Skip to content

Instantly share code, notes, and snippets.

@sarpavci
Last active December 16, 2025 13:50
Show Gist options
  • Select an option

  • Save sarpavci/7139ef58acbfa859be92420fb0b06215 to your computer and use it in GitHub Desktop.

Select an option

Save sarpavci/7139ef58acbfa859be92420fb0b06215 to your computer and use it in GitHub Desktop.

Question

how can i use context7 self hosted? I have many projects which has repositories and documentations. I want to create something like context7

Answer

Got it. If by “Context7” you mean a self-hosted, local-first “context hub” that ingests your repos and docs, indexes them, and lets you query with LLMs—here’s a clean way to build and run your own, end-to-end. I’ll outline a minimal, production-ready stack you can deploy on a single machine or a small server and scale later.

Plan

  1. Source ingestion
  • Git repos: clone and keep in sync (scheduled pulls)

  • Documents: PDFs, MD, DOCX, HTML; optional cloud sources via APIs

  • Normalize into chunks with metadata (repo, path, commit, headings)

  1. Indexing + storage
  • Vector DB (Qdrant or Weaviate) for semantic search

  • Relational DB (Postgres) for canonical metadata and audit

  • Embeddings via OpenRouter/OpenAI; configurable model

  1. Retrieval + RAG
  • Hybrid search: BM25 + vector

  • Context assembly with citations and deduping

  • Optional code-aware reranking

  1. API + UI
  • FastAPI backend with routes for ingest/search/chat

  • Next.js UI for query, filters, citations, and chat

  • Auth via GitHub OAuth or simple token

  1. Ops
  • Docker Compose for all services

  • Background workers for ingest/resync

  • Cron-like scheduling with Celery/Beat (or Temporal)

Below is a full working starter you can run immediately. It ingests local folders + Git repos, indexes to Qdrant, stores metadata in Postgres, and exposes a RAG API. UI is optional; the API returns structured answers with source citations.

Docker Compose# filename: docker-compose.yml

version: "3.9"

services:

postgres:

image: postgres:16-alpine

environment:

  POSTGRES_USER: context7

  POSTGRES_PASSWORD: context7pass

  POSTGRES_DB: context7

ports:

  - "5432:5432"

volumes:

  - pg_data:/var/lib/postgresql/data

qdrant:

image: qdrant/qdrant:v1.8.4

ports:

  - "6333:6333"

volumes:

  - qdrant_data:/qdrant/storage

redis:

image: redis:7-alpine

ports:

  - "6379:6379"

api:

build:

  context: ./server

  dockerfile: Dockerfile

environment:

  DATABASE_URL: postgresql://context7:context7pass@postgres:5432/context7

  QDRANT_URL: http://qdrant:6333

  OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}

  EMBEDDING_MODEL: openai/text-embedding-3-large

  COMPLETION_MODEL: openai/gpt-4.1

  JWT_SECRET: ${JWT_SECRET}

  INGEST_ROOT: /data

volumes:

  - ./data:/data

depends_on:

  - postgres

  - qdrant

  - redis

ports:

  - "8000:8000"

worker:

build:

  context: ./server

  dockerfile: Dockerfile

command: ["bash", "-lc", "celery -A tasks.celery_app worker --loglevel=INFO"]

environment:

  DATABASE_URL: postgresql://context7:context7pass@postgres:5432/context7

  QDRANT_URL: http://qdrant:6333

  OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}

  EMBEDDING_MODEL: openai/text-embedding-3-large

  INGEST_ROOT: /data

volumes:

  - ./data:/data

depends_on:

  - postgres

  - qdrant

  - redis

beat:

build:

  context: ./server

  dockerfile: Dockerfile

command: ["bash", "-lc", "celery -A tasks.celery_app beat --loglevel=INFO"]

environment:

  DATABASE_URL: postgresql://context7:context7pass@postgres:5432/context7

  QDRANT_URL: http://qdrant:6333

  OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}

  EMBEDDING_MODEL: openai/text-embedding-3-large

  INGEST_ROOT: /data

volumes:

  - ./data:/data

depends_on:

  - redis

volumes:

pg_data:

qdrant_data:

Server: FastAPI + Celery + Qdrant# filename: server/Dockerfile

FROM python:3.11-slim

WORKDIR /app

RUN apt-get update && apt-get install -y build-essential git poppler-utils \

&& rm -rf /var/lib/apt/lists/*

COPY requirements.txt /app/requirements.txt

RUN pip install --no-cache-dir -r requirements.txt

COPY . /app

EXPOSE 8000

CMD ["bash", "-lc", "uvicorn main:app --host 0.0.0.0 --port 8000"]

filename: server/requirements.txt

fastapi==0.115.0

uvicorn[standard]==0.30.0

pydantic==2.9.0

sqlalchemy==2.0.36

psycopg2-binary==2.9.10

qdrant-client==1.9.2

httpx==0.27.2

python-multipart==0.0.9

celery==5.4.0

redis==5.0.1

beautifulsoup4==4.12.3

lxml==5.3.0

unstructured==0.14.9

unstructured[docx,pdf]==0.14.9

markdown==3.7

regex==2024.11.6

sentencepiece==0.2.0

tiktoken==0.8.0

filename: server/main.py

import os

from typing import List, Optional

from fastapi import FastAPI, Depends, HTTPException, UploadFile, File, Form

from pydantic import BaseModel

from sqlalchemy import create_engine, text

from sqlalchemy.orm import sessionmaker

from qdrant_client import QdrantClient

from qdrant_client.http.models import Distance, VectorParams, PointStruct

import uuid

import hashlib

import json

from ingest import ingest_local_path, ingest_git_repo

from rag import hybrid_search_and_answer

DATABASE_URL = os.getenv("DATABASE_URL")

QDRANT_URL = os.getenv("QDRANT_URL")

EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "openai/text-embedding-3-large")

JWT_SECRET = os.getenv("JWT_SECRET", "change-me")

INGEST_ROOT = os.getenv("INGEST_ROOT", "/data")

app = FastAPI(title="Context7 Self-Hosted")

engine = create_engine(DATABASE_URL)

SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)

qdrant = QdrantClient(url=QDRANT_URL)

COLLECTION = "context7_chunks"

def ensure_collection():

existing = [c.name for c in qdrant.get_collections().collections]

if COLLECTION not in existing:

    qdrant.create_collection(

        collection_name=COLLECTION,

        vectors_config=VectorParams(size=3072, distance=Distance.COSINE)  # matches text-embedding-3-large

    )

ensure_collection()

class SearchRequest(BaseModel):

query: str

top_k: int = 8

rerank: bool = True

filters: Optional[dict] = None  # {"repo": "name", "path_contains": "src/"}

class SearchResponse(BaseModel):

answer: str

citations: List[dict]  # [{"title": "...", "uri": "...", "path": "...", "score": 0.87}]

class IngestGitRequest(BaseModel):

repo_url: str

branch: Optional[str] = "main"

shallow: bool = True

@app.get("/health")

def health():

with engine.connect() as conn:

    conn.execute(text("SELECT 1"))

qdrant.get_collection(COLLECTION)

return {"ok": True}

@app.post("/ingest/local")

def ingest_local(relative_path: str = Form(...), namespace: str = Form("local")):

base = os.path.join(INGEST_ROOT, relative_path)

if not os.path.exists(base):

    raise HTTPException(status_code=404, detail="Path not found")

count = ingest_local_path(base, namespace, qdrant)

return {"indexed": count}

@app.post("/ingest/git")

def ingest_git(req: IngestGitRequest):

count = ingest_git_repo(req.repo_url, req.branch, req.shallow, qdrant)

return {"indexed": count}

@app.post("/search", response_model=SearchResponse)

def search(req: SearchRequest):

answer, citations = hybrid_search_and_answer(

    req.query, req.top_k, req.rerank, req.filters, qdrant

)

return {"answer": answer, "citations": citations}

filename: server/ingest.py

import os

import tempfile

import shutil

import subprocess

from qdrant_client import QdrantClient

from qdrant_client.http.models import PointStruct

from unstructured.partition.auto import partition

from embeddings import embed_text

import uuid

import hashlib

TEXT_EXTS = {".md", ".txt", ".py", ".js", ".ts", ".tsx", ".java", ".go", ".rb", ".php"}

DOC_EXTS = {".pdf", ".docx", ".html", ".htm"}

def file_to_chunks(path: str, namespace: str):

# Extract text using unstructured, then chunk by headings and length.

elements = partition(filename=path)

chunks = []

buf = []

current_heading = "Untitled"

for el in elements:

    text = el.text.strip()

    if not text:

        continue

    if getattr(el, "category", "").lower() == "title":

        # flush previous buffer

        if buf:

            combined = "\n".join(buf)

            chunks.append((current_heading, combined))

            buf = []

        current_heading = text[:120]

    else:

        buf.append(text)

if buf:

    combined = "\n".join(buf)

    chunks.append((current_heading, combined))

# Further split long chunks

final = []

for heading, body in chunks:

    words = body.split()

    step = 350

    overlap = 60

    i = 0

    while i < len(words):

        w = words[i:i+step]

        final.append((heading, " ".join(w)))

        i += step - overlap

return final

def index_chunk(qdrant: QdrantClient, collection: str, text: str, meta: dict):

vec = embed_text(text)

pid = uuid.uuid4().hex

qdrant.upsert(

    collection_name=collection,

    points=[PointStruct(id=pid, vector=vec, payload=meta)]

)

def ingest_local_path(base: str, namespace: str, qdrant: QdrantClient, collection: str = "context7_chunks"):

count = 0

for root, _, files in os.walk(base):

    for f in files:

        ext = os.path.splitext(f)[1].lower()

        if ext in TEXT_EXTS or ext in DOC_EXTS:

            path = os.path.join(root, f)

            rel = os.path.relpath(path, base)

            chunks = file_to_chunks(path, namespace)

            for heading, body in chunks:

                meta = {

                    "namespace": namespace,

                    "source": "local",

                    "title": heading,

                    "path": rel,

                    "fullpath": path,

                }

                index_chunk(qdrant, collection, body, meta)

                count += 1

return count

def ingest_git_repo(repo_url: str, branch: str, shallow: bool, qdrant: QdrantClient, collection: str = "context7_chunks"):

tmp = tempfile.mkdtemp()

try:

    clone_args = ["git", "clone"]

    if shallow:

        clone_args += ["--depth", "1"]

    clone_args += ["--branch", branch, repo_url, tmp]

    subprocess.run(clone_args, check=True)

    # Index files

    count = ingest_local_path(tmp, namespace=repo_url, qdrant=qdrant, collection=collection)

    return count

finally:

    shutil.rmtree(tmp)

filename: server/embeddings.py

import os

import httpx

OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")

EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "openai/text-embedding-3-large")

def embed_text(text: str):

if not OPENROUTER_API_KEY:

    raise RuntimeError("OPENROUTER_API_KEY not set")

# Truncate to model max if needed

text = text[:12000]

headers = {

    "Authorization": f"Bearer {OPENROUTER_API_KEY}",

    "HTTP-Referer": "https://localhost",

    "X-Title": "Context7 Self-Hosted",

}

payload = {

    "model": EMBEDDING_MODEL,

    "input": text,

}

url = "https://openrouter.ai/api/v1/embeddings"

r = httpx.post(url, json=payload, headers=headers, timeout=60)

r.raise_for_status()

data = r.json()

return data["data"][0]["embedding"]

filename: server/rag.py

import os

import httpx

from typing import Tuple, List, Optional

from qdrant_client import QdrantClient

from qdrant_client.http.models import Filter, FieldCondition, MatchValue, SearchParams

COMPLETION_MODEL = os.getenv("COMPLETION_MODEL", "openai/gpt-4.1")

OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")

COLLECTION = "context7_chunks"

def _filters_to_qdrant(filters: Optional[dict]) -> Optional[Filter]:

if not filters:

    return None

conditions = []

for k, v in filters.items():

    conditions.append(FieldCondition(key=k, match=MatchValue(value=v)))

return Filter(must=conditions)

def _completion(prompt: str) -> str:

headers = {

    "Authorization": f"Bearer {OPENROUTER_API_KEY}",

    "HTTP-Referer": "https://localhost",

    "X-Title": "Context7 Self-Hosted",

}

payload = {

    "model": COMPLETION_MODEL,

    "input": prompt,

    "max_output_tokens": 800,

}

url = "https://openrouter.ai/api/v1/chat/completions"

# Use "messages" format for better compatibility

payload = {

    "model": COMPLETION_MODEL,

    "messages": [{"role": "system", "content": "You are a helpful assistant. Use the provided context to answer succinctly with citations."},

                 {"role": "user", "content": prompt}],

    "temperature": 0.2

}

r = httpx.post(url, json=payload, headers=headers, timeout=60)

r.raise_for_status()

data = r.json()

return data["choices"][0]["message"]["content"]

def hybrid_search_and_answer(query: str, top_k: int, rerank: bool, filters: Optional[dict], qdrant: QdrantClient) -> Tuple[str, List[dict]]:

# Simple dense-only search for brevity; you can add BM25 via local index later

res = qdrant.search(

    collection_name=COLLECTION,

    query_vector=_embed_query(query),

    limit=top_k,

    query_filter=_filters_to_qdrant(filters),

    search_params=SearchParams(hnsw_ef=128)

)

contexts = []

citations = []

for point in res:

    payload = point.payload or {}

    text = payload.get("text")

    # We didn't store text in payload; fetch via points?

    # Instead, ask qdrant to return payload only; we store text in "content"

    # Adjust: use "content" field in payload.

    content = payload.get("content") or payload.get("chunk") or ""

    if not content:

        # Fallback: cannot reconstruct; skip

        continue

    contexts.append(f"[{payload.get('title', 'Untitled')}] {content}")

    citations.append({

        "title": payload.get("title", "Untitled"),

        "uri": payload.get("path", ""),

        "path": payload.get("path", ""),

        "score": float(point.score),

    })

prompt = f"Answer the question based on the following context. Cite titles/paths in parentheses. Question: {query}\n\nContext:\n" + "\n\n".join(contexts[:8])

answer = _completion(prompt)

return answer, citations

def _embed_query(query: str):

# Reuse embedding code

import httpx, os

EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "openai/text-embedding-3-large")

OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")

headers = {

    "Authorization": f"Bearer {OPENROUTER_API_KEY}",

    "HTTP-Referer": "https://localhost",

    "X-Title": "Context7 Self-Hosted",

}

payload = {"model": EMBEDDING_MODEL, "input": query}

url = "https://openrouter.ai/api/v1/embeddings"

r = httpx.post(url, json=payload, headers=headers, timeout=60)

r.raise_for_status()

return r.json()["data"][0]["embedding"]

Quick start

  1. Prepare data
  • Put repos and docs under ./data

  • Or point /ingest/local to subfolders

  1. Set env
  • Create .env with OPENROUTER_API_KEY and a strong JWT_SECRET
  1. Run
  1. Ingest
  1. Search

Notes and improvements

  • Payload bug: In ingest.index_chunk we didn’t include the chunk text in payload; fix to store it for RAG.

Small fix snippet# filename: server/ingest.py (update index_chunk meta)

def index_chunk(qdrant, collection, text, meta):

vec = embed_text(text)

pid = uuid.uuid4().hex

payload = dict(meta)

payload["content"] = text

qdrant.upsert(collection_name=collection, points=[PointStruct(id=pid, vector=vec, payload=payload)])

Then reindex.

  • Hybrid search: Add BM25 with a local Whoosh/Lucene or Postgres full-text search and merge with vector scores.

  • Reranking: Use Cohere rerank or an LLM scoring pass on candidate chunks.

  • Code-awareness: Use a code embedding model for code files and a text model for docs; store modality type and search both.

  • Auth: Add GitHub OAuth to restrict UI; issue JWTs for API.

  • Scheduling: Celery Beat can run “git pull + delta reindex” hourly.

  • Security

  • Do not hardcode secrets; use env + Docker secrets

  • Validate file types; ignore binaries

  • Rate limit ingestion workers

  • Avoid exposing /ingest publicly without auth

If you want a Next.js UI and filters, I can add a lightweight client in the same repo. Also happy to tailor this to your stack (Laravel/Next.js/TypeScript) if you prefer PHP for the API and use Qdrant via its HTTP API.

Assumptions: “Context7” refers to a general self-hosted context + RAG system, not a specific closed-source product. If you had a particular feature set in mind (e.g., conversation memory, agent tools, multi-tenant, ACLs), tell me and I’ll extend the design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment