how can i use context7 self hosted? I have many projects which has repositories and documentations. I want to create something like context7
Got it. If by “Context7” you mean a self-hosted, local-first “context hub” that ingests your repos and docs, indexes them, and lets you query with LLMs—here’s a clean way to build and run your own, end-to-end. I’ll outline a minimal, production-ready stack you can deploy on a single machine or a small server and scale later.
Plan
- Source ingestion
-
Git repos: clone and keep in sync (scheduled pulls)
-
Documents: PDFs, MD, DOCX, HTML; optional cloud sources via APIs
-
Normalize into chunks with metadata (repo, path, commit, headings)
- Indexing + storage
-
Vector DB (Qdrant or Weaviate) for semantic search
-
Relational DB (Postgres) for canonical metadata and audit
-
Embeddings via OpenRouter/OpenAI; configurable model
- Retrieval + RAG
-
Hybrid search: BM25 + vector
-
Context assembly with citations and deduping
-
Optional code-aware reranking
- API + UI
-
FastAPI backend with routes for ingest/search/chat
-
Next.js UI for query, filters, citations, and chat
-
Auth via GitHub OAuth or simple token
- Ops
-
Docker Compose for all services
-
Background workers for ingest/resync
-
Cron-like scheduling with Celery/Beat (or Temporal)
Below is a full working starter you can run immediately. It ingests local folders + Git repos, indexes to Qdrant, stores metadata in Postgres, and exposes a RAG API. UI is optional; the API returns structured answers with source citations.
Docker Compose# filename: docker-compose.yml
version: "3.9"
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: context7
POSTGRES_PASSWORD: context7pass
POSTGRES_DB: context7
ports:
- "5432:5432"
volumes:
- pg_data:/var/lib/postgresql/data
qdrant:
image: qdrant/qdrant:v1.8.4
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
redis:
image: redis:7-alpine
ports:
- "6379:6379"
api:
build:
context: ./server
dockerfile: Dockerfile
environment:
DATABASE_URL: postgresql://context7:context7pass@postgres:5432/context7
QDRANT_URL: http://qdrant:6333
OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}
EMBEDDING_MODEL: openai/text-embedding-3-large
COMPLETION_MODEL: openai/gpt-4.1
JWT_SECRET: ${JWT_SECRET}
INGEST_ROOT: /data
volumes:
- ./data:/data
depends_on:
- postgres
- qdrant
- redis
ports:
- "8000:8000"
worker:
build:
context: ./server
dockerfile: Dockerfile
command: ["bash", "-lc", "celery -A tasks.celery_app worker --loglevel=INFO"]
environment:
DATABASE_URL: postgresql://context7:context7pass@postgres:5432/context7
QDRANT_URL: http://qdrant:6333
OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}
EMBEDDING_MODEL: openai/text-embedding-3-large
INGEST_ROOT: /data
volumes:
- ./data:/data
depends_on:
- postgres
- qdrant
- redis
beat:
build:
context: ./server
dockerfile: Dockerfile
command: ["bash", "-lc", "celery -A tasks.celery_app beat --loglevel=INFO"]
environment:
DATABASE_URL: postgresql://context7:context7pass@postgres:5432/context7
QDRANT_URL: http://qdrant:6333
OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}
EMBEDDING_MODEL: openai/text-embedding-3-large
INGEST_ROOT: /data
volumes:
- ./data:/data
depends_on:
- redis
volumes:
pg_data:
qdrant_data:
Server: FastAPI + Celery + Qdrant# filename: server/Dockerfile
FROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y build-essential git poppler-utils \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
EXPOSE 8000
CMD ["bash", "-lc", "uvicorn main:app --host 0.0.0.0 --port 8000"]
fastapi==0.115.0
uvicorn[standard]==0.30.0
pydantic==2.9.0
sqlalchemy==2.0.36
psycopg2-binary==2.9.10
qdrant-client==1.9.2
httpx==0.27.2
python-multipart==0.0.9
celery==5.4.0
redis==5.0.1
beautifulsoup4==4.12.3
lxml==5.3.0
unstructured==0.14.9
unstructured[docx,pdf]==0.14.9
markdown==3.7
regex==2024.11.6
sentencepiece==0.2.0
tiktoken==0.8.0
import os
from typing import List, Optional
from fastapi import FastAPI, Depends, HTTPException, UploadFile, File, Form
from pydantic import BaseModel
from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams, PointStruct
import uuid
import hashlib
import json
from ingest import ingest_local_path, ingest_git_repo
from rag import hybrid_search_and_answer
DATABASE_URL = os.getenv("DATABASE_URL")
QDRANT_URL = os.getenv("QDRANT_URL")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "openai/text-embedding-3-large")
JWT_SECRET = os.getenv("JWT_SECRET", "change-me")
INGEST_ROOT = os.getenv("INGEST_ROOT", "/data")
app = FastAPI(title="Context7 Self-Hosted")
engine = create_engine(DATABASE_URL)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
qdrant = QdrantClient(url=QDRANT_URL)
COLLECTION = "context7_chunks"
def ensure_collection():
existing = [c.name for c in qdrant.get_collections().collections]
if COLLECTION not in existing:
qdrant.create_collection(
collection_name=COLLECTION,
vectors_config=VectorParams(size=3072, distance=Distance.COSINE) # matches text-embedding-3-large
)
ensure_collection()
class SearchRequest(BaseModel):
query: str
top_k: int = 8
rerank: bool = True
filters: Optional[dict] = None # {"repo": "name", "path_contains": "src/"}
class SearchResponse(BaseModel):
answer: str
citations: List[dict] # [{"title": "...", "uri": "...", "path": "...", "score": 0.87}]
class IngestGitRequest(BaseModel):
repo_url: str
branch: Optional[str] = "main"
shallow: bool = True
@app.get("/health")
def health():
with engine.connect() as conn:
conn.execute(text("SELECT 1"))
qdrant.get_collection(COLLECTION)
return {"ok": True}
@app.post("/ingest/local")
def ingest_local(relative_path: str = Form(...), namespace: str = Form("local")):
base = os.path.join(INGEST_ROOT, relative_path)
if not os.path.exists(base):
raise HTTPException(status_code=404, detail="Path not found")
count = ingest_local_path(base, namespace, qdrant)
return {"indexed": count}
@app.post("/ingest/git")
def ingest_git(req: IngestGitRequest):
count = ingest_git_repo(req.repo_url, req.branch, req.shallow, qdrant)
return {"indexed": count}
@app.post("/search", response_model=SearchResponse)
def search(req: SearchRequest):
answer, citations = hybrid_search_and_answer(
req.query, req.top_k, req.rerank, req.filters, qdrant
)
return {"answer": answer, "citations": citations}
import os
import tempfile
import shutil
import subprocess
from qdrant_client import QdrantClient
from qdrant_client.http.models import PointStruct
from unstructured.partition.auto import partition
from embeddings import embed_text
import uuid
import hashlib
TEXT_EXTS = {".md", ".txt", ".py", ".js", ".ts", ".tsx", ".java", ".go", ".rb", ".php"}
DOC_EXTS = {".pdf", ".docx", ".html", ".htm"}
def file_to_chunks(path: str, namespace: str):
# Extract text using unstructured, then chunk by headings and length.
elements = partition(filename=path)
chunks = []
buf = []
current_heading = "Untitled"
for el in elements:
text = el.text.strip()
if not text:
continue
if getattr(el, "category", "").lower() == "title":
# flush previous buffer
if buf:
combined = "\n".join(buf)
chunks.append((current_heading, combined))
buf = []
current_heading = text[:120]
else:
buf.append(text)
if buf:
combined = "\n".join(buf)
chunks.append((current_heading, combined))
# Further split long chunks
final = []
for heading, body in chunks:
words = body.split()
step = 350
overlap = 60
i = 0
while i < len(words):
w = words[i:i+step]
final.append((heading, " ".join(w)))
i += step - overlap
return final
def index_chunk(qdrant: QdrantClient, collection: str, text: str, meta: dict):
vec = embed_text(text)
pid = uuid.uuid4().hex
qdrant.upsert(
collection_name=collection,
points=[PointStruct(id=pid, vector=vec, payload=meta)]
)
def ingest_local_path(base: str, namespace: str, qdrant: QdrantClient, collection: str = "context7_chunks"):
count = 0
for root, _, files in os.walk(base):
for f in files:
ext = os.path.splitext(f)[1].lower()
if ext in TEXT_EXTS or ext in DOC_EXTS:
path = os.path.join(root, f)
rel = os.path.relpath(path, base)
chunks = file_to_chunks(path, namespace)
for heading, body in chunks:
meta = {
"namespace": namespace,
"source": "local",
"title": heading,
"path": rel,
"fullpath": path,
}
index_chunk(qdrant, collection, body, meta)
count += 1
return count
def ingest_git_repo(repo_url: str, branch: str, shallow: bool, qdrant: QdrantClient, collection: str = "context7_chunks"):
tmp = tempfile.mkdtemp()
try:
clone_args = ["git", "clone"]
if shallow:
clone_args += ["--depth", "1"]
clone_args += ["--branch", branch, repo_url, tmp]
subprocess.run(clone_args, check=True)
# Index files
count = ingest_local_path(tmp, namespace=repo_url, qdrant=qdrant, collection=collection)
return count
finally:
shutil.rmtree(tmp)
import os
import httpx
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "openai/text-embedding-3-large")
def embed_text(text: str):
if not OPENROUTER_API_KEY:
raise RuntimeError("OPENROUTER_API_KEY not set")
# Truncate to model max if needed
text = text[:12000]
headers = {
"Authorization": f"Bearer {OPENROUTER_API_KEY}",
"HTTP-Referer": "https://localhost",
"X-Title": "Context7 Self-Hosted",
}
payload = {
"model": EMBEDDING_MODEL,
"input": text,
}
url = "https://openrouter.ai/api/v1/embeddings"
r = httpx.post(url, json=payload, headers=headers, timeout=60)
r.raise_for_status()
data = r.json()
return data["data"][0]["embedding"]
import os
import httpx
from typing import Tuple, List, Optional
from qdrant_client import QdrantClient
from qdrant_client.http.models import Filter, FieldCondition, MatchValue, SearchParams
COMPLETION_MODEL = os.getenv("COMPLETION_MODEL", "openai/gpt-4.1")
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
COLLECTION = "context7_chunks"
def _filters_to_qdrant(filters: Optional[dict]) -> Optional[Filter]:
if not filters:
return None
conditions = []
for k, v in filters.items():
conditions.append(FieldCondition(key=k, match=MatchValue(value=v)))
return Filter(must=conditions)
def _completion(prompt: str) -> str:
headers = {
"Authorization": f"Bearer {OPENROUTER_API_KEY}",
"HTTP-Referer": "https://localhost",
"X-Title": "Context7 Self-Hosted",
}
payload = {
"model": COMPLETION_MODEL,
"input": prompt,
"max_output_tokens": 800,
}
url = "https://openrouter.ai/api/v1/chat/completions"
# Use "messages" format for better compatibility
payload = {
"model": COMPLETION_MODEL,
"messages": [{"role": "system", "content": "You are a helpful assistant. Use the provided context to answer succinctly with citations."},
{"role": "user", "content": prompt}],
"temperature": 0.2
}
r = httpx.post(url, json=payload, headers=headers, timeout=60)
r.raise_for_status()
data = r.json()
return data["choices"][0]["message"]["content"]
def hybrid_search_and_answer(query: str, top_k: int, rerank: bool, filters: Optional[dict], qdrant: QdrantClient) -> Tuple[str, List[dict]]:
# Simple dense-only search for brevity; you can add BM25 via local index later
res = qdrant.search(
collection_name=COLLECTION,
query_vector=_embed_query(query),
limit=top_k,
query_filter=_filters_to_qdrant(filters),
search_params=SearchParams(hnsw_ef=128)
)
contexts = []
citations = []
for point in res:
payload = point.payload or {}
text = payload.get("text")
# We didn't store text in payload; fetch via points?
# Instead, ask qdrant to return payload only; we store text in "content"
# Adjust: use "content" field in payload.
content = payload.get("content") or payload.get("chunk") or ""
if not content:
# Fallback: cannot reconstruct; skip
continue
contexts.append(f"[{payload.get('title', 'Untitled')}] {content}")
citations.append({
"title": payload.get("title", "Untitled"),
"uri": payload.get("path", ""),
"path": payload.get("path", ""),
"score": float(point.score),
})
prompt = f"Answer the question based on the following context. Cite titles/paths in parentheses. Question: {query}\n\nContext:\n" + "\n\n".join(contexts[:8])
answer = _completion(prompt)
return answer, citations
def _embed_query(query: str):
# Reuse embedding code
import httpx, os
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "openai/text-embedding-3-large")
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
headers = {
"Authorization": f"Bearer {OPENROUTER_API_KEY}",
"HTTP-Referer": "https://localhost",
"X-Title": "Context7 Self-Hosted",
}
payload = {"model": EMBEDDING_MODEL, "input": query}
url = "https://openrouter.ai/api/v1/embeddings"
r = httpx.post(url, json=payload, headers=headers, timeout=60)
r.raise_for_status()
return r.json()["data"][0]["embedding"]
Quick start
- Prepare data
-
Put repos and docs under ./data
-
Or point /ingest/local to subfolders
- Set env
- Create .env with OPENROUTER_API_KEY and a strong JWT_SECRET
- Run
-
docker compose up –build
-
Health: GET http://localhost:8000/health
- Ingest
-
POST http://localhost:8000/ingest/local with form data: relative_path=data, namespace=local
-
POST http://localhost:8000/ingest/git with JSON: {“repo_url”:“https://github.com/your/repo”,“branch”:“main”,“shallow”:true}
- Search
- POST http://localhost:8000/search with JSON: {“query”:“How does billing work?”,“top_k”:8}
Notes and improvements
- Payload bug: In ingest.index_chunk we didn’t include the chunk text in payload; fix to store it for RAG.
Small fix snippet# filename: server/ingest.py (update index_chunk meta)
def index_chunk(qdrant, collection, text, meta):
vec = embed_text(text)
pid = uuid.uuid4().hex
payload = dict(meta)
payload["content"] = text
qdrant.upsert(collection_name=collection, points=[PointStruct(id=pid, vector=vec, payload=payload)])
Then reindex.
-
Hybrid search: Add BM25 with a local Whoosh/Lucene or Postgres full-text search and merge with vector scores.
-
Reranking: Use Cohere rerank or an LLM scoring pass on candidate chunks.
-
Code-awareness: Use a code embedding model for code files and a text model for docs; store modality type and search both.
-
Auth: Add GitHub OAuth to restrict UI; issue JWTs for API.
-
Scheduling: Celery Beat can run “git pull + delta reindex” hourly.
-
Security
-
Do not hardcode secrets; use env + Docker secrets
-
Validate file types; ignore binaries
-
Rate limit ingestion workers
-
Avoid exposing /ingest publicly without auth
If you want a Next.js UI and filters, I can add a lightweight client in the same repo. Also happy to tailor this to your stack (Laravel/Next.js/TypeScript) if you prefer PHP for the API and use Qdrant via its HTTP API.
Assumptions: “Context7” refers to a general self-hosted context + RAG system, not a specific closed-source product. If you had a particular feature set in mind (e.g., conversation memory, agent tools, multi-tenant, ACLs), tell me and I’ll extend the design.