Skip to content

Instantly share code, notes, and snippets.

@2dogsandanerd
Last active December 25, 2025 20:11
Show Gist options
  • Select an option

  • Save 2dogsandanerd/2a3d54085b2daaccbb1125601945ceeb to your computer and use it in GitHub Desktop.

Select an option

Save 2dogsandanerd/2a3d54085b2daaccbb1125601945ceeb to your computer and use it in GitHub Desktop.
Enterprise RAG Core – Feature Manifest V2.55
Note: This manifest describes a system designed for "Zero Data Loss" ingestion. It prioritizes accuracy and auditability over speed.
# Enterprise RAG Core – Feature Manifest V2.55
**Version:** V2.55 (Public Release Candidate)
**Status:** Production Ready / Code Verified
**Summary:** A high-precision, hybrid Graph-Vector RAG platform featuring a multi-lane ingestion engine with consensus reconciliation.
---
## 1. Agent Service (The Orchestrator)
*Handles query planning, decomposition, and context synthesis.*
* **Query Decomposition Engine:**
* **Plan-and-Solve Pattern:** Breaks complex user prompts into multi-step execution plans.
* **Sub-Query Modeling:** Auto-generates dependencies (`depends_on`) between steps.
* **Routing:** Dynamically routes sub-queries to Vector Search, Graph Traversal, or Mathematical Calculation tools.
* **Semantic Caching (Redis + Embeddings):**
* **Similarity Matching:** Caches responses based on vector similarity (>95%) rather than exact string matching.
* **Performance:** ~40x latency reduction for recurring semantic queries.
* **Cost Efficiency:** estimated 80% reduction in LLM token usage for high-traffic topics.
* **Resilience & API:**
* **Streaming Responses:** Real-time token streaming with state progress updates.
* **Rate Limiting:** IP-based throttling (SlowAPI).
* **Session Management:** Full conversation history with "Time-to-Live" support.
## 2. Ingest Service (The Multi-Lane Engine)
*The system's USP. Instead of a single extraction method, we use parallel lanes and a consensus engine.*
* **Pipeline Routing ("The Triage"):**
* Analyzes incoming documents for complexity (layout, scan quality, text layer).
* Routes to one or more specific processing lanes based on confidence scoring.
* **Multi-Lane Architecture:**
* **Lane A (Fast/LedZeppelin):** Raw text extraction via PyMuPDF. <100ms/page.
* **Lane B (Smart/Goethe):** Structure-aware extraction using Docling (Tables, Headers, Markdown).
* **Lane C (Vision/Hawk):** VLM-based extraction (Ollama Vision) for charts, photos, and complex layouts.
* **"Solomon" Consensus Engine:**
* **Parallel Execution:** Runs selected lanes concurrently.
* **Reconciliation:** Compares "Ground Truth" (Text) against "Visual" (Vision) layers.
* **Conflict Resolution:** Merges outputs to maximize coverage and accuracy.
* **Entity Extraction:**
* LLM-based extraction of 8 core entity types (Person, Org, Location, Skill, etc.) and 7 relationship types.
* **JSON Schema Enforcement:** Ensures strict adherence to the Graph Schema.
* **Control Room API:**
* Real-time WebSocket feeds for pipeline status.
* Live metrics on batch processing and lane performance.
## 3. Knowledge Service (Vector Layer)
*Optimized for unstructured semantic retrieval.*
* **ChromaDB Integration:**
* Custom HNSW configuration for Cosine/L2 distance metrics.
* Batch chunk ingestion with metadata preservation (page numbers, source refs).
* **Retrieval Logic:**
* **Metadata Filtering:** Pre-filtering chunks based on document ownership or attributes.
* **Score Normalization:** Standardizes distance metrics to similarity scores (0-1).
## 4. Graph Service (Context Layer)
*Optimized for structured relationships and "Multi-Hop" reasoning.*
* **Neo4j Implementation:**
* **Strict Schema:** Pre-defined ontology for Enterprise contexts (Entities: `Organization`, `Person`, `Contract`, etc.).
* **Cypher Injection Protection:** Strict allow-listing of property names and relationship types.
* **Graph Algorithms:**
* **Traversal:** recursive retrieval of connected entities (e.g., "Who reports to X who works on Project Y?").
* **Constraint Management:** Enforced uniqueness constraints to prevent node duplication.
## 5. Shared Infrastructure & Security
* **Observability Stack:**
* **OpenTelemetry:** Distributed tracing across all microservices (instrumented for Jaeger).
* **Prometheus/Grafana:** Metrics for request latency, LLM token usage, and cache hit rates.
* **Audit Logging:** Immutable logs of every data access intent (User X viewed Document Y).
* **Security:**
* **RBAC:** Granular permissions (`INGEST`, `READ`, `ADMIN`, `SERVICE_INTERN`).
* **Auth:** Dual-mode authentication (JWT for users, API Keys for service-to-service).
* **Input Sanitization:** Protection against Path Traversal and ReDoS (Regular Expression Denial of Service).
## 6. Tech Stack
* **Core:** Python 3.11, FastAPI, Pydantic V2
* **Orchestration:** LangGraph, LangChain
* **Databases:** Neo4j 5.x (Graph), ChromaDB (Vector), Redis 7 (Cache/Queue)
* **AI/ML:** Ollama (Local Inference), Docling (PDF Processing), PyMuPDF
* **Ops:** Docker Compose, Prometheus, OpenTelemetry
---
## Roadmap (Coming in V3.0)
* **Cross-Encoder Reranking:** Re-scoring retrieval results for higher precision.
* **Full Human-in-the-Loop (HITL) UI:** A frontend for manually resolving the specific edge cases flagged by the Solomon Consensus Engine.
* **Distributed Tracing UI:** Full visualization of the request lifecycle in Grafana.
---
### Why this exists?
Most RAG systems fail on complex PDFs or hallucinate relationships. By splitting ingestion into **"Lanes"** (Text, Vision,
Layout) and having them vote on the result (**Consensus**), and then storing the data in both a **Vector DB** (for fuzzy
search) and a **Knowledge Graph** (for hard facts), we achieve a much higher level of factual consistency for enterprise use
cases.
@2dogsandanerd
Copy link
Author

2dogsandanerd commented Dec 25, 2025

ingest_cockpit telemetrie flight_deck missing content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment