← Back to work

Agentic RAG over an Enterprise Documentation Corpus

Self-initiated · Aerojet Rocketdyne (L3Harris) · 2025 — Present

An agentic retrieval system over a large enterprise technical-documentation corpus that measurably cut specification-research time — built self-initiated, now adopted by colleagues.

↓ research time
measured impact measured specific internal figure available on request
graph + vector
hybrid retrieval measured
read-only
least-privilege bridge measured
  • RAG
  • GraphRAG
  • Qdrant
  • BAAI/bge embeddings
  • MCP
  • Python
  • CUDA OCR

Context

Engineers spend an enormous amount of time locating the right passage across a sprawling technical-documentation corpus. I built an agentic retrieval system to collapse that search from a manual hunt into a grounded answer with citations.

What I built

  • Hybrid retrieval — a Qdrant vector store over BAAI/bge embeddings combined with a homegrown knowledge-graph RAG layer (entity and co-occurrence graph with community detection) so the system reasons over how documents relate, not just which are individually similar.
  • CUDA-accelerated OCR ingestion to bring scanned and image-based documents into the corpus cleanly.
  • A least-privilege, read-only tool bridge (MCP) as the enabling control — the agent can retrieve and cite, but never write, and access is scoped and fail-closed by default.

Measured result

Specification-research time dropped materially — a self-measured, work-validated result. The specific figure is available on request. The capability moved beyond me: colleagues now use it.

The corpus itself and the systems it lives in are deliberately unnamed here. This entry is capability-focused by design.