AI Document Intelligence
DocAgent
DocAgent transforms raw construction documents into an intelligent query system. Files flow through a 7-stage async pipeline — parse, chunk, embed, classify, profile, extract, store — producing both semantic search vectors and structured SQL extractions. An AI agent with 5 flexible tools then answers business questions: aggregating costs, comparing budgets vs actuals, listing line items, or searching document text. Every number in the answer is fetched from SQL, never fabricated — the agent does zero arithmetic itself. Document profiles (stored as JSONB) supply per-document query hints and tool suggestions that guide the agent's strategy. Answers render as typed cards: tables, timelines, fact grids, and party cards.
// screenshots



// engineering highlights
- →7-stage ingestion pipeline (parse → chunk → embed → classify → profile → extract → store) runs fully async via BullMQ + Redis — UI stays responsive throughout
- →Semantic search via pgvector HNSW index with 1,536-dim OpenAI embeddings finds meaning across documents, not just keyword matches
- →Deterministic Excel extraction: LLM infers column schema once, regex processes every row — zero tokens per row after the first pass
- →AI agent chooses between 5 tools per question; all arithmetic offloaded to compute_result — 100% factual, no hallucinated numbers
- →Unified JSONB document profile stores AI-generated query hints and suggested tools, giving the agent per-document query strategy
- →Semantic category matching embeds user terms and resolves them to nearest sheet/section names via cosine distance — handles abbreviations, synonyms, and typos
- →Structured JSON answers render as typed UI cards (key_facts, table, timeline, list, parties) — predictable shape, extensible frontend
// architecture explorer
loading diagram...