PRODUCTION RAG SYSTEM

UK Finance Domain
Intelligence System

A production-grade Retrieval-Augmented Generation system for answering questions over real UK financial documents — with evidence, traceability, and deployment realism.

Explore the Architecture Live Demo

Evidence Grounding • Traceability • Deployment Realism

Machine Learning Systems Are About
Trust, Not Text

Most RAG demos stop at "the model gave the right answer."

This system asks a harder question: Why should you trust the answer at all?

Financial documents are long, dense, and ambiguous. Large language models are powerful — but unreliable when left alone.

This project treats retrieval-augmented generation as a systems problem, not a prompting problem.

Rather than optimizing for fluency, the system is designed around:

Evidence Grounding

Traceability

Deployment Realism

Observability

Failure Transparency

Every answer is backed by retrieved source passages, attributable to a specific document and page range, generated by a stateless, production-ready API, visible through a simple but honest UI.

This is not a chatbot. It is a document intelligence pipeline.

Document Registry

What happens here

The system begins with a declarative registry of financial documents.

Each document is defined by:

company
fiscal year
report type
source URL

No document is processed unless it is explicitly declared.

Why this matters

This creates reproducibility, auditability, and clear provenance. Nothing "just appears" in the system.

ENGINEERING CHOICE

Declarative registry

What we chose: YAML-based document registry listing company, fiscal year, report type, and source URL.

Alternatives: Hardcoding URLs, dynamic crawling, manual uploads

Why: Explicit declaration equals reproducibility. Prevents accidental ingestion and mimics production data contracts.

PDF Ingestion

What happens here

Registered documents are downloaded as immutable artifacts.

Files are stored by company, year, and report type. Duplicate downloads are skipped.

Why this matters

Raw documents are treated as source-of-truth artifacts, not transient inputs.

Text Extraction

What happens here

PDFs are converted to raw text using page-aware extraction. Each page is explicitly marked. Failures are surfaced early.

Why this matters

Page boundaries are preserved so that citations remain meaningful later.

ENGINEERING CHOICE

Preserve page boundaries

What we chose: Extract text page-by-page with explicit markers, preserving ordering and structure.

Alternatives: Full-text blob dumping, aggressive OCR, layout stripping

Why: Citations require page awareness. Financial reports are page-referenced; losing structure kills explainability.

Text Normalization

Conservative cleanup

The extracted text undergoes conservative cleanup:

Remove obvious headers/footers
Normalize page markers
Preserve original wording

NO SEMANTIC REWRITING

No aggressive cleaning.

Why this matters

RAG systems fail when preprocessing distorts meaning. This phase optimizes for fidelity, not prettiness.

ENGINEERING CHOICE

Conservative cleanup

What we chose: Remove obvious headers/footers, normalize whitespace, preserve original wording. No semantic rewriting.

Alternatives: Heavy regex, sentence rewriting, LLM-based cleanup

Why: RAG systems break when semantics distort. Financial language is precise — don't rewrite it. Fidelity over aesthetics.

Chunking & Metadata

What happens here

Documents are split into overlapping chunks bounded by character length, page-traceable, and metadata-rich.

Each chunk inherits company, year, document ID, and page range.

Why this matters

Chunks become the atomic unit of retrieval. Traceability is built in, not bolted on.

ENGINEERING CHOICE

Fixed-size overlapping chunks

What we chose: Character-bounded chunks (~1200 chars) with overlap (~150 chars) and chunk-level metadata.

Alternatives: Sentence-based, section-based, no overlap

Why: Simple and predictable. Overlap prevents boundary loss. Metadata enables filtering. Optimized for retrieval stability, not theoretical optimality.

Embedding Generation

What happens here

Each chunk is embedded using a sentence-transformer model. Embeddings are deterministic, normalized, and stored once.

Why this matters

This enables efficient semantic similarity without model inference at query time.

ENGINEERING CHOICE

Sentence-Transformers MiniLM

What we chose: sentence-transformers/all-MiniLM-L6-v2

Alternatives: OpenAI embeddings, larger SBERT, Instructor models, domain-specific

Why: Fast, small, deterministic, excellent performance per parameter, no API dependency. Signals "I know when not to over-engineer."

Vector Store (FAISS)

What happens here

Embeddings are stored in a FAISS index loaded at API startup.

The index is read-only in production, fast, and memory-resident.

Why this matters

FAISS provides predictable latency without external dependencies.

ENGINEERING CHOICE

FAISS (in-memory)

What we chose: FAISS index loaded at startup, read-only during serving.

Alternatives: Pinecone, Weaviate, Milvus, ElasticSearch, Chroma

Why: No external dependencies, zero per-query cost, predictable latency, industry-standard. Production realism: choose simplicity until scale demands otherwise.

Query Embedding

What happens here

What happens here

A simple UI enables natural-language queries, optional filters, and transparent citation display.

The UI calls the hosted API directly.

Why this matters

The UI proves the system works end-to-end — not just in code.

ENGINEERING CHOICE

Streamlit (thin client)

What we chose: Streamlit UI calling hosted API directly, displaying citations transparently.

Alternatives: React, Next.js, Gradio, no UI

Why: Fastest honest UI, minimal ceremony, easy demo, clean separation from backend. Communication over frontend mastery.

ENGINEERING CHOICE

Docker + Cloud Run

What we chose: Dockerized services, serverless Cloud Run, CI/CD via GitHub Actions.

Alternatives: VMs, Kubernetes, local-only, platform-specific runtimes

Why: Scales to zero, simple mental model, industry-relevant, minimal ops burden. Proves deployment understanding beyond notebooks.

Document Pipeline

Navigate

Reliable AI Is Built, Not Prompted

This project demonstrates how to build an AI system that answers questions responsibly, exposes its evidence, survives deployment, and fails honestly.

Answers with evidence

Survives deployment

Fails honestly

Respects systems thinking

Not by chasing novelty — but by respecting systems thinking.

UK Finance Domain Intelligence System

Machine Learning Systems Are About Trust, Not Text

Document Registry

PDF Ingestion

Text Extraction

Text Normalization

Chunking & Metadata

Embedding Generation

Vector Store (FAISS)

Query Embedding

Semantic Retrieval

Metadata Filtering

Evidence Assembly

LLM Generation

API Response

Observability & Logging

Streamlit UI

Document Pipeline

Reliable AI Is Built, Not Prompted

UK Finance Domain
Intelligence System

Machine Learning Systems Are About
Trust, Not Text