RAG Brain — Ryan Walsh

What it is

RAG Brain answers natural-language questions against the markdown notes in my vault — "what did I decide about the portfolio site stack?" instead of grep -ri portfolio 20-projects/. It indexes markdown files into a local ChromaDB vector store, mirrors chunk metadata and search history into SQLite, and returns top-k semantic matches with file paths, section headings, and similarity scores.

The design is explicit about every decision it makes. Most RAG tutorials make these choices silently. This one documents the tradeoffs.

What it returns

One real query against the live vault, top-3 results pulled from the search log:

$ search.py "what is reasonable-ux"

1.  20-projects/reasonable-ux.md  ::what-it-is    0.5705
2.  20-projects/reasonable-ux.md  ::status        0.4839
3.  00-dashboard/now.md           ::intro         0.4489

Top hit is the right note's What-it-is section. The same note's Status section ranks #2 — that's H2 structural chunking (decision #1 below) doing its job: two distinct chunks of the same file ranked separately by semantic relevance, not collapsed into a single document-level result. The #3 result is a tangential mention in the dashboard intro — lower score, correctly demoted.

Five decisions

Why not fixed-token chunks?

Fixed-token windows (128/256/512 tokens) split wherever the boundary falls — mid-sentence, mid-thought. Each embedding captures a fragment. H2 structural splitting produces chunks that correspond to complete ideas, so when a query asks about architecture, the '## Architecture' chunk scores well because the embedding represents a coherent topic. Very short sections get merged forward; duplicate headings within a file get deduplicated to prevent silent ChromaDB overwrites.

Why not embed the frontmatter?

YAML frontmatter — status, tags, title, started — is stored as ChromaDB metadata for filtering, not included in the embedded text. Embedding 'status: active, tags: rag, embeddings' alongside prose would dilute the semantic signal. The separation enables hybrid queries: filter by status:active before semantic ranking, not during.

Why pass input_type at all?

Voyage AI's voyage-3-lite is trained with two representations: one optimized for document content, one for query intent. Passing input_type="document" when indexing and input_type="query" when searching means a query embedding lands near document embeddings that answer the query, even when the vocabulary doesn't overlap. Using "document" for both works for "find similar documents" but degrades on natural language questions.

Why two stores?

ChromaDB finds nearest neighbors. It doesn't do SQL. SQLite does SQL but doesn't do vectors. Both run in parallel: search_chunks mirrors ChromaDB chunk metadata into SQLite; search_log records every query with result IDs and similarity scores. The pipeline becomes inspectable — over time you can see which queries return weak matches and use that signal to improve chunking strategy or note structure.

-- Which source files produce the most chunks?
SELECT source, COUNT(*) FROM search_chunks GROUP BY source ORDER BY COUNT(*) DESC;

-- What has been searched, and how strong were the matches?
SELECT query, scores, searched_at FROM search_log ORDER BY searched_at DESC LIMIT 20;

Why hash files?

Each file's MD5 hash is stored in ChromaDB chunk metadata. On subsequent runs, stored hashes are compared against current file content; unchanged files are skipped. At personal vault scale, re-embedding everything on every run wastes quota and time. Two edge cases handled: deleted files (chunks orphaned in ChromaDB, diffed and cleaned before indexing) and restructured files (changed H2 layout leaves orphan chunk IDs — old chunks deleted by source path before upserting new ones).

Status

Shipped as v1. Running against my vault daily. Planned expansions, not blocking completion: an MCP server so any Claude session can query the vault as a tool, hybrid search (vector + keyword) for exact-term recall, and an embedding provider comparison (Voyage / Ollama / OpenAI) once the corpus is large enough to make the comparison meaningful.