I design AI systems.
Models ship them.
I prove they work.
LLMOps systems.

Building things to understand them. Current lane: eval harnesses, trace observability, cost routing, prompt maintenance.

what i build

  • advisor patterns
  • eval harnesses
  • RAG
  • agent orchestration
  • trace observability
  • cost telemetry

What's running

Reasonable UX

  • production
  • eval harness

Playwright agent that walks a site, scores the UX, and emits a multi-page PDF. Tiered model architecture (Haiku scout → Sonnet executor → Opus advisor) with a separate eval harness that quantified which variant actually wins — the cheaper configuration, 2W/1L.

QA Agent

  • shipped
  • 84% token reduction

Automated UX quality audit agent built on Playwright + multi-model scoring. Cut running cost 84% by stripping images from conversation history and shipping screenshots at JPEG-40.

Personal OS Dashboard

  • production
  • modular · Svelte 5

Private SvelteKit dashboard on Fly.io, reading a SQLite mirror of an Obsidian vault via Litestream. Modules for projects, daily notes, biometrics, and system health.

RAG Brain

  • shipped
  • RAG pipeline
  • Voyage AI · ChromaDB

Five explicit LLMOps decisions in a working RAG pipeline: H2 structural chunking over fixed-token windows, frontmatter separation from embedded text, asymmetric query/document input types, SQLite as a queryable observability layer, and incremental indexing via content hashing.

By the numbers

11 decisions defended across project pages
$0.52–$1.94 cost per UX audit reasonable-ux
<$0.01 cost per RAG reindex RAG Brain

Building LLMOps tools · 20 yrs in regulated software · Charlotte NC