Playwright agent that walks a site, scores the UX, and emits a multi-page PDF. Tiered model architecture (Haiku scout → Sonnet executor → Opus advisor) with a separate eval harness that quantified which variant actually wins — the cheaper configuration, 2W/1L.
- shipped
- 84% token reduction
Automated UX quality audit agent built on Playwright + multi-model scoring. Cut running cost 84% by stripping images from conversation history and shipping screenshots at JPEG-40.
- production
- modular · Svelte 5
Private SvelteKit dashboard on Fly.io, reading a SQLite mirror of an Obsidian vault via Litestream. Modules for projects, daily notes, biometrics, and system health.
- shipped
- RAG pipeline
- Voyage AI · ChromaDB
Five explicit LLMOps decisions in a working RAG pipeline: H2 structural chunking over fixed-token windows, frontmatter separation from embedded text, asymmetric query/document input types, SQLite as a queryable observability layer, and incremental indexing via content hashing.