Reasonable UX

  • production
  • eval harness

Playwright agent that walks a site, scores the UX, and emits a multi-page PDF. Tiered model architecture (Haiku scout → Sonnet executor → Opus advisor) with a separate eval harness that quantified which variant actually wins — the cheaper configuration, 2W/1L.

QA Agent

  • shipped
  • 84% token reduction

Automated UX quality audit agent built on Playwright + multi-model scoring. Cut running cost 84% by stripping images from conversation history and shipping screenshots at JPEG-40.

Personal OS Dashboard

  • production
  • modular · Svelte 5

Private SvelteKit dashboard on Fly.io, reading a SQLite mirror of an Obsidian vault via Litestream. Modules for projects, daily notes, biometrics, and system health.

RAG Brain

  • shipped
  • RAG pipeline
  • Voyage AI · ChromaDB

Five explicit LLMOps decisions in a working RAG pipeline: H2 structural chunking over fixed-token windows, frontmatter separation from embedded text, asymmetric query/document input types, SQLite as a queryable observability layer, and incremental indexing via content hashing.

  • System Health
    • revamping
    • self-discovering

    Built because sustained attention to maintenance is exactly what ADHD is worst at — so the checker had to be the thing that pays attention. Self-discovers every automation: launchd jobs, git hooks, OAuth tokens, vault integrity. Reports what to run when something breaks.

  • Agent Panel
    • personal-os
    • 10+ specialists
    • eval harness

    Specialist Claude agents for vault queries, hiring analysis, curriculum review, and automation design — each with typed inputs and narrow scope. An eval harness runs periodic trajectory checks to surface drift before it compounds. The architecture is LLMOps patterns made operational.

  • LLMOps Curriculum
    • phase 0 complete
    • build-as-you-learn

    Self-paced path through the operational layer of LLM systems — observability, evals, cost routing, model gateways. Each phase maps to work already running in the portfolio: reasonable-ux, QA Agent, and Agent Panel.