Writing

Notes while building.

Mostly working through what actually holds up when you put AI in front of real users: retrieval, evaluation, and the judgment calls a metric will not make for you. A few pieces are listed below — still writing them.

What a RAG Demo Hides: The Metrics That Actually Matter

A retrieval demo answering one good question proves almost nothing. The interesting part is everything the happy path never shows you.

Coming soon

Designing Evaluation Loops for AI Products

Evaluation is not a QA step at the end. For AI products it is part of the product, and treating it that way changes what you build.

Coming soon

Let AI Answer, or Keep It Human? A Decision I Got Wrong Once

At GMATWhiz we automated a piece of mentorship that should have stayed human. Here is the framework I wish I had used.

Coming soon