Discussion about this post

User's avatar
Lee's avatar

Andy - you invited parallel experiments, so here's one. I've spent the past several months building a version of this loop for myself with Claude Code as the engine, and the experience pulled me toward Nadella's framing of it more than the eval-centered one.

The difference is in what the loop is for. An eval produces an artifact - a rubric, a leaderboard, a score. Genuinely useful, and a great on-ramp (your Lesson 2). But what I ended up building isn't an artifact; it's a feedback-producing process, and its only product is a better thinker over time - me. That's Nadella's "you can never offload your learning" taken literally: the goal isn't to measure a model, it's to compound my own judgment.

Structurally that needs two things an eval doesn't:

1. An owned, model-agnostic substrate - local files, my own version control, memory that isn't in a vendor's profile. Nadella's sovereignty test (swap the model, lose nothing that matters), pointed at a person instead of a firm.

2. A memory that remembers me, not just my rubric - so the loop can surface my own inconsistencies ("you argued the opposite in March") and connect today's reading to two years of accumulated content. A rubric is my current judgment reified; it can't catch my own drift. Only a record of past selves can.

And it can't be built in an afternoon - only started and run. The value is in the years.

So for a class, maybe the eval is the ramp and the compounding feedback loop is the destination. If the goal is better thinkers, that's the shape I'd point students toward.

No posts

Ready for more?