Bare-Metal Loops | Colin Gaffney

Often my role requires squeezing signal out of unstructured data. Last year, during a model development cycle, I noticed that an important survey question was returning tens of thousands of near-duplicate categories. As a feature, it was structurally useless, with typos, formatting, and even emojis obscuring the real signal. There is a solution, which is using pattern parsing code to remove a lot of the noise and typos, but it would take me days of trial and error to get anything of substance.

Of course, maybe it wouldn’t take me forever. I don’t need to write the regex myself, I have LLMs like Claude for that. The problem was Claude couldn’t guess the structure of the data. The intent was clear, but I needed a way to wrap a loop around a model or I’d be copy and pasting for days, not much of an upgrade.

What followed taught me the lesson of bare-metal loops.

I gave Claude three things: the data as input, a filter script as its action space, and a scoring tool as the reward. Then I let it run. Within 15 minutes, the previously useless feature ranked in our top 20 for the next model.

bare-metal loops: reduce and remove everything until you have the most raw interface between your model and the task loop. The best agentic workflows are not the ones that prop up the model with scaffolding. They are the ones with the tightest loop between model action and verifiable feedback.

The model capabilities that improved fastest are the ones with tight feedback loops and verifiable end states. This is how coding, math, and tool use broke open over the past year. When you echo those environments in your own workflow, the model switches from chatbot to optimization agent. When you get a tight fit, it’s almost terrifying to watch.

The trap, however, is that as the cost of software drops, the stream of new tools, abstractions, and frameworks is never ending. You could easily talk yourself into a multi-agent, auxiliary model, token-maxxing workflow with a dozen custom skills. The problem is, not only is this fragile to model changes, but it complicates the reward loop for the model itself. Let alone that the next generation of models will likely solve the problem you built your framework around.

This pattern repeats: chain-of-thought became reasoning, tool use became native, and planner-executor frameworks collapsed into short back-and-forth conversations. Next we can expect models to improve at calling skills, incorporating memories, and explaining themselves visually. The model keeps absorbing the wrapper.

Every abstraction should either shorten the loop, improve the feedback signal, or expand the model’s useful action space. Otherwise, it just adds friction.

Projects like the intentionally minimal pi coding agent, and Karpathy’s autoresearch are explicit examples of the idea. Where possible, reduce complexity, remove abstractions, and get as close to bare-metal loops as you can.

A few ideas to make this work in practice:

Agentic Tools: Build small, purpose-built tools that give the model a search space and a feedback signal. Prefer CLIs, APIs, and scripts over MCPs.
Loop Design: break your project into atomic tasks with a verifiable end state, mirroring the RL environments of their training. A perfect example is Simon Willison’s Red/Green TDD prompt.
Remove Yourself: if you have to copy-paste between two interfaces, that is a broken loop, find the shorter path. Use voice-to-text to remove typing where possible in favor of free-form brain dumps and Q&A.
Customize the Interface: chat is a lossy interface for complex state. Get the model to explain itself through HTML presentations, Mermaid diagrams, or pseudo-code, whatever compresses the state better than prose.

LLMs are general optimizers, and they yearn for the loops.