Making the coding agent legible
2026-04-19
The agent worked — but the web UI couldn't tell you which model was behind it, the command surface was invisible, and when a provider rejected a request the error said nothing useful. Fixing that revealed a silent wire-compat bug we'd been shipping for weeks.
The bench earns its keep
2026-04-19
We built the agent bench to grade providers. It turned out to be just as good at finding tool-side bugs the agent had been silently working around — and at proving that our fixes actually fix anything.