← Back to changelog

2026-06-05

Agents that show their work

Before calling a task done, the agent runs your tests and opens the app in a real browser — and learns each repo so the next task starts smarter.

"It compiles" isn't "it works." So the agent now does the checks you'd do yourself before it tells you a task is finished — and it remembers what it learns.

What changed

  • Runs your tests. Before a goal step is marked done, the agent runs the project's test suite. A real failure isn't ignored — it's fed back as guidance and the agent tries again.
  • Opens the app in a real browser. It launches the running app in a headless browser and flags console errors and failed requests, so a change that builds but breaks the page doesn't slip through.
  • Shows you the proof. Test results and the browser check travel with the task, so you can see why it thinks the work is done.
  • Learns each repo. The agent recalls prior work on a repo — past changes, conventions, where things live — so the next task starts with context instead of from scratch. The more you use an agent on a codebase, the smarter it gets.

Why

Trust comes from evidence. An agent that quietly verifies itself and gets to know your code is one you can actually hand bigger work to.