Agents that show their work

"It compiles" isn't "it works." So the agent now does the checks you'd do yourself before it tells you a task is finished — and it remembers what it learns.

What changed

Runs your tests. Before a goal step is marked done, the agent runs the project's test suite. A real failure isn't ignored — it's fed back as guidance and the agent tries again.
Opens the app in a real browser. It launches the running app in a headless browser and flags console errors and failed requests, so a change that builds but breaks the page doesn't slip through.
Shows you the proof. Test results and the browser check travel with the task, so you can see why it thinks the work is done.
Learns each repo. The agent recalls prior work on a repo — past changes, conventions, where things live — so the next task starts with context instead of from scratch. The more you use an agent on a codebase, the smarter it gets.

Why

Trust comes from evidence. An agent that quietly verifies itself and gets to know your code is one you can actually hand bigger work to.