Agentic Engineering
Simon Willison recently started documenting Agentic Engineering Patterns — practices for getting good results out of coding agents that both write and execute code on their own. This page is my running companion to that idea, told from the other side of the loop: lessons and observations from my own experience working alongside AI coding agents.
- June 18, 2026
"Almost working" kept the wrong tool in place
What happened. I asked an agent to add a simple action: create a task and return its ID. The first version used a command-line tool that could create tasks but didn't reliably hand back the ID in a usable form. Instead of stepping back, the agent kept trying to make that tool work — searching for the task after creating it, parsing the command's output, writing the result to a temp file. Each workaround fixed the latest symptom but never questioned the real problem: the tool simply wasn't designed to give back a value, and no amount of patching would change that.
Why the agent missed it. Its frame was "how do I extract the ID from this command?" The right frame was "is this even the right tool for a job that needs a return value?" Because the command did create tasks, it felt close enough — and that's what made it dangerous: not obviously broken, just wrong enough to keep wasting time.
The fix. The pushback was: "I'd expect to call a function that creates the task and returns the ID directly — no parsing." That reframed it. We swapped the command-line tool for the server-side API that returns structured data, and the workarounds disappeared.
When a tool isn't built for the job, making it "almost work" can feel like progress — but each workaround just hides a choice you should have revisited. If the fixes keep stacking up, that's usually the signal: step back and ask whether the tool is right — instead of piling on more workarounds.
- June 12, 2026
"Just implement it" quietly became "build a second backend"
What happened. I asked an agent to add a test flow to an existing dashboard, with one requirement: it should behave exactly like the production path. Instead of reusing the production code, the agent quietly built a parallel version — a separate runner with its own handling for edge cases. Each piece looked reasonable alone, but together they formed a second system that only looked like the original.
Why the agent missed it. It focused on making the new flow work, and a separate copy works fine on its own. What it missed was that "behave like production" meant use the production path — not build something that merely looks like it.
The fix. I asked one question: "this separate runner — isn't it redundant?" That reframed the task. We removed the parallel code and routed the new flow through the same shared logic the production path already used.
"Avoid duplication" isn't about tidiness — it's about correctness. Some argue it matters less in the age of AI, since agents can later update both copies if needed. But an agent only fixes what it's pointed at; it has no reason to know a parallel copy exists. So the two paths drift apart silently while both still look healthy. When a feature is meant to mirror an existing process, reuse is the default; duplication needs a stated reason.
- May 20, 2026
Reading code is changing — but it's not going away
The principle. In the age of agentic engineering, understanding code without reading every line is becoming one of the core skills. Whether it's code an AI generated for you or code a colleague wrote (with AI help), you need to grasp intent, structure, and risk at a glance.
On code review. We already use AI agents to review pull requests. But copy-pasting AI-generated review comments adds no value — the code author could have run the same tool themselves. The real skill is reading the AI's review, deciding which comments are valid, and judging how much they matter. You're not the reviewer's typist; you're its editor.
AI generates the code and the reviews. Your job is to understand both well enough to know what matters and what doesn't — without reading every line yourself.
- May 20, 2026
Start in the driver's seat, then gradually hand over the wheel
The principle. When you join a new team or start a new project, use AI as an assistant — not an autopilot. Read the documents yourself (with AI helping you summarize and search). Read the code yourself (with AI helping you navigate and explain). Double-check what the AI generates, not because it's wrong, but because reviewing its output is how you build context.
Why it matters. Over time, as you understand the system better, you can give the agent more autonomy and spend less time monitoring its work. But if you hand over too much autonomy at the start — even if the AI does everything correctly — you stay behind. You can't meaningfully steer what you don't understand.
Trust is built gradually. Start hands-on, increase autonomy as your own understanding grows. The goal isn't to supervise forever — it's to earn the confidence to stop.
- May 13, 2026
"Simplest option" erased nine features from five engineers
What happened. I asked an agent to merge a long-lived branch back into main. There were conflicts. The agent resolved all of them by keeping my branch's version and discarding everything else — silently deleting work from five other engineers.
Why the agent missed it. It reasoned: "this branch is the goal, so the branch should win every conflict." That logic was correct for my changes but wrong for everyone else's unrelated edits that happened to touch the same files.
The fix. I said: "if you're removing code I wrote, fine — if you're removing other people's code, that's not fine." The agent then checked who wrote each deleted line and preserved everything that wasn't mine. Took 20 minutes instead of 2, but saved nine features from five engineers.
"Simplest option" is an instruction about effort, not correctness. In any shared codebase, ask: "whose work am I about to overwrite?" before accepting a bulk resolution.
- May 11, 2026
"Re-run" didn't mean "resume"
What happened. I told an agent to fix a bug and re-run a multi-hour test suite. The agent was about to restart from scratch — re-running hundreds of already-passed tests — because "re-run" didn't mean "resume."
Why the agent missed it. It optimized the literal instruction ("get a clean run") without considering the implicit constraint ("don't redo finished work"). Agents don't infer what you didn't say.
The fix. I asked one question: "won't this rerun the ones that already passed?" The agent immediately built a resume mechanism. Two minutes of work saved hours of wasted compute.
Agents optimize the goal you stated. Side effects outside that frame — wasted time, wasted money, repeated work — are invisible to them unless you name them.
