The Engineer Who Got Rejected for Coding by Hand — and the Workflow That Replaced It

1. A Rejected PR That Changed Everything
Imagine joining the world’s leading AI lab as a senior engineer. You write your first pull request — careful, clean, thoroughly considered. Then it gets rejected. Not for logic errors. Not for missing tests. For being hand-written.
This happened to Boris Cherny, now the engineering lead behind Claude Code, when he joined Anthropic. Previously one of Meta’s most prolific engineers, Boris found himself in an environment where the baseline expectation was that code came from AI. His hand-crafted PR was a signal that he hadn’t yet made the mental shift.
That rejection became the catalyst for everything that followed — a workflow capable of shipping 20 to 30 pull requests per day, and a philosophy of AI-native engineering that is quietly reshaping how software gets built.
2. The Core Mistake: Keeping AI in a Box
Claude Code began as an internal prototype called Clyde. In its early design, Boris made the same mistake most engineers make: he tried to constrain what the AI could do.
The prevailing model for AI tooling at the time was the plugin approach — highlight some code, ask the model to complete or explain it, get a response. The AI was a helper with a narrow lane. Boris’s first instinct was to formalize that lane, to give the model structured inputs and controlled outputs.
It didn’t work. The model felt stunted.
The breakthrough came from abandoning control entirely. Rather than telling the AI how to solve a problem, the team gave it tools — Bash execution, file system access, the ability to run arbitrary programs — and then stepped back.
“Don’t try putting it in a box. Give the model tools, let it run programs itself.”
The difference became immediately apparent. In one early test, a researcher asked Clyde: “What music am I listening to right now?” This isn’t a question with an obvious API. But the model didn’t complain about missing context. It wrote an AppleScript to query the local music player, piped the output through sed, and returned the song title.
Nobody told it to do that. Nobody even thought of that approach. The model reasoned its way to a solution using tools it had been given for entirely different purposes.
This is what Boris calls the Bitter Lesson applied to agent design: general-purpose capability, given real tools, outperforms tightly engineered narrow systems. Almost every time.
3. The Parallel Agent Workflow
Once you accept that the AI should have autonomy, the next question is: how do you scale that?
Boris’s answer is what he calls the conductor model. The engineer stops being a typist and becomes an orchestrator — running multiple AI agents in parallel, each working on a separate branch, each receiving high-level task descriptions while the engineer cycles between them.
The physical setup is intentionally low-tech:
- Open 5 terminal tabs in tmux (or use Git worktrees for full isolation)
- In each tab, launch Claude Code and enter Plan Mode (Shift+Tab twice)
- Describe the task to each agent
- While agent 1 is thinking or executing, switch to agent 2 and give it a new task
- Return to agent 1 when it needs input or a decision
1 | [tab 1] Claude Code — feature/auth-refresh → executing |
The engineer’s job in this loop is not implementation. It is defining type signatures — the shape of what correct output looks like — and validating decisions when agents surface them. The cognitive load shifts from writing to judging.
This is why Boris uses the orchestra conductor metaphor. A conductor does not play the instruments. They hold the structure of the piece in their head, they communicate intent to performers who have deep domain expertise, and they make real-time decisions about pacing and emphasis. That is now the job.
4. Quality at Scale: When 80% of Your Code Is AI-Generated
At Anthropic, approximately 80% of code is now written by Claude Code. This raises an obvious question: how do you maintain quality when the majority of your codebase was not typed by a human?
Boris’s answer is layered:
Layer 1 — Agent self-testing. Claude Code writes and runs its own tests locally before submitting. For changes to Claude Code itself, this includes end-to-end tests where the model uses itself to verify its own behavior.
Layer 2 — Best of N review. When a PR is opened, multiple review agents run independently in parallel. Each evaluates the change against the codebase, the requirements, and known failure patterns. A deduplication agent synthesizes the results. This catches roughly 80% of low-level bugs before any human sees the code.
Layer 3 — Dynamic linting. When a category of error surfaces repeatedly, Boris doesn’t write a ticket. He asks Claude to write a lint rule that prevents the pattern at the source. The quality system is itself AI-generated and self-expanding.
Layer 4 — Human final review. Despite all of the above, every PR that goes into production products still requires a human engineer’s sign-off. Not because the automation is insufficient, but because accountability and judgment belong to the people, not the tools.
Note that this is not fully autonomous. The human is not removed from the loop — the human’s position in the loop changes. From coder to reviewer. From implementer to decider.
5. The Printing Press Moment
Boris uses a historical analogy that deserves more attention than it typically gets.
In the 15th century, literacy and manuscript copying were rare specializations. Scribes occupied a privileged position in the information economy — they controlled access to knowledge by controlling who could produce and reproduce text. Then Gutenberg invented movable type.
Scribes did not disappear. The scribal function — careful, expert engagement with text — became the foundation for writers, editors, and publishers. The printing press did not shrink the market for literacy. It exploded it.
“We’re like 15th-century scribes. The people who couldn’t code before — CEOs, PMs, designers — are now like kings who couldn’t read. And they’re starting to read.”
At Anthropic, nearly 100% of non-engineers now use Claude Code to build things. Not to assist engineers. To build their own things — tools, workflows, prototypes — that previously required dedicated engineering resources. The total volume of software being created is expanding, not contracting.
The practical implication: AI coding tools are not a threat to engineering employment in aggregate. They are a threat to engineering as a bottleneck. The pie is getting much larger; the question is what role engineers play in the larger pie.
6. The Skills Rebalancing
This brings us to the uncomfortable part of the conversation: which skills are actually worth developing right now?
Boris is direct about what is devaluing:
- Framework loyalty. Whether you prefer React or Vue,
fastapiorgin, matters less every month. An AI can rewrite a frontend in a different framework in hours. Specialized knowledge of a specific tool’s quirks is increasingly irrelevant. - Syntax memorization. The ability to write boilerplate from memory is not a signal of skill anymore. It is increasingly a signal of not using the right tools.
And what is gaining value:
- Systematic debugging. The ability to form hypotheses about why something is wrong — not just what is wrong — is harder to automate. Engineers who approach bugs scientifically, narrowing the hypothesis space efficiently, will continue to be irreplaceable.
- Cross-domain breadth. Boris describes the emerging ideal as the “one-person company” engineer: someone with full-stack technical depth and enough business, design, or domain literacy to make product decisions independently. AI expands what one person can build; breadth determines what that person chooses to build.
- Context-switching agility. Managing five Claude Code agents simultaneously is a different cognitive skill than deep single-task focus. The ability to make rapid, high-quality decisions across multiple parallel workstreams — without losing track of each one’s state — is what the parallel workflow demands.
7. Intellectual Humility as a Core Skill
Boris makes one more point that I found worth sitting with.
He admits that even he finds the pace of model improvement disorienting. Solutions he tried six months ago and abandoned as unworkable have become feasible with the current generation of models. This means that engineers who experimented early, failed, and drew firm conclusions based on those failures may now be carrying wrong beliefs about what AI can do — beliefs that are actively limiting their effectiveness.
“Intellectual humility matters more than past experience.”
This is a direct challenge to experience as a form of capital. In most engineering contexts, having done something before is an asset. In a field where the fundamental capability is improving faster than anyone can track, having done something before and failed at it is sometimes a liability — if the failure led to a fixed conclusion.
The correct posture, Boris argues, is to treat your mental model of AI capability as a variable that needs continuous updating, not a constant you can rely on.
8. Conclusion
Boris Cherny’s story is not primarily about productivity metrics — 30 PRs a day is a striking number, but it is a symptom, not the point. The point is that a different mental model of what programming is has taken hold at one of the world’s leading engineering organizations.
In that model, the engineer’s primary contribution is not code. It is judgment — judgment about what to build, how to verify it, when to trust the output, and when to push back. The physical act of typing was always the least interesting part of the job. Now it is optional.
The engineers who thrive in this environment will be the ones who recognize that giving up the keyboard is not a loss of identity. It is a reallocation of attention toward everything that was always more valuable than the typing.
Are you already running parallel agents in your workflow, or does the idea feel like too much to manage at once? What is the biggest mental barrier you have hit in transitioning to AI-native development? Share in the comments.
Comments