How I Learned to Stop Coding With IDEs and Love the Agents
For those of us who write code daily, our workflow is changing rapidly beneath our feet.
Coding agents are still clunky and prone to mistakes. But they are already (at least for me) the fastest way to write code. Perhaps more importantly, they change the kind of code I write. By reducing friction and increasing iteration speed, coding agents increase my ambition for each coding session. I’ve been coding up new features with child-like excitement (perhaps too many).
To give a sense of user traction, Boris Cherny (creator of Claude Code) said in a recent Q&A “about 80% of people at Anthropic that are technical use Claude Code every day”. Here’s another notable interaction from the Q&A session:
Question from the audience: "Why did you build a CLI tool instead of an IDE?"
Answer from Boris "....I think there's a good chance that by the end of the year people aren't using IDEs anymore..."
I think this could be correct. I’ve done a number of coding sessions now where I haven’t touched the code “manually” with an IDE. And the models are the worst they’ll ever be at coding.
But now we have new problems. Different problems. When coding “manually”, I prefer to have one big monitor. The IDE fills the screen. I minimize context switching and try to stay in flow.
But that workflow adapts poorly for coding with agents. For one thing, they run for longer periods of time, so it’s inefficient to sit there waiting. Another problem is the variable completion time. Some agent trajectories require almost immediate intervention. Others can run for long periods of time.
The dynamic is closer to being a conductor. Instead of focusing on a single, synchronous task, my focus darts around between agents, encouraging them, demanding more, serving up hints. It’s an iterative process, ensuring the agents have the context they need to progress. With practice, I’ve started to develop some intuition for what a coding agent can do reliably, and what will take a few attempts. But this intuition has a short half-life.
Such is the nature of AI development.
Quick ecosystem news and curated links
My AI Skeptic Friends Are All Nuts: An excellent post on what it's like to code with agents and why, as this blog argues, it's changing everything. This rose to the top of hacker news.
Microsoft is now ranking the ‘safety’ of its cloud AI models: The Financial Times reports on a new internal safety leaderboard for models sold to cloud customers—a potential sign of growing market demand for safety metrics.
What happens when AI outcompetes you as a father?: David Duvenaud, ML legend (and now my poster child for why academic tenure is so valuable) ponders difficult questions in a thoughtful way.
Is the world vulnerable to extremely powerful technology?: Polymath Michael Nielsen draws on historical parallels to explore our potential fragility in the face of transformative AI.
The UK government pledges funding for Sizewell C: Nuclear power. Could it happen? I’m mildly optimistic.
A reminder of the Gell-Mann Amnesia effect, via a Semianalysis chart. I like Semianalysis for semiconductor news. It’s great. But their recent newsletter featured a “SWE-Bench Performance vs. Cost” graph with sufficient room for improvement that I feel compelled to comment:
Mislabeled Benchmark: The chart is titled SWE-bench, but the numbers are for SWE-bench (verified)—a smaller, verified subset, for which the numbers are different. Not a huge deal, but not a great start. Still, I’m sure I’ve done this before and it’s reasonable to infer that the audience will know. The Claude announcement page also omits the “verified” part. Not a big deal.
Minor Nitpick: The models are technically named Claude Opus 4 and Claude Sonnet 4, not "Claude 4 Opus/Sonnet". Perhaps this is Anthropic’s fault for changing the naming scheme.
Apples-to-Oranges Comparison: My main concern. The chart reports Claude 4 Opus/Sonnet results derived from a non-public reward model and parallel sampling, but reports token costs as though sampling is conducted from a standalone model. These are not comparable.
(Despite this, I naturally continue to believe their chip supply chain reports are great.)
The Illusion of Thinking: A paper that has everyone on the internet talking. I recommend sharing your opinion before reading the paper.