What Agentic Engineering Means for Computational Drug Discovery

Apr 4, 2026 · 5 min read

Simon Willison recently appeared on Lenny’s Podcast to discuss what he calls the November inflection point: the moment in late 2025 when frontier models crossed a threshold where agentic coding went from “mostly works if you watch carefully” to “almost always does what you asked.” His highlights post is worth reading in full, but reading it through the lens of computational drug discovery, several themes land with unusual force.


The Inflection Point is Real in Our Field Too

Willison frames November 2025 as the moment the promise of coding agents became practical: you can spin up an agent, describe what you want, and get something back that actually runs. For those of us building MD analysis pipelines, virtual screening workflows, or ML-based substrate classifiers, this rings true. What changed isn’t just raw model capability - it’s reliability. An agent that gets it right 75% of the time creates more debugging work than it saves. One that gets it right 95% of the time fundamentally changes how you design a project.

In drug discovery, the code we write is usually not the point. The point is the science: does this compound dock with the right binding pose? Does this ML model generalize across CYP isoforms? The code is scaffolding. And if agents can build the scaffolding reliably, that frees researchers to focus on the parts that actually require domain expertise: choosing the right receptor ensemble, interpreting docking scores, deciding whether a geometric feature captures what you think it captures.


The Bottleneck Has Moved to Testing, and That’s Familiar Territory

One of Willison’s most interesting observations is that the bottleneck in software development has shifted: implementation is no longer the slow part. Prototyping three different approaches now costs as much as prototyping one used to. The new constraint is validation - how do you know which of your three prototypes is actually right?

This maps almost perfectly onto computational drug discovery, where validation has always been the hard part. It takes minutes to run a docking job. It takes weeks or months of wet lab work to find out if the predicted binder actually binds. The ratio of compute to experimental validation hasn’t changed just because the AI can write your AutoDock Vina wrapper faster. If anything, the pressure on experimental teams increases as computational throughput accelerates.

What does change is internal validation. It’s now cheap to try multiple featurization strategies for your ML model, run multiple docking protocols on your ensemble, or test three different clustering approaches on your MD trajectories, and compare them properly. The scientific judgment required to interpret those comparisons doesn’t get cheaper, but you get more data to feed it.


Experience Amplification, Not Experience Replacement

Willison cites a ThoughtWorks finding: AI tools benefit senior engineers and junior engineers most, while mid-career engineers face the most disruption. The reasoning is that senior engineers know what to build and how to evaluate what was built - they provide the judgment that guides the agent and the critical eye that catches its mistakes. Junior engineers benefit from having an endlessly patient collaborator that can explain concepts, demonstrate patterns, and scaffold early projects.

The parallel in computational drug discovery is direct. A postdoctoral researcher who understands force fields, binding free energies, the difference between RMSD and RMSF as quality metrics, or why a high docking score doesn’t mean a good drug candidate: that knowledge becomes more valuable when an agent can rapidly implement whatever protocol you describe. The agent doesn’t know that your receptor is in an unusual conformational state, or that the crystallographic water in the binding site probably shouldn’t be stripped. You know that.

What could be disrupted is the middle ground: researchers who can run established pipelines competently but haven’t yet developed the scientific intuition to question their own results. If agents can run the pipelines, that skill alone is no longer a differentiator.


Responsible “Vibe Coding” and the Stakes of Our Domain

Willison draws a clear line: vibe coding (rapidly assembling working software without deeply scrutinizing every line) is fine when only you are affected by any bugs. The moment other people depend on your output, you have to be more careful.

Drug discovery sits unambiguously on the “be more careful” side of that line. A virtual screening pipeline that silently uses the wrong protonation state, a feature extraction script that indexes atoms incorrectly, an ML model that leaks test set information through improper splitting: these errors don’t just waste compute time. They can misdirect medicinal chemistry efforts, consume experimental resources, and slow down projects with real human health implications downstream.

This doesn’t mean agents are inappropriate for the field. It means the validation discipline we already apply to our science needs to be applied to the code too. Write tests. Sanity-check intermediate outputs against known reference datasets. Treat agent-generated code with the same skepticism you’d apply to a script handed to you by a collaborator you respect but haven’t worked with before.


The New Bottleneck: Can We Ask Better Questions?

The thread that runs through Willison’s entire conversation is this: the cost of implementation is collapsing. What remains expensive is knowing what to implement and why. In computational drug discovery, this is the oldest truth in the field. CADD has always been able to generate enormous numbers of candidates; the limiting resource has always been the ability to prioritize them intelligently.

Agentic AI doesn’t solve the prioritization problem. But it does mean that a researcher with genuine scientific judgment - about target biology, about the chemical space worth exploring, about what a binding pose is actually telling you - can execute on that judgment faster and at greater scale than before. The field has always needed people who can think at the interface of chemistry, biology, and computation. That’s not going away. If anything, it matters more.

The inflection point Willison describes is real, and it’s arriving in computational drug discovery the same way it’s arriving everywhere else. The question is whether we use it to accelerate science or just to accelerate the production of numbers.


Episode: An AI state of the union: We’ve passed the inflection point, dark factories are coming, and automation timelines, Lenny’s Podcast, April 2026. Simon Willison’s highlights post: simonwillison.net.