A History of Graph Neural Networks in Drug Discovery

Mon, 08 Jun 2026 00:00:00 +0000

Graphs in drug discovery have gone from a quiet background tool to one of the main ways we think about molecules, proteins, and their interactions. This post walks through that story: how the field moved from fingerprints and QSAR to today’s 3D, attention-based graph neural networks operating directly on protein-ligand complexes.

Before Deep Learning: Graphs Hiding in Plain Sight

Long before anyone said “graph neural network,” chemoinformatics was already thinking in graphs. A small molecule is naturally a graph: atoms are nodes, bonds are edges. Classical fingerprints like ECFP/Morgan and many QSAR descriptors essentially encode local graph neighborhoods into fixed-length vectors. Kernel methods on graphs also appeared well before deep GNNs, using hand-crafted features and similarity measures rather than learned message passing.

In protein-ligand modeling, researchers built interaction graphs: bipartite graphs of ligand atoms or residues connected to protein residues if they were in contact, then fed summary descriptors of those graphs into traditional machine learning models. Graphs were there, but the graph structure was not learned end-to-end. It was one step in a largely manual feature pipeline.

The First Wave: Molecules as Graphs

Around the mid-2010s, deep learning began to absorb graph structure directly. Message passing neural networks, graph convolutional networks (GCN), and graph attention networks (GAT) emerged as ways to learn from arbitrary graphs. In drug discovery, the first impact was on ligand-only problems: predicting solubility, toxicity, activity, or ADMET endpoints from the molecular graph.

Instead of computing fingerprints by hand, models learned node and edge embeddings and iteratively propagated information along bonds. This matched chemists’ intuition: properties depend on how atoms communicate through bonds and how local environments are arranged. For several years, many “GNN in drug discovery” papers stopped here: GNNs as better QSAR on ligand graphs, with protein information coming from sequences, profiles, or being ignored entirely.

From Ligands to Complexes

A key shift came when researchers started treating protein-ligand complexes themselves as graphs. Rather than separate models for ligand and protein, the complex became a joint graph or a pair of interacting graphs.

Several design patterns emerged:

Protein and ligand as separate graphs with interaction edges connecting residues and atoms within a distance cutoff.
Unified interaction graphs where both protein and ligand atoms live in one graph, with edges encoding contacts, distances, and interaction types.
Structure-aware interactive GNNs that pass messages not only within the ligand or protein, but also across protein-ligand edges based on 3D geometry.

This step is conceptually important: the binding site environment, not just the ligand, becomes part of the learnable graph. Distance features, contact types, and physico-chemical interactions (H-bonds, hydrophobics, metal coordination) can all be encoded as edge features. Attention and message passing mechanisms then decide which contacts matter most.

Geometry, Physics, and Attention

As complex-level GNNs proliferated, three themes solidified.

3D geometry became central. Models began using distance encodings, radial basis functions, or full SE(3)-equivariant architectures to respect spatial symmetry and exploit precise atomic coordinates. This makes graph models analogous to physics-inspired scoring functions, but with learnable parameters.

Edge semantics grew richer. Edges no longer meant just “bond” or “within X angstroms.” They carried labels for specific interaction classes: hydrogen bonds, pi-pi stacking, hydrophobic contacts, metal coordination. This is particularly natural in systems like cytochromes, where a small set of chemical interactions (Fe coordination, certain C-H distances) are mechanistically dominant.

Attention mechanisms took a larger role. GAT-style layers use learned attention coefficients to weight neighbor contributions, letting the network emphasize key contacts while down-weighting noisy ones. At the graph level, attentive pooling or multiple instance learning over subgraphs (poses, pockets, conformers) helps the model select which parts of the input to trust.

Ensembles, Poses, and MIL: Where the Field Is Now

Docking and MD rarely provide a single “true” complex; they yield an ensemble of poses. A modern and increasingly explored direction is to treat each pose as a graph instance and apply multiple instance learning (MIL) over the set of poses for a compound.

A typical pipeline looks like this:

Build one protein-ligand graph per pose, with rich node and edge features, using an edge-aware GAT (such as EdgeGATConv) for message passing.
Read out each pose graph to a pose-level embedding.
Use attention-based MIL (ABMIL) to aggregate pose embeddings into a compound-level representation, learning which pose or poses are most informative for binding or substrate classification.
Predict substrate status or binding affinity from that aggregated embedding.

This pipeline extends the history in a natural direction: from ligand graphs to protein-ligand complex graphs, from a single static complex to ensembles of complexes, and from fixed pooling to learnable attention-based selection over poses.

It also marries ensemble docking, which has long been used in drug discovery, with modern graph and attention machinery. Rather than manually choosing the best pose by docking score, the model learns what a productive pose looks like.

Summary

The arc is clear. Graphs were implicit in chemoinformatics from the beginning. Deep learning made them explicit and learnable. Complex-level modeling brought in the protein environment. Geometric and equivariant architectures grounded models in physical reality. And MIL-over-poses frameworks now allow models to reason over entire conformational ensembles rather than single snapshots.

Each step expanded what the model could see and what it could learn to ignore.