Cheminformatics

A History of Graph Neural Networks in Drug Discovery

Graphs in drug discovery have gone from a quiet background tool to one of the main ways we think about molecules, proteins, and their interactions. This post walks through that story: how the field moved from fingerprints and QSAR to today’s 3D, attention-based graph neural networks operating directly on protein-ligand complexes.

Jun 8, 2026

A Practical Guide to QSAR Model Validation: Internal, Cross, and External Checks

Building a QSAR model is only half the job. The harder question is: does it actually work? Overfitted models routinely pass internal checks while failing completely on new compounds. The OECD principles and decades of best-practice literature have converged on a three-tier validation framework that separates what a model has memorised from what it can genuinely predict.

Jun 1, 2026

Beyond SMILES: The Evolving Landscape of Molecular Representations

This post summarizes the key ideas from Zhang et al. (2026), “Molecular Knowledge Representations in the Era of Artificial Intelligence,” a preprint published on ChemRxiv (DOI: 10.26434/chemrxiv.15002830/v1). The Core Problem Molecules are quantum-mechanical objects. Their exact description is computationally intractable, and any real sample is a messy mixture of impurities, conformers, and side products. This means every representation of a molecule is, by necessity, an approximation — shaped by the interactions and length scales we care about.

May 23, 2026

Choosing the Right PDB Structure: A Systematic Guide for Docking and MD Simulations

Your docking pose is only as trustworthy as your starting coordinates. Here is a systematic guide to navigating the PDB, avoiding common pitfalls, and future-proofing your workflow for the coming mmCIF era.

May 19, 2026

Computational Strategies for Accelerating Drug Discovery: A Comprehensive Review

A comprehensive walkthrough of cheminformatics, machine learning, molecular docking, ADMET prediction, and molecular dynamics simulations as the modern toolbox for computer-aided drug discovery.

May 10, 2026

What Agentic Engineering Means for Computational Drug Discovery

Simon Willison recently appeared on Lenny’s Podcast to discuss what he calls the November inflection point: the moment in late 2025 when frontier models crossed a threshold where agentic coding went from “mostly works if you watch carefully” to “almost always does what you asked.” His highlights post is worth reading in full, but reading it through the lens of computational drug discovery, several themes land with unusual force.

Apr 4, 2026

Getting Started with Graph Neural Networks for Protein–Ligand Complexes Using DGL

A practical, beginner-friendly introduction to the Deep Graph Library (DGL) and how to use it to featurize protein–ligand complexes for machine learning in drug discovery.

Apr 3, 2026

Beyond 2D Fingerprints: Encoding Protein-Ligand Interactions for Machine Learning

A practical guide to three advanced 3D fingerprinting methods (PLEC, SPLIF, and E3FP) and how to choose between them when featurizing docking poses for ML-based drug discovery models.

Apr 1, 2026

Understanding Binding Energetics in Molecular Docking

Conceptual overview of the key energetic contributions governing protein–ligand binding in molecular docking, including desolvation, entropy, water displacement, electrostatics, and scoring function behavior.

Mar 13, 2026

Practical System Preparation Tips for Molecular Dynamics Simulations

Preparing a molecular system correctly before running molecular dynamics (MD) simulations is essential for obtaining meaningful and reproducible results. Small technical choices such as solvent box geometry, treatment of protein termini, and strategies for selecting representative conformations can strongly influence simulation stability, computational efficiency, and interpretation of results.

Mar 9, 2026