This post summarizes the key ideas from Zhang et al. (2026), “Molecular Knowledge Representations in the Era of Artificial Intelligence,” a preprint published on ChemRxiv (DOI: 10.26434/chemrxiv.15002830/v1). The Core Problem Molecules are quantum-mechanical objects. Their exact description is computationally intractable, and any real sample is a messy mixture of impurities, conformers, and side products. This means every representation of a molecule is, by necessity, an approximation — shaped by the interactions and length scales we care about.
Your docking pose is only as trustworthy as your starting coordinates. Here is a systematic guide to navigating the PDB, avoiding common pitfalls, and future-proofing your workflow for the coming mmCIF era.
A comprehensive walkthrough of cheminformatics, machine learning, molecular docking, ADMET prediction, and molecular dynamics simulations as the modern toolbox for computer-aided drug discovery.
A beginner-friendly walkthrough of PyTorch Geometric's point cloud tutorial — covering the Data object, transforms, dynamic graph construction, PointNet++ message passing, and graph-level classification.
Simon Willison recently appeared on Lenny’s Podcast to discuss what he calls the November inflection point: the moment in late 2025 when frontier models crossed a threshold where agentic coding went from “mostly works if you watch carefully” to “almost always does what you asked.” His highlights post is worth reading in full, but reading it through the lens of computational drug discovery, several themes land with unusual force.
A practical, beginner-friendly introduction to the Deep Graph Library (DGL) and how to use it to featurize protein–ligand complexes for machine learning in drug discovery.
A practical guide to three advanced 3D fingerprinting methods (PLEC, SPLIF, and E3FP) and how to choose between them when featurizing docking poses for ML-based drug discovery models.
Conceptual overview of the key energetic contributions governing protein–ligand binding in molecular docking, including desolvation, entropy, water displacement, electrostatics, and scoring function behavior.
Preparing a molecular system correctly before running molecular dynamics (MD) simulations is essential for obtaining meaningful and reproducible results. Small technical choices such as solvent box geometry, treatment of protein termini, and strategies for selecting representative conformations can strongly influence simulation stability, computational efficiency, and interpretation of results.
A practical guide to the most common methodological errors in Density Functional Theory calculations, explaining why legacy protocols like B3LYP/6-31G* are no longer reliable and how to adopt robust, modern best practices.