This post summarizes the key ideas from Zhang et al. (2026), “Molecular Knowledge Representations in the Era of Artificial Intelligence,” a preprint published on ChemRxiv (DOI: 10.26434/chemrxiv.15002830/v1). The Core Problem Molecules are quantum-mechanical objects. Their exact description is computationally intractable, and any real sample is a messy mixture of impurities, conformers, and side products. This means every representation of a molecule is, by necessity, an approximation — shaped by the interactions and length scales we care about.
May 23, 2026
Your docking pose is only as trustworthy as your starting coordinates. Here is a systematic guide to navigating the PDB, avoiding common pitfalls, and future-proofing your workflow for the coming mmCIF era.
May 19, 2026
A comprehensive walkthrough of cheminformatics, machine learning, molecular docking, ADMET prediction, and molecular dynamics simulations as the modern toolbox for computer-aided drug discovery.
May 10, 2026
Simon Willison recently appeared on Lenny’s Podcast to discuss what he calls the November inflection point: the moment in late 2025 when frontier models crossed a threshold where agentic coding went from “mostly works if you watch carefully” to “almost always does what you asked.” His highlights post is worth reading in full, but reading it through the lens of computational drug discovery, several themes land with unusual force.
Apr 4, 2026
A practical, beginner-friendly introduction to the Deep Graph Library (DGL) and how to use it to featurize protein–ligand complexes for machine learning in drug discovery.
Apr 3, 2026
A practical guide to three advanced 3D fingerprinting methods (PLEC, SPLIF, and E3FP) and how to choose between them when featurizing docking poses for ML-based drug discovery models.
Apr 1, 2026
Conceptual overview of the key energetic contributions governing protein–ligand binding in molecular docking, including desolvation, entropy, water displacement, electrostatics, and scoring function behavior.
Mar 13, 2026
Preparing a molecular system correctly before running molecular dynamics (MD) simulations is essential for obtaining meaningful and reproducible results. Small technical choices such as solvent box geometry, treatment of protein termini, and strategies for selecting representative conformations can strongly influence simulation stability, computational efficiency, and interpretation of results.
Mar 9, 2026
Practical introduction to DataWarrior as a free, chemistry-aware workbench for data visualization, filtering, and focused library generation, based on Bruno Villoutreix’s tutorial series.
Nov 15, 2025
Step-by-step guide to constructing an open-source drug discovery pipeline with AI and chemistry tools, from data to visualization.
Jul 1, 2025