Cheminformatics

Beyond SMILES: The Evolving Landscape of Molecular Representations

This post summarizes the key ideas from Zhang et al. (2026), “Molecular Knowledge Representations in the Era of Artificial Intelligence,” a preprint published on ChemRxiv (DOI: 10.26434/chemrxiv.15002830/v1). The Core Problem Molecules are quantum-mechanical objects. Their exact description is computationally intractable, and any real sample is a messy mixture of impurities, conformers, and side products. This means every representation of a molecule is, by necessity, an approximation — shaped by the interactions and length scales we care about.

May 23, 2026

Choosing the Right PDB Structure: A Systematic Guide for Docking and MD Simulations

Your docking pose is only as trustworthy as your starting coordinates. Here is a systematic guide to navigating the PDB, avoiding common pitfalls, and future-proofing your workflow for the coming mmCIF era.

May 19, 2026

Computational Strategies for Accelerating Drug Discovery: A Comprehensive Review

A comprehensive walkthrough of cheminformatics, machine learning, molecular docking, ADMET prediction, and molecular dynamics simulations as the modern toolbox for computer-aided drug discovery.

May 10, 2026

What Agentic Engineering Means for Computational Drug Discovery

Simon Willison recently appeared on Lenny’s Podcast to discuss what he calls the November inflection point: the moment in late 2025 when frontier models crossed a threshold where agentic coding went from “mostly works if you watch carefully” to “almost always does what you asked.” His highlights post is worth reading in full, but reading it through the lens of computational drug discovery, several themes land with unusual force.

Apr 4, 2026

Getting Started with Graph Neural Networks for Protein–Ligand Complexes Using DGL

A practical, beginner-friendly introduction to the Deep Graph Library (DGL) and how to use it to featurize protein–ligand complexes for machine learning in drug discovery.

Apr 3, 2026

Beyond 2D Fingerprints: Encoding Protein-Ligand Interactions for Machine Learning

A practical guide to three advanced 3D fingerprinting methods (PLEC, SPLIF, and E3FP) and how to choose between them when featurizing docking poses for ML-based drug discovery models.

Apr 1, 2026

Understanding Binding Energetics in Molecular Docking

Conceptual overview of the key energetic contributions governing protein–ligand binding in molecular docking, including desolvation, entropy, water displacement, electrostatics, and scoring function behavior.

Mar 13, 2026

Practical System Preparation Tips for Molecular Dynamics Simulations

Preparing a molecular system correctly before running molecular dynamics (MD) simulations is essential for obtaining meaningful and reproducible results. Small technical choices such as solvent box geometry, treatment of protein termini, and strategies for selecting representative conformations can strongly influence simulation stability, computational efficiency, and interpretation of results.

Mar 9, 2026

How to Use DataWarrior for Drug Discovery: Key Workflows From the Villoutreix Tutorials

Practical introduction to DataWarrior as a free, chemistry-aware workbench for data visualization, filtering, and focused library generation, based on Bruno Villoutreix’s tutorial series.

Nov 15, 2025

AI + Chemistry: Building Drug Discovery Pipelines with Free Tools

Step-by-step guide to constructing an open-source drug discovery pipeline with AI and chemistry tools, from data to visualization.

Jul 1, 2025