Glossary
Computational Chemistry
Ab initio calculations:
Calculations that are performed using basic principles of quantum mechanics, without any experimental data or assumptions.
Density functional theory (DFT):
A computational method that uses the electronic density of a system to calculate properties such as energy, structure, and reactivity.
Force field:
A mathematical model used to describe the interactions between atoms or molecules in a system.
Molecular mechanics:
A method of calculating the energy and forces in a molecular system based on classical mechanics.
Quantum mechanics (QM):
A theory that describes the behavior of particles at the atomic and subatomic level, and is used to calculate properties such as energy and structure in chemical systems.
Molecular dynamics (MD):
A computational method that simulates the motion of atoms and molecules over time, based on classical mechanics and force field calculations.
Monte Carlo (MC) simulations:
A computational method that uses statistical sampling to simulate the behavior of a system, based on random input parameters.
Basis set:
A set of mathematical functions used to describe the wave function of electrons in a quantum mechanical calculation.
Gaussian function:
A mathematical function that describes the shape of a wave function in a quantum mechanical calculation.
Energy minimization:
A computational method used to find the lowest-energy configuration of a system, by adjusting the positions of the atoms or molecules until the energy is minimized.
Ab initio molecular dynamics:
A type of molecular dynamics simulation that uses ab initio calculations to describe the behavior of a system.
Hybrid method:
A computational method that combines two or more types of calculations, such as DFT and molecular mechanics, to improve accuracy.
Electrostatic potential:
The electric field generated by a charged particle, which can be used to describe the interactions between molecules in a system.
Non-bonded interaction:
Interactions between atoms or molecules that are not directly bonded to each other, such as van der Waals forces or electrostatic interactions.
Docking:
A computational method that predicts the binding mode and affinity of a small molecule to a target protein.
Ligand:
A small molecule that binds to a target protein and modulates its activity.
Receptor:
A protein that binds to a ligand and mediates its biological activity.
Molecular dynamics simulations:
A computational method that simulates the motion of atoms and molecules over time to predict the behavior of a system.
Virtual screening:
A computational method that uses molecular docking or other techniques to identify potential drug candidates from a large database of compounds.
Pharmacophore:
A set of chemical features that are necessary for a ligand to bind to a receptor and produce a biological response.
Quantitative structure-activity relationship (QSAR):
A computational method that predicts the activity of a compound based on its chemical structure and the activity of similar compounds.
Homology modeling:
A computational method that predicts the structure of a protein based on its sequence similarity to a known protein structure.
Fragment-based drug design:
A method of drug design that involves the identification of small fragments that bind to a target protein and the subsequent assembly of these fragments into a larger molecule.
Lead optimization:
The process of modifying a lead compound to improve its potency, selectivity, pharmacokinetic properties, or other desirable characteristics.
Drug-likeness:
A set of physicochemical properties that are commonly found in approved drugs, used to evaluate the potential of a compound to become a drug.
Target validation:
The process of demonstrating that a target protein is biologically relevant and is a suitable target for drug discovery.
Machine Learning for Cheminformatics
Algorithm:
A set of rules a model follows to make predictions (e.g., Random Forest, Deep Neural Networks).
Bias:
Systematic error in predictions, e.g., a QSAR model favoring certain molecular scaffolds.
Cross-validation:
Splitting data multiple times to ensure robust model performance.
Data augmentation:
Artificially increasing dataset size, e.g., generating molecular conformers.
Decision boundary:
A threshold separating active vs. inactive compounds in classification models.
Epoch:
One full pass through the training data in deep learning.
Feature engineering:
Creating relevant molecular descriptors (e.g., logP, TPSA, fingerprints).
Feature selection:
Choosing the most informative descriptors for QSAR models.
Generalization:
A model’s ability to predict new molecules, not just training data.
Gradient descent:
Optimization method to update model weights in deep learning.
Hyperparameter tuning:
Optimizing settings like learning rate, number of layers, or trees in Random Forest.
Interpretability:
Understanding why a model predicts a molecule as active/inactive (e.g., SHAP, LIME).
Loss function:
Measures how far predictions are from true values, e.g., RMSE in regression.
Model overfitting:
When a model memorizes training data but fails on new molecules.
Neural networks:
ML models inspired by the brain, used in de novo drug design.
One-hot encoding:
Converting categorical data like SMILES tokens into binary form.
Overfitting:
When a model is too specific to training data and lacks generalization.
Precision & recall:
Metrics to evaluate models predicting bioactivity (e.g., hit discovery).
QSAR (Quantitative Structure-Activity Relationship):
ML models predicting molecular bioactivity from descriptors.
Regularization:
Methods like L1/L2 to prevent overfitting in QSAR models.
ROC-AUC:
Metric evaluating a model’s ability to separate actives/inactives.
SMILES (Simplified Molecular Input Line Entry System):
Text-based molecular representation.
Supervised learning:
Training models with labeled data (e.g., binding affinity prediction).
Unsupervised learning:
Finding patterns in unlabeled molecular datasets (e.g., clustering).
Validation set:
Data used to tune the model before final testing.