Glossary

Computational Chemistry

Ab initio calculations:

Calculations that are performed using basic principles of quantum mechanics, without any experimental data or assumptions.

Density functional theory (DFT):

A computational method that uses the electronic density of a system to calculate properties such as energy, structure, and reactivity.

Force field:

A mathematical model used to describe the interactions between atoms or molecules in a system.

Molecular mechanics:

A method of calculating the energy and forces in a molecular system based on classical mechanics.

Quantum mechanics (QM):

A theory that describes the behavior of particles at the atomic and subatomic level, and is used to calculate properties such as energy and structure in chemical systems.

Molecular dynamics (MD):

A computational method that simulates the motion of atoms and molecules over time, based on classical mechanics and force field calculations.

Monte Carlo (MC) simulations:

A computational method that uses statistical sampling to simulate the behavior of a system, based on random input parameters.

Basis set:

A set of mathematical functions used to describe the wave function of electrons in a quantum mechanical calculation.

Gaussian function:

A mathematical function that describes the shape of a wave function in a quantum mechanical calculation.

Energy minimization:

A computational method used to find the lowest-energy configuration of a system, by adjusting the positions of the atoms or molecules until the energy is minimized.

Ab initio molecular dynamics:

A type of molecular dynamics simulation that uses ab initio calculations to describe the behavior of a system.

Hybrid method:

A computational method that combines two or more types of calculations, such as DFT and molecular mechanics, to improve accuracy.

Electrostatic potential:

The electric field generated by a charged particle, which can be used to describe the interactions between molecules in a system.

Non-bonded interaction:

Interactions between atoms or molecules that are not directly bonded to each other, such as van der Waals forces or electrostatic interactions.

Docking:

A computational method that predicts the binding mode and affinity of a small molecule to a target protein.

Ligand:

A small molecule that binds to a target protein and modulates its activity.

Receptor:

A protein that binds to a ligand and mediates its biological activity.

Molecular dynamics simulations:

A computational method that simulates the motion of atoms and molecules over time to predict the behavior of a system.

Virtual screening:

A computational method that uses molecular docking or other techniques to identify potential drug candidates from a large database of compounds.

Pharmacophore:

A set of chemical features that are necessary for a ligand to bind to a receptor and produce a biological response.

Quantitative structure-activity relationship (QSAR):

A computational method that predicts the activity of a compound based on its chemical structure and the activity of similar compounds.

Homology modeling:

A computational method that predicts the structure of a protein based on its sequence similarity to a known protein structure.

Fragment-based drug design:

A method of drug design that involves the identification of small fragments that bind to a target protein and the subsequent assembly of these fragments into a larger molecule.

Lead optimization:

The process of modifying a lead compound to improve its potency, selectivity, pharmacokinetic properties, or other desirable characteristics.

Drug-likeness:

A set of physicochemical properties that are commonly found in approved drugs, used to evaluate the potential of a compound to become a drug.

Target validation:

The process of demonstrating that a target protein is biologically relevant and is a suitable target for drug discovery.

Machine Learning for Cheminformatics

Algorithm:

A set of rules a model follows to make predictions (e.g., Random Forest, Deep Neural Networks).

Bias:

Systematic error in predictions, e.g., a QSAR model favoring certain molecular scaffolds.

Cross-validation:

Splitting data multiple times to ensure robust model performance.

Data augmentation:

Artificially increasing dataset size, e.g., generating molecular conformers.

Decision boundary:

A threshold separating active vs. inactive compounds in classification models.

Epoch:

One full pass through the training data in deep learning.

Feature engineering:

Creating relevant molecular descriptors (e.g., logP, TPSA, fingerprints).

Feature selection:

Choosing the most informative descriptors for QSAR models.

Generalization:

A model’s ability to predict new molecules, not just training data.

Gradient descent:

Optimization method to update model weights in deep learning.

Hyperparameter tuning:

Optimizing settings like learning rate, number of layers, or trees in Random Forest.

Interpretability:

Understanding why a model predicts a molecule as active/inactive (e.g., SHAP, LIME).

Loss function:

Measures how far predictions are from true values, e.g., RMSE in regression.

Model overfitting:

When a model memorizes training data but fails on new molecules.

Neural networks:

ML models inspired by the brain, used in de novo drug design.

One-hot encoding:

Converting categorical data like SMILES tokens into binary form.

Overfitting:

When a model is too specific to training data and lacks generalization.

Precision & recall:

Metrics to evaluate models predicting bioactivity (e.g., hit discovery).

QSAR (Quantitative Structure-Activity Relationship):

ML models predicting molecular bioactivity from descriptors.

Regularization:

Methods like L1/L2 to prevent overfitting in QSAR models.

ROC-AUC:

Metric evaluating a model’s ability to separate actives/inactives.

SMILES (Simplified Molecular Input Line Entry System):

Text-based molecular representation.

Supervised learning:

Training models with labeled data (e.g., binding affinity prediction).

Unsupervised learning:

Finding patterns in unlabeled molecular datasets (e.g., clustering).

Validation set:

Data used to tune the model before final testing.

Yassir Boulaamane