Step-by-Step MD Simulation of a Protein–Ligand Complex with GROMACS
Introduction
This post walks through the complete setup of a protein–ligand MD simulation with GROMACS.
We use CHARMM36 for proteins, CGenFF for ligands, and the TIP3P water model.
Each step is explained briefly before showing the corresponding command.
Step 0: PDBFixer
We begin by repairing the input PDB file (adding missing atoms, hydrogens, etc.). This ensures the protein structure is suitable for simulations.
conda activate pdbfixer
pdbfixer protein.pdb --output=protein_fixed.pdb
Step 1: Extract and clean the protein
Next, we separate the ligand (UNK or cofactors like FAD) and clean the protein file.
This avoids force field issues caused by unknown residues.
grep UNK protein.pdb > unk.pdb
grep -v "UNK" protein.pdb > clean.pdb
grep -v -e "UNK" -e "FAD" protein_fixed.pdb > clean.pdb
conda activate pdbfixer
pdbfixer clean.pdb --output=clean.pdb
Step 2: Generate protein topology
We use pdb2gmx
to generate the topology of the clean protein with CHARMM36.
Here we also choose protonation states interactively.
gmx pdb2gmx -f clean.pdb -o processed.gro -ter -ignh << EOF
1
1
0
0
0
0
EOF
Step 3: Convert ligand to MOL2 format
The ligand is converted to .mol2
format with hydrogens added and atom typing corrected.
obabel unk.pdb -O unk.mol2 --addh --gen3d
sed -i '2s/.*/UNK/; s/UNK1/UNK/g' unk.mol2
Step 4: Sort ligand bonds
We ensure the bond ordering in the MOL2 file is consistent using a Perl script.
perl sort_mol2_bonds.pl unk.mol2 unk_fix.mol2
Step 5: Parameterize the ligand with CGenFF
The ligand is parameterized using CGenFF, and converted to a GROMACS-compatible format.
~/mdrun/silcsbio.2024.1/cgenff/cgenff unk_fix.mol2 -f unk.str
chmod +x cgenff_charmm2gmx.py
conda activate cgenff
python cgenff_charmm2gmx.py UNK unk_fix.mol2 unk.str charmm36-jul2022.ff
rm cgenff_charmm2gmx.py
Step 6: Convert ligand/cofactor to GROMACS format
We now generate .gro
files for both the ligand and cofactor (if present).
gmx editconf -f unk_ini.pdb -o unk.gro
gmx editconf -f fad_ini.pdb -o fad.gro
Step 7: Merge protein and ligands
We merge the protein, ligand, and optional cofactors into one structure file.
Atom counts are updated to reflect the combined system.
cp processed.gro complex.gro && sed -i '$d' complex.gro && tail -n +3 unk.gro >> complex.gro && echo " " >> complex.gro
num1=$(sed -n '2p' complex.gro)
num2=$(sed -n '2p' unk.gro)
sum=$((num1 + num2))
sed -i "2s/.*/ $sum/" complex.gro
For protein + FAD + UNK:
cp processed.gro complex.gro && \
sed -i '$d' complex.gro && \
tail -n +3 fad.gro >> complex.gro && \
tail -n +3 unk.gro >> complex.gro && \
echo " " >> complex.gro
num1=$(sed -n '2p' processed.gro)
num2=$(sed -n '2p' fad.gro)
num3=$(sed -n '2p' unk.gro)
sum=$((num1 + num2 + num3))
sed -i "2s/.*/ $sum/" complex.gro
Step 8: Update topology file
We edit topol.top
to include the ligand topologies and parameters.
sed -i '/; Include Position restraint file/{:a;N;/#endif/!ba;s/\n#endif\n/\n#endif/;s/#endif/#endif\n\n; Include ligand topology\n#include "unk.itp"/}' topol.top
sed -i '/; Include forcefield parameters/{:a;N;/#include ".\/charmm36-jul2022.ff\/forcefield.itp"/!ba;s/#include ".\/charmm36-jul2022.ff\/forcefield.itp"/&\n\n; Include ligand parameters\n#include "unk.prm"/}' topol.top
sed -i '/\[ molecules \]/,/Protein_chain_A/ {/Protein_chain_A/ a\\nUNK 1\n}' topol.top
For FAD + UNK:
sed -i '/; Include Position restraint file/{:a;N;/#endif/!ba;s/#endif/#endif\n\n; Include ligand topologies\n#include "fad.itp"\n#include "unk.itp"/}' topol.top
sed -i '/; Include forcefield parameters/{:a;N;/#include ".\/charmm36-jul2022.ff\/forcefield.itp"/!ba;s|#include ".\/charmm36-jul2022.ff\/forcefield.itp"|&\n\n; Include ligand parameters\n#include "fad.prm"\n#include "unk.prm"|}' topol.top
sed -i '/\[ molecules \]/,/Protein_chain_A/ {/Protein_chain_A/ a\\nFAD 1\\nUNK 1\n}' topol.top
Step 9: Define box and solvate
We place the system in a cubic box with a 1.4 nm buffer and solvate with TIP3P water.
gmx editconf -f complex.gro -o newbox.gro -bt cubic -c -d 1.4
gmx solvate -cp newbox.gro -cs spc216.gro -p topol.top -o solv.gro
Step 10: Add ions
We neutralize the system and add physiological salt concentration (0.15 M NaCl).
cat <<EOF > ions.mdp
integrator = steep
emtol = 1000.0
nsteps = 50000
cutoff-scheme = Verlet
coulombtype = PME
rcoulomb = 1.0
rvdw = 1.0
constraints = none
EOF
gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr -maxwarn 3
gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral -conc 0.15 <<EOF
15
EOF
Step 11: Energy minimization
We relax steric clashes by running a steepest descent minimization.
gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr
gmx mdrun -v -deffnm em
Step 12: Generate ligand restraints
Ligand restraints are generated for equilibration to avoid large displacements.
gmx make_ndx -f unk.gro -o index_unk.ndx << EOF
0 & ! a H*
q
EOF
gmx genrestr -f unk.gro -n index_unk.ndx -o posre_unk.itp -fc 1000 1000 1000 <<EOF
3
EOF
sed -i '/; Include water topology/{x;s/.*/\n; Ligand position restraints\n#ifdef POSRES\n#include "posre_unk.itp"\n#endif\n/;G}' topol.top
Step 13: Create index for complex
We define a custom index file including protein and ligand groups.
gmx make_ndx -f solv_ions.gro -o index.ndx << EOF
1 | 13
q
EOF
Step 14: NVT equilibration
We equilibrate temperature at 300 K with position restraints.
gmx grompp -f nvt.mdp -c solv_ions.gro -r solv_ions.gro -p topol.top -n index.ndx -o nvt.tpr
gmx mdrun -deffnm nvt
Step 15: NPT equilibration
We equilibrate pressure at 1 bar using the Parrinello-Rahman barostat.
gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -n index.ndx -o npt.tpr
gmx mdrun -deffnm npt
Step 16: Production MD
Finally, we run a 100 ns production simulation in the NPT ensemble.
gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md_0_100.tpr
gmx mdrun -deffnm md_0_100
Notes
- Always fix PBC before analysis.
- Discard the first 5–10 ns of equilibration before data collection.
- Stable RMSD, Rg, SASA values indicate good simulation behavior.
- Use PCA/FEL to explore large-scale motions and free energy landscapes.