Step-by-Step MD Simulation of a Protein–Ligand Complex with GROMACS

Apr 4, 2025 · 5 min read

Introduction

This post walks through the complete setup of a protein–ligand MD simulation with GROMACS.
We use CHARMM36 for proteins, CGenFF for ligands, and the TIP3P water model.
Each step is explained briefly before showing the corresponding command.


Step 0: PDBFixer

We begin by repairing the input PDB file (adding missing atoms, hydrogens, etc.). This ensures the protein structure is suitable for simulations.

conda activate pdbfixer
pdbfixer protein.pdb --output=protein_fixed.pdb

Step 1: Extract and clean the protein

Next, we separate the ligand (UNK or cofactors like FAD) and clean the protein file.
This avoids force field issues caused by unknown residues.

grep UNK protein.pdb > unk.pdb
grep -v "UNK" protein.pdb > clean.pdb

grep -v -e "UNK" -e "FAD" protein_fixed.pdb > clean.pdb

conda activate pdbfixer
pdbfixer clean.pdb --output=clean.pdb

Step 2: Generate protein topology

We use pdb2gmx to generate the topology of the clean protein with CHARMM36.
Here we also choose protonation states interactively.

gmx pdb2gmx -f clean.pdb -o processed.gro -ter -ignh << EOF
1
1
0
0
0
0
EOF

Step 3: Convert ligand to MOL2 format

The ligand is converted to .mol2 format with hydrogens added and atom typing corrected.

obabel unk.pdb -O unk.mol2 --addh --gen3d
sed -i '2s/.*/UNK/; s/UNK1/UNK/g' unk.mol2

Step 4: Sort ligand bonds

We ensure the bond ordering in the MOL2 file is consistent using a Perl script.

perl sort_mol2_bonds.pl unk.mol2 unk_fix.mol2

Step 5: Parameterize the ligand with CGenFF

The ligand is parameterized using CGenFF, and converted to a GROMACS-compatible format.

~/mdrun/silcsbio.2024.1/cgenff/cgenff unk_fix.mol2 -f unk.str
chmod +x cgenff_charmm2gmx.py

conda activate cgenff
python cgenff_charmm2gmx.py UNK unk_fix.mol2 unk.str charmm36-jul2022.ff

rm cgenff_charmm2gmx.py

Step 6: Convert ligand/cofactor to GROMACS format

We now generate .gro files for both the ligand and cofactor (if present).

gmx editconf -f unk_ini.pdb -o unk.gro
gmx editconf -f fad_ini.pdb -o fad.gro

Step 7: Merge protein and ligands

We merge the protein, ligand, and optional cofactors into one structure file.
Atom counts are updated to reflect the combined system.

cp processed.gro complex.gro && sed -i '$d' complex.gro && tail -n +3 unk.gro >> complex.gro && echo "   " >> complex.gro

num1=$(sed -n '2p' complex.gro)
num2=$(sed -n '2p' unk.gro)
sum=$((num1 + num2))
sed -i "2s/.*/ $sum/" complex.gro

For protein + FAD + UNK:

cp processed.gro complex.gro && \
sed -i '$d' complex.gro && \
tail -n +3 fad.gro >> complex.gro && \
tail -n +3 unk.gro >> complex.gro && \
echo "   " >> complex.gro

num1=$(sed -n '2p' processed.gro)
num2=$(sed -n '2p' fad.gro)
num3=$(sed -n '2p' unk.gro)
sum=$((num1 + num2 + num3))
sed -i "2s/.*/ $sum/" complex.gro

Step 8: Update topology file

We edit topol.top to include the ligand topologies and parameters.

sed -i '/; Include Position restraint file/{:a;N;/#endif/!ba;s/\n#endif\n/\n#endif/;s/#endif/#endif\n\n; Include ligand topology\n#include "unk.itp"/}' topol.top

sed -i '/; Include forcefield parameters/{:a;N;/#include ".\/charmm36-jul2022.ff\/forcefield.itp"/!ba;s/#include ".\/charmm36-jul2022.ff\/forcefield.itp"/&\n\n; Include ligand parameters\n#include "unk.prm"/}' topol.top

sed -i '/\[ molecules \]/,/Protein_chain_A/ {/Protein_chain_A/ a\\nUNK                 1\n}' topol.top

For FAD + UNK:

sed -i '/; Include Position restraint file/{:a;N;/#endif/!ba;s/#endif/#endif\n\n; Include ligand topologies\n#include "fad.itp"\n#include "unk.itp"/}' topol.top

sed -i '/; Include forcefield parameters/{:a;N;/#include ".\/charmm36-jul2022.ff\/forcefield.itp"/!ba;s|#include ".\/charmm36-jul2022.ff\/forcefield.itp"|&\n\n; Include ligand parameters\n#include "fad.prm"\n#include "unk.prm"|}' topol.top

sed -i '/\[ molecules \]/,/Protein_chain_A/ {/Protein_chain_A/ a\\nFAD                 1\\nUNK                 1\n}' topol.top

Step 9: Define box and solvate

We place the system in a cubic box with a 1.4 nm buffer and solvate with TIP3P water.

gmx editconf -f complex.gro -o newbox.gro -bt cubic -c -d 1.4
gmx solvate -cp newbox.gro -cs spc216.gro -p topol.top -o solv.gro

Step 10: Add ions

We neutralize the system and add physiological salt concentration (0.15 M NaCl).

cat <<EOF > ions.mdp
integrator     = steep
emtol          = 1000.0
nsteps         = 50000
cutoff-scheme  = Verlet
coulombtype    = PME
rcoulomb       = 1.0
rvdw           = 1.0
constraints    = none
EOF

gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr -maxwarn 3
gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral -conc 0.15 <<EOF
15
EOF

Step 11: Energy minimization

We relax steric clashes by running a steepest descent minimization.

gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr
gmx mdrun -v -deffnm em

Step 12: Generate ligand restraints

Ligand restraints are generated for equilibration to avoid large displacements.

gmx make_ndx -f unk.gro -o index_unk.ndx << EOF
0 & ! a H*
q
EOF

gmx genrestr -f unk.gro -n index_unk.ndx -o posre_unk.itp -fc 1000 1000 1000 <<EOF
3
EOF

sed -i '/; Include water topology/{x;s/.*/\n; Ligand position restraints\n#ifdef POSRES\n#include "posre_unk.itp"\n#endif\n/;G}' topol.top

Step 13: Create index for complex

We define a custom index file including protein and ligand groups.

gmx make_ndx -f solv_ions.gro -o index.ndx << EOF
1 | 13
q
EOF

Step 14: NVT equilibration

We equilibrate temperature at 300 K with position restraints.

gmx grompp -f nvt.mdp -c solv_ions.gro -r solv_ions.gro -p topol.top -n index.ndx -o nvt.tpr
gmx mdrun -deffnm nvt

Step 15: NPT equilibration

We equilibrate pressure at 1 bar using the Parrinello-Rahman barostat.

gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -n index.ndx -o npt.tpr
gmx mdrun -deffnm npt

Step 16: Production MD

Finally, we run a 100 ns production simulation in the NPT ensemble.

gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md_0_100.tpr
gmx mdrun -deffnm md_0_100

Notes

  • Always fix PBC before analysis.
  • Discard the first 5–10 ns of equilibration before data collection.
  • Stable RMSD, Rg, SASA values indicate good simulation behavior.
  • Use PCA/FEL to explore large-scale motions and free energy landscapes.