How to Validate AlphaFold Structures

March 28, 2024 36 minute read

AlphaFold has revolutionized protein structure prediction by achieving unprecedented accuracy in many cases (1). However, even high-quality predicted models should be rigorously validated before use in research. Just as experimental structures undergo validation (geometry checks, reliability scores, etc.), AlphaFold-predicted structures require careful assessment of confidence, stereochemistry, and functional plausibility. In this comprehensive guide, we outline practical strategies to validate an AlphaFold model, from interpreting AlphaFold’s own confidence metrics to using external validation tools, comparing against known structures, and even testing the model’s behavior in silico. The goal is to ensure the predicted structure is trustworthy for drawing biological conclusions.

AlphaFold Confidence Metrics: pLDDT and PAE

AlphaFold provides intrinsic confidence measures that are the first step in validation. pLDDT (predicted Local Distance Difference Test) is a per-residue confidence score ranging from 0 to 100. Higher pLDDT means the model is more confident that the local structure around that residue is accurate (2). This metric is based on the lDDT-Cα score, which assesses local structural accuracy without requiring a global superposition (3). In practice, AlphaFold classifies residues into confidence bands:

pLDDT ≥ 90: Very high confidence (high accuracy for backbone and side chains).
70 ≤ pLDDT < 90: Confident – backbone likely correct, though side chains may have some error.
50 ≤ pLDDT < 70: Low confidence – the region is less reliable, possibly mis-modeled.
pLDDT < 50: Very low confidence – often indicates disordered or flexible regions, which AlphaFold expects might not adopt a single stable structure in reality.

AlphaFold typically encodes pLDDT in the B-factor field of the output PDB/mmCIF file (4). This means visualization programs can color the structure by B-factor to see confidence per residue. For example, in PyMOL you can load the model and run a command to color by B-factor (pLDDT):

# In PyMOL, after loading the AlphaFold model:
spectrum b, red_white_blue, minimum=0, maximum=100
# This colors residues from red (low pLDDT) to blue (high pLDDT).

AlphaFold’s second key metric is the PAE (Predicted Aligned Error). PAE is a measure of the model’s confidence in the relative positions of two residues (5). More formally, PAE(x, y) is the expected error (in Å) at residue x’s position if the predicted structure were aligned to the (unknown) true structure on residue y (6). In simpler terms, PAE tells us how well the model thinks region x and region y are positioned relative to each other. A low PAE between two residues (or domains) implies the model is confident about their relative arrangement, whereas a high PAE means the relative placement is uncertain (7). For example, if two domains have consistently high PAE with each other, the model is essentially saying “these domains might move relative to each other, so don’t trust the exact orientation.” By contrast, low PAE between two domains suggests a well-defined domain packing (8).

PAE is often visualized as a heatmap (matrix) where each axis is residue index. AlphaFold’s structure database provides an interactive PAE plot alongside each model (9,10). In a PAE heatmap, the diagonal is always dark (zero error when a residue is aligned on itself) and can be ignored (10). Off-diagonal blocks indicate inter-residue or inter-domain confidence: dark green/blue regions = low error (high confidence), and light colors = high error (low confidence). For instance, a multi-domain protein often shows clear dark blocks along the diagonal (each domain internally high confidence) separated by lighter regions (uncertain relative domain orientation). Interpreting PAE in validation is crucial: even if each domain has high pLDDT, a high PAE between domains means you should not assume the domain arrangement is correct (10). In such cases, additional validation (or experimental data) is needed to confirm domain orientations.

In summary, check pLDDT and PAE first. These tell you which parts of the model are likely reliable and whether the overall fold (especially multi-domain architecture) is predicted with confidence. Regions with low pLDDT or very high PAE should be treated with caution – they might be disordered in reality or simply modeled inaccurately. Often, those regions may require further analysis or may be trimmed out if you only need the stable core of the structure.

Visual Inspection of the Model

After reviewing AlphaFold’s confidence metrics, the next validation step is a careful visual inspection of the model. This is a straightforward but powerful way to catch obvious issues. Tools such as PyMOL, UCSF ChimeraX, and Mol* (MolStar) are invaluable for this task:

PyMOL – A widely-used molecular graphics tool. Load the AlphaFold PDB, and visualize secondary structure (cartoon view) to see if helices and strands look well-formed. Color the model by pLDDT (as described above) to highlight low-confidence segments. Pay attention to any steric clashes or oddly twisted geometries. For example, check if any side chains deeply penetrate other parts of the protein (which might indicate a clash or error). PyMOL’s distance measurement can help verify if supposed disulfide bonds or salt bridges are geometrically plausible. If you have a known motif (e.g., a catalytic triad), inspect whether those residues align in the expected geometry.
UCSF ChimeraX – A modern visualization tool that can directly fetch AlphaFold models from the AlphaFold DB by UniProt ID, and it has built-in presets for AlphaFold confidence coloring. In ChimeraX, one can simply use the command color bfactor palette alphafold to apply the standard pLDDT color scheme (blue = high confidence, orange/red = low). Use ChimeraX’s structure analysis features to examine hydrogen bonds, contacts, and clashes. Chimera(X) also allows viewing the PAE map if you have it, or you can segment the model by domains to see how they interact. If the model has flexible termini (often glowing red/orange from low pLDDT), you might visualize those separately – they could be flopping around and not relevant to the stable core.
Mol* – This is the web-based 3D viewer used by the AlphaFold Protein Structure Database (11). If you obtained your model from AlphaFold DB, you can use the online Mol* viewer to instantly see the structure colored by confidence and the PAE plot side-by-side. Mol* is great for a quick look without installing software. Through Mol*, you can rotate the model, identify secondary structure elements, and even select a region on the PAE plot to highlight which parts of the 3D structure are uncertain (12). While not as feature-rich as desktop tools, it’s convenient for an initial overview.

During visual inspection, look for red flags such as:

Unphysical bond lengths or angles: AlphaFold generally produces chemically sensible structures, but extremely distorted geometries (e.g., a peptide bond angle that looks abnormal) could appear in low-confidence regions. These are rare, but if present, they’ll likely be flagged by subsequent validation tools as well.
Clashes: Overlapping atoms or very tightly packed regions. Our eyes might miss slight clashes, so after visual review you’ll use tools like MolProbity to quantify this. But obvious clashes (like two side chains on top of each other) can sometimes be spotted visually.
Chain breaks or gaps: Check if the protein backbone is continuous. If AlphaFold was unsure in a region, it might produce an unphysical loop. Usually AlphaFold does not literally break the chain (except maybe at extreme low confidence segments, it still gives a continuous but perhaps highly extended or tangled loop). If you do see a gap (missing residues), it could be because the input sequence had a region AlphaFold couldn’t model confidently or was omitted – though in normal operation AlphaFold predicts all residues provided.
Secondary structure mismatches: Does the model’s secondary structure make sense given the sequence? If you have prior knowledge (say from sequence analysis, you expect three transmembrane helices or a beta-sheet in a certain region), see if the model matches that. If AlphaFold predicts a region to be helical (and with high confidence), it likely is. But if you see something like a beta-strand contorted oddly, or a helix with a kink, note those for further scrutiny.
Active site or binding site conformation: If the protein’s function is known and certain residues are critical (e.g., enzyme active site), inspect that region. Are the catalytic residues oriented in a plausible way? Is there a cavity where a substrate could bind? Sometimes AlphaFold might predict a side chain orientation that blocks a known binding cleft (especially if no ligand was present during prediction). Visualizing this early can hint if the model is consistent with function or if something might be off.

Visual inspection is mostly qualitative, but it sets the stage for targeted questions. For instance, if you notice the model has an unexpected large cavity or an unusual domain orientation, you might plan specific validations (like docking a ligand into that cavity, or checking if domain orientation is supported by known data). At this stage, you should also decide if any segments of the model might be so low confidence that they should be treated separately (for example, you might exclude a floppy tail from further analysis if it’s not essential to the protein’s folded core).

Using Structural Validation Tools

Automated structure-validation tools provide quantitative checks on your model’s stereochemistry and overall quality. Many of these tools were originally developed for validating experimentally determined structures, but they are equally applicable to predicted models. Here are some key validation tools and what they offer:

MolProbity – A comprehensive all-atom structure validation server (13). MolProbity checks for steric clashes by adding hydrogens to the structure and finding all-atom overlaps. It also evaluates bond angles and dihedrals, highlighting Ramachandran outliers (phi/psi angles that fall in disallowed regions of the Ramachandran plot) and rotamer outliers (unusual side chain conformations). MolProbity provides an overall score (the “MolProbity score”) and specific metrics like clashscore (number of serious clashes per 1000 atoms), percentage of Ramachandran favored/allowed/outlier residues, and more (14,15). A good AlphaFold model should have low clashscore and few (if any) Ramachandran outliers. However, it’s not uncommon for some low-pLDDT loops in AlphaFold models to show outliers or clashes, since the model wasn’t physics-refined. MolProbity can guide you to those problematic spots; you can then consider refining them (e.g., via energy minimization or rebuilding). Tip: If MolProbity reports many clashes or poor geometry in a high-pLDDT region, that’s a red flag – the model might be locally strained, and further investigation is needed.
PROCHECK – A classic tool (from 1993) that checks the stereochemical quality of a protein structure (16). PROCHECK generates a number of outputs, the most famous being the Ramachandran plot. It will tell you what fraction of residues lie in favored, allowed, and disallowed regions of Ramachandran space. It also checks other parameters: peptide bond planarity, bond length deviations, Cβ deviations, and backbone torsion angle quality. In essence, PROCHECK ensures your model’s geometry is within typical bounds seen in crystallographic structures. For an AlphaFold model, you’d like to see the vast majority of residues in the favored regions of the Ramachandran plot (ideally >90% favored, <1% outliers, similar to high-resolution crystal structures). AlphaFold models generally fare well here, especially in well-defined regions, but if you have an outlier, PROCHECK will pinpoint it. You can run PROCHECK via the command-line (it’s part of the CCP4 suite) or conveniently through web servers like PDBsum (which can generate PROCHECK analysis for any uploaded structure).
WHAT_CHECK – Part of the WHAT IF software suite, WHAT_CHECK is an extensive consistency check for protein structures. It examines a wide range of criteria: geometry, van der Waals contacts, environment of residues, and more. For example, it will check if buried polar groups have hydrogen-bonding partners, if side chain torsions are consistent with electron density (for experimental structures), etc. With a predicted model, some checks (like comparison to electron density) don’t apply, but WHAT_CHECK’s geometrical and empirical rules can still flag issues. It’s quite thorough – sometimes producing a long report. If you have access to the WHAT IF web interface or servers, running WHAT_CHECK can complement PROCHECK/MolProbity by catching subtle issues. However, if you’re already using MolProbity and PROCHECK, you may get sufficient info; use WHAT_CHECK if you want an extra layer of scrutiny.
Verify3D – A 3D-1D profile method that checks residue environment compatibility (17). Essentially, Verify3D evaluates each residue in the model to see if its environment (surface vs buried, secondary structure context, etc.) is consistent with what one would expect for that amino acid. It returns a score for each residue and an overall percentage of residues that pass a threshold. Typically, you want a Verify3D result where >80% of residues have a score above the cutoff (0.2 in the original scoring scheme) (18). If Verify3D flags certain residues, it means those amino acids are in an environment that is unusual (e.g., a polar residue buried without contacts, or a hydrophobic residue exposed to solvent) – potentially indicating a modeling error or an unusual but real feature. When validating an AlphaFold model, a low Verify3D percentage (significantly below 80%) is a warning. It could be due to a genuine peculiarity of the protein, but often it suggests some regions of the model are not structurally sound. Verify3D is available via the UCLA SAVES server (which bundles several validation tools).
ERRAT – A program that analyzes non-bonded atom-atom interactions in the model and identifies regions that deviate from expected patterns (19). ERRAT essentially plots a quality score across the protein and gives an overall quality factor (as a percentage). High-quality structures tend to have an ERRAT overall quality score in the high 80s or 90s (out of 100), whereas a significantly lower score might indicate problems. For example, an ERRAT score of ~79% (as seen in some homology models) is marginal, and one would try to improve the model (20). ERRAT is also available via the UCLA SAVES server. If ERRAT flags a specific region (say residues 50–60) as problematic, that region likely has odd interatomic distances (perhaps a mis-packed helix or something). Cross-check such regions with what pLDDT said – often they coincide with low pLDDT stretches.
SwissModel Structure Assessment (QMEAN) – The SwissModel server provides a model quality assessment tool that includes the QMEAN scoring function. QMEAN (Qualitative Model Energy ANalysis) is a composite score evaluating various geometrical aspects (e.g., torsion angles, solvation, secondary structure packing) to estimate how “protein-like” a model is (21). When you submit a model to SwissModel’s Assessment, you receive a QMEAN score and a QMEAN Z-score which compares your model to a baseline of experimental structures of similar size. A QMEAN Z-score around 0 is good (meaning your model is in the range of normal structures); a score significantly below -4.0 indicates a model of low quality (likely incorrect) (22). QMEAN also provides per-residue scores, so you can see which regions contribute to lowering the score. For AlphaFold models, QMEAN often yields scores consistent with high accuracy for high-pLDDT regions, but may penalize the low-confidence regions. Use this as an overall “sanity check” – if even QMEAN (which is trained on typical protein statistics) finds your entire model to be an outlier, you should be cautious. SwissModel will also report other descriptors like predicted secondary structure vs model secondary structure agreement, and possible buriedness mismatches.

All these tools provide complementary views of model quality. In practice, you don’t have to use every single one for every model; but it’s wise to use at least one geometry-focused tool (MolProbity/PROCHECK) and one environment-focused tool (Verify3D/ERRAT/QMEAN). Many of these can be run easily via web servers. For instance, the UCLA SAVES web server allows you to upload a PDB and get results from PROCHECK, Verify3D, and ERRAT together. MolProbity has its own web server (as part of Phenix or standalone). SwissModel’s assessment is another one-stop option for QMEAN and additional analyses.

When you get the results, pay attention to consistent signals: e.g., if MolProbity and PROCHECK both say the same residue is an outlier (bad geometry) and Verify3D also flags that region, that part of the model is definitely suspect. You might then decide to remodel that segment or simply note that it’s unreliable. On the other hand, if all tools give you largely clean reports (few clashes, good Ramachandran stats, high Verify3D percentage, high ERRAT score), you can be more confident the model is reasonable.

Comparison with Experimental Structures

If a related experimental structure is available, comparing your AlphaFold model to it is one of the most telling validations. Such comparisons can quantify the accuracy of the prediction and highlight any differences that might be important.

Finding a reference structure: First, determine if the same protein (or a homolog with significant sequence identity) has a known structure in the Protein Data Bank (PDB). You might have done this prior to running AlphaFold, but it’s worth doing a BLAST or HHPred search of your sequence against PDB. Even if an exact match isn’t found, perhaps a homologous domain or a partial structure exists. Any high-resolution structure (X-ray, cryo-EM, NMR) of a similar sequence can serve as a reference to validate fold correctness.

Structural alignment tools: To compare the 3D structures, you can use alignment programs like TM-align or built-in tools in PyMOL/Chimera.

TM-align – A command-line tool that superimposes two protein structures and calculates the TM-score and RMSD. The TM-score is particularly useful; it ranges from 0 to 1, where 1 indicates an identical structure. Importantly, TM-score is designed to be length-independent and sensitive to overall fold (23,24). A TM-score > 0.5 generally indicates the two structures share the same fold (25) (and >0.9 would mean almost identical). Scores < 0.2 indicate essentially unrelated structures (random similarity) (26). RMSD (root-mean-square deviation) in Å is also output, but RMSD can be misleading for large proteins or ones with flexible termini, so TM-score is the better metric for global accuracy. To use TM-align, download it from the Zhang Lab website and run it on your model and the reference PDB:

# Example usage of TM-align
TMalign model.pdb reference.pdb > tmalign_output.txt

The output will list aligned length, RMSD, and TM-scores. Check the TM-score first. For instance, if you get TM-score = 0.65, that’s a strong confirmation the fold is correct (assuming the reference covers the full length). You might also get a per-residue alignment which can identify where differences occur (e.g., a loop in the model that deviates from the crystal structure).

PyMOL or Chimera(X) alignment – In PyMOL, you can use the align or cealign commands to superimpose two structures (if sequences are identical or very similar, a straightforward align model, reference works; for more distantly related, PyMOL’s built-in alignment might struggle, so TM-align is better). UCSF Chimera has a tool called MatchMaker which can align structures based on sequences and secondary structure. After superposition, you can visually inspect differences and also measure RMSD over aligned regions. Look at the alpha carbon trace: do the helices and strands overlap well? Are there specific regions where the model deviates? Often, you’ll find that high-pLDDT regions of the AlphaFold model align closely with the experimental structure (perhaps within 1–2 Å RMSD), whereas low-confidence loops might be displaced or completely different (AlphaFold might have modeled a loop arbitrarily if there was no template or signal for it).

Interpreting differences: If your model aligns well (high TM-score, low RMSD) with an experimental structure, that’s great validation. Minor differences in side chain conformations are expected and not usually concerning, especially if the experimental structure was determined at moderate resolution or is in a different environment (AlphaFold predictions are effectively in isolation, whereas a crystal structure might have bound ligands or lattice contacts that shift things). Focus on larger deviations:

Domain shifts: Perhaps the model predicted a closed conformation whereas the crystal is open (or vice versa). Check the PAE from AlphaFold; it might have indicated uncertainty between domains, and indeed the experimental structure shows a different orientation. This doesn’t mean the model is “wrong” per se, but that the protein might be dynamic or the model chose one plausible state. You might then need to verify which state is relevant biologically (through experiments or further modeling).
Loop conformations: A common scenario is that AlphaFold modeled a loop one way (especially if low confidence, it might even be a nonsense loop), and the crystal structure shows a different loop conformation. If the loop is functionally important (e.g., part of an active site), you’d want to trust the experimental structure more. If it’s surface-exposed, it might be flexible in reality; AlphaFold’s guess is just one of many conformations it could take. In such cases, don’t over-interpret that part of the model.
Secondary structure errors: Rarely, AlphaFold might predict a helix where the actual structure has a strand, or vice versa. This usually happens in lower confidence regions or if the sequence had an unusual pattern. If you find such a discrepancy, that region of the model is likely incorrect. Use the experimental structure’s assignment as the truth. It’s worth noting if the pLDDT was low there (likely yes); AlphaFold was unsure and picked one possibility. Knowing the correct secondary structure, you might rebuild that segment or simply note the model’s limitation.

If no experimental structure is available for the exact protein, you can still do remote validations:

Align the model to the closest known structure (even if only part of it aligns). For example, maybe one domain matches a known protein family structure. Validate that portion – a good alignment (say TM-score >0.5 for that domain) gives confidence that domain is correct. The other parts of your model that have no benchmark will rely on the internal metrics and other validations we discuss.
Use multiple structure prediction approaches and compare (e.g., if there’s an older homology model or a Rosetta model for the protein). Consistency between different models can be somewhat validating, though remember they might all be wrong in the same way if based on the same inputs. Still, if AlphaFold and, say, Rosetta both produce a similar fold for a novel protein, that convergence is encouraging.

Overall, structural comparison is the gold standard when you have an experimental reference. It can confirm that AlphaFold got it right, or pinpoint exactly where it differs. In favorable cases (AlphaFold’s high-accuracy predictions), you might find your model is virtually identical to the crystal structure of a homolog, with perhaps a TM-score ~0.9 and main-chain RMSD ~1 Å. That level of agreement would give you great confidence moving forward. On the other hand, if the model and experiment differ significantly (low TM-score), you should trust the experiment more – and treat the model with skepticism or use it only for qualitative insights in regions that do align well.

Functional Validation: Ligand Binding and Dynamics

The ultimate purpose of a protein structure is often to investigate function – e.g., how it binds to a ligand, catalyzes a reaction, or interacts with other molecules. Thus, validating an AlphaFold model also involves asking: Does this structure make sense functionally? There are a few approaches to explore this:

1. Checking Binding Sites and Docking Ligands: If your protein is known (or suspected) to bind a small-molecule ligand, cofactor, or substrate, examine the predicted structure for the binding site. AlphaFold might not explicitly indicate pockets, but you can use tools or visual cues:

Use a pocket detection tool like CASTp or Fpocket to identify cavities on the model. These tools will list pockets by volume and residues lining them. Check if a known active site corresponds to one of the top pockets.
If you have a known ligand (say, the substrate of an enzyme or a cofactor like NADH), try docking it into the model. Software like AutoDock Vina (open source) or SwissDock, Glide (commercial) can take your model (treated as the receptor) and the ligand structure and attempt to find plausible binding poses. Before docking, prepare the protein: add hydrogens (especially if using AutoDock, you’d prepare a PDBQT file with charges), and consider making the protein rigid for the initial docking. A successful dock would show the ligand fitting well into a pocket, with key interactions (hydrogen bonds to the right residues, hydrophobics nestled in hydrophobic patches, etc.) similar to what you’d expect. If docking suggests a reasonable pose and binding energy, it supports the validity of the binding site conformation in the model. If the ligand cannot dock or only docks in a bizarre way (or with a very poor score), that might indicate the binding site in the model is not accurately formed. Perhaps a side chain that should move is blocking the pocket, or a loop is mis-modeled.
For enzymes, check the geometry of catalytic residues. Do they form the correct arrangement? For example, a catalytic triad (Ser-His-Asp) should be in the right orientation to form hydrogen bonds. If the model’s active site residues are far apart or oriented incorrectly, the model might represent an inactive conformation, or it might be wrong in that region. Sometimes, energy-minimizing that region or running a short MD (see below) can relax the structure into a more favorable active-site geometry.

2. Protein-Protein Interactions: If your protein’s function involves other macromolecules (another protein, DNA/RNA, etc.), consider validating those interfaces. AlphaFold-Multimer (a version of AlphaFold for complexes) might predict complexes, but if you only have a monomeric model, you can still simulate docking:

Use a protein-protein docking server (e.g., HADDOCK or ClusPro) to see if your model can interact with its partner in a way that’s consistent with known data. This is more advanced, but if an interface is known experimentally (like mutagenesis showed certain residues are important), check if those residues are surface-exposed and clustered, as they should be if an interface is there.
If the model is of a single chain that normally oligomerizes (dimer, etc.), you might superpose multiple copies or use symmetry to see if a plausible dimer can form. Again, low PAE between chains in an AlphaFold-Multimer prediction would signal a confident multimer interface (27), but for a single-chain prediction you have to manually inspect.

3. Molecular Dynamics (MD) Simulations: This is a powerful way to test the stability of the predicted structure and to possibly refine it. In MD, you treat the model with physical force fields and see if it holds up under simulation (e.g., in water at 300 K). MD can reveal if the structure wants to unfold or change and can also relieve any residual strain by nudging the model into a more favorable state.

Common MD packages include GROMACS (free, open-source), AMBER (academic, partly free tools), and NAMD (free, optimized for large systems). An example workflow using GROMACS will be covered in another blog post.

During the simulation, monitor:

RMSD over time: Does the protein rapidly drift away from the starting structure, or does it stabilize? A stable RMSD (plateauing after some fluctuation) suggests the structure is in a favourable energy well (likely a correct fold). If RMSD keeps increasing or the protein partially unfolds, that indicates the model had instabilities (perhaps that region was incorrect or inherently flexible).
Secondary structure retention: You can use tools (like DSSP on each frame) to see if helices and strands remain intact. If a predicted helix quickly unravels in MD, maybe that segment wasn’t truly helical.
Key interactions: Check if critical hydrogen bonds or salt bridges persist. For example, if the model suggested a salt bridge that’s important for stability, does MD maintain it or does it break?
Refinement: MD often causes side chains to adjust, sometimes resolving minor clashes automatically. You might take an average or low-energy structure from the simulation as a refined model. Many researchers use short MD refinement to improve homology models (28); the same can be done for AlphaFold models, especially to relax strained loops. As one approach, you could restrain the well-defined parts of the structure (to prevent them from moving too much) and let a loop of interest move freely; this can relieve local strain (28).

MD simulation essentially serves as a reality check: AlphaFold’s neural network prediction isn’t guaranteed to be at an energy minimum. If the model is good, it should withstand simulation in a realistic environment. If it’s not, you might see it start to collapse or drift, which tells you that at least some interactions in the model were not favorable. Of course, MD itself is an approximation (dependent on the force field, etc.), but it’s a well-established method to refine and validate structures.

After MD analysis: If the structure proved stable, you gain confidence. You might also use the MD trajectory to estimate the flexibility of different regions (e.g., high B-factor regions in a crystal often correspond to mobile loops in MD). If the structure was unstable, identify what broke down – was it the interface between two domains? Was it a particular loop that kept swinging out? This can pinpoint the weak spots of the model that might need revision or cautious interpretation.

Lastly, tie back to experiments whenever possible. If you have biochemical data (e.g., mutation effects, binding assays), see if the model’s features align with them. For example, if a mutation is known to disrupt function and your model places that residue in the core of the active site, that’s consistent and supports the model. If instead the model has that residue on the surface away from any obvious functional site, maybe the model is missing something (like maybe it’s actually part of a protein-protein interface that wasn’t modeled).

By this point, you have examined the model from every angle: intrinsic confidence (AlphaFold metrics), overall geometry and quality (validation tools), comparison to known structures, and functional plausibility (docking, pockets, MD). Based on all these, you can form a judgment on how reliable each part of the model is. It’s often useful to summarize this in a table or checklist, as we do below.

Summary of Validation Steps

The following table summarizes key validation steps for an AlphaFold model and why each step is important:

Validation Step	Purpose and Importance
AlphaFold Confidence (pLDDT & PAE)	Identify which regions of the model are predicted with high confidence and which are uncertain. pLDDT highlights local reliability (e.g., core vs. floppy tail) and PAE indicates confidence in domain orientation (29). This guides where to focus further validation – low pLDDT or high PAE regions are likely error-prone or flexible.
Visual Inspection (PyMOL/ChimeraX/Mol)	Catch obvious structural anomalies and get an intuitive feel for the model. By coloring the structure by confidence and examining secondary structure, one can see if the fold makes sense. Visual checks can reveal steric clashes, weird loops, or missing pockets early on. It’s a simple sanity check before delving into metrics.
Geometric Validation (MolProbity, PROCHECK, WHAT_CHECK)	Ensure the model’s geometry is physically plausible. These tools check bond lengths, angles, dihedral angles (Ramachandran plot) and detect atomic clashes (29). Good geometry (few outliers, low clashscore) means the model is stereochemically sound, while many violations indicate errors or regions that may need refinement.
Residue Environment Validation (Verify3D, ERRAT, QMEAN)	Assess whether each residue is in a realistic environment and the model has protein-like statistical features. Verify3D and ERRAT compare the structure to empirical rules derived from real proteins (19). A high Verify3D score or ERRAT quality factor, and a QMEAN Z-score near 0, suggest the model is globally consistent with known protein structures. Poor scores flag unlikely regions (e.g., a hydrophobic residue exposed, or an odd packing) for further scrutiny.
Comparison to Known Structures (TM-align, structural superposition)	Quantify accuracy by comparing to experimental data. If a PDB structure exists, a high TM-score (≈0.5–1.0) and low RMSD confirm the model’s correctness. Even partial matches validate domains or motifs. Discrepancies pinpoint specific errors or alternative conformations. This step effectively benchmarks the model against reality.
Functional Site Check (Docking & Pockets)	Verify that the model supports the protein’s known function. Checking for expected pockets and docking ligands tests if the active or binding site is correctly formed. Successful docking (reasonable poses & scores) means the geometry allows the interaction, while failures could mean the site is modeled incorrectly. This links structure to biochemical function, an essential validation for usability in studies (e.g., drug design).
Molecular Dynamics Simulation (GROMACS/AMBER/NAMD)	Test the model’s stability and refine it in a physics-based environment. A stable trajectory (little deviation) indicates the model sits in an energy minimum (likely correct fold), whereas structural unraveling signals problems. MD can also relieve minor clashes and adjust the model to be more realistic. In sum, MD validation ensures the model’s behavior is consistent with a real protein in solvent over time.

Each step adds a layer of confidence to your AlphaFold-predicted structure. By the end of this process, you should have a detailed understanding of your model’s strengths and weaknesses: which parts you can trust, which parts should be treated as hypotheses, and what further experiments or refinements might be necessary. Validating computational models is crucial for responsible use of AlphaFold predictions in research – it separates the cases where the model is as good as a crystal structure from those where it might mislead if taken at face value (30). With the strategies outlined above, researchers and students can robustly assess their AlphaFold structures and proceed with confidence in subsequent analyses.

References

1- Jumper, J., Evans, R., Pritzel, A., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. https://pmc.ncbi.nlm.nih.gov/articles/PMC8728224/

2- European Bioinformatics Institute. (n.d.). pLDDT: Understanding local confidence. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/plddt-understanding-local-confidence/

3- European Bioinformatics Institute. (n.d.). pLDDT measures confidence in the predicted position of residues. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/plddt-understanding-local-confidence/

4- Jumper et al., 2021 – (lDDT, pLDDT also reflects local geometry) https://pmc.ncbi.nlm.nih.gov/articles/PMC8728224/

5- European Bioinformatics Institute. (n.d.). Predicted aligned error (PAE) – Explanation. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/pae-a-measure-of-global-confidence-in-alphafold-predictions/

6- European Bioinformatics Institute. (n.d.). Low PAE score implies confident relative positions. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/pae-a-measure-of-global-confidence-in-alphafold-predictions/

7- European Bioinformatics Institute. (n.d.). PAE visualisation explanation. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/pae-a-measure-of-global-confidence-in-alphafold-predictions/

8- EBI (n.d.). Image: Figure 18 – PAE plot. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/pae-a-measure-of-global-confidence-in-alphafold-predictions/

9- EBI (n.d.). Explanation of diagonal PAE plot. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/pae-a-measure-of-global-confidence-in-alphafold-predictions/

10- EBI (n.d.). Comparison of pLDDT and PAE. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/pae-a-measure-of-global-confidence-in-alphafold-predictions/

11- Jumper et al., 2021 – (residue confidence metric pLDDT) https://pmc.ncbi.nlm.nih.gov/articles/PMC8728224/

12- EBI (n.d.). How to view PAE plots in structure viewer. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/pae-a-measure-of-global-confidence-in-alphafold-predictions/

13- Chen, V. B., Arendall, W. B., Headd, J. J., et al. (2010). MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallographica Section D: Biological Crystallography, 66(1), 12–21. https://pmc.ncbi.nlm.nih.gov/articles/PMC1933162/

14- Chen et al., 2010 – (integration with other resources) https://pmc.ncbi.nlm.nih.gov/articles/PMC1933162/

15- Laskowski, R. A., MacArthur, M. W., Moss, D. S., & Thornton, J. M. (1993). PROCHECK: A program to check the stereochemical quality of protein structures. Journal of Applied Crystallography, 26(2), 283–291. https://www.ebi.ac.uk/thornton-srv/software/PROCHECK/

16- EBI (n.d.). PROCHECK hand menu image. https://www.ebi.ac.uk/thornton-srv/software/PROCHECK/

17- SAVES. (n.d.). VERIFY 3D. https://saves.mbi.ucla.edu/#:~:text=VERIFY%203D

18- ResearchGate. (n.d.). Interpreting ERRAT, Verify3D and PROCHECK scores. https://www.researchgate.net/post/How-do-I-interpret-my-results-from-ERRAT-Verify3D-and-PROCHECK

19- SAVES. (n.d.). ERRAT. https://saves.mbi.ucla.edu/#:~:text=ERRAT

20- BioStars. (n.d.). Verify3D result discussion. https://www.biostars.org/p/14354/#:~:text=Verify3D%20shows%2076.10,087

21- Benkert, P., Tosatto, S. C., & Schwede, T. (2009). QMEAN: A comprehensive scoring function for model quality assessment. Proteins: Structure, Function, and Bioinformatics, 71(1), 261–277. https://swissmodel.expasy.org/qmean/

22- Bonvin Lab. (n.d.). QMEAN Z-score. https://www.bonvinlab.org/education/molmod_online/modelling/

23- Zhang, Y., & Skolnick, J. (2005). TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Research, 33(7), 2302–2309. https://zhanggroup.org/TM-score/

24- ZhangGroup. (n.d.). TM-score info. https://zhanggroup.org/TM-score/

25- Varadi, M., Anyango, S., Deshpande, M., et al. (2022). AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 50(D1), D439–D444. https://pmc.ncbi.nlm.nih.gov/articles/PMC10776388/

26- EBI (n.d.). Residue confidence within structure prediction. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/pae-a-measure-of-global-confidence-in-alphafold-predictions/

27- EBI (n.d.). pLDDT: More accurate structures. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/plddt-understanding-local-confidence/

28- BioStars. (n.d.). Simulations and MD prep. https://www.biostars.org/p/14354/#:~:text=I%20would%20run%20a%20short,system%20and%20running%20the%20simulations

29- Jumper et al., 2021 – (Predicted Aligned Error detail) https://pmc.ncbi.nlm.nih.gov/articles/PMC8728224/

30- EBI (n.d.). PAE score importance for AlphaFold ID. https://www.ebi.ac.uk/training/online/courses/alphafold/inputs-and-outputs/evaluating-alphafolds-predicted-structures-using-confidence-scores/pae-a-measure-of-global-confidence-in-alphafold-predictions/