DAQ-Score Database

Precomputed Residue-Wise Local Quality Scores of Protein Models from Cryo-EM Maps

Developed by Kihara Lab
Examples:
Updated entries on 2024/12/03 based on the PDB data as of 2024/07/31.
Details Update History

What is DAQ

DAQ is a deep-learning-based score that quantifies residue-wise local quality for protein models from cryo-electron microscopy (cryo-EM) maps determined at a resolution between 2.5 to 5 Å.

Key Features

example record

The model is colored by DAQ score scaled from red (low) to blue (high). PDB 7JSN chain B Version 1, EMD-22458.

This database provides pre-computed DAQ score for structure models in PDB that were derived from cryo-EM maps. Entries can be searched by PDB ID, EMDB ID, or key words. DAQ score along a protein structure model is visualized in an interactive structure viewer as well as a graph and a score table, which are connected with the model structure in the viewer. Three scores are provided, which evaluate correctness of amino acid assignments, Cα positions, and secondary structure assignments.

Overview of DAQ

Overview of DAQ

DAQ uses deep-learning and computes the likelihood that each local position in a cryo-EM map corresponds to different secondary structures, amino acids, and Cα atoms from its local density features. Then, a plausibility of each residue in a structure model from the cryo-EM map is quantified with the following equations.

The amino acid type of residue \(i\) in a model is evaluated as: \[ DAQ(AA)(i)=log\left(\frac{P_{aa(i)}(i)}{\sum_{j}P_{aa(i)}(j)/N}\right), \] where \(aa(i)\) is the amino acid type of residue \(i\), \(P_{aa(i)}(i)\) is the computed probability for the amino acid type of residue \(i\) by deep learning, which is normalized by the average probability of the amino acid type across over all atom positions in the protein model.

The Cα position of residue \(i\) in a model is evaluated as: \[ DAQ(C\alpha)(i)=log\left(\frac{P_{C\alpha}(i)}{\sum_{j}P_{C\alpha}(j)/N}\right), \] where \(C\alpha(i)\) is the Cα atom of residue \(i\), \(P_{C\alpha}(i)\) is the computed probability that the position correspond to a Cα atom by deep learning, which is normalized by the average probability of Cα over all atom positions in the protein model.

Lastly, the secondary structure of residue \(i\) in a model is evaluated as: \[ DAQ(SS)\left(i\right)=\sum_{ss\in H,E,C}{{Pseq}_{ss}\left(i\right)log\left(\frac{P_{ss}\left(i\right)}{\sum_{j}P_{ss}(j)/N}\right)}, \] where \(SS(i)\) is the secondary structure type of residue \(i\) to be evaluated, \({Pseq}_{ss}(i)\) is the probability of the secondary structure \(ss\) for the amino acid residue \(i\) predicted from the protein sequence using a secondary structure prediction method, SPOT1D. \(P_{ss}(i)\) is the computed probability of the secondary structure of residue \(i\) by the deep learning, which is normalized by the average probability of the secondary structure type across over all atom positions in the protein model.

Computed scores are averaged over a window of 19 residues along the sequence, because this averaging better distinguished potential incorrect and correct residue placements in the benchmark.

Key Insights: What type of modeling issues does DAQ detect and when does DAQ tend not to flag?