SCOP and CATH Database
SCOP and CATH Database
By Aishwarya Dharan
MSc. Life Science(Bioinformatics)
19mslsbf02
SCOP
Structural Classification Of Proteins
CATH
Class Architecture Topology
Homologous
SCOP AND CATH
• Since SCOP and SCOP2 are not up-to-date with the latest version of the PDB, an extended
version of SCOP, SCOPe, was recently established by the Chandonia group.
• Structural Classification of Proteins extended (SCOPe) database was released in 2012 with
far greater automation of the same hierarchical system and is full backwards compatible with
SCOP. In 2014, manual curation was reintroduced into SCOPe to maintain accurate structure
assignment. SCOPe 2.05 has classified 71,000 of the 110,000 total PDB entries. SCOPe also
corrects some errors in SCOP.
CLASSIFICATION OF SCOP
SCOP is organized into 4 hierarchical layers:
1. Class—It is the general structural architecture of the protein domains. Proteins are usually (but
not always) separated into domains, and most of these domains are classified into one of the first
five classes:
a) all-α:those whose structure is essentially formed by α-helices
b) all-β:those whose structure is essentially formed by β -sheets
c) α/β : those with α-helices and β-strands
d) α+β:mainly antiparallel beta sheets (segregated alpha
and beta regions)
e) multi-domain:those with domains of different fold and
for which no homologues are known at present.
2. Fold—It represents similar arrangement of regular secondary
structures but without evidence of evolutionary relatedness.
• Includes different shapes of domains within a class, e.g.,
2 helices; antiparallel hairpin, left-handed twist, etc.
Source: https://www.ebi.ac.uk/training/online/sites/ebi.ac.uk.training.online/files/resize/user/511/documents/slide1_5-457x343.jpg
3. Superfamily—The domains in a fold are grouped into superfamilies, which have at least a distant
common ancestor.
4. Family—Domains belonging to the same family:
• share some sequence similarity.
• evolutionarily related.
• pairwise residue identities between them are 30% and greater.
5. Protein domain: The domains in families are grouped into protein domains, which are essentially the same protein.
6. Species: The domains in "protein domains" are grouped according to species.
Source: https://image1.slideserve.com/1868737/structural-classification-of-proteins-n.jpg
• The CATH Protein Structure Classification database (http://www.cathdb.info/ ) is a free,
publicly available online resource that provides hierarchical domain classification of protein
structures in the Protein Data Bank. Protein structures are classified using a combination of
automatic structural alignment program (SSAP) as well as manual comparison. Although the
protocol used is mostly automatic, manual inspection is used to check assignments at some
critical stages, such as the detection of very distantly related homologues and analogues and
the assignment of novel architectures.
• It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet
Thornton and David Jones, and continues to be developed by the Orengo group at University
College London.
• Experimentally-determined protein three-dimensional structures are obtained from the PDB
and split into their consecutive polypeptide chains, where applicable.
CLASSIFICATION OF CATH
• The four main levels of the CATH hierarchy are as follows:
1. Class: the overall secondary-structure content of the domain. e.g., all α, all β, α/β, α+β, α&β, etc.
2. Architecture: Structures are classified according to their overall shape as determined by the
orientations of the secondary structures in 3D space but ignores the connectivity between them.
3. Topology: consists of structures with the same number, arrangement and connectivity of secondary
structure based on structural superposition.
4. Homologous superfamily: Functional and structural similarities are determined by sequence
comparison and then by structure comparison using SSAP. Two structures are in the same homologous
superfamily if any of the following hold:
• Sequence identity > 35%
• SSAP score > 80 and sequence identity > 20%
• SSAP score > 80 and 60% of larger structure is equivalent to the smaller structure; the domains have
related functions
• To illustrate the types of domains that one can observe at the architecture level,
let us look at some of the mixed alpha–beta class. These are the following 10
entries at the architecture level:
Source:
https://www.researchgate.net/profile/Senthilnathan_Rajendran2/publication/325182475/figure/fig2/AS:628569227153409@1526873990276/Illustration
-of-10-different-CATH-architectures-subfolds-in-our-data-set-A-2-Layer.png
Source: https://images.slideplayer.com/25/7639286/slides/slide_3.jpg
APPLICATIONS OF SCOP
1) To study viral fold specificity - SCOP classification was used by Cheng and Brooks
to study fold diversity in viral capsid proteins. Cheng and Brooks concluded that
viral capsids evolved under distinct evolutionary constraints from non‐capsid
proteins, and may provide valuable templates for protein engineering.
2) Study evolution of oligomer geometries. In a study of evolution of different
oligomeric states by Perica, Chothia, and Teichmann, structures were collected from
10 SCOP families that have “at least one dimer and one homologous tetramer or
hexamer with the same dimeric binding mode.” The study detected locations of
mutations that were correlated with different oligomerization states and found that
“such indirect, or allosteric mutations affecting intersubunit geometry via indirect
mechanisms are as important as interface sequence changes for evolution of
oligomeric states.”
APPLICATIONS OF CATH
• There was one study in which the authors used information that was available only in CATH
and not in SCOP. In the study by Bukhari and Caetano‐Anollés, phylogenetic data were used
to study the emergence of different CATH domain architectures.
• The focus of the study was on the CATH architecture level, which does not have an analogous
level in SCOP.
• The study found ancient architectures such as the CATH 3‐layer (αβα) sandwich (3.40) or
the orthogonal bundle (1.10) are involved in basic cellular functions, but more recently evolved
architectures such as prism, propeller, 2‐solenoid, super‐roll, clam, trefoil, and box are not
widely distributed.
• That study also benchmarked the phylogenetic analysis of CATH domains compared with
SCOP domains, measuring the distribution of CATH architectures, topologies, and homologies,
and SCOP folds, superfamilies, and families in Bacteria, Eukarya, and Archaea
superkingdoms.
COMPARATIVE DISCREPANCIES
BETWEEN SCOP & CATH
CATH assigns more domains than SCOP, due to the
fact that CATH defines domains purely structurally,
whereas SCOP takes into account whether or not a
domain is observed as recurring in another
superfamily, or observed as a separate single-
domain fold.
(a) Structure of papain (1ppo), a cysteine proteinase from papaya, with catalytic histidine, asparagine
and cystine shown as ball-and-stick residues. SCOP classifies the structure as one domain (SCOP code:
4.3.1), leaving the catalytic cysteine, histidine, and asparagine together to form the active site, whereas
CATH splits the structure into two, as shown by blue (CATH code: 1.10.190.10) and yellow (3.10.160.10)
colouring, rendering each domain effectively functionless. After this study by Haldane & Jones, Papain is
now treated as a single domain in CATH.
Source: Fig. 4, https://doi.org/10.1016/S0969-2126(99)80177-4
COMPARATIVE DISCREPANCIES BETWEEN SCOP &
CATH
Examples of class assignment disagreements between CATH and SCOP. (a) SCOP ignores the
small helical elements in the haemagglutinin structure and classifies the domain as mainly β,
whereas CATH takes the helices into account and considers the structure αβ. (b) In case of
lysozyme superfamily (e.g. 1lys), CATH disregards the presence of small β strands and
considers the protein mainly α, whereas SCOP takes into account the functional and
evolutionary importance of these strands, and calls the lysozymes α/β.
Source: https://ars.els-cdn.com/content/image/1-s2.0-S0969212699801774-
gr6_lrg.jpg
DIFFERENCE BETWEEN SCOP & CATH
Fold (F)
Source: https://www.researchgate.net/profile/Syed_Abbas_Bukhari/publication/235993836/figure/fig1/AS:299889826254851@1448510714283/Hierarchy-of-
the-CATH-structural-classification-system-compared-to-corresponding-SCOP.png
DIFFERENCE BETWEEN SCOP & CATH