100% found this document useful (5 votes)
5K views22 pages

SCOP and CATH Database

SCOP and CATH are secondary protein structure databases that provide hierarchical classifications of protein domains derived from protein structures in the PDB. SCOP classifies domains based on structural similarities and evolutionary relationships, while CATH classifies based on class, architecture, topology, and homology. Both databases aim to determine evolutionary relationships between proteins to study protein structure and function.

Uploaded by

Aishwarya Dharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (5 votes)
5K views22 pages

SCOP and CATH Database

SCOP and CATH are secondary protein structure databases that provide hierarchical classifications of protein domains derived from protein structures in the PDB. SCOP classifies domains based on structural similarities and evolutionary relationships, while CATH classifies based on class, architecture, topology, and homology. Both databases aim to determine evolutionary relationships between proteins to study protein structure and function.

Uploaded by

Aishwarya Dharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

SCOP AND CATH DATABASE

By Aishwarya Dharan
MSc. Life Science(Bioinformatics)
19mslsbf02
SCOP
Structural Classification Of Proteins

CATH
Class Architecture Topology
Homologous
SCOP AND CATH

• Secondary databases to study protein structure.


• Secondary databases contain information derived from primary databases.
Secondary databases store information such as conserved sequences, active
site residues, and signature sequences. Protein Databank data is stored in
secondary databases.
• The Structural Classification of Proteins (SCOP) database is free and a publicly available database, which
manually classifies protein structural domains based on similarities of their structures and amino acid sequences.
(http://scop2.mrc-lmb.cam.ac.uk/)
• The SCOP protein classification is essentially a manual process using visual inspection and comparison of
structures, some automation is used for the most routine tasks such as clustering protein chains on the basis of
sequence similarity.
• SCOP was created in 1994 in the Centre for Protein Engineering and the Laboratory of Molecular Biology. It was
maintained by Alexey G. Murzin and his colleagues in the Centre for Protein Engineering at Cambridge
University until its closure in 2010 and subsequently at the Laboratory of Molecular Biology in Cambridge,
England.
• The main motivation for this classification is to determine the evolutionary relationship between proteins.
• SCOP has been discontinued due to accelerating pace of protein structure publications, the limited automation of
classification could not keep up, leading to a non-comprehensive dataset. The last official version of SCOP is
1.75. SCOP1.75 is also known as SCOP2.
• SCOP2 offers two different ways for accessing data: SCOP2-browser, and SCOP2-graph.
SCOP2-browser allows navigation in a traditional way by browsing pages displaying the node
information. SCOP2-graph is a graph-based web tool for display and navigation.

• Since SCOP and SCOP2 are not up-to-date with the latest version of the PDB, an extended
version of SCOP, SCOPe, was recently established by the Chandonia group.

• Structural Classification of Proteins extended (SCOPe) database was released in 2012 with
far greater automation of the same hierarchical system and is full backwards compatible with
SCOP. In 2014, manual curation was reintroduced into SCOPe to maintain accurate structure
assignment. SCOPe 2.05 has classified 71,000 of the 110,000 total PDB entries. SCOPe also
corrects some errors in SCOP.
CLASSIFICATION OF SCOP
SCOP is organized into 4 hierarchical layers:
1. Class—It is the general structural architecture of the protein domains. Proteins are usually (but
not always) separated into domains, and most of these domains are classified into one of the first
five classes:
a) all-α:those whose structure is essentially formed by α-helices
b) all-β:those whose structure is essentially formed by β -sheets
c) α/β : those with α-helices and β-strands
d) α+β:mainly antiparallel beta sheets (segregated alpha
and beta regions)
e) multi-domain:those with domains of different fold and
for which no homologues are known at present.
2. Fold—It represents similar arrangement of regular secondary
structures but without evidence of evolutionary relatedness.
• Includes different shapes of domains within a class, e.g.,
2 helices; antiparallel hairpin, left-handed twist, etc.
Source: https://www.ebi.ac.uk/training/online/sites/ebi.ac.uk.training.online/files/resize/user/511/documents/slide1_5-457x343.jpg
3. Superfamily—The domains in a fold are grouped into superfamilies, which have at least a distant
common ancestor.
4. Family—Domains belonging to the same family:
• share some sequence similarity.
• evolutionarily related.
• pairwise residue identities between them are 30% and greater.
5. Protein domain: The domains in families are grouped into protein domains, which are essentially the same protein.
6. Species: The domains in "protein domains" are grouped according to species.

Source: https://image1.slideserve.com/1868737/structural-classification-of-proteins-n.jpg
• The CATH Protein Structure Classification database (http://www.cathdb.info/ ) is a free,
publicly available online resource that provides hierarchical domain classification of protein
structures in the Protein Data Bank. Protein structures are classified using a combination of
automatic structural alignment program (SSAP) as well as manual comparison. Although the
protocol used is mostly automatic, manual inspection is used to check assignments at some
critical stages, such as the detection of very distantly related homologues and analogues and
the assignment of novel architectures.
• It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet
Thornton and David Jones, and continues to be developed by the Orengo group at University
College London.
• Experimentally-determined protein three-dimensional structures are obtained from the PDB
and split into their consecutive polypeptide chains, where applicable.
CLASSIFICATION OF CATH
• The four main levels of the CATH hierarchy are as follows:
1. Class: the overall secondary-structure content of the domain. e.g., all α, all β, α/β, α+β, α&β, etc.
2. Architecture: Structures are classified according to their overall shape as determined by the
orientations of the secondary structures in 3D space but ignores the connectivity between them.
3. Topology: consists of structures with the same number, arrangement and connectivity of secondary
structure based on structural superposition.
4. Homologous superfamily: Functional and structural similarities are determined by sequence
comparison and then by structure comparison using SSAP. Two structures are in the same homologous
superfamily if any of the following hold:
• Sequence identity > 35%
• SSAP score > 80 and sequence identity > 20%
• SSAP score > 80 and 60% of larger structure is equivalent to the smaller structure; the domains have
related functions
• To illustrate the types of domains that one can observe at the architecture level,
let us look at some of the mixed alpha–beta class. These are the following 10
entries at the architecture level:

ALPHA-BETA (αβ) 2-Layer Sandwich (αβ1)


3-Layer(aba) Sandwich (αβ2)
Alpha-Beta Barrel (αβ3)
Alpha-Beta Complex (αβ4)
Roll (αβ5)
MAINLY ALPHA (α) Orthogonal Bundle (α1)
Up-down Bundle (α2)
MAINLY BETA (β) Beta Barrel (β1)
Roll (β2)
Sandwich (β3)
Illustration of 10 different CATH architectures (A) 2-layer sandwich (αβ1). (B) 3-layer(αβα)
sandwich (αβ2). (C) alpha-beta barrel (αβ3). (D) alpha-beta complex (αβ4). (E) roll (αβ5). (F)
orthogonal bundle (α1). (G) up-down bundle (α2). (H) beta barrel (β1). (I) roll (β2). (J) sandwich
(β3)

Source:
https://www.researchgate.net/profile/Senthilnathan_Rajendran2/publication/325182475/figure/fig2/AS:628569227153409@1526873990276/Illustration
-of-10-different-CATH-architectures-subfolds-in-our-data-set-A-2-Layer.png
Source: https://images.slideplayer.com/25/7639286/slides/slide_3.jpg
APPLICATIONS OF SCOP
1) To study viral fold specificity - SCOP classification was used by Cheng and Brooks
to study fold diversity in viral capsid proteins. Cheng and Brooks concluded that
viral capsids evolved under distinct evolutionary constraints from non‐capsid
proteins, and may provide valuable templates for protein engineering.
2) Study evolution of oligomer geometries. In a study of evolution of different
oligomeric states by Perica, Chothia, and Teichmann, structures were collected from
10 SCOP families that have “at least one dimer and one homologous tetramer or
hexamer with the same dimeric binding mode.” The study detected locations of
mutations that were correlated with different oligomerization states and found that
“such indirect, or allosteric mutations affecting intersubunit geometry via indirect
mechanisms are as important as interface sequence changes for evolution of
oligomeric states.”
APPLICATIONS OF CATH
• There was one study in which the authors used information that was available only in CATH
and not in SCOP. In the study by Bukhari and Caetano‐Anollés, phylogenetic data were used
to study the emergence of different CATH domain architectures.
• The focus of the study was on the CATH architecture level, which does not have an analogous
level in SCOP.
• The study found ancient architectures such as the CATH 3‐layer (αβα) sandwich (3.40) or
the orthogonal bundle (1.10) are involved in basic cellular functions, but more recently evolved
architectures such as prism, propeller, 2‐solenoid, super‐roll, clam, trefoil, and box are not
widely distributed.
• That study also benchmarked the phylogenetic analysis of CATH domains compared with
SCOP domains, measuring the distribution of CATH architectures, topologies, and homologies,
and SCOP folds, superfamilies, and families in Bacteria, Eukarya, and Archaea
superkingdoms.
COMPARATIVE DISCREPANCIES
BETWEEN SCOP & CATH
CATH assigns more domains than SCOP, due to the
fact that CATH defines domains purely structurally,
whereas SCOP takes into account whether or not a
domain is observed as recurring in another
superfamily, or observed as a separate single-
domain fold.
(a) Structure of papain (1ppo), a cysteine proteinase from papaya, with catalytic histidine, asparagine
and cystine shown as ball-and-stick residues. SCOP classifies the structure as one domain (SCOP code:
4.3.1), leaving the catalytic cysteine, histidine, and asparagine together to form the active site, whereas
CATH splits the structure into two, as shown by blue (CATH code: 1.10.190.10) and yellow (3.10.160.10)
colouring, rendering each domain effectively functionless. After this study by Haldane & Jones, Papain is
now treated as a single domain in CATH.
Source: Fig. 4, https://doi.org/10.1016/S0969-2126(99)80177-4
COMPARATIVE DISCREPANCIES BETWEEN SCOP &
CATH

Examples of class assignment disagreements between CATH and SCOP. (a) SCOP ignores the
small helical elements in the haemagglutinin structure and classifies the domain as mainly β,
whereas CATH takes the helices into account and considers the structure αβ. (b) In case of
lysozyme superfamily (e.g. 1lys), CATH disregards the presence of small β strands and
considers the protein mainly α, whereas SCOP takes into account the functional and
evolutionary importance of these strands, and calls the lysozymes α/β.
Source: https://ars.els-cdn.com/content/image/1-s2.0-S0969212699801774-
gr6_lrg.jpg
DIFFERENCE BETWEEN SCOP & CATH

Fold (F)

Source: https://www.researchgate.net/profile/Syed_Abbas_Bukhari/publication/235993836/figure/fig1/AS:299889826254851@1448510714283/Hierarchy-of-
the-CATH-structural-classification-system-compared-to-corresponding-SCOP.png
DIFFERENCE BETWEEN SCOP & CATH

• In CATH, there is only one class to represent mixed


alpha-beta.
• In SCOP there are two:
• α/β: beta structure is largely parallel, made of β α β
motifs
• α + β : alpha and beta structure segregated to different
parts of structure
CONCLUSION

SCOP is a valuable resource for detailed


evolutionary information, and CATH is a valuable
source of geometric information.
REFERENCES
• Hadley C., Jones D. T. (1999). A systematic comparison of protein structure
classifications: SCOP, CATH and FSSP. Structure. 7:1099–1112.
https://doi.org/10.1016/S0969-2126(99)80177-4
• Burkowski F. J. (2008). Structural Bioinformatics: An algorithmic approach.
Florida, FL: CRC Press, Taylor & Francis Group.
• Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995). SCOP: a
structural classification of proteins database for the investigation of
sequences and structures. J. Mol. Biol. 247, 536-540. [PDF]
THANK YOU

You might also like