0% found this document useful (0 votes)
13 views36 pages

Data Bases

The document discusses various types of chemical databases, including structure, substructure, reaction, and relational databases, highlighting their importance in retrieving and analyzing chemical information. It emphasizes the role of the Cambridge Structural Database (CSD) and Protein Data Bank (PDB) in providing curated structural data for organic and biological molecules. Additionally, it covers the concept of 3D pharmacophores, which are essential for understanding molecular interactions in drug design.

Uploaded by

pgchemistry2225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views36 pages

Data Bases

The document discusses various types of chemical databases, including structure, substructure, reaction, and relational databases, highlighting their importance in retrieving and analyzing chemical information. It emphasizes the role of the Cambridge Structural Database (CSD) and Protein Data Bank (PDB) in providing curated structural data for organic and biological molecules. Additionally, it covers the concept of 3D pharmacophores, which are essential for understanding molecular interactions in drug design.

Uploaded by

pgchemistry2225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

DATA BASES

• https://www.science.co.il/chemistry/database
s/Structure-databases.php#google_vignette
InChI representation (formerly IChI) being
developed by IUPAC
DATA BASE SEARCH

2D 3D
SEARCH SEARCH
STRUCTURE SEARCH

SUB STRUCTURE SEARCH CSD

REACTION SEARCH

PROTEIN DATA BANK


PATENT SEARCH

RELATIONAL SEARCH 3D PHARMACOPHORE


STRUCTURE SEARCHING
• The simplest searching task involves the extraction from the
database of information associated with a particular structure.
• For example, one may wish to look up the boiling point of acetic
acid or the price of acetone.
• The first step is to convert the structure provided by the user (the
query) into the relevant canonical ordering or representation.
• One could then search through the database, starting at the
beginning, in order to find this structure. However, the canonical
representation can also provide the means to retrieve
information about a given structure more directly from the
database through the generation of a hash key. A hash key is
typically an integer with a value between 0 and some large
number (e.g. 232 − 1).
SUBSTRUCTURE SEARCHING
• A substructure search identifies all the molecules
in the database that contain a specified
substructure.
• A simple example would be to identify all
structures that contain a particular functional
group or sequence of atoms such as a carboxylic
acid, benzene ring or C5 alkyl chain.
• Most chemical database systems use a two-stage
mechanism to perform substructure search
• The first step involves the use of screens to
rapidly eliminate molecules that cannot
possibly match the substructure query. The
aim is to discard a large proportion (ideally
more than 99%) of the database.
REACTION DATA BASE
• Reactions are central to the subject of
chemistry, being the means by which new
chemical entities are produced. As any student
of chemistry is aware a huge number of new
reactions and new applications of existing
reactions are published every year.
• The Beilstein Handbuch der Organischen
Chemie containing information from 1771
onwards
WHY YOU NEED REACTION SEARCH?
• When planning a synthesis a chemist may wish to
search a reaction database in a variety of ways.
• A simple type of query would involve an exact
structure search against the products, to
determine whether there was an established
synthesis for the compound of interest.
• reaction search involving the structures or
substructures of the precursor or reactant and
the product.
• A more general type of search would involve
identification of all deposited syntheses for a
particular named reaction, in order to identify
a range of possible reaction conditions to
attempt. Other quantities of interest may
include specific reagents, catalysts, solvents,
the reaction conditions (temperature,
pressure, pH, time, etc.) together with the
yield.
• One may of course wish to combine more
than one of these criteria in a single query
(e.g. “find all deposited reaction schemes
which involve the reduction of an aliphatic
aldehyde to an alcohol where the temperature
is less than 150◦C and the yield is greater than
75%”)
RELATIONAL DATA BASE
• Relational database systems have been used for
many years to store numeric and textual data
• Most databases include an identifier for each
structure, such as a Chemical Abstracts Registry
Number (a CAS number), an internal registry
number, catalogue number or chemical name.
Other types of data that may be present include
measured and calculated physical properties,
details of supplier and price where appropriate,
date of synthesis and so on.
• A typical database contains a number of
tables, linked via unique identifiers. By way of
example, suppose we wish to construct a
simple relational database to hold inventory
and assay data relating to biological testing.
• In a relational database the data is stored in
rectangular tables, the columns corresponding
to the data items and each row representing a
different piece of data.
EXAMPLE OF RELATIONAL DATA BASE
3D DATA BASE SEARCH
• EXPERIMENTAL 3D DATABASES:
– Cambridge Structural Database (CSD)
– Protein Data Bank (PDB)
Cambridge Structural Database (CSD)

• Cambridge crystallographic data centre-CCDC


• The CSD is the world's repository of highly
curated experimentally determined organic and
metal-organic crystal structures. It is used
globally by scientists in over 70 countries to
understand how molecules behave and interact
in three dimensions in the solid form and
ultimately how this affects physical properties.
• 1 million molecuels
• CCDC are world-leading experts in structural chemistry data, software
and knowledge for materials and life science research and application.
• They are dedicated to the advancement of chemistry and
crystallography for the public benefit. They specialise in the collation,
preservation and application of scientific structural data for use in
pharmaceutical discovery, materials development and research and
education.
• CCDC compile and distribute the Cambridge Structural Database (CSD),
a certified trusted database of fully curated and enhanced organic and
metal-organic structures, used by researchers across the globe.
• Their cutting-edge software empowers scientists to extract invaluable
insights from the vast dataset, informing and accelerating their
research & development.
• https://
www.ccdc.cam.ac.uk/structures/UnlicensedEn
quiry?id=UnitCellSearch
PDB-Protein Data Bank
• The Protein Data Bank (PDB) contains more than 44,000 x-ray and
nuclear magnetic resonance (NMR) structures of proteins and
protein–ligand complexes and some nucleic acid and
carbohydrate structures
• The Protein Data Bank (PDB)[1] is a database for the three-
dimensional structural data of large biological molecules, such as
proteins and nucleic acids. The data, typically obtained by
X-ray crystallography, NMR spectroscopy, or, increasingly, cryo
-electron microscopy, and submitted by biologists and
biochemists from around the world, are freely accessible on the
Internet via the websites of its member organisations (PDBe, [2]
PDBj,[3] RCSB,[4] and BMRB[5]). The PDB is overseen by an
organization called the Worldwide Protein Data Bank, wwPDB.
• the PDB is more a communal repository of data
files, one for each protein structure. Founded in
1971 the PDB now contains approximately 44,000
structures, most obtained using x-ray
crystallography but with some determined using
NMR.
• The PDB is most commonly accessed via a web
interface, which enables structures to be retrieved
using various textual queries (such as by author,
protein name, literature citation).
• Some web interfaces also enable searches to
be performed using amino acid sequence
information. As the number of protein
structures has grown so it has been
recognised that the “flat file” system is
inappropriate and more modern database
systems and techniques have been introduced
based on the information in the PDB.
• The PDB has been used extensively to further
our understanding of the nature of protein
structure and its relationship to the amino
acid sequence. For example, various
classification schemes have been proposed for
dividing protein structures into families
• The structures in the PDB also form the basis for
comparative modelling (also known as homology
modelling), where one attempts to predict the
conformation of a protein of known sequence but
unknown structure using the known 3D structure
of a related protein.
• The PDB has also provided information about the
nature of the interactions between amino acids,
and between proteins and water molecules and
small-molecule ligands
3D PHARMACOPHORE
• A major use of 3D database systems is for the identification of
compounds that possess 3D properties believed to be
important for interaction with a particular biological target.
These requirements can be expressed in a variety of ways, one
of the most common being as a 3D pharmacophore. A 3D
pharmacophore is usually defined as a set of features together
with their relative spatial orientation. Typical features include
hydrogen bond donors and acceptors, positively and negatively
charged groups, hydrophobic regions and aromatic rings. The
use of such features is a natural extension of the concept of
bioisosterism, which recognises that certain functional groups
have similar biological, chemical and physical properties
bioisosteres are chemical substituents or groups with
similar physical or chemical properties which produce
broadly similar biological properties to another
chemical compound.
• PDB file formats:
• The Protein Data Bank (pdb) file format is a
textual file format describing the three-
dimensional structures of molecules held in the
Protein Data Bank. The pdb format accordingly
provides for description and annotation of protein
and nucleic acid structures including atomic
coordinates, secondary structure assignments, as
well as atomic connectivity. In addition
experimental metadata are stored. PDB format is
the legacy file format for the Protein Data Bank
which now keeps data on biological
macromolecules in the newer mmCIF file format.
Pharmacophore
• A pharmacophore is an abstract description
of molecular features that are necessary for
molecular recognition of a ligand by a
biological macromolecule

You might also like