CENG3300 Lecture 4
CENG3300 Lecture 4
Engineering
Lecture 5
Introduction to cheminformatics and
bioinformatics
• ILOs
• Know the common data types in cheminformatics and bioinformatics;
• Digitize molecular structure data;
• Calculate simple molecular features;
• Know the tools for processing biology data
Cheminformatics
• Chemoinformatics, chemical informatics, chemioinformatics
• Computer and chemistry
• Scope and Applications
• Chemical Data Management
• Molecular Descriptors and Property Prediction
• Chemical Similarity and Clustering
• Quantitative Structure-Activity Relationship
• Cycles
• Break a bond in the cycle and use a digit to label the break.
Q: What is the SMILES for Cyclohexane? Benzene? Toluene?
SMILES(cont.)
• Disconnections
• A period “.” separates nonbonded molecules.
• Isomeric Smiles
• Slashes ( / \ ) denote configuration around double bonds. F/C=C/F
• At ( @ ) denotes configuration around chiral centers. F\C=C\F
SMILES(cont.)
• SMILES is not natural to human (readable, but not ideal)
• SMILES is friendly to computers (text string)
• Compared to SMILES:
• InChI is standardized and unique
• Lengthy representation, not human readable
• Limited stereochemistry
SMILES arbitrary target
specification (SMARTS)
• Regular expressions for molecules.
• All SMILES are SMARTS (exact matches). Additionally, SMARTS
support:
• wild cards
• C~*~C any atom can be between two carbons using any (~) bond
• a1aaaaa1 any aromatic 6 atom ring
• property testing
• [R] atom in a ring
• [#6] atomic number is 6 (matches aromatic or aliphatic)
• [D3] atom with three explicit bonds (degree)
SMARTS (cont.)
• Additionally, SMARTS support:
• logical operators (not - !, and - & ;, or - ,)
• [!C&R] not aliphatic carbon and in ring
• [F,Cl,Br,I] one of the first four halogens
• matching an atomic environment ('recursive' SMARTS)
• [$(*O);$(*C)] this matches one atom that is bound to both C and O
Molecular fingerprint
Differenciate
Differenciate
Molecular fingerprint
• Naïve fingerprint
CHHCHH-----
Molecular fingerprint
• Implementation
Molecular fingerprint
• Extended connectivity molecular fingerprint
Molecular fingerprint (cont.)
• Similarity calculation
http://www.ncbi.nlm.nih.gov
https://www.ebi.ac.uk
Bioinformatics databases
AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb,BBDB, BCGD, Beanref,
Biolmage, BioMagResBank, BIOMDB, BLOCKS, BovGBASE, BOVMAP, BSORF,
BTKbase,
CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,
ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, CyanoBase, dbCFC, dbEST,
dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC,
ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB,
ESTHER,
FlyBase, FlyView, GCRDB, GDB, GENATLAS, Genbank, GeneCards, Genlilesne,
GenLink,
GENOTK, GenProtEC, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,
HAEMB,
HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase,
HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA,
KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI,
MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR,
MutBase,
MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase,
PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS,
GenBank
Bioinformatics topics
Include but are not limited to:
• Organization, classification, dissemination and analysis of biological and biomedical
data (particularly ‘-omics' data).
• Biological sequence analysis and phylogenetics.
• Genome organization and evolution.
• Regulation of gene expression and epigenetics.
• Biological pathways and networks in healthy & disease states.
• Protein structure prediction from sequence.
• Modeling and prediction of the biophysical properties of biomolecules for binding
prediction and drug design.
• Design of biomolecular structure and function.
References
• http://mscbio2025.csb.pitt.edu/notes/cheminformatics.slides.html#/
• https://bioboot.github.io/bggn213_f17/lectures/#1