0% found this document useful (0 votes)
10 views21 pages

Class03-What Is bioinformatics-2022-SIV2001

Bioinformatics is the application of computational tools to analyze and interpret biological data, facilitating advancements in molecular biology, genomics, and medicine. It encompasses data acquisition, management, analysis, and the development of databases and software for various biological applications. The field has evolved significantly since its inception, impacting areas such as drug development, personalized medicine, and agricultural biotechnology.

Uploaded by

m-9274491
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views21 pages

Class03-What Is bioinformatics-2022-SIV2001

Bioinformatics is the application of computational tools to analyze and interpret biological data, facilitating advancements in molecular biology, genomics, and medicine. It encompasses data acquisition, management, analysis, and the development of databases and software for various biological applications. The field has evolved significantly since its inception, impacting areas such as drug development, personalized medicine, and agricultural biotechnology.

Uploaded by

m-9274491
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

26/10/2022

SIV 2001

Class 3
What is bioinformatics

Biology
• Science that deals with living organisms and life processes.
• Plant, animal and microbial life of an particular region or environment.
• Properties and vital phenomena exhibited by an organism or group of
organisms.
• Elements, processes and interactions in living beings

1
26/10/2022

Computer

• One or who or that which computes;


• specifically: a programmable electronic device that can store,
retrieve and process data, and can also store programs that
control its own action.

Information Science

• The collection, classification, storage, retrieval and


distribution of recorded knowledge
• Treated both as a pure and as an applied science

2
26/10/2022

Information Theory

• Theory that deals statistically with information.


• The measurement of its contents in terms of its
distinguishing characteristics.
• The efficiency of processes of communication of
information between humans and machines
(telecommunication and computing machines)

Knowledge
• Knowing something with familiarity gained through
experience or association
• Understanding of a science, art, or technique
• Being aware of something
• Apprehending fact through reasoning
• Being learned.

• The sum of what is known

3
26/10/2022

Summary
• Data are the facts.

• Information is the organisation of, associations between, and constraints


upon data that allow it to be used by a user or a machine.

• Knowledge is the interpretation of information and its use in a problem


solving context. Knowledge can lead to new insights, which in turn lead to
new innovations and ultimately to wealth creation and improvements in
the quality of life

Bioinformatics - definitions
• Bioinformatics (NIH):
“research, development, or application of computational
tools and approaches for expanding the use of biological,
medical, behavioral or health data, including those to
acquire, store, organize, analyze, or visualize such data.”

• Bioinformatics involves the technology that uses computers or


information technologies for storage, retrieval, manipulation,
and distribution of information related to biological data

4
26/10/2022

What is bioinformatics?

What is Bioinformatics?
Biology
Molecular
Biology

Chemistry Medicine

Bioinformatics

Mathematics
Physics
Statistics

Computer
Science
Informatics

10

5
26/10/2022

Why Bioinformatics

• To better understand a living cell and how it functions at the


molecular level.

• Biologists were searching for methods and algorithms to


store, retrieve, analyze and interpret their huge amount of
empiric biological data.

Origin of bioinformatics and biological database


• The first protein sequence reported was that of bovine
insulin in 1956, consisting of 51 residues.

• Nearly a decade later, the first nucleic acid sequence was


reported, that of yeast tRNAalanine with 77 bases.

• In 1965, Dayhoff gathered all the available sequence data


to create the first bioinformatic database (Atlas of Protein
Sequence and Structure).

6
26/10/2022

Bioinformatics history
• Earliest bioinformatics exercise: Margaret Dayhoff (1965) first
protein sequence database Atlas of Protein Sequence and
Structure (now PIR).

• 1970s:
• Protein structure database (PDB) 1972 with a collection of
ten X-ray crystallographic protein structures.
• Protein Sequence Database (PSD) by Margaret Dayhoff
• Sequence alignment algorithm Needleman-Wunsch
• Sanger sequencing
• Routine sequence comparisons and database searching
• Protein structure prediction algorithm Chou and Fasman
1980s saw establishment of GenBank and FASTA and BLAST

Bioinformatics history
• 1980s:
• Human Genome Project started late 1980s
• The PCR reaction was described by Kary Mullis and co-
workers
• The Smith-Waterman algorithm for sequence alignment
• The SWISS-PROT database
• The FASTP algorithm was published by Lipman & Pearson
• The National Center for Biotechnology Information (NCBI)
was established
• 1990s:
• The BLAST program by Altschul, et. al.
• 2000 and beyond:
• A draft of the human genome (3,000 Mbp) was published

7
26/10/2022

The growth of biological data – nucleotide data

Scales of file size

As of 15 June 2019

8
26/10/2022

Top organisms in GenBank (Release 191)


Organism base pairs
Homo sapiens 1.6310774187×1010
Mus musculus 9.974977889×109
Rattus norvegicus 6.521253272×109
Bos taurus 5.386258455×109
Zea mays 5.062731057×109
Sus scrofa 4.88786186×109
Danio rerio 3.120857462×109
Strongylocentrotus purpuratus 1.435236534×109
Macaca mulatta 1.256203101×109
Oryza sativa Japonica Group 1.255686573×109
Nicotiana tabacum 1.197357811×109
Xenopus (Silurana) tropicalis 1.249938611×109
Drosophila melanogaster 1.11996522×109
Pan troglodytes 1.008323292×109
Arabidopsis thaliana 1.144226616×109
Canis lupus familiaris 951,238,343
Vitis vinifera 999,010,073
Gallus gallus 899,631,338
Glycine max 906,638,854
Triticum aestivum 898,689,329

Bioinformatics approaches
1. Data acquisition

2. Data management/storage

3. Data retrieval

4. Data analysis/interpretation

5. Data compilation

9
26/10/2022

Biological information

Molecular Sequences 3D Structure Biological Functions

Biological information
Molecular Sequences
• Nucleic acid
• Amino acid
• Sequencing Technology
• Sanger sequencing
Gene prediction Genome viewer/browser
• NGS
• 3rd generation sequencing
• Molecular Sequence Analysis
• Sequence alignments  conservation pattern
• Variant (SNPs) analysis
• Comparative genomics Comparative genomics
• Genome Annotation
• Open reading frame
• Functional sites
• Genome Viewer
• ...

Genome annotation (bacteria) Multiple sequence alignment

10
26/10/2022

Biological information

3D structure
• Double helix DNA
• RNA structure
• Protein x-ray crystallography
• 3D structure prediction
double helix DNA

RNA structure Protein x-ray crystallography

Biological information

Biological functions
• Pathway analysis
• Network analysis
• Gene Ontology
• Protein-protein interaction
• Proteomics
Pathway analysis Network analysis
• Metabolomics
• Molecular modeling
• Molecular dynamics
• Phylogenetics
• Evolutional relationship
• Drug design
• Vaccine design
• ...

Molecular dynamics Phylogenetics

11
26/10/2022

Bioinformatics approaches
1. Data acquisition

2. Data management/storage

3. Data retrieval

4. Data analysis/interpretation

5. Data compilation

Where do we use bioinformatics

• Basic genomic and molecular biology research.


• Biotechnology.
• Biomedical sciences.
• Forensic DNA analysis.
• Agriculture biotechnology
• Pharmacology
• Anthropology

12
26/10/2022

What do bioinformatics do?

Input:
Biological information
(data)

BIOINFORMATICS

Output:
New Biological information
( New data and Knowledge)

What do bioinformatics do?


1. Development of computational tools and databases
• Software for sequence analysis
• Sequence alignment, sequence database searching, motif and pattern
discovery, gene and promoter finding, reconstruction of evolutionary
relationships, genome assembly and comparison
• Software for structural analysis
• Protein and nucleic acid structural analysis, comparison, classification and
prediction
• Software for functional analysis
• Gene expression profiling, protein-protein interaction prediction, protein
sub-cellular location prediction, metabolic pathway reconstruction
• Construction and curation of biological databases

13
26/10/2022

What do bioinformatics do?


2. Generate biological knowledge to better understand living systems
• Analyse and interpret the various types of biological data.
• Often identify new problems that require new software to analyze.
• Bioinformatics is essential for basic genomic and molecular biology research
• Sequence assembly
• Genome annotation
• Molecular evolution
• Analysis of gene expression
• Analysis of gene regulation
• Major impact in biotechnology and biomedical sciences
• Structural bioinformatics and Knowledge-based drug design
• Protein structure analysis
• Protein structure prediction
• 3D structure allows design of ligands that fit
• Reduces time and cost to develop drugs

What do bioinformatics do?


2. Generate biological knowledge to better understand living systems (cont.)
• Forensic DNA analysis
• Bayesian statistics and likelihood-based methods
• Personalised healthcare
• Agricultural biotechnology
• Plant genome databases
• Gene expression profiles
• New crop varieties

14
26/10/2022

Main players

Organization Database

NCBI (NIH) GenBank

EMBL Nucleotide Reorganized into


EMBL 
Sequence Database ENA

NIG, Shizuoka, Japan DDBJ

International Nucleotide Sequence Database Collaboration:


• GenBank
• ENA
• DDBJ

https://www.nih.gov/

Institutes at NIH

National Library of Medicine (NLM)

National Center for


Biotechnology Information (NCBI)

15
26/10/2022

https://www.ncbi.nlm.nih.gov/

NCBI resources
• Databases:
• PubMed
• Bookshelf
• Sequence Read Archive (SRA)
• Online Mendelian Inheritance in Man (OMIM)
• ClinVar
• BioSystems
• Nucleotide Database
• GenBank  BLAST tools
• Reference Sequence (RefSeq)
• Database of Short Genetic Variations (dbSNP)
• Tools
• 1000 Genomes Browser
• Basic Local Alignment Search Tool (BLAST)
• Genome Data Viewer (GDV)

16
26/10/2022

https://www.embl.org/
• EMBL Heidelberg, Germany - Main
laboratory
• EMBL Hamburg, Germany - Structural
biology
• EMBL Barcelona, Spain - Tissue biology
and disease modelling
• EMBL-EBI Hinxton, United kingdom -
European Bioinformatics Institute
• EMBL Grenoble, France -Structural
biology
• EMBL Rome, Italy - Epigenetics and
neurobiology

Resources and tools in EMBL


• ArrayExpress – archive of gene expression experiments
• BioModels – a database of computational models relevant to the life sciences
• BioStudies – a database that serves as a generic data archive at EMBL-EBI for
biomolecular datasets
• European Nucleotide Archive (ENA) – resource of nucleotide sequencing
information
• Ensembl project – genome databases for vertebrates and other eukaryotic species
(joint with Wellcome Trust Sanger Institute)
• Expression Atlas – database of summary information on which genes are
expressed under which conditions
• Gene ontology – ontology of gene functions and processes
• InterPro – database of protein functional domains and families
• MetaboLights – repository of metabolomics data
• Protein Data Bank in Europe – European resource for the collection, organisation
and dissemination of data on biological macromolecular structures
• UniProt – database of protein sequence and functional information (joint with
Swiss Institute of Bioinformatics and Protein Information Resource)

17
26/10/2022

https://www.nig.ac.jp/nig/

NIG resources
Mouse Microorganisms NBRP – National BioResource
– Mouse Genetic Resources – E. Coli: Strain/Vector/Antibody Project
– Mouse Genome Database – E. Coli: Genome Database SHIGEN – Shared Information of
– Mouse Phenotype Database – E. Coli: TEC Database Genetic Resources
– Japan Mouse/Rat Strain – S. Japonicus (JapoNet) RRC – Research Resource Circulation
– Microsatellite Data Base of Japan – Bacillus subtilis DDBJ - DNA Data Bank of Japan
– RefEx for Mouse or Rat Fish
Human – Zebrafish: zTRAP
– dbHERV-REs – Zebrafish: Knock Out Fish Project
– RefEx for human – Coelacanth
Drosophila Aquatic organisms
– Drosophila Strains (NIG-FLY) – Hydra
– Segmentation Antibodies – Xenopus laevis
C. Elegans – Sea urchin (Hemicentrotus pulcherrimus)
– Gene Expression Database (NEXTDB) Plants
– cDNA library – Rice (Oryzabase)
– Liverwort (Marchantia polymorph

18
26/10/2022

International Nucleotide Sequence


Database Collaboration
• GenBank
• ENA
• DDBJ

Impact of bioinformatics
• Personal Genomics
• Increased vigilance and taking action to prevent disease
• Improving health care  provide individual/specific medical care
• Understanding the link between genomics and environment
• Novel Drug Development
• Identifying novel drug targets
• Validating drug targets
• Predicting toxicity and adverse reactions
• Improving clinical trials and testing
• Gene therapy
• Replacing the gene rather than the gene product
• Stem cells therapies
• Replacing the entire cell type or tissue to cure a disease

19
26/10/2022

Impact of Bioinformatics

• Pharmacogenomics
• Personalized medicine  Adjusting drug, amounts and delivery to suit
patients
• Maximize efficacy and minimize side effects
• Identify genetics of adverse reactions
• Identify patients who respond optimally

Impact of Bioinformatics

• Ethical, Legal and Social Issues Personal Privacy


• Insurability
• Employability
• Discrimination
• Genetic selection versus eugenics
• Cosmetic genetics
• Patentability of genes, proteins and other natural products

20
26/10/2022

Limitations

• Realize the limitation and avoid over-reliance on and


over-expectation of bioinformatics output.
• Accuracy.
• Quality.

Bioinformatics is expanding…

• More reliable tools for sequence, structural analysis and


functional analysis.
• Development of tools to understand the functions and
interactions of all gene products in a cell: system biology.

21

You might also like