0% found this document useful (0 votes)
31 views40 pages

01-What Is Bioinformatics

Uploaded by

mrguochengzong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views40 pages

01-What Is Bioinformatics

Uploaded by

mrguochengzong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

What is bioinformatics?

BINF2010 T3 12024

Sara Ballouz, PhD


CSE, UNSW
K17 401A
Learning outcomes

• Define bioinformatics
• Describe some common
sub-fields and application
areas of bioinformatics
Introduction

What is bioinformatics?
• Interdisciplinary field of science Mathematics/
that develops methods and Statistics
software tools for understanding
biological data
• Involves using computer
technology to collect, store, Bioinformatics
analyze and disseminate
Biology Computer
biological data and information (Chemistry, science
Physics)
Introduction

Bioinformatics: brief history


• In 1966, Margaret Dayhoff pioneered the use of
computers in comparing protein sequences and
reconstructing their evolutionary histories
from sequence alignments.
• The first definition of the term bioinformatics was
coined by Paulien Hogeweg and Ben Hesper in 1970, to
refer to the study of “informatic processes in biotic
systems.”
“Propelled by the exponential increase of sequence data, the
term bioinformatics became mainstream in the late 1980s,
coming to mean the development and use of computational
methods for data management and data analysis of sequence
data, protein structure determination, homology-based
function prediction, and phylogeny.” - Hogeweg (2011)
Introduction

Problems in biology
Computational biologists

Algorithms
Software
developers/engineers
Bioinformatics system
Biologists who use
computers
Discovery in biology

Who are bioinformaticians?


Introduction

What does a “bioinformatician” do?


• Research in “computational biology”
• Developing new algorithms and new ways of looking at biological data
• Applying developments in computer science and mathematics to biological
problems

This requires a good understanding of both the computational


sciences and of biology problems
Introduction

Tasks and responsibilities


• Development and optimization of bioinformatics algorithms and/or statistical procedure
in computational genomics or proteomics. In computational genomics, we primarily
focus on application in metagenomics and the detection of infection chains; in
proteomics on applications in metaproteomics, quantitative proteomics, and
Research proteogenomics.
• Data analysis for next generation sequencing data or mass spectrometry data in close

Scientist in
collaboration with experimental research groups
• Development of bioinformatics software and pipelines and support in the translation of
research projects to routine analyses pipelines in our core facility

Computational • Training, advising, and supporting junior group members


• Publication of results and presentation at scientific meetings
Profile

Genomics or • University Degree (on at least a master’s level, a PhD is of strong advantage) in a field
related to the tasks (e.g. (bio-) informatics, mathematics, statistics, physics, quantitative
biology)

Proteomics • Practical experience in the development of algorithms or statistical procedures for the
analysis of experimental high-throughput data
• Knowledge in the analysis of high-throughput data, in particular, next generation
sequencing or mass spectrometry
• Good programming skills and comfort level with software development methodology
• Thorough understanding of molecular biology (genomics/proteomics)
Introduction

What does a “bioinformatician” do?


• Software engineering for biological research
• Choose and implement algorithms in an efficient manner (suitable for
high-throughput analysis)
• Design and implement databases to store and retrieve biological data
• Design and implement user interfaces for biologists to be able to use the
software
• Address issues such as performance, diverse resource integration, inter-
operability, scalability, data access, security, networks, costs
Software engineering skills, understanding of biological
data and biologist needs
Introduction

You will manage and develop software supporting the automatic annotation pipelines for
complete genomes in UniProt. You will be responsible for the design, development and
maintenance of software for the Java data services providing protein annotations to
the scientific community and computational biologists (…)You are expected to be
innovative and work with the team in the extension of the current software components
as well as to evaluate and promote new software methods, tools and programming
models for a robust and interoperable programming framework.

The primary responsibilities include:


• Design, development and maintenance of software for the automatic annotation of
large data sets
• Design, development and maintenance of software for the provision of data services,

Software including the Java API for the scientific community


• Assist with programming standards to promote best practices
• Support the databases and develop and execute production release pipelines of the

Engineer UniProt resources


• Assist with innovative programming technologies.

You will have a background in computing, and/or bioinformatics. Proven work experience
of programming with Java and related technologies such as Spring/Guice, Lucene/Solr,
and Java RESTful services are essential. You will be familiar with relational databases
(preferably Oracle) and have a general understanding of NoSQL databases, as well as
knowledge of SQL and Unix shell scripting (preferably bash). You should also be
familiar with standard development tools (continuous integration e.g. Jenkins, build
management e.g. Maven, source code management systems e.g. GIT, etc). Knowledge
of Perl and/or Python would be advantageous, and knowledge of JavaScript and
frameworks such as AngularJS would be beneficial.
Introduction

What does a “bioinformatician” do?


• Research in biology: using computers for biological discovery
• Using existing software and databases, applying established methods
• Combining existing software and databases into new combinations
(scripting rather than programming)
• (Occasionally) developing new software and databases as needed

This is biology, but needs to understand the limitations and


assumptions of the software
Introduction

The goal of our lab is to identify biomarkers and targets for brain tumor
propagating cells, and to understand how these cells interact with the
tumor micro-environment. Our approach involves single-cell and bulk

Researcher
RNA/DNA sequencing, from primary and recurrent patient glioblastoma
samples. Collaborating labs, and clinicians, will provide coordinated
bisulfite and ChIP sequencing assays, MRI and patient metadata, which we
will integrate into comprehensive models of tumor evolution and response
in tumor to treatment.

We are particularly interested in candidates with experience performing

heterogeneity/ bioinformatics analyses using state-of-the-art software tools, and who also
possess a deep knowledge of some aspect of cancer biology.

microenvironment The position requires a PhD in the biomedical sciences or bioinformatics,


experience in cancer research and in using modern software tools to study
‘omics data.
Subfields

Subfields in bioinformatics
• Sequences • Systems
• DNA, RNA, protein • Transcriptomics
sequence analysis • Proteomics
• Genomics, variation, • Quantitative modelling
populations and simulation
• Structures • Others
• 3D structures, mainly • Image analysis
of proteins • Clinical bioinformatics
• Molecular interactions • Databases, ontologies,
(proteins/DNA/small and data mining
molecules)
Subfields

Sequences Protein

DNA RNA Polypeptide chain

Thymine (T) Base pair Uracil (U)


Nucleotides or Amino Acids
nucleobases
Backbone
or sugar-
Cytosine (C) phosphate Cytosine (C)
helix

Adenine (A) Adenine (A)

Guanine (G) Guanine (G)

Deoxyribonucleic acid Ribonucleic acid

TATACAAGAAAGTTTGTACT QTELATKAGVKQQSIQLIEAGVTK
Subfields

Variation and populations Genomes

Genetic relationships

Genome-wide association study (GWAs)

Genome aggregation
database (gnomAD)
Subfields

Structures: protein

Secondary
Tertiary
Quaternary
Subfields

Structures: RNA
tRNA

Secondary
Tertiary
Subfields

Structures: DNA
Subfields

Systems: transcriptomics Quantitate all mRNAs


simultaneously i.e., the
What genes are expressed in a cell? “transcriptome” of a cell
What RNAs are present in a cell?
In what quantity?
Microarrays RNA sequencing
Cell (RNA-seq)

Ribosome Nucleus
tRNA DNA
rRNA lncRNA siRNA
miRNA
Nucleolus
snoRNA

Cytoplasm
mRNA
Subfields

Systems: proteomics Proteome is more complex.


Separation and identification
by physical properties (e.g.,
What proteins are expressed? charge, weight).
What isoform (splicing/translation)
and in which state (post-translational 2D-PAGE Mass Spectrometry (MS)
modifications)?
Subfields

Systems: pathways and networks


Protein interactions How to represent molecular webs ?
Gene networks How to extract meaningful information ?
Biochemical pathways How to deal with context and kinetics ?
Subfields

Infamous “hairball”

Protein interactions in yeast


Subfields

Systems: modelling and simulations


Gray-Scott model Kinesin
Turing Pattern Cellular Automaton
[A]new = [A] + ΔA + f(1-[A]) - r · [A] · [B]2
[B]new = [B] + ΔB - k · [B] + r · [A] · [B]2
A + 2B → 3B
Feed parameter (f)

Spectral color map:


[B]/([A]+[B])

Death parameter (k)


Subfields
3D rendering Spatial quantification
Image analysis
How to distinguish organs
from tumors?
Best way to get spatially
resolved gene/protein?
Segmentation Tracking Object detection
Subfields

Databases, ontologies, data mining, …


How to represent and store information?

Database Ontology Data mining


Subfields

Clinical and translational


GWAs catalogue

ClinVar
Trans-Omics for Precision Medicine Clinical pipeline
(TOPMed) Program
GWAS catalogue
Subfields

Medicinal product development


Docking
Toxicity modelling

Clinical trial
simulations
Vaccine development
Drug design

Can we identify suitable drug and vaccine targets?


Bioinformatics research

UNSW and affiliated


research centres and Cellular genomics

institutes Immunogenomics
Neurogenomics
Epigenomics
Genomic cancer medicine
Clinical genomics
Clinical immunogenomics

Computational Genomics
Computational Cardiology
Structural Biology
Cardiac Electrophysiology
Bioinformatics research

Research centres and institutes in


Sydney
Bioinformatics industry

Bioinformatics in industry
Bioinformatics has been enormously boosted by the biotechnology industry.
• Dedicated bioinformatics companies
• Geneious, Benchling, DNAnexus, …
• Platform technology companies
• Illumina, Affymetrix, Qiagen, Life Technologies…
• Multinational pharmaceutical companies
• GSK, Novartis, Aventis, Astra Zeneca…
• A large Open Source bioinformatics movement coexists with corporate
interests.
• See www.bioinformatics.org for a starting point.
Bioinformatics resources

Dedicated conferences and societies


• International Society for • Intelligent Systems for Molecular Biology (ISMB)
Computational Biology (ISCB) • European Conference on Computational Biology
• Australian Bioinformatics and (ECCB)
Computational Biology Society • Research in Computational Molecular Biology
(ABACBS) (RECOMB)
• Australian Computational Biology • International Conference on Bioinformatics (InCoB)
and Bioinformatics Student Society
(COMBINE) • Great Lakes Bioinformatics (GLBIO), Rocky
Bioinformatics Conference
• EMBL conferences (e.g., Cancer genomics, … )
• CSHL meetings (e.g., Biology of Genomes, Genome
Informatics, Systems Biology)
• Oz Single Cell, Human Cell Atlas meetings, ..
Bioinformatics resources

Bioinformatics journals
• Top (Google Scholar ranking)
• Bioinformatics
• Briefings in Bioinformatics
• PLOS Computational Biology
• BMC Bioinformatics
• GigaScience
• IEEE/ACM Transactions on Computational Biology and Bioinformatics
• Others:
• Genome Biology, Genome Research, Molecular Systems Biology,
eLife, Nature Methods
Bioinformatics resources
• Macromolecular Sequence, Structure and Function

ISMB 2024: • Bio-Ontologies


• Bioinformatics core facilities

Conference areas • Bioinformatics Education


• Visualisation in biology
• Critical assessment of massive data analysis
• Mass spectrometry and proteomics
• Evolution and comparative genomics
• High-throughput DNA sequencing
• Bioinformatics of Microbes and Microbiomes
• Machine learning in computational biology
• Systems biology and networks
• Computational modelling of biological systems
• Regulatory Systems Genomics
• Text mining for healthcare and biology
• Translational medical bioinformatics
• Interpretation of genomic variation
• Integrative RNA biology
• Open Source Bioinformatics
• General Computational Biology
So what use is
bioinformatics?
Bioinformatics uses

Reductionist and synthetic


approaches in biology
Biological System
(Organism)
Reductionist Synthetic
Approach Approach
(Experiments) (Bioinformatics)

Building Blocks
(Genes/Molecules)
Kanehisa (2000) Post-genome Informatics
Bioinformatics uses

Modelling, learning, prediction,


simulation
Data
Model
Learning Simulation

Prediction
Rule extraction

Knowledge
Bioinformatics uses

Common questions
• A chunk of DNA from an extinct organism has just been sequenced.
What functions might this DNA encode?
• How did our blood clotting system arise during evolution?
• What would happen if extra genes for ethanol production enzymes were added to
Saccharomyces cerevisiae (brewer’s yeast) for production of biofuel?
• A new HIV protease inhibitor is in a drug company’s development pipeline.
mRNA expression profiles have been run on a number of human cell lines - should
development proceed further?
• A surgical patient is genotyped and found to carry the CYP2C9*2 allele (a cytochrome P450
IIC Arg144 -> Cys mutation).
Should they be given warfarin anticoagulant therapy?
Bioinformatics uses

Some “Holy Grails” in bioinformatics


• Predicting the structure of proteins from their sequence (solved?)
• Predicting what a protein does and what it interacts with
• ”Reverse engineering” the effect of genetic variations on disease and
treatments
• Building a computer model of a cell or of an organism, so the effect of
drugs can be tested by computer simulation
• Simulating a clinical drug trial
• Understanding and predicting the regulatory “code”
• Interdisciplinary field of science, using

Summary computer technology to collect, store,


analyze and disseminate biological
data
• Bioinformaticians are:
• Computational biologists
• Software developers/engineers
• Biologists who use computers
• Sub-fields:
• Sequences
• Structures
• Systems
• Clinical
• Data science
Further reading/resources
• https://www.nature.com/articles/nmeth.1427
• https://askabiologist.asu.edu/venom/protein-art
• Kanehisa (2000) Post-genome Informatics
• https://scholar.google.com/citations?view_op=top_venues&hl
=en&vq=eng_bioinformatics&inst=7289110936595769722
• https://www.sciencedirect.com/science/article/pii/S09581669
16301082
BINFSOC is going to ABACBS!
• Contact Donren for more information:

You might also like