01-What Is Bioinformatics
01-What Is Bioinformatics
BINF2010 T3 12024
• Define bioinformatics
• Describe some common
sub-fields and application
areas of bioinformatics
Introduction
What is bioinformatics?
• Interdisciplinary field of science Mathematics/
that develops methods and Statistics
software tools for understanding
biological data
• Involves using computer
technology to collect, store, Bioinformatics
analyze and disseminate
Biology Computer
biological data and information (Chemistry, science
Physics)
Introduction
Problems in biology
Computational biologists
Algorithms
Software
developers/engineers
Bioinformatics system
Biologists who use
computers
Discovery in biology
Scientist in
collaboration with experimental research groups
• Development of bioinformatics software and pipelines and support in the translation of
research projects to routine analyses pipelines in our core facility
Genomics or • University Degree (on at least a master’s level, a PhD is of strong advantage) in a field
related to the tasks (e.g. (bio-) informatics, mathematics, statistics, physics, quantitative
biology)
Proteomics • Practical experience in the development of algorithms or statistical procedures for the
analysis of experimental high-throughput data
• Knowledge in the analysis of high-throughput data, in particular, next generation
sequencing or mass spectrometry
• Good programming skills and comfort level with software development methodology
• Thorough understanding of molecular biology (genomics/proteomics)
Introduction
You will manage and develop software supporting the automatic annotation pipelines for
complete genomes in UniProt. You will be responsible for the design, development and
maintenance of software for the Java data services providing protein annotations to
the scientific community and computational biologists (…)You are expected to be
innovative and work with the team in the extension of the current software components
as well as to evaluate and promote new software methods, tools and programming
models for a robust and interoperable programming framework.
You will have a background in computing, and/or bioinformatics. Proven work experience
of programming with Java and related technologies such as Spring/Guice, Lucene/Solr,
and Java RESTful services are essential. You will be familiar with relational databases
(preferably Oracle) and have a general understanding of NoSQL databases, as well as
knowledge of SQL and Unix shell scripting (preferably bash). You should also be
familiar with standard development tools (continuous integration e.g. Jenkins, build
management e.g. Maven, source code management systems e.g. GIT, etc). Knowledge
of Perl and/or Python would be advantageous, and knowledge of JavaScript and
frameworks such as AngularJS would be beneficial.
Introduction
The goal of our lab is to identify biomarkers and targets for brain tumor
propagating cells, and to understand how these cells interact with the
tumor micro-environment. Our approach involves single-cell and bulk
Researcher
RNA/DNA sequencing, from primary and recurrent patient glioblastoma
samples. Collaborating labs, and clinicians, will provide coordinated
bisulfite and ChIP sequencing assays, MRI and patient metadata, which we
will integrate into comprehensive models of tumor evolution and response
in tumor to treatment.
heterogeneity/ bioinformatics analyses using state-of-the-art software tools, and who also
possess a deep knowledge of some aspect of cancer biology.
Subfields in bioinformatics
• Sequences • Systems
• DNA, RNA, protein • Transcriptomics
sequence analysis • Proteomics
• Genomics, variation, • Quantitative modelling
populations and simulation
• Structures • Others
• 3D structures, mainly • Image analysis
of proteins • Clinical bioinformatics
• Molecular interactions • Databases, ontologies,
(proteins/DNA/small and data mining
molecules)
Subfields
Sequences Protein
TATACAAGAAAGTTTGTACT QTELATKAGVKQQSIQLIEAGVTK
Subfields
Genetic relationships
Genome aggregation
database (gnomAD)
Subfields
Structures: protein
Secondary
Tertiary
Quaternary
Subfields
Structures: RNA
tRNA
Secondary
Tertiary
Subfields
Structures: DNA
Subfields
Ribosome Nucleus
tRNA DNA
rRNA lncRNA siRNA
miRNA
Nucleolus
snoRNA
Cytoplasm
mRNA
Subfields
Infamous “hairball”
ClinVar
Trans-Omics for Precision Medicine Clinical pipeline
(TOPMed) Program
GWAS catalogue
Subfields
Clinical trial
simulations
Vaccine development
Drug design
institutes Immunogenomics
Neurogenomics
Epigenomics
Genomic cancer medicine
Clinical genomics
Clinical immunogenomics
Computational Genomics
Computational Cardiology
Structural Biology
Cardiac Electrophysiology
Bioinformatics research
Bioinformatics in industry
Bioinformatics has been enormously boosted by the biotechnology industry.
• Dedicated bioinformatics companies
• Geneious, Benchling, DNAnexus, …
• Platform technology companies
• Illumina, Affymetrix, Qiagen, Life Technologies…
• Multinational pharmaceutical companies
• GSK, Novartis, Aventis, Astra Zeneca…
• A large Open Source bioinformatics movement coexists with corporate
interests.
• See www.bioinformatics.org for a starting point.
Bioinformatics resources
Bioinformatics journals
• Top (Google Scholar ranking)
• Bioinformatics
• Briefings in Bioinformatics
• PLOS Computational Biology
• BMC Bioinformatics
• GigaScience
• IEEE/ACM Transactions on Computational Biology and Bioinformatics
• Others:
• Genome Biology, Genome Research, Molecular Systems Biology,
eLife, Nature Methods
Bioinformatics resources
• Macromolecular Sequence, Structure and Function
Building Blocks
(Genes/Molecules)
Kanehisa (2000) Post-genome Informatics
Bioinformatics uses
Prediction
Rule extraction
Knowledge
Bioinformatics uses
Common questions
• A chunk of DNA from an extinct organism has just been sequenced.
What functions might this DNA encode?
• How did our blood clotting system arise during evolution?
• What would happen if extra genes for ethanol production enzymes were added to
Saccharomyces cerevisiae (brewer’s yeast) for production of biofuel?
• A new HIV protease inhibitor is in a drug company’s development pipeline.
mRNA expression profiles have been run on a number of human cell lines - should
development proceed further?
• A surgical patient is genotyped and found to carry the CYP2C9*2 allele (a cytochrome P450
IIC Arg144 -> Cys mutation).
Should they be given warfarin anticoagulant therapy?
Bioinformatics uses