0% found this document useful (0 votes)
42 views14 pages

Introduction To Bioinformatics

Introduction to Bioinformatics

Uploaded by

ajays162616
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views14 pages

Introduction To Bioinformatics

Introduction to Bioinformatics

Uploaded by

ajays162616
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

29-07-2024

Introduction to Bioinformatics
24 – 07 – 2023
23-07-2024

Scope of this course


• Review of the cell structure

• Biomolecules (proteins, carbohydrates, lipids, nucleic acids)– their structure,


function and chemistry

• Understand the functioning of biological systems (interaction of the biomolecules in


the cell - METABOLISM)

• Sequence analysis: BLAST, Multiple sequence alignment, Phylogenetic analysis,


Protein sequence

• Biological data collection, interpretation, and analysis.


• Develop analytical ability to solve real-world problems using these methodologies.

1
29-07-2024

What is Bioinformatics
• Bioinformatics is a multidisciplinary field that utilizes computer
programming, machine learning, algorithms, statistics, and
other computational tools to organize and analyze large
volumes of biological data.
• Bioinformatic tools are software programs designed for
• for extracting the meaningful information from the mass of molecular
biology / biological databases
• to carry out sequence or structural analysis.

2
29-07-2024

What is Biological data?

>NM_001384529.1 Homo sapiens GATA zinc finger domain containing 2A (GATAD2A), transcript variant 21, mRNA
AGTGTGAGACTGAGCCGCGAGACTGAGCTGCGGCTCCGAGCGCTGCGCGGCGGCTCCTCCCGCCCAGGGT
CAGCGCCCCGGCGCGCGCACGCGCACCCCCGCCGCCCGAGCGCGCCCCGCGCCGCCCGCGCAGTCGGTCG
GTCGGTCGTCTGTCCTGTCGCCGCTGCCGCCGCCGCCACAGCGGCCGCCGCGGGCGCCACCTGAGGGAGT
CGCCTCCGCGGGACGCCACAAGACCTGACCGGACTGCGCCGCCCGAGGCCGTCGGCCGCCGTCAGCGAGG
GCGCCGAGCAACTTCGGAGCAACAGATTTGGATGAAACCCTGAGGATCCCAGAGCTGAAAGTGAGTTTGA
AGTGTCCGATCCAGTCCTTCAACTCAGAGCACTCCTATCTGTGACACCTCTGCCCACGCATCCAGTATGT
GCAGCACACCTGCTCTGTGACTGACACTCTTGCAGAAGTGGGGCCACTTCAGGGACATGGACAAGGTGTT
GTACCTGCTGTCACAGAGCCTGTTATCTGAATGACCGAAGAAGCATGCCGAACACGGAGTCAGAAACGAG GAGGCCCTC
CGCTTGAACGGGACCCAACAGAGGACGATGTGGAGAGCAAGAAAATAAAAATGGAGAGAGGATTGTTGGC
TTCAGATTTAAACACTGACGGAGACATGAGGGTGACACCTGAGCCGGGAGCAGGTCCAACCCAAGGATTG
CTGAGGGCAACAGAGGCCACGGCCATGGCCATGGGCAGAGGCGAAGGGCTGGTGGGCGATGGGCCCGTGG
ATGAAAAGCA
ACATGCGCACCTCACACAGTGACATGAAGTCCGAGAGGAGACCCCCCTCACCTGACGTGATTGTGCTCTC
CGACAACGAGCAGCCCTCGAGCCCGAGAGTGAATGGGCTGACCACGGTGGCCTTGAAGGAGACTAGCACC GTCCTGAAGA
GAGGCCCTCATGAAAAGCAGTCCTGAAGAACGAGAAAGGATGATCAAGCAGCTGAAGGAAGAATTGAGGT
TAGAAGAAGCAAAACTCGTGTTGTTGAAAAAGTTGCGGCAGAGTCAAATACAAAAGGAAGCCACCGCCCA
GAAGCCCACAGGTTCTGTTGGGAGCACCGTGACCACCCCTCCCCCGCTTGTTCGGGGCACTCAGAACATT
ACGAGAAAG
CCTGCTGGCAAGCCATCACTCCAGACCTCTTCAGCTCGGATGCCCGGCAGTGTCATACCCCCGCCCCTGG
TCCGAGGTGGGCAGCAGGCGTCCTCGAAGCTGGGGCCACAGGCGAGCTCACAGGTCGTCATGCCCCCACT GATGCGGCA
CGTCAGGGGGGCTCAGCAAATCCACAGCATTAGGCAACATTCCAGCACAGGGCCACCGCCCCTCCTCCTG
GCCCCCCGGGCGTCGGTGCCCAGTGTGCAGATTCAGGGACAGAGGATCATCCAGCAGGGCCTCATCCGCG
TCGCCAATGTTCCCAACACCAGCCTGCTCGTCAACATCCCACAGCCCACCCCAGCATCACTGAAGGGGAC
GAGTCAAATA
AACAGCCACCTCCGCTCAGGCCAACTCCACCCCCACTAGTGTGGCCTCTGTGGTCACCTCTGCCGAGTCT
CCAGCAAGCCGACAGGCGGCCGCCAAGCTGGCGCTGCGCAAACAGCTGGAGAAGACGCTACTCGAGATCC CAAAAGGAAG
CCCCACCCAAGCCCCCAGCCCCAGAGATGAACTTCCTGCCCAGCGCCGCCAACAACGAGTTCATCTACCT
GGTCGGCCTGGAGGAGGTGGTGCAGAACCTACTGGAGACACAAGGCAGGATGTCGGCCGCCACTGTGCTG
TCCCGGGAGCCCTACATGTGTGCACAGTGCAAGACGGACTTCACGTGCCGCTGGCGGGAGGAGAAGAGCG
CCACCGCCC
GCGCCATCATGTGTGAGAACTGCATGACAACCAACCAGAAGAAGGCGCTCAAGGTGGAGCACACCAGCCG
GCTGAAGGCCGCCTTTGTGAAGGCGCTGCAGCAGGAACAGGAGATTGAGCAGCGGCTCCTGCAGCAGGGC A
ACGGCCCCTGCACAGGCCAAGGCCGAGCCCACCGCTGCCCCACACCCCGTGCTGAAGCAGGTCATAAAAC
CCCGGCGTAAGTTGGCGTTCCGCTCAGGAGAGGCCCGCGACTGGAGTAACGGGGCTGTGCTACAGGCCTC
CAGCCAGCTGTCCCGGGGTTCGGCCACGACGCCCCGAGGTGTCCTGCACACGTTCAGTCCGTCACCCAAA
CTGCAGAACTCAGCCTCGGCCACAGCCCTGGTCAGCAGGACCGGCAGACATTCTGAGAGAACCGTGAGCG
CCGGCAAGGGCAGCGCCACCTCCAACTGGAAGAAGACGCCCCTCAGCACAGGCGGGACCCTTGCGTTTGT
CAGCCCAAGCCTGGCGGTGCACAAGAGCTCCTCGGCCGTGGACCGCCAGCGAGAGTACCTCCTGGACATG
ATCCCACCCCGCTCCATCCCCCAGTCAGCCACGTGGAAATAGTGCGAGCCAGGCCCCGTGGAAGACGGGC
TCCCTCCTCCCCCACCTGGCCCCTGGTCTAGAAGGACCCACTGCACCACCCTCCGCTGGCTCGGGAAGAC
ACCGTGCCCGCCCCAAGAGCAAGCACCGGCCATGCTGCAGAGGCAAGACCTCAATTCTTGGCTGCAAAGT
TTCATCAGGGCTAGGGGGCTGGTGCCGCCTCATAGGCAGACGAGGATCATCGCTGGGGGACCTTTCCCGT 6
GGGCTTTCTTCCTTTCTCTCTTTGCCTTTAGTTTGCCCGACACCAGCAGAAAAGTGGACCTTGGGGGCTG
GTTCTGCTCCTGGCCCCCTTGTTCAGCCCCTGCCGGCACACGGGCGGCTCACCCTGGACACTGTGATGCG CAT

3
29-07-2024

Protein sequence
• >AAA40590.1 insulin [Octodon degus]
• MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRP
HDRRELEDLQVEQAELGLEAGGLQPSALEMILQKRGIVDQCCNNICTFNQLQ
NYCNVP

Gene Expression Data

4
29-07-2024

Sequencing data

What is biological big data?


• Biological big data are a massive amount of data generated from multi-
omics experiments, such as genomics, transcriptomics, proteomics,
metabolomics, phenomics, glycomics, epigenomics, and other omics.
These data are used to study biological processes and to gain insights
into how living systems work.

10

5
29-07-2024

• The central role of bioinformatics in the modern biological investigation based on ‘omics’
sciences.
Organism's
Proteomics is
transcriptome is the
the large-scale
sum of all of its RNA
study of
transcripts.
proteins.

Genomics is an Metabolomics is the


interdisciplinary scientific study of
field of biology chemical processes
focusing on the involving
structure, metabolites, the
function, small molecule
evolution, substrates,
mapping, and intermediates, and
editing of products of cell
genomes metabolism. 11

• Glycomics - study of glycans, or complex carbohydrates, in cells and


organisms.
• Epigenomics - the study of the epigenome, which is the complete set
of epigenetic modifications on a cell's genetic material.
• Phenomics - the study of an organism's phenotype, or its observable
characteristics, and how they change over time.

12

6
29-07-2024

Big Data Centres:


NCBI - National Center for Biotechnology Information
EMBL - European Molecular Biology Laboratory
IBDC – Indian Biological Data Center

13

Abdalla, H.B. A brief survey on big data: technologies,


terminologies and data-intensive applications. J Big Data 9, 107

BIG DATA APPLICATION (2022). https://doi.org/10.1186/s40537-022-00659-3

14

7
29-07-2024

Techniques used in the big data applications

Abdalla, H.B. A brief survey on big data: technologies, terminologies and data-intensive
applications. J Big Data 9, 107 (2022). https://doi.org/10.1186/s40537-022-00659-3 15

16

8
29-07-2024

Big data technologies/tools can be categorized into four:


(https://bioinformaticsreview.com/20160313/big-data-in-bioinformatics/)
1. Data storage and retrieval:
For mapping sequencing data to specific reference organisms-
• CloudBurst, a parallel computing model.
• Contrail for assembling large genomes
• Crossbow for identifying SNPs from sequence datasets.
• DistMap (a toolkit for distributed short read mapping on a Hadoop cluster)
• SeqWare (to access large-scale whole genome datasets)
• Read Annotation pipeline ( developed by DDBJ, cloud-based pipeline to analyze NGS data)
• Hydra ( for processing large peptide and spectra databases)

17

2. Error Identification: Necessary to identify errors in the sequence datasets


• SAMQA (Sequence Alignment/Map Quality analysis) which identifies errors and ensures
that large-scale genomic data meet the minimum quality standards
• ART - a next-generation sequencing read simulator

3. Data Analysis:
• This feature of big data allows the researchers to analyze the data obtained by performing
experiments.
• GATK (Genome Analysis Toolkit) is a MapReduce-based programming framework -
used for large-scale DNA sequence analysis
• BlueSNP - R package for highly scalable genome-wide association studies using Hadoop
clusters

18

9
29-07-2024

• 4. Platform Integration Deployment:


• integrate big data technologies into user-friendly operations.
• SeqPig - distributed analysis of large sequencing datasets on Hadoop clusters. (reduces
the technological skills required to use MapReduce by reading large formatted files to
feed analysis applications)
• CloVR (Cloud Virtual Resource) is a sequencing analysis package distributed through a
virtual machine
• CloudBioLinux - Coffers genome analysis resources for cloud computing platforms such
as Amazon EC2.

19

20

10
29-07-2024

DNA sequence analysis data -


• convert raw data into
meaningful results that will
guide further research.
????

21

22

11
29-07-2024

And then….
• DNA sequencing analysts help with developing models
and algorithms to manage and analyze Sanger sequencing and
NGS data.
• A genome analyst can provide insights
• into the risk factors for genetic disease in a specific individual
• find new targets for drugs
• help with developing personalized medicine.
• A genome analyst can also help with study design, - the
selection of patients for clinical trials based on their genetic
makeup.
23

Some of the job titles of biological data analyst:

• Bioinformatician
• Bioinformatics Scientist
• Computational Analyst/Biologist
• Genome Analyst
• Genomic Data Analyst
• Rare Disease Analyst/Cancer Analyst

• https://omicstutorials.com/bioinformatics-tools-softwares-programmes/
24

12
29-07-2024

Data Analyzation requires

25

What should I learn?


• JAVA: computer-based biological simulation technologies
• PERL : String manipulation, regular expression matching, file parsing, data
format interconversion etc
• R – for perform statistics, machine learning, visualisations and data analyses.
• Python - high-level programming language - fewer lines of code than would be
possible in languages such as C++ or Java.
• BioXML (eXtensible Markup Language): This is a resource to gather XML
documentation, DTDs and XML aware tools for biology in one location.
• Biocorba: Framework for interlanguage support -interoperability between
bioperl and other perl packages such as Ensembl and the Annotation
Workbench.

26

13
29-07-2024

What do I get from Big data?


• Ontology - Deriving phenotype data from tons of sequences
• Phylogeny - Deriving evolutionary patterns from genetic data
• SNP's - Finding nucleotide bases that differ from the norm to predict
patterns in phenotype...
• Cancer studies - Data science and machine learning technologies -
extract new meaning from large clinical and molecular datasets.
• Basically being trained to look at terabytes of data and derive SOME
knowledge...and it all depends on what your looking for...

27

14

You might also like