0% found this document useful (0 votes)

42 views14 pages

Introduction To Bioinformatics

Introduction to Bioinformatics

Uploaded by

ajays162616

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views14 pages

Introduction To Bioinformatics

Introduction to Bioinformatics

Uploaded by

ajays162616

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

29-07-2024

Introduction to Bioinformatics
24 – 07 – 2023
23-07-2024

Scope of this course

• Review of the cell structure

• Biomolecules (proteins, carbohydrates, lipids, nucleic acids)– their structure,

function and chemistry

• Understand the functioning of biological systems (interaction of the biomolecules in

the cell - METABOLISM)

• Sequence analysis: BLAST, Multiple sequence alignment, Phylogenetic analysis,

Protein sequence

• Biological data collection, interpretation, and analysis.

• Develop analytical ability to solve real-world problems using these methodologies.

1
29-07-2024

What is Bioinformatics
• Bioinformatics is a multidisciplinary field that utilizes computer
programming, machine learning, algorithms, statistics, and
other computational tools to organize and analyze large
volumes of biological data.
• Bioinformatic tools are software programs designed for
• for extracting the meaningful information from the mass of molecular
biology / biological databases
• to carry out sequence or structural analysis.

2
29-07-2024

What is Biological data?

>NM_001384529.1 Homo sapiens GATA zinc finger domain containing 2A (GATAD2A), transcript variant 21, mRNA
AGTGTGAGACTGAGCCGCGAGACTGAGCTGCGGCTCCGAGCGCTGCGCGGCGGCTCCTCCCGCCCAGGGT
CAGCGCCCCGGCGCGCGCACGCGCACCCCCGCCGCCCGAGCGCGCCCCGCGCCGCCCGCGCAGTCGGTCG
GTCGGTCGTCTGTCCTGTCGCCGCTGCCGCCGCCGCCACAGCGGCCGCCGCGGGCGCCACCTGAGGGAGT
CGCCTCCGCGGGACGCCACAAGACCTGACCGGACTGCGCCGCCCGAGGCCGTCGGCCGCCGTCAGCGAGG
GCGCCGAGCAACTTCGGAGCAACAGATTTGGATGAAACCCTGAGGATCCCAGAGCTGAAAGTGAGTTTGA
AGTGTCCGATCCAGTCCTTCAACTCAGAGCACTCCTATCTGTGACACCTCTGCCCACGCATCCAGTATGT
GCAGCACACCTGCTCTGTGACTGACACTCTTGCAGAAGTGGGGCCACTTCAGGGACATGGACAAGGTGTT
GTACCTGCTGTCACAGAGCCTGTTATCTGAATGACCGAAGAAGCATGCCGAACACGGAGTCAGAAACGAG GAGGCCCTC
CGCTTGAACGGGACCCAACAGAGGACGATGTGGAGAGCAAGAAAATAAAAATGGAGAGAGGATTGTTGGC
TTCAGATTTAAACACTGACGGAGACATGAGGGTGACACCTGAGCCGGGAGCAGGTCCAACCCAAGGATTG
CTGAGGGCAACAGAGGCCACGGCCATGGCCATGGGCAGAGGCGAAGGGCTGGTGGGCGATGGGCCCGTGG
ATGAAAAGCA
ACATGCGCACCTCACACAGTGACATGAAGTCCGAGAGGAGACCCCCCTCACCTGACGTGATTGTGCTCTC
CGACAACGAGCAGCCCTCGAGCCCGAGAGTGAATGGGCTGACCACGGTGGCCTTGAAGGAGACTAGCACC GTCCTGAAGA
GAGGCCCTCATGAAAAGCAGTCCTGAAGAACGAGAAAGGATGATCAAGCAGCTGAAGGAAGAATTGAGGT
TAGAAGAAGCAAAACTCGTGTTGTTGAAAAAGTTGCGGCAGAGTCAAATACAAAAGGAAGCCACCGCCCA
GAAGCCCACAGGTTCTGTTGGGAGCACCGTGACCACCCCTCCCCCGCTTGTTCGGGGCACTCAGAACATT
ACGAGAAAG
CCTGCTGGCAAGCCATCACTCCAGACCTCTTCAGCTCGGATGCCCGGCAGTGTCATACCCCCGCCCCTGG
TCCGAGGTGGGCAGCAGGCGTCCTCGAAGCTGGGGCCACAGGCGAGCTCACAGGTCGTCATGCCCCCACT GATGCGGCA
CGTCAGGGGGGCTCAGCAAATCCACAGCATTAGGCAACATTCCAGCACAGGGCCACCGCCCCTCCTCCTG
GCCCCCCGGGCGTCGGTGCCCAGTGTGCAGATTCAGGGACAGAGGATCATCCAGCAGGGCCTCATCCGCG
TCGCCAATGTTCCCAACACCAGCCTGCTCGTCAACATCCCACAGCCCACCCCAGCATCACTGAAGGGGAC
GAGTCAAATA
AACAGCCACCTCCGCTCAGGCCAACTCCACCCCCACTAGTGTGGCCTCTGTGGTCACCTCTGCCGAGTCT
CCAGCAAGCCGACAGGCGGCCGCCAAGCTGGCGCTGCGCAAACAGCTGGAGAAGACGCTACTCGAGATCC CAAAAGGAAG
CCCCACCCAAGCCCCCAGCCCCAGAGATGAACTTCCTGCCCAGCGCCGCCAACAACGAGTTCATCTACCT
GGTCGGCCTGGAGGAGGTGGTGCAGAACCTACTGGAGACACAAGGCAGGATGTCGGCCGCCACTGTGCTG
TCCCGGGAGCCCTACATGTGTGCACAGTGCAAGACGGACTTCACGTGCCGCTGGCGGGAGGAGAAGAGCG
CCACCGCCC
GCGCCATCATGTGTGAGAACTGCATGACAACCAACCAGAAGAAGGCGCTCAAGGTGGAGCACACCAGCCG
GCTGAAGGCCGCCTTTGTGAAGGCGCTGCAGCAGGAACAGGAGATTGAGCAGCGGCTCCTGCAGCAGGGC A
ACGGCCCCTGCACAGGCCAAGGCCGAGCCCACCGCTGCCCCACACCCCGTGCTGAAGCAGGTCATAAAAC
CCCGGCGTAAGTTGGCGTTCCGCTCAGGAGAGGCCCGCGACTGGAGTAACGGGGCTGTGCTACAGGCCTC
CAGCCAGCTGTCCCGGGGTTCGGCCACGACGCCCCGAGGTGTCCTGCACACGTTCAGTCCGTCACCCAAA
CTGCAGAACTCAGCCTCGGCCACAGCCCTGGTCAGCAGGACCGGCAGACATTCTGAGAGAACCGTGAGCG
CCGGCAAGGGCAGCGCCACCTCCAACTGGAAGAAGACGCCCCTCAGCACAGGCGGGACCCTTGCGTTTGT
CAGCCCAAGCCTGGCGGTGCACAAGAGCTCCTCGGCCGTGGACCGCCAGCGAGAGTACCTCCTGGACATG
ATCCCACCCCGCTCCATCCCCCAGTCAGCCACGTGGAAATAGTGCGAGCCAGGCCCCGTGGAAGACGGGC
TCCCTCCTCCCCCACCTGGCCCCTGGTCTAGAAGGACCCACTGCACCACCCTCCGCTGGCTCGGGAAGAC
ACCGTGCCCGCCCCAAGAGCAAGCACCGGCCATGCTGCAGAGGCAAGACCTCAATTCTTGGCTGCAAAGT
TTCATCAGGGCTAGGGGGCTGGTGCCGCCTCATAGGCAGACGAGGATCATCGCTGGGGGACCTTTCCCGT 6
GGGCTTTCTTCCTTTCTCTCTTTGCCTTTAGTTTGCCCGACACCAGCAGAAAAGTGGACCTTGGGGGCTG
GTTCTGCTCCTGGCCCCCTTGTTCAGCCCCTGCCGGCACACGGGCGGCTCACCCTGGACACTGTGATGCG CAT

3
29-07-2024

Protein sequence
• >AAA40590.1 insulin [Octodon degus]
• MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRP
HDRRELEDLQVEQAELGLEAGGLQPSALEMILQKRGIVDQCCNNICTFNQLQ
NYCNVP

Gene Expression Data

4
29-07-2024

Sequencing data

What is biological big data?

• Biological big data are a massive amount of data generated from multi-
omics experiments, such as genomics, transcriptomics, proteomics,
metabolomics, phenomics, glycomics, epigenomics, and other omics.
These data are used to study biological processes and to gain insights
into how living systems work.

5
29-07-2024

• The central role of bioinformatics in the modern biological investigation based on ‘omics’
sciences.
Organism's
Proteomics is
transcriptome is the
the large-scale
sum of all of its RNA
study of
transcripts.
proteins.

Genomics is an Metabolomics is the

interdisciplinary scientific study of
field of biology chemical processes
focusing on the involving
structure, metabolites, the
function, small molecule
evolution, substrates,
mapping, and intermediates, and
editing of products of cell
genomes metabolism. 11

• Glycomics - study of glycans, or complex carbohydrates, in cells and

organisms.
• Epigenomics - the study of the epigenome, which is the complete set
of epigenetic modifications on a cell's genetic material.
• Phenomics - the study of an organism's phenotype, or its observable
characteristics, and how they change over time.

6
29-07-2024

Big Data Centres:

NCBI - National Center for Biotechnology Information
EMBL - European Molecular Biology Laboratory
IBDC – Indian Biological Data Center

Abdalla, H.B. A brief survey on big data: technologies,

terminologies and data-intensive applications. J Big Data 9, 107

BIG DATA APPLICATION (2022). https://doi.org/10.1186/s40537-022-00659-3

7
29-07-2024

Techniques used in the big data applications

Abdalla, H.B. A brief survey on big data: technologies, terminologies and data-intensive
applications. J Big Data 9, 107 (2022). https://doi.org/10.1186/s40537-022-00659-3 15

8
29-07-2024

Big data technologies/tools can be categorized into four:

(https://bioinformaticsreview.com/20160313/big-data-in-bioinformatics/)
1. Data storage and retrieval:
For mapping sequencing data to specific reference organisms-
• CloudBurst, a parallel computing model.
• Contrail for assembling large genomes
• Crossbow for identifying SNPs from sequence datasets.
• DistMap (a toolkit for distributed short read mapping on a Hadoop cluster)
• SeqWare (to access large-scale whole genome datasets)
• Read Annotation pipeline ( developed by DDBJ, cloud-based pipeline to analyze NGS data)
• Hydra ( for processing large peptide and spectra databases)

2. Error Identification: Necessary to identify errors in the sequence datasets

• SAMQA (Sequence Alignment/Map Quality analysis) which identifies errors and ensures
that large-scale genomic data meet the minimum quality standards
• ART - a next-generation sequencing read simulator

3. Data Analysis:
• This feature of big data allows the researchers to analyze the data obtained by performing
experiments.
• GATK (Genome Analysis Toolkit) is a MapReduce-based programming framework -
used for large-scale DNA sequence analysis
• BlueSNP - R package for highly scalable genome-wide association studies using Hadoop
clusters

9
29-07-2024

• 4. Platform Integration Deployment:

• integrate big data technologies into user-friendly operations.
• SeqPig - distributed analysis of large sequencing datasets on Hadoop clusters. (reduces
the technological skills required to use MapReduce by reading large formatted files to
feed analysis applications)
• CloVR (Cloud Virtual Resource) is a sequencing analysis package distributed through a
virtual machine
• CloudBioLinux - Coffers genome analysis resources for cloud computing platforms such
as Amazon EC2.

10
29-07-2024

DNA sequence analysis data -

• convert raw data into
meaningful results that will
guide further research.
????

11
29-07-2024

And then….
• DNA sequencing analysts help with developing models
and algorithms to manage and analyze Sanger sequencing and
NGS data.
• A genome analyst can provide insights
• into the risk factors for genetic disease in a specific individual
• find new targets for drugs
• help with developing personalized medicine.
• A genome analyst can also help with study design, - the
selection of patients for clinical trials based on their genetic
makeup.
23

Some of the job titles of biological data analyst:

• Bioinformatician
• Bioinformatics Scientist
• Computational Analyst/Biologist
• Genome Analyst
• Genomic Data Analyst
• Rare Disease Analyst/Cancer Analyst

• https://omicstutorials.com/bioinformatics-tools-softwares-programmes/
24

12
29-07-2024

Data Analyzation requires

What should I learn?

• JAVA: computer-based biological simulation technologies
• PERL : String manipulation, regular expression matching, file parsing, data
format interconversion etc
• R – for perform statistics, machine learning, visualisations and data analyses.
• Python - high-level programming language - fewer lines of code than would be
possible in languages such as C++ or Java.
• BioXML (eXtensible Markup Language): This is a resource to gather XML
documentation, DTDs and XML aware tools for biology in one location.
• Biocorba: Framework for interlanguage support -interoperability between
bioperl and other perl packages such as Ensembl and the Annotation
Workbench.

13
29-07-2024

What do I get from Big data?

• Ontology - Deriving phenotype data from tons of sequences
• Phylogeny - Deriving evolutionary patterns from genetic data
• SNP's - Finding nucleotide bases that differ from the norm to predict
patterns in phenotype...
• Cancer studies - Data science and machine learning technologies -
extract new meaning from large clinical and molecular datasets.
• Basically being trained to look at terabytes of data and derive SOME
knowledge...and it all depends on what your looking for...

S.C. Rastogi Parag Rastogi, Namita Mendiratta - Bioinformatics - Methods and Applications - Genomics, Proteomics and Drug Discovery-PHI (2022)
100% (1)
S.C. Rastogi Parag Rastogi, Namita Mendiratta - Bioinformatics - Methods and Applications - Genomics, Proteomics and Drug Discovery-PHI (2022)
626 pages
Bioinformatics Principles
No ratings yet
Bioinformatics Principles
6 pages
Bioinformatics A Practical Guide To Next Generation Sequencing Data
100% (1)
Bioinformatics A Practical Guide To Next Generation Sequencing Data
349 pages
Genome Parsergenome Parsergenome Parsergenome Parser
No ratings yet
Genome Parsergenome Parsergenome Parsergenome Parser
165 pages
Bioinformatics Technologies, 1st Edition Enhanced Ebook Download
100% (11)
Bioinformatics Technologies, 1st Edition Enhanced Ebook Download
15 pages
Biostatistics and Vital Statistics Handout
100% (6)
Biostatistics and Vital Statistics Handout
3 pages
Yi-Ping Phoebe Chen - Bioinformatics Technologies - 250210 - 163243-2
No ratings yet
Yi-Ping Phoebe Chen - Bioinformatics Technologies - 250210 - 163243-2
17 pages
Bioinfo MCQS
100% (1)
Bioinfo MCQS
22 pages
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
100% (2)
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
54 pages
Computational and Systems Biology
No ratings yet
Computational and Systems Biology
16 pages
R NGS
No ratings yet
R NGS
29 pages
Bioin
No ratings yet
Bioin
34 pages
Lecture 1 (Introduction To Bioinformatics)
No ratings yet
Lecture 1 (Introduction To Bioinformatics)
21 pages
Lec (1) - Introduction
No ratings yet
Lec (1) - Introduction
41 pages
Bioinformtics Future
No ratings yet
Bioinformtics Future
27 pages
Présentation
No ratings yet
Présentation
11 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
6 pages
Lec 01 Introduction To Biostatistics
No ratings yet
Lec 01 Introduction To Biostatistics
14 pages
Biostar Handbook Chapter
No ratings yet
Biostar Handbook Chapter
51 pages
MSC - Bioinformatics - Year1 Detailing by Bioinformatics Centre SPPU - 03082023
No ratings yet
MSC - Bioinformatics - Year1 Detailing by Bioinformatics Centre SPPU - 03082023
33 pages
Bioinformatics - Trends and Methodologies
No ratings yet
Bioinformatics - Trends and Methodologies
736 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
DOT PLOT and SEQUENTIAL ALIGNMENT
No ratings yet
DOT PLOT and SEQUENTIAL ALIGNMENT
22 pages
Lecture 1-2 Intro
No ratings yet
Lecture 1-2 Intro
24 pages
Biostat MCQ
100% (4)
Biostat MCQ
8 pages
Bioinformatics Learning Framework
No ratings yet
Bioinformatics Learning Framework
7 pages
4 RNAseq-Quantification LO
No ratings yet
4 RNAseq-Quantification LO
30 pages
Lecture1-1 525 W16 Large
No ratings yet
Lecture1-1 525 W16 Large
129 pages
Bioinformatics Final
No ratings yet
Bioinformatics Final
18 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
01-What Is Bioinformatics
No ratings yet
01-What Is Bioinformatics
40 pages
Bioinformatics
No ratings yet
Bioinformatics
3 pages
Bioinformatics Class Notes
No ratings yet
Bioinformatics Class Notes
12 pages
Bioinfo Course Notes M1 2020 DR Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 DR Mbulli
56 pages
Bioinformatics LB 2024
No ratings yet
Bioinformatics LB 2024
32 pages
BIOINFORMATICS
100% (1)
BIOINFORMATICS
4 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Navigating-Internet-Resources-in-Bioinformatics - PPTX 2
No ratings yet
Navigating-Internet-Resources-in-Bioinformatics - PPTX 2
11 pages
Unit 1
No ratings yet
Unit 1
24 pages
Article BioinformaticsNewToolsAndAppli
No ratings yet
Article BioinformaticsNewToolsAndAppli
15 pages
Bio Informatics
No ratings yet
Bio Informatics
46 pages
Toolsofbioinforformatics 200511063020
No ratings yet
Toolsofbioinforformatics 200511063020
18 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Intro To BioInformatics Lec 1
No ratings yet
Intro To BioInformatics Lec 1
8 pages
Bioinformatics New Tools and Applications in Life
No ratings yet
Bioinformatics New Tools and Applications in Life
16 pages
Role of Bioinformatics
No ratings yet
Role of Bioinformatics
2 pages
BIOINFORMAICS
No ratings yet
BIOINFORMAICS
12 pages
Introduction To Bioinformatics - BCHS 4214
No ratings yet
Introduction To Bioinformatics - BCHS 4214
10 pages
BioInformatics Assignment 1
No ratings yet
BioInformatics Assignment 1
7 pages
Joint Beca-Ilri Hub, Slu and Unesco Advanced Genomics and Bioinformatics
No ratings yet
Joint Beca-Ilri Hub, Slu and Unesco Advanced Genomics and Bioinformatics
27 pages
Biometry PDF
No ratings yet
Biometry PDF
28 pages
Download
No ratings yet
Download
19 pages
Fundamentals of Biostatistics 8th Edition PDF
No ratings yet
Fundamentals of Biostatistics 8th Edition PDF
39 pages
Bioinformatics A New Tool in Dentistry
No ratings yet
Bioinformatics A New Tool in Dentistry
8 pages
Historical Background of Bioinformatics
No ratings yet
Historical Background of Bioinformatics
4 pages
BTH 403-BTG407 Lecture 1
No ratings yet
BTH 403-BTG407 Lecture 1
6 pages
Judul Kesmas
No ratings yet
Judul Kesmas
72 pages
Biostatistics Lecture Notes - 1 - 4
No ratings yet
Biostatistics Lecture Notes - 1 - 4
34 pages
Bio in For Matics
No ratings yet
Bio in For Matics
17 pages
DNA Sequencing With Machine Learning
No ratings yet
DNA Sequencing With Machine Learning
34 pages
BIO 401 Note... Introduction To Bioinformatics
No ratings yet
BIO 401 Note... Introduction To Bioinformatics
4 pages
Notas
No ratings yet
Notas
4 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
7 pages
Bioinformatics: Tina Elizabeth Varghese
No ratings yet
Bioinformatics: Tina Elizabeth Varghese
9 pages
Msa Notes
No ratings yet
Msa Notes
10 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
8 pages
Collection
No ratings yet
Collection
8 pages
Lecture 1
No ratings yet
Lecture 1
11 pages
Biological Databases
No ratings yet
Biological Databases
15 pages
Methods For Applying Multiple Sequence Alignment
No ratings yet
Methods For Applying Multiple Sequence Alignment
17 pages
Schubert Metadata
No ratings yet
Schubert Metadata
162 pages
Nes-And-Genomes BS4
No ratings yet
Nes-And-Genomes BS4
44 pages
IP2 Notes
No ratings yet
IP2 Notes
25 pages
VRsec BIOINFORMATICS
No ratings yet
VRsec BIOINFORMATICS
2 pages
Lecture 1
No ratings yet
Lecture 1
12 pages
Repeated Measures Design With Generalized Linear Mixed Models For Randomized Controlled Trials 1st Edition Toshiro Tango
No ratings yet
Repeated Measures Design With Generalized Linear Mixed Models For Randomized Controlled Trials 1st Edition Toshiro Tango
39 pages
Biostatistics Lecture
No ratings yet
Biostatistics Lecture
75 pages
Function of Public Health Core Competencies of Public Health Scope of Public Health
No ratings yet
Function of Public Health Core Competencies of Public Health Scope of Public Health
53 pages
Biostat Lecture 1
No ratings yet
Biostat Lecture 1
24 pages
CV - Md. Akhtar Hossain
No ratings yet
CV - Md. Akhtar Hossain
3 pages
Second - Done - W14a - Substitution Patterns
No ratings yet
Second - Done - W14a - Substitution Patterns
36 pages
KCL NGScourse Session3 Handout
No ratings yet
KCL NGScourse Session3 Handout
13 pages
Bioinformatics Cheat Sheet
No ratings yet
Bioinformatics Cheat Sheet
4 pages
PAM and BLOSUM Matrices
No ratings yet
PAM and BLOSUM Matrices
3 pages
Molecular Phylogenetic Relationships and The Evolution of The Placenta in Poecilia Fishes
No ratings yet
Molecular Phylogenetic Relationships and The Evolution of The Placenta in Poecilia Fishes
9 pages
Dna Toolkit
No ratings yet
Dna Toolkit
1 page
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Biomedical Engineering: The Fundamentals of Biotechnology
From Everand
Biomedical Engineering: The Fundamentals of Biotechnology
Phil Gilberts
No ratings yet