0% found this document useful (0 votes)

18 views39 pages

Intro Biol Datav2

This document provides an overview of the BT3041 course on analysis and interpretation of biological data. It discusses the large amounts of data being generated across various areas of biology such as genomics, proteomics, connectomics and more. It outlines the course structure which will cover topics like data representation, comparison, clustering, classification and statistical analysis techniques to help interpret these large and complex biological datasets.

Uploaded by

a.vidhya 12345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views39 pages

Intro Biol Datav2

Uploaded by

a.vidhya 12345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

BT3041:

Analysis and Interpretation of Biological Data

Instructor: V. Srinivasa Chakravarthy

Slot: D
Room: CRC301
Life…
under the burden of
BIG DATA
Telecom
• 4.6 billion mobile-phone subscriptions worldwide
• 1-2 billion people accessing the internet. [1]

• The world's effective capacity to exchange information through

telecommunication networks:
– 281 petabytes in 1986,
– 471 petabytes in 1993,
– 2.2 exabytes in 2000,
– 65 exabytes in 2007[8]
– predicted to reach
667 exabytes
annually by 2014.(Wiki)
Video Surveillance
• Up to 5.9 million closed-circuit television
cameras in UK
• Including 750,000 in
“sensitive locations” such as schools, hospitals
and care homes
.
• 1 camera for every 11 people
• A few GB/camera/day

http://www.telegraph.co.uk/technology/10172298/One-surveillance-
camera-for-every-11-people-in-Britain-says-CCTV-survey.html
Space Exploration
• Square Kilometer Array (SKA) project
• Radio telescopes spread over 1 sq km area
• “sensitive enough to detect airport radar on a
planet 50 light years away”
• Generates 750 terabytes every SECOND!

http://venturebeat.com/2014/10/05/how-big-data-is-fueling-a-new-age-in-
space-exploration/
DATA in modern world
• Data as the fourth pillar of science
• The first 3 pillars are:
– Theory
– Experiment
– Computation
Jobs!
• Needed by 2018, in US alone:
– 140,000 to 190,000 big data analysts
– 1.5 million managers who understand big data
file:///D:/BACKUPD/courses/biol_data/intro/Big_data_McKinsey_Company.htm#sthash.2NWbgp5G.dpuf
g y
l o
B io
i n
ata
g d
Bi
A Big Data place
• The European Bioinformatics Institute (EBI) in Hinxton, UK,
– part of the European Molecular Biology Laboratory
– one of the world's largest biology-data repositories,
– currently stores 20 petabytes of data and back-ups
– Data about genes, proteins and small molecules.
Another Big Data place
• Beijing Genomics Institute (BG) in Shenzen, China
• “The Sequence Factory”
• 157 genome sequencing instruments working 24X7
• Samples from people, plants, animals and microbes.
• Each day, it generates 6 terabytes of genomic data.
• Every instrument can decode one human genome per week
(used to take months or years and many staff).
(Marx, Nature, 2013)

…Where is it all coming from?

THE
OMICS
HUMAN GENOMICS
Human Genome Project
• Aim
– Identify sequence of bases on all 23 human chromosomes
(3 billion bases/3Gb)
– Identify genes within those sequences (~30 000 genes)
– Locate the position of the genes on the chromosomes

• $6 bn, 1000 scientists, 50 countries, 10+ years!

• Human genome can now be sequenced in a few days on the ‘next-

generation sequencing’ (NGS) machines

• Full genome data being collected from disease conditions

– the combined cancer genome and normal genome from a single patient
constitutes about 1 terabyte (1012 bytes)
– a million genomes would generate an exabyte (10 18 bytes). ”

(Courtesy: Karthik Raman)

Types of Genomics
• Disease genomics
– Millions of patients per disease
• DNA profiling
– Family lineages, parenting, forensics etc
• Comparative genomics
• Plant genomics
• Bacterial genomics
• Viral genomics
Transcriptome
• The set of all RNA molecules in a given cell,
population of cells or an organism
• A gene may produce many different types of
mRNA molecules, so a transcriptome is much
more complex than the genome that encodes
it.
Proteins
• Peptide: a chain of amino acids (AAs)
• Assuming an average size of 200 AAs, number of possible
proteins is 20200 > # protons in the universe
• Assume:
– there are 107–108 species on Earth and
– 103–105 genes/species,
–  there are 1010–1013 unique protein sequences,
– <<possible sequence space,
– >> known protein number
– http://www.ncbi.nlm.nih.gov/books/NBK20267/
Single Protein Study

• Structure prediction
– Secondary structure
– Tertiary structure
– Quaternary Structure
• Can be quite complex
– P53 – tumor suppression gene assoc protein
– P53 mutation database exists
– 60,000 publications on p53 alone!!!
Proteomics
• The full complement of proteins expressed in
a cell, organ or an organism
Interactome
• is the whole set of molecular interactions in a
particular cell.
I’m here

NOW…
LET’S GET TO THE CELL LEVEL
Neurons come in different Shapes
Neuromorph
The Connectome Project
• Find out the complete wiring diagram of the brain
• Human brain
– Has 100 billion neurons
– Each neuron has about 1k-10k connections
• Feasible for smaller organisms
– C. Elegans connectome – 300 neurons, 7000
connections (White et al 1986)
Human Connectome Project

- 1200 individuals

- fMRI + dMRI +
+ MRI + MEG

-Washington U +
U. Minnesota

http://www.humanconnectome.org/
Connectome Data Sizes
HCP Data Sizes (per Subject)
Session Format .zip File Size
Structural Unprocessed 70.99 MB
Preprocessed 1.19 GB
Resting State fMRI Unprocessed 2 GB
(each of 2 sessions) Preprocessed 3.24 GB

For 68 subjects:
Task fMRI (avg per Task) Unprocessed 490 MB
Preprocessed 771 MB
(all 7 Tasks) Unprocessed 3.43 GB
Preprocessed 5.4 GB

Diffusion Unprocessed
Preprocessed
2.18 GB
2.81 GB
1.8 Terabytes!!
Group-Average on Unrelated 20 Additionally 289 MB
Processed
Total (per Subject) Unprocessed 9.81 GB
Preprocessed 15.77 GB
Both 25.58 GB

Total (5 Subjects) Unprocessed 62.16 GB

Preprocessed 78.83 GB
Both 141 GB

Total (20 Subjects) Unprocessed 247.34 GB

Preprocessed 315.05 GB
Both 562.39 GB

Total (68 Subjects) Unprocessed 815.4 GB

Preprocessed 1.058 TB
Both 1.873 TB
The Hierarchy

Tissue/organ Large scale networks

Cell (e.g. neuron) Microcircuits

Proteome
Metabolome/
Interactome

Transcriptome
Regulatory
Networks

Genome
ANALYZE THAT!
Questions?
• How to represent an object?
• How to compare objects?
– Same type or different?
– How different?
• How to group/cluster objects based on similarity?
• How to assign objects to classes?
• How to compare groups of objects?
– Are two groups of objects really different?
Course Structure
• Mathematical Preliminaries
– Vectors, vector spaces
– Eigenvalues and eigenvectors
– Derivatives in higher dimensions
– Linear Least Squares problem
– Optimization
• Lagrange multipliers
– Probability and Bayes theorem
Unsupervised Learning methods
• Clustering
– K-means
– Hierarchical clustering
– Scale-based clustering
– Fuzzy clustering
– Graph based clustering
– Self-organizing map
• Dimensionality reduction
– PCA and ICA
Classification
• Prototype-based classification
– K Nearest-neighbor classifier
– Learning Vector Classification
Classification
• Discriminant-based classification
– Linear Discriminant Analysis
– Neural Networks –
• Multilayer perceptron
• Radial Basis Function Network
– Support Vector Machines
– Bayesian Classifier
Biostatistics
• Standard Normal Distribution;
• Hypothesis testing;
• Multiple hypothesis testing;
• Chi-squared distribution;
• F-test and Student’s t-test;
• ANOVA;
• Regression Analysis
Text Books
• Introduction to Data Mining –
Tan/Steinbach/Kumar
• Neural Networks: A classroom approach –
Satish Kumar
• Analysis of Biological Data – Whitlock/Schluter
Grading
• Quiz I – 20
• Quiz 2 – 20
• Assignments – 20
• Endsem – 40

• Grading policy – RG!!!

May the
DATA
be with you!

Bif501 Handouts PDF Bif
No ratings yet
Bif501 Handouts PDF Bif
197 pages
Bioinformatics and Functional Genomics 1st Edition Jonathan Pevsner Instant Download
100% (1)
Bioinformatics and Functional Genomics 1st Edition Jonathan Pevsner Instant Download
59 pages
Guide To Human Genome Computing - 2nd Edition Open Access Download
100% (10)
Guide To Human Genome Computing - 2nd Edition Open Access Download
15 pages
Proteomics Introduction
67% (3)
Proteomics Introduction
39 pages
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
100% (2)
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
54 pages
Comparative Genomics 2 - PART 1
No ratings yet
Comparative Genomics 2 - PART 1
31 pages
Structural Functional Comparative Genomics
No ratings yet
Structural Functional Comparative Genomics
17 pages
Principles of Behavior Modification Albert Bandura
No ratings yet
Principles of Behavior Modification Albert Bandura
2,012 pages
Bioinformatics-An Introduction and Overview
No ratings yet
Bioinformatics-An Introduction and Overview
12 pages
Microbiology
100% (1)
Microbiology
31 pages
Fractography
100% (5)
Fractography
17 pages
Biological Databases: DR Z Chikwambi Biotechnology
No ratings yet
Biological Databases: DR Z Chikwambi Biotechnology
47 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
104 pages
Thesis Title Defense Minutes - Wednesday Group - Jan 25
No ratings yet
Thesis Title Defense Minutes - Wednesday Group - Jan 25
16 pages
Data Mining in Molecular Biology A Journey From Ra
No ratings yet
Data Mining in Molecular Biology A Journey From Ra
7 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
Overview On Bioinformatics
No ratings yet
Overview On Bioinformatics
75 pages
APPLICATION OF BIOINFORMATICS IN MOLECULAR BIOLOGY AND CURRENT RESEACRH-Dr. Ruchi Yadav
No ratings yet
APPLICATION OF BIOINFORMATICS IN MOLECULAR BIOLOGY AND CURRENT RESEACRH-Dr. Ruchi Yadav
105 pages
Anotacion de Genomas
No ratings yet
Anotacion de Genomas
84 pages
202 07 Bioinformatics
No ratings yet
202 07 Bioinformatics
14 pages
Unit1 BDA
No ratings yet
Unit1 BDA
86 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
Genomicsproteomics 180414063127
No ratings yet
Genomicsproteomics 180414063127
46 pages
Guide To Human Genome Computing, 2nd Edition Full Ebook Access
No ratings yet
Guide To Human Genome Computing, 2nd Edition Full Ebook Access
15 pages
Bioinformatics 2
No ratings yet
Bioinformatics 2
42 pages
Introduction To Bioinformatics and Biocomputing I: DR Tan Tin Wee Director Bioinformatics Centre
No ratings yet
Introduction To Bioinformatics and Biocomputing I: DR Tan Tin Wee Director Bioinformatics Centre
39 pages
Intro Biol Notes
No ratings yet
Intro Biol Notes
49 pages
Big Data and Genomics
No ratings yet
Big Data and Genomics
17 pages
BHU Biotech
No ratings yet
BHU Biotech
38 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
BTH 202 C 072025
No ratings yet
BTH 202 C 072025
17 pages
Lecture 1 - Biological Database
No ratings yet
Lecture 1 - Biological Database
14 pages
COM215 RobertLleras FINAL
No ratings yet
COM215 RobertLleras FINAL
26 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
14 pages
Big Data Challenges in Bioinformatics
No ratings yet
Big Data Challenges in Bioinformatics
47 pages
9.5 - Genomics and Bioinformatics (Book Highlights)
No ratings yet
9.5 - Genomics and Bioinformatics (Book Highlights)
4 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
33 pages
Class04 - Biological Databases - 2022
No ratings yet
Class04 - Biological Databases - 2022
14 pages
Latthika
No ratings yet
Latthika
21 pages
General Concepts For Systems Biology
No ratings yet
General Concepts For Systems Biology
32 pages
Test For Upload
No ratings yet
Test For Upload
25 pages
Big Data Analytics
No ratings yet
Big Data Analytics
13 pages
Anne Carpenter Michael Schatz Matt Wood: Broad Institute, @drannecarpenter
No ratings yet
Anne Carpenter Michael Schatz Matt Wood: Broad Institute, @drannecarpenter
26 pages
Sequence Comparison Method of Dna Prediction
No ratings yet
Sequence Comparison Method of Dna Prediction
18 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Challenges of Big Data Integration in The Life Sciences
No ratings yet
Challenges of Big Data Integration in The Life Sciences
10 pages
Front - Matter Pauline Doran Book PDF
No ratings yet
Front - Matter Pauline Doran Book PDF
8 pages
Practical Guide For Managing Large-Scale Human Gen
No ratings yet
Practical Guide For Managing Large-Scale Human Gen
14 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Brown Black Vintage Old Paper Project History Presentation
No ratings yet
Brown Black Vintage Old Paper Project History Presentation
14 pages
Human Genome Project: Presented By: Vaishali Gade & Sandhya Singh
No ratings yet
Human Genome Project: Presented By: Vaishali Gade & Sandhya Singh
30 pages
3rd Quarter Exam
No ratings yet
3rd Quarter Exam
7 pages
Biological Databases For Human Research
No ratings yet
Biological Databases For Human Research
9 pages
CSC 121 Computers and Scientific Thinking Fall 2005
No ratings yet
CSC 121 Computers and Scientific Thinking Fall 2005
15 pages
Bio Paper 1 Ms Cehkwfbj
100% (1)
Bio Paper 1 Ms Cehkwfbj
28 pages
Introduction To Genomics: Children's Hospital Informatics Program
No ratings yet
Introduction To Genomics: Children's Hospital Informatics Program
22 pages
Sciencedirect Big Data Analytics For Personalized Medicine: Davide Cirillo and Alfonso Valencia
No ratings yet
Sciencedirect Big Data Analytics For Personalized Medicine: Davide Cirillo and Alfonso Valencia
10 pages
Big Data: Astronomical or Genomical?
No ratings yet
Big Data: Astronomical or Genomical?
11 pages
Bioinformatics
No ratings yet
Bioinformatics
5 pages
Big Data Biology - in Medicine
No ratings yet
Big Data Biology - in Medicine
4 pages
Google Genomics Whitepaper PDF
No ratings yet
Google Genomics Whitepaper PDF
10 pages
Paper6 - Bionoinformatics Nosql
No ratings yet
Paper6 - Bionoinformatics Nosql
5 pages
Bioinformatics Glossary
No ratings yet
Bioinformatics Glossary
4 pages
Big Data
No ratings yet
Big Data
3 pages
Future of Genomic Data
No ratings yet
Future of Genomic Data
3 pages
Seviour Bible 2010
No ratings yet
Seviour Bible 2010
684 pages
Introdcution To Seed Pathology Revised
No ratings yet
Introdcution To Seed Pathology Revised
14 pages
Research Output
No ratings yet
Research Output
21 pages
Introduction To Pharmacology: Prof. Johnny S. Bacud JR., RPH, Mspharm Cand
No ratings yet
Introduction To Pharmacology: Prof. Johnny S. Bacud JR., RPH, Mspharm Cand
81 pages
5631236-The Little Girl - Answer Key
No ratings yet
5631236-The Little Girl - Answer Key
6 pages
Similarity
No ratings yet
Similarity
19 pages
Cambridge IGCSE: Biology 0610/21
No ratings yet
Cambridge IGCSE: Biology 0610/21
16 pages
The Whole History of The Earth and Life
100% (2)
The Whole History of The Earth and Life
2 pages
Fib Exam Question PDF
0% (1)
Fib Exam Question PDF
20 pages
Food Journal - Pap Biology - Arnav Sama
No ratings yet
Food Journal - Pap Biology - Arnav Sama
6 pages
101 Reasons Not To Have An Abortion
No ratings yet
101 Reasons Not To Have An Abortion
568 pages
Digital PCR
No ratings yet
Digital PCR
6 pages
Assignment 1 - BT5021-A1-2016
No ratings yet
Assignment 1 - BT5021-A1-2016
2 pages
Cytoskeleton & Proteosomes
No ratings yet
Cytoskeleton & Proteosomes
51 pages
PCR Multiplex
No ratings yet
PCR Multiplex
2 pages
Surface Properties and Surface Characterization-2-1
No ratings yet
Surface Properties and Surface Characterization-2-1
61 pages
Worksheet 4.4
No ratings yet
Worksheet 4.4
4 pages
Sts Lecture
No ratings yet
Sts Lecture
1 page
Surface-Protein Interactions
No ratings yet
Surface-Protein Interactions
39 pages
World Ocean Day
No ratings yet
World Ocean Day
1 page
The Self-Organizing Map
No ratings yet
The Self-Organizing Map
30 pages
INMUNOMICROBIOLOGIA
No ratings yet
INMUNOMICROBIOLOGIA
15 pages
Himalayan WhiteHouse International College Dashain Homework Science 12
No ratings yet
Himalayan WhiteHouse International College Dashain Homework Science 12
13 pages
Lipids
No ratings yet
Lipids
22 pages
Ethnobotanical Knowledge of Philippine Lowland Farmers and Its Application in Agroforestry
No ratings yet
Ethnobotanical Knowledge of Philippine Lowland Farmers and Its Application in Agroforestry
22 pages
18 2017 Article 2509
No ratings yet
18 2017 Article 2509
19 pages
UNIT 3 Exam: Population Biology
No ratings yet
UNIT 3 Exam: Population Biology
6 pages
Glycolysis: Cellular Respiration Cells Cytoplasm
No ratings yet
Glycolysis: Cellular Respiration Cells Cytoplasm
4 pages
Programming Assignment-1
No ratings yet
Programming Assignment-1
1 page
PreciControl CMV IgG Avidity - Ms - 05942322190.V4.En
No ratings yet
PreciControl CMV IgG Avidity - Ms - 05942322190.V4.En
2 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet

Intro Biol Datav2

Uploaded by

Intro Biol Datav2

Uploaded by

BT3041:

Analysis and Interpretation of Biological Data

Instructor: V. Srinivasa Chakravarthy

• The world's effective capacity to exchange information through

…Where is it all coming from?

• $6 bn, 1000 scientists, 50 countries, 10+ years!

• Human genome can now be sequenced in a few days on the ‘next-

• Full genome data being collected from disease conditions

(Courtesy: Karthik Raman)

Total (5 Subjects) Unprocessed 62.16 GB

Total (20 Subjects) Unprocessed 247.34 GB

Total (68 Subjects) Unprocessed 815.4 GB

Tissue/organ Large scale networks

Cell (e.g. neuron) Microcircuits

• Grading policy – RG!!!

You might also like