0% found this document useful (0 votes)

143 views25 pages

Deepbind: 6.874 - Pranam Chatterjee

DeepBind is a deep learning model that learns sequence specificities from large datasets of protein binding sequences. It generates Position Weight Matrix models for over 500 transcription factors and 194 RNA binding proteins. DeepBind outperforms other non-deep learning methods on tasks like predicting DNA and RNA binding. It can also identify variants that affect protein binding and help understand alternative splicing regulation. Overall, DeepBind demonstrates the ability of deep learning to discover sequence motifs from huge amounts of protein binding data.

Uploaded by

Oky Hermansyah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

143 views25 pages

Deepbind: 6.874 - Pranam Chatterjee

Uploaded by

Oky Hermansyah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

DeepBind

6.874 - Pranam Chatterjee

Why do we care?
● Regulatory processes
○ Transcription
○ Alternative Splicing
○ Disease correlation

● Sequence specificity
Position Weight Matrix
Steps:
1. Get PFM by counting
occurrences of each
nucleotide at each
position.
2. Divide frequency by total
# of sequences.
3. Formally, given a set X of
N aligned sequences of
length i:
Data Issues
● Different forms of data
○ Specifity coefficient
■ Protein Binding Microarrays
■ RNAcompete arrays
○ Ranked Lists of Bound Sequences
■ ChIP-Seq
○ High Affinity Sequence List
■ HT-SELEX
● Large Quantities of Data
○ 10,000-100,000 sequences (1 EXPERIMENT)
● Additional Biases/Limitations
○ i.e., hyper-ChIPable regions of genome
○ Need to filter
DeepBind Claims
● Apply to both microarray and sequencing data
● Generalize well across technologies
● Tolerate noise and mislabeled data
● Can learn from millions of sequences through parallel implementation on a
graphics processing unit (GPU)
● Train models and tune parameters automatically
● Can discover new patterns without location information

Alipanahi, et al., Nature Biotechnology, 2015.

Overview of DeepBind

Alipanahi, et al., Nature Biotechnology, 2015.

Training Procedure

MAX or MEAN

BINDING SCORE
Alipanahi, et al., Nature Biotechnology, 2015.
Calibration and Testing Procedure

12 terabases of data!!!
Alipanahi, et al., Nature Biotechnology, 2015.
Let’s unpack that...
● Thousands of PBM, RNAcompete, ChIP-Seq, and HT-SELEX experiments
● Create 927 DeepBind models
● 538 Transcription Factors
● 194 RNA-binding Proteins (RBPs)

(This took 4+ years, btw)

Alipanahi, et al., Nature Biotechnology, 2015.

How well does it work?
● Test on PBM data from DREAM5 TF-DNA Motif Recognition Challenge
● 86 different mouse transcription factors
● 2 array designs (~40,000 probes each)
○ All possible 10-mers, non-palindromic 8-mers (32x)
● Train on probe intensities, predict on held-out test array design

Example Competing Algorithms (26 in total)

● FeatureREDUCE (biophysical PWM/k-mer) None of these are deep-learning-based!

● BEEML-PBM (weighted regression)
● RankMotif++ (probabilistic)
● PFM models (position frequency matrices)
Alipanahi, et al., Nature Biotechnology, 2015.
Metrics
● Area Under Curve (AUC)
○ Measures true positive rate of model as a function of
false positive rate (ROC curve)
○ Tells us how good the model identifies actual positives
○ Higher AUC means better performing model

● Pearson Correlation
○ Measures linear correlation between predicted intensity
and probe intensities
○ Higher absolute values (maxed at 1), indicate better
performing mode.
Quantitative Performance Against Other Methods

Alipanahi, et al., Nature Biotechnology, 2015.

Do in vitro models accurately identify in vivo bound
sequences?
● 506 ENCODE ChIP-Seq data sets
● In vivo laboratory biases
○ Cell-type specificities
○ Nucleosome interactions
○ Chromatin remodeling, etc.
● 137 transcription factors
● Performed better than other
non-deep learning methods based
on AUC
● Can generalize to other data
acquisition methods
Alipanahi, et al., Nature Biotechnology, 2015.
First place goes to….DEEPBIND!
Why are RBPs sequence specifities difficult to
predict?
● Usually bind to ssRNA
● More flexible than DNA
● Can fold into stable secondary structures
● Recognition motif is highly flexible
○ Multiple domains neeeded for binding
● RNA structure also affects binding

Rhee, et al., Nucleic Acids Research 2008,

Identifying Damaging
Genetic Variants

● How to do this?
● MUTATION MAPS!
○ Importance of each base
○ Effect of each mutation on
binding score
● Illustrates effect of point
mutations on binding
affinity

Alipanahi, et al., Nature Biotechnology, 2015.

Mutation in MYC Enhancer Weakens TCF7L2
Binding Site

Alipanahi, et al., Nature Biotechnology, 2015.

SNP in Globin Cluster Creates GATA1 Binding Site

Alipanahi, et al., Nature Biotechnology, 2015.

DeepFind: an aggregate model
● What’s the point? To provide
collective contexts.
● I.e., true TF binding sites are
likely to be located with other
TF binding sites
● AUC ~ 0.76
● Predicts deleterious SNVs in
promoters

Alipanahi, et al., Nature Biotechnology, 2015.

One more application: Alternative Splicing
● AS generates transcriptional
diversity
● RBPs regulate splicing
● Binding scores at exon
junctions regulated by splicing
regulators
● Consistent with experimental
CLIP-seq data and known
binding profiles of RBP’s

Alipanahi, et al., Nature Biotechnology, 2015.

Prediction of Nova Regulation Mechanism
DeepBind Motif Learning
Key Takeaways
● GOAL: given regions experimentally determined to be bound by proteins,
what is the model describing bound sequences?
● Sequences/Binding Scores -> CNN -> binding scores for novel sequences
● Generates weighted ensembles of PWM’s and mutational maps
● ~600 different DeepBind models generated
● Identified RNA-binding sites involved in splicing regulation
● Identified disease-associated variants that affect TF binding

CHECK IT OUT YOURSELF: http://tools.genes.toronto.edu/deepbind/

Shortcomings and Future Work
● Comparisons with only non-deep learning models
● Not much better than non-deep learning models
● Assumes one motif in each probe
● Non-coding factors/variants ignored
● Does not account for positional dynamics of probe sequences -> DeeperBind
● How about epigenetic regulation of binding to sequences? -> DeepSEA

Cool name bro

Any Questions?

DLPPT
No ratings yet
DLPPT
14 pages
Project Reference
No ratings yet
Project Reference
16 pages
Genome-Wide Prediction Of: Binding Sites
No ratings yet
Genome-Wide Prediction Of: Binding Sites
9 pages
DeepLBS A Deep Convolutional Neural Network-Based Ligand-Binding Site Prediction Tool
No ratings yet
DeepLBS A Deep Convolutional Neural Network-Based Ligand-Binding Site Prediction Tool
4 pages
WVDL Weighted Voting Deep Learning Model For Predicting RNA-Protein Binding Sites
No ratings yet
WVDL Weighted Voting Deep Learning Model For Predicting RNA-Protein Binding Sites
7 pages
TFBS-Finder: Deep Learning-Based Model With DNABERT and Convolutional Networks To Predict Transcription Factor Binding Sites
No ratings yet
TFBS-Finder: Deep Learning-Based Model With DNABERT and Convolutional Networks To Predict Transcription Factor Binding Sites
9 pages
Machine Learning Based Prediction Methods in Bioinformatics
No ratings yet
Machine Learning Based Prediction Methods in Bioinformatics
34 pages
Deeplearning Survery
No ratings yet
Deeplearning Survery
10 pages
Overview On Bioinformatics
No ratings yet
Overview On Bioinformatics
75 pages
DeepGRN Prediction of Transcription Factor Binding PDF
No ratings yet
DeepGRN Prediction of Transcription Factor Binding PDF
27 pages
J Bbagen 2020 129545
No ratings yet
J Bbagen 2020 129545
18 pages
A Sequence Based Multiple Kernel Model For Identifying DNA Binding Proteins
No ratings yet
A Sequence Based Multiple Kernel Model For Identifying DNA Binding Proteins
17 pages
Paper1 Decodingdlforbinfdingaffinit1y
No ratings yet
Paper1 Decodingdlforbinfdingaffinit1y
32 pages
DeepFinder An Integration of Feature Based and Deep Learning Approach For DNA Motif Discovery
No ratings yet
DeepFinder An Integration of Feature Based and Deep Learning Approach For DNA Motif Discovery
11 pages
DeepFinder An Integration of Feature-Based and Deep Learning Approach For DNA Motif Discovery
No ratings yet
DeepFinder An Integration of Feature-Based and Deep Learning Approach For DNA Motif Discovery
11 pages
Geometric Deep Learning of protein-DNA Binding Specificity: Nature Methods
No ratings yet
Geometric Deep Learning of protein-DNA Binding Specificity: Nature Methods
15 pages
Deep Learning in Bioinformatics PDF
No ratings yet
Deep Learning in Bioinformatics PDF
18 pages
Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model For Antibody Engineering
No ratings yet
Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model For Antibody Engineering
8 pages
Structure-Based, Deep-Learning Models For Protein-Ligand Binding Affinity Prediction
No ratings yet
Structure-Based, Deep-Learning Models For Protein-Ligand Binding Affinity Prediction
15 pages
Predicting rRNA-, RNA-, and DNA-binding Proteins From Primary Structure With Support Vector Machines
No ratings yet
Predicting rRNA-, RNA-, and DNA-binding Proteins From Primary Structure With Support Vector Machines
10 pages
Antibody Optimization Enabled by Artificial Intelligence Predictions of Binding Affinity and Naturalness
No ratings yet
Antibody Optimization Enabled by Artificial Intelligence Predictions of Binding Affinity and Naturalness
39 pages
Genomic Benchmarks: A Collection of Datasets For Genomic Sequence Classification
No ratings yet
Genomic Benchmarks: A Collection of Datasets For Genomic Sequence Classification
9 pages
NextComp2017 Paper 24
No ratings yet
NextComp2017 Paper 24
7 pages
Acs Molpharmaceut 6b00248
No ratings yet
Acs Molpharmaceut 6b00248
7 pages
John Mitchell James Mcdonagh Neetika Nath
No ratings yet
John Mitchell James Mcdonagh Neetika Nath
26 pages
Unveiling DNA Sequences: A Comparison of Machine Learning and Deep Learning Techniques For Prediction
No ratings yet
Unveiling DNA Sequences: A Comparison of Machine Learning and Deep Learning Techniques For Prediction
11 pages
Exploration of Protein Sequence Embeddings For Protein-Ligand Binding Site Detection
No ratings yet
Exploration of Protein Sequence Embeddings For Protein-Ligand Binding Site Detection
6 pages
MCNN Multiple Convolutional Neural Networks For RNA-Protein Binding Sites Prediction
No ratings yet
MCNN Multiple Convolutional Neural Networks For RNA-Protein Binding Sites Prediction
8 pages
An Efficient Deep Learning Approach For DNA Binding Proteins Classification From Primary Sequences
No ratings yet
An Efficient Deep Learning Approach For DNA Binding Proteins Classification From Primary Sequences
14 pages
Zhou 和 Troyanskaya - 2015 - Predicting effects of noncoding variants with deep
No ratings yet
Zhou 和 Troyanskaya - 2015 - Predicting effects of noncoding variants with deep
8 pages
Structural
No ratings yet
Structural
4 pages
Deep Learning: New Computational Modelling Techniques For Genomics
No ratings yet
Deep Learning: New Computational Modelling Techniques For Genomics
15 pages
A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction
No ratings yet
A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction
11 pages
Seq2Seq Fingerprint
No ratings yet
Seq2Seq Fingerprint
10 pages
AutoGenome An AutoML Tool For Genomi - 2021 - Artificial Intelligence in The Li
No ratings yet
AutoGenome An AutoML Tool For Genomi - 2021 - Artificial Intelligence in The Li
11 pages
Bhadra 2021 Mach. Learn. Sci. Technol. 2 015005
No ratings yet
Bhadra 2021 Mach. Learn. Sci. Technol. 2 015005
19 pages
Research Poster
No ratings yet
Research Poster
1 page
Neural Networks To Learn Protein Sequence-Function Relationships From Deep Mutational Scanning Data
No ratings yet
Neural Networks To Learn Protein Sequence-Function Relationships From Deep Mutational Scanning Data
12 pages
Data Representation in Machine Learning Methods With Its Applicat
No ratings yet
Data Representation in Machine Learning Methods With Its Applicat
100 pages
Predicting Protein-Ligand Binding Site Using Support Vector Machine With Protein Properties
No ratings yet
Predicting Protein-Ligand Binding Site Using Support Vector Machine With Protein Properties
13 pages
2019 PLoS Comput Biol 15 E1006718
No ratings yet
2019 PLoS Comput Biol 15 E1006718
23 pages
A Deep Learning Approach For Efficient Decision Making in Healthcare Informatics
No ratings yet
A Deep Learning Approach For Efficient Decision Making in Healthcare Informatics
14 pages
Agga Rwal 2021
No ratings yet
Agga Rwal 2021
11 pages
DeepPFP - A Multi-task-Aware Architecture For Protein Function Prediction
No ratings yet
DeepPFP - A Multi-task-Aware Architecture For Protein Function Prediction
10 pages
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
No ratings yet
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
6 pages
Annotating Protein Functions Via Fusing Multiple Biological Modalities
No ratings yet
Annotating Protein Functions Via Fusing Multiple Biological Modalities
13 pages
2020 Bioinformatics 36 3077-3083
No ratings yet
2020 Bioinformatics 36 3077-3083
7 pages
Abstract
No ratings yet
Abstract
1 page
The Bioinformatics Toolbox Extends MATLAB
No ratings yet
The Bioinformatics Toolbox Extends MATLAB
19 pages
RefinePocket An Attention-Enhanced and Mask-Guided Deep Learning Approach For Protein Binding Site Prediction
No ratings yet
RefinePocket An Attention-Enhanced and Mask-Guided Deep Learning Approach For Protein Binding Site Prediction
8 pages
ML Bioinformatics Updated
No ratings yet
ML Bioinformatics Updated
3 pages
Type Sequence Alignment Other Online? Download? User Input: Align-GVGD Bongo
No ratings yet
Type Sequence Alignment Other Online? Download? User Input: Align-GVGD Bongo
1 page
Unit-5 Bioinformatics
No ratings yet
Unit-5 Bioinformatics
13 pages
Dokumen - Pub Machine Learning in Bioinformatics of Protein Sequences Algorithms Databases and Resources For Modern Protein Bioinformatics 9811258570 9789811258572
No ratings yet
Dokumen - Pub Machine Learning in Bioinformatics of Protein Sequences Algorithms Databases and Resources For Modern Protein Bioinformatics 9811258570 9789811258572
378 pages
Application of Deep Learning Technique in Next Generation Sequence Experiments
No ratings yet
Application of Deep Learning Technique in Next Generation Sequence Experiments
21 pages
Pi Is 1097276523004665
No ratings yet
Pi Is 1097276523004665
29 pages
Transfer Learning To Leverage Larger Datasets For Improved Prediction of Protein Stability Changes
No ratings yet
Transfer Learning To Leverage Larger Datasets For Improved Prediction of Protein Stability Changes
10 pages
Microbial Diagnosis
No ratings yet
Microbial Diagnosis
57 pages
Deepubi: A Deep Learning Framework For Prediction of Ubiquitination Sites in Proteins
No ratings yet
Deepubi: A Deep Learning Framework For Prediction of Ubiquitination Sites in Proteins
10 pages
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
Art - Digital Games and Escapism
No ratings yet
Art - Digital Games and Escapism
20 pages
DCA2102 Unit-05
No ratings yet
DCA2102 Unit-05
21 pages
MCP 033123
No ratings yet
MCP 033123
224 pages
Finishing
100% (4)
Finishing
68 pages
W&M Approved Weighing Terminal: For Challenging Weighing and Filling Applications
No ratings yet
W&M Approved Weighing Terminal: For Challenging Weighing and Filling Applications
4 pages
All or Nothing SR Clare Resource Pack
No ratings yet
All or Nothing SR Clare Resource Pack
36 pages
The American College
No ratings yet
The American College
2 pages
Characteristics of Profesional Ethics
No ratings yet
Characteristics of Profesional Ethics
17 pages
Steam Turbine
No ratings yet
Steam Turbine
24 pages
伯格曼与克
No ratings yet
伯格曼与克
23 pages
Flexible & Adaptable
No ratings yet
Flexible & Adaptable
5 pages
Result of GCG Assessment 241 242
No ratings yet
Result of GCG Assessment 241 242
2 pages
Upsc Cse
No ratings yet
Upsc Cse
18 pages
Class 8 Selected Topics/skills: Choose Correct Answer(s) From The Given Choices
No ratings yet
Class 8 Selected Topics/skills: Choose Correct Answer(s) From The Given Choices
2 pages
Jujutsu Kaisen Manga Chapter 241
No ratings yet
Jujutsu Kaisen Manga Chapter 241
1 page
DE Experiment 7
No ratings yet
DE Experiment 7
9 pages
GBT 1591 2018 en
No ratings yet
GBT 1591 2018 en
33 pages
DBMS - Unit 3 - Notes (Relational Calculus)
No ratings yet
DBMS - Unit 3 - Notes (Relational Calculus)
22 pages
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
No ratings yet
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
22 pages
ESG Report
No ratings yet
ESG Report
9 pages
Preparing To Take The Solid Edge Certification Exam: Siemens PLM Software
No ratings yet
Preparing To Take The Solid Edge Certification Exam: Siemens PLM Software
1 page
TRD PRM
No ratings yet
TRD PRM
33 pages
UNIT 3 - Test 2
No ratings yet
UNIT 3 - Test 2
7 pages
UT525 526 User Manual
No ratings yet
UT525 526 User Manual
31 pages
Audi A4 Avant 95-01 Service & Repair Manual - Heating and AC
No ratings yet
Audi A4 Avant 95-01 Service & Repair Manual - Heating and AC
231 pages
Amzn1.Tortuga.3.Bc55a94a 9d6d 4883 A54c 888faa4c62c0.T23B1PWXCPIVJP
No ratings yet
Amzn1.Tortuga.3.Bc55a94a 9d6d 4883 A54c 888faa4c62c0.T23B1PWXCPIVJP
392 pages
Algebra Balance Scales
No ratings yet
Algebra Balance Scales
1 page
Discussion - Design Concepts For Jib Cranes
No ratings yet
Discussion - Design Concepts For Jib Cranes
2 pages
Heizer 17-1
No ratings yet
Heizer 17-1
33 pages
176 Series Remote Indicators-SBEM
No ratings yet
176 Series Remote Indicators-SBEM
2 pages

Deepbind: 6.874 - Pranam Chatterjee

Uploaded by

Deepbind: 6.874 - Pranam Chatterjee

Uploaded by

DeepBind

6.874 - Pranam Chatterjee

Alipanahi, et al., Nature Biotechnology, 2015.

Alipanahi, et al., Nature Biotechnology, 2015.

(This took 4+ years, btw)

Alipanahi, et al., Nature Biotechnology, 2015.

Example Competing Algorithms (26 in total)

● FeatureREDUCE (biophysical PWM/k-mer) None of these are deep-learning-based!

Alipanahi, et al., Nature Biotechnology, 2015.

Rhee, et al., Nucleic Acids Research 2008,

Alipanahi, et al., Nature Biotechnology, 2015.

Alipanahi, et al., Nature Biotechnology, 2015.

Alipanahi, et al., Nature Biotechnology, 2015.

Alipanahi, et al., Nature Biotechnology, 2015.

Alipanahi, et al., Nature Biotechnology, 2015.

CHECK IT OUT YOURSELF: http://tools.genes.toronto.edu/deepbind/

Cool name bro

You might also like