0% found this document useful (0 votes)

41 views

DNA Sequencing With Machine Learning

This document discusses using machine learning classifiers to predict gene function from DNA sequences. It introduces using k-mers to represent DNA sequences as bags of words that can be analyzed using natural language processing and machine learning techniques. The document walks through preparing human, chimpanzee and dog DNA sequence and label data, using k-mers and CountVectorizer to represent the sequences as word counts, splitting the human data into train and test sets, training a multinomial naive Bayes classifier on the k-mer counts, and evaluating the classifier's performance on the test set.

Uploaded by

esraamohammed1112000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

DNA Sequencing With Machine Learning

Uploaded by

esraamohammed1112000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

DNA sequencing and applying

classifier with ML

INTRODUCATION:- 2

In the field of medical information research, the

genetic series is widely used as a component of a
category. One of the applications of ML is
biochemistry. Bioinformatics is an interdisciplinary
science that uses computers and communication
science to understand biological data. One of its
most difficult tasks is to distinguish between regular
genes and disease-causing genes.
3

The classification of gene sequences into

existing categories is utilized in genomic
research to discover the functions of novel
proteins. As a result, it is critical to identify
and categorize such genes. We employ ML
approaches to distinguish between infected
and normal genes using classification
methods.
I will apply a classification model that
can predict a gene's function based on
the DNA sequence of the coding
sequence alone.
5

You will need some libraries

such as: numpy, pandas ..
I will upload human data and read it 6

to became have some data for human

DNA sequence coding regions
and a class label.
7

I also upload and read data

for Chimpanzee and a more
divergent species, the dog.
Here are the definitions for each of 8

the 7 classes and how many there are

in the human training data. They are
gene sequence function groups.
9

Since seq is not equal, we will apply the k-

mers to the complete sequences.
Using get Kmers function
10

Now, our coding sequence data is

changed to lowercase, split up into all
possible k-mer words of length 6
11
12
13

Since we are going to use scikit-learn

natural language processing tools to
do the k-mer counting, we need to
now convert the lists of k-mers for
each gene into string sentences of
words that the count vectorizer can
use.
14
We can also make a y variable
to hold the class labels.
16
17

We will perform the same

steps for chimpanzee and dog
18
19
20
21
we will apply the BAG of WORDS
using CountVectorizer using NLP.
This is equivalent to k-mer counting.
23
24

If we have a look at class balance we can

see we have relatively balanced dataset.
25
26
27

Splitting the human dataset into the

training set and test set.
28

A multinomial naive Bayes classifier will be

created. I previously did some parameter
tuning and found the ngram size of 4
(reflected in the Countvectorizer() instance)
and a model alpha of 0.1 did the best
29
let's look at some model
performce metrics like the
confusion matrix, accuracy,
precision, recall and f1 score.
We are getting really good
results on our unseen data,
31
32
33
THANK YOU

MITx 6.86x Notes - MD
No ratings yet
MITx 6.86x Notes - MD
91 pages
Illustrating The Rectangular Coordinate System and Its Uses: Grade 8 - Mathematics Week 5, Quarter 1
100% (1)
Illustrating The Rectangular Coordinate System and Its Uses: Grade 8 - Mathematics Week 5, Quarter 1
75 pages
Bio Report El
No ratings yet
Bio Report El
8 pages
Genomic Sequence Data Classification Using Machine Learning Techniques
100% (1)
Genomic Sequence Data Classification Using Machine Learning Techniques
23 pages
DA2
No ratings yet
DA2
8 pages
DA3
No ratings yet
DA3
12 pages
Gene Prediction Using Statistical Methods
No ratings yet
Gene Prediction Using Statistical Methods
47 pages
EM AND FORWARD (1)
No ratings yet
EM AND FORWARD (1)
11 pages
Machine Learning in Genomics Medicine
No ratings yet
Machine Learning in Genomics Medicine
22 pages
Personalized Cancer Diagnosis
No ratings yet
Personalized Cancer Diagnosis
100 pages
Clasification 1 - 240117 - 133229
No ratings yet
Clasification 1 - 240117 - 133229
10 pages
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet
Personalised Medicine Solution Methodology
No ratings yet
Personalised Medicine Solution Methodology
4 pages
Gene Prediction
No ratings yet
Gene Prediction
24 pages
Research Article Analysis of DNA Sequence Classification Using CNN and Hybrid Models
No ratings yet
Research Article Analysis of DNA Sequence Classification Using CNN and Hybrid Models
12 pages
nihms-839467
No ratings yet
nihms-839467
30 pages
Data Mining Fall 2023
No ratings yet
Data Mining Fall 2023
15 pages
LayoutingFix
No ratings yet
LayoutingFix
8 pages
Wa0001
No ratings yet
Wa0001
39 pages
MLExample1-1
No ratings yet
MLExample1-1
37 pages
A Review of Deep Learning Applications in Human Genomics Using Next-Generation Sequencing Data
No ratings yet
A Review of Deep Learning Applications in Human Genomics Using Next-Generation Sequencing Data
20 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
4 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
49 pages
cp4252-machine-learning-lab-manual
No ratings yet
cp4252-machine-learning-lab-manual
38 pages
Gene Prediction
No ratings yet
Gene Prediction
25 pages
ML Lab Experiments (1) - Pages-2
No ratings yet
ML Lab Experiments (1) - Pages-2
10 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Machine Learning
100% (1)
Machine Learning
21 pages
Lecture 05 Preview
No ratings yet
Lecture 05 Preview
65 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
50 pages
AI Report 6sem
No ratings yet
AI Report 6sem
6 pages
Nerative Models For Classification
No ratings yet
Nerative Models For Classification
31 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Machine Learning Lab Manual (15CSL76)
No ratings yet
Machine Learning Lab Manual (15CSL76)
30 pages
Bioinformatics TM4
No ratings yet
Bioinformatics TM4
44 pages
ml syll
No ratings yet
ml syll
2 pages
2023s2 Cosc122 Assignment1 Handout
No ratings yet
2023s2 Cosc122 Assignment1 Handout
9 pages
[24.07] Genomic Language Models Opportunities and Challenges
No ratings yet
[24.07] Genomic Language Models Opportunities and Challenges
25 pages
AD3461_ML_MANUAL
No ratings yet
AD3461_ML_MANUAL
34 pages
Natural Computing with Python: Learn to implement genetic and evolutionary algorithms to solve problems in a pythonic way
From Everand
Natural Computing with Python: Learn to implement genetic and evolutionary algorithms to solve problems in a pythonic way
Giancarlo Zaccone
No ratings yet
Lab Manual: Department of Computer Science and Engineering
No ratings yet
Lab Manual: Department of Computer Science and Engineering
30 pages
ML Lab Manual
No ratings yet
ML Lab Manual
26 pages
lecture3-linear-classifiers
No ratings yet
lecture3-linear-classifiers
36 pages
Cores Bioinformatics_and_Computational_Biology
No ratings yet
Cores Bioinformatics_and_Computational_Biology
4 pages
AI and ML Lab Manual
No ratings yet
AI and ML Lab Manual
29 pages
Final PRINT 2022 SCHEME VI SEM SCHEME & SYLLABUS
No ratings yet
Final PRINT 2022 SCHEME VI SEM SCHEME & SYLLABUS
30 pages
[S1 IJEECS 2021 Rohit Chivukula] Classifying Clinically KNN and SVM
No ratings yet
[S1 IJEECS 2021 Rohit Chivukula] Classifying Clinically KNN and SVM
8 pages
Machine learning 2
No ratings yet
Machine learning 2
18 pages
Cancer Prediction ML
No ratings yet
Cancer Prediction ML
15 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
51 pages
Deep Learning: New Computational Modelling Techniques For Genomics
No ratings yet
Deep Learning: New Computational Modelling Techniques For Genomics
15 pages
CPSC 481 Handout - Bio. Search & Machine Learning & Stochastic
No ratings yet
CPSC 481 Handout - Bio. Search & Machine Learning & Stochastic
8 pages
B.Tech.AIDS-90
No ratings yet
B.Tech.AIDS-90
1 page
Lab Manual-ANN
No ratings yet
Lab Manual-ANN
7 pages
ML LAB 146
No ratings yet
ML LAB 146
50 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
19_ML_intro
No ratings yet
19_ML_intro
33 pages
Proj 782
No ratings yet
Proj 782
31 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Lesson Plan 1 For Science
100% (1)
Lesson Plan 1 For Science
7 pages
Pendahuluan Swamedikasi Penyakit Kulit
No ratings yet
Pendahuluan Swamedikasi Penyakit Kulit
13 pages
21648211557043-INVOICE
No ratings yet
21648211557043-INVOICE
12 pages
Scarlet Witch Hex Powers
No ratings yet
Scarlet Witch Hex Powers
4 pages
Risk Management Toolkit
No ratings yet
Risk Management Toolkit
24 pages
Applications of The Chain Rule
No ratings yet
Applications of The Chain Rule
2 pages
Stiffness Modifiers As Per Is Code-Etabs Application
100% (1)
Stiffness Modifiers As Per Is Code-Etabs Application
2 pages
A Review of Shunt Active Power Filters With Fuzzy Logic Controller PDF
No ratings yet
A Review of Shunt Active Power Filters With Fuzzy Logic Controller PDF
5 pages
Infosys Dse&Sp Test 2
No ratings yet
Infosys Dse&Sp Test 2
18 pages
MiniOTDR 3x
No ratings yet
MiniOTDR 3x
8 pages
Pr1 Module 1
No ratings yet
Pr1 Module 1
37 pages
Detlefsen (Auth.) - Hilbert's Program - An Essay On Mathematical Instrumentalism-Springer Netherlands (1986)
No ratings yet
Detlefsen (Auth.) - Hilbert's Program - An Essay On Mathematical Instrumentalism-Springer Netherlands (1986)
198 pages
01 Skills Checklist 1
No ratings yet
01 Skills Checklist 1
2 pages
Proposal For IM's Preparation Program Matrix
No ratings yet
Proposal For IM's Preparation Program Matrix
4 pages
Values PowerPoint
No ratings yet
Values PowerPoint
14 pages
Direct and Indirect Speech
No ratings yet
Direct and Indirect Speech
11 pages
Vodka More Vodka 2024 23
No ratings yet
Vodka More Vodka 2024 23
5 pages
Lesson Plan in Englih 6
No ratings yet
Lesson Plan in Englih 6
8 pages
Lesson 1 Oral Communication in Context
No ratings yet
Lesson 1 Oral Communication in Context
18 pages
AP Calculus AB Summer Assignment 2024-2025
No ratings yet
AP Calculus AB Summer Assignment 2024-2025
10 pages
Form 2 - Geography - Marking Scheme
No ratings yet
Form 2 - Geography - Marking Scheme
8 pages
WAGO 750-1415en PDF
No ratings yet
WAGO 750-1415en PDF
7 pages
project_slcr_rivers
No ratings yet
project_slcr_rivers
10 pages
Organisational Culture and Dynamics: January 2018
No ratings yet
Organisational Culture and Dynamics: January 2018
10 pages
System Catalogue 2016 18430
No ratings yet
System Catalogue 2016 18430
490 pages
BILDUNGSROMAN
No ratings yet
BILDUNGSROMAN
13 pages
Ils Z Rwy10 Cat
No ratings yet
Ils Z Rwy10 Cat
1 page
Golitsis-Simplicius and Philoponus On The Authority Opf Aristotle
No ratings yet
Golitsis-Simplicius and Philoponus On The Authority Opf Aristotle
2 pages
Cambridge IGCSE: PHYSICS 0625/63
No ratings yet
Cambridge IGCSE: PHYSICS 0625/63
12 pages

DNA Sequencing With Machine Learning

Uploaded by

DNA Sequencing With Machine Learning

Uploaded by

DNA sequencing and applying

classifier with ML​

In the field of medical information research, the

The classification of gene sequences into

You will need some libraries

to became have some data for human

I also upload and read data

the 7 classes and how many there are

Since seq is not equal, we will apply the k-

Now, our coding sequence data is

Since we are going to use scikit-learn

We will perform the same

If we have a look at class balance we can

Splitting the human dataset into the

A multinomial naive Bayes classifier will be

You might also like

classifier with ML