0% found this document useful (0 votes)

53 views3 pages

Ie ML Project (Getting Started)

1) The document provides guidance on building a machine learning model to classify spam emails. 2) It outlines 13 steps, noting that steps 7-13 involve more complex concepts that should be learned thoroughly. 3) Key steps include preprocessing the text data by removing punctuation and null values, converting the target variable to binary values, and using count vectorization and Naive Bayes classification to build the model.

Uploaded by

nicool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views3 pages

Ie ML Project (Getting Started)

Uploaded by

nicool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

IE ML PROJECT(GETTING STARTED)

FROM STEP 7-13 THE LEARNING CURVE IS PRETTY HIGH SO I’D

REQUEST YOU TO TAKE YOUR TIME AND LEARN THESE CONCEPTS
THOROUGHLY AND THEN PROCEED. YOU’LL GET A LOT OF ERRORS
WHILE WORKING ON IT BUT IT’S ALL PART OF THE PROCESS! WE’VE ALL
BEEN THERE!

A few things to notice :

1) Make sure you put encoding = ‘ISO-8859-1’ when creating the data frame
otherwise you might get a UTF error.

2) On printing the dataset you can see something like this:

3) There are 3 unnamed columns of values NaN which signifies NULL values.

We don't need them so go ahead and do:

4) Now that we’ve gotten rid of the NULL columns let’s change the two useful columns
to something more meaningful.

5) On doing df.info() you can see we have 5572 entries with 5572 NON_NULL objects
in both columns so there isn’t any NULL values we need to take care of. We are good
to go.

6) Since it’s a classification algorithm we need to change the target variable to 0 and
1’s instead of ham and spam. Since we are detecting spam we’ll make spam = 1.

7) Now you can see in the dataset we have a lot of punctuations which we need to
remove.

e.g : !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~

8) Go ahead and write a function which removes punctuation from the entire dataset.

(Hint : Use : string.punctuation)

9) Split the data using train test split.

10) Now comes the important part : Where you choose the ham words and put it in
bag1 and choose spam words and put it in bag2. So next time you see a ham word
check the probability of the word on being there in the ham bag.

1 - probability of the word on being there in the ham bag = probability of the word on
being there in the spam bag.

Because computer doesn’t understand text words input, you need to convert it into a
matrix of binary numbers.

For more information refer to this:

http://www.inf.ed.ac.uk/teaching/courses/inf2b/learnnotes/inf2b-learn-note07-2up.pdf

11) The whole putting it into the bag and converting it into a matrix can done using
Count_Vectorizer and Tf-Idf Vectorizer. Look it up it’s a nice concept!

12) Use Count_Vectorizer/Tf-Idf Vectorizer and fit it in Multinomial Naive Bayes.

13) Use a preferred metrics to find the accuracy.

GOODBYE

______________________________________________________________________________

CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
W. J. Beek, K. M. K. Muttzall, J. W. Van Heuven Transport Phenomena PDF
100% (2)
W. J. Beek, K. M. K. Muttzall, J. W. Van Heuven Transport Phenomena PDF
303 pages
Ex: # 01: A Sample of Dry Anthracite Has The Following: 44kg CO
100% (1)
Ex: # 01: A Sample of Dry Anthracite Has The Following: 44kg CO
12 pages
Email Spam Detection Final Presentation-21BSCHH010002
No ratings yet
Email Spam Detection Final Presentation-21BSCHH010002
17 pages
AI Phase4
No ratings yet
AI Phase4
11 pages
Implemention of Sms Spam Filtering
No ratings yet
Implemention of Sms Spam Filtering
27 pages
Naive Bayes Classification - Jupyter Notebook
No ratings yet
Naive Bayes Classification - Jupyter Notebook
4 pages
Manual
No ratings yet
Manual
48 pages
Notebook - Text Classification
No ratings yet
Notebook - Text Classification
7 pages
Spam Detection Model
No ratings yet
Spam Detection Model
4 pages
Quiz 2
No ratings yet
Quiz 2
11 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Arnav MLlab04
No ratings yet
Arnav MLlab04
7 pages
FND Imp Points
No ratings yet
FND Imp Points
6 pages
Project Ali Huzaifa
No ratings yet
Project Ali Huzaifa
6 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
No ratings yet
NLP Labsheet-2 Sentiment Analysis Using Naive Bayes Classifier
15 pages
cs188 Fa22 Note19
No ratings yet
cs188 Fa22 Note19
8 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
38 pages
Bayesian Inference
No ratings yet
Bayesian Inference
20 pages
ML Week10.1
No ratings yet
ML Week10.1
5 pages
DWDM Pavan Final
No ratings yet
DWDM Pavan Final
10 pages
ML Book
No ratings yet
ML Book
40 pages
Methodology
No ratings yet
Methodology
9 pages
Pricing Mercari
No ratings yet
Pricing Mercari
41 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Aiml Assignment-2
No ratings yet
Aiml Assignment-2
8 pages
Python CA 4
No ratings yet
Python CA 4
9 pages
Probabilistic Reasoning Lab Procedure
No ratings yet
Probabilistic Reasoning Lab Procedure
4 pages
Lecture # 09
No ratings yet
Lecture # 09
3 pages
Machine Learning Path
No ratings yet
Machine Learning Path
21 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
Purva Rawale - BDA Practical No 2
No ratings yet
Purva Rawale - BDA Practical No 2
9 pages
Anti-Spam Filter Based On Naïve Bayes, SVM, and KNN Model
No ratings yet
Anti-Spam Filter Based On Naïve Bayes, SVM, and KNN Model
5 pages
ML Lab Exercise - 9
No ratings yet
ML Lab Exercise - 9
4 pages
Data Mining Numericals
No ratings yet
Data Mining Numericals
38 pages
Building A Powered Ai and Spam Caller
No ratings yet
Building A Powered Ai and Spam Caller
7 pages
Lec 09
No ratings yet
Lec 09
50 pages
AI Report
No ratings yet
AI Report
8 pages
Module3 Ids
No ratings yet
Module3 Ids
17 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
Machine Learning Learning With Email Spam Detection
No ratings yet
Machine Learning Learning With Email Spam Detection
5 pages
Aiml Lab Aim & Alg
No ratings yet
Aiml Lab Aim & Alg
22 pages
Machine Learning Lab Assignment 2
No ratings yet
Machine Learning Lab Assignment 2
23 pages
Lec 09
No ratings yet
Lec 09
50 pages
cs221 Lecture10
No ratings yet
cs221 Lecture10
43 pages
Information Security Awareness - Refresher Course
100% (2)
Information Security Awareness - Refresher Course
83 pages
NEEL (1) Edited Edited
No ratings yet
NEEL (1) Edited Edited
12 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Amlnew
No ratings yet
Amlnew
25 pages
Practical 3
No ratings yet
Practical 3
11 pages
Numpy Module
No ratings yet
Numpy Module
10 pages
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
No ratings yet
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
119 pages
ML Lab Experiments (1) - Pages-3
No ratings yet
ML Lab Experiments (1) - Pages-3
11 pages
Lab5 Example Fall 23
No ratings yet
Lab5 Example Fall 23
4 pages
NEEL
No ratings yet
NEEL
12 pages
Microproject Report
No ratings yet
Microproject Report
23 pages
6 Finetuning For Classification - Build A Large Language Model (From Scratch)
No ratings yet
6 Finetuning For Classification - Build A Large Language Model (From Scratch)
24 pages
Lab5 NaiveBayes Full
No ratings yet
Lab5 NaiveBayes Full
5 pages
Shivm - Bi@mrplksiokin Plesment@nitkackin Trening@nitkkackin Trening - Sentr@mrplksiokin
No ratings yet
Shivm - Bi@mrplksiokin Plesment@nitkackin Trening@nitkkackin Trening - Sentr@mrplksiokin
4 pages
Kill Itkeep Servinf Us Like Yhat - Thanks A Lot
No ratings yet
Kill Itkeep Servinf Us Like Yhat - Thanks A Lot
1 page
Confis
No ratings yet
Confis
1 page
Fly Ash
No ratings yet
Fly Ash
9 pages
Momentum Transfer: Jul-Dec 2006 Instructor: Dr. S. Ramanathan Office: CHL 210 Email: Srinivar@iitm - Ac.in Class Notes
No ratings yet
Momentum Transfer: Jul-Dec 2006 Instructor: Dr. S. Ramanathan Office: CHL 210 Email: Srinivar@iitm - Ac.in Class Notes
40 pages

Ie ML Project (Getting Started)

Uploaded by

Ie ML Project (Getting Started)

Uploaded by

IE ML PROJECT(GETTING STARTED)

FROM STEP 7-13 THE LEARNING CURVE IS PRETTY HIGH SO I’D

A few things to notice :

2) On printing the dataset you can see something like this:

We don't need them so go ahead and do:

(Hint : Use : string.punctuation)

9) Split the data using train test split.

For more information refer to this:

12) Use Count_Vectorizer/Tf-Idf Vectorizer and fit it in Multinomial Naive Bayes.

13) Use a preferred metrics to find the accuracy.

You might also like