0% found this document useful (0 votes)

37 views2 pages

Exp-9 - Jupyter Notebook

The document discusses building a spam classifier using Naive Bayes. It loads and preprocesses a spam dataset, builds vocabularies of words for spam and ham emails, and calculates accuracy, precision and recall of the classifier on a test set.

Uploaded by

Dhana Lakshmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views2 pages

Exp-9 - Jupyter Notebook

Uploaded by

Dhana Lakshmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

5/23/22, 8:08 PM Exp-9 - Jupyter Notebook

In [12]:

import numpy as np
import pandas as pd
from collections import Counter

In [13]:

data = pd.read_csv('./data/spam.csv')
data = data[['v1', 'v2']]
data.head()

Out[13]:

v1 v2

0 ham Go until jurong point, crazy.. Available only ...

1 ham Ok lar... Joking wif u oni...

2 spam Free entry in 2 a wkly comp to win FA Cup fina...

3 ham U dun say so early hor... U c already then say...

4 ham Nah I don't think he goes to usf, he lives aro...

In [14]:

data = data.sample(frac=1)
train, test = data[:4000], data[4000:]
X_train, X_test, y_train, y_test = train['v2'], test['v2'], train['v1'], test['v1']

In [15]:

X_train.shape, X_test.shape, y_train.shape, y_test.shape

Out[15]:

((4000,), (1572,), (4000,), (1572,))

In [16]:

vocab_spam = Counter()
for i in train[train['v1'] == 'spam']['v2']:
vocab_spam += Counter(i.split(' '))

vocab_ham = Counter()
for i in train[train['v1'] == 'ham']['v2']:
vocab_ham += Counter(i.split(' '))

In [17]:

test_sentence = 'to a a spam msg'

localhost:8888/notebooks/Exp-9.ipynb 1/2
5/23/22, 8:08 PM Exp-9 - Jupyter Notebook

In [18]:

intial_spam_guess = train[train['v1'] == 'spam'].shape[0]/train.shape[0]

intial_ham_guess = train[train['v1'] == 'ham'].shape[0]/train.shape[0]

In [22]:

d = {'TP': 0, 'FP': 0, 'TN': 0, 'FN': 0}

for (sentence, label) in zip(X_test, y_test):

spam_score = intial_spam_guess
ham_score = intial_ham_guess
for word in sentence.split(' '):
spam_score *= vocab_spam.get(word, 1)/sum(vocab_spam.values())
ham_score *= vocab_ham.get(word, 1)/sum(vocab_ham.values())

if spam_score > ham_score:

pred = 'spam'
else:
pred = 'ham'

if label == pred:
if label == 'spam':
d['TP'] += 1
else:
d['TN'] += 1
else:
if label == 'spam':
d['FP'] += 1
else:
d['FN'] += 1

In [29]:

print(f"Accuracy: {(d['TP'] + d['TN'])/sum(d.values())}")

print(f"Precision: {d['TP']/(d['TP'] + d['FP'])}")
print(f"Recall: {d['TP']/(d['TP'] + d['FN'])}")

Accuracy: 0.8810432569974554

Precision: 0.9808612440191388

Recall: 0.5283505154639175

In [ ]:

localhost:8888/notebooks/Exp-9.ipynb 2/2

0bk Awd Audi q5
80% (5)
0bk Awd Audi q5
197 pages
Physics Project: Water Level Indicator
100% (1)
Physics Project: Water Level Indicator
21 pages
CH - 4 Discrete Fourier Transform
100% (1)
CH - 4 Discrete Fourier Transform
64 pages
Unstructtured Data Classification Fresco
100% (1)
Unstructtured Data Classification Fresco
4 pages
Solar Tracking System Thesis PDF
100% (3)
Solar Tracking System Thesis PDF
5 pages
Introducing Your New AT&T Bill
0% (1)
Introducing Your New AT&T Bill
2 pages
Introduction To Keyboarding: Using Good Technique
No ratings yet
Introduction To Keyboarding: Using Good Technique
17 pages
NLP Study Plan For Beginners - HW Samples
No ratings yet
NLP Study Plan For Beginners - HW Samples
47 pages
School of Engineering: Lab Manual On Machine Learning Lab
No ratings yet
School of Engineering: Lab Manual On Machine Learning Lab
23 pages
Pa0201 2000
No ratings yet
Pa0201 2000
103 pages
Extra Feature NLP
No ratings yet
Extra Feature NLP
5 pages
Information Retrival
No ratings yet
Information Retrival
43 pages
Null 0
No ratings yet
Null 0
6 pages
1 AD&D Character Sheet
No ratings yet
1 AD&D Character Sheet
22 pages
AI Lab6
No ratings yet
AI Lab6
22 pages
Atul MLT Exp 4-11
No ratings yet
Atul MLT Exp 4-11
17 pages
Unit 4
No ratings yet
Unit 4
23 pages
Official AccuPoint-AdvancedNG User-Manual
No ratings yet
Official AccuPoint-AdvancedNG User-Manual
29 pages
Implemention of Sms Spam Filtering
No ratings yet
Implemention of Sms Spam Filtering
27 pages
Beginning JSP 2-From Novice To Professional
No ratings yet
Beginning JSP 2-From Novice To Professional
39 pages
A5 - Jupyter Notebook PDF
No ratings yet
A5 - Jupyter Notebook PDF
4 pages
Lab Manual ML
No ratings yet
Lab Manual ML
28 pages
Sharpmx 4111n Guide
No ratings yet
Sharpmx 4111n Guide
68 pages
ML Lab Report
No ratings yet
ML Lab Report
8 pages
Abap Important All Data Infor.
No ratings yet
Abap Important All Data Infor.
17 pages
Getting Started v9
No ratings yet
Getting Started v9
150 pages
057-085 9xx 91xx 92xx 94xx Ops PDF
No ratings yet
057-085 9xx 91xx 92xx 94xx Ops PDF
74 pages
Microproject Report
No ratings yet
Microproject Report
23 pages
Generating Permutations. Ranking and Unranking Permutations. The Pigeonhole Principle. The Inclusion and Exclusion Principle
No ratings yet
Generating Permutations. Ranking and Unranking Permutations. The Pigeonhole Principle. The Inclusion and Exclusion Principle
88 pages
NLP Assignment 4 (22bce9560)
No ratings yet
NLP Assignment 4 (22bce9560)
12 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
Assignment
No ratings yet
Assignment
6 pages
Lab 5
No ratings yet
Lab 5
7 pages
Report On - Social Media Research Topic Modeling
No ratings yet
Report On - Social Media Research Topic Modeling
26 pages
Email Spam Classifier
No ratings yet
Email Spam Classifier
22 pages
Installation Guide Zwcad 2023
No ratings yet
Installation Guide Zwcad 2023
14 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
DL
No ratings yet
DL
17 pages
2016-06-30 Zalar D. (2016, ST&TF, Budapest) Seamless Travel - Bridging The Gap Between Vision and Reality
No ratings yet
2016-06-30 Zalar D. (2016, ST&TF, Budapest) Seamless Travel - Bridging The Gap Between Vision and Reality
21 pages
Methodology
No ratings yet
Methodology
9 pages
Self Evaluation Exercises
No ratings yet
Self Evaluation Exercises
12 pages
ML Program Output
No ratings yet
ML Program Output
22 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
ML Lab Manual PDF
No ratings yet
ML Lab Manual PDF
9 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
Hari Kishan Reddy Dulganti - Java
No ratings yet
Hari Kishan Reddy Dulganti - Java
7 pages
Notebook - Text Classification
No ratings yet
Notebook - Text Classification
7 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
3 pages
ML Lab Programs
No ratings yet
ML Lab Programs
8 pages
AI Phase4
No ratings yet
AI Phase4
11 pages
7 Aiml
No ratings yet
7 Aiml
4 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
VC Series Motorized Valves
No ratings yet
VC Series Motorized Valves
8 pages
Sentiment Analysis Using LSTM
No ratings yet
Sentiment Analysis Using LSTM
5 pages
Python CA 4
No ratings yet
Python CA 4
9 pages
Exp No 5
No ratings yet
Exp No 5
5 pages
Import As Import As Import As Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import
No ratings yet
Import As Import As Import As Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import
8 pages
CS 471 HW 3 - Spam Detection
No ratings yet
CS 471 HW 3 - Spam Detection
6 pages
Lesson 1.2 Basic Word Problems
No ratings yet
Lesson 1.2 Basic Word Problems
5 pages
Word 2 Vec
No ratings yet
Word 2 Vec
3 pages
I041 NLP Assignment5
No ratings yet
I041 NLP Assignment5
12 pages
Code
No ratings yet
Code
6 pages
M100 - 4" Meters With Mechanical Register: Positive Displacement
No ratings yet
M100 - 4" Meters With Mechanical Register: Positive Displacement
2 pages
FND Imp Points
No ratings yet
FND Imp Points
6 pages
Lab 78
No ratings yet
Lab 78
6 pages
Email Spam Detection
No ratings yet
Email Spam Detection
3 pages
Sample
No ratings yet
Sample
6 pages
IR Prac 5
No ratings yet
IR Prac 5
3 pages
Naive Bayes Classification - Jupyter Notebook
No ratings yet
Naive Bayes Classification - Jupyter Notebook
4 pages
Clean Data
No ratings yet
Clean Data
4 pages
17 - Source Code - nlp-2-5
No ratings yet
17 - Source Code - nlp-2-5
4 pages
Artifacts OF THE PROCESS: Presented by
No ratings yet
Artifacts OF THE PROCESS: Presented by
15 pages
Muayad CV
No ratings yet
Muayad CV
1 page
TP-Link Archer C9 v1 - Unbrick and Back To Stock Step-By-Step Guide
No ratings yet
TP-Link Archer C9 v1 - Unbrick and Back To Stock Step-By-Step Guide
7 pages
Experiment 3 Word2Vec Custom Vectors Generation and Performing Classification
No ratings yet
Experiment 3 Word2Vec Custom Vectors Generation and Performing Classification
4 pages
Assingment-3 NLP
No ratings yet
Assingment-3 NLP
5 pages
Ai&Ml Lab: Dept of CSE, SUK
No ratings yet
Ai&Ml Lab: Dept of CSE, SUK
3 pages
ML Week10.1
No ratings yet
ML Week10.1
5 pages
Ass5 DL Inp OUT
No ratings yet
Ass5 DL Inp OUT
5 pages
Artificial Intelligence (18Csc305J) Lab: EXPERIMENT 13: Implementation of NLP Problem
No ratings yet
Artificial Intelligence (18Csc305J) Lab: EXPERIMENT 13: Implementation of NLP Problem
9 pages
Use of Robot Kits in Manufacturing Industry-CIM
No ratings yet
Use of Robot Kits in Manufacturing Industry-CIM
11 pages
Shreya Srivastava-27
No ratings yet
Shreya Srivastava-27
3 pages
Static Analysis of Binary Exe
No ratings yet
Static Analysis of Binary Exe
9 pages
6.4 Thin Lens Formula Worksheet Name
No ratings yet
6.4 Thin Lens Formula Worksheet Name
5 pages
Ec3401 Set4
No ratings yet
Ec3401 Set4
2 pages
Data Extraction and Text Analysis Blackcoffer Consulting: Objective
No ratings yet
Data Extraction and Text Analysis Blackcoffer Consulting: Objective
4 pages
Ie ML Project (Getting Started)
No ratings yet
Ie ML Project (Getting Started)
3 pages
Condor Scissors Lift t62 92367 Parts Book
100% (71)
Condor Scissors Lift t62 92367 Parts Book
20 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

Exp-9 - Jupyter Notebook

Uploaded by

Exp-9 - Jupyter Notebook

Uploaded by

5/23/22, 8:08 PM Exp-9 - Jupyter Notebook

0 ham Go until jurong point, crazy.. Available only ...

1 ham Ok lar... Joking wif u oni...

2 spam Free entry in 2 a wkly comp to win FA Cup fina...

3 ham U dun say so early hor... U c already then say...

4 ham Nah I don't think he goes to usf, he lives aro...

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((4000,), (1572,), (4000,), (1572,))

test_sentence = 'to a a spam msg'

intial_spam_guess = train[train['v1'] == 'spam'].shape[0]/train.shape[0]

d = {'TP': 0, 'FP': 0, 'TN': 0, 'FN': 0}

for (sentence, label) in zip(X_test, y_test):

if spam_score > ham_score:

print(f"Accuracy: {(d['TP'] + d['TN'])/sum(d.values())}")

You might also like