0% found this document useful (0 votes)

32 views6 pages

Lab 78

fgfgfggf

Uploaded by

thuctranduynguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views6 pages

Lab 78

fgfgfggf

Uploaded by

thuctranduynguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Artificial intelligence Course: IT153IU

International University – VNU HCM Date: May 24th 2024

Dr. Nguyen Trung Ky Time: 2 weeks

Full name: Lê Hồng Quang

Student ID: ITITIU20286

Lab#7&8/Assignment#7&8: NaiveBayes
Exercise 1: Create a class NaiveBayesFilter, with an __init__() method
as defined in NaiveBayesFilter. Add a fit() method which takes as
arguments X, the training data, and y the training labels. In this case,
X is a pandas.Series containing strings that are SMS messages. For
each message in X count the number of occurrences of each word
and record this information in a DataFrame. The final form of the
DataFrame should have a column for each unique word that appears
in any message of X as well as a Label column, you may also include
any other columns you think you’ll need. Each row of the DataFrame
corresponds to a row of X, and records the number of occurrences of
each word in a given message. The Label column records the label of
the message. Save this DataFrame as self.data.
class NaiveBayesFilter:
def __init__(self):
self.data = []
self.vocabulary = [] # returns tuple of unique words
self.proba = []
self.prob = []
self.p_spam = 0 # Probability of Spam
self.p_ham = 0 # Probability of Ham
# Initiate parameters
self.parameters_spam = {unique_word: 0 for unique_word
in self.vocabulary}
self.parameters_ham = {unique_word: 0 for unique_word in
self.vocabulary}

def fit(self, X, y):

for sms in X:
for word in sms:
self.vocabulary.append(word)

self.vocabulary = list(set(self.vocabulary))

word_counts_per_sms = {unique_word: [0] * len(X) for

unique_word in self.vocabulary}
for index, sms in enumerate(X):
for word in sms:
word_counts_per_sms[word][index] += 1

word_counts = pd.DataFrame(word_counts_per_sms)
self.data = pd.concat([y, X, word_counts], axis=1)

print(self.data)

spam_messages = self.data[self.data['Label'] == 'spam']

ham_messages = self.data[self.data['Label'] == 'ham']

# P(Spam) and P(Ham)

self.p_spam = len(spam_messages) / len(self.data)
self.p_ham = len(ham_messages) / len(self.data)

# N_Spam
n_words_per_spam_message =
spam_messages['SMS'].apply(len)
n_spam = n_words_per_spam_message.sum()

# N_Ham
n_words_per_ham_message = ham_messages['SMS'].apply(len)
n_ham = n_words_per_ham_message.sum()

# N_Vocabulary
n_vocabulary = len(self.vocabulary)

# Laplace smoothing
alpha = 1

print('Number of words in spam messages is: ' +

str(n_spam) + '\n')
print('Number of words in ham messages is: ' +
str(n_ham) + '\n')
print('Number of unique words are: ' +
str(n_vocabulary))

# Calculate parameters
for word in self.vocabulary:
n_word_given_spam = spam_messages[word].sum() #
spam_messages already defined
p_word_given_spam = (n_word_given_spam + alpha) /
(n_spam + alpha * n_vocabulary)
self.parameters_spam[word] = p_word_given_spam

n_word_given_ham = ham_messages[word].sum() #
ham_messages already defined
p_word_given_ham = (n_word_given_ham + alpha) /
(n_ham + alpha * n_vocabulary)
self.parameters_ham[word] = p_word_given_ham
return self.data

Exercise 2: Implement the predict_proba() method in your naïve Bayes

classifier. This should take as an argument X, the data that needs to
be classified. For each message x in X compute P(S|x) and P(H|x)
using Equations 1.3, 1.4, and 1.5. The method should return a (Nx2)
list, where N is the length of X. The first column corresponds to P(C =
H|x), and the second to P(C = S|x).
def predict_proba(self, X):
# Isolating spam and ham messages first
self.proba_dict = {sentences: [1] * 2 for sentences in
range(len(X))}
for index, sms in enumerate(X):
for word in sms:
if word in self.vocabulary:
self.proba_dict[index][0] *=
self.parameters_ham[word]
self.proba_dict[index][1] *=
self.parameters_spam[word]
else:
self.proba_dict[index][0] *= 1
self.proba_dict[index][1] *= 1

# print(self.proba_dict)

proba_table = pd.DataFrame(self.proba_dict, index=['ham',

'spam']).T

predict_list = []
for i in range(len(proba_table)):
if proba_table.loc[i, 'spam'] > proba_table.loc[i,
'ham']:
predict_list.append('spam')
else:
predict_list.append('ham')
predict_table = pd.DataFrame(predict_list,
columns=['predict'])
# print(predict_list)

self.proba = pd.concat([X, proba_table, predict_table],

axis=1)
return self.proba

Exercise 3: Implement the predict() method in your naïve Bayes

classifier. This should take as an argument X, the data that needs to
be classified. Implement equation 1.2 and return a list of labels that
predicts each message in X. For example:

#create the filter

NB = NaiveBayesFilter()

#fit the filter with train data

NB.fit(X_train, y_train)

#test the predict function with five data points in test data

NB.predict(X_test[500:505])

Output: ['ham', 'ham', 'ham', 'ham', 'spam']

def predict(self, X):
predicted_labels = []

for sms in X:
p_spam_given_sms = np.log(self.p_spam) # Log
probability of spam
p_ham_given_sms = np.log(self.p_ham) # Log probability
of ham

for word in sms:

if word in self.parameters_spam:
p_spam_given_sms +=
np.log(self.parameters_spam[word])
if word in self.parameters_ham:
p_ham_given_sms +=
np.log(self.parameters_ham[word])

if p_spam_given_sms > p_ham_given_sms:

predicted_labels.append('spam')
else:
predicted_labels.append('ham')

return predicted_labels

Exercise 4: Implement the score() method in your naïve Bayes

classifier. This should take two arguments as a list of predicted labels
(returns from predict() method) and a list of true labels in X and return
the matched labels between them which refer as recall metric.

For example:

#test the predict function with five data points in test data

predict_labels = NB.predict(X_test[500:505])

#calculate the score

recall = NB.score(y_test[500:505], predict_labels)

print("recall of NB: ", recall)

Output: recall of NB: 0.8

def score(self, y, predict_label):
A = [[0, 0],
[0, 0]]
for i in range(len(y)):
if (predict_label[i] == 'ham') & (y[i] == 'ham'):
A[0][0] += 1
elif (predict_label[i] == 'spam') & (y[i] == 'spam'):
A[1][1] += 1
elif (predict_label[i] == 'spam') & (y[i] == 'ham'):
A[1][0] += 1
else:
A[0][1] += 1
for line in A:
print(' '.join(map(str, line)))

precision = A[0][0] / (A[0][0] + A[1][0])

recall = A[0][0] / (A[0][0] + A[0][1])
F1 = (2 * precision * recall) / (precision + recall)

accuracy = (A[0][0] + A[1][1]) / len(y)

print('Naive Bayes accuracy: ', accuracy * 100, '%')
return precision, recall, F1

NZ National Vital Signs Chart
No ratings yet
NZ National Vital Signs Chart
2 pages
Review Answers: Your Answer
50% (2)
Review Answers: Your Answer
3 pages
20 Pips Daily Price Action Forex Breakout Strategy
0% (1)
20 Pips Daily Price Action Forex Breakout Strategy
4 pages
Chapter 6 Barriers To International Trade
No ratings yet
Chapter 6 Barriers To International Trade
13 pages
Second Series Plays, JUSTICE by John Galsworthy
100% (1)
Second Series Plays, JUSTICE by John Galsworthy
66 pages
37 Hounds of Low Tide PDF
No ratings yet
37 Hounds of Low Tide PDF
2 pages
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
No ratings yet
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
4 pages
Example J.6 Base Plate Bearing On Concrete: Merican Nstitute of Teel Onstruction
100% (1)
Example J.6 Base Plate Bearing On Concrete: Merican Nstitute of Teel Onstruction
4 pages
Artificial Intelligence Lab
No ratings yet
Artificial Intelligence Lab
35 pages
ML Lab Manual PDF
No ratings yet
ML Lab Manual PDF
9 pages
Object Oriented Programming in Java
No ratings yet
Object Oriented Programming in Java
5 pages
Metro Jobs Clearance Form Blank
100% (1)
Metro Jobs Clearance Form Blank
1 page
An Example of Text Classification With Naïve Bayes
No ratings yet
An Example of Text Classification With Naïve Bayes
4 pages
ML Assignment 4
No ratings yet
ML Assignment 4
10 pages
Quiz 2
No ratings yet
Quiz 2
11 pages
Naive Bayes - An Example
No ratings yet
Naive Bayes - An Example
4 pages
Homework3 Sol
No ratings yet
Homework3 Sol
5 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
Lab5 NaiveBayes Full
No ratings yet
Lab5 NaiveBayes Full
5 pages
Astm C273-C273M - 19
No ratings yet
Astm C273-C273M - 19
9 pages
Machine Learning Learning With Email Spam Detection
No ratings yet
Machine Learning Learning With Email Spam Detection
5 pages
Anti-Spam Filter Based On Naïve Bayes, SVM, and KNN Model
No ratings yet
Anti-Spam Filter Based On Naïve Bayes, SVM, and KNN Model
5 pages
Soil Variability and Its Consequences in Geotechnical Engineering
No ratings yet
Soil Variability and Its Consequences in Geotechnical Engineering
302 pages
Bahrick Et Al. (1993) Spacing Effect
No ratings yet
Bahrick Et Al. (1993) Spacing Effect
7 pages
Manual
No ratings yet
Manual
48 pages
Ie ML Project (Getting Started)
No ratings yet
Ie ML Project (Getting Started)
3 pages
Statement of Purpose (Ashok)
No ratings yet
Statement of Purpose (Ashok)
2 pages
Check Your Progress II - Cristea Florin PDF
No ratings yet
Check Your Progress II - Cristea Florin PDF
3 pages
Python CA 4
No ratings yet
Python CA 4
9 pages
AI Phase4
No ratings yet
AI Phase4
11 pages
MCQ
67% (3)
MCQ
274 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Lab Manual ML
No ratings yet
Lab Manual ML
28 pages
Email Spam Classifier
No ratings yet
Email Spam Classifier
22 pages
Exp-9 - Jupyter Notebook
No ratings yet
Exp-9 - Jupyter Notebook
2 pages
Bajaj Project
No ratings yet
Bajaj Project
36 pages
ML Lab
No ratings yet
ML Lab
7 pages
Lab7&8 NaiveBayes
No ratings yet
Lab7&8 NaiveBayes
5 pages
Spam Detection
No ratings yet
Spam Detection
10 pages
Microproject Report
No ratings yet
Microproject Report
23 pages
ML Lab Experiments (1) - Pages-3
No ratings yet
ML Lab Experiments (1) - Pages-3
11 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
CS 471 HW 3 - Spam Detection
No ratings yet
CS 471 HW 3 - Spam Detection
6 pages
Bayesian Inference
No ratings yet
Bayesian Inference
20 pages
Code
No ratings yet
Code
6 pages
CSE 422 Machine Learning Probabilistic Methods
No ratings yet
CSE 422 Machine Learning Probabilistic Methods
28 pages
Factory Act Return PDF Download Form 22
No ratings yet
Factory Act Return PDF Download Form 22
1 page
When I Was
No ratings yet
When I Was
6 pages
Implemention of Sms Spam Filtering
No ratings yet
Implemention of Sms Spam Filtering
27 pages
The Present Continuous
No ratings yet
The Present Continuous
4 pages
Service Manual: Viewsonic Pjd6211
No ratings yet
Service Manual: Viewsonic Pjd6211
60 pages
WEEK 4 - Hiking PPT With Youtube Links
No ratings yet
WEEK 4 - Hiking PPT With Youtube Links
25 pages
RF Heating: Created in COMSOL Multiphysics 5.3a
No ratings yet
RF Heating: Created in COMSOL Multiphysics 5.3a
22 pages
Gen Ai Lab Programs
No ratings yet
Gen Ai Lab Programs
15 pages
Notebook - Text Classification
No ratings yet
Notebook - Text Classification
7 pages
Mitosis Lecture PDF
No ratings yet
Mitosis Lecture PDF
11 pages
Sodapdf
No ratings yet
Sodapdf
1 page
Simple Naive Bayes Classifier For Email Classification
No ratings yet
Simple Naive Bayes Classifier For Email Classification
5 pages
ECommerce Virtual Assistant Course
100% (1)
ECommerce Virtual Assistant Course
18 pages
Spam Detection Model
No ratings yet
Spam Detection Model
4 pages
Aiml Assignment-2
No ratings yet
Aiml Assignment-2
8 pages
Naive Bayes Classification - Jupyter Notebook
No ratings yet
Naive Bayes Classification - Jupyter Notebook
4 pages
Assignment - 01
No ratings yet
Assignment - 01
4 pages
Module 7 Intangibles
No ratings yet
Module 7 Intangibles
14 pages
AIML 3.2.dpk
No ratings yet
AIML 3.2.dpk
3 pages
cs188 Fa22 Note19
No ratings yet
cs188 Fa22 Note19
8 pages
Fam PR-10
No ratings yet
Fam PR-10
4 pages
IR Prac 5
No ratings yet
IR Prac 5
3 pages
2023 Reports - Luminate On Diversity
No ratings yet
2023 Reports - Luminate On Diversity
28 pages
Email Spam Detection
No ratings yet
Email Spam Detection
3 pages
Ai&Ml Lab: Dept of CSE, SUK
No ratings yet
Ai&Ml Lab: Dept of CSE, SUK
3 pages
ISYE6740 Fall2024 HW4 Rubric
No ratings yet
ISYE6740 Fall2024 HW4 Rubric
5 pages
Daima Jieshi
No ratings yet
Daima Jieshi
5 pages
Atul MLT Exp 4-11
No ratings yet
Atul MLT Exp 4-11
17 pages
Parenteral Feeding
No ratings yet
Parenteral Feeding
3 pages
Homework 4
0% (1)
Homework 4
4 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
40 pages
Endeavor
No ratings yet
Endeavor
2 pages
Cse Machine Learning Lab Manual
No ratings yet
Cse Machine Learning Lab Manual
22 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
86 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
PL LAB 3 File
No ratings yet
PL LAB 3 File
56 pages
Ch1 Solid State Formula Sheet 12th Science Chemistry
No ratings yet
Ch1 Solid State Formula Sheet 12th Science Chemistry
9 pages
2.naïve Bayes Classifier For Sms
No ratings yet
2.naïve Bayes Classifier For Sms
9 pages
TASK04_EMAILSPAMDETECTIONWITHMACHINELEARNING_1752340927
No ratings yet
TASK04_EMAILSPAMDETECTIONWITHMACHINELEARNING_1752340927
2 pages
Email Class
No ratings yet
Email Class
39 pages
ML lab 6-9 (1)
No ratings yet
ML lab 6-9 (1)
15 pages
2034055 - CTI Record.docx
No ratings yet
2034055 - CTI Record.docx
49 pages

Lab 78

Uploaded by

Lab 78

Uploaded by

Artificial intelligence Course: IT153IU

International University – VNU HCM Date: May 24th 2024

Full name: Lê Hồng Quang

Student ID: ITITIU20286

def fit(self, X, y):

word_counts_per_sms = {unique_word: [0] * len(X) for

spam_messages = self.data[self.data['Label'] == 'spam']

# P(Spam) and P(Ham)

print('Number of words in spam messages is: ' +

Exercise 2: Implement the predict_proba() method in your naïve Bayes

proba_table = pd.DataFrame(self.proba_dict, index=['ham',

self.proba = pd.concat([X, proba_table, predict_table],

Exercise 3: Implement the predict() method in your naïve Bayes

#create the filter

#fit the filter with train data

Output: ['ham', 'ham', 'ham', 'ham', 'spam']

for word in sms:

if p_spam_given_sms > p_ham_given_sms:

Exercise 4: Implement the score() method in your naïve Bayes

#calculate the score

recall = NB.score(y_test[500:505], predict_labels)

print("recall of NB: ", recall)

Output: recall of NB: 0.8

precision = A[0][0] / (A[0][0] + A[1][0])

accuracy = (A[0][0] + A[1][1]) / len(y)

You might also like