0% found this document useful (0 votes)
2 views3 pages

MLT Lab 06

The document outlines a practical assignment for using the Naïve Bayesian Classifier to classify a set of documents, detailing the steps for text preprocessing, training, prediction, and evaluation metrics such as accuracy, precision, and recall. It explains the theoretical foundation of the classifier based on Bayes' Theorem and includes source code for implementation in Python using libraries like pandas and sklearn. Key assumptions of the model are also discussed, emphasizing the independence of features and the need for a representative training dataset.

Uploaded by

ponete3977
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

MLT Lab 06

The document outlines a practical assignment for using the Naïve Bayesian Classifier to classify a set of documents, detailing the steps for text preprocessing, training, prediction, and evaluation metrics such as accuracy, precision, and recall. It explains the theoretical foundation of the classifier based on Bayes' Theorem and includes source code for implementation in Python using libraries like pandas and sklearn. Key assumptions of the model are also discussed, emphasizing the independence of features and the need for a representative training dataset.

Uploaded by

ponete3977
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Name – Amit Shukla

Roll No. – 2200971640010


Branch – AIML
Subject – Machine Learning Technique Lab

Practical-06

AIM - Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
precision, and recall for your data set

Theory:- The Naïve Bayesian Classifier is a probabilis c machine learning model used for text classifica on
tasks, such as spam detec on or sen ment analysis. It is based on Bayes' Theorem, with the "naïve" assump
on that all features (words in a document) are independent of each other given the class label. Despite this
simplifica on, it performs remarkably well in prac cal applica ons.

Key Concepts:
 Bayes’ Theorem:
It provides a way to calculate the probability of a hypothesis given the evidence.

Prior Probability P(H):


Probability of a class (e.g., posi ve or nega ve) before seeing the data.
Likelihood P(E∣H):
Probability of observing a word in a document, given the class.
Posterior Probability P(H∣E):
Final probability of the class given the observed features (words).
Feature Independence Assump on:
Assumes each word in the document contributes independently to the class probability.

How the Naïve Bayesian Classifier Works for Document Classifica on:
1. Preprocess the Text:
Convert documents into tokens (words), remove stopwords, and vectorize the data using techniques
like Bag of Words or TF-IDF.
2. Training Phase:
Use the training documents and their labels to calculate the prior and likelihood probabili es for
each class.
3. Predic on Phase:
For a new/unseen document, compute the posterior probability for each class, and assign the class
with the highest probability.
4. Evalua on:
Use metrics such as Accuracy, Precision, and Recall to evaluate model performance.

Assump ons of Naïve Bayesian Classifier:

• The features (words) are condi onally independent given the class.

• The training dataset is representa ve of the real-world distribu on.

• The input text is already preprocessed (cleaned and vectorized).

Source Code :-

import pandas as pd
msg = pd.read_csv('/content/sample_data/document.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0]) msg['labelnum'] =
msg.label.map({'pos': 1, 'neg': 0})

X = msg.message
y = msg.labelnum
from sklearn.model_selec on import train_test_split Xtrain,
Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extrac on.text import CountVectorizer

count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
……………………………………………………………………………

df = pd.DataFrame(Xtrain_dm.toarray(), columns=count_v.get_feature_names_out())
print(df[0:5])
from sklearn.naive_bayes import Mul nomialNB clf = Mul nomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
…………………………………………………
………………………… for doc, p in
zip(Xtrain, pred): p = 'pos' if p == 1 else 'neg'
print("%s -> %s" % (doc, p))

from
sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
print('Accuracy Metrics: \n') print('Accuracy: ', accuracy_score(ytest, pred)) print('Recall: ',
recall_score(ytest, pred)) print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

You might also like