0% found this document useful (0 votes)

7 views13 pages

Naïve Bayesian Classifier and K-Means Clustering

The document discusses two machine learning techniques: Naïve Bayesian Classifier and K-Means Clustering. It explains the principles and applications of each method, including spam detection for Naïve Bayes and customer segmentation for K-Means. The document also provides mathematical formulations and examples to illustrate how these algorithms function in practice.

Uploaded by

megwejohnmwangi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views13 pages

Naïve Bayesian Classifier and K-Means Clustering

Uploaded by

megwejohnmwangi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Page 1 of 13 - Cover Page Submission ID trn:oid:::29034:86246888

Submission
My Files

My Files

University

Document Details

Submission ID

trn:oid:::29034:86246888 11 Pages

Submission Date 1,085 Words

Mar 17, 2025, 1:20 AM GMT+5:30

5,901 Characters

Download Date

Mar 17, 2025, 1:21 AM GMT+5:30

File Name

Naïve Bayesian Classifier and K means.docx

File Size

98.2 KB

Page 1 of 13 - Cover Page Submission ID trn:oid:::29034:86246888

Page 2 of 13 - AI Writing Overview Submission ID trn:oid:::29034:86246888

0% detected as AI Caution: Review required.

The percentage indicates the combined amount of likely AI-generated text as It is essential to understand the limitations of AI detection before making decisions
well as likely AI-generated text that was also likely AI-paraphrased. about a student’s work. We encourage you to learn more about Turnitin’s AI detection
capabilities before using the tool.

Detection Groups
1 AI-generated only 0%
Likely AI-generated text from a large-language model.

2 AI-generated text that was AI-paraphrased 0%

Likely AI-generated text that was likely revised using an AI-paraphrase tool
or word spinner.

Disclaimer
Our AI writing assessment is designed to help educators identify text that might be prepared by a generative AI tool. Our AI writing assessment may not always be accurate (it may misidentify
writing that is likely AI generated as AI generated and AI paraphrased or likely AI generated and AI paraphrased writing as only AI generated) so it should not be used as the sole basis for
adverse actions against a student. It takes further scrutiny and human judgment in conjunction with an organization's application of its specific academic policies to determine whether any
academic misconduct has occurred.

Frequently Asked Questions

How should I interpret Turnitin's AI writing percentage and false positives?

The percentage shown in the AI writing report is the amount of qualifying text within the submission that Turnitin’s AI writing
detection model determines was either likely AI-generated text from a large-language model or likely AI-generated text that was
likely revised using an AI-paraphrase tool or word spinner.

False positives (incorrectly flagging human-written text as AI-generated) are a possibility in AI models.

AI detection scores under 20%, which we do not surface in new reports, have a higher likelihood of false positives. To reduce the
likelihood of misinterpretation, no score or highlights are attributed and are indicated with an asterisk in the report (*%).

The AI writing percentage should not be the sole basis to determine whether misconduct has occurred. The reviewer/instructor
should use the percentage as a means to start a formative conversation with their student and/or use it to examine the submitted
assignment in accordance with their school's policies.

What does 'qualifying text' mean?

Our model only processes qualifying text in the form of long-form writing. Long-form writing means individual sentences contained in paragraphs that make up a
longer piece of written work, such as an essay, a dissertation, or an article, etc. Qualifying text that has been determined to be likely AI-generated will be
highlighted in cyan in the submission, and likely AI-generated and then likely AI-paraphrased will be highlighted purple.

Non-qualifying text, such as bullet points, annotated bibliographies, etc., will not be processed and can create disparity between the submission highlights and the
percentage shown.

Page 2 of 13 - AI Writing Overview Submission ID trn:oid:::29034:86246888

Page 3 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Naïve Bayesian Classifier and K-Means Clustering

Student’s Name

Institution Affiliation

Professor’s name

Course

Date

Page 3 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Page 4 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Naïve Bayesian Classifier and K-Means Clustering

Part 1: Naïve Bayesian Classifier

1. Concept Explanation

Machine learning uses the Naïve Bayesian Classifier as a probabilistic algorithm for

classification operations. This algorithm relies on Bayes' theorem, assuming features become

independent when the class is known. Despite using a simplified conditional independence

assumption, the Naïve Bayesian Classifier functions effectively for spam detection, sentiment

analysis, and medical diagnosis. Feature independence occurs only after the class label has been

provided.

Assumptions:

1. Conditional Independence – The algorithm bases its operation on a rule that states

features show independence from one another when the class label serves as input.

2. Equal Importance of Features – Each feature contributes equally to the classification.

3. Prior Probabilities Are Used – The model relies on prior knowledge (base rates of

classes).

Mathematically, Bayes' theorem is given by:

𝑃(𝑋|𝐶)𝑃(𝐶)
𝑃(𝐶|𝑋) =
𝑃(𝑋)

Where:

 𝑃(𝐶|𝑋) is the posterior probability of class C given feature set X.

 𝑃(𝑋|𝐶) is the likelihood of feature set X given class C.

 𝑃(𝐶) is the prior probability of class C.

 𝑃(𝑋) is the marginal probability of feature set X.

Page 4 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Page 5 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

For multiple features X = (X1, X2, ..., Xn), the Naïve Bayes assumption simplifies:
𝑛
𝑃(𝐶)𝛱ⅈ=1 𝑃(𝑋ⅈ |𝐶)
𝑃(𝐶|𝑋) =
𝑃(𝑋)

This allows for efficient computation in classification problems.

2. Example with Explanation

Application: Spam Email Detection

Spam detection involves categorizing emails into spam and valid messages (ham). The

primary purpose is to develop a predictive model for identifying spam emails based on word

frequency patterns and additional characteristics.

Classification Objective

The goal is to determine the probability of an email being spam given a set of observed

words. This is achieved using the Naïve Bayes classifier, which assumes that the presence of

each word in the email is independent of the others, given the class label.

3. Sample Problem & Solution

Dataset

Consider a small dataset of emails with the presence (1) or absence (0) of specific keywords:

Email ID "Free" "Win" "Money" "Offer" Spam (1=Yes, 0=No)

1 1 1 0 1 1

2 0 1 1 0 0

3 1 1 1 1 1

4 0 0 1 0 0

5 1 0 1 1 1

Page 5 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Page 6 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

We classify a new email with: ("Free"=1, "Win"=1, "Money"=1, "Offer"=0).

Step-by-Step Calculation using Bayes' Theorem

Calculate Priors:
3
 P(Spam) = = 0.6
5

2
 P(Not Spam) = = 0.4
5

Calculate Likelihoods:
2
 P(Free=1∣Spam) = = 0.67
3

2
 P(Win=1∣Spam) = = 0.67
3

2
 P(Money=1∣Spam) = = 0.67
3

1
 P(Offer=0∣Spam) = = 0.33
3

0
 P(Free=1∣Not Spam) = = 0.00
2

1
 P(Win=1∣Not Spam) = = 0.5
2

1
 P(Money=1∣Not Spam) = = 0.5
2

1
 P(Offer=0∣Not Spam) = = 0.5
2

Compute Posteriors:

 P(Spam∣X) ∝ 0.6 × (0.67 × 0.67 × 0.67 × 0.33)

 P(Not Spam∣X) ∝ 0.4 × (0.00 × 0.50 × 0.50 × 0.50)

Since P(Not Spam∣X) is 0, the classification is Spam.

Python Code for Naïve Bayes Implementation

from sklearn.naive_bayes import BernoulliNB

import numpy as np

Page 6 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Page 7 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

# Training dataset
X_train = np.array([[1,1,0,1], [0,1,1,0], [1,1,1,1], [0,0,1,0], [1,0,1,1]])
y_train = np.array([1, 0, 1, 0, 1]) # 1 = Spam, 0 = Not Spam

# New email sample

X_test = np.array([[1,1,1,0]])

# Model training
nb_model = BernoulliNB()
nb_model.fit(X_train, y_train)

# Prediction
prediction = nb_model.predict(X_test)
print("Prediction:", "Spam" if prediction[0] == 1 else "Not Spam")
Output:

Prediction: Spam

Page 7 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Page 8 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Part 2: K-Means Clustering

1. Concept Explanation

Machine learning techniques implement clustering as an unsupervised approach that

groups data points through their shared features. Clustering algorithms detect natural data

groupings in an unsupervised manner since they work without predefined categories. Clustering

serves multiple functions, including market segmentation and anomaly detection, image

processing, and biological data analysis.

Definition of K-Means Clustering

K-Means Clustering is commonly used in marketing to segment customers based on

spending behavior and income levels. This allows businesses to target specific customer groups

with personalized promotions.

K-Means follows three main steps:

1. Centroid Selection:

o Randomly select K initial centroids from the dataset.

2. Cluster Assignment:

o Each data point is assigned to the nearest centroid based on the Euclidean

distance.

3. Centroid Updating:

o Compute the new centroid by taking the mean of all points in the cluster.

o Repeat until centroids no longer change significantly (convergence).

The centroid of a cluster is mathematically represented as:

Page 8 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Page 9 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

𝑛
1
𝐶𝑘 = ∑ 𝑥ⅈ
𝑛
ⅈ=1

where:

 𝐶𝑘 is the centroid of cluster k,

 𝑥ⅈ represents the data points in cluster k,

 n is the number of points in the cluster.

The Euclidean distance used for assigning clusters:

𝑛
2
𝑑(𝑥, 𝐶𝑘 ) = √∑(𝑥𝑗 − 𝐶𝑘𝑗 )
0=1

where:

 x is a data point,

 Ck is the cluster centroid,

 m is the number of features.

2. Customer Segmentation

The marketing industry uses K-Means Clustering as a popular technique to divide

customers by their purchasing activities and financial capability. Businesses use this approach to

deliver advertisements that cater specifically to discernible customer demographics.

3. Sample Problem & Solution

Customer ID Annual Income ($1000s) Spending Score (1-100)

1 15 39

Page 9 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Page 10 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

2 16 81

3 17 6

4 18 77

5 20 40

6 24 94

7 25 3

8 30 73

9 35 92

10 40 8

Step 1: Initial Centroid Selection

Randomly selecting K=3 centroids:

 C1 (Low Income, Low Spending): (15, 39)

 C2 (Middle Income, High Spending): (24, 94)

 C3 (High Income, Low Spending): (40, 8)

Step 2: Cluster Assignment (Iteration 1)

Using Euclidean distance to compute the 3 centroids to each customer, and assigning it to

the nearest one.

Example Calculation for Customer 1 (15, 39)

Distance to C1 (15,39):

𝑑1 = √(15 − 15)2 + (39 − 39)2 = 0

(Customer 1 stays in Cluster 1)

Distance to C2 (24,94):

Page 10 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Page 11 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

𝑑1 = √(15 − 24)2 + (39 − 94)2 = 55.1

Distance to C3 (40,8):

𝑑1 = √(15 − 40)2 + (39 − 8)2 = 39.5

Step 3: Centroid Update

Example for Cluster 1 (Customers: 1, 3, 5, 7, 10):

New centroid:

51+17+20+25+40 39+6+40+3+8
C1 = ( , ) = (23.4,19.2)
5 5

Final Cluster Assignments

Customer ID Annual Income ($1000s) Spending Score (1-100) Final Cluster
1 15 39 1
2 16 81 2
3 17 6 1
4 18 77 2
5 20 40 1
6 24 94 2
7 25 3 1
8 30 73 2
9 35 92 2
10 40 8 1

Page 11 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Page 12 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

# Dataset: Customer Income & Spending Score

X = np.array([
[15, 39], [16, 81], [17, 6], [18, 77], [20, 40],
[24, 94], [25, 3], [30, 73], [35, 92], [40, 8]
])

# Apply K-Means Clustering

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Cluster assignments
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Plot the clusters

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', marker='o', edgecolor='k')
plt.scatter(centroids[:, 0], centroids[:, 1], s=300, c='red', marker='X', label='Centroids')
plt.xlabel('Annual Income ($1000s)')
plt.ylabel('Spending Score (1-100)')
plt.title('K-Means Customer Segmentation')
plt.legend()
plt.show()

# Print cluster assignments

print("Final Cluster Assignments:", labels)
Output

Page 12 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

Page 13 of 13 - AI Writing Submission Submission ID trn:oid:::29034:86246888

6725133c9e12e9db65ccf8d9 Mopumiwejapov
No ratings yet
6725133c9e12e9db65ccf8d9 Mopumiwejapov
2 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
18 pages
CH 5
No ratings yet
CH 5
21 pages
UNIT - 1 PPT - DBMS - BSC
No ratings yet
UNIT - 1 PPT - DBMS - BSC
27 pages
Lab 002
No ratings yet
Lab 002
5 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
Operation Manual MIPLUS REV - 00 en
No ratings yet
Operation Manual MIPLUS REV - 00 en
86 pages
IR Unit 2 (1,2)
No ratings yet
IR Unit 2 (1,2)
76 pages
Naive Bayes Algorithm For Classification Tasks: Sana Badagan 1MS24RAI09
No ratings yet
Naive Bayes Algorithm For Classification Tasks: Sana Badagan 1MS24RAI09
31 pages
Text Classification in ML
No ratings yet
Text Classification in ML
47 pages
Assignment Guideline 2
No ratings yet
Assignment Guideline 2
5 pages
Text Classification Using TF-IDF and Machine Learning
No ratings yet
Text Classification Using TF-IDF and Machine Learning
30 pages
Math1330 Printable Exercises and Solutions
No ratings yet
Math1330 Printable Exercises and Solutions
245 pages
2.naïve Bayes Classifier For Sms
No ratings yet
2.naïve Bayes Classifier For Sms
9 pages
Business Ecosystem Analysis Report - Docx: Document Details
No ratings yet
Business Ecosystem Analysis Report - Docx: Document Details
29 pages
Text Classification
No ratings yet
Text Classification
11 pages
AIReport
No ratings yet
AIReport
12 pages
Naive Bayes Ons
No ratings yet
Naive Bayes Ons
29 pages
Naïve Bayesian Classifier
No ratings yet
Naïve Bayesian Classifier
15 pages
Lec 09
No ratings yet
Lec 09
50 pages
ML Notes (III BCA)
No ratings yet
ML Notes (III BCA)
64 pages
Mla Unit-5'2
No ratings yet
Mla Unit-5'2
74 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
Screen-Time Weight-Loss Intervention Targeting Children at Home (SWITCH) : A Randomized Controlled Trial
No ratings yet
Screen-Time Weight-Loss Intervention Targeting Children at Home (SWITCH) : A Randomized Controlled Trial
11 pages
Lecture 12 Dr. Lamiaa
No ratings yet
Lecture 12 Dr. Lamiaa
21 pages
Journal of Catholic Social Thought
No ratings yet
Journal of Catholic Social Thought
17 pages
Supervised Machine Learning Unit 3
No ratings yet
Supervised Machine Learning Unit 3
8 pages
A Case Study On Apple Inc..docx: Document Details
No ratings yet
A Case Study On Apple Inc..docx: Document Details
9 pages
Multimedia Application L7 - For
No ratings yet
Multimedia Application L7 - For
46 pages
24 Shivangi DMDW
No ratings yet
24 Shivangi DMDW
12 pages
Naive Bayes Explanation Cleaned
No ratings yet
Naive Bayes Explanation Cleaned
2 pages
Lec 09
No ratings yet
Lec 09
50 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
Machine Ass
No ratings yet
Machine Ass
33 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
7 pages
CL-1208 CL-1216: User Manual
No ratings yet
CL-1208 CL-1216: User Manual
82 pages
Intel 8080 CPU Chip Development
No ratings yet
Intel 8080 CPU Chip Development
4 pages
Lesson 15.3 Exponential Smoothing Techniques - Practice Results - Hawkes Learning - Portal
No ratings yet
Lesson 15.3 Exponential Smoothing Techniques - Practice Results - Hawkes Learning - Portal
1 page
Lesson 15.4 Forecast Accuracy - Certify Results - Hawkes Learning - Portal
No ratings yet
Lesson 15.4 Forecast Accuracy - Certify Results - Hawkes Learning - Portal
1 page
Naive Bayes
No ratings yet
Naive Bayes
38 pages
System Development Life Cycle
100% (2)
System Development Life Cycle
3 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
Dhrystone - Wikipedia
No ratings yet
Dhrystone - Wikipedia
15 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
24 pages
Statement of Purpose Msinus
No ratings yet
Statement of Purpose Msinus
5 pages
AIReport
No ratings yet
AIReport
7 pages
Exp 10
No ratings yet
Exp 10
9 pages
g915 g913 TKL QSG
No ratings yet
g915 g913 TKL QSG
333 pages
Unit 3
No ratings yet
Unit 3
20 pages
Lab7&8 NaiveBayes
No ratings yet
Lab7&8 NaiveBayes
5 pages
Naive Bays
No ratings yet
Naive Bays
10 pages
Naïve Bayes Classifiers 3
No ratings yet
Naïve Bayes Classifiers 3
16 pages
Rock Smith Configuration
No ratings yet
Rock Smith Configuration
23 pages
Educational Qualification: Chetan.A.Vibhutimath
No ratings yet
Educational Qualification: Chetan.A.Vibhutimath
2 pages
03 Classification
No ratings yet
03 Classification
66 pages
Csat
No ratings yet
Csat
5 pages
cs188 Fa22 Note19
No ratings yet
cs188 Fa22 Note19
8 pages
07 - KNN & Naive Bayes
No ratings yet
07 - KNN & Naive Bayes
59 pages
Who Is Arthur Noriega - Google Search
No ratings yet
Who Is Arthur Noriega - Google Search
1 page
Practical 3
No ratings yet
Practical 3
11 pages
Lab5 NaiveBayes Full
No ratings yet
Lab5 NaiveBayes Full
5 pages
Naive456 Bayes297Classification
No ratings yet
Naive456 Bayes297Classification
21 pages
AWS Certified DevOps Engineer Professional Questions
No ratings yet
AWS Certified DevOps Engineer Professional Questions
4 pages
Naive Bayes Classifier Presentation
No ratings yet
Naive Bayes Classifier Presentation
10 pages
DWM Exp5 C49
No ratings yet
DWM Exp5 C49
12 pages
1xjBRET: Exploration of Geometric Animation Using A Single Formula in Spreadsheet Excel
No ratings yet
1xjBRET: Exploration of Geometric Animation Using A Single Formula in Spreadsheet Excel
25 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
Mechine Learning
No ratings yet
Mechine Learning
7 pages
JXCX OMT0002
No ratings yet
JXCX OMT0002
113 pages
Naive Bayes Classifier in Machine Learning Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning Javatpoint
23 pages
Business Intelligence & Data Visualization Tableau - Tables: Cyrus Lentin
No ratings yet
Business Intelligence & Data Visualization Tableau - Tables: Cyrus Lentin
28 pages
Scadaaa
No ratings yet
Scadaaa
121 pages
Module3 Ids
No ratings yet
Module3 Ids
17 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
Week 3
No ratings yet
Week 3
31 pages
Naive Bayes
No ratings yet
Naive Bayes
4 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
Resentation On Aïve Bayesian Lassification
No ratings yet
Resentation On Aïve Bayesian Lassification
38 pages
Notes On Module 3 - Pattern Recognition
No ratings yet
Notes On Module 3 - Pattern Recognition
17 pages
6d7701 - Bayesean Classifer
No ratings yet
6d7701 - Bayesean Classifer
8 pages
Clinic Information System
100% (1)
Clinic Information System
15 pages
Badar Part
No ratings yet
Badar Part
2 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
UISearchController Tutorial Getting Started
No ratings yet
UISearchController Tutorial Getting Started
16 pages
Chatbot Presentation
0% (1)
Chatbot Presentation
21 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
Naive Bayes - Report (Repaired)
No ratings yet
Naive Bayes - Report (Repaired)
5 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
Introduction To Programming Language C 2023
100% (1)
Introduction To Programming Language C 2023
44 pages
Naive Bayes Model
No ratings yet
Naive Bayes Model
10 pages
ToEE Game Guide - Circle of 8 Walkthrough
No ratings yet
ToEE Game Guide - Circle of 8 Walkthrough
75 pages
Disassembling A System Unit
No ratings yet
Disassembling A System Unit
5 pages
DSGW-060 Smart Gateway
No ratings yet
DSGW-060 Smart Gateway
7 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
1715 Redundant I/O System Specifications: Technical Data
No ratings yet
1715 Redundant I/O System Specifications: Technical Data
20 pages
Com Profibus 7sj602 en
No ratings yet
Com Profibus 7sj602 en
54 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
Statistics with Rust: 50+ Statistical Techniques Put into Action
From Everand
Statistics with Rust: 50+ Statistical Techniques Put into Action
Keiko Nakamura
No ratings yet
Python Data Analysis
From Everand
Python Data Analysis
Ivan Idris
4/5 (2)