0% found this document useful (0 votes)
9 views2 pages

K Means

Uploaded by

dksk0945
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views2 pages

K Means

Uploaded by

dksk0945
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

K MEANS ----------------------------------------------------

K-Means is a popular unsupervised machine learning algorithm used for clustering


data into a predefined number of groups (k clusters). It minimizes the variance
within clusters by iteratively assigning data points to clusters and updating
cluster centroids.

Detailed Steps of K-Means


Initialization

Choose the number of clusters, k.


Randomly initialize k centroids (cluster centers) in the feature space.
Assignment Step

For each data point, compute its distance to all centroids (e.g., Euclidean
distance).
Assign the data point to the cluster with the nearest centroid.
Update Step

Calculate the new centroid of each cluster as the mean of all data points assigned
to it.
Repeat

Reassign data points to clusters based on updated centroids.


Recompute centroids based on new assignments.
Convergence

Stop when the centroids do not change significantly or a predefined number of


iterations is reached.
Mathematical Perspective
Strengths and Limitations
Strengths:
Simple to understand and implement.
Scales well with a large number of samples.
Works well with spherical-shaped clusters.
Limitations:
Sensitive to the initial placement of centroids.
Struggles with non-spherical or overlapping clusters.

𝑘
Requires

k (number of clusters) to be specified beforehand.

NAIVE BAYES ------------------------------------------------------

Naive Bayes is a probabilistic machine learning algorithm based on Bayes’ theorem,


commonly used for classification tasks. It is called “naive” because it assumes
that the features are conditionally independent given the class label, which is
rarely true in real-world scenarios. Despite this simplification, Naive Bayes often
performs well in practice.

Types of Naive Bayes


Gaussian Naive Bayes:

Assumes that features are continuous and follow a Gaussian (normal) distribution.
Multinomial Naive Bayes:

Works with discrete data, often used for text classification where features
represent word counts or frequencies.
Bernoulli Naive Bayes:

Used for binary data (e.g., presence or absence of a feature).

Advantages
Simple and fast to implement.
Handles both continuous and discrete data.
Works well with high-dimensional datasets (e.g., text data).
Limitations
Relies on the independence assumption, which may not hold in many cases.
Can struggle with data where features are highly correlated.
Zero probabilities for unseen data (solved using Laplace smoothing).

VISION

You might also like