0% found this document useful (0 votes)
41 views23 pages

Artificial Intelligence Report

Uploaded by

Joan Eborde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views23 pages

Artificial Intelligence Report

Uploaded by

Joan Eborde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

STUDY OF UNSUPERVISED

LEARNING TECHNIQUES :
K-MEANS AND HIERARCHICAL
CLUSTERING ALGORITHM

By: Pratiksha Usatkar


ARTIFICIAL INTELLIGENCE

CAPIZ STATE UNIVERSITY PHILIPPINES


BACHELOR OF SCIENCE IN COMPUTER SCIENCE

GROUP 6
I. Introduction
● Clustering algorithms are an effective strategy for machine
learning on unsupervised information. Cluster analysis can be a
powerful tool for any organization that needs to identify discrete
groups of customers, sales transactions, or other types of
behaviors and things. For case, insurance suppliers utilize cluster
analysis to identify false claims, and banks utilize it for credit
scoring .These algorithms are mostly used in the identification of
fake news, spam filter, marketing sales, classify network traffic,
● and identifying fraudulent or criminal activity.
K means Clustering

● K-Means clustering is an unsupervised learning algorithm.


There is no labelled data for this clustering, unlike in
supervised learning. K-Means performs the division of
objects into clusters that share similarities and are
dissimilar to the objects belonging to another cluster.
● The term ‘K’ is a number. You need to tell the system how
many clusters you need to create. For example, K = 2
refers to two clusters. There is a way of finding out the
best or optimum value of K for a given data.
● In machine learning, the most common
algorithms are k-means and Hierarchical. These
two algorithms are inconceivably effective when
connected to different machine learning issues.
● K-Means Clustering - Centroid Based Progressive
Clustering - Divisive and Agglomerative K-means
clustering is one of the well-known sorts of
partitioning-based Clustering. Partitioning algorithms
are Clustering strategies that subdivide the
information set in a set of k bunches, where k is the
pre-determined number of clusters for
generalization. In k-means clusters are represented
by the center or mean of information focuses having
a place in the cluster.
● There are six sorts of clustering procedures- k-
Means Clustering, Hierarchical-Based Clustering,
and EM Algorithm, Optics. Performance
investigation of k-means with distinctive
initialization techniques for high dimensional data
● In this paper Khaled Alsabti et al, the creators present the novel algorithm for
performing k-means clustering. The most point of the creator here was to think
approximately the computational points of view of the k-means procedure.
Here the datasets are made misleadingly to induce the scaling properties of
the calculation utilized by them. They communicated that their algorithm
arranged will inside and out prevalent execution than the facilitate k-means
calculation in most cases of their exploratory comes approximately. The plan
which they proposed is said to be making strides in the computational speed
of the arrange k-means calculation by an organized to two orders of greatness
inside the add up to a few remove calculations and the in general time of
computation .
● Trupti M.Kodinariya et al., elaborated six different approaches for the selection
of K value for the K-Mean clustering algorithm in a dataset. He concluded that
clusters are in a viewing eye and analyzed the situation when clusters, though
not definitely typical, are in data.
III. ALGORITHM
I. K-means clustering algorithm
1. K-means clustering could be a type of unsupervised
learning which is utilized once you have unlabeled data
2. the objective of this algorithm is to discover groups inside
the data with the number of bunches talked to by the variable k 3.
3. the algorithm works iteratively to assign each information
point to one of the k bunches based on the highlights that are
provided
4. data points are clustered based on include similarity.
II. Hierarchical clustering algorithm

● Hierarchical clustering is an unsupervised machine learning


algorithm ,it is utilized bunch the unlabeled datasets for dataset for
clusters called Hierarchical cluster examinations in this algorithm in
this algorithm ,we make in this calculation, we develop the
progression of clusters within the shape tree-shaped structure is
called the dendrogram .
1.clustering is performed based upon dissimilarities between
clusters
2.produces an arrangement or tree of clustering does not require
the number of the cluster as input
3.partitions can be visualized employing a tree structure 5possible
to see allotments at the diverse level of granularities utilizing distinctive k
I. Agglomerative
1. Start point with the individual cluster . In this algorithm If there is an n
data point, then the number of clusters would-be n
2. At every step, interface the closest coordinate of clusters till because
it was one cluster is clear out .
3. Compute the distance matrix(Single, complete, centroid, and
average Linkage)
4. All the clusters are merged toward one huge cluster, create the
dendrogram to partition the clusters as per the issue.
5. Create Dendrogram a dendrogram is a tree-like diagram that
records the sequences of merges or splits.
II. Divisive
1.Start with one cluster, all-inclusive cluster .
2.At each step, part a cluster until each cluster
contains a k cluster .
3. Compute the distance matrix(Single,
complete, centroid, and average Linkage) .
4. all the clusters are merged toward one huge
cluster, create the dendrogram to partition the
clusters.
Consider a simple example with 5 data points for both Agglomerative and Divisive Clustering. Let’s say we
have data points A, B, C, D, and E with the following distances between them:
Distance(A, B) = 1 Distance(A, C) = 2 Distance(A, D) = 3 Distance(A, E) = 4 Distance(B, C) = 2 Distance(B,
D) = 2 Distance(B, E) = 3 Distance(C, D) = 1 Distance(C, E) = 2 Distance(D, E) = 3
Agglomerative Clustering:
1.Start with 5 clusters: {A}, {B}, {C}, {D}, {E}
2.Merge the closest clusters: {A, B}, {C}, {D}, {E} (since Distance(A, B) = 1 is the smallest)
3.Continue merging: {A, B, C}, {D}, {E} (since Distance(B, C) = 2 is the smallest among the remaining)
4.And so on, until you get one big cluster: {A, B, C, D, E}
Divisive Clustering:
5.Start with 1 cluster: {A, B, C, D, E}
2.Split the cluster into two. This could be done in several ways depending on the criteria you use. One
possible split could be: {A, B, C}, {D, E}
3.Continue splitting: {A, B}, {C}, {D, E} and so on, until each data point is in its own cluster: {A}, {B}, {C}, {D},
{E}
In both methods, you can use a dendrogram to visualize the process and decide the optimal number of
clusters.
The above example is a simplified one, in real-world scenarios,
the data is usually multi-dimensional and the distance computation would be more complex.
IV. Customer segmentation

In an organization customer identification and


their behaviors plays an important role. We can
group the customers who can buy similar products
based on their identification and behavior. An
unsupervised clustering in machine learning (ML),
will help to identify customers which includes two
clustering techniques K means clustering and
hierarchical clustering which we are going to
Problem Statement
● Suppose there is a mall, where they have recorded 200 details of
customers like age, gender, annual income, and spending score through
campaigns, spending score is ready based on the investing habits of the
buys they have made from the shopping center. Presently, the mall is
bringing the new and luxurious products to the mall and wants to reach
the customers, we can't go to each customer and ask how is the product,
instead, we can separate the customers into groups who can buy the
products. This problem can be done using clustering. To represent this
we can use Two Dimensional Euclidean space, one is the X-axis which is
represented by annual income, and Y-Axis which is represented by
spending Score. By representing each customer on the plane, we can use
the clustering method to find the customers who buy luxurious products.
● I. Python implementation of k means algorithm
1. The first step is data processing, we have to import the data first, after that, we
have to import the dataset. Then we have to extract the independent variable
2. In the second step, we will use the elbow method and find the optimal number
cluster
3. In the third step, We have to train the K-means algorithm from the training data
set
4. In the fourth step visualize the cluster

II. Implementation of hierarchical clustering


1. The first step is data processing, we have to import the data first, after that,
we have to import the dataset. Then we have to extract the matrix of
feature
2. In the second step, we will use the Dendrogram and find the optimal
number cluster
3. In the third step, we have to train the hierarchical algorithm from the
training data set 4. In the fourth step visualize the cluster.
V. Python implementation customer
segmentation steps:
1. Import the basic libraries to examined the CSV record and visualize the
information .
2. Read the dataset that's in a CSV record. Characterize the dataset the
demonstrate .
3. To execute the k-means clustering, we have to find the perfect number
of clusters in which clients will be put. The ideal number of clusters for k-
means, the elbow procedure is utilized based on within- cluster-sum-of-
squares (WCSS). It'll be plotted as given below: as we will see inside the
over the figure, the over plot is visualized and we ought to recognize the
region of the elbow on the x-axis. Inside the over plot, the elbow shows up
to be on point5 of the x-axis.
4. After finding the perfect number of clusters, fit the k-means clustering
appears to the dataset characterized inside the minute step and after that
anticipates clusters for each of the information components. It infers it'll
when alluding to a client with low pay and high contributing, we have utilized cyan
color. This bunch appears as ‘Careless Customer’ since in spite of having a low
compensation, they spend more. To offer an excessive thing, a person with a tall
wage and tall contributing propensities ought to center. This bunch of clients is
spoken to in maroon color within over the chart.
7. Presently the same assignment will be executed utilizing Various leveled
clustering. The perusing of CSV records and making a dataset for calculations will
be common as given inside the primary and moment steps. In K-Means, the
number of perfect clusters was found utilizing the elbow strategy. K-Means, the
number of ideal clusters was found utilizing the elbow method. In various leveled
clustering, the dendrograms are utilized for this reason. Plot and visualized a
dendrogram for our dataset. In case you're aware of this method, you'll be able to I
see inside the over charts. The combination of 5 lines isn't joined on the Y-axis from
100 to 240, for approximately 140 units. the ideal number of clusters would be 5
for progressive clustering.
8. Presently we prepare the hierarchical clustering calculation and expect the
cluster for each data point.
VI. CVI. COMPARISON BETWEEN K-MEANS AND
HIERARCHICAL CLUSTERING SON BETWEEN K-MEANS AND
HIERARCHICAL CLUSTERING
VII. RESULT:
Python implementation of k-means and Hierarchical
algorithm
● K-means clustering is a broadly utilized method for data cluster
analysis. The ‘k-means’ inside the K-means refers to averaging of the
information; that's, finding the centroid. We also call Hierarchical
Clustering a Greedy Algorithm because splits and merges of clusters
vary based on Linkage selection. Hierarchical clustering is the most
well known and broadly utilized strategy to analyze social network
data. When there is a large dataset, the quality of the algorithm is
good and performance is also good. Being a Huge dataset, the K-
Means algorithm is faster than the rest of the algorithms. k-means
algorithm performance is better than Hierarchical clustering. The
hierarchical algorithm was received for categorical data, and due to
its complexity and a new approach for assigning rank value to each
categorical attribute using K- implies can be utilized in which
categorical data is, begin with, changed over into numeric by

You might also like