K-Means Clustering in MATLAB
Last Updated :
28 Apr, 2025
K-means clustering is an unsupervised machine learning algorithm that is commonly used for clustering data points into groups or clusters. The algorithm tries to find K centroids in the data space that represent the center of each cluster. Each data point is then assigned to the nearest centroid, forming K clusters. The algorithm iteratively updates the centroids based on the mean of the data points assigned to it and re-assigns the data points to the closest centroid. This process is repeated until the centroids no longer move, or a maximum number of iterations is reached.
Here are two examples of k-means clustering with complete MATLAB code and explanations:
Example 1: Iris Dataset
The Iris dataset is a classic dataset used in machine learning and data mining. It contains measurements of the sepal length, sepal width, petal length, and petal width of three species of Iris flowers (Setosa, Versicolor, and Virginica). In this example, we will use k-means clustering to cluster the Iris dataset into three clusters based on the four features.
Matlab
% Load the Iris dataset
load fisheriris;
% Combine the four features into a matrix
X = [meas(:,1), meas(:,2), meas(:,3), meas(:,4)];
% Apply k-means clustering with k=3
k = 3;
[idx, centroids] = kmeans(X, k);
% Plot the results
figure;
gscatter(X(:,1), X(:,2), idx, 'bgr', '.', 10);
hold on;
plot(centroids(:,1), centroids(:,2), 'kx', 'MarkerSize', 15, 'LineWidth', 3);
legend('Cluster 1', 'Cluster 2', 'Cluster 3', 'Centroids');
title('K-Means Clustering Results');
xlabel('Sepal Length');
ylabel('Sepal Width');
Output:
Iris Dataset using k means clustering Explanation:
In this example, we first load the Iris dataset using the load() function. We then combine the four features into a matrix X. Next, we apply k-means clustering with k=3 using the kmeans() function. The kmeans() function returns the cluster indices idx and the centroid coordinates centroids. Finally, we plot the clustered data and the centroids using the gscatter() and plot() functions.
Example 2: Synthetic Data
In this example, we will generate a synthetic dataset of two clusters and use k-means clustering to cluster the data.
Matlab
% Generate random data
rng(1);
X = [randn(100,2)*0.75+ones(100,2); randn(100,2)*0.5-ones(100,2)];
% Apply k-means clustering with k=2
k = 2;
[idx, centroids] = kmeans(X, k);
% Plot the results
figure;
gscatter(X(:,1), X(:,2), idx, 'bgr', '.', 10);
hold on;
plot(centroids(:,1), centroids(:,2), 'kx', 'MarkerSize', 15, 'LineWidth', 3);
legend('Cluster 1', 'Cluster 2', 'Centroids');
title('K-Means Clustering Results');
xlabel('X1');
ylabel('X2');
Output:
Synthetic Data using k means clusteringIn this example, we first generate a random dataset of 200 points with two clusters using the randn() function. We then apply k-means clustering with k=2 using the kmeans() function. The kmeans() function returns the cluster indices idx and the centroid coordinates centroids. Finally, we plot the clustered data and the centroids using the gscatter() and plot() functions.
Applications of k-means clustering in MATLAB:
- Image segmentation.
- Market segmentation.
- Anomaly detection.
- Recommendation systems.
- Text clustering.
Similar Reads
Fuzzy C-means Clustering in MATLAB Fuzzy C-means (FCM) is a method of clustering that allows points to be more than one cluster. The (FCM) is a kind of data clustering technique in which the data set is grouped into N numbers of clusters with every data point corresponding to each cluster on the basis. which is to differentiate the d
4 min read
Mean Function in MATLAB Mean or average is the average of a sequence of numbers. In MATLAB, mean (A) returns the mean of the components of A along the first array dimension whose size doesn't equal to 1. Suppose that A is a vector, then mean(A) returns the mean of the components. Now, if A is a Matrix form, then mean(A) re
3 min read
Clustering in Julia Clustering in Julia is a very commonly used method in unsupervised learning. In this method, we put similar data points into a cluster based on the number of features they have in common. The number of clusters created during the clustering process is decided based on the complexity and size of the
4 min read
Clustering Distance Measures Clustering is a fundamental concept in data analysis and machine learning, where the goal is to group similar data points into clusters based on their characteristics. One of the most critical aspects of clustering is the choice of distance measure, which determines how similar or dissimilar two dat
7 min read
How to Calculate Harmonic Mean in MATLAB? Harmonic mean is a type of mean, which is a measure of central tendencies of data, in statistics that gives large weightage to smaller data and small weightage to larger data. The Harmonic Mean in mathematical terms is nothing but the reciprocal of the mean of reciprocal values of all the data eleme
2 min read
Clustering Metrics in Machine Learning Clustering is a technique in Machine Learning that is used to group similar data points. While the algorithm performs its job, helping uncover the patterns and structures in the data, it is important to judge how well it functions. Several metrics have been designed to evaluate the performance of th
8 min read