0% found this document useful (0 votes)

3 views11 pages

Aam Unit 4 QB With Answer

Uploaded by

ombhavari434

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views11 pages

Aam Unit 4 QB With Answer

Uploaded by

ombhavari434

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

AAM QUESTION BANK WITH ANSWERS

CHAPTER 4 – Unsupervised Learning: Clustering Algorithms

[12 marks]

Q WHAT IS K-MEANS CLUSTERING?

 K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into different clusters.
 Here K defines the number of pre-defined clusters that need to be created in the process,
as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
 It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabelled dataset on its own without the need for any
training.
 It is a centroid-based algorithm, where each cluster is associated with a centroid.
 The main aim of this algorithm is to minimize the sum of distances between the data
point and their corresponding clusters.
 The algorithm takes the unlabelled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k
should be predetermined in this algorithm.

Q HOW DOES THE K-MEANS ALGORITHM WORK?

The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid
of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.

Prof. Kirti Karande 7045014174 AAM

Q DESCRIBE FAILURE OF K-MEANS

Failures or challenges associated with K-Means:

1. Sensitive to Initial Centroid Positions: K-Means is sensitive to the initial placement of
centroids. Different initializations can lead to different final cluster assignments, and the
algorithm may converge to a local minimum rather than the global minimum.
2. Assumes Spherical Clusters: K-Means assumes that clusters are spherical and equally
sized. In situations where clusters have different shapes, densities, or sizes, K-Means may
fail to accurately capture the underlying structure of the data.
3. Sensitive to Outliers: Outliers can significantly impact the performance of K-Means.
Since the algorithm relies on the mean (centroid) of the data points in each cluster,
outliers can disproportionately influence the centroid, leading to suboptimal cluster
assignments.
4. Requires Pre-specification of the Number of Clusters (K): One of the major limitations of
K-Means is that it requires the user to specify the number of clusters (K) in advance.
Choosing an inappropriate value for K can result in poor clustering results.
5. Limited to Euclidean Distance: K-Means uses Euclidean distance to measure the
dissimilarity between data points and centroids. This can be a limitation when dealing
with data that does not adhere to Euclidean geometry or when the features have different
scales.
6. May Produce Unbalanced Clusters: K-Means can produce clusters of significantly
different sizes. In cases where the data naturally forms clusters of unequal sizes, K-
Means may not be the most suitable algorithm.
7. Not Robust to Non-Convex Shapes: K-Means assumes that clusters are convex, which
means it struggles with non-convex shapes. If the true clusters have complex, non-convex
boundaries, K-Means may fail to accurately represent them.
8. Does Not Handle Categorical Data Well: K-Means is designed for numerical data, and it
may not perform well with categorical or binary features. Preprocessing techniques, such
as one-hot encoding, are often required.
9. Noisy Data Impact: Noise in the data can lead to incorrect cluster assignments. K-Means
is not robust to noisy data, and outliers or irrelevant features can affect the clustering
results.

Prof. Kirti Karande 7045014174 AAM

10. Convergence to Local Optima: K-Means uses an iterative optimization process, and it
may converge to a local minimum rather than the global minimum. Multiple runs with
different initializations are often performed to mitigate this issue

Q IMPLEMENTATION OF K-Means ALGORITHM

The steps to be followed for the implementation are given below:

o Data Pre-processing
o Finding the optimal number of clusters using the elbow method
o Training the K-means algorithm on the training dataset
o Visualizing the clusters
Step-1: Data pre-processing Step
o Importing Libraries:
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
In the above code, the numpy we have imported for the performing mathematics
calculation, matplotlib is for plotting the graph, and pandas are for managing the dataset.
o Importing the Dataset:
1. # Importing the dataset
2. dataset = pd.read_csv('Mall_Customers_data.csv')

Extracting Independent Variables

1. x = dataset.iloc[:, [3, 4]].values
Step-2: Finding the optimal number of clusters using the elbow method
1. #finding optimal number of clusters using the elbow method
2. from sklearn.cluster import KMeans
3. wcss_list= [] #Initializing the list for the values of WCSS
4.
5. #Using for loop for iterations from 1 to 10.
6. for i in range(1, 11):
7. kmeans = KMeans(n_clusters=i, init='k-means++', random_state= 42)

Prof. Kirti Karande 7045014174 AAM

8. kmeans.fit(x)
9. wcss_list.append(kmeans.inertia_)
10. mtp.plot(range(1, 11), wcss_list)
11. mtp.title('The Elobw Method Graph')
12. mtp.xlabel('Number of clusters(k)')
13. mtp.ylabel('wcss_list')
14. mtp.show()
Step- 3: Training the K-means algorithm on the training dataset

1. #training the K-means model on a dataset

2. kmeans = KMeans(n_clusters=5, init='k-means++', random_state= 42)
3. y_predict= kmeans.fit_predict(x)
.
Step-4: Visualizing the Clusters
1. #visulaizing the clusters
2. mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c = 'blue', label = 'Cluster
1') #for first cluster
3. mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c = 'green', label = 'Cluste
r 2') #for second cluster
4. mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c = 'red', label = 'Cluster 3'
) #for third cluster
5. mtp.scatter(x[y_predict == 3, 0], x[y_predict == 3, 1], s = 100, c = 'cyan', label = 'Cluster
4') #for fourth cluster
6. mtp.scatter(x[y_predict == 4, 0], x[y_predict == 4, 1], s = 100, c = 'magenta', label = 'Clu
ster 5') #for fifth cluster
7. mtp.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yell
ow', label = 'Centroid')
8. mtp.title('Clusters of customers')
9. mtp.xlabel('Annual Income (k$)')
10. mtp.ylabel('Spending Score (1-100)')
11. mtp.legend()
12. mtp.show()

Prof. Kirti Karande 7045014174 AAM

Q. Advantages / benefits Dimension Reduction
1. It facilitates data compression and decreases the necessary storage space.
2. It reduces the amount of time needed to conduct identical calculations.
3. It addresses the issue of multi-collinearity, which enhances the performance of the model. It
eliminates superfluous characteristics.

Q. What are the common methods to perform Dimension Reduction?

1. Missing Values:
 When we come across missing values when analysing data, how should we proceed?
 To begin, we should first determine the cause and then address missing data or eliminate
variables using suitable approaches.
 However, what if we encounter an excessive number of missing values? Should we
replace missing values with imputed values or remove the variables entirely?
2. Low Variance:. If there are a large number of dimensions, it is advisable to exclude
variables with low variance in comparison to others, as these variables will not
effectively account for the variation in the target variables.

3. Decision Trees:
 It serves as a comprehensive approach to address various issues like as handling missing
results, outliers, and finding relevant variables.
 It performed effectively during our Data Hackathon as well. Multiple data scientists
employed decision tree algorithms and achieved successful outcomes.

4. Random Forest:
 Random Forest is a method that is similar to a decision tree.
 It is important to note that random forests tend to show a bias towards variables with a
higher number of different values, meaning they favour numeric variables over binary or
category values

5. Strong Correlation:
Dimensions that have a strong correlation can negatively impact the model's performance.
Furthermore, it is undesirable to have several variables that contain comparable informat ion or
exhibit variance, a phenomenon commonly referred to as "multicollinearity".

Prof. Kirti Karande 7045014174 AAM

6. Backward Feature Elimination:
This method begins with all n dimensions. Calculate the sum of squared errors (SSR) by
removing each variable individually, repeating this process n times. Next, we find the variables
that, when removed, result in the smallest increase in the sum of squared residuals (SSR).
Finally, we remove these variables, resulting in a dataset with n-1 input features.

7. Factor Analysis:
. There are essentially two approaches to conducting factor analysis:

Exploratory Factor Analysis (EFA)

Confirmatory Factor Analysis (CFA)
8. Principal Component Analysis (PCA)
 It is a method that involves transforming variables into a new set of variables that are
linear combinations of the original variables.
 The new set of variables is referred to as principal components.
 The components are obtained by ensuring that the first principal component captures the
most of the potential variation in the original data, followed by each subsequent
component having the largest variance possible.
 The use of Principal Component Analysis (PCA) to your data collection becomes
meaningless. If the importance of result interpretability is a priority for your analysis,
Principal Component Analysis (PCA) is not the appropriate technique for your project.

Q. Define Correlation:
 Correlation refers to the statistical relationship between two or more variables.
 Correlation is a statistical term that quantifies the direction and magnitude of the linear
relationship between two variables..
 A correlation value of 0 indicates the absence of a linear relationship between the two
variables, whereas correlation coefficients of 1 and -1 indicate perfect positive and
negative correlations, respectively.
 The principal components in PCA are linear combinations of the original variables that
optimise the amount of variation accounted for by the data. The calculation of principal
components involves the utilisation of the correlation matrix.
.

Prof. Kirti Karande 7045014174 AAM

Q. PCA Implementation
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
:

m_data = pd.read_csv('mushrooms.csv')

# Machine learning systems work with integers, we need to encode these

# string characters into ints

encoder = LabelEncoder()

# Now apply the transformation to all the columns:

for col in m_data.columns:
m_data[col] = encoder.fit_transform(m_data[col])

X_features = m_data.iloc[:,1:23]
y_label = m_data.iloc[:, 0]

# Scale the features

scaler = StandardScaler()
X_features = scaler.fit_transform(X_features)

# Visualize
pca = PCA()
pca.fit_transform(X_features)
pca_variance = pca.explained_variance_

Prof. Kirti Karande 7045014174 AAM

plt.figure(figsize=(8, 6))
plt.bar(range(22), pca_variance, alpha=0.5, align='center', label='individual variance')
plt.legend()
plt.ylabel('Variance ratio')
plt.xlabel('Principal components')
plt.show()
pca2 = PCA(n_components=17)
pca2.fit(X_features)
x_3d = pca2.transform(X_features)

plt.figure(figsize=(8,6))
plt.scatter(x_3d[:,0], x_3d[:,5], c=m_data['class'])
plt.show()

pca3 = PCA(n_components=2)
pca3.fit(X_features)
x_3d = pca3.transform(X_features)
plt.figure(figsize=(8,6))
plt.scatter(x_3d[:,0], x_3d[:,1], c=m_data['class'])
plt.show()

Q. List Dimensionality Reduction Techniques

Prof. Kirti Karande 7045014174 AAM

Q. Advantages / Benefits of Dimensionality Reduction
o By reducing the dimensions of the features, the space required to store the dataset also
gets reduced.
o Less Computation training time is required for reduced dimensions of features.
o Reduced dimensions of features of the dataset help in visualizing the data quickly.
o It removes the redundant features (if present) by taking care of multicollinearity.

Q. Disadvantages of dimensionality Reduction

There are also some disadvantages of applying the dimensionality reduction, which are given
below:
o Some data may be lost due to dimensionality reduction.
o In the PCA dimensionality reduction technique, sometimes the principal components
required to consider are unknown.

Prof. Kirti Karande 7045014174 AAM

Q. Approaches of Dimension Reduction
There are two ways to apply the dimension reduction technique, which are given below:
Feature Selection
Feature selection is the process of selecting the subset of the relevant features and leaving out the
irrelevant features present in a dataset to build a model of high accuracy. In other words, it is a
way of selecting the optimal features from the input dataset.
Three methods are used for the feature selection:
1. Filters Methods
In this method, the dataset is filtered, and a subset that contains only the relevant features is
taken. Some common techniques of filters method are:
o Correlation
o Chi-Square Test
o ANOVA
o Information Gain, etc.
2. Wrappers Methods
The wrapper method has the same goal as the filter method, but it takes a machine learning
model for its evaluation. In this method, some features are fed to the ML model, and evaluate the
performance. The performance decides whether to add those features or remove to increase the
accuracy of the model. This method is more accurate than the filtering method but complex to
work. Some common techniques of wrapper methods are:
o Forward Selection
o Backward Selection
o Bi-directional Elimination
3. Embedded Methods: Embedded methods check the different training iterations of the machine
learning model and evaluate the importance of each feature. Some common techniques of
Embedded methods are:
o LASSO
o Elastic Net
o Ridge Regression, etc.
Feature Extraction:
Feature extraction is the process of transforming the space containing many dimensions into
space with fewer dimensions. This approach is useful when we want to keep the whole
information but use fewer resources while processing the information.

Prof. Kirti Karande 7045014174 AAM

Some common feature extraction techniques are:
a) Principal Component Analysis
b) Linear Discriminant Analysis
c) Kernel PCA
d) Quadratic Discriminant Analysis

Q. Common techniques of Dimensionality Reduction

a) Principal Component Analysis
b) Backward Elimination
c) Forward Selection
d) Score comparison
e) Missing Value Ratio
f) Low Variance Filter
g) High Correlation Filter
h) Random Forest
i) Factor Analysis
j) Auto-Encoder

Prof. Kirti Karande 7045014174 AAM

Agent Analyst
No ratings yet
Agent Analyst
559 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Da Exp 10 66
No ratings yet
Da Exp 10 66
6 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
SML Hand Note Bau by DT
No ratings yet
SML Hand Note Bau by DT
1 page
Detecting Patterns With Unsupervised Learning
No ratings yet
Detecting Patterns With Unsupervised Learning
21 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K Means
No ratings yet
K Means
25 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Report 1
No ratings yet
Report 1
3 pages
K Mean Notes
No ratings yet
K Mean Notes
5 pages
Kmean
No ratings yet
Kmean
24 pages
DATA 2024 - Dist
No ratings yet
DATA 2024 - Dist
97 pages
K Means - Ipynb - Colab
No ratings yet
K Means - Ipynb - Colab
10 pages
DWM Exp7 C49
No ratings yet
DWM Exp7 C49
11 pages
DWM Exp4
No ratings yet
DWM Exp4
9 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Unit 4
No ratings yet
Unit 4
46 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
K-Means Clustering Report
No ratings yet
K-Means Clustering Report
2 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
10 Marks Questions
No ratings yet
10 Marks Questions
19 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
Bone Suplement Market Segmentation
No ratings yet
Bone Suplement Market Segmentation
20 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
K Mean
No ratings yet
K Mean
7 pages
Unit IV
No ratings yet
Unit IV
96 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Unit 4
No ratings yet
Unit 4
63 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Unit 4
No ratings yet
Unit 4
125 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Algo
No ratings yet
Algo
59 pages
Intro To ML Ass
No ratings yet
Intro To ML Ass
3 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
Unit 4
No ratings yet
Unit 4
22 pages
09.unsupervised Learning
No ratings yet
09.unsupervised Learning
50 pages
Module 3
No ratings yet
Module 3
6 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
_0006_ K Means Clustering – Introduction _ 2025
No ratings yet
_0006_ K Means Clustering – Introduction _ 2025
19 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
HW 06 Markov Chains Solutions
No ratings yet
HW 06 Markov Chains Solutions
4 pages
CHAPTER 3. Some Common Probability Distributions 2023
No ratings yet
CHAPTER 3. Some Common Probability Distributions 2023
6 pages
150 Hard Questions For 800 3
No ratings yet
150 Hard Questions For 800 3
6 pages
First Grade Math
100% (2)
First Grade Math
8 pages
Chapter 2 Students-Sta408
No ratings yet
Chapter 2 Students-Sta408
59 pages
Mathematics T Coursework Example
100% (2)
Mathematics T Coursework Example
7 pages
Marketing Research Advanced Questions Final 2
No ratings yet
Marketing Research Advanced Questions Final 2
9 pages
Structural Design Report 07.06.2016
100% (1)
Structural Design Report 07.06.2016
22 pages
Mobile Ultimate Calculative Physics Preamble
No ratings yet
Mobile Ultimate Calculative Physics Preamble
37 pages
Uace MTC 1 Aitel 2018
No ratings yet
Uace MTC 1 Aitel 2018
4 pages
Student Solutions Manual and Study Guide For Discrete Mathematics With Applications, 3rd Edition by Susanna S. Epp
No ratings yet
Student Solutions Manual and Study Guide For Discrete Mathematics With Applications, 3rd Edition by Susanna S. Epp
2 pages
Production Engineering V SEM SET-1
No ratings yet
Production Engineering V SEM SET-1
3 pages
Manual 3322006
No ratings yet
Manual 3322006
46 pages
Subject-PLO Mapping Updated6-1
No ratings yet
Subject-PLO Mapping Updated6-1
3 pages
MAT 051-Limits PDF
No ratings yet
MAT 051-Limits PDF
148 pages
Outline Een 407
No ratings yet
Outline Een 407
5 pages
Java Basic Programming Question
No ratings yet
Java Basic Programming Question
9 pages
Optimization Models: Exercises 2
No ratings yet
Optimization Models: Exercises 2
2 pages
Robotics Lab Manual
100% (3)
Robotics Lab Manual
26 pages
Anamolous Zeeman Effect
No ratings yet
Anamolous Zeeman Effect
8 pages
Program Structure First Year CSE Stream 2024 2025
No ratings yet
Program Structure First Year CSE Stream 2024 2025
13 pages
Sumo
No ratings yet
Sumo
37 pages
Introduction To Diferential Calculus Christopher Thomas: Mathematics Learning Centre University of Sydney NSW 2006
No ratings yet
Introduction To Diferential Calculus Christopher Thomas: Mathematics Learning Centre University of Sydney NSW 2006
30 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
Floweret English Medium School-8
No ratings yet
Floweret English Medium School-8
12 pages
Watertank Sap2000 Sample
No ratings yet
Watertank Sap2000 Sample
58 pages
YCb CR
No ratings yet
YCb CR
4 pages
Assignment: Assignment 1: Write A Program in Java To Print Hello World
No ratings yet
Assignment: Assignment 1: Write A Program in Java To Print Hello World
20 pages

Aam Unit 4 QB With Answer

Uploaded by

Aam Unit 4 QB With Answer

Uploaded by

AAM QUESTION BANK WITH ANSWERS

CHAPTER 4 – Unsupervised Learning: Clustering Algorithms

Q WHAT IS K-MEANS CLUSTERING?

Q HOW DOES THE K-MEANS ALGORITHM WORK?

Prof. Kirti Karande 7045014174 AAM

Failures or challenges associated with K-Means:

Prof. Kirti Karande 7045014174 AAM

Q IMPLEMENTATION OF K-Means ALGORITHM

The steps to be followed for the implementation are given below:

Extracting Independent Variables

Prof. Kirti Karande 7045014174 AAM

1. #training the K-means model on a dataset

Prof. Kirti Karande 7045014174 AAM

Q. What are the common methods to perform Dimension Reduction?

Prof. Kirti Karande 7045014174 AAM

Exploratory Factor Analysis (EFA)

Prof. Kirti Karande 7045014174 AAM

# Machine learning systems work with integers, we need to encode these

# Now apply the transformation to all the columns:

# Scale the features

Prof. Kirti Karande 7045014174 AAM

Q. List Dimensionality Reduction Techniques

Prof. Kirti Karande 7045014174 AAM

Q. Disadvantages of dimensionality Reduction

Prof. Kirti Karande 7045014174 AAM

Prof. Kirti Karande 7045014174 AAM

Q. Common techniques of Dimensionality Reduction

Prof. Kirti Karande 7045014174 AAM

You might also like