ML-Notes - 4 and 5 - 16 Marks
ML-Notes - 4 and 5 - 16 Marks
A Cluster 1
B Cluster 1
C Cluster 2
D Cluster 1
E Cluster 2
F Cluster 2
Same as Iteration 1 → Converged
Final Cluster Assignment after 2 Iterations
Cluster 1: A, B, D
Cluster 2: C, E, F
Final Centroids
Centroid 1: (3.0, 7.67)
Centroid 2: (7.0, 4.33)
2. Organize the functioning of unsupervised learning with its two primary approaches.
Compare the strengths and weaknesses of this algorithm with respect to K-Means and K-
Medoids clustering. Also, identify the specific situations where Hierarchical Clustering
would be more advantageous than K-Means or K-Medoids and provide practical examples
where this might be applicable.
Unsupervised Learning:
Two Primary Approaches in Unsupervised Learning
1. Clustering
o Objective: Group data points into clusters such that items in the same group are
more similar to each other than to those in other groups.
o Common algorithms: K-Means, K-Medoids, Hierarchical Clustering, DBSCAN,
etc.
2. Dimensionality Reduction
o Objective: Reduce the number of variables in the dataset while preserving
important information.
o Common algorithms: PCA (Principal Component Analysis), t-SNE,
Autoencoders, etc.
K-Medoids
Strengths:
More robust to noise and outliers.
Can use any distance metric (e.g., Manhattan, cosine).
Weaknesses:
Slower than K-Means on large datasets.
Still requires k to be specified.
Hierarchical Clustering
Strengths:
No need to pre-define number of clusters.
Builds a hierarchy of clusters (dendrogram).
Effective for nested or non-convex clusters.
Weaknesses:
Computationally expensive on large datasets.
Cannot easily undo previous clustering decisions.
Sensitive to linkage and distance metric choice.
When is Hierarchical Clustering More Advantageous?
Hierarchical clustering is better than K-Means/K-Medoids in scenarios where:
The number of clusters is unknown and flexible grouping is desired.
The data has a nested or hierarchical structure.
You want interpretability via dendrograms.
Clusters are of unequal size, shape, or density.
Practical Examples
Application Why Hierarchical Clustering?
Gene expression analysis Reveals nested gene groupings with biological significance.
Document or text clustering Captures topic hierarchies (e.g., topics and subtopics).
Customer segmentation in Useful when customer behaviors are hierarchically
marketing structured.
Captures complex behavior not well-separated into k
Anomaly detection in networks
clusters.
Taxonomy of species or organisms Natural fit due to inherent hierarchical classification.
3. Consider the data set
Cust-id Annual Income Spending
store
1 15000 39
2 16000 81
3 17000 6
4 18000 77
5 19000 40
6 20000 76
Perform K-means clustering on the given data set K=2. Calculate the centroids of the two
clusters after applying K- means.
Solution:
K-Means Clustering on the dataset with K = 2. Steps:
1. Initialize centroids (we’ll select 2 points from the data).
2. Compute distances and assign clusters.
3. Calculate new centroids.
4. Iterate once more and show final centroids.
Dataset
Cust-id Annual Income Spending Score
1 15000 39
2 16000 81
3 17000 6
4 18000 77
5 19000 40
6 20000 76
Cluster 2: Customers 2, 4, 6
Points:
(16000, 81), (18000, 77), (20000, 76)
Centroid X=(16000+18000+20000)/3=54000/3=18000
Centroid Y=(81+77+76)/3=234/3=78.00
→ New Centroid 2: (18000, 78.00)
Final Output
Final Clusters:
Cluster 1: Cust-ids 1, 3, 5
Cluster 2: Cust-ids 2, 4, 6
Final Centroids:
Cluster 1 Centroid: (17000, 28.33)
Cluster 2 Centroid: (18000, 78.00)
4. Given the transaction dataset as below and the thresholds of minimum support = 2 and
minimum confidence = 50%, apply the Apriori algorithm to find all frequent itemsets and
generate the corresponding association rules.
TID ITEMSETS
T1 A, B
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Solution:
Apply the Apriori algorithm to the transaction dataset with:
Minimum support = 2
Minimum confidence = 50%
1. Find all frequent itemsets.
2. Generate association rules from those itemsets that meet the minimum confidence
threshold.
Step 0: Understand the Dataset
Transactions:
TID Items
T1 A, B
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
TID Items
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Frequent 1-itemsets:
L1 = {A, B, C, D}
Step 2: Generate L2 – Candidate 2-itemsets from L1
C2 candidates:
{A,B}, {A,C}, {A,D}, {B,C}, {B,D}, {C,D}
Support count:
Itemset Transactions Support
A, B T1, T4, T8, T9 4
A, C T5, T7, T8, T9 4
A, D T4 1❌
B, C T3, T6, T8, T9 4
B, D T2, T4 2
C, D — 0❌
Frequent 2-itemsets:
L2 = {A,B}, {A,C}, {B,C}, {B,D}
Step 3: Generate L3 – Candidate 3-itemsets from L2
Possible candidates:
{A, B, C}, {A, B, D}, {B, C, D}
Support count:
Itemset Transactions Support
A, B, C T8, T9 2✅
A, B, D T4 1❌
B, C, D — 0❌
Frequent 3-itemsets:
L3 = {A, B, C}
5. Enumerate the architecture of a Neural Network and explain the key components such
as the input layer, hidden layer and output layer.
Architecture of a Neural Network
A Neural Network (NN) is a computational model inspired by the structure and function of the
human brain. It consists of layers of interconnected nodes (neurons) that process data through
weighted connections
Basic Structure
A typical neural network includes:
1. Input Layer
2. One or more Hidden Layers
3. Output Layer
Input Layer → Hidden Layer(s) → Output Layer
Each layer is made up of neurons, and connections between neurons carry weights that are
adjusted during training.
Key Components :
1. Input Layer
Function: Receives the input data (features).
Structure: Each neuron in this layer corresponds to one feature of the input vector.
No computation: It simply passes data to the next layer.
Example:
For image recognition, the input layer might have 784 neurons (for 28x28 pixel grayscale
images).
2. Hidden Layer(s)
Function: Perform computations to extract patterns from the data.
Can be multiple layers, hence the term deep learning for deep networks.
Each neuron applies a weighted sum and passes it through an activation function like:
o ReLU (Rectified Linear Unit) – common in deep networks
o Sigmoid – used in binary classification
o Tanh – zero-centered activation
Role:
Capture complex, nonlinear relationships.
More layers and neurons = more capacity to model complex data.
3. Output Layer
Function: Produces the final result (prediction).
Structure: Number of neurons depends on the task:
o 1 neuron for binary classification
o n neurons for n-class classification (with softmax activation)
o Regression might use a linear neuron (no activation)
Activation Examples:
Softmax – for multi-class classification
Sigmoid – for binary classification
Linear – for regression problems
Additional Key Components
Component Description
Numeric values associated with each connection; they determine
Weights
importance.
Biases Extra parameters that allow shifting activation functions.
Activation Functions Introduce non-linearity so networks can model complex data.
Loss Function Measures the difference between predicted and true values.
Component Description
Optimizer Algorithm that adjusts weights to minimize loss (e.g., SGD, Adam).
Summary of Flow
1. Input features go into the input layer.
2. Data moves through hidden layers, each applying weights, biases, and activations.
3. The output layer produces predictions.
4. During training, backpropagation adjusts weights using the loss to improve accuracy.
6. Assume the neurons use the sigmoid activation function for the forward and backward
pass. The target output is 0.5 and the learning rate is 1.
Biases:
Neuron Updated Value
bh1 0.0976
bh2 -0.2983
bo 0.0359
x1=0.5 w1=0.6
x2=0.8 w2=−0.4
x3=0.3 w3=0.9
x4=0.6 w4=0.1