Unit-V 1
Unit-V 1
Ensemble Learning:
Ensemble learning is a technique in machine learning where multiple models are
combined to solve a particular problem. The primary idea is to aggregate
predictions from several models to achieve better performance than any single
model alone.
➢ Improves Accuracy:
➢ Reduces Overfitting:
2. Boosting
1. Bagging (Bootstrap Aggregating)
• The results of these models are then aggregated to form a final prediction.
2. Boosting
• Boosting is a sequential ensemble method where each new model is trained
to correct the errors made by the previous models.
• Models are added one by one, and each new model focuses on the hardest-
to-predict instances by adjusting their weights.
• the subsequent model puts more focus on misclassified samples from the
previous model
1. Ensemble Learning:
➢ Each tree is trained on a different portion of the dataset, and the final
prediction is made by aggregating (e.g., majority voting for
classification or averaging for regression).
3. Feature Randomness:
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-5: For new data points, find the predictions of each decision tree, and assign
the new data points to the category that wins the majority votes.
Example: Suppose there is a dataset that contains multiple fruit images. So, this
dataset is given to the Random forest classifier. The dataset is divided into subsets
and given to each decision tree. During the training phase, each decision tree
produces a prediction result, and when a new data point occurs, then based on the
majority of results, the Random Forest classifier predicts the final decision.
Example:
Heart
Blood
Record Age Cholesterol Disease
Pressure
(Target)
1 45 230 140 1
2 50 210 130 0
3 55 250 150 1
4 60 240 145 0
5 48 235 138 1
6 53 225 135 0
Bootstrap Sample 1
2 50 210 130 0
3 55 250 150 1
5 48 235 138 1
1 45 230 140 1
Bootstrap Sample 2
Record Age Cholesterol Blood Pressure Target
4 60 240 145 0
2 50 210 130 0
6 53 225 135 0
3 55 250 150 1
• For each split, we randomly choose two features from the three available
(Age, Cholesterol, Blood Pressure).
Sum Rule:
Here, yi is the output (e.g., probability score) from the ith classifier, and wi is its
weight.
The final prediction is based on the sum of weighted outputs. Weights w i can be
equal or manually assigned based on the importance of each classifier.
Product Rule:
This rule multiplies the outputs of all classifiers. It is sensitive to low confidence
scores (i.e., if one classifier gives a near-zero score, it heavily affects the product).
Max Rule:
The final decision is based on the maximum output among the classifiers. It is
useful when one of the classifiers is highly confident about a particular prediction.
Min Rule:
The final decision is based on the minimum output among the classifiers. It is
rarely used because it heavily depends on the lowest score.
Majority Voting:
Each classifier gives its class prediction. The class with the majority votes is
chosen as the final prediction.
This method works well when all classifiers have equal weights and similar
accuracy.
Trained rule fusion techniques, also known as learned fusion techniques, involve
training an additional model (meta-classifier) to learn the optimal combination
of predictions from the base classifiers. This approach adapts based on the
training data.
Boosting:
• Randomly select K data points to serve as the initial centroids, the center
points of the clusters.
• For each data point, calculate the distance (usually Euclidean) to each
centroid.
• Assign the data point to the cluster whose centroid is closest to it.
4. Update Centroids
• For each cluster, calculate the mean of all data points assigned to it.
6. Result
• The algorithm outputs K clusters, each with its centroid and assigned data
points.
2. It is also efficient, in which the time taken to cluster K-means rises linearly
with the number of data points.
3. No other clustering algorithm performs better than K-means.
3. It is not suitable for discovering clusters that are not hyper ellipsoids or
hyper spheres.
• A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 9).
• The distance function is Euclidean distance.
• Suppose initially we assign A1, B1, and C1 as the center of each cluster,
respectively.
Distance
Data Distance Distance New
to Cluster
Points to (2,10) to (6,6) Cluster
(1.5,3.5)
Current Centroids A1: (3, 9.5) B1: (6.5, 5.25) C1: (1.5, 3.5)
Distance
Data Distance to Distance to New
to Cluster
Points (3,9.5) (6.5,5.25) Cluster
(1.5,3.5)
A1 (2,10) 1.12 6.54 6.52 1 1
A2 (2,5) 4.61 4.51 1.58 3 3
A3 (8,4) 7.43 1.95 6.52 2 2
B1 (5,8) 2.5 3.13 5.7 2 1
B2 (7,5) 6.02 0.56 5.7 2 2
B3 (6,4) 6.26 1.35 4.53 2 2
C1 (1,2) 7.76 6.39 1.58 3 3
C2 (4,9) 1.12 4.51 6.04 1 1
Current Centroids A1: (3.67, 9) B1: (7, 4.33) C1: (1.5, 3.5)
Distance Distance
Distance to New
Data Points to to Cluster
(1.5,3.5) Cluster
(3.67,9) (7,4.33)
A1 (2,10) 1.94 7.56 6.52 1 1
A2 (2,5) 4.33 5.04 1.58 3 3
A3 (8,4) 6.62 1.05 6.52 2 2
B1 (5,8) 1.67 4.18 5.7 1 1
B2 (7,5) 5.21 0.67 5.7 2 2
B3 (6,4) 5.52 1.05 4.53 2 2
C1 (1,2) 7.49 6.44 1.58 3 3
C2 (4,9) 0.33 5.55 6.04 1 1
Hierarchical Clustering:
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and
this tree-shaped structure is known as the dendrogram.
Sometimes the results of K-means clustering and hierarchical clustering may look
similar, but they both differ depending on how they work. As there is no
requirement to predetermine the number of clusters as we did in the K-Means
algorithm.
we have seen in the K-means clustering that there are some challenges with this
algorithm, which are a predetermined number of clusters, and it always tries to
create the clusters of the same size.
To solve these two challenges, we can opt for the hierarchical clustering algorithm
because, in this algorithm, we don't need to have knowledge about the predefined
number of clusters.
Process:
1. Start: Each data point is its own cluster (initially, we have n clusters for n
data points).
2. Merge: Find the pair of clusters that are closest to each other and merge
them.
3. Repeat: Continue merging the closest clusters until only one cluster
remains or the desired number of clusters is reached.
Divisive Hierarchical Clustering:
• The algorithm then iteratively splits the clusters into smaller clusters until
each data point is its own cluster or a stopping criterion is met.
Process:
• Repeat: Continue splitting the clusters until every data point is its own
cluster or a predefined number of clusters is reached.
The dendrogram in divisive clustering starts with a single cluster at the top
and splits downward until each data point is separated.
1. Find the Minimum Distance: Start by merging clusters with the smallest
distance.
2. Form Clusters: At each step, find the smallest distance between clusters.
2. Form Clusters: Use the maximum distance between clusters to merge them.
18 22 25 27 42 43
18 0 4 7 9 24 25
22 4 0 3 5 20 21
25 7 3 0 2 17 18
27 9 5 2 0 15 16
42 24 20 17 15 0 1
43 25 21 18 16 1 0
Find the minimum distance between data points
18 22 25 27 42 43
18 0 4 7 9 24 25
22 4 0 3 5 20 21
25 7 3 0 2 17 18
27 9 5 2 0 15 16
42 24 20 17 15 0 1
43 25 21 18 16 1 0
Cluster 1: (42,43)
18 22 25 27 42,43
18 0 4 7 9 24
22 4 0 3 5 20
25 7 3 0 2 17
27 9 5 2 0 15
42,43 24 20 17 15 0
18 22 25 27 42,43
18 0 4 7 9 24
22 4 0 3 5 20
25 7 3 0 2 17
27 9 5 2 0 15
42,43 24 20 17 15 0
18 22 25,27 42,43
18 0 4 7 24
22 4 0 3 20
25,27 7 3 0 15
42,43 24 20 15 0
Cluster 3: ((42,43),((25,27),22))
18 22,25,27 42,43
18 0 4 24
22,25,27 4 0 15
42,43 24 15 0
18,22,25,27 42,43
18,22,25,27 0 15
42,43 15 0
Cluster 4: ((42,43),(((25,27),22),18))
Final Dendogram:
Given a one-dimensional data set {1, 5, 8, 10, 2}, use the agglomerative clustering
algorithms with the complete link with Euclidean distance to establish a
hierarchical grouping relationship.
EuclideanDistance=√((x₂-x₁)²+(y₂-y₁)²)
Euclidean Distance = √((x₂ - x₁)²)
Sno 1 2 3 4 5
DataPoints 1 5 8 10 2
1 1 0 4 7 9 1
2 5 4 0 3 5 3
3 8 7 3 0 2 6
4 10 9 5 2 0 8
5 2 1 3 6 8 0
Here Minimum distance from data point i.e (1,2) of first row, fifth column or fifth
row ,first column
1,5 2 3 4
1,5 0 4 7 9
2 4 0 3 5
3 7 3 0 2
4 9 5 2 0
➢ From the above distance matrix, we can see the distance between points 3
and 4 is smallest.
1,5 2 3 4
1,5 0 4 7 9
2 4 0 3 5
3 7 3 0 2
4 9 5 2 0
Thus, we can update the distance matrix, where row 2 corresponds to point 2,
rows 1 and 3 correspond to clusters {1,5} and {3,4}, as follows:
1,5 2 3,4
1,5 0 4 9
2 4 0 5
3,4 9 5 0
Following the same procedure, we merge point 2 with the cluster {1, 5} to form {1,
2, 5} and update the distance matrix as follows:
1,5 2 3,4
1,5 0 4 9
2 4 0 5
3,4 9 5 0
{1,5},2 {3,4}
{1,5},2 0 9
{3,4} 9 0