IML_Module_Answer
IML_Module_Answer
Manhattan Distance: The Manhattan distance between two points (x1, y1)
and (x2, y2) is calculated as |x1 - x2| + |y1 - y2|.
Calculations:
o Distance((3,2), (1,2)) = |3-1| + |2-2| = 2
o Distance((3,2), (2,3)) = |3-2| + |2-3| = 2
o Distance((3,2), (3,5)) = |3-3| + |2-5| = 3
o Distance((3,2), (4,4)) = |3-4| + |2-4| = 3
o Distance((3,2), (5,3)) = |3-5| + |2-3| = 3
K=2 Nearest Neighbors: The two nearest neighbors to (3,2) are (1,2) and
(2,3), both with distance 2.
Classification: Both nearest neighbors belong to class 'A'. Therefore, the
point (3, 2) is classified as A.
Entropy (Parent): First calculate the overall entropy of the "Play Tennis"
target variable. Count how many "Yes" and "No" values, and then use the
formula: -p(yes) * log2(p(yes)) - p(no) * log2(p(no))
Entropy (Children): Calculate the entropy for each "Outlook" value (Sunny,
Overcast, Rainy).
o For each Outlook value, calculate the probability of "Yes" and "No"
then calculate the entropy for that Outlook
Weighted Average of Child Entropy: Then, calculate the weighted average
of the child entropies, based on how many samples each Outlook contains.
Information Gain: Finally, calculate the Information Gain as
Entropy(Parent) - Weighted average of child entropy
Calculation
o Total Play Tennis: 9 yes , 5 no, 14 Total
o Parent Entropy = -(9/14)*log2(9/14) - (5/14)*log2(5/14) = 0.940
o Sunny: 2 yes, 3 no, 5 total, entropy = 0.97
o Overcast: 4 yes, 0 no, 4 total, entropy= 0
o Rainy: 3 yes, 2 no, 5 total, entropy = 0.97
o Weighted Entropy = (5/14)*0.97 + (4/14)*0+ (5/14)*0.97=0.692
o Information Gain = 0.940-0.692 = 0.248
Overfitting in Decision Trees: Single decision trees can overfit the training
data, capturing noise, and leading to poor generalization on unseen data.
Complex trees can memorize the training data instead of learning general
patterns.
Random Forest Solution:
o Random Subsampling of Training Data: By training each tree on a
random subset of data, each tree learns from different examples and
thereby reduces the chance of memorizing the noise in the training
data.
o Random Subset of Features: Each split in each tree is made
considering a random subset of features, which adds more variation.
o Ensemble Averaging: The predictions of multiple trees are then
averaged for regression, or a majority vote is taken for classification,
leading to a more robust prediction.
o Generalization: This combined approach leads to random forest
generalizing better than a single decision tree and is less prone to
overfitting.
9. K-Means Algorithm:
Initial Centroids (Random): Let's assume the initial centroids are (2,3) and
(6,6) (just pick any 2 data points from the dataset)
Assignment Step:
o Distances to Centroid 1 (2,3):
(2,3): 0
(3,3): 1
(6,6): sqrt(18)
(8,8): sqrt(50)
(5,8): sqrt(25)
(1,2): sqrt(2)
o Distances to Centroid 2 (6,6):
(2,3): sqrt(25)
(3,3): sqrt(18)
(6,6): 0
(8,8): sqrt(8)
(5,8): sqrt(5)
(1,2): sqrt(50)
o Cluster Assignment:
Cluster 1: (2,3) , (3,3), (1,2)
Cluster 2: (6,6), (8,8), (5,8)
Update Centroids:
o New Centroid 1: ((2+3+1)/3, (3+3+2)/3) = (2, 2.67)
o New Centroid 2: ((6+8+5)/3, (6+8+8)/3) = (6.33, 7.33)
Structure:
o Inputs: Multiple inputs (x1, x2, ... ,xn), each associated with a weight
(w1, w2, ... wn)
o Weighted Sum: Each input is multiplied by its corresponding weight,
and all results are summed up.
o Bias: A bias term (b) is added to the weighted sum.
o Activation Function: The sum (with bias) is passed through an
activation function (e.g., sigmoid, ReLU), which determines the
neuron's output.
Similarity to Biological Neuron:
o Dendrites: Inputs are analogous to dendrites that receive signals.
o Synapses: Weights are analogous to the strength of connections
between neurons (synapses).
o Cell Body: The weighted sum with bias corresponds to the cell body
accumulating input signals.
o Axon: The activation function represents the neuron's firing behavior
(axon transmitting signals).
Calculate Mean of Actual y (ȳ): ȳ = (2.5 + 3.6 + 4.8 + 6.1 + 7.1) / 5 = 4.82
Calculate Total Sum of Squares (TSS):
o TSS = Σ(yi - ȳ)^2 = (2.5-4.82)^2 + (3.6-4.82)^2 + (4.8-4.82)^2 + (6.1-
4.82)^2 + (7.1-4.82)^2
o TSS = 5.3824+1.4884+0.0004+1.6384+5.1984 = 13.708
Calculate Residual Sum of Squares (RSS):
o RSS = Σ(yi - ŷi)^2 = (2.5-2.8)^2 + (3.6-3.4)^2 + (4.8-4.6)^2 + (6.1-5.9)^2 +
(7.1-7.1)^2 = 0.09+0.04+0.04+0.04+0 = 0.21
Calculate R²: R² = 1 - (RSS / TSS) = 1 - (0.21 / 13.708) ≈ 0.985
o An R2 of 0.985 is a good model fit, explaining 98.5% of the variance.
25. Linear Regression Model for Study Hours and Test Scores:
Data:
o x (Study Hours): 1, 2, 3, 4, 5
o y (Test Score): 20, 30, 40, 50, 60
Calculate means: Calculate the mean of x (x̄ ) and mean of y (ȳ). x̄ =
(1+2+3+4+5)/5 = 3 , ȳ = (20+30+40+50+60)/5 = 40
Calculate slope (b): b = Σ[(xi - x̄ )(yi - ȳ)] / Σ[(xi - x̄ )^2]
o Σ[(xi - x̄ )(yi - ȳ)] = (-2)(-20) + (-1)(-10) + (0)(0) + (1)(10) + (2)(20) =
40+10+0+10+40= 100
o Σ[(xi - x̄ )^2] = (-2)^2 + (-1)^2 + (0)^2+ (1)^2 + (2)^2 = 4+1+0+1+4=10
o b = 100/10 = 10
Calculate y-intercept (a): a = ȳ - b * x̄ = 40 - (10 * 3) = 10
Regression Equation: y = 10 + 10x
Prediction: For 4.5 hours: y = 10 + 10 * 4.5 = 55.
Test Score Prediction: The predicted test score for a student who has
studied for 4.5 hours is 55.