Al3451 Ia 2 Answer Key
Al3451 Ia 2 Answer Key
PART – A
1 Ensemble learning is a machine learning technique where multiple models (such as
decision trees, SVMs, or neural networks) are trained and their predictions combined to
solve the same task. The goal is to achieve better accuracy and robustness than
individual models by leveraging their collective strengths.
2 Boosting aims to improve the performance of weak learners by training them
sequentially. Each new model is focused on correcting the errors made by the previous
models. This results in a strong ensemble where the combined output has higher
accuracy and reduced bias.
3 1. Improved Accuracy: Ensemble methods often outperform individual models
by reducing variance and bias.
2. Better Generalization: They are less likely to overfit the training data,
improving performance on unseen data.
4 Stacking combines the outputs of multiple base models using a meta-model, which
learns how to best integrate the predictions. This layered approach captures diverse
patterns in the data and compensates for individual model weaknesses, thus enhancing
overall predictive accuracy.
5 Gradient descent is an optimization algorithm used to minimize the loss function in
neural networks. It works by computing the gradient of the loss with respect to the
weights and updating the weights iteratively in the opposite direction of the gradient to
reach a minimum loss.
6 The vanishing gradient problem occurs when the gradients used during
backpropagation become very small, especially in deep networks. This leads to
minimal updates in the weights of the earlier layers, causing the network to learn
slowly or not at all in those layers.
7 L1 regularization adds the sum of the absolute values of the weights to the loss
function, encouraging sparsity by driving some weights to zero.
L2 regularization adds the sum of the squares of the weights, penalizing large weights
and promoting smoother models with better generalization.
8 Bootstrapping is a resampling technique where multiple datasets are generated by
sampling with replacement from the original dataset. This allows estimation of model
accuracy and variability, and is commonly used in ensemble methods like bagging.
9 Resampling techniques involve drawing repeated samples from a dataset to evaluate
model performance or stability. Examples include k-fold cross-validation and
bootstrapping. These methods are useful when limited data is available for training and
testing.
10 Statistical significance testing helps determine if the observed performance difference
between two classifiers is due to a true difference or just random chance. It ensures that
the comparison is meaningful and not influenced by sample variability.
PART – B
11A Ensemble learning Definition:
a) Ensemble learning is a machine learning approach where multiple individual models
(learners) are trained and combined to solve the same problem. It aims to improve
prediction accuracy, reduce overfitting, and provide better generalization.
Key Points:
Benefits:
Key Features:
Steps:
Working:
Algorithm Description
Gradient Boosting Uses gradient descent to minimize error.
Extreme Gradient Boosting – optimized for speed and
XGBoost
performance.
LightGBM Faster and more efficient; handles large datasets.
CatBoost Optimized for categorical features.
12A Feature Bagging Boosting
a) Training Parallel Sequential
Data Sampling Random with replacement Same data, but weights updated
Objective Reduces variance Reduces bias
Sensitivity Less sensitive to outliers More sensitive to outliers
Example Random Forest AdaBoost, Gradient Boosting
Aggregation Voting or averaging Weighted vote
Overfitting Handles overfitting well Risk of overfitting if not tuned
12B Definition:
a) GMM assumes data is generated from a mixture of several Gaussian distributions.
Each Gaussian is a cluster, characterized by:
Mean (μ)
Covariance (Σ)
Mixing coefficient (π)
Features:
Applications:
Image segmentation
Anomaly detection
Speech recognition
Customer segmentation
b) Scenario:
1. E-Step (Expectation):
For each data point xxx, compute responsibility rir_iri:
2. M-Step (Maximization):
Update parameters μi,σi,πi\mu_i, \sigma_i, \pi_iμi,σi,πi using responsibilities.
Output: Each data point gets a probability vector like [0.7, 0.2, 0.1], indicating its
association with each Gaussian component.
Purpose:
High bias
Poor training and test accuracy
Flat learning curves
Create resampled sets (e.g., 1000 times), each with 5 values drawn with replacement.
Trade-offs
Higher K → more training, better performance
PART – C
16A Paired t-Test to Evaluate Classifier Performance
Problem Statement:
Two classifiers (A and B) are evaluated using 10-fold cross-validation. The accuracy
scores (%) from each fold are as follows:
Fold Classifier A Classifier B
1 85 82
2 88 84
3 84 83
4 90 86
5 87 85
6 89 84
7 91 88
8 86 83
9 88 86
10 87 84
We are to test whether the difference in accuracy is statistically significant using a
paired t-test at the 95% confidence level (α=0.05\alpha = 0.05α=0.05).
Step-by-Step Answer:
⇒ Reject H0H_0H0
Computed t=8.24>2.262t = 8.24 > 2.262t=8.24>2.262
✅ Conclusion:
There is a statistically significant difference between the two classifiers’
performances. Classifier A performs significantly better than Classifier B at 95%
confidence level.
16B Create a Bagging Ensemble using Decision Trees to Classify Customer Churn and
Discuss the Impact of Increasing the Number of Base Learners
Steps to Implement:
1. Data Preparation:
o Load churn dataset (e.g., from telecom company).
o Preprocess: One-hot encode categorical variables, normalize numerical
features.
2. Bootstrap Sampling:
o Generate k random datasets from training data with replacement.
3. Train Base Learners:
o Train k Decision Trees, each on one bootstrap sample.
4. Aggregate Predictions:
o For classification: Use majority voting to determine final class label.
python
CopyEdit
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
bag_model = BaggingClassifier(
base_estimator=DecisionTreeClassifier(),
n_estimators=50,
bootstrap=True
)
bag_model.fit(X_train, y_train)
y_pred = bag_model.predict(X_test)
✅ Conclusion (1M)
Bagging with decision trees is effective for churn prediction. Increasing the number of
base learners improves performance initially, but has diminishing returns. A balance
between accuracy and computational cost must be maintained.