0% found this document useful (0 votes)
6 views2 pages

Assignment ML3

The assignment involves using the Pen-Digits dataset to implement and evaluate machine learning techniques including decision trees, bagging, and boosting. Key tasks include generating visualizations, fitting models, tuning hyperparameters, and comparing model performances based on various metrics. Additionally, the assignment emphasizes improving model accuracy through PCA and feature selection, with a focus on evaluating the impact of these enhancements.

Uploaded by

harithmsylhy3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Assignment ML3

The assignment involves using the Pen-Digits dataset to implement and evaluate machine learning techniques including decision trees, bagging, and boosting. Key tasks include generating visualizations, fitting models, tuning hyperparameters, and comparing model performances based on various metrics. Additionally, the assignment emphasizes improving model accuracy through PCA and feature selection, with a focus on evaluating the impact of these enhancements.

Uploaded by

harithmsylhy3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment: ML (3)

Dataset:
Use Pen-Digits datasets (train dataset & test dataset) with provided splits to solve
questions.

# Decision tree
1- Generate a scatterplot matrix to show the relationships between the variables and a
heatmap to determine correlated attributes, then write a summary of what you noticed.
2- Ensure data is in the correct format for downstream processes (e.g., remove redundant
information, convert categorical to numerical values, address missing values, etc.)
3- Fit a decision tree to the training data. Plot the tree, interpret the results, and display
accuracy and Confusion Matrix.
4- Try different ways to improve the decision tree algorithm (e.g., use different splitting
strategies, prune tree after splitting). Does pruning the tree improves the accuracy?

# Bagging
(Bagging is to generate a set of bootstrap datasets, create estimators for each bootstrap
dataset, and finally utilize majority voting (soft or hard) to get the final decision.)
1- Apply bagging strategy to classify test set samples by using SVM and Decision Tree
algorithm as base estimators. Display accuracy and Confusion Matrix.
2- Apply Random Forest algorithm (the baseline), then fine tune this baseline. For the
number of estimators, Try 5 different values within the interval of [10, 200]. Plot
accuracy vs. number of estimators.

# Boosting
1- Use GradientBoosting classifier to classify test set samples. There are 2 important
hyperparameters in GradientBoosting, i.e., the number of estimators, and learning rate.
First, tune number of estimators parameter by trying 4 values in the interval of [10,
200]. Then by using the tuned value for number of estmators, tune the learning rate
parameter by trying 4 values within the range of [0.1, 0.9]. Display accuracy and
Confusion Matrix separately for the best value of both parameters (Number of
estimators and learning rate).
2- Build XGBoost classifier with the same parameters that you obtained in the last one.
Provide accuracy and Confusion Matrix.
3- Comment on Bagging and Boosting approaches.
# Improving with PCA and Feature Selection:
1- Compare the performance of the models in terms of the following criteria: precision,
recall, accuracy, F-score. Identify the model that performed best and worst according to
each criterion.
2- Choose the best model and reduce complexity and focusing on the most important
features by using Principal Component Analysis (PCA) and feature selection to make it
work even better.
3- Evaluate how well the improved model does in precision, recall, accuracy, and F1-score.
This helps us see the impact of PCA and feature selection on its performance.
4- Compare the enhanced model's performance with its original version. With writing a
comment.

You might also like