0% found this document useful (0 votes)
4 views

Assignment_2

The assignment for the Machine Learning Algorithms course focuses on enhancing classification model performance using techniques such as Decision Trees, Random Forests, and K-Nearest Neighbors (KNN). Students are required to build, tune, and evaluate these models while applying preprocessing steps and utilizing tools from the Scikit-learn library, with a strong emphasis on hyperparameter tuning and ensemble learning. Evaluation criteria include technical correctness, completeness, clarity of presentation, and the quality of the report, with specific tasks outlined for each model.

Uploaded by

aaaa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Assignment_2

The assignment for the Machine Learning Algorithms course focuses on enhancing classification model performance using techniques such as Decision Trees, Random Forests, and K-Nearest Neighbors (KNN). Students are required to build, tune, and evaluate these models while applying preprocessing steps and utilizing tools from the Scikit-learn library, with a strong emphasis on hyperparameter tuning and ensemble learning. Evaluation criteria include technical correctness, completeness, clarity of presentation, and the quality of the report, with specific tasks outlined for each model.

Uploaded by

aaaa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCE

Course code: MO 3208


Course title: MACHINE LEARNING ALGORITHMS

Method of Assessment: ASSIGNMENT 2


Lecturer: Aigul B. Mimenbayeva

Astana 2025
SUBMISSION: Submit your reports to the Moodle platform as a Zip file (report, code part,
dataset)
COLLABORATION: Make certain that you understand the course collaboration policy,
described on the course website. You must complete this assignment individually; you are not
allowed to collaborate with anyone else. You may discuss the homework to understand the
problems and the mathematics behind the various learning algorithms, but you are not allowed to
share problem solutions or your code with any other students.
DESCRIPTION
This assignment focuses on enhancing the performance of classification models using
various machine learning techniques. It involves building, tuning, and evaluating models such as
Decision Trees, Random Forests, and K-Nearest Neighbors (KNN). It is expected to apply
preprocessing steps including handling missing values, feature scaling, and encoding, followed
by model development and optimization using tools from the Scikit-learn library.
Key concepts such as hyperparameter tuning (using GridSearchCV or
RandomizedSearchCV), cross-validation, and ensemble learning are explored. The stacking
ensemble model is constructed to compare the performance of combined models against
individual learners.
EVALUATION CRITERIA:
The following evaluation criteria provide a detailed description of how the assignment will
be evaluated. Each section of the assignment will be evaluated based on its technical correctness,
completeness and clarity of presentation. The grading system below assigns weights to each
component, providing a balanced assessment of both the technical implementation and the
quality of the report.

Criteria Description Full


Marks
Task 1: Correct implementation of Decision Tree model with preprocessing; 15%
Decision hyperparameter tuning using GridSearchCV; comparison before and after
Tree Tuning tuning using proper metrics.
Task 2: Building and tuning of Random Forest classifier; comparison with 15%
Random Decision Tree model; explanation of improvements using evaluation
Forest metrics.
Modeling
Task 3: Implementation of KNN with proper preprocessing; tuning n_neighbors 15%
KNN and distance metrics; use of cross-validation and comparison with other
Modeling models.
Task 4: Implementation of StackingClassifier using multiple models; use of meta- 15%
Stacking model (e.g., Logistic Regression or SVM); comparison with individual
Ensemble models.
Model Consistent use of evaluation metrics (Accuracy, F1-Score, Confusion 10%
Evaluation Matrix, ROC-AUC) and cross-validation throughout all tasks.
Techniques
Report Clear and organized report with logical structure, code, visualizations (if 30%
Quality & any), explanations, interpretations, and defense of methods used.
Presentation

General Instructions:
 For each task, provide the code, visualizations (if necessary), and detailed reports on the results.
 Use the Scikit-learn library to implement models and evaluate their performance.
 Use GridSearchCV or RandomizedSearchCV for hyperparameter tuning.
 Use Cross-Validation methods to validate model results and avoid overfitting.
 Apply different evaluation metrics like accuracy, F1-Score, Confusion Matrix, and ROC-AUC.

Task 1: Hyperparameter Tuning of Decision Tree


1. Task Description: Improve a Decision Tree model for classification. The dataset contains a mix of
categorical and numerical features.
2. Steps:
Load the dataset and perform preprocessing (handling missing values, normalizing/standardizing features).
Split the data into training and testing sets.
Build a Decision Tree model and tune hyperparameters using GridSearchCV:
max_depth
min_samples_split
min_samples_leaf
Compare the performance of the model before and after hyperparameter tuning.
3. Expected Outcome: Write a report comparing the model's performance before and after optimization
using accuracy, F1-Score, and Confusion Matrix.
4. Code Reference: https://www.analyticsvidhya.com/blog/2024/03/decision-trees-split-methods-
hyperparameter-tuning/

Task 2: Building a Random Forest Ensemble


1. Task Description: Using the same dataset, build a Random Forest model for classification.
2. Steps:
o Build a Random Forest model with default hyperparameters.
o Optimize the model by tuning hyperparameters:
 n_estimators
 max_depth
 min_samples_split
 max_features
o Compare the results of the Random Forest model with the Decision Tree model using accuracy and other
metrics.
3. Expected Outcome: Write a report discussing how the results improved after hyperparameter
optimization.
Random Forest Hyperparameter Tuning Example:
https://www.analyticsvidhya.com/blog/2020/03/beginners-guide-random-forest-hyperparameter-tuning/

Task 3: Applying KNN for Classification


1. Task Description: Apply KNN (K-Nearest Neighbors) to the same dataset.
2. Steps:
o Perform preprocessing for KNN (e.g., feature standardization).
o Split the data into training and testing sets.
o Build the KNN model with different numbers of neighbors and use Cross-Validation to find the best
n_neighbors.
o Explore the effect of different distance metrics (e.g., Euclidean, Manhattan).
o Evaluate the model using accuracy, F1-Score, and other metrics.
3. Expected Outcome: Write a report discussing the best n_neighbors and distance metric, and compare the
results to Decision Tree and Random Forest models.
4. Code Reference: KNN Example with Cross-Validation
https://www.kaggle.com/code/parthshah98/k-fold-cross-validation-for-knn

Task 4: Ensemble Methods (Stacking)


1. Task Description: Build an ensemble model using Stacking with Decision Tree, Random Forest, and
KNN.
2. Steps:
o Build an ensemble using StackingClassifier.
o Use Logistic Regression or SVM as a meta-model for prediction.
o Compare the results of the stacked ensemble with individual models.
3. Expected Outcome: Write a report discussing how using the ensemble method improves accuracy
compared to individual models.
4. Code Reference: Stacking Ensemble Example
https://www.kaggle.com/code/anuragbantu/stacking-ensemble-learning-beginner-s-guide
https://www.kaggle.com/code/anuragbantu/stacking-ensemble-learning-beginner-s-guide

You might also like