Assignment_2

The assignment for the Machine Learning Algorithms course focuses on enhancing classification model performance using techniques such as Decision Trees, Random Forests, and K-Nearest Neighbors (KNN). Students are required to build, tune, and evaluate these models while applying preprocessing steps and utilizing tools from the Scikit-learn library, with a strong emphasis on hyperparameter tuning and ensemble learning. Evaluation criteria include technical correctness, completeness, clarity of presentation, and the quality of the report, with specific tasks outlined for each model.

Uploaded by

aaaa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Assignment_2

Uploaded by

aaaa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCE

Course code: MO 3208

Course title: MACHINE LEARNING ALGORITHMS

Method of Assessment: ASSIGNMENT 2

Lecturer: Aigul B. Mimenbayeva

Astana 2025
SUBMISSION: Submit your reports to the Moodle platform as a Zip file (report, code part,
dataset)
COLLABORATION: Make certain that you understand the course collaboration policy,
described on the course website. You must complete this assignment individually; you are not
allowed to collaborate with anyone else. You may discuss the homework to understand the
problems and the mathematics behind the various learning algorithms, but you are not allowed to
share problem solutions or your code with any other students.
DESCRIPTION
This assignment focuses on enhancing the performance of classification models using
various machine learning techniques. It involves building, tuning, and evaluating models such as
Decision Trees, Random Forests, and K-Nearest Neighbors (KNN). It is expected to apply
preprocessing steps including handling missing values, feature scaling, and encoding, followed
by model development and optimization using tools from the Scikit-learn library.
Key concepts such as hyperparameter tuning (using GridSearchCV or
RandomizedSearchCV), cross-validation, and ensemble learning are explored. The stacking
ensemble model is constructed to compare the performance of combined models against
individual learners.
EVALUATION CRITERIA:
The following evaluation criteria provide a detailed description of how the assignment will
be evaluated. Each section of the assignment will be evaluated based on its technical correctness,
completeness and clarity of presentation. The grading system below assigns weights to each
component, providing a balanced assessment of both the technical implementation and the
quality of the report.

Criteria Description Full

Marks
Task 1: Correct implementation of Decision Tree model with preprocessing; 15%
Decision hyperparameter tuning using GridSearchCV; comparison before and after
Tree Tuning tuning using proper metrics.
Task 2: Building and tuning of Random Forest classifier; comparison with 15%
Random Decision Tree model; explanation of improvements using evaluation
Forest metrics.
Modeling
Task 3: Implementation of KNN with proper preprocessing; tuning n_neighbors 15%
KNN and distance metrics; use of cross-validation and comparison with other
Modeling models.
Task 4: Implementation of StackingClassifier using multiple models; use of meta- 15%
Stacking model (e.g., Logistic Regression or SVM); comparison with individual
Ensemble models.
Model Consistent use of evaluation metrics (Accuracy, F1-Score, Confusion 10%
Evaluation Matrix, ROC-AUC) and cross-validation throughout all tasks.
Techniques
Report Clear and organized report with logical structure, code, visualizations (if 30%
Quality & any), explanations, interpretations, and defense of methods used.
Presentation

General Instructions:
 For each task, provide the code, visualizations (if necessary), and detailed reports on the results.
 Use the Scikit-learn library to implement models and evaluate their performance.
 Use GridSearchCV or RandomizedSearchCV for hyperparameter tuning.
 Use Cross-Validation methods to validate model results and avoid overfitting.
 Apply different evaluation metrics like accuracy, F1-Score, Confusion Matrix, and ROC-AUC.

Task 1: Hyperparameter Tuning of Decision Tree

1. Task Description: Improve a Decision Tree model for classification. The dataset contains a mix of
categorical and numerical features.
2. Steps:
Load the dataset and perform preprocessing (handling missing values, normalizing/standardizing features).
Split the data into training and testing sets.
Build a Decision Tree model and tune hyperparameters using GridSearchCV:
max_depth
min_samples_split
min_samples_leaf
Compare the performance of the model before and after hyperparameter tuning.
3. Expected Outcome: Write a report comparing the model's performance before and after optimization
using accuracy, F1-Score, and Confusion Matrix.
4. Code Reference: https://www.analyticsvidhya.com/blog/2024/03/decision-trees-split-methods-
hyperparameter-tuning/

Task 2: Building a Random Forest Ensemble

1. Task Description: Using the same dataset, build a Random Forest model for classification.
2. Steps:
o Build a Random Forest model with default hyperparameters.
o Optimize the model by tuning hyperparameters:
 n_estimators
 max_depth
 min_samples_split
 max_features
o Compare the results of the Random Forest model with the Decision Tree model using accuracy and other
metrics.
3. Expected Outcome: Write a report discussing how the results improved after hyperparameter
optimization.
Random Forest Hyperparameter Tuning Example:
https://www.analyticsvidhya.com/blog/2020/03/beginners-guide-random-forest-hyperparameter-tuning/

Task 3: Applying KNN for Classification

1. Task Description: Apply KNN (K-Nearest Neighbors) to the same dataset.
2. Steps:
o Perform preprocessing for KNN (e.g., feature standardization).
o Split the data into training and testing sets.
o Build the KNN model with different numbers of neighbors and use Cross-Validation to find the best
n_neighbors.
o Explore the effect of different distance metrics (e.g., Euclidean, Manhattan).
o Evaluate the model using accuracy, F1-Score, and other metrics.
3. Expected Outcome: Write a report discussing the best n_neighbors and distance metric, and compare the
results to Decision Tree and Random Forest models.
4. Code Reference: KNN Example with Cross-Validation
https://www.kaggle.com/code/parthshah98/k-fold-cross-validation-for-knn

Task 4: Ensemble Methods (Stacking)

1. Task Description: Build an ensemble model using Stacking with Decision Tree, Random Forest, and
KNN.
2. Steps:
o Build an ensemble using StackingClassifier.
o Use Logistic Regression or SVM as a meta-model for prediction.
o Compare the results of the stacked ensemble with individual models.
3. Expected Outcome: Write a report discussing how using the ensemble method improves accuracy
compared to individual models.
4. Code Reference: Stacking Ensemble Example
https://www.kaggle.com/code/anuragbantu/stacking-ensemble-learning-beginner-s-guide
https://www.kaggle.com/code/anuragbantu/stacking-ensemble-learning-beginner-s-guide

Lab 8
No ratings yet
Lab 8
5 pages
ML Supervised Learning Unit 3
No ratings yet
ML Supervised Learning Unit 3
51 pages
Assignment1_LATEX
No ratings yet
Assignment1_LATEX
11 pages
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
AIML Hard
No ratings yet
AIML Hard
22 pages
ML Hota Assign3
No ratings yet
ML Hota Assign3
4 pages
Assignment_1_Machine Learning
No ratings yet
Assignment_1_Machine Learning
3 pages
Advanced Techniques in Machine Learning and Optimization (3)
No ratings yet
Advanced Techniques in Machine Learning and Optimization (3)
8 pages
ML Assignment 2
No ratings yet
ML Assignment 2
3 pages
Data Science Notes
No ratings yet
Data Science Notes
36 pages
Assignment 9[1]
No ratings yet
Assignment 9[1]
8 pages
Unit 1 AAM
No ratings yet
Unit 1 AAM
16 pages
col780_a3-1 (1)
No ratings yet
col780_a3-1 (1)
5 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
64 pages
AIML LAB WEEK 8 SET-2
No ratings yet
AIML LAB WEEK 8 SET-2
5 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
22 pages
0bcd05c31fafcaea376ca1edacfa7eb9
No ratings yet
0bcd05c31fafcaea376ca1edacfa7eb9
6 pages
Data Mining Report
No ratings yet
Data Mining Report
7 pages
CHAPTER FOU1
No ratings yet
CHAPTER FOU1
14 pages
Cours 3 - TP
No ratings yet
Cours 3 - TP
3 pages
Statistical Machine Learning (CSE 575) : About This Course
No ratings yet
Statistical Machine Learning (CSE 575) : About This Course
12 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
FAM_QUESTION_BANK_CT[1]
No ratings yet
FAM_QUESTION_BANK_CT[1]
14 pages
17 - PPT - NLP Project-2-24
No ratings yet
17 - PPT - NLP Project-2-24
23 pages
Final Exam MPML
No ratings yet
Final Exam MPML
5 pages
Building A Deep Learning Model For Skin Cancer Classification Using A Hybrid Approach That Combines Convolutional Neural Networks
No ratings yet
Building A Deep Learning Model For Skin Cancer Classification Using A Hybrid Approach That Combines Convolutional Neural Networks
5 pages
Project Description Document
No ratings yet
Project Description Document
7 pages
Pravesh 6301
No ratings yet
Pravesh 6301
11 pages
INNOVATION - PDF Phrase 2
No ratings yet
INNOVATION - PDF Phrase 2
9 pages
DM Assignment 2
No ratings yet
DM Assignment 2
2 pages
CS60050_Machine Learning_Programming Assignment_3
No ratings yet
CS60050_Machine Learning_Programming Assignment_3
5 pages
AI LAB Assignment 09
No ratings yet
AI LAB Assignment 09
4 pages
INT354 Syllabus
No ratings yet
INT354 Syllabus
2 pages
Big Data Framework Final Project
No ratings yet
Big Data Framework Final Project
2 pages
23CA22P1 - MACHINE LEARNING LAB
No ratings yet
23CA22P1 - MACHINE LEARNING LAB
2 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Steps to create data sets and developing a machine learning model
No ratings yet
Steps to create data sets and developing a machine learning model
3 pages
ML Assignment (22BCE8086) 2
No ratings yet
ML Assignment (22BCE8086) 2
19 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
Green Minimalist Professional Business Proposal Presentation
No ratings yet
Green Minimalist Professional Business Proposal Presentation
20 pages
Report
No ratings yet
Report
4 pages
Capstone Project 2
No ratings yet
Capstone Project 2
27 pages
MIC Assignment4
No ratings yet
MIC Assignment4
9 pages
COMP-377 Lab Assignment 2 - F21
No ratings yet
COMP-377 Lab Assignment 2 - F21
2 pages
Assignment
No ratings yet
Assignment
5 pages
chapter3
No ratings yet
chapter3
9 pages
CSC 603 - Final Project
No ratings yet
CSC 603 - Final Project
3 pages
ENG21CS0302 - SGAN
No ratings yet
ENG21CS0302 - SGAN
7 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
5 no ans.
No ratings yet
5 no ans.
38 pages
Theory (10 Marks)
No ratings yet
Theory (10 Marks)
4 pages
COM7039M MachineLearning Assignment Brief-Level 7-1
No ratings yet
COM7039M MachineLearning Assignment Brief-Level 7-1
12 pages
AI Lab Assignment-10 Ishaan Bhadrike
No ratings yet
AI Lab Assignment-10 Ishaan Bhadrike
7 pages
Unit 5
No ratings yet
Unit 5
11 pages
Machine_Learning_for_Genomic_Data_Proposal
No ratings yet
Machine_Learning_for_Genomic_Data_Proposal
4 pages
ML 5 Marks Questions Answers 1 to 30
No ratings yet
ML 5 Marks Questions Answers 1 to 30
5 pages
Dissertation Study - Deep Packet Inspection
No ratings yet
Dissertation Study - Deep Packet Inspection
14 pages
20CB913 Machine Learning Module 2
No ratings yet
20CB913 Machine Learning Module 2
52 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Artificial Intelligence in Bus
No ratings yet
Artificial Intelligence in Bus
138 pages
Healthcare Analytics From Data to Knowledge to Healthcare Improvement 1st Edition Hui Yang download pdf
100% (3)
Healthcare Analytics From Data to Knowledge to Healthcare Improvement 1st Edition Hui Yang download pdf
65 pages
Towards Development of An Isfet-Based Smart PH Sensor: Enabling Machine Learning For Drift Compensation in Iot Applications
No ratings yet
Towards Development of An Isfet-Based Smart PH Sensor: Enabling Machine Learning For Drift Compensation in Iot Applications
12 pages
Introduction To Prescriptive AI: A Primer For Decision Intelligence Solutioning With Python Akshay Kulkarni Ebook All Chapters PDF
No ratings yet
Introduction To Prescriptive AI: A Primer For Decision Intelligence Solutioning With Python Akshay Kulkarni Ebook All Chapters PDF
49 pages
Skillsbuild_report
No ratings yet
Skillsbuild_report
2 pages
AI in Food Science Research Proposal (1)
No ratings yet
AI in Food Science Research Proposal (1)
17 pages
Instant Access to Handbook of HydroInformatics: Volume I: Classic Soft-Computing Techniques 1st Edition - eBook PDF ebook Full Chapters
100% (4)
Instant Access to Handbook of HydroInformatics: Volume I: Classic Soft-Computing Techniques 1st Edition - eBook PDF ebook Full Chapters
69 pages
Arduino Tiny Machine Learning KIT
No ratings yet
Arduino Tiny Machine Learning KIT
4 pages
Impact+of+Artificial+Intelligence+on+Financial+Markets
No ratings yet
Impact+of+Artificial+Intelligence+on+Financial+Markets
8 pages
Waterloos Artificial Intelligence Cluster Map-1
No ratings yet
Waterloos Artificial Intelligence Cluster Map-1
2 pages
Usage of Machine Learning in Software Testing
No ratings yet
Usage of Machine Learning in Software Testing
15 pages
A Review of the Computational Methods Used in Physiological Systems
No ratings yet
A Review of the Computational Methods Used in Physiological Systems
16 pages
Rprop PDF
No ratings yet
Rprop PDF
6 pages
Unit 7 - 2
No ratings yet
Unit 7 - 2
59 pages
An Introduction To Deep ReinforcementLearning
No ratings yet
An Introduction To Deep ReinforcementLearning
65 pages
Fuzzy CMeans
No ratings yet
Fuzzy CMeans
3 pages
[Ebooks PDF] download Bioinformatics Research and Applications 10th International Symposium ISBRA 2014 Zhangjiajie China June 28 30 2014 Proceedings 1st Edition Mitra Basu full chapters
100% (1)
[Ebooks PDF] download Bioinformatics Research and Applications 10th International Symposium ISBRA 2014 Zhangjiajie China June 28 30 2014 Proceedings 1st Edition Mitra Basu full chapters
55 pages
Cognitive Electronic Warfare , Conceptual Design and Architecture
No ratings yet
Cognitive Electronic Warfare , Conceptual Design and Architecture
18 pages
Language Independent Sentiment Analysis
No ratings yet
Language Independent Sentiment Analysis
6 pages
Bagging and Random Forest Presentation1
100% (2)
Bagging and Random Forest Presentation1
23 pages
ZYN Rewards a Deep Dive Into the Future of Loyalty Programs in 2023
No ratings yet
ZYN Rewards a Deep Dive Into the Future of Loyalty Programs in 2023
5 pages
Mini project
No ratings yet
Mini project
26 pages
DSF - Unit V Notes
No ratings yet
DSF - Unit V Notes
7 pages
Klasifikasi Jenis Umbi Berdasarkan Citra Menggunakan SVM Dan KNN
No ratings yet
Klasifikasi Jenis Umbi Berdasarkan Citra Menggunakan SVM Dan KNN
4 pages
Flipkart Submitted Resume
No ratings yet
Flipkart Submitted Resume
1 page
10.2478 - Jaiscr 2019 0006
No ratings yet
10.2478 - Jaiscr 2019 0006
11 pages
Concepts and Techniques: - Chapter 11
No ratings yet
Concepts and Techniques: - Chapter 11
103 pages
Projects List Cse
No ratings yet
Projects List Cse
46 pages
Micro-Report-format 5 (1)
No ratings yet
Micro-Report-format 5 (1)
13 pages
(2022) Machine Learning Techniques To Model A Full-Scale Wastewater Treatment Plant With Biological Nutrient - Zaghloul, Achari
No ratings yet
(2022) Machine Learning Techniques To Model A Full-Scale Wastewater Treatment Plant With Biological Nutrient - Zaghloul, Achari
18 pages

Assignment_2

Uploaded by

Assignment_2

Uploaded by

DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCE

Course code: MO 3208

Method of Assessment: ASSIGNMENT 2

Criteria Description Full

Task 1: Hyperparameter Tuning of Decision Tree

Task 2: Building a Random Forest Ensemble

Task 3: Applying KNN for Classification

Task 4: Ensemble Methods (Stacking)

You might also like