0% found this document useful (0 votes)

40 views5 pages

IntroAI - 2425 HK2 - Project 2

This project involves building decision trees using real-world datasets, specifically the UCI Heart Disease dataset and the Palmer Penguins dataset, along with an additional dataset chosen by the student. Students will prepare datasets, build classifiers, evaluate performance, and analyze the impact of decision tree depth on accuracy. A comprehensive report detailing the process, findings, and visualizations is required for submission.

Uploaded by

Nguyễn Trần Thiên Phú

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views5 pages

IntroAI - 2425 HK2 - Project 2

Uploaded by

Nguyễn Trần Thiên Phú

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Project 2 Introduction to Artificial Intelligence CS14003

Project 2

Decision Tree

1 Description
In this assignment, you are going to build decision trees on real-world datasets using scikit-learn.

The datasets you will be working on include:

• Binary class dataset: The UCI Heart Disease dataset is used for classifying whether a
patient has a heart disease or not based on age, blood pressure, cholesterol level, and other
medical indicators. This dataset includes 303 samples, with labels indicating presence (1) or
absence (0) of heart disease.

• Multi-class dataset: The Palmer Penguins dataset is used for classifying penguin species
based on physical characteristics. The dataset includes 344 samples of three penguin species:
Adelie, Chinstrap, and Gentoo, with features such as bill length, flipper length, body mass,
and sex.

• Additional dataset: You have to find another dataset and build the decision tree for it.
Please provide a detailed description of the dataset information in your report.
Your dataset must:

– Contain both features and labels for supervised learning.

– Include at least 300 samples for meaningful analysis.
– Contain multiple classes or at least two binary classes.

2 Specifications
You are required to write Python Notebooks (.ipynb) and use the scikit-learn library to
complete the following tasks described for the Heart Disease dataset.
For the remaining datasets (Penguins dataset and your additional dataset), perform similar
tasks as with the Heart Disease dataset.
While there are no strict guidelines for code organization, each task must be clearly documented
and fully comply with all specified requirements.

University of Science Faculty of Information Technology Page 1

Project 2 Introduction to Artificial Intelligence CS14003

2.1 Preparing the datasets

This task sets up the training and test datasets for the upcoming experiments.
Using the features and labels above, please prepare the following four subsets:

• feature_train: a set of training samples.

• label_train: a set of labels corresponding to the samples in feature_train.

• feature_test: a set of test samples with a structure to feature_train.

• label_test: a set of labels corresponding to the samples in feature_test.

You need to shuffle the dataset before splitting and ensure it is split in a stratified fashion.
Other parameters (if there are any) should remain at their default settings.
There will be experiments on training and test sets with different proportions, including 40/60,
60/40, 80/20, and 90/10 (train/test); therefore, you will need 16 subsets in total.
Visualize the class distributions in all datasets (the original set, training sets, and test sets)
across all proportions to demonstrate that they have been appropriately prepared.

2.2 Building the decision tree classifiers

This task involves conducting experiments on the designated train/test proportions listed above.
You need to fit an instance of sklearn.tree.DecisionTreeClassifier (using information gain)
to each training set and visualize the resulting decision tree with Graphviz.

Figure 1: Example for a decision tree classifier (with depth = 2).

University of Science Faculty of Information Technology Page 2

Project 2 Introduction to Artificial Intelligence CS14003

2.3 Evaluating the decision tree classifiers

For each of the above decision tree classifiers, predict the samples in the corresponding test set
and generate a report using classification_report and confusion_matrix.

Figure 2: Example for Classification Report and Confusion Matrix.

How do you interpret the classification report and the confusion matrix? Based on the results,
provide your insights into the performance of these decision tree classifiers.

2.4 The depth and accuracy of a decision tree

This task focuses on the 80/20 training and test sets. You need to consider that how the depth of
the decision tree affects classification accuracy.
You can specify the maximum depth of a decision tree by adjusting the max_depth parameter.
Try the following values for parameter max_depth: None, 2, 3, 4, 5, 6, 7. Then:

• Provide the decision trees, visualized using Graphviz, for each max_depth value.

• Report the accuracy_score (on the test set) of the decision tree classifier for each value of
the max_depth parameter in the following table.

max_depth None 2 3 4 5 6 7
Accuracy

• Provide charts and your insights on the statistics reported above.

University of Science Faculty of Information Technology Page 3

Project 2 Introduction to Artificial Intelligence CS14003

2.5 Repeat for Other Datasets

You are required to repeat the same workflow described above for both the Penguins dataset
and your chosen Additional dataset. For categorical features, please use one-hot encoding.
After completing the experiments for all datasets, write a comparative analysis in your report.
Discuss how characteristics of each dataset — including the number of classes, number of features,
and sample size — affect the decision tree’s performance. Use tables or plots to summarize your
findings and support your conclusions.

3 Requirements
3.1 Report
The report must include the following sections:

• Member information (Student ID, full name, etc.).

• Work assignment table, which includes information on each task assigned to team members,
along with the completion rate of each member compared to the assigned tasks. For example,
student A has a percentage of completion 90% and the group work has a total score of 9.0,
then A receives a score of 9.0 ∗ 90% = 8.1.

• A self-evaluation of the completion rate of the project and other requirements.

• All visualizations must be presented in the .ipynb file, while statistical results and insights
must be presented in the report.

• The report needs to be well-formatted and exported to PDF. If there are figures cut off by
the page break, etc., points will be deducted.

• References (if any).

3.2 Submission
• All reports, code, etc., must be contributed in the form of a compressed file (.zip, .rar, .7z)
and named according to the format: StudentID1_StudentID2_etc.zip/.rar/.7z.

• If the compressed file is larger than 25MB, prioritize compressing the report and source code.
Images and other large files may be uploaded to the Google Drive and shared via a link.

University of Science Faculty of Information Technology Page 4

Project 2 Introduction to Artificial Intelligence CS14003

4 Assessment
The detailed assessment criteria for this project are outlined as follows:

No. Criteria Score

1 Analysis of the Heart Disease dataset. 30%
2 Analysis of the Palmer Penguins dataset. 30%
3 Analysis of an additional dataset. 30%
4 Comparative analysis of all three datasets. 5%
5 Well-structured and formatted notebooks. 5%
Total 100%

The detailed assessment criteria for each dataset are outlined as follows:

No. Criteria Score

1 Data preparation. 30%
2 Implement decision tree classifiers. 20%
3 Performance evaluation of decision tree.
- Classification report and confusion matrix. 10%
- Insights. 10%
4 Depth and accuracy of decision trees.
- Visualization (trees, tables, charts). 20%
- Insights. 10%
Total 100%

5 Notices
Please pay attention to the following notices:

• This is a GROUP assignment. Each group has 4 members.

• Duration: about 3 weeks.

• Any plagiarism, any tricks, or any lie will have a 0 point for the course grade.

The end.

University of Science Faculty of Information Technology Page 5

Managing the Testing Process: Practical Tools and Techniques for Managing Hardware and Software Testing
From Everand
Managing the Testing Process: Practical Tools and Techniques for Managing Hardware and Software Testing
Rex Black
4/5 (8)
6.891 Machine Learning: Project Proposal
No ratings yet
6.891 Machine Learning: Project Proposal
2 pages
Decision Tree: 1 Description
No ratings yet
Decision Tree: 1 Description
5 pages
Lab 2
No ratings yet
Lab 2
3 pages
Lab # 10
No ratings yet
Lab # 10
6 pages
Lab 2
No ratings yet
Lab 2
17 pages
210170111018ai (1) Rkjher
No ratings yet
210170111018ai (1) Rkjher
36 pages
AI Manual
No ratings yet
AI Manual
69 pages
Decision Tree
No ratings yet
Decision Tree
44 pages
210..127 Ai
No ratings yet
210..127 Ai
35 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Lab 02: Decision Tree With Scikit-Learn: About The Mushroom Data Set
No ratings yet
Lab 02: Decision Tree With Scikit-Learn: About The Mushroom Data Set
3 pages
Ruturajfinalmannual
No ratings yet
Ruturajfinalmannual
36 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
Data Mining - Lab 2
No ratings yet
Data Mining - Lab 2
5 pages
Soft Computing Lab Practical Assignment 2
No ratings yet
Soft Computing Lab Practical Assignment 2
10 pages
BIT 3206 Artificial Intellligence Assignment 2
No ratings yet
BIT 3206 Artificial Intellligence Assignment 2
1 page
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
CS440: HW3
No ratings yet
CS440: HW3
7 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning Practical
No ratings yet
Machine Learning Practical
59 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Lecture2 DT
No ratings yet
Lecture2 DT
103 pages
Using Vocals Determine Human Emotion
From Everand
Using Vocals Determine Human Emotion
Faiz ul haque Zeya
No ratings yet
02 LecDT
No ratings yet
02 LecDT
85 pages
Hindusthan College of Engineering and Technology
No ratings yet
Hindusthan College of Engineering and Technology
9 pages
TMLS20 Machine Learning Coursework-1
No ratings yet
TMLS20 Machine Learning Coursework-1
5 pages
ML Project Guidelines SWE Winter 2024
No ratings yet
ML Project Guidelines SWE Winter 2024
8 pages
IDAI610 PS1 DecisionTree
No ratings yet
IDAI610 PS1 DecisionTree
5 pages
Lecture2 DT
No ratings yet
Lecture2 DT
89 pages
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
No ratings yet
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
4 pages
1 22csu601-Aiml Syllabus
No ratings yet
1 22csu601-Aiml Syllabus
4 pages
FREE AI Code Generator - Generate Code Online in Any Language
No ratings yet
FREE AI Code Generator - Generate Code Online in Any Language
12 pages
ML2
No ratings yet
ML2
7 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
DWDM 4
No ratings yet
DWDM 4
58 pages
Machine Learning (Se204A) Lab Manual
No ratings yet
Machine Learning (Se204A) Lab Manual
27 pages
Movie Prediction
No ratings yet
Movie Prediction
9 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
CE802 Pilot
No ratings yet
CE802 Pilot
2 pages
INF385T IMLsyllabus
No ratings yet
INF385T IMLsyllabus
4 pages
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
22 pages
CSE455/CSE552 Machine Learning (Spring 2024) Homework #2: Hand-In Policy Collaboration Policy Grading
No ratings yet
CSE455/CSE552 Machine Learning (Spring 2024) Homework #2: Hand-In Policy Collaboration Policy Grading
2 pages
AMAN
No ratings yet
AMAN
51 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
4 pages
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
IntroML Project Description - CLC 2425
No ratings yet
IntroML Project Description - CLC 2425
5 pages
Lakshmi Priya Module 3 Assignment
No ratings yet
Lakshmi Priya Module 3 Assignment
6 pages
Introduction To Intelligent Systems
No ratings yet
Introduction To Intelligent Systems
3 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
7 Classification
100% (3)
7 Classification
63 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
AIot Lab Syllabus
No ratings yet
AIot Lab Syllabus
4 pages
Presentation Report S2019 Artificial Intelligence-CS360
No ratings yet
Presentation Report S2019 Artificial Intelligence-CS360
9 pages
CSR 304 Syllabus
No ratings yet
CSR 304 Syllabus
3 pages
Hasnain Saeed Lab Task # 11
No ratings yet
Hasnain Saeed Lab Task # 11
11 pages
CPFD Barracuda
No ratings yet
CPFD Barracuda
11 pages
Cost Effectiveness Analysis in Health A Practical Approach, 3rd Edition All Chapter
100% (19)
Cost Effectiveness Analysis in Health A Practical Approach, 3rd Edition All Chapter
14 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
DGD 11
No ratings yet
DGD 11
21 pages
Loan Approval Model Prediction
No ratings yet
Loan Approval Model Prediction
10 pages
Lecture 11. Ch4. Decision Making Techniques (Part Two)
No ratings yet
Lecture 11. Ch4. Decision Making Techniques (Part Two)
32 pages
An Analytic-Based Course Recommendation System For Higher Education
No ratings yet
An Analytic-Based Course Recommendation System For Higher Education
6 pages
B3 Twitter Data
No ratings yet
B3 Twitter Data
68 pages
(INTI
No ratings yet
(INTI
9 pages
Functional Bid Landscape Forecasting For Display Advertising
No ratings yet
Functional Bid Landscape Forecasting For Display Advertising
16 pages
A Decision Tree Based Data-Driven Diagnostic Strategy For Air Handling Units
No ratings yet
A Decision Tree Based Data-Driven Diagnostic Strategy For Air Handling Units
9 pages
Green Lab Report Template
No ratings yet
Green Lab Report Template
47 pages
Business Strategy 1C - Decision Tree
No ratings yet
Business Strategy 1C - Decision Tree
13 pages
CP CH2 Ims555
No ratings yet
CP CH2 Ims555
7 pages
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
No ratings yet
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
24 pages
AI Unit 4
No ratings yet
AI Unit 4
30 pages
Summary of Decision Tree
No ratings yet
Summary of Decision Tree
7 pages
ISE302 - IT Project Management
No ratings yet
ISE302 - IT Project Management
25 pages
Heizer Om13 TB MA
No ratings yet
Heizer Om13 TB MA
39 pages
Disability Identification Tool Selection Guide - NoMacro
No ratings yet
Disability Identification Tool Selection Guide - NoMacro
11 pages
EVPI
No ratings yet
EVPI
9 pages
Capstone Notes-Model
No ratings yet
Capstone Notes-Model
20 pages
An Emmet S Tale The Duality of Social and Lexical 2023 Language Communica
No ratings yet
An Emmet S Tale The Duality of Social and Lexical 2023 Language Communica
15 pages
Module 2
No ratings yet
Module 2
42 pages
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
No ratings yet
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
6 pages
Under Pressure - Mastering Decision-Making in Demanding Circumstances
No ratings yet
Under Pressure - Mastering Decision-Making in Demanding Circumstances
15 pages
Students Placement Prediction Using Machine Learning Algorithms
No ratings yet
Students Placement Prediction Using Machine Learning Algorithms
14 pages
Full Document - Carbon
No ratings yet
Full Document - Carbon
120 pages
Data Analytics - Unit 5
No ratings yet
Data Analytics - Unit 5
56 pages
Farm Fusion
No ratings yet
Farm Fusion
14 pages

IntroAI - 2425 HK2 - Project 2

Uploaded by

IntroAI - 2425 HK2 - Project 2

Uploaded by

Project 2 Introduction to Artificial Intelligence CS14003

The datasets you will be working on include:

– Contain both features and labels for supervised learning.

University of Science Faculty of Information Technology Page 1

2.1 Preparing the datasets

• feature_train: a set of training samples.

• label_train: a set of labels corresponding to the samples in feature_train.

• feature_test: a set of test samples with a structure to feature_train.

• label_test: a set of labels corresponding to the samples in feature_test.

2.2 Building the decision tree classifiers

Figure 1: Example for a decision tree classifier (with depth = 2).

University of Science Faculty of Information Technology Page 2

2.3 Evaluating the decision tree classifiers

Figure 2: Example for Classification Report and Confusion Matrix.

2.4 The depth and accuracy of a decision tree

• Provide charts and your insights on the statistics reported above.

University of Science Faculty of Information Technology Page 3

2.5 Repeat for Other Datasets

• Member information (Student ID, full name, etc.).

• A self-evaluation of the completion rate of the project and other requirements.

• References (if any).

University of Science Faculty of Information Technology Page 4

No. Criteria Score

No. Criteria Score

• This is a GROUP assignment. Each group has 4 members.

• Duration: about 3 weeks.

University of Science Faculty of Information Technology Page 5

You might also like