AI Lab Report Detailed

This document outlines a laboratory practice focused on binary classification using machine learning techniques, specifically with the Breast Cancer Wisconsin dataset. Students will implement and evaluate two models, Multilayer Perceptron (MLP) and Decision Tree, while exploring the effects of various validation strategies and feature sets on model performance. Key findings include the superiority of MLP in precision and F1-score, the benefits of using multiple features, and the reliability of cross-validation over holdout validation.

Uploaded by

htn6x6y95y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views4 pages

AI Lab Report Detailed

Uploaded by

htn6x6y95y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Laboratory Practice 1 – Fundamentals

of Artificial Intelligence
1. Introduction
In recent years, the importance of machine learning (ML) techniques in solving real-world
problems has increased dramatically. One of the fundamental tasks in ML is classification,
particularly binary classification, which deals with categorizing data into two distinct
groups. This laboratory practice aims to introduce students to the basic workflow of solving
a classification problem using Python and the scikit-learn library. By performing a binary
classification task using two models — Multilayer Perceptron (MLP) and Decision Tree —
students gain practical experience in training, validating, and evaluating machine learning
models. The primary dataset used in this practice is the Breast Cancer Wisconsin dataset, a
widely used benchmark in biomedical ML research. Through this exercise, we analyze the
impact of different validation strategies (holdout vs. k-fold cross-validation), input feature
sets, and random seed values on model performance. In total, 16 experimental
configurations are tested, offering a comprehensive view of how each factor influences
accuracy, recall, precision, and F1-score.

2. Dataset Description
The dataset used for this practice is the Breast Cancer Wisconsin dataset, included in the
scikit-learn library. It contains 569 instances, each with 30 numerical features extracted
from digitized images of breast mass tissue. These features describe characteristics of the
cell nuclei present in the image. The target variable is binary, where '0' indicates a benign
tumor and '1' indicates a malignant tumor. The dataset is relatively clean, with no missing
values, making it ideal for introductory machine learning experiments. Among the 30
features, we focus on a subset considered informative for the classification task:
- mean perimeter
- mean smoothness
- mean concave points

These features are chosen based on domain knowledge and previous studies showing their
relevance in tumor classification. The dataset exhibits a slight class imbalance, which needs
to be considered during evaluation.

3. Data Preparation
Data preparation is a crucial step in the machine learning pipeline. The dataset is first
loaded and converted into a pandas DataFrame for easier manipulation. Labels are stored in
a separate Series. Two different feature sets are considered in this study:
1. A single feature: 'mean perimeter'
2. A combination of three features: 'mean perimeter', 'mean smoothness', and 'mean
concave points'

This allows us to study the effect of feature richness on model performance. The labels
remain unchanged, representing the binary classification goal. During preprocessing, the
data is not normalized or scaled, though in practice this could benefit algorithms such as
MLP. For holdout validation, the data is split using a 90/10 train-test ratio. For cross-
validation, 5-fold (K=5) splitting is used. The same seed values are applied consistently
across experiments to ensure reproducibility.

4. Experiment Design
To thoroughly assess the effect of model choice, validation method, input features, and seed
values on model performance, 16 combinations are tested:

- Model: Multilayer Perceptron (MLP) or Decision Tree

- Validation method: Holdout or 5-Fold Cross-Validation
- Feature set: one feature or three features
- Random seed: 0 (fixed) or a personal identifier (e.g., DNI)

For each configuration, the following performance metrics are computed:

- Accuracy: the proportion of correctly classified samples
- Recall: the proportion of true positives detected
- Precision: the proportion of positive predictions that are correct
- F1-score: the harmonic mean of precision and recall

Cross-validation metrics are averaged across all folds, while holdout metrics are based on a
single test split. The script generates confusion matrices for visual assessment and outputs
metrics to the console or log file.

5. Experimental Results
The results of the 16 experiments are summarized in the table below. Each row
corresponds to one unique combination of model, validation method, feature set, and seed
value. The performance metrics are recorded to analyze how each variable affects
classification quality.

[INSERT YOUR RESULTS TABLE HERE]

Based on these results, several trends can be observed and are discussed in the next section.
6. Analysis and Discussion
From the experiments, it becomes clear that the choice of input features has a major impact
on model performance. Using only 'mean perimeter' often leads to poor recall values,
especially when the data is split unfavorably due to the random seed. Adding two more
features significantly improves the classifier's ability to distinguish malignant cases.

Model comparison shows that the MLP generally outperforms the Decision Tree in terms of
precision and F1-score, particularly when cross-validation is used. This suggests that MLP is
better at generalizing, though it is also more sensitive to data scaling and convergence
settings.

Regarding validation, 5-Fold Cross-Validation provides more stable and reliable results
compared to holdout validation. The variability introduced by different seed values is
mitigated in cross-validation, which helps in obtaining a more accurate performance
estimate.

Finally, different seed values affect both training/test splits and model initialization
(especially for MLP). While the effect is sometimes small, in borderline cases it can
significantly impact results.

7. Conclusions
This laboratory practice demonstrated the fundamental steps in approaching a binary
classification problem using machine learning. The impact of model selection, validation
strategy, feature richness, and random seed was studied through systematic
experimentation.

Key conclusions include:

- Using multiple features significantly improves model performance.
- The MLP model, while more sensitive to training conditions, generally achieves higher
metrics.
- Cross-validation offers more reliable and reproducible evaluations than holdout.
- Proper feature selection and evaluation metric choice (especially F1-score and recall) are
critical in medical datasets, where false negatives can be dangerous.

Future improvements may include feature scaling, hyperparameter tuning, and use of
ensemble methods.

8. Appendix
Additional outputs such as confusion matrices, learning curves, and logs can be found in the
output_images directory or captured via script output. Below is a list of files:
- confusion_matrix_fold_X_seed_Y.png
- confusion_matrix_retention_seed_Y.png

These figures complement the numerical evaluation and help assess which combinations of
parameters produce more confident and accurate models.

6 - Steps of The Classification Algorithm in Supervised Learning
No ratings yet
6 - Steps of The Classification Algorithm in Supervised Learning
15 pages
Minor Project
No ratings yet
Minor Project
21 pages
Final Documentation
No ratings yet
Final Documentation
101 pages
A Computational Study On Classification of Malignant
No ratings yet
A Computational Study On Classification of Malignant
63 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
Final 1
No ratings yet
Final 1
36 pages
Data Analysis and Machine Learning On The Wisconsin Breast Cancer Dataset
No ratings yet
Data Analysis and Machine Learning On The Wisconsin Breast Cancer Dataset
11 pages
AI Final Presentation-2
No ratings yet
AI Final Presentation-2
22 pages
Update Ai 2
No ratings yet
Update Ai 2
22 pages
KR23 DL Lab Record
No ratings yet
KR23 DL Lab Record
59 pages
ML - Collection.2019 04 15
No ratings yet
ML - Collection.2019 04 15
30 pages
One Stream Design and Reference Guide
100% (1)
One Stream Design and Reference Guide
1,186 pages
How To Prepare Your Dataset For Machine Learning in Python
No ratings yet
How To Prepare Your Dataset For Machine Learning in Python
14 pages
ML - Mod2 Classification
No ratings yet
ML - Mod2 Classification
74 pages
Student Name: Gaurav Raut University ID: 2038584: 6CS012 Workshop 4
No ratings yet
Student Name: Gaurav Raut University ID: 2038584: 6CS012 Workshop 4
48 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
Exp6 C121 MLOA
No ratings yet
Exp6 C121 MLOA
15 pages
Phython 3
No ratings yet
Phython 3
10 pages
ML Report2
No ratings yet
ML Report2
21 pages
AngadKumar - 21CS012 - Pattern Recognition
No ratings yet
AngadKumar - 21CS012 - Pattern Recognition
8 pages
AMDL Practicals
No ratings yet
AMDL Practicals
27 pages
Classification: Prof. Gheith Abandah
No ratings yet
Classification: Prof. Gheith Abandah
30 pages
Rahul Phase 4...
No ratings yet
Rahul Phase 4...
13 pages
Course Work AI - Foundation
No ratings yet
Course Work AI - Foundation
12 pages
2018 02 Msu Data Science
No ratings yet
2018 02 Msu Data Science
65 pages
Report
No ratings yet
Report
6 pages
Practical Aplication 2
No ratings yet
Practical Aplication 2
10 pages
ML Experiment 4
No ratings yet
ML Experiment 4
6 pages
Machine Learning in Medicine: A Practical Introduction To Techniques For Data Pre-Processing, Hyperparameter Tuning, and Model Comparison
No ratings yet
Machine Learning in Medicine: A Practical Introduction To Techniques For Data Pre-Processing, Hyperparameter Tuning, and Model Comparison
15 pages
Lab Module 2 - ML Classifier
No ratings yet
Lab Module 2 - ML Classifier
2 pages
Phyton
No ratings yet
Phyton
10 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Exp 3
No ratings yet
Exp 3
7 pages
Ass Report
No ratings yet
Ass Report
6 pages
SC Lab File Fayiz PDF
No ratings yet
SC Lab File Fayiz PDF
29 pages
ML Lab 146
No ratings yet
ML Lab 146
50 pages
01 Machine Learning
No ratings yet
01 Machine Learning
25 pages
Exp 1 - Exp 2 - Exp 3 - Merged
No ratings yet
Exp 1 - Exp 2 - Exp 3 - Merged
9 pages
Progress Report-V (Analysis Report) On Dissertation Phase-II
No ratings yet
Progress Report-V (Analysis Report) On Dissertation Phase-II
16 pages
Sameed Ahmed Khan Tools For Artificial Neural Network and Machine Learning
No ratings yet
Sameed Ahmed Khan Tools For Artificial Neural Network and Machine Learning
14 pages
AR23 Machine Learning LAB
No ratings yet
AR23 Machine Learning LAB
3 pages
Ann Experiential Learning
No ratings yet
Ann Experiential Learning
43 pages
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
No ratings yet
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
26 pages
286IARP27
No ratings yet
286IARP27
72 pages
ML Digit Classification Report
No ratings yet
ML Digit Classification Report
7 pages
Lab 8
No ratings yet
Lab 8
5 pages
ML Lab
No ratings yet
ML Lab
23 pages
ML Lab 01999676272
No ratings yet
ML Lab 01999676272
12 pages
ML - Hands On
No ratings yet
ML - Hands On
24 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
ML RECORD - Merged
No ratings yet
ML RECORD - Merged
33 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
CHAPTER 4 Diabetes
No ratings yet
CHAPTER 4 Diabetes
6 pages
Chapter # 4 Networking and The The Internet
No ratings yet
Chapter # 4 Networking and The The Internet
27 pages
ChatGPT - MyLearning On Coding For Machine Learning
No ratings yet
ChatGPT - MyLearning On Coding For Machine Learning
16 pages
AIML 7 To 11
No ratings yet
AIML 7 To 11
7 pages
Mastering Partial Least Squares Structural Equation Modeling (Pls-Sem) with Smartpls in 38 Hours
From Everand
Mastering Partial Least Squares Structural Equation Modeling (Pls-Sem) with Smartpls in 38 Hours
Ken Kwong-Kay Wong
3/5 (1)
Exercise - 4 - BDPA - Shayekh Arif
No ratings yet
Exercise - 4 - BDPA - Shayekh Arif
1 page
Binary Classification Tutorial With The Keras Deep Learning Library
No ratings yet
Binary Classification Tutorial With The Keras Deep Learning Library
33 pages
SD
No ratings yet
SD
10 pages
HiPer HR Owners Manual
No ratings yet
HiPer HR Owners Manual
85 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
2021 SharePoint QMS Guidebook
No ratings yet
2021 SharePoint QMS Guidebook
29 pages
SIP5 - 7SA SD 82 84 86 7SL 82 86 SJ 86 - V08.40 - Manual - C010 E - en
0% (1)
SIP5 - 7SA SD 82 84 86 7SL 82 86 SJ 86 - V08.40 - Manual - C010 E - en
2,200 pages
OPC Automation Wrapper+Tutorial
No ratings yet
OPC Automation Wrapper+Tutorial
28 pages
Chaos Overlords - Manual
No ratings yet
Chaos Overlords - Manual
43 pages
Worksheet: The Leaning Tower of Pisa
100% (1)
Worksheet: The Leaning Tower of Pisa
2 pages
Computer Assignment 2 by Arghya Mishra
No ratings yet
Computer Assignment 2 by Arghya Mishra
45 pages
PowerPoint - All About Digital Citizenship New Zealand PowerPoint (Years 5-8)
No ratings yet
PowerPoint - All About Digital Citizenship New Zealand PowerPoint (Years 5-8)
23 pages
Notes Ariba-Supplier-Guide
No ratings yet
Notes Ariba-Supplier-Guide
94 pages
Exponents and Scientifi C Notation: "The of My Are - The of My Are My ."
No ratings yet
Exponents and Scientifi C Notation: "The of My Are - The of My Are My ."
54 pages
TTLM
No ratings yet
TTLM
21 pages
Sbi Yono FINAL PROJECT
No ratings yet
Sbi Yono FINAL PROJECT
46 pages
Features and Constraints in Embedded System
No ratings yet
Features and Constraints in Embedded System
9 pages
DH Serisi Arıza Kodları.
No ratings yet
DH Serisi Arıza Kodları.
71 pages
Dawat e Deen Ka Ambiyai Tareeqe Kar - Dr. Mahmood Hasan Ilha Aabadi - Free Download, Borrow, and Streaming - Internet Archive
No ratings yet
Dawat e Deen Ka Ambiyai Tareeqe Kar - Dr. Mahmood Hasan Ilha Aabadi - Free Download, Borrow, and Streaming - Internet Archive
1 page
Registers
No ratings yet
Registers
6 pages
Computedradiography 190411130133
No ratings yet
Computedradiography 190411130133
32 pages
Pcap 31 03
No ratings yet
Pcap 31 03
6 pages
Neuro Fuzzy - Session 1
No ratings yet
Neuro Fuzzy - Session 1
21 pages
Karmarkar's Method: Appendix E
No ratings yet
Karmarkar's Method: Appendix E
8 pages
DTM - PT-2 (Updated) Question Bank
No ratings yet
DTM - PT-2 (Updated) Question Bank
11 pages
SAA2 - Geeni App For PC Download (Windows 11,10,8,7)
No ratings yet
SAA2 - Geeni App For PC Download (Windows 11,10,8,7)
5 pages
Unlimited Tinder Openers
0% (3)
Unlimited Tinder Openers
11 pages
Cellnode Brap: Product Overview
No ratings yet
Cellnode Brap: Product Overview
2 pages
V-Station Quick Start
No ratings yet
V-Station Quick Start
2 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
MCGD Issue
No ratings yet
MCGD Issue
4 pages
21.4 Graphs On Semi-Log Paper
No ratings yet
21.4 Graphs On Semi-Log Paper
2 pages

AI Lab Report Detailed

Uploaded by

AI Lab Report Detailed

Uploaded by

Laboratory Practice 1 – Fundamentals

- Model: Multilayer Perceptron (MLP) or Decision Tree

For each configuration, the following performance metrics are computed:

[INSERT YOUR RESULTS TABLE HERE]

Key conclusions include:

You might also like