0% found this document useful (0 votes)

12 views8 pages

Machine Learning Assignment (1)

The report details a programming assignment focused on binary and multi-class classification using machine learning techniques on two datasets: Breast Cancer Wisconsin and Car Evaluation. It includes data preprocessing, model implementation (Naive Bayes, KNN, Decision Tree, Random Forest), evaluation results, and hyperparameter tuning, highlighting that KNN and Random Forest performed best in their respective tasks. The conclusion emphasizes the significance of preprocessing, model selection, and tuning in machine learning applications.

Uploaded by

bcool4957

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views8 pages

Machine Learning Assignment (1)

Uploaded by

bcool4957

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

EC2011E Foundations of Machine Learning

Programming Assignment Report

Team Members:-

Kaigala Mani Charan – B230999EC

Kamana Narendra Subbaraj – B231001EC

K Vinay – B230996EC
1. Binary Classification: Breast Cancer Wisconsin (Diagnostic) Dataset

1.1 Dataset Description

The Breast Cancer Wisconsin (Diagnostic) dataset is widely used for binary classification tasks in the
medical domain. It consists of 569 instances with 30 real-valued input features computed from
digitized images of fine needle aspirates (FNA) of breast masses. The diagnosis (target variable) has
two classes:

 M = Malignant (cancerous)

 B = Benign (non-cancerous)

For each of the 10 features (radius, texture, perimeter, area, smoothness, compactness, concavity,
concave points, symmetry, and fractal dimension), the dataset provides:

 Mean

 Standard Error

 Worst (maximum) value

1.2 Preprocessing Steps

Data loading and preprocessing

import pandas as pd

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split

data = pd.read_csv("wdbc.data", header=None)

columns = ['ID', 'Diagnosis'] + [

f"{feat}_{stat}" for stat in ['mean', 'se', 'worst'] for feat in [

'radius', 'texture', 'perimeter', 'area', 'smoothness', 'compactness',

'concavity', 'concave_points', 'symmetry', 'fractal_dimension']

data.columns = columns

data.drop('ID', axis=1, inplace=True)

data['Diagnosis'] = data['Diagnosis'].map({'M': 1, 'B': 0})

features = [col for col in data.columns if '_mean' in col]

X = data[features]

y = data['Diagnosis']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

1.3 Models Implemented

 Naive Bayes Classifier using GaussianNB()

 K-Nearest Neighbors (KNN) with k=5 using KNeighborsClassifier()

from sklearn.naive_bayes import GaussianNB

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

Naive Bayes
nb = GaussianNB()

nb.fit(X_train, y_train)

y_pred_nb = nb.predict(X_test)

KNN
knn = KNeighborsClassifier(n_neighbors=5)

knn.fit(X_train_scaled, y_train)

y_pred_knn = knn.predict(X_test_scaled)

1.4 Evaluation Results

Naive Bayes Classifier Output:
Accuracy: 0.9474

Confusion Matrix:

[[70 1]

[ 5 38]]
Classification Report:

precision recall f1-score support

0 0.93 0.99 0.96 71

1 0.97 0.88 0.93 43

accuracy 0.95 114

macro avg 0.95 0.93 0.94 114

weighted avg 0.95 0.95 0.95 114

KNN Classifier Output (k = 5):

Accuracy: 0.9474

Confusion Matrix:

[[68 3]

[ 3 40]]

Classification Report:

precision recall f1-score support

0 0.96 0.96 0.96 71

1 0.93 0.93 0.93 43

accuracy 0.95 114

macro avg 0.94 0.94 0.94 114

weighted avg 0.95 0.95 0.95 114

2. PCA-Based Dimensionality Reduction

from sklearn.decomposition import PCA

for k in [10, 9, 8]:

pca = PCA(n_components=k)

X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

nb_pca = GaussianNB()

nb_pca.fit(X_train_pca, y_train)

print(f"Naive Bayes Accuracy with PCA-{k}:", accuracy_score(y_test, nb_pca.predict(X_test_pca)))

knn_pca = KNeighborsClassifier(n_neighbors=5)

knn_pca.fit(X_train_pca, y_train)

print(f"KNN Accuracy with PCA-{k}:", accuracy_score(y_test, knn_pca.predict(X_test_pca)))

PCA Results (k = number of components used):

Principal Components Naive Bayes Accuracy KNN Accuracy

10 0.9123 0.9474

9 0.9211 0.9474

8 0.9211 0.9474

3. KNN Hyperparameter Tuning

k_values = list(range(1, 16))

accuracies = []

for k in k_values:

model = KNeighborsClassifier(n_neighbors=k)

model.fit(X_train_scaled, y_train)

acc = model.score(X_test_scaled, y_test)

accuracies.append(acc)

plt.figure(figsize=(8, 5))

plt.plot(k_values, accuracies, marker='o')

plt.title("KNN Accuracy vs k")

plt.xlabel("k")

plt.ylabel("Accuracy")

plt.grid()
plt.show()

plot:

Observation:

 Highest accuracy observed around k = 5

 Smaller k leads to overfitting; higher k leads to underfitting.

4. Multi-Class Classification: Car Evaluation Dataset

import pandas as pd

from sklearn.preprocessing import OrdinalEncoder

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

car_data = pd.read_csv("car.data", header=None)

car_data.columns = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']

encoder = OrdinalEncoder()

X_car = encoder.fit_transform(car_data.drop('class', axis=1))

y_car = car_data['class']

Xc_train, Xc_test, yc_train, yc_test = train_test_split(X_car, y_car, test_size=0.2, random_state=42)

Decision Tree
dt = DecisionTreeClassifier(random_state=42)

dt.fit(Xc_train, yc_train)

yc_pred_dt = dt.predict(Xc_test)

Decision Tree Results:

Accuracy: 0.9739884393063584

Classification Report:

precision recall f1-score support

acc 0.97 0.92 0.94 83

good 0.62 0.91 0.74 11

unacc 1.00 1.00 1.00 235

vgood 1.00 0.94 0.97 17

accuracy 0.97 346

macro avg 0.90 0.94 0.91 346

weighted avg 0.98 0.97 0.98 346

Random Forest
rf = RandomForestClassifier(random_state=42)

rf.fit(Xc_train, yc_train)

yc_pred_rf = rf.predict(Xc_test)

Random Forest Results:

Accuracy: 0.9739884393063584
Classification Report:

precision recall f1-score support

acc 0.99 0.90 0.94 83

good 0.65 1.00 0.79 11

unacc 0.99 1.00 1.00 235

vgood 1.00 0.94 0.97 17

accuracy 0.97 346

macro avg 0.91 0.96 0.92 346

weighted avg 0.98 0.97 0.98 346

5. Conclusion

 KNN slightly outperformed Naive Bayes for binary classification, especially with scaling.

 PCA reduced dimensionality while maintaining high accuracy, especially for KNN.

 k = 5 was optimal for KNN in this dataset.

 Random Forest outperformed Decision Tree on the multi-class car dataset due to better
generalization from ensemble learning.

 The assignment highlights the importance of preprocessing, model selection, and

hyperparameter tuning in practical ML applications.

6. References

 UCI Machine Learning Repository

 scikit-learn Documentation (https://scikit-learn.org/)

 Course Lecture Slides and Notes

The Heights of Macchu Picchu - Pablo Neruda
100% (1)
The Heights of Macchu Picchu - Pablo Neruda
93 pages
Vc9808+ Manual
No ratings yet
Vc9808+ Manual
3 pages
Vertopal.com_ML LAB 8
No ratings yet
Vertopal.com_ML LAB 8
9 pages
machine-learning-assignment (1)
No ratings yet
machine-learning-assignment (1)
7 pages
ML Mini Project
No ratings yet
ML Mini Project
9 pages
ml lab programs 2
No ratings yet
ml lab programs 2
16 pages
Machine Learning Final Report
No ratings yet
Machine Learning Final Report
8 pages
DSBDA_10
No ratings yet
DSBDA_10
5 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
BI_6_NEW
No ratings yet
BI_6_NEW
6 pages
Knn
No ratings yet
Knn
4 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
ML
No ratings yet
ML
11 pages
ML101 Graded Assignment 2.Ipynb - Colab
No ratings yet
ML101 Graded Assignment 2.Ipynb - Colab
6 pages
6 - 2 - SVMS, - Randon - Forests - and - KNN - Ipynb - Colaboratory
No ratings yet
6 - 2 - SVMS, - Randon - Forests - and - KNN - Ipynb - Colaboratory
4 pages
ML Mini Project (1)
No ratings yet
ML Mini Project (1)
9 pages
Machine_Learning_II
No ratings yet
Machine_Learning_II
61 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Scikit Learn Cheat Sheet Python
No ratings yet
Scikit Learn Cheat Sheet Python
1 page
Machine Learnin1
100% (1)
Machine Learnin1
41 pages
Experiment 7
No ratings yet
Experiment 7
3 pages
PRAC7_23BME053
No ratings yet
PRAC7_23BME053
2 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
ANN_EXPERIENTIAL_LEARNING
No ratings yet
ANN_EXPERIENTIAL_LEARNING
43 pages
ML Lab Manual
No ratings yet
ML Lab Manual
6 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
Implementing KNN Algorithm: Importing Libraries
No ratings yet
Implementing KNN Algorithm: Importing Libraries
6 pages
ML0101EN Clas SVM Cancer Py v1
No ratings yet
ML0101EN Clas SVM Cancer Py v1
10 pages
ML Lab6
No ratings yet
ML Lab6
4 pages
Ml-Exp-2 - Jupyter Notebook
No ratings yet
Ml-Exp-2 - Jupyter Notebook
2 pages
ML 2 16
No ratings yet
ML 2 16
6 pages
ML Assignment 4
No ratings yet
ML Assignment 4
7 pages
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
aiml nts
No ratings yet
aiml nts
33 pages
Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
exp9-10
No ratings yet
exp9-10
4 pages
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
100% (1)
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
1 page
I Avaliação Parcial - 25.0 PTS - Gabarito
No ratings yet
I Avaliação Parcial - 25.0 PTS - Gabarito
9 pages
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
100% (1)
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
1 page
Garishav Basra 102103129 2CO5
No ratings yet
Garishav Basra 102103129 2CO5
8 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
Practical 6
No ratings yet
Practical 6
8 pages
ML LAB 146
No ratings yet
ML LAB 146
50 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
10 pages
Shobit Sharma (2124399) ML lab file pdf
No ratings yet
Shobit Sharma (2124399) ML lab file pdf
19 pages
AIML_Lab_3_4
No ratings yet
AIML_Lab_3_4
5 pages
Lec 17 -Dsfa23
No ratings yet
Lec 17 -Dsfa23
32 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
SVM(686) (1)
No ratings yet
SVM(686) (1)
5 pages
data preprocessing
No ratings yet
data preprocessing
9 pages
Experiment 7 Ids
No ratings yet
Experiment 7 Ids
12 pages
Practical No 6
No ratings yet
Practical No 6
3 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
WINSEM2024-25_CSE3008_ELA_AP2024254001161_2025-02-13_Reference-Material-I (1)
No ratings yet
WINSEM2024-25_CSE3008_ELA_AP2024254001161_2025-02-13_Reference-Material-I (1)
2 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
KNVP Sbi STMNT
No ratings yet
KNVP Sbi STMNT
3 pages
EFL S&L EXAM Speaking Rubric
100% (3)
EFL S&L EXAM Speaking Rubric
1 page
Sowmya Fee Receipt Oct-Dec-17
No ratings yet
Sowmya Fee Receipt Oct-Dec-17
1 page
Cs 116 Midterm Collection
No ratings yet
Cs 116 Midterm Collection
35 pages
Process Control Application
No ratings yet
Process Control Application
11 pages
Humid Tropical Expansive Soils of Trinidad Their G
No ratings yet
Humid Tropical Expansive Soils of Trinidad Their G
18 pages
2023 2024 Class Program Kinder 3
No ratings yet
2023 2024 Class Program Kinder 3
4 pages
full download Schweizer Geschichtsbuch Neubearbeitung Band 2 Vom Beginn Der Neuzeit Bis Zum Ersten Weltkrieg Schulerbuch Patrick Grob Kilian Grutter Beat Hatz Martin Kloter Klaus Pflugner online full chapter pdf
100% (9)
full download Schweizer Geschichtsbuch Neubearbeitung Band 2 Vom Beginn Der Neuzeit Bis Zum Ersten Weltkrieg Schulerbuch Patrick Grob Kilian Grutter Beat Hatz Martin Kloter Klaus Pflugner online full chapter pdf
70 pages
Portfolio: Teacher Induction Program
No ratings yet
Portfolio: Teacher Induction Program
14 pages
Daftar Nilai Bahasa Inggris Kelas 5
No ratings yet
Daftar Nilai Bahasa Inggris Kelas 5
28 pages
CON FLY Betocarb UF EN
No ratings yet
CON FLY Betocarb UF EN
4 pages
CAT 4 - Answer Key
No ratings yet
CAT 4 - Answer Key
3 pages
Productividad Economica.
No ratings yet
Productividad Economica.
8 pages
The Macroscopic Quantum Effect in Nonlinear Oscillating Systems: A Possible Bridge Between Classical and Quantum Physics Danil Doubochinski and Jonathan Tennenbaum
No ratings yet
The Macroscopic Quantum Effect in Nonlinear Oscillating Systems: A Possible Bridge Between Classical and Quantum Physics Danil Doubochinski and Jonathan Tennenbaum
16 pages
Piller UBT - 211011 GB - Web
No ratings yet
Piller UBT - 211011 GB - Web
12 pages
Bid 1323433
No ratings yet
Bid 1323433
2 pages
4 Common Refrigerator Compressor Problems
100% (1)
4 Common Refrigerator Compressor Problems
32 pages
Worms Day 1
No ratings yet
Worms Day 1
1 page
General Psycho Last-Ppt June 2023 Chapter 1 2 3 4
No ratings yet
General Psycho Last-Ppt June 2023 Chapter 1 2 3 4
99 pages
CS2301-Software Engineering 2 Marks
100% (1)
CS2301-Software Engineering 2 Marks
17 pages
Measuring Electrical Tool Safety Knowledge and Practices in the Workplace (GROUP 3)_REVISED
No ratings yet
Measuring Electrical Tool Safety Knowledge and Practices in the Workplace (GROUP 3)_REVISED
25 pages
20世纪上半叶英国博物学家苏柯仁在华博物学实践研究_范丽媛
No ratings yet
20世纪上半叶英国博物学家苏柯仁在华博物学实践研究_范丽媛
226 pages
Hempel, On The Nature of Mathematical Truth
No ratings yet
Hempel, On The Nature of Mathematical Truth
14 pages
Corrosion
No ratings yet
Corrosion
15 pages
CIS Microsoft Windows Server 2012 R2 Benchmark v2.2.1
No ratings yet
CIS Microsoft Windows Server 2012 R2 Benchmark v2.2.1
20 pages
The Central Nervous System 4th Edition Per Brodal - The newest ebook version is ready, download now to explore
No ratings yet
The Central Nervous System 4th Edition Per Brodal - The newest ebook version is ready, download now to explore
48 pages
The History of Information Communication Technology
No ratings yet
The History of Information Communication Technology
8 pages
USBOOSTERS - N Series Boosters - 07 2020 - 45 37064
No ratings yet
USBOOSTERS - N Series Boosters - 07 2020 - 45 37064
8 pages