Iris Flower Classification
Iris Flower Classification
Open in Colab
Contribution - Individual
Task - 1
Project Summary -
Project Description:
The Iris Flower Classification project focuses on developing a machine learning model to
classify iris flowers into their respective species based on specific measurements. Iris flowers
are classified into three species: setosa, versicolor, and virginica, each of which exhibits
distinct characteristics in terms of measurements.
Objective:
The primary goal of this project is to leverage machine learning techniques to build a
classification model that can accurately identify the species of iris flowers based on their
measurements. The model aims to automate the classification process, offering a practical
solution for identifying iris species.
GitHub Link -
GitHub Link - https://github.com/Apaulgithub/oibsip_task1/tree/main
1 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Problem Statement
The iris flower, scientifically known as Iris, is a distinctive genus of flowering plants. Within
this genus, there are three primary species: Iris setosa, Iris versicolor, and Iris virginica. These
species exhibit variations in their physical characteristics, particularly in the measurements of
their sepal length, sepal width, petal length, and petal width.
Objective:
The objective of this project is to develop a machine learning model capable of learning
from the measurements of iris flowers and accurately classifying them into their respective
species. The model's primary goal is to automate the classification process based on the
distinct characteristics of each iris species.
Project Details:
• Iris Species: The dataset consists of iris flowers, specifically from the species setosa,
versicolor, and virginica.
• Key Measurements: The essential characteristics used for classification include sepal
length, sepal width, petal length, and petal width.
• Machine Learning Model: The project involves the creation and training of a machine
learning model to accurately classify iris flowers based on their measurements.
This project's significance lies in its potential to streamline and automate the classification of
iris species, which can have broader applications in botany, horticulture, and environmental
monitoring.
Let's Begin !
Import Libraries
2 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
In [ ]: # Import Libraries
# Importing Numpy & Pandas for data processing & data wrangling
import numpy as np
import pandas as pd
Dataset Loading
In [ ]: # Load Dataset
df = pd.read_csv("https://raw.githubusercontent.com/Apaulgithub/oibsip_task1/main/Iris.csv"
3 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Dataset Information
In [ ]: # Dataset Info
# Checking information about the dataset using info
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
Duplicate Values
Out[ ]: Id 0
SepalLengthCm 0
SepalWidthCm 0
PetalLengthCm 0
PetalWidthCm 0
Species 0
dtype: int64
4 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
• The Iris dataset consists of length and width mesurements of sepal and petal for
different species in centimeter.
• There are 150 rows and 6 columns provided in the data.
• No duplicate values exist.
• No Null values exist.
5 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
3. Data Wrangling
6 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
7 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
In [ ]: # Define colors for each species and the corresponding species labels.
colors = ['red', 'yellow', 'green']
species = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
In [ ]: # Chart - 2 Scatter plot visualization code for Sepal Length vs Sepal Width.
# Create a scatter plot for Sepal Length vs Sepal Width for each species.
for i in range(3):
# Select data for the current species.
x = data[data['Species'] == species[i]]
# Create a scatter plot with the specified color and label for the current species.
plt.scatter(x['SepalLengthCm'], x['SepalWidthCm'], c=colors[i], label=species[i
8 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
In [ ]: # Chart - 3 Scatter plot visualization code for Petal Length vs Petal Width.
# Create a scatter plot for Petal Length vs Petal Width for each species.
for i in range(3):
# Select data for the current species.
x = data[data['Species'] == species[i]]
# Create a scatter plot with the specified color and label for the current species.
plt.scatter(x['PetalLengthCm'], x['PetalWidthCm'], c=colors[i], label=species[i
9 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
In [ ]: # Chart - 4 Scatter plot visualization code for Sepal Length vs Petal Length.
# Create a scatter plot for Sepal Length vs Petal Length for each species.
for i in range(3):
# Select data for the current species.
x = data[data['Species'] == species[i]]
# Create a scatter plot with the specified color and label for the current species.
plt.scatter(x['SepalLengthCm'], x['PetalLengthCm'], c=colors[i], label=species[
10 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
In [ ]: # Chart - 5 Scatter plot visualization code for Sepal Width vs Petal Width.
# Create a scatter plot for Sepal Width vs Petal Width for each species.
for i in range(3):
# Select data for the current species.
x = data[data['Species'] == species[i]]
# Create a scatter plot with the specified color and label for the current species.
plt.scatter(x['SepalWidthCm'], x['PetalWidthCm'], c=colors[i], label=species[i])
11 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
# Plot Heatmap
plt.figure(figsize=(8, 4))
sns.heatmap(corr_matrix, annot=True, cmap='Reds_r')
# Setting Labels
plt.title('Correlation Matrix heatmap')
# Display Chart
plt.show()
1. Categorical Encoding
In [ ]: # Encode the categorical columns
# Create a LabelEncoder object
le = LabelEncoder()
# Encode the 'Species' column to convert the species names to numerical labels
data['Species'] = le.fit_transform(data['Species'])
12 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
2. Data Scaling
In [ ]: # Defining the X and y
x=data.drop(columns=['Species'], axis=1)
y=data['Species']
3. Data Splitting
In [ ]: # Splitting the data to train and test
x_train,x_test,y_train,y_test=train_test_split(x,y, test_size=0.3)
Out[ ]: 1 37
2 35
0 33
Name: Species, dtype: int64
6. ML Model Implementation
13 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
print("\nConfusion Matrix:")
sns.heatmap(cm_train, annot=True, xticklabels=['Negative', 'Positive'], yticklabels
ax[0].set_xlabel("Predicted Label")
ax[0].set_ylabel("True Label")
ax[0].set_title("Train Confusion Matrix")
plt.tight_layout()
plt.show()
14 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
1. Explain the ML Model used and it's performance using Evaluation metric
Score Chart.
15 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Confusion Matrix:
16 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Confusion Matrix:
17 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Have i seen any improvement? Note down the improvement with updates Evaluation metric
Score Chart.
18 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
It appears that hyperparameter tuning did not improve the performance of the Logistic
Regression model on the test set. The precision, recall, accuracy and F1 scores on the test set
are same for both tuned and untuned Logistic Regression models.
1. Explain the ML Model used and it's performance using Evaluation metric
Score Chart.
Confusion Matrix:
19 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
20 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
# Initialize GridSearchCV
grid_search = GridSearchCV(model, grid, cv=rskf)
Confusion Matrix:
21 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Have i seen any improvement? Note down the improvement with updates Evaluation metric
Score Chart.
22 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Precision
0.980952 0.990741 1.000000 0.954548
Train
Accuracy
0.980952 0.990476 1.000000 0.952381
Train
F1 macro
0.980952 0.990478 1.000000 0.952353
Train
It appears that hyperparameter tuning didn't improved the performance of the Decision Tree
model on the test set. The precision, recall, accuracy and F1 scores on the test set are less for
the tuned Decision Tree model compare to the untuned Decision Tree model.
1. Explain the ML Model used and it's performance using Evaluation metric
Score Chart.
Confusion Matrix:
23 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Precision
0.980952 0.990741 1.000000 0.954548 1.000000
Train
Precision
0.979167 0.979167 0.979167 0.960784 0.979167
Test
Accuracy
0.980952 0.990476 1.000000 0.952381 1.000000
Train
Accuracy
0.977778 0.977778 0.977778 0.955556 0.977778
Test
F1 macro
0.980952 0.990478 1.000000 0.952353 1.000000
Train
F1 macro
0.977692 0.977692 0.977692 0.955093 0.977692
Test
24 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
# Initialize RandomSearchCV
random_search = RandomizedSearchCV(rf, grid,cv=rskf, n_iter=10, n_jobs=-1)
Confusion Matrix:
25 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Have i seen any improvement? Note down the improvement with updates Evaluation metric
Score Chart.
26 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Out[ ]: Logistic
Logistic Decision Decision Random Random
regression
regression Tree Tree tuned Forest Forest tuned
tuned
Precision
0.980952 0.990741 1.000000 0.954548 1.000000 0.971693
Train
Precision
0.979167 0.979167 0.979167 0.960784 0.979167 0.979167
Test
Recall
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429
Train
Accuracy
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429
Train
Accuracy
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778
Test
F1 macro
0.980952 0.990478 1.000000 0.952353 1.000000 0.971434
Train
F1 macro
0.977692 0.977692 0.977692 0.955093 0.977692 0.977692
Test
It appears that hyperparameter tuning improved the performance of the Random Forest
model on the train set. But the precision, recall, accuracy and F1 scores on the test set are
same for both tuned and untuned Random Forest models.
1. Explain the ML Model used and it's performance using Evaluation metric
Score Chart.
Confusion Matrix:
27 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
28 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Precision
0.980952 0.990741 1.000000 0.954548 1.000000 0.971693 0.980952
Train
Precision
0.979167 0.979167 0.979167 0.960784 0.979167 0.979167 0.979167
Test
Recall
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952
Train
Recall
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778
Test
Accuracy
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952
Train
Accuracy
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778
Test
F1 macro
0.980952 0.990478 1.000000 0.952353 1.000000 0.971434 0.980952
Train
F1 macro
0.977692 0.977692 0.977692 0.955093 0.977692 0.977692 0.977692
Test
29 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Confusion Matrix:
30 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Have i seen any improvement? Note down the improvement with updates Evaluation metric
Score Chart.
Precision
0.980952 0.990741 1.000000 0.954548 1.000000 0.971693 0.980952 0.980952
Train
Precision
0.979167 0.979167 0.979167 0.960784 0.979167 0.979167 0.979167 0.979167
Test
Recall
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952
Train
Recall
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778
Test
Accuracy
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952
Train
Accuracy
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778
Test
F1
macro 0.980952 0.990478 1.000000 0.952353 1.000000 0.971434 0.980952 0.980952
Train
F1
macro 0.977692 0.977692 0.977692 0.955093 0.977692 0.977692 0.977692 0.977692
Test
It appears that hyperparameter tuning did not improve the performance of the SVM model
on the test set. The precision, recall, accuracy and F1 scores on the test set are same for both
tuned and untuned SVM models.
1. Explain the ML Model used and it's performance using Evaluation metric
Score Chart.
Confusion Matrix:
31 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
32 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Precision
0.980952 0.990741 1.000000 0.954548 1.000000 0.971693 0.980952 0.980952 1.000000
Train
Precision
0.979167 0.979167 0.979167 0.960784 0.979167 0.979167 0.979167 0.979167 0.979167
Test
Recall
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Recall
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
Accuracy
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Accuracy
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
F1
macro 0.980952 0.990478 1.000000 0.952353 1.000000 0.971434 0.980952 0.980952 1.000000
Train
F1
macro 0.977692 0.977692 0.977692 0.955093 0.977692 0.977692 0.977692 0.977692 0.977692
Test
# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(xgb2, param_grid, n_iter=10, cv=rskf)
33 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Confusion Matrix:
Randomized search is a popular technique because it can be more efficient than exhaustive
search methods like grid search. Instead of trying all possible combinations of
hyperparameters, randomized search samples a random subset of the hyperparameter
space. This can save time and computational resources while still finding good
hyperparameters for the model.
34 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Have i seen any improvement? Note down the improvement with updates Evaluation metric
Score Chart.
Precision
0.980952 0.990741 1.000000 0.954548 1.000000 0.971693 0.980952 0.980952 1.000000
Train
Precision
0.979167 0.979167 0.979167 0.960784 0.979167 0.979167 0.979167 0.979167 0.979167
Test
Recall
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Recall
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
Accuracy
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Accuracy
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
F1
macro 0.980952 0.990478 1.000000 0.952353 1.000000 0.971434 0.980952 0.980952 1.000000
Train
F1
macro 0.977692 0.977692 0.977692 0.955093 0.977692 0.977692 0.977692 0.977692 0.977692
Test
It appears that hyperparameter tuning did not improve the performance of the XGBoost
model on the test set. The precision, recall, accuracy and F1 scores on the test set are same
for both the untuned and tuned XGBoost models.
1. Explain the ML Model used and it's performance using Evaluation metric
Score Chart.
Confusion Matrix:
35 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
36 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Precision
0.980952 0.990741 1.000000 0.954548 1.000000 0.971693 0.980952 0.980952 1.000000
Train
Precision
0.979167 0.979167 0.979167 0.960784 0.979167 0.979167 0.979167 0.979167 0.979167
Test
Recall
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Recall
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
Accuracy
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Accuracy
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
F1
macro 0.980952 0.990478 1.000000 0.952353 1.000000 0.971434 0.980952 0.980952 1.000000
Train
F1
macro 0.977692 0.977692 0.977692 0.955093 0.977692 0.977692 0.977692 0.977692 0.977692
Test
# Initialize GridSearchCV
GridSearch = GridSearchCV(naive, param_grid, cv=rskf, n_jobs=-1)
37 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Confusion Matrix:
Here we have used the GridSearchCV for optimization of the Naive Bayes model.
Have i seen any improvement? Note down the improvement with updates Evaluation metric
Score Chart.
38 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Precision
0.980952 0.990741 1.000000 0.954548 1.000000 0.971693 0.980952 0.980952 1.000000
Train
Precision
0.979167 0.979167 0.979167 0.960784 0.979167 0.979167 0.979167 0.979167 0.979167
Test
Recall
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Recall
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
Accuracy
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Accuracy
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
F1
macro 0.980952 0.990478 1.000000 0.952353 1.000000 0.971434 0.980952 0.980952 1.000000
Train
F1
macro 0.977692 0.977692 0.977692 0.955093 0.977692 0.977692 0.977692 0.977692 0.977692
Test
It appears that hyperparameter tuning did not improved the performance of the Naive Bayes
model on the test set. The tuned Naive Bayes model has precision, recall, accuracy and F1
score on the test set as same as in the untuned Naive Bayes model.
1. Explain the ML Model used and it's performance using Evaluation metric
Score Chart.
Confusion Matrix:
39 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
40 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Precision
0.980952 0.990741 1.000000 0.954548 1.000000 0.971693 0.980952 0.980952 1.000000
Train
Precision
0.979167 0.979167 0.979167 0.960784 0.979167 0.979167 0.979167 0.979167 0.979167
Test
Recall
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Recall
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
Accuracy
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Accuracy
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
F1
macro 0.980952 0.990478 1.000000 0.952353 1.000000 0.971434 0.980952 0.980952 1.000000
Train
F1
macro 0.977692 0.977692 0.977692 0.955093 0.977692 0.977692 0.977692 0.977692 0.977692
Test
# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(neural, param_grid, n_iter=10, cv=rskf, n_jobs=-
41 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Confusion Matrix:
Here we have used Randomized search to tune the Neural Network model.
Randomized search is a popular technique because it can be more efficient than exhaustive
search methods like grid search. Instead of trying all possible combinations of
hyperparameters, randomized search samples a random subset of the hyperparameter
space. This can save time and computational resources while still finding good
hyperparameters for the model.
42 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
Have i seen any improvement? Note down the improvement with updates Evaluation metric
Score Chart.
Precision
0.980952 0.990741 1.000000 0.954548 1.000000 0.971693 0.980952 0.980952 1.000000
Train
Precision
0.979167 0.979167 0.979167 0.960784 0.979167 0.979167 0.979167 0.979167 0.979167
Test
Recall
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Recall
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
Accuracy
0.980952 0.990476 1.000000 0.952381 1.000000 0.971429 0.980952 0.980952 1.000000
Train
Accuracy
0.977778 0.977778 0.977778 0.955556 0.977778 0.977778 0.977778 0.977778 0.977778
Test
F1
macro 0.980952 0.990478 1.000000 0.952353 1.000000 0.971434 0.980952 0.980952 1.000000
Train
F1
macro 0.977692 0.977692 0.977692 0.955093 0.977692 0.977692 0.977692 0.977692 0.977692
Test
It appears that hyperparameter tuning improve the performance of the neural network
model on the test set. The precision, recall, accuracy and F1 scores on the test set are
increased for the tuned neural network model compare to untuned neural network model.
In [ ]: print(score.to_markdown())
43 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
In [ ]: # Removing the overfitted models which have precision, recall, f1 scores for train as 1
score_t = score.transpose() # taking transpose of the score dataframe to create ne
remove_models = score_t[score_t['Recall Train']>=0.98].index # creating a list of models whi
remove_models
Out[ ]: F1 F1
Precision Precision Recall Recall Accuracy Accuracy
macro macro
Train Test Train Test Train Test
Train Test
Decision
Tree 0.954548 0.960784 0.952381 0.955556 0.952381 0.955556 0.952353 0.955093
tuned
Random
Forest 0.971693 0.979167 0.971429 0.977778 0.971429 0.977778 0.971434 0.977692
tuned
Naive
0.942857 0.979365 0.942857 0.977778 0.942857 0.977778 0.942857 0.977806
Bayes
Naive
Bayes 0.942857 0.979365 0.942857 0.977778 0.942857 0.977778 0.942857 0.977806
tuned
best_models = {}
for metric in metrics:
max_test = df[metric + ' Test'].max()
best_model_test = df[df[metric + ' Test'] == max_test].index[0]
best_model = best_model_test
best_models[metric] = best_model
return best_models
44 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
I choose recall as the primary evaluation metric because correctly identifying different iris
flowers are critical to achieving our business objectives. By selecting a model with a high
recall score, we aim to ensure that we correctly identify as many different iris flowers as
possible, even if it means that we may have some false positives. Overall, we believe that the
Random Forest (tuned) is the best choice for our needs and will help us achieve a positive
business impact.
45 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
In [ ]: # In this example, it's a data point with Sepal Length, Sepal Width, Petal Length, and Petal
x_rf = np.array([[5.1, 3.5, 1.4, 0.2]])
Iris-Setosa
Conclusion
46 of 47 15/10/2024, 7:06 AM
Iris_Flower_Classification http://localhost:8888/nbconvert/html/Iris_Flower_Classification.ipynb...
In the Iris flower classification project, the tuned Random Forest model has been selected as
the final prediction model. The project aimed to classify Iris flowers into three distinct
species: Iris-Setosa, Iris-Versicolor, and Iris-Virginica. After extensive data exploration,
preprocessing, and model evaluation, the following conclusions can be drawn:
2. Data Preprocessing: Data preprocessing steps, including handling missing values and
encoding categorical variables, were performed to prepare the dataset for modeling.
3. Model Selection: After experimenting with various machine learning models, tuned
Random Forest was chosen as the final model due to its simplicity, interpretability, and
good performance in classifying Iris species.
4. Model Training and Evaluation: The Random Forest (tuned) model was trained on the
training dataset and evaluated using appropriate metrics. The model demonstrated
satisfactory accuracy and precision in classifying Iris species.
5. Challenges and Future Work: The project encountered challenges related to feature
engineering and model fine-tuning. Future work may involve exploring more advanced
modeling techniques to improve classification accuracy further.
6. Practical Application: The Iris flower classification model can be applied in real-world
scenarios, such as botany and horticulture, to automate the identification of Iris species
based on physical characteristics.
In conclusion, the Iris flower classification project successfully employed Random Forest
(tuned) as the final prediction model to classify Iris species. The project's outcomes have
practical implications in the field of botany and offer valuable insights into feature
importance for species differentiation. Further refinements and enhancements may lead to
even more accurate and reliable classification models in the future.
47 of 47 15/10/2024, 7:06 AM