Iris Flower Classification Project
Iris Flower Classification Project
On
in
[…Name of discipline…]
By
Sona Halder(23SCSE1041009)
Dhruv Gupta(23SCSE1040993)
INDIA
June, 2025
pg. 1
Table of content
3. Conclusion 14
pg. 2
Chapter -1 Introduction
Machine learning is almost everywhere nowadays. It has become increasingly
necessary day by day. From recommending what to buy to recognizing a
person, robotics everywhere relies on machine learning. So in this project,
we’ll create the “Hello World” of machine learning, which means Iris flower
classification.
Iris flower classification is a very popular machine learning project. The iris
dataset contains three classes of flowers, Versicolor, Setosa, Virginica, and
each class contains 4 features, ‘Sepal length’, ‘Sepal width’, ‘Petal length’, and
‘Petal width’. The iris flower classification aims to predict flowers based on
their specific features.
pg. 3
• Semi-supervised machine learning: Semi-supervised learning falls
between supervised and unsupervised learning. It has a small amount of
tagged data and a large amount of untagged data.
3. Chatbot: Chatbots are used to give customer services without any human
agent. It takes questions from users and based on the question it gives an
answer as a response.
In this project, we’ll solve the problem using a supervised learning approach.
We’ll use an algorithm called “Support vector machine”.
pg. 4
pg. 5
SVM approximates a separate line (Hyperplane) between tSVM algorithm finds
the points closest to the line from both classes. These points are known as
support vectors. Then it computes the distance between the line and support
vectors. This distance is called the margin. The main goal is to maximize the
margin. The hyperplane which has the maximum margin is known as the
optimal hyperplane.
pg. 6
Chapter-2 :Steps to classify iris flower
Steps to Classify Iris Flower:
• Pandas help to load data from various sources like local storage,
database, excel file, CSV file, etc.
columns = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Class_labels']
# Load the data
df = pd.read_csv('iris.data', names=columns)
df.head()
pg. 7
• Next, we load the data using pd.read_csv() and set the column name as
per the iris data information.
• Pd.read_csv reads CSV files. CSV stands for comma separated value.
• df.head() only shows the first 5 rows from the data set table.
From this description, we can see all the descriptions about the data, like
average length and width, minimum value, maximum value, the 25%, 50%, and
75% distribution value, etc.
pg. 8
• To visualize the whole dataset we used the seaborn pair plot method. It
plots the whole dataset’s information.
• From this visualization, we can tell that iris-setosa is well separated from
the other two flowers.
• And iris virginica is the longest flower and iris setosa is the shortest.
pg. 9
# Calculate average of each features for all classes
Y_Data = np.array([np.average(X[:, i][Y==j].astype('float32')) for i in range (X.shape[1])
for j in (np.unique(Y))])
Y_Data_reshaped = Y_Data.reshape(4, 3)
Y_Data_reshaped = np.swapaxes(Y_Data_reshaped, 0, 1)
X_axis = np.arange(len(columns)-1)
width = 0.25
• Here we used two for loops inside a list. This is known as list
comprehension.
pg. 10
• Here we can clearly see the verginica is the longest and setosa is the
shortest flower.
• Using train_test_split we split the whole data into training and testing
datasets. Later we’ll use the testing dataset to check the accuracy of the
model.
# Support vector machine algorithm
from sklearn.svm import SVC
svn = SVC()
svn.fit(X_train, y_train)
• After that, we feed the training dataset into the algorithm by using the
svn.fit() method.
• Now we predict the classes from the test dataset using our trained
model.
• accuracy_score() takes true values and predicted values and returns the
percentage of accuracy.
pg. 11
Output:
0.9666666666666667
Now let’s see the detailed classification report based on the test dataset.
# A detailed classification report
from sklearn.metrics import classification_report
print(classification_report(y_test, predictions))
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 9
Iris-versicolor 1.00 0.83 0.91 12
Iris-virginica 0.82 1.00 0.90 9
accuracy 0.93 30
macro avg 0.94 0.94 0.94 30
weighted avg 0.95 0.93 0.93 30
• Precision defines the ratio of true positives to the sum of true positive
and false positives.
• Recall defines the ratio of true positive to the sum of true positive and
false negative.
• Here we take some random values based on the average plot to see if
the model can predict accurately.
pg. 12
Output:
It looks like the model is predicting correctly because the setosa is shortest and
virginica is the longest and versicolor is in between these two.
# Save the model
import pickle
with open('SVM.pickle', 'wb') as f:
pickle.dump(svn, f)
# Load the model
with open('SVM.pickle', 'rb') as f:
model = pickle.load(f)
model.predict(X_new)
• And again we can load the model in any other program using pickle and
use it using model.predict to predict the iris data.
pg. 13
Chapter-3: Conclusion
In this project, we explored the Iris Flower Classification using supervised
machine learning techniques. The main objective was to build a model that can
accurately classify iris flowers into one of three species: Setosa, Versicolor, or
Virginica, based on features such as sepal length, sepal width, petal length, and
petal width. We began by loading and understanding the dataset, followed by
thorough data preprocessing and exploration to identify patterns and
relationships among features. Data visualization techniques, including
histograms, pair plots, and correlation matrices, were used to better
understand the distribution and correlation of variables.
pg. 14