0% found this document useful (0 votes)
24 views16 pages

Iris Dataset Project Report - Compress

The document outlines a machine learning project focused on classifying iris flowers using various algorithms, including Logistic Regression, K-Nearest Neighbour, Support Vector Machine, Decision Trees, and Naive Bayes. It details the methodology for data preparation, exploratory data analysis, and model implementation, highlighting the importance of understanding the dataset and the classification process. The project concludes with accuracy results for each algorithm, emphasizing the effectiveness of KNN and Gaussian Naive Bayes.

Uploaded by

ntdmikey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views16 pages

Iris Dataset Project Report - Compress

The document outlines a machine learning project focused on classifying iris flowers using various algorithms, including Logistic Regression, K-Nearest Neighbour, Support Vector Machine, Decision Trees, and Naive Bayes. It details the methodology for data preparation, exploratory data analysis, and model implementation, highlighting the importance of understanding the dataset and the classification process. The project concludes with accuracy results for each algorithm, emphasizing the effectiveness of KNN and Gaussian Naive Bayes.

Uploaded by

ntdmikey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Iris Flower

Classification Project
with Machine Learning

Dipali Mistry
01/11/2022

0 0
Contents

Contents
1. INTRODUCTION.............................................................................................................................................2
1.1 Problem statement..................................................................................................................................3
1.2 Data Prepare the data.............................................................................................................................3
2. Methodology................................................................................................................................................5
2.1 Pre Processing.........................................................................................................................................5
2.1.1 Exploratory Data Analysis................................................................................................................5
2.1.2 Outlier Analysis................................................................................................................................6
Feature in Seaborn through a boxplot......................................................................................................7
2.1.3. Box plot grid....................................................................................................................................9
3. The model implementation.........................................................................................................................11
3.1. Logistic Regression...............................................................................................................................11
3.2. K – Nearest Neighbour (KNN)...............................................................................................................12
3.3. Support Vector Machine (SVM)............................................................................................................13
3.4. Decision Trees......................................................................................................................................14
3.5. Naive Bayes classifier............................................................................................................................15
4. Conclusion...................................................................................................................................................16

1. INTRODUCTION
Every machine learning project begins by understanding what the data and drawing
the objectives. While applying machine learning algorithms to your data set, you are
understanding, building and analyzing the data as to get the end result.

Following are the steps involved in creating a well-defined ML project:

1] Understand and define the problem


2] Prepare the data
2

0 0
3] Explore and Analyse the data
4] Apply the algorithms
5] Reduce the errors
6] Predict the result

To understand various machine learning algorithms let us use the Iris data set, one of
the most famous datasets available.

1.1 Problem statement

This data set consists of the physical parameters of three species of flower -
Versicolor, Setosa and Virginica. The numeric parameters which the dataset contains
are Sepal width, Sepal length, Petal width and Petal length. In this data we will be
predicting the classes of the flowers based on these parameters. The data consists of
continuous numeric values which describe the dimensions of the respective features.
We will be training the model based on these features.

1.2 Data Prepare the data

Table 1.1: Sample Data (Columns: 1-16)

0 0
It has been created Ronald Fisher in 1936. It contains the petal length, petal width,
sepal length and sepal width of 150 iris flowers from 3 different species. Variables
present in given dataset are SepalLengthCm, SepalWidthCm, PetalLengthCm,
PetalWidthCm, Species.

The details of variable present in the dataset are as follows - instant:

Now,
View the info of the data frame that contains details like the count of non-null
variables and the column’s datatype along with the column names. It will also
show the memory usage.

0 0
2. Methodology
2.1 Pre Processing

Any predictive modeling requires that we look at the data before we start
modeling. However, in data mining terms looking at data refers to so much more
than just looking. Looking at data refers to exploring the data, cleaning the data as
well as visualizing the data through graphs and plots. This is often called as
Exploratory Data Analysis.

2.1.1 Exploratory Data Analysis

In exploring the data we have,


If there are any missing values, then modify them before using the dataset. For
modifying you can use the fillna() method. It will fill null values.

Scatterplot of the Iris features

0 0
Bivariate scatterplots and univariate histograms in the same figure

2.1.2 Outlier Analysis

Outlier analysis is done to handle all inconsistent observations present in given


dataset. As outlier analysis can only be done on continuous variable.

0 0
Feature in Seaborn through a boxplot

Visualizes a kernel density estimate of the underlying feature

0 0
Iris-setosa species is separataed from the other two across all feature
Combinations

0 0
2.1.3. Box plot grid

Box plot grid

Andrews Curves involve using attributes of samples as coefficients for Fourier se


ries and then plotting these

0 0
Parallel coordinates plots each feature on a separate column & then draws
lines connecting the features for each data sample

feature as a point on a 2D plane, and then simulates


having each sample attached to those points through a spring weighted
by the relative value for that feature

10

0 0
3. The model implementation
Using some of the commonly used algorithms, we will be training our model to check how accurate
every algorithm is. We will be implementing these algorithms to compare:
1] Logistic Regression
2] K – Nearest Neighbour (KNN)
3] Support Vector Machine (SVM)
4] Decision Trees
5] Naive Bayes classifier

3.1. Logistic Regression

We can start with the first algorithm Logistic Regression. We can build our model like below:

11

0 0
3.2. K – Nearest Neighbour (KNN)
Now , let us see the scores with K-Nearest Neighbors technique.

12

0 0
3.3. Support Vector Machine (SVM)
Thirdly , with SVM (Support Vector Machines).

13

0 0
3.4. Decision Trees
Next , is yes – no type of algorithm , decision trees !

14

0 0
3.5. Naive Bayes classifier
And lastly , the Naive Bayes classifier. (Variants included)

And also display some other type of Naïve Bayes inside the graph.

15

0 0
4. Conclusion
 Flower classification is a very important, simple, and basic project for any machine learning
student. Every machine learning student should be thorough with the iris flowers dataset.

 This classification can be done by many classification algorithms in machine learning but in
our article, we used Logistic Regression (Accuracy: 0.98 ), K – Nearest Neighbour
(Accuracy:1.0), Support Vector Machine(Accuracy:1.0), Decision Trees (Accuracy:0.966),
Naïve Bayes classifier, Gaussian Naive Bayes (Accuracy:100.00), Multinomial Naïve
Bayes(Accuracy:0.83), Bernoulli Naïve Bayes(Accuracy:20.0),
Complement Naive Bayes(Accuracy:56.66).

16

0 0

You might also like