Iris Dataset Project Report - Compress
Iris Dataset Project Report - Compress
Classification Project
with Machine Learning
Dipali Mistry
01/11/2022
0 0
Contents
Contents
1. INTRODUCTION.............................................................................................................................................2
1.1 Problem statement..................................................................................................................................3
1.2 Data Prepare the data.............................................................................................................................3
2. Methodology................................................................................................................................................5
2.1 Pre Processing.........................................................................................................................................5
2.1.1 Exploratory Data Analysis................................................................................................................5
2.1.2 Outlier Analysis................................................................................................................................6
Feature in Seaborn through a boxplot......................................................................................................7
2.1.3. Box plot grid....................................................................................................................................9
3. The model implementation.........................................................................................................................11
3.1. Logistic Regression...............................................................................................................................11
3.2. K – Nearest Neighbour (KNN)...............................................................................................................12
3.3. Support Vector Machine (SVM)............................................................................................................13
3.4. Decision Trees......................................................................................................................................14
3.5. Naive Bayes classifier............................................................................................................................15
4. Conclusion...................................................................................................................................................16
1. INTRODUCTION
Every machine learning project begins by understanding what the data and drawing
the objectives. While applying machine learning algorithms to your data set, you are
understanding, building and analyzing the data as to get the end result.
0 0
3] Explore and Analyse the data
4] Apply the algorithms
5] Reduce the errors
6] Predict the result
To understand various machine learning algorithms let us use the Iris data set, one of
the most famous datasets available.
This data set consists of the physical parameters of three species of flower -
Versicolor, Setosa and Virginica. The numeric parameters which the dataset contains
are Sepal width, Sepal length, Petal width and Petal length. In this data we will be
predicting the classes of the flowers based on these parameters. The data consists of
continuous numeric values which describe the dimensions of the respective features.
We will be training the model based on these features.
0 0
It has been created Ronald Fisher in 1936. It contains the petal length, petal width,
sepal length and sepal width of 150 iris flowers from 3 different species. Variables
present in given dataset are SepalLengthCm, SepalWidthCm, PetalLengthCm,
PetalWidthCm, Species.
Now,
View the info of the data frame that contains details like the count of non-null
variables and the column’s datatype along with the column names. It will also
show the memory usage.
0 0
2. Methodology
2.1 Pre Processing
Any predictive modeling requires that we look at the data before we start
modeling. However, in data mining terms looking at data refers to so much more
than just looking. Looking at data refers to exploring the data, cleaning the data as
well as visualizing the data through graphs and plots. This is often called as
Exploratory Data Analysis.
0 0
Bivariate scatterplots and univariate histograms in the same figure
0 0
Feature in Seaborn through a boxplot
0 0
Iris-setosa species is separataed from the other two across all feature
Combinations
0 0
2.1.3. Box plot grid
0 0
Parallel coordinates plots each feature on a separate column & then draws
lines connecting the features for each data sample
10
0 0
3. The model implementation
Using some of the commonly used algorithms, we will be training our model to check how accurate
every algorithm is. We will be implementing these algorithms to compare:
1] Logistic Regression
2] K – Nearest Neighbour (KNN)
3] Support Vector Machine (SVM)
4] Decision Trees
5] Naive Bayes classifier
We can start with the first algorithm Logistic Regression. We can build our model like below:
11
0 0
3.2. K – Nearest Neighbour (KNN)
Now , let us see the scores with K-Nearest Neighbors technique.
12
0 0
3.3. Support Vector Machine (SVM)
Thirdly , with SVM (Support Vector Machines).
13
0 0
3.4. Decision Trees
Next , is yes – no type of algorithm , decision trees !
14
0 0
3.5. Naive Bayes classifier
And lastly , the Naive Bayes classifier. (Variants included)
And also display some other type of Naïve Bayes inside the graph.
15
0 0
4. Conclusion
Flower classification is a very important, simple, and basic project for any machine learning
student. Every machine learning student should be thorough with the iris flowers dataset.
This classification can be done by many classification algorithms in machine learning but in
our article, we used Logistic Regression (Accuracy: 0.98 ), K – Nearest Neighbour
(Accuracy:1.0), Support Vector Machine(Accuracy:1.0), Decision Trees (Accuracy:0.966),
Naïve Bayes classifier, Gaussian Naive Bayes (Accuracy:100.00), Multinomial Naïve
Bayes(Accuracy:0.83), Bernoulli Naïve Bayes(Accuracy:20.0),
Complement Naive Bayes(Accuracy:56.66).
16
0 0