Classification of Flower Species Final

AN EFFICIENT CLASSIFICATION OF FLOWER
SPECIES USING KNN
A PROJECT REPORT
Submitted by
MANIMARAN R (Reg.No:2016104079)
MAREESWARAN S (Reg.No:2016104083)
MOHAMED ASHFALK A (Reg.No:2016104087)
In partial fulfillment for the award of the degree

of
BACHELOR OF ENGINEERING
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
APRIL 2019
SETHU INSTITUTE OF TECHNOLOGY

AN AUTONOMOUS INSTITUION
AFFILIATED TO ANNA UNIVERSITY
PULLOOR, KARIAPATTI-626 115.
i
SETHU INSTITUTIE OF TECHNOLOGY
AN AUTONOMOUS INSTITUTION
BONAFIDE CERTIFICATE
Certified that this technical project report “AN EFFICIENT

CLASSIFICATION OF FLOWER SPECIES USING KNN” is the bonafide
work of MANIMARAN.R (2016104079), MAREESWARAN.S (2016104083),
MOHAMED ASHFALK.A (2016104087), who carried out the
technical project work under my supervision.
SIGNATURE SIGNATURE
Mrs. Helina Rajini Suresh M.E.,(Ph.D) Mrs.S.Amalorpava Mary Rajee M.E., (Ph.D)
HEAD OF THE DEPARTMENT SUPERVISOR
Department of ECE Professor/Department of ECE
Sethu Institute of Technology Sethu Institute of Technology
Pulloor, Kariapatti-626 115 Pulloor, Kariapatti-626 115
Submitted for the 15UEC609 - Project Work End Semester Examination held at
Sethu Institute of Technology on …………………….
INTERNAL EXAMINER EXTERNAL EXAMINER
ii
ACKNOWLEDGEMENT
First we like to thank god the almighty for giving us the talent and
opportunity to complete project.
We wish to express our earnest great fullness to our honorable founder and
chairman Mr. S. MOHAMED JALEEL B.Sc., B.L., for his encouragement
extend to us to undertake this project.
We wish to express our sense of gratitude to our principal
Dr. A. SENTHIL KUMAR M.E., Ph.D., for being given guidance kind and
cooperative encouragement, inspiration and keep interest show throughout the
We would like to express our deep sense of gratitude to our Head of the
Department Mrs. HELINA RAJINI SURESH M.E., (Ph.D.)., who extended
their heartiest encouragement, advice and valuable guidance through this project.
We would also like to acknowledge our deep sense of gratitude to our guide
Mrs. S. AMALORPAVA MARY RAJEE M.E.,(Ph.D)., for this enthusiastic
inspiration, constant encouragement, sustained guidance and scholarly
advice impaired throughout the course of this project.
We thank our parents, faculty members, supporting staff

and friends for their extended during the time of our project.
iii
ABSTRACT
The ability of a machine learning model to classify or label an image into its respective
class with the help of learned features from hundreds of images is called as Image
Classification. Classification of images plays an important role in sorting the images into
classes based on their similarities. Recently, the demands on classifying images according
to their features have shown great interest in many areas such as digital library, searching
engine, or any content-based image retrieval system with the advantage of advanced
computer technologies. In this project, we will look into one such image classification
problem namely Flower species classification which is a hard problem because there are
millions of flower species around the world. As we know machine learning is all about
learning from past data, we need huge dataset of flower images to perform real-time flower
species recognition. Without worrying too much on real-time flower classification, we will
learn how to perform a simple image classification task using computer vision and machine
learning algorithms with the help of Python in this project. In this project, we will using
simple machine learning algorithm- KNN classifier to classify the flower species .
iv
TABLE OF CONTENTS
CHAPTER TITLE PAGE NO

No
ABSTRACT iv
LIST OF ABBREVATIONS vii
1 INTRODUCTION 1
1.1 MACHINE LEARNING 1
1.1.1 PURPOSE OF ML 2
1.1.2 STEPS TO PERFORM ML 2
1.1.3 TYPES OF ML ALGORITHMS 3
1.1.4 SUPERVISED LEARNING ALGORITHM 3
1.1.5 CLASSIFICATION TECHNIQUE 4
1.1.6 APPLICATIONS OF ML ALGORITHMS 5
2 KNN CLASSIFIER ALGORITHM 5
2.1 WHEN DO WE USE KNN AKGORITHM? 6
2.2 WORKING PRINCIPLE OF KNN 7
2.3 HOW DO WE CHOOSE THE FACTOR K? 8
2.4 ADVANTAGES OF KNN ALGORITHM 12
2.5 LIMITATIONS TO KNN ALGORITHM 12
v
3 LITERATURE SURVEY 13
4 PROPOSED WORK 16
4.1 LIBRARIES AND PACKAGES 16
4.2 FLOWER DATASET 17
4.3 FLOWER IMAGE RESIZING 17
4.4 LABEL ENCODING 18
4.5 TRAIN TEST SPLIT 18
4.6 TRAINING AND PREDICTIONS 19
5 SOFTWARE DESCRIPTION 20
5.1 INTRODUCTION 20
5.1.1 PYTHON 20
6 RESULT AND DISCUSSION 21
7 CONCLUSION 24
REFERENCES 25
vi
LIST OF ABBREVIATIONS
KNN - K-Nearest Neighbors
ML - Machine Learning
vii
CHAPTER 1
INTRODUCTION
Nature has many different kinds of flowers, similarity in some features is found between
the flowers. For example, many flowers share the red color. On the other hand, these red
flowers are different from other features. Red flowers do not necessarily share the same
shape. These similarities and differences highlight the difficulty of identifying each flower
species automatically. Traditional flower recognition task is done by a botanist. Many
challenges are facing botanist through flower recognition task. This project aims at
providing an automated system that detects and recognizes flower species using machine
learning algorithm. The importance of building automated flower recognition method
stands out in many benefits such as providing fast recognition for educational purpose, as
automated method accelerates the learning process. Automated flower recognition gives
the people, with limited experience in flower species, the ability to recognize the species
of a flower, with the advantages of saving time and effort.
1.1 MACHINE LEARNING
Data science, machine learning and artificial intelligence are some of the top trending
topics in the tech world today. Data mining and Bayesian analysis are trending and this is
adding the demand for machine learning. Machine learning is a discipline that deals with
programming the systems so as to make them automatically learn and improve with
experience. Here, learning implies recognizing and understanding the input data and taking
informed decisions based on the supplied data. It is very difficult to consider all the
decisions based on all possible inputs. To solve this problem, algorithms are developed that
build knowledge from a specific data and past experience by applying the principles of
statistical science, probability, logic, mathematical optimization, reinforcement learning,
and control theory.
1
1.1.1 PURPOSE OF ML
Machine learning can be seen as a branch of AI or Artificial Intelligence, since, the
ability to change experience into expertise or to detect patterns in complex data is a mark
of human or animal intelligence. As a field of science, machine learning shares common
concepts with other disciplines such as statistics, information theory, game theory, and
optimization. As a subfield of information technology, its objective is to program machines
so that they will learn. However, it is to be seen that, the purpose of machine learning is
not building an automated duplication of intelligent behavior, but using the power of
computers to complement and supplement human intelligence. For example, machine
learning programs can scan and process huge databases detecting patterns that are beyond
the scope of human perception. In the real world, we usually come across lots of raw data
which is not fit to be readily processed by machine learning algorithms. We need to
preprocess the raw data before it is fed into various machine learning algorithms.
1.1.2 STEPS TO PERFORM ML ALGORITHMS

A machine learning project involves the following steps
• Defining a Problem
• Preparing Data
• Evaluating Algorithms
• Improving Results
• Presenting Results
The best way to get started using Python for machine learning is to work through a
project end-to-end and cover the key steps like loading data, summarizing data, evaluating
algorithms and making some predictions. This gives you a replicable method that can be
used dataset after dataset. You can also add further data and improve the results.
2
Fig-1 : Block diagram of steps involved in ML
1.1.3 TYPES OF ML ALGORITHMS

There are four categories of machine learning algorithms as shown below.
• Supervised learning algorithm

• Unsupervised learning algorithm
• Semi-supervised learning algorithm
• Reinforcement learning algorithm
However, the most commonly used ones are supervised and
unsupervised learning.
1.1.4 SUPERVISED LEARNING ALGORITHM

Supervised learning is commonly used in real world applications, such as face and
speech recognition, products or movie recommendations, and sales forecasting.
3
Supervised learning can be further divided into two types.
1. Regression
2. Classification
Regression trains on and predicts a continuous-valued response, for example

predicting real estate prices.
Classification attempts to find the appropriate class label, such as analyzing

positive/negative sentiment, male and female persons, benign and malignant tumors,
secure and unsecure loans etc.
In supervised learning, learning data comes with description, labels, targets or

desired outputs and the objective is to find a general rule that maps inputs to outputs. This
kind of learning data is called labeled data. The learned rule is then used to label new data
with unknown outputs. Supervised learning involves building a machine learning model
that is based on labeled samples. For example, if we build a system to estimate the price
of a plot of land or a house based on various features, such as size, location, and so on, we
first need to create a database and label it. We need to teach the algorithm what features
correspond to what prices. Based on this data, the algorithm will learn how to calculate the
price of real estate using the values of the input features. Supervised learning deals with
learning a function from available training data.
Here, a learning algorithm analyzes the training data and produces a derived function
that can be used for mapping new examples. There are many supervised learning
algorithms such as Logistic Regression, Neural networks, Support Vector Machines
(SVMs), KNN classifiers and Naive Bayes classifiers. Common examples of supervised
learning include classifying e-mails into spam and not-spam categories, labeling webpages
based on their content, voice recognition and image classification.
1.1.5 CLASSIFICATION TECHNIQUE

Classification is a machine learning technique that uses known data to determine
how the new data should be classified into a set of existing categories.
4
Consider the following examples to understand classification technique. A credit
card company receives tens of thousands of applications for new credit cards. These
applications contain information about several different features like age, location, sex,
annual salary, credit record etc. The task of the algorithm here is to classify the card
applicants into categories like those who have good credit record, bad credit record and
those who have a mixed credit record.
While classifying a given set of data, the classifier system performs the following
actions.
• Initially a new data model is prepared using any of the learning algorithms.
• Then the prepared data model is tested.
• Later, this data model is used to examine the new data and to determine its class.
Classification, also called categorization, is a machine learning technique that uses

known data to determine how the new data should be classified into a set of existing
labels/classes/categories.
1.1.6 APPLICATIONS OF ML ALGORITHMS

The developed machine learning algorithms are used in various applications such as
given below.
• Vision processing
• Language processing
• Forecasting things like stock market trends, weather
• Pattern recognition
• Games
• Data mining
• Expert systems
• Robotics
5
CHAPTER 2
KNN CLASSIFIER ALGORITHM
KNN is a non-parametric and lazy learning algorithm. Non-parametric means there

is no assumption for underlying data distribution. In other words, the model structure
determined from the dataset. This will be very helpful in practice where most of the real
world datasets do not follow mathematical theoretical assumptions. Lazy algorithm means
it does not need any training data points for model generation. All training data used in the
testing phase. This makes training faster and testing phase slower and costlier. Costly
testing phase means time and memory. In the worst case, KNN needs more time to scan all
data points and scanning all data points will require more memory for storing training data.
2.1 WHEN DO WE USE KNN ALGORITHM?
KNN can be used for both classification and regression predictive problems. However,
it is more widely used in classification problems in the industry. To evaluate any technique.
we generally look at 3 important aspects:
1. Ease to interpret output
2. Calculation time
3. Predictive Power
Let us take a few examples to place KNN in the scale :
KNN algorithm fairs across all parameters of considerations. It is commonly used

for its easy of interpretation and low calculation time.
6
2.2 WORKING PRINCIPLE OF KNN
In KNN, K is the number of nearest neighbors. The number of neighbors is the core
deciding factor. K is generally an odd number if the number of classes is 2. When K=1,
then the algorithm is known as the nearest neighbor algorithm. This is the simplest case.
Suppose P1 is the point, for which label needs to predict. First, you find the one closest
point to P1 and then the label of the nearest point assigned to P1.
Suppose P1 is the point, for which label needs to predict. First, you find the k
closest point to P1 and then classify points by majority vote of its k neighbors. Each
object votes for their class and the class with the most votes is taken as the prediction. For
finding closest similar points, you find the distance between points using distance
measures such as Euclidean distance, Hamming distance, Manhattan distance and
Minkowski distance.
KNN has the following basic steps:
• Calculate distance
7
• Find closest neighbors
• Vote for labels
2.3 HOW DO WE CHOOSE THE FACTOR K?

First let us try to understand what exactly does K influence in the algorithm. Following
are the different boundaries separating the two classes with different values of K.
8
If you watch carefully, you can see that the boundary becomes smoother with increasing
value of K. With K increasing to infinity it finally becomes all blue or all red depending on
the total majority. The training error rate and the validation error rate are two parameters
we need to access on different K-value. Following is the curve for the training error rate
with varying value of K .
9
As you can see, the error rate at K=1 is always zero for the training sample. This is because
the closest point to any training data point is itself. Hence, the prediction is always accurate
with K=1. If validation error curve would have been similar, our choice of K would have
been 1. Following is the validation error curve with varying value of K.
10
This makes the story more clear. At K=1, we were overfitting the boundaries. Hence, error
rate initially decreases and reaches a minima. After the minima point, it then increase with
increasing K. To get the optimal value of K, you can segregate the training and validation
from the initial dataset. Now plot the validation error curve to get the optimal value of K.
This value of K should be used for all predictions.
11
2.4 ADVANTAGES OF KNN ALGORITHM
• KNN is simple to implement.

• KNN executes quickly for small training data sets.
• Performance asymptotically approaches the performance of the Bayes
Classifier.
• Don’t need any prior knowledge about the structure of data in the training set.
• No retraining is required if the new training pattern is added to the existing
training set.
2.5 LIMITATIONS TO KNN ALGORITHM

• When the training set is large, it may take a lot of space.
• For every test data, the distance should be computed between test data and all the
training data. Thus a lot of time may be needed for the testing.
12
CHAPTER 3
LITERATURE SURVEY
[1] D.S Guru,Y.H Sharath,S.Manjunath, “Texture Features and KNN in Classification of

Flower Images”, IEEE 2016.
In this paper, we propose an algorithmic model for automatic classification of flowers using
KNN classifier. The proposed algorithmic model is based on textural features such as Gray
level co-occurrence matrix and Gabor responses. A flower image is segmented using a
threshold based method. The data set has different flower species with similar appearance
(small inter class variations) across different classes and varying appearance (large intra
class variations) within a class. Also, the images of flowers are of different pose with
cluttered background under varying lighting conditions and climatic conditions. The flower
images were collected from World Wide Web in addition to the photographs taken up in a
natural scene. Experimental Results are presented on a dataset of 1250 images consisting
of 25 flower species. It is shown that relatively a good performance can be achieved, using
KNN classifier algorithm. A qualitative comparative analysis of the proposed method with
other well known existing flower classification methods is also presented.
[2] Jinho Kim, Byung-Soo Kim, Silvio savarese, “Comparing Image Classification
Methods: K-Nearest-Neighbor and Support-Vector-Machines”, IEEE 2017.
In order for a robot or a computer to perform tasks, it must recognize what it is looking at.
Given an image a computer must be able to classify what the image represents. While this
is a fairly simple task for humans, it is not an easy task for computers. Computers must go
through a series of steps in order to classify a single image. In this paper, we used a general
Bag of Words model in order to compare two different classification methods. Both K-
Nearest-Neighbor (KNN) and Support-Vector-Machine (SVM) classification are well
known and widely used. We were able to observe that the SVM classifier outperformed the
KNN classifier. For future work, we hope to use more categories for the objects and to use
more sophisticated classifiers.
13
[3] Surbi saxena, D.R ochawar, “PERFORMANCE ANALYSIS OF IMAGE
CLASSIFICATION ALGORITHMS”,IEEE 2016.
Image classification plays an integral role in computer vision. Given an image a computer
must be able to classify what the image represents. While this is a fairly simple task for
humans, it is not an easy task for computers. Computers must go through a series of steps
in order to classify a single image. In this paper, we used a saliency based segmentation
process to compare two different classification methods. Both K-NearestNeighbor (KNN)
and Support-Vector-Machine (SVM) classification are designed using VHDL code. We
were able to observe that the SVM classifier outdoes the KNN classifier. For future work,
we hope to use more categories for the objects and to use more sophisticated classifiers.
[4] Bo sun, Jumping nu, Tian geo ,“Study on the improvement of K-Nearest-Neighbour
Algorithm”,IEEE 2017.
As one of the instance based learning method, the K-nearest-neighbor (KNN) algorithm
has been widely used in many fields. This paper accomplishes the improvements on the
two aspects. First, aiming to improve the efficiency of classifying, we move some
computations occurring at classifying period to the training period, which leads to the great
descent of computational cost. Second, to improve the accuracy of classifying, we take into
account of the contribution of different attributes and obtain the optimal attribute weight
sets using the quadratic programming method. Finally, this paper gives the validation of
the improvements through practical experiment.
[5] Tanakorn Tiay, Pipimphorn Benyaphaichit, Panomkhawn riyamongkol ,“Flower

recognition system based on image processing”IEEE 2016.
The flower recognition system based on image processing has been developed. This system
uses edge and color characteristics of flower images to classify flowers. Hu's seven-
moment algorithm is applied to acquire edge characteristics. Red, green, blue, hue, and
saturation characteristics are derived from histograms. K-nearest neighbor is used to
classify flowers. The accuracy of this system is more than 80%
14
[6] Riddhi H. Shaparia, Dr Narendra M. Patel, Zankhana H. Shah ,“ Flower Classification
using Texture and Color Features”, IEEE 2016.
In this research paper, we have used texture and color features for flower classification.
Standard database of flowers have used for experiments. The preprocessing like noise
removal and segmentation for elimination of background are apply on input images.
Texture and color features are extracted from the segmented images. Texture feature is
extracted using GLCM (Gray Level Co-occurrence Matrix) method and color feature is
extracted using Color moment. For classification, neural network classifier is used. The
overall accuracy of the system is 95.0 %.
15
CHAPTER 4
PROPOSED WORK
4.1 LIBRARIES AND PACKAGES
To understand machine learning, you need to have basic knowledge of Python

programming. In addition, there are a number of libraries and packages generally used in
performing various machine learning tasks as listed below.
• Scikit-learn - It is a library in Python that provides many unsupervised and

supervised learning algorithms. This library is used for data analysis and data
mining tasks. It is simple, efficient and easy to use API
• Imutils - A series of convenience functions to make basic image processing

functions such us translation, rotation, resizing, skeletonization, displaying
Matplotlib images, sorting contours, detecting edges and much more easier
with OpenCV
• argparse - Python argparse is the recommended command-line argument

parsing module in python.
• OS - The OS module in Python provides a way of using operating system

dependent functionality. The functions that the OS module provides allows
you to interface with the underlying operating system that Python is running
on-be that Windows, Linux or Mac.
• OpenCV – Open source computer vision library (OpenCV) is an open source

computer vision and machine learning software library. OpenCV was built to
provide a common infrastructure for computer vision applications.
• Numpy – Numpy is the fundamental package for scientific computing with

Python. It contains among other things: a powerful N-dimensional array
object.
16
4.2 FLOWER DATASET
We are going to use the famous flower dataset for our KNN example. The dataset
consists of two flower species. Each flower species having 80 images. Totally, there are
160 flower images are used in this project. The task is to predict the flower to which these
species belong.
4.3 FLOWER IMAGE RESIZING

Image resizing was usually used in literature before image segmentation.The image
data was prepared for further analysis to be used in Grab-cut segmentation, the input image
was resized to its half dimension. It was mentioned that resizing will give faster processing.
To get a satisfying segmentation results, it is required to find a re-sizing value to be
17
convenient with the region growing segmentation which is deployed in the proposed
system. Empirically, re-size value was experimented to achieve fast and complete
segmentation with accurate foreground boundary, an optimal value of re-sizing is 350 rows,
while the number of columns is calculated automatically to preserve the aspect ratio.
4.4 LABEL ENCODING

In supervised learning, we mostly come across a variety of labels which can be in the
form of numbers or words. If they are numbers, then they can be used directly by the
algorithm. However, many times, labels need to be in readable form. Hence, the training
data is usually labelled with words.
Label encoding refers to changing the word labels into numbers so that the algorithms
can understand how to work on them.
4.5 TRAIN TEST SPLIT

To avoid over-fitting, we will divide our dataset into training and test splits, which
gives us a better idea as to how our algorithm performed during the testing phase. This way
our algorithm is tested on un-seen data, as it would be in a production application.
To create training and test splits, execute the following script:
from sklearn.model_selection import train_test_split
(trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.25,

random_state=42)
The above script splits the dataset into 75% train data and 25% test data. This means that
out of total 100 records, the training set will contain 75 records and the test set contains 25
of those records.
18
4.5.1 TRAIN DATA
The observations in the training set form the experience that the algorithm uses to
learn. In supervised learning problems, each observation consists of an observed output
variable and one or more observed input variables.
4.5.2 TEST DATA
The test set is a set of observations used to evaluate the performance of the model
using some performance metric. It is important that no observations from the training set
are included in the test set. If the test set does contain examples from the training set, it
will be difficult to assess whether the algorithm has learned to generalize from the training
set or has simply memorized it.
4.6 TRAINING AND PREDICTIONS

It is extremely straight forward to train the KNN algorithm and make predictions with
it, especially when using Scikit-Learn.
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=1)
model.fit(trainX, trainY)
The first step is to import the KNeighborsClassifier class from
the sklearn.neighbors library. In the second line, this class is initialized with one parameter,
i.e. n_neigbours. This is basically the value for the K. There is no ideal value for K and it
is selected after testing and evaluation, however to start out, 5 seems to be the most
commonly used value for KNN algorithm.
The final step is to make predictions on our test data. To do so, execute the following
script:
print(classification_report(testY, model.predict(testX),target_names=le.classes_))
19
CHAPTER 5
SOFTWARE DESCRIPTION
5.1 INTRODUCTION
5.1.1 PYTHON
Python is a widely used general-purpose, high level programming language. It was
initially designed by Guido van Rossum in 1991 and developed by Python software
foundation. It was mainly developed for emphasis on code readability, and its syntax allows
programmers to express concepts in fewer lines of code. Python is a programming language
that lets you work quickly and integrate systems more efficiently.
It is used for:
• Web development (Server side)
• Software development
• Mathematics
• System scripting
Python is a popular platform used for research and development of production

systems. It is a vast language with number of modules, packages and libraries that provides
multiple ways of achieving a task.
Python and its libraries like NumPy, SciPy, Scikit-Learn, Matplotlib are used in
data science and data analysis. They are also extensively used for creating scalable
machine learning algorithms. Python implements popular machine learning techniques
such as Classification, Regression, Recommendation, and Clustering.
Python offers ready-made framework for performing data mining tasks on large
volumes of data effectively in lesser time. It includes several implementations achieved
through algorithms such as linear regression, logistic regression, Naïve Bayes, k-means,
K nearest neighbor, and Random Forest.
20
CHAPTER 6
RESULT AND DISCUSSION

The proposed work is evaluated over the database of 400 images of flowers. The
database consists images of each flower species contains 80 images. A detailed study is
completed to investigate the use of advance in machine learning for the detection of flower
species. A system for diagnosis the detection of flower species has been developed using
the Python software. The image data of the flower species are collected by using a digital
camera. Algorithm for classification based on computer vision platform were designed.
OUTPUT
21
ACCURACY, PRECISION AND RECALL
Consider a classification task in which a machine learning system observes tumors

and has to predict whether these tumors are benign or malignant. Accuracy, or the fraction
of instances that were classified correctly, is an obvious measure of the program's
performance. While accuracy does measure the program's performance, it does not make
distinction between malignant tumors that were classified as being benign, and benign
tumors that were classified as being malignant. In some applications, the costs incurred on
all types of errors may be the same. In this problem, however, failing to identify malignant
tumors is a more serious error than classifying benign tumors as being malignant by
mistake.
We can measure each of the possible prediction outcomes to create different

snapshots of the classifier's performance. When the system correctly classifies a tumor as
being malignant, the prediction is called a true positive. When the system incorrectly
classifies a benign tumor as being malignant, the prediction is a false positive. Similarly,
a false negative is an incorrect prediction that the tumor is benign, and a true negative is
a correct prediction that a tumor is benign. These four outcomes can be used to calculate
several common measures of classification performance, like accuracy, precision, recall
and so on.
Accuracy is calculated with the following formula −
ACC = (TP + TN)/(TP + TN + FP + FN)
Where,
TP is the number of true positives
TN is the number of true negatives
FP is the number of false positives
FN is the number of false negatives.

22
Precision is the fraction of the tumors that were predicted to be malignant that are
actually malignant. Precision is calculated with the following formula.
PREC = TP/(TP + FP)
Recall is the fraction of malignant tumors that the system identified. Recall is
calculated with the following formula.
R = TP/(TP + FN)
In this example, precision measures the fraction of tumors that were predicted to be
malignant that are actually malignant. Recall measures the fraction of truly malignant
tumors that were detected. The precision and recall measures could reveal that a classifier
with impressive accuracy actually fails to detect most of the malignant tumors. If most
tumors are benign, even a classifier that never predicts malignancy could have high
accuracy. A different classifier with lower accuracy and higher recall might be better
suited to the task, since it will detect more of the malignant tumors. Many other
performance measures for classification can also be used.
The results show that our KNN algorithm was able to classify all the 30 records in
the test set with 100% accuracy, which is excellent. Although the algorithm performed very
well with this dataset, don't expect the same results with all applications. As noted earlier,
KNN doesn't always perform as well with high-dimensionality or categorical features.
23
CHAPTER 7
CONCLUSION
KNN is a simple yet powerful classification algorithm. It requires no training for
making predictions, which is typically one of the most difficult parts of a machine
learning algorithm. The KNN algorithm have been widely used to find document
similarity and pattern recognition. It has also been employed for developing
recommender systems and for dimensionality reduction and pre-processing steps for
computer vision, particularly face recognition tasks.
The flower classification system takes the input image which is flower image taken
from dataset. Classification plays a important role in sorting the images. Providing
automated method for segmentation and recognition of flower species has many benefits
to the people either in the agricultural field or in any other fields. It accelerates the
learning process through automated and fast application; also it is a type of entertainment
and learning with fun for the people outside the agricultural field. In addition, building
our new Dataset that focus in the Arab region, show our rule in this field. Region growing
segmentation is applied with previous pre-process step, resizing the input to reduce the
segmentation processing time, and increase the segmentation quality.
24
REFERENCES
[1] Nilsback, M. E. and Zisserman, A. 2015. A Visual Vocabulary for flower
Classification. In the Proceedings of Computer Vision and Pattern Recognition, Vol. 2, pp.
1447-1454.
[2] Boykov, Y.Y. and Jolly, M.P. 2016.Interactive graph cuts for optimal boundary and
region segmentation of objects in N-D images. In Proc. ICCV, volume 2, pages 105-112.
[3] Nilsback, M. E. and Zisserman, A. 2017. Automated flower classification over a large
number of classes. In the Proceedings of Sixth Indian Conference on Computer Vision,
Graphics and Image Processing, pp. 722 – 729.
[4] Nilsback, M. E. and Zisserman, A. 2016. Delving into the whorl of flower segmentation.
In the Proceedings of British Machine Vision Conference, Vol. 1, pp. 27-30.
[5] Das, M., Manmatha, R., and Riseman, E. M. 2012. Indexing flower patent images using
domain knowledge. IEEE Intelligent systems, Vol. 14, No. 5, pp. 24-33.
[6] Saitoh, T., Aoki, K., and Kaneko, T. 2015. Automatic recognition of blooming flowers.
In the Proceedings of 17th International Conference on Pattern Recognition, Vol. 1, pp 27-
30.
[7] Yoshioka, Y., Iwata, H., Ohsawa, R.., and Ninomiya, S. 2016. Quantitative evaluation
of flower color pattern by image analysis and principal component analysis of Primula
sieboldii E. Morren. Euphytica, pp. 179 – 186, 2016.
[8] Gonzales, R. C., Woods, R. E., and Eddins, S. L. 2016. Digital Image Processing Using
MATLAB. Third edition.
[9] Haralick, R. M., Shanmugam, K., and Dinstein, I. 1973.Textural Features for image
classification. IEEE Transaction on System, man and Cybermatics, Vol. 3, No. 6. pp. 610
– 621.
[10] Varma, M. and Ray, D. 2017. Learning the discriminative power invariance trade-off.
In the Proceedings of 11th International Conference on Computer Vision, pp 1 – 8.
[11] Mortensen, E. and Barrett, W. A, 1995. Intelligent scissors for image composition. In
Proc.ACM SIGGRAPH, pages 191–198.
[12] Saitoh, T. and Kaneko, T. 2000.Automatic recognition of Wild Flowers.
25

Classification of Flower Species Final

Uploaded by

Copyright:

Available Formats

Classification of Flower Species Final

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classification of Flower Species Final

Uploaded by

Copyright:

Available Formats

AN EFFICIENT CLASSIFICATION OF FLOWER

SPECIES USING KNN

In partial fulfillment for the award of the degree

SETHU INSTITUTE OF TECHNOLOGY

Certified that this technical project report “AN EFFICIENT

INTERNAL EXAMINER EXTERNAL EXAMINER

We wish to express our sense of gratitude to our principal

We thank our parents, faculty members, supporting staff

CHAPTER TITLE PAGE NO

LIST OF ABBREVATIONS vii

1.1 MACHINE LEARNING 1

1.1.2 STEPS TO PERFORM ML 2

1.1.3 TYPES OF ML ALGORITHMS 3

1.1.4 SUPERVISED LEARNING ALGORITHM 3

1.1.5 CLASSIFICATION TECHNIQUE 4

1.1.6 APPLICATIONS OF ML ALGORITHMS 5

2 KNN CLASSIFIER ALGORITHM 5

2.1 WHEN DO WE USE KNN AKGORITHM? 6

2.2 WORKING PRINCIPLE OF KNN 7

2.3 HOW DO WE CHOOSE THE FACTOR K? 8

2.4 ADVANTAGES OF KNN ALGORITHM 12

2.5 LIMITATIONS TO KNN ALGORITHM 12

4.1 LIBRARIES AND PACKAGES 16

4.2 FLOWER DATASET 17

4.3 FLOWER IMAGE RESIZING 17

4.4 LABEL ENCODING 18

4.5 TRAIN TEST SPLIT 18

4.6 TRAINING AND PREDICTIONS 19

6 RESULT AND DISCUSSION 21

KNN - K-Nearest Neighbors

1.1.2 STEPS TO PERFORM ML ALGORITHMS

1.1.3 TYPES OF ML ALGORITHMS

• Supervised learning algorithm

1.1.4 SUPERVISED LEARNING ALGORITHM

Regression trains on and predicts a continuous-valued response, for example

Classification attempts to find the appropriate class label, such as analyzing

In supervised learning, learning data comes with description, labels, targets or

1.1.5 CLASSIFICATION TECHNIQUE

• Then the prepared data model is tested.

Classification, also called categorization, is a machine learning technique that uses

1.1.6 APPLICATIONS OF ML ALGORITHMS

KNN CLASSIFIER ALGORITHM

KNN is a non-parametric and lazy learning algorithm. Non-parametric means there

2.1 WHEN DO WE USE KNN ALGORITHM?

KNN algorithm fairs across all parameters of considerations. It is commonly used

2.3 HOW DO WE CHOOSE THE FACTOR K?

• KNN is simple to implement.

2.5 LIMITATIONS TO KNN ALGORITHM

[1] D.S Guru,Y.H Sharath,S.Manjunath, “Texture Features and KNN in Classification of

[5] Tanakorn Tiay, Pipimphorn Benyaphaichit, Panomkhawn riyamongkol ,“Flower

4.1 LIBRARIES AND PACKAGES

To understand machine learning, you need to have basic knowledge of Python

• Scikit-learn - It is a library in Python that provides many unsupervised and

• Imutils - A series of convenience functions to make basic image processing

• argparse - Python argparse is the recommended command-line argument

• OS - The OS module in Python provides a way of using operating system

• OpenCV – Open source computer vision library (OpenCV) is an open source

• Numpy – Numpy is the fundamental package for scientific computing with

4.3 FLOWER IMAGE RESIZING