Classification of Flower Species Final
Classification of Flower Species Final
Classification of Flower Species Final
A PROJECT REPORT
Submitted by
MANIMARAN R (Reg.No:2016104079)
MAREESWARAN S (Reg.No:2016104083)
MOHAMED ASHFALK A (Reg.No:2016104087)
i
SETHU INSTITUTIE OF TECHNOLOGY
AN AUTONOMOUS INSTITUTION
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Mrs. Helina Rajini Suresh M.E.,(Ph.D) Mrs.S.Amalorpava Mary Rajee M.E., (Ph.D)
HEAD OF THE DEPARTMENT SUPERVISOR
Department of ECE Professor/Department of ECE
Sethu Institute of Technology Sethu Institute of Technology
Pulloor, Kariapatti-626 115 Pulloor, Kariapatti-626 115
Submitted for the 15UEC609 - Project Work End Semester Examination held at
Sethu Institute of Technology on …………………….
ii
ACKNOWLEDGEMENT
First we like to thank god the almighty for giving us the talent and
opportunity to complete project.
We wish to express our earnest great fullness to our honorable founder and
chairman Mr. S. MOHAMED JALEEL B.Sc., B.L., for his encouragement
extend to us to undertake this project.
Dr. A. SENTHIL KUMAR M.E., Ph.D., for being given guidance kind and
cooperative encouragement, inspiration and keep interest show throughout the
We would like to express our deep sense of gratitude to our Head of the
Department Mrs. HELINA RAJINI SURESH M.E., (Ph.D.)., who extended
their heartiest encouragement, advice and valuable guidance through this project.
We would also like to acknowledge our deep sense of gratitude to our guide
Mrs. S. AMALORPAVA MARY RAJEE M.E.,(Ph.D)., for this enthusiastic
inspiration, constant encouragement, sustained guidance and scholarly
advice impaired throughout the course of this project.
iii
ABSTRACT
The ability of a machine learning model to classify or label an image into its respective
class with the help of learned features from hundreds of images is called as Image
Classification. Classification of images plays an important role in sorting the images into
classes based on their similarities. Recently, the demands on classifying images according
to their features have shown great interest in many areas such as digital library, searching
engine, or any content-based image retrieval system with the advantage of advanced
computer technologies. In this project, we will look into one such image classification
problem namely Flower species classification which is a hard problem because there are
millions of flower species around the world. As we know machine learning is all about
learning from past data, we need huge dataset of flower images to perform real-time flower
species recognition. Without worrying too much on real-time flower classification, we will
learn how to perform a simple image classification task using computer vision and machine
learning algorithms with the help of Python in this project. In this project, we will using
simple machine learning algorithm- KNN classifier to classify the flower species .
iv
TABLE OF CONTENTS
ABSTRACT iv
1 INTRODUCTION 1
1.1.1 PURPOSE OF ML 2
v
3 LITERATURE SURVEY 13
4 PROPOSED WORK 16
5 SOFTWARE DESCRIPTION 20
5.1 INTRODUCTION 20
5.1.1 PYTHON 20
7 CONCLUSION 24
REFERENCES 25
vi
LIST OF ABBREVIATIONS
ML - Machine Learning
vii
CHAPTER 1
INTRODUCTION
Nature has many different kinds of flowers, similarity in some features is found between
the flowers. For example, many flowers share the red color. On the other hand, these red
flowers are different from other features. Red flowers do not necessarily share the same
shape. These similarities and differences highlight the difficulty of identifying each flower
species automatically. Traditional flower recognition task is done by a botanist. Many
challenges are facing botanist through flower recognition task. This project aims at
providing an automated system that detects and recognizes flower species using machine
learning algorithm. The importance of building automated flower recognition method
stands out in many benefits such as providing fast recognition for educational purpose, as
automated method accelerates the learning process. Automated flower recognition gives
the people, with limited experience in flower species, the ability to recognize the species
of a flower, with the advantages of saving time and effort.
1.1 MACHINE LEARNING
Data science, machine learning and artificial intelligence are some of the top trending
topics in the tech world today. Data mining and Bayesian analysis are trending and this is
adding the demand for machine learning. Machine learning is a discipline that deals with
programming the systems so as to make them automatically learn and improve with
experience. Here, learning implies recognizing and understanding the input data and taking
informed decisions based on the supplied data. It is very difficult to consider all the
decisions based on all possible inputs. To solve this problem, algorithms are developed that
build knowledge from a specific data and past experience by applying the principles of
statistical science, probability, logic, mathematical optimization, reinforcement learning,
and control theory.
1
1.1.1 PURPOSE OF ML
Machine learning can be seen as a branch of AI or Artificial Intelligence, since, the
ability to change experience into expertise or to detect patterns in complex data is a mark
of human or animal intelligence. As a field of science, machine learning shares common
concepts with other disciplines such as statistics, information theory, game theory, and
optimization. As a subfield of information technology, its objective is to program machines
so that they will learn. However, it is to be seen that, the purpose of machine learning is
not building an automated duplication of intelligent behavior, but using the power of
computers to complement and supplement human intelligence. For example, machine
learning programs can scan and process huge databases detecting patterns that are beyond
the scope of human perception. In the real world, we usually come across lots of raw data
which is not fit to be readily processed by machine learning algorithms. We need to
preprocess the raw data before it is fed into various machine learning algorithms.
• Defining a Problem
• Preparing Data
• Evaluating Algorithms
• Improving Results
• Presenting Results
The best way to get started using Python for machine learning is to work through a
project end-to-end and cover the key steps like loading data, summarizing data, evaluating
algorithms and making some predictions. This gives you a replicable method that can be
used dataset after dataset. You can also add further data and improve the results.
2
Fig-1 : Block diagram of steps involved in ML
unsupervised learning.
3
Supervised learning can be further divided into two types.
1. Regression
2. Classification
Here, a learning algorithm analyzes the training data and produces a derived function
that can be used for mapping new examples. There are many supervised learning
algorithms such as Logistic Regression, Neural networks, Support Vector Machines
(SVMs), KNN classifiers and Naive Bayes classifiers. Common examples of supervised
learning include classifying e-mails into spam and not-spam categories, labeling webpages
based on their content, voice recognition and image classification.
While classifying a given set of data, the classifier system performs the following
actions.
• Initially a new data model is prepared using any of the learning algorithms.
• Later, this data model is used to examine the new data and to determine its class.
• Vision processing
• Language processing
• Forecasting things like stock market trends, weather
• Pattern recognition
• Games
• Data mining
• Expert systems
• Robotics
5
CHAPTER 2
KNN can be used for both classification and regression predictive problems. However,
it is more widely used in classification problems in the industry. To evaluate any technique.
we generally look at 3 important aspects:
1. Ease to interpret output
2. Calculation time
3. Predictive Power
Let us take a few examples to place KNN in the scale :
6
2.2 WORKING PRINCIPLE OF KNN
In KNN, K is the number of nearest neighbors. The number of neighbors is the core
deciding factor. K is generally an odd number if the number of classes is 2. When K=1,
then the algorithm is known as the nearest neighbor algorithm. This is the simplest case.
Suppose P1 is the point, for which label needs to predict. First, you find the one closest
point to P1 and then the label of the nearest point assigned to P1.
Suppose P1 is the point, for which label needs to predict. First, you find the k
closest point to P1 and then classify points by majority vote of its k neighbors. Each
object votes for their class and the class with the most votes is taken as the prediction. For
finding closest similar points, you find the distance between points using distance
measures such as Euclidean distance, Hamming distance, Manhattan distance and
Minkowski distance.
KNN has the following basic steps:
• Calculate distance
7
• Find closest neighbors
• Vote for labels
8
If you watch carefully, you can see that the boundary becomes smoother with increasing
value of K. With K increasing to infinity it finally becomes all blue or all red depending on
the total majority. The training error rate and the validation error rate are two parameters
we need to access on different K-value. Following is the curve for the training error rate
with varying value of K .
9
As you can see, the error rate at K=1 is always zero for the training sample. This is because
the closest point to any training data point is itself. Hence, the prediction is always accurate
with K=1. If validation error curve would have been similar, our choice of K would have
been 1. Following is the validation error curve with varying value of K.
10
This makes the story more clear. At K=1, we were overfitting the boundaries. Hence, error
rate initially decreases and reaches a minima. After the minima point, it then increase with
increasing K. To get the optimal value of K, you can segregate the training and validation
from the initial dataset. Now plot the validation error curve to get the optimal value of K.
This value of K should be used for all predictions.
11
2.4 ADVANTAGES OF KNN ALGORITHM
12
CHAPTER 3
LITERATURE SURVEY
[2] Jinho Kim, Byung-Soo Kim, Silvio savarese, “Comparing Image Classification
Methods: K-Nearest-Neighbor and Support-Vector-Machines”, IEEE 2017.
In order for a robot or a computer to perform tasks, it must recognize what it is looking at.
Given an image a computer must be able to classify what the image represents. While this
is a fairly simple task for humans, it is not an easy task for computers. Computers must go
through a series of steps in order to classify a single image. In this paper, we used a general
Bag of Words model in order to compare two different classification methods. Both K-
Nearest-Neighbor (KNN) and Support-Vector-Machine (SVM) classification are well
known and widely used. We were able to observe that the SVM classifier outperformed the
KNN classifier. For future work, we hope to use more categories for the objects and to use
more sophisticated classifiers.
13
[3] Surbi saxena, D.R ochawar, “PERFORMANCE ANALYSIS OF IMAGE
CLASSIFICATION ALGORITHMS”,IEEE 2016.
Image classification plays an integral role in computer vision. Given an image a computer
must be able to classify what the image represents. While this is a fairly simple task for
humans, it is not an easy task for computers. Computers must go through a series of steps
in order to classify a single image. In this paper, we used a saliency based segmentation
process to compare two different classification methods. Both K-NearestNeighbor (KNN)
and Support-Vector-Machine (SVM) classification are designed using VHDL code. We
were able to observe that the SVM classifier outdoes the KNN classifier. For future work,
we hope to use more categories for the objects and to use more sophisticated classifiers.
[4] Bo sun, Jumping nu, Tian geo ,“Study on the improvement of K-Nearest-Neighbour
Algorithm”,IEEE 2017.
As one of the instance based learning method, the K-nearest-neighbor (KNN) algorithm
has been widely used in many fields. This paper accomplishes the improvements on the
two aspects. First, aiming to improve the efficiency of classifying, we move some
computations occurring at classifying period to the training period, which leads to the great
descent of computational cost. Second, to improve the accuracy of classifying, we take into
account of the contribution of different attributes and obtain the optimal attribute weight
sets using the quadratic programming method. Finally, this paper gives the validation of
the improvements through practical experiment.
14
[6] Riddhi H. Shaparia, Dr Narendra M. Patel, Zankhana H. Shah ,“ Flower Classification
using Texture and Color Features”, IEEE 2016.
In this research paper, we have used texture and color features for flower classification.
Standard database of flowers have used for experiments. The preprocessing like noise
removal and segmentation for elimination of background are apply on input images.
Texture and color features are extracted from the segmented images. Texture feature is
extracted using GLCM (Gray Level Co-occurrence Matrix) method and color feature is
extracted using Color moment. For classification, neural network classifier is used. The
overall accuracy of the system is 95.0 %.
15
CHAPTER 4
PROPOSED WORK
Label encoding refers to changing the word labels into numbers so that the algorithms
can understand how to work on them.
The above script splits the dataset into 75% train data and 25% test data. This means that
out of total 100 records, the training set will contain 75 records and the test set contains 25
of those records.
18
4.5.1 TRAIN DATA
The observations in the training set form the experience that the algorithm uses to
learn. In supervised learning problems, each observation consists of an observed output
variable and one or more observed input variables.
The test set is a set of observations used to evaluate the performance of the model
using some performance metric. It is important that no observations from the training set
are included in the test set. If the test set does contain examples from the training set, it
will be difficult to assess whether the algorithm has learned to generalize from the training
set or has simply memorized it.
model = KNeighborsClassifier(n_neighbors=1)
model.fit(trainX, trainY)
The first step is to import the KNeighborsClassifier class from
the sklearn.neighbors library. In the second line, this class is initialized with one parameter,
i.e. n_neigbours. This is basically the value for the K. There is no ideal value for K and it
is selected after testing and evaluation, however to start out, 5 seems to be the most
commonly used value for KNN algorithm.
The final step is to make predictions on our test data. To do so, execute the following
script:
print(classification_report(testY, model.predict(testX),target_names=le.classes_))
19
CHAPTER 5
SOFTWARE DESCRIPTION
5.1 INTRODUCTION
5.1.1 PYTHON
Python is a widely used general-purpose, high level programming language. It was
initially designed by Guido van Rossum in 1991 and developed by Python software
foundation. It was mainly developed for emphasis on code readability, and its syntax allows
programmers to express concepts in fewer lines of code. Python is a programming language
that lets you work quickly and integrate systems more efficiently.
It is used for:
• Software development
• Mathematics
• System scripting
Python and its libraries like NumPy, SciPy, Scikit-Learn, Matplotlib are used in
data science and data analysis. They are also extensively used for creating scalable
machine learning algorithms. Python implements popular machine learning techniques
such as Classification, Regression, Recommendation, and Clustering.
Python offers ready-made framework for performing data mining tasks on large
volumes of data effectively in lesser time. It includes several implementations achieved
through algorithms such as linear regression, logistic regression, Naïve Bayes, k-means,
K nearest neighbor, and Random Forest.
20
CHAPTER 6
21
ACCURACY, PRECISION AND RECALL
Where,
Recall is the fraction of malignant tumors that the system identified. Recall is
calculated with the following formula.
R = TP/(TP + FN)
In this example, precision measures the fraction of tumors that were predicted to be
malignant that are actually malignant. Recall measures the fraction of truly malignant
tumors that were detected. The precision and recall measures could reveal that a classifier
with impressive accuracy actually fails to detect most of the malignant tumors. If most
tumors are benign, even a classifier that never predicts malignancy could have high
accuracy. A different classifier with lower accuracy and higher recall might be better
suited to the task, since it will detect more of the malignant tumors. Many other
performance measures for classification can also be used.
The results show that our KNN algorithm was able to classify all the 30 records in
the test set with 100% accuracy, which is excellent. Although the algorithm performed very
well with this dataset, don't expect the same results with all applications. As noted earlier,
KNN doesn't always perform as well with high-dimensionality or categorical features.
23
CHAPTER 7
CONCLUSION
KNN is a simple yet powerful classification algorithm. It requires no training for
making predictions, which is typically one of the most difficult parts of a machine
learning algorithm. The KNN algorithm have been widely used to find document
similarity and pattern recognition. It has also been employed for developing
recommender systems and for dimensionality reduction and pre-processing steps for
computer vision, particularly face recognition tasks.
The flower classification system takes the input image which is flower image taken
from dataset. Classification plays a important role in sorting the images. Providing
automated method for segmentation and recognition of flower species has many benefits
to the people either in the agricultural field or in any other fields. It accelerates the
learning process through automated and fast application; also it is a type of entertainment
and learning with fun for the people outside the agricultural field. In addition, building
our new Dataset that focus in the Arab region, show our rule in this field. Region growing
segmentation is applied with previous pre-process step, resizing the input to reduce the
segmentation processing time, and increase the segmentation quality.
24
REFERENCES
[1] Nilsback, M. E. and Zisserman, A. 2015. A Visual Vocabulary for flower
Classification. In the Proceedings of Computer Vision and Pattern Recognition, Vol. 2, pp.
1447-1454.
[2] Boykov, Y.Y. and Jolly, M.P. 2016.Interactive graph cuts for optimal boundary and
region segmentation of objects in N-D images. In Proc. ICCV, volume 2, pages 105-112.
[3] Nilsback, M. E. and Zisserman, A. 2017. Automated flower classification over a large
number of classes. In the Proceedings of Sixth Indian Conference on Computer Vision,
Graphics and Image Processing, pp. 722 – 729.
[4] Nilsback, M. E. and Zisserman, A. 2016. Delving into the whorl of flower segmentation.
In the Proceedings of British Machine Vision Conference, Vol. 1, pp. 27-30.
[5] Das, M., Manmatha, R., and Riseman, E. M. 2012. Indexing flower patent images using
domain knowledge. IEEE Intelligent systems, Vol. 14, No. 5, pp. 24-33.
[6] Saitoh, T., Aoki, K., and Kaneko, T. 2015. Automatic recognition of blooming flowers.
In the Proceedings of 17th International Conference on Pattern Recognition, Vol. 1, pp 27-
30.
[7] Yoshioka, Y., Iwata, H., Ohsawa, R.., and Ninomiya, S. 2016. Quantitative evaluation
of flower color pattern by image analysis and principal component analysis of Primula
sieboldii E. Morren. Euphytica, pp. 179 – 186, 2016.
[8] Gonzales, R. C., Woods, R. E., and Eddins, S. L. 2016. Digital Image Processing Using
MATLAB. Third edition.
[9] Haralick, R. M., Shanmugam, K., and Dinstein, I. 1973.Textural Features for image
classification. IEEE Transaction on System, man and Cybermatics, Vol. 3, No. 6. pp. 610
– 621.
[10] Varma, M. and Ray, D. 2017. Learning the discriminative power invariance trade-off.
In the Proceedings of 11th International Conference on Computer Vision, pp 1 – 8.
[11] Mortensen, E. and Barrett, W. A, 1995. Intelligent scissors for image composition. In
Proc.ACM SIGGRAPH, pages 191–198.
[12] Saitoh, T. and Kaneko, T. 2000.Automatic recognition of Wild Flowers.
25