0% found this document useful (0 votes)
20 views5 pages

Flower Image Classification Using CNN

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

Flower Image Classification Using CNN

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Flower Image Classification Using CNN

Cuong Duc Nguyen Georgios Kokolakis


University of Skövde University of Skövde
Skövde, Sweden Skövde, Sweden
[email protected] [email protected]

Abstract—In this paper, we present a system for classifying


images of flowers using deep convolutional neural networks III. THE PROCESS
(CNNs). The system is trained and tested on a dataset of images Input: The first step is to provide the CNN with the input data,
of flowers, where each image is labeled with the species of flower
such as an image or a video. The input data is typically
it depicts. We evaluate the performance of the system using
metrics such as accuracy, precision, and recall. Our system preprocessed and normalized to ensure that the network can
achieves high accuracy in classifying images of flowers, effectively learn from it.
demonstrating the effectiveness of CNNs in this task. Overall,
the proposed system is a valuable tool for automating the Convolution: The next step is the convolution operation,
identification of flower species, which can have applications in where a set of filters (also called kernels or weights) are
fields such as conservation, agriculture, and research. applied to the input data. These filters are used to extract
Additionally, the findings in this paper could be extend to other features from the input data, such as edges, textures, and
image classification tasks such as plant recognition in general. patterns. The convolution operation produces feature maps
that represent different aspects of the input data.
Keywords—CNN, Image Classification, Layers,
Convolutional, Pooling, Activation Function, Evaluation Metrics,
Confusion Matrix. Non-Linearity (Activation): Next step, the feature maps are
passed through an activation function, which introduces non-
linearity into the model. Commonly used activation functions
I. INTRODUCTION
are ReLU, sigmoid and Tanh.
Flower image classification is a vital task in fields such as
botany, horticulture and ecology. Flowers are used as a model Pooling: After the convolution and activation step, a pooling
system in various scientific studies, thus accurate operation is applied to the feature maps. The purpose of
identification of flower species is crucial in data collection and pooling is to down-sample the feature maps and reduce their
analysis. In recent years, the rise of deep learning has led to a
dimensions, which helps to control overfitting and reduce the
significant improvement in image classification performance.
number of parameters in the network.
Among the deep learning architectures, Convolutional Neural
Networks (CNNs) have been widely used to achieve state-of-
the-art results in many image classification tasks. Repeating steps 2-4: These three steps are repeated multiple
times in the network, with each repetition resulting in a
In this work, we propose a CNN-based model for deeper and more abstract representation of the input data.
classifying flower images. The proposed model uses transfer
learning, which is a powerful technique to leverage the Fully Connected Layers: The output of the final
knowledge learned by a pre-trained model, fine-tuning it on a convolutional/pooling layer is passed through one or more
new dataset. We have collected a dataset of flower images and fully connected layers. These layers are used to make final
labeled them according to their species. The model is trained predictions or decisions based on the learned features.
and tested on this dataset, and the performance is measured by
various evaluation metrics such as accuracy, precision and
recall. Our goal is to achieve high classification performance, Output: Finally, the output of the CNN is generated, typically
to further its practical applications in various fields such as in the form of class scores or probabilities.
conservation, botanical research and commercial agriculture.
Backpropagation and Optimization: Based on the output the
network makes a prediction, the prediction is then compared
II. THE DATASET
to the actual output (label) and the error is calculated using a
The dataset used comprises of 4242 images of flowers that loss function, optimizer then use backpropagation to optimize
were collected from different sources such as flickr, google the model parameters, making the network to reduce the
images and yandex images. The images are labeled into five error. [2]
different classes, namely chamomile, tulip, rose, sunflower,
and dandelion, with around 800 images per class. These
images can be used to train models for recognizing different
plant species from photographs. However, it is important to
note that these images are not of high resolution, being
approximately 320x240 pixels and also have different
proportion [1].
Figure 1 Complete Flow of CNN to process an input image and classifies the objects based on values .

A. Conv2D
IV. CONVOLUTIONAL NEURAL NETWORKS (CNNS)
A CNN is typically composed of multiple layers, including
Convolutional Neural Networks (CNNs) have been widely one or more convolutional layers (Conv2D) as well as other
used for image classification tasks because of their ability to types of layers such as pooling layers, fully connected layers
automatically learn features from images. The key advantage and normalization layers.
of CNNs is their ability to learn hierarchies of features, with
lower-level features such as edges and textures being learned Conv2D layers are a specific type of layer that are used in
in the early layers and higher-level features such as shapes and CNNs to learn spatial hierarchies of features from images.
parts of objects being learned in the deeper layers. This These layers work by applying a set of filters to small regions
hierarchical structure allows CNNs to automatically learn of the input image, creating a set of feature maps. Each filter
relevant features for image classification, which can be is designed to detect a specific feature or pattern in the image,
difficult or time-consuming to design manually. such as edges or textures. By applying multiple filters, the
convolutional layer can learn a variety of different features
Another advantage of CNNs is that they can effectively from the image.
process images of different scales, orientations, and
translations, making them robust to image variations. This is The role of Conv2D layers in a CNN is to learn the local
achieved by the use of convolutional layers, which apply patterns of the images and extract the features from the
filters to small regions of the input image and are able to images. These features are then passed to the next layer, which
maintain the spatial relationships between pixels. This allows can be either another convolutional layer or a pooling layer.
CNNs to learn spatial hierarchies of features, enabling them to The pooling layer reduces the dimensionality of the data,
detect patterns and features regardless of their position in the while preserving the most important information. This process
image. is repeated through multiple layers and the features learned by
each layer are combined to form a more robust and abstract
Additionally, CNNs use pooling layers to reduce the
feature representation of the image.
dimensionality of the data while preserving the most
important information. This reduces the computational cost of Finally, the output of the last convolutional/pooling layer
the model and improves its generalization capabilities. is passed to a fully connected layer (also known as Dense
layer), which uses the features learned by the previous layers
Finally, CNNs are able to take advantage of the large to make a final prediction about the image.
amount of labeled data and computational resources available
today, which allows them to be trained on large and complex The main advantage of Conv2D layers is that they are able
datasets. This is particularly useful for image classification to learn local patterns in the image, preserving the spatial
tasks, where large amounts of labeled data is required to train information of the input. For example, Conv2D layers are able
accurate models. to detect edges and textures regardless of their position in the
image, which allows the network to be robust to translations
Overall, CNNs are well-suited for image classification and rotations. Conv2D layers also share their parameters
tasks because they can automatically learn relevant features across different regions of the input image, reducing the
from images, handle image variations and preserve the most number of parameters required to learn and also reducing
important information, and are able to take advantage of large overfitting.
amounts of labeled data and computational resources [3].
Conv2D layers have also the ability to increase the depth
We will use the Sequential model to add the necessary of the representation in CNNs, which means creating more
layers to our CNN. The Sequential model is a way to define feature maps, and hence, adding more layers to learn more
the architecture of the neural network by creating a linear stack abstract and complex features. This process called stacking, is
of layers. The Sequential model is a simple and easy-to-use essential for the CNNs to build a robust feature representations
model provided by the Keras library, which allows you to to classify images.
define the architecture of your network by adding layers one
by one. Furthermore, Conv2D layers allow the network to learn
features that are sensitive to the spatial relationships between
pixels, which is crucial for tasks such as image classification. The strides parameter controls the step size of the pooling
By using Conv2D layers, the network can learn features such window, which is typically set to the same value as the
as shapes and parts of objects, which are crucial for classifying pool_size to ensure that the pooling window does not overlap
images correctly [4]. with itself [4].

B. Activation Functions D. Flatten & Dense Layers


In a Conv2D layer, the activation function is used to We finally add the Flatten and Dense layers to complete our
introduce non-linearity into the network, allowing it to learn CNN. In a CNN, the Flatten and Dense layers are used to
complex and abstract features from the input. The activation prepare the feature maps learned by the previous layers for
function is applied element-wise to the output of the final classification.
convolution operation, creating a new output called the
activation map. The Flatten layer is used to convert the feature maps from a
2D array to a 1D array. It does this by unrolling the 2D array
The most commonly used activation functions in Conv2D into a long 1D array, which can be used as input to the final
layers are ReLU (Rectified Linear Unit) and its variants such fully connected layers of the network. The flatten layer is
as LeakyReLU, PReLU, and ELU (Exponential Linear Unit). usually added after one or more convolutional and max
ReLU is a simple and computationally efficient activation pooling layers.
function that replaces all negative values with zero, which
speeds up the training process and can also help reduce The Dense layer is a fully connected layer that is used to make
overfitting. the final prediction. It is typically added after the Flatten layer
and takes the 1D array of features learned by the previous
LeakyReLU is a variant of ReLU that allows small negative layers as input. The Dense layer applies a linear
values to pass through, which can help prevent the dying transformation to the input, which is then followed by a non-
ReLU problem (when all the neurons output zero). linear activation function. The Dense layer has a number of
PReLU is another variant of ReLU that learns the value of the units (also known as neurons) that indicate the number of
negative slope of the function. output classes.
ELU is similar to ReLU but instead of replacing negative V. DATA AUGMENTATION
values with zero, it replaces it with an exponential function,
which allows for the network to learn smoother and more Data augmentation is a technique used to artificially increase
robust features. the size of a dataset by applying various transformations to
the images, such as rotation, translation, flipping, and scaling.
Other activation functions like Sigmoid, tanh and Softmax The goal of data augmentation is to increase the diversity of
can also be used but they are less commonly used as they may the training data, which can help improve the robustness and
lead to slow down the training process and overfitting. generalization of the CNN model [6].

Overall, the choice of activation function depends on the When working with image data, it is common to apply data
dataset, problem, and the specific requirements of the model. augmentation techniques such as:
In practice, ReLU and its variants are often used in Conv2D
layers due to their computational efficiency, stability, and • Rotation: images can be rotated by a random angle
effectiveness in preventing overfitting [5]. to make the model more robust to rotations in the
input.
C. Max Pooling • Translation: images can be translated by a random
In a CNN, max pooling is a technique used to down-sample amount to make the model more robust to
the spatial dimensions of the feature maps, which can reduce translations in the input.
the computational cost of the model and improve its • Flipping: images can be horizontally or vertically
generalization capabilities. The max pooling operation is flipped to make the model more robust to symmetry
typically applied after one or more convolutional layers and in the input.
works by dividing the feature map into a set of non- • Scaling: images can be scaled by a random factor to
overlapping regions, or pooling windows, and then taking the make the model more robust to scale variations in
maximum value of each region. the input.
• Zooming: images can be zoomed in or out to make
The max pooling operation is implemented in Keras using the the model more robust to zoom variations in the
MaxPooling2D layer. The layer is typically added after one input.
or more convolutional layers, and it takes several arguments • Lighting: brightness and contrast can be adjusted to
such as pool_size and strides to control the size and stride of make the model more robust to lighting variations in
the pooling window. the input.
The pool_size parameter controls the size of the pooling VI. THE FINAL MODEL
window, which is typically set to (2, 2) to reduce the spatial We compile the model by setting the optimizer, loss function
dimensions by a factor of 2 in each direction. and evaluation metrics. 'Adam' optimizer is used, it's default
learning rate is 0.001. The loss function used is
categorical_crossentropy which is commonly used for multi-
class classification problems. The evaluation metric is
'accuracy' which is the ratio of correctly classified samples to Our model achieved an accuracy score of 77% on the test set.
total number of samples. The best precision score was identified on Daisy flowers with
The data was split in training and test sets and a 20% test size a score of 0.91. The flower with the highest recall as well as
was used. We then used datagen which is a generator that the highest F1-score was the Sunflower which was 0.91 and
yields batches of data for training. It takes the training data 0.89 respectively. Overall, the model seems to correctly
X_train and labels y_train, and a batch size of 128. This identify Sunflowers and Daisies better than the rest.
generates data for the training process in a batch of 128 at a
time. The number of epochs was set to 50. The fit() method
trains the model on the generator and also performs validation
on X_test and y_test data set at the end of each epoch, with a
batch size of 128 and runs for a total of 50 epochs.
VII. EVALUATION
For evaluation metrics we utilized the confusion matrix. The
confusion matrix provides a useful way to summarize a large
amount of data and can be used to calculate various
evaluation metrics such as accuracy, precision, recall and F1-
Score, which are useful to analyze the performance of the
model [7].

Here is a general explanation of the terms that are used to


describe the matrix:
Figure 2. Model loss over the Epochs on training and testing sets
• True Positives (TP): These are the cases in which we
predicted yes (they have the condition), and they do
have the condition.
• True Negatives (TN): We predicted no, and they
don't have the condition.
• False Positives (FP): We predicted yes, but they
don't actually have the condition. (Type I error)
• False Negatives (FN): We predicted no, but they
actually do have the condition. (Type II error)

Accuracy, precision, recall and F1-score are evaluation


metrics that are commonly used to measure the performance
of a classification model. They can be calculated using the
values in a confusion matrix.

• Accuracy is the ratio of correctly predicted


observation to the total observations. Figure 3. The Models' Accuracy over the Epochs on the training
• Accuracy = (TP + TN) / (TP + TN + FP + FN) and testing sets
• Precision is the ratio of correctly predicted
positive observations to the total predicted FUTURE WORK
positive observations.
• Precision = TP / (TP + FP) Although our model performs adequately, there are several
• Recall (Sensitivity or True Positive Rate) is the steps we could follow, given more time, that would improve
ratio of correctly predicted positive our models’ performance.
observations to the total actual positive
observations. Namely, we could collect more data. Having a large and
diverse dataset is crucial for training accurate image
• Recall = TP / (TP + FN)
classification models. If your dataset is small or not diverse
• F1-Score is the Harmonic Mean between
enough, collecting more data can help improve the model's
precision and recall. The range for F1 Score is
performance.
[0, 1]. It tells you how precise your classifier is
(how many instances it classifies correctly), as
We could also choose a more powerful model architecture.
well as how robust it is (it does not miss a
There are several pre-trained models available like ResNet,
significant number of instances). The greater
VGG, Inception etc. Using a more complex and powerful
the F1 Score, the better is the performance of
model architecture can improve the performance of the
our model.
model.
• F1-Score = 2*(Recall * Precision) / (Recall +
Precision)
Also, tuning our hyperparameters might increase our • Scenes: Beach, city, mountain, etc.
accuracy. Tuning the hyperparameters of the model, such as • Objects and tools: Scissors, hammer, computer, etc.
the learning rate, batch size, and number of epochs can
improve performance. Grid search or random search can be REFERENCES
used to find the optimal hyperparameters.
[1] “Flowers Recognition.”
Using Ensemble methods by combining multiple models can https://www.kaggle.com/datasets/alxmamaev/flowers-
also improve performance. For example, by training multiple recognition (accessed Dec. 23, 2022).
models and then taking a majority vote, we can improve the [2] Prabhu, “Understanding of Convolutional Neural
overall accuracy. Network (CNN) — Deep Learning,” Medium, Nov. 21,
2019.
Furthermore, leveraging the power of Cloud and GPUs, https://medium.com/@RaghavPrabhu/understanding-
because the computational power of cloud and the utilization of-convolutional-neural-network-cnn-deep-learning-
of Graphics Processing Units (GPUs) can greatly speed up 99760835f148 (accessed Jan. 11, 2023).
the training process, and therefore improve the performance [3] T. Guo, J. Dong, H. Li, and Y. Gao, “Simple
of the model. convolutional neural network on image classification,”
in 2017 IEEE 2nd International Conference on Big
It's worth noting that these methods are not mutually Data Analysis (ICBDA), Mar. 2017, pp. 721–724. doi:
exclusive, and often, a combination of these methods is used 10.1109/ICBDA.2017.8078730.
to achieve the best results. It's also important to keep in mind [4] M. Agarwal, S. Gupta, and K. K. Biswas, “A new
that accuracy is not the only metric to consider when Conv2D model with modified ReLU activation
evaluating the performance of a model and that the choice of function for identification of disease type and severity
method(s) will depend on the specific problem and the in cucumber plant,” Sustain. Comput. Inform. Syst., vol.
available resources. 30, p. 100473, Jun. 2021, doi:
10.1016/j.suscom.2020.100473.
In the future, we could also try to classify different objects [5] K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep
besides flowers with our model to evaluate its performance. into Rectifiers: Surpassing Human-Level Performance
Specifically, among other things the model can be used to on ImageNet Classification,” in 2015 IEEE
classify: International Conference on Computer Vision (ICCV),
Santiago, Chile, Dec. 2015, pp. 1026–1034. doi:
• Animals: Dogs, cats, horses, birds, etc. 10.1109/ICCV.2015.123.
• Vehicles: Cars, trucks, buses, motorcycles, [6] S. Y. Feng et al., “A Survey of Data Augmentation
airplanes, etc. Approaches for NLP.” arXiv, Dec. 01, 2021. Accessed:
• Plants: Trees, grass, etc. Jan. 11, 2023. [Online]. Available:
• Fruits and vegetables: Apples, bananas, tomatoes, http://arxiv.org/abs/2105.03075
etc. [7] M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM:
• Household items: Chairs, tables, lamps, etc. Multi-Label Confusion Matrix,” IEEE Access, vol. 10,
• Fashion and apparel: Clothing, shoes, jewelry, etc. pp. 19083–19095, 2022, doi:
• Food: Pizza, sushi, sandwiches, etc. 10.1109/ACCESS.2022.3151048.
• Body parts: faces, eyes, hands, etc.

You might also like