Flower Image Classification Using CNN
Flower Image Classification Using CNN
A. Conv2D
IV. CONVOLUTIONAL NEURAL NETWORKS (CNNS)
A CNN is typically composed of multiple layers, including
Convolutional Neural Networks (CNNs) have been widely one or more convolutional layers (Conv2D) as well as other
used for image classification tasks because of their ability to types of layers such as pooling layers, fully connected layers
automatically learn features from images. The key advantage and normalization layers.
of CNNs is their ability to learn hierarchies of features, with
lower-level features such as edges and textures being learned Conv2D layers are a specific type of layer that are used in
in the early layers and higher-level features such as shapes and CNNs to learn spatial hierarchies of features from images.
parts of objects being learned in the deeper layers. This These layers work by applying a set of filters to small regions
hierarchical structure allows CNNs to automatically learn of the input image, creating a set of feature maps. Each filter
relevant features for image classification, which can be is designed to detect a specific feature or pattern in the image,
difficult or time-consuming to design manually. such as edges or textures. By applying multiple filters, the
convolutional layer can learn a variety of different features
Another advantage of CNNs is that they can effectively from the image.
process images of different scales, orientations, and
translations, making them robust to image variations. This is The role of Conv2D layers in a CNN is to learn the local
achieved by the use of convolutional layers, which apply patterns of the images and extract the features from the
filters to small regions of the input image and are able to images. These features are then passed to the next layer, which
maintain the spatial relationships between pixels. This allows can be either another convolutional layer or a pooling layer.
CNNs to learn spatial hierarchies of features, enabling them to The pooling layer reduces the dimensionality of the data,
detect patterns and features regardless of their position in the while preserving the most important information. This process
image. is repeated through multiple layers and the features learned by
each layer are combined to form a more robust and abstract
Additionally, CNNs use pooling layers to reduce the
feature representation of the image.
dimensionality of the data while preserving the most
important information. This reduces the computational cost of Finally, the output of the last convolutional/pooling layer
the model and improves its generalization capabilities. is passed to a fully connected layer (also known as Dense
layer), which uses the features learned by the previous layers
Finally, CNNs are able to take advantage of the large to make a final prediction about the image.
amount of labeled data and computational resources available
today, which allows them to be trained on large and complex The main advantage of Conv2D layers is that they are able
datasets. This is particularly useful for image classification to learn local patterns in the image, preserving the spatial
tasks, where large amounts of labeled data is required to train information of the input. For example, Conv2D layers are able
accurate models. to detect edges and textures regardless of their position in the
image, which allows the network to be robust to translations
Overall, CNNs are well-suited for image classification and rotations. Conv2D layers also share their parameters
tasks because they can automatically learn relevant features across different regions of the input image, reducing the
from images, handle image variations and preserve the most number of parameters required to learn and also reducing
important information, and are able to take advantage of large overfitting.
amounts of labeled data and computational resources [3].
Conv2D layers have also the ability to increase the depth
We will use the Sequential model to add the necessary of the representation in CNNs, which means creating more
layers to our CNN. The Sequential model is a way to define feature maps, and hence, adding more layers to learn more
the architecture of the neural network by creating a linear stack abstract and complex features. This process called stacking, is
of layers. The Sequential model is a simple and easy-to-use essential for the CNNs to build a robust feature representations
model provided by the Keras library, which allows you to to classify images.
define the architecture of your network by adding layers one
by one. Furthermore, Conv2D layers allow the network to learn
features that are sensitive to the spatial relationships between
pixels, which is crucial for tasks such as image classification. The strides parameter controls the step size of the pooling
By using Conv2D layers, the network can learn features such window, which is typically set to the same value as the
as shapes and parts of objects, which are crucial for classifying pool_size to ensure that the pooling window does not overlap
images correctly [4]. with itself [4].
Overall, the choice of activation function depends on the When working with image data, it is common to apply data
dataset, problem, and the specific requirements of the model. augmentation techniques such as:
In practice, ReLU and its variants are often used in Conv2D
layers due to their computational efficiency, stability, and • Rotation: images can be rotated by a random angle
effectiveness in preventing overfitting [5]. to make the model more robust to rotations in the
input.
C. Max Pooling • Translation: images can be translated by a random
In a CNN, max pooling is a technique used to down-sample amount to make the model more robust to
the spatial dimensions of the feature maps, which can reduce translations in the input.
the computational cost of the model and improve its • Flipping: images can be horizontally or vertically
generalization capabilities. The max pooling operation is flipped to make the model more robust to symmetry
typically applied after one or more convolutional layers and in the input.
works by dividing the feature map into a set of non- • Scaling: images can be scaled by a random factor to
overlapping regions, or pooling windows, and then taking the make the model more robust to scale variations in
maximum value of each region. the input.
• Zooming: images can be zoomed in or out to make
The max pooling operation is implemented in Keras using the the model more robust to zoom variations in the
MaxPooling2D layer. The layer is typically added after one input.
or more convolutional layers, and it takes several arguments • Lighting: brightness and contrast can be adjusted to
such as pool_size and strides to control the size and stride of make the model more robust to lighting variations in
the pooling window. the input.
The pool_size parameter controls the size of the pooling VI. THE FINAL MODEL
window, which is typically set to (2, 2) to reduce the spatial We compile the model by setting the optimizer, loss function
dimensions by a factor of 2 in each direction. and evaluation metrics. 'Adam' optimizer is used, it's default
learning rate is 0.001. The loss function used is
categorical_crossentropy which is commonly used for multi-
class classification problems. The evaluation metric is
'accuracy' which is the ratio of correctly classified samples to Our model achieved an accuracy score of 77% on the test set.
total number of samples. The best precision score was identified on Daisy flowers with
The data was split in training and test sets and a 20% test size a score of 0.91. The flower with the highest recall as well as
was used. We then used datagen which is a generator that the highest F1-score was the Sunflower which was 0.91 and
yields batches of data for training. It takes the training data 0.89 respectively. Overall, the model seems to correctly
X_train and labels y_train, and a batch size of 128. This identify Sunflowers and Daisies better than the rest.
generates data for the training process in a batch of 128 at a
time. The number of epochs was set to 50. The fit() method
trains the model on the generator and also performs validation
on X_test and y_test data set at the end of each epoch, with a
batch size of 128 and runs for a total of 50 epochs.
VII. EVALUATION
For evaluation metrics we utilized the confusion matrix. The
confusion matrix provides a useful way to summarize a large
amount of data and can be used to calculate various
evaluation metrics such as accuracy, precision, recall and F1-
Score, which are useful to analyze the performance of the
model [7].