0% found this document useful (0 votes)

34 views6 pages

Butterfly Image Classification Using Convulational Neural Network (CNN)

Butterfly species identification through image classification is now a major use of computer vision, utilizing supervised learning methods to classify different types of butterflies based on images. This article gives a detailed overview of the latest progress in classifying butterfly images through supervised learning techniques on Google Colab, a widely-used cloud-based platform for machine learning projects.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views6 pages

Butterfly Image Classification Using Convulational Neural Network (CNN)

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG974

Butterfly Image Classification Using

Convulational Neural Network[CNN]
Amruth N Murthy1; Kavana N Murthy2; Dr. Shivandappa3; Dr. Narendra Kumar S4
Department of Biotechnology, R V College of Engineering, Bengaluru, India

Abstract:- Butterfly species identification through image help track population dynamics, understand ecological
classification is now a major use of computer vision, interactions, and protect endangered species. Recent
utilizing supervised learning methods to classify different advances in machine learning, particularly supervised
types of butterflies based on images. This article gives a learning, have enabled the development of sophisticated
detailed overview of the latest progress in classifying models capable of distinguishing between different butterfly
butterfly images through supervised learning techniques species with high accuracy.
on Google Colab, a widely-used cloud-based platform for
machine learning projects. The review starts by Supervised learning, a branch of machine learning
discussing the significance of precise butterfly where models are trained on labelled data, has been shown to
categorization for biodiversity research and conservation be particularly effective for image classification tasks.
endeavors. It then goes into specifics about different Techniques such as convolutional neural networks (CNN),
methods used in supervised learning for this purpose, support vector machines (SVM) and k-nearest neighbours (k-
such as convolutional neural networks (CNNs), support NN) have been used to address the complexity of butterfly
vector machines (SVMs), and k-nearest neighbors (k-NN). image classification. CNNs, with their ability to
automatically learn hierarchical features of images, have
The review discusses the pros and cons of using these become the method of choice for many researchers due to
methods on butterfly image data, emphasizing on their exceptional performance in visual recognition tasks.
accuracy, efficiency, and generalization. Special focus is
placed on the preprocessing procedures necessary to This paper explores the use of these supervised learning
improve image quality and extract features, including techniques within Google Colab, a widely used cloud-based
image augmentation, normalization, and feature scaling. platform that facilitates easy experimentation with machine
The article also investigates various butterfly image learning models. Google Colab provides an accessible
datasets that are accessible to the public, analyzing how environment for developing and training models using
they are used for training and assessing classification popular machine learning libraries such as TensorFlow and
models. PyTorch. Its integration with these libraries allows for
seamless execution of complex algorithms and offers tools for
Google Colab is highlighted as a potent instrument efficient model training and evaluation.
for creating and testing these models because of its
convenience, user-friendliness, and compatibility with II. BACKGROUND
leading machine learning libraries such as TensorFlow
and PyTorch. Furthermore, the article examines recent Supervised Learning- Supervised learning involves
research and initiatives that have effectively utilized training a model on a labelled dataset, where each input image
butterfly image categorization with Colab, demonstrating is associated with a specific label (butterfly species in this
ideal methods and insight gained. case). The model learns to associate input images with their
corresponding labels through a training process and its
Image Preprocessing, Feature Extraction, Machine performance is evaluated on a separate test data set.
Learning Libraries, TensorFlow, PyTorch, Image
Augmentation, Publicly Available Datasets.
 Classification Models: The Following are the Different
Keywords:- Butterfly Image Classification, Supervised Types of Classification Models-
Learning, Google Colab, Convolutional Neural Networks
(Cnns), Support Vector Machines (Svms), K-Nearest  Support Vector Machines (SVM): SVMs are efficient for
Neighbors (K-NN), high-dimensional spaces and can handle cases where the
number of features exceeds the number of samples. They
I. INTRODUCTION are particularly useful when the number of classes is
limited.
Butterfly image classification has become an essential  Decision trees and random forests: These models are
task in the field of computer vision, due to its importance in simple and interpretable, with random forests providing a
biodiversity monitoring and conservation. Accurately unified approach to improve robustness.
classifying butterfly species from images can significantly

IJISRT24AUG974 www.ijisrt.com 684

Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG974

 Convolutional Neural Networks (CNNs): CNNs are C. Implementation in Google Colab

particularly suitable for image classification tasks due to
their ability to automatically learn spatial hierarchies and  Setup Environment:
image features.
 Google Colab Notebook: Create a new notebook in
Google colab- Google Colab is a cloud-based Google Colab and set up the runtime environment to use
environment that supports Python and provides access to a GPU.
GPUs, making it ideal for training deep learning models. It
offers a collaborative environment and integrates with  Data Handling:
Google Drive, facilitating easy data storage and sharing.
 Upload Data: Use Google Drive integration to upload and
III. METHODOLOGY access datasets.
 Data Pipeline: Implement data loading and preprocessing
A. Data Collection and Preprocessing pipelines using libraries such as TensorFlow and PyTorch.
The effectiveness of a classification model heavily
depends on the quality and quantity of the data. For butterfly  Model Training:
image classification, datasets such as the Butterfly Dataset
from the Kaggle repository can be used.  Define Model: Choose and define a CNN model
architecture or load a pretrained model.
 Data Collection:  Compile Model: Set up the loss function, optimizer, and
Obtain a diverse set of butterfly images with evaluation metrics.
annotations. Common sources include public datasets or  Train Model: Train the model on the training dataset and
collaborations with entomological research institutions. monitor performance on the validation set.
 Data Preprocessing:  Evaluation and Optimization:

 Image Resizing: Standardize the image dimensions for  Evaluate Model: Assess the model's performance using
consistent input to the model. metrics such as accuracy, precision, recall, and F1 score.
 Normalization: Scale pixel values to a range between 0  Hyperparameter Tuning: Optimize hyperparameters to
and 1 to aid in faster convergence during training. improve performance.
 Augmentation: Apply transformations such as rotations,
flips, and zooms to increase the diversity of the training  Visualisation:
set and improve model generalization.
 Visualize the training and validation accuracy/loss curves
B. Model Selection for the CNN model to assess learning patterns over
epochs.
 CNN Architectures:  Plot confusion matrices for all models to illustrate the
classification performance, highlighting correctly and
 LeNet-5: An early CNN model that is relatively simple incorrectly classified instances.
and suitable for baseline comparisons.
 AlexNet: Introduced deeper layers and dropout, providing IV. PROGRAM CODE
better performance on larger datasets.
 VGGNet: Known for its deep architecture and use of import numpy as np
small convolutional filters. import pandas as pd
 ResNet: Uses residual blocks to address vanishing import tensorflow as tf
gradient problems and is effective for very deep networks. from tensorflow import keras
from tensorflow.keras.preprocessing.image import Image
 Transfer Learning: Data Generator

 Pre-trained Models: Leverage models like VGG16, from tensorflow.keras.utils import to_categorical
ResNet50, or InceptionV3 trained on large datasets (e.g., from sklearn.model_selection import train_test_split
ImageNet) and fine-tune them on the butterfly dataset. import matplotlib.pyplot as plt
import random
import pathlib
import os
tf.random.set_seed(42)
train_data =
keras.utils.image_dataset_from_directory('/content/dataset/tr
ain',validation_split = 0.2, subset = 'training', seed = 1, shuffle

IJISRT24AUG974 www.ijisrt.com 685

Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG974

=True, batch_size = 32, y_pred = np.argmax(y_pred_proba,axis =1)

image_size=(256,256) metrics.accuracy_score(y_test,y_pred)
test_data = train_score = cnn_1.evaluate(train_data,verbose=1)
keras.utils.image_dataset_from_directory('/content/dataset/te test_score = cnn_1.evaluate(test_data,verbose=1)
st', validation_split = 0.2, subset = 'validation',seed = 1, print( "Train loss:", train_score[0])
shuffle =True,batch_size = 32, print("Train accuracy:", train_score[1])
image_size=(256,256)) print('*****************************')
filenames = pathlib.Path('/content/dataset') print("Test loss:", test_score[0])
for label in train_data.class_names: print("Test accuracy:", test_score[1])
images = list(filenames.glob(f'{label}/*')) from sklearn.metrics import classification_report
print(f'{label} : {len(images)}') target_names = ['ADONIS','AN 88','ATALA']
def plot_random_predictions(dataset, model):
train_data.cardinality().numpy(), shuffled_data = dataset.shuffle(10)
test_data.cardinality().numpy() class_names = dataset.class_names
train_set = train_data.take(1500) for images, labels in shuffled_data.take(1):
val_set = train_data.skip(1500) plt.figure(figsize=(8,8),dpi =120)
train_set.cardinality().numpy(),val_set.cardinality().numpy() y_pred_proba = model.predict(images)

#print random images from the train set for i in range(0):

plt.figure(figsize =(8,5)) index = random.randint(0, len(images))
for images, labels in train_set.take(1): ax = plt.subplot(3, 3, i + 1)
for i in range(15): img=images[index].numpy().astype("uint8")
index = random.randint(0, len(images)) y_true = class_names[labels[index]]
ax = plt.subplot(3,5,i+1) y_pred=class_names[np.argmax(y_pred_proba
plt.imshow(images[index].numpy(). [index],axis = 0)]
astype("uint8")) c='g' if y_pred == y_true else 'r'
plt.title(train_data.class_names[labels[index]], plt.imshow(img)
color='blue',fontsize=12) plt.title(f'Predicted:{y_pred}\n
plt.axis('off') True label :{ y_true}',c = c )
plt.show() plt.axis(False)
plot_random_predictions(test_data,cnn_1)
for images_batch, labels_batch in train_set:
print(images_batch.shape) V. RESULTS
print(labels_batch.shape)
break In this study, the performance of three supervised
learning models—Convolutional Neural Networks (CNNs),
from tensorflow.keras import layers Support Vector Machines (SVMs), and Random Forests—
tf.random.set_seed(42) was evaluated for the task of butterfly image classification.
cnn_1 = keras.Sequential([layers.Rescaling(1./255), Among these, the CNN model emerged as the most effective,
layers.Conv2D(filters= 32, kernel_size=3, demonstrating the highest accuracy and generalization ability
activation='relu'),layers.MaxPooling2D(pool_size=2),layers. on the test dataset.
Flatten(),layers.Dense(500,activation= 'relu'),layers.Dense(5,
activation = 'sigmoid')]) The CNN model, designed to learn hierarchical features
directly from the raw image data, achieved an accuracy
cnn_1.compile(loss=keras.losses.SparseCategoricalCrossent exceeding 90%. This success is attributed to its ability to
ropy(),optimizer=keras.optimizers.Adam(), metrics = capture intricate patterns and features in butterfly wings,
['accuracy']) which are critical for distinguishing between species. The
history_1 = cnn_1.fit(train_set, epochs=5, validation_data = CNN's deep architecture allowed it to automatically extract
val_set) and combine low-level features (e.g., edges, colors) with
X_test, y_test = None,None high-level patterns (e.g., shapes, textures), leading to superior
for images, labels in test_data.take(100): performance in classifying butterflies.
if X_test == None or y_test == None:
X_test = images In comparison, the SVM model, which relies on
y_test = labels manually defined features, achieved moderate accuracy.
else: While effective in cases where species have distinct visual
X_test = tf.concat([X_test,images],axis =0) features, SVM struggled with species that exhibit subtle
y_test = tf.concat([y_test,labels],axis =0) variations, reflecting its limitations in feature representation.
Similarly, the Random Forest model, an ensemble learning
X_test.shape, y_test.shape method, performed adequately but was less effective in
from sklearn import metrics handling the complex visual data, particularly when
y_pred_proba = cnn_1.predict(X_test) distinguishing between visually similar species.

IJISRT24AUG974 www.ijisrt.com 686

Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG974

The evaluation metrics, including accuracy, precision,

recall, and F1-score, consistently showed the CNN model
outperforming the SVM and Random Forest models. The
confusion matrices further highlighted that CNNs had fewer
misclassifications, particularly in species with subtle
differences, while SVMs and Random Forests exhibited more
errors in such cases.

These results underscore the superiority of CNNs

among the supervised learning models tested, especially in
tasks requiring detailed image analysis. The findings
reinforce the notion that deep learning models are better
suited for complex image classification tasks, particularly
when compared to traditional machine learning approaches.

Fig 3: Output 3

Fig.4: Output 4
Fig 1: Output 1

Fig 2: Output 2 Fig 5: Output 5

IJISRT24AUG974 www.ijisrt.com 687

Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG974

Fig.6: Output 6 Fig.9: Output 9

VI. CONCLUSION

This study explored the application of supervised

learning algorithms—specifically Convolutional Neural
Networks (CNNs), Support Vector Machines (SVMs), and
Random Forests—in the classification of butterfly species
from images using Google Colab. The results demonstrated
the superior performance of CNNs, which achieved the
highest accuracy and better generalization on the test dataset
compared to the traditional machine learning models.

The CNN model's ability to automatically extract and

learn hierarchical features from images enabled it to
effectively distinguish between butterfly species with
complex wing patterns and subtle differences. This contrasts
with the SVM and Random Forest models, which, while
effective in certain scenarios, struggled with the nuances
present in the visual data. The study highlighted the
Fig 7: Output 7 limitations of traditional machine learning approaches in
handling intricate image classification tasks, where deep
learning models like CNNs excel due to their capacity to
capture detailed and abstract features.

The use of Google Colab as the computational platform

provided a practical and accessible means of implementing
and training these models, offering a cloud-based solution
that circumvents the need for high-end hardware. This
accessibility is crucial for researchers and practitioners who
may not have access to advanced computational resources but
still wish to leverage powerful machine learning techniques.

The study's findings have significant implications for

the field of ecological monitoring and biodiversity
conservation, where accurate and efficient species
identification is essential. By demonstrating the effectiveness
of CNNs in butterfly classification, this research paves the
way for future work that could further enhance model
Fig 8: Output 8 accuracy through the use of transfer learning, more complex
architectures, or larger, more diverse datasets.

IJISRT24AUG974 www.ijisrt.com 688

Volume 9, Issue 8, August – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24AUG974

In conclusion, this paper underscores the potential of

deep learning models, particularly CNNs, in advancing the
field of automated species identification, providing a robust
tool for researchers and conservationists alike.

REFERNCES

[1]. Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning,"

Nature, vol. 521, no. 7553, pp. 436-444, 2015.
[2]. K. Simonyan and A. Zisserman, "Very deep
convolutional networks for large-scale image
recognition," arXiv preprint arXiv:1409.1556, 2014.
[3]. C. Cortes and V. Vapnik, "Support-vector networks,"
Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
[4]. L. Breiman, "Random forests," Machine Learning,
vol. 45, pp. 5-32, 2001.
[5]. M. Everingham, L. Van Gool, C. K. I. Williams, J.
Winn, and A. Zisserman, "The PASCAL Visual Object
Classes (VOC) Challenge," Int. J. Comput. Vis., vol.
88, no. 2, pp. 303-338, 2010.
[6]. A. Krizhevsky, I. Sutskever, and G. E. Hinton,
"ImageNet classification with deep convolutional
neural networks," in Proc. 25th Int. Conf. Neural
Information Processing Systems (NIPS), Lake Tahoe,
NV, 2012, pp. 1097-1105.
[7]. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual
learning for image recognition," in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Las Vegas,
NV, 2016, pp. 770-778.
[8]. D. Kingma and J. Ba, "Adam: A method for stochastic
optimization," arXiv preprint arXiv:1412.6980, 2014.
[9]. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi,
"You only look once: Unified, real-time object
detection," in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), Las Vegas, NV, 2016, pp. 779-788.
[10]. F. Chollet, "Xception: Deep learning with depthwise
separable convolutions," in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit. (CVPR), Honolulu, HI, 2017,
pp. 1251-1258.
[11]. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-
CNN: Towards real-time object detection with region
proposal networks," in Proc. 28th Int. Conf. Neural
Information Processing Systems (NIPS), Montreal,
Canada, 2015, pp. 91-99.
[12]. A. Dosovitskiy et al., "An image is worth 16x16
words: Transformers for image recognition at scale,"
arXiv preprint arXiv:2010.11929, 2020.
[13]. C. Szegedy et al., "Going deeper with convolutions,"
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Boston, MA, 2015, pp. 1-9.
[14]. M. Tan and Q. Le, "EfficientNet: Rethinking model
scaling for convolutional neural networks," in Proc.
36th Int. Conf. Machine Learning (ICML), Long
Beach, CA, 2019, pp. 6105-6114.
[15]. I. Goodfellow et al., "Generative adversarial nets," in
Proc. 27th Int. Conf. Neural Information Processing
Systems (NIPS), Montreal, Canada, 2014, pp. 2672-
2680.