Application of Transfer Learning For Image Classification On Dataset With Not Mutually Exclusive Classes
Application of Transfer Learning For Image Classification On Dataset With Not Mutually Exclusive Classes
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
Transferred
the limitations of the deep CNN, such as requirement layers
Pretrained
network
of a large training dataset and heavy computational
cost. For dataset with well decoupled classes, the
Conv
conventional CNN can perform classification tasks ImageNet
FC
FC
FC
dataset layers
with fair accuracy. However, it will suffer if the
dataset contains coupled classes. Experimental results
show that the deep CNN-based transfer learning Transfer learning Train custom
classifier
models can classify images with better accuracy than
the conventional CNN, and wins by a large margin, no Custom Conv Custom
FC
FC
FC
dataset layers classification task
matter if the classes are coupled or decoupled.
Custom
2. Transfer Learning Using Pretrained layers
AlexNet, developed by Alex Krizhevsky, is a fast contains 1,000 channels for 1,000 classes in ImageNet
GPU-implementation of CNN that won the ImageNet Challenge. A softmax layer is used as the final layer.
contest in 2012. It is incredibly capable of achieving
high accuracy on very challenging datasets. It is 2.3 Transfer Learning Models
trained to classify more than a million images in
ImageNet into a thousand different categories, and The transfer learning approach in terms of CNN is
achieved 37.5% top-1 error rates and 17.0% top-5 by replacing the original classifier in the pretrained
error rates on test data. network with a new classifier to classify images in
The network contains five convolutional layers and new datasets, and the rest of the structure of the newly
followed by three fully-connected layers. Then the formed transfer network is the same as the pretrained
output of the last fully-connected layer is fed to a network except for the last layers. The pretrained
softmax layer to classify 1,000 classes. networks used to build the transfer learning models
are AlexNet and VGG16; they have been trained with
2.2 VGG16 millions of images beforehand. The comparison of the
structure of the pretrained network and the new
VGG16 is another deep CNN which surpasses network for specific image classification purposes is
AlexNet. The 16 indicates that the network is 16 shown in Fig. 1. In order to utilize the pretrained
layers deep. Proposed by Karen Simonyan and network in the newly constructed network, the last
Andrew Zisserman from Visual Geometry Group Lab fully-connected layer of AlexNet and VGG16 is
of Oxford University, VGG16 achieves 92.7% top-5 replaced with a new classifier, which contains a fully-
test accuracy on the ImageNet dataset, and won first connected layer with the size same as the number of
place in localization track, second place in classes in the new dataset, followed by a softmax layer
classification track in the 2014 ImageNet Challenge. and a classification output layer. In order to transfer
The work behind VGG16 investigates the impact of the learned features in AlexNet and VGG16 to the
network depth on the accuracy of large-scale image new classification network, the weights in the
recognition tasks. It is found out that the increase of transferred layers are kept frozen, only the weights in
the network depth by using small convolutional filters the newly added layers are trained to fulfill the new
can significantly improve the results from the prior- classification tasks.
art configurations.
The model takes 224 x 224 RGB image as input, 3. Experimental Results and Discussion
and then the data is passed through multiple
convolutional layers. Within the convolutional layers, A dataset from Kaggle which contains abstract
filters with small 3 x 3 receptive field are used to categories has been used to evaluate the performance
capture the notion of left/right, up/down, and center. of the transferred networks. The images in this dataset
Also, 1 x 1 convolutional filters are used as a linear (namely Dataset I) are scrapped from the Unsplash
transformation of input channels. The convolutional website, and they are classified into 16 different
stride is 1 pixel, and the spatial padding is 1 pixel. Five categories. Each class contains 500 images. The
max-pooling layers are inserted at specific locations, classes of the Dataset I are not mutually exclusive
and are performed over a 2 x 2 pixel window with from each other, which include animals, architecture,
stride 2. Three fully-connected layers are put at the arts-culture, athletics, business-work, fashion, food-
end of the stack of convolutional layers in which the
first one contains 4,096 channels, and the last one
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Image examples from the custom dataset.
Fig. 3. Training results of AlexNet-based transfer
learning model.
drink, health, history, interiors, nature, people, street-
photography, technology, textures-patterns, and
travel, as shown in Fig. 2. Some classes have overlap,
such as animals/nature, health/food-drink, street-
photography/travel. The images might belong to
different classes. However, an image only has one
label in the dataset.
Dataset I is divided into a training set, a validation
set, and a test set, which contain 4,800 images, 1,600
images and 1,600 images, respectively. In order to
evaluate how the transfer learning can perform on the
dataset with coupled classes, transfer learning
networks are built using AlexNet and VGG16. The
1,000 classes classifier are replaced with a new
classifier. The newly added fully-connected layer has
a size of 16 to match the number of classes in Dataset Fig. 4. Training results of VGG16-based transfer
I. The images with various sizes are first resized to learning model.
suit the input size of the AlexNet and VGG16 input
layer. During the training process, the weights of the Table I: Classification accuracy on not mutually
transferred layers are frozen by specifying a much exclusive Dataset I (16 classes)
higher learning rate in the classifier than the VGG16- AlexNet- Conventional
transferred layers. The training is carried out for 6
based based CNN
epochs using sgdm solver. The training results of the
Validation
transferred models are shown in Fig. 3 and Fig. 4. The 0.5294 0.4613 0.1550
accuracy
classification accuracy on Dataset I is shown in Table
Test
I. For comparison, a conventional CNN is used as the 0.5069 0.4500 0.1613
accuracy
benchmark. It can be seen that the VGG16-based
transfer learning network has the best performance test set in the dataset I. The true classes of the two
among the three networks, which achieves a 0.5069 images are animals and health, respectively. For
accuracy on a dataset with coupled classes. The Image A, one can naturally reckon that the image
AlexNet-based model comes the second with an belongs to two classes, animals, and nature. For
accuracy of 0.4500. Both of the transfer learning AlexNet-based network, the image has been correctly
networks have far better test accuracy than the classified as animals with a score of 0.9002. While for
conventional CNN, of which the accuracy is only a VGG16-based network, even though it finally
poor 0.1613. It shows that the conventional CNN has classifies the image as nature with a 0.5174 score, it
majorly suffered from the not decoupled classes and still gives a fairly confident score (0.4579) in animals,
it is difficult to produce acceptable classification and no other classes get a score higher than 0.0150.
accuracy, while the transfer learning models can still As for Image B, the VGG16-based network classifies
yield credible results. it into food-drink with a score of 0.8249, but also give
Even though the classification accuracy of the two a score 0.1748 to the class health, and the scores for
transfer learning networks is not as high on Dataset I, the rest of the classes are all too small to be
the classification score for each class can still provide considered. It is of common sense that Image B can
an insight into what classes should the image belong be put into the food-drink and health classes.
to. Fig. 5 shows two example images taken from the
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.
approach increases dramatically, and still outperforms
the conventional CNN by a large margin.
Acknowledgment
This work was supported by Seoul National
University of Science and Technology, Seoul, South
(a) (b) Korea.
Fig. 5. Example images taken from the test set of the
dataset I. (a) Image A, label: animals. (b) Image B, References
label: health.
[1] J. Wang, Y. Zheng, M. Wang, Q. Shen and J. Huang,
Table II: Classification accuracy on mutually "Object-Scale Adaptive Convolutional Neural Networks for
exclusive Dataset II (2 classes) High-Spatial Resolution Remote Sensing Image
Classification," in IEEE Journal of Selected Topics in
VGG16- AlexNet- Conventional Applied Earth Observations and Remote Sensing, vol. 14,
based based CNN pp. 283-299, 2021.
Validation
0.9900 0.9650 0.7450 [2] J. Zhang, Y. Xia, Y. Xie, M. Fulham and D. D. Feng,
accuracy
Test "Classification of Medical Images in the Biomedical
0.9950 0.9650 0.7100 Literature by Jointly Using Deep and Handcrafted Visual
accuracy
Features," in IEEE Journal of Biomedical and Health
Informatics, vol. 22, no. 5, pp. 1521-1530, Sept. 2018.
In order to demonstrate the effectiveness of the
transfer learning on well-classified dataset, the [3] H. Lee and H. Kwon, "Going Deeper With Contextual
Dataset I is cropped to remove the overlapped classes CNN for Hyperspectral Image Classification," in IEEE
to construct Dataset II. The Dataset II contains only Transactions on Image Processing, vol. 26, no. 10, pp. 4843-
two mutually exclusive classes, to be specific, animals 4855, Oct. 2017.
and architecture. The Dataset II is also divided into
training set, validation set, and test set with a ratio of [4] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E.
3:1:1. Comparison of the classification accuracy of Hinton. "Imagenet classification with deep convolutional
neural networks." Advances in neural information
the VGG16-based network, the AlexNet-based
processing systems 25 (2012): 1097-1105.
network and the conventional CNN on Dataset II are
given in Table II. It can be seen that the classification [5] He, Kaiming, et al. "Deep residual learning for image
accuracy of the two transfer learning networks on recognition." Proceedings of the IEEE conference on
Dataset II with well-decoupled classes are both above computer vision and pattern recognition. 2016.
96%, which is much improved compared with that on
Dataset I with not mutually exclusive classes. [6] Howard, Andrew G., et al. "Mobilenets: Efficient
Besides, the two transfer learning networks still convolutional neural networks for mobile vision
outperform the conventional CNN (71% test applications." arXiv preprint arXiv:1704.04861 (2017).
accuracy) on Dataset II by a large margin.
[7] L. Shao, F. Zhu and X. Li, "Transfer Learning for Visual
Categorization: A Survey," in IEEE Transactions on Neural
4. Conclusion Networks and Learning Systems, vol. 26, no. 5, pp. 1019-
1034, May 2015.
This paper evaluates the performance of the deep
CNN-based transfer learning approach on a dataset [8] Hanni, Akkamahadevi, Satyadhyan Chickerur, and
with not mutually exclusive classes for image Indira Bidari. "Deep learning framework for scene based
indoor location recognition." 2017 International Conference
classification. Deep CNN has the major drawback that
on Technological Advancements in Power and Energy
it requires a large dataset to train the network, while (TAP Energy). IEEE, 2017.
conventional CNN has low accuracy if used on dataset
with coupled classes. Therefore, transfer learning [9] M. Fradi, M. Afif, E. -H. Zahzeh, K. Bouallegue and M.
based on the AlexNet and the VGG16 is proposed to Machhout, "Transfer-Deep Learning Application for
be used on this kind of dataset. Experimental results Ultrasonic Computed Tomographic Image Classification,"
show that when using the transfer learning approach 2020 International Conference on Control, Automation and
on the dataset with not mutually exclusive classes, it Diagnosis (ICCAD), Paris, France, 2020, pp. 1-6.
achieves acceptable accuracy, and the classification
[10] Swati, Zar Nawab Khan, et al. "Brain tumor
scores reflect the potential classes that the image
classification for MR images using transfer learning and
might belong to, while the conventional CNN fails fine-tuning." Computerized Medical Imaging and Graphics
with low accuracy. When the classes are decoupled in 75 (2019): 34-46.
the dataset, the accuracy of the transfer learning
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:30:12 UTC from IEEE Xplore. Restrictions apply.