Image Recognition With Deep Learning
Image Recognition With Deep Learning
106
ICIIBMS 2018, Track 2: Artificial Intelligent, Robotics, and Human-Computer Interaction, Bangkok, Thailand
repeated. After applying FCM to segment food images they divided into three parts: training set with 9866 images,
used sphere shaped support vector machine (SVM) to validation set with 3430 images and evaluation set with
classify segmented images. In this experiment they got an 3347 images.
accuracy of 95%.
The authors of [5] used random forests classifier to
classify food-101 dataset. Food-101 is a large dataset with B. Image Pre-processing
101 categories. The researchers got 50.76% of accuracy We applied some image pre-processing technique to
with the RFDC approach. increase efficiency to our system by speeding up training
In recent literatures there are also several approaches time and also . The pre-processing such as random rotation
that used deep convolution neural network to classify food and horizontal flips help convolutional neural network
images. Since convolutional neural networks are scalable models to be insensitive of the exact position of the object
for large datasets it is more suitable to use CNNs for food in the images. The ZCA whitening reduces the redundancy
image classification. Deep learning was used in [6] to in the matrix of pixel images and highlights the structures
classify UEC-256 food image for dietary assessment system and features of the images to the convolutional neural
using digital device. They obtained a top-1 accuracy of network. First we resized all our images to 299 x 299 x 3 to
54.7 % and 81.5% accuracy of top-5. increase processing time and also to fit in Inception V3.
After that we applied following image pre-processing
A Pre-trained DNN was applied in [7]. The deep CNN techniques.
was pre-trained on ImageNet with 1000 food related
categories than fine tuned to classify UEC-FOOD100 We set input mean to 0 over the dataset.
dataset. They achieved 78.77% top-1 accuracy.
We also set each sample mean to 0.
In [8] they used GoogLeNet to classify Thai fast food
images in TFF food dataset. They achieved 88.33% Then we divided inputs by standard deviation
accuracy for 11 classes. In [9] they implemented and of the dataset.
compared several convolutional neural network models with We also ensured that each input is divided its
food-11 dataset. They got 70.12% with their proposed standard deviation.
approach, 80.51% with Caffenet and 82.07% accuracy with
Alexnet. We applied ZCA whitening.
In our paper we implemented deep learning with We randomly rotated images in the range from
convolution neural network to classify food-11 dataset and 0 to 180 degrees to make our learning model
we got an accuracy of 92.86%. invariant to the object location in the image.
Randomly shift images horizontally.
II. METHODOLOGY
Randomly shift images vertically.
We used convolutional neural network in our approach
to classify food images. The convolutional neural network is Randomly flip images.
a category of neural network that has been proven very
efficient in image classification. The convolutional neural C. Deep Learning
network learns the filters that in traditional algorithms were
Our research utilizes the inception V3[10] model. The
hand-engineered. In our method we used an inception v3
architecture of the model is given in figure 2.
[10] model pre-trained with ImageNet [11]. The method of
our task classifying food images consisting of four The layers of the inception V3 model are:
processes:
Convolution Layer: At the beginning
Select a food image dataset convolution layer with input size 299 x 299 x 3
to create feature maps by convolving input
Image Pre-processing images.
Train dataset using deep learning algorithm Max Pooling Layer: Max-pooling is a sample-
Classification of food images based discretization process. Max pooling is
done by applying a max filter to non-
The figure 1 depicts the methodology of our objective. overlapping sub regions of the input matrices.
Max-pooling extracts the most important
A. Input Image features like vertical edgesand horizontal edges
We used the Food-11 dataset for our research. The [12].
dataset was created by the authors who proposed [7]. The
dataset consists of 16643 images grouped into 11 major
food categories. The 11 food categories are Bread, Dairy
products, Dessert, Egg, Fried food, Meat, Pasta, Rice,
Seafood, Soup, and Vegetables. The food images are
107
ICIIBMS 2018, Track 2: Artificial Intelligent, Robotics, and Human-Computer Interaction, Bangkok, Thailand
108
ICIIBMS 2018, Track 2: Artificial Intelligent, Robotics, and Human-Computer Interaction, Bangkok, Thailand
Model Accuracy
Proposed Approach 92.86%
Average Pooling Layer: Average pooling layer
reduces the variance and complexity in the data. It
also divides the input into rectangular pooling
regions and computing the average values of each
matrix to downsample the input features [10].
Concat Layer: The Concat layer concatenates its
multiple input blobs to one single output blob
[10].
Dropout Layer: The dropout layer randomly
drops elements from a layer in the neural network.
Dropout is a technique used to improve over-fit
on neural networks [10].
Fully Connected Layer: The fully connected (FC)
layer in the CNN represents the feature vector for
the input. This feature vector holds information
that is vital to the input [10].
Softmax Layer: The softmax assigns decimal Fig. 4. Plot of model accuracy on training and validation datasets.
probabilities to each class in a multi-class
recognition problem. Those decimal probabilities We applied the models described in [9]. We see that Fine
must add up to 1.0. This additional constraint tuned Alexnet [9] has 82.23% and Caffenet [9] has 80.12 % of
make training to converge more quickly [10]. accuracy with our dataset. The proposed approach which uses
transfer learning technique with inception V3 convolution
D. Clasificaation neural network has an accuracy of 92.86%. The transfer
learning uses the knowledge earned from previous learning in
We trained our dataset of SGD [13] optimizer with initial
new training dataset to classify images that is why our
learning rate of 0.01 and 0.9 momentum. We used a learning
proposed approach has a better accuracy.
scheduler to set learning rate to 0.002 after 15 epochs and
0.0004 after 28 epochs.
A. Evaluation of model
Our dataset was divided into three parts: training,
validation and evaluation. We used training and validation
parts of the dataset while training the model and we used
evaluation part of our dataset during the evaluation of our
model. We resized the images of evaluation part in 299 x x299
x 3We evaluated the accuracy of the model by true positive
(TP), true negative (TN), false positive (FP) and false negative
(FN) after classification.
109
ICIIBMS 2018, Track 2: Artificial Intelligent, Robotics, and Human-Computer Interaction, Bangkok, Thailand
neural network works very well in food classification task. Vision – ECCV 2014 (Cham) (David Fleet, Tomas Pajdla, Bernt Schiele,
However the there can be still some improvement to reduce and Tinne Tuytelaars, eds.), Springer International Publishing, 2014, pp.
446–461.
the gap between training and test accuracy in our model. Other
[6] Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, and
neural network models such as recurrent neural network, Yunsheng Ma, Deepfood: Deep learning-based food image recognition
dilated convolutional neural network, etc can be applied to for computer-aided dietary assessment, CoRR abs/1606.05675 (2016).
classify food images. Convolution neural network models take [7] K. Yanai and Y. Kawano, "Food image recognition using deep
time for computation but once the model is trained it can be convolutional network with pre-training and fine-tuning," 2015 IEEE
easily used for classification. We can also improve our International Conference on Multimedia & Expo Workshops (ICMEW),
research by applying feature based models which will take less Turin, 2015, pp. 1-6.
computational time. We can also apply the different machine [8] N. Hnoohom and S. Yuenyong, "Thai fast food image classification
using deep learning," 2018 International ECTI Northern Section
learning algorithms with larger datasets with more categories. Conference on Electrical, Electronics, Computer and
Telecommunications Engineering (ECTI-NCON), Chiang Rai, 2018, pp.
116-119.
[9] G. Özsert Yi̇ ği̇ t and B. M. Özyildirim, "Comparison of convolutional
References neural network models for food image classification," 2017 IEEE
International Conference on INnovations in Intelligent SysTems and
Applications (INISTA), Gdynia, 2017, pp. 349-353.
[1] Health effects of overweight and obesity in 195 countries over 25 years,
New England Journal of Medicine 377 (2017), no. 1, 13–27, PMID: [10] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens,
28604169. and Zbigniew Wojna, Rethinking the inception architecture for computer
vision, CoRR abs/1512.00567 (2015).
[2] Y. He, C. Xu, N. Khanna, C. J. Boushey and E. J. Delp, "Analysis of
food images: Features and classification," 2014 IEEE International [11] J. Deng, W. Dong, R. Socher, L. J. Li, Kai Li and Li Fei-Fei, "ImageNet:
Conference on Image Processing (ICIP), Paris, 2014, pp. 2744-2748. A large-scale hierarchical image database," 2009 IEEE Conference on
Computer Vision and Pattern Recognition, Miami, FL, 2009, pp. 248-
[3] Z. Zong, D. T. Nguyen, P. Ogunbona and W. Li, "On the Combination 255.
of Local Texture and Global Structure for Food Classification," 2010
IEEE International Symposium on Multimedia, Taichung, 2010, pp. 204- [12] D. Ciregan, U. Meier and J. Schmidhuber, "Multi-column deep neural
211. networks for image classification," 2012 IEEE Conference on Computer
Vision and Pattern Recognition, Providence, RI, 2012, pp. 3642-3649.
[4] S. J. Minija and W. R. S. Emmanuel, "Food image classification using
sphere shaped — Support vector machine," 2017 International [13] Mei, Song, Montanari, Andrea & Nguyen, Phan-Minh (2018). A mean
Conference on Inventive Computing and Informatics (ICICI), field view of the landscape of two-layer neural networks. Proceedings of
Coimbatore, 2017, pp. 109-113. the National Academy of Sciences, 115, E7665-E7671.
[5] Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool, Food-101 –
mining discriminative components with random forests, Computer
110