Efficient Extraction of Deep Image Features Using Convolutional Neural
Efficient Extraction of Deep Image Features Using Convolutional Neural
A R T I C L E I N F O A B S T R A C T
Keywords: Background: The development of techniques and methods for rapidly and reliably detecting and analysing food
Food detection quality and safety products is of significance for the food industry. Traditional machine learning algorithms
Convolutional neural network based on handcrafted features normally have poor performance due to their limited representation capacity for
Feature extraction
complex food characteristics. Recently, the convolutional neural network (CNN) emerges as an effective and
Deep learning
Food safety and quality
potential tool for feature extraction, which is considered the most popular architecture of deep learning and has
been increasingly applied for the detection and analysis of complex food matrices.
Scope and approach: In the current review, the structure of CNN, the method of feature extraction based on 1-D, 2-
D and 3-D CNN models, and multi-feature aggregation methods are introduced. Applications of CNN as a depth
feature extractor for detecting and analyzing complex food matrices are discussed, including meat and aquatic
products, cereals and cereal products, fruits and vegetables, and others. In addition, data sources, model ar
chitecture and overall performance of CNN with other existing methods are compared, and trends of future
studies on applying CNN for food detection and analysis are also highlighted.
Key findings and conclusions: CNN combined with nondestructive detection techniques and computer vision
system show great potential for effectively and efficiently detecting and analysing complex food matrices, and
the features based on CNN show better performance and outperform the features handcrafted or those extracted
by machine learning algorithms. Although there still remains some challenges in using CNN, it is expected that
CNN models will be deployed on mobile devices for real-time detection and analysis of food matrices in future.
https://doi.org/10.1016/j.tifs.2021.04.042
Received 17 October 2020; Received in revised form 9 April 2021; Accepted 30 April 2021
Available online 6 May 2021
0924-2244/© 2021 Elsevier Ltd. All rights reserved.
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
developed for the qualitative and quantitative analysis of food matrices performance for various agricultural problems, Zhou et al. (2019)
(Ma et al., 2018; Wang et al., 2017). focused on the applications of deep learning in the food domain,
In food detection and analysis, data acquisition is the first important covering food identification, calories intake calculations, quality
step as the performance of food feature extraction highly depends on the detection of fruits, vegetables, meat and aquatic products, food supply
status of the data (Teng et al., 2019). Many advanced nondestructive chain, and food contamination, indicated that deep learning as a
techniques are available for data acquisition including computer vision promising tool outperforms manual feature extractors and machine
(Sun & Brosnan, 2003; Wang & Sun, 2003; Zheng et al., 2006), hyper learning algorithms, while most recently, Kumar et al. (2020) described
spectral imaging (HSI) (Cheng et al., 2016b; Liu et al., 2018b; Ma et al., several feature extraction methods based on conventional techniques
2017; Pan et al., 2018), Terahertz imaging (Zhou et al., 2019) and so on. and deep learning techniques for hyperspectral image classification,
In addition, a number of food image datasets are also available such as indicated that deep learning techniques were a better choice that suit the
Food-101, UECFood-100, UECFood-256, Food-524, Food-475, Food-5K cubical form of the HSI data. However, despite the above reviews, no
and Food-11, which can be readily used for detecting and analysing food review is available on feature extraction methods related to different
attributes (Ciocca et al., 2018; McAllister et al., 2018). Therefore, the dimension based on CNN for food detection and analysis. Therefore, the
main challenge is to design and develop accurate, efficient and objective objective of this review is to introduce feature extraction methods based
algorithms to rapidly obtain food attributes. on 1-D, 2-D, 3-D CNN models and discuss recent research advances in
There are many conventional machine learning algorithms used for the food area including meat and aquatic products, cereals and cereal
food data analysis such as support vector machines (SVM) (Du & Sun, products, fruits and vegetables, and others. It is hoped that this review
2005), K-means clustering, and artificial neural networks (ANN) (Zhou will provide critical information for further understanding of feature
et al., 2019). However, using these algorithms to exploit the food ar extraction techniques based on 1-D, 2-D, 3-D CNN and encourage more
chitecture information is a difficult task since foods are typically applications of CNN models for food detection and analysis.
non-rigid and deformable objects, and foods generally exhibit high
intra-class variance, i.e., multiple visual appearances within same food 2. Feature extraction using CNN
products and low inter-class variance, i.e., similar characteristics in
different types of food products (Mezgec et al., 2017). Therefore, these 2.1. Fundamentals of CNN
traditional algorithms based on handcrafted features probably have
poor performance due to their limited capacities in representing these The convolutional neural network represents a class of deep feed-
complex characteristics (Situju et al., 2019). forward neural networks, which is constructed by imitating the con
Alternative algorithms that can overcome the above difficulties are nective pattern of neurons in the human visual cortex (Hussain et al.,
those based on deep learning, also known as representational learning 2019). A typical CNN architecture usually consists of convolutional
and feature learning (LeCun et al., 2015). For example, Gu et al. (2020) (Conv) layers, activation layer, pooling (POOL) layer, and fully con
employed a dual-channel network to extract more complete image fea nected (FC) layer, as shown in Fig. 1. The Conv layers are made up of a
tures and obtained good performance. Gu et al. (2021) studied the series of convolution kernels, each of which obtains certain features
feasibility of ensemble meta-learning in better extracting the main fea from the input images, starting from basic features such as edges and
tures by fine-tuning the deep neural network. Deep learning is a deriv shapes at initial layers and becoming more complex and specific features
ative of artificial neural networks, aiming to establish the visual neural at the last layers. The parameters of convolution kernels (e.g. numbers,
network of the human brain to analyze and process massive data, which kernel size, strides, padding, etc.) should be adjusted and optimized
can automatically extract data representations in a hierarchical way. according to the size of the input image and the architecture of the
Due to its feature learning and generalization ability, deep learning is network (Zhou et al., 2019). Moreover, the neurons of the Conv layer are
widely applied for solving complex problems, which has achieved just sparsely connected to the neurons of adjacent layers, and individual
satisfactory performance in a variety of research areas such as medicine, neurons respond to the overlapping region of the receptive field by
forestry, and agriculture (Pouladzadeh et al., 2017). sharing weights until the entire visual area is covered (Hu et al., 2015;
Among the widely-used deep learning algorithms including con Mezgec et al., 2017). The activation layer behind Conv or FC layers is a
volutional neural network (CNN), fully convolutional network (FCN), non-linear operation of CNN models, which learns the non-linear rep
stacked autoencoders (SAEs) and long short-term memory (LSTM), CNN resentation of the output volume of the previous Conv or FC layer by a
and its derivative are the most popular model for processing visual- non-linear activation function such as Sigmoid, Tanh and rectified linear
related problems including image classification, object detection and activation function (ReLU) (Paoletti et al., 2019). The POOL layer can
semantic segmentation (Teng et al., 2019). The operation of CNN, which reduce the spatial dimensions of the extracted feature maps and the
is a type of highly parallelized method, is based on the principle of the number of network parameters by some non-linear numerical operations
forward and backward propagation algorithm that it can automatically such as average-pooling, sum-pooling or max-pooling (Paoletti et al.,
learn to extract distributed features of input data in convolutional layers, 2019). For example, a max-pooling function divides the input data into
and the deep features generated by CNN are more efficient and robust non-overlapping rectangles and retains the maximum value for each
than features handcrafted (Ciocca et al., 2018). Therefore, recently CNN region. After a set of Conv layers alternate with POOL layers, learning to
has been widely studied for extracting the features of foods. McAllister abstract features of increasing levels, the FC layer as a classifier can be
et al. (2018) employed the CNN model as deep feature extractors to implemented at the end of modelling in order to classify input images
capture the features of diverse food image datasets, which were subse into predefined labelled classes by integrating high-level features and
quently fed into the machine learning algorithms to perform food clas information from previous layers (Kamilaris et al., 2018a). The output of
sification. Teng et al. (2019) evaluated the performance of the feature FC layers represents the output of CNN models, and the number of
based on CNN_5 architecture as compared with the Bag-of-Features output nodes depends on the number of labelled classes (Ciocca et al.,
(BoF) on a Chinese food image dataset, while Pan et al. (2020) uti 2018).
lized the combinational CNN with two-stream of subnets in parallel to The early architecture of CNN is LeNet-5 (Lecun et al., 1998), which
extract different types of the features of food datasets, which were has been successfully employed for handwriting digital recognition.
subsequently fusioned to improve the performance of classification by With the introduction of large scale labelled databases such as Image
using a multi-feature aggregation method. Net, new deep learning technologies such as ReLU and Dropout, and new
There are also a few relevant reviews published in the past. Kamilaris computing hardware such as graphics processing units (GPUs) and
et al. (2018b) reviewed the applications of CNN in agriculture and AlexNet (Krizhevsky et al., 2012) is the first notably framework that has
indicate that CNN constitutes a promising technique with high gained excellent performance than traditional computer vision methods
194
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
Fig. 1. A typical CNN architecture comprising convolutional (Conv) layers, pooling (POOL) layer, and fully connected (FC) layer for food detection and analysis.
for image recognition in ILSVRC (ImageNet Large Scale Visual Recog scale datasets. Pan et al. (2019) utilized four image augmentation
nition Challenge). After the successful application of AlexNet, many techniques including rotation, flipping, translation and shearing to
deeper and wider network structures such as GoogleNet, VGGNet, Re expand the size of the small-scale food dataset called MLC-41. The third
sidual Networks (ResNet), and DenseNet have been proposed to handle skill is to use a CNN model pre-trained on a large-scale database as a
complex problems of the restricted ability of feature representation (Pan depth feature extractor for small and medium scale datasets. The last
et al., 2020). skill is adjusting the network framework properly based on the basic
With the rapid expansion of the depth and breadth of CNN models, CNN networks. Liu et al. (2018a) adjusted the AlexNet network archi
more abstract and robust features can now be obtained by employing tecture by adding a module called “Inception Module”, which not only
deeper networks. Jahani Heravi et al. (2018) designed a ConvNet ar increased the depth of the network but also removed the computation
chitecture with 23 layers for food classification, which not only added bottlenecks. Therefore, the architecture and parameters of CNN should
the spatial pyramid pooling (SPP) layer to improve the performance of be correspondingly adjusted according to the specific food task. Fig. 2
the model but also utilized bottleneck blocks to decrease the complexity summarizes the procedure of using CNN for food detection and analysis.
of the model, realizing the parameters of the architecture fewer than
ResNet and GoogleNet models. The parameters of the CNN architecture
can be adjusted and updated in each training epoch of the forward and 2.2. Extraction techniques
backward propagation process (Zhou et al., 2019). The forward propa
gation aims to calculate the error value between the output value and Feature extraction (FE) is usually considered as an indispensable step
the labelled value according to the defined loss function, while the for image analysis and process, and it is a complicated and time-
backward propagation intends to adjust the weight and bias of neurons consuming process, which needs to determine the most appropriate
based on this error value, thereby minimizing the loss function by using types of features for the successful detection and analysis of certain food
Stochastic Gradient Descent (SGD), Adaptive Gradient (AdaGrad), or categories (McAllister et al., 2018). Generally, in the spectral feature
Nesterov’s Accelerated Gradient (NAG) algorithms (Mezgec et al., extraction (FE) method, all pixel-wise spectra of the region of interest
2017). (ROI) in each sample are averaged as a spectral signature of ROI. At
All the above mentioned CNN architectures, including other deriv present, both features handcrafted using colour histogram, oriented
ative frameworks of CNN, have brought a series of eminent break gradients (HOG) histogram, scale-invariant feature transform (SIFT) and
throughs in the food area. However, one troublesome issue about local binary pattern (LBP), and features captured using conventional
training CNN models is that large-scale datasets are necessary as small- methods including principal component analysis (PCA), wavelet trans
scale datasets (e.g. a few hundreds of images or less) might cause the form (WT) and independent component correlation algorithm (ICA) can
overfitting problem (Pan et al., 2019). At present, four widely-used be utilized for food detection and analysis (McAllister et al., 2018; Zhou
methods have been utilized to address the overfitting problem. The et al., 2019). However, using these features to perform food detection
first skill is fine-tuning technology that takes trained weights and biases may have lower performance due to limited representation capacity for
of the pre-trained model as the initialization value, and restart to train complex food characteristics (Situju et al., 2019).
the model, as training the entire CNN model from scratch is complex and On the other hand, efficient feature extraction methods based on
time-consuming (Pan et al., 2020). Ciocca et al. (2018) evaluated per CNN has been developed rapidly in recent years after the appearance of
formances of the ResNet-50-S model trained from scratch and fine-tuned the ImageNet database, testifying the capacity of learning automatically
the ResNet-50 model on the Food-475 dataset, and found that the per high-level features. The features generated by CNN have been confirmed
formance of fine-tuned ResNet-50 outperformed the ResNet-50-S. The to be more efficient and robust than handcrafted and captured features
second skill is the image augmentation technique for small and medium (Ciocca et al., 2018; Jahani Heravi et al., 2018). Therefore, in order to
extract deep features from pre-trained CNN, it is important to select a
Fig. 2. A workflow for food detection and analysis. Five major steps are involved in this workflow including dataset preparation, image preprocessing, CNN model
selection and download, fine-tuning CNN model, and application.
195
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
layer for each model. Generally, CNN extracts high-level features from extraction of a bruised apple as an example, which consists of an input
the last output layer of the model (Pan et al., 2020). Additionally, it is layer, 1-D CNN block (stacking several Conv layers and POOL layers)
worth mentioning that CNN models usually employ two-dimensional and a fully-connected neural network (FNN) block. The average spectral
(2-D) convolution filters to analyze grayscale imagery and pixels with a size of nchannels × 1 are considered as input data
red-green-blue (RGB) images. However, one-dimensional (1-D) and (wherenchannels can be the number of original bands or the number of
three-dimensional (3-D) filters can be also employed to learn spectral optimal bands selected), and there are m1 1-D convolution kernels of size
features and spatial-spectral features, respectively (Ball et al., 2017; k1 × 1 in the first 1-D Conv layer, thereby the first 1-D Conv layer will
Zhou et al., 2019). In addition, the features extracted by 1-D, 2-D and generate m1 feature maps of the size of (nchannels − k1 +1) × 1 by 1-D
3-D CNN can be used to train another classifier like SVM and K-nearest convolution operation (Hu et al., 2015). Each feature map is obtained
neighbour (KNN). In addition, multi-feature aggregation techniques are by taking the dot product between the weight matrix ω and the local
also available for food detection and analysis. Fig. 3 compares the ar area position x, and the value of a neuron Vijx at position x on the jth
chitectures of 1-D, 2-D and 3-D CNN. As shown in Fig. 3, 1-D, 2-D and feature map in the ith layer can be evaluated by (Chen et al., 2016):
3-D CNN involve more complex model structure and computational ef ( )
forts, however, it seems difficult for researchers with minimal back ∑ K∑
i− 1
x
Vij = σ bij + k x+k
wijm V(i− 1)m (1)
ground in computer science to program complex neural network models. m k=0
Therefore, in order to reduce programming difficulty for researchers,
many popular frameworks including Tensorflow, Keras, Theano, where σ( ⋅) denotes the activation function of the ith layer, bij is an ad
Pytorch and Caffe have emerged to help researchers quickly build 1-D, ditive bias of jth feature map at the ith layer, m indexes the connection
2-D and 3-D CNN models by calling encapsulated function interfaces between the feature map in the (i − 1)th layer and the current (jth)
to stack some duplicate network layers such as Conv layer, POOL layer feature map, Ki is the width of the 1-D convolution kernel, and ωkijm is a
and FC layer (Zhou et al., 2019). x+k
weight for input V(i− 1)m with an offset of k in 1-D convolution kernel.
2.2.1. One-dimensional spectral feature extraction The pooling process is triggered after the convolution stage, and it
Since the hierarchical architecture of CNN is designed to extract two- can make the features invariant from the location via decreasing the
dimensional (2-D) image features, it is thus a challenge to employing resolution of the feature maps. There are m2 kernels of size k2 × 1 in the
CNN to extract one-dimensional (1-D) spectral signatures (Qiu et al., first POOL layer containing m2 × n3 × 1 nodes, and n3 = n2 /k2 . More
2018). However, with the successful application of the CNN models, over, the neurons of the POOL layer connect a small n × 1 patch of the
researchers start to develop 1-D CNN for signal processing (e.g., speech Conv layer. The pooling operation of the max-pooling can then be
recognition and noise filtering), and the input of 1-D CNN can be completed using the equation below (Chen et al., 2016):
regarded as a 1-D array (Al-Sarayreh et al., 2018). Therefore, 1-D CNN ( )
aj = max ak×1
i u(k, 1) (2)
can be applied to capture spectral features of hyperspectral images with n×1
Fig. 3. The comparison for the feature extraction process of (a) 1-D CNN (Zhou et al., 2019), (b) 2-D CNN and (c) 3-D CNN (Al-Sarayreh et al., 2020).
196
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
197
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
Fig. 4. The comparison for multi-feature aggregation based on (a) two 2-D CNN subnetworks (Jiang et al., 2020) and (b) 1-D and 3-D CNN subnetworks (Al-Sarayreh
et al., 2018).
198
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
Table 1
Applications of 1-D, 2-D and 3-D CNN models in food quality detection.
Products Samples CNN models Results References
Meat and fishery Fresh unpacked, fresh packed, frozen unpacked, frozen 1-D and 3-D Classification rate of 94.4% Al-Sarayreh et al. (2018)
products packed and frozen-thawed unpacked red meats CNN
Fresh lamb, beef and pork 3-D CNN Classification rates of 96.9% and 97.1% for NIR Al-Sarayreh et al. (2020)
and VIS snapshot HIS, respectively
Wholesome and defective salmon fillets 2-D AlexNet Classification rates of 92.7% and 91.6% for cross- Xu et al. (2018)
validation and test sets, respectively
Most fresh, fresh, fairly fresh and spoiled carps 2-D VGG-16 Classification rate of 98.21% Taheri-Garavand et al.
(2020)
Cereals and Japonica and Indica rices 1-D VGGNet Classification rates of 89.6% and 87.0% for Qiu et al. (2018)
cereals training and testing sets, respectively
products Medium-grain, round-grain and long-grain rices 2-D DCNN Classification rates of 99.4% and 95.5% for Lin et al. (2018)
calibration and validation sets, respectively
Qualified and defective maize kernels 2-D ResNet Prediction accuracy of 98.2% Ni et al. (2019)
Soybeans in different varieties 1-D CNN Classification rate of each variety over 90% Zhu et al. (2019)
Normal wheat and those with impurities (insects, stalks, 2-D WheNet Recognition rates of 98.59% (Top-1%) and Shen et al. (2019)
grass, awns, spikelet) 99.98% (Top-5%) for testing set
Barley in different varieties 2-D CNN Classification rate of 93.21% Kozlowski et al. (2019)
Bread crusts with different baking periods (0, 5, 10, 15, 2-D Short-CNN Recognition rate of 98.8% Cotrim et al. (2020)
20, 25 and 30 min)
Fruits and Olives in different varieties 2-D Inception- Classification rate of 95.91% Ponce et al. (2019)
vegetables ResnetV2
Dates in different varieties and maturity stages 2-D VGG-16 Classification rates of 99.01%, 97.25%, and Altaheri et al. (2019)
98.59% for varieties, maturity, and harvesting
decision, respectively
Normal and defective apples 2-D CNN Classification rate of 96.5% Fan et al. (2020)
Mealy and non-mealy apples 2-D AlexNet Classification rate of 91.11% Lashgari et al. (2020)
Sound and damaged blueberries 3-D ResNet Average accuracy of 88.44% Wang et al. (2018)
Sound and decayed blueberries 3D-CNN Detection rate of 92.15% Qiao et al. (2020)
Healthy and bruised jujubes from different geographical 1-D CNN Detection rate of over 85% for Vis-NIR spectra Feng et al. (2019)
origins from all geographical origins
Healthy and damaged sour lemons 2-D CNN Classification rate of 100% Jahanbakhshi et al. (2020)
Plums in different varieties 2-D AlexNet Classification rates from 91% to 97% Rodriguez et al. (2018) da
Healthy and defective tomatoes 2-D ResNet Classification rate of 96.2%, 94.2% and 94.6% for Costa et al. (2020)
training set, validation set and testing set,
respectively
Others Normal milk and milk samples adulterated with sucrose, 1-D CNN Classification rates of 98.76% and 96.95% for Asseiss Neto et al. (2019)
soluble starch, sodium bicarbonate, hydrogen peroxide, binary and multiclass, respectively
and formaldehyde
Normal sesame oil and sesame oil counterfeited with 2-D AlexNet R2 > 0.99, RMSEP of 0.99%, Wu et al. (2020)
rapeseed oil, soybean oil and corn oil 2.20% and 1.64% for three counterfeit samples,
respectively
optimal bands and aggregated these features to form the final joint performance of the VGG-16-based feature was superior to traditional
spatial-spectral features based on the feature fusion method of feature extraction methods, and a classification accuracy of 98.21% was
fully-connected neural network (FNN), which was then used for adul achieved.
teration detection of five different states of mixed red meats. They The above studies confirm that CNN features are more efficient and
showed that the joint features performed better than the spectral and robust as compared with features handcrafted and features captured
texture feature based on SVM and obtained an overall detection accu based on traditional algorithms, therefore CNN features provide the
racy of 94.4%. Recently, Al-Sarayreh et al. (2020) further proposed a most useful information of samples and can achieve encouraging results.
3-D CNN model to simultaneously capture the spectral and spatial In addition, CNN coupled with other nondestructive detection tech
feature of sample for the classification of three different meat species niques should be further explored to detect the quality of more meat and
using snapshot HSI and line-scanning HSI system and found that the 3-D aquatic products in different conditions such as during transportation,
CNN model was excellent on the snapshot HSI system in near-infrared for various storage periods or at varying temperatures, which can be a
(NIR) range and visible (VIS) range with a classification accuracy of focus for future studies.
96.9% and 97.1%, respectively.
With regard to aquatic products, surface defects are considered the 3.2. Cereals and cereal products
main issue in downgrading the quality of the products. Xu et al. (2018)
utilized a 2-D AlexNet model and histograms of oriented gradients Cereals including rice, maize, soybean, wheat and barley are
(HOG) model to extract the features of salmon fillet samples, which were important agricultural products but they are vulnerable to resist diseases
subsequently fed into SVM for defect detection of the fillets. A classifi and insects. Therefore, the development of effective detection methods
cation accuracy of 80.0% was obtained based on handcrafted features of for guaranteeing the quality and yield of cereals are of great interest.
HOG, however, the use of the 2-D AlexNet feature was obviously better CNN combined with computer vision systems or hyperspectral imaging
than the handcrafted features of HOG, resulting in a classification ac techniques can provide an efficient tool for detecting and analyzing
curacy of 91.6% for testing sets. In addition, freshness detection of cereals and cereal products (Kozlowski et al., 2019; Zhu et al., 2019).
aquatic products is also of great concern (Dai et al., 2016). Taher Rice from different growth environments has different nutrients and
i-Garavand et al. (2020) employed a 2-D VGG-16 architecture as a deep flavours (Li et al., 2020; Qiu et al., 2018), knowing the species of rice is
feature extractor to extract the RGB image features of common carp with thus important for farmers and consumers. Qiu et al. (2018) utilized a
four different levels of freshness, which were subsequently fed into a modified 1-D VGGNet model with different sizes of training sets to
classifier block for freshness detection. Their results indicated that the directly extract the 1-D spectral feature of four varieties of rice from
199
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
hyperspectral images in two spectral ranges of 441–948 nm and Several studies (Hossain et al., 2019; Rodriguez et al., 2018) have
975–1646 nm and found that the classification accuracy gradually illustrated that CNN has unique advantages over traditional methods in
increased with increasing of the sample number of the training set and differentiating fruit varieties. In particularly, Ponce et al. (2019) com
VGGNet model with 3000 training samples outperformed KNN and SVM bined the CNN model with machine vision systems to differentiate seven
models for 1-D spectra, which yielded the best classification accuracy of different varieties of olive fruit. They applied six different 2-D CNN ar
87.0% in the spectral range of 975–1646 nm. Lin et al. (2018) combined chitectures to extract image features of the olive fruit and found that
a 2-D CNN architecture with a machine vision system to differentiate with Inception-ResnetV2 architecture the best classification accuracy of
three varieties of rice and the image features of rice samples were 95.91% was achieved. Similarly, Altaheri et al. (2019) tested the ca
extracted using CNN. In comparison with handcrafted features, the CNN pacity of pre-trained AlexNet and VGG-16 models for real-time differ
model achieved the highest classification accuracy of 99.4% and 95.5% entiation of date fruit according to varieties, maturity, and harvesting
in the calibration and validation sets, respectively. The above studies decision, and found that the fine-tuned VGG-16 model was the best with
demonstrated the possibility that 1-D and 2-D CNN combining hyper classification accuracies of 99.01%, 97.25%, and 98.59% for varieties,
spectral imaging and machine vision technique can discriminate against maturity, and harvesting decisions, respectively.
species of rice with acceptable precision and accuracy. Additionally, defect detection of fruit is a significant issue of concern.
On the other hand, Zhu et al. (2019) explored the feasibility of 1-D Recently, there appeared many studies about the application of CNN for
CNN models with a few training samples for identifying three varieties defect detection of apple (Fan et al., 2020), blueberry (Qiao et al., 2020;
of soybeans with the aid of pixel-wise spectra of hyperspectral imaging Wang et al., 2018), winter jujube (Feng et al., 2019) and sour lemons
(HSI) and found that the pixel-wise CNN model yielded the most satis (Jahanbakhshi et al., 2020) as well as differentiation of mealy apples
factory result. Furthermore, they concluded that the performance could from and non-mealy ones (Lashgari et al., 2020). Especially, for
be further improved by increasing the number of training set samples. detecting bruises of winter jujube from four different geographical ori
Maize is known as the “queen of the cereals” with high nutritional gins, Feng et al. (2019) designed a 1-D CNN model to extract pixel-wise
values and harvested as human food and livestock feed. However, the spectral features from the region of interest (ROI) of the jujube HSI
defect of maize kernels can degrade the value of maize, thereby defect images in VIS-NIRS (502–947 nm) and NIR (975–1646 nm) spectral
detection of maize is needed. Ni et al. (2019) developed an automatic regions and their results proved the feasibility of 1-D CNN in detecting
inspection machine based on a dual-side camera and 2-D ResNet model subtle bruises as compared with traditional methods. In addition, the
for defect detection of maize kernels. A total of 2040 pairs of maize feasibility of 2-D CNN combined with a computer vision system was
kernel images were prepared and pretreated using dual-side camera and explored by Fan et al. (2020) to detect defective apples and the best
k-means clustering guided-curvature method, which were subsequently results with an accuracy of 96.5% were obtained for the testing set.
sent to 2-D ResNet for defect detection, achieving the best classification Meanwhile, da Costa et al. (2020) tested the performance of different
accuracy of 98.2%. 2-D ResNet architectures based on fixed feature extractor and
Besides studies focusing on quality detection of rice, soybean and fine-tuning methods for detecting the external defects of tomatoes and
maize, some attempts have also been made to quality detection of wheat found that fine-tuned ResNet50 yielded the most satisfactory result with
and barley. In one study, Shen et al. (2019) proposed a 2-D WheNet classification rates of 96.2%, 94.2% and 94.6% for the training set,
convolutional neural network to extract the image features of normal validation set and testing set, respectively.
wheat and five kinds of wheat with impurities for the detection of the More importantly, in order to improve the classification accuracy
impurity in wheat samples. Compared with the detection results of and reduce the number of architecture parameters, a deep residual 3-D
ResNet_101 and Inception_v3 networks, the WheNet model proved to CNN framework was utilized to extract simultaneously rich spectral and
perform better with recognition accuracies of 98.59% (Top-1%) and spatial features from hyperspectral images of fresh and decayed blue
99.98% (Top-5%). However, it should be noted that the use of some berries for the early detection of the decay and the most satisfactory
image preprocessing methods including image augmentation, deblur result with a detection rate of 92.15% was obtained compared with
ring and binarization can avoid overfitting problems and improve the AlexNet and GoogleNet models (Qiao et al., 2020). The above studies
performance of detection. In another study, the capacity of different illustrated the potential of 1-D, 2-D and 3-D CNN models as an effective
configurations CNN model was studied in the classification of barley and rapid tool for feature extraction in defect detection of fruits.
varieties comparing with image feature extracted by AlexNet and
ResNet18 models based on fixed feature extractor and fine-tuning 3.4. Others
methods, and it was found that the CNN model in 64 3× 3 configura
tions yielded the most satisfactory result with a classification rate of CNN has also been applied to deal with other food-related problems,
93.21% (Kozlowski et al., 2019). including quality detection of liquid foods, food volume estimation,
Meanwhile, some other studies have also been reported on evaluation of nutritional contents and detection of crop diseases. Most of
combining the CNN model with computer vision systems for detecting these studies have yielded satisfactory results, thereby encouraging
cereal products, for example, Cotrim et al. (2020) identified the further research.
browning degree of bread crust during baking, which was closely linked
to consumer purchase decisions. In their study, the short-CNN model 3.4.1. Quality detection of liquid products
based on the Inception v3 module was employed to extract the image In liquid products such as milk and edible oil, adulteration can take
features of bread crust of seven different baking periods. The best results place, thus authentication detection is vitally important to the industry.
were obtained by short-CNN with a global accuracy of 98.8% and the Asseiss Neto et al. (2019) designed a 1-D CNN architecture for the
short-CNN model was proven to be more advantageous than AlexNet adulteration detection of milk. In the detection, two types of data
and VGGNet-16 models. including infrared spectra and component features were generated by
Fourier transformed infrared spectroscopy (FTIR). The proposed 1-D
3.3. Fruits CNN was utilized as a deep feature extractor to extract features of 1-D
infrared spectra, and random forest (RF) and gradient boosting ma
Fruits offer abundant essential vitamins and other nutrients and thus chine (GBM) classifiers were employed to analyze component features.
are important for a healthy diet. However, many factors such as harvest, Results showed that 1-D CNN outperformed RF, GBM and classical
storage and transportation conditions can affect their quality, therefore, learning methods, with the highest classification accuracies of 98.76%
the quality detection of fruits and vegetables is important for the and 96.95% for binary and multiclass classifications, respectively.
industry. Recently, Wu et al. (2020) developed 2-D AlexNet architecture for
200
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
identifying three counterfeit sesame oil and four pure oil samples using a modified neural network to train. To validate the feasibility, the
3D fluorescence spectroscopy. The proposed AlexNet was utilized as a trained neural network was utilized to infer the depth image of the
deep feature extractor to capture the features of fluorescence contour opposite side from the input depth image with invisible viewing angles,
images, which were subsequently fed into SVM and partial least squares then the two depth images were fused to generate a completed 3D point
(PLS) classifiers for determining the counterfeiting of the sesame oil. cloud map using point cloud completion algorithm. Furthermore, the
Their results revealed that the model of the AlexNet-based feature iterative closest point (ICP) algorithm was adapted in the proposed
achieved better results than SVM and PLS, providing the possibility of approach to address the misalignment problem of point cloud registra
combining CNN with fluorescence spectroscopy to rapid authentication tion, and then the global point cloud of the 3D model was meshed with
detection of liquid products. the alpha shape to estimate the volume. Their results demonstrated that
the modified neural network outperformed the naive version, and was
3.4.2. Estimation of food volume able to infer the depth image of the opposite side from the input depth
Estimation of food volume can be the most direct and effective so image without overfitting. However, to accurately estimate the volume
lution to monitor dietary intake. Commonly used food volume estima from a single depth image, further investigations are required to
tion methods, such as model-based and stereo-based, have proved establish a more representative 3D model database for training the
effective in measuring the volume of food. However, these methods rely neural network.
heavily on manual intervention, which can be tedious (Lo et al., 2018).
In recent years, CNN has been introduced to estimate food volume. In 3.4.3. Evaluation of food nutritional contents
one study, Myers et al. (2015) designed a framework based on CNN to The nutrients in food that can produce calories to maintain life are
estimate the volume of food from a single RGB image on three food protein, fat and carbohydrates. However, a high-calorie intake that is
datasets including NYUv2 RGBD (training), GFood3d (fine-tuning) and not burned by physical activity can increase the risk of developing
NFood-3d dataset (testing), as shown in Fig. 5(a) and indicated that the lifestyle-related diseases such as obesity, diabetes and hypertension
proposed CNN volume predictor was very successful. Their volume (Situju et al., 2019). In addition, high-calorie foods do not mean high
assessment process could be described as follows: a CNN model was nutrition. On the contrary, high-calorie foods often contain very low or
utilized to infer the depth map from a single RGB image, each pixel of no nutrients, such as some popular beverages, and junk foods. However,
the inferred depth map was then projected into a 3-D space to generate in addition to the nutrients mentioned above, there are mineral nutrients
voxel representation, and the volume could thus be estimated by and vitamins. Although mineral nutrients and vitamins do not contain
calculating the occupied voxels. However, their approach could calories, they still have a significant relationship with our body meta
generate large volume estimation errors due to view occlusion and bolism, especially the absence of essential nutrients in the body can
contours ambiguity. cause a number of health problems related to organ degradation such as
To address the problems of view occlusion and contours ambiguity, weakened immune systems, weak bone structure, sparse hairline and so
Lo et al. (2018) proposed a comprehensive approach based on CNN and on (Sundaravadivel et al., 2018). Therefore, to prevent these
depth-sensing technique, as shown in Fig. 5(b). The paired depth images lifestyle-related diseases, estimating the calorie intake and nutritional
(initial viewing angles and its opposite side) of 8 different types of foods contents is a relevant and important issue.
were captured and rendered to build the training dataset based on a Nowadays, people are concerned about high levels of calorie intakes,
range of extrinsic camera parameters, which were subsequently fed into and evaluating the calorie from food image can provide an effective
Fig. 5. The comparison of two different methods for food volume estimation using (a) the voxel grid of a depth map predicted by CNN from the RGB image of the
food item (Myers et al., 2015) and (b) a 3D point cloud map generated by fusing two depth images, with one image from depth camera and the other from deep neural
network (Lo et al., 2018).
201
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
solution to maintain a relatively balanced calorie content in the human the highest classification of 97.89% as compared with traditional
body. The evaluation of food calories can be defined as a regression methods of SVM and ANN. The above studies showed that integrating
problem that uses multi-task 2-D CNN to predict the calories values. CNN with machine vision technique could significantly improve the
Therefore, Situju et al. (2019) and Ege et al. (2019) studied the feasi detection performance of crop diseases.
bility of multi-task 2-D CNN architecture in order to evaluate food cal
orie from food images. In the study of Situju et al. (2019), two different 4. Challenges and future work
scale image datasets were collected, including a middle-scale categor
y-annotated food image database and a small-scale dataset containing With the unique advantages of strong feature learning and good
both calorie content and salinity, which were subsequently fed into generalization ability, CNN is potential and attractive for effective and
pre-trained CNNs with ImageNet to conduct two-stage fine-tuning. Their efficient analysis of complex food matrices. CNN can not only auto
results illustrated that the multi-task CNN with two-stage fine-tuning matically locate important features, but can also obtain unparalleled
outperformed single-task CNN and multi-task CNN without two-stage performance under challenging conditions such as complex background,
fine-tuning in terms of food classification, and calorie and salinity and different resolutions and orientations of the images. Despite the
evaluation with a related error of 31.2%, an absolute error of 89.6 kcal advantages of CNN in the provision of better performance, there still
and a correlation coefficient of 0.84 for calorie evaluation, and 36.1%, remain numerous challenges to its applications in the food domain.
0.74 g and 0.45 for salinity evaluation, respectively. Firstly, many kinds of sensors and nondestructive detection tech
Although CNN-based methods are effective for evaluating food cal niques have been widely applied for obtaining external information
orie by treating the evaluation process as a regression issue, the evalu including weight, smell, touch, firmness and taste and internal charac
ation performance still needs improvement. Furthermore, as the calorie teristics of food such as component contents and composition. However,
contents of food dishes or items in the same category can be different the effective fusion and full utilization of multisource data by using CNN
depending on many factors such as ingredients and cooking directions is a challenging task. Current fusion methods just simply stack and
(Ege et al., 2018), it is necessary to construct large-scale food image concatenate the data or features from sensors and advanced detection
databases labelled with calorie contents, ingredients, cooking directions systems, which can bring redundant information and affect the accuracy
and bounding boxes, respectively. of detection. Therefore, methods of effectively highlighting the most
Evaluation of food nutritional contents can provide nutrition infor informative features while reducing noises are expected on data fusion.
mation to unpacked or cooked foods in restaurants, which is similar to Secondly, the robustness and generalization of the CNN model are
the provision of nutrients in packaged food. Ahn et al. (2019) employed linked to the quality and diversity of data. Although some publicly
multiple deep neural networks (DNN) architecture to evaluate nutri available food image datasets can be used for food recognition and
tional contents such as CPF (carbohydrates, proteins, and fats) of five classification, these datasets do not completely cover all types of foods.
different kinds of food items using their hyperspectral images in the Moreover, some researchers use nonpublic data for food detection,
wavelength range of 887–1722 nm. The proposed procedure utilized a which makes it even more difficult to guarantee the reproducibility of
common network to extract the common features from the hyperspectral their studies. Unfortunately, building a worldwide accessible reference
signal from ROI, three nutrient-specific networks to compute the CPF dataset for detection and analysis of food is challenging, the establish
values based on these common features, and a verification network (VN) ment of food datasets with sufficient generality and diversity is still
to verify the evaluation results. Furthermore, the autoencoder mecha greatly needed.
nism and the sandwich strategy were adapted in the joint layer to share Thirdly, it remains a challenge to train CNN models and optimize the
and compress the features, and the error avoidance scheme based on VN parameters according to specific food analysis tasks. The number of
was also adapted to remove the outlier. Their results showed that the layers, filters and epochs, as well as the hyperparameters of the model,
proposed procedure achieved the highest R2 values and mean absolute are usually determined by trial-and-error tuning until optimal settings
percentage errors (SMAPE) of 0.9543 and 0.0997 for carbohydrate are obtained, which is mostly based on expert experience and is one of
evaluation, 0.8527 and 0.1352 for protein evaluation, and 0.8481 and the major bottlenecks of tuning for most researchers. Although recent
0.1218 for fat evaluation. approaches such as the automated Bayesian optimization method can
find better hyperparameters faster than experts, it does not fit well with
3.4.4. Detection of crop diseases large models. Therefore, new methods should be developed so that they
Although not in the area of food products, crop diseases are related to can not only self-adaptively search optimal settings but can also improve
food production. The efficient extraction of these diseases can enable the the performance of CNN.
early detection of infected crops, avoiding the loss of food production. Finally, CNN models deployed on mobile devices with limited
CNN has thus been employed for realizing efficient crop diseases memory and battery constraints bring even more challenge for real-time
detection. Li et al. (2020) and Chen et al. (2020) developed a 2-D deep detection and analysis of food. In addition, the high cost of hardware
convolutional neural network (DCNN) backbone and DenseNet model as needs to be taken into account.
an image feature extractor for the detection of paddy diseases. In the Despite the above challenges, the enormous potentials of CNN-based
study of Li et al. (2020), a paddy diseases video detection system based approaches have not yet been fully exploited. From the perspective of
on a faster-RCNN framework with a custom DCNN backbone was improving the efficiency and accuracy of food detection, several po
established. The detection process consisted of the following steps: a tential research directions emerge for the CNN-based method in the
customized DCNN backbone was trained with still-image of three paddy future.
diseases including sheath blight, stem borer and brown spot to extract A promising research field is mobile hyperspectral imaging systems
the features of the diseases and a frame extraction module was applied to using snapshot HSI and 3-D CNN models. Although the snapshot HSI has
extract the frame of the input video and then the frame was sent to the the advantages in fast acquisition of spectral data and ultra-portability
faster-RCNN framework for detecting the diseases. Their results revealed compared with the line-scanning HSI, it can only acquire the data in
that the faster-RCNN model with the customized DCNN backbone was limited spectral wavelength ranges. However, due to the potential and
superior to other backbone systems including VGG16, ResNet-50 and robustness of 3-D CNN models, the snapshot HSI couple with 3-D CNN
ResNet-101 and YOLOv3 models. model can achieve excellent performance. Hence, the mobile HSI system
Additionally, detection of maize leaf disease was conducted by is more appropriate for real-time detection of food in the future.
Priyadharshini et al. (2019), who utilized a modified 2-D LeNet archi In addition, since HSI techniques have concepts similar to Terahertz
tecture as a feature extractor to capture the features of three different imaging, and they are utilized as a nondestructive tool to reflect the
maize disease images based on the PlantVillage dataset, and achieved intrinsic information of food for food inspection, it will be worth
202
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
examining the applicability of CNN coupled with Terahertz spectros Chen, J., Zhang, D., Nanehkaran, Y. A., & Li, D. (2020). Detection of rice plant diseases
based on deep transfer learning. Journal of the Science of Food and Agriculture, 100(7),
copy for food detection (Zhang et al., 2020).
3246–3256.
Furthermore, a coming research field is combining CNN with Cheng, W., Sun, D.-W., & Cheng, J.-H. (2016a). Pork biogenic amine index (BAI)
blockchain and internet of things (IoT) techniques to control food pro determination based on chemometric analysis of hyperspectral imaging data. LWT-
cessing as food chains usually involve a series of processes of planting Food Science and Technology, 73, 13–19.
Cheng, W., Sun, D.-W., Pu, H., & Liu, Y. (2016b). Integration of spectral and textural data
(feeding), growing, harvesting (slaughtering) by farmers (butchers), for enhancing hyperspectral prediction of K value in pork meat. LWT-Food Science
processing by manufacturers, storage and transportation by distributors, and Technology, 72, 322–329.
and sale by retailers. IoT and blockchain can provide a large amount of Cheng, W., Sun, D.-W., Pu, H., & Wei, Q. (2017). Chemical spoilage extent traceability of
two kinds of processed pork meats using one multispectral system developed by
information in the food chain, which can be utilized by CNN for hyperspectral imaging combined with effective variable selection methods. Food
handling and optimizing for ensuring process quality and safety. Chemistry, 221, 1989–1996.
Therefore, the combination of IoT and blockchain with CNN will play an Cheng, W., Sun, D.-W., Pu, H., & Wei, Q. (2018). Heterospectral two-dimensional
correlation analysis with near-infrared hyperspectral imaging for monitoring
important role in food detection and analysis in the future. oxidative damage of pork myofibrils during frozen storage. Food Chemistry, 248,
119–127.
Ciocca, G., Napoletano, P., & Schettini, R. (2018). CNN-based features for retrieval and
5. Conclusions
classification of food images. Computer Vision and Image Understanding, 176, 70–77.
Cotrim, W.d. S., Rodridgues Minim, V. P., Felix, L. B., & Minim, L. A. (2020). Short
CNN is a promising feature extraction tool that has gradually convolutional neural networks applied to the recognition of the browning stages of
replaced traditional machine learning algorithms. CNN can not only bread crust. Journal of Food Engineering, 277, 109916, 109916.
da Costa, A. Z., Figueroa, H. E. H., & Fracarolli, J. A. (2020). Computer vision based
extract the most robust and effective features but also has strong detection of external defects on tomatoes using deep learning. Biosystems Engineering,
generalization ability, which is infeasible with traditional machine 190, 131–144.
learning methods. This review introduces the principles of CNN, dis Dai, Q., Cheng, J.-H., Sun, D.-W., Zhu, Z., & Pu, H. (2016). Prediction of total volatile
basic nitrogen contents using wavelet features from visible/near-infrared
cusses the feature extraction methods based on 1-D, 2-D and 3-D CNN hyperspectral images of prawn (Metapenaeus ensis). Food Chemistry, 197, 257–265.
models and multi-features aggregation method, and summarizes the Du, C. J., & Sun, D.-W. (2005). Pizza sauce spread classification using colour vision and
recent applications of CNN models in quality detection including meat support vector machines. Journal of Food Engineering, 66(2), 137–145.
Ege, T., & Yanai, K. (2018). Image-based food calorie estimation using recipe
and aquatic products, cereals and cereal products, fruits and vegetables, information. IEICE - Transactions on Info and Systems, E101D(5), 1333–1341.
oil and milk. In addition, CNN shows to be feasible in the estimation of Ege, T., & Yanai, K. (2019). Simultaneous estimation of dish locations and calories with
food volume and nutritional contents and detection of crop diseases. multi-task learning. IEICE - Transactions on Info and Systems, E102D(7), 1240–1246.
Elmasry, G., Barbin, D. F., Sun, D.-W., & Allen, P. (2012). Meat quality evaluation by
However, despite the prominent performance of CNN in food detection,
hyperspectral imaging technique: An overview. Critical Reviews in Food Science and
there is still enormous potential to further exploit new algorithms for Nutrition, 52(8), 689–711.
improving the computation speed of CNN, and future studies can focus Fan, S., Li, J., Zhang, Y., Tian, X., Wang, Q., He, X., Zhang, C., & Huang, W. (2020). On
line detection of defective apples using computer vision system combined with deep
on such an area. It is hoped that this review can encourage further
learning methods. Journal of Food Engineering, 286, 110102.
research in food detection based on CNN. Feng, L., Zhu, S., Zhou, L., Zhao, Y., Bao, Y., Zhang, C., & He, Y. (2019). Detection of
subtle bruises on winter jujube using hyperspectral imaging with pixel-wise deep
learning method. IEEE Access, 7, 64494–64505.
Acknowledgements Gu, K., Xia, Z., Qiao, J., & Lin, W. (2020). Deep dual-channel neural network for image-
based smoke detection. IEEE Transactions on Multimedia, 22(2), 311–323.
The authors are grateful to the National Key R&D Program of China Gu, K., Zhang, Y., & Qiao, J. (2021). Ensemble meta-learning for few-shot soot density
recognition. IEEE Transactions on Industrial Informatics, 17(3), 2261–2270.
(2018YFC1603400) for its support. This research was also supported by Hossain, M. S., Al-Hammadi, M., & Muhammad, G. (2019). Automatic fruit classification
the Guangdong Basic and Applied Basic Research Foundation using deep learning for industrial applications. IEEE Transactions on Industrial
(2020A1515010936), the Fundamental Research Funds for the Central Informatics, 15(2), 1027–1034.
Hu, W., Huang, Y., Wei, L., Zhang, F., & Li, H. (2015). Deep convolutional neural
Universities (D2190450), the Contemporary International Collaborative
networks for hyperspectral image classification. Journal of Sensors, 258619, 2015.
Research Centre of Guangdong Province on Food Innovative Processing Hussain, G., Maheshwari, M. K., Memon, M. L., Jabbar, M. S., & Javed, K. (2019). A CNN
and Intelligent Control (2019A050519001) and the Common Technical based automated activity and food recognition using wearable sensor for preventive
healthcare. Electronics, 8(12), 1425.
Innovation Team of Guangdong Province on Preservation and Logistics
Jackman, P., Sun, D.-W., & Allen, P. (2011). Recent advances in the use of computer
of Agricultural Products (2020KJ145). Yao Liu is grateful for his MSc vision technology in the quality assessment of fresh meats. Trends in Food Science &
study supervised and supported by the Academy of Contemporary Food Technology, 22(4), 185–197.
Engineering, South China University of Technology, China. Jahanbakhshi, A., Momeny, M., Mahmoudi, M., & Zhang, Y.-D. (2020). Classification of
sour lemons based on apparent defects using stochastic pooling mechanism in deep
convolutional neural networks. Scientia Horticulturae, 263, 109133.
References Jahani Heravi, E., Habibi Aghdam, H., & Puig, D. (2018). An optimized convolutional
neural network with bottleneck and spatial pyramid pooling layers for classification
of foods. Pattern Recognition Letters, 105, 50–58.
Ahn, D., Choi, J.-Y., Kim, H.-C., Cho, J.-S., Moon, K.-D., & Park, T. (2019). Estimating the
Jiang, S., Min, W., Liu, L., & Luo, Z. (2020). Multi-scale multi-view deep feature
composition of food nutrients from hyperspectral signals based on deep neural
aggregation for food recognition. IEEE Transactions on Image Processing, 29(1),
networks. Sensors, 19(7), 1560.
265–276.
Al-Sarayreh, M., Reis, M. M., Wei Qi, Y., & Klette, R. (2018). Detection of red-meat
Kamilaris, A., & Prenafeta-Boldu, F. X. (2018a). Deep learning in agriculture: A survey.
adulteration by deep spectral-spatial features in hyperspectral images. Journal of
Computers and Electronics in Agriculture, 147, 70–90.
Imaging, 4(5), 63.
Kamilaris, A., & Prenafeta-Boldu, F. X. (2018b). A review of the use of convolutional
Al-Sarayreh, M., Reis, M. M., Yan, W. Q., & Klette, R. (2020). Potential of deep learning
neural networks in agriculture. Journal of Agricultural Science, 156(3), 312–322.
and snapshot hyperspectral imaging for classification of species in meat. Food
Kozlowski, M., Gorecki, P., & Szczypinski, P. M. (2019). Varietal classification of barley
Control, 117, 107332.
by convolutional neural networks. Biosystems Engineering, 184, 155–165.
Altaheri, H., Alsulaiman, M., & Muhammad, G. (2019). Date fruit classification for
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep
robotic harvesting in a natural environment using deep learning. IEEE Access, 7,
convolutional neural networks. Communications of the ACM, 60(6), 84–90.
117115–117133.
Kumar, B., Dikshit, O., Gupta, A., & Singh, M. K. (2020). Feature extraction for
Asseiss Neto, H., Tavares, W. L. F., Ribeiro, D. C. S. Z., Alves, R. C. O., Fonseca, L. M., &
hyperspectral image classification: A review. International Journal of Remote Sensing,
Campos, S. V. A. (2019). On the utilization of deep and ensemble learning to detect
41(16), 6248–6287.
milk adulteration. BioData Mining, 12(1), 1–13.
Lashgari, M., Imanmehr, A., & Tavakoli, H. (2020). Fusion of acoustic sensing and deep
Audebert, N., Le Saux, B., & Lefevre, S. (2019). Deep learning for classification of
learning techniques for apple mealiness detection. Journal of Food Science and
hyperspectral data. IEEE Geoscience and Remote Sensing Magazine, 7(2), 159–173.
Technology-Mysore, 57(6), 2233–2240.
Ball, J. E., Anderson, D. T., & Chan, C. S. (2017). Comprehensive survey of deep learning
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
in remote sensing: Theories, tools, and challenges for the community. Journal of
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to
Applied Remote Sensing, 11(4), Article 042609.
document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Chen, Y., Jiang, H., Li, C., Jia, X., & Ghamisi, P. (2016). Deep feature extraction and
classification of hyperspectral images based on convolutional neural networks. IEEE
Transactions on Geoscience and Remote Sensing, 54(10), 6232–6251.
203
Y. Liu et al. Trends in Food Science & Technology 113 (2021) 193–204
Lin, P., Li, X. L., Chen, Y. M., & He, Y. (2018). A deep convolutional neural network Qiao, S., Wang, Q., Zhang, J., & Pei, Z. (2020). Detection and classification of early decay
architecture for boosting image discrimination accuracy of rice species. Food and on blueberry based on improved deep residual 3D convolutional neural network in
Bioprocess Technology, 11(4), 765–773. hyperspectral images. Scientific Programming, 2020, Article 8895875.
Liu, C., Cao, Y., Luo, Y., Chen, G., Vokkarane, V., Ma, Y., … Hou, P. (2018a). A new deep Qiu, Z., Chen, J., Zhao, Y., Zhu, S., He, Y., & Zhang, C. (2018). Variety identification of
learning-based food recognition system for dietary assessment on an edge computing single rice seed using hyperspectral imaging combined with convolutional neural
service infrastructure. IEEE Transactions on Services Computing, 11(2), 249–261. network. Applied Sciences-Basel, 8(2), 212.
Liu, Y., Sun, D.-W., Cheng, J.-H., & Han, Z. (2018b). Hyperspectral imaging sensing of Qi, W., Zhang, X., Wang, N., Zhang, M., & Cen, Y. (2019). A spectral-spatial cascaded 3D
changes in moisture content and color of beef during microwave heating process. convolutional neural network with a convolutional long short-term memory network
Food Analytical Methods, 11(9), 2472–2484. for hyperspectral image classification. Remote Sensing, 11(20), 2363.
Liu, Y., Pu, H., & Sun, D.-W. (2017). Hyperspectral imaging technique for evaluating food Qu, J.-H., Liu, D., Cheng, J.-H., Sun, D.-W., Ma, J., Pu, H., & Zeng, X.-A. (2015).
quality and safety during various processes: A review of recent applications. Trends Applications of near-infrared spectroscopy in food safety evaluation and control: A
in Food Science & Technology, 69, 25–35. review of recent research advances. Critical Reviews in Food Science and Nutrition, 55
Li, D., Wang, R., Xie, C., Liu, L., Zhang, J., Li, R., Wang, F., Zhou, M., & Liu, W. (2020). (13), 1939–1954.
A recognition method for rice plant diseases and pests video detection based on deep Rodriguez, F. J., Garcia, A., Pardo, P. J., Chavez, F., & Luque-Baena, R. M. (2018). Study
convolutional neural network. Sensors, 20(3), 578. and classification of plum varieties using image analysis and deep learning
Li, Y., Zhang, H., & Shen, Q. (2017). Spectral-spatial classification of hyperspectral techniques. Progress in Artificial Intelligence, 7(2), 119–127.
imagery with 3D convolutional neural network. Remote Sensing, 9(1), 67. Shen, Y., Yin, Y., Zhao, C., Li, B., Wang, J., Li, G., & Zhang, Z. (2019). Image recognition
Lo, F. P. W., Sun, Y., Qiu, J., & Lo, B. (2018). Food volume estimation based on deep method based on an improved convolutional neural network to detect impurities in
learning view synthesis from a single depth map. Nutrients, 10(12), 2005. wheat. IEEE Access, 7, 162206–162218.
Ma, J., Pu, H., & Sun, D.-W. (2018). Predicting intramuscular fat content variations in Situju, S. F., Takimoto, H., Sato, S., Yamauchi, H., Kanagawa, A., & Lawi, A. (2019). Food
boiled pork muscles by hyperspectral imaging using a novel spectral pre-processing constituent estimation for lifestyle disease prevention by multi-task CNN. Applied
technique. LWT-Food Science and Technology, 94, 119–128. Artificial Intelligence, 33(8), 732–746.
Ma, J., Sun, D.-W., & Pu, H. (2017). Model improvement for predicting moisture content Steinbrener, J., Posch, K., & Leitner, R. (2019). Hyperspectral fruit and vegetable
(MC) in pork longissimus dorsi muscles under diverse processing conditions by classification using convolutional neural networks. Computers and Electronics in
hyperspectral imaging. Journal of Food Engineering, 196, 65–72. Agriculture, 162, 364–372.
McAllister, P., Zheng, H., Bond, R., & Moorhead, A. (2018). Combining deep residual Sun, D.-W., & Brosnan, T. (2003). Pizza quality evaluation using computer vision - Part 2
neural network features with supervised machine learning algorithms to classify - Pizza topping analysis. Journal of Food Engineering, 57(1), 91–95.
diverse food image datasets. Computers in Biology and Medicine, 95, 217–233. Sundaravadivel, P., Kesavan, K., Kesavan, O., Mohanty, S. P., & Kougianos, E. (2018).
Mezgec, S., & Seljak, B. K. (2017). NutriNet: A deep learning food and drink image Smart-log: A deep-learning based automated nutrition monitoring system in the IoT.
recognition system for dietary assessment. Nutrients, 9(7), 657. IEEE Transactions on Consumer Electronics, 64(3), 390–398.
Myers, A., Johnston, N., Rathod, V., Korattikara, A., Gorban, A., Silberman, N., Taheri-Garavand, A., Nasiri, A., Banan, A., & Zhang, Y.-D. (2020). Smart deep learning-
Guadarrama, S., Papandreou, G., Huang, J., & Murphy, K. (2015). Im2Calories: based approach for non-destructive freshness diagnosis of common carp fish. Journal
Towards an automated mobile vision food diary. In International conference on of Food Engineering, 278, 109930.
computer vision (ICCV), 2015 IEEE (pp. 1233–1241). IEEE. Teng, J., Zhang, D., Lee, D.-J., & Chou, Y. (2019). Recognition of Chinese food using
Ni, C., Wang, D., Vinson, R., Holmes, M., & Tao, Y. (2019). Automatic inspection convolutional neural network. Multimedia Tools and Applications, 78(9),
machine for maize kernels based on deep convolutional neural networks. Biosystems 11155–11172.
Engineering, 178, 131–144. Wang, Z., Hu, M., & Zhai, G. (2018). Application of deep learning architectures for
Pandey, P., Deepthi, A., Mandal, B., & Puhan, N. B. (2017). FoodNet: Recognizing foods accurate and rapid detection of internal mechanical damage of blueberry using
using ensemble of deep networks. IEEE Signal Processing Letters, 24(12), 1758–1762. hyperspectral transmittance data. Sensors, 18(4), 1126.
Pan, L., Li, C., Pouyanfar, S., Chen, R., & Zhou, Y. (2020). A novel combinational Wang, H. H., & Sun, D.-W. (2003). Assessment of cheese browning affected by baking
convolutional neural network for automatic food-ingredient classification. Cmc- conditions using computer vision. Journal of Food Engineering, 56(4), 339–345.
Computers Materials & Continua, 62(2), 731–746. Wang, K., Sun, D.-W., & Pu, H. (2017). Emerging non-destructive terahertz spectroscopic
Pan, L., Qin, J., Chen, H., Xiang, X., Li, C., & Chen, R. (2019). Image augmentation-based imaging technique: Principle and applications in the agri-food industry. Trends in
food recognition with convolutional neural networks. Cmc-Computers Materials & Food Science & Technology, 67, 93–105.
Continua, 59(1), 297–313. Wu, N., Zhang, C., Bai, X., Du, X., & He, Y. (2018). Discrimination of chrysanthemum
Pan, Y., Sun, D.-W., Cheng, J.-H., & Han, Z. (2018). Non-destructive detection and varieties using hyperspectral imaging combined with a deep convolutional neural
screening of non-uniformity in microwave sterilization using hyperspectral imaging network. Molecules, 23(11), 2831.
analysis. Food Analytical Methods, 11(6), 1568–1580. Wu, X., Zhao, Z., Tian, R., Shang, Z., & Liu, H. (2020). Identification and quantification of
Paoletti, M. E., Haut, J. M., Plaza, J., & Plaza, A. (2019). Deep learning classifiers for counterfeit sesame oil by 3D fluorescence spectroscopy and convolutional neural
hyperspectral imaging: A review. ISPRS Journal of Photogrammetry and Remote network. Food Chemistry, 311, 125882.
Sensing, 158, 279–317. Xu, J.-L., & Sun, D.-W. (2018). Computer vision detection of salmon muscle gaping using
Ponce, J. M., Aquino, A., & Andujar, J. M. (2019). Olive-fruit variety classification by convolutional neural network features. Food Analytical Methods, 11(1), 34–47.
means of image processing and convolutional neural networks. IEEE Access, 7, Zhang, J., Yang, Y., Feng, X., Xu, H., Chen, J., & He, Y. (2020). Identification of bacterial
147629–147641. blight resistant rice seeds using terahertz imaging and hyperspectral imaging
Pouladzadeh, P., & Shirmohammadi, S. (2017). Mobile multi-food recognition using deep combined with convolutional neural network. Frontiers of Plant Science, 11, 821.
learning. ACM Transactions on Multimedia Computing, Communications, and Zheng, C., Sun, D.-W., & Zheng, L. (2006). Correlating colour to moisture content of large
Applications, 13(3), 36. cooked beef joints by computer vision. Journal of Food Engineering, 77(4), 858–863.
Priyadharshini, R. A., Arivazhagan, S., Arun, M., & Mirnalini, A. (2019). Maize leaf Zhou, L., Zhang, C., Liu, F., Qiu, Z., & He, Y. (2019). Application of deep learning in food:
disease classification using deep convolutional neural networks. Neural Computing & A review. Comprehensive Reviews in Food Science and Food Safety, 18(6), 1793–1811.
Applications, 31(12), 8887–8895. Zhu, S., Zhou, L., Zhang, C., Bao, Y., Wu, B., Chu, H., Yu, Y., He, Y., & Feng, L. (2019).
Identification of soybean varieties using hyperspectral imaging coupled with
convolutional neural network. Sensors, 19(19), 4065.
204