Ieee
Ieee
I. INTRODUCTION
In recent years, a heightened public focus on health recognition and detection tasks by automatically
and nutrition has emerged, fuelled by the widespread learning hierarchical features, thus effectively
availability of food-related content on social media reducing the high dimensionality of image data
and the internet. This surge in visual food data has without losing crucial information [3, 4]. Unlike
created both opportunities and challenges for traditional approaches that depended on manually
automated image classification systems [1]. The engineered features, modern CNN-based solutions
complexity of this task increases with the sheer have consistently outperformed earlier methods [5,
number of food categories, making robust and 6].
efficient classification methods essential [2].
A key advancement in this field is transfer learning,
Deep learning, particularly through Convolutional where models pre-trained on large-scale datasets
Neural Networks (CNNs), has revolutionized image like ImageNet are adapted for specialized tasks, such
as food classification [7, 8, 9]. Notable architectures In [3], Wu and Jiang extended transfer learning
including Inception-v3 [10], Xception [11], methods by leveraging very deep pre-trained
MobileNet [12], EfficientNets [13], and DenseNet networks for food classification tasks. Their findings
[14] have each contributed unique strengths to the underscore the effectiveness of cross-domain
domain, balancing computational efficiency with knowledge transfer from large-scale datasets like
performance. Among these, EfficientNet-B7 stands ImageNet, which enhances performance on
out due to its superior accuracy, scalability, and specialized tasks such as food image classification.
optimized computational efficiency, making it a
powerful choice for fine-grained image
In [4], the concept of compound scaling in
classification tasks such as food recognition.
EfficientNet was introduced, which balances
EfficientNet-B7 employs a compound scaling network depth, width, and resolution to achieve
approach that adjusts depth, width, and resolution state-of-the-art performance. This study highlights
simultaneously, achieving state-of-the-art accuracy the benefits of EfficientNet-B7, which delivers high
while maintaining manageable computational accuracy with fewer parameters and improved
requirements. This enables it to outperform deeper computational efficiency.
architectures while using significantly fewer
parameters, making it particularly useful for real-
In [5], comparative studies among architectures—
world applications requiring both high accuracy and
including Xception, MobileNet, and DenseNet—
efficiency.
reveal that while Xception provides robust accuracy
This paper builds on these developments by and MobileNet offers lightweight solutions,
exploring the potential of state-of-the-art pre-trained EfficientNet-B7 strikes an optimal balance between
networks for food image classification, with a performance and efficiency. Additional techniques
particular focus on optimizing efficiency and such as data augmentation, mixed precision training,
accuracy using EfficientNet-B7. and prefetching have been demonstrated to further
enhance model generalization and robustness in
II. LITERATURESURVEY
food image classification.
In [1], Bossard et al. introduced the Food-101
III. PROPOSED SYSTEM
dataset, exploring the application of discriminative
patch extraction to manage the diverse and complex The proposed system integrates EfficientNet-B7
nature of food images. The study emphasizes the within an advanced training pipeline that employs
challenges inherent in food classification due to high transfer learning, mixed precision training, and data
intra-class variability and noisy backgrounds. prefetching to maximize both accuracy and
efficiency.
In [2], Chen et al. employed deep convolutional A. System Overview and Implementation
neural networks for food image recognition,
Dataset: The Food-101 dataset, consisting of
demonstrating that fine-tuning pre-trained models
101,000 images across 101 categories, is split into
on domain-specific data markedly improves
training and validation subsets.
classification accuracy compared to traditional
feature engineering approaches.
Image Pre-processing: Standard resizing, B. Network Architecture and Performance
normalization, and extensive data augmentation Analysis
(e.g., random rotations, flips, brightness EfficientNet-B7 is the core of our approach,
adjustments) are applied to improve model employing a compound scaling strategy that adjusts
generalization. network depth, width, and input resolution
simultaneously.
Transfer Learning: EfficientNet-B7 is initialized
with ImageNet weights, enabling the model to Key Architectural Features:
leverage pre-learned features before fine-tuning on
Mobile Inverted Bottlenecks: Efficiently capture
Food-101.
features while keeping computational costs low.
Training Pipeline:
Squeeze-and-Excitation Modules: Dynamically
1.Batch Processing & Prefetching: Utilizing recalibrate channel-wise feature responses to
TensorFlow’s tf.data pipelines, data batches are enhance representation.
prefetched to reduce I/O latency and maximize GPU
The proposed model achieves approximately ~85.%
utilization.
Top-1 accuracy on the Food-101 validation set,
2.Mixed Precision Training: TensorFlow’s mixed outperforming traditional architectures such as
precision API reduces memory usage and ResNet101 and Xception.
accelerates computation without sacrificing
The integration of mixed precision training and data
accuracy.
prefetching results in a 30–40% reduction in training
3.Hyperparameter Optimization: An adaptive time, while maintaining efficient resource
learning rate scheduler (via the Adam optimizer) is utilization—critical for real-world deployments.
employed, with checkpoints and TensorBoard
logging for real-time monitoring.
operations while maintaining full precision (float32) Training Performance: The model was trained for
for essential calculations like model updates. As a multiple epochs, demonstrating rapid convergence
result, training throughput increased by due to pre-trained ImageNet weights and an
approximately 30–40%, significantly reducing the optimized learning strategy. The initial epochs (1–3)
total training time without compromising model showed progressive improvement in accuracy, with
accuracy. the validation accuracy reaching 84.15% by epoch
4. As the learning rate adjusted dynamically, the
accuracy continued improving, stabilizing at
84.86% in later epochs (Fig. 4).
REFERENCES
[8] Susan, Seba, Dhaarna Sethi, and Kriti Arora.
"Cross-domain learning for pulmonary nodule
[1] Ilyukhin, Sasha V., Timothy A. Haley, and
detection using Gestalt principle of similarity." Soft
Rakesh K. Singh. "A survey of automation practices
Computing (2023): 1-12.
in the food industry." Food control 12, no. 5 (2001):
285-296.
[9] Saini, Manisha, and Seba Susan. "Cervical
Cancer Screening on Multiclass Imbalanced
[2] Bruno, Vieira, Silva Resende, and Cui Juan. "A
Cervigram Dataset using Transfer Learning." In
survey on automated food monitoring and dietary
2022 15th International Congress on Image and
management systems." Journal of health & medical
Signal Processing, BioMedical Engineering and
informatics 8, no. 3 (2017).
Informatics (CISP-BMEI), pp. 1-6. IEEE, 2022.
[3] LeCun, Yann, Yoshua Bengio, and Geoffrey
[10] Szegedy, Christian, Vincent Vanhoucke, Sergey
Hinton. "Deep learning." nature 521, no. 7553
Ioffe, Jon Shlens, and Zbigniew Wojna. "Rethinking
(2015): 436-444.
the inception architecture for computer vision." In
[4] Susan, Seba, and Seema Chandna. "Object Proceedings of the IEEE conference on computer
recognition from color images by fuzzy vision and pattern recognition, pp. 2818-2826. 2016.
classification of gabor wavelet features." In 2013 5th
[11] Chollet, François. "Xception: Deep learning
International Conference and Computational
with depthwise separable convolutions." In
Intelligence and Communication Networks, pp. 301-
Proceedings of the IEEE conference on computer
305. IEEE, 2013.
vision and pattern recognition, pp. 1251-1258. 2017.
[5] Saini, Manisha, and Seba Susan. "Comparison of
[12] Howard, Andrew G., Menglong Zhu, Bo Chen,
deep learning, data augmentation and bag of-visual-
Dmitry Kalenichenko, Weijun Wang, Tobias
Weyand, Marco Andreetto, and Hartwig Adam. Networks for Pulmonary Nodule Detection." In
"Mobilenets: Efficient convolu-tional neural 2020 IEEE 15th International Conference on
networks for mobile vision applications." arXiv Industrial and Information Systems (ICIIS), pp. 168-
preprint arXiv:1704.04861 (2017). 173. IEEE, 2020.
[13] Tan, Mingxing, and Quoc Le. "Efficientnet: [20] Pan, Lili, Samira Pouyanfar, Hao Chen,
Rethinking model scaling for convolution-al neural Jiaohua Qin, and Shu-Ching Chen. "Deepfood:
networks." In International conference on machine Automatic multi-class classification of food
learning, pp. 6105-6114. PMLR, 2019. ingredients using deep learning." In 2017 IEEE 3rd
international conference on collaboration and
[14] Iandola, Forrest, Matt Moskewicz, Sergey internet computing (CIC), pp. 181-189. IEEE, 2017.
Karayev, Ross Girshick, Trevor Darrell, and Kurt
Keutzer. "Densenet: Implementing efficient convnet [21] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey
descriptor pyramids." arXiv preprint E. Hinton. "Imagenet classification with deep
arXiv:1404.1869 (2014). convolutional neural networks." Communications of
the ACM 60, no. 6 (2017): 84-90.
[15] Tao, Huawei, Li Zhao, Ji Xi, Ling Yu, and Tong
Wang. "Fruits and vegetables recognition based on [22] Jia, Yangqing, Evan Shelhamer, Jeff Donahue,
color and texture features." Transactions of the Sergey Karayev, Jonathan Long, Ross Girshick,
Chinese Society of Agricultural Engineering 30, no. Sergio Guadarrama, and Trevor Darrell. "Caffe:
16 (2014): 305- 311. Convolutional architecture for fast feature
embedding." In Proceedings of the 22nd ACM
[16] Bossard, Lukas, Matthieu Guillaumin, and Luc international conference on Multimedia, pp. 675-
Van Gool. "Food-101– mining discriminative 678. 2014.
components with random forests." In Computer
Vision–ECCV 2014: 13th European Conference, [23] He, Kaiming, Xiangyu Zhang, Shaoqing Ren,
Zurich, Switzerland, September 6-12, 2014, and Jian Sun. "Deep residual learning for image
Proceedings, Part VI 13, pp. 446-461. Springer recognition." In Proceedings of the IEEE conference
International Publishing, 2014. on computer vision and pat-tern recognition, pp.
770-778. 2016.
[17] Zheng, Jiannan, Liang Zou, and Z. Jane Wang.
"Mid‐level deep Food Part mining for food image [24] Yanai, Keiji, and Yoshiyuki Kawano. "Food
recognition." IET Computer Vision 12, no. 3 (2018): image recognition using deep convolutional network
298-304. with pre-training and fine-tuning." In 2015 IEEE
International Conference on Multimedia & Expo
[18] Zhou, Lei, Chu Zhang, Fei Liu, Zhengjun Qiu, Workshops (ICMEW), pp. 1-6. IEEE, 2015.
and Yong He. "Application of deep learning in food:
a review." Comprehensive reviews in food science [25] VijayaKumari, G., Priyanka Vutkur, and P.
and food safety 18, no. 6 (2019): 1793-1811. Vishwanath. "Food classification using transfer
learning technique." Global Transitions Proceedings
[19] Sethi, Dhaarna, Kriti Arora, and Seba Susan. 3, no. 1 (2022): 225-229.
"Transfer Learning by Deep Tuning of Pre-trained
[26] Yadav, Sapna, and Satish Chand. "Automated
food image classification using deep learning
approach." In 2021 7th International Conference on
Advanced Computing and Communication Systems
(ICACCS), vol. 1, pp. 542-545. IEEE, 2021.