Thesis
Thesis
TP True Positive
TN True Negative
FP False Positive
FN False Negative
TiB Tebibyte
1
CHAPTER 1
INTRODUCTION
1.1 Introduction
Tomatoes are a key agricultural product, and their quality significantly impacts con-
sumer satisfaction and market competitiveness. The global demand for high-quality
tomatoes has increased, driven by consumers’ growing preference for fresh and healthy
produce. Traditionally, the grading and sorting of tomatoes have been performed
manually. This method, however, is time-consuming, labor-intensive, and prone to
human error, leading to higher costs and inefficiencies. In large-scale production en-
vironments, where the demand for high-quality produce is constant, manual methods
fail to meet rising consumer expectations and global market standards efficiently[7].
These inefficiencies often disrupt the supply chain, creating bottlenecks that reduce
overall productivity and profitability for producers.Furthermore, as the agricultural
sector faces labor shortages and increasing labor costs, there is a pressing need to
adopt automation to ensure consistent and reliable quality control. Traditional grad-
ing methods also struggle to account for subtle variations in tomato quality, such as
size, color, and texture, which are critical for determining ripeness and overall quality.
As a result, there is an urgent demand for innovative solutions that can enhance the
efficiency, accuracy, and scalability of the tomato grading process.
Recent advances in computer vision and machine learning technologies have shown
great potential in automating quality control tasks. These technologies can provide a
more precise, consistent, and faster alternative to manual grading, offering the abil-
ity to sort tomatoes based on multiple quality factors in real-time. The integration
of deep learning models, such as convolutional neural networks (CNNs), with au-
tomated systems can significantly improve the grading process, reduce errors, and
increase throughput, thereby boosting the profitability and sustainability of tomato
production.
2
Manual grading systems are inadequate for handling the high volume of tomatoes
produced daily. These systems result in several inefficiencies, including delays due to
increased operational time, higher costs due to greater reliance on labor, and incon-
sistencies in grading that negatively impact market value and customer satisfaction.
These challenges highlight the need for an automated system that can accurately
assess tomato quality at scale, reduce labor dependency, and enhance precision and
consistency in grading.
1.3 Objective
The primary goal of this research is to develop a real-time, automated system for
tomato quality grading and sorting. The system aims to efficiently classify tomatoes
using both binary classification (healthy vs. rejected) and multiclass classification
(ripe, unripe, old, and damaged). Additionally, the system seeks to reduce labor
costs, improve operational efficiency, and evaluate its performance using metrics such
as accuracy, precision, recall, and F1-score.
As shown in Figure 1, the system provides an overview of the tomato quality
grading process.
This research introduces an automated grading system that leverages a hybrid model
approach, combining deep learning and traditional machine learning techniques. The
system uses pre-trained convolutional neural networks (CNNs) for feature extraction,
enabling it to capture complex features from tomato images. These extracted features
are classified using machine learning algorithms, including Support Vector Machines
(SVM), k-Nearest Neighbors (KNN), Decision Trees (DT), and Random Forest (RF).
To ensure practical application and robustness, the system is trained on a custom
dataset sourced from field environments. This hybrid approach is designed to address
the inefficiencies of manual grading systems and enhance the accuracy and scalability
of tomato quality assessment.
The thesis is organized as follows: Chapter 1 introduces the study, including the
problem statement, objectives, proposed solution, and organization of the thesis.
Chapter 2 reviews the literature, discussing related work and summarizing key
findings. Chapter 3 presents the system design, including an overview, dataset
description, feature extraction using pre-trained CNNs, classification methods, and
performance metrics. Chapter 4 covers the implementation and results, featuring
the system architecture, implementation setup, results of feature extraction and clas-
sification tasks, evaluation with performance metrics, and a discussion of the findings.
Finally, Chapter 5 concludes the study and provides future work.
4
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
Significant advancements have been made in automated systems for sorting and grad-
ing fruits and vegetables, focusing on external attributes such as shape, size, color,
texture, and ripeness. The increasing demand for high-quality produce and the lim-
itations of manual grading have driven the shift towards automation. Many studies
have explored algorithms combining image processing with machine learning and deep
learning to improve the accuracy and efficiency of these systems. Traditional machine
learning algorithms like support vector machines (SVM), k-nearest neighbors (KNN),
and decision trees (DT) have been used, relying on hand-crafted features such as color
histograms and texture descriptors. However, these methods often struggle to cap-
ture complex patterns. The advent of deep learning, particularly convolutional neural
networks (CNNs), has significantly improved fruit grading systems by automatically
learning hierarchical features from raw image data, eliminating the need for manual
feature extraction.
Additionally, techniques like transfer learning, data augmentation, and multispec-
tral imaging have further enhanced model performance, allowing for better evaluation
of both external and internal fruit quality attributes such as ripeness, sugar content,
and firmness. Despite these advancements, challenges such as handling variations in
lighting and environmental conditions persist, and ongoing research aims to improve
the robustness and scalability of these systems for large-scale agricultural applica-
tions.
Real-time Defects Detection System for Orange Citrus Fruits Using Multi-
spectral Imaging
5
2.3 Summary
In summary, various automated systems have been developed for fruit and vegetable
quality assessment, leveraging different algorithms and techniques. The combination
of image processing, machine learning, and deep learning has shown promising results
in tasks such as defect detection, classification of ripeness, and size classification.
Studies have demonstrated the effectiveness of methods such as transfer learning,
SVM, and CNN-based models in achieving high accuracy rates for various agricultural
applications.
7
CHAPTER 3
SYSTEM DESIGN
The proposed system is a real-time, automated tomato quality grading and sorting
system that utilizes deep learning and traditional machine learning models. The sys-
tem classifies tomatoes based on both binary (healthy vs. rejected) and multiclass
(ripe, unripe, old, and damaged) tasks. It leverages pre-trained Convolutional Neural
Networks (CNNs) for feature extraction, followed by traditional classifiers like Sup-
port Vector Machines (SVM), k-Nearest Neighbors (kNN), Decision Trees (DT) for
classification. Data augmentation techniques such as rotation, flipping, brightness ad-
justments, and noise addition enhance model generalization. The system is designed
to reduce labor costs, improve operational efficiency, and provide high accuracy, pre-
cision, recall, and F1-score in real-world applications.
3.2 Dataset
The dataset utilized in this research was gathered from agricultural fields in Tamil
Nadu, India, with labeling and validation provided by the Department of Horticulture,
Tamil Nadu. The dataset was structured to support various classification tasks,
with details of the class distribution presented in Table 1. Prior to training the
models, a series of preprocessing steps were applied. Each image was resized to a
standardized 150 x 150 pixels to maintain consistency. Additionally, several data
augmentation techniques were employed to diversify the training set and improve
the model’s ability to generalize. These techniques included random rotations to
introduce variations in image orientation, along with horizontal and vertical flips
to simulate different viewpoints. Adjustments to image brightness were made to
account for lighting inconsistencies, and random noise was added to simulate real-
world conditions, further enhancing the model’s robustness.
Figure 3: Sample images from the dataset showcasing the four original classes
in Table 1. The original dataset contained 1,734 images, which increased to 3,466
after augmentation. This increase in dataset size was essential for improving model
performance, particularly in achieving better generalization and reducing overfitting
during training.
In this study, feature extraction was performed using several pre-trained Convolu-
tional Neural Networks (CNNs), including ResNet50, InceptionV3, MobileNetV2,
DenseNet121, and EfficientNetB0. These models, pre-trained on large and diverse
datasets such as ImageNet, possess the ability to capture a wide range of features
from images, enabling the extraction of relevant characteristics essential for classi-
fying tomato quality. The decision to utilize these pre-trained models stems from
their exceptional performance in various computer vision tasks, as they have already
learned to recognize complex patterns in images, such as edges, textures, and shapes.
The feature extraction process, as illustrated in Figure. 4, involves passing input
images through the layers of these CNN models. Each layer progressively abstracts
the features from low-level information (e.g., edges and textures) to high-level repre-
sentations (e.g., shapes, objects, and complex patterns). The output from the CNNs
consists of feature maps—structured data that encapsulates these learned represen-
10
tations. These feature maps are then stored as tensors, which are multidimensional
arrays that hold the extracted features and serve as input for the next stages of the
classification pipeline.
To improve the efficiency and accuracy of the tomato classification system, sev-
eral pre-trained convolutional neural network (CNN) models were employed solely
for feature extraction. The models considered in this research included ResNet50[8],
InceptionV3[9], MobileNetV2[10], DenseNet121[11], and EfficientNetB0[12]. These
models, pre-trained on the ImageNet dataset, were selected for their ability to ex-
tract high-level features from images efficiently. Each model offers unique architec-
tural innovations that make them suitable for extracting robust and diverse features.
ResNet50 utilizes residual connections to address the vanishing gradient problem,
enabling the extraction of deep hierarchical features. InceptionV3 captures features
at multiple scales using inception modules. MobileNetV2, optimized for resource-
constrained environments, extracts lightweight yet effective features using depthwise
separable convolutions. DenseNet121 promotes feature reuse through dense connec-
tions, resulting in highly detailed feature maps. Finally, EfficientNetB0 employs
compound scaling to generate compact yet highly informative features. Table 2 sum-
marizes the key characteristics of these models, emphasizing their architectural ad-
vantages and their suitability for feature extraction.
These pre-trained models served as feature extractors by removing their fully con-
11
nected layers and leveraging their convolutional layers to generate feature embeddings
from the input images. These embeddings were subsequently fed into traditional ma-
chine learning classifiers to perform binary and multiclass classification tasks, achiev-
ing an optimal balance between computational efficiency and classification accuracy.
In this study, traditional machine learning classifiers were employed to classify the fea-
tures extracted by pre-trained CNN models. These classifiers include Support Vector
Machine (SVM), k-Nearest Neighbors (kNN),and Decision Trees (DT). The primary
objective of utilizing these classifiers was to leverage their ability to efficiently handle
both binary and multiclass classification tasks, using the high-dimensional feature
sets derived from the CNN models. Each classifier underwent hyperparameter opti-
mization to ensure the best performance on the dataset. The tuning was performed
using cross-validation, which enabled a robust evaluation of each model’s performance
across different subsets of the training data. The goal of hyperparameter optimiza-
tion was to strike a balance between model complexity and generalization, ensuring
that the classifiers perform well without overfitting the data. The SVM classifier
was tested with two kernel types: Radial Basis Function (RBF) and Linear, as these
are well-suited for handling both linearly separable and non-linearly separable data.
12
The kernel type directly affects the classifier’s decision boundary, and selecting the
appropriate kernel was crucial for improving classification accuracy. For the kNN al-
gorithm, the number of neighbors was set to 5, based on cross-validation results that
suggested it offered the best trade-off between computational cost and accuracy. In
the case of the DT algorithm, two configurations were considered for the maximum
depth: no limit (None) and a maximum depth of 5. The latter option was chosen to
avoid overfitting, which could degrade the model’s generalization capability. The hy-
perparameters used for each classifier are summarized in Table 3, which outlines the
different configurations tested during the tuning process. These configurations were
selected after evaluating various combinations and were chosen based on their impact
on classifier performance. This process of fine-tuning and optimizing hyperparame-
ters helped to improve the model’s ability to accurately classify tomatoes into their
respective categories, enhancing the overall system’s performance for both binary and
multiclass classification tasks.
The performance of the model is evaluated using several metrics that help assess
its effectiveness in classification tasks. These metrics include True Positive (TP),
which represents the number of correctly predicted positive cases; True Negative
(TN), which corresponds to correctly predicted negative cases; False Positive (FP),
13
which refers to incorrectly predicted positive cases; and False Negative (FN), which
denotes incorrectly predicted negative cases.
Accuracy
TP + TN
Accuracy =
TP + TN + FP + FN
Accuracy represents the proportion of correctly predicted instances out of the total
instances. It provides a general measure of the model’s performance, but it may not
be suitable for imbalanced datasets, where one class is more prevalent than the other.
Precision
TP
Precision =
TP + FP
Precision, also known as positive predictive value, measures the ratio of true positive
predictions to the total number of positive predictions made by the model. It reflects
how many of the instances predicted as positive are actually positive. High preci-
sion indicates that the model is making fewer false positive predictions. Precision is
particularly important when the cost of a false positive is high, such as in medical
diagnoses or fraud detection.
Recall (Sensitivity)
TP
Recall =
TP + FN
Recall, also known as sensitivity or true positive rate, measures the model’s ability
to identify all actual positive cases. It reflects how well the model detects positive
instances. High recall is essential when the cost of missing a positive case (false
negative) is significant, such as in detecting diseases or identifying critical events.
F1 Score
Precision × Recall
F1 Score = 2 ×
Precision + Recall
14
The F1 Score is the harmonic mean of precision and recall. It provides a balance
between the two metrics, especially when there is an uneven class distribution. F1
is a more useful metric when both precision and recall are important, and there is a
need to balance false positives and false negatives.
Confusion Matrix
Figure 5: Illustration of confusion matrices: (a) Binary confusion matrix; (b) Multi-
class confusion matrix.
15
CHAPTER 4
The system architecture of the proposed automated tomato quality grading and sort-
ing system is designed to seamlessly integrate various components, ensuring real-time
performance and high classification accuracy. The architecture comprises multiple
stages, each focusing on a specific task, such as image acquisition, feature extraction,
classification, and decision-making. The overall workflow is as follows:
• Classification: The features extracted from the images are classified using
algorithms like Support Vector Machines (SVM), k-Nearest Neighbors (kNN),
and Decision Trees (DT) to determine the quality of the tomatoes. Both binary
and multiclass classification tasks are handled in this stage.
• Decision and Grading: Based on the classification results, the system cat-
egorizes tomatoes as ”healthy” or ”rejected” (binary classification) or assigns
16
them to one of the four categories: ripe, unripe, old, or damaged (multiclass
classification). The system can then trigger the appropriate action, such as
sorting the tomatoes or sending an alert.
The architecture ensures that the system is scalable, flexible, and robust enough
to handle variations in tomato quality while maintaining high throughput and low
error rates.
The implementation of the automated tomato quality grading and sorting system
leverages a high-performance on-premise server configuration to ensure seamless op-
eration and real-time processing capabilities. The server runs on the Ubuntu 22.04.5
LTS operating system and is hosted on a Dell PowerEdge R740 machine, which fea-
tures an Intel Xeon Gold 6140 processor with 36 cores and an NVIDIA Tesla V100
GPU for accelerated deep learning computations. The system is equipped with 256
17
GB of RAM and a 2.54 TiB SSD, providing ample resources for handling large-scale
image processing and deep learning workflows. Additional features such as a 1600x900
resolution and the zsh 5.8.1 shell further enhance the operational environment. The
detailed server specifications are summarized in Table 4.
Resolution 1600x900
RAM 256
SSD 2.54TiB
The process of feature extraction from images utilizes pre-trained deep learning mod-
els to efficiently derive meaningful representations from the data. This method
involves selecting a suitable pre-trained model, such as ResNet50, InceptionV3,
MobileNetV2, DenseNet121, or EfficientNetB0, to process the dataset. The in-
put images are resized to a standard dimension of 150 × 150 to ensure compatibility
with the model architecture. The dataset, organized in a specified directory, is pro-
cessed in batches, with a default size of 32 images per batch. The features extracted
from the images, along with their corresponding labels, are saved as tensors in a desig-
nated directory for subsequent use. This approach leverages transfer learning, which
18
eliminates the need for training a model from scratch, thus significantly reducing
computational requirements and time. The extracted features can then be utilized
for further analysis or as input to traditional machine learning classifiers, enhancing
the efficiency and accuracy of downstream tasks.
4 """
5 Extract features from images using a pre - trained deep learning
model .
6
7 Args :
8 model_name ( str ) : The name of the pre - trained model .
9 data_dir ( str ) : Path to the dir containing the image dataset .
10 img_height ( int ) : Height of the input image ( default : 150) .
11 img_width ( int ) : Width of the input image ( default : 150) .
12 batch_size ( int ) : Number of images to process at a time .
13 save_dir ( str ) : Dir to save the extracted features and labels .
14
15 Saves :
16 Extracted features and labels as ’. tf ’ files .
17 """
18
The classification results for both binary and multi-class tasks highlight the perfor-
mance of models based on extracted features and traditional machine learning classi-
fiers. Using features from the InceptionV3 model, a Support Vector Classifier (SVC)
with an RBF kernel achieved an accuracy of 0.94, a precision of 0.95, a recall of 0.91,
and an F1-score of 0.93 in the binary classification task. Similarly, the DenseNet121
model combined with an SVC using a linear kernel demonstrated strong performance
19
The evaluation of the models was conducted using key performance metrics, including
accuracy, precision, recall, and F1-score. The results for binary classification tasks
are summarized in Table 5, while the performance of multi-class classification models
is detailed in Table 6.
In the binary classification comparison (Table 5), the model combining Incep-
tionV3 with an SVC using an RBF kernel achieved the highest accuracy of 94%, with
a precision of 0.95, recall of 0.91, and an F1-score of 0.93. This indicates its supe-
rior ability to generalize in distinguishing between the binary classes. On the other
hand, ResNet50 paired with an SVC using an RBF kernel showed a lower accuracy of
73%, with a precision of 0.79 and a recall of 0.57, highlighting its challenges in han-
dling imbalanced or complex data distributions. Figure 7 (A) visualizes the accuracy
comparison for the binary classification task.
20
For multi-class classification (Table 6), the combination of DenseNet121 and SVC
(linear kernel) emerged as the top performer, achieving an accuracy of 96%, precision
of 0.91, and recall of 0.96. MobileNetV2 with a linear SVC closely followed with an
accuracy of 94%. However, the models based on ResNet50 showed relatively lower
The classification results demonstrate the effectiveness of using pre-trained deep learn-
ing models for feature extraction, combined with traditional machine learning clas-
sifiers, in both binary and multi-class classification tasks. The models’ performance
22
varies based on the combination of feature extractors and classifiers, which is a critical
factor for achieving high accuracy and generalizability. In the binary classification
task, the InceptionV3 + SVC (RBF kernel) combination yielded the highest per-
formance, achieving an accuracy of 94% with excellent precision and recall scores.
This suggests that the model is effective at distinguishing between the two classes,
even with a relatively simple classifier like SVC. The high precision and recall indicate
that the model can both correctly identify positive instances and avoid misclassifying
negative instances. However, the performance of the ResNet50 + SVC (RBF kernel)
combination was subpar, with an accuracy of just 73%, precision of 0.79, and recall of
0.57. This lower performance may be due to the difficulty of the ResNet50 model in
capturing discriminative features for the binary task, especially when the data distri-
bution is imbalanced or complex. The performance could be improved by fine-tuning
the model or exploring alternative classifiers or feature extractors. In the multi-
class classification task, the DenseNet121 + SVC (linear kernel) combination
performed exceptionally well, with an accuracy of 96%, precision of 0.91, and recall
of 0.96. This indicates that DenseNet121 is well-suited for handling the complexity
of multi-class classification, as it effectively extracts features that help the SVC make
correct predictions across the different classes. This combination stands out as the
most reliable and robust for multi-class tasks, particularly for complex datasets. On
the other hand, ResNet50 did not perform as well in multi-class tasks. The ResNet50
+ DT combination resulted in the lowest accuracy (55%), precision (0.53), and recall
(0.36), suggesting that this combination may not be ideal for multi-class classifica-
tion in this specific task. The decision tree classifier may have struggled with the
complexity of the feature space or failed to generalize well across the multiple classes.
The MobileNetV2 + SVC (linear kernel) also showed strong performance, with
an accuracy of 94%, closely following DenseNet121. This highlights MobileNetV2’s
potential in scenarios where computational efficiency is crucial, as it provides a good
trade-off between accuracy and speed.
The confusion matrices presented in Figure 8 further corroborate these findings by
visualizing the misclassifications. The models’ ability to correctly predict the classes
23
is evident, but they also highlight areas where improvements can be made, especially
in terms of recall for certain classes. For instance, the misclassifications observed in
the ResNet50 + DT configuration suggest that this model might be prone to under-
fitting in multi-class scenarios, which could be addressed by fine-tuning or using a
different classifier. Overall, these results highlight the importance of selecting ap-
propriate model configurations based on the task at hand. Pre-trained models like
InceptionV3 and DenseNet121, combined with traditional machine learning classi-
fiers like SVC, offer a powerful approach to achieve high performance across various
classification tasks. However, careful tuning and evaluation of model parameters are
necessary to avoid overfitting or underfitting, particularly in more complex multi-class
classification scenarios.
24
CHAPTER 5
5.1 Conclusion
In this work, we have successfully developed and evaluated a hybrid approach com-
bining pre-trained deep learning models for feature extraction and traditional ma-
chine learning classifiers for both binary and multi-class classification tasks. The
results demonstrate that combining state-of-the-art models such as InceptionV3,
DenseNet121, and MobileNetV2 with classifiers like Support Vector Classifiers (SVC)
and K-Nearest Neighbors (KNN) can achieve high accuracy, precision, and recall,
outperforming many baseline methods. For binary classification, the combination of
InceptionV3 and SVC (RBF kernel) emerged as the top performer, while for multi-
class tasks, DenseNet121 coupled with SVC (linear kernel) achieved outstanding
results. These findings highlight the effectiveness of using pre-trained models for
feature extraction in conjunction with traditional classifiers, providing a balance be-
tween computational efficiency and high classification performance. The work also
sheds light on the challenges faced by certain models, such as ResNet50 in multi-class
classification tasks, where the performance could be further enhanced with better
tuning or alternative approaches.
Overall, this research contributes to the growing field of hybrid machine learning
models, demonstrating their potential in a variety of classification tasks, and offering
insights into model selection and performance evaluation.
While the current work has achieved significant results, there are several opportuni-
ties for future research and improvement. One promising avenue is the integration
of YOLO (You Only Look Once) models for more advanced real-time object
detection and classification tasks. YOLO’s ability to perform high-speed detection
25
makes it an ideal candidate for applications requiring quick and efficient predic-
tions, such as real-time surveillance or autonomous systems[?]15)[?]16). Addition-
ally, vision-based large language models (LLMs) can be explored for future
work, particularly in the context of multimodal learning, where models can com-
bine visual and textual information to generate more comprehensive and accurate
outputs[?]13)[?]14)[?]17). For example, integrating YOLO models with vision-based
LLMs could enhance the understanding and contextualization of visual data, enabling
more sophisticated systems capable of both recognizing objects and understanding
complex instructions or scenarios. This integration has the potential to elevate tasks
such as autonomous navigation, visual question answering, and interactive robotics.
Another area for future work involves expanding the dataset and considering more
complex architectures, such as Transformers, for both feature extraction and classi-
fication. Fine-tuning pre-trained models or adopting new architectures can further
improve the generalization ability of the models, especially in more challenging envi-
ronments or datasets with a larger variety of classes.
In conclusion, the future scope of this research lies in advancing the integration
of computer vision, object detection, and multimodal learning models, with potential
applications in industries ranging from healthcare to autonomous systems and beyond.
By leveraging cutting-edge technologies, future work can push the boundaries of AI,
enabling systems that are not only highly accurate but also capable of more complex
and nuanced decision-making processes.
Bibliography
[1] A. M. Abdelsalam and M. S. Sayed, “Real-time defects detection system for or-
ange citrus fruits using multi-spectral imaging,” in Proc. IEEE 59th Int. Midwest
Symp. Circuits Syst. (MWSCAS), Oct. 2016, pp. 1–4.
[3] T. Tao and X. Wei, “A hybrid CNN–SVM classifier for weed recognition in winter
rape field,” Plant Methods, vol. 18, no. 1, p. 29, Dec. 2022.
[4] Z. Zha, D. Shi, X. Chen, H. Shi, and J. Wu, “Classification of appearance quality
of red grape based on transfer learning of convolution neural network,” Tech.
Rep., 2023.
26
27
[8] W. Xu, Y.-L. Fu, and D. Zhu, “ResNet and its application to medical image pro-
cessing: Research progress and challenges,” Comput. Methods Programs Biomed.,
vol. 240, p. 107660, 2023, doi: 10.1016/j.cmpb.2023.107660.
[10] K. Dong, C. Zhou, Y. Ruan, and Y. Li, “MobileNetV2 Model for Image Clas-
sification,” in Proc. Int. Conf. Inf. Technol. Comput. Appl. (ITCA), 2020, pp.
476–480, doi: 10.1109/ITCA52113.2020.00106.
[12] V.-T. Hoang and K.-H. Jo, “Practical Analysis on Architecture of Efficient-
Net,” in Proc. Int. Conf. Hum. Syst. Interact. (HSI), 2021, pp. 1–4, doi:
10.1109/HSI52170.2021.9538782.
[15] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once:
Unified, Real-Time Object Detection,” in Proc. 2016 IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), 2016, pp. 779–788, doi: 10.1109/CVPR.2016.91.
28
[16] A. K. Sangaiah, F.-N. Yu, Y.-B. Lin, W.-C. Shen, and A. Sharma, “UAV T-
YOLO-Rice: An Enhanced Tiny Yolo Networks for Rice Leaves Diseases Detec-
tion in Paddy Agronomy,” IEEE Trans. Network Sci. Eng., vol. 11, no. 6, pp.
5201–5216, 2024, doi: 10.1109/TNSE.2024.3350640.
[17] J. Wang, T. Wang, W. Cai, L. Xu, and C. Sun, “Boosting Efficient Reinforce-
ment Learning for Vision-and-Language Navigation With Open-Sourced LLM,”
IEEE Robotics and Automation Letters, vol. 10, no. 1, pp. 612–619, 2025, doi:
10.1109/LRA.2024.3511402.