Lightweight GAN-Assisted Class Imbalance Mitigation for Apple Flower Bud Detection

Yuan, Wenan; Li, Peng

doi:10.3390/bdcc9020028

Open AccessArticle

Lightweight GAN-Assisted Class Imbalance Mitigation for Apple Flower Bud Detection

by

Wenan Yuan

^*

and

Peng Li

Independent Researcher, Oak Brook, IL 60523, USA

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(2), 28; https://doi.org/10.3390/bdcc9020028

Submission received: 21 December 2024 / Revised: 15 January 2025 / Accepted: 24 January 2025 / Published: 29 January 2025

(This article belongs to the Special Issue Advances and Applications of Deep Learning Methods and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Multi-class object detectors often suffer from the class imbalance issue, where substantial model performance discrepancies exist between classes. Generative adversarial networks (GANs), an emerging deep learning research topic, are able to learn from existing data distributions and generate similar synthetic data, which might serve as valid training data for improving object detectors. The current study investigated the utility of lightweight unconditional GAN in addressing weak object detector class performance by incorporating synthetic data into real data for model retraining, under an agricultural context. AriAplBud, a multi-growth stage aerial apple flower bud dataset was deployed in the study. A baseline YOLO11n detector was first developed based on training, validation, and test datasets derived from AriAplBud. Six FastGAN models were developed based on dedicated subsets of the same YOLO training and validation datasets for different apple flower bud growth stages. Positive sample rates and average instance number per image of synthetic data generated by each of the FastGAN models were investigated based on 1000 synthetic images and the baseline detector at various confidence thresholds. In total, 13 new YOLO11n detectors were retrained specifically for the two weak growth stages, tip and half-inch green, by including synthetic data in training datasets to increase total instance number to 1000, 2000, 4000, and 8000, respectively, pseudo-labeled by the baseline detector. FastGAN showed its resilience in successfully generating positive samples, despite apple flower bud instances being generally small and randomly distributed in the images. Positive sample rates of the synthetic datasets were negatively correlated with the detector confidence thresholds as expected, which ranged from 0 to 1. Higher overall positive sample rates were observed for the growth stages with higher detector performance. The synthetic images generally contained fewer detector-detectable instances per image than the corresponding real training images. The best achieved YOLO11n AP improvements in the retrained detectors for tip and half-inch green were 30.13% and 14.02% respectively, while the best achieved YOLO11n mAP improvement was 2.83%. However, the relationship between synthetic training instance quantity and detector class performances had yet to be determined. GAN was concluded to be beneficial in retraining object detectors and improving their performances. Further studies are still in need to investigate the influence of synthetic training data quantity and quality on retrained object detector performance.

Keywords:

AriAplBud; CNN; convolutional neural network; FastGAN; generative adversarial network; object detection; precision agriculture; pseudo-labeling; YOLO11

1. Introduction

Object detection, in a strict narrow sense, refers to the computer vision task of localizing and classifying target objects in images through bounding boxes. Models that are capable of completing such tasks, typically convolutional neural network (CNN)-based frameworks, are named object detectors. Although the concept of object detection is also captured in closely related computer vision tasks, such as semantic segmentation, instance segmentation, or panoptic segmentation, where precise geometric masks are used to highlight target objects, object segmentors are not to be confused with object detectors [1]. Object detection has wide implications across various disciplines [2,3], including precision agriculture [4], a growing branch of agriculture that aims to improve agricultural management efficiency utilizing technologies [5]. Current mainstream object detectors rely on supervised learning and require large diverse datasets to achieve robust performances, which translates to human image annotation needs with significant labor and time costs. For a multi-class dataset, it is common for certain object classes to contain substantially fewer instances than the rest due to data collection difficulties. For example, an apple flower bud at tip growth stage in early spring can grow into five to eight flowers at bloom growth stage in late spring [6,7], implying that a single tip bounding box annotation would correspond to multiple bloom bounding box annotations. This problem is known as the class imbalance issue in deep learning, which causes weak exposure to certain classes for object detectors, hence leading to poor model performances for the classes [8] (Figure 1).

Generative adversarial networks (GANs), proposed by Goodfellow et al. [9], are generative models designed for creating synthetic data that mimic real data. GANs comprise generators and discriminators that compete in a minimax two-player game. Generators are trained on real datasets to capture real data distribution and learn to map input variables, such as random multidimensional noise vectors or source domain images, to synthetic target domain samples. Discriminators, on the other hand, estimate the probability of given samples being real during model training. Generators’ objective functions aim to minimize the likelihood of discriminators assigning high and low real data probability to real and synthetic data, respectively, while discriminators’ objective functions aim to maximize the same likelihood. Practically, generators become better at generating synthetic samples indistinguishable from real samples, and discriminators become better at identifying synthetic samples from real samples, as model training proceeds. With sufficient data, GAN trainings, in theory, should eventually converge and reach Nash equilibria, meaning that neither the generators nor the discriminators can improve further against their counterparts [10]. Similar to object detection, research regarding GAN in recent years has gained significant traction in many disciplines, and GAN’s common applications include image synthesis, image super-resolution, image-to-image translation, etc. [11,12].

Given GAN’s ability to synthetize data based on learned data distributions [9], utilizing synthetic data from GANs to improve weakly trained object detectors naturally becomes an intriguing research topic and a potentially valid solution for insufficient model training data. The underlying method here would be to first develop one or multiple GANs targeted towards the challenging classes for an object detector using its limited training dataset, then generate sufficient synthetic data for the classes, and finally retrain the object detector by incorporating the synthetic data into the original training dataset. Fundamentally, this approach is an alternative form of traditional data augmentation techniques [13], and shares similarities with simulation-based image generation [14], as it involves creating entirely new data from scratch rather than modifying or transforming existing data. It is worth noting that fine-tuning object detectors with synthetic data can be an equally valid approach to improve their performances. However, besides shorter training durations, this strategy does not offer advantages over retraining in terms of final model performance on real test datasets, as fine-tuned models are not exposed to real training data unlike retrained models during their second phase of development. Despite the theoretical potential of this idea, current literature has reported that synthetic training data can reduce artificial intelligence (AI) model performances and even lead to model collapse. For example, Bohacek and Farid [15] observed that the popular image generation model Stable Diffusion was highly vulnerable to data poisoning or synthetic training data, and yielded severely distorted and less diverse images even when the retraining datasets contained low quantities of self-generated images. Shumailov et al. [16] discovered that generative AIs, such as large language models (LLMs), variational autoencoders (VAEs), and Gaussian mixture models (GMMs), all experienced performance degradation when their training datasets were polluted by synthetic data from preceding models.

Additionally, under the context of GAN and object detection, further methodological concerns exist. First, GAN development is generally conducted on massive datasets and arguably more data-demanding than object detector development. For example, the famous StyleGAN was trained on the FFHQ dataset with 70,000 images [17], while the subsequent StyleGAN2 was trained on the LSUN Car dataset with 893,000 images [18]. Brock et al. [19] trained BigGAN and BigGAN-deep using a subset of the JFT-300M dataset, containing 292 million images. When trained on insufficient data, GANs are prone to the mode collapse issue [20], where generators produce highly similar outputs instead of diverse samples that reflect full training data distributions. Consequently, the value of a homogeneous synthetic dataset would be limited for improving object detectors. Second, GAN training can be rather computationally expensive. For example, StyleGAN was trained for one week on an NVIDIA DGX-1 with eight Tesla V100 GPUs, while the StyleGAN2 project was executed for approximately 51 GPU years based on a single NVIDIA V100 GPU. The high computation requirement of GAN development could pose a significant barrier to research with limited resource access. Third, object detection often deals with multi-class datasets, while regular unconditional GANs trained on such datasets do not have the ability to synthesize data of target classes on demand. Specialized GANs, such as conditional GANs [21], would be necessary for precise class data generation. Fourth, small objects, characterized by their relatively small sizes compared to overall images, are common yet challenging targets in object detection. In contrast, GANs are typically trained on images dominated by a single, centrally located target object. It is questionable whether GANs can effectively synthesize images with randomly located small target objects. Lastly, even though GANs can generate an unlimited number of images, corresponding image annotations are still needed for the synthesized data to be useful for object detectors. Traditional costly and inefficient manual annotations are apparently not the ideal solution for such a scenario, as it can easily become a bottleneck that significantly limits the dataset preparation process. Alternative automated methods, such as pseudo-labeling, can address the annotation speed constraint; however, the generated image annotations may have suboptimal quality.

In light of the aforementioned background, the current study investigated the feasibility of mitigating class imbalance in object detection utilizing GAN-generated data, as the first attempt under an agricultural context in current literature. The lightweight unconditional GAN FastGAN, the state-of-art object detector YOLO11n, and the multi-growth stage apple flower bud dataset AriAplBud were chosen for the study. Such an experiment design was implemented to not only avoid extreme GAN development time and resource requirements, but also verify the viability of the seemingly paradoxical concept: utilizing less-capable lightweight GANs developed on few images to improve advanced object detectors that struggle with the same small training dataset. The objectives of the study included: (1) developing a state-of-art apple flower bud detector; (2) developing GAN models for apple flower bud image synthesis at individual growth stages; (3) quantifying GAN’s capacity to successfully synthesize images containing object-detector-detectable apple flower buds; (4) evaluating the usefulness of GAN-based synthetic data by quantity in improving object detector performance for weakly trained classes. This article follows a standard research article structure, sequentially presenting a comprehensive literature review, a detailed method description, experimental results and discussions, and study conclusions in the subsequent sections.

2. Literature Review

A moderate number of studies exists in current literature utilizing GANs for object detection purposes. Based on their nature, the studies can be generically divided into three categories: implementing GAN with object detector, retraining object detector with synthetic data from GAN, and manipulating real data with GAN for object detector. The current study falls into the second category and addresses the knowledge gap by focusing on lightweight GAN in an agricultural context.

Unique characteristics exist for the three study categories, as they utilize GANs in distinct manners. Incorporating GAN modules into object detection frameworks creates a complete, unified solution. However, deep and up-to-date understandings of neural networks, significant network architecture designing and modification, and iterative experimentation might be required to develop such models. Retraining with additional GAN-generated data would update model weights and hence should be able to fundamentally improve object detectors for future applications. However, the efficacy of the model retraining depends on the quantity and quality of the synthetic data, which in turn rely on the specific GANs used as well as their developments. Transforming real data into a target domain that object detectors are more familiar with through GAN allows for model performance improvements without detector redevelopment. Yet, to maintain the same level of model performance, the utilization of GANs becomes a constant necessity. It is difficult to conclude which categories are exploiting GANs for object detection more effectively, as the studies often developed and evaluated their detectors based on different datasets, metrics, and benchmarking counterparts for different applications, and therefore are not always directly comparable.

2.1. Implementing GAN with Object Detector

Li et al. [22] proposed Perceptual GAN to address small object detection problems. Perceptual GAN consisted of a deep residual generator that utilized fine-grained details from low-level convolutional layers to generate super-resolved high-level convolutional features, and a discriminator with an adversarial branch for object probability estimation and a perception branch for object classification and bounding box regression. The study demonstrated that Perceptual GAN outperformed Fast R-CNN and Faster R-CNN for the Tsinghua-Tencent 100K traffic sign dataset.

Wang et al. [23] proposed CMTGAN based on deep CNNs dedicated to small object detection. CMTGAN included a generator with a centered mask for image super-resolution, and a discriminator for two-stage object detection by proposing ROIs first and then predicting object categories and regressing bounding boxes on the ROIs. The study experiments suggested that CMTGAN outperformed YOLOv4 and Faster R-CNN combined with bilinear, bicubic, SPSR, and ESRGAN upsampling methods based on the PASCAL VOC dataset.

Sun et al. [24] proposed Ganster R-CNN consisting of RFPN and IGAN modules for occluded object detection. RFPN was created by combining RPN and FPN to integrate semantic information from high-level feature maps and location information from low-level feature maps, and extracted samples from real images. IGAN was composed of a GAN generator and Faster R-CNN detector. The GAN generator created synthetic occluded samples from noise variables and real sample high- and low-resolution feature maps, while Faster R-CNN detector learned whether the samples were real or synthetic. Based on the PASCAL VOC2007, PASCAL VOC2012, and MS COCO 2017 datasets, Ganster R-CNN outperformed Faster R-CNN, SSD513, and R-FCN.

Dewi et al. [25] proposed DC YOLO-GAN by incorporating a one-class YOLO architecture into a GAN discriminator to recognize similar-looking musical instruments, including basson, cello, clarinet, erhu, flute, French horn, guitar, harp, recorder, saxophone, trumpet, and violin. Their study based on the PPMI dataset showed that DC YOLO-GAN outperformed YOLOv2 marginally for certain instrument classes.

Jaw et al. [26] proposed RodNet that incorporated GAN and object detector for nighttime image detection. RodNet GAN treated nighttime images as source domain and daytime images as target domain to achieve feature transformation and project low-luminance features into visible and clean features. RodNet object detector shared features from the GAN generator as inputs and made subsequent object predictions. RodNets incorporated with YOLOv3-416 and YOLOv7-tiny were tested on the BDD100K, KITTI, and CityScape nighttime datasets. The results showed that RodNet-YOLOv3 outperformed SSD-512, RetinaNet, and YOLOv3, and RodNet-YOLOv7 outperformed YOLOv7, for both daytime and nighttime domains.

Ni et al. [27] proposed NaGAN for off-nadir object detection of multi-view remote sensing imagery. NaGAN consisted of a generator with feature generation and label alignment modules to generate nadir-like representations from off-nadir objects, and a discriminator with adversarial and detecting heads. Based on the SpaceNet satellite dataset, NaGAN consistently outperformed Faster R-CNN, Cascade R-CNN, CornerNet, FoveaBox, RetinaNet, HTC, Libra R-CNN, NAS-FPN, and CentripetalNet for all sensor viewing angles.

Bai et al. [28] proposed SOD-MTGAN to be incorporated with any object detectors and improve their small object detections. The baseline object detector was used to first crop out image regions of interests (ROIs), which were then fed into the generator of SOD-MTGAN to construct corresponding high-resolution samples, and the discriminator of SOD-MTGAN finally classified object categories and regressed bounding boxes. Their experiments showed that SOD-MTGAN was able to improve the performances of both Faster R-CNN and FPN based on the MS COCO minival dataset.

Chen et al. [29] trained DRBox with a small set of human-labeled airplane images and pseudo-labeled the remaining large dataset. They further trained DCGAN to classify the human-labeled, pseudo-labeled, and generated images to filter out false detections by DRBox and prevent model overfitting.

Jiang and Ying [30] added a GAN before DSSD as a foreground-background separation translation model and performed data augmentation, including color channel change, noise addition, and contrast enhancement only on image foregrounds. The proposed model marginally outperformed DSSD based on the PASCAL VOC2007 and PASCAL VOC2012 datasets.

Zhai et al. [31] proposed GAN-FRCNN to address the low-resolution and undersampling problem in CSGI object detection. GAN-FRCNN utilized TVAL3 algorithm for reconstructing images at different resolutions and sampling rates using real images. Pretrained Faster R-CNN based on real images was used to obtain object classification loss and bounding box regression loss of the reconstructed images, and the high-loss images were selected as training dataset. Based on the MS COCO 2017 dataset, GAN-FRCNN achieved substantial performance improvements for many object classes.

2.2. Retraining Object Detector with Synthetic Data from GAN

Bosquet et al. [32] proposed DS-GAN to increase the number of small objects in video datasets. DS-GAN had an encoder-decoder generator and a residual block discriminator, and was able to create downsampled low-resolution small objects from high-resolution objects. The authors incorporated DS-GAN into a small object data augmentation pipeline by using Mask R-CNN to extract small foreground target objects and inpainting and blending the objects in images at plausible locations with correct orientations and scales. The synthetic data generated by the pipeline improved STDnet, FPN, and CenterNet for the UAVDT car dataset.

Posilovic et al. [33] proposed DetectionGAN based on Pix2pixHD, consisting of a U-net generator, two PatchGAN discriminators, and a pre-trained object detector. DetectionGAN was used as a conditional GAN to translate binary masks of steel block detects into realistic ultrasonic images, and the synthetic data were further utilized to retrain YOLOv3 along with real data. Model performance improvements were successfully achieved in the study.

Lee et al. [34] proposed RDAGAN for data augmentation purposes, which comprised an object generation network based on InfoGAN that generates target objects, and an image translation network with an encoder-decoder generator and global and local discriminators that inserts the generated object batches within bounding box masks of clean images and translates the overall images into target domain. According to the FiSmo fire and Google Landmarks v2 datasets, YOLOv5 trained with both real and augmented images showed a performance improvement.

Dai et al. [35] proposed CPGAN for thermal infrared data augmentation based on RGB image translation. CPGAN was composed of a cascade pyramid generator and a multi-scale discriminator. The cascade pyramid generator consisted of three branches with similar network structures including low-, medium-, and high-resolution generators for high-resolution image generation. The multi-scale discriminator consisted of three discriminators with the same structure but were to be executed on images of low, medium, and high resolutions. The study demonstrated that synthetic thermal infrared images, when added to training datasets, were able to help improve the performance of Faster RCNN, R-FCN, YOLOv2, YOLOv3, YOLOv4, and SSD.

Liu et al. [36] proposed DetectorGAN based on CycleGAN, which incorporated a ResNet generator, two global and one local PatchGAN discriminators, and a detector that took both real and synthetic labeled images as input and outputted bounding boxes. The study suggested that RetinaNet had performance improvements when trained with both real and synthetic images based on the NIH Chest X-ray nodule dataset and Cityscapes pedestrian dataset.

Zhu et al. [37] proposed MCGAN to augment data for object detection in optical remote sensing images. MCGAN architecture contained a DCGAN’s generator, three discriminators, and a classifier. It was trained based on cropped target objects rather than whole images, and the synthetic objects were Poisson mosaicked into real images to increase data diversity. Pretrained Faster R-CNN was utilized to filter out misidentified or unidentified objects to maintain data distribution. Based on the NWPU VHR-10 and DOTA geospatial datasets with seven classes, Faster R-CNN trained on datasets with different levels of added synthetic objects was able to perform better for most classes.

Kim et al. [38] trained a CNN-based GAN using baggage X-ray images from the GDXray dataset, and retrained Faster R-CNN with additional GAN-generated synthetic data, which achieved superior performances on detecting handgun, shuriken, and razor.

Lin et al. [39] proposed SYN-MTGAN to synthesize traffic sign images, consisting of an encoder-decoder generator inspired by CycleGAN, and a multi-task discriminator that predicted real and synthetic images and target object classes. Adversarial, cycle consistency, identity, classification loss functions were used to calculate the overall weighted sum loss function. According to a customized traffic sign dataset, Faster R-CNN trained on the synthetic dataset reached considerably higher accuracies on certain traffic sign categories than the one trained on real scene images.

Maeda et al. [40] trained PG-GAN using cropped pothole images and Poisson blended synthetic images into undamaged road images. Based on their Road Damage Dataset 2019, retrained SSD MobileNet performance first improved but then reduced as the size of synthetic data increased to the same size of real data in the training datasets.

2.3. Manipulating Real Data with GAN for Object Detector

Courtrai et al. [41] developed SR-CWGAN-Yolo as an image super-resolution network by incorporating SR-GAN, CycleGAN, and YOLOv3. They observed that pretrained Faster R-CNN, EfficientDet, RetinaNet-50, and YOLOv3 based on the ISPRS Potsdam car dataset had much higher object detection accuracies on the upsampled dataset generated by SR-CWGAN-Yolo than the ones generated using bicubic interpolation and EDSR methods.

Nath and Behzadan [42] proposed a deep CNN-based GAN for image super-resolution and missing pixel information generation in low-resolution images. Based on two in-house datasets, namely the Pictor-v2 dataset with common construction objects including buildings, equipment, and worker, and the Pictor-v3 dataset with worker, hat, and vest, YOLOv3 trained on high-resolution images consistently performed better on super-resolved images than their corresponding low-resolution images at various resolution levels.

Li et al. [43] tackled the issue of deep learning model generalizability when a model was trained on source datasets but needed to be applied to target datasets. They applied CycleGAN and AgGAN as image translation models to transform target domain images to source-domain-stylized images under three scenarios, including datasets captured at two time points, over two locations, and by two sensors, respectively. Faster R-CNN trained on source images was subsequently applied to the transformed target images. The study results showed that the methodology failed to boost Faster R-CNN performance.

3. Methods and Materials

3.1. Dataset

The agricultural benchmarking dataset AriAplBud [44] was adopted in the study. AriAplBud is an aerial apple flower bud dataset, captured by an unmanned aerial vehicle (UAV) on nine dates during the spring growing season of 2020 over an apple orchard with two tree varieties. Six apple flower bud growth stages are covered by AriAplBud, namely tip, half-inch green, tight cluster, pink, bloom, and petal fall. Of these, tip and half-inch green have substantially fewer images than the rest. AriAplBud consists of 3600 positive samples with 110,467 manual bounding box annotations in Darknet format, and 2520 negative samples. All images have a resolution of 1920 × 1080. Further details of AriAplBud can be found in the original article [44].

3.2. Object Detector and GAN

The state-of-the-art object detector YOLO11 [45] was selected in the study. At the time of writing, YOLO11 is the latest iteration of the YOLO series, a family of CNN-based object detection frameworks designed for real-time applications. The main architectural difference between YOLO11 and its predecessors includes the incorporation of C3k2 and C2PSA modules [46]. Based on the MS COCO dataset, the five versions of YOLO11 from YOLO11n to YOLO11x achieved mAP50-95 from 0.40 to 0.55.

The lightweight unconditional GAN FastGAN [47] was chosen in the study. FastGAN was designed for few-shot image synthesis tasks. Its training could complete within hours on a single GPU, and was able to converge even when training samples were fewer than 100. FastGAN’s architecture features a skip-layer channel-wise excitation module and a self-supervised discriminator. Based on limited training images in the original study, FastGAN achieved superior performance to StyleGAN2.

3.3. Experiments

The study consisted of four relatively independent experiments, which shared certain models and datasets with one other. Experiment 1 and 2 focused on developing an apple flower bud detector as well as apple flower bud image generators, which were essential for the subsequent experiments. Experiment 3 verified the utility of synthetic apple flower bud images created by the generators for the detector. Experiment 4 examined the proposed methodology of improving object detectors through retraining with synthetic images (Figure 2).

3.3.1. Experiment 1

As it was previously discovered that the AriAplBud negative samples did not affect object detector performance in a meaningful way [13], a baseline YOLO11n apple flower bud detector, YOLO-Base, was developed using only the 3600 AriAplBud positive samples. The development followed a standard random 70-20-10 dataset split for model training, validation, and test, respectively. The three datasets will be referred to as YOLO-Train, YOLO-Val, and YOLO-Test henceforth.

The three specified hyperparameters for model training included a batch size of 256, an image size of 512, and an epoch of 300. The batch size was set to the highest possible value within GPU limit for efficient model training. The image size was set to align with the output image size of FastGAN. The epoch was set empirically according to prior experiments with older YOLO versions, as well as observed model performance plateau in preliminary experiments. The model was evaluated based on mAP50 and mAP50-95 [48]. All YOLO model development was completed using the Python API version 8.3.68 from Ultralytics (Frederick, MD, USA) in Google Colab (Mountain View, CA, USA) with Python 3.11.11 and one NVIDIA A100 GPU (Santa Clara, CA, USA). A summary of the YOLO datasets can be found in Table 1.

3.3.2. Experiment 2

As FastGAN is unconditional, the type of its synthetic output cannot be directly controlled. To synthetize images for all six growth stages in AriAplBud in a targeted manner, six dedicated GAN models, namely GAN-Tip, GAN-Half, GAN-Tight, GAN-Pink, GAN-Bloom, and GAN-Petal, were developed. The datasets used to develop the GAN models were created following a meticulous method. The six training datasets for the six GAN models, namely GAN-Train-Tip, GAN-Train-Half, GAN-Train-Tight, GAN-Train-Pink, GAN-Train-Bloom, and GAN-Train-Petal, consisted of images only from YOLO-Train. The corresponding six validation datasets, namely GAN-Val-Tip, GAN-Val-Half, GAN-Val-Tight, GAN-Val-Pink, GAN-Val-Bloom, and GAN-Val-Petal, consisted of images only from YOLO-Val. The reason behind this design was to prevent the mixing of YOLO-Train and YOLO-Val images during GAN development and exposing the GAN models to YOLO-Test images, so that GAN validations during model training and retrained YOLO evaluations in the subsequent experiments utilizing YOLO-Test could remain objective and unbiased. For each GAN dataset, its image selection was based on one criterion: the selected images must contain at least one apple flower bud annotation belonging to the target growth stage. For example, for GAN-Train-Tip, all images from YOLO-Train containing at least one tip annotation were selected, while for GAN-Val-Petal, all images from YOLO-Val containing at least one petal fall annotation were selected. Note, many positive samples in AriAplBud contain apple flower bud annotations belonging to multiple growth stages simultaneously. Hence, the created GAN training datasets shared certain images from YOLO-Train, while the created GAN validation datasets also shared certain images from YOLO-Val.

The three specified hyperparameters for model training included a batch size of 16, an image size of 512, and an iteration of 20,000. The batch size was set to the highest possible value within GPU limit for efficient model training. The image size was set to be as close to the commonly used 640 × 640 resolution for YOLO as possible. The iteration was set empirically according to observed model performance plateau in preliminary experiments. During training, the models were validated against their validation datasets at every 100 iterations according to FID [49], which measures the feature distribution distance between two image datasets based on the activations of the Inception network [50]. A lower FID score indicates a higher dataset similarity. During each model validation, a synthetic dataset was generated using the current model weights, with the number of images matching that of the corresponding validation dataset. All GAN model development was completed using the official PyTorch implementation of the FastGAN publication [51], with code customizations on periodic model validation and record saving, in Google Colab with Python 3.11.11 and one NVIDIA T4 GPU. A summary of the GAN datasets can be found in Table 2.

3.3.3. Experiment 3

Since most of the apple flower buds in AriAplBud only occupy a small portion of the images, it is theoretically possible for GANs trained on AriAplBud to generate negative samples containing zero apple flower buds. To evaluate GAN’s efficacy in generating positive samples, 1000 synthetic images were generated by GAN-Tip, GAN-Half, GAN-Tight, GAN-Pink, GAN-Bloom, and GAN-Petal, respectively, to create six synthetic datasets, GAN-Syn-Tip, GAN-Syn-Half, GAN-Syn-Tight, GAN-Syn-Pink, GAN-Syn-Bloom, and GAN-Syn-Petal. YOLO-Base was subsequently used to detect the six datasets at 21 confidence thresholds, from 0 to 1 with a step of 0.05. Positive sample rates, or the numbers of images containing at least one target apple flower bud within each of the six synthetic datasets were recorded. The relationship between average detectable target instance number per image and detector confidence threshold was also investigated and compared between the six real training datasets, GAN-Train-Tip, GAN-Train-Half, GAN-Train-Tight, GAN-Train-Pink, GAN-Train-Bloom, and GAN-Train-Petal, and the corresponding synthetic datasets, GAN-Syn-Tip, GAN-Syn-Half, GAN-Syn-Tight, GAN-Syn-Pink, GAN-Syn-Bloom, and GAN-Syn-Petal.

3.3.4. Experiment 4

As mentioned previously, tip and half-inch green are the two weak growth stages in AriAplBud, with only 505 and 1389 instances, in comparison to 19,049 to 43,791 instances for the other four growth stages. The objective of Experiment 4 was to improve YOLO11n performance specifically on tip and half-inch green, by incorporating synthetic images generated by GAN-Tip and GAN-Half into YOLO-Train and redeveloping YOLO11n models, in an attempt to mitigate the class imbalance issue in AriAplBud. While an infinite number of synthetic images can be generated for training object detectors in theory, an upper limit needed to be set to keep the experiment scope sufficient yet practical. According to prior studies [13,52], older YOLO versions, such as YOLOv4 and YOLOv7, tended to have a slower performance improvement rate when training instance number exceeded 1000 to 3000, and a limited performance improvement was observed when training instance number surpassed 5000 to 10,000. It is reasonable to assume that synthetic images should be less effective than real images for training object detectors to identify objects in real images, and YOLO11n trained on synthetic data should have its performance plateaued at lower training instance numbers when validated against real images. Consequently, 8000 was selected as the maximum total instance number to be evaluated in the experiment design.

To investigate the relationship between model performance and total training instance number with both real and synthetic instances combined, based on the results in Experiment 3, 50,000 and 10,000 synthetic images were first generated using GAN-Tip and GAN-Half, respectively, serving as two pools for further image selection. YOLO-Base was then used to identify positive samples with the default confidence threshold of 0.25, that is, pseudo-labeling and counting the number of detectable tip instances in each of the 50,000 images and the number of detectable half-inch green instances in each of the 10,000 images, respectively. In total, 13 new training datasets were created as shown in Table 3. YOLO-Train-Tip-5h, YOLO-Train-Tip-1k, YOLO-Train-Tip-2k, YOLO-Train-Tip-4k, and YOLO-Train-Tip-8k were created sequentially by gradually adding more synthetic positive samples to YOLO-Train, and the suffixes of the dataset names represented the total number of real and synthetic instances for tip. YOLO-Train-Half-2k, YOLO-Train-Half-4k, and YOLO-Train-Half-8k were created in a similar fashion for half-inch green. YOLO-Train-T&H-5h, YOLO-Train-T&H-1k, YOLO-Train-T&H-2k, YOLO-Train-T&H-4k, and YOLO-Train-T&H-8k were created experimentally with the intention to increase the total instance numbers of both tip and half-inch green to the target levels, as indicated by the dataset name suffixes. However, GAN-Train-Tip and GAN-Train-Half shared the images from YOLO-Train that include tip and half-inch green instances simultaneously. As a result, GAN-Tip and GAN-Half were both capable of generating synthetic images containing tip and half-inch green instances, and YOLO-Train-T&H-2k, YOLO-Train-T&H-4k, and YOLO-Train-T&H-8k ended up having total instance numbers much higher than the target levels. YOLO-Train-T&H-5h and YOLO-Train-T&H-1k were essentially the same as YOLO-Train-Tip-5h and YOLO-Train-Tip-1k, except their total tip instance numbers were slightly higher than the 500 and 1000 targets due to initial real instance number miscalculations. Accordingly, 13 new YOLO11n models, YOLO-Tip-5h, YOLO-Tip-1k, YOLO-Tip-2k, YOLO-Tip-4k, YOLO-Tip-8k, YOLO-Half-2k, YOLO-Half-4k, YOLO-Half-8k, YOLO-T&H-5h, YOLO-T&H-1k, YOLO-T&H-2k, YOLO-T&H-4k, and YOLO-T&H-8k were developed based on the corresponding training datasets, all of which were validated and tested against YOLO-Val and YOLO-Test. The same training hyperparameters and environment from Experiment 1 were deployed to develop the models.

4. Results and Discussions

4.1. Experiment 1

Table 4 lists the achieved performance of YOLO-Base on YOLO-Test, which serves as a baseline for Experiment 4 results. Sample detection results by YOLO-Base on YOLO-Test can be found in Figure 3. The relative performance of YOLO11n for the growth stages followed the same pattern as the previously investigated YOLOv4, YOLOv7, and YOLOv8 [13,44,52,53], with a best-to-worst ranking of tight cluster, bloom, pink, petal fall, half-inch green, and tip. A noteworthy discovery was that YOLO11n had an interior performance to YOLOv4 even with a network size of 320 × 320 based on mAP50. There are a number of factors that might have contributed to this result: the random splits of AriAplBud created different training, validation, and test datasets in the two studies, and YOLO-Test could be slightly more difficult than the test dataset for YOLOv4 by chance; the image input sizes were different in the two studies, with 320 for YOLOv4 and 512 for YOLO11n, although lower than 480 × 480 image resolution should lead to worse model performance based on prior investigations [53]; the models were developed using different training hyperparameters, such as batch size and learning rate, which could have influenced model final performance; YOLOv4 has 62 million parameters, while YOLO11n has 2.6 million parameters, and larger models in general should have superior performances; AriAplBud negative samples were previously included during YOLOv4 training but not for YOLO11n, although a previous study indicated that the negative samples were not effective in improving model performance [13]. A subsequent benchmarking study that comprehensively compares different YOLO versions based on AriAplBud will be conducted.

4.2. Experiment 2

Figure 4 shows the generator loss, discriminator loss, and FID score records throughout the development process of GAN-Tip, GAN-Half, GAN-Tight, GAN-Pink, GAN-Bloom, and GAN-Petal. Due to the small training dataset sizes (Table 2), none of the GAN models were able to converge successfully during training, which was characterized by the large generator and discriminator loss fluctuations after roughly 5000 iterations. On the other hand, the quality of the synthetic images generated for validation, or the GAN models’ ability in generating realistic images was indeed improving with training based on the validation FID scores. The models initially appeared to optimize in the wrong direction, with FID peaking at around 500 iterations, before stabilizing and plateauing after 10,000 to 15,000 iterations. This was also confirmed by manually inspecting the generated synthetic validation images at different iterations (Figure 5). Visually speaking, before 5000 iterations, the generated validation images exhibited drastically different appearances, while after 5000 iterations, the variations became more subtle, with the images from later iterations appearing more refined and polished. A clear positive correlation between training image number and best validation FID was also observed for the GAN models, as shown in Figure 6, indicating the importance of large dataset size for GAN development. For FastGAN specifically, based on Figure 6, when training image number went above approximately 1000, FID improvement rate seemed to slow down substantially.

4.3. Experiment 3

Figure 7 shows how the positive sample rates of GAN-Syn-Tip, GAN-Syn-Half, GAN-Syn-Tight, GAN-Syn-Pink, GAN-Syn-Bloom, and GAN-Syn-Petal varied with YOLO-Base detection confidence thresholds for each of the apple flower bud growth stages, supplemented by Table A1. Again, a positive sample here means a synthetic image containing at least one instance at target growth stage. Expectedly, as confidence threshold increased from 0 to 1, positive sample rate decreased from 1 to 0. This indicated that when no confidence threshold was in place to filter detection results, that is a confidence threshold of 0, all synthetic images could be positive samples. Meanwhile, no detected instance by YOLO-Base could achieve a perfect confidence score, and hence no synthetic images could be positive samples when confidence threshold was at 1. Despite the positive sample rates at extreme confidence thresholds being the same for all growth stages, their rates of change, however, differed substantially from one another. In Figure 7, a line position towards the upper right corner represents a stronger overall confidence from YOLO-Base to detect target instances in the synthetic images. The relative YOLO-Base confidence ranking among the growth stages roughly followed the YOLO-Base performance ranking in Experiment 1, which from best to worst included tight cluster, bloom, petal fall, pink, half-inch green, and tip. The two similar rankings might stem from the same cause, which was the generic positive relationship between dataset size and model performance. To elaborate further, a growth stage with more training instances and images would have led to a better YOLO-Base performance on the growth stage and a more sufficiently trained GAN that generated more realistic synthetic images for the growth stage. When the better-performing YOLO-Base met the higher quality synthetic images, detections with stronger confidence could have been made. Sample detection results by YOLO-Base using the default confidence threshold on the synthetic images generated by each of the GAN models can be found in Figure 8.

An interesting comparison conducted in the study was the average target instance number per image in the real training dataset and the corresponding synthetic dataset for GAN-Tip, GAN-Half, GAN-Tight, GAN-Pink, GAN-Bloom, and GAN-Petal based on YOLO-Base, respectively, for which the results are listed in Table A2 and Table A3. Figure 9 shows the percent change in average target instance per image between the real and synthetic datasets, where positive values indicate that the synthetic datasets have higher numbers of detectable target instances per image than the real datasets, while negative values indicate that the synthetic datasets have lower numbers of detectable target instances per image than the real datasets. Here, the real training datasets represented the input to the GAN models, while the synthetic dataset represented the output from the GAN models. Expectedly, YOLO-Base was able to detect relatively many more high-confidence instances on the real datasets than the synthetic ones for all growth stages in general. This was not only due to the fact that the synthetic images were generated based on the real images and hence could not possibly have a quality higher than the real images, but also simply because YOLO-Base was trained on the real images, therefore, it should be more familiar with the real images than the synthetic ones. As shown in Table A2, the average detectable instance number per image was constantly below 1 for the real datasets at high-confidence thresholds. When the numbers for the synthetic dataset were even just slightly lower in terms of absolute value, large percent reductions up to 100% could be observed in Figure 9. No synthetic datasets had a higher average instance per image than their corresponding real datasets when confidence threshold was equal and larger than 0.35. However, for four growth stages, including tip, half-inch green, tight cluster, and bloom, YOLO-Base was able to detect more target instances on the synthetic datasets than the real ones at low confidence thresholds. For pink and petal fall, the real datasets consistently beat the corresponding synthetic ones at all times. As confidence threshold decreased, the difference between the synthetic and real datasets tended to become smaller, illustrated by the smaller percent change values.

Table A3 perfectly illustrates the dilemma of using pseudo-labeling to create training data for object detectors: balancing ground truth quantity and quality. Typically, high detector confidence thresholds are adopted in pseudo-labeling to ensure that machine-generated annotations are of high quality. However, high-confidence thresholds inevitably filter out potentially valuable instances, thereby limiting the instance diversity as well as the number of machine-generated annotations. In the current study, FastGAN models were utilized to expend the training datasets of the weak YOLO11n classes, namely tip and half-inch green. The GAN models themselves, however, were developed on small training datasets in the first place. At high confidence thresholds, very low positive sample rates were observed for the two classes (Figure 7, Table A3). Using tip at 0.9 confidence as an example, creating a synthetic dataset of 10,000 positive samples would imply first generating approximately 2 million synthetic images, a computationally expensive and time-consuming process. The default YOLO11 confidence threshold of 0.25 was chosen in the study for pseudo-labeling, not only to keep the image generation workload practical and manageable, but also for its simplicity and consistency when it comes to potentially evaluating YOLO models based on certain confidence-dependent metrics, such as precision, recall, or F1 score, although the metrics used in the study, AP and mAP, are confidence-independent.

4.4. Experiment 4

Figure 10 shows the scatter plots comparing total training instance number to AP50 performance for YOLO-Tip-5h, YOLO-Tip-1k, YOLO-Tip-2k, YOLO-Tip-4k, YOLO-Tip-8k, YOLO-Half-2k, YOLO-Half-4k, YOLO-Half-8k, YOLO-T&H-5h, YOLO-T&H-1k, YOLO-T&H-2k, YOLO-T&H-4k, and YOLO-T&H-8k, in comparison with YOLO-Base, across all six apple flower bud growth stages. Here, tip and half-inch green were the two target weak growth stages to be improved and hence the focus of the experiment. Unintentional tight cluster and pink instances were generated by GAN-Tip or GAN-Half, which were also incorporated during YOLO11n retraining. No bloom or petal fall instance was generated by GAN-Tip or GAN-Half; therefore, the total number of bloom and petal fall instances remained unchanged for all the retrained models. AP50 values for Figure 10 and corresponding mAP values are provided in Table A4, while AP50-95 and mAP50-95 values are provided in Table A5. The AP50 and AP50-95 results followed highly similar patterns; therefore, only the AP50 results are discussed in this section to avoid redundancy.

A rather noisy data distribution was achieved for all growth stages in general, as shown in Figure 10. No apparent relationship between total training instance number and AP50 could be observed, unlike the observations from previous studies [13,52] for YOLOv4 and YOLOv7, where training datasets only contained real images. In other words, higher total training instance numbers inflated by synthetic data did not necessarily lead to superior model performances. Out of the 13 retrained models, for tip, 11 achieved better results, with YOLO-Tip-5h obtaining the highest 0.406 AP50, a 30.13% improvement over YOLO-Base. For half-inch green, six retrained models achieved better results, with YOLO-Half-8k obtaining the highest 0.423 AP50, a 14.02% improvement over YOLO-Base. For the rest of the non-target classes, retraining with the synthetic datasets in general negatively impacted model performance. For tight cluster and pink, 9 and 11 retrained models showed a class performance decrease, respectively. For bloom and petal fall, even though no synthetic instances were incorporated in the training datasets at all, 10 and 8 retrained models showed a class performance decrease, respectively. The degree of the model performance degradations, however, were often trivial. Among the 13 retrained models, the worst achieved AP50s for tight cluster, pink, bloom, and petal fall were 0.819, 0.664, 0.795, and 0.467, respectively, representing 2.15%, 4.05%, 2.93%, and 6.79% declines from YOLO-Base. In terms of mAP50, seven retrained models, namely YOLO-Tip-5h, YOLO-Tip-1k, YOLO-Tip-4k, YOLO-Half-2k, YOLO-Half-8k, YOLO-T&H-5h, and YOLO-T&H-1k, achieved better results than YOLO-Base. The best mAP50 of 0.605 or a 2.83% improvement was obtained by YOLO-Tip-5h, while the worst mAP50 of 0.572 or a 2.89% decline was recorded by YOLO-T&H-4k.

In a sense, the study’s core objective of improving object detector through GAN-generated training data was successfully accomplished, as both superior YOLO11n class and overall performances were attained with the help of the synthetic data from FastGAN. However, despite observing up to a 30% performance increase in weak classes, non-target classes frequently experienced unintended, mostly minor performance decreases after model retraining, which sometimes weakened the overall object detector performance improvement to negative levels. More importantly, due to their unpredictability, the results of the study failed to provide conclusive guidelines on the optimal quantity of synthetic instances for model retraining. The low detector confidence threshold adopted during pseudo-labeling might have contributed low-quality and hence counterproductive synthetic instances for the YOLO11n models to learn from. While the utility of GAN-based synthetic data for object detection has been clearly demonstrated, retraining object detectors with small quantities of synthetic weak class instances and examining model performance experimentally and iteratively appear to be both a practical and feasible strategy for exploiting the methodology proposed in this study.

The achieved results in the study should be encouraging for other disciplines to apply and experiment with the same methodology, as agricultural image datasets containing complex backgrounds and small yet dense plant structures, like AriAplBud, arguably represent a higher level of difficulty in object detection, which in theory should be more challenging for GANs in general, let alone lightweight GANs, to learn from. The unpredictable nature of the results, however, deserves further research to explore. As an oversimplified but reasonable guess, if synthetic training data are not bringing anticipated improvements to object detectors, the images themselves might be simply not “real” enough. Synthetic image diversity is apparently another importance factor that could affect model performance, which is unfortunately dictated by the training data of GANs. Large sophisticated GANs might be able to process challenging datasets more effectively. It would be worthwhile to investigate whether synthetic image quality metrics, such as FID, correlate with object detector performance improvements when multiple GANs are involved. That is, whether more realistic synthetic training data consistently lead to higher retrained detector accuracies. Increasing input image size to higher values, such as 1024 or 2048, for GANs and objective detectors simultaneously could help to preserve important fine-grained image features that the models can utilize. Adopting more strict pseudo-labeling confidence thresholds should filter out low-quality annotations and hence reduce counterproductive model learning. Additionally, it might be beneficial for future studies to examine and understand the influence of instance size relative to image on GAN performance, as well as subsequent synthetic data effectiveness for retraining object detectors. If training with pseudo-labeled synthetic instances leads to model performance decrease, utilizing synthetic images simply as negative samples could be an intriguing research topic to exploit GANs for enhancing object detector robustness.

5. Conclusions

Based on YOLO11n, FastGAN, and AriAplBud, the following conclusions were intentionally made to highlight broader implications. The study demonstrated the feasibility of utilizing GAN to selectively improve agricultural object detector class performance and mitigate the class imbalance issue in object detection, even if the GAN was lightweight, developed on a very small dataset, and not able to converge during training. From the synthetic image quality perspective, training divergence did not necessarily indicate complete training failure, such as mode collapse for GAN, especially when periodic model validation was appropriately executed. Despite the small size of agricultural instances, which were randomly distributed in images, GAN was able to capture their characteristics and successfully generate synthetic images with object detector-detectable instances. Positive sample rate of synthetic data generally correlated with object detector performance; that is, higher detector performance of a class implies higher synthetic positive sample rate of the class at non-extreme confidence thresholds in general. The average object detector-detectable instance per image of synthetic data, however, tended to be lower than that of real data, especially at high confidence thresholds. Synthetic positive samples from GAN, when employed for object detector retraining, were able to help improve object detector performances on target classes considerably. However, optimal synthetic instance quantity for model retraining remained unclear, and a negative influence from the synthetic samples was also observed for non-target classes. Further studies are needed to investigate how the quantity and quality of synthetic instances impact object detector performance improvement or degradation in target and non-target classes.

Author Contributions

Conceptualization, W.Y.; methodology, W.Y.; software, W.Y.; validation, W.Y.; formal analysis, W.Y.; investigation, W.Y. and P.L.; resources, W.Y. and P.L.; data curation, W.Y. and P.L.; writing—original draft preparation, W.Y.; writing—review and editing, W.Y.; visualization, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the dataset AriAplBud is openly available in OSF at https://osf.io/wexu7/.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Supplementary tables accompanying the main article are provided in this section.

Table A1. Positive sample rates of GAN-Syn-Tip, GAN-Syn-Half, GAN-Syn-Tight, GAN-Syn-Pink, GAN-Syn-Bloom, and GAN-Syn-Petal, based on YOLO-Base.

Confidence Threshold	Growth Stage
Confidence Threshold	Tip	Half-Inch Green	Tight Cluster	Pink	Bloom	Petal Fall
0	1	1	1	1	1	1
0.05	0.926	0.995	1	1	1	1
0.1	0.818	0.973	1	1	1	1
0.15	0.716	0.95	1	0.991	1	1
0.2	0.618	0.925	1	0.985	1	1
0.25	0.522	0.877	1	0.979	1	0.996
0.3	0.434	0.809	1	0.961	1	0.989
0.35	0.332	0.733	1	0.937	1	0.976
0.4	0.275	0.628	1	0.878	1	0.938
0.45	0.219	0.529	1	0.796	1	0.883
0.5	0.157	0.427	1	0.697	0.998	0.781
0.55	0.126	0.316	1	0.553	0.997	0.651
0.6	0.096	0.207	0.997	0.397	0.992	0.525
0.65	0.062	0.134	0.995	0.257	0.965	0.373
0.7	0.044	0.078	0.986	0.122	0.878	0.251
0.75	0.029	0.031	0.954	0.053	0.611	0.141
0.8	0.016	0.009	0.778	0.017	0.169	0.054
0.85	0.008	0.002	0.242	0.001	0.005	0.008
0.9	0.005	0	0.003	0	0	0
0.95	0.001	0	0	0	0	0
1	0	0	0	0	0	0

Table A2. Average target instances per image of GAN-Train-Tip, GAN-Train-Half, GAN-Train-Tight, GAN-Train-Pink, GAN-Train-Bloom, and GAN-Train-Petal, based on YOLO-Base.

Confidence Threshold	Growth Stage
Confidence Threshold	Tip	Half-Inch Green	Tight Cluster	Pink	Bloom	Petal Fall
0	16.639	28.454	207.294	90.786	128.128	191.941
0.05	3.187	6.776	30.272	27.651	62.721	56.805
0.1	2.331	5.166	22.310	21.126	46.270	35.520
0.15	2.018	4.365	18.924	17.928	37.920	25.948
0.2	1.729	3.750	16.999	15.834	32.702	20.189
0.25	1.506	3.242	15.656	14.253	28.918	16.207
0.3	1.337	2.834	14.615	12.918	25.861	13.309
0.35	1.181	2.503	13.737	11.685	23.335	11.228
0.4	1.060	2.237	12.972	10.512	21.125	9.558
0.45	0.928	1.992	12.252	9.387	19.089	8.275
0.5	0.813	1.742	11.557	8.253	17.121	7.201
0.55	0.711	1.571	10.796	7.073	15.187	6.249
0.6	0.596	1.339	10.005	5.840	13.136	5.366
0.65	0.464	1.128	9.054	4.558	10.869	4.533
0.7	0.367	0.888	7.880	3.374	8.326	3.708
0.75	0.265	0.684	6.405	2.248	5.392	2.778
0.8	0.193	0.429	4.335	1.139	2.305	1.791
0.85	0.151	0.196	1.772	0.370	0.340	0.744
0.9	0.096	0.051	0.152	0.041	0.007	0.143
0.95	0.012	0	0.001	0.001	0	0.001
1	0.000	0	0	0	0	0

Table A3. Average target instances per image of GAN-Syn-Tip, GAN-Syn-Half, GAN-Syn-Tight, GAN-Syn-Pink, GAN-Syn-Bloom, and GAN-Syn-Petal, based on YOLO-Base.

Confidence Threshold	Growth Stage
Confidence Threshold	Tip	Half-Inch Green	Tight Cluster	Pink	Bloom	Petal Fall
0	14.914	29.273	245.717	45.052	156.11	186.714
0.05	3.632	8.42	49.21	15.027	47.274	37.883
0.1	2.18	5.512	30.952	10.348	31.876	19.405
0.15	1.508	4.08	23.67	7.917	24.808	12.525
0.2	1.08	3.121	19.732	6.264	20.398	8.787
0.25	0.794	2.386	17.022	4.976	17.307	6.559
0.3	0.603	1.846	15.197	3.999	14.916	4.965
0.35	0.448	1.426	13.729	3.181	12.912	3.786
0.4	0.339	1.072	12.472	2.464	11.156	2.889
0.45	0.25	0.786	11.277	1.823	9.405	2.16
0.5	0.173	0.568	10.119	1.318	7.818	1.539
0.55	0.128	0.375	9.061	0.893	6.38	1.089
0.6	0.097	0.231	7.958	0.546	4.969	0.758
0.65	0.062	0.137	6.803	0.302	3.482	0.476
0.7	0.044	0.078	5.415	0.136	2.14	0.292
0.75	0.029	0.031	3.707	0.053	0.922	0.15
0.8	0.016	0.009	1.757	0.017	0.185	0.056
0.85	0.008	0.002	0.286	0.001	0.005	0.008
0.9	0.005	0	0.003	0	0	0
0.95	0.001	0	0	0	0	0
1	0	0	0	0	0	0

Table A4. Total training instance numbers and achieved AP50 and mAP50 performances of YOLO-Base, YOLO-Tip-5h, YOLO-Tip-1k, YOLO-Tip-2k, YOLO-Tip-4k, YOLO-Tip-8k, YOLO-Half-2k, YOLO-Half-4k, YO-LO-Half-8k, YOLO-T&H-5h, YOLO-T&H-1k, YOLO-T&H-2k, YOLO-T&H-4k, and YO-LO-T&H-8k.

Model	Tip		Half-Inch Green		Tight Cluster		Pink		Bloom		Petal Fall		mAP
Model	Instance	AP	Instance	AP	Instance	AP	Instance	AP	Instance	AP	Instance	AP	mAP
YOLO-Base	371	0.312	1030	0.371	13,634	0.837	18,312	0.692	30,559	0.819	13,196	0.501	0.589
YOLO-Tip-5h	500	0.406	1217	0.367	15,804	0.84	18,312	0.694	30,559	0.816	13,196	0.509	0.605
YOLO-Tip-1k	1000	0.335	1946	0.366	24,496	0.835	18,312	0.684	30,559	0.819	13,196	0.507	0.591
YOLO-Tip-2k	2000	0.324	3460	0.355	41,977	0.819	18,312	0.685	30,559	0.81	13,196	0.506	0.583
YOLO-Tip-4k	4000	0.351	6428	0.388	76,246	0.828	18,312	0.68	30,559	0.814	13,196	0.497	0.593
YOLO-Tip-8k	8000	0.338	12,636	0.35	145,761	0.826	18,312	0.664	30,559	0.799	13,196	0.472	0.575
YOLO-Half-2k	680	0.357	2000	0.389	22,433	0.838	18,321	0.69	30,559	0.824	13,196	0.507	0.601
YOLO-Half-4k	1343	0.29	4000	0.389	39,629	0.835	18,332	0.684	30,559	0.816	13,196	0.493	0.585
YOLO-Half-8k	2671	0.342	8000	0.423	75,622	0.841	18,357	0.673	30,559	0.806	13,196	0.493	0.596
YOLO-T&H-5h	576	0.349	1329	0.396	17,187	0.833	18,312	0.693	30,559	0.828	13,196	0.529	0.605
YOLO-T&H-1k	1077	0.367	2050	0.383	25,992	0.83	18,312	0.68	30,559	0.812	13,196	0.498	0.595
YOLO-T&H-2k	2285	0.354	4230	0.302	49,001	0.835	18,318	0.668	30,559	0.808	13,196	0.489	0.576
YOLO-T&H-4k	4940	0.295	9191	0.358	100,547	0.834	18,331	0.674	30,559	0.8	13,196	0.468	0.572
YOLO-T&H-8k	10,269	0.344	19,402	0.355	205,938	0.846	18,353	0.667	30,559	0.795	13,196	0.467	0.579

Table A5. Total training instance numbers and achieved AP50-95 and mAP50-95 performances of YOLO-Base, YOLO-Tip-5h, YOLO-Tip-1k, YOLO-Tip-2k, YOLO-Tip-4k, YOLO-Tip-8k, YOLO-Half-2k, YOLO-Half-4k, YO-LO-Half-8k, YOLO-T&H-5h, YOLO-T&H-1k, YOLO-T&H-2k, YOLO-T&H-4k, and YO-LO-T&H-8k.

Model	Tip		Half-Inch Green		Tight Cluster		Pink		Bloom		Petal Fall		mAP
Model	Instance	AP	Instance	AP	Instance	AP	Instance	AP	Instance	AP	Instance	AP	mAP
YOLO-Base	371	0.169	1030	0.193	13,634	0.486	18,312	0.308	30,559	0.433	13,196	0.228	0.303
YOLO-Tip-5h	500	0.214	1217	0.184	15,804	0.483	18,312	0.3	30,559	0.426	13,196	0.235	0.307
YOLO-Tip-1k	1000	0.164	1946	0.188	24,496	0.474	18,312	0.294	30,559	0.425	13,196	0.23	0.296
YOLO-Tip-2k	2000	0.172	3460	0.189	41,977	0.471	18,312	0.305	30,559	0.431	13,196	0.238	0.301
YOLO-Tip-4k	4000	0.195	6428	0.201	76,246	0.477	18,312	0.3	30,559	0.428	13,196	0.229	0.305
YOLO-Tip-8k	8000	0.173	12,636	0.175	145,761	0.481	18,312	0.286	30,559	0.417	13,196	0.219	0.292
YOLO-Half-2k	680	0.174	2000	0.194	22,433	0.482	18,321	0.303	30,559	0.435	13,196	0.233	0.304
YOLO-Half-4k	1343	0.158	4000	0.202	39,629	0.481	18,332	0.299	30,559	0.427	13,196	0.227	0.299
YOLO-Half-8k	2671	0.2	8000	0.214	75,622	0.485	18,357	0.298	30,559	0.418	13,196	0.23	0.308
YOLO-T&H-5h	576	0.179	1329	0.212	17,187	0.479	18,312	0.304	30,559	0.436	13,196	0.246	0.309
YOLO-T&H-1k	1077	0.169	2050	0.193	25,992	0.472	18,312	0.298	30,559	0.424	13,196	0.225	0.297
YOLO-T&H-2k	2285	0.186	4230	0.152	49,001	0.476	18,318	0.289	30,559	0.421	13,196	0.227	0.292
YOLO-T&H-4k	4940	0.142	9191	0.183	100,547	0.479	18,331	0.29	30,559	0.413	13,196	0.213	0.287
YOLO-T&H-8k	10,269	0.201	19,402	0.192	205,938	0.486	18,353	0.288	30,559	0.4	13,196	0.214	0.297

References

Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Bayraktar, E.; Yigit, C.B.; Boyraz, P. Object Manipulation with a Variable-Stiffness Robotic Mechanism Using Deep Neural Networks for Visual Semantics and Load Estimation. Neural Comput. Appl. 2020, 32, 9029–9045. [Google Scholar] [CrossRef]
Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object Detection with Deep Learning: A Review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed]
Maheswaran, S.; Sathesh, S.; Gomathi, R.D.; Indhumathi, N.; Prasanth, S.; Charumathi, K.; Balanisharitha, P.; Murugesan, G.; Duraisamy, P. Automated Weed Identification And Classification Using Artificial Intelligence. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024. [Google Scholar]
Barbosa Júnior, M.R.; de Almeida Moreira, B.R.; dos Santos Carreira, V.; de Brito Filho, A.L.; Trentin, C.; de Souza, F.L.P.; Tedesco, D.; Setiyono, T.; Flores, J.P.; Ampatzidis, Y.; et al. Precision Agriculture in the United States: A Comprehensive Meta-Review Inspiring Further Research, Innovation, and Adoption. Comput. Electron. Agric. 2024, 221, 108993. [Google Scholar] [CrossRef]
Crassweller, R. Home Orchards: Flowering Habits of Apples and Pears. Available online: https://extension.psu.edu/home-orchards-flowering-habits-of-apples-and-pears (accessed on 17 December 2024).
Duan, M.; Wang, Z.; Sun, L.; Liu, Y.; Yang, P. Monitoring Apple Flowering Date at 10 m Spatial Resolution Based on Crop Reference Curves. Comput. Electron. Agric. 2024, 225, 109260. [Google Scholar] [CrossRef]
Oksuz, K.; Cam, B.C.; Kalkan, S.; Akbas, E. Imbalance Problems in Object Detection: A Review. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3388–3415. [Google Scholar] [CrossRef] [PubMed]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Available online: https://arxiv.org/abs/1406.2661 (accessed on 17 December 2024).
Franci, B.; Grammatico, S. Training Generative Adversarial Networks via Stochastic Nash Games. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 1319–1328. [Google Scholar] [CrossRef] [PubMed]
Gui, J.; Sun, Z.; Wen, Y.; Tao, D.; Ye, J. A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications. IEEE Trans. Knowl. Data Eng. 2023, 35, 3313–3332. [Google Scholar] [CrossRef]
Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative Adversarial Networks: An Overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
Yuan, W.; Choi, D.; Bolkas, D.; Heinemann, P.H.; He, L. Sensitivity Examination of YOLOv4 Regarding Test Image Distortion and Training Dataset Attribute for Apple Flower Bud Classification. Int. J. Remote Sens. 2022, 43, 3106–3130. [Google Scholar] [CrossRef]
Bayraktar, E.; Yigit, C.B.; Boyraz, P. A Hybrid Image Dataset toward Bridging the Gap between Real and Simulation Environments for Robotics: Annotated Desktop Objects Real and Synthetic Images Dataset: ADORESet. Mach. Vis. Appl. 2019, 30, 23–40. [Google Scholar] [CrossRef]
Bohacek, M.; Farid, H. Nepotistically Trained Generative-AI Models Collapse. Available online: https://arxiv.org/abs/2311.12202 (accessed on 17 December 2024).
Shumailov, I.; Shumaylov, Z.; Zhao, Y.; Papernot, N.; Anderson, R.; Gal, Y. AI Models Collapse When Trained on Recursively Generated Data. Nature 2024, 631, 755–759. [Google Scholar] [CrossRef] [PubMed]
Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4217–4228. [Google Scholar] [CrossRef] [PubMed]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8107–8116. [Google Scholar] [CrossRef]
Brock, A.; Donahue, J.; Simonyan, K. Large Scale GaN Training for High Fidelity Natural Image Synthesis. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019; pp. 1–35. [Google Scholar]
Bhagyashree; Kushwaha, V.; Nandi, G.C. Study of Prevention of Mode Collapse in Generative Adversarial Network (GAN). In Proceedings of the 2020 IEEE 4th Conference on Information & Communication Technology (CICT), Chennai, India, 3–5 December 2020. [Google Scholar] [CrossRef]
Gan, I.; Verbeek, J. Instance-Conditioned GAN. Available online: https://arxiv.org/abs/2109.05070 (accessed on 17 December 2024).
Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual Generative Adversarial Networks for Small Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1951–1959. [Google Scholar] [CrossRef]
Wang, H.; Wang, J.; Bai, K.; Sun, Y. Centered Multi-Task Generative Adversarial Network for Small Object Detection. Sensors 2021, 21, 5194. [Google Scholar] [CrossRef] [PubMed]
Sun, K.; Wen, Q.; Zhou, H. Ganster R-CNN: Occluded Object Detection Network Based on Generative Adversarial Nets and Faster R-CNN. IEEE Access 2022, 10, 105022–105030. [Google Scholar] [CrossRef]
Dewi, C.; Chen, R.C.; Hendry; Liu, Y.T. Similar Music Instrument Detection via Deep Convolution YOLO-Generative Adversarial Network. In Proceedings of the 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), Morioka, Japan, 23–25 October 2019; pp. 1–6. [Google Scholar] [CrossRef]
Jaw, D.W.; Huang, S.C.; Lin, I.C.; Zhang, C.; Huang, C.C.; Kuo, S.Y. RodNet: An Advanced Multidomain Object Detection Approach Using Feature Transformation With Generative Adversarial Networks. IEEE Sens. J. 2023, 23, 17531–17540. [Google Scholar] [CrossRef]
Ni, L.; Huo, C.; Zhang, X.; Wang, P.; Zhang, L.; Guo, K.; Zhou, Z. NaGAN: Nadir-like Generative Adversarial Network for Off-Nadir Object Detection of Multi-View Remote Sensing Imagery. Remote Sens. 2022, 14, 975. [Google Scholar] [CrossRef]
Bai, Y.; Zhang, Y.; Ding, M.; Ghanem, B. SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network. In Computer Vision—ECCV 2018. ECCV 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11217, pp. 210–226. [Google Scholar] [CrossRef]
Chen, G.; Liu, L.; Hu, W.; Pan, Z. Semi-Supervised Object Detection in Remote Sensing Images Using Generative Adversarial Networks. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 2503–2506. [Google Scholar] [CrossRef]
Jiang, W.; Ying, N. Improve Object Detection by Data Enhancement Based on Generative Adversarial Nets. Available online: https://arxiv.org/abs/1903.01716 (accessed on 17 December 2024).
Zhai, X.; Cheng, Z.; Wei, Y.; Liang, Z.; Chen, Y. Compressive Sensing Ghost Imaging Object Detection Using Generative Adversarial Networks. Opt. Eng. 2019, 58, 1. [Google Scholar] [CrossRef]
Bosquet, B.; Cores, D.; Seidenari, L.; Brea, V.M.; Mucientes, M.; Del Bimbo, A. A Full Data Augmentation Pipeline for Small Object Detection Based on Generative Adversarial Networks. Pattern Recognit. 2023, 133, 108998. [Google Scholar] [CrossRef]
Posilović, L.; Medak, D.; Subašić, M.; Budimir, M.; Lončarić, S. Generative Adversarial Network with Object Detector Discriminator for Enhanced Defect Detection on Ultrasonic B-Scans. Neurocomputing 2021, 459, 361–369. [Google Scholar] [CrossRef]
Lee, H.; Kang, S.; Chung, K. Robust Data Augmentation Generative Adversarial Network for Object Detection. Sensors 2023, 23, 157. [Google Scholar] [CrossRef] [PubMed]
Dai, X.; Yuan, X.; Wei, X. Data Augmentation for Thermal Infrared Object Detection with Cascade Pyramid Generative Adversarial Network. Appl. Intell. 2022, 52, 967–981. [Google Scholar] [CrossRef]
Liu, L.; Muelly, M.; Deng, J.; Pfister, T.; Li, L.J. Generative Modeling for Small-Data Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6072–6080. [Google Scholar] [CrossRef]
Zhu, D.; Xia, S.; Zhao, J.; Zhou, Y.; Jian, M.; Niu, Q.; Yao, R.; Chen, Y. Diverse Sample Generation with Multi-Branch Conditional Generative Adversarial Network for Remote Sensing Objects Detection. Neurocomputing 2020, 381, 40–51. [Google Scholar] [CrossRef]
Kim, J.; Kim, J.; Ri, J. Generative Adversarial Networks and Faster-Region Convolutional Neural Networks Based Object Detection in X-Ray Baggage Security Imagery. OSA Contin. 2020, 3, 3604. [Google Scholar] [CrossRef]
Lin, Y.; Suzuki, K.; Takeda, H.; Nakamura, K. Generating Synthetic Training Data for Object Detection Using Multi-Task Generative Adversarial Networks. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 5, 443–449. [Google Scholar] [CrossRef]
Maeda, H.; Kashiyama, T.; Sekimoto, Y.; Seto, T.; Omata, H. Generative Adversarial Network for Road Damage Detection. Comput. Civ. Infrastruct. Eng. 2021, 36, 47–60. [Google Scholar] [CrossRef]
Courtrai, L.; Pham, M.T.; Lefèvre, S. Small Object Detection in Remote Sensing Images Based on Super-Resolution with Auxiliary Generative Adversarial Networks. Remote Sens. 2020, 12, 3152. [Google Scholar] [CrossRef]
Nath, N.; Behzadan, A.H. Deep Generative Adversarial Network to Enhance Image Quality for Fast Object Detection in Construction Sites. In Proceedings of the 2020 Winter Simulation Conference, Orlando, FL, USA, 14–18 December 2020; pp. 2447–2459. [Google Scholar]
Li, X.; Luo, M.; Ji, S.; Zhang, L.; Lu, M. Evaluating Generative Adversarial Networks Based Image-Level Domain Transfer for Multi-Source Remote Sensing Image Segmentation and Object Detection. Int. J. Remote Sens. 2020, 41, 7327–7351. [Google Scholar] [CrossRef]
Yuan, W. AriAplBud: An Aerial Multi-Growth Stage Apple Flower Bud Dataset for Agricultural Object Detection Benchmarking. Data 2024, 9, 36. [Google Scholar] [CrossRef]
Jocher, G.; Qiu, J. Ultralytics YOLO11. Available online: https://github.com/ultralytics/ultralytics (accessed on 17 December 2024).
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Liu, B.; Zhu, Y.; Song, K.; Elgammal, A. Towards Faster and Stabilized Gan Training for High-Fidelity Few-Shot Image Synthesis. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, 3–7 May 2021; pp. 1–13. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Available online: https://arxiv.org/abs/1706.08500 (accessed on 17 December 2024).
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
A Fast and Stable GAN for Small and High Resolution Imagesets—Pytorch. Available online: https://github.com/odegeasslbc/FastGAN-pytorch (accessed on 17 December 2024).
Yuan, W. Accuracy Comparison of YOLOv7 and YOLOv4 Regarding Image Annotation Quality for Apple Flower Bud Classification. AgriEngineering 2023, 5, 413–424. [Google Scholar] [CrossRef]
Yuan, W.; Choi, D. UAV-Based Heating Requirement Determination for Frost Management in Apple Orchard. Remote Sens. 2021, 13, 273. [Google Scholar] [CrossRef]

Figure 1. Schematic illustration of weak object detector class performance under imbalanced training and validation datasets.

Figure 2. Schematic diagram illustrating the proposed methodology utilizing GAN to mitigate weak object detector performance, which is rooted in insufficient training data at the class or overall level.

Figure 3. Apple flower bud detection results by YOLO-Base on sample YOLO-Test images using the default confidence threshold.

Figure 4. Generator loss, discriminator loss, and FID training records of GAN-Tip, GAN-Half, GAN-Tight, GAN-Pink, GAN-Bloom, and GAN-Petal.

Figure 5. Sample synthetic validation images generated by GAN-Tip, GAN-Half, GAN-Tight, GAN-Pink, GAN-Bloom, and GAN-Petal at different training iterations with the same input noise vectors.

Figure 6. Scatter plot showing the relationship between training image number and best validation FID achieved by GAN-Tip, GAN-Half, GAN-Tight, GAN-Pink, GAN-Bloom, and GAN-Petal.

Figure 7. Line plot showing the relationship between YOLO-Base confidence threshold and positive sample rate of GAN-Syn-Tip, GAN-Syn-Half, GAN-Syn-Tight, GAN-Syn-Pink, GAN-Syn-Bloom, and GAN-Syn-Petal.

Figure 8. Apple flower bud detection results by YOLO-Base using the default confidence threshold on sample synthetic images generated by GAN-Tip, GAN-Half, GAN-Tight, GAN-Pink, GAN-Bloom, and GAN-Petal.

Figure 9. Percent change in average detectable target instance per image between GAN-Train-Tip, GAN-Train-Half, GAN-Train-Tight, GAN-Train-Pink, GAN-Train-Bloom, and GAN-Train-Petal and the corresponding synthetic datasets, GAN-Syn-Tip, GAN-Syn-Half, GAN-Syn-Tight, GAN-Syn-Pink, GAN-Syn-Bloom, and GAN-Syn-Petal, based on YOLO-Base at different confidence thresholds.

Figure 10. Scatter plots between total training instance number and AP50 performance of YOLO-Base, YO-LO-Tip-5h, YOLO-Tip-1k, YOLO-Tip-2k, YOLO-Tip-4k, YOLO-Tip-8k, YOLO-Half-2k, YOLO-Half-4k, YOLO-Half-8k, YOLO-T&H-5h, YOLO-T&H-1k, YOLO-T&H-2k, YO-LO-T&H-4k, and YOLO-T&H-8k, for all six apple flower bud growth stages.

Table 1. Summary of the training, validation, and test datasets for developing the baseline YOLO11n apple flower bud detector, YOLO-Base.

Dataset	Image Number	Annotation Number
Dataset	Image Number	Tip	Half-Inch Green	Tight Cluster	Pink	Bloom	Petal Fall
YOLO-Train	2520	371	1030	13,634	18,312	30,559	13,196
YOLO-Val	720	88	242	3811	5171	8850	3912
YOLO-Test	360	46	117	2322	2483	4382	1941

Table 2. Summary of the training and validation datasets for developing the six FastGAN apple flower bud image generators, GAN-Tip, GAN-Half, GAN-Tight, GAN-Pink, GAN-Bloom, and GAN-Petal.

Model	Growth Stage	Training Dataset	Image Number	Validation Dataset	Image Number
GAN-Tip	Tip	GAN-Train-Tip	166	GAN-Val-Tip	43
GAN-Half	Half-inch green	GAN-Train-Half	392	GAN-Val-Half	103
GAN-Tight	Tight cluster	GAN-Train-Tight	1106	GAN-Val-Tight	308
GAN-Pink	Pink	GAN-Train-Pink	1528	GAN-Val-Pink	438
GAN-Bloom	Bloom	GAN-Train-Bloom	1336	GAN-Val-Bloom	380
GAN-Petal	Petal fall	GAN-Train-Petal	904	GAN-Val-Petal	255

Table 3. Summary of the 13 new datasets for retraining YOLO11n apple flower bud detectors, including the base datasets, the numbers of synthetic images generated by GAN-Tip and GAN-Half, and the total real and synthetic instances for all six growth stages.

New Dataset	Base Dataset	Synthetic Image Number		Total Instance Number
New Dataset	Base Dataset	GAN-Tip	GAN-Half	Tip	Half-Inch Green	Tight Cluster	Pink	Bloom	Petal Fall
YOLO-Train-Tip-5h	YOLO-Train	76	0	500	1217	15,804	18,312	30,559	13,196
YOLO-Train-Tip-1k	YOLO-Train-Tip-5h	383	0	1000	1946	24,496	18,312	30,559	13,196
YOLO-Train-Tip-2k	YOLO-Train-Tip-1k	991	0	2000	3460	41,977	18,312	30,559	13,196
YOLO-Train-Tip-4k	YOLO-Train-Tip-2k	2205	0	4000	6428	76,246	18,312	30,559	13,196
YOLO-Train-Tip-8k	YOLO-Train-Tip-4k	4658	0	8000	12,636	145,761	18,312	30,559	13,196
YOLO-Train-Half-2k	YOLO-Train	0	361	680	2000	22,433	18,321	30,559	13,196
YOLO-Train-Half-4k	YOLO-Train-Half-2k	0	1077	1343	4000	39,629	18,332	30,559	13,196
YOLO-Train-Half-8k	YOLO-Train-Half-4k	0	2552	2671	8000	75,622	18,357	30,559	13,196
YOLO-Train-T&H-5h	YOLO-Train	124	0	576	1329	17,187	18,312	30,559	13,196
YOLO-Train-T&H-1k	YOLO-Train-T&H-5h	433	0	1077	2050	25,992	18,312	30,559	13,196
YOLO-Train-T&H-2k	YOLO-Train-T&H-1k	1034	242	2285	4230	49,001	18,318	30,559	13,196
YOLO-Train-T&H-4k	YOLO-Train-T&H-2k	2253	952	4940	9191	100,547	18,331	30,559	13,196
YOLO-Train-T&H-8k	YOLO-Train-T&H-4k	4708	2419	10,269	19,402	205,938	18,353	30,559	13,196

Table 4. YOLO-Base performance on YOLO-Test.

Growth Stage	AP50	AP50-95
Tip	0.312	0.169
Half-inch green	0.371	0.193
Tight cluster	0.837	0.486
Pink	0.692	0.308
Bloom	0.819	0.433
Petal fall	0.501	0.228
mAP	0.589	0.303

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuan, W.; Li, P. Lightweight GAN-Assisted Class Imbalance Mitigation for Apple Flower Bud Detection. Big Data Cogn. Comput. 2025, 9, 28. https://doi.org/10.3390/bdcc9020028

AMA Style

Yuan W, Li P. Lightweight GAN-Assisted Class Imbalance Mitigation for Apple Flower Bud Detection. Big Data and Cognitive Computing. 2025; 9(2):28. https://doi.org/10.3390/bdcc9020028

Chicago/Turabian Style

Yuan, Wenan, and Peng Li. 2025. "Lightweight GAN-Assisted Class Imbalance Mitigation for Apple Flower Bud Detection" Big Data and Cognitive Computing 9, no. 2: 28. https://doi.org/10.3390/bdcc9020028

APA Style

Yuan, W., & Li, P. (2025). Lightweight GAN-Assisted Class Imbalance Mitigation for Apple Flower Bud Detection. Big Data and Cognitive Computing, 9(2), 28. https://doi.org/10.3390/bdcc9020028

Article Menu

Lightweight GAN-Assisted Class Imbalance Mitigation for Apple Flower Bud Detection

Abstract

1. Introduction

2. Literature Review

2.1. Implementing GAN with Object Detector

2.2. Retraining Object Detector with Synthetic Data from GAN

2.3. Manipulating Real Data with GAN for Object Detector

3. Methods and Materials

3.1. Dataset

3.2. Object Detector and GAN

3.3. Experiments

3.3.1. Experiment 1

3.3.2. Experiment 2

3.3.3. Experiment 3

3.3.4. Experiment 4

4. Results and Discussions

4.1. Experiment 1

4.2. Experiment 2

4.3. Experiment 3

4.4. Experiment 4

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI