Machine learning-based bee recognition and tracking for advancing insect behavior research

Rozenbaum, Erez; Shrot, Tammar; Daltrophe, Hadassa; Kunya, Yehuda; Shafir, Sharoni

doi:10.1007/s10462-024-10879-z

Machine learning-based bee recognition and tracking for advancing insect behavior research

Open access
Published: 12 August 2024

Volume 57, article number 245, (2024)
Cite this article

Download PDF

You have full access to this open access article

Artificial Intelligence Review Aims and scope Submit manuscript

Machine learning-based bee recognition and tracking for advancing insect behavior research

Download PDF

Erez Rozenbaum¹,
Tammar Shrot²,
Hadassa Daltrophe²,
Yehuda Kunya² &
…
Sharoni Shafir¹

3669 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

The study of insect behavior, particularly that of honey bees, has a broad scope and significance. Tracking bee flying patterns grants much helpful information about bee behavior. However, tracking a small yet fast-moving object, such as a bee, is difficult. Hence, we present artificial intelligence, machine-learning-based bee recognition, and tracking systems to assist the researcher in studying the bee’s behavior. To develop a machine learning system, a labeled database is required for model training. To address this, we implemented an automated system for analyzing and labeling bee videos. This labeled database served as the foundation for two distinct bee-tracking solutions. The first solution (planar bee tracking system) tracked individual bees in closed mazes using a neural network. The second solution (spatial bee tracking system) utilized a neural network and a tracking algorithm to identify and track flying bees in open environments. Both systems tackle the challenge of tracking small-bodied creatures with rapid and diverse movement patterns. Although we applied these systems to entomological cognition research in this paper, their relevance extends to general insect research and developing tracking solutions for small organisms with swift movements. We present the complete architecture and detailed methodologies to facilitate the utilization of these models in future research endeavors. Our approach is a simple and inexpensive method that contributes to the growing number of image-analysis tools used for tracking animal movement, with future potential applications under less sterile field conditions. The tools presented in this paper could assist the study of movement ecology, specifically in insects, by providing accurate movement specifications. Following the movement of pollinators or natural enemies, for example, greatly contributes to the study of pollination or biological control, respectively, in natural and agro-ecosystems.

Image-based South Asian bee species identification: a machine learning approach

Article Open access 05 July 2025

Individual honey bee tracking in a beehive environment using deep learning and Kalman filter

Article Open access 11 January 2024

A Real-Time Edge Computing System for Monitoring Bees at Flowers

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The ability to individually track animals provides an opportunity to identify inter-individual differences and personality patterns and compare cognitive performance on various tasks (Christakis et al. 2012; Toledo et al. 2020). Some animals are relatively easy to track, especially when they are relatively large, move relatively slowly, and in two dimensions; others, such as insects, which can be small, and fly fast in three dimensions, are more challenging. For example, a honey bee with an average length of 14 mm and a flight speed exceeding 25 km/h is difficult to track in space (Barron and Srinivasan 2006).

Movement ecology relies on advanced tracking and spatial analysis tools due to technological advancements (Brum-Bastos et al. 2022). Beneficial insects, such as bees and wasps, are vital for agricultural services like pollination and biological control. Understanding their movement patterns is crucial for conservation biology (Chapman et al. 2023). When direct observation is impractical, theoretical frameworks can estimate flight distances and predict movement between flower patches (Cresswell et al. 2000; Brunet et al. 2023). However, a gap remains between theoretical predictions and actual observations in pollination studies (Kendall et al. 2022).

The main aim of this paper is to establish a machine learning (ML) based system to recognize and track insects to enable the investigation of various aspects of their spatial behavior. Yet, building ML systems requires existing data. Hence, the systems described in this paper were built to assist in our research regarding the influence of nutrition on bees’ spatial cognition. Additionally, given the challenge of tracking small insects, there is a need to develop dedicated models for various environments. Consequently, an integral aspect of our aim is to provide detailed guidance on building the database and training the models. This will enable other researchers to replicate and customize the system for their own experiments.

In our study, we used worker honey bees as model organism, tracking them individually to assess their behavior and the impact of stressors on their performance. This study presents the development of an AI system using machine-learning models to identify and track bees. We divide the study into two main steps: first, developing a machine-learning system (planar bee tracking system) to identify bees in a closed 8-arm radial maze for studying behavior while walking; and second, creating a machine-learning system (spatial bee tracking system) to track flying bees for advanced research on spatial behavior and flight patterns in the real world.

1.1 Background

Studying the spatial behavior of bees and the effect of various stressors on performance in a spatial task is a multifaceted activity. Bees have been suffering exposure to multiple biotic and abiotic stressors (Goulson et al. 2015; Samuelson et al. 2016; GÓmez-Moracho et al. 2017; Belsky and Joshi 2019). We wanted to develop a system that would enable these studies to be extended and allow researchers to rigorously quantify the effect of stressors on bees’ spatial behavior.

Bees’ spatial behavior can be studied using radar to examine their flight path in open areas (Menzel et al. 2005). Alternatively, bees can be trained to recognize landmarks and tested on changed terrain to observe their use of landmarks for navigation (Ushitani et al. 2016). For studying spatial behavior in bees, we designed an 8-arm radial maze similar to those used with rodents (Kurzina et al. 2020). While basic maze performance can be observed in real-time, video recording allows for more accurate and detailed analysis. Developing a machine-learning system enables high-resolution tracking of bee movement, providing precise data on speed, acceleration, and movement patterns, enhancing analysis efficiency.

A machine-learning model was trained to recognize and track the bee’s position in the maze. It provides insights into the bee’s location, maze completion, time taken, speed, and total walking distance. Another model was developed to analyze flying patterns, locations, and flight duration of bees in real-world scenarios. Videos of bees flying in a controlled environment were used to train and test this model.

Computer vision and AI have been utilized in numerous studies exploring different aspects of bee biology. For instance, one study (Bjerge et al. 2019) employed computer vision and AI to detect mite infestations in beehives, while another study (Bozek et al. 2021) investigated crowd behavior in bee colonies. Additional research (Ramírez et al. 2012; Elizondo et al. 2013) used computer vision to monitor honeycomb cells in hives, or to identify honey bee foragers that returned to the hive carrying pollen loads on their hind legs (Rodriguez et al. 2018). A comprehensive review (Odemer 2022) critically examines the historical progression of automated techniques for measuring bee flight and counting bees, with a focus on improving validation methods.

The results of these studies indicate that the AI approaches provide high measurement accuracy, which may be better than manual or traditional automated approaches. Recently, (Bjerge et al. 2022) provided an automated AI-based real-time insect monitoring system. The system identifies individual insects in the field, providing information about phenology and foraging behavior. Similarly, (Zhang et al. 2024) developed machine learning techniques to discriminate between three species of bees that pollinate alfalfa. Methods are also being developed for detecting small moving objects, such as bees, from videos recorded using unmanned aerial vehicles (Stojnić et al. 2021).

Our study introduces an automated tool for identifying and tracking individual bees. We provide a detailed description of the system, covering data preparation, training, classification, and tracking. Unlike previous research on swarm tracking or individual identification, our focus is on precisely measuring movement patterns of individuals. This detailed measurement enables analysis of walking and flying patterns, contributing to a better understanding of spatial behavior.

1.2 Machine learning background

Machine learning is a subset of AI that uses algorithms to learn patterns from data for classification, predictions, and decision-making. Supervised learning is a technique where a computer learns from labeled examples to solve problems. The model is trained using labeled examples, and its structure is adjusted to produce accurate solutions. Testing is done using separate labeled examples to evaluate accuracy.

Artificial neural networks (ANNs) are supervised learning algorithms inspired by brain processes (Bhattacharya et al. 2021). ANNs consist of interconnected units, including input, hidden, and output layers. Neurons perform mathematical operations with weighted contributions. Convolutional neural networks (CNNs) are a type of ANN used for image processing. They use convolutions instead of matrix multiplications to reduce image size and identify features. You Only Look Once (YOLO) is a fast and accurate CNN algorithm for image recognition (Fang et al. 2019). It divides images into parts and identifies objects based on their fit (Fig. 1).

YOLO’s architecture has a total of 24 convolutional layers with two fully connected layers at the end, as illustrated in Fig. 2a. Over the years, the algorithm has been improved and developed (see timeline in Fig. 2b), and today there is a fifth version (YOLO-v5), which we used in this study. We chose to use this version because of its high speed, which is vital for developing a real-time tracking system.

While recognizing bees by using the YOLO algorithms, we must also consider the possibility that several bees appear in the same frame, which will negatively impact the tracking process. Tracking means that from the moment we recognize a particular object, we continue to recognize it in the following frames as the same object we recognized first, even if we recognize more objects in the following frames. Hence, we need to know how to differentiate between the various objects by giving each object a unique number.

Simple online and realtime tracking (SORT) (Bewley et al. 2016) is a tracking algorithm written as an open-source project. This algorithm uses detection data from previous and current frames to make associations. For this purpose, it uses the Kalman filter (Bishop and Welch 2001). The SORT algorithm is one of the fastest open-source tracking algorithms for multiple objects.

Tracking algorithms use previous movement information (speed, direction) to predict future movement. When new samples are received, we associate them with previous ones based on our prediction. The Kalman filter balances the weight between the prediction and the noisy, inaccurate samples to obtain an accurate path in tracking.

2 Planar bee tracking system

This system tracks a single bee in an 8-arm radial maze. It recognizes and records the bee’s location and movement within the maze. In the upcoming sections, we provide an overview of the biological experiment, its technical aspects, and the development of the machine-learning system.

2.1 Biological and technical description of the input data

Our experiment tested honey bee foragers (Apis mellifera). The tested bees were kept in a temperature-controlled room, which was kept at 25 degrees Celsius. The room was naturally illuminated by one window, which provided the colony with circadian cues about the time of day. It was further illuminated by 36 fluorescent light bulbs, which turned on about half an hour after sunrise and turned off about half an hour before sunset throughout the entire experimental period. This allowed bees to naturally return to their hive towards sunset. Every third bulb was connected to a different phase of a three-phase electricity system, which reduces the flicker frequency to which bees are sensitive. The bees were divided into two groups with 30 bees in each: the control group received a balanced diet containing a 1:1 ratio of omega 6 and 3 essential fatty acids; and the other group received an unbalanced diet containing 5 times more omega 6 than 3, which is known to impair associative learning in bees (Arien et al. 2018).

After the feeding period, the behavior of the bees in the bee maze was tested. The maze was built in the form of 8 arms, as illustrated in Fig. 3. Honey bees have been tested in a 6-arm maze before (Brown et al. 1997), and 8-arm mazes are typically used with rodents (Olton and Samuelson 1976), thus facilitating comparisons. A feeder at the end of each arm contained 3 micro-liters of $20\%$ w/w sucrose solution, since the quality fo reward can affect the motivation of foraging bees. Honey bees collect nectar in their crop mainly to carry back to the colony. A bee can load upwards of 50 micro-liters, so the bee in our experiment remained motivated to feed throughout the test. The bee was supposed to collect food from all the arms in less than 5 minutes. The bee’s performance was assessed by several measures: the speed of completing the maze, the number of times the bee entered each arm, and fine details in movement patterns.

The experiment was filmed using a camera with a fixed height and angle above the maze: see Fig. 4. The start and end of the recording were triggered manually from the moment the bee entered until it exited the maze, respectively.

We worked with individually marked bees, and each bee made repeated visits to the maze. Bees were initially trained to approach the maze entrance. When an individual bee learned to approach the maze entrance, it was allowed entrance into the maze and the video recording began. The entrance to the maze was blocked by a transparent cover to prevent more bees from entering. The arms visited by the bee in the maze were noted by the experimenter, and when the bee had visited all eight arms, the ceiling was lifted, and the bee flew back to the hive.

2.2 Constructing the model

To train a machine-learning model, we need to use preexisting labeled examples to enable the model to identify the repetitive attributes and use them to recognize the desired object. Therefore, to train a model that can recognize a bee, we need to provide it with many bee images. The training data must contain various images with different locations and positions since we wish the model to recognize the bee without some constant background feature.

2.2.1 Creating the training set

Many tools demand a manual marking of the object’s location in the image to create the labeled training dataset. This process is highly time-consuming. Hence, we created an automatic image processing system to mark the bee’s location. This system enables the quick creation of a set of marked images throughout several stages of the preparation process, as follows (see Fig. 5).

(a1)
Create reference image. To prepare the reference background image of the maze, we took two different frames from a video of a bee in the maze so that in the first frame, the bee is on the lower side of the maze, and in the second frame, the bee is on the upper side. We cut the two frames in the middle and took the parts without the bee. Connecting the two different parts created the reference image.^{Footnote 1}
(a2)
Convert reference image to grayscale. Given a colorful reference image, we eliminate the unnecessary color information by turning the image to grayscale.
(b1)
Convert the frame to grayscale. As in the previous stage, we eliminate unnecessary color information, this time from the current frame.
(b2)
Subtract frame from reference image. For each frame, we subtract the current frame from the reference image. As a result, most of the image is reset to the zero value for each pixel (black color) or smaller. Smaller values (negatives) are wrapped to the maximum possible value of 255 (white color). Hence, we are left with the bee in grayscale (around 90 value) and many weak noise points around the values 255 and 0.
(b3)
Clean noise. We clean the image by resetting the noise (values in the range 0–30 or 225–255) to black, resulting in a smooth image with a grayscale bee.
(b4)
Convert to black and white. Every pixel that is not black is reset to white. That way, we get an image of two tones: a black background and a white bee in the middle. Note that there may be various additional white noises in the image.
(b5)
Morphological opening. We run an opening process on the current frame in two phases:
1. 1.
  Activating erosion on the frame. To clean small noises from the frame, we blacken each pixel that is not surrounded by white in the original image. This process effectively removes unwanted disturbances.
2. 2.
  Activating dilation on the frame. We set each pixel in the frame to white if it is surrounded by at least one white pixel in the original image. This process ensures the bee returns to its initial size.
After these processes, we get a noise-free frame, with a bee in a slightly more clumsy form, without legs or delicate parts.
(b6)
Find bee location. After obtaining the image from the previous steps, we determine the upper-left and bottom-right coordinates of the white pixels to generate a blocking square. We then verify if this blocking square aligns with the size of a bee, thus filtering out images with multiple bees or human hands (for example, the hands of the experimenter in our case). If the blocking square matches a bee’s size, we have successfully automated the detection of a bee in the frame.
(b7)
Save image and text files. We save the original image and generate a new text file. This text file contains the bee’s position in the image and the length and width of the blocking square, normalized to the size of the image. These two output files constitute the training set, which is the input for the YOLO-v5 machine-learning model.
(b8)
Human validation. We duplicated each image for accuracy validation and added a red square to indicate the calculated bee location. By transforming these images into a video and reviewing them manually, we ensured that the red square consistently appeared in the correct position, i.e., that the bee was identified accurately. This confirms the validity of the automatic detection process.

To create a training set, we generated 4000 images using the aforementioned process.

2.2.2 Training the model

To train the model using the YOLO algorithm, we divided the compiled database into training (3200 images), testing (400 images), and validation (400 images) datasets.

We then trained the model for 300 epochs (the number of iterations over the complete training set) with a batch size of 32 images. The batch and epoch sizes were determined according to the recommended values that correspond to the memory we have in our hardware (16 GB of RAM).

The training process was completed after 4.5 hours, resulting in a fully trained model. Figure 6 shows the loss function and metrics as a function of the number of epochs completed during the training process.

The YOLO loss function has three different measurements:

Box (box_loss) Loss due to a box prediction not exactly covering an object (bounding box regression loss).
Objectness (obj_loss) Loss due to a wrong box-object IoU^{Footnote 2} prediction.
Classification (cls_loss) Loss due to deviations from predicting ‘1’ for the correct classes and ‘0’ for all the other classes for the object in that box.

The YOLO accuracy functions use mean average precision (mAP), which compares the ground-truth bounding box to the detected box and returns a score: the higher the score, the more accurate the model is in its detection.

mAP 0.5 When IoU is set to 0.5, the AP^{Footnote 3} of all pictures of each category is calculated, and then all categories are averaged: mAP.
mAP 0.5$-$0.95 Represents the average mAP at different IoU thresholds (from 0.5 to 0.95 in steps of 0.05).

The six left plots in Fig. 6 display the loss functions. The upper three plots show training set loss, while the lower three plots show validation set loss.

All loss plots decreased with increasing epochs, indicating improved model performance. No overfitting was observed, as improvement was seen in both training and validation sets. After 300 epochs, further training yielded negligible improvement. Continuing beyond this point was unnecessary. Additionally, the four right-hand plots in Fig. 6 demonstrate significant initial improvement in measured accuracy.

2.3 Recognizing a bee in the maze

The process for recognizing a bee in a maze from a video consists of several steps (see Fig. 7):

(a)
Detect the maze’s center.

In the preprocessing step, we determine the location of the maze’s center as a reference point. This step is crucial for accurately calculating the bee’s position and distance in the video. Instead of calculating the center’s location in each frame, which is inefficient and prone to noise from human hands or the bee, we detect the center by analyzing information from 100 frames. We select the point with the highest likelihood of being the center, grading each result based on its frequency and confidence value obtained from the model.
(b1)
Detect a bee. The recognition machine-learning models process video frames and provide the bee’s position as minX, maxX, minY, maxY, representing the coordinates of the bee’s bounding box in pixels.
(b2)
Filter unreasonable results. To validate the bee’s position in the current frame, we verify that it falls within the known boundaries of the maze. Additionally, we compare the current and previous locations of the bee to ensure consistent movement and filter out bees that are simply flying around the maze.
(b3)
Calculate the bee’s position in centimeters. Given the machine-learning output—the blocking square, minX, maxX, minY, maxY—we calculate the center of the bee in centimeters as follows:
$$\begin{aligned} x_{bee}[cm]= & {} \left( minX_{bee}+\frac{maxX_{bee}-minX_{bee}}{2}+X_{MazeCenter}\right) \cdot pixelToCm \\ y_{bee}[cm]= & {} \left( minY_{bee} +\frac{maxY_{bee}-minY_{bee}}{2}+Y_{MazeCenter}\right) \cdot pixelToCm \end{aligned}$$
Where $X_{MazeCenter}$ and $ Y_{MazeCenter}$ are the maze center values in pixels (see phase (a)), and pixelToCm, a constant whose value is 0.039489 cm as measured in a video. It expresses the resolution of the pixel size in centimeters.
(b4)
Locate the arm and the distance. To track the bee’s progress in the 8-arm maze, we determine the specific arm (ranging from 1 to 8) it has visited and measure the distance it has traveled. We accomplish this by converting the bee’s Cartesian coordinates to polar coordinates and analyzing the angles within the designated range, as illustrated in Fig. 8.
(b5)
Calculate the total path length and the average speed. The spatial cognition abilities of a bee can be evaluated by considering the length of its path and the speed at which it completes the maze visit. A bee with strong spatial cognition will exhibit shorter paths and faster completion times, which we aim to calculate.

We determine the total length traveled by the bee in the maze by comparing consecutive frames and measuring the distance between the bee’s locations. This allows us to assess the bee’s efficiency in navigating the maze.

To ensure accuracy and exclude small movements while the bee is feeding (from the feeder at the end of each maze arm), we set a minimum distance threshold between frames. Only distances exceeding this threshold are added to the total length. The threshold is experimentally chosen and fine-tuned by observing the video to ensure accurate measurement of movement while accounting for feeding behavior. The minimum threshold is influenced by the frame rate, which is set at 50fps in our videos. Based on these considerations, we set the minimum movement distance to a shift of three pixels between frames. Figure 9 demonstrates an example of a bee’s processed path using Wolfram Mathematica software.

To calculate the average walking speed, we divide the total path length by the total time spent walking (excluding feeding time).
(b6)
Save results to new video and CSV files. The analysis of bee’s recognition produces two output files:
1. 1.
  A video file in which the bee and maze are annotated, and an information panel is displayed on the left side. Refer to Fig. 10 for visualization. The video aims to see the revelations ‘in real time’ and understand the software’s calculations and decisions. (For example, if it seems that the maze’s center was detected in the wrong location, this will help us understand the strange results we will get in decoding this video.)
2. 2.
  A CSV file containing all the information collected about the bee in each frame, see example in Table 1. The purpose of the CSV file is to enable us to analyze the extracted data with additional external tools.

Table 1 CSV output file example

Full size table

2.3.1 Challenges in using the model

As part of analyzing the results, when we identified anomalies, we validated the bee tracking process by manually watching the processed videos (see Fig. 10) that produced an abnormal bee operation. This human procedure led to encountering two primary issues: incorrect detection of the maze’s center and the detection of flying bees outside the maze.

In some videos, the maze’s center was inaccurately identified, typically due to obstructions (e.g., the researcher’s hand) in the initial frames. Consequently, an erroneous center was selected, leading to incorrect calculations of the bee’s location. To resolve this issue, we extended the maze’s center detection stage, as outlined in step (a) (Fig. 7) above.

Another challenge arose when another bee flying above the maze was mistakenly identified as the tracked bee. To mitigate this problem, we implemented a filtering mechanism that eliminates information about other bees in the videos, as described in step (b2).

3 Spatial bee tracking system

The second system studying bees’ spatial behavior focuses on tracking a flying bee. Tracking flying bees allows us to change the experiment’s environment, take the bee out of the controlled room, and put it in the open field. Whereas the first system is typically designed for specific experiments following a single bee in a planar maze, the system described in this section was intended to be more general and support various experiments, including following several bees and in less planar environments. This would make it possible to use the system in many scenarios in which the behavior of individual bees needs to be tracked, either in more sophisticated lab-oriented experiments studying spatial learning and orientation of flying bees or outdoors. For example, bee traffic in and out of the hive is one of the measures used to assess colony health and strength. It is also a most pertinent metric for assessing the pollination potential of the hive, for example, when colonies are rented to provide pollination services. In such cases, a camera would be placed in front of a hive to monitor hive activity. In order to add recognition of individual bees over a longer time frame, the system could be combined with one that identifies individually-marked bees at the hive entrance. It is feasible to individually mark up to several hundred bees with colored numbered tags, RFID tags, or barcodes (for example, (Crall et al. 2015; Alburaki et al. 2021; Warren et al. 2024). It would also be possible to place a camera in the field, in front of a flowering tree or patch of flowers, in order to assess the floral visitation behavior of the bees. The system could also be extended to observe other bee species and focus, for example, on nests of ground-nesting bees, to quantify nest-building, procurement of nectar and pollen to the nest, and nest orientation behaviors. Many of these kinds of observations are often performed by human observers, who can only sample a small fraction of the total time and with limited measures. Cameras can greatly enlarge the amount of data collected, but manual collection of measurements from the videos is highly time-consuming and limited in the quality of data collected. An AI system could track bees for longer times, with fewer errors, and provide much more data, allowing more sophisticated measures and statistical analyses.

3.1 Training the model

The model was trained to detect flying bees inside a room with static cameras. Tracking flying bees involves three-dimensional data, requiring more than one camera. However, in our scenario, bees typically maintain a consistent height during flight. Therefore, we ignored this dimension to simplify data collection and analysis.

For the training, we used a process similar to the process described in Sect. 2 to automatically turn a video of flying bees into a dataset without needing to manually identify the bees in each frame. We checked whether the model from the previous part, which recognizes a bee in the maze, also manages to recognize flying bees, and it turned out that it does not recognize them. Hence, we trained a new model with the new dataset of flying bees, using Yolo-v5, as described in Sect. 2. The graphic analytic appears in Fig. 11. The different graphs’ meaning is explained in Sect. 2.2.2. These results indicate that the training process is improving the model’s accuracy.

3.2 Traceability

Tracking means that from the moment a specific bee is identified, we continue to identify it in subsequent frames as the same bee we identified the first time, even if more bees enter and exit in subsequent frames. We know how to differentiate between the different bees by giving each bee a unique label. There are different tracking methods available. In this study, we used the SORT algorithm (Bewley et al. 2016), responsible for making associations with the detection we already have from the machine-learning model. The SORT algorithm receives an array of recent discoveries from the machine-learning model in each frame. In the first frame, the algorithm gives a unique label to each discovery and returns the list of discoveries with the unique label. In the following frames, the algorithm tries to associate the new detections it receives with the previous detections according to proximity, speed, and direction of movement. The bees in each frame of the video were marked in the frame: each bee was marked with a different color, and the label obtained from the tracking algorithm was added. We can see in the video after executing the algorithm that the same bee is marked with the same color and label throughout the different frames, even when it moves from place to place and when there are several bees in the video (see example in Fig. 12).

3.2.1 Challenges in using the tracking model

The tracking algorithm (Bewley et al. 2016) is incompatible with our model because it relies on detecting overlaps between identified areas of the same object across different frames to track objects continuously. However, our system deals with bees as objects, and the identified area of a bee in each frame is small, while bees move quickly. Therefore, the fast movement of bees creates significant distance between their locations in consecutive frames, making it impossible to establish the necessary overlaps for the algorithm to function effectively.

Several empirical experiments were conducted in search of a solution to this problem. In the end, we created a two-step fix for this situation.

The parameters of the tracking model were changed so that even if the object was not detected for several frames, the model still keeps track in case it detects a new overlap.
The area of the detected object (the square surrounding the bee) was artificially enlarged before it was reported to the tracking algorithm. It was reduced to its original size after the tracking algorithm returned its results. Therefore, the size of the original detection is preserved. Yet, the square is larger for the tracking algorithm. Thus, there is an overlap between the different identifications along the frames, which allows the tracking algorithm to maintain accurate tracking within the frames (see Fig. 13).

The explained fix enables the tracking algorithm to maintain consistent tracking of bees throughout the video. Note that despite the algorithm’s reasonable performance and the bees’ high flight speed, there are rare instances where the algorithm fails to track a bee and assign a new label to the same bee.

3.2.2 Limitations of the tracking model

The recognition model did not recognize all the bees in the frames. The reason for this is the small size of the bee and the fact that the camera is placed at a distance from the bees. In addition, sometimes, during fast flight, the bee looks smeared, and it is difficult for the human eye to identify and differentiate it from dirt or various stapler pins found on the hive in the video.

Another limitation relates to the speed of the bee’s flight. A consistency-tracking algorithm needs a very fast and powerful processing ability. This research uses a relatively strong computer, given budget limitations (Intel i7-10,750 H, 16GB RAM, GeForce RTX 2070 GPU). However, it was not fast enough for real-time calculation, which uses high-quality Full HD $1920\times 1080$ videos at 50 fps. We adjusted the algorithm to support the weaker hardware while still supporting real-time analysis. Every second frame was dropped out of the processing process, so we only handled 25 fps. This, in turn, increases the distance we find between the same bee’s locations in the different frames, which amplifies the problem mentioned and solved in Sect. 3.2.1.

4 Discussion and conclusion

The article suggests machine-learning-based systems for recognizing and tracking bees in both planar and spatial environments. This system can be used in various behavioral research with bees and other insects.

Besides developing a tool to study insect movement, the paper’s main contribution is the description of the different phases and the overall architecture of the machine-learning-based systems. A dedicated tracking system was adapted and built for each configuration of a bee—walking or flying, due to differences in the bee’s appearance. In that way, we ensure the system maximizes its performance since it is tailored specifically to the needed assignment.

In the first system, planar bee tracking system, the bees were recorded while walking in a two-dimensional maze. The maze videos are analyzed by creating training and testing databases, training the model, testing it, fine-tuning it, and eventually using the system to recognize the bee and analyze its path.

The second system, spatial bee tracking system, was built to track the bees in a spatial environment, such as when they were flying or walking in a closed 3-dimensional environment. This system requires building its own machine-learning model that includes the use of tracking algorithms and a few of our modifications to enable consistent tracking of the bees throughout the different frames of the videos. This system uses flight videos and is an infrastructure for future experiments investigating bees’ spatial behavior while flying. A comparison between the two systems appears in Table 2.

Table 2 Planar bee tracking system versus spatial bee tracking system

Full size table

The two systems share some common attributes and challenges. Both systems were used to observe the significant contribution that machine learning can make to the world of insect research. In training the models, we used simple computer vision algorithms to generate the needed datasets easily and quickly based on unique characteristics that appeared in videos. We showed that the resources needed to use these algorithms automatically instead of manually are significantly reduced. The systems we developed allow cheap real-time tracking, which improves the resources needed for experiments and allows accurate execution of experiments that need adjustments during their process. Furthermore, the entire process of recognizing and tracking the bee allowed for an accurate analysis of the duration of the movement, the direction, and the length of the path the bee made, providing insights into patterns, trends, and anomalies. Manual tracking allows for the sampling of simple parameters such as the time of the occurrence of events and their durations or the quantification of the number of errors the agent makes; more fine-grained characteristics that allow a better understanding of spatial behavior could not be calculated by manual tracking. Additional advantages of automatic over manual tracking are the ability to monitor multiple bees simultaneously or in extended observation periods, integration with other technologies, such as sensors and environmental monitoring devices, and consistency in tracking and analyzing bee behavior. Automatic tracking avoids variations that may occur in manual tracking due to factors like fatigue or subjective judgment.

Each system has its own challenges and limitations that we resolved and acknowledged (see Sects. 2.3.1 and 3.2.1). This study gives a detailed description of the system construction process and the approaches used while addressing these setbacks, which can be used as a blueprint for researchers while implementing similar systems or tackling similar challenges.

The systems described in this article were developed for experiments conducted to study the spatial cognition of bees, which were examined in laboratory conditions. To adapt the system to various applications in field conditions, one needs to handle the main challenge of identifying small flying bees in highly cluttered backgrounds (e.g., soil, rocks, plants). Although the system showed reasonable performance in a slightly noisy environment, more research is needed to improve the detection while also dealing with the high computational power required to handle fast cameras with high resolution. A further research extension could explore the flight patterns of bees, which would necessitate three-dimensional tracking. This process would involve the calibration of multiple cameras or integration of RFID tags into the experiment, along with adjustments to the models to accommodate these changes.

Recently, methods are being developed to employ unoccupied aerial vehicles (UAV, a.k.a. drones) for remote sensing of vegetation in order to estimate biodiversity and bee diversity and abundance (Torresani et al. 2023, 2024) and even for directly detecting small moving objects, such as bees, from videos recorded by these UAVs (Stojnić et al. 2021). An exciting, challenging future research might implement our system to track bees using UAVs.

Recognizing and tracking small objects with a fast movement pattern are complex operations that machine-learning systems will have to deal with in the coming years as part of the transition to a world with autonomous systems and a growing need to understand insect flight patterns and their implications. This work is meant as another milestone toward achieving this goal.

Data availability

The data supporting the findings of this study are available from the corresponding authors, TS and HD. It can also be found, along with the models, at the following URL: http://beedetector.apc.ac/.

Notes

Steps (a1)-(a2) are only required because the input data contained the bee in all video frames. If we have a frame of the maze without a bee, we can skip these steps.
Intersection over Union (IoU): measures the overlap between two boundaries. It is used to measure how much the predicted boundary overlaps with the ground truth.
Average precision (AP) a popular metric in measuring the accuracy of object detectors. It computes the average precision value for recall value over 0 to 1.

References

Alburaki M, Madella S, Corona M (2021) Rfid technology serving honey bee research: a comprehensive description of a 32-antenna system to study honey bee and queen behavior. Appl Syst Innov 4(4):88
Article Google Scholar
Arien Y, Dag A, Shafir S (2018) Omega-6:3 ratio more than absolute lipid level in diet affects associative learning in honey bees. Front Psych 9:1001
Article Google Scholar
Bandyopadhyay H (2022) YOLO: real-time object detection explained. https://www.v7labs.com/blog/yolo-object-detection
Barron A, Srinivasan MV (2006) Visual regulation of ground speed and headwind compensation in freely flying honey bees (Apis mellifera L.). J Exp Bio 209(5):978–984
Article Google Scholar
Belsky J, Joshi NK (2019) Impact of biotic and abiotic stressors on managed and feral bees. Insects 10(8):233
Article Google Scholar
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 3464–3468
Bhattacharya S, Maddikunta PKR, Pham QV, Gadekallu TR, Chowdhary CL, Alazab M, Piran MJ et al (2021) Deep learning and medical image processing for coronavirus (COVID-19) pandemic: a survey. Sustain Cities Soc 65:102589
Article Google Scholar
Bishop G, Welch G et al (2001) An introduction to the Kalman filter. Proc SIGGRAPH Course 8(27599–23175):41
Google Scholar
Bjerge K, Frigaard CE, Mikkelsen PH, Nielsen TH, Misbih M, Kryger P (2019) A computer vision system to monitor the infestation level of Varroa destructor in a honeybee colony. Comput Electron Agric 164:104898
Article Google Scholar
Bjerge K, Mann HM, Høye TT (2022) Real-time insect tracking and monitoring with computer vision and deep learning. Remote Sens Ecol Conserv 8(3):315–327
Article Google Scholar
Bozek K, Hebert L, Portugal Y, Mikheyev AS, Stephens GJ (2021) Markerless tracking of an entire honey bee colony. Nat Commun 12(1):1733
Article Google Scholar
Brown MF, Moore JA, Brown CH, Langheld KD (1997) The existence and extent of spatial working memory ability in honeybees. Anim Learn Behav 25:473–484
Article Google Scholar
Brum-Bastos V, Łoś M, Long JA, Nelson T, Demšar U (2022) Context-aware movement analysis in ecology: a systematic review. Int J Geogr Inf Sci 36(2):405–427
Article Google Scholar
Brunet J, Jiang Q, Zhao Y, Thairu MW, Clayton MK (2023) Bee species perform distinct foraging behaviors that are best described by different movement models. Sci Rep 13(1):71
Article Google Scholar
Chapman CA, Reyna-Hurtado R, Melletti M (2023) Linking movement ecology to conservation biology. Springer, Cham, pp 187–193
Google Scholar
Christakis DA, Ramirez JS, Ramirez JM (2012) Overstimulation of newborn mice leads to behavioral differences and deficits in cognitive performance. Sci Rep 2(1):546
Article Google Scholar
Crall JD, Gravish N, Mountcastle AM, Combes SA (2015) Beetag: A low-cost, image-based tracking system for the study of animal behavior and locomotion. PLoS ONE 10(9):0136487
Article Google Scholar
Cresswell JE, Osborne JL, Goulson D (2000) An economic model of the limits to foraging range in central place foragers with numerical solutions for bumblebees. Ecol Entomol 25(3):249–255
Article Google Scholar
Elizondo V, Briceño JC, Travieso CM, Alonso JB (2013) Video monitoring of a mite in honeybee cells. Adv Mat Res 664:1107–1113
Google Scholar
Fang W, Wang L, Ren P (2019) Tinier-YOLO: a real-time object detection method for constrained environments. IEEE Access 8:1935–1944
Article Google Scholar
Gomez-Moracho T, Heeb P, Lihoreau M (2017) Effects of parasites and pathogens on bee cognition. Eco Entomol 42:51–64
Article Google Scholar
Goulson D, Nicholls E, Botías C, Rotheray EL (2015) Bee declines driven by combined stress from parasites, pesticides, and lack of flowers. Science 347(6229):1255957
Article Google Scholar
Kendall LK, Mola JM, Portman ZM, Cariveau DP, Smith HG, Bartomeus I (2022) The potential and realized foraging movements of bees are differentially determined by body size and sociality. Wiley, Hoboken
Book Google Scholar
Kurzina NP, Aristova IY, Volnova AB, Gainetdinov RR (2020) Deficit in working memory and abnormal behavioral tactics in dopamine transporter knockout rats during training in the 8-arm maze. Behav Brain Res 390:112642
Article Google Scholar
Menzel R, Greggers U, Smith A, Berger S, Brandt R, Brunke S, Bundrock G, Hülse S, Plümpe T, Schaupp F et al (2005) Honey bees navigate according to a map-like spatial memory. Proc Natl Acad Sci USA 102(8):3040–3045
Article Google Scholar
Odemer R (2022) Approaches, challenges and recent advances in automated bee counting devices: a review. Ann Appl Biol 180(1):73–89
Article Google Scholar
Olton DS, Samuelson RJ (1976) Remembrance of places passed: spatial memory in rats. J Exp Psychol Anim Behav Process 2(2):97
Article Google Scholar
Ramírez M, Prendas JP, Travieso CM, Calderón R, Salas O (2012) Detection of the mite Varroa destructor in honey bee cells by video sequence processing. In: 2012 IEEE 16th international conference on intelligent engineering systems (INES). IEEE, pp 103–108
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Rodriguez IF, Megret R, Acuna E, Agosto-Rivera JL, Giray T (2018) Recognition of pollen-bearing bees from video using convolutional neural network. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 314–322
Samuelson EE, Chen-Wishart ZP, Gill RJ, Leadbeater E (2016) Effect of acute pesticide exposure on bee spatial working memory using an analogue of the radial-arm maze. Sci Rep 6(1):38957
Article Google Scholar
Stojnić V, Risojević V, Muštra M, Jovanović V, Filipi J, Kezić N, Babić Z (2021) A method for detection of small moving objects in uav videos. Remote Sens 13(4):653
Article Google Scholar
Toledo S, Shohami D, Schiffner I, Lourie E, Orchan Y, Bartan Y, Nathan R (2020) Cognitive map-based navigation in wild bats revealed by a new high-throughput tracking system. Science 369(6500):188–193
Article Google Scholar
Torresani M, Kleijn D, Vries JPR, Bartholomeus H, Chieffallo L, Gatti RC, Moudrỳ V, Da Re D, Tomelleri E, Rocchini D (2023) A novel approach for surveying flowers as a proxy for bee pollinators using drone images. Ecol Ind 149:110123
Article Google Scholar
Torresani M, Rocchini D, Ceola G, Vries JPR, Feilhauer H, Moudrỳ V, Bartholomeus H, Perrone M, Anderle M, Gamper HA et al (2024) Grassland vertical height heterogeneity predicts flower and bee diversity: an uav photogrammetric approach. Sci Rep 14(1):809
Article Google Scholar
Ushitani T, Perry CJ, Cheng K, Barron AB (2016) Accelerated behavioural development changes fine-scale search behaviour and spatial memory in honey bees (apis mellifera l.). J Exp Bio 219(3):412–418
Google Scholar
Warren RJ, Colin T, Quarrell SR, Barron AB, Allen GR (2024) Quantifying the impact of crop coverings on honey bee orientation and foraging in sweet cherry orchards using rfid. J Appl Entomol 00:1–16
Google Scholar
Zhang CJ, Liu T, Wang J, Zhai D, Zhang Y, Gao Y, Wu HZ, Yu J, Chen M (2024) Evaluation of the yolo models for discrimination of the alfalfa pollinating bee species. J Asia-Pacific Entomol 27(1):102195
Article Google Scholar

Download references

Acknowledgements

This research was supported by the Israel Science Foundation (grant No. 2620/21) and the SCE grant for institutional collaboration.

Author information

Authors and Affiliations

Department of Entomology, Institute of Environmental Sciences, Faculty of Agriculture, Food, and the Environment, The Hebrew University of Jerusalem, Rehovot, Israel
Erez Rozenbaum & Sharoni Shafir
Department of Software Engineering, Shamoon College of Engineering (SCE), Jabotinsky, Ashdod, Israel
Tammar Shrot, Hadassa Daltrophe & Yehuda Kunya

Authors

Erez Rozenbaum
View author publications
Search author on:PubMed Google Scholar
Tammar Shrot
View author publications
Search author on:PubMed Google Scholar
Hadassa Daltrophe
View author publications
Search author on:PubMed Google Scholar
Yehuda Kunya
View author publications
Search author on:PubMed Google Scholar
Sharoni Shafir
View author publications
Search author on:PubMed Google Scholar

Contributions

Methodology TS, HD, and SS; Software YK; Validation YK and ER; formal analysis YK and ER; Investigation YK, ER, SS, TS and HD; resources, YK and ER; data curation, YK and ER; writing—original draft preparation TS and HD; writing—review and editing TS, HD and SS; visualization YK, ER, TS and HD; supervision SS, TS and HD; project administration SS, TS and HD.

Corresponding authors

Correspondence to Tammar Shrot or Hadassa Daltrophe.

Ethics declarations

Conflict of interest

The authors declare no potential Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Rozenbaum, E., Shrot, T., Daltrophe, H. et al. Machine learning-based bee recognition and tracking for advancing insect behavior research. Artif Intell Rev 57, 245 (2024). https://doi.org/10.1007/s10462-024-10879-z

Download citation

Accepted: 25 July 2024
Published: 12 August 2024
DOI: https://doi.org/10.1007/s10462-024-10879-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Machine learning-based bee recognition and tracking for advancing insect behavior research

Abstract

Similar content being viewed by others

Image-based South Asian bee species identification: a machine learning approach

Individual honey bee tracking in a beehive environment using deep learning and Kalman filter

A Real-Time Edge Computing System for Monitoring Bees at Flowers

Explore related subjects

1 Introduction

1.1 Background

1.2 Machine learning background

2 Planar bee tracking system

2.1 Biological and technical description of the input data

2.2 Constructing the model

2.2.1 Creating the training set

2.2.2 Training the model

2.3 Recognizing a bee in the maze

2.3.1 Challenges in using the model

3 Spatial bee tracking system

3.1 Training the model

3.2 Traceability

3.2.1 Challenges in using the tracking model

3.2.2 Limitations of the tracking model

4 Discussion and conclusion

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords