0% found this document useful (0 votes)

27 views18 pages

Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

This document proposes a zone-based clustering approach to enhance pedestrian group detection and tracking in autonomous vehicles. The approach divides the camera image into zones and performs clustering within each zone to minimize mis-groupings. Evaluation on standard datasets shows the zone-based method outperforms other approaches in addressing scenarios where pedestrians from different fields of view intersect.

Uploaded by

llama8873

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views18 pages

Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

Uploaded by

llama8873

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Received 20 September 2023, accepted 16 November 2023, date of publication 23 November 2023,

date of current version 29 November 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3336592

Enhancing Pedestrian Group Detection and

Tracking Through Zone-Based Clustering
MINGZUOYANG CHEN , SHADI BANITAAN , (Member, IEEE), AND MINA MALEKI
Department of Electrical and Computer Engineering, University of Detroit Mercy, Detroit, MI 48221, USA
Department of Computer Science, University of Detroit Mercy, Detroit, MI 48221, USA
Corresponding author: Mingzuoyang Chen ([email protected])

ABSTRACT Advancements in self-driving car technology have the potential to revolutionize transportation
by enhancing safety, efficiency, and accessibility. Nonetheless, the successful integration of autonomous
vehicles into our urban landscapes necessitates robust and reliable pedestrian detection and tracking systems.
As we frequently observe pedestrians moving together, tracking them as a group becomes a beneficial
approach, mitigating occlusion and enhancing both the accuracy and speed of object detection and tracking.
However, utilizing a human-view camera in an autonomous vehicle presents challenges as pedestrians
occupy varied fields of view. In some instances, pedestrians closer to the camera may overlap with those
farther away, as seen from the camera’s viewpoint, which causes the mis-groupings to happen. To address
these challenges, we proposed a strategy to divide the image into distinct zones and perform grouping within
each, significantly minimizing mis-groupings. First, an object detection method extracted pedestrians and
their bounding boxes from an image. Second, zone detection was applied to separate the image into several
zones. Third, clustering methods were applied to detect pedestrian groups within each zone. Last, the object
tracking method was utilized to track pedestrian groups. We repeated the process over a ten-frame sequence
to achieve better performance, with object detection executed in the first frame and object tracking in the
remaining nine frames. The comparison of processing times of different group detection methods indicated
that tracking pedestrian groups is more time-efficient than tracking individuals and achieved a 4.5% to 14.1%
improvement. Furthermore, according to the Adjusted Rand Index (ARI) evaluation metric, our proposed
zone-based group detection method outperforms the other commonly used approaches by achieving scores
of 0.635 on the MOT17 dataset and 0.781 on the KITTI dataset. In addition, the proposed approach surpasses
the other approaches in addressing scenarios where individuals from different fields of view intersect with
each other.

INDEX TERMS DBSCAN, K-means, object detection, object tracking, zone-based group detection.

I. INTRODUCTION While safety belts and various vehicle components can

As self-driving technology progresses and becomes more protect drivers, improving pedestrian safety has become a
popular; engineers must examine not only the path for the pressing issue for scientists to address. Drivers must react
vehicle to its destination but also safety issues that impact appropriately to pedestrian behavior to prevent accidents,
both the driver and pedestrian. The United States Department particularly when drivers may not be able to maintain
of Transportation reports that the majority of unexpected complete focus on the road. Therefore, identifying pedestrian
deaths are due to highway accidents, with an average of behavior is instrumental in preventing accidents [1].
39,000 fatalities occurring annually in the US between Nonetheless, predicting pedestrian behavior poses a con-
2016 and now, with at least 94% of those fatalities occurring siderable challenge due to the complex nature of their
on highways and affecting both drivers and pedestrians. environment. Road obstacles or the presence of other
individuals can readily disrupt head orientation and human
The associate editor coordinating the review of this manuscript and trajectory, undermining the accuracy of existing techniques.
approving it for publication was Yongming Li . While humans can intuitively monitor pedestrian activities,
2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
132162 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

autonomous systems lack this proficiency. Hence, machine Through continuous observation of street activity, we have
learning methodologies are crucial to train these systems to frequently noticed that individuals always walk in groups.
recognize pedestrians and utilize sophisticated techniques to These group members or unrelated individuals sharing a
comprehend their actions. common destination tend to move with analogous speeds
In recent years, numerous autonomous vehicle technolo- and directions. During synchronized motion, one individual
gies have been developed to caution drivers and prevent often occludes another, impacting tracking accuracy and
pedestrian injuries, including but not limited to Forward speed. Hence, tracking these individuals collectively has
Collision Warning, Pedestrian Automatic Emergency Brak- emerged as a potential solution. Tracking them as a group
ing, and Rear-view Video Systems [2], [3]. These functions can help with occlusion and enhance both the accuracy
principally rely on various types of sensors like cameras and and speed of object detection and tracking, as one member
Lidars. Given these sensor capabilities, an increasing number occluding another will no longer be an issue. Furthermore,
of researchers have shifted their focus towards pedestrian tracking a group can provide insights into group dynamics as
tracking, given its potential to alleviate traffic congestion and well as the interactions between its members. Additionally,
preserve human lives. this can bolster the safety of autonomous vehicles by
Object tracking is an essential aspect of computer vision anticipating group behavior and making requisite motion
used to track objects’ movement within a video stream. adjustments to avert potential collisions [13], [14], [15].
Currently, the predominant pedestrian tracking methods take This innovative approach has given rise to group detection
in a set of initial object detection and track the objects to tackle the challenges inherent in traditional pedestrian
as they traverse through a video. Object detection models tracking methods.
can provide several parameters for object tracking models In our thought, group detection can be accomplished by
to extract features and locate pedestrians on the street, the classification process. We aim to separate the pedestrians
in which points [4], [5], [6] or bounding boxes [7], [8], [9] into several groups using various clustering methods. After
are the most frequently utilized parameters for identifying successfully grouping the pedestrians, the position and
pedestrian locations within an image, each offering their movement of occluded pedestrians can be inferred based on
unique advantages. Connecting points can identify objects the behavior of their nearby pedestrians. This process would
based on the point characteristics, such as finding the joints also reduce the processing time by decreasing the number
of a pedestrian. On the other hand, a bounding box, typically of tracking objects, as corroborated by our previous paper
a rectangle encircling an object and defined by the lower [13], [16]. In this paper, we continue to focus primarily
right and upper left coordinates, can identify pedestrians on pedestrian group detection. We employed bounding box
with less data but cannot accurately define their head coordinates provided by the Pytorch object detection method
orientation. as the input of pedestrian group detection methods, and we
Recent research in pedestrian tracking has primarily outputted the group labels for each pedestrian and the groups’
focused on enhancing accuracy and processing speed. bounding box coordinates. At last, we forwarded these
However, to constitute a comprehensive pedestrian tracking groups’ coordinates to the object tracking method. The com-
algorithm, it necessitates the execution of several numerous parison between K-Means and DBSCAN clustering methods
tasks, including object detection, localization, classification, showed the effectiveness of clustering pedestrians when
and object tracking. While strides have been made in no strangers cross each other. Nonetheless, their capability
the advancement of object detection methodologies, some diminishes when handling pedestrians from varying fields of
researchers have begun to divert their focus towards explor- view. In this study, we extended pedestrian group detection
ing pedestrian interactions [10], [11] and extending the with Grid clustering and further innovated by adding zone
techniques to enhance tracking precision, thereby ensuring detection to the traditional K-Means and DBSCAN clustering
pedestrian safety [12]. methods, named Z-KMeans and Z-DBSCAN. These methods
Generally, the background density is directly proportional were tested on two widely utilized datasets and evaluated
to the ease of object tracking. Additionally, pedestrians are accordingly. To further enhance our evaluation, we generated
frequently occluded, leading to challenges in crowded envi- the group’s ground truth labels on both datasets’ videos based
ronments due to occlusion and reduced tracking accuracy. on human observers, which were used to assess our grouping
Traditional pedestrian tracking methods primarily emphasize results.
individual tracking. To maintain consistency in tracking The main contributions of this work are as follows:
the same objects, these methods necessitate incorporating • Introducing a dynamic zone-based pedestrian group
specific features for reassociating a lost individual to its detection method that improves object detection and
preceding tracking ID. The implementation of this additional tracking by addressing occlusion challenges and effec-
process could potentially slow down the tracking speed. tively handling varied views of pedestrians, ensuring
When tracking pedestrians in a dense crowd, it will take more accurate detection regardless of proximity to the human-
time than anticipated. Therefore, it is imperative to discover view camera.
an appropriate solution to this challenge in order to reduce the • Demonstrating that tracking pedestrian groups is more
processing time. time efficient than tracking individual pedestrians,

VOLUME 11, 2023 132163

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

reducing the tracking processing time in the range of confidence. However, HVNet [23] applied an attentive voxel
4.5% to 14.1% using various proposed pedestrian group feature encoder of different scales to find the balance between
detection methods. the voxel size and inference time and achieved a better mAP
The paper is structured as follows: Section II describes the with high inference speed. However, the accuracy of these
background of pedestrian tracking and group detection. detection methods is heavily dependent on the quality of the
The coding detail and clustering methods are described in image. If we use a single camera instead, the results of these
Section III, followed by the comparison of different approach methods might not be reliable.
results based on the processing time and evaluation metrics in Typically, researchers will evaluate their detection
Section IV. Finally, Section V discusses the Conclusions and results through Average Precision(AP), the area of the
future work. precision-recall curve whose calculation incorporates several
other metrics. It is a metric commonly used to analyze
II. RELATED WORK the performance of object detection and segmentation
Object tracking is an essential aspect of computer vision systems. Table. 1 presented below compares AP and
used to track objects’ movement in a video stream. The Frames Per Second (FPS) across various Information fusion-
method in itself is usually incapable of detecting the objects. based detectors. M3D-RPN achieved the highest AP within
It invariably requires the integration of an additional object the KITTI dataset, while HVNet registered the top FPS
detection process. performance.

A. OBJECT DETECTION
Object detection is a technique used to detect specific objects 2) CNN-BASED DETECTORS
in an image or video based on machine learning and computer In other studies, Convolutional Neural Networks (CNN) are
vision technologies [17]. Its accuracy can be affected by used to hierarchically learn the features we need, which
background, resolution, light, etc. As a result, this field has could guarantee both speed and accuracy [24], [25]. Detectors
received much attention in recent years [17]. In this work, based on CNN are generally categorized into two types: two-
we mainly introduce two different types of object detection stage detectors and single-stage detectors. The main goal of
methods, which are information fusion-based detections and single-stage detectors is to achieve high speed and acceptable
CNN-based detections. accuracy. Two-stage detectors, however, divide this process
into two steps, with the first step generating proposals and the
1) INFORMATION FUSION-BASED DETECTORS second step recognizing these proposals. It always has higher
In order to make some improvements, some developers tried accuracy at a slower speed [26].
to use multiple or accurate sensors to get more reliable and CNN-based two-stage detectors
precise data as input. Such as Liang et al. [18] exploited both After many years of development, regions with deep
LIDAR and cameras to detect objects in 3D accurately. They convolutional neural network features (R-CNN) have become
proposed a novel end-to-end learnable 3D object detection the most popular CNN-based two-stage detector model since
method with a continuous fusion layer and transferred the it limits the number of regions used in the selective search.
detected data into a bird’s eye view. Their testing result on As a widely used object detection method, Faster R-CNN
KITTI and TOR4D shows that their approach significantly [27] designed a Region Proposal Network to share image
increases detection accuracy. And Qi et al. [19] used CNN features with the detection network. In combination with Fast
to detect objects in 2D, then extended to a 3D viewing R-CNN, it achieved state-of-the-art object detection accuracy.
frustum by RGB-D data from 3D sensors, which greatly To reduce the processing time, Yan et al. [28] removed the
increased the localization accuracy for many nearby detection RoI-Pooling layer, which prevents the process of candidate
boxes. As for the un-blocked objects, their method achieved boxes in the first stage and the calculation in the second stage.
great accuracy in 3D segmentation and object detection. To Nabati et al. [29] generated pre-defined anchor boxes by
leverage the relationship between 2D scale and 3D depth, transferring the Radar data into the image coordinate system
M3D-RPN [20] shared 2D and 3D anchors and applied to provide more accurate parameters for object detection.
depth-aware convolution layers to develop the spatially- The ThunderNet [30] used two efficient architecture blocks
aware features, which improve the detection accuracy of to discover more discriminative features. With its backbone
bird’s eye view 3D object detection. Surfconv [21] tried to and the design for detection, it could even achieve smaller
solve this problem by combining a depth-aware multi-scale processing than one-stage detectors. Also, MaxpoolNMS
2D convolution with a Data-Driven Depth Discretization [31] applied a novel multi-scale multi-channel max-pooling
scheme. They used less than 30% data while getting similar strategy to find the peak in objectness score maps and
performance to the SOTA 3D-convolution-based methods. remove duplicate objects. Compared with GreedyNMS, their
EPNet [22] integrated the LI-Fusion module and CE loss method had similar accuracy with a massive speed up. CPN
to overcome the inconsistency between the localization of [32] avoided false-positive detection results by identifying
multiple sensors, in which LI-Fusion was used to enhance potential corner keypoint combinations and giving each
the point features and CE loss to increase the localization independent classification a class label and surpassed most

132164 VOLUME 11, 2023

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

TABLE 1. Summary of information fusion-based detectors.

SOTA object detection methods. Xie et al. [33] proposed Table. 2 below provides a comparison among various
an object detection method called oriented R-CNN. It is a CNN-based object detection methods. As previously dis-
practical two-stage detector containing both oriented Region cussed, most two-stage detectors achieved a higher AP, while
Proposal Network and oriented R-CNN. The oriented RPN the one-stage detectors registered superior FPS performance.
was used to generate the high-quality oriented proposal, and As one of the most widely used SOTA methods, Faster
the oriented R-CNN was used to identify oriented Regions R-CNN was used in our work to detect objects due to its
of Interest, which achieved a great degree of accuracy on accuracy and speed. As well as being able to discover the
two widely used datasets. In order to make the method more features automatically, it was robust with interrupts and could
flexible and responsive, Sun et al. [34] applied a smaller achieve real-time processing.
proposal box. In their Sparse R-CNN, classification was done
using a fixed sparse set, and the location was achieved using B. OBJECT TRACKING
dynamic heads, which got a great accuracy on the COCO Due to the complicity of the environment, pedestrian tracking
dataset with a shorter running time. is a complex task. The vehicles on the road need to take
CNN-based one-stage detectors suitable reactions based on the pedestrian direction. If we can
As a popular one-stage object detection method, YOLOv3 track the pedestrians, it will be helpful to prevent accident
[35] extended previous work by calculating the score of occurrences [8] since the driver may not be able to pay
each bounding box by logistic regression and replacing the attention to their view of the road at all times. As a result,
softmax with an independent logistic classifier, which made we may need to use technologies like object tracking to alert
their method more accurate and fast. There are also some drivers to these conditions, potentially reducing the number
other researchers using the neural network with different of accidents.
structures. For example, Zhou [36] developed a novel Scale- Object tracking is another essential aspect of computer
Transferrable Detection Network to find the interaction vision. It is divided into model-free-tracking (MFT) and
between detection scales. Using the embedded super- tracking-by-detection (TBD). In MFT-based methods, pedes-
resolution layers, they could explicitly explore the interscale trians are manually initialized in the first frame and tracked in
consistency nature across multiple detection scales, leading the subsequent frames. TBD-based methods, however, locate
to their method outperforming SOTA methods. In FCOS [37], pedestrians and associate their hypotheses with trajectories
the pre-defined anchor boxes were removed from the neural in each frame. Due to their greater accuracy and lack of
network. With the non-maximum suppression as the only requiring the number of pedestrians to be defined, TBD-
post-processing step, it became simpler with high detection based methods are more common. In addition, they typically
accuracy. However, MimicDet [38] trains the one-stage include feature extraction and data association [42]. There
detector by the two-stage features and shares the backbone are a number of approaches for tracking pedestrians that
for two detectors. Their method increased the accuracy with are generated via object detection. Xinshuo et al. [43]
a limited time increase. By abandoning the upsampling layers use the Kalman filter to locate the pedestrian and the
and refinement stage, Yang [39] proposed a delicate box Hungarian algorithm to link the tracking results and prior
prediction network with a candidate generation layer to lower object detection results. References [44] and [45] used
the detection processing time. Chen et al. [40] developed re-identification to discover the relationship between the
an I 3 Net that specific working one-stage detectors to learn detected pedestrians in each frame. In the actual world,
instance-invariant features by features in different layers. it is difficult to assert that we can track a person without
With three aspects integrated, their method exceeded the being disturbed because pedestrians regularly are blocked by
SOTA accuracy. By applying the Spatial-Semantic Feature others. Some solutions tackle this difficulty by saving and
Aggregation module, Cia-ssd [41] could accurately predict utilizing the old path to construct a new one. The path is
the bounding box coordinates, and the Distance-variant linked to the projected trajectory once it is discovered again.
IoU-weighted NMS significantly increased the speed and Object tracking has been the subject of much research in
accuracy of localization. recent years, such as CSR-DCF tracker [46], which stands

VOLUME 11, 2023 132165

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

TABLE 2. Summary of CNN-based detectors.

for Discriminative Correlation Filter Tracker with Channel cameras. It also improved the robustness of the method
and Spatial Reliability. It increases the search region and the through their proposed attention module, which made human
tracking accuracy of non-rectangular objects by adjusting the and vehicle tracking more accurate. However, TrackFormer
filter support through the spatial reliability map. DMAN [47] [53] applied global frame-level features on both self- and
proposed an approach that integrated single object tracking encoder-decoder attention and no need for additional graph
and data association into the unified framework, which optimization. Their method achieved better results on both
made the network focus more on the interaction between multi-object tracking and segmentation. To solve the problem
images to limit noise. The results show that their method of object occlusion, ByteTrack [54] found the similarities
outperformed both online and offline trackers. To ensure in the trajectories of low-scoring identified objects and then
reliability, mmMOT [48] ran the sensors independently reconstructed the real objects, which made their tracking
and utilized a novel multi-modality fusion module. Their more accurate without an ID switch. Further, MOTR [55]
method could optimize the base feature extractor of each developed a ‘track query’ to analyze the tracked instances
sensor and enhance the framework’s robustness, subsequently and updated every frame to predict accurately. Their method
improving accuracy. A unified framework that combined significantly outperformed the SOTA methods by applying
object and trajectory was proposed by [49] to combine the a temporal aggregation network. EPformer [56] introduced
model’s appearance and topology, subsequently integrating an architecture capable of operating in parallel with the
object features and trajectory. Their feed-forward network Transformer network. It employed a Feature Fusion Head
with end-to-end training could achieve SOTA accuracy (FFH) to extract salient features. The integration of an
on the MOTChallenge dataset with the public detector. attention mechanism enabled the model to thoroughly mine
FAMNet [50] used a single network to refine feature rich contextual information, resulting in enhanced accuracy.
extraction and affinity estimation. After combining single Our research will use the CSR-DCF tracker since it has
object tracking with dedicated target management, their a relatively low frame rate but higher accuracy. A summary
method could recover false negatives and filter the noise from showing the multi-object tracking accuracy (MOTA) of the
public detection methods. Using previous frame features, mentioned methods is shown in Table. 3. MOTA is an
TransTrack [51] continuously detected objects in the current evaluation matrix widely used to measure the detector and
frame, and the learned object query was used to detect tracker’s overall accuracy. The MOTA score is calculated
new objects. Therefore, their method only needs to apply using Equation 1. In this equation, ‘FN’ refers to the number
object detection on the first frame, and tracking-by-detection of false negatives, which represents the number of objects that
methods become more efficient and accurate. FairMOT [7], are not successfully detected, and ‘FP’ refers to the number
which tries to balance the detection and re-identification of false positives, which represents the number of objects that
to reduce the ID switches during the tracking. In order are incorrectly detected, ‘IDS’ refers to the identity switches,
to get more accurate results, they also generated a new which is the number of incorrect associations between ground
self-supervised learning approach to improve accuracy and truth objects and tracked objects, and ‘T’ refers to the total
processing time. In DyGLIP [52], a dynamic graph model was number of ground truth objects. The higher the MOTA
applied to communicate the tracking ID between different score, the better the tracking performance. A perfect score

132166 VOLUME 11, 2023

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

of 1 indicates no tracking errors. special loss function to limit the error. However, [63] found
FN + FP + IDS the pedestrian groups by comparing their similar motion
MOTA = 1 − (1) patterns and generating the motion similarity graph. Then,
T
they applied a spectral clustering algorithm on the graph to
cluster the pedestrians. The comparing result showed that
C. GROUP DETECTION AND TRACKING their method could efficiently identify pedestrian groups.
Nowadays, we frequently observe pedestrians walking Similarly, [64] developed a Multiview-Based Parameter-Free
together when we walk down the street. They might be a framework to group the pedestrians based on the trajectories.
couple, family, or a group of friends, in which case they It uses feature extraction on the point cloud of the pedestrians
might have similar speed and direction [10], [57], [58]. to detect the motion, then clusters them based on the density.
Even strangers moving at the same or different speeds In [10], a time-sequence DBSCAN method combined with
might stay together for multiple frames. These situations additional input, a coexisting time ratio, was used to cluster
cause collision problems frequently between group members, the pedestrians. They proved that group detection could
which lowers the performance of pedestrian tracking and provide more reliable trajectory prediction results with low
trajectory prediction, making the methods that are trying runtime complexity.
to solve these problems complex. These disadvantages are Table. 4 showed different papers’ algorithm types and their
what we aim to avoid. Since object tracking processing time purposes.
lies on object numbers, grouping them and tracking those
groups with a lower object number than the original would be 3) GROUP DETECTION APPLICATIONS
helpful. This concept is called group detection. It is becoming Pedestrian group detection is gaining prominence in public
increasingly important to study the behavior of these groups, safety and security. The larger the pedestrian group, the
not only to benefit society but also to advance autonomous higher the risk of accidents. Any abnormal behavior can
vehicle applications. potentially result in severe implications, particularly in
densely populated areas. Monitoring pedestrian groups can
1) SPATIO-TEMPORAL-BASED GROUP DETECTION not only help us manage the crowds and prevent accidents
There exist several research projects that are working on but also find potential health threats.
pedestrian group detection, with some of these working During the holiday seasons and festivals, popular tourist
on using spatiotemporal data as the parameter to increase destinations are frequented by a vast number of visitors,
the accuracy. A temporal-spatial method has been proposed introducing potential safety hazards such as violence or
in [59] for clustering and locating the pedestrian groups, stampedes. Technologies such as pedestrian group detection
followed by using a CRF-based event detection mechanism are essential to regulate pedestrian flow and establish
for recognizing contextual behaviors. They were trying to emergency exit paths. This will significantly mitigate the risk
improve their Groupon system, a device that generates of stampedes and violent incidents.
coupons for groups, by finding pedestrian groups in indoor Furthermore, finding pedestrian groups can play a vital
scenarios and getting great accuracy. Such as Zaki [58] role in preventing the spread of infectious diseases. As an
tried to use object tracking to find small groups of example of COVID-19 at now. It transmits through tiny
pedestrians. They introduced a dissimilarity measure to find droplets containing the virus, inhaled when in close proximity
the similarity between different pedestrians’ behaviors and to infected individuals. Pedestrian group detection could help
achieved an accuracy of 77%. Furthermore, [60] used spatial us find potentially infected people and prevent high crowds of
formation to determine whether the pedestrians are in a people from infecting each other.
group. Their method could give the probability that two
pedestrians belong to the same group with high accuracy. 4) LIMITATIONS OF EXISTING TECHNIQUES
Also, [61] worked on finding the interaction between According to these studies, pedestrian grouping can be con-
pedestrians. They tried to use distances between individuals, sidered a potential branch of computer vision. By applying
relative orientations, and velocity differences as additional group detection, we can expedite pedestrian tracking by
inputs to implement interaction behavior in robots. The reducing the number of objects to be tracked. However, most
result showed that these parameters enhance the detection of these studies utilized an overhead camera, which caused
accuracy and could be applied in robot areas with great the field of view within the image to differ from a vehicle-
potential. mounted camera. Furthermore, none of these studies worked
on using group detection to reduce the processing time of
2) CLUSTER-BASED GROUP DETECTION pedestrian tracking. Primarily, these researchers have studied
Some other researchers also tried to use the clustering group detection to enhance the accuracy of techniques like
method to achieve pedestrian group detection. Reference pedestrian tracking and trajectory prediction or to facilitate
[62] designed the sociologically grounded features to dis- traffic management. This paper is the first to focus on
cover group characteristics. They used the Structural SVM reducing pedestrian tracking processing time through group
framework to discover the clustering rules and provided a detection.

VOLUME 11, 2023 132167

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

TABLE 3. Summary of object tracking methods.

TABLE 4. Summary of pedestrian group detection methods.

FIGURE 1. Structure of pedestrian group detection and tracking.

III. METHODOLOGY time after we tracked the pedestrian groups with the specific
When pedestrians walk down the street as a group, they are tracker over a ten-frame sequence, with object detection on
mostly close to each other in both vertical and horizontal the first frame and object tracking on the rest of the frames.
directions. This will make their coordinates in the image The Adjust Rand Index evaluation matrix was selected to test
sometimes distribute closely and could be separated into our clustering results.
several groups. We aim to discover a suitable pedestrian group
detection method that can correctly group the pedestrians as
we want. The general layout of our work is described in Fig. 1. A. OBJECT DETECTION
First, We applied the object detection method to the image The fasterrcnn_resnet50_fpn model is used in our work to
and filtered out the non-pedestrian objects and pedestrians detect pedestrians. This model can be found in PyTorch [66].
far away based on the object filtering section below. Then, It is a Faster R-CNN model combined with a ResNet-50-
different kinds of clustering methods were used to group FPN backbone. The Faster R-CNN model [27] has proven
the pedestrians. At last, we calculated the total processing in recent SOTA papers that it is a suitable method for object

132168 VOLUME 11, 2023

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

detection [7], [24], [67]. R-CNN stands for region-based consistently failed to detect. The camera, which the dataset
convolutional neural network. It first applies a selective generators used, is adjusted for human vision. In other words,
search to find objects, and CNN performs feature extraction. long-distance objects will become smaller. Consequently,
Then, it feeds the features into an SVM model to classify the we needed a method to filter out such pedestrians, allowing
object labels. However, R-CNN is not capable of achieving us to concentrate more on continuous tracking and streamline
real-time processes. Fast R-CNN extends the method by subsequent comparisons.
feeding the recorded region back to the neural network, Since we are using a human view camera, the size of the
greatly decreasing the R-CNN processing time. Using the pedestrian bounding box cannot provide the measurement
Region Proposal Network (RPN) makes the Faster R-CNN of distance since children that are closed also have small
outperform and more acceptable to most SOTA methods. bounding boxes. Hence, we used the lower layer coordinate
Faster R-CNN also reshapes the image by using the ROI of the bounding box as the parameter for filtering, which most
pooling layer to feed the image back to the network with often represents the level of pedestrians’ feet. This correlates
the same size. Unlike selective search, it divides the network with camera distance. Therefore, the threshold was set to
into different parts and lets the network learn the region half the length of the picture, and any detected pedestrians
proposals by itself. As a result, the Faster R-CNN can be used whose lower edges exceeded it were removed. The result of
in real-time object detection. the filtering process for the first frame of MOT17-02 is shown
As inputs for object detection, the MOT17Det dataset in Fig. 3. After observing the filtered object detection results
[68] contains 1920*1080 images in JPG format for several on the following frames in the videos, object filtering proved
videos, and the KITTI dataset [69] contains 1242*375 effective.
images in PNG format. Both datasets were combined with
object detection ground truth coordinates. The chosen object
detection algorithm is applied directly to these images
without compressing them in order to achieve accurate
detection results. The model will return the detected object
labels and their bounding box top-left and lower-right
coordinates. The detection result on an example frame of
MOT17-02 is shown in Fig. 2.

FIGURE 3. Object detection result with filtering.

C. PEDESTRIAN GROUP DETECTION

To reduce the number of tracking objects, shorten processing
time, and prevent accidents, this work utilized clustering
techniques to group pedestrians. In our opinion, clustering
FIGURE 2. Original object detection result on MOT17-02. is a suitable technique for detecting pedestrian groups as it
has a limited processing time and could get results as soon as
possible.
B. OBJECT FILTERING Clustering is a technique that processes unlabeled data into
After the object detection, we are getting 2D coordinates groups based on similarities or differences. It determines a
((x1, y1),(x2, y2)) for our following step, where (x1, y1) cluster structure in a dataset characterized by the highest
denotes the upper-left coordinate and (x2, y2) for the lower- degree of similarity within a cluster and the most significant
right coordinate. When we ran the object detection on the degree of dissimilarity between clusters. In general, cluster-
entire video, we observed that objects positioned far from ing methods can be categorized according to partitioning,
the camera were frequently absent in the detection results. density, and model [70], [71], [72]. Partitional clustering
A common cause of this is that the object detection method methods commonly distinguish the clusters by applying
we chose is not adequately accurate, and it cannot always distance functions, with K-means clustering being one of the
detect pedestrians far from the camera. To address this, object most popular. However, different cluster shapes can be found
filtering was applied. using density-based methods. As an example, DBSCAN is
Our object filtering step first removed the coordinates well-known for identifying clusters of any shape [73].
of objects that were not pedestrians based on their labels. In this study, we employed five distinct clustering methods
Then, we need to remove those pedestrians that the algorithm to identify pedestrian groups. We aimed to compare these

VOLUME 11, 2023 132169

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

techniques across two diverse datasets and refine the 2) DBSCAN CLUSTERING METHOD
methodology to cater to various situations. Furthermore, our DBSCAN [77], which stands for Density-Based Spatial
study also encompasses the tracking of pedestrian groups. Clustering of Applications with Noise, is a data clustering
algorithm frequently used in data mining. It uses two
1) K-MEANS CLUSTERING METHOD parameters, Epsilon and MinPts, to determine if points are
K-Means [74] is one of the popular unsupervised machine in a group. Epsilon specifies the minimum distance between
learning algorithms due to their popularity and usableness points to form a cluster, and MinPts is the minimum number
[75], which could divide the observations into k clusters of points required to make a dense zone. A data point must
through multiple iterations. It will generate k random points have at least MinPts within the circle to be classified as a
and use the Euclidean distances between them to cluster and Core point. If the number of points is less than MinPts, it will
iterate until the center points do not change. be named a border point; otherwise, a data point having no
The detailed process and pseudocode of K-Means cluster- neighbor will be considered an error point.
ing are shown in Algorithms 1. The algorithm first chooses The Algorithms 2 and 3 below show the key of DBSCAN,
k random centroids Ck . For each data point in N that needs in which function RangeQuery(p, ϵ) provides a p’s neighbors
to be clustered, calculate the distance to the chosen centroids, list that is contained in a range of ϵ [78]. It first selects a
assign it to the nearest centroid, and form clusters. Next, the random data point p and calculates the distance to all other
method will update the new centroids in each cluster Ck by data points. If the number of the points within ϵ is larger than
the average value of data points. The method keeps repeating MinPts, p will be assigned as a core point to form a cluster
until the center points remain constant, indicating clustering and merge the density-reachable point into it as shown in
is complete. Algorithms 3. However, if the number of the points within
ϵ is smaller than MinPts, p will be assigned as a noise unless
Algorithm 1 K-Means Clustering Method [76] it is a density-reachable point of another core point q. If no
Input: Number of centroids k; Set of points N; List of new cluster forms, the algorithm will repeat the entire work
centroids randomly assigned Ck until all the data points are assigned.
Output: Set of clusters with their centroids
Begin
Algorithm 2 DBSCAN Clustering Method [78]
1: Repeat
2: for each data point in N do Input:
3: Calculate the distance between the data point P: data points;
and the centroid of each cluster. ϵ: the radius of a neighborhood with respect to some point;
4: Assign the data point to the nearest centroid. MinPts: the minimum number of points required to form a
5: end for dense region;
6: for each cluster in Ck do Initialize cluster id C = 0
7: Calculate the new centroid position 1: for each unclassified point p ∈ P do
8: end for 2: Nϵ (p) = RangeQuery(p, ϵ)
9: Until All data points belong to a cluster, or the maximum 3: if |Nϵ (p)| ≥ MinPts then
number of iterations is reached. 4: Set p’s cluster id to C
End 5: ExpandCluster(p, Nϵ (p), C, ϵ, MinPts)
6: C=C+1
The disadvantage of the K-Means clustering method is 7: else
that the method requires a suitable value for the number of 8: Label p as noise
clusters (k). Researchers usually use the elbow value of sum- 9: end if
10: end for
of-squared errors (SSE) to determine the suitable number of
clusters.
SSE is a measure of the variation of error in a model. It
is one of the most commonly used evaluation metrics for In our method, we did not need to specify the MinPts since
choosing k values in the K-Means clustering method. It is we also considered single pedestrians as a group. The choice
an indicator of clustering quality. The lower the SSE, the of epsilon thus becomes the main problem in our testing of
better the clustering. As we can observe from all the k and DBSCAN. A small number prevents the dense points from
SEE plots, an increase in k will decrease the corresponding clustering, whereas a large value results in the merging of
SSE, indicating a trade-off between them. Commonly, the clusters. So ϵ is dynamically adjusted in our testing to make
algorithm will pick the value of k when the SSE plot flattens it applicable in various circumstances. Utilizing the elbow
out and resembles an elbow. We can have a low SSE while value of the sorting distances between each data point and
choosing a small value of k [71]. its closest neighbor as ϵ.

132170 VOLUME 11, 2023

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

Algorithm 3 ExpandCluster(p, neighborPts, C, ϵ,

MinPts) [78]
Input:
p: current search point;
neighborPts: density-reachable points from p;
C: current cluster id;
ϵ: the maximum distance;
MinPts: the minimum points to form a cluster;
Output:
drPts: density-reachable points from p
1: drPts = neighborPts
2: for each point q ∈ drPts do
3: if q is unclassified then
4: Nϵ (p) = RangeQuery(p, ϵ)
5: if |Nϵ (p)| ≥ MinPts then
6: drPts = drPts ∪ Nϵ (p)
7: end if FIGURE 4. Grid clustering example.
8: end if
9: if q does not belong to any cluster then
10: q’s cluster id = C The CSRT tracker, the C++ version of the CSR-DCF
11: end if tracking algorithm, used the spatial map to find suitable
12: end for spatially constrained filter channels. Then, different feature
channels’ responses are correlated. The tracker will locate
the object by the channel reliability weights generated by the
discriminative power of each channel response. In addition,
3) GRID CLUSTERING METHOD the foreground histogram, based on the estimated object
Grid-based clustering methods represent the data space as a bounding box, and background histograms, extracted from
grid structure, dividing it into a finite number of cells. These the object size, are kept updated to reduce the noise from
cells form the base unit of the grid and are clustered based on detectors. As the frame progresses, the filters and channel
predefined criteria. Unlike other clustering algorithms, this reliable weights will update to ensure reliable detection.
method processes on grid cells rather than individual data In this step, we are not tracking individual objects but
points, which results in a lower processing time. The quality the clusters we clustered earlier, which helps us reduce the
of the clustering heavily depends on the choice of the number processing time by limiting the number of objects.
of grid cells. Fig. 4 below shows an example of the grid
clustering process, where a unique color denotes each group IV. PROPOSED ZONE-BASED GROUP DETECTION
within a cell. METHOD
In our prior work, we observed that the K-Means and
D. OBJECT AND GROUP TRACKING DBSCAN clustering algorithms were incapable of effectively
In our study, we employ the Discriminative Correlation Filter grouping points across disparate fields of view. Our enhanced
Tracker with Channel and Spatial Reliability (CSR-DCF) clustering methods addressed this limitation by splitting the
[46], chosen for its higher accuracy. It increases the search image into different zones. Unlike grid clustering, we only
region and the tracking accuracy of non-rectangular objects separate the image in the horizontal direction, employing
by adjusting the filter support through the spatial reliability uneven intervals. The structure of our proposed clustering
map. methods is illustrated in Fig. 5. First, we applied the object
After many years of development, several object-tracking detection method to the selected image, enabling us to
methods exist that can have high accuracy while dealing with get all the detected objects’ labels and positions. After
various types of distractions. Because of its speed, DCF- object filtering, we extracted pedestrian information. Then,
based tracking algorithms are widely utilized. Such as the we separate the image based on the lower coordinates of the
Kernel correlation filter, which uses multi-channel features pedestrians’ bounding boxes. Lastly, we deployed K-Means
to increase accuracy, and the CSR-DCF tracking algorithm, and DBSCAN to cluster pedestrians.
which combines CNN features to improve robustness [79]. To set up the zone detection, we need to create a sorted
There are various open-source object tracking algorithms list containing each pedestrian foot’s horizontal location on
available online, including BOOSTING, MIL, KCF, TLD, each image. Since pedestrians in a group are close to one
MEDIANFLOW, GOTURN, MOSSE, and CSRT, which are another, their y-axis will also distribute closely. In this study,
all included in OpenCV, where CSRT is the one we will use we used the lower bounding box level as the parameter
to apply object tracking. to define the horizontal location of each pedestrian foot.

VOLUME 11, 2023 132171

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

FIGURE 5. The structure of our proposed group detection methods.

We first separated the image based on the gap between the main problem we must solve in clustering-based pedestrian
nearby sorted y-axis. Once the gap exceeded the threshold, group detection methods. Pedestrians in the same group
a divider was defined, and these dividers would separate will have a wider gap when they are closer to the camera,
the image. This greatly increased the accuracy of pedestrian which means we cannot use a single epsilon to perform the
grouping. However, there still exists a problem: in the human- clustering. Since the image is divided into different zones,
view image, the same distance in the horizontal direction each one has to have its own epsilon. To set up the epsilon,
will also be influenced by the field of view, which means we will use the average bounding box width of the pedestrians
this process will separate the pedestrians that are too close to within each divided region. Since the bounding box width
the camera into different areas and cannot accurately detect varies with magnification, it could reflect the size of each
them as a group. As a result, we enhanced it by using the pedestrian. A ratio will be used to generate epsilon since
bounding box widths of the horizontal nearby pedestrians as pedestrians of the same group are not always close by. Since
the parameter to achieve zone detection. We will first create a individual pedestrians are also considered a group in our
2*n matrix with each line containing the sorted y-axis and its research, we specify the MinPts equal to 1. In each separate
corresponding bounding box width of one specific pedestrian. area, we run DBSCAN using our chosen epsilon and MinPts
Then, we calculate the gap between the nearby y-axis and and add the pedestrian temporary label with the previous
compare it with half of the average corresponding bounding area’s maximum label value.
box widths. In order to prevent the situation that pedestrians Our proposed Z-DBSCAN method is represented by the
in heavy crowds can hardly define the threshold, we also set pseudocode below. As previously discussed, we have set the
three fixed values when the number of the saved threshold thresholds using the sorted y1 coordinates. The picture is
was less than 2. Fig. 6 below presents a sample frame with our then divided by these thresholds, and a specific zone_label
defined zones. As observed, pedestrians with different fields is assigned to pedestrians on each piece. Additionally,
of view are effectively distinguished by our zone detection. we collected the bounding box width for each pedestrian
and used the average width in each division region to set
up the epsilon. Due to the possibility of pedestrians in the
same group not always touching in our field of view, we gave
epsilon a ratio. After being clustered with the DBSCAN
method and getting the area_label, the label_size is added
with the label to avoid label duplication and generate the
group_label for each pedestrian.

B. ZONE-BASED K-MEANS CLUSTERING METHOD

K-Means clustering method requires the number of clusters
to be defined initially. Our prior research indicated that
K-Means clustering-based pedestrian group detection could
achieve higher accuracy than DBSCAN at a lower speed [13].
FIGURE 6. Zone detection on a sample frame of the MOT17 dataset.
This happened because we needed to execute the method
numerous times to generate the SSE plot and determine the
optimal k value for the clustering based on the Elbow method.
A. ZONE-BASED DBSCAN CLUSTERING METHOD Moreover, since K-Means clustering is a partitional clustering
DBSCAN clustering method requires two parameters, which method, it distinguishes the clusters by applying distance
are epsilon and MinPts. In our view, the field of view is the functions [73]. When data points are closely distributed

132172 VOLUME 11, 2023

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

Algorithm 4 The Zone-Based DBSCAN Algorithm. dataset also contains several ground truth files for each
Input: Multiple Bounding Box Coordinates X. video, which contain the objects’ coordinates in each video
Output: Group Label of Each Object frame generated by the dataset creators, in which object ID,
X is a matrix containing the top-left and bottom-right bounding box coordinates, and object class are used in our
coordinates of pedestrians’ bounding boxes [(x0 , y0 ), (x1 , y1 )] research.
1: label_size ← 0 We also applied the KITTI dataset [69] in our testing. It
2: distribute_list ← sorted(y1 ) is one of the most popular datasets in mobile robotics and
3: threshold ← gap between distribute_list autonomous driving. Like the MOT17Det dataset, the KITTI
4: ▷ use default threshold if not enough list length dataset contains 21 continuous frames in their training set
5: zone_label ← threshold and 29 videos’ continuous frames in the test set. The videos
6: width ← x1 − x0 are recorded through a mounted camera in front of a moving
7: for each area separated by zone_label do vehicle. In its 0013 training set, a video with 339 frames is
8: eps ← average(widtharea ) ∗ ratio described in PNG images. The training dataset also contains
9: area_label ← DBSCAN(eps, Xarea ) + label_size a set of corresponding ground truth files.
10: label_size ← max(area_label)
11: group_label ← group_label + area_label B. GROUND TRUTH GENERATION
12: end for Since both datasets lack ground truth on group detection,
13: return group_label we experimented to determine the group labels for each
pedestrian to evaluate the clustering result using the Adjust
Rand Index. The experiment was provided to four annotators,
and each had the original video and the videos containing one
horizontally, K-Means clustering tends to group them into
highlighted pedestrian with its ID. We highlighted the pedes-
a cluster, lowering pedestrian group detection accuracy in
trians based on the ground truth bounding box coordinates.
human-view cameras. In this study, we tried to mitigate this
They produced the grouping label for all detected pedestrians,
disadvantage by partitioning the image into distinct zones,
excluding those hard to discern after watching the videos
which allows the method to run fewer iterations and thereby
numerous times. We assessed inter-annotator agreement on
save time.
each pair of the annotator results using Cohen’s kappa score,
The proposed Z-KMeans method’s process is almost
a measure of consensus among annotators regarding cluster-
the same as Z-DBSCAN. The differences are reflected
ing results. Equation 2 is the calculation of the score, in which
in the clustering method section. Unlike using the DBSCAN
p0 stands for the probability of correctly labeling a given
clustering method and defining the epsilon, we run the
sample, and pe represents the expected agreement between
K-Means method several times and generate the k-SSE plot;
two annotators on their labels if they are randomly assigned.
then, the elbow method is utilized to define the suitable k
By using sklearn.metrics.cohen_kappa_score, an average
value.
Cohen’s kappa score of 0.9282 for the MOT17Det dataset
and 0.9453 for the KITTI dataset was provided for our
V. RESULTS AND DISCUSSIONS
experiment, indicating a high level of agreement between
In this work, we introduced a framework to detect pedestrian
these annotators.
groups. Through this framework, the interaction between
group members could be discovered. Additionally, tracking (p0 − pe )
speed will increase as the number of tracking objects K= (2)
(1 − pe )
decreases. Furthermore, it was possible to predict the
behavior of osculating pedestrians based on the information C. EVALUATION METRICS
provided by the group. The framework starts by detecting Evaluation is a process that shows the improvement of
pedestrians in selected frames. After that, pedestrian groups a model and determines the effectiveness of selected
were identified using K-Means, DBSCAN, Grid, and our algorithms. Ahmed [75] classified evaluation measures into
proposed zone-based clustering methods, Z-KMeans, and internal evaluation, external evaluation, manual evaluation,
Z-DBSCAN. Finally, object tracking will be used to track and indirect evaluation, in which internal evaluation and
pedestrian groups continuously. external evaluation are widely used nowadays.
Internal evaluation evaluates the clustering based on the
A. DATASETS data itself, including Silhouette Score, Calinski-Harabaz
We were using the MOT17Det dataset [68] as the input for Index, Davies-Bouldin Index, etc. However, external evalu-
our testing. This dataset contains several numbered videos ation compares the clustering result to existing ground truth,
with object coordinate ground truths. MOT17-02 contains such as class labels. It contains the Rand Index, Confusion
a video with 600 frames in 20 seconds, where frames are matrix, Jaccard Index, etc. In this research, we applied the
images in a sequence. It records the pedestrian movements Adjusted Rand Index, a corrected-for-chance version of the
on the street in a fixed position, as in a human view. The Rand Index, as the evaluation measure.

VOLUME 11, 2023 132173

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

The internal evaluation is used primarily to measure the accurately. However, in the KITTI-0013 dataset, pedestrians
accuracy of unsupervised learning clustering methods, and it are less densely clustered compared to the MOT17-02 dataset,
only evaluates the cluster based on the distance between data which improves the performance of Z-KMeans clustering.
points. It is less capable once working on the data points with Fig. 9 can quickly help us find the difference between the
different fields of view. As an example of Fig. 7, pedestrians original DBSCAN clustering method and the zone-based one.
are identified by numbers located at the top-left corner In the figure, the background of the sample image has been
of the bounding box. Meanwhile, the group label and the removed from each plot, and each pedestrian is represented
included pedestrians are denoted at the bottom-right corner. by a dot located at the lower-center point of the bounding
The pedestrian of ID 6 is close to the pedestrian of ID 4 and box’s lower edge. Dots with identical colors and shapes
is clustered together by the K-Means and DBSCAN methods, signify that they belong to the same group as determined
while our proposed Z-DBSCAN and Z-KMeans methods by specific methods. We had already mentioned that only
separate them correctly. However, the internal evaluation our proposed Z-DBSCAN method correctly grouped the
cannot distinguish this situation. pedestrians at varying depths within the field of view.
Therefore, in this research, we applied the Adjusted Rand Taking pedestrians with IDs 3 and 5 as an example, while
Index, a corrected-for-chance version of the Rand Index, DBSCAN clustered them together, our zone-based approach
as the evaluation measure. The Rand Index is a metric used separated them accurately, aligning with the ground truth.
to measure the similarity between two clustering results. It This exemplified the strength of our approach in handling
analyzes individual points within a cluster to determine if pedestrians at varied depths within a human-view camera
two clusters are similar [80]. However, it cannot return a perspective.
constant value on a random cluster. To address this problem,
ARI adjusts all points for the expected similarity between
all pairs of points under random clustering. ARI is defined E. ANALYSIS OF THE EFFECT OF GROUP DETECTION ON
in Equation 3, where RI indicates the result of the Rand OBJECT TRACKING
Index, expected(RI) is the expected value of the Rand Index It is necessary to apply object tracking to our results in
under random labeling, and max(RI) is the maximum possible order to determine whether group detection is beneficial.
value of the Rand Index. The Adjusted Rand Index ranges In our method, we apply the CSRT tracker, the C++
from −1 to 1, where 1 indicates perfect agreement between version of the CSR-DCF, to the detection result since it
the clusterings, 0 indicates agreement expected by random gives higher accuracy. Five tests are conducted for each
chance, and negative values indicate worse than random method, and each test’s average processing time is recorded.
agreement. A comparison of different methods’ entire processing time,
both the detection time and the tracking time are included,
(RI − expected(RI )) is shown in table 5. From the observation, we could find that
ARI = (3)
(max(RI ) − expected(RI )) all methods reduced processing time for object tracking, and
our proposed Z-DBSCAN achieved higher accuracy while
D. RESULTS COMPARISON minimizing processing time on both datasets.
The comparison of ARI results revealed that we achieved the
best results, and our proposed Z-DBSCAN method is more
in line with the ground truth of pedestrian grouping. The F. ANALYSIS OF K-MEANS CLUSTERING-BASED GROUP
average ARI of different methods over a ten-frame sequence DETECTION
of the MOT17 dataset and the KITTI dataset are also shown Clusters k are required as inputs with the K-Means method.
in Fig. 8. This figure shows that our proposed Z-DBSCAN For each frame that applied object detection, the method was
clustering achieved the highest ARI score in the MOT17 tested with continuous integer values in a specific window
dataset. However, for the KITTI dataset, the optimal value size of k to calculate SSE and choose the elbow value of the
of Grid clustering matched the ARI score of our proposed SSE plot as the k value.
Z-DBSCAN clustering. These comparisons indicate that our Using the first frame from the MOT17-02 dataset as an
proposed Z-DBSCAN clustering method exhibits consistent example, we selected the k values between 4 and 8 due to the
stability and reliability across varied scenarios. aforementioned limited number of pedestrians. The resultant
Notably, the ARI of the Z-KMeans did not exhibit an ideal SSE is shown with the k value in Fig. 10. After that, the elbow
performance compared to the K-Means in the MOT17-02 approach was used to determine the ideal number of clusters.
dataset. We attribute this to the chosen k value. Once we For our situation, k = 6 is the appropriate number. As shown
separate the image into several regions, each region will in Fig. 11, the K-Means method produces a clear and accurate
contain fewer pedestrians. When we compute the SSE and grouping result in the first frame of MOT17-02. However, the
utilize the elbow method to identify the optimal k value, the K-Means clustering method in pedestrian grouping requires
SSE plot will carry less information, making it challenging selecting an appropriate k value in each detection period, and
for the elbow method to determine the number of clusters it is challenging to determine the window size for finding it.

132174 VOLUME 11, 2023

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

FIGURE 7. A comparison of grouping results on a sample frame.

TABLE 5. Pedestrian detection and tracking processing times in seconds.

and epsilon are required as inputs. Since an individual object

can also be seen as a group in our methodology, we set MinPts
to 1. Epsilon is determined by the elbow created by sorting
distances between each point and its nearest neighbor. In our
method, we used the groups’ and pedestrians’ coordinates
as the input for the object tracking. The choice of epsilon
thus becomes the main problem in our testing of DBSCAN.
A small number prevents the dense points from clustering,
whereas a big value results in the merging of clusters. So ϵ
is dynamically adjusted in our testing to make it applicable
in various circumstances. Utilizing the elbow value of the
sorting distances between each data point and its closest
neighbor as Epsilon. Fig. 12 shows the results of group
detection using the DBSCAN method for the first frame,
which could be observed that we did not achieve perfect
FIGURE 8. Average evaluation score of different group detection methods results with clustering. We believe the difference in the field
over ten frames. of view caused it.

G. ANALYSIS OF DBSCAN CLUSTERING-BASED GROUP H. ANALYSIS OF GRID CLUSTERING-BASED GROUP

DETECTION DETECTION
The number of clusters, k, is not considered by the DBSCAN Grid clustering, a significant field in image segmentation, was
clustering method since it is density-based. Instead, MinPts tested in our study with varying numbers of grid cells. Table 6

VOLUME 11, 2023 132175

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

FIGURE 9. Comparison of the two clustering methods and the ground truth on a sample frame of the MOT17-02 dataset.

FIGURE 10. SSE of different number of clusters.

FIGURE 12. Group detection with DBSCAN method on a sample frame of

the MOT17 dataset.

TABLE 6. Grid clustering comparison on the MOT17 dataset based on ARI

score.

FIGURE 11. Group detection with K-Means method (k=6) on a sample

frame of the MOT17 dataset.

TABLE 7. Grid clustering comparison on the KITTI dataset based on ARI

provides a comparative analysis of various configurations score.
of grids, number of columns and rows, for the MOT17-02
dataset, indicating that a 4 × 7 configuration yielded the most
favorable results. Similarly, Table 7 for the KITTI dataset
reveals that a 6 × 4 configuration was optimal. As a result,
these specific configurations were chosen for further testing.
However, these results were not optimal, indicating that Grid
clustering is inadequate to handle pedestrian grouping with
diverse fields of view.

I. ANALYSIS OF OUR PROPOSED Z-DBSCAN

CLUSTERING-BASED GROUP DETECTION on the image from the human view camera. Individual
After observing the pedestrian clustering result of the pedestrians who were not in a group were always clustered
DBSCAN method, we found that it did not work perfectly together. In our opinion, the depth of human vision caused

132176 VOLUME 11, 2023

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

VI. CONCLUSION
Our focus in this study was on detecting and tracking
pedestrian groups as a collective entity rather than focus-
ing solely on individual pedestrians because individuals
within pedestrian groups frequently walk in close proximity.
Furthermore, such practices could facilitate better traffic
management and contribute to the prevention of infectious
disease spread. This paper presented a zone-based pedestrian
group detection method to detect pedestrian groups within
the MOT17 and KITTI datasets. In this study, we applied
zone detection to separate the image into several horizontal
zones and ran the clustering method within each zone, which
FIGURE 13. Group detection with our Z-DBSCAN method on a sample aimed to mitigate potential misclassification arising from
frame of the MOT17 dataset. varying fields of view. When compared against established
methods such as K-Means, DBSCAN, and grid clustering,
our proposed zone-based clustering method, Z-DBSCAN,
this issue. The distance between two pedestrians fluctuated
exhibited commendable performance using the ARI evalua-
depending on their distance from us. The closer we got,
tion metric and attained scores of 0.635 on the MOT17 dataset
the more space between pedestrians we saw. Therefore,
and 0.781 on the KITTI dataset. We also demonstrated that
we cannot perform DBSCAN clustering based on only one
detecting pedestrian groups is more time-efficient compared
epsilon value. As a result, we enhanced the DBSCAN method
to tracking individual pedestrians, resulting in a performance
by dividing the image into several zones and setting dynamic
improvement ranging from 4.5% to 14.1%.
epsilon for each region. This helped us cluster the pedestrians
Enhancing pedestrian group detection with an emphasis on
of similar magnification. This concept is called Z-DBSCAN.
predicting pedestrian trajectories could be one of the potential
As shown in Fig. 13, our proposed Z-DBSCAN pedestrian
future works. By harnessing the data from group information,
clustering could efficiently find pedestrian groups compared
the pedestrian’s future direction can be anticipated, even
with the original DBSCAN result.
in cases of occlusion. Future work could also include
examining the adaptability of the proposed method to
J. ANALYSIS OF OUR PROPOSED Z-KMEANS dynamic urban environments, including changing pedestrian
CLUSTERING-BASED GROUP DETECTION behavior, varying densities, and unexpected obstacles.
As we mentioned in Methodology and Z-DBSCAN’s
analysis, clustering-based group detection could not solve REFERENCES
the problems of horizontally distributed closed pedestrian [1] S. Haddad, M. Wu, H. Wei, and S. K. Lam, ‘‘Situation-aware pedestrian
separation. This proposed method tried to solve this by trajectory prediction with spatio-temporal attention model,’’ in Proc. 24th
using zone detection to separate the pedestrians that are Comput. Vis. Winter Workshop, 2019, pp. 1–10.
[2] E. Johnson and E. Nica, ‘‘Connected vehicle technologies, autonomous
horizontally distributed closely. We will choose the number driving perception algorithms, and smart sustainable urban mobility
of clusters within each area being separated. We named this behaviors in networked transport systems,’’ Contemp. Readings Law
method Z-KMeans. Fig. 14 showed below is the result of Social Justice, vol. 13, no. 2, pp. 37–50, 2021.
group detection using the Z-KMeans method for the first [3] H. A. Ameen, A. K. Mahamad, S. Saon, D. M. Nor, and K. Ghazi,
‘‘A review on vehicle to vehicle communication system applications,’’
frame in MOT17-02. Indonesian J. Electr. Eng. Comput. Sci., vol. 18, no. 1, p. 188, Apr. 2020.
[4] R. Quintero Mínguez, I. P. Alonso, D. Fernández-Llorca, and M. Á. Sotelo,
‘‘Pedestrian path, pose, and intention prediction through Gaussian process
dynamical models and pedestrian activity recognition,’’ IEEE Trans. Intell.
Transp. Syst., vol. 20, no. 5, pp. 1803–1814, May 2019.
[5] D. Ludl, T. Gulde, and C. Curio, ‘‘Enhancing data-driven algorithms for
human pose estimation and action recognition through simulation,’’ IEEE
Trans. Intell. Transp. Syst., vol. 21, no. 9, pp. 3990–3999, Sep. 2020.
[6] K. M. Abughalieh and S. G. Alawneh, ‘‘Predicting pedestrian intention to
cross the road,’’ IEEE Access, vol. 8, pp. 72558–72569, 2020.
[7] Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, ‘‘FairMOT: On the
fairness of detection and re-identification in multiple object tracking,’’ Int.
J. Comput. Vis., vol. 129, no. 11, pp. 3069–3087, Nov. 2021.
[8] S. Karthik, A. Prabhu, and V. Gandhi, ‘‘Simple unsupervised multi-object
tracking,’’ 2020, arXiv:2006.02609.
[9] S. Han, P. Huang, H. Wang, E. Yu, D. Liu, and X. Pan, ‘‘MAT: Motion-
aware multi-object tracking,’’ Neurocomputing, vol. 476, pp. 75–86,
Mar. 2022.
[10] H. Cheng, Y. Li, and M. Sester, ‘‘Pedestrian group detection in
FIGURE 14. Group detection with Z-KMeans method on a sample frame shared space,’’ in Proc. IEEE Intell. Vehicles Symp. (IV), Jun. 2019,
of the MOT17 dataset. pp. 1707–1714.

VOLUME 11, 2023 132177

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

[11] J. Zhu, S. Chen, W. Tu, and K. Sun, ‘‘Tracking and simulating pedestrian [34] P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka,
movements at intersections using unmanned aerial vehicles,’’ Remote L. Li, Z. Yuan, C. Wang, and P. Luo, ‘‘Sparse R-CNN: End-to-end object
Sens., vol. 11, no. 8, p. 925, Apr. 2019. detection with learnable proposals,’’ in Proc. IEEE/CVF Conf. Comput.
[12] R. Wang, Y. Cui, X. Song, K. Chen, and H. Fang, ‘‘Multi-information- Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 14449–14458.
based convolutional neural network with attention mechanism for pedes- [35] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’
trian trajectory prediction,’’ Image Vis. Comput., vol. 107, Mar. 2021, 2018, arXiv:1804.02767.
Art. no. 104110. [36] P. Zhou, B. Ni, C. Geng, J. Hu, and Y. Xu, ‘‘Scale-transferrable object
[13] M. Chen, S. Banitaan, M. Maleki, and Y. Li, ‘‘Pedestrian group detection detection,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
with K-means and DBSCAN clustering methods,’’ in Proc. IEEE Int. Conf. Jun. 2018, pp. 528–537.
Electro Inf. Technol. (eIT), May 2022, pp. 1–6. [37] Z. Tian, C. Shen, H. Chen, and T. He, ‘‘FCOS: Fully convolutional one-
[14] Q. Zhang, X. J. Yang, and L. P. Robert, ‘‘What and when to explain? A stage object detection,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV),
survey of the impact of explanation on attitudes toward adopting automated Oct. 2019, pp. 9626–9635.
vehicles,’’ IEEE Access, vol. 9, pp. 159533–159540, 2021. [38] X. Lu, Q. Li, B. Li, and J. Yan, ‘‘MimicDet: Bridging the gap between
[15] Q. Zhang, C. Esterwood, A. K. Pradhan, D. Tilbury, X. J. Yang, and one-stage and two-stage object detection,’’ in Proc. Eur. Conf. Comput.
L. P. Robert, ‘‘The impact of modality, technology suspicion, and NDRT Vis., Glasgow, U.K., Aug. 2020, pp. 541–557.
engagement on the effectiveness of AV explanations,’’ IEEE Access, [39] Z. Yang, Y. Sun, S. Liu, and J. Jia, ‘‘3DSSD: Point-based 3D single stage
vol. 11, pp. 81981–81994, 2023. object detector,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
[16] M. Chen, S. Banitaan, M. Maleki, and Y. Li, ‘‘GDSCAN: Pedestrian group (CVPR), Jun. 2020, pp. 11037–11045.
detection using dynamic epsilon,’’ in Proc. 21st IEEE Int. Conf. Mach. [40] C. Chen, Z. Zheng, Y. Huang, X. Ding, and Y. Yu, ‘‘I3Net: Implicit
Learn. Appl. (ICMLA), Dec. 2022, pp. 1748–1753. instance-invariant network for adapting one-stage object detectors,’’ in
[17] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021,
M. Pietikäinen, ‘‘Deep learning for generic object detection: pp. 12571–12580.
A survey,’’ Int. J. Comput. Vis., vol. 128, no. 2, pp. 261–318, [41] W. Zheng, W. Tang, S. Chen, L. Jiang, and C.-W. Fu, ‘‘CIA-SSD: Confident
Feb. 2020. IoU-aware single-stage object detector from point cloud,’’ in Proc. AAAI
[18] M. Liang, B. Yang, S. Wang, and R. Urtasun, ‘‘Deep continuous fusion Conf. Artif. Intell., vol. 35, no. 4, 2021, pp. 3555–3562.
for multi-sensor 3D object detection,’’ in Proc. Eur. Conf. Comput. Vis. [42] Z. Sun, J. Chen, L. Chao, W. Ruan, and M. Mukherjee, ‘‘A survey of
(ECCV), Sep. 2018, pp. 641–656. multiple pedestrian tracking based on tracking-by-detection framework,’’
[19] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, ‘‘Frustum PointNets for IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 5, pp. 1819–1833,
3D object detection from RGB-D data,’’ in Proc. IEEE/CVF Conf. Comput. May 2021.
Vis. Pattern Recognit., Jun. 2018, pp. 918–927. [43] X. Weng, J. Wang, D. Held, and K. Kitani, ‘‘AB3DMOT: A base-
[20] G. Brazil and X. Liu, ‘‘M3D-RPN: Monocular 3D region proposal network line for 3D multi-object tracking and new evaluation metrics,’’ 2020,
for object detection,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), arXiv:2008.08063.
Brazil, Oct. 2019, pp. 9286–9295. [44] N. Mahmoudi, S. M. Ahadi, and M. Rahmati, ‘‘Multi-target tracking using
[21] H. Chu, W.-C. Ma, K. Kundu, R. Urtasun, and S. Fidler, ‘‘SurfConv: CNN-based features: CNNMTT,’’ Multimedia Tools Appl., vol. 78, no. 6,
Bridging 3D and 2D convolution for RGBD images,’’ in Proc. IEEE/CVF pp. 7077–7096, Mar. 2019.
Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 3002–3011. [45] Z. Zhou, J. Xing, M. Zhang, and W. Hu, ‘‘Online multi-target tracking with
[22] T. Huang, Z. Liu, X. Chen, and X. Bai, ‘‘EPNet: Enhancing point features tensor-based high-order graph matching,’’ in Proc. 24th Int. Conf. Pattern
with image semantics for 3D object detection,’’ in Proc. Eur. Conf. Comput. Recognit. (ICPR), Aug. 2018, pp. 1809–1814.
Vis. Cham, Switzerland: Springer, Aug. 2020, pp. 35–52. [46] A. Lukežic, T. Vojír, L. C. Zajc, J. Matas, and M. Kristan, ‘‘Discriminative
[23] M. Ye, S. Xu, and T. Cao, ‘‘HVNet: Hybrid voxel network for LiDAR correlation filter with channel and spatial reliability,’’ in Proc. IEEE Conf.
based 3D object detection,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4847–4856.
Recognit. (CVPR), Jun. 2020, pp. 1628–1637. [47] J. Zhu, H. Yang, N. Liu, M. Kim, W. Zhang, and M.-H. Yang, ‘‘Online
[24] Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, ‘‘Domain adaptive multi-object tracking with dual matching attention networks,’’ in Proc. Eur.
faster R-CNN for object detection in the wild,’’ in Proc. IEEE/CVF Conf. Conf. Comput. Vis. (ECCV), Sep. 2018, pp. 366–382.
Comput. Vis. Pattern Recognit., Jun. 2018, pp. 3339–3348. [48] W. Zhang, H. Zhou, S. Sun, Z. Wang, J. Shi, and C. C. Loy, ‘‘Robust multi-
[25] W. Zhiqiang and L. Jun, ‘‘A review of object detection based on modality multi-object tracking,’’ in Proc. IEEE/CVF Int. Conf. Comput.
convolutional neural network,’’ in Proc. 36th Chin. Control Conf. (CCC), Vis. (ICCV), Oct. 2019, pp. 2365–2374.
Jul. 2017, pp. 11104–11109. [49] J. Xu, Y. Cao, Z. Zhang, and H. Hu, ‘‘Spatial–temporal relation networks
[26] Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, and J. Sun, ‘‘Light-head R-CNN: for multi-object tracking,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis.
In defense of two-stage object detector,’’ 2017, arXiv:1711.07264. (ICCV), Oct. 2019, pp. 3987–3997.
[27] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-time [50] P. Chu and H. Ling, ‘‘FAMNet: Joint learning of feature, affinity and multi-
object detection with region proposal networks,’’ in Proc. Adv. Neural Inf. dimensional assignment for online multiple object tracking,’’ in Proc.
Process. Syst., vol. 28, 2015, pp. 1–9. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 6171–6180.
[28] C. Yan, W. Chen, P. C. Y. Chen, A. S. Kendrick, and X. Wu, ‘‘A new [51] P. Sun, Y. Jiang, R. Zhang, E. Xie, J. Cao, X. Hu, T. Kong, Z. Yuan,
two-stage object detection network without RoI-pooling,’’ in Proc. Chin. C. Wang, and P. Luo, ‘‘TransTrack: Multiple-object tracking with
Control Decis. Conf. (CCDC), Jun. 2018, pp. 1680–1685. transformer,’’ 2020, arXiv:2012.15460.
[29] R. Nabati and H. Qi, ‘‘RRPN: Radar region proposal network for object [52] K. G. Quach, P. Nguyen, H. Le, T.-D. Truong, C. N. Duong, M.-T. Tran,
detection in autonomous vehicles,’’ in Proc. IEEE Int. Conf. Image and K. Luu, ‘‘DyGLIP: A dynamic graph model with link prediction for
Process. (ICIP), Sep. 2019, pp. 3093–3097. accurate multi-camera multiple object tracking,’’ in Proc. IEEE/CVF Conf.
[30] Z. Qin, Z. Li, Z. Zhang, Y. Bao, G. Yu, Y. Peng, and J. Sun, ‘‘ThunderNet: Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 13779–13788.
Towards real-time generic object detection on mobile devices,’’ in Proc. [53] T. Meinhardt, A. Kirillov, L. Leal-Taixé, and C. Feichtenhofer, ‘‘Track-
IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 6717–6726. Former: Multi-object tracking with transformers,’’ in Proc. IEEE/CVF
[31] L. Cai, B. Zhao, Z. Wang, J. Lin, C. S. Foo, M. S. Aly, and Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 8834–8844.
V. Chandrasekhar, ‘‘MaxpoolNMS: Getting rid of NMS bottlenecks in [54] Y. Zhang, ‘‘ByteTrack: Multi-object tracking by associating every
two-stage object detectors,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern detection box,’’ in Proc. 17th Eur. Conf. Comput. Vis., Tel Aviv, Israel,
Recognit. (CVPR), Jun. 2019, pp. 9348–9356. Oct. 2022, pp. 1–21.
[32] K. Duan, L. Xie, H. Qi, S. Bai, Q. Huang, and Q. Tian, ‘‘Corner proposal [55] F. Zeng, B. Dong, Y. Zhang, T. Wang, X. Zhang, and Y. Wei, ‘‘MOTR:
network for anchor-free, two-stage object detection,’’ in Proc. Eur. Conf. End-to-end multiple-object tracking with transformer,’’ in Proc. Eur. Conf.
Comput. Vis. (ECCV), Aug. 2020, pp. 399–416. Comput. Vis., Tel Aviv, Israel, Oct. 2022, pp. 659–675.
[33] X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, ‘‘Oriented R-CNN for [56] F. Gu, J. Lu, and C. Cai, ‘‘RPformer: A robust parallel transformer for
object detection,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), visual tracking in complex scenes,’’ IEEE Trans. Instrum. Meas., vol. 71,
Oct. 2021, pp. 3500–3509. pp. 1–14, 2022.

132178 VOLUME 11, 2023

M. Chen et al.: Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

[57] A. Sawas, A. Abuolaim, M. Afifi, and M. Papagelis, ‘‘Tensor methods for [78] Y. Chen, S. Tang, N. Bouguila, C. Wang, J. Du, and H. Li, ‘‘A fast
group pattern discovery of pedestrian trajectories,’’ in Proc. 19th IEEE Int. clustering algorithm based on pruning unnecessary distance computations
Conf. Mobile Data Manage. (MDM), Jun. 2018, pp. 76–85. in DBSCAN for high-dimensional data,’’ Pattern Recognit., vol. 83,
[58] M. H. Zaki and T. Sayed, ‘‘Automated analysis of pedestrian group pp. 375–387, Nov. 2018.
behavior in urban settings,’’ IEEE Trans. Intell. Transp. Syst., vol. 19, no. 6, [79] S. Liu, D. Liu, G. Srivastava, D. Połap, and M. Woźniak, ‘‘Overview and
pp. 1880–1889, Jun. 2018. methods of correlation filter algorithms in object tracking,’’ Complex Intell.
[59] S. Li, Z. Qin, and H. Song, ‘‘A temporal-spatial method for group detection, Syst., vol. 7, pp. 1895–1917, Jun. 2020.
locating and tracking,’’ IEEE Access, vol. 4, pp. 4484–4494, 2016. [80] R. Sinnott, H. Duan, and Y. Sun, ‘‘A case study in big data analytics:
[60] D. Bršcic, F. Zanlungo, and T. Kanda, ‘‘Modelling of pedestrian groups and Exploring Twitter sentiment analysis and the weather,’’ in Big Data.
application to group recognition,’’ in Proc. 40th Int. Conv. Inf. Commun. Cambridge, MA, USA: Morgan Kaufmann, 2016, pp. 357–388.
Technol., Electron. Microelectron. (MIPRO), May 2017, pp. 564–569.
[61] Z. Yücel, F. Zanlungo, and M. Shiomi, ‘‘Modeling the impact of interaction
on pedestrian group motion,’’ Adv. Robot., vol. 32, no. 3, pp. 137–147,
Feb. 2018.
[62] F. Solera, S. Calderara, and R. Cucchiara, ‘‘Socially constrained structural
MINGZUOYANG CHEN received the B.S. and
learning for groups detection in crowd,’’ IEEE Trans. Pattern Anal. Mach.
Intell., vol. 38, no. 5, pp. 995–1008, May 2016.
M.S. degrees in electrical and computer engineer-
[63] V. Bastani, D. Campo, L. Marcenaro, and C. Regazzoni, ‘‘Online ing from the University of Detroit Mercy, MI,
pedestrian group walking event detection using spectral analysis of motion USA, in 2018 and 2019, respectively, where he is
similarity graph,’’ in Proc. 12th IEEE Int. Conf. Adv. Video Signal Based currently pursuing the Ph.D. degree in electrical
Surveill. (AVSS), Aug. 2015, pp. 1–5. and computer engineering. From 2020 to 2023,
[64] X. Li, M. Chen, F. Nie, and Q. Wang, ‘‘A multiview-based parameter free he was a Research Assistant in electrical and com-
framework for group detection,’’ in Proc. AAAI Conf. Artif. Intell., vol. 31, puter engineering with the University of Detroit
no. 1, 2017, pp. 4147–4153. Mercy. His research interests include machine
[65] T. Fernando, S. Denman, S. Sridharan, and C. Fookes, ‘‘GD-GAN: learning and image processing.
Generative adversarial networks for trajectory prediction and group
detection in crowds,’’ in Proc. Asian Conf. Comput. Vis., Perth, WA,
Australia, Dec. 2018, pp. 314–330.
[66] A. Paszke et al., ‘‘Pytorch: An imperative style, high-performance deep
learning library,’’ in Proc. Adv. Neural Inf. Process. Syst. Red Hook, SHADI BANITAAN (Member, IEEE) received the
NY, USA: Curran Associates, 2019, pp. 8024–8035. [Online]. Available: B.S. degree in computer science and the M.S.
http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-
degree in computer and information sciences from
performance-deep-learning-library.pdf
Yarmouk University, and the Ph.D. degree in
[67] Y.-H. Byeon and K.-C. Kwak, ‘‘A performance comparison of pedestrian
computer science from North Dakota State Uni-
detection using faster RCNN and ACF,’’ in Proc. 6th IIAI Int. Congr. Adv.
Appl. Informat. (IIAI-AAI), Jul. 2017, pp. 858–863. versity. He was an Instructor with the University
[68] A. Milan, L. Leal-Taixé, I. D. Reid, S. Roth, and K. Schindler, ‘‘MOT16: of Nizwa, from 2004 to 2009. In 2013, he joined
A benchmark for multi-object tracking,’’ 2016, arXiv:1603.00831. the University of Detroit Mercy, where he is
[69] A. Geiger, P. Lenz, and R. Urtasun, ‘‘Are we ready for autonomous driving? currently an Associate Professor and the Director
The KITTI vision benchmark suite,’’ in Proc. IEEE Conf. Comput. Vis. of Computer Science and Software Engineering.
Pattern Recognit., Jun. 2012, pp. 3354–3361. His research interests include artificial intelligence, machine learning, and
[70] K. P. Sinaga and M.-S. Yang, ‘‘Unsupervised K-means clustering data mining. He is a member of the Association for Computing Machinery
algorithm,’’ IEEE Access, vol. 8, pp. 80716–80727, 2020. (ACM) and the IEEE Computer Society.
[71] C. Yuan and H. Yang, ‘‘Research on K-value selection method of K-means
clustering algorithm,’’ J, vol. 2, no. 2, pp. 226–235, Jun. 2019.
[72] E. Min, X. Guo, Q. Liu, G. Zhang, J. Cui, and J. Long, ‘‘A survey
of clustering with deep learning: From the perspective of network
architecture,’’ IEEE Access, vol. 6, pp. 39501–39514, 2018. MINA MALEKI received the master’s degree
[73] Z. Shi and L. Pun-Cheng, ‘‘Spatiotemporal data clustering: A survey of in software engineering from Tehran Polytechnic
methods,’’ ISPRS Int. J. Geo-Inf., vol. 8, no. 3, p. 112, Feb. 2019. University, in 2007, and the Ph.D. degree in
[74] J. MacQueen, ‘‘Classification and analysis of multivariate observations,’’ computer science from the University of Windsor,
in Proc. 5th Berkeley Symp. Math. Statist. Probab. Los Angeles, LA, USA:
Canada, in 2014. She is currently an Assistant Pro-
Univ. California Los Angeles, 1967, pp. 281–297.
fessor in computer science and software engineer-
[75] M. Ahmed, R. Seraj, and S. M. S. Islam, ‘‘The k-means algorithm: A
comprehensive survey and performance evaluation,’’ Electronics, vol. 9, ing with the University of Detroit Mercy, MI, USA.
no. 8, p. 1295, Aug. 2020. Prior to this role, she was a Sessional Instructor
[76] K. Kandali, L. Bennis, and H. Bennis, ‘‘A new hybrid routing protocol with the University of Windsor, and a SOSCIP
using a modified K-means clustering algorithm and continuous Hopfield TalentEdge Postdoctoral Research Fellow with the
network for VANET,’’ IEEE Access, vol. 9, pp. 47169–47183, 2021. Cross-Border Institute (CBI), University of Windsor. Her research interests
[77] M. Ester, ‘‘A density-based algorithm for discovering clusters in large include machine learning, deep learning, pattern recognition, and data
spatial databases with noise,’’ in Proc. KDD, vol. 96, no. 34, 1996, analysis.
pp. 226–231.

VOLUME 11, 2023 132179

Traffic-Net - 3D Traffic Monitoring Using A Single Camera
100% (1)
Traffic-Net - 3D Traffic Monitoring Using A Single Camera
21 pages
Numericals (Force)
No ratings yet
Numericals (Force)
22 pages
81-120 Inetigacion
No ratings yet
81-120 Inetigacion
50 pages
3.-GE11 EntrepreneurialMind FINAL
100% (4)
3.-GE11 EntrepreneurialMind FINAL
15 pages
1 s2.0 S2405844025008084 Main
No ratings yet
1 s2.0 S2405844025008084 Main
41 pages
350 SX-F Cairoli Replica 2012: Spare Parts Manual: Chassis
No ratings yet
350 SX-F Cairoli Replica 2012: Spare Parts Manual: Chassis
36 pages
Pedestrian Detection in Automotive Safet
No ratings yet
Pedestrian Detection in Automotive Safet
27 pages
Analyzing Vehicle-Pedestrian Interactions Combining Data Cube Structure and Predictive Collision Risk Estimation Model
No ratings yet
Analyzing Vehicle-Pedestrian Interactions Combining Data Cube Structure and Predictive Collision Risk Estimation Model
33 pages
Pressure Vessel Handout
No ratings yet
Pressure Vessel Handout
14 pages
The Cambridge Handbook of Violent Behavior and Aggression, 1st Edition Annotated PDF Download
100% (17)
The Cambridge Handbook of Violent Behavior and Aggression, 1st Edition Annotated PDF Download
17 pages
A Discrete Choice Modeling Framework For Pedestrian Walking Behavior With Application To Human Tracking in Video Sequences
No ratings yet
A Discrete Choice Modeling Framework For Pedestrian Walking Behavior With Application To Human Tracking in Video Sequences
157 pages
Boym RussianSoulPostCommunist 1995
No ratings yet
Boym RussianSoulPostCommunist 1995
35 pages
6 CF 5
No ratings yet
6 CF 5
18 pages
CAPformer Pedestrian Crossing Action Prediction Us
No ratings yet
CAPformer Pedestrian Crossing Action Prediction Us
22 pages
Predicting Pedestrian Intention To Cross The Road
No ratings yet
Predicting Pedestrian Intention To Cross The Road
12 pages
A Data-Driven Model For Pedestrian Behavior Classification and Trajectory Prediction
No ratings yet
A Data-Driven Model For Pedestrian Behavior Classification and Trajectory Prediction
12 pages
Methods of Research
No ratings yet
Methods of Research
17 pages
Smart Traffic Management of Vehicles Using Faster R CNN Based Deep Learning Method
No ratings yet
Smart Traffic Management of Vehicles Using Faster R CNN Based Deep Learning Method
11 pages
Ijaia 03
No ratings yet
Ijaia 03
15 pages
Pedestrian Detection: Domain Generalization, CNNS, Transformers and Beyond
No ratings yet
Pedestrian Detection: Domain Generalization, CNNS, Transformers and Beyond
13 pages
1 s2.0 S0198971523000844 Main
No ratings yet
1 s2.0 S0198971523000844 Main
10 pages
Pedestrians and Cyclists Intention Estimation For
No ratings yet
Pedestrians and Cyclists Intention Estimation For
10 pages
Adapting Pedestrian Detectors To New Domains: A Comprehensive Review
No ratings yet
Adapting Pedestrian Detectors To New Domains: A Comprehensive Review
17 pages
2022 V13i5023
No ratings yet
2022 V13i5023
9 pages
Traffic-Net: 3D Traffic Monitoring Using A Single Camera: Mahdi Rezaei, Mohsen Azarmi, Farzam Mohammad Pour Mir
No ratings yet
Traffic-Net: 3D Traffic Monitoring Using A Single Camera: Mahdi Rezaei, Mohsen Azarmi, Farzam Mohammad Pour Mir
21 pages
Traffic-Net: 3D Traffic Monitoring Using A Single Camera: Mahdi Rezaei, Mohsen Azarmi, Farzam Mohammad Pour Mir
No ratings yet
Traffic-Net: 3D Traffic Monitoring Using A Single Camera: Mahdi Rezaei, Mohsen Azarmi, Farzam Mohammad Pour Mir
21 pages
1 s2.0 S1877050915000915 Main - 3
No ratings yet
1 s2.0 S1877050915000915 Main - 3
8 pages
Aws RP
No ratings yet
Aws RP
11 pages
Pedestrian Detection Using Embedded Night-Vision Systems
No ratings yet
Pedestrian Detection Using Embedded Night-Vision Systems
8 pages
PDF50789003 1028417441
No ratings yet
PDF50789003 1028417441
13 pages
Pami09 Compressed
No ratings yet
Pami09 Compressed
18 pages
Vijay Report
No ratings yet
Vijay Report
14 pages
Real-Time Object Detection Using Deep Learning: Journal of Advances in Mathematics and Computer Science June 2023
No ratings yet
Real-Time Object Detection Using Deep Learning: Journal of Advances in Mathematics and Computer Science June 2023
10 pages
Synopsis
No ratings yet
Synopsis
9 pages
Thesis Abstract Final
No ratings yet
Thesis Abstract Final
6 pages
1 Synopsis On Pedestrian Controlling On Zebra Crossing
No ratings yet
1 Synopsis On Pedestrian Controlling On Zebra Crossing
7 pages
Pedestrian Detection - Research Paper
No ratings yet
Pedestrian Detection - Research Paper
9 pages
Real-Time Pedestrian Detection and Tracking at Nighttime For Driver-Assistance Systems
No ratings yet
Real-Time Pedestrian Detection and Tracking at Nighttime For Driver-Assistance Systems
16 pages
(2015) Bicyclist Recognition and Orientation Estimation From On-Board Vision System
No ratings yet
(2015) Bicyclist Recognition and Orientation Estimation From On-Board Vision System
7 pages
Seminar Report
No ratings yet
Seminar Report
27 pages
Intelligent Vehicle Black Box System Using Iot: Iconnect-2023
No ratings yet
Intelligent Vehicle Black Box System Using Iot: Iconnect-2023
6 pages
Intelligent Pedestrian Intention Prediction Framework
No ratings yet
Intelligent Pedestrian Intention Prediction Framework
5 pages
Aviation Ni-Cd BMT - Battery Maintenance Training
No ratings yet
Aviation Ni-Cd BMT - Battery Maintenance Training
2 pages
Myanmar Cyclone Shelter Assessment
No ratings yet
Myanmar Cyclone Shelter Assessment
116 pages
Pedestrian Detection and Tracking
No ratings yet
Pedestrian Detection and Tracking
13 pages
Design and Implementation of The Pedestrian Information Analysis System
No ratings yet
Design and Implementation of The Pedestrian Information Analysis System
7 pages
Major Project Research Paper
No ratings yet
Major Project Research Paper
7 pages
A Method For Tracking Road Objects
No ratings yet
A Method For Tracking Road Objects
15 pages
Paper For Bibliometric Analysis - Occlusion
No ratings yet
Paper For Bibliometric Analysis - Occlusion
3 pages
51 Submission
No ratings yet
51 Submission
5 pages
Alert System
No ratings yet
Alert System
7 pages
Monitoring Crowded Traffic Scenes
No ratings yet
Monitoring Crowded Traffic Scenes
6 pages
Measurements of Spatial Angles Using Diamond Nitrogen-Vacancy Center Optical Detection Magnetic Resonance
No ratings yet
Measurements of Spatial Angles Using Diamond Nitrogen-Vacancy Center Optical Detection Magnetic Resonance
5 pages
Real-Time Detection of Road Markings Using Labview and C++
No ratings yet
Real-Time Detection of Road Markings Using Labview and C++
6 pages
Pedestrian Detection Based On Background Compensation With Block-Matching Algorithm
No ratings yet
Pedestrian Detection Based On Background Compensation With Block-Matching Algorithm
5 pages
A Pedestrian Detection and Tracking System Based On Video Processing Technology
No ratings yet
A Pedestrian Detection and Tracking System Based On Video Processing Technology
6 pages
Mate Szarvas Pedestrian Detection With Convolutional Neural Networks IV 2005 Final PDF
No ratings yet
Mate Szarvas Pedestrian Detection With Convolutional Neural Networks IV 2005 Final PDF
6 pages
Biodata of Profvssapkal
No ratings yet
Biodata of Profvssapkal
30 pages
DB en Trio Ups 2g 1ac 1ac 120v 750va 107057 en 01
No ratings yet
DB en Trio Ups 2g 1ac 1ac 120v 750va 107057 en 01
24 pages
Regions of Interest For Accurate Object Detection
No ratings yet
Regions of Interest For Accurate Object Detection
8 pages
(IJCST-V6I6P1) : Gandrapu Gideon, Prof. K. Venkata Rao
No ratings yet
(IJCST-V6I6P1) : Gandrapu Gideon, Prof. K. Venkata Rao
5 pages
Conference Paper
No ratings yet
Conference Paper
4 pages
Pedestrian Detection and Tracking at Crossroads
No ratings yet
Pedestrian Detection and Tracking at Crossroads
10 pages
Multiple Lanes Identification For Advanced Driver Assistance System (ADAS)
No ratings yet
Multiple Lanes Identification For Advanced Driver Assistance System (ADAS)
6 pages
Pedestrian Detection System For Night Vision Application To Avoid Pedestrian Vehicle Related Accidents
No ratings yet
Pedestrian Detection System For Night Vision Application To Avoid Pedestrian Vehicle Related Accidents
6 pages
Ijdacr
No ratings yet
Ijdacr
7 pages
A Multi-Resolution Approach For Infrared Vision-Based Pedestrian Detection
No ratings yet
A Multi-Resolution Approach For Infrared Vision-Based Pedestrian Detection
6 pages
Vision-Based Human Tracking and Activity Recognition
No ratings yet
Vision-Based Human Tracking and Activity Recognition
6 pages
Alchemical Imagery in The Works of Quiri PDF
No ratings yet
Alchemical Imagery in The Works of Quiri PDF
467 pages
Aspratame :from Dr. Adrian Gross, FDA Toxicologist, To Carl Sharp
No ratings yet
Aspratame :from Dr. Adrian Gross, FDA Toxicologist, To Carl Sharp
3 pages
My Homework For You
100% (1)
My Homework For You
4 pages
VVM MCQ On Electricity
No ratings yet
VVM MCQ On Electricity
4 pages
DLP Cot2
No ratings yet
DLP Cot2
3 pages
One Hundred Years of Solitude-The Story of Mankind Re-Visited
No ratings yet
One Hundred Years of Solitude-The Story of Mankind Re-Visited
5 pages
Unit 3 Ge Esci Contemporary World
No ratings yet
Unit 3 Ge Esci Contemporary World
28 pages
System-On-Chip Design Book 2019 200dpi Aw
No ratings yet
System-On-Chip Design Book 2019 200dpi Aw
334 pages
Katalog Cable Support SIVENTRA (Tray C) - Siap Cetak
No ratings yet
Katalog Cable Support SIVENTRA (Tray C) - Siap Cetak
7 pages
Agfa Parat-1
No ratings yet
Agfa Parat-1
30 pages
Consumer H5DU516 (8) 2ETR-xxx (Rev1.1)
No ratings yet
Consumer H5DU516 (8) 2ETR-xxx (Rev1.1)
30 pages
The Recycling Folded Cascode A General Enhancement of The Folded Cascode Amplifier
No ratings yet
The Recycling Folded Cascode A General Enhancement of The Folded Cascode Amplifier
8 pages
7.chapter 4 Fire Protection Design Process
No ratings yet
7.chapter 4 Fire Protection Design Process
4 pages
Latihan Soal PRDDD
No ratings yet
Latihan Soal PRDDD
73 pages
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Sneha Sarkar, 127, B, Beta and Gamma Function
No ratings yet
Sneha Sarkar, 127, B, Beta and Gamma Function
12 pages
Medical Image Analysis: Published by Elsevier B.V
No ratings yet
Medical Image Analysis: Published by Elsevier B.V
1 page
Aw Hook-Simulationxpress Study-1
No ratings yet
Aw Hook-Simulationxpress Study-1
11 pages
10024947D00 - Turbine Control Board Requirements Specification, PB 540
No ratings yet
10024947D00 - Turbine Control Board Requirements Specification, PB 540
8 pages
3rd-5 Grade Lesson Plans
No ratings yet
3rd-5 Grade Lesson Plans
2 pages
Reported Speech: Mr.A-Bouhandi
No ratings yet
Reported Speech: Mr.A-Bouhandi
1 page

Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

Uploaded by

Enhancing Pedestrian Group Detection and Tracking Through Zone-Based Clustering

Uploaded by

Received 20 September 2023, accepted 16 November 2023, date of publication 23 November 2023,

date of current version 29 November 2023.

Enhancing Pedestrian Group Detection and

I. INTRODUCTION While safety belts and various vehicle components can

VOLUME 11, 2023 132163

132164 VOLUME 11, 2023

TABLE 1. Summary of information fusion-based detectors.

VOLUME 11, 2023 132165

TABLE 2. Summary of CNN-based detectors.

132166 VOLUME 11, 2023

VOLUME 11, 2023 132167

TABLE 3. Summary of object tracking methods.

TABLE 4. Summary of pedestrian group detection methods.

FIGURE 1. Structure of pedestrian group detection and tracking.

132168 VOLUME 11, 2023

FIGURE 3. Object detection result with filtering.

C. PEDESTRIAN GROUP DETECTION

VOLUME 11, 2023 132169

132170 VOLUME 11, 2023

Algorithm 3 ExpandCluster(p, neighborPts, C, ϵ,

VOLUME 11, 2023 132171

FIGURE 5. The structure of our proposed group detection methods.

B. ZONE-BASED K-MEANS CLUSTERING METHOD

132172 VOLUME 11, 2023

VOLUME 11, 2023 132173

132174 VOLUME 11, 2023

FIGURE 7. A comparison of grouping results on a sample frame.

TABLE 5. Pedestrian detection and tracking processing times in seconds.

and epsilon are required as inputs. Since an individual object

G. ANALYSIS OF DBSCAN CLUSTERING-BASED GROUP H. ANALYSIS OF GRID CLUSTERING-BASED GROUP

VOLUME 11, 2023 132175

FIGURE 10. SSE of different number of clusters.

FIGURE 12. Group detection with DBSCAN method on a sample frame of

TABLE 6. Grid clustering comparison on the MOT17 dataset based on ARI

FIGURE 11. Group detection with K-Means method (k=6) on a sample

TABLE 7. Grid clustering comparison on the KITTI dataset based on ARI

I. ANALYSIS OF OUR PROPOSED Z-DBSCAN

132176 VOLUME 11, 2023

VOLUME 11, 2023 132177

132178 VOLUME 11, 2023

VOLUME 11, 2023 132179

You might also like