Keywords: Deep learning; Technological innovation; Object detection; Detectors; Real-time systems;
Informatics; Machine intelligence; Object detection; Indoor scene; Outdoor scene; Generative Adversarial
Network; A modal Perception; Instance segmentation; Compositional models.
Introduction: The complexity of dynamic scenes and the limitations of existing approaches pose
numerous important problems to occlusion detection and feature handling in computer vision
applications. Computing efficiency, occlusion modeling, and detection accuracy are the three key
areas where these issues manifest.
Modeling of Occlusion
To represent occlusions, conventional approaches frequently use semantic picture masks, which
necessitate large amounts of training data and intricate generation procedures (Nietiedt et al.,
2024).
Creating high-quality training datasets becomes more challenging in dynamic situations due to
the unpredictable nature of occlusions, which in turn reduces reconstruction quality (Nietiedt et
al., 2024).
Precision of Detection
The detection performance of object identification systems can be greatly affected by occlusion
rates, which is especially problematic in smart video surveillance (Ouardirhi et al., 2023).
In multi-object tracking scenarios, fragmented trajectories may occur since existing techniques
do not discriminate between overlapping items (Su et al., 2023).
Efficient Computing
High computing demands frequently impede performance in augmented reality systems, making
real-time occlusion processing a difficulty (Dai et al., 2024).
Although new algorithms have shown improvement in tracking speed, they still have challenges
when it comes to handling occlusions and major changes in appearance (Sun et al., 2024). One
example is the use of multi-template update techniques.
There has been progress, but there is still a long way to go before we can overcome the
enormous obstacles presented by the intrinsic complexity of occlusions in real-world
circumstances; this calls for constant innovation and study.
Feature-based occlusion detection methods utilize a range of techniques to improve performance
in challenging circumstances. The main characteristics consist of contour detection, spatio-
temporal matching, and integration of depth information, which together enhance the precision
and effectiveness of occlusion control.
Contour detection
The approach presented by Dai et al. use contour detection to align actual and virtual pictures,
producing masks in real-time for occlusion processing. This approach improves the precision of
edge detection and decreases the amount of computational work required (Dai et al., 2024).
Spatio-temporal matching refers to the process of aligning and correlating data or events in both
space and time.
Nietiedt et al. propose a resilient estimating technique that identifies occlusions in dynamic
situations without requiring a large amount of training data. This method enhances the quality of
reconstruction. This methodology successfully mitigates the negative effects of occlusion, while
still achieving a level of accuracy that is comparable to standard methods (Nietiedt et al., 2024).
Integration of depth and RGB data.
Ouardirhi et al. utilize a Feature Pyramid Network in conjunction with depth and RGB data to
improve object detection in situations where objects are partially hidden. This integration
facilitates enhanced differentiation of intersecting entities, leading to a substantial enhancement
in the accuracy of detection (Ouardirhi et al., 2023)(Ouardirhi et al., 2024).
However, there are still obstacles to overcome, especially in densely populated areas where
obstructions occur frequently. Further investigation should prioritize the enhancement of
detection algorithms to specifically tackle these enduring problems (Gregorio et al., 2023).
Literature Review:
K. Saleh (There has been tremendous progress in object detection thanks to the huge capability of
deep learning networks. Object detector frameworks have made great strides in efficiency and
accuracy in recent years. But there are a number of reasons, occlusion being one of them, that
significantly limit their capability compared to humans. Complexity arises from the fact that
occlusion can occur in a wide range of scales, places, and ratios. In this study, we discuss the
difficulties of controlling occlusion in generic object detection in both indoor and outdoor
situations, and we make note of the current efforts to solve these problems. Lastly, we go over a
few potential avenues for further study.
Liu, T. [2020] Deep convolutional neural networks have improved pedestrian detection. Small and
obscured pedestrians are still difficult to identify. To solve these two difficulties, we present a
couple-network pedestrian identification approach in this research. The gated multi-layer feature
extraction sub-network adaptively generates discriminative features for pedestrian candidates to
robustly detect pedestrians with large scale variations. Using deformable regional area of interest
(RoI)-pooling, the second sub-network addresses pedestrian detection occlusion. We study two
gate units for the gated sub-network, the channel-wise and spatio-wise gate units, which can
repeatedly represent regional convolutional features across channel dimensions or spatial
domains. Ablation investigations have successfully tested the gated multi-layer feature extraction
and deformable occlusion handling subnetworks. Our pedestrian detector performs well on both
datasets, especially for small or obstructed pedestrians, using the linked architecture.
Yuan, Y., et. al. [2020] Correlation
filters (CFs) combined with convolutional neural network
(CNN) features are effective for object tracking. Nevertheless, the prominent characteristics of a
standard CNN lacking residual structure are limited by the lack of detailed information and are
susceptible to interference from similar objects or background noise. However, CF-based
approaches typically update filters for each frame, even in the presence of occlusion. This can
diminish the capacity to distinguish the target from the backdrop. This research presents a novel
method for object tracking that is capable of adapting to different scales. The initial step involves
extracting features from various layers of ResNet to generate response maps. Subsequently, these
response maps are fused using the AdaBoost algorithm to enhance the precision of target
localization. Furthermore, in order to avoid the filters from updating during occlusion, a
recommended update technique incorporating occlusion detection is suggested. Ultimately, a
scale filter is employed to approximate the desired scale. The experimental results indicate that
the suggested method outperforms various conventional methods, particularly when dealing with
occlusion and size change.
References
[1] K. Saleh, S. Szénási and Z. Vámossy, "Occlusion Handling in Generic Object Detection: A
Review," 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI),
Herl'any, Slovakia, 2021, pp. 000477-000484, doi: 10.1109/SAMI50585.2021.9378657.
[2] Liu, T., Luo, W., Ma, L., Huang, J. J., Stathaki, T., & Dai, T. (2020). Coupled network for robust
pedestrian detection with gated multi-layer feature extraction and deformable occlusion handling. IEEE
transactions on image processing, 30, 754-766.
[3] Yuan, Y., Chu, J., Leng, L., Miao, J., & Kim, B. G. (2020). A scale-adaptive object-tracking algorithm
with occlusion detection. EURASIP Journal on Image and Video Processing, 2020, 1-15.