From Classical Techniques To Convolution-Based Models: A Review of Object Detection Algorithms

This document reviews object detection algorithms, contrasting classical methods with convolutional neural networks (CNNs). It highlights the evolution of object detection techniques, emphasizing the limitations of traditional approaches and the advancements brought by deep learning. The paper categorizes detection methods, discusses their strengths and weaknesses, and identifies areas for future research to enhance performance.

Uploaded by

Neha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views6 pages

From Classical Techniques To Convolution-Based Models: A Review of Object Detection Algorithms

Uploaded by

Neha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

From classical techniques to convolution-based

models: A review of object detection algorithms

1st FNU Neha 1st Deepshikha Bhati 2nd Deepak Kumar Shukla 3rd Md Amiruzzaman
Dept. of Computer Science Dept. of Computer Science Rutgers Business School Dept. of Computer Science
Kent State University Kent State University Rutgers University West Chester University
Kent, OH, USA Kent, OH, USA Newark, New Jersey, USA West Chester, PA, USA
neha@[Link] dbhati@[Link] ds1640@[Link] mamiruzzaman@[Link]
arXiv:2412.05252v1 [[Link]] 6 Dec 2024

Abstract—Object detection is a fundamental task in computer

vision and image understanding, with the goal of identifying and
localizing objects of interest within an image while assigning them
corresponding class labels. Traditional methods, which relied on
handcrafted features and shallow models, struggled with complex
visual data and showed limited performance. These methods
combined low-level features with contextual information and
lacked the ability to capture high-level semantics. Deep learning,
especially Convolutional Neural Networks (CNNs), addressed
these limitations by automatically learning rich, hierarchical Fig. 1: (A) Single-object sunflower: A single bounding box
features directly from data. These features include both semantic localizes and classifies the central sunflower bloom. (B)
and high-level representations essential for accurate object de- Multiple-object sunflower: Multiple bounding boxes highlight
tection. This paper reviews object detection frameworks, starting
and classify overlapping sunflowers and leaves, illustrating
with classical computer vision methods. We categorize object
detection approaches into two groups: (1) classical computer multi-scale object detection and localization within a complex
vision techniques and (2) CNN-based detectors. We compare scene.
major CNN models, discussing their strengths and limitations. In
conclusion, this review highlights the significant advancements in
object detection through deep learning and identifies key areas especially CNNs, have significantly improved detection per-
for further research to improve performance. formance. Modern methods use hierarchical representations,
Index Terms—Object Detection, CNN, Deep Learning, Image
enabling object detection in complex environments with oc-
Processing, Computer Vision
clusions and varying scales.
I. INTRODUCTION Although many studies have reviewed specific deep learning
models or object detection applications, few provide a compre-
Deep learning (DL) has advanced image analysis, especially
hensive overview of both classical computer vision techniques
in object classification, localization, and detection tasks. In
and CNN-based approaches. This paper addresses this gap by
classification, the aim is to assign an image or object within it
offering an analysis of both. Key contributions include:
to one of several categories [1]. However, classification does
1) A review of classical computer vision techniques for
not provide the object’s location. Localization improves on this
object detection.
by identifying both the object’s category and position, typically
2) An analysis of general region proposal generation tech-
with a bounding box [2], though the precision of these boxes
niques.
can vary. Object detection further extends classification and
3) A detailed review of convolution-based models for ob-
localization by detecting and classifying multiple objects in an
ject detection, including two-stage and one-stage detec-
image, providing bounding boxes for each [2]. The bounding
tors.
box’s top-left corner is represented by (Xmin , Ymin ), and the
bottom-right by (Xmax , Ymax ), along with a label indicating The paper is organized as follows: Section 2 covers classical
the object’s class as shown in Fig 1. computer vision techniques for object detection, Section 3
Object detection has applications across fields such as discusses region proposal generation, Section 4 explores CNN-
medical imaging, logo detection, facial recognition, pedestrian based detection architectures, Section 5 reviews applications,
detection, and industrial automation. However, challenges arise Section 6 lists popular datasets, Section 7 covers evaluation
from image transformations like changes in scale, orienta- metrics, and Section 8 concludes with future directions.
tion, and lighting. While classical computer vision techniques II. CLASSICAL COMPUTER VISION TECHNIQUES
provided a foundation, advancements in deep learning (DL), FOR OBJECT DETECTION
The first author contributed the most to this paper. Corresponding author: Earlier computer vision techniques for image processing,
mamiruzzaman@[Link] particularly image similarity, relied on feature-based methods
[3]–[8]. These methods focused on extracting distinctive image Each method has specific limitations: multiscale saliency
features to reduce computational costs while enabling robust struggles with low-contrast objects, color contrast is inef-
image matching despite transformations like scaling or rotation fective with minimal contrast, edge detection may produce
[3]. The Scale-Invariant Feature Transform (SIFT) algorithm false positives or negatives, and super-pixel clustering requires
overcame the challenge of scaling by extracting features refinement. Consequently, hybrid models are often developed
invariant to scale, rotation, brightness, and contrast [4]. Other to improve region proposal accuracy.
feature extractors, like the Canny Edge Detector, contributed
to tasks like image comparison and panoramic stitching by IV. CONVOLUTION BASED OBJECT DETECTION
providing resilience to transformations and occlusions [5]. The MODELS
Histogram of Oriented Gradients (HOG) technique enabled Object detection initially relied on manual feature design,
efficient image analysis by measuring gradient magnitudes and focusing on patterns and edges. With CNN advancements,
directions, creating descriptive feature vectors [6]. networks such as Visual Geometry Group Network (VGGNet)
Traditional object detection involves three stages: [15] and AlexNet [16] now autonomously extract features
1) Proposal Generation: Scanning the image at various through convolution and pooling layers, with fully connected
positions and scales to generate candidate bounding (FC) layers followed by a SoftMax layer for classification.
boxes, often using methods like sliding windows or For localization, the final FC layer outputs bounding box
selective search algorithms. coordinates, unlike classical methods which use filters and
2) Feature Extraction: Extracting features from the iden- machine learning-based models (e.g., SVMs).
tified regions to capture relevant visual patterns. Training CNN-based models involve adjusting weights via
3) Classification: Classifying the extracted features using backpropagation to align predictions with ground truth bound-
machine learning algorithms, such as support vector ing boxes. Detection models fall into two categories: (1)
machine (SVM). Two-stage detectors, which generate region proposals before
In 2001, Viola et al. introduced a real-time (webcam based) classification, including R-CNN [17], Fast R-CNN [18], Faster
facial detection classifier [7]. In 2005, Dalal et al. introduced R-CNN [19], and Mask R-CNN [20]; and (2) One-stage de-
an object detector using HOG features and an SVM classifier, tectors, treating detection as direct regression or classification
effective across scales but limited by pose variations [6]. In tasks, like YOLO [21] and SSD [22]. Table I summarizes their
2009, Felzenszwalb et al. improved this with the Deformable strengths and limitations.
Part Model (DPM), allowing flexible parts to handle poses,
though it struggled with overlapping parts in multi-person TABLE I: Comparison of CNN-Based Object Detection Ar-
images [8]. chitectures
Studies from 2008 to 2012 on popular object detection Model Strengths Limitations
R-CNN (2013) Simple, foundational; ap- High computation
datasets (see Section 5) showed key limitations in traditional plies CNNs for classifica- for 2000 region
methods. For instance, sliding windows require substantial tion. classifications; slow
(47 sec/image); no end-
computational resources and can generate redundant detec- to-end training.
tions. Additionally, the performance of the classifier greatly SPPNet (2015) Faster than R-CNN; sup- Does not update conv. lay-
ports multi-scale input via ers before SPP layer dur-
impacts the results, necessitating more robust approaches. spatial pyramid pooling. ing fine-tuning.
Fast R-CNN (2015) Faster than SPPNet; intro- Relies on selective search
III. GENERIC REGION PROPOSAL GENERATION duces ROI pooling to han- for region proposals, not
TECHNIQUES dle varied input sizes. learned during training.
Faster R-CNN (2015) Uses RPN for fast region Limited in detecting small
Object detection models integrate a bounding box regressor proposals; improves effi- objects due to single fea-
ciency. ture map.
within the classification network to accurately locate objects Mask R-CNN (2017) Adds instance segmenta- High computational de-
[9]. Traditionally, this involves feeding cropped images to tion, detecting objects and mand; struggles with mo-
the localization network, resulting in excessive inputs. An masks simultaneously. tion blur at low resolution.
YOLO (2015) Real-time detection at 45 Poor detection of small
OverFeat model enhances efficiency by using a sliding window fps; single forward pass. objects; produces coarse
detector within convolution layers, scanning images with a features.
SSD (2016) Handles various resolu- Default boxes may not
large filter and stride [10]. However, indiscriminate scanning tions; uses multi-scale fea- match all shapes; possible
of background regions necessitates predicting potential object ture maps for detection. overlapping detections.
locations. Methods such as interest point detection, multiscale
saliency, color contrast, edge detection, and super-pixel clus-
A. Region-based Convolutional Neural Network (R-CNN)
tering are employed for this purpose [11]–[14].
For instance, multiscale saliency leverages the Fast Fourier In 2014, Girshick et al. introduced R-CNN, a two-stage net-
Transform to analyze features at multiple scales [11]; color work that combines classical techniques like selective search
contrast relies on color intensity differences [12]; edge de- with CNNs for object detection [17] (see Fig. 2). R-CNN’s
tection identifies edges, followed by density analysis [13]; training involves three steps:
and super-pixel clustering groups similar pixels for detailed • Fine-tune a pre-trained network (e.g., AlexNet) on region
analysis [14]. proposals generated by selective search.
Fig. 2: R-CNN Architecture Fig. 3: SPP-Net Architecture

• Train an SVM classifier for object classification.

• Use a bounding box regressor to improve localization
accuracy.
Selective search generates around 2000 region proposals,
each resized to 227x227 pixels for CNN input, reducing the
computational cost of exhaustive sliding windows.
Initially, R-CNN achieved 44% accuracy, improving to 54%
after fine-tuning on warped images. Adding a bounding box Fig. 4: Faster R-CNN Architecture
regressor boosted accuracy to 58%, and using VGGNet further
increased it to 66%. While nine times slower than OverFeat,
R-CNN’s focus on region proposals reduces false positives, and uses a Region of Interest (ROI) Pooling layer to extract
improving accuracy by 10%. fixed-length features from each region, dividing proposals into
However, R-CNN has some limitations: a fixed N × N grid. Unlike SPP, ROI Pooling backpropagates
• Feature extraction is performed independently for each error signals, enabling end-to-end optimization.
proposal, resulting in high computational costs. After feature extraction, features pass through FC layers,
• The separate stages of proposal generation, feature extrac- outputting (1) SoftMax probabilities for C+1 classes (including
tion, and classification prevent end-to-end optimization. background) and (2) four bounding box regression parameters.
• Selective search relies on low-level visual features, strug- Fast R-CNN achieved better accuracy than R-CNN and SPP-
gles with complex scenes, and does not benefit from GPU Net but still relied on traditional proposal methods.
acceleration.
• Despite higher accuracy compared to methods like Over-
D. Faster Region-based Convolutional Neural Network
Feat, R-CNN is slower due to these inefficiencies. (Faster R-CNN)
B. Spatial Pyramid Pooling-Net (SPP-Net) In 2015, Girshick et al. introduced Faster R-CNN, which
In 2015, He et al. introduced SPP-Net to improve detection utilizes the Region Proposal Network (RPN) to generate object
speed and feature learning over R-CNN [23]. Unlike R-CNN, proposals at each feature map position using a sliding window
which processes each cropped proposal individually, SPP-Net approach (Fig. 4) [18]. This method shares feature extraction
computes the feature map for the entire image and then applies across regions, enhancing efficiency and achieving state-of-
a Spatial Pyramid Pooling (SPP) layer to extract fixed-length the-art results. However, the separate computation for region
feature vectors (See Fig. 3). The SPP layer divides the feature classification can be inefficient with many proposals, and
map into grids of varying sizes (N × N), enabling pooling reliance on a single deep feature map makes detecting objects
at multiple scales and concatenation of the resulting feature of varying scales difficult, as deep features are semantically
vectors. strong but spatially weak, while shallow features are spatially
SPP-Net allows multi-scale and varied aspect ratio handling strong but semantically weak.
without resizing, preserving image details and improving both
accuracy and inference speed over R-CNN. However, its multi- E. Mask R-CNN
stage training hinders end-to-end optimization and requires
extra memory for feature storage. Additionally, the SPP layer In 2017, He et al. introduced Mask R-CNN, an extension of
does not back-propagate to earlier layers, keeping parameters Faster R-CNN that performs pixel-level instance segmentation
fixed before the SPP layer and limiting deeper learning. [20]. It adds a new branch for binary mask prediction to the
two-stage pipeline, alongside class and box predictions. This
C. Fast Region-based Convolutional Neural Network (Fast R- branch uses a fully convolutional network (FCN) atop the
CNN) CNN feature map. Mask R-CNN also replaces RoIPool with
In 2015, Girshick et al. introduced Fast R-CNN, a two- RoIAlign to better preserve spatial accuracy, enhancing mask
stage detector designed to improve on SPP-Net’s limitations precision. However, it struggles to detect objects with motion
[18]. Fast R-CNN computes a feature map for the entire image blur in low-resolution images.
F. You Only Look Once (YOLO) tasks such as document digitization, automated data entry,
To increase speed, one-stage models like YOLO (You Only and cognitive computing.
Look Once) were developed, bypassing region proposals. • Self-Driving Cars: Object detection is essential for au-
Introduced in 2015 by Redmon et al., YOLO treats detection tonomous vehicles to detect and classify objects such as
as a regression task [21]. Dividing the image into an S × S cars, pedestrians, traffic lights, and road signs.
grid, YOLO predicts class probabilities, bounding boxes, and • Object Tracking: Used in tracking objects in videos,
confidence scores per cell. This captures context well, reducing object detection has applications in surveillance, traffic
false positives, but the grid structure can cause localization monitoring, and sports analytics.
errors and struggles with small objects. • Face Detection and Recognition: Widely employed in
YOLO has undergone several iterations, enhancing its per- computer vision, object detection is used for social media
formance: image tagging and biometric security systems.
• YOLOv2/YOLO9000 (2017): Introduced batch normal-
• Object Extraction from Images or Videos: Facilitates
ization and anchor boxes for improved speed and accu- segmentation and meaningful representation of images,
racy [24]. potentially enabling applications like video object extrac-
• YOLOv3 (2018): Added multi-scale predictions and
tion.
residual connections for better detection across various • Digital Watermarking: Embed markers into digital sig-
sizes [25]. nals for copyright protection and authentication purposes.
• YOLOv4 (2020): Enhanced with the CSPDarknet back-
• Medical Imaging: Assists clinicians in diagnosis and
bone and advanced training techniques, achieving higher therapy planning, particularly in tracking anatomical ob-
precision [26]. jects.
• YOLOv5 (2021): Focused on usability, scalability, and
Object detection technology continues to evolve, promis-
deployment flexibility with various model sizes [27]. ing further advancements and expanding its applications
• YOLOv6 (2022): Optimized for edge devices with im-
across various industries.
proved backbone and attention mechanisms [28]. VI. POPULAR DATASET
• YOLOv7 (2023): Employed AutoML techniques for
Key datasets in object detection include Pascal VOC [33],
dynamic model optimization, enhancing adaptability [29].
COCO [34], ImageNet [35], and Open Images [36]. Pascal
• YOLOv8 (2023): Incorporated a transformer-based back-
VOC (Visual Object Classes) offers a manageable size, balanc-
bone for better detection in dense scenes [30].
ing complexity and computational efficiency, making it ideal
• YOLOv9 (2024): Utilized adversarial training to improve
for testing. COCO (Common Objects in Context) provides
robustness against variations [31].
extensive annotations with multiple objects per image, includ-
• YOLOv10 (2024): Implemented real-time feedback
ing segmentation and key points. ImageNet, primarily used
loops for dynamic adjustments, boosting accuracy [32].
for classification, also includes object detection annotations.
These enhancements have established YOLO as a versatile
Open Images, with over 600 labeled categories, stands out for
and powerful option for real-time object detection.
its large scale, offering both bounding box annotations and
G. Single Shot MultiBox Detector (SSD) segmentation masks. Table II summarizes the key attributes of
The Single Shot MultiBox Detector (SSD), introduced by each dataset, emphasizing their unique features and primary
Liu et al. in 2016, is a one-stage model that improves on usage. Table III provides a comparison of the performance
YOLO by using anchors with multiple scales and aspect ratios of RCNN, Fast RCNN, Faster RCNN, Mask RCNN, YOLO,
within each grid cell [22]. Each anchor is refined by regres- and SSD on these datasets in terms of mAP, inference speed
sors and assigned probabilities across categories, with object (measured in Frames Per Second, or FPS), and model size.
detection predicted on multiple feature maps for different VII. E VALUATION M ETRICS
scales. SSD trains end-to-end with a weighted localization
and classification loss, integrating results across maps. Using Object detection models are assessed using several key
hard negative mining and extensive data augmentation, SSD metrics: Intersection over Union (IoU), Mean Average Pre-
matches Faster R-CNN’s accuracy while allowing real-time cision (mAP), Precision, Recall, Confidence Score (CS), F1
inference. Score, and Non-Maximum Suppression (NMS). Table IV
summarizes these metrics, highlighting their limitations and
V. APPLICATIONS potential biases.
Object detection, powered by CNN, has diverse applica-
A. Intersection over Union (IoU)
tions, spanning from targeted advertising to self-driving cars
and beyond. It is utilized for handwritten digit recognition, IoU measures the overlap between the predicted and ground
Optical Character Recognition (OCR), face detection, medical truth bounding boxes, calculated as the ratio of the intersection
image analysis, sports analytics, and more. area to the union area:
• Optical Character Recognition (OCR): OCR converts
Area of Intersection
images of text into machine-encoded text, facilitating IoU =
Area of Union
TABLE II: Popular Object Detection Datasets
Dataset Number of Images Number of Classes Usage
Pascal VOC 0.01 million 20 Initial model testing
COCO 0.33 million 80 Object detection
ImageNet 1.5 million 1,000 Object localization and detection
Open Images 9.2 million 600 Object localization

TABLE III: Quantitative Performance Comparison of Object Detection Models on different Dataset
Model Pascal VOC (mAP) COCO (mAP) ImageNet (mAP) Open Images (mAP) Inference Speed (FPS) Model Size (MB)
RCNN 66% 54% 60% 55% ∼5 FPS 200
Fast RCNN 70% 59% 63% 58% ∼7 FPS 150
Faster RCNN 75% 65% 68% 63% ∼10 FPS 180
Mask RCNN 76% 66% 69% 64% ∼8 FPS 230
YOLO 72.5% 58.5% 61.5% 57.5% ∼45–60 FPS 145
SSD 75% 63.5% 66.5% 61.5% ∼19–46 FPS 145

B. Mean Average Precision (mAP) • Speed-Accuracy Trade-off: Enhancing both accuracy and
mAP evaluates model performance by averaging the pre- speed for real-time, low-power applications.
• Tiny Object Detection: Improving the detection of small
cision across all classes. The Average Precision (AP) is
computed as: objects in areas such as wildlife monitoring and medical
imaging.
Pn
(P (k) × Precision at Recall(k)) • 3D Object Detection: Leveraging 3D sensors for applica-
AP = k=1 tions in augmented reality and robotics.
n
• Multi-modal Detection: Integrating visual and textual
where P (k) is the change in recall from the previous highest sources for better accuracy in complex scenarios.
recall, and precision at recall k is the maximum precision • Few-shot Learning: Developing models that can effec-
observed at any recall level j where j ≥ k. tively detect objects from limited examples, particularly
C. Precision and Recall in low-resource settings.
This review aims to foster interest in advancing object
Precision is the ratio of true positives to all positive predic-
detection models and to inspire innovation to address current
tions, while Recall is the ratio of true positives to all ground
limitations, including minimizing environmental impacts.
truth positives.
ACKNOWLEDGMENT
D. Confidence Score (CS)
This study was partly supported by the West Chester Uni-
The Confidence Score reflects the model’s certainty that a
versity faculty development fund.
predicted bounding box contains the correct object. Higher
scores indicate greater accuracy and help set thresholds for R EFERENCES
accepting or rejecting detections.
[1] L. Chen, S. Li, Q. Bai, J. Yang, S. Jiang, and Y. Miao, “Review of
image classification algorithms based on convolutional neural networks,”
E. Non-Maximum Suppression (NMS) Remote Sensing, vol. 13, no. 22, p. 4712, 2021.
[2] C. B. Murthy, M. F. Hashmi, N. D. Bokde, and Z. W. Geem, “Investi-
Non-Maximum Suppression refines bounding box predic- gations of object detection in images/videos using various deep learning
tions by sorting them by confidence scores and selecting the techniques and embedded platforms—a comprehensive review,” Applied
highest one while suppressing overlapping boxes. This process sciences, vol. 10, no. 9, p. 3280, 2020.
[3] J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching
ensures each object is detected once, improving accuracy and from handcrafted to deep features: A survey,” International Journal of
efficiency. Computer Vision, vol. 129, no. 1, pp. 23–79, 2021.
[4] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
VIII. D ISCUSSION AND F UTURE D IRECTIONS International journal of computer vision, vol. 60, pp. 91–110, 2004.
[5] J. Canny, “A computational approach to edge detection,” IEEE Transac-
This review examined prominent object detection models, tions on pattern analysis and machine intelligence, no. 6, pp. 679–698,
classifying them into classical computer vision techniques 1986.
[6] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
and CNN-based methods. While recent CNN architectures detection,” in 2005 IEEE computer society conference on computer
have significantly improved accuracy to below 5%, they also vision and pattern recognition (CVPR’05), vol. 1. Ieee, 2005, pp.
increase complexity and resource demands. Traditional models 886–893.
[7] P. Viola and M. Jones, “Rapid object detection using a boosted cascade
like Deformable Part Models (DPMs) are shallower and more of simple features,” in Proceedings of the 2001 IEEE computer society
lightweight, making them better suited for edge deployment conference on computer vision and pattern recognition. CVPR 2001,
compared to modern deep learning architectures like AlexNet vol. 1. Ieee, 2001, pp. I–I.
[8] P. Felzenszwalb, D. McAllester, and D. Ramanan, “A discriminatively
and VGGNet. trained, multiscale, deformable part model,” in 2008 IEEE conference
Key future directions for object detection include: on computer vision and pattern recognition. Ieee, 2008, pp. 1–8.
TABLE IV: Evaluation Metrics: Limitations and Potential Biases of Object Detection Models
Model Metrics Used Limitations Potential Biases
RCNN IoU, mAP, Precision, Recall, F1 Score - Separate region proposal step slows infer- - Favors larger objects due to reliance on
ence. - High memory usage due to multiple selective search. - Struggles with scale vari-
stages. ations and densely packed objects.
Fast RCNN IoU, mAP, Precision, Recall, F1 Score - Dependent on external region proposals. - - Similar biases as RCNN: prefers larger and
Not optimized for real-time applications. well-separated objects. - Performance drops
in high-density scenes.
Faster RCNN IoU, mAP, Precision, Recall, F1 Score - More complex architecture with integrated - Favors objects with distinct features de-
Region Proposal Network (RPN). - Requires tectable by RPN. - Limited accuracy on
careful hyperparameter tuning. small or thin objects compared to single-
shot models.
Mask RCNN IoU, mAP, Precision, Recall, F1 Score - Increased computational overhead from - Bias towards classes with abundant and
mask prediction. - Longer training times. detailed segmentation data. - Misses small
or occluded objects in segmentation masks.
YOLO IoU, mAP, Precision, Recall, Confidence Score - Lower detection accuracy on small objects. - Prioritizes objects at the center of the
- Struggles with overlapping objects and image. - Predefined grid may miss objects
crowded scenes. at image edges.
SSD IoU, mAP, Precision, Recall, Confidence Score - Performance degrades on very small ob- - Bias towards predefined anchor boxes,
jects. - Limited by predefined anchor box affecting generalization for unseen scales. -
scales and aspect ratios. Struggles with variable object shapes and
sizes not covered by anchor boxes.

[9] S. Schulter, C. Leistner, P. Wohlhart, P. M. Roth, and H. Bischof, pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904–
“Accurate object detection with joint classification-regression random 1916, 2015.
forests,” in Proceedings of the IEEE conference on computer vision and [24] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in
pattern recognition, 2014, pp. 923–930. Proceedings of the IEEE conference on computer vision and pattern
[10] P. Sermanet, “Overfeat: Integrated recognition, localization and detection recognition, 2017, pp. 7263–7271.
using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013. [25] A. Farhadi and J. Redmon, “Yolov3: An incremental improvement,”
[11] G. Li and Y. Yu, “Visual saliency based on multiscale deep features,” in Computer vision and pattern recognition, vol. 1804. Springer
in Proceedings of the IEEE conference on computer vision and pattern Berlin/Heidelberg, Germany, 2018, pp. 1–6.
recognition, 2015, pp. 5455–5463. [26] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Op-
[12] K. Fu, C. Gong, J. Yang, and Y. Zhou, “Salient object detection via color timal speed and accuracy of object detection,” arXiv preprint
contrast and color distribution,” in Computer Vision–ACCV 2012: 11th arXiv:2004.10934, 2020.
Asian Conference on Computer Vision, Daejeon, Korea, November 5-9, [27] G. Jocher, A. Stoken, A. Chaurasia, J. Borovec, Y. Kwon, K. Michael,
2012, Revised Selected Papers, Part I 11. Springer, 2013, pp. 111–122. L. Changyu, J. Fang, P. Skalski, A. Hogan et al., “ultralytics/yolov5: v6.
[13] C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from 0-yolov5n’nano’models, roboflow integration, tensorflow export, opencv
edges,” in Computer Vision–ECCV 2014: 13th European Conference, dnn support,” Zenodo, 2021.
Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. [28] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng,
Springer, 2014, pp. 391–405. W. Nie et al., “Yolov6: A single-stage object detection framework for
[14] K. Fu, C. Gong, J. Yang, Y. Zhou, and I. Y.-H. Gu, “Superpixel based industrial applications,” arXiv preprint arXiv:2209.02976, 2022.
color contrast and color distribution driven salient object detection,” [29] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable
Signal Processing: Image Communication, vol. 28, no. 10, pp. 1448– bag-of-freebies sets new state-of-the-art for real-time object detectors,”
1463, 2013. in Proceedings of the IEEE/CVF conference on computer vision and
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for pattern recognition, 2023, pp. 7464–7475.
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [30] G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics yolov8,” [Link]
com/ultralytics/ultralytics, 2023, aGPL-3.0 License.
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
[31] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “Yolov9: Learning what you
with deep convolutional neural networks,” Advances in neural informa-
want to learn using programmable gradient information,” arXiv preprint
tion processing systems, vol. 25, 2012.
arXiv:2402.13616, 2024.
[17] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
[32] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding,
hierarchies for accurate object detection and semantic segmentation,”
“Yolov10: Real-time end-to-end object detection,” arXiv preprint
in Proceedings of the IEEE conference on computer vision and pattern
arXiv:2405.14458, 2024.
recognition, 2014, pp. 580–587.
[33] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser-
[18] R. Girshick, “Fast r-cnn,” arXiv preprint arXiv:1504.08083, 2015. man, “The pascal visual object classes (voc) challenge,” International
[19] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time journal of computer vision, vol. 88, pp. 303–338, 2010.
object detection with region proposal networks,” IEEE transactions on [34] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1137–1149, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in
2016. context,” in Computer Vision–ECCV 2014: 13th European Conference,
[20] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13.
Proceedings of the IEEE international conference on computer vision, Springer, 2014, pp. 740–755.
2017, pp. 2961–2969. [35] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
[21] J. Redmon, “You only look once: Unified, real-time object detection,” A large-scale hierarchical image database,” in 2009 IEEE conference on
in Proceedings of the IEEE conference on computer vision and pattern computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
recognition, 2016. [36] A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset,
[22] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and S. Kamali, S. Popov, M. Malloci, A. Kolesnikov et al., “The open images
A. C. Berg, “Ssd: Single shot multibox detector,” in Computer Vision– dataset v4: Unified image classification, object detection, and visual
ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, relationship detection at scale,” International journal of computer vision,
October 11–14, 2016, Proceedings, Part I 14. Springer, 2016, pp. vol. 128, no. 7, pp. 1956–1981, 2020.
21–37.
[23] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep
convolutional networks for visual recognition,” IEEE transactions on

Fin Irjmets1684232858
No ratings yet
Fin Irjmets1684232858
9 pages
Object Detection Using Deep Learning
No ratings yet
Object Detection Using Deep Learning
5 pages
Tools, Techniques, Datasets and Application Areas For Object Detection in An Image: A Review
No ratings yet
Tools, Techniques, Datasets and Application Areas For Object Detection in An Image: A Review
55 pages
John 2020 Comparative
No ratings yet
John 2020 Comparative
7 pages
Object Detection in Images and Videos Using OpenCV A Comparative Study of Deep Learning and Traditional Computer Vision Techniques
No ratings yet
Object Detection in Images and Videos Using OpenCV A Comparative Study of Deep Learning and Traditional Computer Vision Techniques
6 pages
2021 - 2 Hslskhkshsigs
No ratings yet
2021 - 2 Hslskhkshsigs
2 pages
Ijlbps 6620dd20c5747
No ratings yet
Ijlbps 6620dd20c5747
8 pages
Object Detection With DL
No ratings yet
Object Detection With DL
17 pages
Object Detection Presentation
No ratings yet
Object Detection Presentation
12 pages
Investigations of Object Detection in Im
No ratings yet
Investigations of Object Detection in Im
46 pages
Object Detectionwith Convolutional Neural Networks
No ratings yet
Object Detectionwith Convolutional Neural Networks
12 pages
Vijay Report
No ratings yet
Vijay Report
14 pages
Objectdetection
No ratings yet
Objectdetection
7 pages
Sensors 25 00214
No ratings yet
Sensors 25 00214
32 pages
Final Report - Removed
No ratings yet
Final Report - Removed
43 pages
Review of Deep Learning for Object Detection
No ratings yet
Review of Deep Learning for Object Detection
21 pages
Computer Vision Application
No ratings yet
Computer Vision Application
2 pages
Project Detecto!: A Real-Time Object Detection Model
No ratings yet
Project Detecto!: A Real-Time Object Detection Model
3 pages
Deep Learning Object Detection
No ratings yet
Deep Learning Object Detection
7 pages
Object Detection Based On CNN and Vision Transformer A Survey
No ratings yet
Object Detection Based On CNN and Vision Transformer A Survey
30 pages
Computer Vision 3
No ratings yet
Computer Vision 3
38 pages
Sensors 22 04833
No ratings yet
Sensors 22 04833
17 pages
Object Detectionusing Machine Learningand Deep Learning
No ratings yet
Object Detectionusing Machine Learningand Deep Learning
9 pages
Proceedingbook-Anas Mustafa
No ratings yet
Proceedingbook-Anas Mustafa
10 pages
Object Detection Report
No ratings yet
Object Detection Report
27 pages
Deep Learning Object Detection Survey
No ratings yet
Deep Learning Object Detection Survey
30 pages
18 TallapallyHarini 162-170
No ratings yet
18 TallapallyHarini 162-170
9 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
33 pages
Unit 3
No ratings yet
Unit 3
17 pages
Object Detection for the Visually Impaired
No ratings yet
Object Detection for the Visually Impaired
4 pages
A Novel Model To Detect and Categorize Objects From Images by Using A Hybrid Machine Learning Model
No ratings yet
A Novel Model To Detect and Categorize Objects From Images by Using A Hybrid Machine Learning Model
13 pages
Literature Survey For Robotics
No ratings yet
Literature Survey For Robotics
6 pages
Unit 3
No ratings yet
Unit 3
19 pages
20-Year Evolution of Object Detection
No ratings yet
20-Year Evolution of Object Detection
20 pages
Research Article: An Evaluation of Deep Learning Methods For Small Object Detection
No ratings yet
Research Article: An Evaluation of Deep Learning Methods For Small Object Detection
18 pages
Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
Real Time Object Recognition and Classification
No ratings yet
Real Time Object Recognition and Classification
6 pages
Object Detection Project Report
No ratings yet
Object Detection Project Report
45 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
An Investigation of Deep Neural Network Based Techniques For Object Detection An
No ratings yet
An Investigation of Deep Neural Network Based Techniques For Object Detection An
6 pages
Object Detection Explained
No ratings yet
Object Detection Explained
45 pages
Object Detection and Identification Report
No ratings yet
Object Detection and Identification Report
45 pages
Real-Time Object Detection App
No ratings yet
Real-Time Object Detection App
6 pages
An Evaluation of Deep Learning Methods For Small Object
No ratings yet
An Evaluation of Deep Learning Methods For Small Object
18 pages
A Review of Object Detection Based On Convolutional Neural Network
No ratings yet
A Review of Object Detection Based On Convolutional Neural Network
6 pages
Object Detection Security System Report
No ratings yet
Object Detection Security System Report
13 pages
Ankit Synopsis
No ratings yet
Ankit Synopsis
13 pages
Understanding Object Detection Techniques
No ratings yet
Understanding Object Detection Techniques
46 pages
Image AI Techniques for Object Detection
No ratings yet
Image AI Techniques for Object Detection
4 pages
Object Detection ppt-1
100% (2)
Object Detection ppt-1
16 pages
E3sconf Icmed-Icmpc2023 01016
No ratings yet
E3sconf Icmed-Icmpc2023 01016
6 pages
Devansh Rajesh Dhuri 8TH F Roll No.13 (Object Detection in Ai)
No ratings yet
Devansh Rajesh Dhuri 8TH F Roll No.13 (Object Detection in Ai)
10 pages
Deep Learning for Small Object Detection
No ratings yet
Deep Learning for Small Object Detection
14 pages
A Survey Object Detection Methods From CNN To Tran
No ratings yet
A Survey Object Detection Methods From CNN To Tran
32 pages
YOLO Object Detection: Challenges & Successors
No ratings yet
YOLO Object Detection: Challenges & Successors
33 pages
Object Detection Using Machine Learningand Neural Networks
No ratings yet
Object Detection Using Machine Learningand Neural Networks
10 pages
Solution Manual For Principles of Geotechnical Engineering SI 9th Edition Das Sobhan 1305970950 9781305970953 Ready To Read
100% (16)
Solution Manual For Principles of Geotechnical Engineering SI 9th Edition Das Sobhan 1305970950 9781305970953 Ready To Read
54 pages
Avoiding Common Technical Errors in Subclavian Central Venous Catheter Placement
No ratings yet
Avoiding Common Technical Errors in Subclavian Central Venous Catheter Placement
6 pages
Introducing Language Through Literature: Vijaylakshmi & Parveen Bala
No ratings yet
Introducing Language Through Literature: Vijaylakshmi & Parveen Bala
10 pages
IELTS Listening
No ratings yet
IELTS Listening
7 pages
GSU Format
No ratings yet
GSU Format
10 pages
Aldelia Vacancy - Lead Static Mechanical Engineer - Dubai
No ratings yet
Aldelia Vacancy - Lead Static Mechanical Engineer - Dubai
2 pages
Ubuntu-Module 1
No ratings yet
Ubuntu-Module 1
10 pages
Scouting Service Credit Certification
No ratings yet
Scouting Service Credit Certification
2 pages
Thk2e BrE L0 Vocabulary Standard Unit 3
No ratings yet
Thk2e BrE L0 Vocabulary Standard Unit 3
2 pages
2025 Mid-Semester Exam Schedule
No ratings yet
2025 Mid-Semester Exam Schedule
5 pages
Registration For FORMI Internship Cum PPO Recruitment Drive - 2026 Graduating Batch
No ratings yet
Registration For FORMI Internship Cum PPO Recruitment Drive - 2026 Graduating Batch
6 pages
MBA-HealthCare-AC 2
No ratings yet
MBA-HealthCare-AC 2
3 pages
Maxwell's Demon
No ratings yet
Maxwell's Demon
12 pages
Ai Worksheet
No ratings yet
Ai Worksheet
4 pages
OPM3 CMMI Comparison
No ratings yet
OPM3 CMMI Comparison
4 pages
Racquel Bursee: Experienced Substitute Teacher Profile
No ratings yet
Racquel Bursee: Experienced Substitute Teacher Profile
3 pages
XAT 2022 Score Card Details
No ratings yet
XAT 2022 Score Card Details
2 pages
Parent's Consent and Waiver for OJT
No ratings yet
Parent's Consent and Waiver for OJT
1 page
Organizational Culture Questionnaire New
100% (5)
Organizational Culture Questionnaire New
3 pages
Est 230715 132158
No ratings yet
Est 230715 132158
4 pages
Compound Microscope and Focusing Specimen DLP
67% (3)
Compound Microscope and Focusing Specimen DLP
5 pages
CIT-506 Infomation Technology Innovation in Business
50% (2)
CIT-506 Infomation Technology Innovation in Business
10 pages
Impact of Immediate and Delayed Error Co
No ratings yet
Impact of Immediate and Delayed Error Co
10 pages
Women IIT Grads: Career Challenges
No ratings yet
Women IIT Grads: Career Challenges
9 pages
HW7
100% (3)
HW7
6 pages
OUP Readers 2016
No ratings yet
OUP Readers 2016
7 pages
Social Studies Lesson For Portfolio
No ratings yet
Social Studies Lesson For Portfolio
4 pages
Ieltsfever Academic Reading Practice Test 42 PDF
100% (2)
Ieltsfever Academic Reading Practice Test 42 PDF
17 pages
Learning Management System (LMS) FEATURES
No ratings yet
Learning Management System (LMS) FEATURES
2 pages
Understanding Perception in Organizations
No ratings yet
Understanding Perception in Organizations
4 pages

From Classical Techniques To Convolution-Based Models: A Review of Object Detection Algorithms

Uploaded by

From Classical Techniques To Convolution-Based Models: A Review of Object Detection Algorithms

Uploaded by

From classical techniques to convolution-based

models: A review of object detection algorithms

Abstract—Object detection is a fundamental task in computer

• Train an SVM classifier for object classification.

You might also like