Object Detection With Deep Learning_ A Review Summary
Object Detection With Deep Learning_ A Review Summary
Review Summary
The system is inherited and related to the neural network and other
corresponding learning strategies, the improvement in this pasture will
develop the algorithm of neural networks and will heavily impact the
object detection system that can be evaluated as a learning method.
Though this strategy can detect all possible object locations it has some
shortcomings like expensiveness due to many candidate windows and
redundant windows. Regardless, limiting the sliding window templates
may produce unsatisfactory regions.
B. Feature Extraction
C. Classification
Our assignment will illustrate the brief history of deep learning, basic CNN
structure, generic object detection architecture, and CNN application
reviews including object detection, face detection, and pedestrian
detection. At last, there will remain some future guidelines and concluding
remarks.
Pooling like max pooling, average pooling, L2 pooling, and local contrast
normalization summarises the receptive field response to create a more
robust feature description. The VGG16 has 13 convolutional layers, 3 FC
layers, 3 max-pooling layers, and a softmax classification layer.
For those advantages, CNN is widely used in different research fields such
as image super-resolution reconstruction, image classification, image
retrieval, face recognition, pedestrian detection, and video analysis.
2) SPP-Net:
FC layers must take a fixed-size input. That is why R-CNN chooses to warp
or crop each region's proposal into the same size.
3) Fast R-CNN:
4) Faster R-CNN: In the Faster R-CNN, anchors of three scales and three
aspect ratios are adopted. With the proposal of Faster R-CNN, region
proposal-based CNN architectures for object detection can be trained in
an end-to-end way. The alternate training algorithm is very time-
consuming and RPN produces objectlike regions (including backgrounds)
instead of object instances and is not skilled in dealing with objects with
extreme scales or shapes.
5) R-FCN: Recent state-of-the-art image classification networks, such as
ResNets and GoogLeNetsare fully convolutional. With R-FCN, more
powerful classification networks can be adopted to accomplish object
detection in a fully convolutional architecture by sharing nearly all the
layers, and the state-of-the-art results are obtained on both PASCAL VOC
and Microsoft COCO data sets at a test speed of 170 ms per image.
Although there are different methods of deep learning, there are many
factors for continuous improvement. Still, there is a huge imbalance
between the annotated object numbers and background examples.
B. Regression/Classification-Based Framework:
C. Experimental Evaluation:
Visual saliency detection is one of the most critical and challenging tasks
in computer vision, aiming to highlight the most dominant object regions
in an image. Numerous applications are incorporated to improve visual
saliency performance such as image cropping and segmentation image
retrieval, and object detection.
V. FACE DETECTION:
Some authors trained CNNs with other complementary tasks, such as 3-D
modeling and face landmarks, in a multitask learning manner.
B. Experimental Evaluation
The FDDB data set has 2845 pictures in which 5171 faces are annotated
with an elliptical shape. Here, two types of evaluations are used: the
discrete score and the continuous score.
B. Experimental Evaluation
1) Cascade Network
2) Unsupervised and Weakly Supervised Learning
3) Network Optimization
The third scope of research is to detect 2-D, and 3-D objects, and video
object detection.