OBJECT DETECTION AND IDENTIFICATION report tc

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

TOPIC: OBJECT DETECTION AND

IDENTIFICATION

This mini-project is submitted for the partial fulfilment of Internal marks


of B.Tech [IT-B, Semester: IV]

By
Shatakshi (2100970130103)
Shubhanshi Misra (2100970130110)
Tanish Agrawal (2100970130115)
Umesh Pandey (2100970130118)
Nevesh Gupta (2100970130068)

To
Dr. Ram Naresh Mahto

Galgotias College of Engineering and Technology


Greater Noida
Date: 15/06/2023
Declaration By Candidates

This is to certify that no part of this mini project has


been borrowed from any resources in any form.

Date:15/06/2023 Name: Shatakshi

TABLE OF CONTENT
Chapter No. TITLE Page No.

ABBREVIATION/ACRONYM (i)
ABSTRACT (ii)
1. INTRODUCTION 1-4
1.1 Introduction
1.2 About the topic
1.3 Outline of project
2. BACKGROUND 5-11
2.1 Operations on image
2.2 Artificial Intelligence and learning
2.3 Machine Learning
2.4 Deep learning
2.5 Convolutional Neural Network (CNN)
2.6 Unified detection model- YOLO
3. METHODOLOGY 12-18
4. SYSTEM REQUIREMENTS 19-22
5. OPEN COMPUTER VISION 23-26
5.1 Libraries used
6. APPLICATION OF OBJECT DETECTION 27-30
7. FUTURE WORK 31-34
7.1 Future enhancement
8. RESULTS AND DISCUSSION 35-36
9. CONCLUSION 37
REFERENCES 38-39
APPENDICES 40-41
ABBREVIATION/ACRONYM

1. OD: Object Detection


2. OID: Object Identification
3. CNN: Convolutional Neural Network
4. RCNN: Region-based CNN
5. Fast R-CNN: Fast Region-based CNN
6. Faster R-CNN: Faster Region-based CNN
7. SSD: Single Shot MultiBox Detector
8. YOLO: You Only Look Once
9. DNN: Deep Neural Network
10.ROI: Region of Interest
11.IoU: Intersection over Union
12.VOC: Visual Object Classes
13.COCO: Common Objects in Context
14.mAP: mean Average Precision
15.FPS: Frames Per Second
16.TP: True Positive
17.FP: False Positive
18.FN: False Negative
19.TP-Rate: True Positive Rate
20.FP-Rate: False Positive Rate
ABSTRACT

The world of the twenty-first century is constantly moving towards automation.


This increase appears to have no signs of abating in the near future. This
movement, which aims to transform the way of life of the typical man, is led by
image recognition. Image processing is the development of the part of robotics that
will allow computers to live in their new bodies. A wide, active, and challenging
field of computer vision is real-time object detection. Image localization is the
process of finding a single object in an image, while object detection is the process
of finding several objects in an image. This finds a class's semantic objects in
digital photos and videos. Real-time object detection has a wide range of uses,
including object tracking, video surveillance, people counting, pedestrian detection,
self-driving automobiles, facial recognition, ball tracking in sports, and many more.

Convolution Neural Networks is an example of a deep learning tool for object


detection that uses OpenCV (Opensource Computer Vision), a collection of
programming functions primarily geared at real-time computer vision. We used the
YOLO acronym—which stands for "You Only Look Once"—to design an object
detection algorithm to do this assignment. Our system used a 21 × 21 grid and was
tested on 10,000 photographs after being trained on 50,000 images. We also created
a text generator that generates text and URLs in an image at random. Additionally,
a record of pertinent data regarding the placement of the URLs in the image is kept
and later provided to the YOLO algorithm for training.

Keywords: Object detection and recognition, Deep Learning, neural network,


Image Recognition, YOLO.
REFERENCES

[1] P. Chakravorty, "What Is a Signal? [Lecture Notes]," in IEEE Signal Processing


Magazine, vol. 35, no. 5, pp. 175-177, Sept. 2018, doi:
10.1109/MSP.2018.2832195.
[2] M. Rouse, “image” [Online]. Available at:
https://whatis.techtarget.com/definition/image. [Accessed: 01/10/20]
[3] Merriam-Webster “image” [Online]. Available at: https://www.merriam-
webster.com/dictionary/image, [Assessed: 01/10/20]
[4] Wikipedia, “Image” [Online]. Available at: https://en.wikipedia.org/wiki/Image,
[Assessed: 01/10/20]
[5] The Editors of Encyclopedia Britannica, “Image-processing” [Online],
Available at: https://www.britannica.com/technology/image-processing. [Assessed:
01/10/20]
[6] Wikipedia, “Encoding images” [Online], Available at:
https://www.bbc.co.uk/bitesize/guides/zqyrq6f/revision/3. [Assessed: 02/10/20]
[7] Object (image processing) [Online], Available at:
https://en.wikipedia.org/wiki/Object_(image_processing) , [Assessed: 02/10/20]
[8] P. Ganesh, Object Detection: Simplified [Online], Available at:
https://towardsdatascience.com/object-detection-simplified-e07aa3830954,
[Assessed: 02/10/20]
[9]Tensorflow, Available at:
https://www.tensorflow.org/lite/models/object_detection/overview, [Assessed:
03/10/20] [10] Wikipedia, “Object detection” Available at:
‘https://en.wikipedia.org/wiki/Object_detection, [Assessed: 02/10/20]
[11] Fritz, “Object Detection Guide”, Available at: https://www.fritz.ai/object-
detection/, [Assessed: 03/10/20]
[12] Wikipedia, “Outline of object recognition”, Available at:
https://en.wikipedia.org/wiki/Outline_of_object_recognition , [Assessed: 02/10/20]
[13] https://ps2fino.github.io/documents/Daniel_J._Finnegan-Thesis.pdf
[14] R. Philippe, Volvo Excavation Site, 2020. [Online]. Available: https:
//www.korestudios.com/portfolio/volvo-construction-equipment/
[15] AHK, Exemplar Construction Site, 2018. [Online]. Available:
https://urbantoronto.ca/news/2018/04/ torontos-largest-construction-site-well-
spadina-front [16] V. G. Maltarollo, K. M. Honório, and A. B. F. da Silva,
“Applications of artificial neural networks in chemical problems,” Artificial neural
networks-architectures and applications, pp. 203–223, 2013.
[17] TutorialsPoint, Supervised Learning, 2020. [Online]. Available:
https://www.tutorialspoint.com/artificial_neural_network/artificial_
neural_network_supervised_learning.htm
[18] B. Frank, Deep Learning the Beautiful Mind, 2016. [Online]. Available:
www.mindwise-groningen.nl/deep-learning-the-beautiful-mind/
[19] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional
networks,” in European conference on computer vision. Springer, 2014, pp. 818–
833.
[20] P. Firelord, Pictorial example of max-pooling, 2018. [Online]. Available:
https://computersciencewiki.org/index.php/Max-pooling_/_Pooling
[21] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:
Unified, real-time object detection,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2016, pp. 779–788.
[22] CyberAILab, A Closer Look at YOLOv3, 2018. [Online]. Available:
https://www.cyberailab.com/home/a-closer-look-at-yolov3
[23] C. C. Nguyen, G. S. Tran, T. P. Nghiem, N. Q. Doan, D. Gratadour, J. C. Burie,
and C. M. Luong, “Towards real-time smile detection based on faster region
convolutional neural network,” in 2018 1st International Conference on Multimedia
Analysis and Pattern Recognition (MAPR). IEEE, 2018, pp. 1–6.
[24] Z. Deng, H. Sun, S. Zhou, J. Zhao, L. Lei, and H. Zou, “Multi-scale object
detection in remote sensing imagery with convolutional neural networks,” ISPRS
journal of photogrammetry and remote sensing, vol. 145, pp. 3–22, 2018.
[25].Agarwal, S., Awan, A., and Roth, D. (2004). Learning to detect objects in
images viaasparse,part-based representation. IEEE Trans. Pattern Anal. Mach.
Intell. 26,1475–1490.doi:10.1109/TPAMI.2004.108
[26] Alexe, B., Deselaers, T., and Ferrari, V. (2010). “What is an object?,” in
ComputerVisionandPatternRecognition (CVPR), 2010 IEEE Conference on (San
Francisco,CA: IEEE), 73–80.doi:10.1109/CVPR.2010.5540226
[27] Aloimonos, J., Weiss, I., and Bandyopadhyay, A. (1988). Active vision. Int.
J.Comput. Vis.1,333–356. doi:10.1007/BF00133571
[28] Andreopoulos, A., and Tsotsos, J. K. (2013). 50 years of object recognition:
direc-tionsforward.Comput. Vis. Image Underst. 117, 827–891.
doi:10.1016/j.cviu.2013.04.005
[29] Azizpour, H., and Laptev, I. (2012). “Object detection using strongly-
superviseddeformablepartmodels,” in Computer Vision-ECCV 2012 (Florence:
Springer),836–849.
[30] Azzopardi, G., and Petkov, N. (2013). Trainable cosfire filters for keypoint
detectionandpatternrecognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 490–
503.doi:10.1109/TPAMI.2012.106
[31] Azzopardi, G., and Petkov, N. (2014). Ventral-stream-like shape
representation:frompixel intensityvalues to trainable object-selective cosfire
models. Front.Comput. Neurosci. 8:80.doi:10.3389/fncom.2014.00080
[32] Benbouzid, D., Busa-Fekete, R., and Kegl, B. (2012). “Fast classification
using sparsedecisiondags,”in Proceedings of the 29th International Conference on
MachineLearning (ICML-12), ICML‘12, edsJ. Langford and J. Pineau (New York,
NY:Omnipress), 951–958.
[33] Bengio, Y. (2012). “Deep learning of representations for unsupervised and
transferlearning,”inICML Unsupervised and Transfer Learning, Volume 27 of
JMLRProceedings, eds I. Guyon, G. Dror,V. Lemaire, G. W. Taylor, and D. L.
Silver(Bellevue: JMLR.Org), 17–36.
[34] Bourdev, L. D., Maji, S., Brox, T., and Malik, J. (2010). “Detecting
peopleusing mutuallyconsistentposelet activations,” in Computer Vision –
ECCV2010 – 11th European Conference onComputerVision, Heraklion, Crete,
Greece,September 5-11, 2010, Proceedings, Part VI, Volume6316ofLecture Notes
inComputer Science, eds K. Daniilidis, P. Maragos, and N.
Paragios(Heraklion:Springer), 168–181.
APPENDICES

Appendix A: Object Detection Algorithms


1. Viola-Jones: A classic algorithm that uses Haar-like features and a
cascade of classifiers for real-time object detection.
2. Histogram of Oriented Gradients (HOG): A feature descriptor
method that counts occurrences of gradient orientations in localized
portions of an image.
3. Faster R-CNN: A two-stage algorithm that uses a region proposal
network (RPN) to generate potential object bounding boxes and a
classification network to classify and refine the detections.
4. You Only Look Once (YOLO): A one-stage algorithm that divides
the input image into a grid and predicts bounding boxes and class
probabilities directly from each grid cell.
5. Single Shot MultiBox Detector (SSD): A one-stage algorithm that
uses a series of convolutional layers with different scales to detect
objects at multiple resolutions.
6. Mask R-CNN: An extension of Faster R-CNN that adds a pixel-level
segmentation branch to perform instance segmentation alongside object
detection.

Appendix B: Datasets for Object Detection and Identification


1. COCO (Common Objects in Context): A large-scale dataset with 80
object categories and over 200,000 labeled images.
2. Pascal VOC (Visual Object Classes): A dataset that includes 20
object categories and over 11,000 labeled images.
3. ImageNet: A widely used dataset with over 1 million labeled images
across 1,000 object categories.
4. Open Images: A dataset with millions of labeled images covering a
wide range of object categories.
5. KITTI: A dataset specifically designed for autonomous driving,
including labeled images with objects such as cars, pedestrians, and
cyclists.

Appendix C: Evaluation Metrics for Object Detection


1. Intersection over Union (IoU): Measures the overlap between
predicted and ground truth bounding boxes.
2. Precision-Recall (PR) Curve: Plots precision (fraction of correctly
predicted objects) against recall (fraction of ground truth objects
detected).
3. Average Precision (AP): The area under the PR curve, which
summarizes the overall performance of an object detector.
4. Mean Average Precision (mAP): The average AP across multiple
object categories, commonly used as an evaluation metric for object
detection algorithms.

You might also like