Ex No 06
Ex No 06
Object detection is a computer vision task that involves identifying and locating objects in
images or videos. It is an important part of many applications, such as surveillance, self-driving
cars, or robotics. Object detection algorithms can be divided into two main categories: single-shot
detectors and two-stage detectors.
You Only Look Once (YOLO) proposes using an end-to-end neural network that makes
predictions of bounding boxes and class probabilities all at once. It differs from the approach taken
by previous object detection algorithms, which repurposed classifiers to perform detection.
Following a fundamentally different approach to object detection, YOLO achieved state-of-the-art
results, beating other real-time object detection algorithms by a large margin.While algorithms like
Faster RCNN work by detecting possible regions of interest using the Region Proposal Network
and then performing recognition on those regions separately, YOLO performs all of its predictions
with the help of a single fully connected layer.
The first 20 convolution layers of the model are pre-trained using ImageNet by plugging in
a temporary average pooling and fully connected layer. Then, this pre-trained model is converted
to perform detection since previous research showcased that adding convolution and connected
layers to a pre-trained network improves performance. YOLO’s final fully connected layer
predicts both class probabilities and bounding box coordinates.
YOLO divides an input image into an S × S grid. If the center of an object falls into a grid
cell, that grid cell is responsible for detecting that object. Each grid cell predicts B bounding boxes
and confidence scores for those boxes. These confidence scores reflect how confident the model is
that the box contains an object and how accurate it thinks the predicted box is.
YOLO predicts multiple bounding boxes per grid cell. At training time, we only want one
bounding box predictor to be responsible for each object. YOLO assigns one predictor to be
“responsible” for predicting an object based on which prediction has the highest current IOU with
the ground truth. This leads to specialization between the bounding box predictors. Each predictor
gets better at forecasting certain sizes, aspect ratios, or classes of objects, improving the overall
recall score.
SOFTWARE REQUIRED:
Ubuntu 14.04, 64 bit
Tensorflow deep learning framework and Python language
GPU: Nvidia GTX 750, 4GB
PROGRAM:
import cv2
import numpy as np
# Load image
image = cv2.imread("image.jpg")
if confidence >conf_threshold:
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
RESULT: Thus the given aim of the program is Succefully Completed and the Outs puts are
Verified.
RERERENCE BOOKS:
1. Bishop, C., M., Pattern Recognition and Machine Learning, Springer, 2006.
2. Navin Kumar Manaswi, “Deep Learning with Applications Using Python”, Apress, 2018.