Object Detection Using CNN
Object Detection Using CNN
crucial task in computer vision[7]. The system will be of the feed-forward portion of the CNN, which is usually
trained to spot and identify things of interest precise. sufficient for most image recognition tasks.
search with the edge box technique for region proposal Object Recognition:
generation, we achieve significantly faster runtimes without Recognition goes a step further. It involves identifying
sacrificing mean average precision (mAP). Moreover, we and classifying the detected objects into specific categories
simplify the system by eliminating class-specific SVMs, or labels. Mainly, Recognition gives the what is the object
relying instead on the softmax output from the CNN's final in video or image.Simply, detection is about finding objects,
layer as our confidence score. Through meticulous training while recognition is about understanding and labeling what
data curation, we ensure precise calibration of the CNN, those objects . After recognize the objects.
mitigating any potential performance degradation resulting
from the absence of SVMs. Object Detection:
Finding and locating items in a video or image is a
[Aishwarya Sarkale]The field of artificial intelligence technique known as detection. object detection techniques
is booming, and breakthroughs are happening quickly in a like Faster-RCNN and SSD. Bounding boxes surrounding
lot of different areas. In particular, picture identification and the objects that have been recognized and their associated
detection are important sub-domains with many class labels are included in the output of an object detection
applications. AI-powered cars and facial recognition method. Many uses for this data are possible, such as
software are just two examples of the many applications for augmented reality, driverless vehicles, and video
these technologies. Due to the extensive use of neural surveillance.
networks, many sectors benefit from breakthroughs in other
areas in addition to their own unique applications. A subfield Algorithm:
of computer vision and image processing called object Install the TensorFlow.
detection looks for instances of semantic items in digital Download the MobileNet pretrained model to your
images, like people, buildings, and cars. Applications for machine
object detection, including as face and pedestrian detection, Utilize model prediction by passing in the configuration
have been extensively researched. These applications have path to the model.
implications for computer vision domains like picture Preprocess the image.
retrieval and video surveillance. Assign a target label to the object in the image.
Predicts the probability of target label to each frame in
III. METHODOLOGY the image.
The video stream live and video file that we uploaded
will perform real time object. looping through the frames
we captured from the video stream.
R-CNN:
RCNN stands for “Region-based Convolutional Neural
Network”. It's a kind of deep learning model for identifying
objects in pictures. It first generates region proposals.
RCNN, region proposals are generated using a selective
search algorithm. This algorithm analyzes the image and
Fig 1 : Process of Proposed System identifies potential object regions based on similarities in
color, texture, and other visual features. These proposed
Camera/Webcam: regions are then passed through the convolutional neural
first, we have to collect the videos which has objects network for further analysis and classification. It's a popular
after that upload the video then we have to detect the approach in computer vision.
objects.
YOLO:
Extract Frames from Video: YOLO stands for “YOU ONLY LOOK ONCE” .It is a
To extract frames from a video, we are using opencv real-time object detection system that uses a single neural
video processsing library .This tool is to break down a video network to process the entire image, segmenting it into areas
into individual frames or images. then save these frames as and predicting possibilities and bounding boxes for each
separate files for further analysis or use in other one. This indicates that Yolo can recognize several objects
applications. By extracting frames, you can analyze the in a video.
content of each frame, perform image processing tasks, or
create. It's a useful technique in various fields like computer
vision.
SSD achieves object detection by using a single neural Based on the comparison of deep learning algorithms
network that processes an input image and generates a set of for object detection The real-time object detection system
bounding box predictions and class probabilities. It does this was put through thorough testing, showcasing an impressive
by dividing the input image into a grid of cells and assigning object detection accuracy of 92% .It consistently processed
each cell responsibility for detecting objects. The network frames at a speed of 25 frames per second (FPS) on desktop
then predicts the offsets to adjust default bounding box computers and 15 FPS on embedded systems and
priors and the corresponding class probabilities for each cell. smartphones. When compared to R-CNN, YOLO, and SSD,
This allows SSD to detect objects of various sizes and aspect this system outperformed in terms of both accuracy and
ratios at different locations in the image. The predicted speed.
bounding boxes are then filtered based on their confidence
scores to obtain the final detection results. The latest evaluation results demonstrate the system's
robustness and efficiency in various scenarios. Its
B. Mobile-Net adaptability in tasks like traffic surveillance and pedestrian
To avoid the drawbacks of other systems we proposed detection further solidifies its potential for applications in
this models which included single shot multi box detector autonomous vehicles, surveillance systems, and augmented
and a mobile-net, Tensor flow, Open cv that detects objects, reality. This system's performance and versatility make it a
with much accuracy, and is robust. valuable tool for real-world implementations requiring fast
and accurate object detection capabilities.
Mobile-Net is an efficient and lightweight CNN
architecture used for efficient vision applications. It has two
convolutions they are depth-wise separable convolution and
point-wise seperable convolutions. In this we are using
proven depth-wise separable convolutions to build light
weight deep neural networks.
VIII. CONCLUSION
Fig 3 : In Above Screen I am Uploading One Video, Once its
Finally the conclusion the progress made in computer
Uploaded the Video, it will Appear Below Screen.
vision systems, particularly in object detection and
recognition, combining traditional methods with deep
learning models. The reliability of these systems in
accurately identifying and categorizing objects, even in
complex situations, showcases their potential for practical
use. The ability for real-time processing further boosts their
effectiveness in time-critical tasks like surveillance and
autonomous vehicles.