PR Project Ankit
PR Project Ankit
PR Project Ankit
Model:
In recent years, object detection has become a crucial task in computer vision with applications
ranging from surveillance to autonomous vehicles. YOLO (You Only Look Once) is a
state-of-the-art real-time object detection system known for its speed and accuracy. This project
aims to leverage YOLO for the detection of moving objects in video streams, enabling
applications such as traffic monitoring, security surveillance, and activity recognition.
The trained model achieved promising results on the test set, demonstrating its ability to
accurately detect and track moving objects in real-time. Quantitative metrics such as precision,
recall, and F1-score were computed to evaluate the performance of the model. Qualitative
evaluation through visual inspection of the detected bounding boxes confirmed the effectiveness
of the model in various scenarios.
YOLOv3:
YOLO, short for You Only Look Once, is a state-of-the-art object detection algorithm in
computer vision. It revolutionized object detection by introducing a real-time approach that
processes images in a single pass through a neural network, hence the name "You Only Look
Once." Traditional object detection algorithms often involve multiple passes through the network
or involve complex pipelines for region proposal and classification. YOLO, on the other hand,
treats object detection as a single regression problem, directly predicting bounding boxes and
class probabilities for objects in the input image.
CNN: YOLOv3 utilizes a deep convolutional neural network (CNN) as its backbone, typically
based on Darknet, a custom network architecture designed specifically for YOLO.
Feature Extraction: The backbone network processes the input image to extract hierarchical
features at multiple scales, capturing both fine-grained and coarse information.
Detection: YOLOv3's detection head is responsible for predicting bounding boxes and class
probabilities. It consists of multiple convolutional layers that directly output bounding box
coordinates and class scores. YOLO divides the input image into a grid of cells. Each cell is
responsible for predicting bounding boxes and class probabilities for objects whose center falls
within that cell.
Bounding Box Prediction: For each grid cell, YOLO predicts a fixed number of bounding
boxes, each characterized by its center coordinates, width and height, and confidence score. The
confidence score represents the model's confidence that the bounding box contains an object and
how accurate the box's localization is.
Dataset:
1. COCO (Common Objects in Context)
2. Pascal VOC (Visual Object Classes)
3. KITTI
Flowchart
import cv2
import numpy as np
import os
from google.colab.patches import cv2_imshow
from google.colab import files
# Path to the folder where frames with detected persons will be saved
output_folder = "CCTV_extracted_frames"
os.makedirs(output_folder, exist_ok=True)
# Convert the frame to a format suitable for input to the neural network
blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True,
crop=False)
net.setInput(blob)
outs = net.forward(net.getUnconnectedOutLayersNames())
cap.release()
cv2.destroyAllWindows()
# Path to the folder where frames with detected persons will be saved
output_folder = f"{output_folder}"
os.makedirs(output_folder, exist_ok=True)
cap.release()
cv2.destroyAllWindows()
import cv2
import numpy as np
import os
import pickle
# Call the function with the video path and output folder
result_folder = detect_and_extract_frames(video_path, output_folder)
# Now you can work with the result_folder containing the extracted frames
print("Frames extracted and saved in:", result_folder)