PR Project Ankit

DELHI TECHNOLOGICAL UNIVERSITY
(Formerly Delhi College of Engineering)

Shahbad Daulatpur, Bawana Road, Delhi-
110042
DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING
Pattern Recognition Project

Subject Code: CSE6401
Submitted To: Submitted By:

Dr Aruna Bhatt Ankit Singh (23/CSE/30)
Moving Object Detection using YoloV3
Model:
In recent years, object detection has become a crucial task in computer vision with applications
ranging from surveillance to autonomous vehicles. YOLO (You Only Look Once) is a
state-of-the-art real-time object detection system known for its speed and accuracy. This project
aims to leverage YOLO for the detection of moving objects in video streams, enabling
applications such as traffic monitoring, security surveillance, and activity recognition.
The trained model achieved promising results on the test set, demonstrating its ability to
accurately detect and track moving objects in real-time. Quantitative metrics such as precision,
recall, and F1-score were computed to evaluate the performance of the model. Qualitative
evaluation through visual inspection of the detected bounding boxes confirmed the effectiveness
of the model in various scenarios.
YOLOv3:
YOLO, short for You Only Look Once, is a state-of-the-art object detection algorithm in
computer vision. It revolutionized object detection by introducing a real-time approach that
processes images in a single pass through a neural network, hence the name "You Only Look
Once." Traditional object detection algorithms often involve multiple passes through the network
or involve complex pipelines for region proposal and classification. YOLO, on the other hand,
treats object detection as a single regression problem, directly predicting bounding boxes and
class probabilities for objects in the input image.
CNN: YOLOv3 utilizes a deep convolutional neural network (CNN) as its backbone, typically
based on Darknet, a custom network architecture designed specifically for YOLO.
Feature Extraction: The backbone network processes the input image to extract hierarchical
features at multiple scales, capturing both fine-grained and coarse information.
Detection: YOLOv3's detection head is responsible for predicting bounding boxes and class
probabilities. It consists of multiple convolutional layers that directly output bounding box
coordinates and class scores. YOLO divides the input image into a grid of cells. Each cell is
responsible for predicting bounding boxes and class probabilities for objects whose center falls
within that cell.
Bounding Box Prediction: For each grid cell, YOLO predicts a fixed number of bounding
boxes, each characterized by its center coordinates, width and height, and confidence score. The
confidence score represents the model's confidence that the bounding box contains an object and
how accurate the box's localization is.
Dataset:
1. COCO (Common Objects in Context)
2. Pascal VOC (Visual Object Classes)
3. KITTI
Flowchart
Total Computation Time:

Average Computation time (100 runs) = 4.58 minutes
Code:
import cv2
import numpy as np
import os
from google.colab.patches import cv2_imshow
from google.colab import files
# Load YOLOv3 model

net = cv2.dnn.readNet("/content/yolov3.weights", "/content/yolov3.cfg")
classes = []
# Load class names

with open("/content/coco.names", "r") as f:
classes = f.read().strip().split("\n")
# Confidence threshold and NMS threshold for object detection

conf_threshold = 0.5
nms_threshold = 0.3
# Path to the input video file

video_path = "/content/drive/MyDrive/Face_Recognition/CCTV Videos/ch25-
20240413-203434-204507-001000000000_kVDml79r.mp4"
# Path to the folder where frames with detected persons will be saved
output_folder = "CCTV_extracted_frames"
os.makedirs(output_folder, exist_ok=True)
# Open the video file for capturing frames

cap = cv2.VideoCapture(video_path)
# Check if the video file was opened successfully

if not cap.isOpened():
print("Error opening video!")
exit()
# Loop through each frame of the video

while True:
ret, frame = cap.read()
if not ret:
break
# Convert the frame to a format suitable for input to the neural network
blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True,
crop=False)
net.setInput(blob)
outs = net.forward(net.getUnconnectedOutLayersNames())
# Process the detections

for out in outs:
for detection in out:
scores = detection[5:] # Confidence scores for each class
class_id = np.argmax(scores) # ID of the class with the highest score
confidence = scores[class_id] # Confidence score for the detected class
# Check if the detected object is a person and if the confidence is above

the threshold
if confidence > conf_threshold and classes[class_id] == "person":
# Save the frame with the detected person to the output folder
video_name, _ = os.path.splitext(os.path.basename(video_path))
frame_number = int(cap.get(cv2.CAP_PROP_POS_FRAMES))
filename =
f"{output_folder}/{video_name}_frame_{frame_number}.jpg"
cv2.imwrite(filename, frame)
if cv2.waitKey(1) & 0xFF == ord('q'):

break
cap.release()
cv2.destroyAllWindows()
# Create a zip file containing all the detected frames

!zip -r CCTV_extracted_frames.zip {output_folder}
# Download the zip file

files.download("CCTV_extracted_frames.zip")
import cv2
import numpy as np
import os
import pickle
def detect_and_extract_frames(video_path, output_folder):

# Load YOLOv3 model
net = cv2.dnn.readNet("/content/yolov3.weights", "/content/yolov3.cfg")
classes = []
# Load class names

with open("/content/coco.names", "r") as f:
classes = f.read().strip().split("\n")
# Confidence threshold and NMS threshold for object detection

conf_threshold = 0.5
nms_threshold = 0.3
# Path to the input video file

video_path = video_path
# Path to the folder where frames with detected persons will be saved
output_folder = f"{output_folder}"
os.makedirs(output_folder, exist_ok=True)
# Open the video file for capturing frames

cap = cv2.VideoCapture(video_path)
# Check if the video file was opened successfully

if not cap.isOpened():
print("Error opening video!")
exit()
# Loop through each frame of the video

while True:
ret, frame = cap.read()
if not ret:
break
# Convert the frame to a format suitable for input to the neural network
blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True,
crop=False)
net.setInput(blob)
outs = net.forward(net.getUnconnectedOutLayersNames())
# Process the detections

for out in outs:
for detection in out:
scores = detection[5:] # Confidence scores for each class
class_id = np.argmax(scores) # ID of the class with the highest score
confidence = scores[class_id] # Confidence score for the detected class
# Check if the detected object is a person and if the confidence is above

the threshold
if confidence > conf_threshold and classes[class_id] == "person":
# Save the frame with the detected person to the output folder
video_name, _ = os.path.splitext(os.path.basename(video_path))
frame_number = int(cap.get(cv2.CAP_PROP_POS_FRAMES))
filename =
f"{output_folder}/{video_name}_frame_{frame_number}.jpg"
cv2.imwrite(filename, frame)
if cv2.waitKey(1) & 0xFF == ord('q'):

break
cap.release()
cv2.destroyAllWindows()
# Create a zip file containing all the detected frames

zip_file_path = f"{output_folder}.zip"
!zip -r $zip_file_path $output_folder
# Download the zip file

files.download(zip_file_path)
# Export the function as a pickle file

with open('detect_and_extract_frames.pkl', 'wb') as f:
pickle.dump(detect_and_extract_frames, f)
import cv2
import numpy as np
import os
import pickle
# Load the function from the pickle file

with open('/content/detect_and_extract_frames.pkl', 'rb') as f:
detect_and_extract_frames = pickle.load(f)
# Define your video path and output folder

video_path = "/content/vid2.mp4"
output_folder = "trial_folder"
# Call the function with the video path and output folder
result_folder = detect_and_extract_frames(video_path, output_folder)
# Now you can work with the result_folder containing the extracted frames
print("Frames extracted and saved in:", result_folder)

PR Project Ankit

Uploaded by

Copyright:

Available Formats

PR Project Ankit

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PR Project Ankit

Uploaded by

Copyright:

Available Formats

DELHI TECHNOLOGICAL UNIVERSITY

(Formerly Delhi College of Engineering)

DEPARTMENT OF COMPUTER SCIENCE AND

Pattern Recognition Project

Submitted To: Submitted By:

Total Computation Time:

# Load YOLOv3 model

# Load class names

# Confidence threshold and NMS threshold for object detection

# Path to the input video file

# Open the video file for capturing frames

# Check if the video file was opened successfully

# Loop through each frame of the video

# Process the detections

# Check if the detected object is a person and if the confidence is above

if cv2.waitKey(1) & 0xFF == ord('q'):

# Create a zip file containing all the detected frames

# Download the zip file

def detect_and_extract_frames(video_path, output_folder):

# Load class names

# Confidence threshold and NMS threshold for object detection

# Path to the input video file

# Open the video file for capturing frames

# Check if the video file was opened successfully

# Loop through each frame of the video

# Process the detections

# Check if the detected object is a person and if the confidence is above

if cv2.waitKey(1) & 0xFF == ord('q'):

# Create a zip file containing all the detected frames

# Download the zip file

# Export the function as a pickle file

# Load the function from the pickle file

# Define your video path and output folder

You might also like