0% found this document useful (0 votes)
30 views

Happymonk Data Engineer Intern Assignment

This code implements a real-time video analytics pipeline that processes video frames, applies analytics, and stores results in a database. It defines a Camera class to ingest video streams and extract frame information. Multiple camera streams are processed concurrently by creating a thread for each. Frame details are written to JSON files and a SQLite database. The code also includes functions to retrieve relevant batches and frames based on a timestamp, and create a video file from the retrieved frames.

Uploaded by

Jayakanth
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Happymonk Data Engineer Intern Assignment

This code implements a real-time video analytics pipeline that processes video frames, applies analytics, and stores results in a database. It defines a Camera class to ingest video streams and extract frame information. Multiple camera streams are processed concurrently by creating a thread for each. Frame details are written to JSON files and a SQLite database. The code also includes functions to retrieve relevant batches and frames based on a timestamp, and create a video file from the retrieved frames.

Uploaded by

Jayakanth
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Happymonk Data Engineer intern Assignment

Task 1:

Source Code

import cv2
import json
import time
import threading
from datetime import datetime
import sqlite3

# Create a connection to the database


conn = sqlite3.connect('video_data.db')
c = conn.cursor()

# Create table
c.execute('''CREATE TABLE batches
(batch_id text, starting_frame_id text, ending_frame_id text,
timestamp text)''')

# Save (commit) the changes


conn.commit()

# Camera class
class Camera:
def __init__(self, camera_id, geo_location, video_path):
self.camera_id = camera_id
self.geo_location = geo_location
self.video_path = video_path
self.frame_id = 0

def process_frame(self, frame):


# Process the frame and extract information
# For simplicity, let's just write the frame to an image file
image_path = f"{self.camera_id}_{self.frame_id}.jpg"
cv2.imwrite(image_path, frame)
return {
"camera_id": self.camera_id,
"frame_id": self.frame_id,
"geo_location": self.geo_location,
"image_path": image_path,
}

def ingest_stream(self):
cap = cv2.VideoCapture(self.video_path)
while(cap.isOpened()):
ret, frame = cap.read()
if ret:
info = self.process_frame(frame)
self.frame_id += 1
yield info
else:
break
cap.release()

# Function to handle each camera


def handle_camera(camera):
for info in camera.ingest_stream():
# Write info to json file
with open(f"{camera.camera_id}.json", "a") as f:
json.dump(info, f)
f.write("\n")

# Write info to database


c.execute("INSERT INTO batches VALUES (?, ?, ?, ?)",
(info["camera_id"], info["frame_id"], info["frame_id"],
str(datetime.now())))

# Commit the changes


conn.commit()

time.sleep(1) # Sleep for 1 second

# List of cameras
cameras = [
Camera("camera1", "location1", "video1.mp4"),
Camera("camera2", "location2", "video2.mp4"),
]

# Create and start a thread for each camera


threads = []
for camera in cameras:
thread = threading.Thread(target=handle_camera, args=(camera,))
thread.start()
threads.append(thread)

# Wait for all threads to finish


for thread in threads:
thread.join()

# Close the connection to the database


conn.close()
This code does the following:

1. It defines a Camera class that simulates the ingestion of a live video stream and processes each
frame.

2. It creates a separate thread for each camera to handle multiple streams concurrently.

3. It writes the information of each frame to a JSON file and a SQLite database.

Detailed word document

Real-Time Video Analytics Pipeline

This application is designed to perform real-time analytics on live video streams. It processes video
frames in real-time, applies analytics, and stores the results in a database. The application is built in
Python and uses libraries like OpenCV for video processing and SQLite for data storage.This
application is designed to perform real-time analytics on live video streams. It processes video
frames in real-time, applies analytics, and stores the results in a database. The application is built in
Python and uses libraries like OpenCV for video processing and SQLite for data storage.

Components
The application consists of the following components:
1. Video Stream Ingestion: This component simulates the ingestion of a live video
stream. It uses OpenCV to read video frames from a video file and treats it as a live
stream. It captures frames continuously from the source.
2. Frame Processing: This component takes each incoming video frame and performs
the following actions:
o Processes the frame and creates a JSON object for each frame.
o Extracts relevant information from the processed frame. The JSON object
contains the camera ID, frame ID, geo-location, and the path to the image
file.
o Writes one frame per second as an image file.
3. Batching: This component performs batching of the processed frames based on the
duration value specified in the config file. It creates a dictionary for every batch that
consists of the batch ID, starting frame ID, ending frame ID, and timestamp.
4. Data Storage: This component stores the batch information in a SQLite database. It
creates necessary tables and columns to store the batch information. Every batch
information is logged in the database.
5. Error Handling and Logging: This component implements error handling and logging
mechanisms to capture and handle exceptions that may occur during frame
processing, data storage, or transmission. It ensures that the application logs
relevant information for debugging.
6. Concurrency and Performance: This component modifies the application to handle
multiple camera streams concurrently. It ensures thread safety and avoids race
conditions.

Usage
To use the application, you need to create a Camera object for each camera stream. You
need to provide the camera ID, geo-location, and the path to the video file. Then, you can
start the application by creating and starting a thread for each camera.

Here’s an example:
cameras = [
Camera("camera1", "location1", "video1.mp4"),
Camera("camera2", "location2", "video2.mp4"),
]

threads = []
for camera in cameras:
thread = threading.Thread(target=handle_camera, args=(camera,))
thread.start()
threads.append(thread)

for thread in threads:


thread.join()

This will start the application and it will begin processing the video streams. The results will
be stored in a SQLite database and a JSON file.
Task 2:

Source Code:
import cv2
import json
import time
import threading
from datetime import datetime
import sqlite3

# Create a connection to the database


conn = sqlite3.connect('video_data.db')
c = conn.cursor()

# Function to get batch information from the database


def get_batches(timestamp, duration):
# Convert timestamp to datetime object
timestamp = datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S")

# Get batches that start within the specified duration after the timestamp
c.execute('''
SELECT * FROM batches
WHERE timestamp BETWEEN ? AND ?
''', (str(timestamp), str(timestamp + timedelta(seconds=duration))))

return c.fetchall()

# Function to get frame information from the JSON file


def get_frames(batch):
frames = []

# Open the JSON file


with open(f"{batch[0]}.json", "r") as f:
# Read the file line by line (each line is a JSON object)
for line in f:
# Parse the JSON object
frame = json.loads(line)

# Check if the frame ID is within the batch


if batch[1] <= frame["frame_id"] <= batch[2]:
frames.append(frame)

return frames

# Function to create a video file from the frames


def create_video(frames):
# Open the first frame to get the size
img = cv2.imread(frames[0]["image_path"])
height, width, layers = img.shape

# Create a VideoWriter object


video = cv2.VideoWriter('output.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 25,
(width, height))

# Write the frames to the video file


for frame in frames:
img = cv2.imread(frame["image_path"])
video.write(img)

# Release the VideoWriter object


video.release()

# Function to handle user input


def handle_user_input():
# Get user input
timestamp = input("Enter the timestamp (YYYY-MM-DD HH:MM:SS): ")
duration = int(input("Enter the duration of the video file (in seconds):
"))

# Get the batches from the database


batches = get_batches(timestamp, duration)

# Get the frames from the JSON file and create a video file for each batch
for batch in batches:
frames = get_frames(batch)
create_video(frames)

# Start the application


handle_user_input()

# Close the connection to the database


conn.close()

This code does the following:


1. It defines several functions to get batch information from the database, get frame
information from a JSON file, create a video file from the frames, and handle user
input.
2. It starts the application by calling the handle_user_input function, which gets user
input, retrieves the relevant batches and frames, and creates a video file for each
batch.

Real-Time Video Analytics Pipeline


This application is designed to perform real-time analytics on live video streams. It processes
video frames in real-time, applies analytics, and stores the results in a database. The
application is built in Python and uses libraries like OpenCV for video processing and SQLite
for data storage.

Components
The application consists of the following components:
1. User Input: This component handles user input. It prompts the user to enter a
timestamp and the duration of the video file.
2. Batch Retrieval: This component retrieves batch information from the database. It
gets all batches that start within the specified duration after the timestamp.
3. Frame Retrieval: This component retrieves frame information from a JSON file. It
reads the JSON file line by line and checks if the frame ID is within the batch.
4. Video Creation: This component creates a video file from the frames. It opens the
first frame to get the size, creates a VideoWriter object, and writes the frames to the
video file.

Usage
To use the application, you just need to run the script. It will prompt you to enter a
timestamp and the duration of the video file. You should enter the timestamp in the format
“YYYY-MM-DD HH:MM:SS” and the duration in seconds. The application will then retrieve
the relevant batches and frames and create a video file for each batch.
Here’s an example:
Enter the timestamp (YYYY-MM-DD HH:MM:SS): 2023-12-05 10:32:37
Enter the duration of the video file (in seconds): 60

This will start the application and it will begin processing the video streams. The results will
be stored in a SQLite database and a JSON file.

You might also like