Happymonk Data Engineer Intern Assignment
Happymonk Data Engineer Intern Assignment
Task 1:
Source Code
import cv2
import json
import time
import threading
from datetime import datetime
import sqlite3
# Create table
c.execute('''CREATE TABLE batches
(batch_id text, starting_frame_id text, ending_frame_id text,
timestamp text)''')
# Camera class
class Camera:
def __init__(self, camera_id, geo_location, video_path):
self.camera_id = camera_id
self.geo_location = geo_location
self.video_path = video_path
self.frame_id = 0
def ingest_stream(self):
cap = cv2.VideoCapture(self.video_path)
while(cap.isOpened()):
ret, frame = cap.read()
if ret:
info = self.process_frame(frame)
self.frame_id += 1
yield info
else:
break
cap.release()
# List of cameras
cameras = [
Camera("camera1", "location1", "video1.mp4"),
Camera("camera2", "location2", "video2.mp4"),
]
1. It defines a Camera class that simulates the ingestion of a live video stream and processes each
frame.
2. It creates a separate thread for each camera to handle multiple streams concurrently.
3. It writes the information of each frame to a JSON file and a SQLite database.
This application is designed to perform real-time analytics on live video streams. It processes video
frames in real-time, applies analytics, and stores the results in a database. The application is built in
Python and uses libraries like OpenCV for video processing and SQLite for data storage.This
application is designed to perform real-time analytics on live video streams. It processes video
frames in real-time, applies analytics, and stores the results in a database. The application is built in
Python and uses libraries like OpenCV for video processing and SQLite for data storage.
Components
The application consists of the following components:
1. Video Stream Ingestion: This component simulates the ingestion of a live video
stream. It uses OpenCV to read video frames from a video file and treats it as a live
stream. It captures frames continuously from the source.
2. Frame Processing: This component takes each incoming video frame and performs
the following actions:
o Processes the frame and creates a JSON object for each frame.
o Extracts relevant information from the processed frame. The JSON object
contains the camera ID, frame ID, geo-location, and the path to the image
file.
o Writes one frame per second as an image file.
3. Batching: This component performs batching of the processed frames based on the
duration value specified in the config file. It creates a dictionary for every batch that
consists of the batch ID, starting frame ID, ending frame ID, and timestamp.
4. Data Storage: This component stores the batch information in a SQLite database. It
creates necessary tables and columns to store the batch information. Every batch
information is logged in the database.
5. Error Handling and Logging: This component implements error handling and logging
mechanisms to capture and handle exceptions that may occur during frame
processing, data storage, or transmission. It ensures that the application logs
relevant information for debugging.
6. Concurrency and Performance: This component modifies the application to handle
multiple camera streams concurrently. It ensures thread safety and avoids race
conditions.
Usage
To use the application, you need to create a Camera object for each camera stream. You
need to provide the camera ID, geo-location, and the path to the video file. Then, you can
start the application by creating and starting a thread for each camera.
Here’s an example:
cameras = [
Camera("camera1", "location1", "video1.mp4"),
Camera("camera2", "location2", "video2.mp4"),
]
threads = []
for camera in cameras:
thread = threading.Thread(target=handle_camera, args=(camera,))
thread.start()
threads.append(thread)
This will start the application and it will begin processing the video streams. The results will
be stored in a SQLite database and a JSON file.
Task 2:
Source Code:
import cv2
import json
import time
import threading
from datetime import datetime
import sqlite3
# Get batches that start within the specified duration after the timestamp
c.execute('''
SELECT * FROM batches
WHERE timestamp BETWEEN ? AND ?
''', (str(timestamp), str(timestamp + timedelta(seconds=duration))))
return c.fetchall()
return frames
# Get the frames from the JSON file and create a video file for each batch
for batch in batches:
frames = get_frames(batch)
create_video(frames)
Components
The application consists of the following components:
1. User Input: This component handles user input. It prompts the user to enter a
timestamp and the duration of the video file.
2. Batch Retrieval: This component retrieves batch information from the database. It
gets all batches that start within the specified duration after the timestamp.
3. Frame Retrieval: This component retrieves frame information from a JSON file. It
reads the JSON file line by line and checks if the frame ID is within the batch.
4. Video Creation: This component creates a video file from the frames. It opens the
first frame to get the size, creates a VideoWriter object, and writes the frames to the
video file.
Usage
To use the application, you just need to run the script. It will prompt you to enter a
timestamp and the duration of the video file. You should enter the timestamp in the format
“YYYY-MM-DD HH:MM:SS” and the duration in seconds. The application will then retrieve
the relevant batches and frames and create a video file for each batch.
Here’s an example:
Enter the timestamp (YYYY-MM-DD HH:MM:SS): 2023-12-05 10:32:37
Enter the duration of the video file (in seconds): 60
This will start the application and it will begin processing the video streams. The results will
be stored in a SQLite database and a JSON file.