0% found this document useful (0 votes)
14 views31 pages

Autocertify Copy 2 1 1

Uploaded by

Bhuvan Bander
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views31 pages

Autocertify Copy 2 1 1

Uploaded by

Bhuvan Bander
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI-590018

A Project Report
on
Autocertify - Intelligent video content classification and
adaptation
Submitted in partial fulfillment of the requirements for the final year degree in
Bachelor of Engineering in Computer Science and Engineering
of Visvesvaraya Technological University, Belagavi

Submitted by
Kalle Praveen 1RN21CS073
K Sai Dheeraj 1RN21CS084
M Sasireeth Reddy 1RN21CS090
Manoj Kumar PG 1RN21CS094
Under the Guidance of
Dr. Bhavanishankar K
Professor
Dept. of CSE

Department of Computer Science and Engineering


RNS Institute of Technology
Affiliated to VTU, Recognized by GOK, Approved by AICTE,New Delhi
NACC ’A + Grade’ Accredited, NBA Accredited (UG-CSE, ECE, ISE, EIE and EEE)
Channasandra, Dr.Vishnuvardhan Road, Bengaluru-560098
Ph:(080)28611880, 28611881 URL:www.rnsit.ac.in

2024-2025
RN SHETTY TRUST®
RNS INSTITUTE OF TECHNOLOGY
Affiliated to VTU, Recognized by GOK, Approved by AICTE, New Delhi
NACC ’A + Grade’ Accredited, NBA Accredited (UG-CSE,ECE,ISE,EIE and EEE)
Channasandra, Dr.Vishnuvardhan Road, Bengaluru-560098
Ph:(080)28611880,28611881 URL:www.rnsit.ac.in
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

CERTIFICATE

Certified that the Project work entitled Autocertify - Intelligent video content classification and

adaptation has been successfully carried out at RNSIT by Kalle Praveen bearing 1RN21CS073, K

Sai Dheeraj bearing 1RN21CS084, M Sasireeth Reddy bearing 1RN21CS090 and Manoj Kumar

PG bearing 1RN21CS094 bonafide students of RNS Institute of Technology in partial fulfillment

of the requirements of final year degree in Bachelor of Engineering in Computer Science and

Engineering of Visvesvaraya Technological University, Belagavi during academic year 2024-2025.

The Project report has been approved as it satisfies the academic requirements in respect of project

work for the said degree.

————————— ————————- ————————-


Dr. Bhavanishankar K Dr. Kavitha C Dr. Ramesh Babu H S
Professor Dean and Head Principal
Dept. of CSE, RNSIT Dept. of CSE, RNSIT RNSIT

External Viva

Name of the Examiners Signature with Date

1.

2.
Acknowledgement

setspace I extend my profound thanks to the Management of RNS Institute of Technology for
fostering an environment that promotes innovation and academic excellence.I want to express my
gratitude to our beloved Director, Dr. M K Venkatesha, and Principal, Dr. Ramesh Babu H S, for
their constant encouragement and insightful support.Their guidance has been pivotal in keeping me
motivated throughout this endeavour.
My heartfelt appreciation goes to Dr. Kavitha C, Dean and HoD of Computer Science and
Engineering, for her vital advice and constructive feedback, which significantly contributed to shaping
this project.
I also thank the Project Coordinators for their continuous monitoring and ensuring the process was
deployed as per schedule. I am deeply grateful to my project guide, Dr. Bhavanishankar K, for
his invaluable guidance and support throughout. Professor, for their unwavering support, guidance,
and valuable suggestions throughout the duration of this project. Lastly my thanks go to all the
teaching and non-teaching staff members of the Computer Science and Engineering Department for
their encouragement, cooperation and support have been invaluable during this journey.

Warm Regards,

Kalle Praveen (1RN21CS073)


K Sai Dheeraj (1RN21CS084)
M Sasireeth Reddy (1RN21CS090)
Manoj Kumar PG (1RN21CS094)

i
Abstract

The proposed project aims to modernize and automate the video certification process in India, by
building a system powered by machine learning methods. Existing challenges like time consuming
procedures and subjective biases poses a need for a more efficient and unbiased certification
solution.The model will leverage machine learning models such as Support Vector Machine (SVM),
You Only Look Once (YOLO) and other computer vision models to automate video categorization
based on the guidelines established by the Central Board of Film Certification (CBFC) on violence.
Moreover, it will adapt the content to ensure objectionable material is made suitable, for audiences
while complying with CBFC regulations. The project consists of data collection, model training,
system integration and user interface design focusing on improving efficiency, accessibility and
adherence in the video content certification process.

ii
Contents

Acknowledgement i

Abstract ii

List of Figures v

1 INTRODUCTION 1
1.1 Existing System and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Manual Certification by CBFC . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Automated Content Tagging on OTT Platforms . . . . . . . . . . . . . . . . 2
1.1.3 Tools for User-Driven Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 LITERATURE SURVEY 3
2.1 Relevant recent paper’s summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Conclusion about literature survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 PROBLEM STATEMENT 5
3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 SYSTEM ARCHITECTURE/BLOCK DIAGRAM 7


4.1 Flow Diagram of SVM and YOLO . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.1 Video Classification Process . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.2 YOLO Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 IMPLEMENTATION 10
5.1 Datasets Explaination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Libraries Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.2 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

iii
5.2.3 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.4 Scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.5 TensorFlow/Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.6 Joblib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2.7 Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2.8 Openpyxl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 Algorithms/Methods/Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6 RESULT AND SNAPSHOTS 18


6.0.1 Certification of a video: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.0.2 Video bluring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7 Conclusion and Future Enhancements 23

References 24
List of Figures

4.1 Flow of Certification Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7


4.2 Flow of Adaptation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.1 Certification Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6.1 A-Rated Video Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18


6.2 A-Rated Video Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.3 PG-13 Video Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.4 PG-13 Video Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.5 U-Rated Video Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.6 U-Rated Video Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.7 Input image with visible blood. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.8 Output image with blood blurred. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.9 Input image with no visible blood. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.10 Output image unchanged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.11 Input image with visible blood. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.12 Output image with blood blurred. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.13 Input image with visible blood. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.14 Output image with blood blurred. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.15 Input image with no visible blood. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.16 Output image unchanged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

v
Chapter 1

INTRODUCTION

Digital media in India is growing and emerging very fast, and due to that, video content on
different platforms like OTTS, YouTube, and social media started growing and became available
to everybody. However, it is difficult to ensure proper grading and proper access to these materials.
The current method of video certification is a very time-consuming process; it may also be affected,
high variability in results, and slow processes of accreditation. Observing this situation, the
project offers a new model that is based on machine learning, which should renovate the whole
video certification process. This solution utilizes advanced technologies, including Support Vector
Machines(SVM),input data and set the frames of categorization in accordance with The Central
Board of Film Certification can, well in advance, pre-determine age content ratings such as U, U/A,
and A. The project has given due importance to consistency, quality, and efficiency in the content
certification for Indian movies that will also bring it very close to the CBFC guidelines. It aims to
provide a reliable system for certifying movies while also addressing content adaptation to make them
suitable for diverse audiences. The system performs adaptation in an interesting way, by making use
of sophisticated deep learning models-DeepLabV3, YOLOv10, and many others-to identify and blur
the unwanted parts in video frames, specifically the scenes that depict different degrees of violence.
This protects adherence to the CBFC regulations but keeps the content appealing and apt for a
audience. It is built on two vital words-classification and adaptation. Classification will engage in
the documentation of films under the categories developed by CBFC: A, PG, U, whereas adaptation
will ensure that sensitive content is treated with consideration. these two elements make a statement
of commitment not only to the standards set by the CBFC but also to increased viewer experience.

1
1.1 Existing System and Limitations

1.1.1 Manual Certification by CBFC

The Central Board of Film Certification (CBFC) in India manually reviews films using a panel of
experts. Films are classified into categories like U (Universal), UA (Parental guidance), A (Adults
only), and S (Restricted to specific groups). Certification is based on CBFC guidelines covering
themes such as violence, nudity, and language.
Limitations:

• Subjective decision-making due to human biases.

• Time-consuming and labor-intensive process, leading to delays.

• Inconsistent interpretations of guidelines.

1.1.2 Automated Content Tagging on OTT Platforms

Streaming platforms like Netflix and Amazon Prime use metadata and machine learning algorithms
to categorize content by age and sensitivity levels.
Limitations:

• Limited accuracy in detecting nuanced or context-dependent objectionable content.

• Inconsistent standards compared to traditional certification bodies like CBFC.

1.1.3 Tools for User-Driven Filtering

Tools like VidAngel allow users to filter or censor specific elements (e.g., violence, nudity, profanity)
from movies or shows based on personal preferences.
Limitations:

• Requires manual user intervention and customization.

• Does not integrate with centralized certification systems like CBFC.

Dept. of CSE, RNSIT 2024-25 Page 2


Chapter 2

LITERATURE SURVEY

1. Machine learning (ML) techniques in automated film censorship and rating systems: Two
distinct but interrelated studies contribute significantly to the evolving landscape of automated content
analysis for film censorship and rating. The first study explores the role of machine learning (ML)
techniques in the Automated Film Censorship and Rating (AFCR) system, specifically focusing on
the use of Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and Long
Short-Term Memory (LSTM) networks to identify explicit content such as violence and nudity in
films. This study aims to enhance the efficiency and accuracy of the traditional manual film censorship
process, which is slow and prone to errors, by utilizing advanced machine learning models for content
classification.

2. Profanity detection and removal in videos using machine learning techniques: Conversely, the
second study investigates profanity detection and removal in videos using machine learning, specif-
ically focusing on recognizing and eliminating explicit language from video platforms. Techniques
such as audio segmentation and lip coordination.This study highlights a key area of concern in content
moderation, offering a practical solution for real-time profanity detection and video modification.

3. Developing BrutNet: A New Deep CNN Model with GRU for Realtime Violence Detection:
It presents a new deep learning model named BrutNet specifically designed for real-time violence
detection in videos. The model leverages the power of Convolutional Neural Networks (CNNs) for
extracting spatial features and Gated Recurrent Units (GRUs) for capturing temporal dependencies in
video frames, which makes it capable of detecting violent content effectively.

3
4. Deep Learning-Based Detection of Inappropriate Speech Content for Film Censorship: The
reference examines that deep learning models, particularly those that incorporate advanced natural
language processing (NLP) techniques, offer significant promise in the field of film censorship.
These models can automatically detect and classify inappropriate speech content, making content
moderation systems more efficient, scalable, and accurate.

2.1 Relevant recent paper’s summary


In a more focused analysis, the recent papers from these studies reveal deeper insights into their
respective areas. The first paper focused on film censorship and rating demonstrates a growing use of
machine learning techniques like CNNs and LSTMs to identify inappropriate content in films. The
study emphasizes the significant role of these algorithms in improving the accuracy and speed of the
censorship process. It also explores the challenges of developing an automated system for content
classification and age-appropriate rating.
The second paper, focused on profanity detection, showcases an innovative approach involving
audio analysis and facial recognition to identify and eliminate offensive language. By employing
lip coordination extraction and machine learning models, this study offers a practical method for
modifying video content and removing objectionable material. It addresses a major concern in online
video platforms, ensuring a more viewing experience by automating the removal of explicit language.

2.2 Conclusion about literature survey


Together, these studies significantly contribute to the advancement of automated film censorship
and content moderation techniques. The first study highlights the growing potential of machine
learning algorithms such as CNNs and LSTMs in film classification, suggesting further development
in their application for real-time censorship and rating systems. The second study demonstrates the
effectiveness of combining facial recognition and audio analysis for profanity detection, presenting
a promising solution for enhancing content control on video platforms. Both studies advocate for
continued exploration of machine learning methodologies to improve the efficiency and accuracy of
automated content analysis, ensuring safer and more appropriate content for all viewers.

Dept. of CSE, RNSIT 2024-25 Page 4


Chapter 3

PROBLEM STATEMENT

The manual process of film censorship is time-consuming, error-prone, and unsustainable given
the vast volume of video content today. This project aims to automate content censorship and
classification using machine learning to detect violent scenes and categorize content accurately. By
aligning with guidelines like the CBFC, it streamlines censorship, reduces human intervention, and
ensures real-time content rating and categorization.

5
3.1 Objectives
This project aims to develop an automated film censorship and rating system that leverages machine
learning models to:

• Data Collection and Preprocessing: Gather a diverse dataset of videos with varying levels of
violent content and preprocess them to eliminate noise and repetitive frames, ensuring high-
quality input data for the system.

• Content Identification and Classification: Utilize advanced machine learning techniques


like DeepLabV3 for visual analysis, YOLOv3 and YOLOv10 for object detection, and natural
language processing algorithms to detect and classify violent or inappropriate content in videos.

• Adherence to CBFC Guidelines: Ensure that the system follows the CBFC guidelines to
generate film ratings and certifications based on identified content.

• Thorough Reporting: Develop a reporting mechanism that generates detailed reports,


including ratings and timestamps for when violent scenes occur, ensuring transparency and
accuracy in the rating process.

• Content Modification and Adaptation: Modify or adapt the video content by blurring or
removing violent scenes, providing an appropriate version for different age group ratings and
audience categories.

Dept. of CSE, RNSIT 2024-25 Page 6


Chapter 4

SYSTEM ARCHITECTURE/BLOCK
DIAGRAM

4.1 Flow Diagram of SVM and YOLO

4.1.1 Video Classification Process

The video classification process is depicted in Figure 4.1, which outlines the flow of the certification
model.

Figure 4.1: Flow of Certification Model

7
1. Dataset Preparation: The process begins with the collection of a labeled dataset of videos,
ensuring that each predefined category (such as ”A-rated,” ”PG-13,” or ”U-rated”) is adequately
represented. This dataset forms the foundation for the entire classification system, and its diversity is
key to building a robust classifier capable of handling various types of video content.
2. Video Preprocessing: Once the dataset is prepared, the videos undergo preprocessing to
standardize them. In this step, videos are converted into a consistent format and resolution. This
uniformity ensures that all videos, regardless of their original properties, are processed in the same
way, making it easier for the subsequent steps to work effectively.
3. Frame Extraction: After preprocessing, frames are extracted from the videos at regular
intervals, such as one frame per second. This is typically done using tools like OpenCV, which
allow for the efficient extraction of key frames that represent the content of the video. These frames
serve as important snapshots of the video’s overall content and are the primary input for the next stage
of feature extraction.
4. Feature Extraction: The extracted frames are then passed through a pretrained VGG16
deep learning model. This model is used to extract high-level features, such as objects and scene
characteristics, from the frames. These features are crucial as they encapsulate the key visual
information needed to categorize the video content accurately.
5. Training: With the features extracted from the frames, an SVM (Support Vector Machine)
classifier is trained. The classifier learns to distinguish between the predefined categories based on
the features it receives. This training step allows the model to develop the ability to classify new
frames by learning from the patterns observed in the dataset.
6. Evaluation: After training, the model’s performance is evaluated using various metrics, such
as Precision, Recall, and Accuracy. These metrics help assess how well the classifier performs in
predicting the correct categories. Based on these evaluation results, adjustments may be made to
improve the system’s performance, such as tuning the model or retraining with a more refined dataset.
7. Use Case: Once the model is trained and fine-tuned, it is ready for deployment in real-
world applications. The trained system is capable of accepting new video inputs, extracting relevant
features from them, and classifying them into the appropriate categories based on their content. This
functionality makes the system suitable for use in various scenarios, such as content moderation,
video recommendation systems, or automated content tagging.

Dept. of CSE, RNSIT 2024-25 Page 8


4.1.2 YOLO Training Process

The YOLO training process is illustrated in Figure 4.2, which outlines the flow of the adaptation
model.

Figure 4.2: Flow of Adaptation Model

The steps are described as follows:

1. Prepare the Dataset: Collect the required data, including images and their corresponding
annotations (bounding boxes and labels). Structure the data to meet YOLO’s input format
requirements.

2. Preprocessing: Resize images to the desired input size, normalize pixel values for consistency
during training, and ensure the annotations match the resized images.

3. Model Setup: Load a pre-trained YOLO model as a starting point to leverage transfer learning.
Configure the model for the specific dataset, including setting up class labels and data paths.

4. Training: Train the model using a configuration file that specifies dataset details. Adjust
hyperparameters such as batch size, epochs, and learning rate to optimize performance.

5. Model Saving: Save the trained model for future inference or further fine-tuning. The saved
model can be deployed to detect objects in new images or videos.

This step-by-step process ensures the YOLO model is effectively trained for customized datasets,
providing accurate object detection results.

Dept. of CSE, RNSIT 2024-25 Page 9


Chapter 5

IMPLEMENTATION

5.1 Datasets Explaination


As shown in Figure 5.1. showcases the collection of the violent content in videos and categorized the
type of violance.

Figure 5.1: Certification Dataset

10
Dataset Attributes
Types of Violence Labeling (Degree of Violence)
Code Description Label Description
G Gunshots 0 UA/Parental Guidance
F Fighting 1 A/Adult Rated
CA Car Accident 2 U/Kids
B Blood
C Chopping
S Stabbing
E Explosion
NA Non-Violent Clips

Other Attributes
Attribute Description
Duration Total length of video in seconds.
Fps Total number of frames in video (each second
divided into 10 frames).

Dept. of CSE, RNSIT 2024-25 Page 11


5.2 Libraries Used

5.2.1 Pandas

Pandas is a popular Python library for data manipulation and analysis. It was used in this project to
handle tabular data, such as loading and processing video-related information stored in DataFrames.
The library is particularly useful for data cleaning, transformation, and analysis.

5.2.2 OpenCV

OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine
learning software library. It provides tools for real-time image and video processing. In this project,
OpenCV was used for video frame extraction, reading and writing video files, and performing various
image processing tasks such as feature extraction, object detection, and segmentation.

5.2.3 NumPy

NumPy is a fundamental library for numerical computations in Python. It provides support for arrays
and matrices, along with mathematical functions to operate on them. In this project, NumPy was used
for handling arrays and matrices, particularly in image processing and manipulating data structures.

5.2.4 Scikit-learn

Scikit-learn is a machine learning library for Python. It provides simple and efficient tools for data
mining and data analysis. In this project:

• train test split was used to split the dataset into training and testing sets.

• SVC (Support Vector Classifier) was used to train the SVM (Support Vector Machine) model
for violence detection classification.

• LabelEncoder was used to encode the categorical labels into numeric form for model
training.

5.2.5 TensorFlow/Keras

TensorFlow is an open-source machine learning framework, and Keras is a high-level neural networks
API. The following Keras modules were used:

Dept. of CSE, RNSIT 2024-25 Page 12


• VGG16: A pre-trained deep learning model used for feature extraction from video frames. This
model was used to extract relevant features to classify the frames based on violence content.

• Model: Used to build and customize the model from VGG16 by excluding the top layer for
feature extraction.

• img to array: Converts images to NumPy arrays, which is essential for preprocessing input
images for the model.

• preprocess input: Preprocessing function for preparing the input data to be compatible
with VGG16, including scaling the pixel values appropriately.

5.2.6 Joblib

Joblib is a Python library used for serializing and deserializing Python objects, such as machine
learning models. In this project, it was used to save the trained SVM model and label encoder to disk
(dump) and to load them back during prediction (load).

5.2.7 Collections

The Counter class from the collections module was used to count the frequency of different
predicted labels. This helps determine the final classification label for a video based on the majority
prediction from the frames.

5.2.8 Openpyxl

Openpyxl is a library used for reading and writing Excel files in Python. In this project, it may have
been used (based on the pip install command) for handling Excel files to store or retrieve data
associated with video processing.

Dept. of CSE, RNSIT 2024-25 Page 13


5.3 Algorithms/Methods/Pseudocode

SVM for Violence Detection and Classification of a Video

Function: extract features(frame, feature extractor)

• Resize frame to 224x224.

• Convert frame to array and preprocess.

• Extract features using feature extractor.

• Flatten and return features.

Explanation: Resizes, preprocesses, and extracts features from a video frame using the VGG16
model.

Function: extract features from videos(df)

• Load VGG16 model (exclude top layer).

• Initialize lists X (features) and y (labels).

• For each video in DataFrame:

– Open video, read frames, and extract features every second.

– Append features and labels to X and y.

• Return X and y.

Explanation: Extracts features from frames of multiple videos, samples one frame per second, and
stores features and labels.

Function: train model(X, y)

• Encode labels (A-rated → 1, PG-13 → 0, U-rated → -1).

• Split data into training and testing sets.

• Train SVM model on training data.

• Save model and label encoder.

Dept. of CSE, RNSIT 2024-25 Page 14


• Evaluate model accuracy on the test set.

• Return trained model.

Explanation: Encodes labels, trains an SVM classifier on extracted features, and saves the trained
model and encoder.

Function: classify video(input video path, model, label encoder)

• Load VGG16 model (exclude top layer) for feature extraction.

• Open video using cv2.VideoCapture.

• Initialize frame count and an empty list for predictions.

• For each frame in the video:

– Read frame, and if frame is valid:

– If it’s time to extract features (every second):

* Extract features using the feature extractor.

* Predict the label using the trained model.

* Convert prediction to human-readable label using label encoder.

* Append the predicted label to the predictions list.

• Release video capture.

• Count the frequency of each predicted label.

– If A-rated (1) is predicted, return A-rated.

– Else if PG-13-rated (0) is predicted, return PG-13-rated.

– Else return U-rated.

Explanation: This function processes a video frame by frame, extracts features every second using
the VGG16 model, and uses a trained classifier to predict the video’s label. The predictions are
stored, and based on the majority, the video is classified into one of three categories: A-rated,
PG-13-rated, or U-rated.

Dept. of CSE, RNSIT 2024-25 Page 15


HSV for Blurring the Blood in a Video

Function: get blood mask(frame)

• TRY:

– Preprocess the frame and obtain model output.

– Resize model output to match the original frame size.

– Create binary mask for blood (class ’15’).

– RETURN the blood mask.

• CATCH Exception:

– RETURN empty mask.

Explanation: This function uses a DeepLabV3 model to segment regions in the input frame and
identifies blood regions (class ’15’). The model output is resized to match the input frame, and a
binary mask is created where the blood regions are detected. If any error occurs during the process,
an empty mask is returned.

Function: hsv blood detection(frame)

• Convert the frame from BGR to HSV.

• Create two masks for red color ranges (blood-like hues).

• Combine the two masks to detect blood-like regions.

• RETURN the combined red mask.

Explanation: This function converts the input frame to the HSV color space, where blood-like colors
(reds) are easier to isolate. It creates two separate masks for red regions (one for each side of the red
hue spectrum) and combines them to cover the full range of red. The resulting mask highlights all the
red (blood-like) regions in the frame.

YOLOv10 for Identification of Violence (Blood)

Function: create config file()

• Define a dictionary config with dataset paths and class labels.

Dept. of CSE, RNSIT 2024-25 Page 16


• Set the path for the config.yaml file.

• Create necessary directories for the file.

• Write the config dictionary to the YAML file.

• Print a success message.

Explanation: Defines a configuration dictionary, ensures the target directory exists, saves it as a
YAML file, and prints a confirmation message.

Function: blur segments in video()

• Load the YOLO model from the specified path.

• Open the input video file using OpenCV.

• Get video properties (frame width, height, FPS, total frames).

• Define codec and create VideoWriter object for output video.

• For each frame in the video:

– Read the frame.

– Apply YOLO object detection to the frame.

– For each detected object:

* If the confidence is above the threshold:

* Extract and blur the region of interest (ROI).

* Replace the ROI with the blurred version.


– Write the processed frame to the output video.

• Print progress after each frame is processed.

• Release video capture and writer resources.

Explanation: Processes video frames with a YOLO model, blurring high-confidence detected
objects, and saves the output as a video file.

Dept. of CSE, RNSIT 2024-25 Page 17


Chapter 6

RESULT AND SNAPSHOTS

6.0.1 Certification of a video:

It takes a video as input and returns a rating in the form of ’A’, ’U/A’ or ’PG-13’, ’U’ according to

CBFC Guidelines. We have tried 4 models for certification; according to the results, we used VGG16

for certification.

As shown in Figure 6.1 shows A-Rated video input and Figure 6.2 shows A-Rated video output.

Figure 6.1: A-Rated Video Input Figure 6.2: A-Rated Video Output

18
As shown in Figure 6.3 shows a PG-13 video input and Figure 6.4 shows PG-13 video output.

Figure 6.3: PG-13 Video Input Figure 6.4: PG-13 Video Output

As shown in Figure 6.5 shows a U-Rated video input and Figure 6.6 shows U-Rated video output.

Figure 6.5: U-Rated Video Input Figure 6.6: U-Rated Video Output

6.0.2 Video bluring

1 .Blurring a frame in a video where blood is visible using DeepLabV3 and HSV Colour Detection:

Using DeepLabV3 and HSV Colour Detection we were able to blur a full frame as well as blur in

segments

2. Blurring a segment in a frame in a video where blood is visible using Y0LOV10: With YOLOV10

we were able to successfully blur a segment in a frame in a given input video where blood is visible.

Dept. of CSE, RNSIT 2024-25 Page 19


As shown in Figure 6.7 shows a Input image with visible blood and Figure 6.8 shows Output

image with blood blurred.

Before After

Figure 6.7: Input image with visible blood. Figure 6.8: Output image with blood blurred.

As shown in Figure 6.9 shows a Input image with no visible blood and Figure 6.10 shows Output

image unchanged.

Before After

Figure 6.9: Input image with no visible blood. Figure 6.10: Output image unchanged.

Dept. of CSE, RNSIT 2024-25 Page 20


As shown in Figure 6.11 shows a Input image with visible blood and Figure 6.12 shows Output

image with blood blurred.

Before After

Figure 6.11: Input image with visible blood. Figure 6.12: Output image with blood blurred.

As shown in Figure 6.13 shows a Input image with visible blood and Figure 6.14 shows Output

image with blood blurred.

Before After

Figure 6.13: Input image with visible blood. Figure 6.14: Output image with blood blurred.

Dept. of CSE, RNSIT 2024-25 Page 21


As shown in Figure 6.15 shows a Input image with no visible blood and Figure 6.16 shows Output

image unchanged.

Before After

Figure 6.15: Input image with no visible blood. Figure 6.16: Output image unchanged.

Dept. of CSE, RNSIT 2024-25 Page 22


Chapter 7

Conclusion and Future Enhancements

With this work, we integrated violence detection and mitigation methods into a new system

for an automatic movie certification and adaption mechanism per CBFC requirements. The

model ResNet50+CNN performed very well in classifying video content and providing appropriate

certification. We handled the problem of detecting and blurring violent content with two blurring

techniques.

The first approach combined DeepLabv3 with HSV color detection, which was able to effectively

segment and blur areas containing excessive blood. The second uses YOLOv10, whereby the

segmentation and selective blurring of violent parts in video frames become easier. These strategies

have helped reduce explicit content while maintaining the viewer experience and, hence, compliance

with CBFC guidelines.

These modality detection and adaptation techniques will provide a solid footing for autonomous

video certification and adaption, hence offering scalability and effectiveness in practical content

moderation and compliance applications. Growing the dataset with a wider gamut of video genres is

one area of future work that could be done to increase the accuracy and adaptability by investigating

sophisticated models.

23
References

[1] S. Afsha, M. Haque and H. Nyeem, “Machine Learning Models for Content Classification in

Film Censorship and Rating,” IEEE, 2021.

Link: https://leeexplore.ieee.org/document/9775816https://leeexplore.ieee.org/document/9775816

[2] A. Chaudhari, P. Davda, M. Dand and S. Dholay, “Profanity Detection and Removal in Videos

using Machine Learning,” IEEE, 2020.

Link: https://leeexplore.ieee.org/document/9358624https://leeexplore.ieee.org/document/9358624

[3] M. Haque, S. Afsha and H. Nyeem, “Developing BrutNet: A New Deep CNN Model with GRU

for Realtime Violence Detection,” IEEE, 2021.

Link: https://leeexplore.ieee.org/document/9775874https://leeexplore.ieee.org/document/9775874

[4] Deep Learning-Based Detection of Inappropriate Speech Content for Film Censorship, IEEE,

2023.

Link: https://ieeexplore.ieee.org/document/99003330https://ieeexplore.ieee.org/document/99003330

24

You might also like