Autocertify Copy 2 1 1
Autocertify Copy 2 1 1
BELAGAVI-590018
A Project Report
on
Autocertify - Intelligent video content classification and
adaptation
Submitted in partial fulfillment of the requirements for the final year degree in
Bachelor of Engineering in Computer Science and Engineering
of Visvesvaraya Technological University, Belagavi
Submitted by
Kalle Praveen 1RN21CS073
K Sai Dheeraj 1RN21CS084
M Sasireeth Reddy 1RN21CS090
Manoj Kumar PG 1RN21CS094
Under the Guidance of
Dr. Bhavanishankar K
Professor
Dept. of CSE
2024-2025
RN SHETTY TRUST®
RNS INSTITUTE OF TECHNOLOGY
Affiliated to VTU, Recognized by GOK, Approved by AICTE, New Delhi
NACC ’A + Grade’ Accredited, NBA Accredited (UG-CSE,ECE,ISE,EIE and EEE)
Channasandra, Dr.Vishnuvardhan Road, Bengaluru-560098
Ph:(080)28611880,28611881 URL:www.rnsit.ac.in
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING
CERTIFICATE
Certified that the Project work entitled Autocertify - Intelligent video content classification and
adaptation has been successfully carried out at RNSIT by Kalle Praveen bearing 1RN21CS073, K
Sai Dheeraj bearing 1RN21CS084, M Sasireeth Reddy bearing 1RN21CS090 and Manoj Kumar
of the requirements of final year degree in Bachelor of Engineering in Computer Science and
The Project report has been approved as it satisfies the academic requirements in respect of project
External Viva
1.
2.
Acknowledgement
setspace I extend my profound thanks to the Management of RNS Institute of Technology for
fostering an environment that promotes innovation and academic excellence.I want to express my
gratitude to our beloved Director, Dr. M K Venkatesha, and Principal, Dr. Ramesh Babu H S, for
their constant encouragement and insightful support.Their guidance has been pivotal in keeping me
motivated throughout this endeavour.
My heartfelt appreciation goes to Dr. Kavitha C, Dean and HoD of Computer Science and
Engineering, for her vital advice and constructive feedback, which significantly contributed to shaping
this project.
I also thank the Project Coordinators for their continuous monitoring and ensuring the process was
deployed as per schedule. I am deeply grateful to my project guide, Dr. Bhavanishankar K, for
his invaluable guidance and support throughout. Professor, for their unwavering support, guidance,
and valuable suggestions throughout the duration of this project. Lastly my thanks go to all the
teaching and non-teaching staff members of the Computer Science and Engineering Department for
their encouragement, cooperation and support have been invaluable during this journey.
Warm Regards,
i
Abstract
The proposed project aims to modernize and automate the video certification process in India, by
building a system powered by machine learning methods. Existing challenges like time consuming
procedures and subjective biases poses a need for a more efficient and unbiased certification
solution.The model will leverage machine learning models such as Support Vector Machine (SVM),
You Only Look Once (YOLO) and other computer vision models to automate video categorization
based on the guidelines established by the Central Board of Film Certification (CBFC) on violence.
Moreover, it will adapt the content to ensure objectionable material is made suitable, for audiences
while complying with CBFC regulations. The project consists of data collection, model training,
system integration and user interface design focusing on improving efficiency, accessibility and
adherence in the video content certification process.
ii
Contents
Acknowledgement i
Abstract ii
List of Figures v
1 INTRODUCTION 1
1.1 Existing System and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Manual Certification by CBFC . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Automated Content Tagging on OTT Platforms . . . . . . . . . . . . . . . . 2
1.1.3 Tools for User-Driven Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 LITERATURE SURVEY 3
2.1 Relevant recent paper’s summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Conclusion about literature survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 PROBLEM STATEMENT 5
3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 IMPLEMENTATION 10
5.1 Datasets Explaination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Libraries Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.2 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
iii
5.2.3 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.4 Scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.5 TensorFlow/Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.6 Joblib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2.7 Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2.8 Openpyxl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 Algorithms/Methods/Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
References 24
List of Figures
v
Chapter 1
INTRODUCTION
Digital media in India is growing and emerging very fast, and due to that, video content on
different platforms like OTTS, YouTube, and social media started growing and became available
to everybody. However, it is difficult to ensure proper grading and proper access to these materials.
The current method of video certification is a very time-consuming process; it may also be affected,
high variability in results, and slow processes of accreditation. Observing this situation, the
project offers a new model that is based on machine learning, which should renovate the whole
video certification process. This solution utilizes advanced technologies, including Support Vector
Machines(SVM),input data and set the frames of categorization in accordance with The Central
Board of Film Certification can, well in advance, pre-determine age content ratings such as U, U/A,
and A. The project has given due importance to consistency, quality, and efficiency in the content
certification for Indian movies that will also bring it very close to the CBFC guidelines. It aims to
provide a reliable system for certifying movies while also addressing content adaptation to make them
suitable for diverse audiences. The system performs adaptation in an interesting way, by making use
of sophisticated deep learning models-DeepLabV3, YOLOv10, and many others-to identify and blur
the unwanted parts in video frames, specifically the scenes that depict different degrees of violence.
This protects adherence to the CBFC regulations but keeps the content appealing and apt for a
audience. It is built on two vital words-classification and adaptation. Classification will engage in
the documentation of films under the categories developed by CBFC: A, PG, U, whereas adaptation
will ensure that sensitive content is treated with consideration. these two elements make a statement
of commitment not only to the standards set by the CBFC but also to increased viewer experience.
1
1.1 Existing System and Limitations
The Central Board of Film Certification (CBFC) in India manually reviews films using a panel of
experts. Films are classified into categories like U (Universal), UA (Parental guidance), A (Adults
only), and S (Restricted to specific groups). Certification is based on CBFC guidelines covering
themes such as violence, nudity, and language.
Limitations:
Streaming platforms like Netflix and Amazon Prime use metadata and machine learning algorithms
to categorize content by age and sensitivity levels.
Limitations:
Tools like VidAngel allow users to filter or censor specific elements (e.g., violence, nudity, profanity)
from movies or shows based on personal preferences.
Limitations:
LITERATURE SURVEY
1. Machine learning (ML) techniques in automated film censorship and rating systems: Two
distinct but interrelated studies contribute significantly to the evolving landscape of automated content
analysis for film censorship and rating. The first study explores the role of machine learning (ML)
techniques in the Automated Film Censorship and Rating (AFCR) system, specifically focusing on
the use of Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and Long
Short-Term Memory (LSTM) networks to identify explicit content such as violence and nudity in
films. This study aims to enhance the efficiency and accuracy of the traditional manual film censorship
process, which is slow and prone to errors, by utilizing advanced machine learning models for content
classification.
2. Profanity detection and removal in videos using machine learning techniques: Conversely, the
second study investigates profanity detection and removal in videos using machine learning, specif-
ically focusing on recognizing and eliminating explicit language from video platforms. Techniques
such as audio segmentation and lip coordination.This study highlights a key area of concern in content
moderation, offering a practical solution for real-time profanity detection and video modification.
3. Developing BrutNet: A New Deep CNN Model with GRU for Realtime Violence Detection:
It presents a new deep learning model named BrutNet specifically designed for real-time violence
detection in videos. The model leverages the power of Convolutional Neural Networks (CNNs) for
extracting spatial features and Gated Recurrent Units (GRUs) for capturing temporal dependencies in
video frames, which makes it capable of detecting violent content effectively.
3
4. Deep Learning-Based Detection of Inappropriate Speech Content for Film Censorship: The
reference examines that deep learning models, particularly those that incorporate advanced natural
language processing (NLP) techniques, offer significant promise in the field of film censorship.
These models can automatically detect and classify inappropriate speech content, making content
moderation systems more efficient, scalable, and accurate.
PROBLEM STATEMENT
The manual process of film censorship is time-consuming, error-prone, and unsustainable given
the vast volume of video content today. This project aims to automate content censorship and
classification using machine learning to detect violent scenes and categorize content accurately. By
aligning with guidelines like the CBFC, it streamlines censorship, reduces human intervention, and
ensures real-time content rating and categorization.
5
3.1 Objectives
This project aims to develop an automated film censorship and rating system that leverages machine
learning models to:
• Data Collection and Preprocessing: Gather a diverse dataset of videos with varying levels of
violent content and preprocess them to eliminate noise and repetitive frames, ensuring high-
quality input data for the system.
• Adherence to CBFC Guidelines: Ensure that the system follows the CBFC guidelines to
generate film ratings and certifications based on identified content.
• Content Modification and Adaptation: Modify or adapt the video content by blurring or
removing violent scenes, providing an appropriate version for different age group ratings and
audience categories.
SYSTEM ARCHITECTURE/BLOCK
DIAGRAM
The video classification process is depicted in Figure 4.1, which outlines the flow of the certification
model.
7
1. Dataset Preparation: The process begins with the collection of a labeled dataset of videos,
ensuring that each predefined category (such as ”A-rated,” ”PG-13,” or ”U-rated”) is adequately
represented. This dataset forms the foundation for the entire classification system, and its diversity is
key to building a robust classifier capable of handling various types of video content.
2. Video Preprocessing: Once the dataset is prepared, the videos undergo preprocessing to
standardize them. In this step, videos are converted into a consistent format and resolution. This
uniformity ensures that all videos, regardless of their original properties, are processed in the same
way, making it easier for the subsequent steps to work effectively.
3. Frame Extraction: After preprocessing, frames are extracted from the videos at regular
intervals, such as one frame per second. This is typically done using tools like OpenCV, which
allow for the efficient extraction of key frames that represent the content of the video. These frames
serve as important snapshots of the video’s overall content and are the primary input for the next stage
of feature extraction.
4. Feature Extraction: The extracted frames are then passed through a pretrained VGG16
deep learning model. This model is used to extract high-level features, such as objects and scene
characteristics, from the frames. These features are crucial as they encapsulate the key visual
information needed to categorize the video content accurately.
5. Training: With the features extracted from the frames, an SVM (Support Vector Machine)
classifier is trained. The classifier learns to distinguish between the predefined categories based on
the features it receives. This training step allows the model to develop the ability to classify new
frames by learning from the patterns observed in the dataset.
6. Evaluation: After training, the model’s performance is evaluated using various metrics, such
as Precision, Recall, and Accuracy. These metrics help assess how well the classifier performs in
predicting the correct categories. Based on these evaluation results, adjustments may be made to
improve the system’s performance, such as tuning the model or retraining with a more refined dataset.
7. Use Case: Once the model is trained and fine-tuned, it is ready for deployment in real-
world applications. The trained system is capable of accepting new video inputs, extracting relevant
features from them, and classifying them into the appropriate categories based on their content. This
functionality makes the system suitable for use in various scenarios, such as content moderation,
video recommendation systems, or automated content tagging.
The YOLO training process is illustrated in Figure 4.2, which outlines the flow of the adaptation
model.
1. Prepare the Dataset: Collect the required data, including images and their corresponding
annotations (bounding boxes and labels). Structure the data to meet YOLO’s input format
requirements.
2. Preprocessing: Resize images to the desired input size, normalize pixel values for consistency
during training, and ensure the annotations match the resized images.
3. Model Setup: Load a pre-trained YOLO model as a starting point to leverage transfer learning.
Configure the model for the specific dataset, including setting up class labels and data paths.
4. Training: Train the model using a configuration file that specifies dataset details. Adjust
hyperparameters such as batch size, epochs, and learning rate to optimize performance.
5. Model Saving: Save the trained model for future inference or further fine-tuning. The saved
model can be deployed to detect objects in new images or videos.
This step-by-step process ensures the YOLO model is effectively trained for customized datasets,
providing accurate object detection results.
IMPLEMENTATION
10
Dataset Attributes
Types of Violence Labeling (Degree of Violence)
Code Description Label Description
G Gunshots 0 UA/Parental Guidance
F Fighting 1 A/Adult Rated
CA Car Accident 2 U/Kids
B Blood
C Chopping
S Stabbing
E Explosion
NA Non-Violent Clips
Other Attributes
Attribute Description
Duration Total length of video in seconds.
Fps Total number of frames in video (each second
divided into 10 frames).
5.2.1 Pandas
Pandas is a popular Python library for data manipulation and analysis. It was used in this project to
handle tabular data, such as loading and processing video-related information stored in DataFrames.
The library is particularly useful for data cleaning, transformation, and analysis.
5.2.2 OpenCV
OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine
learning software library. It provides tools for real-time image and video processing. In this project,
OpenCV was used for video frame extraction, reading and writing video files, and performing various
image processing tasks such as feature extraction, object detection, and segmentation.
5.2.3 NumPy
NumPy is a fundamental library for numerical computations in Python. It provides support for arrays
and matrices, along with mathematical functions to operate on them. In this project, NumPy was used
for handling arrays and matrices, particularly in image processing and manipulating data structures.
5.2.4 Scikit-learn
Scikit-learn is a machine learning library for Python. It provides simple and efficient tools for data
mining and data analysis. In this project:
• train test split was used to split the dataset into training and testing sets.
• SVC (Support Vector Classifier) was used to train the SVM (Support Vector Machine) model
for violence detection classification.
• LabelEncoder was used to encode the categorical labels into numeric form for model
training.
5.2.5 TensorFlow/Keras
TensorFlow is an open-source machine learning framework, and Keras is a high-level neural networks
API. The following Keras modules were used:
• Model: Used to build and customize the model from VGG16 by excluding the top layer for
feature extraction.
• img to array: Converts images to NumPy arrays, which is essential for preprocessing input
images for the model.
• preprocess input: Preprocessing function for preparing the input data to be compatible
with VGG16, including scaling the pixel values appropriately.
5.2.6 Joblib
Joblib is a Python library used for serializing and deserializing Python objects, such as machine
learning models. In this project, it was used to save the trained SVM model and label encoder to disk
(dump) and to load them back during prediction (load).
5.2.7 Collections
The Counter class from the collections module was used to count the frequency of different
predicted labels. This helps determine the final classification label for a video based on the majority
prediction from the frames.
5.2.8 Openpyxl
Openpyxl is a library used for reading and writing Excel files in Python. In this project, it may have
been used (based on the pip install command) for handling Excel files to store or retrieve data
associated with video processing.
Explanation: Resizes, preprocesses, and extracts features from a video frame using the VGG16
model.
• Return X and y.
Explanation: Extracts features from frames of multiple videos, samples one frame per second, and
stores features and labels.
Explanation: Encodes labels, trains an SVM classifier on extracted features, and saves the trained
model and encoder.
Explanation: This function processes a video frame by frame, extracts features every second using
the VGG16 model, and uses a trained classifier to predict the video’s label. The predictions are
stored, and based on the majority, the video is classified into one of three categories: A-rated,
PG-13-rated, or U-rated.
• TRY:
• CATCH Exception:
Explanation: This function uses a DeepLabV3 model to segment regions in the input frame and
identifies blood regions (class ’15’). The model output is resized to match the input frame, and a
binary mask is created where the blood regions are detected. If any error occurs during the process,
an empty mask is returned.
Explanation: This function converts the input frame to the HSV color space, where blood-like colors
(reds) are easier to isolate. It creates two separate masks for red regions (one for each side of the red
hue spectrum) and combines them to cover the full range of red. The resulting mask highlights all the
red (blood-like) regions in the frame.
Explanation: Defines a configuration dictionary, ensures the target directory exists, saves it as a
YAML file, and prints a confirmation message.
Explanation: Processes video frames with a YOLO model, blurring high-confidence detected
objects, and saves the output as a video file.
It takes a video as input and returns a rating in the form of ’A’, ’U/A’ or ’PG-13’, ’U’ according to
CBFC Guidelines. We have tried 4 models for certification; according to the results, we used VGG16
for certification.
As shown in Figure 6.1 shows A-Rated video input and Figure 6.2 shows A-Rated video output.
Figure 6.1: A-Rated Video Input Figure 6.2: A-Rated Video Output
18
As shown in Figure 6.3 shows a PG-13 video input and Figure 6.4 shows PG-13 video output.
Figure 6.3: PG-13 Video Input Figure 6.4: PG-13 Video Output
As shown in Figure 6.5 shows a U-Rated video input and Figure 6.6 shows U-Rated video output.
Figure 6.5: U-Rated Video Input Figure 6.6: U-Rated Video Output
1 .Blurring a frame in a video where blood is visible using DeepLabV3 and HSV Colour Detection:
Using DeepLabV3 and HSV Colour Detection we were able to blur a full frame as well as blur in
segments
2. Blurring a segment in a frame in a video where blood is visible using Y0LOV10: With YOLOV10
we were able to successfully blur a segment in a frame in a given input video where blood is visible.
Before After
Figure 6.7: Input image with visible blood. Figure 6.8: Output image with blood blurred.
As shown in Figure 6.9 shows a Input image with no visible blood and Figure 6.10 shows Output
image unchanged.
Before After
Figure 6.9: Input image with no visible blood. Figure 6.10: Output image unchanged.
Before After
Figure 6.11: Input image with visible blood. Figure 6.12: Output image with blood blurred.
As shown in Figure 6.13 shows a Input image with visible blood and Figure 6.14 shows Output
Before After
Figure 6.13: Input image with visible blood. Figure 6.14: Output image with blood blurred.
image unchanged.
Before After
Figure 6.15: Input image with no visible blood. Figure 6.16: Output image unchanged.
With this work, we integrated violence detection and mitigation methods into a new system
for an automatic movie certification and adaption mechanism per CBFC requirements. The
model ResNet50+CNN performed very well in classifying video content and providing appropriate
certification. We handled the problem of detecting and blurring violent content with two blurring
techniques.
The first approach combined DeepLabv3 with HSV color detection, which was able to effectively
segment and blur areas containing excessive blood. The second uses YOLOv10, whereby the
segmentation and selective blurring of violent parts in video frames become easier. These strategies
have helped reduce explicit content while maintaining the viewer experience and, hence, compliance
These modality detection and adaptation techniques will provide a solid footing for autonomous
video certification and adaption, hence offering scalability and effectiveness in practical content
moderation and compliance applications. Growing the dataset with a wider gamut of video genres is
one area of future work that could be done to increase the accuracy and adaptability by investigating
sophisticated models.
23
References
[1] S. Afsha, M. Haque and H. Nyeem, “Machine Learning Models for Content Classification in
Link: https://leeexplore.ieee.org/document/9775816https://leeexplore.ieee.org/document/9775816
[2] A. Chaudhari, P. Davda, M. Dand and S. Dholay, “Profanity Detection and Removal in Videos
Link: https://leeexplore.ieee.org/document/9358624https://leeexplore.ieee.org/document/9358624
[3] M. Haque, S. Afsha and H. Nyeem, “Developing BrutNet: A New Deep CNN Model with GRU
Link: https://leeexplore.ieee.org/document/9775874https://leeexplore.ieee.org/document/9775874
[4] Deep Learning-Based Detection of Inappropriate Speech Content for Film Censorship, IEEE,
2023.
Link: https://ieeexplore.ieee.org/document/99003330https://ieeexplore.ieee.org/document/99003330
24