0% found this document useful (0 votes)
42 views48 pages

Group 24 Report

This project report describes creating a Chrome extension that will summarize YouTube transcripts by making a request to a backend REST API using natural language processing techniques. The extension will provide concise summaries of YouTube videos to help users quickly understand the key content without watching the entire video. The project aims to gain hands-on experience with abstractive text summarization and implement an interesting idea to help save time for users when browsing educational videos on YouTube.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views48 pages

Group 24 Report

This project report describes creating a Chrome extension that will summarize YouTube transcripts by making a request to a backend REST API using natural language processing techniques. The extension will provide concise summaries of YouTube videos to help users quickly understand the key content without watching the entire video. The project aims to gain hands-on experience with abstractive text summarization and implement an interesting idea to help save time for users when browsing educational videos on YouTube.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 48

A

Project Report
on
YouTube Transcript Summarizer

Submitted in partial fulfilment of requirements for the degree of


Bachelor of Technology
in
Computer Science & Engineering
by
Gourav Sharma
19000401000033

Under the Guidance of

Er. Aman Singh

(Asst. Prof. of Department, C.S.E)

Computer Science & Engineering

Raja Balwant Singh Engineering Technical Campus Bichpuri, Agra

Affiliated to Dr. A.P.J. Abdul Kalam Technical University (Formerly Known as U.P.T.U.),
Lucknow
2022 - 2023
DECLARATION

I declare that the project work presented in this report entitled “YouTube
Transcript Summarizer”, submitted to the Computer Science and Engineering
Department, Raja Balwant Singh Engineering Technical Campus, for the award
of the Bachelor of Technology degree in Computer Science and Engineering, is
our original work. I have not plagiarized or submitted the same work for the
award of any other degree.

May, 2023

Agra

Gourav Sharma (19000401000033)

ii
CERTIFICATE

This is to certify that the Project entitled “YouTube Transcript Summarizer” has
been submitted by Gourav Sharma in partial fulfilmentof the degree of Bachelor
of Technology in Computer Science & Engineering of “Raja Balwant Singh
Engineering Technical Campus, affiliated to Dr. A.P.J. Abdul Kalam Technical
University (Formerly known as U.P.T.U), Lucknow” in academic session 2022-
23.

May, 2023

Agra Er. Aman Singh


Assistant Professor, C.S.E

i
ACKNOWLEDGEMENT

Apart from our effort, the success of the project depends largely on the
encouragement and guidelines of many others. We take this opportunity to
express a gratitude to the people has been instrumental in the successful
completion of this project.
I would like to express my deep and sincere gratitude to my project guide Er.
Aman Singh (Assistant Professor of CSE) who gave me his full support and
encourage me to work in innovative and challenging projects for educational
field.
I extend our gratitude Dr. Brajesh Kumar Singh, Head of Department in
Computer Science and Engineering to encourage us to the highest peak and to
provide us the opportunity prepare the project.
I am grateful to Dr. B.S. Kushwaha (Director Academics) and Dr. Pankaj
Gupta (Director Finance & Admin.), Director, Raja Balwant Singh
Engineering Technical Campus, Bichpuri, Agra for providing us facilities and
constant encouragement. I am also grateful to all the faculty members of the
Department of Computer Science and Engineering for their deliberations and
honest concerns.
Finally, I am grateful to our parents and friends for their constant support
throughout this project work. This work was a distant reality.
I also place on record our indebtedness to those who have directly or indirectly
provided their helping hands in this endeavor.

Gourav Sharma (19000401000033)

ii
ABSTRACT
In this project, I will be creating a Chrome Extension which will make a request
to a backend REST API where it will perform NLP and respond with a
summarized version of a YouTube transcript. Enormous number of video
recordings are being created and shared on the Internet throughout the day. It has
become really difficult to spend time watching such videos which may have a
longer duration than expected and sometimes our efforts may become futile if we
couldn't find relevant information out of it. Summarizing transcripts of such
videos automatically allows us to quickly lookout for the important patterns in the
video and helps us to save time and effort to go through the whole content of the
video. This project will give us an opportunity to have hands-on experience with
state-of-the-art NLP technique for abstractive text summarization and implement
an interesting idea suitable for intermediates and a refreshing hobby project for
professionals.

i
Table of Contents

Topic Page No.


Cover Page i
Declaration ii
Certificate iii
Acknowledgment iv
Abstract v
List of Figures vi
1. Introduction, Objective and Scope 6
2. Review of Literature 9
3. Material and Methods (Technical Details) 18
3.1. Project Category 18
3.2. Techniques to be used 18
3.3. Parallel Techniques Available 20
3.4. Hardware and Software Resources Requirements and 20
their specification
4. Proposed Methodology 21
5. Testing Technology and Security Mechanisms 24
6. Future Scope, Further Enhancement and Limitations 26
7. Conclusion 28
8. Bibliography 29
8.1. References 29
8.2. Snapshots 31
8.3. Appendix 34
9. Bibliographical Sketch 38
10. Plagiarism Check 44

i
List of Figures
Figure Page No.

3.1 Steps of Natural Language Processing 18

4.1 Workflow of my Project 21

4.2.2 Flowchart of YouTube transcript summarizer 23

v
CHAPTER 1

INTRODUCTION, OBJECTIVE & SCOPE

1.1 Introduction

YouTube is a video sharing platform, the second most visited website, the second
most used search engine, and is stronger than ever after more than 17 years of
being online. YouTube uploads about 720,000 hours of fresh video content per day.
The number of videos available on the web platform is steadily growing. It has
become increasing easy to watch videos on YouTube for anything, from cooking
videos to dance videos to motivational videos and other bizarre stuff as well. The
content is available worldwide primarily for educational purposes. The biggest
challenge while extracting information from a video is that the viewer has to watch
the entire video to understand the context, unlike images, where data can be
gathered from a single frame. If a viewer has low network speed or any other
device limitation can lead to watch video with a low resolution that makes it blurry
and hectic to watch. Also, in between advertisements are too frustrating. So,
removing the junk at the start and end of the concerned video as well as skipping
advertisements, and getting is summary to directly jump to your part of interest is
valuable and time efficient. This project focuses on to reducing the length of the
script for the videos. Summarizing transcripts of such videos automatically allows
one to quickly lookout for the important patterns in the video and helps to save
time and effort to go through the whole content of the video. The most important
part of this project will be its ability to string together all the necessary information
and concentrate it into a small paragraph. Video summarization is the process of
identifying the significant segments of the video and produce output video whose
content represents the entire input video. It has advantages like reducing the storage
space used for the video. This project will give an opportunity to have hands-on
experience with state-of-the-art NLP technique for abstractive text summarization
and implement an interesting idea suitable for intermediates and a refreshing hobby
project for professionals.

6
1.2 Objective

The objective of YouTube transcript summarizer extension Using Flask


• This project presents a video transcription technique based on natural language po
cessing and machine learning to reduce YouTube video transcripts to abstract conte
nt without losing important content and details.
• This project aims to reduce the length of video scripts.
• The most important feature of this work is that it can combine all the necessary in
formation and focus in one small sentence.

1.3 Scope

A lot of technical and educational applications involving generation of large


amounts of video and multimedia are top contender of using video summarization
technique. These sports match thus removing redundancy, reducing computational
time and storage requirements.

• Research/Patents: - This application can be used to extract important vital claims


across patents or research papers thus saving time and effort.

• Crash Course: - Students who wants to watch YouTube videos for their study can
easily get a quick idea of the topic and concisely will get a quick read of the video
and can easily check whether the video is relevant for them or not.

• Quick Notes: - Students who don't want to attend the boring lectures or somehow,
they have missed the classes, they can use this application to build the notes from
the summary of the video. Most students browse on YouTube a day before their
exams and watch the video on double speed, but in reducing the watch time by
half, it doubles the confusion about a totally new topic. Thus, making things way
worse than they originally were. So, removing the junk at the start and end of the
concerned video as well as skipping advertisements, and getting is summary to
directly jump to your part of interest is valuable and time efficient.

7
• Customer feedback: - Most of the time getting long feedback from the customers
for any particular product, this application helps to summarize their long feedback
and can easily predict whether the feedback is positive or negative.

• Hearing Impaired Person: - This application is beneficial for hearing impaired


persons as they can.

8
CHAPTER 2

REVIEW OF LITERATURE

Prof. SH Chaflekar et al. [1] spend a noticeable amount of our weekly time watching
YouTube videos, be it for entertainment, education, or exploring our interests. In most
cases, the overall intent is to obtain some form of information from the video. We were
seeking a solution to increase the efficiency of this "information extraction" process as
YouTube's speed adjustment option is the only relevant tool. The summarizer is a
Chrome extension that works with YouTube to extract the key points of a video and
make them accessible to the user. The summary is customizable per user's request,
allowing varying extents of summarization. Key points from the summarization process,
together with corresponding time-stamps, are then presented to the user through a small
UI next to the video feed. This allows the user to navigate to more important sections of
the video, to get to the key points more efficiently. The main idea behind it is to be able
to find a short subset of the most essential information from the entire set and present it
in a human-readable format. As online Textual data grows, automatic Summarization of
text methods has the potential to become very helpful because more useful information
can be read in a short time. described, Facial Recognition, the biggest breakthrough in
Biometric identification and security since fingerprints, uses an individual’s facial
features to identify and recognize them. A technology that seems too far-fetched taken
straight from a science-fiction novel is now available in smartphones in the palm of our
hands. Facial Recognition has gained traction as the primary method of identification
whether its mobile phones, smart security systems, ID verification or something as
simple as login in a website. Recent strides in facial recognition technologies have made
it possible to design, build and implement a facial recognition system ourself. Using
Computer Vision and machine learning libraries like Facial Recognition and Dlib, people
can create a robust system that can detect faces and then match and identify it with a
database of pre-loaded facial data to successfully recognize them.

Hafiz Burhan Ul Haq et.al. [2] proposed that advancements in digital video technology
have empowered video surveillance to play a vital role in ensuring security and safety.
Public and private enterprises use surveillance systems to monitor and analyses daily

9
activities. Consequently, a massive volume of data is generated in videos that require
further processing to achieve security protocol. Analyzing video content is tedious and a
time-consuming task. Moreover, it also requires high-speed computing hardware. The
video summarization concept has emerged to overcome these limitations. This paper
presents a customized video summarization framework based on deep learning. The
proposed framework enables a user to summarize the videos according to the Object of
Interest (OoI), for example, person, airplane, mobile phone, bike, and car. Various
experiments are conducted to evaluate the performance of the proposed framework on
the video summarization (VSUMM) dataset, title-based video summarization (TV Sum)
dataset, and own dataset. The accuracy of VSUMM, TV Sum, and own dataset is 99.6%,
99.9%, and 99.2%, respectively. A desktop application is also developed to help the user
summarize the video based on the OoI.

A. N. S. S. Vybhavi et.al. [3] proposed a video summarizing system based on natural


language processing (NLP) and Machine Learning to summarize the YouTube video
transcripts without losing the key elements. The quantity of videos available on web
platforms is steadily expanding. The content is made available globally, primarily for
educational purposes. Additionally, educational content is available on YouTube,
Facebook, Google, and Instagram. A significant issue of extracting information from
videos is that unlike an image, where data can be collected from a single frame, a viewer
must watch the entire video to grasp the context. The suggested method involves
retrieving transcripts from the video link provided by the user and then summarizing the
text by using Hugging Face Transformers and Pipelining. The built model accepts video
links and the required summary duration as input from the user and generates a
summarized transcript as output.

Fady Bassel et.al. [4] proposed that in videos, description and keywords play an
important role in the choosing process of the right video to watch. The main idea of the
proposed approach is to generate descriptions and timestamps for videos automatically.
Our approach plays an essential role in reducing the time consumed searching for the
proper video. It aims to save time for users watching wrong unwanted videos and saves
their time using timestamps. Timestamps would help to find and watch only the desired
part of the video. One of the main goals of our approach is actual keyword extraction.
Extracted keywords help finding videos with the significant video's keywords. The

1
summarizing of the video depends on frames, emotions and speech. Firstly, the video
content appears in the frame and output a summarized text for the video content.
Secondly, emotion and how it changes during a specific period merged with the
outputted summarization of the frames. Thirdly, the audio transcribing into text occurs
and output an abstractive summarization of the audio track. Finally, the fusion happens
between all summarizations (audio, video, emotion) using natural language processing
techniques. Techniques such as tokenization, sentence segmentation and lemmatization
\& stemming, and then abstractive summarization. Video summarization occurs to get a
meaningful accurate description of the video. Having an accurate description helps
finding the inquired content matching the description. The implemented experiment
showed that on average 87\% of the participants found generated text well representing
the video.

Shraddha Yadav et.al. [5] proposed two different methods to generate summary and
important keywords from the given YouTube video - extractive and abstractive. They
have made a simple user interface through which users can easily get their summaries
through these methods, and surely find it easy to interact with their user interface and get
what they want. Their project surely satisfies the users and solve all the problems that
it’s supposed to tackle which is saving time and efforts, by providing only the useful
information about the topic which interests them so that they don't have to watch those
long videos and the time that saved can be used in gaining more knowledge.

E. Apostolidis et.al. [6] proposed method in is focuses on the recent advances in the
area and provides a comprehensive survey of the existing deep-learning-based methods
for generic video summarization. After presenting the motivation behind the
development of technologies for video summarization, they formulated the video
summarization task and discuss the main characteristics of a typical deep-learning-based
analysis pipeline. Then, suggested a taxonomy of the existing algorithms and provide a
systematic review of the relevant literature that shows the evolution of the deep-
learning-based video summarization technologies and leads to suggestions for future
developments.

Yudong Jiang et.al. [7] said that previous methods mainly take diversity and
representativeness of generated summaries as prior knowledge in algorithm design. In

1
this paper [4], they formulate video summarization as a content-based recommender
problem, which should distill the most useful content from a long video for users who
suffer from information overload. A scalable deep neural network is proposed on
predicting if one video segment is a useful segment for users by explicitly modelling
both segment and video. Moreover, they accomplished scene and action recognition in
untrimmed videos to find more correlations among different aspects of video
understanding tasks. Also, paper discussed the effect of audio and visual features in
summarization task.

Aniqa Dilawari and Muhammad Usman Ghani Khan. [8] stated that a massive
number of videos is produced every day, which contains audio, visual and textual data.
This constant increase is due to the ease of recording service in portable devices such as
mobile phones, tablets or cameras. The major challenge is to understand the visual
semantics and convert it into a condensed format such as caption or summary to save
storage space, enables users to index and navigate and help gain information in less
time. We propose an innovative joint end-to-end solution, ASoVS, which uses deep
neural network to generate natural language description and abstractive text
summarization of an input video. This provides a text-based video description and
abstractive summary enabling users to discriminate between relevant and irrelevant
information according to their needs. Furthermore, our experiments show that the joint
model can attain better results than the baseline methods in separate tasks with
informative, concise and readable multi-line video description and summary in a human
evaluation.

P. Choudhary et.al. [9] proposed that Automatic summarization techniques will give
the user an easy way to look up important content of a collection of media and to browse
media of their choice later. With the evolution of sophisticated capturing devices, cloud-
based summarization solutions, which have a lot of turnaround time, are less preferred
by end user. In this paper, author proposed a real-time video summarization technique
for mobile platform which analyses the video during live camera recording and
generates summary instantaneously. This technique employs the method of analyzing
intrinsic video data like the contents of video stream, and corresponding extrinsic
metadata such as external camera information of the video stream. The proposed
technique has been able to achieve an f-measure of 0.66 and 0.84 on SumMe and

1
SumLive datasets respectively while limiting the overall power consumption to 20
milliamps on an embedded system.

Justine Raju Thomas et.al. [10] elaborated that Summarization is the process of
reducing a text document to create a summary that retains the most important points of
the original document. Extractive summarizers work on the given text to extract
sentences that best convey the message hidden in the text. Most extractive
summarization techniques revolve around the concept of finding keywords and
extracting sentences that have more keywords than the rest. Keyword extraction usually
is done by extracting relevant words having a higher frequency than others, with stress
on important ones. Manual extraction or annotation of keywords is a tedious process
brimming with errors involving lots of manual effort and time. In this paper, we
proposed an algorithm to extract keyword automatically for text summarization in e-
newspaper datasets. The proposed algorithm is compared with the experimental result of
articles having the similar title in four different e-Newspapers to check the similarity and
consistency in summarized results.

Bin Zhao and Eric P. Xing [11] proposed online video highlighting, a principled way
of generating short video summarizing the most important and interesting contents of an
unedited and unstructured video, costly both timewise and financially for manual
processing. Specifically, their method learns a dictionary from given video using group
sparse coding, and updates atoms in the dictionary on-the fly. A summary video is then
generated by combining segments that cannot be sparsely reconstructed using the
learned dictionary. The online fashion of their proposed method enables it to process
arbitrarily long videos and start generating summaries before seeing the end of the
video. Moreover, the processing time required by proposed method is close to the
original video length, achieving quasi real-time summarization speed.

Idham Widodo et.al. [12] aimed to investigate the rhetorical structure of move and step
of short lecture by famous applied linguist Jack C. Richards posted on YouTube. The
data of this study were 22 video-transcripts of a short lecture of Jack C Richards. The
results: (1) three moves of rhetorical structure such as M1 – Introduction, M2 – Content
of Short Lecture, and M3 – Conclusion. They are 100% occurred in all of the data
analysed as obligatory category. (2) the most often found steps in the short lectures that
occurred 100% and classified as obligatory category, such as M2SB – Argumentation of

1
the short lecture and M3SA – Summarizing the points and the steps with 60-99%
percentage of occurrences as classified as conventional category, namely are M1SE –
Announcing topic of oral presentation, M1SA – Greeting the Audience, M2SC –
Illustration of short lecture, and M2SA – Description of short lecture. The new
proposed model of spoken genre analysis adapted from Ali and Singh (2019), the
Sermon model by Cheong cited in Safnil (2010) and Seliman (1996) for identifying the
rhetorical structure of short lecture is effective enough to capture the possible rhetorical
moves and steps in a whole text of short lecture by famous applied linguist posted in
YouTube.

Sourav Biswas and Atul kumar patel [13] said that watching long YouTube videos is
very time-consuming and boring. Nowadays YouTube is an essential aspect of providing
news and information. It is also considered a second teacher to the students; educational
videos are the most viewed videos on YouTube today. In this project, we have tried to
provide a quick, precise, and informative summary of a video. Many techniques are
already discovered but they only provide test summarization. We have tried to get the
summary of a video basically a YouTube video. For this project, we have used a
hugging face transformer to summarize the content of a YouTube video along with that
we have used python API to get the subtitle of a given video. After that our model will
perform text summarization on it and display the summary to the user so that people can
save their precious time reading the summary.

Abdulwahid Albeer et.al. [14] stated that Automatic summarization is a technique for
quickly introducing key information by abbreviating large sections of material.
Summarization may apply to text and video with a different method to display the
abstract of the subject. Natural language processing is employed in automated text
summarization in this research, which applies to YouTube videos by transcribing and
applying the summary stages in this study. Based on the number of words and sentences
in the text, the method term frequency-inverse document frequency (TF-IDF) was used
to extract the important keywords for the summary. Some videos are long and boring or
take more time to display the information that sometimes finds in a few minutes.
Therefore, the essence of the proposed system is to find the way to summarize the long
video and introduce the important information to the user as a text with few numbers of
lines to benefit the students or the researchers that have no time to spend with long

1
videos for extract the useful data. The results have been evaluated using Rouge method
on the convolutional neural network (CNN)-dailymail-master data set.

Vaishali P. Kadam et.al. [15] said that text summarization is the most popular
application and a challenging task in the natural language processing. It is important for
searching the specific information within the short time span from the input document. It
is presently in demand to have quick information access as a summary to make a
conclusion about the document text. This summary always presented with limited word
and specific information contents for the search item. Summarizer systems are capable
of generating a short version of the overall text after the analysis of the text it always
retain its original meaning and the actual theme in the summary text. There are many
automated summarizer systems developed for various Indian languages but still these
systems are not achieved the matured stage. This paper proposed a methodology for
development of the automated text summarization technique for Marathi language. We
have got 44.48% compression accuracy for the summary by our system.

S. Tharun, et.al. [16] concluded that thousands of video recordings are created and
shared on the internet every day. It is becoming increasingly difficult to spend time to
watch such videos, which may take longer than anticipated, and our efforts may go in
vain if we are unable to extract meaningful information from them. Summarizing
transcripts of such videos helps us to quickly search for relevant patterns in the video
without having to go through the entire content. Abstractive transcript summarization
model is very useful in extracting YouTube video transcripts and generates a
summarized version. An automatic summarizer's purpose is to shorten the time of
reading, enable easier selection, be less prejudiced compared to humans, and portray
content that is compressed while preserving the important material of the actual
document. Extractive and abstractive approaches are the two most common ways to
summarise text. Extractive approaches choose phrases or sentences from input text,
whereas Abstractive methods generate new words from input text, making the task much
more difficult.

Amey Thakur and Mega Satish [17] described that Text summarization is the process
of making a synopsis from a given text document while keeping the important
information and meaning of it. Automatic summarization has become an essential

1
method for accurately locating significant information in vast amounts of text in a short
amount of time and with minimal effort. In this project, we propose to implement a web
application that can summarize a text or a Wikipedia link. We have additionally been
given an opportunity to compare different methods of summarization. Problem
Statement - The tremendous abundance of material available on the internet has
produced an odd paradox: people are immersed in information, yet they are yearning for
wisdom. It is tough to keep up with the internet's daily production of billions of articles.
Is there a method to absorb information more effectively in this case without increasing
reading time? We are proposing for the above problem a Text Summarizer web app
using NLP and NLTK libraries.

Shivani Patil et.al. [18] proposed summarization of the video in Regional Languages.
During the procedure, we used methodology NLP, LSA, and MoviePy. This paper aims
to produce a short video of long video without missing any point. The technique first
short video of any downloaded video. A web application that takes an input of the video
and accuracy of the video, then we get this summaries video into text and this text
converted into any regional language. This paper is going to represent an Extraordinary
NLP application. This application benefits Students, and teachers by saving time.

1
CHAPTER 3

MATERIALS & METHODS (TECHNICAL DETAILS)

3.1 Project Category


I am using the Natural Language Processing (NLP) analysis based on information
extraction techniques. This paradigm, making use of techniques from artificial
intelligence, entails performing a detailed semantic analysis of the source text to
build a source representation designed for a particular application. Then a summary
representation is formed using this source representation and the output summary
text is synthesized.

Fig 3.1: Steps of Natural Language Processing.

3.2 Techniques to be used


3.2.1 Languages

 Python - Python is a popular programming language. It was created by


Guido van Rossum and released in 1991. It offers clean models on both
small and large scales. Python has dynamic system types and automatic
memory management. It supports many functions, including object
orientation, values, functions, and methods, and has a large and
comprehensive library.
 Flask - Flask is a web framework, it’s a Python module that lets you develop
web applications easily. It’s having a small and easy-to-extend core: it’s a
microframework that doesn’t include an ORM (Object Relational Manager)
or such features. Flask is based on the Werkzeg WSGI toolkit and the Jinja2

1
template engine. Both are Pocco projects.
 Json - JSON (JavaScript Object Notation) is a lightweight data-interchange
format. It is easy for humans to read and write. It is easy for machines to
parse and generate. It is based on a subset of the JavaScript Programming
Language Standard ECMA-262 3rd Edition - December 1999. JSON is a
text format that is completely language independent but uses conventions
that are familiar to programmers of the C-family of languages, including C,
C++, C#, Java, JavaScript, Perl, Python, and many others. These properties
make JSON an ideal data-interchange language.
 JavaScript - JavaScript is a simple programming language. It is designed to
build web-centric applications. It complements and integrates with Java.
JavaScript is very easy to use as it integrates with HTML. It is open-source
and cross platform.
 Html - Html stands for HyperText Markup Language. It is used to create
web pages and web applications. It is a very easy and simple language. It
can be easily understood and modified. It is a markup language, so it
provides a flexible way to design web pages along with the text.
 Css - Cascading Style Sheets (CSS) is a stylesheet language used to describe
the presentation of a document written in HTML or XML (including XML
dialects such as SVG, MathML or XHTML). CSS describes how elements
should be rendered on screen, on paper, in speech, or on other media.

3.2.2 Tools

 Transformers – Transformers provides APIs and tools to easily download


and train state-of-the-art pretrained models. Using pretrained models can
reduce your compute costs, carbon footprint, and save you the time and
resources required to train a model from scratch. These models support
common tasks in different modalities, such as: Natural Language
Processing, Computer Vision.

 Visual Studio Code - Microsoft Visual Studio is an IDE developed by


Microsoft for various types of software development such as computers,
websites, web applications, web services, and mobile applications. It has

1
complete tools, compilers and other features to make software
development easy.

3.3 Parallel Techniques Available


3.3.1 Django
Django is a high-level Python web framework that encourages rapid development
and clean, pragmatic design. Built by experienced developers, it takes care of much
of the hassle of web development, so you can focus on writing your app without
needing to reinvent the wheel. It’s free and open source.
3.3.2 Node.js
Node.js is an open source, cross-platform runtime environment and library that is
used for running web applications outside the client’s browser. It is used for
server-side programming, and primarily deployed for non-blocking, event-driven
servers, such as traditional web sites and back-end API services, but was
originally designed with real-time, push-based architectures in mind.

3.4 Hardware and Software resource requirements and their specifications

3.4.1 Hardware Requirements

 Processor: Intel® Core™ i3 or above

 RAM: 4 GB and above

 Hard Disk: 120 GB

 Input Devices: Keyboard, Mouse

3.4.2 Software Requirements

 Operating System: Window 10/11 / MAC

 Programming language: Python 3.10, JavaScript, CSS, Html

 Special Tools: YouTube transcript api, Google Chrome

1
CHAPTER 4

PROPOSED METHODOLOGY

4.1. Proposed Algorithm

Steps for YouTube transcript Summarization: -


1. The First step is to getting the video link from the user which user wants to
summarize. The video should be Recorded, it should have a valid video id
and it should be available on YouTube.
2. After taking the video link from the user, the next part is to get the transcripts
on video. Now it will check whether the given video has subtitles available or
not.
3. Passing the Generated transcripts to the text summarizer. Now this is the
main phase of the project where the whole project depends upon. This phase
basically includes the text summarization.

4. Summarize the converted text. If required we can download the summary in


pdf format.

Fig 4.1: Workflow of my Project

2
4.2 System Architecture and Flowchart

4.2.1 System Architecture


A. Backend
Main functioning of the system will be done in the python programming
language. Python has various inbuilt modules like YouTube transcript-API
used to get subtitles of videos. For summarization we will be using
Hugging face transforms. To translate text in different languages, google
translator api model will be useful.

B. Get Transcript
Using a python API called Youtube transcript api we can get the
transcripts/subtitles for a given YouTube video. It also generates the
transcript for youtube videos.

C. Text Summarization
The process of condensing lengthier text into a concise summary while
maintaining the main ideas and general meaning is known as text
summarizing.
There are two methods that are frequently employed for text
summarization:
1) Extractive Summarization: In this method, the model isolates the crucial
phrases and sentences from the source text and only
outputs them.
2) Abstractive Summarization: The model generates new sentences in a
new format, resulting in an entirely distinct text that is shorter than the
original. Transformers will be used in this project to implement this
strategy. In this system, abstractive text summarization will be done on the
transcript received in the previous phase using the Python Hugging Face
transformers module.

D. User Interface
User interface is needed to ensure that the user can interact with the system.

2
User is done using languages like HTML, CSS and flask as a framework. It
will be useful to provide users better interaction with the system.

4.2.2 Flowchart

Fig 4.2.2: Flowchart of YouTube transcript summarizer

2
CHAPTER 5

TESTING AND SECURITY MECHANISMS


5.1 Testing Technologies

In my project, I will use two types of test methods, and test system. This testing
process also helps test one of the systems; We tried all systems.

5.2 Testing Objectives

There are several rules that can serve as testing objectives they are:

• Testing is a process of executing a program with the intent of finding an error.


• A good test case is the one that has a high probability of finding an
undiscovered error.

 Unit Testing Steps:


i. Preparation of Test Cases.
ii. Preparation of possible test data with all validation checks.
iii. Complete Code Review of The Model.
iv. Actual Testing done manually.
v. Prepared Test Result Script.

• Black Box Testing Steps:


In this strategy some test cases are generated as input conditions that fully execute
all functional requirements for the program.
This testing has been used to find error in the following categories:
i. Interface errors.
ii. Errors in data structures are external database access.
iii. Performance error.
iv. Initialization and termination of errors.
v. In this testing only the output is checked for correctness.

2
 System Testing Steps:
i. Integration of all modules in the system.
ii. Preparation of test cases.
iii. Preparation of possible test data with all validation checks.
iv. Actual testing done manually.
v. Recording of all reproduced errors.
vi. Modifications done for the errors found during testing.
vii. Prepared the test result script after rectification of errors.

When unit testing is done for all modules, the whole system is integrated into that
module with all its dependencies. In the integration process, we consider each
module individually and test the system at every step. This will help reduce errors
during system testing.

5.2 Security Mechanism

This project does not use any special security measures as it is an approximate
model and does not collect data (symptoms) from customers. It is used only when
estimating, so no special security is required.

2
CHAPTER 6

FUTURE SCOPE, FURTHER ENHANCEMENT AND


LIMITATIONS

6.1. Future Scope


1. This idea can be further extended to make a system that will automatically
generate notes of a lecture.
2. Those who are deaf may find this useful.
3. For generating meeting notes (all important points that are covered in a virtual
meeting).
4. By using this model, it also arranges the important points discussed in
parliament meeting and other government planning meeting.
6.2. Further Enhancement
1. Improve the summarization algorithm by incorporating advanced NLP
techniques.
2. Allow users to customize the level of summarization based on their
preferences.
3. Integrate machine learning models for better understanding and
summarization of video content.
4. Expand support to other video platforms beyond YouTube.
5. Implement additional features, such as keyword extraction and topic analysis,
to enhance the summarization process.
6.3. Limitations
i. Transcript cannot get from the videos without subtitle.
ii. Translated text other than English won’t support text and pdf file formats
because of encoding format.
iii. The summarization algorithm used in the YouTube Transcript Summarizer may
not always produce perfectly accurate summaries. The generated summaries
may occasionally miss important details or misinterpret certain aspects of the
transcript.
iv. The effectiveness of the summarizer heavily relies on the quality and accuracy
of the provided video transcripts. If the transcript itself contains errors, typos, or

2
inaccuracies, it can impact the quality and coherence of the generated
summaries. Additionally, the summarizer may struggle with summarizing
videos that have poor audio quality or unclear speech.
v. The YouTube Transcript Summarizer focuses solely on the textual content of
the video transcripts. It does not take into account any visual information, such
as images, graphs, or demonstrations present in the videos. As a result, the
summaries may not capture the full richness of the video content, particularly
when visual elements play a significant role.
vi. The YouTube Transcript Summarizer relies on the YouTube Data API to fetch
video information and transcripts. Any changes or restrictions imposed by
YouTube on their API may impact the functionality or availability of the
extension. Changes in API policies or limitations may require updates or
adjustments to ensure continued compatibility.
vii. The YouTube Transcript Summarizer is developed as a Chrome extension,
limiting its usage to the Chrome browser. Users on other browsers or platforms
may not have access to the extension's features. Additionally, future updates or
changes to the Chrome browser or its extension framework may require
modifications to maintain compatibility.
viii. The YouTube Transcript Summarizer project may have limited flexibility in
terms of user control over summarization parameters. Users may not have the
ability to customize the summarization process, such as adjusting the length of
the summary or specifying the level of detail required. This lack of
customization could limit the project's suitability for individual user preferences
and requirements.
ix. The project's user interface (UI) may have limited customization options. Users
may have minimal control over the appearance, layout, or visual aspects of the
extension's UI. The project may focus on providing a functional and intuitive UI
without extensive customization features, which could restrict users who prefer
more personalized or tailored UI experiences.

2
CHAPTER 7

CONCLUSION

This project has proposed a YouTube Transcript summarizer. The system takes the
input YouTube video from the Chrome extension of the Google Chrome browser
when the user clicks the summary button on the Chrome extension webpage and
accesses the transcripts of that video using the python API. The obtained transcripts
are then summarized with the transformer package. The user is then presented with a
summary text on the Chrome extension webpage. This project helps users a lot by
saving their precious time and resources. This helps us get the gist of the video
without watching the entire video. It also helps the user to identify unusual and
unhealthy content so that it does not interfere with their viewing experience. This
project also provides a great user interface when finding summary text because
Chrome extensions have been used.

2
CHAPTER 8

BIBLIOGRAPHY

8.1 REFERENCES

[1] Chaflekar, Prof & Bahadure, Achal & Bramhapurikar, Hosanna & Satpute,
Ruchika & Jumde, Rutuja & Bakhare, Sakshi & Bhirange, Shivani. (2022).
YouTube Transcript Summarizer using Natural Language Processing.
International Journal of Advanced Research in Science, Communication and
Technology. 108-113. 10.48175/IJARSCT-3034.
[2] Haq, Hafiz Burhan & Asif, Muhammad & Ahmad, Maaz & Ashraf, Rehan &
Mahmood, Toqeer. (2022). An Effective Video Summarization Framework
Based on the Object of Interest Using Deep Learning. Mathematical Problems
in Engineering. 2022. 1-25. 10.1155/2022/7453744.
[3] A. N. S. S. Vybhavi, L. V. Saroja, J. Duvvuru and J. Bayana, "Video
Transcript Summarizer," 2022 International Mobile and Embedded
Technology Conference (MECON), 2022, pp. 461-465, doi:
10.1109/MECON53876.2022.9751991.
[4] Bassel, Fady & Refaat, Mark & Abdelhamed, Mohamed & Shorim, Nada &
AbdelRaouf, Ashraf. (2021). Automatic Video summarization with
Timestamps using natural language processing fusion. 0060-0066.
10.1109/CCWC51732.2021.9376115.
[5] Shraddha Yadav, Arun Kumar Behra, Chandra Shekhar Sahu, Nilmani
Chandrakar, “SUMMARY AND KEYWORD EXTRACTION FROM
YOUTUBE VIDEO TRANSCRIPT”, International Research Journal of
Modernization in Engineering Technology and Science
Volume:03/Issue:06/June-2021 Impact Factor- 5.354.
[6] E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris and I. Patras,
"Video Summarization Using Deep Neural Networks: A Survey," in
Proceedings of the IEEE, vol. 109, no. 11, pp. 1838-1863, Nov. 2021,
doi:10.1109/JPROC.2021.3117472.
[7] Yudong Jiang, Kaixu Cui, Bo Peng, Changliang Xu; “Comprehensive Video
Understanding: Video Summarization with Content-Based Video
Recommender Design”; Proceedings of the IEEE/CVF International
Conference on Computer Vision (ICCV), 2019, pp. 0-0.
[8] Dilawari, Aniqa & Khan, Muhammad Usman. (2019). ASoVS: Abstractive
Summarization of Video Sequences. IEEE Access. PP. 1-1.
10.1109/ACCESS.2019.2902507.
[9] P. Choudhary, S. P. Munukutla, K. S. Rajesh and A. S. Shukla, "Real time
video summarization on mobile platform," 2017 IEEE International

2
Conference on Multimedia and Expo (ICME), 2017, pp. 1045-1050, doi:
10.1109/ICME.2017.8019530.
[10] Thomas, Justine & Bharti, Drsantosh & Babu, Korra. (2016). Automatic
Keyword Detection for Text Summarization in e-
Newspapers.10.1145/2980258.2980442.
[11] Bin Zhao, Eric P. Xing; Quasi Real-Time Summarization for Consumer
Videos; Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2014, pp. 2513-2520.
[12] Widodo, I. Diani, and S. Safnil, “The Rhetorical Structure of Short Lecture by
Famous Applied Linguists Jack C. Richards Posted on YouTube”, JADILA,
vol. 1, no. 2, pp. 128-138, Nov. 2020.
[13] Sourav Biswas, A. K. P. (2022) “YouTube Transcript Summarizer to
Summarize the content ofYouTube.” Zenodo.
doi:10.5281/ZENODO.6511886.
[14] Albeer, Rand & Alshahad, Huda & Aleqabie, Hiba J. & Al-Shakarchy, Noor.
(2022). Automatic summarization of YouTube video transcription text using
term frequency-inverse document frequency.
[15] Kadam, V. P., Alazani, S. A. and Namrata Mahender, C. (2022) “A text
summarization system for Marathi language.” Zenodo. doi:
10.5281/ZENODO.7073509.
[16] Tharun, S. & Kumar, R. & Sravanth, P. & Reddy, G. & Akshay, B. (2022).
Survey on Abstractive Transcript Summarization of YouTube Videos.
International Journal of Advanced Research in Science, Communication and
Technology. 231-238. 10.48175/IJARSCT-3181.
[17] Thakur, Amey & Satish, Mega. (2021). Text
Summarizer.10.13140/RG.2.2.17259.67360.
[18] Patil, Shivani & Yadav, Swati & Shinde, Shreya & Waghmare, Darshani &
Patil, Rutuja & Babar, Prof. (2022). Video Transcript Summarization in
Marathi. International Journal of Advanced Research in Science,
Communication and Technology. 82-86. 10.48175/IJARSCT-4983.

2
8.2 SNAPSHOTS

Snapshot 1: YouTube transcript summarizer extension.

3
Snapshot 2. Interface of the extension.

3
Snapshot 3. Extension Summarizes the Transcript.

3
8.3 APPENDIX

#code of main app


from flask import Flask, request
from youtube_transcript_api import YouTubeTranscriptApi
from transformers import pipeline
app = Flask( name )
@app.get('/summary')
def summary_api():
url = request.args.get('url', '')
video_id = url.split('=')[1]
summary = get_summary(get_transcript(video_id))
return summary, 200

def get_transcript(video_id):
transcript_list = YouTubeTranscriptApi.get_transcript(video_id)
transcript = ' '.join([d['text'] for d in transcript_list])
return transcript

def get_summary(transcript):
summariser = pipeline('summarization')
summary = ''
for i in range(0, (len(transcript)//1000)+1):
summary_text = summariser(transcript[i*1000:(i+1)*1000])[0]['summary_text']
summary = summary + summary_text + ' '
return summary

if name == ' main ':


app.run()

#html code for creating chrome extension


<!DOCTYPE html>
<html>
<head>

3
<title>Youtube Transcript Summariser</title>
<style>
h1 {
text-align: center;
}
body {
width: max-content;
max-width: 800px;
}
button {
background-color: red;
color: white;
border-radius: 8px;
width: max-content;
height: max-content;
padding: 10px;
font-size: large;
margin: auto;
display: block;
border-color: coral;
}
button[disabled] {
background-color: lightcoral;
color: white;
border-radius: 8px;
width: max-content;
height: max-content;
padding: 10px;
font-size: large;
margin: auto;
display: block;
border-color: lightpink;
}

3
p{
font-size: medium;
}
</style>
</head>
<body>
<h1>Youtube Transcript Summariser</h1>
<button id="summarise" type="button">Summarise</button>
<br/>
<p id="output"></p>
<script src="popup.js"></script>
</body>
</html>

#code for java script used in making extension


const btn = document.getElementById("summarise");
btn.addEventListener("click", function() {
btn.disabled = true;
btn.innerHTML = "Summarising...";
chrome.tabs.query({currentWindow: true, active: true}, function(tabs){
var url = tabs[0].url;
var xhr = new XMLHttpRequest();
xhr.open("GET", "http://127.0.0.1:5000/summary?url=" + url, true);
xhr.onload = function() {
var text = xhr.responseText;
const p = document.getElementById("output");
p.innerHTML = text;
btn.disabled = false;
btn.innerHTML = "Summarise";
}
xhr.send();
});
});

3
#code for json used in extension
{
"manifest_version": 3,
"name": "Youtube Summariser",
"description": "An extension to summarize youtube videos using the transcript",
"version": "1.0",
"permissions": ["activeTab", "declarativeContent"],
"host_permissions": ["http://127.0.0.1:5000/*"],

"action": {
"default_title": "Summarise this video",
"default_icon": {
"16": "images/icon.png",
"32": "images/icon.png",
"48": "images/icon.png",
"128": "images/icon.png"
},
"default_popup": "popup.html"
},

"icons": {
"16": "images/icon.png",
"32": "images/icon.png",
"48": "images/icon.png",
"128": "images/icon.png"
}
}

3
CHAPTER 9

BIBLIOGRAPHICAL SKETCH

Er. Aman Singh (Supervisor) Assistant Professor.

Er. Aman Singh is currently serving as Assistant Professor of Department of the Post
Graduate Department of Computer Science & Engineering of the Raja Balwant Singh
Engineering Technical Campus, Bichpuri, Agra. He obtained his B.Tech degree in
Computer Science and Enginering from U.P.T.U with First Division in 2011. He obtained
the Master of Technology (M.Tech) degree from SRM University, Chennai in Computer
Science & Engineering with First Division in 2014. He is having eight years of experience.
He is presently engaged in research and development activities in the area of Data Structure,
Software Engineering, Computer Organization and Architecture and DAA.
Academic Qualification: B. Tech and M. Tech, Ph.D. (Pursuing)
Designation with Department: Assistant Professor (Computer Science & Engineering)
Contact No: 9358656548
Email: [email protected]
Specialization: Data Structure, Software Engineering, Computer Organization and
Architecture and DAA.
Experience: 08 Years

Present Area of work: Python Programming, C Programming, Data Structure and


Software Engineering.

Research Articles/Published/Membership:

3
 Research Articles Published: 07
 Papers published in International and National conferences: 07
 “CBIR Algorithm for Image Feature Extraction Using Color, Texture and Shape Mo-
dels.
 Different Approaches of Image Retrieval Techniques.
 A Review on Application of Digital Image Processing on Biotechnology & Bioscien-
ce.
 Video Based Face Recognition Biometric Security System.
 An Efficient Approach for Face Identification Using Neural Network.
 Applications of Computer in Agricultural Research- A review
 Big Data and Its Use in Smart Farming and Agricultural Data Analysis

Journals/Academic Achievements:
 Participated in a Two-week ISTE STTP on Technical Communication conducted by
IIT, Bombay, 2015.
 Successfully completed FDP101x Foundation Program in ICT for Education by IIT
Bombay, 2017.
 Participated in Faculty Development Program on “Android Skilling” by Google,
AKTU, IEI Agra region, 2017.
 Participated in FDP on Natural Language Processing (WNLP-2017) Sponsored by
Dr. A.P.J. Abdul Kalam University, Lucknow, UP.
 No. of B.Tech. Students Guided: 20

3
Prof. (Dr.) Brajesh Kumar Singh (H.O.D)

Dr. Brajesh Kumar Singh was born in District Agra (U.P.) in 1978. He completed his
doctorate degree in Computer Science and Engineering from Motilal Nehru National
Institute of Technology, Allahabad (U.P.) in year 2014. He joined as a Lecturer. / Asstt.
Prof. at R.B.S. Engineering Technical Campus, Bichpuri, Agra in Year 2001. In year 2007,
he was appointedas Reader/ Assoc. Prof. in same organization. In December 2017, he took
over charge as Headof the department in Computer Science and Engineering. In Oct 2018,
he got promoted on thepost of Professor. He has guided more than 50 B.Tech. and 9 M.
Tech. projects of National and international repute. He is supervising 2 Ph.D. candidates.
He has 50 publications to his credit in national and international journals and proceedings
of high repute with large number of citations of his research manuscripts. Dr. Singh has
delivered several invited talks/ key note addresses and chaired sessions in national and
international conferences of high repute in India and abroad. He is having collaborative
training programs/workshops with IIT Bombay. He significantly contributed in enhancing
the research standards in the department of CSE. He is in the receipt of IBM best project
awards. Dr. Singh has organized successfully more than 45 International and national
Conferences/Seminars/Workshops as organizing secretary/ memberof international program
Committee in India and abroad. He is the editor of highly reputed national/ International
Journals.

Academic Qualification: Ph.D. in CSE


Designation with Department: Professor & Head (Computer Science & Engineering)
Contact No: 9675430802
Email: [email protected]
Specialization: Computer Science and Engineering
Experience: 21 Years and 6 Months
Research Articles/Published/Membership: 57

3
Present Area of work: Software Engineering, Software Project Management, Data
Mining,Soft Computing, Computer Vision, IoT, Cloud Computing.
Awards and Recognitions
• Best Project Award by IBM.
• Best Paper Awards
• Chaired Springer Sponsored International Conference at Ajmer, India in 2017.
• Chaired Springer Sponsored International Conference at Ajmer, India in 2018.
• Coordinator, spoken tutorial Training programs in collaboration with IIT
Bombayunder National Mission on Education through ICT, MHRD, Govt. of
India.
• Delivered a keynote speech and chaired a session in IC4S 2017 at Phuket,
Thailand.
• Delivered a keynote speech and chaired a session in IC4S 2018 at Bangkok,
Thailand.
• Delivered an invited talk at Campus of ITS, Sukolilo-Surbaya, Indonesia as
visiting professor in workshop on Software Testing for The Information
System InternationalConference (ISICO), held during July 22-25, 2019.
• Founder Developer of College Website: www.fetrbs.org
• Guiding 01 Ph. D. Scholars enrolled with AKTU, Lucknow.
• Invitation from IEEE international conference, China to Chair a session
• Member of IEEE SOCIETY and IEEE Communications Society, the largest
technicalprofessional society in the world.
• Member of various International Associations/Societies of Artificial
Intelligence/Computer Science/ Scientific Computing.
• No. of M. Tech. Scholars Guided: 10
• Nominated, treasurer for the IEEE, UP section, SP/C (Signal
Processing/Computer)Joint Chapter in year 2014.
• One Book Published for Engineering and MCA students
• Organized 1 Springer sponsored Scopus indexed International Conferences
as Conference Chair.
• Organized 2 National Conferences as Joint secretary/ Secretary.
• Supervised 01 Ph. D. Scholars enrolled with AKTU, Lucknow.
• Visited China to present Research paper in IEEE conference.

4
Journals/Academic Achievements

• Editor/Member of Editorial board and Reviewer of various


International/National Journals of Intelligent Information Processing,
Multidisciplinary Advance Research in Science & Technology, Computer
Science & Information Engineering in India andabroad.
• Executive Member of various International/National conference program
committees in India, USA, China, Singapore, France, United Kingdom,
Hongkong, Japan, South Korea, Malaysia, Romania, Republic of Macedonia,
Netherlands, Greece, Denmark, Turkey, Philippines and others.

4
Gourav Sharma

Gourav Sharma is a final year student of Computer Science & Engineering at Raja
Balwant Singh Engineering Technical Campus, Agra. He has passed his High School
and Intermediate examinations from CBSE in the year 2017 & 2019 respectively with
a score of 95% & 71%. He is skilled in Python, flask. He has achieved many medals in
kabaddi.

4
PLAGIARISM CHECK

Chapter 1

Chapter 2

4
Chapter 3

Chapter 4

4
Chapter 5

Chapter 6

4
Chapter 7

Chapter 8

You might also like