BTP Report
BTP Report
by
Shalika Kumbham
(18ME10030)
Acknowledgment
I sincerely thank my supervisor Professor Krothapalli Sreenivasa Rao and
my mentor Mr. Abhijit Debnath(Ph.D. Scholar) for their constant guidance.
Background:
Many changes are being made by students while adjusting to the COVID situation. One of
which is mainly affected is the mode of classes. Creating online videos and maintaining them is
a kind of task. It would be a lot easier if we could organize them automatically without much
human interaction. It saves a lot of time and effort. So we are trying to develop an interface for
getting the required information to organize them in their respective categories. If given a video,
it will provide information belonging to those categories, Ex: if there is an NPTEL video on
Image Processing taught by Prof. K S Rao from Computer Science Department, IIT Kharagpur,
we have to map the information with their column-like Publisher Name - NPTEL, Professor
Name - K S Rao, Department Name - Computer Science, etc.
Literature Review:
The TIB AV-Portal is a portal for scientific videos with a focus on technology/engineering as well
as architecture, chemistry, computer science, mathematics, and physics. Other scientific
disciplines are also included in the AV-Portal. The videos include computer visualizations,
learning material, simulations, experiments, interviews, video abstracts, lecture, and conference
recordings, and (open) audiovisual learning and teaching materials. The portal uses various
automatic video analyses. The added value: these video analyses enable pinpoint searches
within the video content.
NDLI is an integration platform that sources metadata from a massive number of sources to
provide single window access to a wide variety of digital learning resources. Metadata
summarizes basic information of a resource, which can make finding and working with particular
resources easier. Sources have different standards for their metadata with a variety of
resources.NDLI metadata standard has been developed to draw up this uniform schema. It is an
envelope of several well-established global standards.
The Dublin Core Metadata Initiative (DCMI), which formulates the Dublin Core, is a project of
the Association for Information Science and Technology (ASIS&T). ” Dublin Core” is also used
as an adjective for Dublin Core metadata. It is a style of metadata that draws on multiple
Resource Description Framework (RDF) vocabularies, packaged and constrained in Dublin
Core application profiles.
The Dublin Core Metadata Initiative (DCMI)[9] provides an open forum for developing
interoperable online metadata standards for a broad range of purposes and business models.
DCMI’s activities include
● consensus-driven working groups,
● global conferences and workshops,
● standards liaison, and
● educational efforts to promote widespread acceptance of metadata standards and
practices.
Method:
We divided the problem into subproblems and tackled them individually.
Subproblems:
1. Creation of dataset of lecture videos containing the introduction part of videos and
preparation of ground truth against each video.
2. Identification of keyframes and Application of text localization method to locate texts in
videos
3. Creation of an indexing system to map the above mentioned attributes from the
metadata extracted.
4. Evaluation of efficiency of the indexing system, using Levenshtein distance algorithm
and accuracy.
Data set Creation - To create the data set, first, we collected the URLs of all the videos, their
start and end time, in a text file and prepared the ground truth. Then we used the youtube-dl
library to download videos given the URL links and save them in the format we wanted from the
given options. Then FFmpeg tool was used to cut the video with their respective starting and
ending times (which is in the text file).
Text localization - Easy OCR is used to identify the texts in a given frame. It provides the text,
its probability, and coordinates of bounding boxes for that particular textbox. We can recognize
the text either as lines or paragraphs. For our convenience, we decided to use the paragraph
option.
Indexing System - We used a dictionary for Publisher Names, Institute Names, and
Department Names. We used a fuzzy-wuzzy library to locate the above categories in the text
and mapped them respectively. Used the predefined rules and NER model to identify names
from a given string to identify professor names.
Evaluation - we evaluated its efficiency using the accuracy formula, i.e., No of videos it mapped
correctly for a particular category/ total no of videos. And as EasyOCR might not read all the
letters correctly, there might be errors in the Professor’s names. So we used the Levenshtein
distance algorithm and accuracy combined to evaluate the Professor Name category.
Results:
Accuracies for the following categories are:
1. Publisher Names - 88.03%
2. Institute Names - 88.88%
3. Department Names - 82.47%
4. Professor Names - 85.89
Conclusion:
We developed an indexing system that maps the four categories efficiently.
We could not pursue the other two categories given the time constraints. So we can focus on
how to map those two categories and develop an interface like a web application in the future.
Like when given a video as input, it provides the individual results. And keyframe extraction
accuracy, although better than the methods we tried. It could still be possible to improvise it to
get better clarity( in the frames), which will help reduce OCR errors and get non-repeated
frames.