0% found this document useful (0 votes)
30 views4 pages

BTP Report

The document summarizes a thesis project on automatically extracting relevant metadata from educational videos for efficient indexing. The project aims to obtain metadata like institute name, publisher name, department name, professor name, subject name, and topic name from video lectures. Key steps involved creating a dataset of video clips, identifying keyframes, applying text localization to keyframes, and developing an indexing system to map extracted attributes. Evaluation shows the system accurately maps publisher names 88.03% of the time, institute names 88.88% of the time, and department names 82.47% of the time.

Uploaded by

kshalika734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views4 pages

BTP Report

The document summarizes a thesis project on automatically extracting relevant metadata from educational videos for efficient indexing. The project aims to obtain metadata like institute name, publisher name, department name, professor name, subject name, and topic name from video lectures. Key steps involved creating a dataset of video clips, identifying keyframes, applying text localization to keyframes, and developing an indexing system to map extracted attributes. Evaluation shows the system accurately maps publisher names 88.03% of the time, institute names 88.88% of the time, and department names 82.47% of the time.

Uploaded by

kshalika734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

A thesis on Automatic extraction of relevant metadata from educational

videos for efficient indexing

Project-I (ME47603) report

by

Shalika Kumbham

(18ME10030)

Under the supervision of

Professor Krothapalli Sreenivasa Rao

Department of Computer Science and Engineering

Indian Institute of Technology Kharagpur

Acknowledgment
I sincerely thank my supervisor Professor Krothapalli Sreenivasa Rao and
my mentor Mr. Abhijit Debnath(Ph.D. Scholar) for their constant guidance.

I thank IIT Kharagpur for providing an opportunity to take up my project in


outside department as per my interests.
Abstract :
Nowadays, there are being more online classes due to the COVID situation. So more video
lectures are being recorded, maintained, and uploaded. We generally observe that these
educational videos have metadata containing five to six attributes: Institute Name, Publisher
Name, Department Name, Professor Name, Subject Name, and Topic Name. It would be easy
to maintain these videos if we could organize them with respect to their categories. In this
project, we are trying to get the metadata information as mentioned above from the video
lectures.

Background:
Many changes are being made by students while adjusting to the COVID situation. One of
which is mainly affected is the mode of classes. Creating online videos and maintaining them is
a kind of task. It would be a lot easier if we could organize them automatically without much
human interaction. It saves a lot of time and effort. So we are trying to develop an interface for
getting the required information to organize them in their respective categories. If given a video,
it will provide information belonging to those categories, Ex: if there is an NPTEL video on
Image Processing taught by Prof. K S Rao from Computer Science Department, IIT Kharagpur,
we have to map the information with their column-like Publisher Name - NPTEL, Professor
Name - K S Rao, Department Name - Computer Science, etc.

Literature Review:
The TIB AV-Portal is a portal for scientific videos with a focus on technology/engineering as well
as architecture, chemistry, computer science, mathematics, and physics. Other scientific
disciplines are also included in the AV-Portal. The videos include computer visualizations,
learning material, simulations, experiments, interviews, video abstracts, lecture, and conference
recordings, and (open) audiovisual learning and teaching materials. The portal uses various
automatic video analyses. The added value: these video analyses enable pinpoint searches
within the video content.
NDLI is an integration platform that sources metadata from a massive number of sources to
provide single window access to a wide variety of digital learning resources. Metadata
summarizes basic information of a resource, which can make finding and working with particular
resources easier. Sources have different standards for their metadata with a variety of
resources.NDLI metadata standard has been developed to draw up this uniform schema. It is an
envelope of several well-established global standards.
The Dublin Core Metadata Initiative (DCMI), which formulates the Dublin Core, is a project of
the Association for Information Science and Technology (ASIS&T). ” Dublin Core” is also used
as an adjective for Dublin Core metadata. It is a style of metadata that draws on multiple
Resource Description Framework (RDF) vocabularies, packaged and constrained in Dublin
Core application profiles.
The Dublin Core Metadata Initiative (DCMI)[9] provides an open forum for developing
interoperable online metadata standards for a broad range of purposes and business models.
DCMI’s activities include
● consensus-driven working groups,
● global conferences and workshops,
● standards liaison, and
● educational efforts to promote widespread acceptance of metadata standards and
practices.

Method:
We divided the problem into subproblems and tackled them individually.
Subproblems:
1. Creation of dataset of lecture videos containing the introduction part of videos and
preparation of ground truth against each video.
2. Identification of keyframes and Application of text localization method to locate texts in
videos
3. Creation of an indexing system to map the above mentioned attributes from the
metadata extracted.
4. Evaluation of efficiency of the indexing system, using Levenshtein distance algorithm
and accuracy.
Data set Creation - To create the data set, first, we collected the URLs of all the videos, their
start and end time, in a text file and prepared the ground truth. Then we used the youtube-dl
library to download videos given the URL links and save them in the format we wanted from the
given options. Then FFmpeg tool was used to cut the video with their respective starting and
ending times (which is in the text file).

Keyframes Identification - There are many papers on keyframe extraction. We tried 2 to 3


methods then finalized the ffprobe tool, which gave better results.
This tool categories all the frames into 2 types - ‘I’ and ‘P’, where I corresponds to keyframes.
So we gather all the frames which are mapped as ‘I’ and save them for the keyframes.

Text localization - Easy OCR is used to identify the texts in a given frame. It provides the text,
its probability, and coordinates of bounding boxes for that particular textbox. We can recognize
the text either as lines or paragraphs. For our convenience, we decided to use the paragraph
option.

Indexing System - We used a dictionary for Publisher Names, Institute Names, and
Department Names. We used a fuzzy-wuzzy library to locate the above categories in the text
and mapped them respectively. Used the predefined rules and NER model to identify names
from a given string to identify professor names.
Evaluation - we evaluated its efficiency using the accuracy formula, i.e., No of videos it mapped
correctly for a particular category/ total no of videos. And as EasyOCR might not read all the
letters correctly, there might be errors in the Professor’s names. So we used the Levenshtein
distance algorithm and accuracy combined to evaluate the Professor Name category.

Results:
Accuracies for the following categories are:
1. Publisher Names - 88.03%
2. Institute Names - 88.88%
3. Department Names - 82.47%
4. Professor Names - 85.89

Conclusion:
We developed an indexing system that maps the four categories efficiently.
We could not pursue the other two categories given the time constraints. So we can focus on
how to map those two categories and develop an interface like a web application in the future.
Like when given a video as input, it provides the individual results. And keyframe extraction
accuracy, although better than the methods we tried. It could still be possible to improvise it to
get better clarity( in the frames), which will help reduce OCR errors and get non-repeated
frames.

You might also like