Technical Report 1.2
Technical Report 1.2
TECHNICAL REPORT
Supervisor
Prof. Kareemullah
Submitted by
Talha Azeem
2019-ag-6072
Nabeel ur Rehman
2019-ag-6078
I hereby declare that the contents of the report SemantoTube are project of my own research and
no part has been copied from any published source (except the references). I further declare that
this work has not been submitted for award of any other diploma/degree. The university may
take action if the information provided is found false at any stage. In case of any default the
scholar will be proceeded against as per UAF policy.
_________________
Talha Azeem
_________________
Nabeel ur Rehman
CERTIFICATE
To,
The Controller of Examinations,
University of Agriculture,
Faisalabad.
The supervisory committee certify that TALHA AZEEM [2019-ag-6072] and NABEEL UR
REHMAN [2019-ag-6078] has successfully completed his FYP for the degree of B.S.
Computer Science under our guidance and supervision.
_____________________________________
Prof. Kareemullah
Supervisor
_____________________________________
Talha Azeem
Member
_____________________________________
Nabeel ur Rehman
Member
_______________________________________
Dr. Muhammad Ahsan Latif
Incharge,
Department of Computer Science
ACKNOWLEDGEMENT
I thank all who in one way or another contributed in the completion of this report. First, I thank
to ALLAH ALMIGHTY, most magnificent and most merciful, for all his blessings. Then I am
so grateful to the Department of Computer Science for making it possible for me to study here.
My special and heartily thanks to my supervisor, Prof. Kareemullah who encouraged and
directed me. Her challenges brought this work towards a completion. It is with her supervision
that this work came into existence. For any faults I take full responsibility. I am also deeply
thankful to my informants. I want to acknowledge and appreciate their help and transparency
during my research. I am also so thankful to my fellow students whose challenges and productive
critics have provided new ideas to the work. Furthermore, I also thank my family who
encouraged me and prayed for me throughout the time of my research. May the Almighty God
richly bless all of you.
ABSTRACT
SemantoTube is a web-based video search engine that leverages natural language processing
(NLP) techniques to revolutionize the way users search for information within YouTube videos.
With the vast amount of video content available on the internet, finding specific information
within lengthy videos can be time-consuming and frustrating. SemantoTube aims to address this
challenge by understanding the context and meaning of the video's content through NLP
analysis.
The project enables users to enter a YouTube video link and query, which is then processed by
retrieving the video's transcript and tokenizing it. The tokenized transcript, along with the user's
query, is sent to the Cohere AI API to obtain the most relevant results. These results are
presented to the user in a clear and organized manner, allowing for efficient navigation through
the video content.
1. INTRODUCTION 1
1.1 Background 1
1.2 Description 1
1.3 Scope 1
1.4 Objectives 1
2. REQUIREMENTS 3
2.1 Functional Requirements 3
2.2 Non- Functional Requirements 4
2.3 Hardware Requirements 4
2.4 Software Requirements 4
3. METHODOLOGY 6
4. Timeline 8
5. Design 9
6. RESULTS & DISCUSSION 17
List of Figures
1
1. INTRODUCTION
1.1 Background
In today's world, the amount of information available on the internet is vast and growing every
day. One of the most common ways people consume this information is through videos,
especially on platforms like YouTube. However, searching for specific information within a
video can be time-consuming and frustrating. The proposed project, SemantoTube, aims to
address this problem by using natural language processing (NLP) to understand the contents of
videos and return results based on the user's query.
1.2 Description
SemantoTube is a web-based video search engine that utilizes NLP to understand the contents
of videos and return the most relevant results based on the user's query. The user can enter a link
to a YouTube video, and the system will retrieve the transcript of the video and tokenize it before
sending it to the Cohere AI API along with the user's query. The system will then receive the
most relevant results from the API and present them to the user. The user can then select a result,
and the video will automatically play at the exact time the relevant information appears. This will
save the user time when searching for specific information within lengthy YouTube videos.
1.3 Scope
The proposed project, "SemantoTube," is a web-based video search engine that utilizes natural
language processing (NLP) to understand the contents of YouTube videos and return results
based on the user's query. The project aims to provide users with a more efficient way of
searching for information in lengthy YouTube videos by utilizing the Cohere AI API to match
the user's query with the transcript of the video and return the most relevant results.
It is difficult to search the entire video of Youtube and find a user query’s answer. SemantoTube
will search in seconds and make it easy for the user to find his answer in more relevant and
semantic rather than keyword base.
1.4 Objectives
The primary goal of the project is to develop a web-based video application that utilizes natural
language processing (NLP) techniques to help users quickly locate specific information within
longer YouTube videos in multiple languages.
● The application should be able to understand the context and meaning of the text in the
video, rather than just searching for keywords.
● Users should be able to search for specific topics, phrases, or quotes within the video
semantically and have relevant results returned to them.
● It should be user-friendly and easy to navigate.
● The application should be able to handle large numbers of videos and users.
● The application should be scalable to handle increasing numbers of videos and users..
1
● The application should have multilingual support, allowing users to search for videos in
different languages.
● The application should perform semantic search, understanding the intent and context of
the user's query and returning more accurate results.
● The application should be able to extract key information from the video and make it
easily accessible to the user
● The application should be able to handle a wide range of video formats and sources
● The application should be able to analyze the video's audio and transcript to improve the
search experience
● The application should be able to handle large numbers of requests and provide results in
real-time
● The application should be integrated with popular video platform such as YouTube
● The application should be optimized for mobile and tablet devices
● The application should be available in multiple languages.
2. REQUIREMENTS
2.1 Functional Requirements
2
FR05-01 Allow the user to select a specific result from the search results
FR05-02 Automatically play the video at the exact time the relevant information appears
in the video
FR06-02 Ensure the system is responsive and adapts to different screen sizes
FR06-03 Optimize the system for mobile and tablet devices
3
languages
Web Flask
Framework
NLP Cohere AI API
Cloud AWS, GCP or Azure
services
Web Server Apache or Nginx
Operating Linux
System
Video player Youtube player
3. METHODOLOGY
The methodology used for this project will be the Rapid Application Development (RAD)
methodology. RAD is an iterative development approach that emphasizes a rapid prototyping
and incremental delivery of working software. This methodology is chosen for this project as it is
well-suited for projects where the requirements are not well-defined and are likely to change
over time.
The RAD methodology consists of four major activities: Requirements Planning, User Design,
Construction, and Cutover.
Requirements Planning: In this phase, the project team will gather and document the
requirements of the project. The team will conduct interviews with stakeholders and users to
gather requirements and feedback.
4
User Design: In this phase, the project team will create a prototype of the system based on the
gathered requirements. The prototype will be presented to the stakeholders and users for
feedback and testing.
Construction: In this phase, the project team will start development of the system based on the
approved prototype. The team will use agile development methods to ensure that the system is
developed in an incremental and iterative manner.
Cutover: In this phase, the project team will test and deploy the system. The team will conduct
user acceptance testing and training before deploying the system to production.
3.2 Models
There are a few different models that could be used for this project, depending on the
specific requirements and goals of the project. Below are a few options.
A transcription model: This model would be responsible for transcribing the audio
from the YouTube videos into text. This model could be trained on a large dataset of
transcribed videos to learn the patterns of speech and improve its accuracy.
The system will transcribe the speech in the video to be sent to the cohere AI API
A natural language processing (NLP) model: This model would be responsible for
understanding the contents of the transcriptions from the videos and returning results
based on the user's query. This model could be trained on a large dataset of text and
labeled with different categories or topics to learn how to classify and understand the
text.
A search algorithm: This model would be responsible for searching through the
transcriptions and returning the most relevant results to the user's query. This model
5
could be trained on a dataset of text and labeled with different categories or topics to
learn how to match the user's query with the most relevant text.
4. Timeline
The project is expected to take approximately 2 months to complete. A tentative timeline of the
project activities is given in Figure 2 below.
Figure 2: Tentative Timeline of the Project activities
The project will start with the requirements planning phase, where the project team will gather
and document the requirements of the project. This phase is expected to take 2 months.
Next, the project team will move on to the user design phase, where a prototype of the system
will be created based on the gathered requirements. The prototype will be presented to the
stakeholders and users for feedback and testing. This phase is expected to take half a month.
6
In the construction phase, the project team will start development of the system based on the
approved prototype. The team will use agile development methods to ensure that the system is
developed in an incremental and iterative manner. This phase is expected to take 1 month.
Finally, in the cutover phase, the project team will test and deploy the system. The team will
conduct user acceptance testing and training before deploying the system to production. This
phase is expected to take half a month.
5. Design
7
5.1.1 Use Case Scenarios
Use Case Id 01
Requirement Id FR01
8
Description: This use case describes the process of searching for specific information within a video.
Pre-Conditions:
The user must have a YouTube video link.
The video must have a transcript available.
Post Conditions: The user is provided with relevant search results within the video.
Authority: User
Use Case Id 02
Requirement Id FR02
Description: This use case describes the process of retrieving the transcript of a video.
Pre-Conditions:
The user must have a YouTube video link.
The video must have a transcript available.
9
2. The system retrieves the transcript of the video from the YouTube
API.
3. The system tokenizes the transcript by breaking it into individual
words or phrases.
Post Conditions: The transcript of the video is retrieved and tokenized for further processing.
Authority: User
Use Case Id 03
Requirement Id FR03
Description: This use case describes the process of sending the user's query to the Cohere AI API.
Pre-Conditions:
The transcript of the video must be available and tokenized.
1. The system sends the tokenized transcript and the user's query to the
Cohere AI API.
2. The system receives the most relevant results from the API.
Post Conditions: The system receives the relevant results from the Cohere AI API.
Authority: User
Use Case Id 04
Requirement Id FR04
10
Description: This use case describes the process of presenting the most relevant search results to the
user.
Pre-Conditions:
The relevant results from the Cohere AI API must be available.
The system presents the results to the user in a clear and organized
manner.
Post Conditions: The user is provided with the most relevant search results.
Authority: User
11
12
13
Use Case Title Automatically Play Video at Exact Time
Use Case Id 05
Requirement Id FR05
Description: This
use case describes the process of automatically playing the video at the exact time the
relevant information appears.
Pre-Conditions:
The user has selected a specific result from the search results.
Task Sequence Exceptions
Post Conditions: The video starts playing at the exact time the relevant information appears.
Authority: User
14
5.2 Sequence Diagram
15
5.3 Class Diagram
16
6. RESULTS & DISCUSSION
6.1 Test Cases
Test Data
Expected Results: The user is provided with relevant search results within the video.
Test Data
Expected Results: The transcript of the video is retrieved and tokenized for further
processing.
Actual Results: As above
Status: (Pass/Fail) Pass
17
Test Case ID: 3
Test Case Title: Send Query to Cohere's API
Test Case Priority: High
Requirement: FR03
Test Description: Test the functionality of sending the user's query to the Cohere AI API.
Test Data
Expected Results: The system receives the relevant results from the Cohere AI API.
Test Data
Expected Results: The user is provided with the most relevant search results.
18
Test Case Title: Automatically Play Video at Exact Time
Test Case Priority: High
Requirement: FR05
Test Description: Test the functionality of automatically playing the video at the exact time the
relevant information appears.
Test Date: 05/25/2023
Dependencies:
Test Steps: Select a specific result from the search results.
Verify that the video starts playing at the exact time the relevant
information appears.
Test Data
Expected Results: The video starts playing at the exact time the relevant information
appears.
Actual Results: As above
Status: (Pass/Fail) Pass
Test Data
Expected Results: The system is compatible with different web browsers and devices, and
it adapts to different screen sizes.
Actual Results: As above
Status: (Pass/Fail) Pass
6.2 Conclusion
19
Based on the test cases conducted for the SemantoTube project, it can be concluded that the
system performs well in retrieving video transcripts, searching within videos, and providing
relevant results to the users. The tokenization process is accurate, and the integration with the
Cohere AI API successfully retrieves relevant results based on user queries.
The system also demonstrates responsiveness and compatibility with different web browsers and
devices, ensuring a seamless user experience across various platforms. The automatic playback
feature functions as expected, playing the video at the exact time the relevant information
appears.
Overall, the test results indicate that the SemantoTube project meets the functional requirements
and performs its intended tasks effectively. The project shows promise in providing users with a
more efficient way of searching for specific information within lengthy YouTube videos,
leveraging natural language processing techniques and the Cohere AI API.
However, it is important to note that these test cases cover a limited scope of the project. To
ensure comprehensive testing, additional test cases should be designed to cover edge cases,
performance testing, security testing, and compatibility with different video formats and sources.
Continued testing and refinement will be crucial to further improve the system's accuracy,
usability, and scalability. Regular updates and bug fixes should be implemented based on user
feedback and additional requirements that may arise. With ongoing testing and improvements,
SemantoTube has the potential to become a valuable tool for users seeking efficient video search
capabilities.
20
21