0% found this document useful (0 votes)

18 views11 pages

BDA Technical Documentation

Uploaded by

viplaviwade21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views11 pages

BDA Technical Documentation

Uploaded by

viplaviwade21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Name: Viplavi Wade

Student Id: 13922741

Subject: Big Data Analytics

Course: MSC Advanced Computing

Date: June 24, 2024

Big Data Analytics: CineSense Project

GitHub Repo Link: https://github.com/ViplaviWade/Big-Data-Project

1.0 Introduction

CineSense is an innovative video-processing startup that extracts valuable insights from social
media video content using advanced natural language processing (NLP) and computer vision
techniques. This analysis is crucial for businesses seeking to understand their audience, improve
customer experiences, and make data-driven decisions.

The project's primary objective is to develop a Python application that efficiently downloads and
analyzes YouTube videos using parallel processing techniques. This involves tasks such as
downloading videos, extracting audio, transcribing audio to text, performing sentiment and
emotion analysis, and translating the text. The project emphasizes the use of multiprocessing,
threading, or asynchronous programming to optimize the workflow.

2.0 Tools and technologies used:

• Python Programming language (for implementation) Python3 as Python has a wide range of
libraries that could be used for the implementation of the project such as using pytube, using
speech recognition, spacy, textblob for sentiment analysis etc.
• Git and GitHub: Implementing version control for the project, as it provides visibility and
contribution for repository hosting, and tracking the changes from each phase of the project
development

3.0 Implementation Phases of the Project:

3.1 Phase-1

Tasks:

1. Manually retrieve 10-15 random video URLs from YouTube. Save the URLs in a text file called
video_urls.txt , where each URL should be stored on a separate line. Consider YouTube
videos that are 2-3 minutes in duration.
2. Develop a Python script to read the URLs. Assuming you have the text file named
video_urls.txt containing the URLs of YouTube videos, load it in Python and extract the URLs
using your preferred data structure.

3. Develop a Python script to download the videos using their URLs. Test your solution by
downloading the files serially. Use parallel programming such as multiprocessing or
threading to handle downloads. Your decision will determine the best strategy. For testing
reasons, ensure the script can download up to 5 videos simultaneously to avoid YouTube
blocks. You are advised to use threads and semaphores to control the downloads. Compare
serial and parallel executions for your video download script. Discuss the complexity of your
video download scripts time and space.

To download the videos in a serial as well as parallel fashion I have created this input choice
where the users have the choice to select the downloading execution mechanism where the
user can select ‘1’ for serial execution of downloading and ‘2’ for parallel execution of
download.
Parallel programming is implemented using the multi-threading concept

Ensuring the script can download up to 5 videos simultaneously to avoid YouTube blocks.

Time and space complexity of the serial and parallel downloads.

Time Complexity:

• Serial Execution: The time complexity of serial execution is O(n). Each video
download operation is independent and sequential, leading to a linear relationship
between the number of videos (n) and the total download time. Thus, the total time
taken increases linearly with the number of videos.
• Parallel Execution: The time complexity of parallel execution is O(n/k). Assuming the
system allows downloading (k) videos concurrently without significant overhead, the
time complexity can be approximated to O(n/k), where (k) is the number of
concurrent threads. However, the actual speedup will depend on the network
bandwidth, system resources, and how well the parallelism can be achieved.

Space Complexity:

• Serial Execution: The time complexity of serial execution is O(1). The space
complexity is constant because at any given time, only one video is being processed
and downloaded, regardless of the total number of videos.
• Parallel Execution: The time complexity of parallel execution is O(k). The space
complexity is proportional to the number of concurrent threads (k). Each thread
consumes memory for its stack and local variables. Additionally, each video being
downloaded will consume space simultaneously, but this is generally negligible
compared to the overall download directory size.

Factors that might influence the actual working mechanism of the serial and parallel
execution

• Network Bandwidth: The actual performance gain from parallel downloads will
highly depend on the available network bandwidth. If the bandwidth is a bottleneck,
adding more threads might not lead to a proportional decrease in download time.
• Thread Management: Proper management of threads using semaphores and
mutexes is crucial to avoid issues like race conditions and excessive resource usage.
As in my code, a semaphore is used to limit the number of concurrent downloads to
5, ensuring the system is not overwhelmed.
• Error Handling: Robust error handling is essential in both serial and parallel
execution to manage issues like unavailable videos or network failures gracefully.

Parallel execution can significantly reduce the total download time compared to serial
execution, as demonstrated by the results of my executed code the serial execution took
43.01 seconds, whereas the parallel execution is demonstrated in 27.67 seconds. However,
this comes with increased complexity in managing concurrent threads and potential
challenges related to network bandwidth and system resources. The choice between serial
and parallel execution should consider these factors to optimize performance effectively.

4. Develop a Python script to keep a log for each download. After downloading each video,
create a logger to record which video was downloaded by which process or thread. Save the
log entries to the same file, e.g., download_log.txt . For this script, you have to use threads
and a mutex.

5. Develop Python scripts to perform various video analysis tasks. After downloading a video,
perform the following tasks. It is preferable to develop a separate script for each
functionality. The five analysis subtasks are as follows.
I. Extract audio from a video file.
The code for extracting audio from video file is added in the extract_audio.py file and
the extracted audio is saved under folder names ‘extracted_audio’
II. Transcribe audio to text.
The transcription of audio files to text file is completed in the file
transcribe_audio.py file and the transcribed text files are stored under the
transcribe_audio2text folder
III. Perform the sentiment analysis on a video's content, extracting its polarity and
sensitivity.
The sentiment analysis is completed in sentiment_analysis.py file and stored under
sentiment_analysis folder. In this example the sentiment analysis is completed in
0.12 seconds. And the sentiment shows the polarity and subjectivity of the video.
And it is stored under a json file format for every youtube video.
IV. Translate the text into another language, e.g. Spanish.
The code for translating text from English to any other language. Here, default set as
‘Spanish’ is completed in ‘translate_text.py’. The execution for translating the English
text to Spanish took around 4.30 seconds to complete. The translated text is stored
under the folder named ‘translations’.
V. Extract the emotions of a text.
The emotion of the text is determined under the extract_emotions.py. The emotions
are stored as a json file where the json file contains the values i.e. the emotions of
the text as a form in Happy, Angry, Surprise, Sad, and Fear. These json files are stored
under a folder named extracted_emotions. The execution of this python script is
completed in 1.92 seconds.

Youtube Video Download Using Python Project Report
No ratings yet
Youtube Video Download Using Python Project Report
38 pages
Python YouTube Video Downloader
No ratings yet
Python YouTube Video Downloader
21 pages
Python YouTube Video Downloader
No ratings yet
Python YouTube Video Downloader
20 pages
Happymonk Data Engineer Intern Assignment
No ratings yet
Happymonk Data Engineer Intern Assignment
7 pages
Data Engineer Intern Assignment
No ratings yet
Data Engineer Intern Assignment
3 pages
Youtube Video Download Using Python
100% (2)
Youtube Video Download Using Python
37 pages
Report Yt Downloader Python
No ratings yet
Report Yt Downloader Python
11 pages
E Ticket
No ratings yet
E Ticket
2 pages
Exercises Chapter 6 Capital Allowance
No ratings yet
Exercises Chapter 6 Capital Allowance
2 pages
ANNEX B LGU User Registration Form
No ratings yet
ANNEX B LGU User Registration Form
1 page
A Changing of The Guards at The College of Arts and Sciences (1981)
No ratings yet
A Changing of The Guards at The College of Arts and Sciences (1981)
151 pages
Print - Udyam Registration Certificate
No ratings yet
Print - Udyam Registration Certificate
2 pages
Fosroc Nitomortar FC (FS) : Constructive Solutions
No ratings yet
Fosroc Nitomortar FC (FS) : Constructive Solutions
2 pages
Plane Areas
No ratings yet
Plane Areas
26 pages
Transportation Calculations
No ratings yet
Transportation Calculations
11 pages
SW 4048 120 Spec Sheet
No ratings yet
SW 4048 120 Spec Sheet
2 pages
Land and Inland
No ratings yet
Land and Inland
28 pages
Corporation Testbank
No ratings yet
Corporation Testbank
45 pages
Civpro Digests b1
No ratings yet
Civpro Digests b1
15 pages
Cta Cli
No ratings yet
Cta Cli
52 pages
2 Plugins Changelog
No ratings yet
2 Plugins Changelog
3 pages
Siemens SW Process Simulate Fs
No ratings yet
Siemens SW Process Simulate Fs
3 pages
FINAL CERTIFICATE OF RECOGNITION Grade 11 For Finals 2nd Sem
No ratings yet
FINAL CERTIFICATE OF RECOGNITION Grade 11 For Finals 2nd Sem
3 pages
SFT3508S (SFT3508I) IPTV Gateway Server Spec
No ratings yet
SFT3508S (SFT3508I) IPTV Gateway Server Spec
4 pages
Cambria TechSpecs V 9-2
No ratings yet
Cambria TechSpecs V 9-2
12 pages
GBV Monthly Work Plan
No ratings yet
GBV Monthly Work Plan
20 pages
2024 July Rationale Crisil
No ratings yet
2024 July Rationale Crisil
7 pages
The Japanese Led Light Industry
No ratings yet
The Japanese Led Light Industry
10 pages
Important Reminders: Step 1
No ratings yet
Important Reminders: Step 1
4 pages
Kateen Menu
No ratings yet
Kateen Menu
2 pages
GBC - Group Contract Assignment Guidelines and Rubric 2023 3
No ratings yet
GBC - Group Contract Assignment Guidelines and Rubric 2023 3
4 pages
Milk Powder: Etc., Recombined Milks and Other Liquid Beverages
No ratings yet
Milk Powder: Etc., Recombined Milks and Other Liquid Beverages
5 pages
Advanced Microcontroller Programming DC
No ratings yet
Advanced Microcontroller Programming DC
3 pages
Affidavit With Pay Slip
No ratings yet
Affidavit With Pay Slip
4 pages
Figure 1. Resources and Competencies Figure 3. Porter's Value Chain
No ratings yet
Figure 1. Resources and Competencies Figure 3. Porter's Value Chain
6 pages
Biogase and Its Uses
No ratings yet
Biogase and Its Uses
1 page
Heather R. Flores: Creative Director
No ratings yet
Heather R. Flores: Creative Director
1 page
System Programming Essentials with Go: System calls, networking, efficiency, and security practices with practical projects in Golang
From Everand
System Programming Essentials with Go: System calls, networking, efficiency, and security practices with practical projects in Golang
Alex Rios
No ratings yet
Build a Whatsapp Like App in 24 Hours: Create a Cross-Platform Instant Messaging for Android
From Everand
Build a Whatsapp Like App in 24 Hours: Create a Cross-Platform Instant Messaging for Android
Arjun Subburaj
3.5/5 (5)
Learn Python in 10 Minutes
From Everand
Learn Python in 10 Minutes
Victor Ebai
4/5 (30)
Protocol Buffers Handbook: Getting deeper into Protobuf internals and its usage
From Everand
Protocol Buffers Handbook: Getting deeper into Protobuf internals and its usage
Clément Jean
No ratings yet
Learning Docker
From Everand
Learning Docker
Pethuru Raj
5/5 (5)
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
From Everand
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Mark Chan
5/5 (4)
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
From Everand
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
Andrew Lee
3/5 (2)
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
From Everand
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Jason Scotts
4/5 (55)
The PEAR Installer Manifesto
From Everand
The PEAR Installer Manifesto
Gregory Beaver
No ratings yet
Python Programming: 8 Simple Steps to Learn Python Programming Language in 24 hours! Practical Python Programming for Beginners, Python Commands and Python Language
From Everand
Python Programming: 8 Simple Steps to Learn Python Programming Language in 24 hours! Practical Python Programming for Beginners, Python Commands and Python Language
Norman James
2/5 (1)
Hands-On Python for DevOps: Leverage Python's native libraries to streamline your workflow and save time with automation
From Everand
Hands-On Python for DevOps: Leverage Python's native libraries to streamline your workflow and save time with automation
Ankur Roy
No ratings yet
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
From Everand
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
James Tudor
5/5 (1)
Angular Workshop: From Beginner to Pro, Creating Applications for the Real World
From Everand
Angular Workshop: From Beginner to Pro, Creating Applications for the Real World
Abdelfattah Ragab
No ratings yet
Heroku Cloud Application Development
From Everand
Heroku Cloud Application Development
Anubhav Hanjura
No ratings yet
Visual SourceSafe 2005 Software Configuration Management in Practice
From Everand
Visual SourceSafe 2005 Software Configuration Management in Practice
Aleksandar Seovic
No ratings yet
Practical Guide to Python: From Basics to Advanced Programming
From Everand
Practical Guide to Python: From Basics to Advanced Programming
Arcadia J. Darell
No ratings yet
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
From Everand
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
Jordan Lioy
No ratings yet
Professional Heroku Programming
From Everand
Professional Heroku Programming
Chris Kemp
4/5 (2)
Practical Go: Building Scalable Network and Non-Network Applications
From Everand
Practical Go: Building Scalable Network and Non-Network Applications
Amit Saha
No ratings yet
Python Basics Made Simple: A Practical Guide with Examples
From Everand
Python Basics Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
Using Yocto Project with BeagleBone Black
From Everand
Using Yocto Project with BeagleBone Black
H M Irfan Sadiq
No ratings yet
Implementing DevSecOps with Docker and Kubernetes: An Experiential Guide to Operate in the DevOps Environment for Securing and Monitoring Container Applications
From Everand
Implementing DevSecOps with Docker and Kubernetes: An Experiential Guide to Operate in the DevOps Environment for Securing and Monitoring Container Applications
José Manuel Ortega Candel
No ratings yet
Learn Docker - .NET Core, Java, Node.JS, PHP or Python: Learn Collection
From Everand
Learn Docker - .NET Core, Java, Node.JS, PHP or Python: Learn Collection
Arnaud Weil
5/5 (4)
Learn Kubernetes - Container orchestration using Docker: Learn Collection
From Everand
Learn Kubernetes - Container orchestration using Docker: Learn Collection
Arnaud Weil
4/5 (1)
Docker: The Complete Guide to the Most Widely Used Virtualization Technology. Create Containers and Deploy them to Production Safely and Securely.: Docker & Kubernetes, #1
From Everand
Docker: The Complete Guide to the Most Widely Used Virtualization Technology. Create Containers and Deploy them to Production Safely and Securely.: Docker & Kubernetes, #1
Jordan Lioy
No ratings yet
DevOps and Containers Security: Security and Monitoring in Docker Containers
From Everand
DevOps and Containers Security: Security and Monitoring in Docker Containers
Jose Manuel Ortega Candel
No ratings yet
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Build Your First Home Server
From Everand
Build Your First Home Server
R.R. Arnob
No ratings yet
Software Containers: The Complete Guide to Virtualization Technology. Create, Use and Deploy Scalable Software with Docker and Kubernetes. Includes Docker and Kubernetes.
From Everand
Software Containers: The Complete Guide to Virtualization Technology. Create, Use and Deploy Scalable Software with Docker and Kubernetes. Includes Docker and Kubernetes.
Jordan Lioy
No ratings yet
Troubleshooting Docker
From Everand
Troubleshooting Docker
John Wooten
No ratings yet
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
From Everand
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
Mr Troy
No ratings yet
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
From Everand
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
Mark Magic
No ratings yet
Python for Beginners: An Introduction to Learn Python Programming with Tutorials and Hands-On Examples
From Everand
Python for Beginners: An Introduction to Learn Python Programming with Tutorials and Hands-On Examples
Nathan Metzler
4/5 (2)
Python File Handling Made Easy: A Practical Guide with Examples
From Everand
Python File Handling Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Essential Python 3
From Everand
Essential Python 3
Kevin Vans-Colina
No ratings yet
The Software Programmer: Basis of common protocols and procedures
From Everand
The Software Programmer: Basis of common protocols and procedures
S Mathioudakis
No ratings yet
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
From Everand
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
Kiet Huynh
No ratings yet
Python OOP Step by Step: A Practical Guide with Examples
From Everand
Python OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Python Automation for Beginners: A Practical Guide with Examples
From Everand
Python Automation for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Python Programming: Learn, Code, Create
From Everand
Python Programming: Learn, Code, Create
Sachin Naha
No ratings yet
Learn Kubernetes & Docker - .NET Core, Java, Node.JS, PHP or Python
From Everand
Learn Kubernetes & Docker - .NET Core, Java, Node.JS, PHP or Python
Arnaud Weil
No ratings yet
Python Programming Illustrated For Beginners & Intermediates: “Learn By Doing” Approach-Step By Step Ultimate Guide To Mastering Python: The Future Is Here!: The Future Is Here!
From Everand
Python Programming Illustrated For Beginners & Intermediates: “Learn By Doing” Approach-Step By Step Ultimate Guide To Mastering Python: The Future Is Here!: The Future Is Here!
William Sullivan
4/5 (2)
Mastering Nikto: A Comprehensive Guide to Web Vulnerability Scanning: Security Books
From Everand
Mastering Nikto: A Comprehensive Guide to Web Vulnerability Scanning: Security Books
Erwin Dirks
No ratings yet
The Beginner’s Guide to AI - Aider
From Everand
The Beginner’s Guide to AI - Aider
Steven Mcananey
No ratings yet
Python Programming Illustrated For Beginners & Intermediates“Learn By Doing” Approach-Step By Step Ultimate Guide To Mastering Python: The Future Is Here!
From Everand
Python Programming Illustrated For Beginners & Intermediates“Learn By Doing” Approach-Step By Step Ultimate Guide To Mastering Python: The Future Is Here!
William Sullivan
3/5 (1)
Free & Opensource Video Editor Software For Windows, Ubuntu Linux & Macintosh
From Everand
Free & Opensource Video Editor Software For Windows, Ubuntu Linux & Macintosh
Cyber Jannah Studio
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet
Free Video Editor Software Untuk Windows, Mac Dan Linux Edisi Bahasa Inggris
From Everand
Free Video Editor Software Untuk Windows, Mac Dan Linux Edisi Bahasa Inggris
Cyber Jannah Studio
No ratings yet
The Beginner’s Guide to Node.js
From Everand
The Beginner’s Guide to Node.js
Steven Mcananey
No ratings yet
Python Made Easy: A First Course in Computer Programming using Python
From Everand
Python Made Easy: A First Course in Computer Programming using Python
Kevin Wilson
No ratings yet
C# for Beginners: Learn in 24 Hours
From Everand
C# for Beginners: Learn in 24 Hours
Alex Nordeen
No ratings yet
Video Creators 48 Top Tools: Video Editing Special Edition [ The 8 series - Vol 9 ]
From Everand
Video Creators 48 Top Tools: Video Editing Special Edition [ The 8 series - Vol 9 ]
Mobile Library
No ratings yet

BDA Technical Documentation

Uploaded by

BDA Technical Documentation

Uploaded by

Name: Viplavi Wade

Student Id: 13922741

Subject: Big Data Analytics

Course: MSC Advanced Computing

Date: June 24, 2024

Big Data Analytics: CineSense Project

GitHub Repo Link: https://github.com/ViplaviWade/Big-Data-Project

2.0 Tools and technologies used:

3.0 Implementation Phases of the Project:

Time and space complexity of the serial and parallel downloads.

You might also like