0% found this document useful (0 votes)

22 views5 pages

Paper 4

Uploaded by

Ishika Kale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views5 pages

Paper 4

Uploaded by

Ishika Kale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

International Journal of Research Publication and Reviews, Vol 4, no 5, pp 1204-1208 May 2023

International Journal of Research Publication and Reviews

Journal homepage: www.ijrpr.com ISSN 2582-7421

Implementation of Video and Audio to Text Converter

Dr. M. Saraswathi1, VVSV Ronit2, S Sai Pranav3

1
Assistant Professor, Department of CSE, SCSVMV, Kanchipuram
2
B. Tech graduate (IV year), Department of IT, SCSVMV, (Deemed to be University) Kanchipuram
3
B. E Graduate (IV year), Department of CSE, SCSVMV (Deemed to be University), Kanchipuram

ABSTRACT:

In the real world, where the biggest workplace issues are resolved, the necessity for a video and audio to text converter exists. This converter can be used for
documentation reasons by a wide range of software firms, educational institutions, and other organizations. This is mostly used by software businesses to access
notes, project details, project presentation materials, etc. We choose Google Speech Recognition for our system due to its superior accuracy and user-friendly
interface. Python was even chosen by us for its ease of learning. When there is only one audio file in the system, the audio to text converter should be used to make
it simple for the user to convert the audio to text.

Keywords: Video to text, Audio to text, Python, Tkinter.

I. INTRODUCTION:

Now a days the online mode or work in online has the major part in the educational departments, jobs, and much more. As the pandemic occurred few
year where the world has used to sit in their respective indoor with the personal computers, mobile devices, laptops, etc.If we interview participant for
various research projects. Often grant funding for such projects will cover transcription costs. Human transcribers remain the work and usually do an
excellent job. You could of course transcribe your own interviews, works but this can be a very time consuming and laborious task.Some qualitative
researchers also advocate transcribing your own work as away of becoming more familiar with the data. We will start by converting the input video and
audio file which we can convert it into text.

Objective:

By using this application we can ease the documentation problem and get the notes from the audio and video files. We are using openCV with the given
input video as a source to get the frame rate and pytesseract module to extract the text present in that frame and then adding it to the output textbox.
Speech recognition for the audio file.

II. Literature Survey:

TITLE AUTHOR FINDINGS

From Speech-to-Speech Marcello Federico, Robert Enyedi, Roberto In this related work, the main goal is to evaluate the
Translation to Automatic Barra-Chicote, Ritwik Giri, Umut Isik, naturalness of automatic speech dubbing after enhancing
Dubbing Arvindh Krishnaswamy, Hassan Sawaf a baseline speech-to-speech translation system with the
possibility to control the verbosity of the translation
output, to segment and synchronize the target words with
the speech-pause structure of the source utterances, and
to enrich TTS speech with ambient noise and
reverberation extracted from the original audio.
International Journal of Research Publication and Reviews, Vol 4, no 5, pp 1204-1208 May 2023 1205

Personalized Speech Sagar Nimbalkar, Tekendra Baghele, In this work, the author is translating speech from one
Translation using Google Shaifullah Quraishi, Sayali Mahalle, language to another in an efficient manner. This process
Speech API and Microsoft Monali Junghare is carried out in three steps with the help of two APIs.
Translation API Those APIs are Google speech API and Microsoft
Translation API. The Google speech API converts the
speech into text format which is feed to Microsoft
Translation API that translates text into the desired
language.

Synchronized Audio- Philipp Harzig, Moritz Einfalt, Rainer In this work, the authors presented a Transformer-based
Visual Frames with Lienhart Video-to-Text architecture aimed to generate
Fractional Positional descriptions for short videos and able to gradually
Encoding for improve a vanilla Transformer designed for Machine
Transformers in Video-to- Translation into a architecture that generates appropriate
Text Translation and matching captions for video clips.

Textless Speech-to-Speech Ann Lee, Hongyu Gong, Paul-Ambroise In this work, the authors used reduces variations in the
Translation on Real Data Duquenne, Holger Schwenk, Peng-Jen target speech while retaining the lexical content and take
Chen,SravyaPopuri,Juan Pino advantage of self-superviseddiscrete representations of a
ChanghanWang, , reference speaker speech and perform CTC fine-tuning
Jiatao Gu, Wei-Ning Hsu with a pre-trained speech encoder.

Problem statement:

There are many systems like which just converts the audio neither the video into the text. For many transcribing work there are many live transcribing
applications where we see, use and exists in the day to day work and work profiles. As there is the problem in this existing system some time it may goes
the pronunciation which is wrong and is done with only the audio transcribing.The accuracy comes the problem when the video to text converting part,
where we try to get the accuracy for the audio and its done with the thing to get the output .Whereas the video goes frame by frame in which it takes the
time to get the output slow.

III. Proposed System:

Here we are proposing a new video and audio to text converter that work with the accuracy by the tkinter’s GUI. The uses in this proposed system are:
Simple execution, Interactive GUI, Working amount of accuracy etc…

Modules Description:

In this system, we have developed few modules such as

• Video to audio conversion

• Speech recognition

• Image recognition

• Gui output display

Video to audio conversion:

In this we are using the moviepy package in python in order to extract the audio of the uploaded video file and save it to a audio.wav file. We are using
“vidfile.audio.write_audiofile("audio.wav")” to write the audio of the given video file into a file named audio.wav and saving it the parent directory for
further use.

Speech recognition:

We are using the SpeechRecognition module in python with the extracted audio.wav file as the source and using google speech recognition to extract
the text from the audio file and return it as a stringGoogle speech recognition

Image recognition:

We are using opencv with the given input video as a source and using “capture.get(cv2.CAP_PROP_FPS)” t0 get the frame rate of the video and dividing
the current number of frame with the fps to get a single frame per second in the video and passing that frame to the pytesseract module to extract the text
present in that frame and then adding it to the output textbox.
International Journal of Research Publication and Reviews, Vol 4, no 5, pp 1204-1208 May 2023 1206

GUI output display: The gui 3 main elements the left frame, textbox , and the control buttonsthe left frame is a container to display the name of the
project along with the control buttons.The text box displays the output text from either the audio or video. The control buttons are used to upload.

IV. Methodology Used

 We are using tkinter to display the gui and upload,audio and video buttons for controlling the guiWe are using first uploading the video file
using fd.askopenfilename() in tkinter to get the file location.

 Then if the user presses the audio button the audio of the video file is extracted and is used as a source for google speech recognition using
Speech Recognition module and the output is displayed in the text box.If the video button is pressed the text is extracted once per every second
in the video and the output is added to the gui text box.

METHOD DESCRIPTION

outins() This method is used to remove the current text in the textbox and insert the given string
upload_button() This method is used to ask the user for the video file and saves the file location to the global
variable “filename”
audio_to_text() This method is used to extract the audio from the video and use speech recognition to
retrieve the text from the audio
video_to_text() This method is used to get a frame from every second of the given video and use optical
character recognition to extract the text from the frame.

Methods of video and audio toText Converter

Process:

1. The system starts by opening a tkinter window which contains a text box and a three buttons.

2. The user can start by selecting the upload button and selecting the required file in the open file window.

3. When Audio button is clicked the audio is extracted from the given video file using the moviepy module.The extracted audio is then sent to
google speech recognition and the output is added to the textbox in the gui.

4. When the video button is selected it the uploaded video is used as a capture source in the opencv modules VideoCapture() method.

5. Then a while True loop is used to iterate through all the frames and break is used to exit the loop when the iterating through the frames is
completed.

6. While iterating through the frames it check the if takes a frame per every second in the video and sends it to the tesseract ocr the output from
the ocr is added to a list.

7. After exiting the loop the list is then formatted into a string with a tabular appearance. The string is then inserted into the textbox.
International Journal of Research Publication and Reviews, Vol 4, no 5, pp 1204-1208 May 2023 1207

V. Result:

The result of our proposed work is shown is in fig 1 to 3

Fig 1:Appearing of the gui of the project:

Fig 2:Result of the video to text:

Fig 3: Result of the audio to text:

International Journal of Research Publication and Reviews, Vol 4, no 5, pp 1204-1208 May 2023 1208

Conclusion:

This program offers a very simple, barely functioning sample. This might be improved in a number of ways, including by giving users the option to load
either an mp4 or a wav file, by letting them select from a variety of speech recognizers, and by showing more details like file size and length. The user
experience of simple applications can be enhanced with graphical user interfaces, which are simple to add in Python using tools like PyQt and Tkinter
Designer. Performance issues can also be resolved using threading and running computationally intensive tasks in the background to minimise their
impact on the user experience.

References:

[1]. Marcello Federico, Robert Enyedi, Roberto Barra-Chicote, Ritwik Giri, Umut Isik, Arvindh Krishnaswamy, Hassan Sawaf” From Speech-to-
Speech Translation to Automatic Dubbing” Proceedings of the 17th International Conference on Spoken Language Translation July 2020

[2]. GOPA - International Energy Consultant INTEC & Hamm-Lippstadt University of Applied Sciences 2022

[3]. Published in: IEEE Journal of Selected Topics in Signal Processing ( Volume: 16, Issue: 6, October 2022)

[4]. Published in: IEEE Transactions on Software Engineering ( Volume: 48, Issue: 1, 01 January 2022)

[5]. J. Pradeep, E. Srinivasan, S. Himavathi, Neural network based handwritten character recognition system without feature extraction, in 2011
International Conference on Computer, Communication and Electrical Technology (ICCCET), pp. 40–44 (2011).

[6]. R. Mittal, A. Garg, Text extraction using OCR: A systematic review, in 2020 Second International Conference on Inventive Research in
Computing Applications (ICIRCA), pp. 357–362 (2020).

Automatic Subtitle Generator
0% (1)
Automatic Subtitle Generator
25 pages
Personal Voice Assistant in Python
86% (22)
Personal Voice Assistant in Python
30 pages
Personal Voice Assistant in Python
100% (1)
Personal Voice Assistant in Python
30 pages
Real-Time Video To Text Transcription Android App (Using Video Processing and Multimedia)
No ratings yet
Real-Time Video To Text Transcription Android App (Using Video Processing and Multimedia)
32 pages
Automatic Speech Recognition Using Python
No ratings yet
Automatic Speech Recognition Using Python
18 pages
Speech To Text
No ratings yet
Speech To Text
4 pages
HW 8
No ratings yet
HW 8
2 pages
Parker Hyd Motor
No ratings yet
Parker Hyd Motor
44 pages
On Text To Speech Conversion Using OCR
50% (2)
On Text To Speech Conversion Using OCR
26 pages
6.python Text To Speech
No ratings yet
6.python Text To Speech
2 pages
Speech Recognition
No ratings yet
Speech Recognition
13 pages
VMix Control Panel Manual - 20201020
No ratings yet
VMix Control Panel Manual - 20201020
27 pages
PDF
No ratings yet
PDF
62 pages
Making A Powerful Programmable Keypad For Less Than $30
No ratings yet
Making A Powerful Programmable Keypad For Less Than $30
14 pages
Text To Speech
No ratings yet
Text To Speech
9 pages
Python - Module at Master Livewires - Python GitHub
No ratings yet
Python - Module at Master Livewires - Python GitHub
4 pages
Text To Speech Conversion Module
No ratings yet
Text To Speech Conversion Module
8 pages
Visual Assist
No ratings yet
Visual Assist
53 pages
ELE2120 Digital Circuits and Systems: Tutorial Note 9
No ratings yet
ELE2120 Digital Circuits and Systems: Tutorial Note 9
25 pages
MPR-3 User Manual V5
No ratings yet
MPR-3 User Manual V5
52 pages
CSC213 Object Oriented Programming-Lab Manual-Sol
No ratings yet
CSC213 Object Oriented Programming-Lab Manual-Sol
83 pages
2G Huawei Site Solution
100% (1)
2G Huawei Site Solution
31 pages
SAP HANA Cloud - Foundation - Unit 3
No ratings yet
SAP HANA Cloud - Foundation - Unit 3
20 pages
DSA Cheat Sheet
No ratings yet
DSA Cheat Sheet
4 pages
Voice Recognition System: Speech-To-Text: Journal of Applied and Fundamental Sciences November 2015
No ratings yet
Voice Recognition System: Speech-To-Text: Journal of Applied and Fundamental Sciences November 2015
6 pages
Voice Recognition System: Speech-To-Text: Journal of Applied and Fundamental Sciences November 2015
No ratings yet
Voice Recognition System: Speech-To-Text: Journal of Applied and Fundamental Sciences November 2015
6 pages
TEXT - TO - SPEECH - CONVERSION - 22215a1211
No ratings yet
TEXT - TO - SPEECH - CONVERSION - 22215a1211
8 pages
(IJCST-V9I2P18) :swati, Harpreet Kaur
No ratings yet
(IJCST-V9I2P18) :swati, Harpreet Kaur
6 pages
02 Modul Exasol SQL - en
No ratings yet
02 Modul Exasol SQL - en
41 pages
Text To Speechh Technology
No ratings yet
Text To Speechh Technology
28 pages
Test Design Specification Template
No ratings yet
Test Design Specification Template
5 pages
Time Place : at On in
No ratings yet
Time Place : at On in
4 pages
Jiva Profounded in Visistadvitha
No ratings yet
Jiva Profounded in Visistadvitha
185 pages
Speech To Text
No ratings yet
Speech To Text
6 pages
Worksheet Unit 8
No ratings yet
Worksheet Unit 8
5 pages
Tamil Textual Image Reader
No ratings yet
Tamil Textual Image Reader
4 pages
Dogs British English Teacher Ver2
No ratings yet
Dogs British English Teacher Ver2
6 pages
Main (pt2)
No ratings yet
Main (pt2)
13 pages
7
No ratings yet
7
5 pages
Dpa Series
No ratings yet
Dpa Series
8 pages
FINALYEAR PROJECT - Docsx
No ratings yet
FINALYEAR PROJECT - Docsx
56 pages
Makerere University Business School Report
No ratings yet
Makerere University Business School Report
32 pages
12.MODULE 12. Historical-Biographical Criticism - Lecture
No ratings yet
12.MODULE 12. Historical-Biographical Criticism - Lecture
2 pages
Summarization - Doc - Jupyter Notebook
No ratings yet
Summarization - Doc - Jupyter Notebook
12 pages
103 359 1 PB
No ratings yet
103 359 1 PB
6 pages
Lecture4 AccessControl
No ratings yet
Lecture4 AccessControl
13 pages
Real Time Voice Translator
No ratings yet
Real Time Voice Translator
28 pages
DB2 Database Backup and Restore Steps
No ratings yet
DB2 Database Backup and Restore Steps
3 pages
The Laplace Transform
No ratings yet
The Laplace Transform
3 pages
Development of Multilingual Speech
No ratings yet
Development of Multilingual Speech
13 pages
Form 430 ECS Familiarisation Checklist
No ratings yet
Form 430 ECS Familiarisation Checklist
7 pages
Sujal Kumar Sinha - IOT - MATLAB Mini
No ratings yet
Sujal Kumar Sinha - IOT - MATLAB Mini
13 pages
Priyank Dewashish
No ratings yet
Priyank Dewashish
15 pages
Lab 5
No ratings yet
Lab 5
10 pages
Paper 1
No ratings yet
Paper 1
3 pages
Thank You
No ratings yet
Thank You
23 pages
DL Based Speech To Text Converter For Audio Visual Applications
No ratings yet
DL Based Speech To Text Converter For Audio Visual Applications
4 pages
7sem Projectreport
No ratings yet
7sem Projectreport
33 pages
Capstone Paper
No ratings yet
Capstone Paper
3 pages
Speech To Text
No ratings yet
Speech To Text
17 pages
Dip PDF
No ratings yet
Dip PDF
30 pages
Speech Image Translator Presentation
No ratings yet
Speech Image Translator Presentation
16 pages
Logical Reasoning
No ratings yet
Logical Reasoning
7 pages
133-138, Tesma0810, IJEAST
No ratings yet
133-138, Tesma0810, IJEAST
6 pages
Project
No ratings yet
Project
8 pages
Text To Speech
No ratings yet
Text To Speech
14 pages
Labs 9
No ratings yet
Labs 9
4 pages
Consulting Proposal
No ratings yet
Consulting Proposal
28 pages
Paper 5728
No ratings yet
Paper 5728
3 pages
Literature Survey1
No ratings yet
Literature Survey1
4 pages
Essay - A Portrait of A Lady On Fire
No ratings yet
Essay - A Portrait of A Lady On Fire
4 pages
Speech To Image Conversion: Shaik Karishma, Siddu Devi Naga Susmitha, Nanditha Katari, G. Sirisha
No ratings yet
Speech To Image Conversion: Shaik Karishma, Siddu Devi Naga Susmitha, Nanditha Katari, G. Sirisha
5 pages
Wa0002.
No ratings yet
Wa0002.
10 pages
DL Proj Rep
No ratings yet
DL Proj Rep
11 pages
Voice Assistant
No ratings yet
Voice Assistant
34 pages
Conversionof Image, Value, and Text To Speech by Using Machine Learning
No ratings yet
Conversionof Image, Value, and Text To Speech by Using Machine Learning
16 pages
Voice Assistant
No ratings yet
Voice Assistant
30 pages
Speech To Text Conversion
No ratings yet
Speech To Text Conversion
7 pages
Lec 11
No ratings yet
Lec 11
6 pages
1.modern Text Tool
No ratings yet
1.modern Text Tool
8 pages
Concerning Divine Wisdom in The Creation of Man 1st Edition Abu Hamid Al-Ghazali PDF Download
No ratings yet
Concerning Divine Wisdom in The Creation of Man 1st Edition Abu Hamid Al-Ghazali PDF Download
42 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Department of Computer Science and Engineering) : CGB1121/ EGB1122
No ratings yet
Department of Computer Science and Engineering) : CGB1121/ EGB1122
18 pages
Anurag Synop
No ratings yet
Anurag Synop
9 pages
VP Researchpaper 11
No ratings yet
VP Researchpaper 11
4 pages
Jeswanth
No ratings yet
Jeswanth
17 pages
Subtitle
No ratings yet
Subtitle
4 pages
Final Report
No ratings yet
Final Report
35 pages
Ai Project Sona-1 (1) - 250630 - 194118
No ratings yet
Ai Project Sona-1 (1) - 250630 - 194118
10 pages
ChatGPT for Linguists: Revolutionize Language Research and Analysis with AI-Driven Insights (2024 Guide)
From Everand
ChatGPT for Linguists: Revolutionize Language Research and Analysis with AI-Driven Insights (2024 Guide)
JED RAMOS
No ratings yet

Paper 4

Uploaded by

Paper 4

Uploaded by

International Journal of Research Publication and Reviews, Vol 4, no 5, pp 1204-1208 May 2023

International Journal of Research Publication and Reviews

Implementation of Video and Audio to Text Converter

Dr. M. Saraswathi1, VVSV Ronit2, S Sai Pranav3

Keywords: Video to text, Audio to text, Python, Tkinter.

II. Literature Survey:

TITLE AUTHOR FINDINGS

III. Proposed System:

In this system, we have developed few modules such as

• Video to audio conversion

• Gui output display

Video to audio conversion:

IV. Methodology Used

Methods of video and audio toText Converter

The result of our proposed work is shown is in fig 1 to 3

Fig 1:Appearing of the gui of the project:

Fig 2:Result of the video to text:

Fig 3: Result of the audio to text:

You might also like