0% found this document useful (0 votes)
6 views

Design of Voice To Text Conversion and Management Program Based On Google Cloud Speech API

Uploaded by

SitmCompSansthan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Design of Voice To Text Conversion and Management Program Based On Google Cloud Speech API

Uploaded by

SitmCompSansthan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

2018 International Conference on Computational Science and Computational Intelligence (CSCI)

Design of voice to text conversion and management program


based on Google Cloud Speech API

Jungyoon Choi †, Haeyoung Gill †, Soobin Ou ††, Yoojeong Song ††, Jongwoo Lee †††
† Division of ICT Convergence Engineering, Department of IT Engineering
Sookmyung Women’s University
Seoul, South Korea
e-mail: {gongju210, haee02}@naver.com
ЪЪ Department of IT Enginerring
Sookmyung Women’s University
Seoul, South Korea
e-mail: :{sbwoo, yjsong}@sm.ac.kr
‚‚‚ Division of ICT Convergence Engineering, Department of IT Engineering
Sookmyung Women’s University
Seoul, South Korea
e-mail: : [email protected] (Corresponding author)

Abstract—Sexual crime, including sexual harassment and sex view the converted text file in the web service whenever
assault, is prevalent. In particular, the number of reported wanted.
cases of sexual crimes occurring in the workplace is steadily
increasing. Victims of sexual crime are required to prove the II. RERATED WORKS
fact of the damage, but it is not easy to prove the evidence, so
the sex offenders are often not punished properly because of A. Speechy
insufficient evidence. In this paper, we design a recording Speechy is a real-time dictation application based on
service called CCVoice. It uses mobile devices to record
artificial intelligence and speech recognition engine [1].
everyday life. At the same time, it converts the recorded file to
There is a paid version of Speechy and a free version of
text using Google Cloud Speech API and save the text file.
Therefore, it is possible to easily obtain voice evidence when a
Speechy Lite, and the function is the same except that there
user is suddenly sexually abused such as sexual harassment or is a separate recording time limit in Speechy Lite. It can be
sex assault. edited precisely with vocabulary addition and text editing.
However, the real-time conversion function causes a delay in
Keywords-Voice record; Android; Web; Google Cloud the text conversion, resulting in poor speech recognition and
Speech API poor accuracy.
B. Reviewiser
I. INTRODUCTION Reviewiser is a service that converts voice recordings to
Recently, the number of reported cases of sexual crimes text, and long voice files can be converted easily and quickly
such as sexual harassment and sexual assault has increased in with artificial intelligence technology [2]. It uploads a
the workplace and in daily life. However, it is difficult for recording file to the home page and converts the audio file
the victim to directly and accurately testify even if the victim into text. This service allows you to edit the exact words by
reports this crime. As a result, evidence is insufficient and listening to the voice file through various editing functions.
the perpetrator may not be properly punished or prosecuted. However, there is no mobile application, and users need to
To solve these problems, we propose CCVoice that can record and upload files using another system
record voice using mobile device. The CCVoice is a
combination of 'CCTV' and 'voice'. At the same time as C. ‫ڤګڜٻۃھۀۀۋڮٻڿېۊۇڞٻۀۇۂۊۊڢ‬
recording, it convert the recorded file to text using Google Google Cloud Speech-to-Text is an automated speech
Cloud Speech API and store the text file. The user can check recognition (ARS) API based on deep neural networks that
the converted text file in a service based on web and use the can be used in applications such as voice search and speech
text file whenever the user needs voice evidence. text conversion [3]. It can also handle noisy audio in a
We design the CCVoice with the following goals. First, variety of environments. It supports both live streaming or
when the user registers a desired time slot, recording should pre-recorded audio. Audio over a minute should use the uri
be automatically performed at that time. Second, the field to refer to the audio file on Google Cloud Storage,
accuracy of converting the voice to text should be good which costs $ 0.02 per GB per month. The Google Cloud
enough so that the user can present it as an evidence. Third, Speech API has a good overall understanding of the sentence
The user must be able to hear the recorded voice file and and has the best recognition rate in standard language[4].

978-1-7281-1360-9/18/$31.00 ©2018 IEEE 1452


DOI 10.1109/CSCI46756.2018.00286
III. DESIGN THE SYSTEM C. Deveopment environment
“Fig. 2” represents the development environment of
A. Features of the CCvoice CCVoice. We will implement the CCVoice as an Android
As mentioned before, CCvoice has the goal of app. The server is based on Java under Linux Ubuntu 16.04
'Automatic recording by time slot', 'Accurate text conversion', using MySQL and PHP. Web service is implemented using
and 'User verification service via the web'. In order to Html, javascript, and PHP.
achieve these goals, the following detailed design measures
are suggested.
First, in the application, basic recording function and
time slot recording function allow user to conveniently
record. After recording for the time set by the user, voice
recording is automatically ended.
Second, when recording is finished using the mobile app,
the voice file should be automatically sent to the server, and
a new recording can be started even while sending the
previous one using multi-threads. To use the Google Cloud
Speech API for free, split the voice file for 1 minute and
send it.
Third, the server converts the voice file received from the
mobile app to text using the Google Cloud Speech API and
saves both recorded files and text files.
Fourth, the user can view and manage recorded voice
Figure 2. Development environment of CCVoice
files and converted text files through web service at any time.
Web services allow users to log in and access and view their
recorded and text files. Since files are saved with the date IV. SUMMARY AND FUTURE WORKS
and time of the recording, users can select the desired file
and play the recorded file or check the text. In this paper, we design the CCvoice that enables users to
record everyday life like CCTV under any circumstances.
B. ‫ۀۍېۏھۀۏۄۃھۍڼٻۈۀۏێ۔ڮ‬ Users can get the evidence they need when a sex crime
occurs, such as sexual harassment or sexual assault.
The system architecture of CCVoice is shown in “Fig. 1”.
CCVoice lets you record at the time you want with the
Users log in to the CCVoice mobile app and record using the
mobile app. After recording, the voice file is immediately
default recording function or the time slot setting recording
sent to the server and converted into a text file via the
function. When recording is finished on the mobile device,
Google Cloud Speech API and saved. Finally, the user can
the recorded voice file is sent to the server in 1 minute
check the voice file and the converted text file on the web, so
increments. The server saves the transferred voice files and
that the management can be convenient and can be used as
calls the Google Cloud Speech API to convert them to text.
an actual evidence.
All recorded voice files and text files are stored in a per-user
In the future works, after completing this system, we will
directory. User can view and manage recorded voice files
make it easier for users to find relevant contents in the text
and text files on web pages.
p g
file using keywords. We will also add the ability for the user
to edit the erroneously recognized text.
ACKNOWLEDGMENT
This research was supported by the National Research
Foundation of Korea(NRF) funded by the MSIT(NRF-
2018R1A4A1025559)
REFERENCES
[1] J. ZHENG, “Voice Dictation-Speechy”, 2017, [online] Available :
https://itunes.apple.com/us/app/voice-dictation-
speechy/id1229437714?mt=8
[2] “Reviewiser”, 2018, [online] Available : https://reviewiser.io/
[3] “Google Cloud Speech-to-Text”, [online] Available :
https://cloud.google.com/speech-to-text/
[4] H. Roh and K. Lee, “A Basic Performance Evaluation of the Speech
Recognition APP ofStandard Language and Dialect using Google,
Naver, and DaumKAKAO APIs”, Asia-pacific Journal of Multimedia
Services Convergent with Art, Humanities, and Sociology, Vol.7,
No.12, pp. 819-829, December 2017.
Figure 1. System architecture

1453

You might also like