Design of Voice To Text Conversion and Management Program Based On Google Cloud Speech API
Design of Voice To Text Conversion and Management Program Based On Google Cloud Speech API
Jungyoon Choi †, Haeyoung Gill †, Soobin Ou ††, Yoojeong Song ††, Jongwoo Lee †††
† Division of ICT Convergence Engineering, Department of IT Engineering
Sookmyung Women’s University
Seoul, South Korea
e-mail: {gongju210, haee02}@naver.com
ЪЪ Department of IT Enginerring
Sookmyung Women’s University
Seoul, South Korea
e-mail: :{sbwoo, yjsong}@sm.ac.kr
Division of ICT Convergence Engineering, Department of IT Engineering
Sookmyung Women’s University
Seoul, South Korea
e-mail: : [email protected] (Corresponding author)
Abstract—Sexual crime, including sexual harassment and sex view the converted text file in the web service whenever
assault, is prevalent. In particular, the number of reported wanted.
cases of sexual crimes occurring in the workplace is steadily
increasing. Victims of sexual crime are required to prove the II. RERATED WORKS
fact of the damage, but it is not easy to prove the evidence, so
the sex offenders are often not punished properly because of A. Speechy
insufficient evidence. In this paper, we design a recording Speechy is a real-time dictation application based on
service called CCVoice. It uses mobile devices to record
artificial intelligence and speech recognition engine [1].
everyday life. At the same time, it converts the recorded file to
There is a paid version of Speechy and a free version of
text using Google Cloud Speech API and save the text file.
Therefore, it is possible to easily obtain voice evidence when a
Speechy Lite, and the function is the same except that there
user is suddenly sexually abused such as sexual harassment or is a separate recording time limit in Speechy Lite. It can be
sex assault. edited precisely with vocabulary addition and text editing.
However, the real-time conversion function causes a delay in
Keywords-Voice record; Android; Web; Google Cloud the text conversion, resulting in poor speech recognition and
Speech API poor accuracy.
B. Reviewiser
I. INTRODUCTION Reviewiser is a service that converts voice recordings to
Recently, the number of reported cases of sexual crimes text, and long voice files can be converted easily and quickly
such as sexual harassment and sexual assault has increased in with artificial intelligence technology [2]. It uploads a
the workplace and in daily life. However, it is difficult for recording file to the home page and converts the audio file
the victim to directly and accurately testify even if the victim into text. This service allows you to edit the exact words by
reports this crime. As a result, evidence is insufficient and listening to the voice file through various editing functions.
the perpetrator may not be properly punished or prosecuted. However, there is no mobile application, and users need to
To solve these problems, we propose CCVoice that can record and upload files using another system
record voice using mobile device. The CCVoice is a
combination of 'CCTV' and 'voice'. At the same time as C. ڤګڜٻۃھۀۀۋڮٻڿېۊۇڞٻۀۇۂۊۊڢ
recording, it convert the recorded file to text using Google Google Cloud Speech-to-Text is an automated speech
Cloud Speech API and store the text file. The user can check recognition (ARS) API based on deep neural networks that
the converted text file in a service based on web and use the can be used in applications such as voice search and speech
text file whenever the user needs voice evidence. text conversion [3]. It can also handle noisy audio in a
We design the CCVoice with the following goals. First, variety of environments. It supports both live streaming or
when the user registers a desired time slot, recording should pre-recorded audio. Audio over a minute should use the uri
be automatically performed at that time. Second, the field to refer to the audio file on Google Cloud Storage,
accuracy of converting the voice to text should be good which costs $ 0.02 per GB per month. The Google Cloud
enough so that the user can present it as an evidence. Third, Speech API has a good overall understanding of the sentence
The user must be able to hear the recorded voice file and and has the best recognition rate in standard language[4].
1453