Integrating Google Speech Recognition With Android Home Screen Application For Easy and Fast Multitasking
Integrating Google Speech Recognition With Android Home Screen Application For Easy and Fast Multitasking
Integrating Google Speech Recognition With Android Home Screen Application For Easy and Fast Multitasking
net/publication/325249014
CITATION READS
1 798
2 authors, including:
Venkateswarakiran Lanka
Godavari Institute of Engineering and Technology
3 PUBLICATIONS 1 CITATION
SEE PROFILE
All content following this page was uploaded by Venkateswarakiran Lanka on 19 May 2018.
Abstract—Over the years Speech Recognition is more popular and tablets. In 2005 Google took over company Android Inc.,
and it provides functionality for easy user input. Speech and two years later, in collaboration with the group the Open
recognition is widely used in these days in various domains. Handset Alliance, presented Android operating system (OS).
Speech recognition can minimize the text input and also Features of Android operating system are:
reduces the touches of user. Speech input can be input by
device like microphone. Speech data Speech recognition Enables free download of development environment for
converts user voice data to text output, and by converting text application development.
data to commands some actions can be accessed easily. Each application can access to most resources in device.
Android home screen applications provide users to access and Every application runs in specific memory size.
open the installed apps in the device, but multitasking is not Easy development of applications using development tools
easily done in these apps. User has to follow sequence of (Android Studio) and rich database of software libraries.
actions or touches to multitasking or navigating between apps, High quality of audio and visual content, it is possible to
this can be minimized by using Speech recognition. By use vector graphics, and most audio and video formats.
integrating speech recognition in home screen app user can Ability to develop and test applications on most computing
open the app by only its name to open the app and can also platforms, including Windows, Linux using tools and
access some system functionalities easily by just saying the AVD (Android Virtual Devices).
functionality name. The Android OS architecture is divided into layers (fig. 1.).
Index Terms—Android, Home Screen Application, The application layer of Android operating system is visible to
Multitasking, Speech Recognition, Voice. end user, and consists of user applications. The application
layer includes basic applications which contain UI and services
I. INTRODUCTION for background processes. All applications are written in the
The main concept behind “Integrating Google speech Java programming language using Android SDK. Framework
recognition with android home screen application for easy and is extensible set of software components used by all
fast multitasking” is to simplify the multitasking with the use applications in the operating system. The next layer represents
of Speech recognition. This application takes the voice input the libraries, which are written in the C, C++ programming
and then inbuilt speech recognition engine on the device languages, and OS accesses them via framework. Dalvik
process the speech input and gives the result, or the speech Virtual Machine (DVM), is the main part of the executive
input is send to the speech recognition server and server system environment. Virtual machine starts the core libraries
processes the input and sends the result. Based on the output written in the Java programming language and is based on
result text the specific command is executed. The main registry structure and it is intended for mobile devices, unlike
advantage of this application is user can input the text even if Java’s virtual machine, which is based on the stack.
he/she with in another application. This application takes input
anytime as long as the microphone is free to use. They are
many uses of this application.
Users does not have to go through the series of touches to
open any specific app, they only need to know the app name
they want to open. This application provides the functionalities
what a home screen application does. They are many home
screen applications are there in the market and many speech
recognition applications, but home screen applications shows
apps, users have to close current running app and go to apps
and select the app to navigate between the applications, this
process is irritates to navigating between different apps, every
time user has to do all these steps to open apps. In this
application the user can say app name to open the app, this
process is simple rather than always go through series of steps
for navigating between apps. User can also have access to
some system functionalities like turning on flash, Wi-Fi,
Bluetooth and adjusting screen brightness and etc using voice Fig1: Android Architecture
commands.
The bottom architecture layer of Android operating system is
II. ANDROID kernel and is based on Linux OS, which serves as a hardware
abstraction layer. The main reasons for its use are memory
Android is a software environment that includes an operating
management and processes, security model, network system
system, middleware and key applications for mobile devices
2nd National Conference on Emerging Trends in Computing (NCETC-2K18) Organized by Department of Computer Applications
Godavari Institute of Engineering and Technology , Rajahmundry, Andhra Pradesh on 2 & 3 Feb 2018 150 | P a g e
Special Issue Published in International Journal of Trend in Research and Development (IJTRD),
ISSN: 2394-9333, www.ijtrd.com
and the constant development of systems. There are four basic Speech recognition is 7% faster than the online Speech
components used in construction of applications: activity, recognition with Google server.
intent, service, broadcast receivers and the content provider.
Android provides permission model for user data security of IV. MAIN PARTS OF THE PROJECT
how apps is limited to use the resources. Activity is the root
element of every application and it as a window that users see A. Main class
on their mobile device and acts as root for UI views. The This is the startup class and contains the functionality of Home
application can have one or more activities. Main activity is screen application, this class is responsible for showing the
the one that is used as startup. The transition between the apps installed in the device and responsible for opening the
activities is carried out in a way that launched activity calls a apps chosen by the user and is responsible for adding the
new activity. During the execution of applications, activities Widgets and shortcuts to the home screen. This class starts the
are added to the stack, currently running activity is on the top Service class, Multiple events are registered.
of the stack, activities are switched at run time using intents. B. Service class
An intent is a message used to run the activities, services
(Background processes). An intent can contain the name of the Service is an Android service defined as service in
components (package name and additional info) you need to AndroidManifest.xml file. Service class starts the Speech
run, the action which is necessary to execute, the address of recognition, and used to display the current status of Speech
stored data needed to run the component, and component type. recognition component and also used to display the recognized
A service is a component that runs in background to do long words. This class add UI component that is used to start the
running operations or to perform work for remote processes for Speech recognition. This UI button is always stays on top of
example downloading files from server or retrieve contents every app and user can interact it with this UI button, even if
from servers etc. One service can link multiple applications user using another app. By tapping this button a list shows
and service is running all the time until a connection with all installed apps in side of window in a row. If user experiences
applications is done or it was stopped. A content provider any problems using Speech Recognition then user can select
manages a shared set of application data. Data can be stored in app from list displayed in the side of window. It used as an
the file system, a SQLite database, on the web, or any other alternative, if too much noise is recorded by the micro phone.
persistent storage location which application can access [1]. After holding the button user can say the Speech (commands)
and visual feedback of what the words are recognized are
III. SPEECH RECOGNITION displayed on top of all apps with in another UI element. After
the speech is completed event listener functions are called by
Speech recognition for application is done on Google server or
the Speech recognizer.
on the same device based on the Android version and user
choice. Process involves the conversion of speech into a set of C. Speech Listener class
words and is performed by software component. Accuracy of
speech recognition systems differ in vocabulary size and Speech Listener is a child class of the Speech Recognizer
confusability, speaker dependence vs. independence, modality class, all abstract methods are implemented in this class and
of speech (isolated, discontinuous, or continuous speech, read these methods are invoked by Speech Recognizer. When
or spontaneous speech), task and language constraints [2]. Speech recognizer starts listening particular methods are
Speech recognition system can be divided into several blocks: invoked accordingly by time. When user starts speaking
This period is usually 20 ms or can be extended based on method OnPartialResults() is invoked and Bundle is passed as
parameters, because signal in this interval is considered the argument. After speech is completed or speech time out is
stationary. Feature vectors from training database are used to occurred then OnResults() method is invoked and Bundle as
estimate the parameters of acoustic models. Acoustic model passed as argument to this method. After the Results are came
describes properties of the basic elements that can be this text is processed and converted into commands, and based
recognized. The basic element can be a phoneme for on commands, and commands are executed.
continuous speech or word for isolated words recognition. D. XML files
Dictionary is used to connect acoustic models with vocabulary
words. Language model reduces the number of acceptable Activity UI is designed in XML files. These XML files are
word combinations based on the rules of language and reusable, after adding the components form the XML resource
statistical information from different texts. Speech recognition files these can be reusable. They are different types of UI
systems, based on hidden Markov models [3] are today most components are available. They are different combinations can
widely applied in modern technologies. The continuous speech be selected from XML file attributes, based on these attributes
waveform is first converted to a sequence of equally spaced UI components are arranged in the Activity screen. UI is added
discrete parameter vectors. Vectors of speech characteristics to the activity screen at the OnCreate() method by calling the
consist mostly of MFC (Mel Frequency Cepstral) coefficients, method setContentView() and passing the XML file from the
standardized by the European Telecommunications Standards resources.
Institute for speech recognition. The European V. OUTPUT TEXT PROCESSING
Telecommunications Standards Institute in the early 2000s
defined a standardized MFCC algorithm to be used in mobile When user input the voice to the speech recognition process
phones [4]. Standard MFC coefficients are constructed in a the speech data and converts into text output. This text is
few simple steps. A short-time Fourier analysis of the speech further processed to convert into particular command. Text is
signal using a finite-duration window (typically 20ms) is passed to the algorithm. This algorithm checks the text with
performed and the power spectrum is computed. Speech array of strings and compare the strings and gives the high
recognition can be processed at Google server or at the same match case of string, this string is converted into command.
device where the input is recorded. From Android version 6.0 This process is done at multiple levels to determine the high
Speech recognition engine is integrated into the OS. Offline accuracy of the required command.app names are stored in
array list and when user say any app name, this output text is
2nd National Conference on Emerging Trends in Computing (NCETC-2K18) Organized by Department of Computer Applications
Godavari Institute of Engineering and Technology , Rajahmundry, Andhra Pradesh on 2 & 3 Feb 2018 151 | P a g e
Special Issue Published in International Journal of Trend in Research and Development (IJTRD),
ISSN: 2394-9333, www.ijtrd.com
compared with the array of strings. Text output is compared
with every app name in the arraylist, at a time two strings are
compared, first letter of strings are compared and if those are
equal then second letter and so on. If any of the letter
mismatches the index value of letter mismatches is stored as
first mismatch value and the comparison process is continuous
through the remaining of letters and how many letters matched
are stored, these all values are stored in an array list. This
arraylist is further processed in levels to obtain the required
app name. in every level arraylist is shortened by reordering
and removing 0match case app names from list. This process is
repeats for further minimizing the arraylist to get the required
app name, this final app name is converted into command and
command is executed.
VI. APPLICATION FUNCTIONALITY PRINCIPLE
CONCLUSION
This Application uses Speech recognition for easy multitasking
between Applications and can easily access the system With the improvement of software and hardware capabilities
functionalities like turning on Bluetooth, wi-fi and adjusting of mobile devices, there is an increased need for device-
screen brightness using speech input. After application started specific content, what resulted in market changes. Speech
an image button is placed in top window, which initiate voice recognition technology is of particular interest due to the direct
recognition process. When speech has been detected support of communications between human and Devices. By
application opens connection with Google’s server and starts integrating Google Speech recognition in android Home screen
and sends the recorded speech and after processing speech application apps can be open easily by saying its name, and
server sends the output text to the uses psrocess. Based on the also access system functionalities like adjusting brightness and
Android version using user has the chance to select where the etc. Multitasking is more responsive by using speech
speech is to be processed. From Android version 6.0 speech recognition, saying app name, that particular app is open’s.
recognition engine is implemented in OS. Offline recognition Future enhancements are, more functionalities are added for
is more fast and less error rate. There are more languages are easy accessing system functionalities and continuous speech
available to download. Google has more than 230 billion recognition.
words stored in their DB. If we use this kind of speech WEB SITES
recognizer it is very likely that our voice is stored on Google’s
servers [6]. This fact provides continuous increase of data used 1. Android Developer training page
for training, this increases the accuracy of the system. But in https://developer.android.com/training/index.html
offline Speech recognition it is not possible to store all the data 2. Speech Recognition API documentation
in user device, so Google developed a model, user device can https://developer.android.com/reference/android/speech/
download the languages not more than size 20MB. package-summary.html
3. Speech recognition application example
VII. SCREENSHOTS https://github.com/gotev/android-speech
4. Speech recognition sample code
https://stackoverflow.com/questions/17747722/android-
source-code-home-sample
References
[1] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri.
OpenFst: A general and efficient weighted _nite-state transducer
library. Lecture Notes in Computer Science, 4783:11, 2007.
[2] M. Bacchiani, F. Beaufays, J. Schalkwyk, M. Schuster, and B.
Strope. Deploying GOOG-411: Early lessons in data,
measurement, and testing. In Proceedings of ICASSP, pages
5260{5263, 2008}.
[3] MJF Gales. Semi-tied full-covariance matrices for hidden
Markov models. 1997.
[4] B. Harb, C. Chelba, J. Dean, and G. Ghemawhat. Back-o_
Language Model Compression. 2009.
[5] Maryam Kamvar and Shumeet Baluja. A large scale study of
wireless search behavior: Google mobile search. In CHI, pages
701{709, 2006.
[6] B. Raghavendhar Reddy, E. Mahender. Speech to Text
Conversion using Android Platform in IJERA ISSN: 2248-
9622.
[7] Ryuichi Nisimura, Jumpei Miyake, Hideki Kawahara and
Toshio Irino, “Speech-To-Text Input Method For Web System
Using Javascript”, IEEE SLT 2008 pp 209-212.
[8] S. Katz. Estimation of probabilities from sparse data for the
language model component of a speech recognizer. In IEEE
Transactions on Acoustics, Speech and Signal Processing,
volume 35, pages 400{01, March 1987.
2nd National Conference on Emerging Trends in Computing (NCETC-2K18) Organized by Department of Computer Applications
Godavari Institute of Engineering and Technology , Rajahmundry, Andhra Pradesh on 2 & 3 Feb 2018 152 | P a g e
View publication stats