Voice Application Development for Android: A practical guide to develop advanced and exciting voice applications for Android using open source software
()
About this ebook
Speech technology has been around for some time now. However, it has only more recently captured the imagination of the general public with the advent of personal assistants on mobile devices that you can talk to in your own language. The potential of voice apps is huge as a novel and natural way to use mobile devices.
Voice Application Development for Android is a practical, hands-on guide that provides you with a series of clear, step-by-step examples which will help you to build on the basic technologies and create more advanced and more engaging applications. With this book, you will learn how to create useful voice apps that you can deploy on your own Android device in no time at all.
This book introduces you to the technologies behind voice application development in a clear and intuitive way. You will learn how to use open source software to develop apps that talk and that recognize your speech. Building on this, you will progress to developing more complex apps that can perform useful tasks, and you will learn how to develop a simple voice-based personal assistant that you can customize to suit your own needs.
For more interesting information about the book, visit http://lsi.ugr.es/zoraida/androidspeechbook
Related to Voice Application Development for Android
Related ebooks
Voice Application Development for Android Rating: 1 out of 5 stars1/5Mastering Voice Interfaces: Creating Great Voice Apps for Real Users Rating: 0 out of 5 stars0 ratingsAsynchronous Android Rating: 4 out of 5 stars4/5Speech Generating Device: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsLearning Material Design Rating: 4 out of 5 stars4/5Mastering Software Testing with JUnit 5: Comprehensive guide to develop high quality Java applications Rating: 0 out of 5 stars0 ratingsSwift Quick Syntax Reference Rating: 0 out of 5 stars0 ratingsFlutter Cookbook: 100+ step-by-step recipes for building cross-platform, professional-grade apps with Flutter 3.10.x and Dart 3.x Rating: 0 out of 5 stars0 ratingsTesting and Securing Android Studio Applications Rating: 0 out of 5 stars0 ratingsDeveloping Inclusive Mobile Apps: Building Accessible Apps for iOS and Android Rating: 0 out of 5 stars0 ratingsRust Essentials - Second Edition: A quick guide to writing fast, safe, and concurrent systems and applications Rating: 0 out of 5 stars0 ratingsIoT Penetration Testing Cookbook.: Identify vulnerabilities and secure your smart devices Rating: 0 out of 5 stars0 ratingsWebRTC Blueprints Rating: 0 out of 5 stars0 ratingsIoT Development for ESP32 and ESP8266 with JavaScript: A Practical Guide to XS and the Moddable SDK Rating: 0 out of 5 stars0 ratingsPhoneGap and AngularJS for Cross-platform Development Rating: 0 out of 5 stars0 ratingsDragon Professional - A Step Further Rating: 0 out of 5 stars0 ratingsMATLAB for Machine Learning Rating: 0 out of 5 stars0 ratingsPython Network Programming Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsMastering The XMPP Framework: Develop XMPP Chat Applications for iOS Rating: 5 out of 5 stars5/5Software Development From A to Z: A Deep Dive into all the Roles Involved in the Creation of Software Rating: 0 out of 5 stars0 ratingsMachine Learning for Developers: Uplift your regular applications with the power of statistics, analytics, and machine learning Rating: 0 out of 5 stars0 ratingsObjective-C Memory Management Essentials Rating: 0 out of 5 stars0 ratingsWeb Applications with Elm: Functional Programming for the Web Rating: 0 out of 5 stars0 ratings
Programming For You
Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsLearn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5JavaScript All-in-One For Dummies Rating: 5 out of 5 stars5/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast! Rating: 5 out of 5 stars5/5Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications Rating: 0 out of 5 stars0 ratingsHTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5Beginning Programming with C++ For Dummies Rating: 4 out of 5 stars4/5HTML in 30 Pages Rating: 5 out of 5 stars5/5Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1 Rating: 4 out of 5 stars4/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Coding with JavaScript For Dummies Rating: 0 out of 5 stars0 ratingsC# Programming from Zero to Proficiency (Beginner): C# from Zero to Proficiency, #2 Rating: 0 out of 5 stars0 ratings
Reviews for Voice Application Development for Android
0 ratings0 reviews
Book preview
Voice Application Development for Android - Michael F. McTear
Table of Contents
Voice Application Development for Android
Credits
Foreword
About the Authors
Acknowledgement
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Web page for the book
Errata
Piracy
Questions
1. Speech on Android Devices
Using speech on an Android device
Speech-to-text
Text-to-speech
Voice Search
Android Voice Actions
Virtual Personal Assistants
Designing and developing a speech app
Why Google speech?
What is needed to create a Virtual Personal Assistant?
Summary
2. Text-to-Speech Synthesis
Introducing text-to-speech synthesis
The technology of text-to-speech synthesis
Using pre-recorded speech instead of TTS
Using Google text-to-speech synthesis
Starting the TTS engine
Developing applications with Google TTS
TTSWithLib app – Reading user input
TTSReadFile app – Reading a file out loud
Summary
3. Speech Recognition
The technology of speech recognition
Using Google speech recognition
Developing applications with the Google speech recognition API
ASRWithIntent app
ASRWithLib app
Summary
4. Simple Voice Interactions
Voice interactions
VoiceSearch app
VoiceLaunch app
VoiceSearchConfirmation app
Summary
5. Form-filling Dialogs
Form-filling dialogs
Implementing form-filling dialogs
Threading
XMLLib
FormFillLib
VXMLParser
DialogInterpreter
MusicBrain app
Summary
6. Grammars for Dialog
Grammars for speech recognition and natural language understanding
NLU with hand-crafted grammars
Statistical NLU
NLULib
Processing XML grammars
Processing statistical grammars
The GrammarTest app
Summary
7. Multilingual and Multimodal Dialogs
Multilinguality
Multimodality
Summary
8. Dialogs with Virtual Personal Assistants
The technology of VPA
Determining the user's intention
Making an appropriate response
Pandorabots
AIML
Using oob tag to add additional functions
The VPALib library
Creating a Pandorabot
Sample VPAs – Jack, Derek, and Stacy
Alternative approaches
Summary
9. Taking it Further
Developing a more advanced Virtual Personal Assistant
Summary
A. Afterword
Index
Voice Application Development for Android
Voice Application Development for Android
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2013
Production Reference: 2041213
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-529-7
www.packtpub.com
Cover Image by Aniket Sawant (<[email protected]>)
Credits
Authors
Michael F. McTear
Zoraida Callejas
Reviewers
Deborah A. Dahl
Greg Milette
Acquisition Editor
Rebecca Youe
Commissioning Editor
Amit Ghodake
Technical Editors
Aparna Chand
Nadeem N. Bagban
Project Coordinator
Michelle Quadros
Proofreader
Hardip Sidhu
Indexer
Mehreen Deshmukh
Graphics
Ronak Dhruv
Production Coordinator
Aparna Bhagat
Cover Work
Aparna Bhagat
Foreword
There are many reasons why users need to speak and listen to mobile devices. We spend the first couple of years of our lives learning how to speak and listen to other people, so it is natural that we should be able to speak and listen to our mobile devices. As mobiles become smaller, the space available for physical keypads shrinks, making more difficult to use. Wearable devices such as Google Glass and smart watches don't have physical keypads. Speaking and listening is becoming a major means of interaction with mobile devices.
Eventually computers with microphones and speakers will be embedded into our home environment, eliminating the need for remote controls and handheld device. Speaking and listening will become the major form of communication with home appliances such as TVs, environmental controls, home security, coffee makers, ovens, and refrigerators.
When we perform tasks that require the use of our eyes and hands, we need speech technologies. Speech is the only practical way for interacting with an Android computer while driving a car or operating complex machinery. Holding and using a mobile device while driving is illegal in some places.
Siri and other intelligent agents enable mobile users to speak a search query. While these systems require sophisticated artificial intelligence and natural language techniques which are complex and time consuming to implement, they demonstrate the use of speech technologies that enable users to search for information.
Guides for self-help
tasks requiring both hands and eyes present big opportunities for Android applications. Soon we will have electronic guides that speak and listen to help us assemble, troubleshoot, repair, fine-tune, and use equipment of all kinds. What's causing the strange sound in my car's engine? Why won't my television turn on? How do I adjust the air conditioner to cool the house? How do I fix a paper jam in my printer? Printed instructions, user guides, and manuals may be difficult to locate and difficult to read while your eyes are examining and your hands are manipulating the equipment.
Let a speech-enabled application talk you through the process, step-by-step. These self-help applications replace user documentation for almost any product.
Rather than hunting for the appropriate paperwork, just download the latest instructions simply by scanning the QR code on the product. After completing a step, simply say next
to listen to the next instruction or repeat
to hear the current instruction again. The self-help application can also display device schematics, illustrations, and even animations and video clips illustrating how to perform a task.
Voice messages and sounds are two of the best ways to catch a person's attention. Important alerts, notifications, and messages should be presented to the user vocally, in addition to displaying them on a screen where the user might not notice them.
These are a few of the many reasons to develop applications that speak and listen to users. This book will introduce you to building speech applications. Its examples at different levels of complexity are a good starting point for experimenting with this technology. Then for more ideas of interesting applications to implement, see the Afterword at the end of the book.
James A. Larson
Vice President and Founder of Larson Technical Services
About the Authors
Michael F. McTear is Emeritus Professor of Knowledge Engineering at the University of Ulster with a special research interest in spoken language technologies. He graduated in German Language and Literature from Queens University Belfast in 1965, was awarded MA in Linguistics at University of Essex in 1975, and a PhD at the University of Ulster in 1981. He has been Visiting Professor at the University of Hawaii (1986-87), the University of Koblenz, Germany (1994-95), and University of Granada, Spain (2006- 2010). He has been researching in the field of spoken dialogue systems for more than 15 years and is the author of the widely used text book Spoken Dialogue Technology: Toward the Conversational User Interface (Springer Verlag, 2004). He also is a co-author of the book Spoken Dialogue Systems (Morgan and Claypool, 2010).
Michael has delivered keynote addresses at many conferences and workshops, including the EU funded DUMAS Workshop, Geneva, 2004, the SIGDial workshop, Lisbon, 2005, the Spanish Conference on Natural Language Processing (SEPLN), Granada, 2005, and has delivered invited tutorials at IEEE/ACL Conference on Spoken Language Technologies, Aruba, 2006, and ACL 2007, Prague. He has presented on several occasions at SpeechTEK, a conference for speech technology professionals, in New York and London. He is a certified VoiceXML developer and has taught VoiceXML at training courses to professionals from companies including Genesys, Oracle, Orange, 3, Fujitsu, and Santander. He was the main developer of the VoiceXML-based home monitoring system for patients with type-2 diabetes, currently in use at the Ulster Hospital, Northern Ireland.
Zoraida Callejas is Assistant Professor at the University of Granada, Spain, where she has been teaching several subjects related to Oral and Multimodal Interfaces, Object Oriented Programming, and Software Engineering for the last eight years. She graduated in Computer Science in 2005, and was awarded a PhD in 2008 from the University of Granada. She has been Visiting Professor in Technical University of Liberec, Czech Republic (2007-13), University of Trento, Italy (2008), University of Ulster, Northern Ireland (2009), Technical University of Berlin, Germany (2010), University of Ulm, Germany (2012), and Telecom ParisTech, France (2013).
Zoraida focuses her research on speech technology and in particular, on spoken and multimodal dialogue systems. Zoraida has made presentations at the main conferences in the area of dialogue systems, and has published her research in several international journals and books. She has also coordinated training courses in the development of interactive speech processing systems, and has regularly taught object-oriented software development in Java in different graduate courses for nine years. Currently, she leads a local project for the development of Android speech applications for intellectually disabled users.
Acknowledgement
We would like to acknowledge the advice and help provided by Amit Ghodake, our Commissioning Editor at Packt Publishing, as well as the support of Michelle Quadros, our Project Coordinator, who ensured that we kept to schedule. A special thanks to our technical reviewers, Deborah A. Dahl and Greg Milette, whose comments and careful reading of the first draft of the book enabled us to make numerous changes in the final version that have greatly improved the quality of the book.
Finally, we would like to acknowledge our partners Sandra McTear and David Griol for putting up with our absences while we devoted so much of our time to writing, and sharing the stress of our tight schedule.
About the Reviewers
Dr. Deborah A. Dahl has been working in the areas of speech and natural language processing technologies for over 30 years. She received a Ph.D. in linguistics from the University of Minnesota in 1983, followed by a post-doctoral fellowship in Cognitive Science at the University of Pennsylvania. At Unisys Corporation, she performed research on natural language understanding and spoken dialog systems, and led teams which used these technologies in government and commercial applications. Dr. Dahl founded her company, Conversational Technologies, in 2002. Conversational Technologies provides expertise in