SlideShare a Scribd company logo
JavaScript
Speech
Recognition
Who is this guy?
@macdonst
macdonst on Github
simonmacdonald.com
works at Adobe
Apache Cordova core contributor
nutty about speech recognition
The future won't be like Star Trek.
Scott Adams, creator of Dilbert
JavaScript Speech Recognition
Why do I care about speech rec?
JavaScript Speech Recognition
+
= Cape Bretoner
Here's a conversation between two Cape
Bretoners
P1: jeet?
P2: naw, jew?
P1: naw, t'rly t'eet bye.
And here's the translation
P1: jeet?
P1: Did you eat?
P2: naw, jew?
P2: No, did you?
P1: naw, t'rly t'eet bye.
P1: No, it's too early to eat buddy.
Regular Alphabet
26 letters
Cape Breton
Alphabet
12 letters!
Alright,
enough
about me
What is speech
recognition?
Speech recognition is the
process of translating the
spoken word into text.
The process of speech rec
includes...
Record and digitize the audio
data
Perform end pointing
(trimming)
Split data into phonemes
What is a phoneme?
It is a perceptually distinct
units of sound in a specified
language that distinguish one
word from another.
The English language has 44
distinct sounds
Source: English language phoneme chart
By comparison, the Rotokas
speakers in Papua New Guinea
have 11 phonemes.
But the !Xóõ speakers who
mostly live in Botswana have
112 phonemes.
Apply the phonemes to the
recognition model. This is a
massive lexicon which takes
into account all of the different
ways words can be
pronounced.
Analyze the results against the
grammar
Return a confidence weighted
result
[
{
"confidence":0.97335243225098,
"transcript":"hello"
},
{
"confidence":0.19940405040800,
"transcript":"helllow"
},
{
"confidence":0.19910827091000,
"transcript":"howlow"
}
]
Basically...
JavaScript Speech Recognition
We want it to be like this
0:02
but more often than not...
0:25
Why is that?
When two people talk
comprehension rates are better
than 97%
A really good english language
speech recognition system is
right 92% of the time
Where does that extra 5% in
error rate come from?
Vocabulary size and confusability
Speaker dependence vs independence
Isolated or continuous speech
Initiated vs spontaneous speech
Adverse conditions
Mobile Speech Recognition
OS  Application  SDK
Android Google Now Java API
iOS Siri Many 3rd party Obj-C SDK's
Windows Phone Cortana C# API
So how do we
add speech rec
to our app?
You may look at the W3C
Speech API Specification
but only Chrome on the
desktop has implemented that
spec
But that's okay!
The spec looks like this:
interfaceSpeechRecognition:EventTarget{
//recognitionparameters
attributeSpeechGrammarListgrammars;
attributeDOMStringlang;
attributebooleancontinuous;
attributebooleaninterimResults;
attributeunsignedlongmaxAlternatives;
attributeDOMStringserviceURI;
//methodstodrivethespeechinteraction
voidstart();
voidstop();
voidabort();
};
With additional event methods
to control behaviour:
attributeEventHandleronaudiostart;
attributeEventHandleronsoundstart;
attributeEventHandleronspeechstart;
attributeEventHandleronspeechend;
attributeEventHandleronsoundend;
attributeEventHandleronaudioend;
attributeEventHandleronresult;
attributeEventHandleronnomatch;
attributeEventHandleronerror;
attributeEventHandleronstart;
attributeEventHandleronend;
Let's recognize some speech
varrecognition=newSpeechRecognition();
recognition.onresult=function(event){
if(event.results.length>0){
vartest1=document.getElementById("test1");
test1.innerHTML=event.results[0][0].transcript;
}
};
recognition.start();
Click to Speak
Replace me...
So that's pretty
cool...
...if taking dictation gets you
going
But I want to do
something more
exciting with the
result
Let's do something a little less
trivial
recognition.onresult=function(event){
varresult=event.results[0][0].transcript;
varmusic=document.getElementById("music");
switch(result){
case"jazz":
music.src="jazz.mp3";
music.play();
break;
case"rock":
music.src="rock.mp3";
music.play();
break;
case"stop":
default:
music.pause();
}
};
Click to Speak
Which seems
much cooler to
me
Let's ask the web a question
Click to Speak
Works pretty
good...
...but ugly!
Let's style our
button with some
CSS
+
=
<aclass="speechinput">
<imgsrc="images/mic.png">
</a>
#speechinputinput{
cursor:pointer;
margin:auto;
margin:15px;
color:transparent;
background-color:transparent;
border:5px;
width:15px;
-webkit-transform:scale(3.0,3.0);
}
by Nicholas Gallagher
And we'll add some color using
Speech
Bubbles
Pure-CSS-Speech-Bubbles
Then pull it all
together!
JavaScript Speech Recognition
But wait, why am
I using my eyes
like a sucker?
We'll output the answer using
SpeechSynthesis
The SpeechSynthesis spec
looks like this:
interfaceSpeechSynthesis{
readonlyattributebooleanpending;
readonlyattributebooleanspeaking;
readonlyattributebooleanpaused;
voidspeak(SpeechSynthesisUtteranceutterance);
voidcancel();
voidpause();
voidresume();
SpeechSynthesisVoiceListgetVoices();
};
The SpeechSynthesisUtterance
spec looks like this:
interfaceSpeechSynthesisUtterance:EventTarget{
attributeDOMStringtext;
attributeDOMStringlang;
attributeDOMStringvoiceURI;
attributefloatvolume;
attributefloatrate;
attributefloatpitch;
};
With additional event methods
to control behaviour:
attributeEventHandleronstart;
attributeEventHandleronend;
attributeEventHandleronerror;
attributeEventHandleronpause;
attributeEventHandleronresume;
attributeEventHandleronmark;
attributeEventHandleronboundary;
JavaScript Speech Recognition
Plugin repo's
SpeechRecognitionPlugin -
SpeechSynthesisPlugin -
https://github.com/macdonst/SpeechRecognitionPlugin
https://github.com/macdonst/SpeechSynthesisPlugin
* Working with Julio César (@jcesarmobile) to get iOS done
Availability
OS  Recognition  Synthesis
Android ✓ ✓
iOS*  Soonish  Native to iOS 7.0+
Windows Phone  ×  ×
Getting started
phonegapcreatespeechcom.example.speechspeech
cdspeech
phonegapplatformaddandroid
phonegappluginaddhttps://github.com/macdonst/SpeechRecognitionPlugin
phonegappluginaddhttps://github.com/macdonst/SpeechSynthesisPlugin
phonegaprunandroid
For more information on hybrid
applications
Check out Nick Van
Weerdenburg and Andrey
Feldman presentation on
Creating a Comprehensive
Social Media App Using Ionic
and Phone Gap 3:45pm today
in 801A.
But wait, one
more thing...
Speech recognition and speech
synthesis are not well
supported in the emulator
and sometimes developing on
the device can be a bit of a
pain.
That's why I coded
speechshim.js
https://github.com/macdonst/SpeechShim
Chrome + speechshim.js
=
W3C Web Speech API on your
desktop
Types of Speech Recognition
Applications
Voice Web Search
Speech Command Interface
Continuous Recognition of Open Dialog
Domain Specific Grammars Filling Multiple Input Fields
Speech UI present when no visible UI need be present
Voice Activity Detection
Speech Translation
Multimodal Interaction
Speech Driving Directions
JavaScript Speech Recognition
THE END

More Related Content

PDF
Spring, CDI, Jakarta EE good parts
PPTX
Lambda The Extreme: Test-Driving a Functional Language
PDF
BDD in Javascript
PDF
PDF
Embedded application designed by ATS language
PDF
What is the best programming language for beginner?
PPTX
Mark asoi ppt
KEY
Dart: A Replacement for JavaScript and Why You Should Care
Spring, CDI, Jakarta EE good parts
Lambda The Extreme: Test-Driving a Functional Language
BDD in Javascript
Embedded application designed by ATS language
What is the best programming language for beginner?
Mark asoi ppt
Dart: A Replacement for JavaScript and Why You Should Care

What's hot (19)

PDF
Python overview
PPTX
Innoveo coding dojo
PPTX
BDD with F# at DDD9
PDF
Language portfolio
PDF
ATS language overview'
ODP
2009 Eclipse Con
PDF
ATS2 updates 2017
PDF
Grooming with Groovy
PDF
The Ring programming language version 1.2 book - Part 77 of 84
PDF
Exploring Natural Language Processing in Ruby
PPTX
C++ c#
PDF
PPTX
Computers for kids
KEY
PHP to Python with No Regrets
PDF
Static typing and proof in ATS language
PDF
Code kata
PPTX
Whats New In C Sharp 4 And Vb 10
PDF
The Ring programming language version 1.7 book - Part 89 of 196
PDF
Forget Ruby. Forget CoffeeScript. Do SOA
Python overview
Innoveo coding dojo
BDD with F# at DDD9
Language portfolio
ATS language overview'
2009 Eclipse Con
ATS2 updates 2017
Grooming with Groovy
The Ring programming language version 1.2 book - Part 77 of 84
Exploring Natural Language Processing in Ruby
C++ c#
Computers for kids
PHP to Python with No Regrets
Static typing and proof in ATS language
Code kata
Whats New In C Sharp 4 And Vb 10
The Ring programming language version 1.7 book - Part 89 of 196
Forget Ruby. Forget CoffeeScript. Do SOA
Ad

Viewers also liked (20)

PPTX
Speech to text conversion
PDF
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!
PDF
Web Components & Shadow DOM
PPS
Leaderpalooza Feb2010
PDF
Introduction to Speech Interfaces for Web Applications
PPTX
Influence With Peers
PDF
Build the Virtual Reality Web with A-Frame
PDF
20160713 webvr
PDF
Introduction to WebGL and WebVR
PPTX
Refactoring vers les design patterns pyxis v2
PPTX
PPTX
Running .NET on Docker
PDF
Martin Naumann "Life of a pixel: Web rendering performance"
PDF
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
PPTX
Hardware for a_soft_world_bkup
PDF
Putting your Passion into the Details
PDF
Programming Play
PDF
Designing True Cross-Platform Apps
PDF
Reinvent Your Creative Process with Collaborative Hackathons
PDF
The Shifting Nature of FED Role
Speech to text conversion
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!
Web Components & Shadow DOM
Leaderpalooza Feb2010
Introduction to Speech Interfaces for Web Applications
Influence With Peers
Build the Virtual Reality Web with A-Frame
20160713 webvr
Introduction to WebGL and WebVR
Refactoring vers les design patterns pyxis v2
Running .NET on Docker
Martin Naumann "Life of a pixel: Web rendering performance"
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
Hardware for a_soft_world_bkup
Putting your Passion into the Details
Programming Play
Designing True Cross-Platform Apps
Reinvent Your Creative Process with Collaborative Hackathons
The Shifting Nature of FED Role
Ad

Similar to JavaScript Speech Recognition (20)

PDF
PhoneGap Day US 2013 - Simon MacDonald: Speech Recognition
PPTX
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
PDF
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
PPTX
lec26_audio.pptx
PPT
Asr
PPTX
Speech recognition system seminar
PDF
Speech recognition - how does it work?
PPTX
Speech Recognition
PPTX
Speech Recognition Technology
PPTX
Speech recognition challenges
PPTX
Speech recognition final presentation
PPSX
Speech recognition an overview
PPTX
Speech user interface
PPT
Asr
PPTX
Speech Recognition, Text to Speech, and Voice Interfaces
PDF
A survey on Enhancements in Speech Recognition
PPTX
How speech reorganization works
PDF
K010416167
PhoneGap Day US 2013 - Simon MacDonald: Speech Recognition
550529842-SPEECH-RECOGNITION-PPT-BF.pptx
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
lec26_audio.pptx
Asr
Speech recognition system seminar
Speech recognition - how does it work?
Speech Recognition
Speech Recognition Technology
Speech recognition challenges
Speech recognition final presentation
Speech recognition an overview
Speech user interface
Asr
Speech Recognition, Text to Speech, and Voice Interfaces
A survey on Enhancements in Speech Recognition
How speech reorganization works
K010416167

More from FITC (20)

PPTX
Cut it up
PDF
Designing for Digital Health
PDF
Profiling JavaScript Performance
PPTX
Surviving Your Tech Stack
PDF
How to Pitch Your First AR Project
PDF
Start by Understanding the Problem, Not by Delivering the Answer
PDF
Cocaine to Carrots: The Art of Telling Someone Else’s Story
PDF
Everyday Innovation
PDF
HyperLight Websites
PDF
Everything is Terrifying
PDF
Post-Earth Visions: Designing for Space and the Future Human
PDF
The Rise of the Creative Social Influencer (and How to Become One)
PDF
East of the Rockies: Developing an AR Game
PDF
Creating a Proactive Healthcare System
PDF
World Transformation: The Secret Agenda of Product Design
PDF
The Power of Now
PDF
High Performance PWAs
PDF
Rise of the JAMstack
PDF
From Closed to Open: A Journey of Self Discovery
PDF
Projects Ain’t Nobody Got Time For
Cut it up
Designing for Digital Health
Profiling JavaScript Performance
Surviving Your Tech Stack
How to Pitch Your First AR Project
Start by Understanding the Problem, Not by Delivering the Answer
Cocaine to Carrots: The Art of Telling Someone Else’s Story
Everyday Innovation
HyperLight Websites
Everything is Terrifying
Post-Earth Visions: Designing for Space and the Future Human
The Rise of the Creative Social Influencer (and How to Become One)
East of the Rockies: Developing an AR Game
Creating a Proactive Healthcare System
World Transformation: The Secret Agenda of Product Design
The Power of Now
High Performance PWAs
Rise of the JAMstack
From Closed to Open: A Journey of Self Discovery
Projects Ain’t Nobody Got Time For

Recently uploaded (20)

PPTX
Introduction to Information and Communication Technology
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
DOC
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
PPTX
presentation_pfe-universite-molay-seltan.pptx
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPTX
newyork.pptxirantrafgshenepalchinachinane
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PPT
Ethics in Information System - Management Information System
PDF
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
PPTX
Introduction to cybersecurity and digital nettiquette
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPTX
Funds Management Learning Material for Beg
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
E -tech empowerment technologies PowerPoint
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
Introduction to Information and Communication Technology
An introduction to the IFRS (ISSB) Stndards.pdf
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
presentation_pfe-universite-molay-seltan.pptx
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Unit-1 introduction to cyber security discuss about how to secure a system
newyork.pptxirantrafgshenepalchinachinane
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
Ethics in Information System - Management Information System
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
Introduction to cybersecurity and digital nettiquette
Paper PDF World Game (s) Great Redesign.pdf
Funds Management Learning Material for Beg
INTERNET------BASICS-------UPDATED PPT PRESENTATION
The New Creative Director: How AI Tools for Social Media Content Creation Are...
E -tech empowerment technologies PowerPoint
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Job_Card_System_Styled_lorem_ipsum_.pptx

JavaScript Speech Recognition