Now Let Me Explain Speech Recognition Using Java.: Loadjsgf JSGF Loadjsgf
Now Let Me Explain Speech Recognition Using Java.: Loadjsgf JSGF Loadjsgf
The Java Speech API is designed to keep simple speech applications simple Þ and to
make advanced speech applications possible for non-specialist developers.A speech
recognizer is a speech engine that converts speech to text. The javax.speech.recognition
package defines the Recognizer interface to support speech recognition plus a set of
supporting classes and interfaces.
The following example shows a simple application that uses speech recognition. For this
application we need to define a grammar of everything the user can say, and we need to
write the Java software that performs the recognition task.
Allocate: The allocate methods requests that the Recognizer allocate all
necessary resources.
Load and enable grammars: The loadJSGF method reads in a JSGF document
from a reader created for the file that contains the javax.speech.demo grammar.
(Alternatively, the loadJSGF method can load a grammar from a URL.) Next,
the grammar is enabled. Once the recognizer receives focus (see below), an
enabled grammar is activated for recognition: that is, the recognizer compares
incoming audio to the active grammars and listens for speech that matches those
grammars.
Request focus and resume: For recognition of the grammar to occur, the
recognizer must be in the RESUMED state and must have the speech focus. The
requestFocus and resume methods achieve this.
Process result: Once the main method is completed, the application waits until
the user speaks. When the user speaks something that matches the loaded
grammar, the recognizer issues a RESULT_ACCEPTED event to the listener we
attached to the recognizer. The source of this event is a Result object that
contains information about what the recognizer heard. The getBestTokens
method returns an array of ResultTokens, each of which represents a single
spoken word. These words are printed.
Class javax.speech.Central
java.lang.Object
|
+--javax.speech.Central
A mode descriptor defines a set of required properties for an engine. For example, a
SynthesizerModeDesc can describe a Synthesizer for Swiss German that has a male voice.
Similarly, a RecognizerModeDesc can describe a Recognizer that supports dictation for
Japanese.
An application is responsible for determining its own functional requirements for speech
input/output and providing an appropriate mode descriptor. There are three cases for mode
descriptors:
1. null
2. Created by the application
3. Obtained from the availableRecognizers or availableSynthesizers methods of
Central.
The create engine methods operate differently for the three cases. That is, engine selection
depends upon the type of the mode descriptor:
1. null mode descriptor: the Central class selects a suitable engine for the default
Locale.
2. Application-created mode descriptor: the Central class attempts to locate an engine
with all application-specified properties.
3. Mode descriptor from availableRecognizers or availableSynthesizers:
descriptors returned by these two methods identify a specific engine with a specific
operating mode. Central creates an instance of that engine. (Note: these mode
descriptors are distinguished because they implement the EngineCreate interface.)
Case 1: Example
Case 2: Example
// Create a dictation recognizer for British English
// Note: the UK locale is English spoken in Britain
RecognizerModeDesc desc = new RecognizerModeDesc(Locale.UK,
Boolean.TRUE);
Recognizer rec = Central.createRecognizer(desc);
Case 3: Example
// Obtain a list of all German recognizers
RecognizerModeDesc desc = new RecognizerModeDesc(Locale.GERMAN);
EngineList list = Central.availableRecognizers(desc);
// select amongst by other desired engine properties
RecognizerModeDesc chosen = ...
// create an engine from "chosen" - an engine-provided descriptor
Recognizer rec = Central.createRecognizer(chosen);
For cases 1 and 2 there is a defined procedure for selecting an engine to be created. (For case
3, the application can apply it's own selection procedure.)
Locale is treated specially in the selection to ensure that language is always considered when
selecting an engine. If a locale is not provided, the default locale
(java.util.Locale.getDefault) is used.
When more than one engine is a legal match in the final step, the engines are ordered as
returned by the availableRecognizers or availableSynthesizers method.
Security
Access to speech engines is restricted by Java's security system. This is to ensure that
malicious applets don't use the speech engines inappropriately. For example, a recognizer
should not be usable without explicit permission because it could be used to monitor ("bug")
an office.
The SpeechPermission class defines the types of permission that can be granted or denied
for applications. This permission system is based on the JDK 1.2 fine-grained security model.
Engine Registration
The Central class locates, selects and creates speech engines from amongst a list of
registered engines. Thus, for an engine to be used by Java applications, the engine must
register itself with Central. There are two registration mechanisms: (1) add an
EngineCentral class to a speech properties file, (2) temporarily register an engine by calling
the registerEngineCentral method.
The speech properties files provide persistent registration of speech engines. When Central
is first called, it looks for properties in two files:
<user.home>/speech.properties
<java.home>/lib/speech.properties
where the <user.home> and <java.home> are the values obtained from the System
properties object. (The '/' separator will vary across operating systems.) Engines identified in
either properties file are made available through the methods of Central.
The property files must contain data in the format that is read by the load method of the
Properties class. Central looks for properties of the form
com.acme.recognizer.EngineCentral=com.acme.recognizer.AcmeEngineCentral
This line is interpreted as "the EngineCentral object for the com.acme.recognizer engine
is the class called com.acme.recognizer.AcmeEngineCentral. When it is first called, the
Central class will attempt to create an instance of each EngineCentral object and will
ensure that it implements the EngineCentral interface.
Note to engine providers: Central calls each EngineCentral for each call to
availableRecognizers or availableSynthesizers and sometimes createRecognizer
and createSynthesizer The results are not stored. The
EngineCentral.createEngineList method should be reasonably efficient.
Method Summary
static EngineList availableRecognizers(EngineModeDesc require)
List EngineModeDesc objects for available recognition engine
modes that match the required properties.
static EngineList availableSynthesizers(EngineModeDesc require)
List EngineModeDesc objects for available synthesis engine modes
that match the required properties.
static Recognizer createRecognizer(EngineModeDesc require)
Create a Recognizer with specified required properties.
static Synthesizer createSynthesizer(EngineModeDesc require)
Create a Synthesizer with specified required properties.
static void registerEngineCentral(String className)
Register a speech engine with the Central class for use by the
current application.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notifyAll, notify, toString, wait, wait, wait
Method Detail
createRecognizer
public static final Recognizer createRecognizer(EngineModeDesc require)
throws
IllegalArgumentException,
EngineException,
SecurityException
Create a Recognizer with specified required properties. If there is no Recognizer
with the required properties the method returns null.
The required properties defined in the input parameter may be provided as either an
EngineModeDesc object or a RecognizerModeDesc object. The input parameter may
also be null, in which case an engine is selected that supports the language of the
default locale.
A non-null mode descriptor may be either application-created or a mode descriptor
returned by the availableRecognizers method.
Parameters:
require - required engine properties or null for default engine selection
Returns:
a recognizer matching the required properties or null if none is available
Throws:
IllegalArgumentException - if the properties of the EngineModeDesc do not refer to a
known engine or engine mode.
EngineException - if the engine defined by this RecognizerModeDesc could not be
properly created.
SecurityException - if the caller does not have createRecognizer permission
See Also:
availableRecognizers, RecognizerModeDesc
availableRecognizers
public static final EngineList availableRecognizers(EngineModeDesc require)
throws
SecurityException
List EngineModeDesc objects for available recognition engine modes that match the
required properties. If the require parameter is null, then all known recognizers are
listed.
Returns a zero-length list if no engines are available or if no engines have the required
properties. (The method never returns null).
The order of the EngineModeDesc objects in the list is partially defined. For each
registered engine (technically, each registered EngineCentral object) the order of the
descriptors is preserved. Thus, each installed speech engine should order its descriptor
objects with the most useful modes first, for example, a mode that is already loaded
and running on a desktop.
Parameters:
require - an EngineModeDesc or RecognizerModeDesc defining the required
features of the mode descriptors in the returned list
Returns:
list of mode descriptors with the required properties
Throws:
SecurityException - if the caller does not have permission to use speech recognition
createSynthesizer
public static final Synthesizer createSynthesizer(EngineModeDesc require)
throws
IllegalArgumentException,
EngineException
Create a Synthesizer with specified required properties. If there is no Synthesizer
with the required properties the method returns null.
The required properties defined in the input parameter may be provided as either an
EngineModeDesc object or a SynthesizerModeDesc object. The input parameter may
also be null, in which case an engine is selected that supports the language of the
default locale.
Parameters:
require - required engine properties or null for default engine selection
Returns:
a Synthesizer matching the required properties or null if none is available
Throws:
IllegalArgumentException - if the properties of the EngineModeDesc do not refer to a
known engine or engine mode.
EngineException - if the engine defined by this SynthesizerModeDesc could not be
properly created.
See Also:
availableSynthesizers, SynthesizerModeDesc
availableSynthesizers
public static final EngineList availableSynthesizers(EngineModeDesc
require)
throws
SecurityException
List EngineModeDesc objects for available synthesis engine modes that match the
required properties. If the require parameter is null, then all available known
synthesizers are listed.
Returns an empty list (rather than null) if no engines are available or if no engines
have the required properties.
The order of the EngineModeDesc objects in the list is partially defined. For each
speech installation (technically, each registered EngineCentral object) the order of
the descriptors is preserved. Thus, each installed speech engine should order its
descriptor objects with the most useful modes first, for example, a mode that is
already loaded and running on a desktop.
Throws:
SecurityException - if the caller does not have permission to use speech engines
registerEngineCentral
public static final void registerEngineCentral(String className)
throws EngineException
Register a speech engine with the Central class for use by the current application.
This call adds the specified class name to the list of EngineCentral objects. The
registered engine is not stored persistently in the properties files. If className is
already registered, the call has no effect.
Parameters:
className - name of a class that implements the EngineCentral interface and
provides access to an engine implementation
Throws:
EngineException - if className is not a legal class or it does not implement the
EngineCentral interface
2
The javax.speech package of the Java Speech API defines an abstract software
representation of a speech engine. "Speech engine" is the generic term for a system designed
to deal with either speech input or speech output. Speech synthesizers and speech recognizers
are both speech engine instances. Speaker verification systems and speaker identification
systems are also speech engines but are not currently supported through the Java Speech API.
The javax.speech package defines classes and interfaces that define the basic functionality
of an engine. The javax.speech.synthesis package and javax.speech.recognition
package extend and augment the basic functionality to define the specific capabilities of
speech synthesizers and speech recognizers.
The Java Speech API makes only one assumption about the implementation of a JSAPI
engine: that it provides a true implementation of the Java classes and interfaces defined by
the API. In supporting those classes and interfaces, an engine may completely software-based
or may be a combination of software and hardware. The engine may be local to the client
computer or remotely operating on a server. The engine may be written entirely as Java
software or may be a combination of Java software and native code.
The basic processes for using a speech engine in an application are as follows.
1. Identify the application's functional requirements for an engine (e.g, language or dictation
capability).
2. Locate and create an engine that meets those functional requirements.
3. Allocate the resources for the engine.
4. Set up the engine.
5. Begin operation of the engine - technically, resume it.
6. Use the engine
7. Deallocate the resources of the engine.
Steps 4 and 6 in this process operate differently for the two types of speech engine -
recognizer or synthesizer. The other steps apply to all speech engines and are described in the
remainder of this chapter.
The "Hello World!" code example for speech synthesis and the "Hello World!" code example
for speech recognition both illustrate the 7 steps described above. They also show that simple
speech applications are simple to write with the Java Speech API - writing your first speech
application should not be too hard.
The basic engine properties are defined in the EngineModeDesc class. Additional specific
properties for speech recognizers and synthesizers are defined by the RecognizerModeDesc
and SynthesizerModeDesc classes that are contained in the javax.speech.recognition
and javax.speech.synthesis packages respectively.
The basic properties defined for all speech engines are listed in Table 4-1
ModeName A String that defines a specific mode of operation of the speech engine. e.g. "Acme
Spanish Dictator".
Locale A java.util.Locale object that indicates the language supported by the speech engine,
and optionally, a country and a variant. The Locale class uses standard ISO 639
language codes and ISO 3166 country codes. For example, Locale("fr", "ca")
represents a Canadian French locale, and Locale("en", "") represents English (the
language).
Running A Boolean object that is TRUE for engines which are already running on a platform,
otherwise FALSE. Selecting a running engine allows for sharing of resources and may
also allow for fast creation of a speech engine object.
The one additional property defined by the SynthesizerModeDesc class for speech
synthesizers is shown in Table 4-2
Property Description
Name
List of An array of voices that the synthesizer is capable of producing. Each voice is defined
voices by an instance of the Voice class which encapsulates voice name, gender, age and
speaking style.
The two additional properties defined by the RecognizerModeDesc class for speech
recognizers are shown in Table 4-3
Dictation A Boolean value indicating whether this mode of operation of the recognizer
supported supports a dictation grammar.
Speaker profiles A list of SpeakerProfile objects for speakers who have trained the recognizer.
Recognizers that do not support training return a null list.
Locale getLocale();
Furthermore, all the properties are defined by class objects, never by primitives (primitives in
the Java programming language include boolean, int etc.). With this design, a null value
always represents "don't care" and is used by applications to indicate that a particular
property is unimportant to its functionality. For instance, a null value for the "dictation
supported" property indicates that dictation is not relevant to engine selection. Since that
property is represented by the Boolean class, a value of TRUE indicates that dictation is
required and FALSE indicates explicitly that dictation should not be provided.
The simplest way to create a speech engine is to request a default engine. This is appropriate
when an application wants an engine for the default locale (specifically for the local
language) and does not have any special functional requirements for the engine. The Central
class in the javax.speech package is used for locating and creating engines. Default engine
creation uses two static methods of the Central class.
import javax.speech.*;
import javax.speech.synthesis.*;
import javax.speech.recognition.*;
{
// Get a synthesizer for the default locale
Synthesizer synth = Central.createSynthesizer(null);
// Get a recognizer for the default locale
Recognizer rec = Central.createRecognizer(null);
}
For both the createSynthesizer and createRecognizer the null parameters indicate that
the application doesn't care about the properties of the synthesizer or recognizer. However,
both creation methods have an implicit selection policy. Since the application did not specify
the language of the engine, the language from the system's default locale returned by
java.util.Locale.getDefault() is used. In all cases of creating a speech engine, the Java
Speech API forces language to be considered since it is fundamental to correct engine
operation.
If more than one engine supports the default language, the Central then gives preference to
an engine that is running (running property is true), and then to an engine that supports the
country defined in the default locale.
If the example above is performed in the US locale, a recognizer and synthesizer for the
English language will be returned if one is available. Furthermore, if engines are installed for
both British and US English, the US English engine would be created.
The next easiest way to create an engine is to create a mode descriptor, define desired engine
properties and pass the descriptor to the appropriate engine creation method of the Central
class. When the mode descriptor passed to the createSynthesizer or createRecognizer
methods is non-null, an engine is created which matches all of the properties defined in the
descriptor. If no suitable engine is available, the methods return null.
The list of properties is described in the Properties of a Speech Engine. All the properties in
EngineModeDesc and its sub-classes RecognizerModeDesc and SynthesizerModeDesc
default to null to indicate "don't care".
The following code sample shows a method that creates a dictation-capable recognizer for the
default locale. It returns null if no suitable engine is available.
In the next example we create a Synthesizer for Spanish with a male voice.
/**
* Return a speech synthesizer for Spanish.
* Return null if no such engine is available.
*/
Synthesizer createSpanishSynthesizer()
{
// Create a mode descriptor with all required features
// "es" is the ISO 639 language code for "Spanish"
SynthesizerModeDesc required = new SynthesizerModeDesc();
required.setLocale(new Locale("es", null));
required.addVoice(new Voice(
null, GENDER_MALE, AGE_DONT_CARE,
null));
return Central.createSynthesizer(required);
}
Again, the method returns null if no matching synthesizer is found and the application is
responsible for determining how to handle the situation.
This section explains more advanced mechanisms for locating and creating speech engines.
Most applications do not need to use these mechanisms. Readers may choose to skip this
section.
In addition to performing engine creation, the Central class can provide lists of available
recognizers and synthesizers from two static methods.
If the mode passed to either method is null, then all known speech recognizers or
synthesizers are returned. Unlike the createRecognizer and createSynthesizer methods,
there is no policy that restricts the list to the default locale or to running engines - in advanced
selection such decisions are the responsibility of the application.
The following code shows how an application can obtain a list of speech synthesizers with a
female voice for German. All other parameters of the mode descriptor remain null for "don't
care" (engine name, mode name etc.).
import javax.speech.*;
import javax.speech.synthesis.*;
If the application specifically wanted Swiss German and a running engine it would add the
following before calling availableSynthesizers:
required.setRunning(Boolean.TRUE);
Although applications do not normally care, engine-provided mode descriptors are special in
two other ways. First, all engine-provided mode descriptors are required to implement the
EngineCreate interface which includes a single createEngine method. The Central class
uses this interface to perform the creation. Second, engine-provided mode descriptors may
extend the SynthesizerModeDesc and RecognizerModeDesc classes to encapsulate
additional features and information. Applications should not access that information if they
want to be portable, but engines will use that information when creating a running
Synthesizer or Recognizer.
In the simplest case, applications simply select the first in the list which is obtained using the
EngineList.first method. For example:
EngineModeDesc required;
...
EngineList list = Central.availableRecognizers(required);
if (!list.isEmpty()) {
EngineModeDesc desc = (EngineModeDesc)(list.first());
Recognizer rec = Central.createRecognizer(desc);
}
More sophisticated selection algorithms may test additional properties of the available
engine. For example, an application may give precedence to a synthesizer mode that has a
voice called "Victoria".
The list manipulation methods of the EngineList class are convenience methods for
advanced engine selection.
anyMatch(EngineModeDesc) returns true if at least one mode descriptor in the list has
the required properties.
requireMatch(EngineModeDesc) removes elements from the list that do not match the
required properties.
The following code shows how to use these methods to obtain a Spanish dictation recognizer
with preference given to a recognizer that has been trained for a specified speaker passed as
an input parameter.
import javax.speech.*;
import javax.speech.recognition.*;
import java.util.Locale;
Recognizer getSpanishDictation(String name)
{
RecognizerModeDesc required = new RecognizerModeDesc();
required.setLocale(new Locale("es", ""));
required.setDictationGrammarSupported(Boolean.TRUE);
The Engine interface includes a set of methods that define a generalized state system
manager. Here we consider the operation of those methods. In the following sections we
consider the two core state systems implemented by all speech engines: the allocation state
system and the pause-resume state system. In Chapter 5, the state system for synthesizer
queue management is described. In Chapter 6, the state systems for recognizer focus and for
recognition activity are described.
A state defines a particular mode of operation of a speech engine. For example, the output
queue moves between the QUEUE_EMPTY and QUEUE_NOT_EMPTY states. The following are the
basics of state management.
The getEngineState method of the Engine interface returns the current engine state. The
engine state is represented by a long value (64-bit value). Specified bits of the state represent
the engine being in specific states. This bit- wise representation is used because an engine
can be in more than one state at a time, and usually is during normal operation.
Every speech engine must be in one and only one of the four allocation states (described in
detail in Section 4.4.2). These states are DEALLOCATED, ALLOCATED, ALLOCATING_RESOURCES
and DEALLOCATING_RESOURCES. The ALLOCATED state has multiple sub-states. Any
ALLOCATED engine must be in either the PAUSED or the RESUMED state (described in detail in
Section 4.4.4).
Synthesizers have a separate sub-state system for queue status. Like the paused/resumed state
system, the QUEUE_EMPTY and QUEUE_NOT_EMPTY states are both sub-states of the ALLOCATED
state. Furthermore, the queue status and the paused/resumed status are independent.
Recognizers have three independent sub-state systems to the ALLOCATED state (the
PAUSED/RESUMED system plus two others). The LISTENING, PROCESSING and SUSPENDED
states indicate the current activity of the recognition process. The FOCUS_ON and FOCUS_OFF
states indicate whether the recognizer currently has speech focus. For a recognizer, all three
sub-state systems of the ALLOCATED state operate independently (with some exceptions that
are discussed in the recognition chapter).
Each of these state names is represented by a static long in which a single unique bit is set.
The & and | operators of the Java programming language are used to manipulate these state
bits. For example, the state of an allocated, resumed synthesizer with an empty speech output
queue is defined by:
For convenience, the Engine interface defines two additional methods for handling engine
states. The testEngineState method is passed a state value and returns true if all the state
bits in that value are currently set for the engine. Again, to test whether an engine is resumed,
we use the test:
if (engine.testEngineState(Engine.RESUMED)) ...
The final state method is waitEngineState. This method blocks the calling thread until the
engine reaches the defined state. For example, to wait until a synthesizer stops speaking
because its queue is empty we use:
engine.waitEngineState(Synthesizer.QUEUE_EMPTY);
In addition to method calls, applications can monitor state through the event system. Every
state transition is marked by an EngineEvent being issued to each EngineListener attached
to the Engine. The EngineEvent class is extended by the SynthesizerEvent and
RecognizerEvent classes for state transitions that are specific to those engines. For example,
the RECOGNIZER_PROCESSING RecognizerEvent indicates a transition from the LISTENING
state to the PROCESSING (which indicates that the recognizer has detected speech and is
producing a result).
Engine allocation is the process in which the resources required by a speech recognizer or
synthesizer are obtained. Engines are not automatically allocated when created because
speech engines can require substantial resources (CPU, memory and disk space) and because
they may need exclusive access to an audio resource (e.g. microphone input or speaker
output). Furthermore, allocation can be a slow procedure for some engines (perhaps a few
seconds or over a minute).
The allocate method of the Engine interface requests the engine to perform allocation and
is usually one of the first calls made to a created speech engine. A newly created engine is
always in the DEALLOCATED state. A call to the allocate method is, technically speaking, a
request to the engine to transition to the ALLOCATED state. During the transition, the engine is
in a temporary ALLOCATING_RESOURCES state.
The deallocate method of the Engine interface requests the engine to perform deallocation
of its resources. All well-behaved applications call deallocate once they have finished
using an engine so that its resources are freed up for other applications. The deallocate
method returns the engine to the DEALLOCATED state. During the transition, the engine is in a
temporary DEALLOCATING_RESOURCES state.
Figure 4-1 shows the state diagram for the allocation state system.
Each block represents a state of the engine. An engine must always be in one of the four
specified states. As the engine transitions between states, the event labelled on the transition
arc is issued to the EngineListeners attached to the engine.
For advanced applications, it is often desirable to start up the allocation of a speech engine in
a background thread while other parts of the application are being initialized. This can be
achieved by calling the allocate method in a separate thread. The following code shows an
example of this using an inner class implementation of the Runnable interface. To determine
when the allocation method is complete, we check later in the code for the engine being in the
ALLOCATED state.
Engine engine;
{
engine = Central.createRecognizer();
A full implementation of an application that uses this approach to engine allocation needs to
consider the possibility that the allocation fails. In that case, the allocate method throws an
EngineException and the engine returns to the DEALLOCATED state.
Another issue advanced applications need to consider is class blocking. Most methods of the
Engine, Recognizer and Synthesizer are defined for normal operation in the
ALLOCATED state. What if they are called for an engine in another allocation state? For
most methods, the operation is defined as follows:
ALLOCATED state: for nearly all methods normal behavior is defined for this state. (An
exception is the allocate method).
ALLOCATING_RESOURCES state: most methods block in this state. The calling thread waits
until the engine reaches the ALLOCATED state. Once that state is reached, the method
behaves as normally defined.
DEALLOCATED state: most methods are not defined for this state, so an
EngineStateError is thrown. (Exceptions include the allocate method and certain
methods listed below.)
DEALLOCATING_RESOURCES state: most methods are not defined for this state, so an
EngineStateError is thrown.
A small subset of engine methods will operate correctly in all engine states. The
getEngineProperties always allows runtime engine properties to be set and tested
(although properties only take effect in the ALLOCATED state). The getEngineModeDesc
method can always return the mode descriptor for the engine. Finally, the three engine state
methods - getEngineState, testEngineState and waitEngineState - always operated as
defined.
All ALLOCATED speech engines have PAUSED and RESUMED states. Once an engine reaches the
ALLOCATED state, it enters either the PAUSED or the RESUMED state. The factors that affect the
initial PAUSED/RESUMED state are described below.
The PAUSED/RESUMED state indicates whether the audio input or output of the engine is on or
off. A resumed recognizer is receiving audio input. A paused recognizer is ignoring audio
input. A resumed synthesizer produces audio output as it speaks. A paused synthesizer is not
producing audio output.
As part of the engine state system, the Engine interface provides several methods to test
PAUSED/RESUMED state. The general state system is described previously in Section 4.4.
An application controls an engine's PAUSED/RESUMED state with the pause and resume
methods. An application may pause or resume an engine indefinitely. Each time the
PAUSED/RESUMED state changes an ENGINE_PAUSED or ENGINE_RESUMED type of EngineEvent
is issued each EngineListener attached to the Engine.
Figure 4-2 shows the basic pause and resume diagram for a speech engine. As a sub-state
system of the ALLOCATED state, the pause and resume states represented within the ALLOCATED
state as shown in Figure 4-1.
As with Figure 4-1, Figure 4-2 represents states as labelled blocks, and the engine events as
labelled arcs between those blocks. In this diagram the large block is the ALLOCATED state
which contains both the PAUSED and RESUMED states.
The PAUSED/RESUMED state of a speech engine may, in many situations, be shared by multiple
applications. Here we must make a distinction between the Java object that represents a
Recognizer or Synthesizer and the underlying engine that may have multiple Java and
non-Java applications connected to it. For example, in personal computing systems (e.g.,
desktops and laptops), there is typically a single engine running and connected to microphone
input or speaker/ headphone output and all application share that resource.
When a Recognizer or Synthesizer (the Java software object) is paused and resumed the
shared underlying engine is paused and resumed and all applications connected to that engine
are affected.
An application should pause and resume an engine only in response to a user request (e.g.,
because a microphone button is pressed for a recognizer). For example, it should not pause
an engine before deallocating it.
Because an engine could be resumed without explicitly requesting a resume it should always
be prepared for that resume. For example, it should not place text on the synthesizer's
output queue unless it would expect it to be spoken upon a resume. Similarly, the set of
enabled grammars of a recognizer should always be appropriate to the application context,
and the application should be prepared to accept input results from the recognizer if an
enabled grammar is unexpectedly resumed.
4.4.6 Synthesizer Pause
For a speech synthesizer - a speech output device - pause immediately stops the audio output
of synthesized speech. Resume recommences speech output from the point at which the pause
took effect. This is analogous to pause and resume on a tape player or CD player.
For a recognizer, pausing and resuming turns audio input off and on and is analogous to
switching the microphone off and on. When audio input is off the audio is lost. Unlike a
synthesizer, for which a resume continues speech output from the point at which it was
paused, resuming a recognizer restarts the processing of audio input from the time at which
resume is called.
Under normal circumstances, pausing a recognizer will stop the recognizer's internal
processes that match audio against grammars. If the user was in the middle of speaking at the
instant at which the recognizer was paused, the recognizer is forced to finalize its recognition
process. This is because a recognizer cannot assume that the audio received just before
pausing is in any way linked to the audio data that it will receive after being resumed.
Technically speaking, pausing introduces a discontinuity into the audio input stream.
One complexity for pausing and resuming a recognizer (not relevant to synthesizers) is the
role of internal buffering. For various reasons, described in Chapter 6, a recognizer has a
buffer for audio input which mediates between the audio device and the internal component
of the recognizer which perform that match of the audio to the grammars. If recognizer is
performing in real-time the buffer is empty or nearly empty. If the recognizer is temporarily
suspended or operates slower than real-time, then the buffer may contain seconds of audio or
more.
When a recognizer is paused, the pause takes effect on the input end of the buffer; i.e, the
recognizer stops putting data into the buffer. At the other end of the buffer - where the actual
recognition is performed Þ- the recognizer continues to process audio data until the buffer is
empty. This means that the recognizer can continue to produce recognition results for a
limited period of time even after it has been paused. (A Recognizer also provides a
forceFinalize method with an option to flush the audio input buffer.)
The SUSPENDED state of a Recognizer is superficially similar to the PAUSED state. In the
SUSPENDED state the recognizer is not processing audio input from the buffer, but is
temporarily halted while an application updates its grammars. A key distinction between the
PAUSED state and the SUSPENDED state is that in the SUSPENDED state audio input can be still
be coming into the audio input buffer. When the recognizer leaves the SUSPENDED state the
audio is processed. The SUSPENDED state allows a user to continue talking to the recognizer
even while the recognizer is temporarily SUSPENDED. Furthermore, by updating grammars in
the SUSPENDED state, an application can apply multiple grammar changes instantaneously
with respect to the audio input stream.
Java Speech API events follow the JavaBeans event model. Events are issued to a listener
attached to an object involved in generating that event. All the speech events are derived from
the SpeechEvent class in the javax.speech package.
Name Description
Name Description
Name Description
RecognizerAudioEvent Extends AudioEvent with events for start and stop of speech and
audio level updates.
A speech engine is required to provide all its events in synchronization with the AWT event
queue whenever possible. The reason for this constraint is that it simplifies to integration of
speech events with AWT events and the Java Foundation Classes events (e.g., keyboard,
mouse and focus events). This constraint does not adversely affect applications that do not
provide graphical interfaces.
Synchronization with the AWT event queue means that the AWT event queue is not issuing
another event when the speech event is being issued. To implement this, speech engines need
to place speech events onto the AWT event queue. The queue is obtained through the AWT
Toolkit:
EventQueue q = Toolkit.getDefaultToolkit().getSystemEventQueue();
The EventQueue runs a separate thread for event dispatch. Speech engines are not required to
issue the events through that thread, but should ensure that thread is blocked while the speech
event is issued.
Note that SpeechEvent is not a sub-class of AWTEvent, and that speech events are not
actually placed directly on the AWT event queue. Instead, a speech engine is performing
internal activities to keep its internal speech event queue synchronized with the AWT event
queue to make an application developer's life easier.
Speech engines each have a set of properties that can be changed while the engine is running.
The EngineProperties interface defined in the javax.speech package is the root interface
for accessing runtime properties. It is extended by the SynthesizerProperties interface
defined in the javax.speech.synthesis package, and the RecognizerProperties
interface defined in the javax.speech.recognition package.
{
Recognizer rec = ...;
RecognizerProperties props = rec.getRecognizerProperties();
}
The reset method is used to set all engine properties to default values.
The SynthesizerProperties and RecognizerProperties interfaces define the sets of
runtime features of those engine types. These specific properties defined by these interfaces
are described in Chapter 5 and Chapter 6 respectively.
For each property there is a get and a set method, both using the JavaBeans property patterns.
For example, the methods for handling a synthesizer's speaking voice are:
float getVolume()
The get method returns the current setting. The set method attempts to set a new volume. A
set method throws an exception if it fails. Typically, this is because the engine rejects the set
value. In the case of volume, the legal range is 0.0 to 1.0. Values outside of this range cause
an exception.
A property change event may also be issued because another application has changed a
property, because changing one property affects another (e.g., changing a synthesizer's voice
from male to female will usually cause an increase in the pitch setting), or because the
property values have been reset.
The AudioManager of a speech engine is provided for management of the engine's speech
input or output. For the Java Speech API Version 1.0 specification, the AudioManager
interface is minimal. As the audio streaming interfaces for the Java platform are established,
the AudioManager interface will be enhanced for more advanced functionality.
For this release, the AudioManager interface defines the ability to attach and remove
AudioListener objects. For this release, the AudioListener interface is simple: it is empty.
However, the RecognizerAudioListener interface extends the AudioListener interface to
receive three audio event types (SPEECH_STARTED, SPEECH_STOPPED and AUDIO_LEVEL
events). These events are described in detail in Chapter 6. As a type of AudioListener, a
RecognizerAudioListener is attached and removed through the AudioManager.
An engine can optionally provide a VocabManager for control of the pronunciation of words
and other vocabulary. This manager is obtained by calling the getVocabManager method of a
Recognizer or Synthesizer (it is a method of the Engine interface). If the engine does not
support vocabulary management, the method returns null.
The manager defines a list of Word objects. Words can be added to the VocabManager,
removed from the VocabManager, and searched through the VocabManager.
The Word class is defined in the javax.speech package. Each Word is defined by the
following features.
Written form: a required String that defines how the Word should be presented visually.
Spoken form: an optional String that indicates how the Word is spoken. For English, the
spoken form might be used for defining how acronyms are spoken. For Japanese, the spoken
form could provide a kana representation of how kanji in the written form is pronounced.
Grammatical categories: an optional set of or'ed grammatical categories. The Word class
defines 16 different classes of words (noun, verb, conjunction etc.). These classes do not
represent a complete linguistic breakdown of all languages. Instead they are intended to
provide a Recognizer or Synthesizer with additional information about a word that
may assist in correctly recognizing or correctly speaking it.
The voice property is used to control the speaking voice of the synthesizer. The set of voices
supported by a synthesizer can be obtained by the getVoices method of the synthesizer's
SynthesizerModeDesc object. Each voice is defined by a voice name, gender, age and
speaking style. Selection of voices is described in more detail in Selecting Voices.
The remaining four properties control prosody. Prosody is a set of features of speech
including the pitch and intonation, rhythm and timing, stress and other characteristics which
affect the style of the speech. The prosodic features controlled through the
SynthesizerProperties interface are:
Volume: a float value that is set on a scale from 0.0 (silence) to 1.0 (loudest).
Speaking rate: a float value indicating the speech output rate in words per minute.
Higher values indicate faster speech output. Reasonable speaking rates depend upon
the synthesizer and the current voice (voices may have different natural speeds). Also,
speaking rate is also dependent upon the language because of different conventions
for what is a "word". For English, a typical speaking rate is around 200 words per
minute.
Pitch: the baseline pitch is a float value given in Hertz. Different voices have different
natural sounding ranges of pitch. Typical male voices are between 80 and 180 Hertz.
Female pitches typically vary from 150 to 300 Hertz.
Pitch range: a float value indicating a preferred range for variation in pitch above the
baseline setting. A narrow pitch range provides monotonous output while wide range
provide a more lively voice. The pitch range is typically between 20% and 80% of the
baseline pitch
The top of queue item is the head of the queue. The top of queue item is the item currently
being spoken or is the item that will be spoken next when a paused synthesizer is resumed.
The Synthesizer interface provides a number of methods for manipulating the output queue.
The enumerateQueue method returns an Enumeration object containing a
SynthesizerQueueItem for each object on the queue. The first object in the enumeration is
the top of queue. If the queue is empty the enumerateQueue method returns null.
Each SynthesizerQueueItem in the enumeration contains four properties. Each property has
a accessor method:
getSource returns the source object for the queue item. The source is the object
passed to the speak and speakPlainText method: a Speakable object, a URL or a
String.
getText returns the text representation for the queue item. For a Speakable object it
is the String returned by the getJSMLText method. For a URL it is the String loaded
from that URL. For a string source, it is that string object.
The QUEUE_EMPTY and QUEUE_NOT_EMPTY states are parallel states to the PAUSED and
RESUMED states. These two state systems operate independently as shown in Figure 5-1 (an
extension of Figure 4-2).
To install JSAPI, extract the downloaded file, freetts-1.2beta2-bin.zip, in C drive and set the
classpath of the lib directory of free implementation by executing the following command at
the command prompt:
The javax.speech package contains classes and interfaces that define how the speech engine
functions. A speech engine is a system that manages speech input and output. The
javax.speech package defines the basic properties of a speech engine.
AudioEvent
Central
EngineModeDesc
EngineList
Engine
AudioManager
VocabManager
The AudioEvent class specifies the events related to audio input for the speech recognizer
and audio output for speech synthesis. The AudioEvent class defines a method,
paramString(), which returns a parameter string to identify the event occurred. This method is
used for debugging and for maintaining event logs.
The Central class allows you to access all the speech input and output functions of a speech
engine. This class provides methods to locate, select, and create speech engines, such as
speech recognizers and speech synthesizers. A Java application can use a speech engine if the
speech engine is registered with the Central class. The various methods declared in the
Central class are:
The EngineModeDesc class defines the basic properties of a speech engine that determine the
mode of operation, such as Spanish or English dictator. The various methods declared in the
EngineModeDesc class are:
getEngineName(): Returns the engine name, which should be a unique string across
the provider.
setEngineName(): Sets the name of the engine as provided in the input parameter
string.
getModeName(): Returns the mode name, which uniquely identifies the single mode
of operation of the speech engine.
setModeName(): Sets the mode name as provided in the input parameter string.
getLocale(): Returns the object of the Locale class for the engine mode.
setLocale(): Sets the Locale of the engine according to the specified input parameter,
which is an object of the Locale class.
getRunning(): Returns a Boolean value indicating whether or not the speech engine is
already running.
setRunning(): Sets the feature required to run the engine, according to the Boolean
input parameter.
equals(): Returns a Boolean value, which is true if the EngineModeDesc object input
parameter is not null and has equal values for engine name, mode name, and Locale.
The EngineList class selects the appropriate speech engine with the help of the methods of
the Central class. The EngineList class contains a set of EngineModeDesc class objects. The
various methods available in the EngineList class are:
orderByMatch(): Orders the list that matches the required features. This method takes
the EngineModeDesc class object as an input parameter.
The Engine interface is the parent interface for all speech engines. The speech engines derive
functions, such as allocation and deallocation of methods, access to EngineProperties and
EngineModeDesc classes, and use of the pause() and resume() methods from the Engine
interface. Some of the methods defined by the Engine interface are:
allocate(): Allocates the resources required by the Engine interface and sets the state
of the Engine interface as ALLOCATED. When the method executes, the Engine
interface is in the ALLOCATING_RESOURCES state.
deallocate(): Deallocates the resources of the engine, which are acquired at the
ALLOCATED state and during the operation. This method sets the state of the engine
as DEALLOCATED.
pause(): Pauses the audio stream of the engine and sets the state of the engine as
PAUSED.
resume(): Resumes the audio streaming to or from a paused engine and sets the state
of the engine as RESUME.
The AudioManager interface allows an application to control and monitor the audio input and
output, and other audio-related events, such as start and stop audio. The methods provided by
this interface are:
The VocabManager interface manages words that the speech engine uses. This interface
provides information about difficult words to the speech engine. Some of the methods
provided by this interface are:
addWord(): Adds a word to the vocabulary of the speech engine. This method takes
an object of the Word class as an input parameter.
addWords(): Adds an array of words to the vocabulary of the speech engine. This
method takes an object array of the Word class as an input parameter.
removeWord(): Removes a word from the vocabulary of the speech engine. This
method takes an object of the Word class as an input parameter.
3 tokens
The javax.speech.recognition package provides classes and interfaces that support speech
recognition. This package inherits the basic functioning from the javax.speech package. The
speech recognizer is a type of speech engine that has the ability to recognize and convert
incoming speech to text.
RecognizerModeDesc
Rule
GrammarEvent
Grammar
Recognizer
Result
The RecognizerModeDesc Class
The RecognizerModeDesc class extends the basic functioning of the EngineModeDesc class
with properties specific to a speech recognizer. Some commonly used methods of the
RecognizerModeDesc class are:
The Rule class defines the basic component of the RuleGrammar interface. The methods
provided by this class are:
copy(): Returns a copy of the Rule class and all its subrules, which includes the
RuleAlternatives, RuleCount, RuleParse, RuleSequence, and RuleTag classes.
toString(): Returns a string representing the portion of Java Speech Grammar Format
(JSGF) that appears on the right of a rule definition.
The Recognizer interface extends the functioning of the Engine interface of the javax.speech
package. The Recognizer interface is created by using the createRecognizer() method of the
Central class. Some methods defined in the Recognizer interface are:
suspend(): Suspends the speech recognition temporarily and places the Recognizer
interface in the SUSPENDED state. The incoming audio is buffered whereas the
recognizer is suspended.
The Result interface recognizes the incoming audio that matched an active grammar object,
which is an object of the Grammar class. When an incoming speech is recognized, the Result
interface provides information, such as sequence of finalized and unfinalized words, matched
grammar, and result state. The result state includes UNFINALIZED, ACCEPTED, and
REJECTED. A new object of the Result interface is created when the recognizer identifies
incoming speech that matches with active grammar. Some methods of the Result interface
are:
getResultState(): Returns the current state of the Result interface object in the form of
an integer. The values can be UNFINALIZED, ACCEPTED, and REJECTED.
getGrammar(): Returns an object of the Grammar interface that matches the finalized
tokens of the Result interface.
numTokens(): Returns the integer number of the finalized tokens in the Result
interface.
getBestTokens(): Returns an array of all the finalized tokens for the Result interface.
The javax.speech.synthesis package provides classes and interfaces that support synthesis of
speech. A speech synthesizer is a speech engine that converts text to speech. A synthesizer is
created, selected, and searched through the Central class of the javax.speech package. Some
commonly used classes of the javax.speech.synthesis package are:
Voice
SynthesizerModeDesc
Synthesizer
SynthesizerProperties
The VoiceClass
The Voice class defines one output voice for the speech synthesizer. The class supports
fields, such as GENDER_MALE, GENDER_FEMALE, AGE_CHILD, and
AGE_TEENAGER to describe the synthesizer voice. Some methods provided by the Voice
class are:
setName(): Sets the voice name according to the input string parameter.
setGender(): Sets the voice gender according to the specified integer input parameter.
match(): Returns a Boolean value specifying whether or not the Voice class has all the
features corresponding to the voice object in the input parameter.
The SynthesizerModeDesc class extends the functioning of the EngineModeDesc class of the
javax.speech package. Apart from the engine name, locale, mode name, and running
properties inherited from the EngineModeDesc class, the SynthesizerModeDesc class
includes two properties, the voice to be loaded when the synthesizer is started and the list of
voices provided by the synthesizer. Some methods provided by the SynthesizerModeDesc
class are:
addVoice(): Adds a voice, specified in the voice input parameter, to the existing list of
voices.
match(): Returns a Boolean value depending on whether or not the object of the
SynthesizerModeDesc class has all the features specified by the input parameter. The
input parameter can be SynthesizerModeDesc or EngineModeDesc. If the input
parameter is EngineModeDesc, the method checks only for the features of the
EngineModeDesc class.
The Synthesizer interface provides an extension to the Engine interface of the javax.speech
package. The Synthesizer interface is created by using the createSynthesizer() method of the
Central class. Some methods defined by the Synthesizer interface are:
speak(): Reads out text from a Uniform Resource Locator (URL) that has been
formatted with the Java Speech Markup Language (JSML). This method accepts two
input parameters, the URL containing the JSML text and the SpeakableListener
interface object to which the Synthesizer interface sends the notifications of events.
The Synthesizer interface checks the text specified in the URL for JSML formatting
and places in the output queue.
speakPlainText(): Reads out a plain text string. This method accepts two input
parameters, the string containing text and the SpeakableListener interface object to
which the notifications of events are sent during the synthesis process.
phoneme(): Returns the phoneme string for the corresponding text string input
parameter. The input string can be simple text with out JSML formatting.
cancelAll(): Cancels all the objects in the speech output queue and stops the audio
process of the current object in the top of the queue.
setVoice(): Sets the current synthesizer’s voice according to the specified voice input
parameter.
setPitchRange(): Sets the pitch range according to the input float parameter.
setSpeakingRate(): Sets the target speech rate according to the input float parameter.
The rate is usually represented as number of words per minute.
FinalResult
These multiple interfaces are designed to explicitly indicate (a) what information is available
at what times in the result life-cycle and (b) what information is available for different types
of results. Appropriate casting of results allows compile-time checking of result-handling
code and fewer bugs.
The FinalResult extends the Result interface. It provides access to the additional
information about a result that is available once it has been finalized (once it is in either of the
ACCEPTED or REJECTED states). Calling any method of the FinalResult interface for a result
in the UNFINALIZED state causes a ResultStateError to be thrown.
The separate interfaces determine what information is available for a result in the different
stages of its life-cycle. The state of a Result is determined by calling the getResultState
method. The three possible states are UNFINALIZED, ACCEPTED and REJECTED.
A new result starts in the UNFINALIZED state. When the result is finalized is moves to either
the ACCEPTED or REJECTED state. An accepted or rejected result is termed a finalized result.
All values and information regarding a finalized result are fixed (excepting that audio and
training information may be released).
Following are descriptions of a result object in each of the three states including information
on which interfaces can be used in each state.
getResultState() == Result.UNFINALIZED
getResultState() == Result.ACCEPTED
Recognition of the Result is complete and the recognizer is confident it has the
correct result (not a rejected result). Non-rejection is not a guarantee of a correct
result - only sufficient confidence that the guess is correct.
Events 1: a result transitions from the UNFINALIZED state to the ACCEPTED state when
an RESULT_ACCEPTED event is issued.
Events 2: AUDIO_RELEASED and TRAINING_INFO_RELEASED events may occur
optionally (once) in the ACCEPTED state.
numTokens will return 1 or greater (there must be at least one finalized token) and the
number of finalized tokens will not change. [Note: A rejected result may have zero
finalized tokens.]
The best guess for each finalized token is available through the getBestToken(int
tokNum) method. The best guesses will not change through the remaining life of the
result.
getUnfinalizedTokens method returns null.
The getGrammar method returns the grammar matched by this result. It may be either
a RuleGrammar or DictationGrammar.
For either a RuleGrammar or DictationGrammar the methods of FinalResult may
be used to access audio data and to perform correction/training.
If the result matches a RuleGrammar, the methods of FinalRuleResult may be used
to get alternative guesses for the complete utterance and to get tags and other
information associated with the RuleGrammar. (Calls to any methods of the
FinalDictationResult interface cause a ResultStateError.)
If the result matches a DictationGrammar, the methods of FinalDictationResult
may be used to get alternative guesses for tokens and token sequences. (Calls to any
methods of the FinalRuleResult interface cause a ResultStateError.)
getResultState() == Result.REJECTED
Recognition of the Result is complete but the recognizer believes it does not have the
correct result. Programmatically, an accepted and rejected result are very similar but
the contents of a rejected result must be treated differently - they are likely to be
wrong.
Events 1: a result transitions from the UNFINALIZED state to the REJECTED state when
an RESULT_REJECTED event is issued.
Events 2: (same as for the ACCEPTED state) AUDIO_RELEASED and
TRAINING_INFO_RELEASED events may occur optionally (once) in the REJECTED state.
numTokens will return 0 or greater. The number of tokens will not change for the
remaining life of the result. [Note: an accepted result always has at least one finalized
token.]
As with an accepted result, the best guess for each finalized token is available through
the getBestToken(int num) method and the tokens are guaranteed not to change
through the remaining life of the result. Because the result has been rejected the
guesses are not likely to be correct.
getUnfinalizedTokens method returns null.
If the GRAMMAR_FINALIZED was issued during recognition of the result, the
getGrammar method returns the grammar matched by this result otherwise it returns
null. It may be either a RuleGrammar or DictationGrammar. For rejected results,
there is a greater chance that this grammar is wrong.
The FinalResult interface behaves the same as for a result in the ACCEPTED state
expect that the information is less likely to be reliable.
If the grammar is known, the FinalRuleResult and FinalDictationResult
interfaces behave the same as for a result in the ACCEPTED state expect that the
information is less likely to be reliable. If the grammar is unknown, then a
ResultStateError is thrown on calls to the methods of both FinalRuleResult and
FinalDictationResult.
The state system of a Recognizer is linked to the state of recognition of the current result.
The Recognizer interface documents the normal event cycle for a Recognizer and for
Results. The following is an overview of the ways in which the two state systems are linked:
The ALLOCATED state of a Recognizer has three sub-states. In the LISTENING state,
the recognizer is listening to background audio and there is no result being produced.
In the SUSPENDED state, the recognizer is temporarily buffering audio input while its
grammars are updated. In the PROCESSING state, the recognizer has detected incoming
audio that may match an active grammar and is producing a Result.
The Recognizer moves from the LISTENING state to the PROCESSING state with a
RECOGNIZER_PROCESSING event immediately prior to issuing a RESULT_CREATED
event.
The RESULT_UPDATED and GRAMMAR_FINALIZED events are produced while the
Recognizer is in the PROCESSING state.
The Recognizer finalizes a Result with RESULT_ACCEPTED or RESULT_REJECTED
event immediately after it transitions from the PROCESSING state to the SUSPENDED
state with a RECOGNIZER_SUSPENDED event.
Unless there is a pending suspend, the Recognizer commits grammar changes with a
CHANGES_COMMITTED event as soon as the result finalization event is processed.
The TRAINING_INFO_RELEASED and AUDIO_RELEASED events can occur in any state of
an ALLOCATED Recognizer.
Accept or Reject?
Rejection of a result indicates that the recognizer is not confident that it has accurately
recognized what a user says. Rejection can be controlled through the
RecognizerProperties interface with the setConfidenceLevel method. Increasing the
confidence level requires the recognizer to have greater confident to accept a result, so more
results are likely to be rejected.
It is difficult for recognizers to reliably determine when they make mistakes. Applications
need to determine the cost of incorrect recognition of any particular results and take
appropriate actions. For example, confirm with a user that they said "delete all files" before
deleting anything.
Result Events
Events are issued when a new result is created and when there is any change in the state or
information content of a result. The following describes the event sequence for an accepted
result. It provides the same information as above for result states, but focusses on legal event
sequences.
Before a new result is created for incoming speech, a recognizer usually issues a
SPEECH_STARTED event to the speechStarted method of RecognizerAudioListener. The
A newly created Result is provided to the application by calling the resultCreated method
of each ResultListener attached to the Recognizer with a RESULT_CREATED event. The
new result may or may not have any finalized tokens or unfinalized tokens.
A new Result is created in the UNFINALIZED state. In this state, zero or more
RESULT_UPDATED events may be issued to each ResultListener attached to the Recognizer
and to each ResultListener attached to that Result. The RESULT_UPDATED indicates that
one or more tokens have been finalized, or that the unfinalized tokens have changed, or both.
When the recognizer determines which grammar is the best match for incoming speech, it
issues a GRAMMAR_FINALIZED event. This event is issued to each ResultListener attached to
the Recognizer and to each ResultListener attached to that Result.
Zero or more RESULT_UPDATED events may be issued after the GRAMMAR_FINALIZED event but
before the result is finalized.
Once the recognizer completes recognition of the Result that it choses to accept, it finalizes
the result with an RESULT_ACCEPTED event that is issued to the ResultListeners attached to
the Recognizer, the matched Grammar, and the Result. This event may also indicate
finalization of zero or more tokens, and/or the reseting of the unfinalized tokens to null. The
result finalization event occurs immediately after the Recognizer makes a transition from the
PROCESSING state to the SUSPENDED state with a RECOGNIZER_SUSPENDED event.
When a result is rejected some of the events described above may be skipped. A result may
be rejected with the RESULT_REJECTED event at any time after a RESULT_CREATED event
instead of an RESULT_ACCEPTED event. A result may be rejected with or without any
unfinalized or finalized tokens being created (no RESULT_UPDATED events), and with or
without a GRAMMAR_FINALIZED event.
A new result object is created when a recognizer has detected possible incoming speech
which may match an active grammar.
To accept the result (i.e. to issue a RESULT_ACCEPTED event), the best-guess tokens of the
result must match the token patterns defined by the matched grammar. For a RuleGrammar
this implies that a call to the parse method of the matched RuleGrammar must return
successfully. (Note: the parse is not guaranteed if the grammar has been changed.)
Because there are no programmatically defined constraints upon word patterns for a
DictationGrammar, a single result may represent a single word, a short phrase or sentence,
or possibly many pages of text.
The set of conditions that may cause a result matching a DictationGrammar to be finalized
includes:
See Also:
FinalResult, FinalRuleResult, FinalDictationResult, ResultEvent, ResultListener,
ResultAdapter, Grammar, RuleGrammar, DictationGrammar, forceFinalize,
RecognizerEvent, setConfidenceLevel
Field Summary
static int ACCEPTED
getResultState returns ACCEPTED once recognition of the result is
completed and the Result object has been finalized by being accepted.
static int REJECTED
getResultState returns REJECTED once recognition of the result complete
and the Result object has been finalized by being rejected.
static int UNFINALIZED
getResultState returns UNFINALIZED while a result is still being
recognized.
Method Summary
void addResultListener(ResultListener listener)
Request notifications of events of related to this Result.
ResultToken getBestToken(int tokNum)
Returns the best guess for the tokNumth token.
ResultToken[] getBestTokens()
Returns all the best guess tokens for this result.
Grammar getGrammar()
Return the Grammar matched by the best-guess finalized tokens of this
result or null if the grammar is not known.
int getResultState()
Returns the current state of the Result object: UNFINALIZED, ACCEPTED or
REJECTED.
ResultToken[] getUnfinalizedTokens()
In the UNFINALIZED state, return the current guess of the tokens
following the finalized tokens.
int numTokens()
Returns the number of finalized tokens in a Result.
void removeResultListener(ResultListener listener)
Remove a listener from this Result.
Field Detail
UNFINALIZED
public static final int UNFINALIZED
getResultState returns UNFINALIZED while a result is still being recognized. A
Result is in the UNFINALIZED state when the RESULT_CREATED event is issued. Result
states are described above in detail.
See Also:
getResultState, RESULT_CREATED
ACCEPTED
public static final int ACCEPTED
getResultState returns ACCEPTED once recognition of the result is completed and
the Result object has been finalized by being accepted. When a Result changes to
the ACCEPTED state a RESULT_ACCEPTED event is issued. Result states are described
above in detail.
See Also:
getResultState, RESULT_ACCEPTED
REJECTED
public static final int REJECTED
getResultState returns REJECTED once recognition of the result complete and the
Result object has been finalized by being rejected. When a Result changes to the
REJECTED state a RESULT_REJECTED event is issued. Result states are described above
in detail.
See Also:
getResultState, RESULT_REJECTED
Method Detail
getResultState
public int getResultState()
Returns the current state of the Result object: UNFINALIZED, ACCEPTED or REJECTED.
The details of a Result in each state are described above.
See Also:
UNFINALIZED, ACCEPTED, REJECTED
getGrammar
public Grammar getGrammar()
Return the Grammar matched by the best-guess finalized tokens of this result or null
if the grammar is not known. The return value is null before a GRAMMAR_FINALIZED
event and non-null afterwards.
The grammar is guaranteed to be non-null for an accepted result. The grammar may
be null or non-null for a rejected result, depending upon whether a
GRAMMAR_FINALIZED event was issued prior to finalization.
For a finalized result, an application should determine the type of matched grammar
with an instanceof test. For a result that matches a RuleGrammar, the methods of
FinalRuleResult can be used (the methods of FinalDictationResult throw an
error). For a result that matches a DictationGrammar, the methods of
FinalDictationResult can be used (the methods of FinalRuleResult throw an
error). The methods of FinalResult can be used for a result matching either kind of
grammar.
Example:
Result result;
if (result.getGrammar() instanceof RuleGrammar) {
FinalRuleResult frr = (FinalRuleResult)result;
...
}
See Also:
getResultState
numTokens
public int numTokens()
Returns the number of finalized tokens in a Result. Tokens are numbered from 0 to
numTokens()-1 and are obtained through the getBestToken and getBestTokens
method of this (Result) interface and the getAlternativeTokens methods of the
FinalRuleResult and FinalDictationResult interfaces for a finalized result.
Starting from the RESULT_CREATED event and while the result remains in the
UNFINALIZED state, the number of finalized tokens may be zero or greater and can
increase as tokens are finalized. When one or more tokens are finalized in the
UNFINALIZED state, a RESULT_UPDATED event is issued with the tokenFinalized flag
set true. The RESULT_ACCEPTED and RESULT_REJECTED events which finalize a result
can also indicate that one or more tokens have been finalized.
In the ACCEPTED and REJECTED states, numTokens indicates the total number of tokens
that were finalized. The number of finalized tokens never changes in these states. An
ACCEPTED result must have one or more finalized token. A REJECTED result may have
zero or more tokens.
See Also:
RESULT_UPDATED, getBestToken, getBestTokens, getAlternativeTokens,
getAlternativeTokens
getBestToken
public ResultToken getBestToken(int tokNum)
throws
IllegalArgumentException
Returns the best guess for the tokNumth token. tokNum must be in the range 0 to
numTokens()-1.
If the result has zero tokens (possible in both the UNFINALIZED and REJECTED states)
an exception is thrown.
If the result is in the REJECTED state, then the returned tokens are likely to be
incorrect. In the ACCEPTED state (not rejected) the recognizer is confident that the
tokens are correct but applications should consider the possibility that the tokens are
incorrect.
Throws:
IllegalArgumentException - if tokNum is out of range.
See Also:
getUnfinalizedTokens, getBestTokens, getAlternativeTokens, getAlternativeTokens
getBestTokens
public ResultToken[] getBestTokens()
Returns all the best guess tokens for this result. If the result has zero tokens, the return
value is null.
getUnfinalizedTokens
public ResultToken[] getUnfinalizedTokens()
In the UNFINALIZED state, return the current guess of the tokens following the
finalized tokens. Unfinalized tokens provide an indication of what a recognizer is
considering as possible recognition tokens for speech following the finalized tokens.
. * Unfinalized tokens can provide users with feedback on the recognition process.
The array may be any length (zero or more tokens), the length may change at any
time, and successive calls to getUnfinalizedTokens may return different tokens or
even different numbers of tokens. When the unfinalized tokens are changed, a
RESULT_UPDATED event is issued to the ResultListener. The RESULT_ACCEPTED and
RESULT_REJECTED events finalize a result and always guarantee that the return value
is null. A new result created with a RESULT_CREATED event may have a null or non-
null value.
The returned array is null if there are currently no unfinalized tokens, if the recognizer
does not support unfinalized tokens, or after a Result is finalized (in the ACCEPTED or
REJECTED state).
See Also:
isUnfinalizedTokensChanged, RESULT_UPDATED, RESULT_ACCEPTED,
RESULT_REJECTED
addResultListener
public void addResultListener(ResultListener listener)
Request notifications of events of related to this Result. An application can attach
multiple listeners to a Result. A listener can be removed with the
removeResultListener method.
removeResultListener
public void removeResultListener(ResultListener listener)
Remove a listener from this Result.
4 allocate , deeallocate
Recognizer, Synthesizer
Engines are located, selected and created through methods of the Central class.
Each type of speech engine has a well-defined set of states of operation, and well-defined
behavior for moving between states. These states are defined by constants of the Engine,
Recognizer and Synthesizer interfaces.
The Engine interface defines three methods for viewing and monitoring states:
getEngineState, waitEngineState and testEngineState. An EngineEvent is issued to
EngineListeners each time an Engine changes state.
The basic states of any speech engine (Recognizer or Synthesizer) are DEALLOCATED,
ALLOCATED, ALLOCATING_RESOURCES and DEALLOCATING_RESOURCES. An engine in the
ALLOCATED state has acquired all the resources it requires to perform its core functions.
Engines are created in the DEALLOCATED state and a call to allocate is required to prepare
them for usage. The ALLOCATING_RESOURCES state is an intermediate state between
DEALLOCATED and ALLOCATED which an engine occupies during the resource allocation
process (which may be a very short period or takes 10s of seconds).
Once an application finishes using a speech engine it should always explicitly free system
resources by calling the deallocate method. This call transitions the engine to the
DEALLOCATED state via some period in the DEALLOCATING_RESOURCES state.
The methods of Engine, Recognizer and Synthesizer perform differently according to the
engine's allocation state. Many methods cannot be performed when an engine is in either the
DEALLOCATED or DEALLOCATING_RESOURCES state. Many methods block (wait) for an engine
in the ALLOCATING_RESOURCES state until the engine reaches the ALLOCATED state. This
blocking/exception behavior is defined separately for each method of Engine, Synthesizer
and Recognizer.
The pause and resume methods are used to transition an engine between the PAUSED and
RESUMED states. The PAUSED and RESUMED states are shared by all applications that use the
underlying engine. For instance, pausing a recognizer pauses all applications that use that
engine.
The state values can be bitwise OR'ed (using the Java "|" operator). For example, for an
allocated, resumed synthesizer with items in its speech output queue, the state is
The states and sub-states defined above put constraints upon the state of an engine. The
following are examples of illegal states:
Illegal Engine states:
Engine.DEALLOCATED | Engine.RESUMED
Engine.ALLOCATED | Engine.DEALLOCATED
Illegal Synthesizer states:
Engine.DEALLOCATED | Engine.QUEUE_NOT_EMPTY
Engine.QUEUE_EMPTY | Engine.QUEUE_NOT_EMPTY
Illegal Recognizer states:
Engine.DEALLOCATED | Engine.PROCESSING
Engine.PROCESSING | Engine.SUSPENDED
Calls to the testEngineState and waitEngineState methods with illegal state values cause
an exception to be thrown.
See Also:
Central, Synthesizer, Recognizer
Field Summary
static long ALLOCATED
Bit of state that is set when an Engine is in the allocated state.
static long ALLOCATING_RESOURCES
Bit of state that is set when an Engine is being allocated - the transition state
between DEALLOCATED to ALLOCATED following a call to the allocate method.
static long DEALLOCATED
Bit of state that is set when an Engine is in the deallocated state.
static long DEALLOCATING_RESOURCES
Bit of state that is set when an Engine is being deallocated - the transition
state between ALLOCATED to DEALLOCATED.
static long PAUSED
Bit of state that is set when an Engine is is in the ALLOCATED state and is
PAUSED.
static long RESUMED
Bit of state that is set when an Engine is is in the ALLOCATED state and is
RESUMED.
Method Summary
void addEngineListener(EngineListener listener)
Request notifications of events of related to the Engine.
void allocate()
Allocate the resources required for the Engine and put it into the
ALLOCATED state.
void deallocate()
Free the resources of the engine that were acquired during allocation
and during operation and return the engine to the DEALLOCATED.
AudioManager getAudioManager()
Return an object which provides management of the audio input or
output for the Engine.
EngineModeDesc getEngineModeDesc()
Return a mode descriptor that defines the operating properties of the
engine.
EngineProperties getEngineProperties()
Return the EngineProperties object (a JavaBean).
long getEngineState()
Returns a or'ed set of flags indicating the current state of an Engine.
VocabManager getVocabManager()
Return an object which provides management of the vocabulary for
the Engine.
void pause()
Pause the audio stream for the engine and put the Engine into the
PAUSED state.
void removeEngineListener(EngineListener listener)
Remove a listener from this Engine.
void resume()
Put the Engine in the RESUMED state to resume audio streaming to or
from a paused engine.
boolean testEngineState(long state)
Returns true if the current engine state matches the specified state.
void waitEngineState(long state)
Blocks the calling thread until the Engine is in a specified state.
Field Detail
DEALLOCATED
public static final long DEALLOCATED
Bit of state that is set when an Engine is in the deallocated state. A deallocated engine
does not have the resources necessary for it to carry out its basic functions.
See Also:
allocate, deallocate, getEngineState, waitEngineState
ALLOCATING_RESOURCES
public static final long ALLOCATING_RESOURCES
Bit of state that is set when an Engine is being allocated - the transition state between
DEALLOCATED to ALLOCATED following a call to the allocate method. The
ALLOCATING_RESOURCES state has no sub-states. In the ALLOCATING_RESOURCES state,
many of the methods of Engine, Recognizer, and Synthesizer will block until the
Engine reaches the ALLOCATED state and the action can be performed.
See Also:
getEngineState, waitEngineState
ALLOCATED
public static final long ALLOCATED
Bit of state that is set when an Engine is in the allocated state. An engine in the
ALLOCATED state has acquired the resources required for it to carry out its core
functions.
The ALLOCATED states has sub-states for RESUMED and PAUSED. Both Synthesizer and
Recognizer define additional sub-states of ALLOCATED.
An Engine is always created in the DEALLOCATED state. It reaches the ALLOCATED state
via the ALLOCATING_RESOURCES state with a call to the allocate method.
See Also:
Synthesizer, Recognizer, getEngineState, waitEngineState
DEALLOCATING_RESOURCES
public static final long DEALLOCATING_RESOURCES
Bit of state that is set when an Engine is being deallocated - the transition state
between ALLOCATED to DEALLOCATED. The DEALLOCATING_RESOURCES state has no
sub-states. In the DEALLOCATING_RESOURCES state, most methods of Engine,
Recognizer and Synthesizer throw an exception.
See Also:
getEngineState, waitEngineState
PAUSED
public static final long PAUSED
Bit of state that is set when an Engine is is in the ALLOCATED state and is PAUSED. In
the PAUSED state, audio input or output stopped.
An ALLOCATED engine is always in either in the PAUSED or RESUMED. The PAUSED and
RESUMED states are sub-states of the ALLOCATED state.
See Also:
RESUMED, ALLOCATED, getEngineState, waitEngineState
RESUMED
public static final long RESUMED
Bit of state that is set when an Engine is is in the ALLOCATED state and is RESUMED. In
the RESUMED state, audio input or output active.
An ALLOCATED engine is always in either in the PAUSED or RESUMED. The PAUSED and
RESUMED states are sub-states of the ALLOCATED state.
See Also:
RESUMED, ALLOCATED, getEngineState, waitEngineState
Method Detail
getEngineState
public long getEngineState()
Returns a or'ed set of flags indicating the current state of an Engine. The format of the
returned state value is described above.
See Also:
getEngineState, waitEngineState, getNewEngineState, getOldEngineState
waitEngineState
public void waitEngineState(long state)
throws InterruptedException,
IllegalArgumentException
Blocks the calling thread until the Engine is in a specified state. The format of the
state value is described above.
All state bits specified in the state parameter must be set in order for the method to
return, as defined for the testEngineState method. If the state parameter defines
an unreachable state (e.g. PAUSED | RESUMED) an exception is thrown.
Throws:
InterruptedException - if another thread has interrupted this thread.
IllegalArgumentException - if the specified state is unreachable
See Also:
testEngineState, getEngineState
testEngineState
public boolean testEngineState(long state)
throws IllegalArgumentException
Returns true if the current engine state matches the specified state. The format of the
state value is described above.
The test performed is not an exact match to the current state. Only the specified states
are tested. For example the following returns true only if the Synthesizer queue is
empty, irrespective of the pause/resume and allocation states.
if (synth.testEngineState(Synthesizer.QUEUE_EMPTY)) ...
The testEngineState method is equivalent to:
if ((engine.getEngineState() & state) == state)
The testEngineState method can be called successfully in any Engine state.
Throws:
IllegalArgumentException - if the specified state is unreachable
allocate
public void allocate()
throws EngineException,
EngineStateError
Allocate the resources required for the Engine and put it into the ALLOCATED state.
When this method returns successfully the ALLOCATED bit of engine state is set, and
the testEngineState(Engine.ALLOCATED) method returns true. During the
processing of the method, the Engine is temporarily in the ALLOCATING_RESOURCES
state.
When the Engine reaches the ALLOCATED state other engine states are determined:
PAUSED or RESUMED: the pause state depends upon the existing state of the
engine. In a multi-app environment, the pause/resume state of the engine is
shared by all apps.
A Recognizer always starts in the LISTENING state when newly allocated but
may transition immediately to another state.
A Recognizer may be allocated in either the HAS_FOCUS state or LOST_FOCUS
state depending upon the activity of other applications.
A Synthesizer always starts in the QUEUE_EMPTY state when newly allocated.
While this method is being processed events are issued to any EngineListeners
attached to the Engine to indicate state changes. First, as the Engine changes from the
DEALLOCATED to the ALLOCATING_RESOURCES state, an
ENGINE_ALLOCATING_RESOURCES event is issued. As the allocation process
completes, the engine moves from the ALLOCATING_RESOURCES state to the
ALLOCATED state and an ENGINE_ALLOCATED event is issued.
The allocate method should be called for an Engine in the DEALLOCATED state. The
method has no effect for an Engine is either the ALLOCATING_RESOURCES or
ALLOCATED states. The method throws an exception in the DEALLOCATING_RESOURCES
state.
If any problems are encountered during the allocation process so that the engine
cannot be allocated, the engine returns to the DEALLOCATED state (with an
ENGINE_DEALLOCATED event), and an EngineException is thrown.
Allocating the resources for an engine may be fast (less than a second) or slow
(several 10s of seconds) depending upon a range of factors. Since the allocate
method does not return until allocation is completed applications may want to perform
allocation in a background thread and proceed with other activities. The following
code uses an inner class implementation of Runnable to create a background thread
for engine allocation:
deallocate
public void deallocate()
throws EngineException,
EngineStateError
Free the resources of the engine that were acquired during allocation and during
operation and return the engine to the DEALLOCATED. When this method returns the
DEALLOCATED bit of engine state is set so the
testEngineState(Engine.DEALLOCATED) method returns true. During the
processing of the method, the Engine is temporarily in the
DEALLOCATING_RESOURCES state.
While this method is being processed events are issued to any EngineListeners
attached to the Engine to indicate state changes. First, as the Engine changes from the
ALLOCATED to the DEALLOCATING_RESOURCES state, an
ENGINE_DEALLOCATING_RESOURCES event is issued. As the deallocation process
completes, the engine moves from the DEALLOCATING_RESOURCES state to the
DEALLOCATED state and an ENGINE_DEALLOCATED event is issued.
The deallocate method should only be called for an Engine in the ALLOCATED state.
The method has no effect for an Engine is either the DEALLOCATING_RESOURCES or
DEALLOCATED states. The method throws an exception in the ALLOCATING_RESOURCES
state.
Deallocating resources for an engine is not always immediate. Since the deallocate
method does not return until complete, applications may want to perform deallocation
in a separate thread. The documentation for the allocate method shows an example
of an inner class implementation of Runnable that creates a separate thread.
Throws:
EngineException - if a deallocation error occurs
EngineStateError - if called for an engine in the ALLOCATING_RESOURCES state
See Also:
allocate, ENGINE_DEALLOCATED, QUEUE_EMPTY
pause
public void pause()
throws EngineStateError
Pause the audio stream for the engine and put the Engine into the PAUSED state.
Pausing an engine pauses the underlying engine for all applications that are connected
to that engine. Engines are typically paused and resumed by request from a user.
Applications may pause an engine indefinately. When an engine moves from the
RESUMED state to the PAUSED state, an ENGINE_PAUSED event is issued to each
EngineListener attached to the Engine. The PAUSED bit of the engine state is set to
true when paused, and can be tested by the getEngineState method and other
engine state methods.
The pause method operates as defined for engines in the ALLOCATED state. When
pause is called for an engine in the ALLOCATING_RESOURCES state, the method blocks
(waits) until the ALLOCATED state is reached and then operates normally. An error is
thrown when pause is called for an engine is either the DEALLOCATED is
DEALLOCATING_RESOURCES states. state.
The pause method does not always return immediately. Some applications need to
execute pause in a separate thread. The documentation for the allocate method
includes an example implementation of Runnable with inner classes that can perform
pause in a separate thread.
Pausing a Synthesizer
Pausing a Recognizer
Pause and resume for a recognizer are analogous to turning a microphone off and on.
Pausing stops the input audio input stream as close as possible to the time of the call
to pause. The incoming audio between the pause and the resume calls is ignored.
Anything a user says while the recognizer is paused will not be heard by the
recognizer. Pausing a recognizer during the middle of user speech forces the
recognizer to finalize or reject processing of that incoming speech - a recognition
result cannot cross a pause/resume boundary.
Most recognizers have some amount of internal audio buffering. This means that
some recognizer processing may continue after the pause. For example, results can be
created and finalized.
Throws:
EngineStateError - if called for an engine in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
resume, getEngineState, ENGINE_PAUSED, suspend
resume
public void resume()
throws AudioException,
EngineStateError
Put the Engine in the RESUMED stateto resume audio streaming to or from a paused
engine. Resuming an engine resuming the underlying engine for all applications that
are connected to that engine. Engines are typically paused and resumed by request
from a user.
When an engine moves from the PAUSED state to the RESUMED state, an
ENGINE_RESUMED event is issued to each EngineListener attached to the Engine.
The RESUMED bit of the engine state is set to true when resumed, and can be tested by
the getEngineState method and other engine state methods.
The resume method operates as defined for engines in the ALLOCATED state. When
resume is called for an engine in the ALLOCATING_RESOURCES state, the method
blocks (waits) until the ALLOCATED state is reached and then operates normally. An
error is thrown when resume is called for an engine is either the DEALLOCATED is
DEALLOCATING_RESOURCES states. state.
The resume method does not always return immediately. Some applications need to
execute resume in a separate thread. The documentation for the allocate method
includes an example implementation of Runnable with inner classes that could also
perform resume in a separate thread.
Throws:
AudioException - if unable to gain access to the audio channel
EngineStateError - if called for an engine in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
pause, getEngineState, ENGINE_RESUMED
getAudioManager
public AudioManager getAudioManager()
Return an object which provides management of the audio input or output for the
Engine.
Returns:
the AudioManager for the engine
getVocabManager
public VocabManager getVocabManager()
throws EngineStateError
Return an object which provides management of the vocabulary for the Engine. See
the VocabManager documentation for a description of vocabularies and their use with
speech engines. Returns null if the Engine does not provide vocabulary management
capabilities.
The VocabManager is available for engines in the ALLOCATED state. The call blocks
for engines in the ALLOCATING_RESOURCES. An error is thrown for engines in the
DEALLOCATED or DEALLOCATING_RESOURCES states.
Returns:
the VocabManager for the engine or null if it does not have a VocabManager
Throws:
EngineStateError - if called for an engine in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
Word
getEngineProperties
public EngineProperties getEngineProperties()
Return the EngineProperties object (a JavaBean).
Returns:
the EngineProperties object for this engine
See Also:
getRecognizerProperties, RecognizerProperties, getSynthesizerProperties,
SynthesizerProperties
getEngineModeDesc
public EngineModeDesc getEngineModeDesc()
throws SecurityException
Return a mode descriptor that defines the operating properties of the engine. For a
Recognizer the return value is a RecognizerModeDesc. For a Synthesizer the
return value is a SynthesizerModeDesc.
Returns:
an EngineModeDesc for the engine.
Throws:
SecurityException - if the application does not have accessEngineModeDesc
permission
addEngineListener
public void addEngineListener(EngineListener listener)
Request notifications of events of related to the Engine. An application can attach
multiple listeners to an Engine. A single listener can be attached to multiple engines.
Parameters:
listener - the listener that will receive EngineEvents
See Also:
Recognizer, RecognizerListener, Synthesizer, SynthesizerListener
removeEngineListener
public void removeEngineListener(EngineListener listener)
Remove a listener from this Engine. An EngineListener can be attached or removed
in any state of an Engine.
Parameters:
listener - the listener to be removed
public abstract interface Synthesizer
extends Engine
The Synthesizer interface provides primary access to speech synthesis capabilities. The
Synthesizer interface extends the Engine interface. Thus, any Synthesizer implements
basic speech engine capabilities plus the specialized capabilities required for speech
synthesis.
The primary functions provided by the Synthesizer interface are the ability to speak text,
speak Java Speech Markup Language text, and control an output queue of objects to be
spoken.
Creating a Synthesizer
Typically, a Synthesizer is created by a call to the Central.createSynthesizer method.
The procedures for locating, selecting, creating and initializing a Synthesizer are described
in the documentation for Central class.
A synthesis package inherits many of its important capabilities from the Engine interface
and its related support classes and interfaces. The synthesis package adds specialized
functionality for performing speech synthesis.
Speaking Text
Plain text is spoken using the speakPlainText method. JSML text is spoken using one of the
three speak methods. The speak methods obtain the JSML text for a Speakable object, from
a URL, or from a String.
[Note: JSML text provided programmatically (by a Speakable object or a String) does not
require the full XML header. JSML text obtained from a URL requires the full XML header.]
A synthesizer is mono-lingual (it speaks a single language) so the text should contain only the
single language of the synthesizer. An application requiring output of more than one language
needs to create multiple Synthesizer object through Central. The language of the
Synthesizer should be selected at the time at which it is created. The language for a created
Synthesizer can be checked through the Locale of its EngineModeDesc (see
getEngineModeDesc).
Each object provided to a synthesizer is spoken independently. Sentences, phrases and other
structures should not span multiple call to the speak methods.
Synthesizer extends the state system of the generic Engine interface. It inherits the four
basic allocation states, plus the PAUSED and RESUMED states.
Synthesizer adds a pair of sub-states to the ALLOCATED state to represent the state of the
speech output queue (queuing is described in more detail below). For an ALLOCATED
Synthesizer, the speech output queue is either empty or not empty: represented by the states
QUEUE_EMPTY and QUEUE_NOT_EMPTY.
The QUEUE_EMPTY and QUEUE_NOT_EMPTY states of a Synthesizer indicate the current state
of of the speech output queue. The state handling methods inherited from the Engine
interface (getEngineState, waitEngineState and testEngineState) can be used to test
the queue state.
The items on the queue can be checked with the enumerateQueue method which returns a
snapshot of the queue.
The cancel methods allows an application to (a) stop the output of item currently at the top
of the speaking queue, (b) remove an arbitrary item from the queue, or (c) remove all items
from the output queue.
Applications requiring more complex queuing mechanisms (e.g. a prioritized queue) can
implement their own queuing objects that control the synthesizer.
Applications can determine the approximate point at which a pause occurs by monitoring the
WORD_STARTED events.
See Also:
Central, Speakable, SpeakableListener, EngineListener, SynthesizerListener
Field Summary
static long QUEUE_EMPTY
Bit of state that is set when the speech output queue of a Synthesizer is
empty.
static long QUEUE_NOT_EMPTY
Bit of state that is set when the speech output queue of a Synthesizer is not
empty.
Method Summary
void addSpeakableListener(SpeakableListener listener)
Request notifications of all SpeakableEvents for all speech output
objects for this Synthesizer.
void cancelAll()
Cancel all objects in the synthesizer speech output queue and stop
speaking the current top-of-queue object.
void cancel()
Cancel output of the current object at the top of the output queue.
void cancel(Object source)
Remove a specified item from the speech output queue.
Enumeration enumerateQueue()
Return an Enumeration containing a snapshot of all the objects
currently on the speech output queue.
SynthesizerProperties getSynthesizerProperties()
Return the SynthesizerProperties object (a JavaBean).
String phoneme(String text)
Returns the phoneme string for a text string.
void removeSpeakableListener(SpeakableListener listener)
Remove a SpeakableListener from this Synthesizer.
void speakPlainText(String text, SpeakableListener listener)
Speak a plain text string.
void speak(Speakable JSMLtext, SpeakableListener listener)
Speak an object that implements the Speakable interface and
provides text marked with the Java Speech Markup Language.
void speak(URL JSMLurl, SpeakableListener listener)
Speak text from a URL formatted with the Java Speech Markup
Language.
void speak(String JSMLText, SpeakableListener listener)
Speak a string containing text formatted with the Java Speech
Markup Language.
Field Detail
QUEUE_EMPTY
public static final long QUEUE_EMPTY
Bit of state that is set when the speech output queue of a Synthesizer is empty. The
QUEUE_EMPTY state is a sub-state of the ALLOCATED state. An allocated Synthesizer
is always in either the QUEUE_NOT_EMPTY or QUEUE_EMPTY state.
The queue status can be tested with the waitQueueEmpty, getEngineState and
testEngineState methods. To block a thread until the queue is empty:
See Also:
QUEUE_NOT_EMPTY, ALLOCATED, getEngineState, waitEngineState,
testEngineState, QUEUE_UPDATED
QUEUE_NOT_EMPTY
public static final long QUEUE_NOT_EMPTY
Bit of state that is set when the speech output queue of a Synthesizer is not empty.
The QUEUE_NOT_EMPTY state is a sub-state of the ALLOCATED state. An allocated
Synthesizer is always in either the QUEUE_NOT_EMPTY or QUEUE_EMPTY state.
A Synthesizer enters the QUEUE_NOT_EMPTY from the QUEUE_EMPTY state when one
of the speak methods is called to place an object on the speech output queue. A
QUEUE_UPDATED event is issued to mark this change in state.
See Also:
QUEUE_EMPTY, ALLOCATED, getEngineState, waitEngineState, testEngineState,
QUEUE_UPDATED
Method Detail
speak
public void speak(Speakable JSMLtext,
SpeakableListener listener)
throws JSMLException,
EngineStateError
Speak an object that implements the Speakable interface and provides text marked
with the Java Speech Markup Language. The Speakable object is added to the end of
the speaking queue and will be spoken once it reaches the top of the queue and the
synthesizer is in the RESUMED state.
The synthesizer first requests the text of the Speakable by calling its getJSMLText
method. It then checks the syntax of the JSML markup and throws a JSMLException
if any problems are found. If the JSML text is legal, the text is placed on the speech
output queue.
Events associated with the Speakable object are issued to the SpeakableListener
object. The listener may be null. A listener attached with this method cannot be
removed with a subsequent remove call. The source for the SpeakableEvents is the
JSMLtext object.
An object placed on the speech output queue can be removed with one of the cancel
methods.
The speak methods operate as defined only when a Synthesizer is in the ALLOCATED
state. The call blocks if the Synthesizer in the ALLOCATING_RESOURCES state and
completes when the engine reaches the ALLOCATED state. An error is thrown for
synthesizers in the DEALLOCATED or DEALLOCATING_RESOURCES states.
Parameters:
JSMLText - object implementing the Speakable interface that provides Java Speech
Markup Language text to be spoken
listener - receives notification of events as synthesis output proceeds
Throws:
JSMLException - if any syntax errors are encountered in JSMLtext
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
speak(String, SpeakableListener), speak(URL, SpeakableListener),
speakPlainText(String, SpeakableListener), SpeakableEvent, addSpeakableListener
speak
public void speak(URL JSMLurl,
SpeakableListener listener)
throws JSMLException,
MalformedURLException,
IOException,
EngineStateError
Speak text from a URL formatted with the Java Speech Markup Language. The text is
obtained from the URL, checked for legal JSML formatting, and placed at the end of
the speaking queue. It is spoken once it reaches the top of the queue and the
synthesizer is in the RESUMED state. In other respects is it identical to the speak
method that accepts a Speakable object.
Because of the need to check JSML syntax, this speak method returns only once the
complete URL is loaded, or until a syntax error is detected in the URL stream.
Network delays will cause the method to return slowly.
Note: the full XML header is required in the JSML text provided in the URL. The
header is optional on programmatically generated JSML (ie. with the speak(String,
Listener) and speak(Speakable, Listener) methods.
The speak methods operate as defined only when a Synthesizer is in the ALLOCATED
state. The call blocks if the Synthesizer in the ALLOCATING_RESOURCES state and
completes when the engine reaches the ALLOCATED state. An error is thrown for
synthesizers in the DEALLOCATED or DEALLOCATING_RESOURCES states.
Parameters:
JSMLurl - URL containing Java Speech Markup Language text to be spoken
JSMLException - if any syntax errors are encountered in JSMLtext
listener - receives notification of events as synthesis output proceeds
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
speak(Speakable, SpeakableListener), speak(String, SpeakableListener),
speakPlainText(String, SpeakableListener), SpeakableEvent, addSpeakableListener
speak
public void speak(String JSMLText,
SpeakableListener listener)
throws JSMLException,
EngineStateError
Speak a string containing text formatted with the Java Speech Markup Language. The
JSML text is checked for formatting errors and a JSMLException is thrown if any are
found. If legal, the text is placed at the end of the speaking queue and will be spoken
once it reaches the top of the queue and the synthesizer is in the RESUMED state. In all
other respects is it identical to the speak method that accepts a Speakable object.
The speak methods operate as defined only when a Synthesizer is in the ALLOCATED
state. The call blocks if the Synthesizer in the ALLOCATING_RESOURCES state and
completes when the engine reaches the ALLOCATED state. An error is thrown for
synthesizers in the DEALLOCATED or DEALLOCATING_RESOURCES states.
Parameters:
JSMLText - String contains Java Speech Markup Language text to be spoken
listener - receives notification of events as synthesis output proceeds
JSMLException - if any syntax errors are encountered in JSMLtext
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
speak(Speakable, SpeakableListener), speak(URL, SpeakableListener),
speakPlainText(String, SpeakableListener)
speakPlainText
public void speakPlainText(String text,
SpeakableListener listener)
throws EngineStateError
Speak a plain text string. The text is not interpreted as containing the Java Speech
Markup Language so JSML elements are ignored. The text is placed at the end of the
speaking queue and will be spoken once it reaches the top of the queue and the
synthesizer is in the RESUMED state. In other respects it is similar to the speak method
that accepts a Speakable object.
The speak methods operate as defined only when a Synthesizer is in the ALLOCATED
state. The call blocks if the Synthesizer in the ALLOCATING_RESOURCES state and
completes when the engine reaches the ALLOCATED state. An error is thrown for
synthesizers in the DEALLOCATED or DEALLOCATING_RESOURCES states.
Parameters:
JSMLText - String contains plaing text to be spoken
listener - receives notification of events as synthesis output proceeds
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
speak(Speakable, SpeakableListener), speak(URL, SpeakableListener), speak(String,
SpeakableListener)
phoneme
public String phoneme(String text)
throws EngineStateError
Returns the phoneme string for a text string. The return string uses the International
Phonetic Alphabet subset of Unicode. The input string is expected to be simple text
(for example, a word or phrase in English). The text is not expected to contain
punctuation or JSML markup.
Parameters:
text - plain text to be converted to phonemes
Returns:
phonemic representation of text or null
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
enumerateQueue
public Enumeration enumerateQueue()
throws EngineStateError
Return an Enumeration containing a snapshot of all the objects currently on the
speech output queue. The first item is the top of the queue. An empty queue returns a
null object.
This method returns only the items on the speech queue placed there by the current
application or applet. For security reasons, it is not possible to inspect items placed by
other applications.
The items on the speech queue cannot be modified by changing the object returned
from this method.
The enumerateQueue method works in the ALLOCATED state. The call blocks if the
Synthesizer in the ALLOCATING_RESOURCES state and completes when the engine
reaches the ALLOCATED state. An error is thrown for synthesizers in the DEALLOCATED
or DEALLOCATING_RESOURCES states.
Returns:
an Enumeration of the speech output queue or null
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
SynthesizerQueueItem, QUEUE_UPDATED, QUEUE_EMPTIED,
addEngineListener
cancel
public void cancel()
throws EngineStateError
Cancel output of the current object at the top of the output queue. A
SPEAKABLE_CANCELLED event is issued to appropriate SpeakableListeners.
If there is another object in the speaking queue, it is moved to top of queue and
receives the TOP_OF_QUEUE event. If the Synthesizer is not paused, speech output
continues with that object. To prevent speech output continuing with the next object
in the queue, call pause before calling cancel.
The cancel methods work in the ALLOCATED state. The calls blocks if the
Synthesizer in the ALLOCATING_RESOURCES state and complete when the engine
reaches the ALLOCATED state. An error is thrown for synthesizers in the DEALLOCATED
or DEALLOCATING_RESOURCES states.
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
cancel(Object), cancelAll(), QUEUE_UPDATED, QUEUE_EMPTIED,
TOP_OF_QUEUE, SPEAKABLE_CANCELLED
cancel
public void cancel(Object source)
throws IllegalArgumentException,
EngineStateError
Remove a specified item from the speech output queue. The source object must be
one of the items passed to a speak method. A SPEAKABLE_CANCELLED event is issued
to appropriate SpeakableListeners.
If the source object is the top item in the queue, the behavior is the same as the
cancel() method.
If the source object is not at the top of the queue, it is removed from the queue
without affecting the current top-of-queue speech output. A QUEUE_UPDATED is then
issued to SynthesizerListeners.
If the source object appears multiple times in the queue, only the first instance is
cancelled.
Warning: cancelling an object just after the synthesizer has completed speaking it and
has removed the object from the queue will cause an exception. In this instance, the
exception can be ignored.
The cancel methods work in the ALLOCATED state. The calls blocks if the
Synthesizer in the ALLOCATING_RESOURCES state and complete when the engine
reaches the ALLOCATED state. An error is thrown for synthesizers in the DEALLOCATED
or DEALLOCATING_RESOURCES states.
Parameters:
source - object to be removed from the speech output queue
Throws:
IllegalArgumentException - if the source object is not found in the speech output
queue.
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
cancel(), cancelAll(), QUEUE_UPDATED, QUEUE_EMPTIED,
SPEAKABLE_CANCELLED
cancelAll
public void cancelAll()
throws EngineStateError
Cancel all objects in the synthesizer speech output queue and stop speaking the
current top-of-queue object.
The cancel methods work in the ALLOCATED state. The calls blocks if the
Synthesizer in the ALLOCATING_RESOURCES state and complete when the engine
reaches the ALLOCATED state. An error is thrown for synthesizers in the DEALLOCATED
or DEALLOCATING_RESOURCES states.
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
cancel(), cancel(Object), QUEUE_EMPTIED, SPEAKABLE_CANCELLED
getSynthesizerProperties
public SynthesizerProperties getSynthesizerProperties()
Return the SynthesizerProperties object (a JavaBean). The method returns exactly
the same object as the getEngineProperties method in the Engine interface.
However, with the getSynthesizerProperties method, an application does not
need to cast the return value.
Returns:
the SynthesizerProperties object for this engine
See Also:
getEngineProperties
addSpeakableListener
public void addSpeakableListener(SpeakableListener listener)
Request notifications of all SpeakableEvents for all speech output objects for this
Synthesizer. An application can attach multiple SpeakableListeners to a
Synthesizer. A single listener can be attached to multiple synthesizers.
When an event effects more than one item in the speech output queue (e.g.
cancelAll), the SpeakableEvents are issued in the order of the items in the queue
starting with the top of the queue.
Parameters:
listener - the listener that will receive SpeakableEvents
See Also:
removeSpeakableListener
removeSpeakableListener
public void removeSpeakableListener(SpeakableListener listener)
Remove a SpeakableListener from this Synthesizer.
Refernces