0% found this document useful (0 votes)
144 views73 pages

Now Let Me Explain Speech Recognition Using Java.: Loadjsgf JSGF Loadjsgf

The document discusses speech recognition in Java using the Java Speech API. It provides an overview of key concepts like grammars, recognizers, and the steps to perform basic speech recognition. These include obtaining a recognizer from the Central class, loading a grammar, attaching a result listener, requesting focus and resuming recognition, and processing the results.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views73 pages

Now Let Me Explain Speech Recognition Using Java.: Loadjsgf JSGF Loadjsgf

The document discusses speech recognition in Java using the Java Speech API. It provides an overview of key concepts like grammars, recognizers, and the steps to perform basic speech recognition. These include obtaining a recognizer from the Central class, loading a grammar, attaching a result listener, requesting focus and resuming recognition, and processing the results.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 73

Now let me explain speech recognition using java.

The Java Speech API is designed to keep simple speech applications simple Þ and to
make advanced speech applications possible for non-specialist developers.A speech
recognizer is a speech engine that converts speech to text. The javax.speech.recognition
package defines the Recognizer interface to support speech recognition plus a set of
supporting classes and interfaces.

Simple example of Speech Recognition using JAVA

The following example shows a simple application that uses speech recognition. For this
application we need to define a grammar of everything the user can say, and we need to
write the Java software that performs the recognition task.

A grammar is provided by an application to a speech recognizer to define the words


that a user can say, and the patterns in which those words can be spoken. In this
example, we define a grammar that allows a user to say "Hello World" or a variant.
The grammar is defined using the Java Speech Grammar Format. This format is
documented in the Java Speech Grammar Format Specification.

Let's examine each step in detail.

 Create: The Central class of javax.speech package is used to obtain a speech


recognizer by calling the createRecognizer method. The EngineModeDesc
argument provides the information needed to locate an appropriate recognizer.
In this example we requested a recognizer that understands English (since the
grammar is written for English).

  Allocate: The allocate methods requests that the Recognizer allocate all
necessary resources.

  Load and enable grammars: The loadJSGF method reads in a JSGF document
from a reader created for the file that contains the javax.speech.demo grammar.
(Alternatively, the loadJSGF method can load a grammar from a URL.) Next,
the grammar is enabled. Once the recognizer receives focus (see below), an
enabled grammar is activated for recognition: that is, the recognizer compares
incoming audio to the active grammars and listens for speech that matches those
grammars.

  Attach a ResultListener: The HelloWorld class extends the ResultAdapter class


which is a trivial implementation of the ResultListener interface. An instance of
the HelloWorld class is attached to the Recognizer to receive result events. These
events indicate progress as the recognition of speech takes place. In this
implementation, we process the RESULT_ACCEPTED event, which is provided
when the recognizer completes recognition of input speech that matches an
active grammar.
 Commit changes: Any changes in grammars and the grammar enabled status
needed to be committed to take effect (that includes creation of a new grammar).

 Request focus and resume: For recognition of the grammar to occur, the
recognizer must be in the RESUMED state and must have the speech focus. The
requestFocus and resume methods achieve this.

 Process result: Once the main method is completed, the application waits until
the user speaks. When the user speaks something that matches the loaded
grammar, the recognizer issues a RESULT_ACCEPTED event to the listener we
attached to the recognizer. The source of this event is a Result object that
contains information about what the recognizer heard. The getBestTokens
method returns an array of ResultTokens, each of which represents a single
spoken word. These words are printed.

 Deallocate: Before exiting we call deallocate to free up the recognizer's resources.

Class javax.speech.Central
java.lang.Object
|
+--javax.speech.Central

public class Central


extends Object
The Central class is the initial access point to all speech input and output capabilities.
Central provides the ability to locate, select and create speech recognizers and speech
synthesizers

Creating a Recognizer or Synthesizer

The createRecognizer and createSynthesizer methods are used to create speech


engines. Both methods accept a single parameter that defines the required properties for the
engine to be created. The parameter is an EngineModeDesc and may be one of the sub-
classes: RecognizerModeDesc or SynthesizerModeDesc.

A mode descriptor defines a set of required properties for an engine. For example, a
SynthesizerModeDesc can describe a Synthesizer for Swiss German that has a male voice.
Similarly, a RecognizerModeDesc can describe a Recognizer that supports dictation for
Japanese.

An application is responsible for determining its own functional requirements for speech
input/output and providing an appropriate mode descriptor. There are three cases for mode
descriptors:

1. null
2. Created by the application
3. Obtained from the availableRecognizers or availableSynthesizers methods of
Central.

The mode descriptor is passed to the createRecognizer or createSynthesizer methods of


Central to create a Recognizer or Synthesizer. The created engine matches all the engine
properties in the mode descriptor passed to the create method. If no suitable speech engine is
available, the create methods return null.

The create engine methods operate differently for the three cases. That is, engine selection
depends upon the type of the mode descriptor:

1. null mode descriptor: the Central class selects a suitable engine for the default
Locale.
2. Application-created mode descriptor: the Central class attempts to locate an engine
with all application-specified properties.
3. Mode descriptor from availableRecognizers or availableSynthesizers:
descriptors returned by these two methods identify a specific engine with a specific
operating mode. Central creates an instance of that engine. (Note: these mode
descriptors are distinguished because they implement the EngineCreate interface.)

Case 1: Example

// Create a synthesizer for the default Locale


Synthesizer synth = Central.createSynthesizer(null);

Case 2: Example
// Create a dictation recognizer for British English
// Note: the UK locale is English spoken in Britain
RecognizerModeDesc desc = new RecognizerModeDesc(Locale.UK,
Boolean.TRUE);
Recognizer rec = Central.createRecognizer(desc);

Case 3: Example
// Obtain a list of all German recognizers
RecognizerModeDesc desc = new RecognizerModeDesc(Locale.GERMAN);
EngineList list = Central.availableRecognizers(desc);
// select amongst by other desired engine properties
RecognizerModeDesc chosen = ...
// create an engine from "chosen" - an engine-provided descriptor
Recognizer rec = Central.createRecognizer(chosen);

Engine Selection Procedure: Cases 1 & 2

For cases 1 and 2 there is a defined procedure for selecting an engine to be created. (For case
3, the application can apply it's own selection procedure.)

Locale is treated specially in the selection to ensure that language is always considered when
selecting an engine. If a locale is not provided, the default locale
(java.util.Locale.getDefault) is used.

The selection procedure is:


1. If the locale is undefined add the language of the default locale to the required
properties.
2. If a Recognizer or Synthesizer has been created already and it has the required
properties, return a reference to it. (The last created engine is checked.)
3. Obtain a list of all recognizer or synthesizer modes that match the required properties.
4. Amongst the matching engines, give preference to:
o A running engine (EngineModeDesc.getRunning is true),
o An engine that matches the default locale's country.

When more than one engine is a legal match in the final step, the engines are ordered as
returned by the availableRecognizers or availableSynthesizers method.

Security

Access to speech engines is restricted by Java's security system. This is to ensure that
malicious applets don't use the speech engines inappropriately. For example, a recognizer
should not be usable without explicit permission because it could be used to monitor ("bug")
an office.

A number of methods throughout the API throw SecurityException. Individual


implementations of Recognizer and Synthesizer may throw SecurityException on
additional methods as required to protect a client from malicious applications and applets.

The SpeechPermission class defines the types of permission that can be granted or denied
for applications. This permission system is based on the JDK 1.2 fine-grained security model.

Engine Registration

The Central class locates, selects and creates speech engines from amongst a list of
registered engines. Thus, for an engine to be used by Java applications, the engine must
register itself with Central. There are two registration mechanisms: (1) add an
EngineCentral class to a speech properties file, (2) temporarily register an engine by calling
the registerEngineCentral method.

The speech properties files provide persistent registration of speech engines. When Central
is first called, it looks for properties in two files:

<user.home>/speech.properties
<java.home>/lib/speech.properties
where the <user.home> and <java.home> are the values obtained from the System
properties object. (The '/' separator will vary across operating systems.) Engines identified in
either properties file are made available through the methods of Central.

The property files must contain data in the format that is read by the load method of the
Properties class. Central looks for properties of the form

com.acme.recognizer.EngineCentral=com.acme.recognizer.AcmeEngineCentral
This line is interpreted as "the EngineCentral object for the com.acme.recognizer engine
is the class called com.acme.recognizer.AcmeEngineCentral. When it is first called, the
Central class will attempt to create an instance of each EngineCentral object and will
ensure that it implements the EngineCentral interface.

Note to engine providers: Central calls each EngineCentral for each call to
availableRecognizers or availableSynthesizers and sometimes createRecognizer
and createSynthesizer The results are not stored. The
EngineCentral.createEngineList method should be reasonably efficient.

Method Summary
static EngineList availableRecognizers(EngineModeDesc require)
          List EngineModeDesc objects for available recognition engine
modes that match the required properties.
static EngineList availableSynthesizers(EngineModeDesc require)
          List EngineModeDesc objects for available synthesis engine modes
that match the required properties.
static Recognizer createRecognizer(EngineModeDesc require)
          Create a Recognizer with specified required properties.
static Synthesizer createSynthesizer(EngineModeDesc require)
          Create a Synthesizer with specified required properties.
static void registerEngineCentral(String className)
          Register a speech engine with the Central class for use by the
current application.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notifyAll, notify, toString, wait, wait, wait
 
Method Detail
createRecognizer
public static final Recognizer createRecognizer(EngineModeDesc require)
throws
IllegalArgumentException,

EngineException,

SecurityException
Create a Recognizer with specified required properties. If there is no Recognizer
with the required properties the method returns null.

The required properties defined in the input parameter may be provided as either an
EngineModeDesc object or a RecognizerModeDesc object. The input parameter may
also be null, in which case an engine is selected that supports the language of the
default locale.
A non-null mode descriptor may be either application-created or a mode descriptor
returned by the availableRecognizers method.

The mechanisms for creating a Recognizer are described above in detail.

Parameters:
require - required engine properties or null for default engine selection
Returns:
a recognizer matching the required properties or null if none is available
Throws:
IllegalArgumentException - if the properties of the EngineModeDesc do not refer to a
known engine or engine mode.
EngineException - if the engine defined by this RecognizerModeDesc could not be
properly created.
SecurityException - if the caller does not have createRecognizer permission
See Also:
availableRecognizers, RecognizerModeDesc

availableRecognizers
public static final EngineList availableRecognizers(EngineModeDesc require)
throws
SecurityException
List EngineModeDesc objects for available recognition engine modes that match the
required properties. If the require parameter is null, then all known recognizers are
listed.

Returns a zero-length list if no engines are available or if no engines have the required
properties. (The method never returns null).

The order of the EngineModeDesc objects in the list is partially defined. For each
registered engine (technically, each registered EngineCentral object) the order of the
descriptors is preserved. Thus, each installed speech engine should order its descriptor
objects with the most useful modes first, for example, a mode that is already loaded
and running on a desktop.

Parameters:
require - an EngineModeDesc or RecognizerModeDesc defining the required
features of the mode descriptors in the returned list
Returns:
list of mode descriptors with the required properties
Throws:
SecurityException - if the caller does not have permission to use speech recognition

createSynthesizer
public static final Synthesizer createSynthesizer(EngineModeDesc require)
throws
IllegalArgumentException,

EngineException
Create a Synthesizer with specified required properties. If there is no Synthesizer
with the required properties the method returns null.

The required properties defined in the input parameter may be provided as either an
EngineModeDesc object or a SynthesizerModeDesc object. The input parameter may
also be null, in which case an engine is selected that supports the language of the
default locale.

A non-null mode descriptor may be either application-created or a mode descriptor


returned by the availableSynthesizers method.

The mechanisms for creating a Synthesizer are described above in detail.

Parameters:
require - required engine properties or null for default engine selection
Returns:
a Synthesizer matching the required properties or null if none is available
Throws:
IllegalArgumentException - if the properties of the EngineModeDesc do not refer to a
known engine or engine mode.
EngineException - if the engine defined by this SynthesizerModeDesc could not be
properly created.
See Also:
availableSynthesizers, SynthesizerModeDesc

availableSynthesizers
public static final EngineList availableSynthesizers(EngineModeDesc
require)
throws
SecurityException
List EngineModeDesc objects for available synthesis engine modes that match the
required properties. If the require parameter is null, then all available known
synthesizers are listed.

Returns an empty list (rather than null) if no engines are available or if no engines
have the required properties.

The order of the EngineModeDesc objects in the list is partially defined. For each
speech installation (technically, each registered EngineCentral object) the order of
the descriptors is preserved. Thus, each installed speech engine should order its
descriptor objects with the most useful modes first, for example, a mode that is
already loaded and running on a desktop.

Throws:
SecurityException - if the caller does not have permission to use speech engines
registerEngineCentral
public static final void registerEngineCentral(String className)
throws EngineException
Register a speech engine with the Central class for use by the current application.
This call adds the specified class name to the list of EngineCentral objects. The
registered engine is not stored persistently in the properties files. If className is
already registered, the call has no effect.

The class identified by className must have an empty constructor.

Parameters:
className - name of a class that implements the EngineCentral interface and
provides access to an engine implementation
Throws:
EngineException - if className is not a legal class or it does not implement the
EngineCentral interface
2

The javax.speech package of the Java Speech API defines an abstract software
representation of a speech engine. "Speech engine" is the generic term for a system designed
to deal with either speech input or speech output. Speech synthesizers and speech recognizers
are both speech engine instances. Speaker verification systems and speaker identification
systems are also speech engines but are not currently supported through the Java Speech API.

The javax.speech package defines classes and interfaces that define the basic functionality
of an engine. The javax.speech.synthesis package and javax.speech.recognition
package extend and augment the basic functionality to define the specific capabilities of
speech synthesizers and speech recognizers.

The Java Speech API makes only one assumption about the implementation of a JSAPI
engine: that it provides a true implementation of the Java classes and interfaces defined by
the API. In supporting those classes and interfaces, an engine may completely software-based
or may be a combination of software and hardware. The engine may be local to the client
computer or remotely operating on a server. The engine may be written entirely as Java
software or may be a combination of Java software and native code.

The basic processes for using a speech engine in an application are as follows.

1. Identify the application's functional requirements for an engine (e.g, language or dictation
capability).
2. Locate and create an engine that meets those functional requirements.
3. Allocate the resources for the engine.
4. Set up the engine.
5. Begin operation of the engine - technically, resume it.
6. Use the engine
7. Deallocate the resources of the engine.
Steps 4 and 6 in this process operate differently for the two types of speech engine -
recognizer or synthesizer. The other steps apply to all speech engines and are described in the
remainder of this chapter.

The "Hello World!" code example for speech synthesis and the "Hello World!" code example
for speech recognition both illustrate the 7 steps described above. They also show that simple
speech applications are simple to write with the Java Speech API - writing your first speech
application should not be too hard.

4.2     Properties of a Speech Engine


Applications are responsible for determining their functional requirements for a speech
synthesizer and/or speech recognizer. For example, an application might determine that it
needs a dictation recognizer for the local language or a speech synthesizer for Korean with a
female voice. Applications are also responsible for determining behavior when there is no
speech engine available with the required features. Based on specific functional requirements,
a speech engine can be selected, created, and started. This section explains how the features
of a speech engine are used in engine selection, and how those features are handled in Java
software.

Functional requirements are handled in applications as engine selection properties. Each


installed speech synthesizer and speech recognizer is defined by a set of properties. An
installed engine may have one or many modes of operation, each defined by a unique set of
properties, and encapsulated in a mode descriptor object.

The basic engine properties are defined in the EngineModeDesc class. Additional specific
properties for speech recognizers and synthesizers are defined by the RecognizerModeDesc
and SynthesizerModeDesc classes that are contained in the javax.speech.recognition
and javax.speech.synthesis packages respectively.

In addition to mode descriptor objects provided by speech engines to describe their


capabilities, an application can create its own mode descriptor objects to indicate its
functional requirements. The same Java classes are used for both purposes. An engine-
provided mode descriptor describes an actual mode of operation whereas an application-
defined mode descriptor defines a preferred or desired mode of operation. (Locating,
Selecting and Creating Engines describes the use of a mode descriptor.)

The basic properties defined for all speech engines are listed in Table 4-1

Table 4-1 Basic engine selection properties: EngineModeDesc

Property Name Description  


EngineName   A String that defines the name of the speech engine. e.g., "Acme Dictation System".  

ModeName   A String that defines a specific mode of operation of the speech engine. e.g. "Acme
Spanish Dictator".  

Locale   A java.util.Locale object that indicates the language supported by the speech engine,
and optionally, a country and a variant. The Locale class uses standard ISO 639
language codes and ISO 3166 country codes. For example, Locale("fr", "ca")
represents a Canadian French locale, and Locale("en", "") represents English (the
language).  

Running   A Boolean object that is TRUE for engines which are already running on a platform,
otherwise FALSE. Selecting a running engine allows for sharing of resources and may
also allow for fast creation of a speech engine object.  

The one additional property defined by the SynthesizerModeDesc class for speech
synthesizers is shown in Table 4-2

Table 4-2 Synthesizer selection properties: SynthesizerModeDesc

Property Description  
Name  

List of An array of voices that the synthesizer is capable of producing. Each voice is defined
voices   by an instance of the Voice class which encapsulates voice name, gender, age and
speaking style.  

The two additional properties defined by the RecognizerModeDesc class for speech
recognizers are shown in Table 4-3

Table 4-3 Recognizer selection properties: RecognizerModeDesc

Property Name   Description  

Dictation A Boolean value indicating whether this mode of operation of the recognizer
supported   supports a dictation grammar.  

Speaker profiles A list of SpeakerProfile objects for speakers who have trained the recognizer.
Recognizers that do not support training return a null list.  

All three mode descriptor classes, EngineModeDesc, SynthesizerModeDesc and


RecognizerModeDesc use the get and set property patterns for JavaBeansTM. For example,
the Locale property has get and set methods of the form:

Locale getLocale();

void setLocale(Locale l);

Furthermore, all the properties are defined by class objects, never by primitives (primitives in
the Java programming language include boolean, int etc.). With this design, a null value
always represents "don't care" and is used by applications to indicate that a particular
property is unimportant to its functionality. For instance, a null value for the "dictation
supported" property indicates that dictation is not relevant to engine selection. Since that
property is represented by the Boolean class, a value of TRUE indicates that dictation is
required and FALSE indicates explicitly that dictation should not be provided.

4.3     Locating, Selecting and Creating Engines


4.3.1     Default Engine Creation

The simplest way to create a speech engine is to request a default engine. This is appropriate
when an application wants an engine for the default locale (specifically for the local
language) and does not have any special functional requirements for the engine. The Central
class in the javax.speech package is used for locating and creating engines. Default engine
creation uses two static methods of the Central class.

Synthesizer Central.createSynthesizer(EngineModeDesc mode);

Recognizer Central.createRecognizer(EngineModeDesc mode);

The following code creates a default Recognizer and Synthesizer.

import javax.speech.*;
import javax.speech.synthesis.*;
import javax.speech.recognition.*;

{
// Get a synthesizer for the default locale
Synthesizer synth = Central.createSynthesizer(null);
// Get a recognizer for the default locale
Recognizer rec = Central.createRecognizer(null);
}

For both the createSynthesizer and createRecognizer the null parameters indicate that
the application doesn't care about the properties of the synthesizer or recognizer. However,
both creation methods have an implicit selection policy. Since the application did not specify
the language of the engine, the language from the system's default locale returned by
java.util.Locale.getDefault() is used. In all cases of creating a speech engine, the Java
Speech API forces language to be considered since it is fundamental to correct engine
operation.

If more than one engine supports the default language, the Central then gives preference to
an engine that is running (running property is true), and then to an engine that supports the
country defined in the default locale.

If the example above is performed in the US locale, a recognizer and synthesizer for the
English language will be returned if one is available. Furthermore, if engines are installed for
both British and US English, the US English engine would be created.

4.3.2     Simple Engine Creation

The next easiest way to create an engine is to create a mode descriptor, define desired engine
properties and pass the descriptor to the appropriate engine creation method of the Central
class. When the mode descriptor passed to the createSynthesizer or createRecognizer
methods is non-null, an engine is created which matches all of the properties defined in the
descriptor. If no suitable engine is available, the methods return null.

The list of properties is described in the Properties of a Speech Engine. All the properties in
EngineModeDesc and its sub-classes RecognizerModeDesc and SynthesizerModeDesc
default to null to indicate "don't care".

The following code sample shows a method that creates a dictation-capable recognizer for the
default locale. It returns null if no suitable engine is available.

/** Get a dictation recognizer for the default locale */


Recognizer createDictationRecognizer()
{
// Create a mode descriptor with all required features
RecognizerModeDesc required = new RecognizerModeDesc();
required.setDictationGrammarSupported(Boolean.TRUE);
return Central.createRecognizer(required);
}
Since the required object provided to the createRecognizer method does not have a
specified locale (it is not set, so it is null) the Central class again enforces a policy of
selecting an engine for the language specified in the system's default locale. The Central
class will also give preference to running engines and then to engines that support the country
defined in the default locale.

In the next example we create a Synthesizer for Spanish with a male voice.

/**
* Return a speech synthesizer for Spanish.
* Return null if no such engine is available.
*/
Synthesizer createSpanishSynthesizer()
{
// Create a mode descriptor with all required features
// "es" is the ISO 639 language code for "Spanish"
SynthesizerModeDesc required = new SynthesizerModeDesc();
required.setLocale(new Locale("es", null));
required.addVoice(new Voice(
null, GENDER_MALE, AGE_DONT_CARE,
null));
return Central.createSynthesizer(required);
}

Again, the method returns null if no matching synthesizer is found and the application is
responsible for determining how to handle the situation.

4.3.3     Advanced Engine Selection

This section explains more advanced mechanisms for locating and creating speech engines.
Most applications do not need to use these mechanisms. Readers may choose to skip this
section.

In addition to performing engine creation, the Central class can provide lists of available
recognizers and synthesizers from two static methods.

EngineList availableSynthesizers(EngineModeDesc mode);

EngineList availableRecognizers(EngineModeDesc mode);

If the mode passed to either method is null, then all known speech recognizers or
synthesizers are returned. Unlike the createRecognizer and createSynthesizer methods,
there is no policy that restricts the list to the default locale or to running engines - in advanced
selection such decisions are the responsibility of the application.

Both availableSynthesizers and availableRecognizers return an EngineList object,


a sub-class of Vector. If there are no available engines, or no engines that match the
properties defined in the mode descriptor, the list is zero length (not null) and its isEmpty
method returns true. Otherwise the list contains a set of SynthesizerModeDesc or
RecognizerModeDesc objects each defining a mode of operation of an engine. These mode
descriptors are engine-defined so all their features are defined (non-null) and applications can
test these features to refine the engine selection.

Because EngineList is a sub-class of Vector, each element it contains is a Java Object.


Thus, when accessing the elements applications need to cast the objects to EngineModeDesc,
SynthesizerModeDesc or RecognizerModeDesc.

The following code shows how an application can obtain a list of speech synthesizers with a
female voice for German. All other parameters of the mode descriptor remain null for "don't
care" (engine name, mode name etc.).

import javax.speech.*;
import javax.speech.synthesis.*;

// Define the set of required properties in a mode descriptor


SynthesizerModeDesc required = new SynthesizerModeDesc();
required.setLocale(new Locale("de", ""));
required.addVoice(new Voice(
null, GENDER_FEMALE, AGE_DONT_CARE, null));

// Get the list of matching engine modes


EngineList list = Central.availableSynthesizers(required);

// Test whether the list is empty - any suitable synthesizers?


if (list.isEmpty()) ...

If the application specifically wanted Swiss German and a running engine it would add the
following before calling availableSynthesizers:

required.setLocale(new Locale("de", "CH"));

required.setRunning(Boolean.TRUE);

To create a speech engine from a mode descriptor obtained through the


availableSynthesizers and availableRecognizers methods, an application simply calls
the createSynthesizer or createRecognizer method. Because the engine created the
mode descriptor and because it provided values for all the properties, it has sufficient
information to create the engine directly. An example later in this section illustrates the
creation of a Recognizer from an engine- provided mode descriptor.

Although applications do not normally care, engine-provided mode descriptors are special in
two other ways. First, all engine-provided mode descriptors are required to implement the
EngineCreate interface which includes a single createEngine method. The Central class
uses this interface to perform the creation. Second, engine-provided mode descriptors may
extend the SynthesizerModeDesc and RecognizerModeDesc classes to encapsulate
additional features and information. Applications should not access that information if they
want to be portable, but engines will use that information when creating a running
Synthesizer or Recognizer.

4.3.3.1     Refining an Engine List

If more than one engine matches the required properties provided to


availableSynthesizers or availableRecognizers then the list will have more than one
entry and the application must choose from amongst them.

In the simplest case, applications simply select the first in the list which is obtained using the
EngineList.first method. For example:

EngineModeDesc required;
...
EngineList list = Central.availableRecognizers(required);

if (!list.isEmpty()) {
EngineModeDesc desc = (EngineModeDesc)(list.first());
Recognizer rec = Central.createRecognizer(desc);
}

More sophisticated selection algorithms may test additional properties of the available
engine. For example, an application may give precedence to a synthesizer mode that has a
voice called "Victoria".

The list manipulation methods of the EngineList class are convenience methods for
advanced engine selection.

 anyMatch(EngineModeDesc) returns true if at least one mode descriptor in the list has
the required properties.

 requireMatch(EngineModeDesc) removes elements from the list that do not match the
required properties.

 rejectMatch(EngineModeDesc) removes elements from the list that match the


specified properties.

 orderByMatch(EngineModeDesc) moves list elements that match the properties to the


head of the list.

The following code shows how to use these methods to obtain a Spanish dictation recognizer
with preference given to a recognizer that has been trained for a specified speaker passed as
an input parameter.

import javax.speech.*;
import javax.speech.recognition.*;
import java.util.Locale;
Recognizer getSpanishDictation(String name)
{
RecognizerModeDesc required = new RecognizerModeDesc();
required.setLocale(new Locale("es", ""));
required.setDictationGrammarSupported(Boolean.TRUE);

// Get a list of Spanish dictation recognizers


EngineList list = Central.availableRecognizers(required);

if (list.isEmpty()) return null; // nothing available

// Create a description for an engine trained for the speaker


SpeakerProfile profile = new SpeakerProfile(null, name, null);
RecognizerModeDesc requireSpeaker = new RecognizerModeDesc();
requireSpeaker.addSpeakerProfile(profile);

// Prune list if any recognizers have been trained for speaker


if (list.anyMatch(requireSpeaker))
list.requireMatch(requireSpeaker);

// Now try to create the recognizer


RecognizerModeDesc first =
(RecognizerModeDesc)
(list.firstElement());
try {
return Central.createRecognizer(first);
} catch (SpeechException e) {
return null;
}
}

4.4     Engine States


4.4.1     State systems

The Engine interface includes a set of methods that define a generalized state system
manager. Here we consider the operation of those methods. In the following sections we
consider the two core state systems implemented by all speech engines: the allocation state
system and the pause-resume state system. In Chapter 5, the state system for synthesizer
queue management is described. In Chapter 6, the state systems for recognizer focus and for
recognition activity are described.

A state defines a particular mode of operation of a speech engine. For example, the output
queue moves between the QUEUE_EMPTY and QUEUE_NOT_EMPTY states. The following are the
basics of state management.
The getEngineState method of the Engine interface returns the current engine state. The
engine state is represented by a long value (64-bit value). Specified bits of the state represent
the engine being in specific states. This bit- wise representation is used because an engine
can be in more than one state at a time, and usually is during normal operation.

Every speech engine must be in one and only one of the four allocation states (described in
detail in Section 4.4.2). These states are DEALLOCATED, ALLOCATED, ALLOCATING_RESOURCES
and DEALLOCATING_RESOURCES. The ALLOCATED state has multiple sub-states. Any
ALLOCATED engine must be in either the PAUSED or the RESUMED state (described in detail in
Section 4.4.4).

Synthesizers have a separate sub-state system for queue status. Like the paused/resumed state
system, the QUEUE_EMPTY and QUEUE_NOT_EMPTY states are both sub-states of the ALLOCATED
state. Furthermore, the queue status and the paused/resumed status are independent.

Recognizers have three independent sub-state systems to the ALLOCATED state (the
PAUSED/RESUMED system plus two others). The LISTENING, PROCESSING and SUSPENDED
states indicate the current activity of the recognition process. The FOCUS_ON and FOCUS_OFF
states indicate whether the recognizer currently has speech focus. For a recognizer, all three
sub-state systems of the ALLOCATED state operate independently (with some exceptions that
are discussed in the recognition chapter).

Each of these state names is represented by a static long in which a single unique bit is set.
The & and | operators of the Java programming language are used to manipulate these state
bits. For example, the state of an allocated, resumed synthesizer with an empty speech output
queue is defined by:

(Engine.ALLOCATED | Engine.RESUMED | Synthesizer.QUEUE_EMPTY)

To test whether an engine is resumed, we use the test:

if ((engine.getEngineState() & Engine.RESUMED) != 0) ...

For convenience, the Engine interface defines two additional methods for handling engine
states. The testEngineState method is passed a state value and returns true if all the state
bits in that value are currently set for the engine. Again, to test whether an engine is resumed,
we use the test:

if (engine.testEngineState(Engine.RESUMED)) ...

Technically, the testEngineState(state) method is equivalent to:

if ((engine.getEngineState() & state) == state)...

The final state method is waitEngineState. This method blocks the calling thread until the
engine reaches the defined state. For example, to wait until a synthesizer stops speaking
because its queue is empty we use:

engine.waitEngineState(Synthesizer.QUEUE_EMPTY);
In addition to method calls, applications can monitor state through the event system. Every
state transition is marked by an EngineEvent being issued to each EngineListener attached
to the Engine. The EngineEvent class is extended by the SynthesizerEvent and
RecognizerEvent classes for state transitions that are specific to those engines. For example,
the RECOGNIZER_PROCESSING RecognizerEvent indicates a transition from the LISTENING
state to the PROCESSING (which indicates that the recognizer has detected speech and is
producing a result).

4.4.2     Allocation State System

Engine allocation is the process in which the resources required by a speech recognizer or
synthesizer are obtained. Engines are not automatically allocated when created because
speech engines can require substantial resources (CPU, memory and disk space) and because
they may need exclusive access to an audio resource (e.g. microphone input or speaker
output). Furthermore, allocation can be a slow procedure for some engines (perhaps a few
seconds or over a minute).

The allocate method of the Engine interface requests the engine to perform allocation and
is usually one of the first calls made to a created speech engine. A newly created engine is
always in the DEALLOCATED state. A call to the allocate method is, technically speaking, a
request to the engine to transition to the ALLOCATED state. During the transition, the engine is
in a temporary ALLOCATING_RESOURCES state.

The deallocate method of the Engine interface requests the engine to perform deallocation
of its resources. All well-behaved applications call deallocate once they have finished
using an engine so that its resources are freed up for other applications. The deallocate
method returns the engine to the DEALLOCATED state. During the transition, the engine is in a
temporary DEALLOCATING_RESOURCES state.

Figure 4-1 shows the state diagram for the allocation state system.
Each block represents a state of the engine. An engine must always be in one of the four
specified states. As the engine transitions between states, the event labelled on the transition
arc is issued to the EngineListeners attached to the engine.

The normal operational state of an engine is ALLOCATED. The paused-resumed state of an


engine is described in the next section. The sub-state systems of ALLOCATED synthesizers and
recognizers are described in Chapter 5 and Chapter 6 respectively.

4.4.3     Allocated States and Call Blocking

For advanced applications, it is often desirable to start up the allocation of a speech engine in
a background thread while other parts of the application are being initialized. This can be
achieved by calling the allocate method in a separate thread. The following code shows an
example of this using an inner class implementation of the Runnable interface. To determine
when the allocation method is complete, we check later in the code for the engine being in the
ALLOCATED state.

Engine engine;
{
engine = Central.createRecognizer();

new Thread(new Runnable() {


public void run() {
try {
engine.allocate();
}
catch (Exception e) {
e.printStackTrace();
}
}
}).start();

// Do other stuff while allocation takes place


...

// Now wait until allocation is complete


engine.waitEngineState(Engine.ALLOCATED);
}
}

A full implementation of an application that uses this approach to engine allocation needs to
consider the possibility that the allocation fails. In that case, the allocate method throws an
EngineException and the engine returns to the DEALLOCATED state.

Another issue advanced applications need to consider is class blocking. Most methods of the
Engine, Recognizer and Synthesizer are defined for normal operation in the
ALLOCATED state. What if they are called for an engine in another allocation state? For
most methods, the operation is defined as follows:
 ALLOCATED state: for nearly all methods normal behavior is defined for this state. (An
exception is the allocate method).

 ALLOCATING_RESOURCES state: most methods block in this state. The calling thread waits
until the engine reaches the ALLOCATED state. Once that state is reached, the method
behaves as normally defined.

 DEALLOCATED state: most methods are not defined for this state, so an
EngineStateError is thrown. (Exceptions include the allocate method and certain
methods listed below.)

 DEALLOCATING_RESOURCES state: most methods are not defined for this state, so an
EngineStateError is thrown.

A small subset of engine methods will operate correctly in all engine states. The
getEngineProperties always allows runtime engine properties to be set and tested
(although properties only take effect in the ALLOCATED state). The getEngineModeDesc
method can always return the mode descriptor for the engine. Finally, the three engine state
methods - getEngineState, testEngineState and waitEngineState - always operated as
defined.

4.4.4     Pause - Resume State System

All ALLOCATED speech engines have PAUSED and RESUMED states. Once an engine reaches the
ALLOCATED state, it enters either the PAUSED or the RESUMED state. The factors that affect the
initial PAUSED/RESUMED state are described below.

The PAUSED/RESUMED state indicates whether the audio input or output of the engine is on or
off. A resumed recognizer is receiving audio input. A paused recognizer is ignoring audio
input. A resumed synthesizer produces audio output as it speaks. A paused synthesizer is not
producing audio output.

As part of the engine state system, the Engine interface provides several methods to test
PAUSED/RESUMED state. The general state system is described previously in Section 4.4.

An application controls an engine's PAUSED/RESUMED state with the pause and resume
methods. An application may pause or resume an engine indefinitely. Each time the
PAUSED/RESUMED state changes an ENGINE_PAUSED or ENGINE_RESUMED type of EngineEvent
is issued each EngineListener attached to the Engine.

Figure 4-2 shows the basic pause and resume diagram for a speech engine. As a sub-state
system of the ALLOCATED state, the pause and resume states represented within the ALLOCATED
state as shown in Figure 4-1.

As with Figure 4-1, Figure 4-2 represents states as labelled blocks, and the engine events as
labelled arcs between those blocks. In this diagram the large block is the ALLOCATED state
which contains both the PAUSED and RESUMED states.

4.4.5     State Sharing

The PAUSED/RESUMED state of a speech engine may, in many situations, be shared by multiple
applications. Here we must make a distinction between the Java object that represents a
Recognizer or Synthesizer and the underlying engine that may have multiple Java and
non-Java applications connected to it. For example, in personal computing systems (e.g.,
desktops and laptops), there is typically a single engine running and connected to microphone
input or speaker/ headphone output and all application share that resource.

When a Recognizer or Synthesizer (the Java software object) is paused and resumed the
shared underlying engine is paused and resumed and all applications connected to that engine
are affected.

There are three key implications from this architecture:

 An application should pause and resume an engine only in response to a user request (e.g.,
because a microphone button is pressed for a recognizer). For example, it should not pause
an engine before deallocating it.

 A Recognizer or Synthesizer may be paused and resumed because of a request by


another application. The application will receive an ENGINE_PAUSED or ENGINE_RESUMED
event and the engine state value is updated to reflect the current engine state.

 Because an engine could be resumed without explicitly requesting a resume it should always
be prepared for that resume. For example, it should not place text on the synthesizer's
output queue unless it would expect it to be spoken upon a resume. Similarly, the set of
enabled grammars of a recognizer should always be appropriate to the application context,
and the application should be prepared to accept input results from the recognizer if an
enabled grammar is unexpectedly resumed.
4.4.6     Synthesizer Pause

For a speech synthesizer - a speech output device - pause immediately stops the audio output
of synthesized speech. Resume recommences speech output from the point at which the pause
took effect. This is analogous to pause and resume on a tape player or CD player.

Chapter 5 describes an additional state system of synthesizers. An ALLOCATED Synthesizer


has sub-states for QUEUE_EMPTY and QUEUE_NOT_EMPTY. This represents whether there is text
on the speech output queue of the synthesizer that is being spoken or waiting to be spoken.
The queue state and pause/resume state are independent. It is possible, for example, for a
RESUMED synthesizer to have an empty output queue (QUEUE_EMPTY state). In this case, the
synthesizer is silent because it has nothing to say. If any text is provided to be spoken, speech
output will start immediately because the synthesizer is RESUMED.

4.4.7     Recognizer Pause

For a recognizer, pausing and resuming turns audio input off and on and is analogous to
switching the microphone off and on. When audio input is off the audio is lost. Unlike a
synthesizer, for which a resume continues speech output from the point at which it was
paused, resuming a recognizer restarts the processing of audio input from the time at which
resume is called.

Under normal circumstances, pausing a recognizer will stop the recognizer's internal
processes that match audio against grammars. If the user was in the middle of speaking at the
instant at which the recognizer was paused, the recognizer is forced to finalize its recognition
process. This is because a recognizer cannot assume that the audio received just before
pausing is in any way linked to the audio data that it will receive after being resumed.
Technically speaking, pausing introduces a discontinuity into the audio input stream.

One complexity for pausing and resuming a recognizer (not relevant to synthesizers) is the
role of internal buffering. For various reasons, described in Chapter 6, a recognizer has a
buffer for audio input which mediates between the audio device and the internal component
of the recognizer which perform that match of the audio to the grammars. If recognizer is
performing in real-time the buffer is empty or nearly empty. If the recognizer is temporarily
suspended or operates slower than real-time, then the buffer may contain seconds of audio or
more.

When a recognizer is paused, the pause takes effect on the input end of the buffer; i.e, the
recognizer stops putting data into the buffer. At the other end of the buffer - where the actual
recognition is performed Þ- the recognizer continues to process audio data until the buffer is
empty. This means that the recognizer can continue to produce recognition results for a
limited period of time even after it has been paused. (A Recognizer also provides a
forceFinalize method with an option to flush the audio input buffer.)

Chapter 6 describes an additional state system of recognizers. An ALLOCATED Recognizer


has a separate sub-state system for LISTENING, RECOGNIZING and SUSPENDED. These states
indicate the current activity of the internal recognition process. These states are largely
decoupled from the PAUSED and RESUMED states except that, as described in detail in Chapter
6, a paused recognizer eventually returns to the LISTENING state when it runs out of audio
input (the LISTENING state indicates that the recognizer is listening to background silence, not
to speech).

The SUSPENDED state of a Recognizer is superficially similar to the PAUSED state. In the
SUSPENDED state the recognizer is not processing audio input from the buffer, but is
temporarily halted while an application updates its grammars. A key distinction between the
PAUSED state and the SUSPENDED state is that in the SUSPENDED state audio input can be still
be coming into the audio input buffer. When the recognizer leaves the SUSPENDED state the
audio is processed. The SUSPENDED state allows a user to continue talking to the recognizer
even while the recognizer is temporarily SUSPENDED. Furthermore, by updating grammars in
the SUSPENDED state, an application can apply multiple grammar changes instantaneously
with respect to the audio input stream.

4.5     Speech Events


Speech engines, both recognizers and synthesizers, generate many types of events.
Applications are not required to handle all events, however, some events are particularly
important for implementing speech applications. For example, some result events must be
processed to receive recognized text from a recognizer.

Java Speech API events follow the JavaBeans event model. Events are issued to a listener
attached to an object involved in generating that event. All the speech events are derived from
the SpeechEvent class in the javax.speech package.

The events of the javax.speech package are listed in Table 4-4.

Table 4-4 Speech events: javax.speech package

Name   Description  

SpeechEvent   Parent class of all speech events.  

EngineEvent   Indicates a change in speech engine state.  

AudioEvent   Indicates an audio input or output event.  

EngineErrorEvent   Sub-class of EngineEvent that indicates an asynchronous problems has


occurred in the engine.  
The events of the javax.speech.synthesis package are listed in Table 4-5.

Table 4-5 Speech events: javax.speech.synthesis package

Name   Description  

SynthesizerEvent   Extends the EngineEvent for the specialized events of a Synthesizer.  

SpeakableEvent   Indicates the progress in output of synthesized text.  

The events of the javax.speech.recognition package are listed in Table 4-6.

Table 4-6 Speech events: javax.speech.recognition package

Name   Description  

RecognizerEvent   Extends the EngineEvent for the specialized events of a Recognizer.  

GrammarEvent   Indicates an update of or a status change of a recognition grammar.  

ResultEvent   Indicates status and data changes of recognition results.  

RecognizerAudioEvent   Extends AudioEvent with events for start and stop of speech and
audio level updates.  

4.5.1     Event Synchronization

A speech engine is required to provide all its events in synchronization with the AWT event
queue whenever possible. The reason for this constraint is that it simplifies to integration of
speech events with AWT events and the Java Foundation Classes events (e.g., keyboard,
mouse and focus events). This constraint does not adversely affect applications that do not
provide graphical interfaces.

Synchronization with the AWT event queue means that the AWT event queue is not issuing
another event when the speech event is being issued. To implement this, speech engines need
to place speech events onto the AWT event queue. The queue is obtained through the AWT
Toolkit:

EventQueue q = Toolkit.getDefaultToolkit().getSystemEventQueue();

The EventQueue runs a separate thread for event dispatch. Speech engines are not required to
issue the events through that thread, but should ensure that thread is blocked while the speech
event is issued.
Note that SpeechEvent is not a sub-class of AWTEvent, and that speech events are not
actually placed directly on the AWT event queue. Instead, a speech engine is performing
internal activities to keep its internal speech event queue synchronized with the AWT event
queue to make an application developer's life easier.

4.6     Other Engine Functions


4.6.1     Runtime Engine Properties

Speech engines each have a set of properties that can be changed while the engine is running.
The EngineProperties interface defined in the javax.speech package is the root interface
for accessing runtime properties. It is extended by the SynthesizerProperties interface
defined in the javax.speech.synthesis package, and the RecognizerProperties
interface defined in the javax.speech.recognition package.

For any engine, the EngineProperties is obtained by calling the EngineProperties


method defined in the Engine interface. To avoid casting the return object, the
getSynthesizerProperties method of the Synthesizer interface and the
getRecognizerProperties method of the Recognizer interface are also provided to return
the appropriate type. For example:

{
Recognizer rec = ...;
RecognizerProperties props = rec.getRecognizerProperties();
}

The EngineProperties interface provides three types of functionality.

 The addPropertyChangeListener and removePropertyChangeListener methods


add or remove a JavaBeans PropertyChangeListener. The listener receives an event
notification any time a property value changes.

 The getControlComponent method returns an engine-provided AWT Component or


null if one is not provided by the engine. This component can be displayed for a user to
modify the engine properties. In some cases this component may allow customization of
properties that are not programmatically accessible.

 The reset method is used to set all engine properties to default values.
The SynthesizerProperties and RecognizerProperties interfaces define the sets of
runtime features of those engine types. These specific properties defined by these interfaces
are described in Chapter 5 and Chapter 6 respectively.

For each property there is a get and a set method, both using the JavaBeans property patterns.
For example, the methods for handling a synthesizer's speaking voice are:

float getVolume()

void setVolume(float voice) throws PropertyVetoException;

The get method returns the current setting. The set method attempts to set a new volume. A
set method throws an exception if it fails. Typically, this is because the engine rejects the set
value. In the case of volume, the legal range is 0.0 to 1.0. Values outside of this range cause
an exception.

The set methods of the SynthesizerProperties and RecognizerProperties interfaces are


asynchronous - they may return before the property change takes effect. For example, a
change in the voice of a synthesizer may be deferred until the end of the current word, the
current sentence or even the current document. So that an application knows when a change
occurs, a PropertyChangeEvent is issued to each PropertyChangeListener attached to the
properties object.

A property change event may also be issued because another application has changed a
property, because changing one property affects another (e.g., changing a synthesizer's voice
from male to female will usually cause an increase in the pitch setting), or because the
property values have been reset.

4.6.2     Audio Management

The AudioManager of a speech engine is provided for management of the engine's speech
input or output. For the Java Speech API Version 1.0 specification, the AudioManager
interface is minimal. As the audio streaming interfaces for the Java platform are established,
the AudioManager interface will be enhanced for more advanced functionality.

For this release, the AudioManager interface defines the ability to attach and remove
AudioListener objects. For this release, the AudioListener interface is simple: it is empty.
However, the RecognizerAudioListener interface extends the AudioListener interface to
receive three audio event types (SPEECH_STARTED, SPEECH_STOPPED and AUDIO_LEVEL
events). These events are described in detail in Chapter 6. As a type of AudioListener, a
RecognizerAudioListener is attached and removed through the AudioManager.

4.6.3     Vocabulary Management

An engine can optionally provide a VocabManager for control of the pronunciation of words
and other vocabulary. This manager is obtained by calling the getVocabManager method of a
Recognizer or Synthesizer (it is a method of the Engine interface). If the engine does not
support vocabulary management, the method returns null.
The manager defines a list of Word objects. Words can be added to the VocabManager,
removed from the VocabManager, and searched through the VocabManager.

The Word class is defined in the javax.speech package. Each Word is defined by the
following features.

 Written form: a required String that defines how the Word should be presented visually.

 Spoken form: an optional String that indicates how the Word is spoken. For English, the
spoken form might be used for defining how acronyms are spoken. For Japanese, the spoken
form could provide a kana representation of how kanji in the written form is pronounced.

 Pronunciations: an optional String array containing one or more phonemic representations


of the pronunciations of the Word. The International Phonetic Alphabet subset of Unicode is
used throughout the Java Speech API for representing pronunciations.

 Grammatical categories: an optional set of or'ed grammatical categories. The Word class
defines 16 different classes of words (noun, verb, conjunction etc.). These classes do not
represent a complete linguistic breakdown of all languages. Instead they are intended to
provide a Recognizer or Synthesizer with additional information about a word that
may assist in correctly recognizing or correctly speaking it.

5.6     Synthesizer Properties


The SynthesizerProperties interface extends the EngineProperties interface described
in Section 4.6.1. The JavaBeans property mechanisms, the asynchronous application of
property changing, and the property change event notifications are all inherited engine
behavior and are described in that section.

The SynthesizerProperties object is obtained by calling the getEngineProperties


method (inherited from the Engine interface) or the getSynthesizerProperties method.
Both methods return the same object instance, but the latter is more convenient since it is an
appropriately cast object.

The SynthesizerProperties interface defines five synthesizer properties that can be


modified during operation of a synthesizer to effect speech output.

The voice property is used to control the speaking voice of the synthesizer. The set of voices
supported by a synthesizer can be obtained by the getVoices method of the synthesizer's
SynthesizerModeDesc object. Each voice is defined by a voice name, gender, age and
speaking style. Selection of voices is described in more detail in Selecting Voices.

The remaining four properties control prosody. Prosody is a set of features of speech
including the pitch and intonation, rhythm and timing, stress and other characteristics which
affect the style of the speech. The prosodic features controlled through the
SynthesizerProperties interface are:

 Volume: a float value that is set on a scale from 0.0 (silence) to 1.0 (loudest).
 Speaking rate: a float value indicating the speech output rate in words per minute.
Higher values indicate faster speech output. Reasonable speaking rates depend upon
the synthesizer and the current voice (voices may have different natural speeds). Also,
speaking rate is also dependent upon the language because of different conventions
for what is a "word". For English, a typical speaking rate is around 200 words per
minute.

 Pitch: the baseline pitch is a float value given in Hertz. Different voices have different
natural sounding ranges of pitch. Typical male voices are between 80 and 180 Hertz.
Female pitches typically vary from 150 to 300 Hertz.

 Pitch range: a float value indicating a preferred range for variation in pitch above the
baseline setting. A narrow pitch range provides monotonous output while wide range
provide a more lively voice. The pitch range is typically between 20% and 80% of the
baseline pitch

5.4     Speech Output Queue


Each call to the speak and speakPlainText methods places an object onto the synthesizer's
speech output queue. The speech output queue is a FIFO queue: first-in-first-out. This means
that objects are spoken in the order in which they are received.

The top of queue item is the head of the queue. The top of queue item is the item currently
being spoken or is the item that will be spoken next when a paused synthesizer is resumed.

The Synthesizer interface provides a number of methods for manipulating the output queue.
The enumerateQueue method returns an Enumeration object containing a
SynthesizerQueueItem for each object on the queue. The first object in the enumeration is
the top of queue. If the queue is empty the enumerateQueue method returns null.

Each SynthesizerQueueItem in the enumeration contains four properties. Each property has
a accessor method:

 getSource returns the source object for the queue item. The source is the object
passed to the speak and speakPlainText method: a Speakable object, a URL or a
String.

 getText returns the text representation for the queue item. For a Speakable object it
is the String returned by the getJSMLText method. For a URL it is the String loaded
from that URL. For a string source, it is that string object.

 isPlainText allows an application to distinguish between plain text and JSML


objects. If this method returns true the string returned by getText is plain text.

 getSpeakableListener returns the listener object to which events associated with


this item will be sent. If no listener was provided in the call to speak and
speakPlainText then the call returns null.
The state of the queue is an explicit state of the Synthesizer. The Synthesizer interface
defines a state system for QUEUE_EMPTY and QUEUE_NOT_EMPTY. Any Synthesizer in the
ALLOCATED state must be in one and only one of these two states.

The QUEUE_EMPTY and QUEUE_NOT_EMPTY states are parallel states to the PAUSED and
RESUMED states. These two state systems operate independently as shown in Figure 5-1 (an
extension of Figure 4-2).

To install JSAPI, extract the downloaded file, freetts-1.2beta2-bin.zip, in C drive and set the
classpath of the lib directory of free implementation by executing the following command at
the command prompt:

set classpath = %classpath%;c:\freetts-bin-


1_2_beta\lib\freetts.jar;c:\freetts-bin-1_2_beta\lib\cmulex.jar;c:\freetts-
bin-1_2_beta\lib\
jsapi.jar;

The javax.speech Package

The javax.speech package contains classes and interfaces that define how the speech engine
functions. A speech engine is a system that manages speech input and output. The
javax.speech package defines the basic properties of a speech engine.

The commonly used classes of the javax.speech package are:

 AudioEvent
 Central

 EngineModeDesc

 EngineList

The commonly used interfaces of the javax.speech package are:

 Engine

 AudioManager

 VocabManager

The AudioEvent Class

The AudioEvent class specifies the events related to audio input for the speech recognizer
and audio output for speech synthesis. The AudioEvent class defines a method,
paramString(), which returns a parameter string to identify the event occurred. This method is
used for debugging and for maintaining event logs.

The Central Class

The Central class allows you to access all the speech input and output functions of a speech
engine. This class provides methods to locate, select, and create speech engines, such as
speech recognizers and speech synthesizers. A Java application can use a speech engine if the
speech engine is registered with the Central class. The various methods declared in the
Central class are:

 availableRecognizers(): Returns a list of available speech recognizers according to the


required properties specified in the input parameter, such as the EngineModeDesc
class or the RecognizerModeDesc class. If the parameter passed is null, the
availableRecognizers() method lists all the available known recognizers.

 availableSynthesizer(): Returns a list of available synthesizers according to the


required properties specified in the input parameter, such as the EngineModeDesc
class. If the parameter passed is null, the availableSynthesizer() method lists all the
available known synthesizers.

 createRecognizer(): Creates a recognizer according to the specified properties in the


input parameter, such as the EngineModeDesc class or the RecognizerModeDesc
class. The createRecognizer() method returns null if there is no recognizer with the
specified properties.

 createSynthesizer(): Creates a synthesizer according to the specified properties in the


input parameter, such as the EngineModeDesc class or the SynthesizerModeDesc
class. The createSynthesizer() method returns null if there is no synthesizer with the
specified properties.
 registerEngineCentral(): Registers a speech engine with the Central class. The
registerEngineCentral() method takes an object of the String class as an input
parameter. The registerEngineCentral() method adds the specified class name to the
list of engines.

The EngineModeDesc Class

The EngineModeDesc class defines the basic properties of a speech engine that determine the
mode of operation, such as Spanish or English dictator. The various methods declared in the
EngineModeDesc class are:

 getEngineName(): Returns the engine name, which should be a unique string across
the provider.

 setEngineName(): Sets the name of the engine as provided in the input parameter
string.

 getModeName(): Returns the mode name, which uniquely identifies the single mode
of operation of the speech engine.

 setModeName(): Sets the mode name as provided in the input parameter string.

 getLocale(): Returns the object of the Locale class for the engine mode.

 setLocale(): Sets the Locale of the engine according to the specified input parameter,
which is an object of the Locale class.

 getRunning(): Returns a Boolean value indicating whether or not the speech engine is
already running.

 setRunning(): Sets the feature required to run the engine, according to the Boolean
input parameter.

 match(): Returns a Boolean value to determine whether or not the EngineModeDesc


object input parameter has all the defined features.

 equals(): Returns a Boolean value, which is true if the EngineModeDesc object input
parameter is not null and has equal values for engine name, mode name, and Locale.

The EngineList Class

The EngineList class selects the appropriate speech engine with the help of the methods of
the Central class. The EngineList class contains a set of EngineModeDesc class objects. The
various methods available in the EngineList class are:

 anyMatch(): Returns a Boolean value, which is true if one or more of the


EngineModeDesc class objects in the EngineList class match the EngineModeDesc
class object in the input parameter.
 requireMatch(): Removes the EngineModeDesc class object entries from the
EngineList class that do not match the EngineModeDesc class object specified as the
input parameter. For each EngineModeDesc class object in the list, the match method
is called. If the match method returns false, the corresponding entry is removed from
the list.

 rejectMatch(): Removes the EngineModeDesc class object entries from the


EngineList class that match the EngineModeDesc class object specified as the input
parameter. For each EngineModeDesc class object in the list, the match method is
called. If the match method returns true, the corresponding entry is removed from the
list.

 orderByMatch(): Orders the list that matches the required features. This method takes
the EngineModeDesc class object as an input parameter.

The Engine Interface

The Engine interface is the parent interface for all speech engines. The speech engines derive
functions, such as allocation and deallocation of methods, access to EngineProperties and
EngineModeDesc classes, and use of the pause() and resume() methods from the Engine
interface. Some of the methods defined by the Engine interface are:

 allocate(): Allocates the resources required by the Engine interface and sets the state
of the Engine interface as ALLOCATED. When the method executes, the Engine
interface is in the ALLOCATING_RESOURCES state.

 deallocate(): Deallocates the resources of the engine, which are acquired at the
ALLOCATED state and during the operation. This method sets the state of the engine
as DEALLOCATED.

 pause(): Pauses the audio stream of the engine and sets the state of the engine as
PAUSED.

 resume(): Resumes the audio streaming to or from a paused engine and sets the state
of the engine as RESUME.

 getEngineState(): Returns the current state of the Engine interface.

The AudioManager Interface

The AudioManager interface allows an application to control and monitor the audio input and
output, and other audio-related events, such as start and stop audio. The methods provided by
this interface are:

 addAudioListener(): Requests notifications of audio events to the AudioListener


object specified as an input parameter.

 removeAudioListener(): Removes the object of the AudioListener interface specified


as an input parameter from the AudioManager interface.
The VocabManager Interface

The VocabManager interface manages words that the speech engine uses. This interface
provides information about difficult words to the speech engine. Some of the methods
provided by this interface are:

 addWord(): Adds a word to the vocabulary of the speech engine. This method takes
an object of the Word class as an input parameter.

 addWords(): Adds an array of words to the vocabulary of the speech engine. This
method takes an object array of the Word class as an input parameter.

 removeWord(): Removes a word from the vocabulary of the speech engine. This
method takes an object of the Word class as an input parameter.

 removeWords(): Removes an array of words from the vocabulary of the speech


engine. This method takes an object array of the Word class as an input parameter.

 listProblemWords(): Returns an array of words that cause problems due to spelling


mistakes.

3 tokens

The javax.speech.recognition Package

The javax.speech.recognition package provides classes and interfaces that support speech
recognition. This package inherits the basic functioning from the javax.speech package. The
speech recognizer is a type of speech engine that has the ability to recognize and convert
incoming speech to text.

The commonly used classes of the javax.speech.recognition package are:

 RecognizerModeDesc

 Rule

 GrammarEvent

The commonly used interfaces of the javax.speech.recognition package are:

 Grammar

 Recognizer

 Result
The RecognizerModeDesc Class

The RecognizerModeDesc class extends the basic functioning of the EngineModeDesc class
with properties specific to a speech recognizer. Some commonly used methods of the
RecognizerModeDesc class are:

 isDictationGrammarSupported(): Returns a Boolean value indicating whether or not


the engine mode provides an object of the DictationGrammar interface.

 addSpeakerProfile(): Adds a speaker profile specified in an input parameter to the


object array of the SpeakerProfile class.

 match(): Returns a Boolean value depending on whether or not the


RecognizerModeDesc object contains all the features specified by the input
parameter. The input parameter can be an object of the RecognizerModeDesc class or
the EngineModeDesc class. For the EngineModeDesc class, the match() method
checks whether or not all the features supported by the EngineModeDesc class are
defined.

 getSpeakeProfiles(): Returns an array of the SpeakerProfile class containing a list of


speaker profiles known to the current mode of the speech recognizer.

The Rule Class

The Rule class defines the basic component of the RuleGrammar interface. The methods
provided by this class are:

 copy(): Returns a copy of the Rule class and all its subrules, which includes the
RuleAlternatives, RuleCount, RuleParse, RuleSequence, and RuleTag classes.

 toString(): Returns a string representing the portion of Java Speech Grammar Format
(JSGF) that appears on the right of a rule definition.

The Recognizer Interface

The Recognizer interface extends the functioning of the Engine interface of the javax.speech
package. The Recognizer interface is created by using the createRecognizer() method of the
Central class. Some methods defined in the Recognizer interface are:

 newRuleGrammar(): Creates a new object of the RuleGrammar interface for the


Recognizer interface with the name specified as the input string parameter.

 getRuleGrammar(): Returns the object of the RuleGrammar interface specified as the


input string parameter. If the grammar is not known to the Recognizer interface, the
method returns a null value.

 getDictationGrammar(): Returns the dictation grammar corresponding to the name


specified in the input string parameter.

 commitChanges(): Commits changes to the loaded types of grammar.


 removeResultListener(): Removes an object of the ResultListener interface, specified
as the input parameter, from the recognizer.

 getSpeakerManager(): Returns an object of the SpeakerManager interface that allows


management of the speakers, such as storing speaker data, of a Recognizer interface.

 suspend(): Suspends the speech recognition temporarily and places the Recognizer
interface in the SUSPENDED state. The incoming audio is buffered whereas the
recognizer is suspended.

The Result Interface

The Result interface recognizes the incoming audio that matched an active grammar object,
which is an object of the Grammar class. When an incoming speech is recognized, the Result
interface provides information, such as sequence of finalized and unfinalized words, matched
grammar, and result state. The result state includes UNFINALIZED, ACCEPTED, and
REJECTED. A new object of the Result interface is created when the recognizer identifies
incoming speech that matches with active grammar. Some methods of the Result interface
are:

 getResultState(): Returns the current state of the Result interface object in the form of
an integer. The values can be UNFINALIZED, ACCEPTED, and REJECTED.

 getGrammar(): Returns an object of the Grammar interface that matches the finalized
tokens of the Result interface.

 numTokens(): Returns the integer number of the finalized tokens in the Result
interface.

 removeResultListener(): Removes a listener from the Result interface that


corresponds to the object of the ResultListener interface input parameter.

 getBestTokens(): Returns an array of all the finalized tokens for the Result interface.

The javax.speech.synthesis Package

The javax.speech.synthesis package provides classes and interfaces that support synthesis of
speech. A speech synthesizer is a speech engine that converts text to speech. A synthesizer is
created, selected, and searched through the Central class of the javax.speech package. Some
commonly used classes of the javax.speech.synthesis package are:

 Voice

 SynthesizerModeDesc

Some commonly used interfaces of the javax.speech.synthesis package are:

 Synthesizer

 SynthesizerProperties
The VoiceClass

The Voice class defines one output voice for the speech synthesizer. The class supports
fields, such as GENDER_MALE, GENDER_FEMALE, AGE_CHILD, and
AGE_TEENAGER to describe the synthesizer voice. Some methods provided by the Voice
class are:

 getName(): Returns the voice name as a string.

 setName(): Sets the voice name according to the input string parameter.

 getGender(): Returns the integer value of the gender of the voice.

 setGender(): Sets the voice gender according to the specified integer input parameter.

 getAge(): Returns the integer value of the age of the voice.

 clone(): Creates a copy of the voice.

 match(): Returns a Boolean value specifying whether or not the Voice class has all the
features corresponding to the voice object in the input parameter.

The SynthesizerModeDesc Class

The SynthesizerModeDesc class extends the functioning of the EngineModeDesc class of the
javax.speech package. Apart from the engine name, locale, mode name, and running
properties inherited from the EngineModeDesc class, the SynthesizerModeDesc class
includes two properties, the voice to be loaded when the synthesizer is started and the list of
voices provided by the synthesizer. Some methods provided by the SynthesizerModeDesc
class are:

 addVoice(): Adds a voice, specified in the voice input parameter, to the existing list of
voices.

 equals(): Returns a Boolean value, which is true if the object of the


SynthesizerModeDesc class and the specified input parameter have equal values of
properties, such as engine name, locale, mode name, and all voices.

 match(): Returns a Boolean value depending on whether or not the object of the
SynthesizerModeDesc class has all the features specified by the input parameter. The
input parameter can be SynthesizerModeDesc or EngineModeDesc. If the input
parameter is EngineModeDesc, the method checks only for the features of the
EngineModeDesc class.

 getVoices(): Returns an array of the list of voices available in the synthesizer.


The Synthesizer Interface

The Synthesizer interface provides an extension to the Engine interface of the javax.speech
package. The Synthesizer interface is created by using the createSynthesizer() method of the
Central class. Some methods defined by the Synthesizer interface are:

 speak(): Reads out text from a Uniform Resource Locator (URL) that has been
formatted with the Java Speech Markup Language (JSML). This method accepts two
input parameters, the URL containing the JSML text and the SpeakableListener
interface object to which the Synthesizer interface sends the notifications of events.
The Synthesizer interface checks the text specified in the URL for JSML formatting
and places in the output queue.

 speakPlainText(): Reads out a plain text string. This method accepts two input
parameters, the string containing text and the SpeakableListener interface object to
which the notifications of events are sent during the synthesis process.

 phoneme(): Returns the phoneme string for the corresponding text string input
parameter. The input string can be simple text with out JSML formatting.

 enumerationQueue(): Returns an enumeration containing the list of all the objects


present in the output queue. This method returns the objects placed on the speech
output queue by the current application only. The top of the queue represents the first
item.

 cancelAll(): Cancels all the objects in the speech output queue and stops the audio
process of the current object in the top of the queue.

The SynthesizerProperties Interface

The SynthesizerProperties interface provides an extension to the EngineProperties interface


of the javax.speech package. This interface allows you to control the run time properties,
such as voice, speech rate, pitch range, and volume. Some methods provided by the
SynthesizerProperties interface are:

 getVoice(): Returns the current synthesizer’s voice.

 setVoice(): Sets the current synthesizer’s voice according to the specified voice input
parameter.

 getPitch(): Returns the baseline pitch for synthesis as a float value.

 setPitchRange(): Sets the pitch range according to the input float parameter.

 setSpeakingRate(): Sets the target speech rate according to the input float parameter.
The rate is usually represented as number of words per minute.

 getVolume(): Returns the volume of speech.


3 tokens

FinalResult

public abstract interface Result


A Result is issued by a Recognizer as it recognizes an incoming utterance that matches an
active Grammar. The Result interface provides the application with access to the following
information about a recognized utterance:

1. A sequence of finalized tokens (words) that have been recognized,


2. A sequence of unfinalized tokens,
3. Reference to the grammar matched by the result,
4. The result state: UNFINALIZED, ACCEPTED or REJECTED.

Multiple Result Interfaces

Every Result object provided by a Recognizer implements both the FinalRuleResult


and FinalDictationResult interfaces. Thus, by extension every result also implements
the FinalResult and Result interfaces.

These multiple interfaces are designed to explicitly indicate (a) what information is available
at what times in the result life-cycle and (b) what information is available for different types
of results. Appropriate casting of results allows compile-time checking of result-handling
code and fewer bugs.

The FinalResult extends the Result interface. It provides access to the additional
information about a result that is available once it has been finalized (once it is in either of the
ACCEPTED or REJECTED states). Calling any method of the FinalResult interface for a result
in the UNFINALIZED state causes a ResultStateError to be thrown.

The FinalRuleResult extends the FinalResult interface. It provides access to the


additional information about a finalized result that matches a RuleGrammar. Calling any
method of the FinalRuleResult interface for a non-finalized result or a result that matches a
DictationGrammar causes a ResultStateError to be thrown.

The FinalDictationResult also extends the FinalResult interface. It provides access to


the additional information about a finalized result that matches a DictationGrammar. Calling
any method of the FinalDictationResult interface for a non-finalized result or a result that
matches a RuleGrammar causes a ResultStateError to be thrown.

Note: every result implements both the FinalRuleResult and FinalDictationResult


interfaces even though the result will match either a RuleGrammar or DictationGrammar, but
never both. The reason for this is that when the result is created (RESULT_CREATED event), the
grammar is not always known.
Result States

The separate interfaces determine what information is available for a result in the different
stages of its life-cycle. The state of a Result is determined by calling the getResultState
method. The three possible states are UNFINALIZED, ACCEPTED and REJECTED.

A new result starts in the UNFINALIZED state. When the result is finalized is moves to either
the ACCEPTED or REJECTED state. An accepted or rejected result is termed a finalized result.
All values and information regarding a finalized result are fixed (excepting that audio and
training information may be released).

Following are descriptions of a result object in each of the three states including information
on which interfaces can be used in each state.

getResultState() == Result.UNFINALIZED

 Recognition of the result is in progress.


 A new result is created with a RESULT_CREATED event that is issued to each
ResultListener attached to a Recognizer. The new result is created in in the
UNFINALIZED state.
 A result remains in the UNFINALIZED state until it is finalized by either a
RESULT_ACCEPTED or RESULT_REJECTED event.
 Applications should only call the methods of the Result interface. A
ResultStateError is issued on calls to the methods of FinalResult,
FinalRuleResult and FinalDictationResult interfaces.
 Events 1: zero or more RESULT_UPDATED events may be issued as (a) tokens are
finalized, or (b) as the unfinalized tokens changes.
 Events 2: one GRAMMAR_FINALIZED event must be issued in the UNFINALIZED state
before result finalization by an RESULT_ACCEPTED event. (Not required if a result is
rejected.)
 Events 3: the GRAMMAR_FINALIZED event is optional if the result is finalized by a
RESULT_REJECTED event. (It is not always possible for a recognizer to identify a best-
match grammar for a rejected result.)
 Prior to the GRAMMAR_FINALIZED event, the getGrammar returns null. Following the
GRAMMAR_FINALIZED event the getGrammar method returns a non-null reference to
the active Grammar that is matched by this result.
 numTokens returns the number of finalized tokens. While in the UNFINALIZED this
number may increase as ResultEvent.RESULT_UPDATED events are issued.
 The best guess for each finalized token is available through getBestToken(int
num). The best guesses are guaranteed not to change through the remaining life of the
result.
 getUnfinalizedTokens may return zero or more tokens and these may change at any
time when a ResultEvent.RESULT_UPDATED event is issued.

getResultState() == Result.ACCEPTED

 Recognition of the Result is complete and the recognizer is confident it has the
correct result (not a rejected result). Non-rejection is not a guarantee of a correct
result - only sufficient confidence that the guess is correct.
 Events 1: a result transitions from the UNFINALIZED state to the ACCEPTED state when
an RESULT_ACCEPTED event is issued.
 Events 2: AUDIO_RELEASED and TRAINING_INFO_RELEASED events may occur
optionally (once) in the ACCEPTED state.
 numTokens will return 1 or greater (there must be at least one finalized token) and the
number of finalized tokens will not change. [Note: A rejected result may have zero
finalized tokens.]
 The best guess for each finalized token is available through the getBestToken(int
tokNum) method. The best guesses will not change through the remaining life of the
result.
 getUnfinalizedTokens method returns null.
 The getGrammar method returns the grammar matched by this result. It may be either
a RuleGrammar or DictationGrammar.
 For either a RuleGrammar or DictationGrammar the methods of FinalResult may
be used to access audio data and to perform correction/training.
 If the result matches a RuleGrammar, the methods of FinalRuleResult may be used
to get alternative guesses for the complete utterance and to get tags and other
information associated with the RuleGrammar. (Calls to any methods of the
FinalDictationResult interface cause a ResultStateError.)
 If the result matches a DictationGrammar, the methods of FinalDictationResult
may be used to get alternative guesses for tokens and token sequences. (Calls to any
methods of the FinalRuleResult interface cause a ResultStateError.)

getResultState() == Result.REJECTED

 Recognition of the Result is complete but the recognizer believes it does not have the
correct result. Programmatically, an accepted and rejected result are very similar but
the contents of a rejected result must be treated differently - they are likely to be
wrong.
 Events 1: a result transitions from the UNFINALIZED state to the REJECTED state when
an RESULT_REJECTED event is issued.
 Events 2: (same as for the ACCEPTED state) AUDIO_RELEASED and
TRAINING_INFO_RELEASED events may occur optionally (once) in the REJECTED state.
 numTokens will return 0 or greater. The number of tokens will not change for the
remaining life of the result. [Note: an accepted result always has at least one finalized
token.]
 As with an accepted result, the best guess for each finalized token is available through
the getBestToken(int num) method and the tokens are guaranteed not to change
through the remaining life of the result. Because the result has been rejected the
guesses are not likely to be correct.
 getUnfinalizedTokens method returns null.
 If the GRAMMAR_FINALIZED was issued during recognition of the result, the
getGrammar method returns the grammar matched by this result otherwise it returns
null. It may be either a RuleGrammar or DictationGrammar. For rejected results,
there is a greater chance that this grammar is wrong.
 The FinalResult interface behaves the same as for a result in the ACCEPTED state
expect that the information is less likely to be reliable.
 If the grammar is known, the FinalRuleResult and FinalDictationResult
interfaces behave the same as for a result in the ACCEPTED state expect that the
information is less likely to be reliable. If the grammar is unknown, then a
ResultStateError is thrown on calls to the methods of both FinalRuleResult and
FinalDictationResult.

Result State and Recognizer States

The state system of a Recognizer is linked to the state of recognition of the current result.
The Recognizer interface documents the normal event cycle for a Recognizer and for
Results. The following is an overview of the ways in which the two state systems are linked:

 The ALLOCATED state of a Recognizer has three sub-states. In the LISTENING state,
the recognizer is listening to background audio and there is no result being produced.
In the SUSPENDED state, the recognizer is temporarily buffering audio input while its
grammars are updated. In the PROCESSING state, the recognizer has detected incoming
audio that may match an active grammar and is producing a Result.
 The Recognizer moves from the LISTENING state to the PROCESSING state with a
RECOGNIZER_PROCESSING event immediately prior to issuing a RESULT_CREATED
event.
 The RESULT_UPDATED and GRAMMAR_FINALIZED events are produced while the
Recognizer is in the PROCESSING state.
 The Recognizer finalizes a Result with RESULT_ACCEPTED or RESULT_REJECTED
event immediately after it transitions from the PROCESSING state to the SUSPENDED
state with a RECOGNIZER_SUSPENDED event.
 Unless there is a pending suspend, the Recognizer commits grammar changes with a
CHANGES_COMMITTED event as soon as the result finalization event is processed.
 The TRAINING_INFO_RELEASED and AUDIO_RELEASED events can occur in any state of
an ALLOCATED Recognizer.

Accept or Reject?

Rejection of a result indicates that the recognizer is not confident that it has accurately
recognized what a user says. Rejection can be controlled through the
RecognizerProperties interface with the setConfidenceLevel method. Increasing the
confidence level requires the recognizer to have greater confident to accept a result, so more
results are likely to be rejected.

Important: the acceptance of a result (an RESULT_ACCEPTED event rather than a


RESULT_REJECTED event) does not mean the result is correct. Instead acceptance implies that
the recognizer has a sufficient level of confidence that the result is correct.

It is difficult for recognizers to reliably determine when they make mistakes. Applications
need to determine the cost of incorrect recognition of any particular results and take
appropriate actions. For example, confirm with a user that they said "delete all files" before
deleting anything.

Result Events
Events are issued when a new result is created and when there is any change in the state or
information content of a result. The following describes the event sequence for an accepted
result. It provides the same information as above for result states, but focusses on legal event
sequences.

Before a new result is created for incoming speech, a recognizer usually issues a
SPEECH_STARTED event to the speechStarted method of RecognizerAudioListener. The

A newly created Result is provided to the application by calling the resultCreated method
of each ResultListener attached to the Recognizer with a RESULT_CREATED event. The
new result may or may not have any finalized tokens or unfinalized tokens.

At any time following the RESULT_CREATED event, an application may attach a


ResultListener to an individual result. That listener will receive all subsequent events
associated with that Result.

A new Result is created in the UNFINALIZED state. In this state, zero or more
RESULT_UPDATED events may be issued to each ResultListener attached to the Recognizer
and to each ResultListener attached to that Result. The RESULT_UPDATED indicates that
one or more tokens have been finalized, or that the unfinalized tokens have changed, or both.

When the recognizer determines which grammar is the best match for incoming speech, it
issues a GRAMMAR_FINALIZED event. This event is issued to each ResultListener attached to
the Recognizer and to each ResultListener attached to that Result.

The GRAMMAR_FINALIZED event is also issued to each ResultListener attached to the


matched Grammar. This is the first ResultEvent received by ResultListeners attached to
the Grammar. All subsequent result events are issued to all ResultListeners attached to the
matched Grammar (as well as to ResultListeners attached to the Result and to the
Recognizer).

Zero or more RESULT_UPDATED events may be issued after the GRAMMAR_FINALIZED event but
before the result is finalized.

Once the recognizer completes recognition of the Result that it choses to accept, it finalizes
the result with an RESULT_ACCEPTED event that is issued to the ResultListeners attached to
the Recognizer, the matched Grammar, and the Result. This event may also indicate
finalization of zero or more tokens, and/or the reseting of the unfinalized tokens to null. The
result finalization event occurs immediately after the Recognizer makes a transition from the
PROCESSING state to the SUSPENDED state with a RECOGNIZER_SUSPENDED event.

A finalized result (accepted or rejected state) may issue a AUDIO_RELEASED or


TRAINING_INFO_RELEASED event. These events may be issued in response to relevant release
methods of FinalResult and FinalDictationResult or may be issued when the recognizer
independently determines to release audio or training information.

When a result is rejected some of the events described above may be skipped. A result may
be rejected with the RESULT_REJECTED event at any time after a RESULT_CREATED event
instead of an RESULT_ACCEPTED event. A result may be rejected with or without any
unfinalized or finalized tokens being created (no RESULT_UPDATED events), and with or
without a GRAMMAR_FINALIZED event.

When does a Result start and end?

A new result object is created when a recognizer has detected possible incoming speech
which may match an active grammar.

To accept the result (i.e. to issue a RESULT_ACCEPTED event), the best-guess tokens of the
result must match the token patterns defined by the matched grammar. For a RuleGrammar
this implies that a call to the parse method of the matched RuleGrammar must return
successfully. (Note: the parse is not guaranteed if the grammar has been changed.)

Because there are no programmatically defined constraints upon word patterns for a
DictationGrammar, a single result may represent a single word, a short phrase or sentence,
or possibly many pages of text.

The set of conditions that may cause a result matching a DictationGrammar to be finalized
includes:

 The user pauses for a period of time (a timeout).


 A call to the forceFinalize method of the recognizer.
 User has spoken text matching an active RuleGrammar (the dictation result is finalized
and a new Result is issued for the RuleGrammar).
 The engine is paused.

The following conditions apply to all finalized results:

 N-best alternative token guesses available through the FinalRuleResult and


FinalDictationResult interfaces cannot cross result boundaries.
 Correction/training is only possible within a single result object.

See Also:
FinalResult, FinalRuleResult, FinalDictationResult, ResultEvent, ResultListener,
ResultAdapter, Grammar, RuleGrammar, DictationGrammar, forceFinalize,
RecognizerEvent, setConfidenceLevel

Field Summary
static int ACCEPTED
          getResultState returns ACCEPTED once recognition of the result is
completed and the Result object has been finalized by being accepted.
static int REJECTED
          getResultState returns REJECTED once recognition of the result complete
and the Result object has been finalized by being rejected.
static int UNFINALIZED
          getResultState returns UNFINALIZED while a result is still being
recognized.
 
Method Summary
void addResultListener(ResultListener listener)
          Request notifications of events of related to this Result.
ResultToken getBestToken(int tokNum)
          Returns the best guess for the tokNumth token.
ResultToken[] getBestTokens()
          Returns all the best guess tokens for this result.
Grammar getGrammar()
          Return the Grammar matched by the best-guess finalized tokens of this
result or null if the grammar is not known.
int getResultState()
          Returns the current state of the Result object: UNFINALIZED, ACCEPTED or
REJECTED.
ResultToken[] getUnfinalizedTokens()
          In the UNFINALIZED state, return the current guess of the tokens
following the finalized tokens.
int numTokens()
          Returns the number of finalized tokens in a Result.
void removeResultListener(ResultListener listener)
          Remove a listener from this Result.
 
Field Detail
UNFINALIZED
public static final int UNFINALIZED
getResultState returns UNFINALIZED while a result is still being recognized. A
Result is in the UNFINALIZED state when the RESULT_CREATED event is issued. Result
states are described above in detail.
See Also:
getResultState, RESULT_CREATED

ACCEPTED
public static final int ACCEPTED
getResultState returns ACCEPTED once recognition of the result is completed and
the Result object has been finalized by being accepted. When a Result changes to
the ACCEPTED state a RESULT_ACCEPTED event is issued. Result states are described
above in detail.
See Also:
getResultState, RESULT_ACCEPTED
REJECTED
public static final int REJECTED
getResultState returns REJECTED once recognition of the result complete and the
Result object has been finalized by being rejected. When a Result changes to the
REJECTED state a RESULT_REJECTED event is issued. Result states are described above
in detail.
See Also:
getResultState, RESULT_REJECTED
Method Detail
getResultState
public int getResultState()
Returns the current state of the Result object: UNFINALIZED, ACCEPTED or REJECTED.
The details of a Result in each state are described above.
See Also:
UNFINALIZED, ACCEPTED, REJECTED

getGrammar
public Grammar getGrammar()
Return the Grammar matched by the best-guess finalized tokens of this result or null
if the grammar is not known. The return value is null before a GRAMMAR_FINALIZED
event and non-null afterwards.

The grammar is guaranteed to be non-null for an accepted result. The grammar may
be null or non-null for a rejected result, depending upon whether a
GRAMMAR_FINALIZED event was issued prior to finalization.

For a finalized result, an application should determine the type of matched grammar
with an instanceof test. For a result that matches a RuleGrammar, the methods of
FinalRuleResult can be used (the methods of FinalDictationResult throw an
error). For a result that matches a DictationGrammar, the methods of
FinalDictationResult can be used (the methods of FinalRuleResult throw an
error). The methods of FinalResult can be used for a result matching either kind of
grammar.

Example:

Result result;
if (result.getGrammar() instanceof RuleGrammar) {
FinalRuleResult frr = (FinalRuleResult)result;
...
}

See Also:
getResultState
numTokens
public int numTokens()
Returns the number of finalized tokens in a Result. Tokens are numbered from 0 to
numTokens()-1 and are obtained through the getBestToken and getBestTokens
method of this (Result) interface and the getAlternativeTokens methods of the
FinalRuleResult and FinalDictationResult interfaces for a finalized result.

Starting from the RESULT_CREATED event and while the result remains in the
UNFINALIZED state, the number of finalized tokens may be zero or greater and can
increase as tokens are finalized. When one or more tokens are finalized in the
UNFINALIZED state, a RESULT_UPDATED event is issued with the tokenFinalized flag
set true. The RESULT_ACCEPTED and RESULT_REJECTED events which finalize a result
can also indicate that one or more tokens have been finalized.

In the ACCEPTED and REJECTED states, numTokens indicates the total number of tokens
that were finalized. The number of finalized tokens never changes in these states. An
ACCEPTED result must have one or more finalized token. A REJECTED result may have
zero or more tokens.

See Also:
RESULT_UPDATED, getBestToken, getBestTokens, getAlternativeTokens,
getAlternativeTokens

getBestToken
public ResultToken getBestToken(int tokNum)
throws
IllegalArgumentException
Returns the best guess for the tokNumth token. tokNum must be in the range 0 to
numTokens()-1.

If the result has zero tokens (possible in both the UNFINALIZED and REJECTED states)
an exception is thrown.

If the result is in the REJECTED state, then the returned tokens are likely to be
incorrect. In the ACCEPTED state (not rejected) the recognizer is confident that the
tokens are correct but applications should consider the possibility that the tokens are
incorrect.

The FinalRuleResult and FinalDictationResult interfaces provide


getAlternativeTokens methods that return alternative token guesses for finalized
results.

Throws:
IllegalArgumentException - if tokNum is out of range.
See Also:
getUnfinalizedTokens, getBestTokens, getAlternativeTokens, getAlternativeTokens
getBestTokens
public ResultToken[] getBestTokens()
Returns all the best guess tokens for this result. If the result has zero tokens, the return
value is null.

getUnfinalizedTokens
public ResultToken[] getUnfinalizedTokens()
In the UNFINALIZED state, return the current guess of the tokens following the
finalized tokens. Unfinalized tokens provide an indication of what a recognizer is
considering as possible recognition tokens for speech following the finalized tokens.

. * Unfinalized tokens can provide users with feedback on the recognition process.
The array may be any length (zero or more tokens), the length may change at any
time, and successive calls to getUnfinalizedTokens may return different tokens or
even different numbers of tokens. When the unfinalized tokens are changed, a
RESULT_UPDATED event is issued to the ResultListener. The RESULT_ACCEPTED and
RESULT_REJECTED events finalize a result and always guarantee that the return value
is null. A new result created with a RESULT_CREATED event may have a null or non-
null value.

The returned array is null if there are currently no unfinalized tokens, if the recognizer
does not support unfinalized tokens, or after a Result is finalized (in the ACCEPTED or
REJECTED state).

See Also:
isUnfinalizedTokensChanged, RESULT_UPDATED, RESULT_ACCEPTED,
RESULT_REJECTED

addResultListener
public void addResultListener(ResultListener listener)
Request notifications of events of related to this Result. An application can attach
multiple listeners to a Result. A listener can be removed with the
removeResultListener method.

ResultListener objects can also be attached to a Recognizer and to any Grammar.


A listener attached to the Recognizer receives all events for all results produced by
that Recognizer. A listener attached to a Grammar receives all events for all results
that have been finalized for that Grammar (all events starting with and including the
GRAMMAR_FINALIZED event).

A ResultListener attached to a Result only receives events following the point in


time at which the listener is attached. Because the listener can only be attached during
or after the RESULT_CREATED, it will not receive the RESULT_CREATED event. Only
ResultListeners attached to the Recognizer receive RESULT_CREATED events.
See Also:
removeResultListener, addResultListener, addResultListener

removeResultListener
public void removeResultListener(ResultListener listener)
Remove a listener from this Result.

4 allocate , deeallocate

Recognizer, Synthesizer

public abstract interface Engine


The Engine interface is the parent interface for all speech engines including Recognizer and
Synthesizer. A speech engine is a generic entity that either processes speech input or
produces speech output. Engines - recognizers and synthesizers - derive the following
functionality from the Engine interface:

 allocate and deallocate methods.


 pause and resume methods.
 Access to a AudioManager and VocabManager.
 Access to EngineProperties.
 Access to the engine's EngineModeDesc.
 Methods to add and remove EngineListener objects.

Engines are located, selected and created through methods of the Central class.

Engine State System: Allocation

Each type of speech engine has a well-defined set of states of operation, and well-defined
behavior for moving between states. These states are defined by constants of the Engine,
Recognizer and Synthesizer interfaces.

The Engine interface defines three methods for viewing and monitoring states:
getEngineState, waitEngineState and testEngineState. An EngineEvent is issued to
EngineListeners each time an Engine changes state.

The basic states of any speech engine (Recognizer or Synthesizer) are DEALLOCATED,
ALLOCATED, ALLOCATING_RESOURCES and DEALLOCATING_RESOURCES. An engine in the
ALLOCATED state has acquired all the resources it requires to perform its core functions.

Engines are created in the DEALLOCATED state and a call to allocate is required to prepare
them for usage. The ALLOCATING_RESOURCES state is an intermediate state between
DEALLOCATED and ALLOCATED which an engine occupies during the resource allocation
process (which may be a very short period or takes 10s of seconds).
Once an application finishes using a speech engine it should always explicitly free system
resources by calling the deallocate method. This call transitions the engine to the
DEALLOCATED state via some period in the DEALLOCATING_RESOURCES state.

The methods of Engine, Recognizer and Synthesizer perform differently according to the
engine's allocation state. Many methods cannot be performed when an engine is in either the
DEALLOCATED or DEALLOCATING_RESOURCES state. Many methods block (wait) for an engine
in the ALLOCATING_RESOURCES state until the engine reaches the ALLOCATED state. This
blocking/exception behavior is defined separately for each method of Engine, Synthesizer
and Recognizer.

Engine State System: Sub-states of ALLOCATED

The ALLOCATED states has sub-states. (The DEALLOCATED, ALLOCATING_RESOURCES and


DEALLOCATING_RESOURCES states do not have any sub-states.)

 Any ALLOCATED engine (Recognizer or Synthesizer) is either PAUSED or RESUMED.


These state indicates whether audio input/output is stopped or running.
 An ALLOCATED Synthesizer has additional sub-states for QUEUE_EMPTY and
QUEUE_NOT_EMPTY that indicate the status of its speech output queue. These two states
are independent of the PAUSED and RESUMED states.
 An ALLOCATED Recognizer has additional sub-states for LISTENING, PROCESSING and
SUSPENDED that indicate the status of the recognition process. These three states are
independent of the PAUSED and RESUMED states (with the exception of minor
interactions documented with Recognizer).
 An ALLOCATED Recognizer also has additional sub-states for FOCUS_ON and
FOCUS_OFF. Focus determines when most of an application's grammars are active or
deactive for recognition. The focus states are independent of the PAUSED and RESUMED
states and of the LISTENING/PROCESSING/SUSPENDED states. (Limited exceptions are
discussed in the documentation for Recognizer).

The pause and resume methods are used to transition an engine between the PAUSED and
RESUMED states. The PAUSED and RESUMED states are shared by all applications that use the
underlying engine. For instance, pausing a recognizer pauses all applications that use that
engine.

Engine State System: get/test/wait

The current state of an Engine is returned by the getEngineState method. The


waitEngineState method blocks the calling thread until the Engine reaches a specified
state. The testEngineState tests whether an Engine is in a specified state.

The state values can be bitwise OR'ed (using the Java "|" operator). For example, for an
allocated, resumed synthesizer with items in its speech output queue, the state is

Engine.ALLOCATED | Engine.RESUMED | Synthesizer.QUEUE_NOT_EMPTY

The states and sub-states defined above put constraints upon the state of an engine. The
following are examples of illegal states:
Illegal Engine states:
Engine.DEALLOCATED | Engine.RESUMED
Engine.ALLOCATED | Engine.DEALLOCATED
Illegal Synthesizer states:
Engine.DEALLOCATED | Engine.QUEUE_NOT_EMPTY
Engine.QUEUE_EMPTY | Engine.QUEUE_NOT_EMPTY
Illegal Recognizer states:
Engine.DEALLOCATED | Engine.PROCESSING
Engine.PROCESSING | Engine.SUSPENDED
Calls to the testEngineState and waitEngineState methods with illegal state values cause
an exception to be thrown.
See Also:
Central, Synthesizer, Recognizer

Field Summary
static long ALLOCATED
          Bit of state that is set when an Engine is in the allocated state.
static long ALLOCATING_RESOURCES
          Bit of state that is set when an Engine is being allocated - the transition state
between DEALLOCATED to ALLOCATED following a call to the allocate method.
static long DEALLOCATED
          Bit of state that is set when an Engine is in the deallocated state.
static long DEALLOCATING_RESOURCES
          Bit of state that is set when an Engine is being deallocated - the transition
state between ALLOCATED to DEALLOCATED.
static long PAUSED
          Bit of state that is set when an Engine is is in the ALLOCATED state and is
PAUSED.
static long RESUMED
          Bit of state that is set when an Engine is is in the ALLOCATED state and is
RESUMED.
 
Method Summary
void addEngineListener(EngineListener listener)
          Request notifications of events of related to the Engine.
void allocate()
          Allocate the resources required for the Engine and put it into the
ALLOCATED state.
void deallocate()
          Free the resources of the engine that were acquired during allocation
and during operation and return the engine to the DEALLOCATED.
AudioManager getAudioManager()
          Return an object which provides management of the audio input or
output for the Engine.
EngineModeDesc getEngineModeDesc()
          Return a mode descriptor that defines the operating properties of the
engine.
EngineProperties getEngineProperties()
          Return the EngineProperties object (a JavaBean).
long getEngineState()
          Returns a or'ed set of flags indicating the current state of an Engine.
VocabManager getVocabManager()
          Return an object which provides management of the vocabulary for
the Engine.
void pause()
          Pause the audio stream for the engine and put the Engine into the
PAUSED state.
void removeEngineListener(EngineListener listener)
          Remove a listener from this Engine.
void resume()
          Put the Engine in the RESUMED state to resume audio streaming to or
from a paused engine.
boolean testEngineState(long state)
          Returns true if the current engine state matches the specified state.
void waitEngineState(long state)
          Blocks the calling thread until the Engine is in a specified state.
 
Field Detail
DEALLOCATED
public static final long DEALLOCATED
Bit of state that is set when an Engine is in the deallocated state. A deallocated engine
does not have the resources necessary for it to carry out its basic functions.

In the DEALLOCATED state, many of the methods of an Engine throw an exception


when called. The DEALLOCATED state has no sub-states.

An Engine is always created in the DEALLOCATED state. A DEALLOCATED can transition


to the ALLOCATED state via the ALLOCATING_RESOURCES state following a call to the
allocate method. An Engine returns to the DEALLOCATED state via the
DEALLOCATING_RESOURCES state with a call to the deallocate method.

See Also:
allocate, deallocate, getEngineState, waitEngineState

ALLOCATING_RESOURCES
public static final long ALLOCATING_RESOURCES
Bit of state that is set when an Engine is being allocated - the transition state between
DEALLOCATED to ALLOCATED following a call to the allocate method. The
ALLOCATING_RESOURCES state has no sub-states. In the ALLOCATING_RESOURCES state,
many of the methods of Engine, Recognizer, and Synthesizer will block until the
Engine reaches the ALLOCATED state and the action can be performed.
See Also:
getEngineState, waitEngineState

ALLOCATED
public static final long ALLOCATED
Bit of state that is set when an Engine is in the allocated state. An engine in the
ALLOCATED state has acquired the resources required for it to carry out its core
functions.

The ALLOCATED states has sub-states for RESUMED and PAUSED. Both Synthesizer and
Recognizer define additional sub-states of ALLOCATED.

An Engine is always created in the DEALLOCATED state. It reaches the ALLOCATED state
via the ALLOCATING_RESOURCES state with a call to the allocate method.

See Also:
Synthesizer, Recognizer, getEngineState, waitEngineState

DEALLOCATING_RESOURCES
public static final long DEALLOCATING_RESOURCES
Bit of state that is set when an Engine is being deallocated - the transition state
between ALLOCATED to DEALLOCATED. The DEALLOCATING_RESOURCES state has no
sub-states. In the DEALLOCATING_RESOURCES state, most methods of Engine,
Recognizer and Synthesizer throw an exception.
See Also:
getEngineState, waitEngineState

PAUSED
public static final long PAUSED
Bit of state that is set when an Engine is is in the ALLOCATED state and is PAUSED. In
the PAUSED state, audio input or output stopped.

An ALLOCATED engine is always in either in the PAUSED or RESUMED. The PAUSED and
RESUMED states are sub-states of the ALLOCATED state.

See Also:
RESUMED, ALLOCATED, getEngineState, waitEngineState
RESUMED
public static final long RESUMED
Bit of state that is set when an Engine is is in the ALLOCATED state and is RESUMED. In
the RESUMED state, audio input or output active.

An ALLOCATED engine is always in either in the PAUSED or RESUMED. The PAUSED and
RESUMED states are sub-states of the ALLOCATED state.

See Also:
RESUMED, ALLOCATED, getEngineState, waitEngineState
Method Detail
getEngineState
public long getEngineState()
Returns a or'ed set of flags indicating the current state of an Engine. The format of the
returned state value is described above.

An EngineEvent is issued each time the Engine changes state.

The getEngineState method can be called successfully in any Engine state.

See Also:
getEngineState, waitEngineState, getNewEngineState, getOldEngineState

waitEngineState
public void waitEngineState(long state)
throws InterruptedException,
IllegalArgumentException
Blocks the calling thread until the Engine is in a specified state. The format of the
state value is described above.

All state bits specified in the state parameter must be set in order for the method to
return, as defined for the testEngineState method. If the state parameter defines
an unreachable state (e.g. PAUSED | RESUMED) an exception is thrown.

The waitEngineState method can be called successfully in any Engine state.

Throws:
InterruptedException - if another thread has interrupted this thread.
IllegalArgumentException - if the specified state is unreachable
See Also:
testEngineState, getEngineState

testEngineState
public boolean testEngineState(long state)
throws IllegalArgumentException
Returns true if the current engine state matches the specified state. The format of the
state value is described above.

The test performed is not an exact match to the current state. Only the specified states
are tested. For example the following returns true only if the Synthesizer queue is
empty, irrespective of the pause/resume and allocation states.

if (synth.testEngineState(Synthesizer.QUEUE_EMPTY)) ...
The testEngineState method is equivalent to:
if ((engine.getEngineState() & state) == state)
The testEngineState method can be called successfully in any Engine state.
Throws:
IllegalArgumentException - if the specified state is unreachable

allocate
public void allocate()
throws EngineException,
EngineStateError
Allocate the resources required for the Engine and put it into the ALLOCATED state.
When this method returns successfully the ALLOCATED bit of engine state is set, and
the testEngineState(Engine.ALLOCATED) method returns true. During the
processing of the method, the Engine is temporarily in the ALLOCATING_RESOURCES
state.

When the Engine reaches the ALLOCATED state other engine states are determined:

 PAUSED or RESUMED: the pause state depends upon the existing state of the
engine. In a multi-app environment, the pause/resume state of the engine is
shared by all apps.
 A Recognizer always starts in the LISTENING state when newly allocated but
may transition immediately to another state.
 A Recognizer may be allocated in either the HAS_FOCUS state or LOST_FOCUS
state depending upon the activity of other applications.
 A Synthesizer always starts in the QUEUE_EMPTY state when newly allocated.

While this method is being processed events are issued to any EngineListeners
attached to the Engine to indicate state changes. First, as the Engine changes from the
DEALLOCATED to the ALLOCATING_RESOURCES state, an
ENGINE_ALLOCATING_RESOURCES event is issued. As the allocation process
completes, the engine moves from the ALLOCATING_RESOURCES state to the
ALLOCATED state and an ENGINE_ALLOCATED event is issued.

The allocate method should be called for an Engine in the DEALLOCATED state. The
method has no effect for an Engine is either the ALLOCATING_RESOURCES or
ALLOCATED states. The method throws an exception in the DEALLOCATING_RESOURCES
state.
If any problems are encountered during the allocation process so that the engine
cannot be allocated, the engine returns to the DEALLOCATED state (with an
ENGINE_DEALLOCATED event), and an EngineException is thrown.

Allocating the resources for an engine may be fast (less than a second) or slow
(several 10s of seconds) depending upon a range of factors. Since the allocate
method does not return until allocation is completed applications may want to perform
allocation in a background thread and proceed with other activities. The following
code uses an inner class implementation of Runnable to create a background thread
for engine allocation:

static Engine engine;

public static void main(String argv[])


{
engine = Central.createRecognizer();

new Thread(new Runnable() {


public void run() {
engine.allocate();
}
}).start();

// Do other stuff while allocation takes place


...

// Now wait until allocation is complete


engine.waitEngineState(Engine.ALLOCATED);
}
Throws:
EngineException - if an allocation error occurred or the engine is not operational.
EngineStateError - if called for an engine in the DEALLOCATING_RESOURCES state
See Also:
getEngineState, deallocate, ALLOCATED, ENGINE_ALLOCATED

deallocate
public void deallocate()
throws EngineException,
EngineStateError
Free the resources of the engine that were acquired during allocation and during
operation and return the engine to the DEALLOCATED. When this method returns the
DEALLOCATED bit of engine state is set so the
testEngineState(Engine.DEALLOCATED) method returns true. During the
processing of the method, the Engine is temporarily in the
DEALLOCATING_RESOURCES state.

A deallocated engine can be re-started with a subsequent call to allocate.

Engines need to clean up current activities before being deallocated. A Synthesizer


must be in the QUEUE_EMPTY state before being deallocated. If the queue is not empty,
any objects on the speech output queue must be cancelled with appropriate events
issued. A Recognizer cannot be in the PROCESSING state when being deallocated. If
necessary, there must be a forceFinalize of any unfinalized result.

While this method is being processed events are issued to any EngineListeners
attached to the Engine to indicate state changes. First, as the Engine changes from the
ALLOCATED to the DEALLOCATING_RESOURCES state, an
ENGINE_DEALLOCATING_RESOURCES event is issued. As the deallocation process
completes, the engine moves from the DEALLOCATING_RESOURCES state to the
DEALLOCATED state and an ENGINE_DEALLOCATED event is issued.

The deallocate method should only be called for an Engine in the ALLOCATED state.
The method has no effect for an Engine is either the DEALLOCATING_RESOURCES or
DEALLOCATED states. The method throws an exception in the ALLOCATING_RESOURCES
state.

Deallocating resources for an engine is not always immediate. Since the deallocate
method does not return until complete, applications may want to perform deallocation
in a separate thread. The documentation for the allocate method shows an example
of an inner class implementation of Runnable that creates a separate thread.

Throws:
EngineException - if a deallocation error occurs
EngineStateError - if called for an engine in the ALLOCATING_RESOURCES state
See Also:
allocate, ENGINE_DEALLOCATED, QUEUE_EMPTY

pause
public void pause()
throws EngineStateError
Pause the audio stream for the engine and put the Engine into the PAUSED state.
Pausing an engine pauses the underlying engine for all applications that are connected
to that engine. Engines are typically paused and resumed by request from a user.

Applications may pause an engine indefinately. When an engine moves from the
RESUMED state to the PAUSED state, an ENGINE_PAUSED event is issued to each
EngineListener attached to the Engine. The PAUSED bit of the engine state is set to
true when paused, and can be tested by the getEngineState method and other
engine state methods.

The PAUSED state is a sub-state of the ALLOCATED state. An ALLOCATED Engine is


always in either the PAUSED or the RESUMED state.

It is not an exception to pause an Engine that is already paused.

The pause method operates as defined for engines in the ALLOCATED state. When
pause is called for an engine in the ALLOCATING_RESOURCES state, the method blocks
(waits) until the ALLOCATED state is reached and then operates normally. An error is
thrown when pause is called for an engine is either the DEALLOCATED is
DEALLOCATING_RESOURCES states. state.

The pause method does not always return immediately. Some applications need to
execute pause in a separate thread. The documentation for the allocate method
includes an example implementation of Runnable with inner classes that can perform
pause in a separate thread.

Pausing a Synthesizer

The pause/resume mechanism for a synthesizer is analogous to pause/resume on a


tape player or CD player. The audio output stream is paused. The speaking queue is
left intact and a subsequent resume continues output from the point at which the pause
took effect.

Pausing a Recognizer

Pause and resume for a recognizer are analogous to turning a microphone off and on.
Pausing stops the input audio input stream as close as possible to the time of the call
to pause. The incoming audio between the pause and the resume calls is ignored.

Anything a user says while the recognizer is paused will not be heard by the
recognizer. Pausing a recognizer during the middle of user speech forces the
recognizer to finalize or reject processing of that incoming speech - a recognition
result cannot cross a pause/resume boundary.

Most recognizers have some amount of internal audio buffering. This means that
some recognizer processing may continue after the pause. For example, results can be
created and finalized.

Note: recognizers add a special suspend method that allows applications to


temporarily stop the recognizer to modify grammars and grammar activation. Unlike a
paused recognizer, a suspended recognizer buffers incoming audio input to be
processed once it returns to a listening state, so no audio is lost.

Throws:
EngineStateError - if called for an engine in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
resume, getEngineState, ENGINE_PAUSED, suspend

resume
public void resume()
throws AudioException,
EngineStateError
Put the Engine in the RESUMED stateto resume audio streaming to or from a paused
engine. Resuming an engine resuming the underlying engine for all applications that
are connected to that engine. Engines are typically paused and resumed by request
from a user.

The specific pause/resume behavior of recognizers and synthesizers is defined in the


documentation for the pause method.

When an engine moves from the PAUSED state to the RESUMED state, an
ENGINE_RESUMED event is issued to each EngineListener attached to the Engine.
The RESUMED bit of the engine state is set to true when resumed, and can be tested by
the getEngineState method and other engine state methods.

The RESUMED state is a sub-state of the ALLOCATED state. An ALLOCATED Engine is


always in either the PAUSED or the RESUMED state.

It is not an exception to resume a engine that is already in the RESUMED state. An


exception may be thrown if the audio resource required by the engine (audio input or
output) is not available.

The resume method operates as defined for engines in the ALLOCATED state. When
resume is called for an engine in the ALLOCATING_RESOURCES state, the method
blocks (waits) until the ALLOCATED state is reached and then operates normally. An
error is thrown when resume is called for an engine is either the DEALLOCATED is
DEALLOCATING_RESOURCES states. state.

The resume method does not always return immediately. Some applications need to
execute resume in a separate thread. The documentation for the allocate method
includes an example implementation of Runnable with inner classes that could also
perform resume in a separate thread.

Throws:
AudioException - if unable to gain access to the audio channel
EngineStateError - if called for an engine in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
pause, getEngineState, ENGINE_RESUMED

getAudioManager
public AudioManager getAudioManager()
Return an object which provides management of the audio input or output for the
Engine.

The AudioManager is available in any state of an Engine.

Returns:
the AudioManager for the engine
getVocabManager
public VocabManager getVocabManager()
throws EngineStateError
Return an object which provides management of the vocabulary for the Engine. See
the VocabManager documentation for a description of vocabularies and their use with
speech engines. Returns null if the Engine does not provide vocabulary management
capabilities.

The VocabManager is available for engines in the ALLOCATED state. The call blocks
for engines in the ALLOCATING_RESOURCES. An error is thrown for engines in the
DEALLOCATED or DEALLOCATING_RESOURCES states.

Returns:
the VocabManager for the engine or null if it does not have a VocabManager
Throws:
EngineStateError - if called for an engine in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
Word

getEngineProperties
public EngineProperties getEngineProperties()
Return the EngineProperties object (a JavaBean).

A Recognizer returns a RecognizerProperties object. The Recognizer interface


also defines a getRecognizerProperties method that returns the same object as
getEngineProperties, but without requiring a cast to be useful.

A Synthesizer returns a SynthesizerProperties object. The Synthesizer


interface also defines a getSynthesizerProperties method that returns the same
object as getEngineProperties, but without requiring a cast to be useful.

The EngineProperties are available in any state of an Engine. However, changes


only take effect once an engine reaches the ALLOCATED state.

Returns:
the EngineProperties object for this engine
See Also:
getRecognizerProperties, RecognizerProperties, getSynthesizerProperties,
SynthesizerProperties

getEngineModeDesc
public EngineModeDesc getEngineModeDesc()
throws SecurityException
Return a mode descriptor that defines the operating properties of the engine. For a
Recognizer the return value is a RecognizerModeDesc. For a Synthesizer the
return value is a SynthesizerModeDesc.

The EngineModeDesc is available in any state of an Engine.

Returns:
an EngineModeDesc for the engine.
Throws:
SecurityException - if the application does not have accessEngineModeDesc
permission

addEngineListener
public void addEngineListener(EngineListener listener)
Request notifications of events of related to the Engine. An application can attach
multiple listeners to an Engine. A single listener can be attached to multiple engines.

The EngineListener is extended for both recognition and synthesis. Typically, a


RecognizerListener is attached to a Recognizer and a SynthesizerListener is
attached to a Synthesizer.

An EngineListener can be attached or removed in any state of an Engine.

Parameters:
listener - the listener that will receive EngineEvents
See Also:
Recognizer, RecognizerListener, Synthesizer, SynthesizerListener

removeEngineListener
public void removeEngineListener(EngineListener listener)
Remove a listener from this Engine. An EngineListener can be attached or removed
in any state of an Engine.
Parameters:
listener - the listener to be removed
public abstract interface Synthesizer
extends Engine
The Synthesizer interface provides primary access to speech synthesis capabilities. The
Synthesizer interface extends the Engine interface. Thus, any Synthesizer implements
basic speech engine capabilities plus the specialized capabilities required for speech
synthesis.

The primary functions provided by the Synthesizer interface are the ability to speak text,
speak Java Speech Markup Language text, and control an output queue of objects to be
spoken.

Creating a Synthesizer
Typically, a Synthesizer is created by a call to the Central.createSynthesizer method.
The procedures for locating, selecting, creating and initializing a Synthesizer are described
in the documentation for Central class.

Synthesis Package: Inherited and Extended Capabilities

A synthesis package inherits many of its important capabilities from the Engine interface
and its related support classes and interfaces. The synthesis package adds specialized
functionality for performing speech synthesis.

 Inherits location mechanism by Central.availableSynthesizers method and


EngineModeDesc.
 Extends EngineModeDesc as SynthesizerModeDesc.
 Inherits allocate and deallocate methods from the Engine interface.
 Inherits pause and resume methods from the Engine interface.
 Inherits getEngineState, waitEngineState and testEngineState methods from
the Engine interface.
 Inherits the DEALLOCATED, ALLOCATED, ALLOCATING_RESOURCES,
DEALLOCATING_RESOURCES, PAUSED and RESUMED states from the Engine interface.
 Adds QUEUE_EMPTY and QUEUE_NOT_EMPTY sub-states to the ALLOCATED state.
 Inherits audio management: see Engine.getAudioManager and AudioManager.
 Inherits vocabulary management: see Engine.getVocabManager and VocabManager.
 Inherits addEngineListener and removeEngineListener methods and uses the
EngineListener interface.
 Extends EngineListener interface to SynthesizerListener.
 Adds speak(Speakable, Listener), speak(URL, Listener), speak(String,
Listener) and speakPlainText(String) methods to place text on the output queue
of the synthesizer.
 Adds phoneme(String) method that converts text to phonemes.
 Adds enumerateQueue, cancel(), cancel(Object) and cancelAll methods for
management of output queue.

Speaking Text

The basic function of a Synthesizer is to speak text provided to it by an application. This


text can be plain Unicode text in a String or can be marked up using the Java Speech
Markup Language (JSML).

Plain text is spoken using the speakPlainText method. JSML text is spoken using one of the
three speak methods. The speak methods obtain the JSML text for a Speakable object, from
a URL, or from a String.

[Note: JSML text provided programmatically (by a Speakable object or a String) does not
require the full XML header. JSML text obtained from a URL requires the full XML header.]

A synthesizer is mono-lingual (it speaks a single language) so the text should contain only the
single language of the synthesizer. An application requiring output of more than one language
needs to create multiple Synthesizer object through Central. The language of the
Synthesizer should be selected at the time at which it is created. The language for a created
Synthesizer can be checked through the Locale of its EngineModeDesc (see
getEngineModeDesc).

Each object provided to a synthesizer is spoken independently. Sentences, phrases and other
structures should not span multiple call to the speak methods.

Synthesizer State System

Synthesizer extends the state system of the generic Engine interface. It inherits the four
basic allocation states, plus the PAUSED and RESUMED states.

Synthesizer adds a pair of sub-states to the ALLOCATED state to represent the state of the
speech output queue (queuing is described in more detail below). For an ALLOCATED
Synthesizer, the speech output queue is either empty or not empty: represented by the states
QUEUE_EMPTY and QUEUE_NOT_EMPTY.

The queue status is independent of the pause/resume status. Pausing or resuming a


synthesizer does not effect the queue. Adding or removing objects from the queue does not
effect the pause/resume status. The only form of interaction between these state systems is
that the Synthesizer only speaks in the RESUMED state, and therefore, a transition from
QUEUE_NOT_EMPTY to QUEUE_EMPTY because of completion of speaking an object is only
possible in the RESUMED state. (A transition from QUEUE_NOT_EMPTY to QUEUE_EMPTY is
possible in the PAUSED state only through a call to one of the cancel methods.)

Speech Output Queue

A synthesizer implements a queue of items provided to it through the speak and


speakPlainText methods. The queue is "first-in, first-out (FIFO)" -- the objects are spoken
in exactly he order in which they are received. The object at the top of the queue is the object
that is currently being spoken or about to be spoken.

The QUEUE_EMPTY and QUEUE_NOT_EMPTY states of a Synthesizer indicate the current state
of of the speech output queue. The state handling methods inherited from the Engine
interface (getEngineState, waitEngineState and testEngineState) can be used to test
the queue state.

The items on the queue can be checked with the enumerateQueue method which returns a
snapshot of the queue.

The cancel methods allows an application to (a) stop the output of item currently at the top
of the speaking queue, (b) remove an arbitrary item from the queue, or (c) remove all items
from the output queue.

Applications requiring more complex queuing mechanisms (e.g. a prioritized queue) can
implement their own queuing objects that control the synthesizer.

Pause and Resume


The pause and resume methods (inherited from the javax.speech.Engine interface) have
behavior like a "tape player". Pause stops audio output as soon as possible. Resume restarts
audio output from the point of the pause. Pause and resume may occur within words, phrases
or unnatural points in the speech output.

Pause and resume do not affect the speech output queue.

In addition to the ENGINE_PAUSED and ENGINE_RESUMED events issued to the


EngineListener (or SynthesizerListener), SPEAKABLE_PAUSED and SPEAKABLE_RESUMED
events are issued to appropriate SpeakableListeners for the Speakable object at the top of
the speaking queue. (The SpeakableEvent is first issued to any SpeakableListener
provided with the speak method, then to each SpeakableListener attached to the
Synthesizer. Finally, the EngineEvent is issued to each SynthesizerListener and
EngineListener attached to the Synthesizer.)

Applications can determine the approximate point at which a pause occurs by monitoring the
WORD_STARTED events.

See Also:
Central, Speakable, SpeakableListener, EngineListener, SynthesizerListener

Field Summary
static long QUEUE_EMPTY
          Bit of state that is set when the speech output queue of a Synthesizer is
empty.
static long QUEUE_NOT_EMPTY
          Bit of state that is set when the speech output queue of a Synthesizer is not
empty.
 
Method Summary
void addSpeakableListener(SpeakableListener listener)
          Request notifications of all SpeakableEvents for all speech output
objects for this Synthesizer.
void cancelAll()
          Cancel all objects in the synthesizer speech output queue and stop
speaking the current top-of-queue object.
void cancel()
          Cancel output of the current object at the top of the output queue.
void cancel(Object source)
          Remove a specified item from the speech output queue.
Enumeration enumerateQueue()
          Return an Enumeration containing a snapshot of all the objects
currently on the speech output queue.
SynthesizerProperties getSynthesizerProperties()
          Return the SynthesizerProperties object (a JavaBean).
String phoneme(String text)
          Returns the phoneme string for a text string.
void removeSpeakableListener(SpeakableListener listener)
          Remove a SpeakableListener from this Synthesizer.
void speakPlainText(String text, SpeakableListener listener)
          Speak a plain text string.
void speak(Speakable JSMLtext, SpeakableListener listener)
          Speak an object that implements the Speakable interface and
provides text marked with the Java Speech Markup Language.
void speak(URL JSMLurl, SpeakableListener listener)
          Speak text from a URL formatted with the Java Speech Markup
Language.
void speak(String JSMLText, SpeakableListener listener)
          Speak a string containing text formatted with the Java Speech
Markup Language.
 
Field Detail
QUEUE_EMPTY
public static final long QUEUE_EMPTY
Bit of state that is set when the speech output queue of a Synthesizer is empty. The
QUEUE_EMPTY state is a sub-state of the ALLOCATED state. An allocated Synthesizer
is always in either the QUEUE_NOT_EMPTY or QUEUE_EMPTY state.

A Synthesizer is always allocated in the QUEUE_EMPTY state. The Synthesizer


transitions from the QUEUE_EMPTY state to the QUEUE_NOT_EMPTY state when a call to
one of the speak methods places an object on the speech output queue. A
QUEUE_UPDATED event is issued to indicate this change in state.

A Synthesizer returns from the QUEUE_NOT_EMPTY state to the QUEUE_EMPTY state


once the queue is emptied because of completion of speaking all objects or because of
a cancel.

The queue status can be tested with the waitQueueEmpty, getEngineState and
testEngineState methods. To block a thread until the queue is empty:

Synthesizer synth = ...;


synth.waitEngineState(QUEUE_EMPTY);

See Also:
QUEUE_NOT_EMPTY, ALLOCATED, getEngineState, waitEngineState,
testEngineState, QUEUE_UPDATED
QUEUE_NOT_EMPTY
public static final long QUEUE_NOT_EMPTY
Bit of state that is set when the speech output queue of a Synthesizer is not empty.
The QUEUE_NOT_EMPTY state is a sub-state of the ALLOCATED state. An allocated
Synthesizer is always in either the QUEUE_NOT_EMPTY or QUEUE_EMPTY state.

A Synthesizer enters the QUEUE_NOT_EMPTY from the QUEUE_EMPTY state when one
of the speak methods is called to place an object on the speech output queue. A
QUEUE_UPDATED event is issued to mark this change in state.

A Synthesizer returns from the QUEUE_NOT_EMPTY state to the QUEUE_EMPTY state


once the queue is emptied because of completion of speaking all objects or because of
a cancel.

See Also:
QUEUE_EMPTY, ALLOCATED, getEngineState, waitEngineState, testEngineState,
QUEUE_UPDATED
Method Detail
speak
public void speak(Speakable JSMLtext,
SpeakableListener listener)
throws JSMLException,
EngineStateError
Speak an object that implements the Speakable interface and provides text marked
with the Java Speech Markup Language. The Speakable object is added to the end of
the speaking queue and will be spoken once it reaches the top of the queue and the
synthesizer is in the RESUMED state.

The synthesizer first requests the text of the Speakable by calling its getJSMLText
method. It then checks the syntax of the JSML markup and throws a JSMLException
if any problems are found. If the JSML text is legal, the text is placed on the speech
output queue.

When the speech output queue is updated, a QUEUE_UPDATE event is issued to


SynthesizerListeners.

Events associated with the Speakable object are issued to the SpeakableListener
object. The listener may be null. A listener attached with this method cannot be
removed with a subsequent remove call. The source for the SpeakableEvents is the
JSMLtext object.

SpeakableEvents can also be received by attaching a SpeakableListener to the


Synthesizer with the addSpeakableListener method. A SpeakableListener
attached to the Synthesizer receives all SpeakableEvents for all speech output
items of the synthesizer (rather than for a single Speakable).
The speak call is asynchronous: it returns once the text for the Speakable has been
obtained, checked for syntax, and placed on the synthesizer's speech output queue. An
application needing to know when the Speakable has been spoken should wait for the
SPEAKABLE_ENDED event to be issued to the SpeakableListener object. The
getEngineState, waitEngineState and enumerateQueue methods can be used to
determine the speech output queue status.

An object placed on the speech output queue can be removed with one of the cancel
methods.

The speak methods operate as defined only when a Synthesizer is in the ALLOCATED
state. The call blocks if the Synthesizer in the ALLOCATING_RESOURCES state and
completes when the engine reaches the ALLOCATED state. An error is thrown for
synthesizers in the DEALLOCATED or DEALLOCATING_RESOURCES states.

Parameters:
JSMLText - object implementing the Speakable interface that provides Java Speech
Markup Language text to be spoken
listener - receives notification of events as synthesis output proceeds
Throws:
JSMLException - if any syntax errors are encountered in JSMLtext
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
speak(String, SpeakableListener), speak(URL, SpeakableListener),
speakPlainText(String, SpeakableListener), SpeakableEvent, addSpeakableListener

speak
public void speak(URL JSMLurl,
SpeakableListener listener)
throws JSMLException,
MalformedURLException,
IOException,
EngineStateError
Speak text from a URL formatted with the Java Speech Markup Language. The text is
obtained from the URL, checked for legal JSML formatting, and placed at the end of
the speaking queue. It is spoken once it reaches the top of the queue and the
synthesizer is in the RESUMED state. In other respects is it identical to the speak
method that accepts a Speakable object.

The source of a SpeakableEvent issued to the SpeakableListener is the URL.

Because of the need to check JSML syntax, this speak method returns only once the
complete URL is loaded, or until a syntax error is detected in the URL stream.
Network delays will cause the method to return slowly.
Note: the full XML header is required in the JSML text provided in the URL. The
header is optional on programmatically generated JSML (ie. with the speak(String,
Listener) and speak(Speakable, Listener) methods.

The speak methods operate as defined only when a Synthesizer is in the ALLOCATED
state. The call blocks if the Synthesizer in the ALLOCATING_RESOURCES state and
completes when the engine reaches the ALLOCATED state. An error is thrown for
synthesizers in the DEALLOCATED or DEALLOCATING_RESOURCES states.

Parameters:
JSMLurl - URL containing Java Speech Markup Language text to be spoken
JSMLException - if any syntax errors are encountered in JSMLtext
listener - receives notification of events as synthesis output proceeds
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
speak(Speakable, SpeakableListener), speak(String, SpeakableListener),
speakPlainText(String, SpeakableListener), SpeakableEvent, addSpeakableListener

speak
public void speak(String JSMLText,
SpeakableListener listener)
throws JSMLException,
EngineStateError
Speak a string containing text formatted with the Java Speech Markup Language. The
JSML text is checked for formatting errors and a JSMLException is thrown if any are
found. If legal, the text is placed at the end of the speaking queue and will be spoken
once it reaches the top of the queue and the synthesizer is in the RESUMED state. In all
other respects is it identical to the speak method that accepts a Speakable object.

The source of a SpeakableEvent issued to the SpeakableListener is the String.

The speak methods operate as defined only when a Synthesizer is in the ALLOCATED
state. The call blocks if the Synthesizer in the ALLOCATING_RESOURCES state and
completes when the engine reaches the ALLOCATED state. An error is thrown for
synthesizers in the DEALLOCATED or DEALLOCATING_RESOURCES states.

Parameters:
JSMLText - String contains Java Speech Markup Language text to be spoken
listener - receives notification of events as synthesis output proceeds
JSMLException - if any syntax errors are encountered in JSMLtext
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
speak(Speakable, SpeakableListener), speak(URL, SpeakableListener),
speakPlainText(String, SpeakableListener)
speakPlainText
public void speakPlainText(String text,
SpeakableListener listener)
throws EngineStateError
Speak a plain text string. The text is not interpreted as containing the Java Speech
Markup Language so JSML elements are ignored. The text is placed at the end of the
speaking queue and will be spoken once it reaches the top of the queue and the
synthesizer is in the RESUMED state. In other respects it is similar to the speak method
that accepts a Speakable object.

The source of a SpeakableEvent issued to the SpeakableListener is the String


object.

The speak methods operate as defined only when a Synthesizer is in the ALLOCATED
state. The call blocks if the Synthesizer in the ALLOCATING_RESOURCES state and
completes when the engine reaches the ALLOCATED state. An error is thrown for
synthesizers in the DEALLOCATED or DEALLOCATING_RESOURCES states.

Parameters:
JSMLText - String contains plaing text to be spoken
listener - receives notification of events as synthesis output proceeds
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
speak(Speakable, SpeakableListener), speak(URL, SpeakableListener), speak(String,
SpeakableListener)

phoneme
public String phoneme(String text)
throws EngineStateError
Returns the phoneme string for a text string. The return string uses the International
Phonetic Alphabet subset of Unicode. The input string is expected to be simple text
(for example, a word or phrase in English). The text is not expected to contain
punctuation or JSML markup.

If the Synthesizer does not support text-to-phoneme conversion or cannot process


the input text it will return null.

If the text has multiple pronunciations, there is no way to indicate which


pronunciation is preferred.

The phoneme method operate as defined only when a Synthesizer is in the


ALLOCATED state. The call blocks if the Synthesizer in the ALLOCATING_RESOURCES
state and completes when the engine reaches the ALLOCATED state. An error is thrown
for synthesizers in the DEALLOCATED or DEALLOCATING_RESOURCES states.

Parameters:
text - plain text to be converted to phonemes
Returns:
phonemic representation of text or null
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states

enumerateQueue
public Enumeration enumerateQueue()
throws EngineStateError
Return an Enumeration containing a snapshot of all the objects currently on the
speech output queue. The first item is the top of the queue. An empty queue returns a
null object.

If the return value is non-null then each object it contains is guaranteed to be a


SynthesizerQueueItem object representing the source object (Speakable object,
URL, or a String) and the JSML or plain text obtained from that object.

A QUEUE_UPDATED event is issued to each SynthesizerListener whenever the


speech output queue changes. A QUEUE_EMPTIED event is issued whenever the queue
the emptied.

This method returns only the items on the speech queue placed there by the current
application or applet. For security reasons, it is not possible to inspect items placed by
other applications.

The items on the speech queue cannot be modified by changing the object returned
from this method.

The enumerateQueue method works in the ALLOCATED state. The call blocks if the
Synthesizer in the ALLOCATING_RESOURCES state and completes when the engine
reaches the ALLOCATED state. An error is thrown for synthesizers in the DEALLOCATED
or DEALLOCATING_RESOURCES states.

Returns:
an Enumeration of the speech output queue or null
Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
SynthesizerQueueItem, QUEUE_UPDATED, QUEUE_EMPTIED,
addEngineListener
cancel
public void cancel()
throws EngineStateError
Cancel output of the current object at the top of the output queue. A
SPEAKABLE_CANCELLED event is issued to appropriate SpeakableListeners.

If there is another object in the speaking queue, it is moved to top of queue and
receives the TOP_OF_QUEUE event. If the Synthesizer is not paused, speech output
continues with that object. To prevent speech output continuing with the next object
in the queue, call pause before calling cancel.

A SynthesizerEvent is issued to indicate QUEUE_UPDATED (if objects remain on the


queue) or QUEUE_EMPTIED (if the cancel leaves the queue empty).

It is not an exception to call cancel if the speech output queue is empty.

The cancel methods work in the ALLOCATED state. The calls blocks if the
Synthesizer in the ALLOCATING_RESOURCES state and complete when the engine
reaches the ALLOCATED state. An error is thrown for synthesizers in the DEALLOCATED
or DEALLOCATING_RESOURCES states.

Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
cancel(Object), cancelAll(), QUEUE_UPDATED, QUEUE_EMPTIED,
TOP_OF_QUEUE, SPEAKABLE_CANCELLED

cancel
public void cancel(Object source)
throws IllegalArgumentException,
EngineStateError
Remove a specified item from the speech output queue. The source object must be
one of the items passed to a speak method. A SPEAKABLE_CANCELLED event is issued
to appropriate SpeakableListeners.

If the source object is the top item in the queue, the behavior is the same as the
cancel() method.

If the source object is not at the top of the queue, it is removed from the queue
without affecting the current top-of-queue speech output. A QUEUE_UPDATED is then
issued to SynthesizerListeners.

If the source object appears multiple times in the queue, only the first instance is
cancelled.
Warning: cancelling an object just after the synthesizer has completed speaking it and
has removed the object from the queue will cause an exception. In this instance, the
exception can be ignored.

The cancel methods work in the ALLOCATED state. The calls blocks if the
Synthesizer in the ALLOCATING_RESOURCES state and complete when the engine
reaches the ALLOCATED state. An error is thrown for synthesizers in the DEALLOCATED
or DEALLOCATING_RESOURCES states.

Parameters:
source - object to be removed from the speech output queue
Throws:
IllegalArgumentException - if the source object is not found in the speech output
queue.
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
cancel(), cancelAll(), QUEUE_UPDATED, QUEUE_EMPTIED,
SPEAKABLE_CANCELLED

cancelAll
public void cancelAll()
throws EngineStateError
Cancel all objects in the synthesizer speech output queue and stop speaking the
current top-of-queue object.

The SpeakableListeners of each cancelled item on the queue receive a


SPEAKABLE_CANCELLED event. A QUEUE_EMPTIED event is issued to attached
SynthesizerListeners.

A cancelAll is implictly performed before a Synthesizer is deallocated.

The cancel methods work in the ALLOCATED state. The calls blocks if the
Synthesizer in the ALLOCATING_RESOURCES state and complete when the engine
reaches the ALLOCATED state. An error is thrown for synthesizers in the DEALLOCATED
or DEALLOCATING_RESOURCES states.

Throws:
EngineStateError - if called for a synthesizer in the DEALLOCATED or
DEALLOCATING_RESOURCES states
See Also:
cancel(), cancel(Object), QUEUE_EMPTIED, SPEAKABLE_CANCELLED

getSynthesizerProperties
public SynthesizerProperties getSynthesizerProperties()
Return the SynthesizerProperties object (a JavaBean). The method returns exactly
the same object as the getEngineProperties method in the Engine interface.
However, with the getSynthesizerProperties method, an application does not
need to cast the return value.

The SynthesizerProperties are available in any state of an Engine. However,


changes only take effect once an engine reaches the ALLOCATED state.

Returns:
the SynthesizerProperties object for this engine
See Also:
getEngineProperties

addSpeakableListener
public void addSpeakableListener(SpeakableListener listener)
Request notifications of all SpeakableEvents for all speech output objects for this
Synthesizer. An application can attach multiple SpeakableListeners to a
Synthesizer. A single listener can be attached to multiple synthesizers.

When an event effects more than one item in the speech output queue (e.g.
cancelAll), the SpeakableEvents are issued in the order of the items in the queue
starting with the top of the queue.

A SpeakableListener can also provided for an indivudal speech output item by


providing it as a parameter to one of the speak or speakPlainText methods.

A SpeakableListener can be attached or removed in any Engine state.

Parameters:
listener - the listener that will receive SpeakableEvents
See Also:
removeSpeakableListener

removeSpeakableListener
public void removeSpeakableListener(SpeakableListener listener)
Remove a SpeakableListener from this Synthesizer.

A SpeakableListener can be attached or removed in any Engine state

Refernces

[1] V. Digalakis, L. Neumeyer and M. Perakakis


“Quantization of Cepstral Parameters for Speech
Recognition Over the World Wide Web,”
Proceedings ICASSP 98, pp. 989-992, 1998.
[2] D. Goddeau, W. Goldenthal and C. Weikart,
“Deploying speech applications over the Web,” Proc.
Eurospeech, pp. 685-688, September 1997.
[3] S. Bayer, “Embedding speech in Web interfaces,”
Proc. ICSLP, pp. 1684-1687, October 1996.
[4] L. Rabiner and B. Juang, Fundamentals of Speech
Recognition, Prentice-Hall, 1993.

You might also like