Dragonfly
Dragonfly
Release 0.6.6b1
Christo Butcher
2016-06-12
Contents
Documentation
1.1 Introduction . . . .
1.2 Object model . . . .
1.3 Engines sub-package
1.4 Actions sub-package
1.5 Miscellaneous topics
1.6 Project . . . . . . .
1.7 Test suite . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
6
20
22
32
36
39
Usage example
51
53
55
57
ii
Dragonfly is a speech recognition framework. It is a Python package which offers a high-level object model and allows
its users to easily write scripts, macros, and programs which use speech recognition.
It currently supports the following speech recognition engines:
Dragon NaturallySpeaking (DNS), a product of Nuance
Windows Speech Recognition (WSR), included with Microsoft Windows Vista, Windows 7, and freely available
for Windows XP
Dragonflys documentation is available online at Read the Docs. Dragonflys FAQ is available at Stackoverflow.
Dragonflys mailing list/discussion group is available at Google Groups.
Contents
Contents
CHAPTER 1
Documentation
1.1 Introduction
Contents:
computer listen to them and speak to them. This is exactly how some of Dragonflys users were introduced to writing
software.
Dragonfly also offers a robust and unified platform for people using speech recognition to increase their productivity
and efficiency. An entire repository of Dragonfly command-modules is available which contains command grammars
for controlling common applications and automating frequent desktop activities.
1.1.2 Installation
This section describes how to install Dragonfly. The installation procedure of Dragonfly itself is straightforward. Its
dependencies, however, differ depending on which speech recognition engine is used.
Prerequisites
To be able to use the dragonfly, you will need the following:
Python, v2.5 or later for example available from ActiveState.
Win32 extensions for Python already included in ActiveStates Python distribution or available from Mark
Hammonds page.
Natlink (only for Dragon NaturallySpeaking users) for example available from Daniel Rocco.
Installation of Dragonfly
Dragonfly is a Python package. A simple installer of the next, next, finish type is available from the projects download
page. This installer was created with Pythons standard distutils and has been tested on Microsoft Windows XP and
Vista.
Dragonflys installer will install the library in your Pythons local site-packages directory under the dragonfly
subdirectory.
Installation for Dragon NaturallySpeaking
Dragonfly uses Natlink to communicate with DNS. Natlink is available in various forms, including Daniel Roccos
efficient and tidy pure-Python package. It is available here.
Once Natlink is up and running, Dragonfly command-modules can be treated as any other Natlink macro files. Natlink
automatically loads macro files from a predefined directory. Common locations are:
C:\Program Files\NatLink\MacroSystem
My Documents\Natlink
At least one of these should be present after installing Natlink. That is the place where you should put Dragonfly
command-modules so that Natlink will load them. Dont forget to turn the microphone off and on again after placing
a new command-modules in the Natlink directory, because otherwise Natlink does not immediately see the new file.
Installation for Windows Speech Recognition
If WSR is available, then no extra installation needs to be done. Dragonfly can find and communicate with WSR using
standard COM communication channels.
If you would like to use Dragonfly command-modules with WSR, then you must run a loader program which will load and manage the command-modules.
A simple loader is available in the
Chapter 1. Documentation
dragonfly/examples/dragonfly-main.py file. When run, it will scan the directory its in for other *.py
files and try to load them as command-modules.
1.1. Introduction
Chapter 1. Documentation
The speech recognition engine processes the audio it receives and calls the following methods of grammar classes to
notify them of the results:
Grammar.process_begin(): Called when the engine detects the start of a phrase, e.g. when the user
starts to speak. This method checks the grammars context and activates or deactivates its rules depending on
whether the context matches.
Grammar._process_begin(): Called by Grammar.process_begin() allowing derived classes
to easily implement custom functionality without losing the context matching implemented in
Grammar.process_begin().
Grammar.process_recognition(): Called when recognition has completed successfully and results
are meant for this grammar.
Grammar.process_recognition_other(): Called when recognition has completed successfully, but
the results are not meant for this grammar.
Grammar.process_recognition_failure(): Called when recognition was not successful, e.g. the
microphone picked up background noise.
The last three methods are not defined for the base Grammar class. They are only called if they are defined for derived
classes.
Grammar class
Chapter 1. Documentation
1.2.2 Rules
This section describes the following classes:
dragonfly.grammar.rule_base.Rule the base rule class
dragonfly.grammar.rule_compound.CompoundRule a rule class of which the root element is a
dragonfly.grammar.element_compound.Compound element.
dragonfly.grammar.rule_mapping.MappingRule a rule class for creating multiple spoken-form
-> semantic value voice-commands.
Rule class
class Rule(name=None, element=None, context=None, imported=False, exported=False)
Rule class for implementing complete or partial voice-commands.
This rule class represents a voice-command or part of a voice- command. It contains a root element, which
defines the language construct of this rule.
Constructor arguments:
name (str) name of this rule. If None, a unique name will automatically be generated.
element (Element) root element for this rule
context (Context, default: None) context within which to be active. If None, the rule will always
be active when its grammar is active.
imported (boolean, default: False) if true, this rule is imported from outside its grammar
exported (boolean, default: False) if true, this rule is a complete top-level rule which can be
spoken by the user. This should be True for voice-commands that the user can speak.
The self._log logger objects should be used in methods of derived classes for logging purposes. It is a standard
logger object from the logger module in the Python standard library.
active
This rules active state. (Read-only)
disable()
Disable this grammar so that it is never active to receive recognitions, regardless of whether its context
matches or not.
element
This rules root element. (Read-only)
enable()
Enable this grammar so that it is active to receive recognitions when its context matches.
enabled
This rules enabled state. An enabled rule is active when its context matches, a disabled rule is never
active regardless of context. (Read-only)
exported
This rules exported status. See Exported rules for more info. (Read-only)
grammar
This rules grammar object. (Set once)
imported
This rules imported status. See Imported rules for more info. (Read-only)
name
This rules name. (Read-only)
process_begin(executable, title, handle)
Start of phrase callback.
This method is called when the speech recognition engine detects that the user has begun to speak a
phrase. It is called by the rules containing grammar if the grammar and this rule are active.
The default implementation of this method checks whether this rules context matches, and if it does this
method calls _process_begin().
Arguments:
executable the full path to the module whose window is currently in the foreground
title window title of the foreground window
handle window handle to the foreground window
process_recognition(node)
Rule recognition callback.
This method is called when the user has spoken words matching this rules contents. This method is called
only once for each recognition, and only for the matching top-level rule.
The default implementation of this method does nothing.
10
Chapter 1. Documentation
Note: This is generally the method which developers should override in derived rule classes to give them
custom functionality when a top-level rule is recognized.
value(node)
Start of phrase callback.
This method is called to obtain the semantic value associated with a particular recognition. It could be
called from another rules value() if that rule references this rule. If also be called from this rules
process_recognition() if that method has been overridden to do so in a derived class.
The default implementation of this method returns the value of this rules root element.
Note: This is generally the method which developers should override in derived rule classes to change
the default semantic value of a recognized rule.
CompoundRule class
The CompoundRule class is designed to make it very easy to create a rule based on a single compound spec.
This rule class has the following parameters to customize its behavior:
spec compound specification for the rules root element
extras extras elements referenced from the compound spec
defaults default values for the extras
exported whether the rule is exported
context context in which the rule will be active
Each of these parameters can be passed as a (keyword) arguments to the constructor, or defined as a class attribute in
a derived class.
Example usage
11
Class reference
12
= {
"[feed] address [bar]":
"subscribe [[to] [this] feed]":
"paste [feed] address":
"feeds | feed (list | window | win)":
"down [<n>] (feed | feeds)":
"up [<n>] (feed | feeds)":
"open [item]":
"newer [<n>]":
"older [<n>]":
"mark all [as] read":
"mark all [as] unread":
Key("a-d"),
Key("a-u"),
Key("a-d, c-v, enter"),
Key("a-d, tab:2, s-tab"),
Key("a-d, tab:2, s-tab, down:%(n)d"),
Key("a-d, tab:2, s-tab, up:%(n)d"),
Key("a-d, tab:2, c-s"),
Key("a-d, tab:2, up:%(n)d"),
Key("a-d, tab:2, down:%(n)d"),
Key("cs-r"),
Key("cs-u"),
Chapter 1. Documentation
"search [bar]":
"search [for] <text>":
Key("a-s"),
Key("a-s") + Text("%(text)s\n"),
}
= [
Integer("n", 1, 20),
Dictation("text"),
]
defaults = {
"n": 1,
}
extras
rule = ExampleRule()
grammar.add_rule(rule)
Class reference
13
Dictation free-form dictation; this element matches any words the speaker says, and includes facilities for
formatting the spoken words with correct spacing and capitalization
DictListRef reference to a dragonfly.all.DictList object; this element is similar to the
dragonfly.all.ListRef element, except that it returns the value associated with the spoken words instead of the spoken words themselves
ElementBase class
class ElementBase(name=None, default=None)
Base class for all other element classes.
Constructor argument:
name (str, default: None) the name of this element; can be used when interpreting complex
recognition for retrieving elements by name.
_copy_sequence(sequence, name, item_types=None)
Utility function for derived classes that checks that a given object is a sequence, copies its contents into a
new tuple, and checks that each item is of a given type.
_get_children()
Returns an iterable of this elements children.
This method is used by the children() property, and should be overloaded by any derived classes to
give the correct children element.
By default, this method returns an empty tuple.
children
Iterable of child elements. (Read-only)
decode(state)
Attempt to decode the recognition stored in the given state.
dependencies(memo)
Returns an iterable containing the dependencies of this element and of this elements children.
The dependencies are the objects that are necessary for this element. These include lists and other rules.
element_tree_string()
Returns a formatted multi-line string representing this element and its children.
gstring()
Returns a formatted grammar string of the contents of this element and its children.
The grammar string is of a format similar to that used by Natlink to define its grammars.
value(node)
Determine the semantic value of this element given the recognition results stored in the node.
Argument:
node a dragonfly.grammar.state.Node instance representing this element within
the recognition parse tree
The default behavior of this method is to return an iterable containing the recognized words matched by
this element (i.e. node.words()).
Sequence class
class Sequence(children=(), name=None, default=None)
Element class representing a sequence of child elements which must all match a recognition in the correct order.
Constructor arguments:
children (iterable, default: ()) the child elements of this element
14
Chapter 1. Documentation
15
Repetition class
class Repetition(child, min=1, max=None, name=None, default=None)
Element class representing a repetition of one child element.
Constructor arguments:
child (ElementBase) the child element of this element
min (int, default: 1) the minimum number of times that the child element must be recognized;
may be 0
max (int, default: None) the maximum number of times that the child element must be recognized;
if None, the child element must be recognized exactly min times (i.e. max = min + 1)
name (str, default: None) the name of this element
For a recognition to match, at least one of the child elements must match the recognition. The first matching
child is used. Child elements are searched in the order they are given in the children constructor argument.
children
Iterable of child elements. (Read-only)
get_repetitions(node)
Returns a list containing the nodes associated with each repetition of this elements child element.
Argument:
node (Node) the parse tree node associated with this repetition element; necessary for
searching for child elements within the parse tree
value(node)
The value of a Repetition is a list containing the values of its child.
The length of this list is equal to the number of times that the child element was recognized.
Literal class
class Literal(text, name=None, value=None, default=None)
children
Iterable of child elements. (Read-only)
dependencies(memo)
Returns an iterable containing the dependencies of this element and of this elements children.
The dependencies are the objects that are necessary for this element. These include lists and other rules.
RuleRef class
class RuleRef(rule, name=None, default=None)
children
Iterable of child elements. (Read-only)
ListRef class
class ListRef(name, list, key=None, default=None)
children
Iterable of child elements. (Read-only)
16
Chapter 1. Documentation
DictListRef class
class DictListRef(name, dict, key=None, default=None)
children
Iterable of child elements. (Read-only)
Dictation class
class Dictation(name=None, format=True, default=None)
children
Iterable of child elements. (Read-only)
dependencies(memo)
Returns an iterable containing the dependencies of this element and of this elements children.
The dependencies are the objects that are necessary for this element. These include lists and other rules.
Compound class
class Compound(spec, extras=None, actions=None, name=None, value=None, value_func=None, elements=None, default=None)
Choice class
class Choice(name, choices, extras=None, default=None)
17
firefox_context = AppContext(executable="firefox")
reader_context = AppContext(executable="firefox", title="Google Reader")
firefox_but_not_reader_context = firefox_context & ~reader_context
Class reference
class AppContext(executable=None, title=None, exclude=False)
Context class using foreground application details.
This class determines whether the foreground window meets certain requirements. Which requirements must be
met for this context to match are determined by the constructor arguments.
Constructor arguments:
executable (str) (part of) the path name of the foreground applications executable; case insensitive
title (str) (part of) the title of the foreground window; case insensitive
class Context
Base class for other context classes.
This base class implements some basic infrastructure, including whats required for logical operations on context
objects. Derived classes should at least do the following things:
During initialization, set self._str to some descriptive, human readable value. This attribute is used by the
__str__() method.
Overload the Context.matches() method to implement the logic to determine when to be active.
The self._log logger objects should be used in methods of derived classes for logging purposes. It is a standard
logger object from the logger module in the Python standard library.
matches(executable, title, handle)
Indicate whether the system is currently within this context.
Arguments:
executable (str) path name to the executable of the foreground application
title (str) title of the foreground window
handle (int) window handle to the foreground window
The default implementation of this method simply returns True.
Note: This is generally the method which developers should overload to give derived context classes
custom functionality.
1.2.5 Grammars
A grammar is a collection of rules. It manages the rules, loading and unloading them, activating and deactivating
them, and it takes care of all communications with the speech recognition engine. When a recognition occurs, the
associated grammar receives the recognition event and dispatches it to the appropriate rule.
Normally a grammar is associated with a particular context or functionality. Normally the rules within a grammar are
somehow related to each other. However, neither of these is strictly necessary, they are just common use patterns.
The Grammar class and derived classes are described in the Grammar classes section.
1.2.6 Rules
Rules represent voice commands or parts of voice commands. Each rule has a single root element, the basis of a tree
structure of elements defining how the rule is built up out of speakable parts. The element tree determines what a user
18
Chapter 1. Documentation
1.2.7 Elements
Elements are the basic building blocks of the language model. They define exactly what can be said and thereby form
the content of rules. The most common elements are:
Literal one or more literal words
Sequence a series of other elements.
19
Alternative a choice of other elements, only one of which can be said within a single recognition
Optional an element container which makes its single child element optional
RuleRef a reference to another rule
ListRef a reference to a list, which is a dynamic language element which can be updated and modified
without reloading the grammar
Dictation a free-form dictation element which allows the speaker to say one or more natural language
words
The above mentioned element types are at the heart of Dragonflys object model. But of course using them all the time
to specify every grammar would be quite tedious. There is therefore also a special element which constructs these
basic element types from a string specification:
Compound a special element which parses a string spec to create a hierarchy of basic elements.
20
Chapter 1. Documentation
21
a2 = Text("Hello world!")
a2.execute()
a4 = a1 + a2
a4.execute()
a3 = Key("a-f, down/25:4")
a4.execute()
Key("w-b, right/25:5").execute()
a4 += a3
22
Chapter 1. Documentation
Dragonflys action framework allows for easy definition of things to do, such as text input and sending keystrokes. It
also allows these things to be dynamically coupled to voice commands, so as to enable the actions to contain dynamic
elements from the recognized command.
An example would be a voice command to find some bit of text:
Command specification: please find <text>
Associated action: Key("c-f") + Text("%(text)s")
Special element: Dictation("text")
This triplet would allow the user to say please find some words, which would result in control-f being pressed to
open the Find dialogue followed by some words being typed into the dialog. The special element is necessary to
define what the dynamic element text is.
Named Repeat factors retrieved their factor-value from the supplied data:
>>> named = Repeat(extra="foo")
>>> named.factor()
Traceback (most recent call last):
...
ActionError: No extra repeat factor found for name 'foo' ('NoneType' object is unsubscriptable)
>>> named.factor({"foo": 4})
4
Repeat factors with both integer count and named extra values set combined (add) these together to determine
their factor-value:
>>> combined = Repeat(count=3, extra="foo")
>>> combined.factor()
Traceback (most recent call last):
...
ActionError: No extra repeat factor found for name 'foo' ('NoneType' object is unsubscriptable)
>>> combined.factor({"foo": 4}) # Combined factors 3 + 4 = 7.
7
23
Key action
This section describes the Key action object. This type of action is used for sending keystrokes to the foreground
application. Examples of how to use this class are given in Example key actions.
Keystroke specification format
The spec argument passed to the Key constructor specifies which keystroke events will be emulated. It is a string
consisting of one or more comma-separated keystroke elements. Each of these elements has one of the following two
possible formats:
Normal press-release key action, optionally repeated several times: [modifiers -] keyname [/ innerpause] [: repeat] [/ outerpause]
Press-and-hold a key, or release a held-down key: [modifiers -] keyname : direction [/ outerpause]
The different parts of the keystroke specification are as follows. Note that only keyname is required; the other fields
are optional.
modifiers Modifiers for this keystroke. These keys are held down while pressing the main keystroke. Can be
zero or more of the following:
a alt key
c control key
s shift key
w Windows key
keyname Name of the keystroke. Valid names are listed in Key names.
innerpause The time to pause between repetitions of this keystroke.
repeat The number of times this keystroke should be repeated. If not specified, the key will be pressed and
released once.
outerpause The time to pause after this keystroke.
direction Whether to press-and-hold or release the key. Must be one of the following:
down press and hold the key
up release the key
Note that releasing a key which is not being held down does not cause an error. It harmlessly does nothing.
Key names
24
Chapter 1. Documentation
Symbol keys: bang or exclamation, at, hash, dollar, percent, caret, and or ampersand, star
or asterisk, leftparen or lparen, rightparen or rparen, minus or hyphen, underscore,
plus, backtick, tilde, leftbracket or lbracket, rightbracket or rbracket, leftbrace
or lbrace, rightbrace or rbrace, backslash, bar, colon, semicolon, apostrophe or
singlequote or squote, quote or doublequote or dquote, comma, dot, slash, lessthan or
leftangle or langle, greaterthan or rightangle or rangle, question, equal or equals
Whitespace and editing keys: enter, tab, space, backspace, delete or del
Modifier keys: shift, control or ctrl, alt
Special keys: escape, insert, pause, win, apps or popup
Navigation keys: up, down, left, right, pageup or pgup, pagedown or pgdown, home, end
Number pad keys: npmul, npadd, npsep, npsub, npdec, npdiv, numpad0 or np0, numpad1 or np1,
numpad2 or np2, numpad3 or np3, numpad4 or np4, numpad5 or np5, numpad6 or np6, numpad7 or
np7, numpad8 or np8, numpad9 or np9
Function keys: f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, f16, f17, f18, f19,
f20, f21, f22, f23, f24
Multimedia keys: volumeup or volup, volumedown or voldown, volumemute or volmute,
tracknext, trackprev, playpause, browserback, browserforward
Example key actions
The following code types the text Hello world! into the foreground application:
Key("H, e, l, l, o, space, w, o, r, l, d, exclamation").execute()
The following code is a bit more useful, as it saves the current file with the name dragonfly.txt (this works for many
English-language applications):
action = Key("a-f, a/50") + Text("dragonfly.txt") + Key("enter")
action.execute()
The following code selects the next four lines by holding down the shift key, slowly moving down 4 lines, and then
releasing the shift key:
Key("shift:down, down/25:4, shift:up").execute()
The following code locks the screen by pressing the Windows key together with the l:
Key("w-l").execute()
25
The format of the keystroke specification spec is described in Keystroke specification format.
This class emulates keyboard activity by sending keystrokes to the foreground application. It does this using
Dragonflys keyboard interface implemented in the keyboard and sendinput modules. These use the
sendinput() function of the Win32 API.
Text action
This section describes the Text action object. This type of action is used for typing text into the foreground application.
It differs from the Key action in that Text is used for typing literal text,
while
dragonfly.actions.action_key.Key emulates pressing keys on the keyboard. An example of this
is that the arrow-keys are not part of a text and so cannot be typed using the Text action, but can be sent by the
dragonfly.actions.action_key.Key action.
class Text(spec=None, static=False, pause=0.02, autofmt=False)
Action that sends keyboard events to type text.
Arguments:
spec (str) the text to type
static (boolean) if True, do not dynamically interpret spec when executing this action
pause (float) the time to pause between each keystroke, given in seconds
autofmt (boolean) if True, attempt to format the text with correct spacing and capitalization. This
is done by first mimicking a word recognition and then analyzing its spacing and capitalization and
applying the same formatting to the text.
Paste action
class Paste(contents, format=None, paste=None, static=False)
Paste-from-clipboard action.
Constructor arguments:
contents (str) contents to paste
format (int, Win32 clipboard format) clipboard format
paste (instance derived from ActionBase) paste action
static (boolean) flag indicating whether the specification contains dynamic elements
This action inserts the given contents into the Windows system clipboard, and then performs the paste action to
paste it into the foreground application. By default, the paste action is the Control-v keystroke. The default
clipboard format to use is the Unicode text format.
Mouse action
This section describes the Mouse action object. This type of action is used for controlling the mouse cursor and
clicking mouse button.
Below youll find some simple examples of Mouse usage, followed by a detailed description of the available mouse
events.
Example mouse actions
The following code moves the mouse cursor to the center of the foreground window ((0.5, 0.5)) and then clicks
the left mouse button once (left):
26
Chapter 1. Documentation
The line below moves the mouse cursor to 100 pixels left of the desktops right edge and 250 pixels down from its top
edge ([-100, 250]), and then double clicks the right mouse button (right:2):
# Square brackets ("[...]") give desktop-relative locations.
# Integer locations ("1", "100", etc.) denote numbers of pixels.
# Negative numbers ("-100") are counted from the right-edge or the
# bottom-edge of the desktop or window.
Mouse("[-100, 250], right:2").execute()
The following command drags the mouse from the top right corner of the foreground window ((0.9, 10),
left:down) to the bottom left corner ((25, -0.1), left:up):
Mouse("(0.9, 10), left:down, (25, -0.1), left:up").execute()
The code below moves the mouse cursor 25 pixels right and 25 pixels up (<25, -25>):
# Angle brackets ("<...>") move the cursor from its current position
# by the given number of pixels.
Mouse("<25, -25>").execute()
The spec argument passed to the Mouse constructor specifies which mouse events will be emulated. It is a string
consisting of one or more comma-separated elements. Each of these elements has one of the following possible
formats:
Mouse movement actions:
location is absolute on the entire desktop: [ number , number ]
location is relative to the foreground window: ( number , number )
move the cursor relative to its current position: < pixels , pixels >
In the above specifications, the number and pixels have the following meanings:
number can specify a number of pixels or a fraction of the reference window or desktop. For example:
(10, 10) 10 pixels to the right and down from the foreground windows left-top corner
(0.5, 0.5) center of the foreground window
pixels specifies the number of pixels
Mouse button-press action:
keyname [: repeat] [/ pause]
keyname Specifies which mouse button to click:
left left mouse button key
middle middle mouse button key
27
28
Chapter 1. Documentation
Class reference
If an error occurs during mimicking the given recognition, then an ActionError is raised. A common error is
that the engine does not know the given words and can therefore not recognize them. For example, the following
attempts to mimic recognition of one single word including a space and an exclamation-mark; this will almost
certainly fail:
Mimic("hello world!").execute()
The constructor accepts the optional extra keyword argument, and uses this to retrieve dynamic data from
the extras associated with the recognition. For example, this can be used as follows to implement dynamic
mimicking:
class ExampleRule(MappingRule):
mapping = {
"mimic recognition <text> [<n> times]":
Mimic(extra="text") * Repeat(extra="n"),
}
extras
= [
IntegerRef("n", 1, 10),
Dictation("text"),
29
]
defaults = {
"n": 1,
}
The example above will allow the user to speak mimic recognition hello world! 3 times, which would result
in the exact same output as if the user had spoken hello world! three times in a row.
Playback action
The Playback action mimics a sequence of recognitions. This is for example useful for repeating a series of prerecorded or predefined voice-commands.
This class could for example be used to reload with one single action:
action = Playback([
(["focus", "Natlink"], 1.0),
(["File"], 0.5),
(["Reload"], 0.0),
])
action.execute()
Class reference
30
Chapter 1. Documentation
FocusWindow action
class FocusWindow(executable=None, title=None)
Bring a window to the foreground action.
Constructor arguments:
executable (str) part of the filename of the applications executable to which the target window
belongs; not case sensitive.
title (str) part of the title of the target window; not case sensitive.
This action searches all visible windows for a window which matches the given parameters.
BringApp and StartApp actions
The StartApp and BringApp action classes are used to start an application and bring it to the foreground.
StartApp starts an application by running an executable file, while BringApp first checks whether the application
is already running and if so brings it to the foreground, otherwise starts it by running the executable file.
Example usage
The following example brings Notepad to the foreground if it is already open, otherwise it starts Notepad:
BringApp(r"C:\Windows\system32\notepad.exe").execute()
Note that the path to notepad.exe given above might not be correct for your computer, since it depends on the operating
system and its configuration.
In some cases an application might be accessible simply through the file name of its executable, without specifying
the directory. This depends on the operating systems path configuration. For example, on the authors computer the
following command successfully starts Notepad:
BringApp("notepad").execute()
Class reference
31
Pause action
class Pause(spec=None, static=False)
Pause for the given amount of time.
The spec constructor argument should be a string giving the time to wait. It should be given in hundredths of a
second. For example, the following code will pause for 20/100s = 0.2 seconds:
Pause("20").execute()
The reason the spec must be given as a string is because it can then be used in dynamic value evaluation. For
example, the following code determines the time to pause at execution time:
action = Pause("%(time)d")
data = {"time": 37}
action.execute(data)
An instance of something contains clipboard data. The data stored within an instance can be transferred to and from
the Windows system clipboard as follows: (before running this example, the text asdf was copied into the Windows
system clipboard)
>>> from dragonfly.windows.clipboard import Clipboard
>>> instance = Clipboard()
# Create empty instance.
>>> print instance
Clipboard()
>>> instance.copy_from_system()
# Retrieve from system clipboard.
>>> print instance
Clipboard(unicode=u'asdf', text, oemtext, locale)
>>> # The line above shows that *instance* now contains content for
>>> # 4 different clipboard formats: unicode, text, oemtext, locale.
>>> # The unicode format content is also displayed.
32
Chapter 1. Documentation
>>> instance.copy_to_system()
The situation frequently occurs that a developer would like to use the Windows system clipboard to perform some task
without the data currently stored in it being lost. This backing up and restoring can easily be achieved as follows:
>>> from dragonfly.windows.clipboard import Clipboard
>>> # Initialize instance with system clipboard content.
... original = Clipboard(from_system=True)
>>> print original
Clipboard(unicode=u'asdf', text, oemtext, locale)
>>> # Use the system clipboard to do something.
... temporary = Clipboard({Clipboard.format_unicode: u"custom content"})
>>> print temporary
Clipboard(unicode=u'custom content')
>>> temporary.copy_to_system()
>>> from dragonfly.all import Key
>>> Key("c-v").execute()
>>> # Restore original system clipboard content.
... print Clipboard(from_system=True) # Show system clipboard contents.
Clipboard(unicode=u'custom content', text, oemtext, locale)
>>> original.copy_to_system()
>>> print Clipboard(from_system=True) # Show system clipboard contents.
Clipboard(unicode=u'asdf', text, oemtext, locale)
Clipboard class
33
Arguments:
format (int) the clipboard format to look for.
has_text()
Determine whether this instance has text content.
The main program using Dragonflys configuration toolkit would normally look something like this:
from dragonfly.all import Config, Section, Item
# *Setup* phase.
# This defines a configuration object with the name "Example
# configuration". It contains one section with the title
# "Test section", which has two configuration items. Both
# these items have a default value and a docstring.
config
= Config("Example configuration")
config.test
= Section("Test section")
config.test.fruit
= Item("apple", doc="Must eat fruit.")
config.test.color
= Item("blue", doc="The color of life.")
# *Load* phase.
# This searches for a file with the same name as the main program,
# but with the extension ".py" replaced by ".txt". It is also
# possible to explicitly specify the configuration file's path.
# See Config.load() for more details.
config.load()
# *Use* phase.
# The configuration values can now be accessed through the
# configuration object as follows.
print "The color of life is", config.test.color
print "You must eat an %s every day" % config.test.fruit
34
Chapter 1. Documentation
The configuration defined above is basically complete. Every configuration item has a default value and can be
accessed by the program. But if the user would like to modify some or all of these settings, he can do so in an external
configuration file without modifying the main program code.
This external configuration file is interpreted as Python code. This gives its author powerful tools for determining the
desired configuration settings. However, it will usually consist merely of variable assignments. The configuration file
for the program above might look something like this:
# Test section
test.fruit = "banana"
test.color = "white"
Implementation details
This configuration toolkit makes use of Pythons special methods for setting and retrieving object attributes. This
makes it much easier to use, as there is no need to use functions such as value = get_config_value(item_name);
instead the configuration values are immediately accessible as Python objects. It also allows for more extensive error
checking; it is for example trivial to implement custom Item classes which only allow specific values or value types,
such as integers, boolean values, etc.
Configuration class reference
class Config(name)
Configuration class for storing program settings.
Constructor argument:
name (str) the name of this configuration object.
This class can contain zero or more Section instances, each of which can contain zero or more Item instances. It is these items which store the actual configuration settings. The sections merely divide the items up
into groups, so that different configuration topics can be split for easy readability.
generate_config_file(path=None)
Create a configuration file containing this configuration objects current settings.
path (str, default: None) path to the configuration file to load. If None, then a path is generated
from the calling modules file name by replacing its extension with .txt.
load(path=None)
Load the configuration file at the given path, or look for a configuration file associated with the calling
module.
path (str, default: None) path to the configuration file to load. If None, then a path is generated
from the calling modules file name by replacing its extension with .txt.
If the path is a file, it is loaded. On the other hand, if it does not exist or is not a file, nothing is loaded and
this configurations defaults remain in place.
class Section(doc)
Section of a configuration for grouping items.
Constructor argument:
doc (str) the name of this configuration section.
A section can contain zero or more subsections and zero or more configuration items.
class Item(default, doc=None, namespace=None)
Configuration item for storing configuration settings.
Constructor arguments:
default the default value for this item
doc (str, default: None) an optional description of this item
35
namespace (dict, default: None) an optional namespace dictionary which will be made available
to the Python code in the external configuration file during loading
A configuration item is the object that stores the actual configuration settings. Each item has a default value, a
current value, an optional description, and an optional namespace.
This class performs the checking of configuration values assigned to it during loading of the configuration file.
The default behavior of this class is to only accept values of the same Python type as the items default value. So,
if the default value is a string, then the value assigned in the configuration file must also be a string. Otherwise
an exception will be raised and loading will fail.
Developers who want other kinds of value checking should override the Item.validate() method of this
class.
validate(value)
Determine whether the given value is valid.
This method performs validity checking of the configuration value assigned to this item during loading of
the external configuration file. If the default behavior is to raise a TypeError if the type of the assigned
value is not the same as the type of the default value.
1.6 Project
Contents:
The first line of a commit message represents the title or summary of the change. Standard Git tools often display it
differently than the rest of the message, for example in bold font, or show only this line, for example when summarizing
multiple commits.
A commit messages first line should be formatted as follows:
The first line should be no longer than 50 characters
36
Chapter 1. Documentation
Issues can be referenced anywhere within a commit message via their numbered tag, e.g. #7.
Commits that change the status of an issue, for example fixing a bug or implementing a feature, should make the
relationship and change explicit on the last line of the commit message using the following format: Resolve #X.
GitHub will automatically update the issue accordingly.
Please see GitHubs help on commit message keywords for more information.
Example
1.6. Project
37
Version incrementation
38
Chapter 1. Documentation
built
automatically,
and
if
not
trigger
it:
Sequence
Basic usage:
>>> seq = Sequence([Literal("hello"), Literal("world")])
>>> test_seq = ElementTester(seq)
>>> test_seq.recognize("hello world")
[u'hello', u'world']
>>> test_seq.recognize("hello universe")
RecognitionFailure
Constructor arguments:
39
Hello world
The spec of the compound element below is parsed into a single literal hello world. The semantic value of the
compound element will therefore be the same as for that literal element, namely hello world.
>>> element = Compound("hello world")
>>> tester = ElementTester(element)
>>> tester.recognize("hello world")
u'hello world'
>>> tester.recognize("hello universe")
RecognitionFailure
The spec of the compound element below is parsed into a sequence with three elements: the word hello, an optional
there, and an alternative of world or universe. The semantic value of the compound element will therefore have
three elements, even when there is not spoken.
>>> element = Compound("hello [there] (world | universe)")
>>> tester = ElementTester(element)
>>> tester.recognize("hello world")
[u'hello', None, u'world']
>>> tester.recognize("hello there world")
[u'hello', u'there', u'world']
>>> tester.recognize("hello universe")
[u'hello', None, u'universe']
>>> tester.recognize("hello galaxy")
RecognitionFailure
40
Chapter 1. Documentation
A list update is automatically available for recognition without reloading the grammar:
>>> tester_fruit.recognize("item apple")
RecognitionFailure
>>> list_fruit.append("apple")
>>> list_fruit
['apple']
>>> tester_fruit.recognize("item apple")
[u'item', u'apple']
>>> tester_fruit.recognize("item banana")
RecognitionFailure
>>> list_fruit.append("banana")
>>> list_fruit
['apple', 'banana']
>>> tester_fruit.recognize("item apple")
[u'item', u'apple']
>>> tester_fruit.recognize("item banana")
[u'item', u'banana']
>>> tester_fruit.recognize("item apple banana")
RecognitionFailure
>>> list_fruit.remove("apple")
>>> list_fruit
['banana']
>>> tester_fruit.recognize("item apple")
RecognitionFailure
>>> tester_fruit.recognize("item banana")
[u'item', u'banana']
Lists can contain the same value multiple times, although that does not affect recognition:
41
>>> list_fruit.append("banana")
>>> list_fruit
['banana', 'banana']
>>> tester_fruit.recognize("item banana")
[u'item', u'banana']
>>> tester_fruit.recognize("item banana banana")
RecognitionFailure
Multiple lists
list_meat = List("list_meat")
list_veg = List("list_veg")
element = Sequence([Literal("food"),
ListRef("list_meat_ref", list_meat),
ListRef("list_veg_ref", list_veg)])
tester_meat_veg = ElementTester(element)
# Explicitly load tester grammar because lists can only be updated
# for loaded grammars.
tester_meat_veg.load()
element = Sequence([Literal("carnivore"),
ListRef("list_meat_ref1", list_meat),
ListRef("list_meat_ref2", list_meat)])
tester_carnivore = ElementTester(element)
# Explicitly load tester grammar because lists can only be updated
# for loaded grammars.
tester_carnivore.load()
42
Chapter 1. Documentation
RecognitionFailure
>>> tester_carnivore.recognize("carnivore hamburger steak")
[u'carnivore', u'hamburger', u'steak']
>>> tester_carnivore.recognize("carnivore steak hamburger")
[u'carnivore', u'steak', u'hamburger']
>>> tester_carnivore.recognize("carnivore steak steak")
[u'carnivore', u'steak', u'steak']
>>> list_meat.remove("steak")
>>> tester_carnivore.recognize("carnivore steak hamburger")
RecognitionFailure
>>> tester_carnivore.recognize("carnivore hamburger hamburger")
[u'carnivore', u'hamburger', u'hamburger']
ListRef construction
ListRef objects must be created referencing the correct type of list object:
>>> print ListRef("list_fruit_ref", []) # Fails.
Traceback (most recent call last):
...
TypeError: List argument to ListRef constructor must be a Dragonfly list.
>>> print ListRef("list_fruit_ref", List("list_fruit")) # Succeeds.
ListRef('list_fruit')
43
Note:
RecognitionObserver instances can be used for both the DNS and the WSR backend engines. However, WSR does not offer access to the words recognized by a different context, and therefore the
RecognitionObservers.on_recognition() will always be called with words = False.
Test fixture initialization:
>>> from dragonfly import *
>>> from dragonfly.test import ElementTester
44
Chapter 1. Documentation
>>> test_recobs.register()
>>> test_recobs.waiting, test_recobs.words
(False, None)
45
86
>>> history
[(u'hello', u'world'), (u'eighty', u'six')]
Minimum length is 1:
>>> history = RecognitionHistory(1)
>>> history.register()
>>> history
[]
>>> for i, word in enumerate(["one", "two", "three", "four", "five"]):
...
assert test_int.recognize(word) == i + 1
>>> history
[(u'five',)]
46
Chapter 1. Documentation
executing 'a'
>>> a.execute({"foo": 2})
executing 'a' {'foo': 2}
>>>
Concatenating actions
# In place concatenation.
# In place concatenation.
Repeating actions
# Integer-factor repetition.
47
executing 'a'
>>> factor = Repeat(extra="foo")
# Named-factor repetition.
>>> ((a + b) * factor).execute({"foo": 2})
executing 'a' {'foo': 2}
executing 'b' {'foo': 2}
executing 'a' {'foo': 2}
executing 'b' {'foo': 2}
>>> ((a + b) * factor).execute({"bar": 2})
Traceback (most recent call last):
...
ActionError: No extra repeat factor found for name 'foo'
>>> c = a
>>> c.execute({"foo": 2})
executing 'a' {'foo': 2}
>>> c *= Repeat(extra="foo")
>>> c.execute({"foo": 2})
executing 'a' {'foo': 2}
executing 'a' {'foo': 2}
>>> c += b
>>> c *= 2
>>> c.execute({"foo": 1})
executing 'a' {'foo': 1}
executing 'b' {'foo': 1}
executing 'a' {'foo': 1}
executing 'b' {'foo': 1}
>>> c *= 2
>>> c.execute({"foo": 0})
executing 'b' {'foo': 0}
executing 'b' {'foo': 0}
executing 'b' {'foo': 0}
executing 'b' {'foo': 0}
>>> c *= 0
>>> c.execute({"foo": 1})
48
Chapter 1. Documentation
49
50
Chapter 1. Documentation
CHAPTER 2
Usage example
A very simple example of Dragonfly usage is to create a static voice command with a callback that will be called when
the command is spoken. This is done as follows:
from dragonfly.all import Grammar, CompoundRule
# Voice command rule combining spoken form and recognition processing.
class ExampleRule(CompoundRule):
spec = "do something computer"
# Spoken form of command.
def _process_recognition(self, node, extras):
# Callback when command is spoken.
print "Voice command spoken."
# Create a grammar which contains and loads the command rule.
grammar = Grammar("example grammar")
# Create a grammar to contain the command rule.
grammar.add_rule(ExampleRule())
# Add the command rule to the grammar.
grammar.load()
# Load the grammar.
The example above is very basic and doesnt show any of Dragonflys exciting features, such as dynamic speech
elements. To learn more about these, please take a look at the projects documentation here.
51
52
CHAPTER 3
Dragonfly offers a powerful and unified interface to developers who want to use speech recognition in their software.
It is used for both speech-enabling applications and for automating computer activities.
In the field of scripting and automation, there are other alternatives available that add speech-commands to increase
efficiency. Dragonfly differs from them in that it is a powerful development platform. The open source alternatives
currently available for use with DNS are compared to Dragonfly as follows:
Vocola uses its own easy-to-use scripting language, whereas Dragonfly uses Python and gives the macro-writer
all the power available.
Unimacro offers a set of macros for common activities, whereas Dragonfly is a platform on which macro-writers
can easily build new commands.
53
54
CHAPTER 4
genindex
modindex
search
55
56
d
dragonfly.actions.action_base, 23
dragonfly.actions.action_focuswindow,
30
dragonfly.actions.action_function, 28
dragonfly.actions.action_key, 23
dragonfly.actions.action_mimic, 29
dragonfly.actions.action_mouse, 26
dragonfly.actions.action_paste, 26
dragonfly.actions.action_pause, 31
dragonfly.actions.action_playback, 30
dragonfly.actions.action_startapp, 31
dragonfly.actions.action_text, 26
dragonfly.actions.action_waitwindow, 30
dragonfly.config, 34
dragonfly.engines.backend_natlink, 21
dragonfly.engines.backend_natlink.dictation,
21
dragonfly.engines.backend_sapi5, 21
dragonfly.engines.base.dictation, 21
dragonfly.grammar.context, 17
dragonfly.grammar.elements_basic, 13
dragonfly.grammar.rule_base, 9
dragonfly.grammar.rule_compound, 11
dragonfly.grammar.rule_mapping, 12
57
58
Index
Symbols
_copy_sequence() (ElementBase method), 14
_get_children() (Alternative method), 15
_get_children() (ElementBase method), 14
_get_children() (Optional method), 15
_get_children() (Sequence method), 15
_process_begin() (Grammar method), 8
59
dragonfly.grammar.rule_base (module), 9
dragonfly.grammar.rule_compound (module), 11
dragonfly.grammar.rule_mapping (module), 12
FocusWindow
(class
in
fly.actions.action_focuswindow), 31
format() (DictationContainerBase method), 21
format() (NatlinkDictationContainer method), 21
Function (class in dragonfly.actions.action_function), 29
I
imported (Rule attribute), 10
is_engine_available()
(in
module
fly.engines.backend_natlink), 21
is_engine_available()
(in
module
fly.engines.backend_sapi5), 21
Item (class in dragonfly.config), 35
dragon-
Index
T
Text (class in dragonfly.actions.action_text), 26
U
unload() (Grammar method), 9
V
validate() (Item method), 36
value() (Alternative method), 15
value() (ElementBase method), 14
value() (Optional method), 15
value() (Repetition method), 16
value() (Rule method), 11
value() (Sequence method), 15
W
WaitWindow
(class
in
fly.actions.action_waitwindow), 30
words (DictationContainerBase attribute), 21
Index
dragon-
61