Python 201 - (Slightly) Advanced Python Topics
Python 201 - (Slightly) Advanced Python Topics
Revision: 1.1a
Date: October 05, 2014
Copyright: Copyright (c) 2003 Dave Kuhlman. All Rights Reserved. This software is
subject to the provisions of the MIT License
http://www.opensource.org/licenses/mit-license.php.
Abstract: This document is a self-learning document for a second course in Python
programming. This course contains discussions of several advanced
topics that are of interest to Python programmers.
Contents
2 Regular Expressions
For more help on regular expressions, see:
Literal characters must match exactly. For example, "a" matches "a".
Concatenated patterns match concatenated targets. For example, "ab" ("a"
followed by "b") matches "ab".
Alternate patterns (separated by a vertical bar) match either of the alternative
patterns. For example, "(aaa)|(bbb)" will match either "aaa" or "bbb".
Repeating and optional items:
"abc*" matches "ab" followed by zero or more occurances of "c", for
example, "ab", "abc", "abcc", etc.
"abc+" matches "ab" followed by one or more occurances of "c", for
example, "abc", "abcc", etc, but not "ab".
"abc?" matches "ab" followed by zero or one occurances of "c", for
example, "ab" or "abc".
Sets of characters -- Characters and sequences of characters in square
brackets form a set; a set matches any character in the set or range. For
example, "[abc]" matches "a" or "b" or "c". And, for example, "[_a-z0-9]"
matches an underscore or any lower-case letter or any digit.
Groups -- Parentheses indicate a group with a pattern. For example, "ab(cd)*ef"
is a pattern that matches "ab" followed by any number of occurances of "cd"
followed by "ef", for example, "abef", "abcdef", "abcdcdef", etc.
There are special names for some sets of characters, for example "\d" (any
digit), "\w" (any alphanumeric character), "\W" (any non-alphanumeric
character), etc. More more information, see Python Library Reference: Regular
Expression Syntax -- http://docs.python.org/library/re.html#regular-
expression-syntax
Because of the use of backslashes in patterns, you are usually better o! defining
regular expressions with raw strings, e.g. r"abc".
import sys, re
pat = re.compile('aa[bc]*dd')
while 1:
line = raw_input('Enter a line ("q" to quit):')
if line == 'q':
break
if pat.search(line):
print 'matched:', line
else:
print 'no match:', line
Comments:
Use search() to search a string and match the first string from the left.
>>> import re
>>> pat = re.compile('aa[0-9]*bb')
>>> x = pat.match('aa1234bbccddee')
>>> x
<_sre.SRE_Match object at 0x401e9608>
>>> x = pat.match('xxxxaa1234bbccddee')
>>> x
>>> type(x)
<type 'NoneType'>
>>> x = pat.search('xxxxaa1234bbccddee')
>>> x
<_sre.SRE_Match object at 0x401e9608>
Notes:
You can also call the corresponding functions match and search in the re
module, e.g.:
import sys, re
Targets = [
'There are <<25>> sparrows.',
'I see <<15>> finches.',
'There is nothing here.',
]
def test():
pat = re.compile('<<([0-9]*)>>')
for line in Targets:
mo = pat.search(line)
if mo:
value = mo.group(1)
print 'value: %s' % value
else:
print 'no match'
test()
Explanation:
In the regular expression, put parentheses around the portion of the regular
expression that will match what you want to extract. Each pair of parentheses
marks o! a group.
After the search, check to determine if there was a successful match by
checking for a matching object. "pat.search(line)" returns None if the search
fails.
If you specify more than one group in your regular expression (more that one
pair of parentheses), then you can use "value = mo.group(N)" to extract the
value matched by the Nth group from the matching object. "value =
mo.group(1)" returns the first extracted value; "value = mo.group(2)" returns
the second; etc. An argument of 0 returns the string matched by the entire
regular expression.
Use "values = mo.groups()" to get a tuple containing the strings matched by all
groups.
Use "mo.expand()" to interpolate the group values into a string. For example,
"mo.expand(r'value1: \1 value2: \2')"inserts the values of the first and second
group into a string. If the first group matched "aaa" and the second matched
"bbb", then this example would produce "value1: aaa value2: bbb". For
example:
import sys, re
pat = re.compile('aa([0-9]*)bb([0-9]*)cc')
while 1:
line = raw_input('Enter a line ("q" to quit):')
if line == 'q':
break
mo = pat.search(line)
if mo:
value1, value2 = mo.group(1, 2)
print 'value1: %s value2: %s' % (value1, value2)
else:
print 'no match'
Comments:
import re
def repl_func(mo):
s1 = mo.group(1)
s2 = '*' * len(s1)
return s2
def test():
pat = r'(\d+)'
in_str = 'there are 2034 birds in 21 trees'
out_str, count = re.subn(pat, repl_func, in_str)
print 'in: "%s"' % in_str
print 'out: "%s"' % out_str
print 'count: %d' % count
test()
Notes:
Here is an even more complex example -- You can locate sub-strings (slices) of a
match and replace them:
import sys, re
pat = re.compile('aa([0-9]*)bb([0-9]*)cc')
while 1:
line = raw_input('Enter a line ("q" to quit): ')
if line == 'q':
break
mo = pat.search(line)
if mo:
value1, value2 = mo.group(1, 2)
start1 = mo.start(1)
end1 = mo.end(1)
start2 = mo.start(2)
end2 = mo.end(2)
print 'value1: %s start1: %d end1: %d' % (value1, start1, end1)
print 'value2: %s start2: %d end2: %d' % (value2, start2, end2)
repl1 = raw_input('Enter replacement #1: ')
repl2 = raw_input('Enter replacement #2: ')
newline = (line[:start1] + repl1 + line[end1:start2] +
repl2 + line[end2:])
print 'newline: %s' % newline
else:
print 'no match'
Explanation:
Put together a new string with string concatenation from pieces of the original
string and replacement values. You can use string slices to get the sub-strings
of the original string. In our case, the following gets the start of the string,
adds the first replacement, adds the middle of the original string, adds the
second replacement, and finally, adds the last part of the original string:
You can also use the sub function or method to do substitutions. Here is an example:
import sys, re
pat = re.compile('[0-9]+')
pat = re.compile('[a-m]+')
def replacer(mo):
return string.upper(mo.group(0))
Notes:
If the replacement argument to sub is a function, that function must take one
argument, a match object, and must return the modified (or replacement)
value. The matched sub-string will be replaced by the value returned by this
function.
In our case, the function replacer converts the matched value to upper case.
This is also a convenient use for a lambda instead of a named function, for example:
pat = re.compile('[a-m]+')
3 Iterator Objects
Note 1: You will need a su"ciently recent version of Python in order to use iterators
and generators. I believe that they were introduced in Python 2.2.
Note 2: The iterator protocol has changed slightly in Python version 3.0.
Learn how to implement a generator function, that is, a function which, when
called, returns an iterator.
Learn how to implement a class containing a generator method, that is, a
method which, when called, returns an iterator.
Learn the iterator protocol, specifically what methods an iterator must support
and what those methods must do.
Learn how to implement an iterator class, that is, a class whose instances are
iterator objects.
Learn how to implement recursive iterator generators, that is, an iterator
generator which recursively produces iterator generators.
Learn that your implementation of an iterator object (an iterator class) can
"refresh" itself and learn at least one way to do this.
Definitions:
def generateItems(seq):
for item in seq:
yield 'item: %s' % item
anIter = generateItems([])
print 'dir(anIter):', dir(anIter)
anIter = generateItems([111,222,333])
for x in anIter:
print x
anIter = generateItems(['aaa', 'bbb', 'ccc'])
print anIter.next()
print anIter.next()
print anIter.next()
print anIter.next()
The value returned by the call to the generator (function) is an iterator. It obeys
the iterator protocol. That is, dir(anIter) shows that it has both __iter__() and
next() methods.
Because this object is an iterator, we can use a for statement to iterate over the
values returned by the generator.
We can also get its values by repeatedly calling the next() method, until it
raises the StopIteration exception. This ability to call the next method enables
us to pass the iterator object around and get values at di!erent locations in our
code.
Once we have obtained all the values from an iterator, it is, in e!ect, "empty" or
"exhausted". The iterator protocol, in fact, specifies that once an iterator raises
the StopIteration exception, it should continue to do so. Another way to say
this is that there is no "rewind" operation. But, you can call the the generator
function again to get a "fresh" iterator.
Then following example implements a function that returns a generator object. The
e!ect is to generate the objects in a collection which excluding items in a separte
collection:
DATA = [
'lemon',
'lime',
'grape',
'apple',
'pear',
'watermelon',
'canteloupe',
'honeydew',
'orange',
'grapefruit',
]
def test():
iter1 = make_producer(DATA, ('apple', 'orange', 'honeydew', ))
print '%s' % iter1
for fruit in iter1:
print fruit
test()
$ python workbook063.py
<generator object <genexpr> at 0x7fb3d0f1bc80>
lemon
lime
grape
pear
watermelon
canteloupe
grapefruit
Notes:
#
# A class that provides an iterator generator method.
#
class Node:
def __init__(self, name='<noname>', value='<novalue>', children=None):
self.name = name
self.value = value
self.children = children
if children is None:
self.children = []
else:
self.children = children
def set_name(self, name): self.name = name
def get_name(self): return self.name
def set_value(self, value): self.value = value
def get_value(self): return self.value
def iterchildren(self):
for child in self.children:
yield child
#
# Print information on this node and walk over all children and
# grandchildren ...
def walk(self, level=0):
print '%sname: %s value: %s' % (
get_filler(level), self.get_name(), self.get_value(), )
for child in self.iterchildren():
child.walk(level + 1)
#
# An function that is the equivalent of the walk() method in
# class Node.
#
def walk(node, level=0):
print '%sname: %s value: %s' % (
get_filler(level), node.get_name(), node.get_value(), )
for child in node.iterchildren():
walk(child, level + 1)
def get_filler(level):
return ' ' * level
def test():
a7 = Node('gilbert', '777')
a6 = Node('fred', '666')
a5 = Node('ellie', '555')
a4 = Node('daniel', '444')
a3 = Node('carl', '333', [a4, a5])
a2 = Node('bill', '222', [a6, a7])
a1 = Node('alice', '111', [a2, a3])
# Use the walk method to walk the entire tree.
print 'Using the method:'
a1.walk()
print '=' * 30
# Use the walk function to walk the entire tree.
print 'Using the function:'
walk(a1)
test()
Note that when an iterator is "exhausted" it, normally, cannot be reused to iterate
over the sequence. However, in this example, we provide a refresh method which
enables us to "rewind" and reuse the iterator instance:
#
# An iterator class that does *not* use ``yield``.
# This iterator produces every other item in a sequence.
#
class IteratorExample:
def __init__(self, seq):
self.seq = seq
self.idx = 0
def next(self):
self.idx += 1
if self.idx >= len(self.seq):
raise StopIteration
value = self.seq[self.idx]
self.idx += 1
return value
def __iter__(self):
return self
def refresh(self):
self.idx = 0
def test_iteratorexample():
a = IteratorExample('edcba')
for x in a:
print x
print '----------'
a.refresh()
for x in a:
print x
print '=' * 30
a = IteratorExample('abcde')
try:
print a.next()
print a.next()
print a.next()
print a.next()
print a.next()
print a.next()
except StopIteration, e:
print 'stopping', e
test_iteratorexample()
d
b
----------
d
b
==============================
b
d
stopping
The next method must keep track of where it is and what item it should
produce next.
Alert: The iterator protocol has changed slightly in Python 3.0. In particular, the
next() method has been renamed to __next__(). See: Python Standard Library:
Iterator Types -- http://docs.python.org/3.0/library/stdtypes.html#iterator-
types.
#
# An iterator class that uses ``yield``.
# This iterator produces every other item in a sequence.
#
class YieldIteratorExample:
def __init__(self, seq):
self.seq = seq
self.iterator = self._next()
self.next = self.iterator.next
def _next(self):
flag = 0
for x in self.seq:
if flag:
flag = 0
yield x
else:
flag = 1
def __iter__(self):
return self.iterator
def refresh(self):
self.iterator = self._next()
self.next = self.iterator.next
def test_yielditeratorexample():
a = YieldIteratorExample('edcba')
for x in a:
print x
print '----------'
a.refresh()
for x in a:
print x
print '=' * 30
a = YieldIteratorExample('abcde')
try:
print a.next()
print a.next()
print a.next()
print a.next()
print a.next()
print a.next()
except StopIteration, e:
print 'stopping', e
test_yielditeratorexample()
d
b
----------
d
b
==============================
b
d
stopping
Because the _next method uses yield, calling it (actually, calling the iterator
object it produces) in an iterator context causes it to be "resumed" immediately
after the yield statement. This reduces bookkeeping a bit.
self.iterator = self._next()
self.next = self.iterator.next
Here is an example:
For more on generator expressions, see The Python Language Reference: Generator
expressions -- http://docs.python.org/reference/expressions.html#generator-
expressions.
def f(x):
return x*3
for x in genexpr:
print x
4 Unit Tests
Unit test and the Python unit test framework provide a convenient way to define and
run tests that ensure that a Python application produces specified results.
This section, while it will not attempt to explain everything about the unit test
framework, will provide examples of several straight-forward ways to construct and
run tests.
Some assumptions:
import unittest
class MyTest(unittest.TestCase):
def test_one(self):
# some test code
pass
def test_two(self):
# some test code
pass
class XmlTest(unittest.TestCase):
def test_import_export1(self):
inFile = file('test1_in.xml', 'r')
inContent = inFile.read()
inFile.close()
doc = webserv_example_heavy_sub.parseString(inContent)
outFile = StringIO.StringIO()
outFile.write('<?xml version="1.0" ?>\n')
doc.export(outFile, 0)
outContent = outFile.getvalue()
outFile.close()
self.failUnless(inContent == outContent)
if __name__ == "__main__":
test_main()
----------------------------------------------------------------------
Ran 1 test in 0.035s
OK
This example tests the ability to parse an xml document test1_in.xml and
export that document back to XML. The test succeeds if the input XML
document and the exported XML document are the same.
The code which is being tested parses an XML document returned by a request
to Amazon Web services. You can learn more about Amazon Web services at:
http://www.amazon.com/webservices. This code was generated from an XML
Schema document by generateDS.py. So we are in e!ect, testing generateDS.py.
You can find generateDS.py at: http://www.davekuhlman.org/#generateds-py.
Testing for success/failure and reporting failures -- Use the methods listed at
http://www.python.org/doc/current/lib/testcase-objects.html to test for and
report success and failure. In our example, we used "self.failUnless(inContent
== outContent)" to ensure that the content we parsed and the content that we
exported were the same.
Add additional tests by adding methods whose names have the prefix "test". If
you prefer a di!erent prefix for tests names, add something like the following
to the above script:
loader.testMethodPrefix = 'trial'
By default, the tests are run in the order of their names sorted by the cmp
function. So, if needed, you can control the order of execution of tests by
selecting their names, for example, using names like test_1_checkderef,
test_2_checkcalc, etc. Or, you can change the comparison function by adding
something like the following to the above script:
loader.sortTestMethodsUsing = mycmpfunc
As a bit of motivation for creating and using unit tests, while developing this
example, I discovered several errors (or maybe "special features") in generateDS.py.
Documentation -- The two important sources for information about extending and
embedding are the following:
Types of extensions:
Tools -- There are several tools that support the development of Python extensions:
Create the "init" function -- The name of this function must be "init" followed
by the name of the module. Every extension module must have such a function.
Create the function table -- This table maps function names (referenced from
Python code) to function pointers (implemented in C/C++).
Implement each wrapper function.
1. Capture the arguments with PyArg_ParseTuple. The format string specifies how
arguments are to be converted and captured. See 1.7 Extracting Parameters in
Extension Functions. Here are some of the most commonly used types:
Use "i", "s", "f", etc to convert and capture simple types such as integers,
strings, floats, etc.
Use "O" to get a pointer to Python "complex" types such as lists, tuples,
dictionaries, etc.
Use items in parentheses to capture and unpack sequences (e.g. lists and
tuples) of fixed length. Example:
Use ":aName" (colon) at the end of the format string to provide a function
name for error messages. Example:
Use ";an error message" (semicolon) at the end of the format string to
provide a string that replaces the default error message.
3. Handle errors and exceptions -- You will need to understand how to (1)
clearing errors and exceptions and (2) Raise errors (exceptions).
Many functions in the Python C API raise exceptions. You will need to
check for and clear these exceptions. Here is an example:
char * message;
int messageNo;
message = NULL;
messageNo = -1;
/* Is the argument a string?
*/
if (! PyArg_ParseTuple(args, "s", &message))
{
/* It's not a string. Clear the error.
* Then try to get a message number (an integer).
*/
PyErr_Clear();
if (! PyArg_ParseTuple(args, "i", &messageNo))
{
o
o
o
You can also raise exceptions in your C code that can be caught (in a
"try:except:" block) back in the calling Python code. Here is an example:
if (n == 0)
{
PyErr_SetString(PyExc_ValueError, "Value must not be zero");
return NULL;
}
And, you can test whether a function in the Python C API that you have
called has raised an exception. For example:
if (PyErr_Occurred())
{
/* An exception was raised.
* Do something about it.
*/
o
o
o
For each built-in Python type there is a set of API functions to create and
manipulate it. See the "Python/C API Reference Manual" for a description
of these functions. For example, see:
http://www.python.org/doc/current/api/intObjects.html
http://www.python.org/doc/current/api/stringObjects.html
http://www.python.org/doc/current/api/tupleObjects.html
http://www.python.org/doc/current/api/listObjects.html
http://www.python.org/doc/current/api/dictObjects.html
Etc.
The reference count -- You will need to follow Python's rules for
reference counting that Python uses to garbage collect objects. You can
learn about these rules at
http://www.python.org/doc/current/ext/refcounts.html. You will not
want Python to garbage collect objects that you create too early or too
late. With respect to Python objects created with the above functions,
these new objects are owned and may be passed back to Python code.
However, there are situations where your C/C++ code will not
automatically own a reference, for example when you extract an object
from a container (a list, tuple, dictionary, etc). In these cases you should
increment the reference count with Py_INCREF.
5.3 SWIG
Note: Our discussion and examples are for SWIG version 1.3
SWIG will often enable you to generate wrappers for functions in an existing C
function library. SWIG does not understand everything in C header files. But it does a
fairly impressive job. You should try it first before resorting to the hard work of
writing wrappers by hand.
1. Create an interface file -- Even when you are wrapping functions defined in an
existing header file, creating an interface file is a good idea. Include your
existing header file into it, then add whatever else you need. Here is an
extremely simple example of a SWIG interface file:
%module MyLibrary
%{
#include "MyLibrary.h"
%}
%include "MyLibrary.h"
Comments:
The "%{" and "%}" brackets are directives to SWIG. They say: "Add the code
between these brackets to the generated wrapper file without processing
it.
The "%include" statement says: "Copy the file into the interface file here.
In e!ect, you are asking SWIG to generate wrappers for all the functions
in this header file. If you want wrappers for only some of the functions in
a header file, then copy or reproduce function declarations for the
desired functions here. An example:
%module MyLibrary
%{
#include "MyLibrary.h"
%}
You can find more information about the directives that are used in SWIG
interface files in the SWIG User Manual, in particular at:
http://www.swig.org/Doc1.3/Preprocessor.html
http://www.swig.org/Doc1.3/Python.html
2. Generate the wrappers:
3. Compile and link the library. On Linux, you can use something like the
following:
gcc -c MyLibrary.c
gcc -c -I/usr/local/include/python2.3 MyLibrary_wrap.c
gcc -shared MyLibrary.o MyLibrary_wrap.o -o _MyLibrary.so
Note that we produce a shared library whose name is the module name
prefixed with an underscore. SWIG also generates a .py file, without the leading
underscore, which we will import from our Python code and which, in turn,
imports the shared library.
Here is a makefile that will execute swig to generate wrappers, then compile and link
the extension.
CFLAGS = -I/usr/local/include/python2.3
all: _MyLibrary.so
MyLibrary.o: MyLibrary.c
gcc -c MyLibrary.c -o MyLibrary.o
MyLibrary_wrap.o: MyLibrary_wrap.c
gcc -c ${CFLAGS} MyLibrary_wrap.c -o MyLibrary_wrap.o
MyLibrary_wrap.c: MyLibrary.i
swig -python MyLibrary.i
clean:
And, here are C source files that can be used in our example.
MyLibrary.h:
/* MyLibrary.h
*/
int getVersion();
int getMode();
MyLibrary.c:
/* MyLibrary.c
*/
int getVersion()
{
return 123;
}
int getMode()
{
return 1;
}
5.4 Pyrex
Pyrex is a useful tool for writing Python extensions. Because the Pyrex language is
similar to Python, writing extensions in Pyrex is easier than doing so in C. Cython
appears to be the a newer version of Pyrex.
# python_201_pyrex_string.pyx
import string
all: python_201_pyrex_string.so
python_201_pyrex_string.so: python_201_pyrex_string.o
gcc -shared python_201_pyrex_string.o -o python_201_pyrex_string.so
python_201_pyrex_string.o: python_201_pyrex_string.c
gcc -c ${CFLAGS} python_201_pyrex_string.c -o python_201_pyrex_string.o
python_201_pyrex_string.c: python_201_pyrex_string.pyx
pyrexc python_201_pyrex_string.pyx
clean:
rm -f python_201_pyrex_string.so python_201_pyrex_string.o \
python_201_pyrex_string.c
Here is another example. In this one, one function in the .pyx file calls another. Here
is the implementation file:
# python_201_pyrex_primes.pyx
all: python_201_pyrex_primes.so
python_201_pyrex_primes.so: python_201_pyrex_primes.o
gcc -shared python_201_pyrex_primes.o -o
python_201_pyrex_primes.so
python_201_pyrex_primes.o: python_201_pyrex_primes.c
gcc -c ${CFLAGS} python_201_pyrex_primes.c -o
python_201_pyrex_primes.o
python_201_pyrex_primes.c: python_201_pyrex_primes.pyx
pyrexc python_201_pyrex_primes.pyx
clean:
rm -f python_201_pyrex_primes.so python_201_pyrex_primes.o
python_201_pyrex_primes.c
$ python
Python 2.3b1 (#1, Apr 25 2003, 20:36:09)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import python_201_pyrex_primes
>>> dir(python_201_pyrex_primes)
['__builtins__', '__doc__', '__file__', '__name__', 'showPrimes']
>>> python_201_pyrex_primes.showPrimes(5)
prime: 2
prime: 3
prime: 5
prime: 7
prime: 11
This next example shows how to use Pyrex to implement a new extension type, that
is a new Python built-in type. Notice that the class is declared with the cdef keyword,
which tells Pyrex to generate the C implementation of a type instead of a class.
# python_201_pyrex_clsprimes.pyx
all: python_201_pyrex_clsprimes.so
python_201_pyrex_clsprimes.so: python_201_pyrex_clsprimes.o
gcc -shared python_201_pyrex_clsprimes.o -o python_201_pyrex_clsprimes.s
python_201_pyrex_clsprimes.o: python_201_pyrex_clsprimes.c
gcc -c ${CFLAGS} python_201_pyrex_clsprimes.c -o python_201_pyrex_clspri
python_201_pyrex_clsprimes.c: python_201_pyrex_clsprimes.pyx
pyrexc python_201_pyrex_clsprimes.pyx
clean:
rm -f python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.o \
python_201_pyrex_clsprimes.c
$ python
Python 2.3b1 (#1, Apr 25 2003, 20:36:09)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import python_201_pyrex_clsprimes
>>> dir(python_201_pyrex_clsprimes)
['Primes', '__builtins__', '__doc__', '__file__', '__name__']
>>> primes = python_201_pyrex_clsprimes.Primes()
>>> dir(primes)
['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__',
'__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__str__', 'primes', 'showPrimes']
>>> primes.showPrimes(4)
prime: 2
prime: 3
prime: 5
prime: 7
Documentation -- Also notice that Pyrex preserves the documentation for the
module, the class, and the methods in the class. You can show this documentation
with pydoc, as follows:
$ pydoc python_201_pyrex_clsprimes
$ python
Python 2.3b1 (#1, Apr 25 2003, 20:36:09)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import python_201_pyrex_clsprimes
>>> help(python_201_pyrex_clsprimes)
5.5 SWIG vs. Pyrex
Choose SWIG when:
You already have an existing C or C++ implementation of the code you want to
call from Python. In this case you want SWIG to generate the wrappers. But note
that Cython promises to enable you to quickly wrap and call functions
implemented in C.
You want to write the implementation in C or C++ by hand. Perhaps, because
you think you can do so quickly, for example, or because you believe that you
can make it highly optimized. Then, you want to be able to generate the Python
(extension) wrappers for it quickly.
You do not have a C/C++ implementation and you want an easier way to write
that C implementation. Writing Pyrex code, which is a lot like Python, is easier
than writing C or C++ code by hand).
You start to write the implementation in C, then find that it requires lots of
calls to the Python C API, and you want to avoid having to learn how to do that.
5.6 Cython
Here is a simple example that uses Cython to wrap a function implemented in C.
/* test_c_lib.h */
/* test_c_lib.c */
#include "test_c_lib.h"
# test_c.pyx
#!/bin/bash -x
cython test_c.pyx
gcc -c -fPIC -I/usr/local/include/python2.6 -o test_c.o test_c.c
gcc -c -fPIC -I/usr/local/include/python2.6 -o test_c_lib.o test_c_lib.c
gcc -shared -fPIC -I/usr/local/include/python2.6 -o test_c.so test_c.o test
Here is a small Python file that uses the wrapper that we wrote in Cython:
# run_test_c.py
import test_c
def test():
test_c.test(4, 5)
test_c.test(12, 15)
if __name__ == '__main__':
test()
$ python run_test_c.py
result from calculate: 60
result from calculate: 540
In older versions of the Python source code distribution, a template for the C code
was provided in Objects/xxobject.c. Objects/xxobject.c is no longer included in the
Python source code distribution. However:
The discussion and examples for creating extension types have been
expanded. See: Extending and Embedding the Python Interpreter, 2. Defining
New Types -- http://docs.python.org/extending/newtypes.html.
In the Tools/framer directory of the Python source code distribution there is an
application that will generate a skeleton for an extension type from a
specification object written in Python. Run Tools/framer/example.py to see it in
action.
And, you can use Pyrex to generate a new built-in type. To do so, implement a
Python/Pyrex class and declare the class with the Pyrex keyword cdef. In fact, you
may want to use Pyrex to generate a minimal extension type, and then edit that
generated code to insert and add functionality by hand. See the Pyrex section for an
example.
Pyrex also goes some way toward giving you access to (existing) C structs and
functions from Python.
Extension classes the Pyrex way -- An alternatie is to use Pyrex to compile a class
definition that does not have the cdef keyword. Using cdef on the class tells Pyrex to
generate an extension type instead of a class. You will have to determine whether
you want an extension class or an extension type.
6 Parsing
Python is an excellent language for text analysis.
In some cases, simply splitting lines of text into words will be enough. In these cases
use string.split().
In other cases, regular expressions may be able to do the parsing you need. If so, see
the section on regular expressions in this document.
However, in some cases, more complex analysis of input text is required. This
section describes some of the ways that Python can help you with this complex
parsing and analysis.
XML parsers and XML tools -- There is lots of support for parsing and processing
XML in Python. Here are a few places to look for support:
As an example, we'll implement a recursive descent parser written in Python for the
following grammer:
#!/usr/bin/env python
"""
A recursive descent parser example.
Usage:
python rparser.py [options] <inputfile>
Options:
-h, --help Display this help message.
Example:
python rparser.py myfile.txt
The grammar:
Prog ::= Command | Command Prog
Command ::= Func_call
Func_call ::= Term '(' Func_call_list ')'
Func_call_list ::= Func_call | Func_call ',' Func_call_list
Term = <word>
"""
import sys
import string
import types
import getopt
#
# To use the IPython interactive shell to inspect your running
# application, uncomment the following lines:
#
## from IPython.Shell import IPShellEmbed
## ipshell = IPShellEmbed((),
## banner = '>>>>>>>> Into IPython >>>>>>>>',
## exit_msg = '<<<<<<<< Out of IPython <<<<<<<<')
#
# Then add the following line at the point in your code where
# you want to inspect run-time values:
#
# ipshell('some message to identify where we are')
#
# For more information see: http://ipython.scipy.org/moin/
#
#
# Constants
#
# Token types
NoneTokType = 0
LParTokType = 1
RParTokType = 2
WordTokType = 3
CommaTokType = 4
EOFTokType = 5
#
# Representation of a node in the AST (abstract syntax tree).
#
class ASTNode:
def __init__(self, nodeType, *args):
self.nodeType = nodeType
self.children = []
for item in args:
self.children.append(item)
def show(self, level):
self.showLevel(level)
print 'Node -- Type %s' % NodeTypeDict[self.nodeType]
level += 1
for child in self.children:
if isinstance(child, ASTNode):
child.show(level)
elif type(child) == types.ListType:
for item in child:
item.show(level)
else:
self.showLevel(level)
print 'Child:', child
def showLevel(self, level):
for idx in range(level):
print ' ',
#
# The recursive descent parser class.
# Contains the "recognizer" methods, which implement the grammar
# rules (above), one recognizer method for each production rule.
#
class ProgParser:
def __init__(self):
pass
def prog_reco(self):
commandList = []
while 1:
result = self.command_reco()
if not result:
break
commandList.append(result)
return ASTNode(ProgNodeType, commandList)
def command_reco(self):
if self.tokenType == EOFTokType:
return None
result = self.func_call_reco()
return ASTNode(CommandNodeType, result)
def func_call_reco(self):
if self.tokenType == WordTokType:
term = ASTNode(TermNodeType, self.token)
self.tokenType, self.token, self.lineNo = self.tokens.next()
if self.tokenType == LParTokType:
self.tokenType, self.token, self.lineNo = self.tokens.next()
result = self.func_call_list_reco()
if result:
if self.tokenType == RParTokType:
self.tokenType, self.token, self.lineNo = \
self.tokens.next()
return ASTNode(FuncCallNodeType, term, result)
else:
raise ParseError(self.lineNo, 'missing right paren')
else:
raise ParseError(self.lineNo, 'bad func call list')
else:
raise ParseError(self.lineNo, 'missing left paren')
else:
return None
def func_call_list_reco(self):
terms = []
while 1:
result = self.func_call_reco()
if not result:
break
terms.append(result)
if self.tokenType != CommaTokType:
break
self.tokenType, self.token, self.lineNo = self.tokens.next()
return ASTNode(FuncCallListNodeType, terms)
#
# The parse error exception class.
#
class ParseError(Exception):
def __init__(self, lineNo, msg):
RuntimeError.__init__(self, msg)
self.lineNo = lineNo
self.msg = msg
def getLineNo(self):
return self.lineNo
def getMsg(self):
return self.msg
def is_word(token):
for letter in token:
if letter not in string.ascii_letters:
return None
return 1
#
# Generate the tokens.
# Usage:
# gen = genTokens(infile)
# tokType, tok, lineNo = gen.next()
# ...
def genTokens(infile):
lineNo = 0
while 1:
lineNo += 1
try:
line = infile.next()
except:
yield (EOFTokType, None, lineNo)
toks = line.split()
for tok in toks:
if is_word(tok):
tokType = WordTokType
elif tok == '(':
tokType = LParTokType
elif tok == ')':
tokType = RParTokType
elif tok == ',':
tokType = CommaTokType
yield (tokType, tok, lineNo)
def test(infileName):
parser = ProgParser()
#ipshell('(test) #1\nCtrl-D to exit')
result = None
try:
result = parser.parseFile(infileName)
except ParseError, exp:
sys.stderr.write('ParseError: (%d) %s\n' % \
(exp.getLineNo(), exp.getMsg()))
if result:
result.show(0)
def usage():
print __doc__
sys.exit(1)
def main():
args = sys.argv[1:]
try:
opts, args = getopt.getopt(args, 'h', ['help'])
except:
usage()
relink = 1
for opt, val in opts:
if opt in ('-h', '--help'):
usage()
if len(args) != 1:
usage()
inputfile = args[0]
test(inputfile)
if __name__ == '__main__':
#import pdb; pdb.set_trace()
main()
And, here is a sample of the data we can apply this parser to:
aaa ( )
bbb ( ccc ( ) )
ddd ( eee ( ) , fff ( ggg ( ) , hhh ( ) , iii ( ) ) )
In this section we'll describe Plex and use it to produce a tokenizer for our recursive
descent parser.
In order to use it, you may want to add Plex-1.1.4/Plex to your PYTHONPATH.
#!/usr/bin/env python
"""
Sample Plex lexer
Usage:
python plex_example.py inputfile
"""
import sys
import Plex
def test(infileName):
letter = Plex.Range("AZaz")
digit = Plex.Range("09")
name = letter + Plex.Rep(letter | digit)
number = Plex.Rep1(digit)
space = Plex.Any(" \t")
endline = Plex.Str('\n')
#comment = Plex.Str('"') + Plex.Rep( Plex.AnyBut('"')) + Plex.Str('"'
resword = Plex.Str("if", "then", "else", "end")
lexicon = Plex.Lexicon([
(endline, count_lines),
(resword, 'keyword'),
(name, 'ident'),
(number, 'int'),
( Plex.Any("+-*/=<>"), 'operator'),
(space, Plex.IGNORE),
#(comment, 'comment'),
(Plex.Str('('), 'lpar'),
(Plex.Str(')'), 'rpar'),
# comments surrounded by (* and *)
(Plex.Str("(*"), Plex.Begin('comment')),
Plex.State('comment', [
(Plex.Str("*)"), Plex.Begin('')),
(Plex.AnyChar, Plex.IGNORE),
]),
])
infile = open(infileName, "r")
scanner = Plex.Scanner(lexicon, infile, infileName)
scanner.line_count = 0
while True:
token = scanner.read()
if token[0] is None:
break
position = scanner.position()
posstr = ('(%d, %d)' % (position[1], position[2], )).ljust(10)
tokstr = '"%s"' % token[1]
tokstr = tokstr.ljust(20)
print '%s tok: %s tokType: %s' % (posstr, tokstr, token[0],)
print 'line_count: %d' % scanner.line_count
def usage():
print __doc__
sys.exit(1)
def main():
args = sys.argv[1:]
if len(args) != 1:
usage()
infileName = args[0]
test(infileName)
if __name__ == '__main__':
#import pdb; pdb.set_trace()
main()
And, when we apply the above test program to this data, here is what we see:
And, here are some comments on constructing the patterns used in a lexicon:
Now let's revisit our recursive descent parser, this time with a tokenizer built with
Plex. The tokenizer is trivial, but will serve as an example of how to hook it into a
parser:
#!/usr/bin/env python
"""
A recursive descent parser example using Plex.
This example uses Plex to implement a tokenizer.
Usage:
python python_201_rparser_plex.py [options] <inputfile>
Options:
-h, --help Display this help message.
Example:
python python_201_rparser_plex.py myfile.txt
The grammar:
"""
#
# Constants
#
# Token types
NoneTokType = 0
LParTokType = 1
RParTokType = 2
WordTokType = 3
CommaTokType = 4
EOFTokType = 5
#
# Representation of a node in the AST (abstract syntax tree).
#
class ASTNode:
def __init__(self, nodeType, *args):
self.nodeType = nodeType
self.children = []
for item in args:
self.children.append(item)
def show(self, level):
self.showLevel(level)
print 'Node -- Type %s' % NodeTypeDict[self.nodeType]
level += 1
for child in self.children:
if isinstance(child, ASTNode):
child.show(level)
elif type(child) == types.ListType:
for item in child:
item.show(level)
else:
self.showLevel(level)
print 'Child:', child
def showLevel(self, level):
for idx in range(level):
print ' ',
#
# The recursive descent parser class.
# Contains the "recognizer" methods, which implement the grammar
# rules (above), one recognizer method for each production rule.
#
class ProgParser:
def __init__(self):
self.tokens = None
self.tokenType = NoneTokType
self.token = ''
self.lineNo = -1
self.infile = None
self.tokens = None
def prog_reco(self):
commandList = []
while 1:
result = self.command_reco()
if not result:
break
commandList.append(result)
return ASTNode(ProgNodeType, commandList)
def command_reco(self):
if self.tokenType == EOFTokType:
return None
result = self.func_call_reco()
return ASTNode(CommandNodeType, result)
def func_call_reco(self):
if self.tokenType == WordTokType:
term = ASTNode(TermNodeType, self.token)
self.tokenType, self.token, self.lineNo = self.tokens.next()
if self.tokenType == LParTokType:
self.tokenType, self.token, self.lineNo = self.tokens.next()
result = self.func_call_list_reco()
if result:
if self.tokenType == RParTokType:
self.tokenType, self.token, self.lineNo = \
self.tokens.next()
return ASTNode(FuncCallNodeType, term, result)
else:
raise ParseError(self.lineNo, 'missing right paren')
else:
raise ParseError(self.lineNo, 'bad func call list')
else:
raise ParseError(self.lineNo, 'missing left paren')
else:
return None
def func_call_list_reco(self):
terms = []
while 1:
result = self.func_call_reco()
if not result:
break
terms.append(result)
if self.tokenType != CommaTokType:
break
self.tokenType, self.token, self.lineNo = self.tokens.next()
return ASTNode(FuncCallListNodeType, terms)
#
# The parse error exception class.
#
class ParseError(Exception):
def __init__(self, lineNo, msg):
RuntimeError.__init__(self, msg)
self.lineNo = lineNo
self.msg = msg
def getLineNo(self):
return self.lineNo
def getMsg(self):
return self.msg
#
# Generate the tokens.
# Usage - example
# gen = genTokens(infile)
# tokType, tok, lineNo = gen.next()
# ...
def genTokens(infile, infileName):
letter = Plex.Range("AZaz")
digit = Plex.Range("09")
name = letter + Plex.Rep(letter | digit)
lpar = Plex.Str('(')
rpar = Plex.Str(')')
comma = Plex.Str(',')
comment = Plex.Str("#") + Plex.Rep(Plex.AnyBut("\n"))
space = Plex.Any(" \t\n")
lexicon = Plex.Lexicon([
(name, 'word'),
(lpar, 'lpar'),
(rpar, 'rpar'),
(comma, 'comma'),
(comment, Plex.IGNORE),
(space, Plex.IGNORE),
])
scanner = Plex.Scanner(lexicon, infile, infileName)
while 1:
tokenType, token = scanner.read()
name, lineNo, columnNo = scanner.position()
if tokenType == None:
tokType = EOFTokType
token = None
elif tokenType == 'word':
tokType = WordTokType
elif tokenType == 'lpar':
tokType = LParTokType
elif tokenType == 'rpar':
tokType = RParTokType
elif tokenType == 'comma':
tokType = CommaTokType
else:
tokType = NoneTokType
tok = token
yield (tokType, tok, lineNo)
def test(infileName):
parser = ProgParser()
#ipshell('(test) #1\nCtrl-D to exit')
result = None
try:
result = parser.parseFile(infileName)
except ParseError, exp:
sys.stderr.write('ParseError: (%d) %s\n' % \
(exp.getLineNo(), exp.getMsg()))
if result:
result.show(0)
def usage():
print __doc__
sys.exit(-1)
def main():
args = sys.argv[1:]
try:
opts, args = getopt.getopt(args, 'h', ['help'])
except:
usage()
for opt, val in opts:
if opt in ('-h', '--help'):
usage()
if len(args) != 1:
usage()
infileName = args[0]
test(infileName)
if __name__ == '__main__':
#import pdb; pdb.set_trace()
main()
And, here is a sample of the data we can apply this parser to:
Comments:
We can now put comments in our input, and they will be ignored. Comments
begin with a "#" and continue to the end of line. See the definition of comment
in function genTokens.
This tokenizer does not require us to separate tokens with whitespace as did
the simple tokenizer in the earlier version of our recursive descent parser.
The changes we made over the earlier version were to:
1. Import Plex.
2. Replace the definition of the tokenizer function genTokens.
3. Change the call to genTokens so that the call passes in the file name,
which is needed to create the scanner.
Our new version of genTokens does the following:
1. Create patterns for scanning.
2. Create a lexicon (an instance of Plex.Lexicon), which uses the patterns.
3. Create a scanner (an instance of Plex.Scanner), which uses the lexicon.
4. Execute a loop that reads tokens (from the scanner) and "yields" each
one.
And, for lexical analysis, you may also want to look here:
In the sections below, we give examples and notes about the use of PLY and
pyparsing.
Learn how to construct lexers and parsers with PLY by reading doc/ply.html in the
distribution of PLY and by looking at the examples in the distribution.
For those of you who want a more complex example, see A Python Parser for the
RELAX NG Compact Syntax, which is implemented with PLY.
Now, here is our example parser. Comments and explanations are below:
#!/usr/bin/env python
"""
A parser example.
This example uses PLY to implement a lexer and parser.
The grammar:
import sys
import types
import getopt
import ply.lex as lex
import ply.yacc as yacc
#
# Globals
#
startlinepos = 0
#
# Constants
#
#
# Representation of a node in the AST (abstract syntax tree).
#
class ASTNode:
def __init__(self, nodeType, *args):
self.nodeType = nodeType
self.children = []
for item in args:
self.children.append(item)
def append(self, item):
self.children.append(item)
def show(self, level):
self.showLevel(level)
print 'Node -- Type: %s' % NodeTypeDict[self.nodeType]
level += 1
for child in self.children:
if isinstance(child, ASTNode):
child.show(level)
elif type(child) == types.ListType:
for item in child:
item.show(level)
else:
self.showLevel(level)
print 'Value:', child
def showLevel(self, level):
for idx in range(level):
print ' ',
#
# Exception classes
#
class LexerError(Exception):
def __init__(self, msg, lineno, columnno):
self.msg = msg
self.lineno = lineno
self.columnno = columnno
def show(self):
sys.stderr.write('Lexer error (%d, %d) %s\n' % \
(self.lineno, self.columnno, self.msg))
class ParserError(Exception):
def __init__(self, msg, lineno, columnno):
self.msg = msg
self.lineno = lineno
self.columnno = columnno
def show(self):
sys.stderr.write('Parser error (%d, %d) %s\n' % \
(self.lineno, self.columnno, self.msg))
#
# Lexer specification
#
tokens = (
'NAME',
'LPAR','RPAR',
'COMMA',
)
# Tokens
t_LPAR = r'\('
t_RPAR = r'\)'
t_COMMA = r'\,'
t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'
# Ignore whitespace
t_ignore = ' \t'
def t_newline(t):
r'\n+'
global startlinepos
startlinepos = t.lexer.lexpos - 1
t.lineno += t.value.count("\n")
def t_error(t):
global startlinepos
msg = "Illegal character '%s'" % (t.value[0])
columnno = t.lexer.lexpos - startlinepos
raise LexerError(msg, t.lineno, columnno)
#
# Parser specification
#
def p_prog(t):
'prog : command_list'
t[0] = ASTNode(ProgNodeType, t[1])
def p_command_list_1(t):
'command_list : command'
t[0] = ASTNode(CommandListNodeType, t[1])
def p_command_list_2(t):
'command_list : command_list command'
t[1].append(t[2])
t[0] = t[1]
def p_command(t):
'command : func_call'
t[0] = ASTNode(CommandNodeType, t[1])
def p_func_call_1(t):
'func_call : term LPAR RPAR'
t[0] = ASTNode(FuncCallNodeType, t[1])
def p_func_call_2(t):
'func_call : term LPAR func_call_list RPAR'
t[0] = ASTNode(FuncCallNodeType, t[1], t[3])
def p_func_call_list_1(t):
'func_call_list : func_call'
t[0] = ASTNode(FuncCallListNodeType, t[1])
def p_func_call_list_2(t):
'func_call_list : func_call_list COMMA func_call'
t[1].append(t[3])
t[0] = t[1]
def p_term(t):
'term : NAME'
t[0] = ASTNode(TermNodeType, t[1])
def p_error(t):
global startlinepos
msg = "Syntax error at '%s'" % t.value
columnno = t.lexer.lexpos - startlinepos
raise ParserError(msg, t.lineno, columnno)
#
# Parse the input and display the AST (abstract syntax tree)
#
def parse(infileName):
startlinepos = 0
# Build the lexer
lex.lex(debug=1)
# Build the parser
yacc.yacc()
# Read the input
infile = file(infileName, 'r')
content = infile.read()
infile.close()
try:
# Do the parse
result = yacc.parse(content)
# Display the AST
result.show(0)
except LexerError, exp:
exp.show()
except ParserError, exp:
exp.show()
USAGE_TEXT = __doc__
def usage():
print USAGE_TEXT
sys.exit(-1)
def main():
args = sys.argv[1:]
try:
opts, args = getopt.getopt(args, 'h', ['help'])
except:
usage()
relink = 1
for opt, val in opts:
if opt in ('-h', '--help'):
usage()
if len(args) != 1:
usage()
infileName = args[0]
parse(infileName)
if __name__ == '__main__':
#import pdb; pdb.set_trace()
main()
Creating the syntax tree -- Basically, each rule (1) recognizes a non-terminal,
(2) creates a node (possibly using the values from the right-hand side of the
rule), and (3) returns the node by setting the value of t[0]. A deviation from this
is the processing of sequences, discussed below.
Sequences -- p_command_list_1 and p_command_list_1 show how to handle
sequences of items. In this case:
p_command_list_1 recognizes a command and creates an instance of
ASTNode with type CommandListNodeType and adds the command to it
as a child, and
p_command_list_2 recognizes an additional command and adds it (as a
child) to the instance of ASTNode that represents the list.
Distinguishing between di!erent forms of the same rule -- In order to process
alternatives to the same production rule di!erently, we use di!erent functions
with di!erent implementations. For example, we use:
p_func_call_1 to recognize and process "func_call : term LPAR RPAR" (a
function call without arguments), and
p_func_call_2 to recognize and process "func_call : term LPAR
func_call_list RPAR" (a function call with arguments).
Reporting errors -- Our parser reports the first error and quits. We've done this
by raising an exception when we find an error. We implement two exception
classes: LexerError and ParserError. Implementing more than one exception
class enables us to distinguish between di!erent classes of errors (note the
multiple except: clauses on the try: statement in function parse). And, we use
an instance of the exception class as a container in order to "bubble up"
information about the error (e.g. a message, a line number, and a column
number).
You will also want to look at the samples in the examples directory, which are very
helpful. My examples below are fairly simple. You can see more of the ability of
pyparsing to handle complex tasks in the examples.
Where to get it - You can find pyparsing at: Pyparsing Wiki Home --
http://pyparsing.wikispaces.com/
import sys
from pyparsing import alphanums, ZeroOrMore, Word
fieldDef = Word(alphanums)
lineDef = fieldDef + ZeroOrMore("," + fieldDef)
def test():
args = sys.argv[1:]
if len(args) != 1:
print 'usage: python pyparsing_test1.py <datafile.txt>'
sys.exit(-1)
infilename = sys.argv[1]
infile = file(infilename, 'r')
for line in infile:
fields = lineDef.parseString(line)
print fields
test()
abcd,defg
11111,22222,33333
And, when we run our parser on this data file, here is what we see:
Note how the grammar is constructed from normal Python calls to function and
object/class constructors. I've constructed the parser in-line because my
example is simple, but constructing the parser in a function or even a module
might make sense for more complex grammars. pyparsing makes it easy to use
these these di!erent styles.
with:
lineDef = delimitedList(fieldDef)
And note that delimitedList takes an optional argument delim used to specify
the delimiter. The default is a comma.
lparen = Literal("(")
rparen = Literal(")")
identifier = Word(alphas, alphanums + "_")
integer = Word( nums )
functor = identifier
arg = identifier | integer
args = arg + ZeroOrMore("," + arg)
expression = functor + lparen + args + rparen
def test():
content = raw_input("Enter an expression: ")
parsedContent = expression.parseString(content)
print parsedContent
test()
Explanation:
Input format:
[name] [phone] [city, state zip]
Last, first 111-222-3333 city, ca 99999
import sys
from pyparsing import alphas, nums, ZeroOrMore, Word, Group, Suppress, Combi
lastname = Word(alphas)
firstname = Word(alphas)
city = Group(Word(alphas) + ZeroOrMore(Word(alphas)))
state = Word(alphas, exact=2)
zip = Word(nums, exact=5)
def test():
args = sys.argv[1:]
if len(args) != 1:
print 'usage: python pyparsing_test3.py <datafile.txt>'
sys.exit(-1)
infilename = sys.argv[1]
infile = file(infilename, 'r')
for line in infile:
line = line.strip()
if line and line[0] != "#":
fields = record.parseString(line)
print fields
test()
Comments:
We use the len=n argument to the Word constructor to restict the parser to
accepting a specific number of characters, for example in the zip code and
phone number. Word also accepts min=n'' and ``max=n to enable you to restrict
the length of a word to within a range.
We use Group to group the parsed results into sub-lists, for example in the
definition of city and name. Group enables us to organize the parse results into
simple parse trees.
We use Combine to join parsed results back into a single string. For example,
in the phone number, we can require dashes and yet join the results back into a
single string.
We use Suppress to remove unneeded sub-elements from parsed results. For
example, we do not need the comma between last and first name.
from pyparsing import Literal, Word, Group, Dict, ZeroOrMore, alphas, nums,\
delimitedList
import pprint
testData = """
+-------+------+------+------+------+------+------+------+------+
| | A1 | B1 | C1 | D1 | A2 | B2 | C2 | D2 |
+=======+======+======+======+======+======+======+======+======+
| min | 7 | 43 | 7 | 15 | 82 | 98 | 1 | 37 |
| max | 11 | 52 | 10 | 17 | 85 | 112 | 4 | 39 |
| ave | 9 | 47 | 8 | 16 | 84 | 106 | 3 | 38 |
| sdev | 1 | 3 | 1 | 1 | 1 | 3 | 1 | 1 |
+-------+------+------+------+------+------+------+------+------+
"""
def main():
# Now parse data and print results
data = datatable.parseString(testData)
print "data:", data
print "data.asList():",
pprint.pprint(data.asList())
print "data keys:", data.keys()
print "data['min']:", data['min']
print "data.max:", data.max
if __name__ == '__main__':
main()
data: [['min', '7', '43', '7', '15', '82', '98', '1', '37'],
['max', '11', '52', '10', '17', '85', '112', '4', '39'],
['ave', '9', '47', '8', '16', '84', '106', '3', '38'],
['sdev', '1', '3', '1', '1', '1', '3', '1', '1']]
data.asList():[['min', '7', '43', '7', '15', '82', '98', '1', '37'],
['max', '11', '52', '10', '17', '85', '112', '4', '39'],
['ave', '9', '47', '8', '16', '84', '106', '3', '38'],
['sdev', '1', '3', '1', '1', '1', '3', '1', '1']]
data keys: ['ave', 'min', 'sdev', 'max']
data['min']: ['7', '43', '7', '15', '82', '98', '1', '37']
data.max: ['11', '52', '10', '17', '85', '112', '4', '39']
Notes:
Note the use of Dict to create a dictionary. The print statements show how to
get at the items in the dictionary.
Note how we can also get the parse results as a list by using method asList.
Again, we use suppress to remove unneeded items from the parse results.
7 GUI Applications
7.1 Introduction
This section will help you to put a GUI (graphical user interface) in your Python
program.
We will use a particular GUI library: PyGTK. We've chosen this because it is reasonably
light-weight and our goal is to embed light-weight GUI interfaces in an (possibly)
existing application.
For simpler GUI needs, consider EasyGUI, which is also described below.
For more heavy-weight GUI needs (for example, complete GUI applications), you may
want to explore WxPython. See the WxPython home page at:
http://www.wxpython.org/
7.2 PyGtk
Information about PyGTK is here: The PyGTK home page -- http://www.pygtk.org//.
#!/usr/bin/env python
import sys
import getopt
import gtk
class MessageBox(gtk.Dialog):
def __init__(self, message="", buttons=(), pixmap=None,
modal= True):
gtk.Dialog.__init__(self)
self.connect("destroy", self.quit)
self.connect("delete_event", self.quit)
if modal:
self.set_modal(True)
hbox = gtk.HBox(spacing=5)
hbox.set_border_width(5)
self.vbox.pack_start(hbox)
hbox.show()
if pixmap:
self.realize()
pixmap = Pixmap(self, pixmap)
hbox.pack_start(pixmap, expand=False)
pixmap.show()
label = gtk.Label(message)
hbox.pack_start(label)
label.show()
for text in buttons:
b = gtk.Button(text)
b.set_flags(gtk.CAN_DEFAULT)
b.set_data("user_data", text)
b.connect("clicked", self.click)
self.action_area.pack_start(b)
b.show()
self.ret = None
def quit(self, *args):
self.hide()
self.destroy()
gtk.main_quit()
def click(self, button):
self.ret = button.get_data("user_data")
self.quit()
def test():
result = message_box(title='Test #1',
message='Here is your message',
buttons=('Ok', 'Cancel'))
print 'result:', result
USAGE_TEXT = """
Usage:
python simple_dialog.py [options]
Options:
-h, --help Display this help message.
Example:
python simple_dialog.py
"""
def usage():
print USAGE_TEXT
sys.exit(-1)
def main():
args = sys.argv[1:]
try:
opts, args = getopt.getopt(args, 'h', ['help'])
except:
usage()
relink = 1
for opt, val in opts:
if opt in ('-h', '--help'):
usage()
if len(args) != 0:
usage()
test()
if __name__ == '__main__':
#import pdb; pdb.set_trace()
main()
Some explanation:
#!/usr/bin/env python
import sys
import getopt
import gtk
def test():
result = input_box(title='Test #2',
message='Enter a valuexxx:',
default_text='a default value')
if result is None:
print 'Canceled'
else:
print 'result: "%s"' % result
USAGE_TEXT = """
Usage:
python simple_dialog.py [options]
Options:
-h, --help Display this help message.
Example:
python simple_dialog.py
"""
def usage():
print USAGE_TEXT
sys.exit(-1)
def main():
args = sys.argv[1:]
try:
opts, args = getopt.getopt(args, 'h', ['help'])
except:
usage()
relink = 1
for opt, val in opts:
if opt in ('-h', '--help'):
usage()
if len(args) != 0:
usage()
test()
if __name__ == '__main__':
#import pdb; pdb.set_trace()
main()
Most of the explanation for the message box example is relevant to this example,
too. Here are some di!erences:
Our EntryDialog class constructor creates instance of gtk.Entry, sets its default
value, and packs it into the client area.
The constructor also automatically creates two buttons: "OK" and "Cancel". The
"OK" button is connect to the click method, which saves the value of the entry
field. The "Cancel" button is connect to the quit method, which does not save
the value.
And, if class EntryDialog and function input_box look usable and useful, add
them to your utility gui module.
#!/usr/bin/env python
import sys
import getopt
import gtk
class FileChooser(gtk.FileSelection):
def __init__(self, modal=True, multiple=True):
gtk.FileSelection.__init__(self)
self.multiple = multiple
self.connect("destroy", self.quit)
self.connect("delete_event", self.quit)
if modal:
self.set_modal(True)
self.cancel_button.connect('clicked', self.quit)
self.ok_button.connect('clicked', self.ok_cb)
if multiple:
self.set_select_multiple(True)
self.ret = None
def quit(self, *args):
self.hide()
self.destroy()
gtk.main_quit()
def ok_cb(self, b):
if self.multiple:
self.ret = self.get_selections()
else:
self.ret = self.get_filename()
self.quit()
def file_open_box(modal=True):
return file_sel_box("Open", modal=modal, multiple=True)
def file_save_box(modal=True):
return file_sel_box("Save As", modal=modal, multiple=False)
def test():
result = file_open_box()
print 'open result:', result
result = file_save_box()
print 'save result:', result
USAGE_TEXT = """
Usage:
python simple_dialog.py [options]
Options:
-h, --help Display this help message.
Example:
python simple_dialog.py
"""
def usage():
print USAGE_TEXT
sys.exit(-1)
def main():
args = sys.argv[1:]
try:
opts, args = getopt.getopt(args, 'h', ['help'])
except:
usage()
relink = 1
for opt, val in opts:
if opt in ('-h', '--help'):
usage()
if len(args) != 0:
usage()
test()
if __name__ == '__main__':
main()
#import pdb
#pdb.run('main()')
A little guidance:
Note that there are also predefined dialogs for font selection (FontSelectionDialog)
and color selection (ColorSelectionDialog)
7.3 EasyGUI
If your GUI needs are minimalist (maybe a pop-up dialog or two) and your application
is imperative rather than event driven, then you may want to consider EasyGUI. As the
name suggests, it is extremely easy to use.
How to know when you might be able to use EasyGUI:
Your application does not need to run in a window containing menus and a
menu bar.
Your GUI needs amount to little more than displaying a dialog now and then to
get responses from the user.
You do not want to write an event driven application, that is, one in which your
code sits and waits for the the user to initiate operation, for example, with
menu items.
EasyGUI plus documentation and examples are available at EasyGUI home page at
SourceForge -- http://easygui.sourceforge.net/
See the documentation at the EasyGUI Web site for more features.
$ python easygui.py
import easygui
def testeasygui():
response = easygui.enterbox(msg='Enter your name:', title='Name Entry')
easygui.msgbox(msg=response, title='Your Response')
testeasygui()
import easygui
def test():
response = easygui.fileopenbox(msg='Select a file')
print 'file name: %s' % response
test()
8.1 Introduction
Python has an excellent range of implementation organization structures. These
range from statements and control structures (at a low level) through functions,
methods, and classes (at an intermediate level) and modules and packages at an
upper level.
This section provides some guidance with the use of packages. In particular:
In order to be able to import individual modules from a directory, the directory must
contain a file named __init__.py. (Note that requirement does not apply to directories
that are listed in PYTHONPATH.) The __init__.py serves several purposes:
A second, slightly more advanced way to enable the user to import the package is to
expose those features of the package in the __init__ module. Suppose that module
mod1 contains functions fun1a and fun1b and suppose that module mod2 contains
functions fun2a and fun2b. Then file __init__.py might contain the following:
import testpackages
Then testpackages will contain fun1a, fun1b, fun2a, and fun2b.
For example, here is an interactive session that demostrates importing the package:
Testpackages
Testpackages/README
Testpackages/MANIFEST.in
Testpackages/setup.py
Testpackages/testpackages/__init__.py
Testpackages/testpackages/mod1.py
Testpackages/testpackages/mod2.py
We'll describe how to configure the above files so that they can be packaged as a
single distribution file and so that the Python package they contain can be installed
as a package by Distutils.
The MANIFEST.in file lists the files that we want included in our distribution. Here is
the contents of our MANIFEST.in file:
The setup.py file describes to Distutils (1) how to package the distribution file and (2)
how to install the distribution. Here is the contents of our sample setup.py:
#!/usr/bin/env python
Explanation:
Then, you can give this distribution file to a potential user, who can install it by doing
the following:
9 End Matter