Manual PyParsing PDF
Manual PyParsing PDF
Abstract
A quick reference guide for pyparsing, a recursive descent parser framework for the Python
programming language.
1 2
This publication is available in Web form and also as a PDF document . Please forward any
comments to [email protected].
Table of Contents
1. pyparsing: A tool for extracting information from text ................................................................. 3
2. Structuring your application ..................................................................................................... 4
3. A small, complete example ....................................................................................................... 5
4. How to structure the returned ParseResults ......................................................................... 7
4.1. Use pp.Group() to divide and conquer ........................................................................ 8
4.2. Structuring with results names ....................................................................................... 9
5. Classes .................................................................................................................................. 10
5.1. ParserElement: The basic parser building block ......................................................... 11
5.2. And: Sequence ............................................................................................................. 16
5.3. CaselessKeyword: Case-insensitive keyword match ................................................... 16
5.4. CaselessLiteral: Case-insensitive string match ........................................................ 16
5.5. CharsNotIn: Match characters not in a given set .......................................................... 16
5.6. Combine: Fuse components together ............................................................................ 17
5.7. Dict: A scanner for tables ............................................................................................ 18
5.8. Each: Require components in any order ........................................................................ 18
5.9. Empty: Match empty content ........................................................................................ 18
5.10. FollowedBy: Adding lookahead constraints ............................................................... 19
5.11. Forward: The parser placeholder ............................................................................... 19
5.12. GoToColumn: Advance to a specified position in the line ............................................. 22
5.13. Group: Group repeated items into a list ....................................................................... 23
5.14. Keyword: Match a literal string not adjacent to specified context ................................... 23
5.15. LineEnd: Match end of line ....................................................................................... 24
5.16. LineStart: Match start of line .................................................................................. 24
5.17. Literal: Match a specific string ................................................................................ 25
5.18. MatchFirst: Try multiple matches in a given order ................................................... 25
5.19. NoMatch: A parser that never matches ........................................................................ 25
5.20. NotAny: General lookahead condition ......................................................................... 26
5.21. OneOrMore: Repeat a pattern one or more times .......................................................... 26
5.22. Optional: Match an optional pattern ......................................................................... 26
1
2
http://www.nmt.edu/tcc/help/pubs/pyparsing/
http://www.nmt.edu/tcc/help/pubs/pyparsing/pyparsing.pdf
import pyparsing as pp
The examples in this document will refer to the module through as pp.
4. Your script will assemble a parser that matches your BNF. A parser is an instance of the abstract base
class pp.ParserElement that describes a general pattern.
Building a parser for your input file format is a bottom-up process. You start by writing parsers for
the smallest pieces, and assemble them into larger and larger pieces into you have a parser for the
entire file.
5. Build a Python string (type str or unicode) containing the input text to be processed.
6. If the parser is p and the input text is s, this code will try to match them:
p.parseString(s)
If the syntax of s matches the syntax descriped by p, this expression will return an object that repres-
ents the parts that matched. This object will be an instance of class pp.ParseResults.
If the input does not match your parser, it will raise an exception of class pp.ParseException.
This exception will include information about where in the input the parse faiiled.
The .parseString() method proceeds in sequence through your input text, using the pieces of
your parser to match chunks of text. The lowest-level parsers are sometimes called tokens, and parsers
at higher levels are called patterns.
You may attach parse actions to any component parser. For example, the parser for an integer might
have an attached parse action that converts the string representation into a Python int.
15
16
http://www.nmt.edu/tcc/help/pubs/pylxml/
http://pyparsing.wikispaces.com/
7. Extract your application's information from the returned ParseResults instance. The exact structure
of this instance depends on how you built the parser.
To see how this all fits together:
Section 3, A small, complete example (p. 5).
Section 4, How to structure the returned ParseResults (p. 7).
That last production can be read as: an identifier consists of one first followed by zero or more
rest.
Here is a script that implements that syntax and then tests it against a number of strings.
trivex
#!/usr/bin/env python
#================================================================
# trivex: Trivial example
#----------------------------------------------------------------
# - - - - - I m p o r t s
import sys
The next line imports the pyparsing module and renames it as pp.
trivex
import pyparsing as pp
# - - - - - M a n i f e s t c o n s t a n t s
In the next line, the pp.alphas variable is a string containing all lowercase and uppercase letters. The
pp.Word() class produces a parser that matches a string of letters defined by its first argument; the
exact=1 keyword argument tells that parser to accept exactly one character from that string. So first
is a parser (that is, a ParserElement instance) that matches exactly one letter or an underbar.
The pp.alphanums variable is a string containing all the letters and all the digits. So the rest pattern
matches one or more letters, digits, or underbar characters.
trivex
rest = pp.Word(pp.alphanums+"_")
The Python + operator is overloaded for instances of the pp.ParserElement class to mean sequence:
that is, the identifier parser matches what the first parser matches, followed optionally by what
the rest parser matches.
trivex
identifier = first+pp.Optional(rest)
# - - - - - m a i n
def main():
"""
"""
for text in testList:
test(text)
# - - - t e s t
def test(s):
'''See if s matches identifier.
'''
print "---Test for '{0}'".format(s)
When you call the .parseString() method on an instance of the pp.ParserElement class, either
it returns a list of the matched elements or raises a pp.ParseException.
trivex
try:
result = identifier.parseString(s)
print " Matches: {0}".format(result)
except pp.ParseException as x:
print " No match: {0}".format(str(x))
# - - - - - E p i l o g u e
if __name__ == "__main__":
main()
The return value is an instance of the pp.ParseResults class; when printed, it appears as a list of the
matched strings. You will note that for single-letter test strings, the resulting list has only a single element,
while for multi-letter strings, the list has two elements: the first character (the part that matched first)
followed by the remaining characters that matched the rest parser.
If we want the resulting list to have only one element, we can change one line to get this effect:
identifier = pp.Combine(first+pp.Optional(rest))
The pp.Combine() class tells pyparsing to combine all the matching pieces in its argument list into a
single result. Here is an example of two output lines from the revised script:
As a dictionary. You can attach a results name r to a parser by calling its .setResultsName(s)
method (see Section 5.1, ParserElement: The basic parser building block (p. 11)). Once you have
done that, you can extract the matched string from the ParseResults instance r as r[s].
Here are some general principles for structuring your parser's ParseResults instance.
However, when you apply pp.Group() to some parser, all the matching pieces are returned in a
single pp.ParseResults that acts like a list.
17
http://en.wikipedia.org/wiki/Stepwise_refinement
That result doesn't really match our concept that the parser is a sequence of two things: a single word,
followed by a sequence of words.
By applying pp.Group() like this, we get a parser that will return a sequence of two items that match
our concept.
1. The grouped parser has two components: a word and a pp.Group. Hence, the result returned acts
like a two-element list.
2. The first element is an actual string, 'imaginary'.
3. The second part is another pp.ParseResults instance that acts like a list of strings.
So for larger grammars, the pp.ParseResults instance, which the top-level parser returns when it
matches, will typically be a many-layered structure containing this kind of mixture of ordinary strings
and other instances of pp.ParseResults.
The next section will give you some suggestions on manage the structure of these beasts.
In the four-element list shown above, you can access the first and third elements by name, but the
second and fourth would be accessible only by position.
A more sensible way to structure this parser would be to write a parser for the combination of a name
and a score, and then combine two of those for the overall parser.
Don't use a results name for a repeated element. If you do, only the last one will be accessible by
results name in the ParseResults.
A better approach is to wrap the entire name in a pp.Group() and then apply the results name to
that.
5. Classes
Here are the classes defined in the pyparsing module.
In the process of building a large syntax out of small pieces, define a parser for each piece, and then
combine the pieces into larger and large aggregations until you have a parser that matches the entire
input.
To assemble parsers into larger configurations, you will use pyparsing's built-in classes such as pp.And,
pp.Or, and pp.OneOrMore. Each of these class constructors returns a parser, and many of them accept
one or more parsers as arguments.
For example, if a certain element of the syntax described by some parser p is optional, then pp.Option-
al(p) returns another parser that is, another instance of a subclass of pp.ParserElement that
will match pattern p if it occurs at that point in the input, and do nothing if the input does not match
p.
Here are the methods available on a parser instance p that subclasses pp.ParserElement.
p.addParseAction(f1, f2, ...)
Returns a copy of p with one or more additional parse actions attached. See the p.setParseAc-
tion() method below for a discussion of parse actions.
p.copy()
Returns a copy of p.
p.ignore(q)
This method modifies p so that it ignores any number of occurrences of text that matches pattern
q. This is a useful way to instruct your parser to ignore comments.
p.leaveWhitespace()
This method instructs p not to skip whitespace before matching the input text. The method returns
p.
When used on a parser that includes multiple pieces, this method suppresses whitespace skipping
for all the included pieces. Here is an example:
You will note that even though the num parser does not skip whitespace, whitespace is still disallowed
for the string ' 47' because the wn parser disabled automatic whitespace skipping.
p.parseFile(f, parseAll=False)
Try to match the contents of a file against parser p. The argument f may be either the name of a file
or a file-like object.
If the entire contents of the file does not match p, it is not considered an error unless you pass the
argument parseAll=True.
p.parseString(s, parseAll=False)
Try to match string s against parser p. If there is a match, it returns an instance of Section 5.26,
ParseResults: Result returned from a match (p. 28). If there is no match, it will raise a
pp.ParseException.
By default, if the entirety of s does not match p, it is not considered an error. If you want to insure
that all of s matched p, pass the keyword argument parseAll=True.
p.scanString(s)
Search through string s to find regions that match p. This method is an iterator that generates a
sequence of tuples (r, start, end), where r is a pp.ParseResults instance that represents
the matched part, and start and end are the beginning and ending offsets within s that bracket
the position of the matched text.
p.setBreak()
When this parser is about to be used, call up the Python debugger pdb.
p.setFailAction(f)
This method modifies p so that it will call function f if it fails to parse. The method returns p.
Here is the calling sequence for a fail action:
s
The input string.
loc
The location in the input where the parse failed, as an offset counting from 0.
p.setName(name)
Attaches a name to this parser for debugging purposes. The argument is a string. The method returns
p.
>>> count.parseString('FAIL')
pyparsing.ParseException: Expected count-parser (at char 0), (line:1,
col:1)
In the above example, if you convert a parser to a string, you get a generic description of it: the
string W:(0123...) tells you it is a Word parser and shows you the first few characters in the
set. Once you have attached a name to it, the string form of the parser is that name. Note that when
the parse fails, the error message identifies what it expected by naming the failed parser.
p.setParseAction(f1, f2, ...)
This method returns a copy of p with one or more parse actions attached. When the parser matches
the input, it then calls each function fi in the order specified.
The calling sequence for a parse action can be any of these four prototypes:
f()
f(toks)
f(loc, toks)
f(s, loc, toks)
These are the arguments your function will receive, depending on how many arguments it accepts:
s
18
The string being parsed. If your string contains tab characters, see the reference documentation
for notes about tab expansion and its effect on column positions.
18
http://packages.python.org/pyparsing/
p.setResultsName(name)
For parsers that deposit the matched text into the ParseResults instance returned by
.parseString(), you can use this method to attach a name to that matched text. Once you do
this, you can retrieve the matched text from the ParseResults instance by using that instance as
if it were a Python dict.
The result of this method is a copy of p. Hence, if you have defined a useful parser, you can create
several instances, each with a different results name. Continuing the above example, if we then use
the count parser, we find that it does not have the results name that is attached to its copy bean-
Counter.
>>> r2 = count.parseString('8873')
>>> r2.keys()
[]
>>> print r2
['8873']
p.setWhitespaceChars(s)
For parser p, change its definition of whitespace to the characters in string s.
Additionally, these ordinary Python operators are overloaded to work with ParserElement instances.
p1+p2
Equivalent to pp.And(p1, p2).
p * n
For a parser p and an integer n, the result is a parser that matches n repetitions of p. You can give
the operands in either order: for example, 3 * p is the same as p * 3.
p1 | p2
Equivalent to pp.MatchFirst(p1, p2).
p1 ^ p2
Equivalent to pp.Or(p1, p2).
p1 & p2
Equivalent to pp.Each(p1, p2).
~ p
Equivalent to pp.NotAny(p).
Class pp.ParserElement also supports one static method:
pp.ParserElement.setDefaultWhitespaceChars(s)
This static method changes the definition of whitespace to be the characters in string s. Calling this
method has this effect on all subsequent instantiations of any pp.ParserElement subclass.
The argument is a sequence of ParseExpression instances. The resulting parser matches a sequence
of items that match those expressions, in exactly that order. You may also use the Python + operator
to get this functionality. Here are some examples:
A variant of Section 5.14, Keyword: Match a literal string not adjacent to specified context (p. 23)
that treats uppercase and lowercase characters the same.
The argument is a literal string to be matched. The resulting parser matches that string, except that it
treats uppercase and lowercase characters the same.
The matched value will always have the same case as the matchString argument, not the case of the
matched text.
>>> ni=pp.CaselessLiteral('Ni')
>>> print ni.parseString('Ni')
['Ni']
>>> print ni.parseString('NI')
['Ni']
>>> print ni.parseString('nI')
['Ni']
>>> print ni.parseString('ni')
['Ni']
A parser of this class matches one or more characters that are not in the notChars argument. You may
specify a minimum count of such characters using the min keyword argument, and you may specify a
maximum count as the max argument. To create a parser that matches exactly N characters that are not
in notChars, use the exact=N keyword argument.
The purpose of this class is to modify a parser containing several pieces so that the matching string will
be returned as a single item in the returned ParseResults instance. The return value is another
ParserElement instance that matches the same syntax as parser, but combines the pieces in the
result.
parser
A parser, as a ParserElement instance.
joinString
A string that will be inserted between the pieces of the matched text when they are concatenated
in the result.
adjacent
In the default case, adjacent=True, the text matched by components of the parser must be ad-
jacent. If you pass adjacent=False, the result will match text containing the components of
parser even if they are separated by other text.
In the example above, hiwayPieces matches one or more letters (pp.Word(pp.alphas)) followed
by one or more digits (pp.Word(pp.nums)). Because it has two components, a match on hiwayPieces
will always return a list of two strings. The hiway parser returns a list containing one string, the concat-
enation of the matched pieces.
The Dict class is a highly specialized pattern used to extract data from text arranged in rows and
columns, where the first column contains labels for the remaining columns. The pattern argument
must be a parser that describes a two-level structure such as a Group within a Group. Other group-like
patterns such as the delimitedList() function may be used.
The constructor returns a parser whose .parseString() method will return a ParseResults instance
like most parsers; however, in this case, the ParseResults instance can act like a dictionary whose
keys are the row labels and each related value is a list of the other items in that row.
Here is an example.
catbird
#!/usr/bin/env python
#================================================================
# dicter: Example of pyparsing.Dict pattern
#----------------------------------------------------------------
import pyparsing as pp
This class returns a ParserElement that matches a given set of pieces, but the pieces may occur in
any order. You may also construct this class using the & operator and the identity Each([p0, p1,
p2, ...]) == p0 & p1 & p2 & .... Here is an example: a pattern that requires a string of letters
and a string of digits, but they may occur in either order.
>>> num=pp.Word(pp.nums)
>>> name=pp.Word(pp.alphas)
>>> nameNum = num & name
>>> print nameNum.parseString('Henry8')
['Henry', '8']
>>> print nameNum.parseString('16Christine')
['16', 'Christine']
>>> e=pp.Empty()
>>> print e.parseString('')
[]
>>> print e.parseString('shrubber')
[]
>>> print e.parseString('shrubber', parseAll=True)
pyparsing.ParseException: Expected end of text (at char 0), (line:1, col:1)
This class is used to specify a lookahead; that is, some content which must appear in the input, but you
don't want to match any of it. Here is an example.
The name pattern matches one or more letters; the oneOrTwo pattern matches either '1' or '2'; and
the number pattern matches one or more digits. The expression pp.FollowedBy(oneOrTwo) requires
that the next thing after the name matches the oneOrTwo pattern, but the input is not advanced past
it.
Thus, the number pattern matches one or more digits, including the '1' or '2' just after the name. In
the 'Robin88' example, the match fails because the character just after 'Robin' is neither '1' or
'2'.
1HX 'X'
10H0123456789 '0123456789'
We'll write our pattern so that the 'H' can be either uppercase or lowercase.
Here's the complete script. We start with the usual preliminaries: imports, some test strings, the main,
and a function to run each test.
hollerith
#!/usr/bin/env python
#================================================================
# hollerith: Demonstrate Forward class
#----------------------------------------------------------------
import sys
import pyparsing as pp
# - - - - - M a n i f e s t c o n s t a n t s
# - - - - - m a i n
def main():
holler = hollerith()
for text in TEST_STRINGS:
test(holler, text)
# - - - t e s t
Next we'll define the function hollerith() that returns a parse for a Hollerith string.
hollerith
# - - - h o l l e r i t h
def hollerith():
First we define a parser intExpr that matches the character count. It has a parse action that converts
the number from character form to a Python int. The lambda expression defines a nameless function
that takes a list of tokens and converts the first token to an int.
hollerith
#--
# Define a recognizer for the character count.
#--
intExpr = pp.Word(pp.nums).setParseAction(lambda t: int(t[0]))
Next we create an empty Forward parser as a placeholder for the logic that matches the 'H' and the
following characters.
hollerith
#--
# Allocate a placeholder for the rest of the parsing logic.
#--
stringExpr = pp.Forward()
19
Next we define a closure that will be added to intExpr as a second parse action. Notice that we are
defining a function within a function. The countedParseAction function will retain access to an ex-
ternal name (stringExpr, which is defined in the outer function's scope) after the function is defined.
hollerith
#--
# Define a closure that transfers the character count from
# the intExpr to the stringExpr.
#--
def countedParseAction(toks):
'''Closure to define the content of stringExpr.
'''
The argument is the list of tokens that was recognized by intExpr; because of its parse action, this list
contains the count as a single int.
hollerith
n = toks[0]
The contents parser will match exactly n characters. We'll use Section 5.5, CharsNotIn: Match
characters not in a given set (p. 16) to do this match, specifying the excluded characters as an empty
string so that any character will be included. Incidentally, this does not for n==0, but '0H' is not a
valid Hollerith literal. A more robust implementation would raise a pp.ParseException in this case.
hollerith
#--
# Create a parser for any (n) characters.
#--
contents = pp.CharsNotIn('', exact=n)
This next line inserts the final pattern into the placeholder parser: an 'H' in either case followed by the
contents pattern. The '<<' operator is overloaded in the Forward class to perform this operation:
for any Forward recognizer F and any parser p, the expression F << p modifies F so that it matches
pattern p.
19
http://en.wikipedia.org/wiki/Closure_(computer_science)
#--
# Store a recognizer for 'H' + contents into stringExpr.
#--
stringExpr << (pp.Suppress(pp.CaselessLiteral('H')) + contents)
Parse actions may elect to modify the recognized tokens, but we don't need to do that, so we return
None to signify that the tokens remain unchanged.
hollerith
return None
That is the end of the countedParseAction closure. We are now back in the scope of hollerith().
The next line adds the closure as the second parse action for the intExpr parser.
hollerith
#--
# Add the above closure as a parse action for intExpr.
#--
intExpr.addParseAction(countedParseAction)
Now we are ready to return the completed hollerith parser: intExpr recognizes the count and
stringExpr recognizes the 'H' and string contents. When we return it, it is still just an empty Forward,
but it will be filled in before it asked to parse.
hollerith
#--
# Return the completed pattern.
#--
return (pp.Suppress(intExpr) + stringExpr)
# - - - - - E p i l o g u e
if __name__ == "__main__":
main()
Here is the output of the script. Note that the last test fails because the '999H' is not followed by 999
more characters.
This class returns a parser that causes the input position to advance to column number colNo, where
column numbers are counted starting from 1. The value matched by this parser is the string of characters
between the current position and position colNo. It is an error if the current position is past column
colNo.
In this example, pat is a parser with three parts. The first part matches one to four letters. The second
part skips to column 5. The third part matches one or more digits.
In the first test, the GoToColumn parser returns '@@' because that was the text between the letters and
column 5. In the second test, that parser returns the empty string because there are no characters between
'wxyz' and '987'. In the third example, the part matched by the GoToColumn is empty because white
space is ignored between tokens.
This class causes the value returned from a match to be formed into a list. The parser argument is
some parser that involves repeated tokens such as ZeroOrMore or a delimitedList.
>>> lb = pp.Literal('{')
>>> rb = pp.Literal('}')
>>> wordList = pp.OneOrMore(pp.Word(pp.alphas))
>>> pat1 = lb + wordList + rb
>>> print pat1.parseString('{ant bee crow}')
['{', 'ant', 'bee', 'crow', '}']
>>> pat2 = lb + pp.Group(wordList) + rb
>>> print pat2.parseString('{ant bee crow}')
['{', ['ant', 'bee', 'crow'], '}']
In the example above, both pat1 and pat2 match a sequence of words within {braces}. In the test
of pat1, the result has five elements: the three words and the opening and closing braces. In the test of
pat2, the result has three elements: the open brace, a list of the strings that matched the OneOrMore
parser, and the closing brace.
The matchString argument is a literal string. The resulting parser will match that exact text in the
input. However, unlike the Literal class, the next input character must not be one of the characters
in I. The default value of the identChars argument is a string containing all the letters and digits plus
underbar (_) and dollar sign ($).
If you provide the keyword argument caseless=True, the match will be case-insensitive.
Examples:
>>> key=pp.Keyword('Sir')
>>> print key.parseString('Sir Robin')
['Sir']
An instance of this class matches if the current position is at the end of a line or the end of a string. If it
matches at the end of a line, it returns a newline ('\n') in the result.
In the next example, note that the end of the string does match the pp.LineEnd(), but in this case no
value is added to the result.
For more examples, see Section 7.12, lineEnd: An instance of LineEnd (p. 45).
An instance of this class matches if the current position is at the beginning of a line; that is, if it is either
the beginning of the text or preceded by a newline. It does not advance the current position or contribute
any content to the result.
Here are some examples. The first pattern matches a name at the beginning of a line.
The ansb pattern here matches a name followed by a newline followed by another name. Note that al-
though there are four components, there are only three strings in the result; the pp.LineStart() does
not contribute a result string.
For more examples, see Section 7.13, lineStart: An instance of LineStart (p. 45).
Matches the exact characters of the text argument. Here are some examples.
Use an instance of this class when you want to try to match two or more different parser, but you want
to specify the order of the tests. The parserList argument is a list of parsers.
A typical place to use this is to match the text against a set of strings in which some strings are substrings
of others. For example, suppose your input text has two different command names CATCH and CAT:
you should test for CATCH first. If you test for CAT first, it will match the first three characters of CATCH,
which is probably not what you want.
You may also get the effect of this class by combining the component parsers with the | operator.
The definition of keySet in the above example could also have been done this way:
This is similar to Section 5.10, FollowedBy: Adding lookahead constraints (p. 19) in that it looks to
see if the current text position does not match something, but it does not advance that position. In this
case the parser argument is any parser. The match succeeds if and only if the text at the current position
does not match parser. The current position is not advanced whether parser matches it or not.
In the example below, the pattern matches a sequence of letters, followed by a sequence of digits provided
the first one is not '0'.
An instance of this class matches one or more repetitions of the syntax described by the parser argument.
Use this pattern when a syntactic element is optional. The parser argument is a parser for the optional
pattern. By default, if the pattern is not present, no content is added to the ParseResult; if you would
like to supply content to be added in that case, provide it as the default keyword option.
An instance of this class matches one of a given set of parsers; the argument is a sequence containing
those parsers. If more than one of the parsers match, the parser used will be the one that matches the
longest string of text.
You may also use the ^ operator to construct a set of alternatives. This line is equivalent to the third
line of the example above:
5.24. ParseException
This is the exception thrown when the parse fails. These attributes are available on an instance:
.lineno
The line number where the parse failed, counting from 1.
.col
The column number where the parse failed, counting from 1.
.line
The text of the line in which the parse failed.
If you have assigned names to any of the components of your parser, you can use the ParseResults
instance as if it were a dictionary: the keys are the names, and each related value is the string that
matched that component.
>>> r.asDict()
{'last': 'Piranha', 'first': 'Doug'}
R.asList()
This method returns R as a normal Python list.
>>> r.asList()
['Doug', 'Piranha']
R.copy()
Returns a copy of R.
.get(key, defaultValue=None)
Works like the .get() method on the standard Python dict type: if the ParseResult has no
component named key, the defaultValue is returned.
.insert(where, what)
Like the .insert() method of the Python list type, this method will insert the value of the string
what before position where in the list of strings.
R.items()
This method works like the .items() method of Python's dict type, returning a list of tuples
(key, value).
>>> r.items()
[('last', 'Piranha'), ('first', 'Doug')]
R.keys()
Returns a list of the keys of named results. Continuing the Piranha example:
>>> r.keys()
['last', 'first']
An instance of this class matches a string literal that is delimited by some quote character or characters.
quoteChar
This string argument defines the opening delimiter and, unless you pass an endQuoteChar argu-
ment, also the closing delimiter. The value may have multiple characters.
>>> qs = pp.QuotedString('"')
>>> print qs.parseString('"semprini"')
['semprini']
>>> cc = pp.QuotedString('/*', endQuoteChar='*/')
>>> print cc.parseString("/* Attila the Bun */")
[' Attila the Bun ']
>>> pat = pp.QuotedString('"', escChar='\\')
>>> print pat.parseString(r'"abc\"def"')
['abc"def']
>>> text = """'Ken
... Obvious'"""
>>> print text
'Ken
Obvious'
>>> pat = pp.QuotedString("'")
>>> print pat.parseString(text)
pyparsing.ParseException: Expected quoted string, starting with ' ending
with ' (at char 0), (line:1, col:1)
>>> pat = pp.QuotedString("'", multiline=True)
>>> print pat.parseString(text)
['Ken\nObvious']
>>> pat = pp.QuotedString('|')
>>> print pat.parseString('|clever sheep|')
['clever sheep']
>>> pat = pp.QuotedString('|', unquoteResults=False)
>>> print pat.parseString('|clever sheep|')
['|clever sheep|']
An instance of this class matches a regular expression expressed in the form expected by the Python re
20
module . The argument r may be either a string containing a regular expression, or a compiled regular
expression as an instance of re.RegexObject.
If the argument r is a string, you may provide a flags argument that will be passed to the re.match()
function as its flags argument.
>>> r1 = '[a-e]+'
>>> pat1 = pp.Regex(r1)
>>> print pat1.parseString('aeebbaecd', parseAll=True)
['aeebbaecd']
20
http://docs.python.org/2/library/re.html
An instance of this class will search forward in the input until it finds text that matches a parser target.
include
By default, when text matching the target pattern is found, the position is left at the beginning
of that text. If you specify include=True, the position will be left at the end of the matched text,
and the ParseResult will include a two-element list whose first element is the text that was
skipped and the second element is the text that matched the target parser.
ignore
You can specify a pattern to be ignored while searching for the target by specifying an argument
ignore=p, where p is a parser that matches the pattern to be ignored.
failOn
You can specify a pattern that must not be skipped over by passing an argument failOn=p, where
p is a parser that matches that pattern. If you do this, the SkipTo parser will fail if it ever recognizes
input that matches p.
An instance of this class matches only if the text position is at the end of the string.
An instance of this class matches only if the text position is at the start of the string.
An instance of this class is a parser that matches the same content as parser p, but when it matches text,
the matched text is not deposited into the returned ParseResult instance.
See also the .suppress() method in Section 5.1, ParserElement: The basic parser building
block (p. 11).
An instance of this class matches what parser p matches, but when the matching text is deposited in
the returned ParseResults instance, all lowercase characters are converted to uppercase.
An instance of this class will match multiple characters from a set of characters specified by the arguments.
initChars
If no bodyChars argument is given, this argument specifies all the characters that will be matched.
If a bodyChars string is supplied, initChars specifies valid initial characters, and characters
after the first that are in the bodyChars string will also match.
bodyChars
See initChars.
min
The minimum length to be matched.
max
The maximum length to be matched.
exact
If you supply this argument with a value of some number n, this parser will match exactly n char-
acters.
asKeyword
By default, this parser will disregard the text following the matched part. If you specify as-
Keyword=True, the match will fail if the next character after the matched part is one of the
matching characters (a character in initChars if there is no bodyChars argument, or a character
in bodyChars if that keyword argument is present).
An instance of this class matches only when the previous character (if there is one) is a word character
and the next character is not a word character.
The optional wordChars argument specifies which characters are considered word characters; the default
value is the set of all printable, non-whitespace characters.
An instance of this class matches only when the current position is at the beginning of a word, and the
previous character (if there is one) is not a word character.
The optional wordChars argument specifies which characters are considered word characters; the default
value is the set of all printable, non-whitespace characters.
An instance of this class matches any number of text items, each of which matches parser p, even if
there are no matching items.
6. Functions
These functions are available in the pyparsing module.
The loc argument to this function is the location (Python index, counted from 0) of some position in a
string s. The returned value is the column number of that position within its line, counting from 1.
Newlines ('\n') are treated as line separators.
This rather specialized function creates a parser that matches some count, followed by that many occur-
rences of a pattern matching some parser, like "3 Moe Larry Curly". This function deposits a list
of the values into the returned ParseResults, omitting the count itself. Note that the integers in this
example are returned as type str, not type int.
If the count is for some reason not an integer in the usual form, you can provide an intExpr keyword
argument that specifies a parser that will match the count and return it as a Python int.
This function creates a parser for a sequence P D P D ... D P, where P matches some parser and
D is some delimiter, defaulting to ,.
By default, the result is a list of the P items with the delimiters (D items) omitted.
To include the delimiters and fuse the entire result into a single string, pass in the argument com-
bine=True.
The last example only matches one name because the Combine class suppresses the skipping of
whitespace within its internal pieces.
This function builds a parser that matches a sequence of key text alternating with value text. When
matched, this parser will deposit a dictionary-like value into the returned ParseResults with those
keys and values.
The keyParser argument is a parser that matches the key text and the valueParser is a parser that
matches the value text.
Here is a very simple example to give you the idea. The text to be matched is a sequence of five-character
items, each of which is a one-letter color code followed by a four-character color name.
Here's a slightly more subtle example. The text has the form "degree: name; ...", where the degree
part is the degree of the musical scale as a number, and the name part is the name of that note. Here's
a first attempt.
>>> text = '1, do; 2, re; 3, mi; 4, fa; 5, sol; 6, la; 7, ti'
>>> key = pp.Word(pp.nums) + pp.Suppress(',')
>>> value = pp.Word(pp.alphas) + pp.Suppress(';')
>>> notePat = pp.dictOf(key, value)
>>> noteNames = notePat.parseString(text)
>>> noteNames.keys()
['1', '3', '2', '5', '4', '6']
>>> noteNames['4']
'fa'
>>> noteNames['7']
KeyError: '7'
Note that the last key-value pair is missing. This is because the value pattern requires a trailing semi-
colon, and the text string does not end with one of those. Unless you were careful to check your work,
you might not notice that the last item is missing. This is one reason that it is good practice always to
use the parseAll=True option when calling .parseString(). Notice how that reveals the error:
It's easy enough to fix the definition of the text, but instead let's fix the parser so that it defines value
as ending either with a semicolon or with the end of the string:
Given a string text and a location loc (Python index) within that string, this function returns the line
containing that location, without a line terminator.
The loc argument to this function is the location (Python index, counted from 0) of some position in a
string s. The returned value is the line number of that position, counting from 1. Newlines ('\n') are
treated as line separators. For an example demonstrating this function, see Section 6.1, col(): Convert
a position to a column number (p. 35).
Use this function as a parse action to force a parser to match only at a specific column number within
the line, counting from 1.
This function returns a new parser that matches not only the same pattern as the given parser, but it
matches the value that was matched by parser.
The last example above failed because, even though the string "no" occurred both before and after the
hyphen, the name2 parser matched the entire string "now" before it tested to see if it matched the pre-
vious occurrence "no". Compare the behavior of Section 6.11, matchPreviousLiteral(): Match
the literal text that the preceding expression matched (p. 40).
This function works like the one described in Section 6.10, matchPreviousExpr(): Match the text
that the preceding expression matched (p. 39), except that the returned parser matches the exact
characters that parser matched, without regard for any following context. Compare the example below
with the one in Section 6.10, matchPreviousExpr(): Match the text that the preceding expression
matched (p. 39).
This function returns a parser that matches text that is structured as a nested list, that is, as a sequence
LCR where:
The opener argument L is some opening delimiter string, defaulting to (.
The closer argument R is some closing delimiter string, defaulting to (.
The content argument C is some content that can occur between these two delimiters. Anywhere
in this content, another level of the LCR sequence may occur any number of times. If you don't specify
a content argument, the corresponding value deposited into the returned ParseResults will be
a list of the strings at each level that consist of non-whitespace groups separated by whitespace.
If the content part may contain the L or R delimiter strings inside quote strings, you can specify an
ignoreExpr parser that describes what a quoted string looks like in your context, and the parsing
process will not treat those occurrences as delimiters. The default value I is an instance of Section 5.27,
QuotedString: Match a delimited string (p. 29). If you specify ignoreExpr=None, no occurrences
of the delimiter characters will be ignored.
This function returns a parser that matches one of a set of literals. In particular, if any literal is a substring
of another, this parser will always check for the longer one first; this behavior is useful, for example,
when you are parsing a set of keywords.
The alternatives argument specifies the different literals that the parser will match. This may be
either a list of strings, or one string with the alternatives separated by spaces.
By default, the match will be case-sensitive. To specify a case-insensitive match, pass the argument
caseless=True.
Use this function to create a string that you can pass to pp.Word() to create a parser that will match
any one of a specified sets of characters. The argument allows you to use ranges of character codes so
that you don't have to specify every single character. The syntax of the argument is similar to the [...]
construct of general-purpose regular expressions.
The ranges argument string consists of one or more occurrences of:
Single characters.
A backslash followed by a single character, so that your parser can match characters such as '-' (as
'\-') or ']' (as '\]').
A character specified by its hexadecimal character code as '\xHH'.
A character specified by its octal character code as '\0N...', where N can be one, two or three octal
digits.
Two of the above choices separated by '-', meaning all the characters with codes between those
values, including the endpoints. For example, pp.srange('[a-z]') will return a parser that will
match any lowercase letter.
Here's an example that demonstrates the use of this function in creating a parser for a Python identifier.
If you attach this function to a parser as a parse action, when the parser matches some text, the value
that will be deposited in the ParseResults will be the literal string value.
>>> @pp.traceParseAction
... def basil(toks):
... '''Dummy parse action
... '''
... return None
...
>>> number = pp.Word(pp.nums).setParseAction(basil)
>>> print number.parseString('575')
>>entering wrapper(line: '575', 0, ['575'])
<<leaving wrapper (ret: None)
['575']
7. Variables
These variables are defined in the pyparsing module.
21
http://www.nmt.edu/tcc/help/pubs/docbook43/iso9573/
>>> text = '// Look out of the yard? What will we see?'
>>> print pp.cppStyleComment.parseString(text)
['// Look out of the yard? What will we see?']
>>> print pp.cppStyleComment.parseString('/* Author: R. J. Gumby */')
['/* Author: R. J. Gumby */']
In the next example, pattern pat1 matches a word, followed by the end of a line, followed by pattern
initialWord. The first sample text matches because the position just after a '\n' is considered the
start of a line.
This example fails to match. The first part of pat2 matches the word 'Harpenden'; automatic blank
skipping moves to the beginning of the word 'southeast'; and then the pp.lineStart parser fails
because the space before 'southeast' is not a newline or the beginning of the string.
>>> pp.nums
'0123456789'
>>> len(pp.printables)
94
>>> print pp.printables
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-
./:;<=>?@[\]^_`{|}~
>>> pp.punc8bit
u'\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2
\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xd7\xf7'
Note that the quotes are returned as part of the result. If you don't like that, you can attach the parse
action described in Section 6.15, removeQuotes(): Strip leading trailing quotes (p. 42).
To match the rest of the line including the newline character at the end (if there is one), combine this
with the parser described in Section 7.12, lineEnd: An instance of LineEnd (p. 45).
If internal "\'" sequences were interpreted as escapes, the last line above would have displayed as:
"Don't"