Pyparsing Docs Readthedocs Io en Latest
Pyparsing Docs Readthedocs Io en Latest
Release 3.0.0b1
Paul T. McGuire
3 pyparsing 25
3.1 pyparsing module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Index 85
i
ii
PyParsing Documentation, Release 3.0.0b1
Release v3.0.0b1
Contents: 1
PyParsing Documentation, Release 3.0.0b1
2 Contents:
CHAPTER 1
Contents
3
PyParsing Documentation, Release 3.0.0b1
An excellent new enhancement is the new railroad diagram generator for documenting pyparsing parsers:
import pyparsing as pp
from pyparsing.diagram import to_railroad, railroad_to_html
from pathlib import Path
Cleaned up default tracebacks when getting a ParseException when calling parseString. Exception traces
should now stop at the call in parseString, and not include the internal traceback frames. (If the full traceback is
desired, then set ParserElement.verbose_traceback to True.)
Expanded __diag__ and __compat__ to actual classes instead of just namespaces, to add some helpful behavior:
• enable() and disable() methods to give extra help when setting or clearing flags (detects invalid flag
names, detects when trying to set a __compat__ flag that is no longer settable). Use these methods now to set
or clear flags, instead of directly setting to True or False:
import pyparsing as pp
pp.__diag__.enable("warn_multiple_tokens_in_named_alternation")
pp.__diag__.enable_all_warnings()
To correct this, use the '<<=' operator (preferred) or parentheses to override operator precedence:
or:
• Enhanced default strings created for Word expressions, now showing string ranges if possible. Word(alphas)
would formerly print as W:(ABCD...), now prints as W:(A-Za-z).
• Added ignoreWhitespace(recurse:bool = True) and added a recurse argument to
leaveWhitespace, both added to provide finer control over pyparsing’s whitespace skipping. Con-
tributed by Michael Milton.
• Added ParserElement.recurse() method to make it simpler for grammar utilities to navigate through
the tree of expressions in a pyparsing grammar.
• Minor reformatting of output from runTests to make embedded comments more visible.
• New pyparsing_test namespace, assert methods and classes added to support writing unit tests.
– assertParseResultsEquals
– assertParseAndCheckList
– assertParseAndCheckDict
– assertRunTestResults
– assertRaisesParseException
– reset_pyparsing_context context manager, to restore pyparsing config settings
• Enhanced error messages and error locations when parsing fails on the Keyword or CaselessKeyword
classes due to the presence of a preceding or trailing keyword character.
• Enhanced the Regex class to be compatible with re’s compiled with the re-equivalent regex module. Individ-
ual expressions can be built with regex compiled expressions using:
import pyparsing as pp
import regex
• countedArray formerly returned its list of items nested within another list, so that accessing the items re-
quired indexing the 0’th element to get the actual list. This extra nesting has been removed. In addition, if there
are other metadata fields parsed between the count and the list items, they can be preserved in the resulting list
if given results names.
• ParseException.explain() is now an instance method of ParseException:
expr = pp.Word(pp.nums) * 3
try:
expr.parseString("123 456 A789")
except pp.ParseException as pe:
print(pe.explain(depth=0))
prints:
Removed Py2.x support and other deprecated features. Pyparsing now requires Python 3.5 or later. If you are using
an earlier version of Python, you must use a Pyparsing 2.4.x version.
• Fixed bug in regex definitions for real and sci_real expressions in pyparsing_common.
• Fixed FutureWarning raised beginning in Python 3.7 for Regex expressions containing ‘[‘ within a regex
set.
• Fixed bug in PrecededBy which caused infinite recursion.
• Fixed bug in CloseMatch where end location was incorrectly computed; and updated
partial_gene_match.py example.
• Fixed bug in indentedBlock with a parser using two different types of nested indented blocks with different
indent values, but sharing the same indent stack.
• Fixed bug in Each when using Regex, when Regex expression would get parsed twice.
• Fixed FutureWarning that sometimes are raised when '[' passed as a character to Word.
And finally, many thanks to those who helped in the restructuring of the pyparsing code base as part of this release.
Pyparsing now has more standard package structure, more standard unit tests, and more standard code formatting
(using black). Special thanks to jdufresne, klahnakoski, mattcarmody, and ckeygusuz, tmiguelt, and toonarmycaptain
to name just a few.
Contents
9
PyParsing Documentation, Release 3.0.0b1
* 1.4.1 Usage
* 1.4.2 Example
* 1.4.3 Customization
Note: While this content is still valid, there are more detailed descriptions and examples at the online doc server at
https://pyparsing-docs.readthedocs.io/en/latest/pyparsing.html
To parse an incoming data string, the client code must follow these steps:
1. First define the tokens and patterns to be matched, and assign this to a program variable. Optional results names
or parsing actions can also be defined at this time.
2. Call parseString() or scanString() on this variable, passing in the string to be parsed. During the
matching process, whitespace between tokens is skipped by default (although this can be changed). When token
matches occur, any defined parse action methods are called.
3. Process the parsed results, returned as a list of strings. Matching results may also be accessed as named attributes
of the returned results, if names are defined in the definition of the token pattern, using setResultsName().
The following complete Python program will parse the greeting “Hello, World!”, or any other greeting of the form
“<salutation>, <addressee>!”:
• The pyparsing module can be used to interpret simple command strings or algebraic expressions, or can be used
to extract data from text reports with complicated format and structure (“screen or report scraping”). However,
it is possible that your defined matching patterns may accept invalid inputs. Use pyparsing to extract data from
strings assumed to be well-formatted.
• To keep up the readability of your code, use operators such as +, |, ^, and ~ to combine expressions. You can
also combine string literals with ParseExpressions - they will be automatically converted to Literal objects. For
example:
In the definition of equation, the string "=" will get added as a Literal("="), but in a more readable
way.
• The pyparsing module’s default behavior is to ignore whitespace. This is the case for 99% of all parsers ever
written. This allows you to write simple, clean, grammars, such as the above equation, without having to
clutter it up with extraneous ws markers. The equation grammar will successfully parse all of the following
statements:
x=2+2
x = 2+2
a = 10 * 4
r= 1234/ 100000
Of course, it is quite simple to extend this example to support more elaborate expressions, with nesting with
parentheses, floating point numbers, scientific notation, and named constants (such as e or pi). See fourFn.
py, included in the examples directory.
• To modify pyparsing’s default whitespace skipping, you can use one or more of the following methods:
– use the static method ParserElement.setDefaultWhitespaceChars to override the normal
set of whitespace chars (‘ tn’). For instance when defining a grammar in which newlines are significant,
you should call ParserElement.setDefaultWhitespaceChars(' \t') to remove newline
from the set of skippable whitespace characters. Calling this method will affect all pyparsing expressions
defined afterward.
– call leaveWhitespace() on individual expressions, to suppress the skipping of whitespace before
trying to match the expression
– use Combine to require that successive expressions must be adjacent in the input string. For instance, this
expression:
will match “3.14159”, but will also match “3 . 12”. It will also return the matched results as [‘3’, ‘.’,
‘14159’]. By changing this expression to:
it will not match numbers with embedded spaces, and it will return a single concatenated string ‘3.14159’
as the parsed token.
• Repetition of expressions can be indicated using * or [] notation. An expression may be multiplied by an integer
value (to indicate an exact repetition count), or indexed with a tuple, representing min and max repetitions (with
... representing no min or no max, depending whether it is the first or second tuple element). See the following
examples, where n is used to indicate an integer value:
– expr*3 is equivalent to expr + expr + expr
– expr[2, 3] is equivalent to expr + expr + Optional(expr)
– expr[n, ...] or expr[n,] is equivalent to expr*n + ZeroOrMore(expr) (read as “at least
n instances of expr”)
– expr[... ,n] is equivalent to expr*(0, n) (read as “0 to n instances of expr”)
– expr[...] and expr[0, ...] are equivalent to ZeroOrMore(expr)
– expr[1, ...] is equivalent to OneOrMore(expr)
Note that expr[..., n] does not raise an exception if more than n exprs exist in the input stream; that is,
expr[..., n] does not enforce a maximum number of expr occurrences. If this behavior is desired, then
write expr[..., n] + ~expr.
• MatchFirst expressions are matched left-to-right, and the first match found will skip all later expressions
within, so be sure to define less-specific patterns after more-specific patterns. If you are not sure which expres-
sions are most specific, use Or expressions (defined using the ^ operator) - they will always match the longest
expression, although they are more compute-intensive.
• Or expressions will evaluate all of the specified subexpressions to determine which is the “best” match, that is,
which matches the longest string in the input data. In case of a tie, the left-most expression in the Or list will
win.
• If parsing the contents of an entire file, pass it to the parseFile method using:
expr.parseFile(sourceFile)
• ParseExceptions will report the location where an expected token or expression failed to match. For
example, if we tried to use our “Hello, World!” parser to parse “Hello World!” (leaving out the separating
comma), we would get an exception, with the message:
In the case of complex expressions, the reported location may not be exactly where you would expect. See more
information under ParseException .
• Use the Group class to enclose logical groups of tokens within a sublist. This will help organize your results
into more hierarchical form (the default behavior is to return matching tokens as a flat list of matching input
strings).
• Punctuation may be significant for matching, but is rarely of much interest in the parsed results. Use the
suppress() method to keep these tokens from cluttering up your returned lists of tokens. For example,
delimitedList() matches a succession of one or more expressions, separated by delimiters (commas by
default), but only returns a list of the actual expressions - the delimiters are used for parsing, but are suppressed
from the returned output.
• Parse actions can be used to convert values from strings to other data types (ints, floats, booleans, etc.).
• Results names are recommended for retrieving tokens from complex expressions. It is much easier to access a
token using its field name than using a positional index, especially if the expression contains optional elements.
You can also shortcut the setResultsName call:
• Be careful when defining parse actions that modify global variables or data structures (as in fourFn.py),
especially for low level tokens or expressions that may occur within an And expression; an early element of an
And may match, but the overall expression may fail.
ParserElement - abstract base class for all pyparsing classes; methods for code to use are:
• parseString(sourceString, parseAll=False) - only called once, on the overall matching pat-
tern; returns a ParseResults object that makes the matched tokens available as a list, and optionally as a dictio-
nary, or as an object with named attributes; if parseAll is set to True, then parseString will raise a ParseException
if the grammar does not process the complete input string.
• parseFile(sourceFile) - a convenience function, that accepts an input file object or filename. The file
contents are passed as a string to parseString(). parseFile also supports the parseAll argument.
• scanString(sourceString) - generator function, used to find and extract matching text in the given
source string; for each matched text, returns a tuple of:
– matched tokens (packaged as a ParseResults object)
– start location of the matched text in the given source string
– end location in the given source string
scanString allows you to scan through the input source string for random matches, instead of exhaustively
defining the grammar for the entire source text (as would be required with parseString).
• transformString(sourceString) - convenience wrapper function for scanString, to process the
input source string, and replace matching text with the tokens returned from parse actions defined in the grammar
(see setParseAction).
• searchString(sourceString) - another convenience wrapper function for scanString, returns a list
of the matching tokens returned from each call to scanString.
• setName(name) - associate a short descriptive name for this element, useful in displaying exceptions and
trace information
• runTests(testsString) - useful development and testing method on expressions, to pass a multiline
string of sample strings to test against the expression. Comment lines (beginning with #) can be inserted and
they will be included in the test output:
digits = Word(nums).setName(“numeric digits”) real_num = Combine(digits + ‘.’ + digits)
real_num.runTests(“”“
If fn modifies the toks list in-place, it does not need to return and pyparsing will use the modified toks list.
• addParseAction - similar to setParseAction, but instead of replacing any previously defined parse
actions, will append the given action or actions to the existing defined parse actions.
• setBreak(breakFlag=True) - if breakFlag is True, calls pdb.set_break() as this expression is about to be
parsed
• copy() - returns a copy of a ParserElement; can be used to use the same parse expression in different places
in a grammar, with different parse actions attached to each
• leaveWhitespace() - change default behavior of skipping whitespace before starting matching (mostly
used internally to the pyparsing module, rarely used by client code)
• setWhitespaceChars(chars) - define the set of chars to be ignored as whitespace before trying to match
a specific ParserElement, in place of the default set of whitespace (space, tab, newline, and return)
• setDefaultWhitespaceChars(chars) - class-level method to override the default set of whitespace
chars for all subsequently created ParserElements (including copies); useful when defining grammars that treat
one or more of the default whitespace characters as significant (such as a line-sensitive grammar, to omit newline
from the list of ignorable whitespace)
• suppress() - convenience function to suppress the output of the given element, instead of wrapping it with
a Suppress object.
• ignore(expr) - function to specify parse expression to be ignored while matching defined patterns; can be
called repeatedly to specify multiple expressions; useful to specify patterns of comment syntax, for example
• setDebug(dbgFlag=True) - function to enable/disable tracing output when trying to match this element
• validate() - function to verify that the defined grammar does not contain infinitely recursive constructs
• parseWithTabs() - function to override default behavior of converting tabs to spaces before parsing the
input string; rarely used, except when specifying whitespace-significant grammars using the White class.
• enablePackrat() - a class-level static method to enable a memoizing performance enhancement, known
as “packrat parsing”. packrat parsing is disabled by default, since it may conflict with some user programs
that use parse actions. To activate the packrat feature, your program must call the class method ParserEle-
ment.enablePackrat(). For best results, call enablePackrat() immediately after importing pyparsing.
- Word(alphas+"_", alphanums+"_")
- Word(srange("[a-zA-Z_]"), srange("[a-zA-Z0-9_]"))
If only one string given, it specifies that the same character set defined for the initial character is used for the
word body; for instance, to define an identifier that can only be composed of capital letters and underscores, use:
- Word("ABCDEFGHIJKLMNOPQRSTUVWXYZ_")
- Word(srange("[A-Z_]"))
A Word may also be constructed with any of the following optional parameters:
– min - indicating a minimum length of matching characters
– max - indicating a maximum length of matching characters
– exact - indicating an exact length of matching characters
If exact is specified, it will override any values for min or max.
Sometimes you want to define a word using all characters in a range except for one or two of them; you can do
this with the new excludeChars argument. This is helpful if you want to define a word with all printables
except for a single delimiter character, such as ‘.’. Previously, you would have to create a custom string to pass
to Word. With this change, you can just create Word(printables, excludeChars='.').
• Char - a convenience form of Word that will match just a single character from a string of matching characters
single_digit = Char(nums)
• CharsNotIn - similar to Word, but matches characters not in the given constructor string (accepts only one
string for both initial and body characters); also supports min, max, and exact optional parameters.
• Regex - a powerful construct, that accepts a regular expression to be matched at the current parse position;
accepts an optional flags parameter, corresponding to the flags parameter in the re.compile method; if the
expression includes named sub-fields, they will be represented in the returned ParseResults
• QuotedString - supports the definition of custom quoted string formats, in addition to pyparsing’s built-
in dblQuotedString and sglQuotedString. QuotedString allows you to specify the following
parameters:
– quoteChar - string of one or more characters defining the quote delimiting string
– escChar - character to escape quotes, typically backslash (default=None)
– escQuote - special quote sequence to escape an embedded quote string (such as SQL’s “” to escape an
embedded “) (default=None)
– multiline - boolean indicating whether quotes can span multiple lines (default=False)
– unquoteResults - boolean indicating whether the matched text should be unquoted (default=True)
– endQuoteChar - string of one or more characters defining the end of the quote delimited string (de-
fault=None => same as quoteChar)
• SkipTo - skips ahead in the input string, accepting any characters up to the specified pattern; may be con-
structed with the following optional parameters:
– include - if set to true, also consumes the match expression (default is false)
– ignore - allows the user to specify patterns to not be matched, to prevent false matches
– failOn - if a literal string or expression is given for this argument, it defines an expression that should
cause the SkipTo expression to fail, and not skip over that expression
SkipTo can also be written using ...:
LBRACE, RBRACE = map(Literal, “{}”) brace_expr = LBRACE + SkipTo(RBRACE) + RBRACE
# can also be written as brace_expr = LBRACE + . . . + RBRACE
• White - also similar to Word, but matches whitespace characters. Not usually needed, as whitespace is implic-
itly ignored by pyparsing. However, some grammars are whitespace-sensitive, such as those that use leading
tabs or spaces to indicating grouping or hierarchy. (If matching on tab characters, be sure to call parseWithTabs
on the top-level parse element.)
• Empty - a null expression, requiring no characters - will always match; useful for debugging and for specialized
grammars
• NoMatch - opposite of Empty, will never match; useful for debugging and for specialized grammars
• And - construct with a list of ParserElements, all of which must match for And to match; can also be created
using the ‘+’ operator; multiple expressions can be Anded together using the ‘*’ operator as in:
A special form of And is created if the ‘-‘ operator is used instead of the ‘+’ operator. In the ipAddress example
above, if no trailing ‘.’ and Word(nums) are found after matching the initial Word(nums), then pyparsing will
back up in the grammar and try other alternatives to ipAddress. However, if ipAddress is defined as:
then no backing up is done. If the first Word(nums) of strictIpAddress is matched, then any mismatch after that
will raise a ParseSyntaxException, which will halt the parsing process immediately. By careful use of the ‘-‘
operator, grammars can provide meaningful error messages close to the location where the incoming text does
not match the specified grammar.
• Or - construct with a list of ParserElements, any of which must match for Or to match; if more than one
expression matches, the expression that makes the longest match will be used; can also be created using the ‘^’
operator
• MatchFirst - construct with a list of ParserElements, any of which must match for MatchFirst to match;
matching is done left-to-right, taking the first expression that matches; can also be created using the ‘|’ operator
• Each - similar to And, in that all of the provided expressions must match; however, Each permits matching to
be done in any order; can also be created using the ‘&’ operator
• Optional - construct with a ParserElement, but this element is not required to match; can be constructed with
an optional default argument, containing a default string or object to be supplied if the given optional parse
element is not found in the input string; parse action will only be called if a match is found, or if a default is
specified
• ZeroOrMore - similar to Optional, but can be repeated; ZeroOrMore(expr) can also be written as
expr[...].
• OneOrMore - similar to ZeroOrMore, but at least one match must be present; OneOrMore(expr) can also
be written as expr[1, ...].
• FollowedBy - a lookahead expression, requires matching of the given expressions, but does not advance the
parsing position within the input string
• NotAny - a negative lookahead expression, prevents matching of named expressions, does not advance the
parsing position within the input string; can also be created using the unary ‘~’ operator
• ^ - creates Or (longest match) using the expressions before and after the operator
• & - creates Each using the expressions before and after the operator
• * - creates And by multiplying the expression by the integer operand; if expression is multiplied by a 2-tuple,
creates an And of (min,max) expressions (similar to “{min,max}” form in regular expressions); if min is None,
intepret as (0,max); if max is None, interpret as expr*min + ZeroOrMore(expr)
• - - like + but with no backup and retry of alternatives
• * - repetition of expression
• == - matching expression to string; returns True if the string matches the given expression
• <<= - inserts the expression following the operator as the body of the Forward expression before the operator
(<< can also be used, but <<= is preferred to avoid operator precedence misinterpretation of the pyparsing
expression)
• Combine - joins all matched tokens into a single string, using specified joinString (default joinString="");
expects all matching tokens to be adjacent, with no intervening whitespace (can be overridden by specifying
adjacent=False in constructor)
• Suppress - clears matched tokens; useful to keep returned results from being cluttered with required but
uninteresting tokens (such as list delimiters)
• Group - causes the matched tokens to be enclosed in a list; useful in repeated elements like ZeroOrMore and
OneOrMore to break up matched tokens into groups for each repeated pattern
• Dict - like Group, but also constructs a dictionary, using the [0]’th elements of all enclosed token lists as the
keys, and each token list as the value
• SkipTo - catch-all matching expression that accepts all characters up until the given pattern is found to match;
useful for specifying incomplete grammars
• Forward - placeholder token used to define recursive token patterns; when defining the actual expression later
in the program, insert it into the Forward object using the << operator (see fourFn.py for an example).
• ParseResults - class used to contain and manage the lists of tokens created from parsing the input using the
user-defined parse expression. ParseResults can be accessed in a number of ways:
– as a list
* if setResultsName() is used to name elements within the overall parse expression, then these
fields can be referenced as dictionary elements or as attributes
* the Dict class generates dictionary entries using the data of the input text - in addition to ParseRe-
sults listed as [ [ a1, b1, c1, ...], [ a2, b2, c2, ...] ] it also acts as a dictio-
nary with entries defined as { a1 : [ b1, c1, ... ] }, { a2 : [ b2, c2, ...
] }; this is especially useful when processing tabular data where the first column contains a key value
for that line of data
* list elements that are deleted using del will still be accessible by their dictionary keys
* supports get(), items() and keys() methods, similar to a dictionary
* a keyed item can be extracted and removed using pop(key). Here key must be non-numeric (such
as a string), in order to use dict extraction instead of list extraction.
* new named elements can be added (in a parse action, for instance), using the same syntax as adding an
item to a dict (parseResults["X"] = "new item"); named elements can be removed using
del parseResults["X"]
– as a nested list
* results returned from the Group class are encapsulated within their own list structure, so that the
tokens can be handled as a hierarchical tree
ParseResults can also be converted to an ordinary list of strings by calling asList(). Note that this will strip
the results of any field names that have been defined for any embedded parse elements. (The pprint module
is especially good at printing out the nested contents given by asList().)
Finally, ParseResults can be viewed by calling dump(). dump()` will first show the
``asList() output, followed by an indented structure listing parsed tokens that have been assigned results
names.
• ParseException - exception returned when a grammar parse fails; ParseExceptions have attributes loc, msg,
line, lineno, and column; to view the text line and location where the reported ParseException occurs, use:
badGrammar = Forward()
goodToken = Literal("A")
badGrammar <<= Optional(goodToken) + badGrammar
• ParseFatalException - exception that parse actions can raise to stop parsing immediately. Should be
used when a semantic error is found in the input text, such as a mismatched XML tag.
• ParseSyntaxException - subclass of ParseFatalException raised when a syntax error is found,
based on the use of the ‘-‘ operator when defining a sequence of expressions in an And expression.
You can also get some insights into the parsing logic using diagnostic parse actions, and setDebug(), or test the
matching of expression fragments by testing them using scanString().
– opExpr - the pyparsing expression for the operator; may also be a string, which will be converted to
a Literal; if None, indicates an empty operator, such as the implied multiplication operation between
‘m’ and ‘x’ in “y = mx + b”.
– numTerms - the number of terms for this operator (must be 1, 2, or 3)
– rightLeftAssoc is the indicator whether the operator is right or left associative, using the
pyparsing-defined constants opAssoc.RIGHT and opAssoc.LEFT.
– parseAction is the parse action to be associated with expressions matching this operator expres-
sion (the parseAction tuple member may be omitted)
3. Call infixNotation passing the operand expression and the operator precedence list, and save the
returned value as the generated pyparsing expression. You can then use this expression to parse input
strings, or incorporate it into a larger, more complex grammar.
• matchPreviousLiteral and matchPreviousExpr - function to define and expression that matches
the same content as was parsed in a previous parse expression. For instance:
first = Word(nums)
matchExpr = first + ":" + matchPreviousLiteral(first)
will match “1:1”, but not “1:2”. Since this matches at the literal level, this will also match the leading “1:1” in
“1:10”.
In contrast:
first = Word(nums)
matchExpr = first + ":" + matchPreviousExpr(first)
will not match the leading “1:1” in “1:10”; the expressions are evaluated first, and then compared, so “1” is
compared with “10”.
• nestedExpr(opener, closer, content=None, ignoreExpr=quotedString) - method for
defining nested lists enclosed in opening and closing delimiters.
– opener - opening character for a nested list (default=”(“); can also be a pyparsing expression
– closer - closing character for a nested list (default=”)”); can also be a pyparsing expression
– content - expression for items within the nested lists (default=None)
– ignoreExpr - expression for ignoring opening and closing delimiters (default=quotedString)
If an expression is not provided for the content argument, the nested expression will capture all whitespace-
delimited content between delimiters as a list of separate values.
Use the ignoreExpr argument to define expressions that may contain opening or closing characters that should
not be treated as opening or closing characters for nesting, such as quotedString or a comment expression.
Specify multiple expressions using an Or or MatchFirst. The default is quotedString, but if no expressions are
to be ignored, then pass None for this argument.
• indentedBlock(statementExpr, indentationStackVar, indent=True) - function to de-
fine an indented block of statements, similar to indentation-based blocking in Python source code:
– statementExpr - the expression defining a statement that will be found in the indented block; a valid
indentedBlock must contain at least 1 matching statementExpr
– indentationStackVar - a Python list variable; this variable should be common to all
indentedBlock expressions defined within the same grammar, and should be reinitialized to [1] each
time the grammar is to be used
– indent - a boolean flag indicating whether the expressions within the block must be indented from the
current parse location; if using indentedBlock to define the left-most statements (all starting in column
1), set indent to False
• originalTextFor(expr) - helper function to preserve the originally parsed text, regardless of any token
processing or conversion done by the contained expression. For instance, the following expression:
will return the parse of “John Smith” as [‘John’, ‘Smith’]. In some applications, the actual name as it was given
in the input string is what is desired. To do this, use originalTextFor:
• ungroup(expr) - function to “ungroup” returned tokens; useful to undo the default behavior of And to
always group the returned tokens, even if there is only one in the list. (New in 1.5.6)
• lineno(loc, string) - function to give the line number of the location within the string; the first line is
line 1, newlines start new rows
• col(loc, string) - function to give the column number of the location within the string; the first column
is column 1, newlines reset the column number to 1
• line(loc, string) - function to retrieve the line of text representing lineno(loc, string); useful
when printing out diagnostic messages for exceptions
• srange(rangeSpec) - function to define a string of characters, given a string of the form used by regexp
string ranges, such as "[0-9]" for all numeric digits, "[A-Z_]" for uppercase characters plus underscore,
and so on (note that rangeSpec does not include support for generic regular expressions, just string range specs)
• getTokensEndLoc() - function to call from within a parse action to get the ending location for the matched
tokens
• traceParseAction(fn) - decorator function to debug parse actions. Lists each call, called arguments, and
return value or exception
• removeQuotes - removes the first and last characters of a quoted string; useful to remove the delimiting
quotes from quoted strings
• replaceWith(replString) - returns a parse action that simply returns the replString; useful when using
transformString, or converting HTML entities, as in:
nbsp = Literal(" ").setParseAction(replaceWith("<BLANK>"))
• keepOriginalText- (deprecated, use originalTextFor instead) restores any internal whitespace or sup-
pressed text within the tokens for a matched parse expression. This is especially useful when defining ex-
pressions for scanString or transformString applications.
• withAttribute(*args, **kwargs) - helper to create a validating parse action to be used with start
tags created with makeXMLTags or makeHTMLTags. Use withAttribute to qualify a starting tag with a
required attribute value, to avoid false matches on common tags such as <TD> or <DIV>.
withAttribute can be called with:
– keyword arguments, as in (class="Customer", align="right"), or
– a list of name-value tuples, as in (("ns1:class", "Customer"), ("ns2:align",
"right"))
An attribute can be specified to have the special value withAttribute.ANY_VALUE, which will match any
value - use this to ensure that an attribute is present but any attribute value is acceptable.
• downcaseTokens - converts all matched tokens to lowercase
• upcaseTokens - converts all matched tokens to uppercase
• matchOnlyAtCol(columnNumber) - a parse action that verifies that an expression was matched at a
particular column, raising a ParseException if matching at a different column number; useful when parsing
tabular data
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ
Grammars are conventionally represented in what are called “railroad diagrams”, which allow you to visually follow
the sequence of tokens in a grammar along lines which are a bit like train tracks. You might want to generate a railroad
diagram for your grammar in order to better understand it yourself, or maybe to communicate it to others.
To generate a railroad diagram in pyparsing, you first have to install pyparsing with the diagrams extra. To do this,
just run pip install pyparsing[diagrams], and make sure you add pyparsing[diagrams] to any
setup.py or requirements.txt that specifies pyparsing as a dependency.
Next, run pyparsing.diagrams.to_railroad() to convert your grammar into a form understood by the
railroad-diagrams module, and then pyparsing.diagrams.railroad_to_html() to convert that into an
HTML document. For example:
You can view an example railroad diagram generated from a pyparsing grammar for SQL SELECT statements here.
railroad.DIAGRAM_CLASS = "my-custom-class"
my_railroad = to_railroad(my_grammar)
pyparsing
3.1.1 pyparsing module - Classes and methods to define and execute parsing
grammars
The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional
lex/yacc approach, or the use of regular expressions. With pyparsing, you don’t need to learn a new syntax for
defining grammars or matching expressions - the parsing module provides a library of classes that you use to construct
the grammar directly in Python.
Here is a program to parse “Hello, World!” (or any greeting of the form "<salutation>, <addressee>!"),
built up using Word, Literal, and And elements (the '+' operators create And expressions, and the strings are
auto-converted to Literal expressions):
The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of
'+', '|', '^' and '&' operators.
The ParseResults object returned from ParserElement.parseString can be accessed as a nested list, a
dictionary, or an object with named attributes.
The pyparsing module handles some of the problems that are typically vexing when writing text parsers:
25
PyParsing Documentation, Release 3.0.0b1
• extra or missing whitespace (the above program will also handle “Hello,World!”, “Hello , World !”, etc.)
• quoted strings
• embedded comments
Getting Started -
Visit the classes ParserElement and ParseResults to see the base classes that most other pyparsing classes
inherit from. Use the docstrings for examples of how to:
• construct literal match expressions from Literal and CaselessLiteral classes
• construct character word-group expressions using the Word class
• see how to create repetitive expressions using ZeroOrMore and OneOrMore classes
• use '+', '|', '^', and '&' operators to combine simple expressions into more complex ones
• associate names with your parsed results using ParserElement.setResultsName
• access the parsed data, which is returned as a ParseResults object
• find some helpful expression short-cuts like delimitedList and oneOf
• find more useful common expressions in the pyparsing_common namespace class
class pyparsing.And(exprs, savelist=True)
Bases: pyparsing.core.ParseExpression
Requires all given ParseExpression s to be found in the given order. Expressions may be separated by
whitespace. May be constructed using the '+' operator. May also be constructed using the '-' operator,
which will suppress backtracking.
Example:
integer = Word(nums)
name_expr = OneOrMore(Word(alphas))
26 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
Example:
prints:
ignore(other)
Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly,
to define multiple comment or other ignorable patterns.
Example:
patt = OneOrMore(Word(alphas))
patt.parseString('ablaj /* comment */ lskjd')
# -> ['ablaj']
patt.ignore(cStyleComment)
(continues on next page)
data_word = Word(alphas)
label = data_word + FollowedBy(':')
attr_expr = Group(label + Suppress(':') + OneOrMore(data_word).setParseAction(' '.
˓→join))
text = "shape: SQUARE posn: upper left color: light blue texture: burlap"
attr_expr = (label + Suppress(':') + OneOrMore(data_word, stopOn=label).
˓→setParseAction(' '.join))
result = Dict(OneOrMore(Group(attr_expr))).parseString(text)
print(result.dump())
prints:
28 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
color = oneOf("RED ORANGE YELLOW GREEN BLUE PURPLE BLACK WHITE BROWN")
shape_type = oneOf("SQUARE CIRCLE TRIANGLE STAR HEXAGON OCTAGON")
integer = Word(nums)
shape_attr = "shape:" + shape_type("shape")
posn_attr = "posn:" + Group(integer("x") + ',' + integer("y"))("posn")
color_attr = "color:" + color("color")
size_attr = "size:" + integer("size")
shape_spec.runTests('''
shape: SQUARE color: BLACK posn: 100, 120
shape: CIRCLE size: 50 color: BLUE posn: 50,80
color:GREEN size:20 shape:TRIANGLE posn:20,40
'''
)
prints:
- color: GREEN
- posn: ['20', ',', '40']
- x: 20
- y: 40
- shape: TRIANGLE
- size: 20
prints:
[['shape', 'SQUARE'], ['color', 'BLACK'], ['posn', 'upper left']]
thereby leaving b and c out as parseable alternatives. It is recommended that you explicitly group the values
inserted into the Forward:
fwdExpr << (a | b | c)
Converting to use the '<<=' operator instead will avoid this problem.
See ParseResults.pprint for an example of a recursive parser created using Forward.
copy()
Make a copy of this ParserElement. Useful for defining different parse actions for the same parsing
pattern, using copies of the original parse element.
Example:
integer = Word(nums).setParseAction(lambda toks: int(toks[0]))
integerK = integer.copy().addParseAction(lambda toks: toks[0] * 1024) +
˓→Suppress("K")
30 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
prints:
ignoreWhitespace(recursive=True)
Enables the skipping of whitespace before matching the characters in the ParserElement’s defined
pattern.
Parameters recursive – If true (the default), also enable whitespace skipping in child ele-
ments (if any)
leaveWhitespace(recursive=True)
Disables the skipping of whitespace before matching the characters in the ParserElement’s defined
pattern. This is normally only used internally by the pyparsing module, but may be needed in some
whitespace-sensitive grammars.
Parameters recursive – If true (the default), also disable whitespace skipping in child ele-
ments (if any)
parseImpl(instring, loc, doActions=True)
streamline()
validate(validateTrace=None)
Check defined expressions for valid structure, check for infinite recursive definitions.
class pyparsing.GoToColumn(colno)
Bases: pyparsing.core._PositionToken
Token to advance to a specific column of input text; useful for tabular report scraping.
parseImpl(instring, loc, doActions=True)
preParse(instring, loc)
class pyparsing.Group(expr)
Bases: pyparsing.core.TokenConverter
Converter to return the matched tokens as a list - useful for returning tokens of ZeroOrMore and OneOrMore
expressions.
Example:
ident = Word(alphas)
num = Word(nums)
term = ident | num
func = ident + Optional(delimitedList(term))
print(func.parseString("fn a, b, 100"))
# -> ['fn', 'a', 'b', '100']
(continues on next page)
prints:
32 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
class pyparsing.LineEnd
Bases: pyparsing.core._PositionToken
Matches if current position is at the end of a line within the parse string
parseImpl(instring, loc, doActions=True)
class pyparsing.LineStart
Bases: pyparsing.core._PositionToken
Matches if current position is at the beginning of a line within the parse string
Example:
test = '''\
AAA this line
AAA and this line
AAA but not this one
B AAA and definitely not this one
'''
prints:
If the lookbehind expression is a string, Literal, Keyword, or a Word or CharsNotIn with a specified
exact or maximum length, then the retreat parameter is not required. Otherwise, retreat must be specified to give
a maximum number of characters to look back from the current parse position for a lookbehind match.
Example:
34 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
data_word = Word(alphas)
label = data_word + FollowedBy(':')
attr_expr = Group(label + Suppress(':') + OneOrMore(data_word).setParseAction(' '.
˓→join))
# use stopOn attribute for OneOrMore to avoid reading label string as part of the
˓→data
class pyparsing.OnlyOnce(methodCall)
Bases: object
Wrapper for parse actions, to ensure they are only called once.
reset()
Allow the associated parse action to be called once more.
class pyparsing.Optional(expr, default=<pyparsing.core._NullToken object>)
Bases: pyparsing.core.ParseElementEnhance
Optional matching of the given expression.
Parameters:
• expr - expression that must match zero or more times
• default (optional) - value to be returned if the optional expression is not found.
Example:
# ZIP+4 form
12101-0001
# invalid ZIP
98765-
''')
prints:
# ZIP+4 form
12101-0001
['12101-0001']
# invalid ZIP
98765-
^
FAIL: Expected end of text (at char 5), (line:1, col:6)
prints:
36 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
Parameters:
• depth (default=16) - number of levels back in the stack trace to list expression and function names; if
None, the full stack trace names will be listed; if 0, only the failing input line, marker, and exception
string will be shown
Returns a multi-line string listing the ParserElements and/or function names in the exception’s stack trace.
Example:
expr = pp.Word(pp.nums) * 3
try:
expr.parseString("123 456 A789")
except pp.ParseException as pe:
print(pe.explain(depth=0))
prints:
patt = OneOrMore(Word(alphas))
patt.parseString('ablaj /* comment */ lskjd')
# -> ['ablaj']
patt.ignore(cStyleComment)
(continues on next page)
ignoreWhitespace(recursive=True)
Enables the skipping of whitespace before matching the characters in the ParserElement’s defined
pattern.
Parameters recursive – If true (the default), also enable whitespace skipping in child ele-
ments (if any)
leaveWhitespace(recursive=True)
Disables the skipping of whitespace before matching the characters in the ParserElement’s defined
pattern. This is normally only used internally by the pyparsing module, but may be needed in some
whitespace-sensitive grammars.
Parameters recursive – If true (the default), also disable whitespace skipping in child ele-
ments (if any)
parseImpl(instring, loc, doActions=True)
recurse()
streamline()
validate(validateTrace=None)
Check defined expressions for valid structure, check for infinite recursive definitions.
exception pyparsing.ParseException(pstr, loc=0, msg=None, elem=None)
Bases: pyparsing.exceptions.ParseBaseException
Exception thrown when parse expressions don’t match class; supported attributes by name are: - lineno - returns
the line number of the exception text - col - returns the column number of the exception text - line - returns the
line containing the exception text
Example:
try:
Word(nums).setName("integer").parseString("ABC")
except ParseException as pe:
print(pe)
print("column: {}".format(pe.col))
prints:
38 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
prints:
ignore(other)
Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly,
to define multiple comment or other ignorable patterns.
Example:
patt = OneOrMore(Word(alphas))
patt.parseString('ablaj /* comment */ lskjd')
# -> ['ablaj']
patt.ignore(cStyleComment)
patt.parseString('ablaj /* comment */ lskjd')
# -> ['ablaj', 'lskjd']
ignoreWhitespace(recursive=True)
Extends ignoreWhitespace defined in base class, and also invokes leaveWhitespace on all
contained expressions.
leaveWhitespace(recursive=True)
Extends leaveWhitespace defined in base class, and also invokes leaveWhitespace on all
contained expressions.
recurse()
streamline()
validate(validateTrace=None)
Check defined expressions for valid structure, check for infinite recursive definitions.
exception pyparsing.ParseFatalException(pstr, loc=0, msg=None, elem=None)
Bases: pyparsing.exceptions.ParseBaseException
user-throwable exception thrown when inconsistent parse content is found; stops all parsing immediately
class pyparsing.ParseResults(toklist=None, name=None, asList=True, modal=True,
isinstance=<built-in function isinstance>)
Bases: object
Structured parse results, to provide multiple means of access to the parsed data:
• as a list (len(results))
integer = Word(nums)
date_str = (integer.setResultsName("year") + '/'
+ integer.setResultsName("month") + '/'
+ integer.setResultsName("day"))
# equivalent form:
# date_str = integer("year") + '/' + integer("month") + '/' + integer("day")
prints:
append(item)
Add single element to end of ParseResults list of elements.
Example:
# use a parse action to compute the sum of the parsed integers, and add it to
˓→the end
def append_sum(tokens):
tokens.append(sum(map(int, tokens)))
print(OneOrMore(Word(nums)).addParseAction(append_sum).parseString("0 123 321
˓→")) # -> ['0', '123', '321', 444]
asDict()
Returns the named parse results as a nested dictionary.
Example:
40 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
integer = Word(nums)
date_str = integer("year") + '/' + integer("month") + '/' + integer("day")
result = date_str.parseString('12/31/1999')
print(type(result), repr(result)) # -> <class 'pyparsing.ParseResults'> (['12
˓→', '/', '31', '/', '1999'], {'day': [('1999', 4)], 'year': [('12', 0)],
result_dict = result.asDict()
print(type(result_dict), repr(result_dict)) # -> <class 'dict'> {'day': '1999
˓→', 'year': '12', 'month': '31'}
import json
print(json.dumps(result)) # -> Exception: TypeError: ... is not JSON
˓→serializable
asList()
Returns the parse results as a nested list of matching tokens, all converted to strings.
Example:
patt = OneOrMore(Word(alphas))
result = patt.parseString("sldkj lsdkj sldkj")
# even though the result prints in string-like form, it is actually a
˓→pyparsing ParseResults
clear()
Clear all elements and results names.
copy()
Returns a new copy of a ParseResults object.
dump(indent=”, full=True, include_list=True, _depth=0)
Diagnostic method for listing out the contents of a ParseResults. Accepts an optional indent argu-
ment so that this string can be embedded in a nested display of other data.
Example:
integer = Word(nums)
date_str = integer("year") + '/' + integer("month") + '/' + integer("day")
result = date_str.parseString('12/31/1999')
print(result.dump())
prints:
extend(itemseq)
Add sequence of elements to end of ParseResults list of elements.
Example:
patt = OneOrMore(Word(alphas))
# use a parse action to append the reverse of the matched strings, to make a
˓→palindrome
def make_palindrome(tokens):
tokens.extend(reversed([t[::-1] for t in tokens]))
return ''.join(tokens)
print(patt.addParseAction(make_palindrome).parseString("lskdj sdlkjf lksd"))
˓→# -> 'lskdjsdlkjflksddsklfjkldsjdksl'
integer = Word(nums)
date_str = integer("year") + '/' + integer("month") + '/' + integer("day")
result = date_str.parseString("1999/12/31")
print(result.get("year")) # -> '1999'
print(result.get("hour", "not specified")) # -> 'not specified'
print(result.get("hour")) # -> None
getName()
Returns the results name for this token expression. Useful when several different expressions might match
at a particular location.
Example:
integer = Word(nums)
ssn_expr = Regex(r"\d\d\d-\d\d-\d\d\d\d")
house_number_expr = Suppress('#') + Word(nums, alphanums)
user_data = (Group(house_number_expr)("house_number")
| Group(ssn_expr)("ssn")
| Group(integer)("age"))
user_info = OneOrMore(user_data)
prints:
42 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
age : 22
ssn : 111-22-3333
house_number : 221B
haskeys()
Since keys() returns an iterator, this method is helpful in bypassing code that looks for the existence of any
defined results names.
insert(index, insStr)
Inserts new element at location index in the list of parsed tokens.
Similar to list.insert().
Example:
# use a parse action to insert the parse location in the front of the parsed
˓→results
items()
keys()
pop(*args, **kwargs)
Removes and returns item at specified index (default= last). Supports both list and dict semantics
for pop(). If passed no argument or an integer argument, it will use list semantics and pop tokens
from the list of parsed tokens. If passed a non-integer argument (most likely a string), it will use dict
semantics and pop the corresponding value from any defined results names. A second default return value
argument is supported, just as in dict.pop().
Example:
def remove_first(tokens):
tokens.pop(0)
print(OneOrMore(Word(nums)).parseString("0 123 321")) # -> ['0', '123', '321']
print(OneOrMore(Word(nums)).addParseAction(remove_first).parseString("0 123
˓→321")) # -> ['123', '321']
label = Word(alphas)
patt = label("LABEL") + OneOrMore(Word(nums))
print(patt.parseString("AAB 123 321").dump())
# Use pop() in a parse action to remove named result (note that corresponding
˓→value is not
prints:
pprint(*args, **kwargs)
Pretty-printer for parsed results as a list, using the pprint module. Accepts additional positional or keyword
args as defined for pprint.pprint .
Example:
prints:
['fna',
['a',
'b',
['(', 'fnb', ['c', 'd', '200'], ')'],
'100']]
values()
exception pyparsing.ParseSyntaxException(pstr, loc=0, msg=None, elem=None)
Bases: pyparsing.exceptions.ParseFatalException
just like ParseFatalException, but thrown internally when an ErrorStop (‘-‘ operator) indicates that
parsing is to stop immediately because an unbacktrackable syntax error has been found.
class pyparsing.ParserElement(savelist=False)
Bases: abc.ABC
Abstract base level parser element class.
DEFAULT_WHITE_CHARS = ' \n\t\r'
addCondition(*fns, **kwargs)
Add a boolean predicate function to expression’s list of parse actions. See setParseAction for func-
tion call signatures. Unlike setParseAction, functions passed to addCondition need to return
boolean success/fail of the condition.
Optional keyword arguments:
• message = define a custom message to be used in the raised exception
• fatal = if True, will raise ParseFatalException to stop parsing immediately; otherwise will raise Parse-
Exception
• callDuringTry = boolean to indicate if this method should be called during internal tryParse calls,
default=False
Example:
44 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
(line:1, col:1)
addParseAction(*fns, **kwargs)
Add one or more parse actions to expression’s list of parse actions. See setParseAction.
See examples in copy.
canParseNext(instring, loc)
copy()
Make a copy of this ParserElement. Useful for defining different parse actions for the same parsing
pattern, using copies of the original parse element.
Example:
prints:
defaultName
static enablePackrat(cache_size_limit=128)
Enables “packrat” parsing, which adds memoizing to the parsing logic. Repeated parse attempts at the
same string location (which happens often in many complex grammars) can immediately return a cached
value, instead of re-executing parsing/validating code. Memoizing is done of both valid results and parsing
exceptions.
Parameters:
• cache_size_limit - (default= 128) - if an integer value is provided will limit the size of the packrat
cache; if None is passed, then the cache size will be unbounded; if 0 is passed, the cache will be
effectively disabled.
This speedup may break existing programs that use parse actions that have side-effects. For this rea-
son, packrat parsing is disabled when you first import pyparsing. To activate the packrat feature,
your program must call the class method ParserElement.enablePackrat. For best results, call
enablePackrat() immediately after importing pyparsing.
Example:
import pyparsing
pyparsing.ParserElement.enablePackrat()
ignore(other)
Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly,
to define multiple comment or other ignorable patterns.
Example:
patt = OneOrMore(Word(alphas))
patt.parseString('ablaj /* comment */ lskjd')
# -> ['ablaj']
patt.ignore(cStyleComment)
patt.parseString('ablaj /* comment */ lskjd')
# -> ['ablaj', 'lskjd']
ignoreWhitespace(recursive=True)
Enables the skipping of whitespace before matching the characters in the ParserElement’s defined
pattern.
Parameters recursive – If true (the default), also enable whitespace skipping in child ele-
ments (if any)
static inlineLiteralsUsing(cls)
Set class to be used for inclusion of string literals into a parser.
Example:
# change to Suppress
ParserElement.inlineLiteralsUsing(Suppress)
date_str = integer("year") + '/' + integer("month") + '/' + integer("day")
leaveWhitespace(recursive=True)
Disables the skipping of whitespace before matching the characters in the ParserElement’s defined
pattern. This is normally only used internally by the pyparsing module, but may be needed in some
whitespace-sensitive grammars.
Parameters recursive – If true (the default), also disable whitespace skipping in child ele-
ments (if any)
matches(testString, parseAll=True)
Method for quick testing of a parser against a test string. Good for simple inline microtests of sub expres-
sions while building up larger parser.
Parameters:
• testString - to test against this expression for a match
• parseAll - (default= True) - flag to pass to parseString when running tests
46 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
Example:
expr = Word(nums)
assert expr.matches("100")
name
packrat_cache = {}
packrat_cache_lock = <unlocked _thread.RLock object owner=0 count=0>
packrat_cache_stats = [0, 0]
parseFile(file_or_filename, parseAll=False)
Execute the parse expression on the given file or filename. If a filename is specified (instead of a file
object), the entire file is opened, read, and closed before parsing.
parseImpl(instring, loc, doActions=True)
parseString(instring, parseAll=False)
Parse a string with respect to the parser definition. This function is intended as the primary interface to the
client code.
Parameters
• instring – The input string to be parsed.
• parseAll – If set, the entire input string must match the grammar.
Raises ParseException – Raised if parseAll is set and the input string does not match
the whole grammar.
Returns the parsed data as a ParseResults object, which may be accessed as a list, a dict,
or an object with attributes if the given parser includes results names.
If the input string is required to match the entire grammar, parseAll flag must be set to True. This is
also equivalent to ending the grammar with StringEnd().
To report proper column numbers, parseString operates on a copy of the input string where all tabs
are converted to spaces (8 spaces per tab, as per the default in string.expandtabs). If the input string
contains tabs and the grammar uses parse actions that use the loc argument to index into the string being
parsed, one can ensure a consistent view of the input string by doing one of the following:
• calling parseWithTabs on your grammar before calling parseString (see
parseWithTabs),
• define your parse action using the full (s,loc,toks) signature, and reference the input string using
the parse action’s s argument, or
• explicitly expand the tabs in your input string before calling parseString.
Examples:
By default, partial matches are OK.
The parsing behavior varies by the inheriting class of this abstract class. Please refer to the children directly
to see more examples.
It raises an exception if parseAll flag is set and instring does not match the whole grammar.
parseWithTabs()
Overrides default behavior to expand <TAB> s to spaces before parsing the input string. Must be called
before parseString when the input grammar contains elements that match <TAB> characters.
postParse(instring, loc, tokenlist)
preParse(instring, loc)
recurse()
static resetCache()
runTests(tests, parseAll=True, comment=’#’, fullDump=True, printResults=True, failureTests=False,
postParse=None, file=None)
Execute the parse expression on a series of test strings, showing each test, the parsed results or where the
parse failed. Quick and easy way to run a parse expression against a list of sample strings.
Parameters:
• tests - a list of separate test strings, or a multiline string of test strings
• parseAll - (default= True) - flag to pass to parseString when running tests
• comment - (default= '#') - expression for indicating embedded comments in the test string; pass
None to disable comment filtering
• fullDump - (default= True) - dump results as list followed by results names in nested outline; if
False, only dump nested list
• printResults - (default= True) prints test output to stdout
• failureTests - (default= False) indicates if these tests are expected to fail parsing
• postParse - (default= None) optional callback for successful parse results; called as fn(test_string,
parse_results) and returns a string to be added to the test output
• file - (default= None) optional file-like object to which test output will be written; if None, will
default to sys.stdout
Returns: a (success, results) tuple, where success indicates that all tests succeeded (or failed if
failureTests is True), and the results contain a list of lines of each test’s output
Example:
number_expr = pyparsing_common.number.copy()
result = number_expr.runTests('''
# unsigned integer
100
# negative integer
-100
# float with scientific notation
6.02e23
# integer with scientific notation
1e-12
''')
(continues on next page)
48 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
result = number_expr.runTests('''
# stray character
100Z
# missing leading digit before '.'
-.100
# too many '.'
3.14.159
''', failureTests=True)
print("Success" if result[0] else "Failed!")
prints:
# unsigned integer
100
[100]
# negative integer
-100
[-100]
Success
# stray character
100Z
^
FAIL: Expected end of text (at char 3), (line:1, col:4)
Success
Each test string must be on a single line. If you want to test a string that spans multiple lines, create a test
like this:
expr.runTest(r"this is a test\n of strings that spans \n 3 lines")
(Note that this is a raw string literal, you must include the leading 'r'.)
scanString(instring, maxMatches=9223372036854775807, overlap=False)
Scan the input string for expression matches. Each match will return the matching tokens, start location,
and end location. May be called with optional maxMatches argument, to clip scanning after ‘n’ matches
are found. If overlap is specified, then overlapping matches will be reported.
Note that the start and end locations are reported relative to the string being parsed. See parseString
for more information on parsing strings with embedded tabs.
Example:
source = "sldjf123lsdjjkf345sldkjf879lkjsfd987"
print(source)
for tokens, start, end in Word(alphas).scanString(source):
print(' '*start + '^'*(end-start))
print(' '*start + tokens[0])
prints:
sldjf123lsdjjkf345sldkjf879lkjsfd987
^^^^^
sldjf
^^^^^^^
lsdjjkf
^^^^^^
sldkjf
^^^^^^
lkjsfd
searchString(instring, maxMatches=9223372036854775807)
Another extension to scanString, simplifying the access to the tokens found to match the given parse
expression. May be called with optional maxMatches argument, to clip searching after ‘n’ matches are
found.
Example:
# a capitalized word starts with an uppercase letter, followed by zero or
˓→more lowercase letters
# the sum() builtin can be used to merge results into a single ParseResults
˓→object
prints:
[['More'], ['Iron'], ['Lead'], ['Gold'], ['I'], ['Electricity']]
['More', 'Iron', 'Lead', 'Gold', 'I', 'Electricity']
setBreak(breakFlag=True)
Method to invoke the Python pdb debugger when this element is about to be parsed. Set breakFlag to
True to enable, False to disable.
setDebug(flag=True)
Enable display of debugging messages while doing pattern matching. Set flag to True to enable, False
to disable.
Example:
50 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
wd = Word(alphas).setName("alphaword")
integer = Word(nums).setName("numword")
term = wd | integer
prints:
The output shown is that produced by the default debug actions - custom debug actions can be specified
using setDebugActions. Prior to attempting to match the wd expression, the debugging message
"Match <exprname> at loc <n>(<line>,<col>)" is shown. Then if the parse succeeds, a
"Matched" message is shown, or an "Exception raised" message is shown. Also note the use
of setName to assign a human-readable name to the expression, which makes debugging and exception
messages easier to understand - for instance, the default name created for the Word expression without
calling setName is "W:(A-Za-z)".
setDebugActions(startAction, successAction, exceptionAction)
Enable display of debugging messages while doing pattern matching.
static setDefaultWhitespaceChars(chars)
Overrides the default whitespace chars
Example:
setFailAction(fn)
Define action to perform if parsing fails at this expression. Fail acton fn is a callable function that takes
the arguments fn(s, loc, expr, err) where:
• s = string being parsed
• loc = location where expression match was attempted and failed
• expr = the parse expression that failed
• err = the exception thrown
The function returns no value. It may throw ParseFatalException if it is desired to stop parsing
immediately.
setName(name)
Define name for this expression, makes debugging and exception messages clearer. Example:
Word(nums).parseString("ABC") # -> Exception: Expected W:(0-9) (at char 0),
˓→(line:1, col:1)
setParseAction(*fns, **kwargs)
Define one or more actions to perform when successfully matching parse element definition. Parse ac-
tion fn is a callable method with 0-3 arguments, called as fn(s, loc, toks) , fn(loc, toks) ,
fn(toks) , or just fn() , where:
• s = the original string being parsed (see note below)
• loc = the location of the matching substring
• toks = a list of the matched tokens, packaged as a ParseResults object
If the functions in fns modify the tokens, they can return them as the return value from fn, and the modified
list of tokens will replace the original. Otherwise, fn does not need to return any value.
If None is passed as the parse action, all previously added parse actions for this expression are cleared.
Optional keyword arguments:
• callDuringTry = (default= False) indicate if parse action should be run during lookaheads and alter-
nate testing
Note: the default parsing behavior is to expand tabs in the input string before starting the parsing process.
See parseString for more information on parsing strings containing <TAB> s, and suggested methods
to maintain a consistent view of the parsed string, the parse location, and line and column positions within
the parsed string.
Example:
integer = Word(nums)
date_str = integer + '/' + integer + '/' + integer
setResultsName(name, listAllMatches=False)
Define name for referencing matching tokens as a nested attribute of the returned parse results. NOTE:
this returns a copy of the original ParserElement object; this is so that the client can define a basic
element, such as an integer, and reference it in multiple places with different names.
You can also set results names using the abbreviated syntax, expr("name") in place of expr.
setResultsName("name") - see __call__.
Example:
date_str = (integer.setResultsName("year") + '/'
+ integer.setResultsName("month") + '/'
+ integer.setResultsName("day"))
(continues on next page)
52 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
# equivalent form:
date_str = integer("year") + '/' + integer("month") + '/' + integer("day")
setWhitespaceChars(chars, copy_defaults=False)
Overrides the default whitespace chars
split(instring, maxsplit=9223372036854775807, includeSeparators=False)
Generator method to split a string using the given expression as a separator. May be called with optional
maxsplit argument, to limit the number of splits; and the optional includeSeparators argument
(default= False), if the separating matching text should be included in the split results.
Example:
punc = oneOf(list(".,;:/-!?"))
print(list(punc.split("This, this?, this sentence, is badly punctuated!")))
prints:
['This', ' this', '', ' this sentence', ' is badly punctuated', '']
streamline()
suppress()
Suppresses the output of this ParserElement; useful to keep punctuation from cluttering up returned
output.
transformString(instring)
Extension to scanString, to modify matching text with modified tokens that may be returned from a
parse action. To use transformString, define a grammar and attach a parse action to it that modifies
the returned token list. Invoking transformString() on a target string will then scan for matches,
and replace the matched text patterns according to the logic in the parse action. transformString()
returns the resulting transformed string.
Example:
wd = Word(alphas)
wd.setParseAction(lambda toks: toks[0].title())
prints:
Now Is The Winter Of Our Discontent Made Glorious Summer By This Sun Of York.
qs = QuotedString('"')
print(qs.searchString('lsjdf "This is the quote" sldjf'))
complex_qs = QuotedString('{{', endQuoteChar='}}')
print(complex_qs.searchString('lsjdf {{This is the "quote"}} sldjf'))
sql_qs = QuotedString('"', escQuote='""')
print(sql_qs.searchString('lsjdf "This is the quote with ""embedded"" quotes"
˓→sldjf'))
prints:
realnum = Regex(r"[+-]?\d+\.\d*")
# ref: https://stackoverflow.com/questions/267399/how-do-you-match-only-valid-
˓→roman-numerals-with-a-regular-expression
roman = Regex(r"M{0,4}(CM|CD|D?{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})")
54 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
make_html = Regex(r"(\w+):(.*?):").sub(r"<\1>\2</\1>")
print(make_html.transformString("h1:main title:"))
# prints "<h1>main title</h1>"
report = '''
Outstanding Issues Report - 1 Jan 2000
# - parse action will call token.strip() for each matched token, i.e., the
˓→description body
prints:
# often, delimiters that are useful during parsing are just in the
# way afterward - use Suppress to keep them out of the parsed output
wd_list2 = wd + ZeroOrMore(Suppress(',') + wd)
print(wd_list2.parseString(source))
prints:
56 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
• punc8bit (non-alphabetic characters in ASCII range 128-255 - currency, symbols, superscripts, diacrit-
icals, etc.)
• printables (any non-whitespace character)
Example:
58 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
A short-cut class for defining Word(characters, exact=1), when defining a match of any single char-
acter in a string of characters.
pyparsing.col(loc, strg)
Returns current column within a string, counting newlines as line separators. The first column is number 1.
Note: the default parsing behavior is to expand tabs in the input string before starting the parsing process.
See ParserElement.parseString for more information on parsing strings containing <TAB> s, and
suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column
positions within the parsed string.
pyparsing.countedArray(expr, intExpr=None)
Helper to define a counted list of expressions.
This helper defines a pattern of the form:
where the leading integer tells how many expr expressions follow. The matched tokens returns the array of expr
tokens as a list - the leading count token is suppressed.
If intExpr is specified, it should be a pyparsing expression that produces an integer value.
Example:
# if other fields must be parsed after the count but before the
# list items, give the fields results names and they will
# be preserved in the returned ParseResults:
count_with_metadata = integer + Word(alphas)("type")
typed_array = countedArray(Word(alphanums), intExpr=count_with_metadata)("items")
result = typed_array.parseString("3 bool True True False")
print(result.dump())
# prints
# ['True', 'True', 'False']
# - items: ['True', 'True', 'False']
# - type: 'bool'
pyparsing.dictOf(key, value)
Helper to easily and clearly define a dictionary by specifying the respective patterns for the key and value. Takes
care of defining the Dict, ZeroOrMore, and Group tokens in the proper order. The key pattern can include
delimiting markers or punctuation, as long as they are suppressed, thereby leaving the significant key text. The
value pattern can include named results, so that the Dict results can include named token fields.
Example:
text = "shape: SQUARE posn: upper left color: light blue texture: burlap"
attr_expr = (label + Suppress(':') + OneOrMore(data_word, stopOn=label).
˓→setParseAction(' '.join))
print(OneOrMore(attr_expr).parseString(text).dump())
attr_label = label
attr_value = Suppress(':') + OneOrMore(data_word, stopOn=label).setParseAction('
˓→'.join)
prints:
[['shape', 'SQUARE'], ['posn', 'upper left'], ['color', 'light blue'], ['texture',
˓→ 'burlap']]
pyparsing.line(loc, strg)
Returns the line of text containing loc within a string, counting newlines as line separators.
pyparsing.lineno(loc, strg)
Returns current line number within a string, counting newlines as line separators. The first line is number 1.
Note - the default parsing behavior is to expand tabs in the input string before starting the parsing process.
See ParserElement.parseString for more information on parsing strings containing <TAB> s, and
suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column
positions within the parsed string.
pyparsing.makeHTMLTags(tagStr)
Helper to construct opening and closing tag expressions for HTML, given a tag name. Matches tags in either
upper or lower case, attributes with namespaces and with quoted or unquoted values.
Example:
text = '<td>More info at the <a href="https://github.com/pyparsing/pyparsing/wiki
˓→">pyparsing</a> wiki page</td>'
60 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
prints:
pyparsing.makeXMLTags(tagStr)
Helper to construct opening and closing tag expressions for XML, given a tag name. Matches tags only in the
given upper/lower case.
Example: similar to makeHTMLTags
pyparsing.matchOnlyAtCol(n)
Helper method for defining parse actions that require matching at a specific column in the input text.
pyparsing.matchPreviousExpr(expr)
Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that
is, it looks for a ‘repeat’ of a previous expression. For example:
first = Word(nums)
second = matchPreviousExpr(first)
matchExpr = first + ":" + second
will match "1:1", but not "1:2". Because this matches by expressions, will not match the leading "1:1" in
"1:10"; the expressions are evaluated first, and then compared, so "1" is compared with "10". Do not use
with packrat parsing enabled.
pyparsing.matchPreviousLiteral(expr)
Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that
is, it looks for a ‘repeat’ of a previous expression. For example:
first = Word(nums)
second = matchPreviousLiteral(first)
matchExpr = first + ":" + second
will match "1:1", but not "1:2". Because this matches a previous literal, will also match the leading "1:1"
in "1:10". If this is not desired, use matchPreviousExpr. Do not use with packrat parsing enabled.
pyparsing.nestedExpr(opener=’(’, closer=’)’, content=None, ignoreExpr=quotedString using single
or double quotes)
Helper method for defining nested lists enclosed in opening and closing delimiters (“(” and “)” are the default).
Parameters:
• opener - opening character for a nested list (default= "("); can also be a pyparsing expression
• closer - closing character for a nested list (default= ")"); can also be a pyparsing expression
• content - expression for items within the nested lists (default= None)
• ignoreExpr - expression for ignoring opening and closing delimiters (default= quotedString)
If an expression is not provided for the content argument, the nested expression will capture all whitespace-
delimited content between delimiters as a list of separate values.
Use the ignoreExpr argument to define expressions that may contain opening or closing characters that
should not be treated as opening or closing characters for nesting, such as quotedString or a comment expres-
sion. Specify multiple expressions using an Or or MatchFirst. The default is quotedString, but if no
expressions are to be ignored, then pass None for this argument.
Example:
data_type = oneOf("void int short long char float double")
decl_data_type = Combine(data_type + Optional(Word('*')))
ident = Word(alphas+'_', alphanums+'_')
number = pyparsing_common.number
arg = Group(decl_data_type + ident)
LPAR, RPAR = map(Suppress, "()")
c_function = (decl_data_type("type")
+ ident("name")
+ LPAR + Optional(delimitedList(arg), [])("args") + RPAR
+ code_body("body"))
c_function.ignore(cStyleComment)
source_code = '''
int is_odd(int x) {
return (x%2);
}
prints:
is_odd (int) args: [['int', 'x']]
dec_to_hex (int) args: [['char', 'hchar']]
pyparsing.nullDebugAction(*args)
‘Do-nothing’ debug action, to suppress debugging output during parsing.
pyparsing.oneOf(strs, caseless=False, useRegex=True, asKeyword=False)
Helper to quickly define a set of alternative Literal s, and makes sure to do longest-first testing when there
is a conflict, regardless of the input order, but returns a MatchFirst for best performance.
Parameters:
• strs - a string of space-delimited literals, or a collection of string literals
• caseless - (default= False) - treat all literals as caseless
• useRegex - (default= True) - as an optimization, will generate a Regex object; otherwise, will generate
a MatchFirst object (if caseless=True or asKeyword=True, or if creating a Regex raises an
exception)
• asKeyword - (default= False) - enforce Keyword-style matching on the generated expressions
62 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
Example:
prints:
[['B', '=', '12'], ['AA', '=', '23'], ['B', '<=', 'AA'], ['AA', '>', '12']]
pyparsing.removeQuotes(s, l, t)
Helper parse action for removing quotation marks from parsed quoted strings.
Example:
pyparsing.replaceHTMLEntity(t)
Helper parser action to replace common HTML entities with their special characters
pyparsing.replaceWith(replStr)
Helper method for common parse actions that simply return a literal value. Especially useful when used with
transformString ().
Example:
pyparsing.srange(s)
Helper to easily define string ranges for use in Word construction. Borrows syntax from regexp ‘[]’ string range
definitions:
The input string must be enclosed in []’s, and the returned string is the expanded character set joined into a
single string. The values enclosed in the []’s may be:
• a single character
• an escaped character with a leading backslash (such as \- or \])
• an escaped hex character with a leading '\x' (\x21, which is a '!' character) (\0x## is also supported
for backwards compatibility)
• an escaped octal character with a leading '\0' (\041, which is a '!' character)
• a range of any of the above, separated by a dash ('a-z', etc.)
• any combination of the above ('aeiouy', 'a-zA-Z0-9_$', etc.)
pyparsing.traceParseAction(f )
Decorator for debugging parse actions.
When the parse action is called, this decorator will print ">> entering
method-name(line:<current_source_line>, <parse_location>,
<matched_tokens>)". When the parse action completes, the decorator will print "<<" followed
by the returned value, or any exception that the parse action raised.
Example:
wd = Word(alphas)
@traceParseAction
def remove_duplicate_chars(tokens):
return ''.join(sorted(set(''.join(tokens))))
wds = OneOrMore(wd).setParseAction(remove_duplicate_chars)
print(wds.parseString("slkdjs sld sldd sdlf sdljf"))
prints:
pyparsing.withAttribute(*args, **attrDict)
Helper to create a validating parse action to be used with start tags created with makeXMLTags or
makeHTMLTags. Use withAttribute to qualify a starting tag with a required attribute value, to avoid
false matches on common tags such as <TD> or <DIV>.
Call withAttribute with a series of attribute names and values. Specify the list of filter attributes names
and values as:
• keyword arguments, as in (align="right"), or
• as an explicit dict with ** operator, when an attribute name is also a Python reserved word, as in
**{"class":"Customer", "align":"right"}
• a list of name-value tuples, as in (("ns1:class", "Customer"), ("ns2:align",
"right"))
For attribute names with a namespace prefix, you must use the second form. Attribute names are matched
insensitive to upper/lower case.
If just testing for class (with or without a namespace), use withClass.
To verify that the attribute exists, but without specifying a value, pass withAttribute.ANY_VALUE as the
value.
Example:
html = '''
<div>
Some text
(continues on next page)
64 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
'''
div,div_end = makeHTMLTags("div")
# only match div tag having a type attribute with value "grid"
div_grid = div().setParseAction(withAttribute(type="grid"))
grid_expr = div_grid + SkipTo(div | div_end)("body")
for grid_header in grid_expr.searchString(html):
print(grid_header.body)
# construct a match with any div tag having a type attribute, regardless of the
˓→value
div_any_type = div().setParseAction(withAttribute(type=withAttribute.ANY_VALUE))
div_expr = div_any_type + SkipTo(div | div_end)("body")
for div_header in div_expr.searchString(html):
print(div_header.body)
prints:
1 4 0 1 0
1 4 0 1 0
1,3 2,3 1,1
data = '''
def A(z):
A1
B = 100
G = A2
A2
A3
B
def BB(a,b,c):
BB1
def BBA():
bba1
(continues on next page)
indentStack = [1]
stmt = Forward()
rvalue = Forward()
funcCall = Group(identifier + "(" + Optional(delimitedList(rvalue)) + ")")
rvalue << (funcCall | identifier | Word(nums))
assignment = Group(identifier + "=" + rvalue)
stmt << (funcDef | assignment | identifier)
module_body = OneOrMore(stmt)
parseTree = module_body.parseString(data)
parseTree.pprint()
prints:
[['def',
'A',
['(', 'z', ')'],
':',
[['A1'], [['B', '=', '100']], [['G', '=', 'A2']], ['A2'], ['A3']]],
'B',
['def',
'BB',
['(', 'a', 'b', 'c', ')'],
':',
[['BB1'], [['def', 'BBA', ['(', ')'], ':', [['bba1'], ['bba2'], ['bba3']]]]]],
'C',
'D',
['def',
'spam',
['(', 'x', 'y', ')'],
':',
[[['def', 'eggs', ['(', 'z', ')'], ':', [['pass']]]]]]]
pyparsing.originalTextFor(expr, asString=True)
Helper to return the original, untokenized text for a given expression. Useful to restore the parsed fields of an
HTML start tag into the raw tag text itself, or to revert separate tokens with intervening whitespace back to the
original matching input text. By default, returns astring containing the original parsed text.
If the optional asString argument is passed as False, then the return value is a ParseResults containing
66 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
any results names that were originally matched, and a single token containing the original matched text from
the input string. So if the expression passed to originalTextFor contains expressions with defined results
names, you must set asString to False if you want to preserve those results name values.
Example:
src = "this is test <b> bold <i>text</i> </b> normal text "
for tag in ("b", "i"):
opener, closer = makeHTMLTags(tag)
patt = originalTextFor(opener + SkipTo(closer) + closer)
print(patt.searchString(src)[0])
prints:
pyparsing.ungroup(expr)
Helper to undo pyparsing’s default grouping of And expressions, even if all but one are non-empty.
pyparsing.infixNotation(baseExpr, opList, lpar=Suppress:(’(’), rpar=Suppress:(’)’))
Helper method for constructing grammars of expressions made up of operators working in a precedence hierar-
chy. Operators may be unary or binary, left- or right-associative. Parse actions can also be attached to operator
expressions. The generated parser will also recognize the use of parentheses to override operator precedences
(see example below).
Note: if you define a deep operator list, you may see performance issues when using infixNotation. See
ParserElement.enablePackrat for a mechanism to potentially improve your parser performance.
Parameters:
• baseExpr - expression representing the most basic element for the nested
• opList - list of tuples, one for each operator precedence level in the expression grammar; each tuple is
of the form (opExpr, numTerms, rightLeftAssoc, parseAction), where:
– opExpr is the pyparsing expression for the operator; may also be a string, which will be con-
verted to a Literal; if numTerms is 3, opExpr is a tuple of two expressions, for the two operators
separating the 3 terms
– numTerms is the number of terms for this operator (must be 1, 2, or 3)
– rightLeftAssoc is the indicator whether the operator is right or left associative, using the
pyparsing-defined constants opAssoc.RIGHT and opAssoc.LEFT.
– parseAction is the parse action to be associated with expressions matching this operator expres-
sion (the parse action tuple member may be omitted); if the parse action is passed a tuple or
list of functions, this is equivalent to calling setParseAction(*fn) (ParserElement.
setParseAction)
• lpar - expression for matching left-parentheses (default= Suppress('('))
• rpar - expression for matching right-parentheses (default= Suppress(')'))
Example:
arith_expr.runTests('''
5+3*6
(5+3)*6
-2--11
''', fullDump=False)
prints:
5+3*6
[[5, '+', [3, '*', 6]]]
(5+3)*6
[[[5, '+', 3], '*', 6]]
-2--11
[[['-', 2], '-', ['-', 11]]]
pyparsing.locatedExpr(expr)
Helper to decorate a returned token with its starting and ending locations in the input string.
This helper adds the following results names:
• locn_start = location where matched expression begins
• locn_end = location where matched expression ends
• value = the actual parsed results
Be careful if the input text contains <TAB> characters, you may want to call ParserElement.
parseWithTabs
Example:
wd = Word(alphas)
for match in locatedExpr(wd).searchString("ljsdf123lksdjjf123lkkjj1222"):
print(match)
prints:
[[0, 'ljsdf', 5]]
[[8, 'lksdjjf', 15]]
[[18, 'lkkjj', 23]]
pyparsing.withClass(classname, namespace=”)
Simplified version of withAttribute when matching on a div class - made difficult because class is a
reserved word in Python.
Example:
html = '''
<div>
Some text
(continues on next page)
68 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
'''
div,div_end = makeHTMLTags("div")
div_grid = div().setParseAction(withClass("grid"))
div_any_type = div().setParseAction(withClass(withAttribute.ANY_VALUE))
div_expr = div_any_type + SkipTo(div | div_end)("body")
for div_header in div_expr.searchString(html):
print(div_header.body)
prints:
1 4 0 1 0
1 4 0 1 0
1,3 2,3 1,1
# exact match
patt.parseString("ATCATCGAATGGA") # -> (['ATCATCGAATGGA'], {'mismatches': [[]],
˓→'original': ['ATCATCGAATGGA']})
upperword = Word(alphas).setParseAction(tokenMap(str.upper))
OneOrMore(upperword).runTests('''
my kingdom for a horse
''')
wd = Word(alphas).setParseAction(tokenMap(str.title))
OneOrMore(wd).setParseAction(' '.join).runTests('''
now is the winter of our discontent made glorious summer by this sun of york
''')
prints:
00 11 22 aa FF 0a 0d 1a
[0, 17, 34, 170, 255, 10, 13, 26]
now is the winter of our discontent made glorious summer by this sun of york
['Now Is The Winter Of Our Discontent Made Glorious Summer By This Sun Of York']
class pyparsing.pyparsing_common
Bases: object
Here are some common low-level expressions that may be useful in jump-starting parser development:
• numeric forms (integers, reals, scientific notation)
• common programming identifiers
• network addresses (MAC, IPv4, IPv6)
• ISO8601 dates and datetime
• UUID
• comma-separated list
Parse actions:
• convertToInteger
• convertToFloat
• convertToDate
• convertToDatetime
• stripHTMLTags
70 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
• upcaseTokens
• downcaseTokens
Example:
pyparsing_common.number.runTests('''
# any int or real number, returned as the appropriate type
100
-100
+100
3.14159
6.02e23
1e-12
''')
pyparsing_common.fnumber.runTests('''
# any int or real number, returned as float
100
-100
+100
3.14159
6.02e23
1e-12
''')
pyparsing_common.hex_integer.runTests('''
# hex numbers
100
FF
''')
pyparsing_common.fraction.runTests('''
# fractions
1/2
-3/4
''')
pyparsing_common.mixed_integer.runTests('''
# mixed fractions
1
1/2
-3/4
1-3/4
''')
import uuid
pyparsing_common.uuid.setParseAction(tokenMap(uuid.UUID))
pyparsing_common.uuid.runTests('''
# uuid
12345678-1234-5678-1234-567812345678
''')
prints:
# any int or real number, returned as the appropriate type
100
[100]
+100
[100]
3.14159
[3.14159]
6.02e23
[6.02e+23]
1e-12
[1e-12]
-100
[-100.0]
+100
[100.0]
3.14159
[3.14159]
6.02e23
[6.02e+23]
1e-12
[1e-12]
# hex numbers
100
[256]
FF
[255]
# fractions
1/2
[0.5]
-3/4
[-0.75]
# mixed fractions
1
[1]
1/2
[0.5]
-3/4
[-0.75]
(continues on next page)
72 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
1-3/4
[1.75]
# uuid
12345678-1234-5678-1234-567812345678
[UUID('12345678-1234-5678-1234-567812345678')]
date_expr = pyparsing_common.iso8601_date.copy()
date_expr.setParseAction(pyparsing_common.convertToDate())
print(date_expr.parseString("1999-12-31"))
prints:
static convertToDatetime(fmt=’%Y-%m-%dT%H:%M:%S.%f’)
Helper to create a parse action for converting parsed datetime string to Python datetime.datetime
Params -
• fmt - format to be passed to datetime.strptime (default= "%Y-%m-%dT%H:%M:%S.%f")
Example:
dt_expr = pyparsing_common.iso8601_datetime.copy()
dt_expr.setParseAction(pyparsing_common.convertToDatetime())
print(dt_expr.parseString("1999-12-31T23:59:59.999"))
prints:
convertToFloat(l, t)
Parse action for converting parsed numbers to Python float
convertToInteger(l, t)
Parse action for converting parsed integers to Python int
static downcaseTokens(s, l, t)
Parse action to convert tokens to lower case.
fnumber = fnumber
any int or real number, returned as float
fraction = fraction
fractional expression of an integer divided by an integer, returns a float
print(table_text.parseString(text).body)
Prints:
static upcaseTokens(s, l, t)
Parse action to convert tokens to upper case.
uuid = UUID
UUID (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
74 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
class pyparsing.pyparsing_unicode
Bases: pyparsing.unicode.unicode_set
A namespace class for defining common language unicode_sets.
class Arabic
Bases: pyparsing.unicode.unicode_set
Unicode set for Arabic Unicode Character Range
class CJK
Bases: pyparsing.unicode.Chinese, pyparsing.unicode.Japanese, pyparsing.
unicode.Hangul
Unicode set for combined Chinese, Japanese, and Korean (CJK) Unicode Character Range
class Chinese
Bases: pyparsing.unicode.unicode_set
Unicode set for Chinese Unicode Character Range
class Cyrillic
Bases: pyparsing.unicode.unicode_set
Unicode set for Cyrillic Unicode Character Range
class Devanagari
Bases: pyparsing.unicode.unicode_set
Unicode set for Devanagari Unicode Character Range
class Greek
Bases: pyparsing.unicode.unicode_set
Unicode set for Greek Unicode Character Ranges
class Hangul
Bases: pyparsing.unicode.unicode_set
Unicode set for Hangul (Korean) Unicode Character Range
class Hebrew
Bases: pyparsing.unicode.unicode_set
Unicode set for Hebrew Unicode Character Range
class Japanese
Bases: pyparsing.unicode.unicode_set
Unicode set for Japanese Unicode Character Range, combining Kanji, Hiragana, and Katakana ranges
class Hiragana
Bases: pyparsing.unicode.unicode_set
Unicode set for Hiragana Unicode Character Range
class Kanji
Bases: pyparsing.unicode.unicode_set
Unicode set for Kanji Unicode Character Range
class Katakana
Bases: pyparsing.unicode.unicode_set
Unicode set for Katakana Unicode Character Range
alias of pyparsing_unicode.Japanese.Hiragana
alias of pyparsing_unicode.Japanese.Katakana
alias of pyparsing_unicode.Japanese.Kanji
Korean
alias of pyparsing_unicode.Hangul
class Latin1
Bases: pyparsing.unicode.unicode_set
Unicode set for Latin-1 Unicode Character Range
class LatinA
Bases: pyparsing.unicode.unicode_set
Unicode set for Latin-A Unicode Character Range
class LatinB
Bases: pyparsing.unicode.unicode_set
Unicode set for Latin-B Unicode Character Range
class Thai
Bases: pyparsing.unicode.unicode_set
Unicode set for Thai Unicode Character Range
E𝜆𝜆𝜂𝜈𝜄𝜅
alias of pyparsing_unicode.Greek
alias of pyparsing_unicode.Cyrillic
alias of pyparsing_unicode.Arabic
alias of pyparsing_unicode.Thai
alias of pyparsing_unicode.Chinese
alias of pyparsing_unicode.Japanese
alias of pyparsing_unicode.Hangul
class pyparsing.unicode_set
Bases: object
A set of Unicode characters, for language-specific strings for alphas, nums, alphanums, and
printables. A unicode_set is defined by a list of ranges in the Unicode character set, in a class attribute
_ranges, such as:
A unicode set can also be defined using multiple inheritance of other unicode sets:
76 Chapter 3. pyparsing
PyParsing Documentation, Release 3.0.0b1
alphanums = ''
alphas = ''
nums = ''
printables = ''
pyparsing.conditionAsParseAction(fn, message=None, fatal=False)
Function to convert a simple predicate function that returns True or False into a parse action. Can be used in
places when a parse action is required and ParserElement.addCondition cannot be used (such as when adding a
condition to an operator level in infixNotation).
Optional keyword arguments:
• message = define a custom message to be used in the raised exception
• fatal = if True, will raise ParseFatalException to stop parsing immediately; otherwise will raise ParseEx-
ception
class pyparsing.pyparsing_test
Bases: object
namespace class for classes useful in writing unit tests
class TestParseResultsAsserts
Bases: object
A mixin class to add parse results assertion methods to normal unittest.TestCase classes.
assertParseAndCheckDict(expr, test_string, expected_dict, msg=None, verbose=True)
Convenience wrapper assert to test a parser element and input string, and assert that the resulting
ParseResults.asDict() is equal to the expected_dict.
assertParseAndCheckList(expr, test_string, expected_list, msg=None, verbose=True)
Convenience wrapper assert to test a parser element and input string, and assert that the resulting
ParseResults.asList() is equal to the expected_list.
assertParseResultsEquals(result, expected_list=None, expected_dict=None, msg=None)
Unit test assertion to compare a ParseResults object with an optional expected_list, and
compare any defined results names with an optional expected_dict.
assertRaisesParseException(exc_type=<class ’pyparsing.exceptions.ParseException’>,
msg=None)
assertRunTestResults(run_tests_report, expected_parse_results=None, msg=None)
Unit test assertion to evaluate output of ParserElement.runTests(). If a list of list-dict tuples
is given as the expected_parse_results argument, then these are zipped with the report tuples
returned by runTests and evaluated using assertParseResultsEquals. Finally, asserts that
the overall runTests() success value is True.
Parameters
• run_tests_report – tuple(bool, [tuple(str, ParseResults or Exception)]) returned
from runTests
• (optional) (expected_parse_results) – [tuple(str, list, dict, Exception)]
class reset_pyparsing_context
Bases: object
Context manager to be used when writing unit tests that modify pyparsing config values:
• packrat parsing
• default whitespace characters.
• default keyword characters
• literal string auto-conversion class
• __diag__ settings
Example:
with reset_pyparsing_context():
# test that literals used to construct a grammar are automatically
˓→suppressed
ParserElement.inlineLiteralsUsing(Suppress)
# assert that the '()' characters are not included in the parsed tokens
self.assertParseAndCheckList(group, "(abc 123 def)", ['abc', '123', 'def
˓→'])
copy()
restore()
save()
78 Chapter 3. pyparsing
CHAPTER 4
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making
participation in our project and our community a harassment-free experience for everyone, regardless of age, body size,
disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic
status, nationality, personal appearance, race, religion, or sexual identity and orientation.
79
PyParsing Documentation, Release 3.0.0b1
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appro-
priate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits,
issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any
contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
4.4 Scope
This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the
project or its community. Examples of representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed representative at an online or offline
event. Representation of a project may be further defined and clarified by project maintainers.
4.5 Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at
[email protected]. All complaints will be reviewed and investigated and will result in a response that is deemed
necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard
to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent
repercussions as determined by other members of the project’s leadership.
4.6 Attribution
This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at https://www.
contributor-covenant.org/version/1/4/code-of-conduct.html
• genindex
• modindex
• search
81
PyParsing Documentation, Release 3.0.0b1
p
pyparsing, 25
83
PyParsing Documentation, Release 3.0.0b1
A comma_separated_list (pypars-
addCondition() (pyparsing.ParserElement method), ing.pyparsing_common attribute), 73
44 conditionAsParseAction() (in module pypars-
addParseAction() (pyparsing.ParserElement ing), 77
method), 45 convertToDate() (pyparsing.pyparsing_common
alphanums (pyparsing.unicode_set attribute), 77 static method), 73
alphas (pyparsing.unicode_set attribute), 77 convertToDatetime() (pypars-
And (class in pyparsing), 26 ing.pyparsing_common static method), 73
append() (pyparsing.ParseExpression method), 38 convertToFloat() (pyparsing.pyparsing_common
append() (pyparsing.ParseResults method), 40 method), 73
asDict() (pyparsing.ParseResults method), 40 convertToInteger() (pypars-
asList() (pyparsing.ParseResults method), 41 ing.pyparsing_common method), 73
assertParseAndCheckDict() (pypars- copy() (pyparsing.Forward method), 30
ing.pyparsing_test.TestParseResultsAsserts copy() (pyparsing.Keyword method), 32
method), 77 copy() (pyparsing.ParseExpression method), 38
assertParseAndCheckList() (pypars- copy() (pyparsing.ParserElement method), 45
ing.pyparsing_test.TestParseResultsAsserts copy() (pyparsing.ParseResults method), 41
method), 77 copy() (pyparsing.pyparsing_test.reset_pyparsing_context
assertParseResultsEquals() (pypars- method), 78
ing.pyparsing_test.TestParseResultsAsserts countedArray() (in module pyparsing), 59
method), 77
assertRaisesParseException() (pypars-
D
DEFAULT_KEYWORD_CHARS (pyparsing.Keyword at-
ing.pyparsing_test.TestParseResultsAsserts
method), 77 tribute), 32
assertRunTestResults() DEFAULT_WHITE_CHARS (pyparsing.ParserElement
(pypars-
ing.pyparsing_test.TestParseResultsAsserts attribute), 44
method), 77 defaultName (pyparsing.ParserElement attribute), 45
delimitedList() (in module pyparsing), 59
C Dict (class in pyparsing), 28
canParseNext() (pyparsing.ParserElement method), dictOf() (in module pyparsing), 59
45 downcaseTokens() (pyparsing.pyparsing_common
CaselessKeyword (class in pyparsing), 26 static method), 73
CaselessLiteral (class in pyparsing), 26 dump() (pyparsing.ParseResults method), 41
Char (class in pyparsing), 58 E
CharsNotIn (class in pyparsing), 27
clear() (pyparsing.ParseResults method), 41 E𝜆𝜆𝜂𝜈𝜄𝜅 (pyparsing.pyparsing_unicode attribute), 76
CloseMatch (class in pyparsing), 69 Each (class in pyparsing), 28
col() (in module pyparsing), 59 Empty (class in pyparsing), 29
Combine (class in pyparsing), 27 enablePackrat() (pyparsing.ParserElement static
method), 45
85
PyParsing Documentation, Release 3.0.0b1
86 Index
PyParsing Documentation, Release 3.0.0b1
Index 87
PyParsing Documentation, Release 3.0.0b1
88 Index