Skip to content

Latest commit

 

History

History
4014 lines (2700 loc) · 166 KB

all_wiki_discussion_toc_2015.md

File metadata and controls

4014 lines (2700 loc) · 166 KB

Pyparsing Wikispaces Discussion - 2015

2015-01-11 06:18:12 - perw07 - Problem with 'asKeyword' in Word()
2015-02-02 08:43:02 - Griffon26 - Parsing items in any order, some of which may repeat?
2015-02-06 12:32:31 - knoguchi - select_parser.py unary operator
2015-02-07 22:06:42 - mhucka - How to deal with ParseResults dict_keyiterator in Python 3?
2015-02-25 04:19:46 - rougier - How to parse C-like declarations ?
2015-02-28 07:31:15 - hlamer - Parse a string with escape characters
2015-03-05 01:14:20 - togr - Enhancement/bugfix to fourFn.py example
2015-03-30 06:37:57 - ansari11 - How to disallow having some keywords in an expression?
2015-04-09 03:51:42 - AndreWin - Nested formatting
2015-04-12 04:00:26 - AndreWin - Parsing link in dokuwiki format
2015-04-15 05:36:10 - reneryu - Return results of NestedExprs as the way they are in the input string
2015-04-17 08:50:32 - rjmarshall17 - Newbie issue: Not able to get to nested block
2015-04-25 04:32:07 - Jimorie - Efficiency problems when parsing nested function calls
2015-04-28 00:36:28 - mwjackson - parse trees with infixNotation
2015-05-01 07:27:42 - Euticus - Newbie question: using restOfLine
2015-05-07 15:37:13 - thatsgobbles - Newbie to pyparsing: Issues using Forward and maxiumum recursion depth
2015-05-17 06:00:21 - Animusmontus - Script parser goes infinite loop at end of script? What do I do to stop this?
2015-05-23 10:09:32 - lars.stavholm - Line sensitive parsing
2015-05-26 08:05:47 - AndreWin - Parsing c++ code block with arbitrary order of its elements
2015-05-29 08:12:18 - jellby - Comments in blocks
2015-06-04 13:49:16 - r3d4rmy - Parsing dblSlashComment
2015-06-05 08:39:06 - onyx_onyx - Error caused by wrong token
2015-06-07 00:06:39 - zaymich - Parsing of Optional(...)
2015-06-17 12:34:46 - nicoder - Class vs Instance Declaration for ParseElements?
2015-06-27 01:17:09 - HubbaBubbaMan - Building a template engine with pyparsing
2015-07-03 22:03:29 - zaymich - Line comments and arithmetic expressions
2015-07-21 14:40:38 - yosepfkaggerman - Parsing multiple lines as in an IDE
2015-07-29 04:36:42 - larapsodia - Minimum Length
2015-08-26 08:06:27 - AndreWin - Problems with end of line
2015-09-07 06:11:02 - Rittel - Skip to first possibility in text
2015-09-09 12:40:46 - TheVeryOmni - deferred execution of ParseAction in case of 'Or' - is it a bug?
2015-09-13 08:00:28 - Williamzjc - questions and advice about the codes of pyparsing2.0.3
2015-10-01 14:34:19 - rjmco - Parsing nginx configuration files
2015-10-12 07:40:40 - Williamzjc - One question about setParseAction
2015-10-19 05:12:31 - heronils - 2to3 required?
2015-10-28 07:46:08 - Williamzjc - Suggestion
2015-11-05 13:44:08 - mbeaches - Markup/down grouping of style
2015-11-10 00:10:34 - utkarsh007 - How I can remove C/C++ style comments and genrtate a string back using pyparsing
2015-11-11 09:01:11 - pdelsante - simpleBool.py: Binding values to parsed operands
2015-11-27 05:27:42 - Williamzjc - Correction and Suggestion
2015-12-02 04:41:43 - mentaal - issue with excludeChars "Word" keyword arg
2015-12-08 04:57:42 - Jeroen537 - QuotedString behaves unexpectedly (?)
2015-12-16 07:29:57 - StephenDause - ParseException seems to give incorrect location
2015-12-21 04:33:45 - anon3456 - Parsing GDB/MI
2015-12-27 06:41:50 - AndreWin - Parsing markdown


2015-01-11 06:18:12 - perw07 - Problem with 'asKeyword' in Word()

Problem with 'asKeyword' in Word()

I have run into an unexpected inconsistency. I have seen that Literal() and Keyword() both matches strings, but Keyword() only matches if the word boundaries are 'correct'.

I noticed that Word() has the optional argument 'asKeyword' and assumed that it would do 'the obvious'. Unfortunately, it does not work the way I expected (examples below).

Since we supply Word() with one (or two) character classes (initChars and bodyChars) I was assuming that setting 'asKeyword=True' would imply that the match boundaries would be 'not in initChars' and 'not in bodyChars'.

Here is an example of what really happens:

>>> list(Word(alphas+'#').scanString(' #initial '))
[((['#initial'], {}), 1, 9)]
>>> list(Word(alphas+'#', asKeyword=True).scanString(' #initial '))
[((['initial'], {}), 2, 9)]

Looking at the pyparsing code I have installed suggests that the implementation uses the built-in r'\b' from the re module to mark the boundaries, which is a general word boundary marker and does not care about initChars and bodyChars.

  1. Is this the intended behaviour? (I think the documentation should be augmented in that case.)

  2. Would it be possible to either change this behaviour to what I think is more natural, or perhaps add another flag to be able to choose between the current and the 'natural' one?

I think I will be able to solve my current problem by using Word() without the 'asKeyword' attribute. My grammar always has a specific delimiter following the words I'm looking for, there will never be false matches.


2015-02-02 08:43:02 - Griffon26 - Parsing items in any order, some of which may repeat?

Suppose I'm parsing command line options passed to a program. Some of the options are optional, others required and yet others can be specified multiple times. The options can be specified in any order.

Is it possible to write a grammar for this without doing checks on how often an item occurred in a parse action?

I've tried (Optional(option1) & option2), but I don't see how to extend that to options that can be specified multiple times. (Optional(option1) & option2 & ZeroOrMore(option3)) will require all option3s to be next to each other, which is not what I want.

2015-02-02 13:51:11 - ptmcg

Really? This is not my experience:

from pyparsing import *

A = Word('A')
B = Word('B')
C = Word('C')

expr = (Optional(A) & B & ZeroOrMore(C))

tests  = '''\
C B A
B C A C C
B
C B C A C'''.splitlines()

for test in tests:
    print expr.parseString(test, parseAll=True).asList()

prints

['C', 'B', 'A']
['B', 'C', 'A', 'C', 'C']
['B']
['C', 'B', 'C', 'A', 'C']

And if you attach results names, the groupings are even easier to get to:

expr = (Optional(A('A')) & B('B') & ZeroOrMore(C('C')))
for test in tests:
    print expr.parseString(test, parseAll=True).dump()

prints

['C', 'B', 'A']
- A: A
- B: B
- C: C
['B', 'C', 'A', 'C', 'C']
- A: A
- B: B
- C: ['C', 'C', 'C']
['B']
- B: B
['C', 'B', 'C', 'A', 'C']
- A: A
- B: B
- C: ['C', 'C', 'C']

2015-02-03 12:11:49 - Griffon26

Oops! You're right. And here I thought I knew how Each worked.

Thanks for the excellent answer and I'm sorry to have wasted your time with this.


2015-02-06 12:32:31 - knoguchi - select_parser.py unary operator

I'm looking at the select_parser.py from the example. The unary operator should be associated with right hand. I think this is wrong. Isn't it?

(oneOf('- + ~') | NOT, UNARY, opAssoc.LEFT),

2015-02-06 12:39:04 - knoguchi

Also, are there reasons why two '||' operator definitions in the infixNotation?

('||', BINARY, opAssoc.LEFT),

2015-02-06 14:20:14 - ptmcg

Right on both counts! Plus I don't see any '&&' operator. It has been a while since looking at this. I would change this statement to:

expr << infixNotation(expr_term,
    [
    (oneOf('- + ~') | NOT, UNARY, opAssoc.RIGHT),
    (oneOf('* / %'), BINARY, opAssoc.LEFT),
    (oneOf('+ -'), BINARY, opAssoc.LEFT),
    (oneOf('<< >> & |'), BINARY, opAssoc.LEFT),
    (oneOf('< <= > >='), BINARY, opAssoc.LEFT),
    (oneOf('= == != <>') | IS | IN | LIKE | GLOB | MATCH | REGEXP, BINARY, opAssoc.LEFT),
    ('&&', BINARY, opAssoc.LEFT),
    ('||', BINARY, opAssoc.LEFT),
    ((BETWEEN,AND), TERNARY, opAssoc.LEFT),
    ])

2015-02-09 17:35:47 - knoguchi

Thank you for the quick answer. Please update the example in the next release. pyparsing is great :-)


2015-02-07 22:06:42 - mhucka - How to deal with ParseResults dict_keyiterator in Python 3?

This may be a dumb question, but I'm trying to convert our PyParsing code from Python 2 to Python 3, and hit a snag with the way that ParseResults.keys() returns a dict_keyiterator object. Our appplication makes heavy use of dictionary values attached to ParseResults. Previously, in Python 2, it was very convenient to use .keys() to see the keys that are present, as well as find out the number using len(), etc. Now with dict_keyiterator(), none of that works. If you type x.keys() on an object 'x', you don't see the values -- you get back a <class 'dict_keyiterator'> object.

How does one view the values interactively? Also, how can I check how many there are, etc.? This may be a basic python 3 question, but honestly I've spent half an hour searching for information on how to achieve this with keyiterators and am coming up short...

2015-02-08 11:18:48 - ptmcg

If you want to see if a particular key is present, then the keys() function has not been the recommended approach for some time - instead use 'in':

if 'optional_key' in parseResult:
    ... do stuff ...

If you just want to iterate over the keys, again, you can just use the iterator as-is:

for key in parseResult.keys():
    print (key)

But if you really want a list, just do as you would do with any dict object in Python 3: wrap the call to keys() inside a list() construct:

listOfKeys = list(parseResult.keys())
print ('There are {} keys in the parseResult'.format(len(listOfKeys)))

But since you are doing a Python 2 -> Python 3 upgrade, surely you are getting used to this concept by now. You will have this same issue with any Python 3 dict.keys() call. I changed pyparsing's ParseResults so that it would be consistent with the handling of dict's keys(), values(), items(), etc. functions, as they changed from returning lists to returning iterators in Py3.

2015-02-08 21:37:31 - mhucka

Hi Paul,

Thank you very much for your reply and suggestions. I think I was not specific enough in my question. I am of course already using some of the standard idioms for Python 3 dicts, but for some strange reason, my ParseResults objects do not exhibit some of the behaviors you describe above. In particular:

(Pdb) type(pr)
type(pr)
<class 'pyparsing.ParseResults'>
(Pdb) pr.keys()
pr.keys()
<dict_keyiterator object at 0x11059cc58>
(Pdb) list(pr.keys())
list(pr.keys())
*** Error in argument: '(pr.keys())'

Now, using the approach of printing the keys with a for loop, I can verify that there is one key, so the error above is apparently not due to the fact that the dictionary is empty. Do you have any thoughts about what might be going on?

2015-02-08 21:42:33 - mhucka

Ah, I just figured out what's going on, and it's really embarrasing. It's an artifact of pdb's behavior. If I simply type 'list(pr.keys())', it fails, but if I use 'p list(pr.keys())', it works :-). This did not occur with Python 2, and I didn't notice that I had gotten into a bad habit of not issuing a print ('p') command in pdb.

Anyway, this gets me past the most problematic issue in adapting my Python 2-based code. Thanks again.

2015-02-08 22:27:07 - ptmcg

Perhaps this was because you happened to be using 'list', which I believe is also a valid pdb command.

Glad you got it all squared away - and don't be embarrassed, this post will very likely help someone out in the future struggling with a similar situation.

-- Paul

2015-02-08 22:29:37 - ptmcg

Just to see if it really is 'list' vs. omission of the print command, try typing this instead:

(Pdb) [k for k in pr.keys()]

I think this will probably print out okay - it's not the omission of print that is the problem, it's that the default str representation is now that of a <dict_keyiterator>, not of a nice viewable list.


2015-02-25 04:19:46 - rougier - How to parse C-like declarations ?

I would like to parse declarations in a C-like source (GLSL code) such that I get a list of (type, name, value). For example:

int a[3];
int b=1, c=2.0;
float d = f(z[2], 2) + 3*g(4,a), e;
Point f = {1,2};

I would like to obtain:

[ ('int',   'a[3]', ''),
  ('int',   'b',    '1'),
  ('int',   'c',    '2.0'),
  ('float', 'd',    'f(z[2], 2) + 3*g(4,a)'),
  ('float', 'e',    ''),
  ('Point', 'f',    '{1,2}') ]

I've played with Forward() and operatorPrecedence() to try to parse the rhs expression but I suspect it is not necessary in my case.

2015-02-25 06:53:13 - rougier

So far I have

IDENTIFIER = Regex('[a-zA-Z_][a-zA-Z_0-9]*')
INTEGER    = Regex('([+-]?(([1-9][0-9]*)|0+))')
EQUAL      = Literal('=').suppress()
SEMI       = Literal(';').suppress()
SIZE       = INTEGER | IDENTIFIER
VARNAME    = IDENTIFIER
TYPENAME   = IDENTIFIER
VARIABLE = Group(VARNAME.setResultsName('name')
                 + Optional(EQUAL + Regex('[^,;]*').setResultsName('value')))
VARIABLES = delimitedList(VARIABLE.setResultsName('variable',listAllMatches=True))
DECLARATION = (TYPENAME.setResultsName('type')
               + VARIABLES.setResultsName('variables', listAllMatches=True) + SEMI)

code = '''
float a=1, b=3+f(2), c;
float d=1.0, e;
float f = z(3,4);
'''

for (token, start, end) in DECLARATION.scanString(code):
    for variable in token.variable:
        print token.type, variable.name, variable.value

But the last expression is not parsed beause of the ',' in function call.

2015-02-25 10:17:43 - rougier

Answering my own question, this seems to be working. Maybe there is a more elegant solution.

IDENTIFIER       = Word(alphas+'_', alphas+nums+'_' )
INT_DECIMAL      = Regex('([+-]?(([1-9][0-9]*)|0+))')
INT_OCTAL        = Regex('(0[0-7]*)')
INT_HEXADECIMAL  = Regex('(0[xX][0-9a-fA-F]*)')
INTEGER          = INT_HEXADECIMAL | INT_OCTAL | INT_DECIMAL
FLOAT            = Regex('[+-]?(((\d+\.\d*)|(\d*\.\d+))([eE][-+]?\d+)?)|(\d*[eE][+-]?\d+)')
LPAREN, RPAREN   = Literal('(').suppress(), Literal(')').suppress()
LBRACK, RBRACK   = Literal('[').suppress(), Literal(']').suppress()
LBRACE, RBRACE   = Literal('{').suppress(), Literal('}').suppress()
SEMICOLON, COMMA = Literal(';').suppress(), Literal(',').suppress()
EQUAL            = Literal('=').suppress()
SIZE             = INTEGER | IDENTIFIER
VARNAME          = IDENTIFIER
TYPENAME         = IDENTIFIER
OPERATOR         = oneOf('+ - * / [ ] . & ^ ! { }')

PART        = nestedExpr() | nestedExpr('{','}') | IDENTIFIER | INTEGER | FLOAT | OPERATOR
EXPR        = delimitedList(PART, delim=Empty()).setParseAction(keepOriginalText)
VARIABLE    = (VARNAME('name') + Optional(LBRACK + SIZE + RBRACK)('size')
                               + Optional(EQUAL + EXPR)('value'))
VARIABLES   = delimitedList(VARIABLE.setResultsName('variables',listAllMatches=True))
DECLARATION = (TYPENAME('type') + VARIABLES + SEMICOLON)

code = '''
int a[3];
int b=1, c=2.0;
float d = f(z[2], 2) + 3*g(4,a), e;
Point f = {1,2};
'''

for (token, start, end) in DECLARATION.scanString(code):
    vtype = token.type
    for variable in token.variables:
        name = variable.name
        size = variable.size
        value = variable.value
        s = '%s / %s' % (vtype,name)
        if size:  s += ' [%s]' % size[0]
        if value: s += ' / %s' % value[0]
        s += ';'
        print s

2015-03-05 05:29:53 - ptmcg

Looks pretty good. Just a couple of thoughts. I usually do all my punctuation in a single statement:

LPAREN,RPAREN,LBRACK,RBRACK,LBRACE,RBRACE,SEMI,COMMA,EQUAL = map(Suppress, '()[]{};,=')  

The implementation of expr.suppress() is to just return Suppress(self).

\You might want to include '|' and '%' in your definition of OPERATOR.

I like your use of results names - I find that using the shortcut form leaves my parser easier to read:

expr.setResultsName('ABC')

can be written just

expr('ABC')

and

expr.setResultsName('XYZ', listAllMatches=True) 

can be written

expr('XYZ*').  

You might want to tweak your definitions of INT_DECIMAL, INT_OCTAL, and INT_HEX.
At the moment '0' would be parsed as an INT_OCTAL, and '0x' would be parsed as a valid INT_HEX. And I'm pretty sure that delimitedList(expr, delim=Empty()) is the same as OneOrMore(expr), or expr*(1,None) if you want to use the multiplication operator form for ZeroOrMore or OneOrMore.

But most of these are just style points, it looks like you've got a decent working parser, well done!

Cheers, -- Paul

2015-03-05 07:49:43 - rougier

Thanks for the review and especially for the expr('XYZ*'), I was very frustrated not being able to use the call syntax with listAllMatches=True. For the record, here is the new version:

IDENTIFIER  = Word(alphas+'_', alphas+nums+'_' )
INT_DECIMAL = Regex('([+-]?(([1-9][0-9]*)|0+))')
INT_OCTAL   = Regex('(0[0-7]+)')
INT_HEXADECIMAL = Regex('(0[xX][0-9a-fA-F]+)')
INTEGER     = INT_HEXADECIMAL | INT_OCTAL | INT_DECIMAL
FLOAT       = Regex('[+-]?(((\d+\.\d*)|(\d*\.\d+))([eE][-+]?\d+)?)|(\d*[eE][+-]?\d+)')
LPAREN,RPAREN,LBRACK,RBRACK,LBRACE,RBRACE,SEMI,COMMA,EQUAL = map(Suppress, '()[]{};,=')
SIZE        = INTEGER | IDENTIFIER
OPERATOR    = oneOf('+ - * / [ ] . & ^ ! { } % |')
PART        = nestedExpr() | nestedExpr('{','}') | IDENTIFIER | INTEGER | FLOAT | OPERATOR
EXPR        = OneOrMore(PART).setParseAction(keepOriginalText)
VARIABLE    = (IDENTIFIER('name') + Optional(LBRACK + SIZE + RBRACK)('size')
                                  + Optional(EQUAL + EXPR)('value'))
VARIABLES   = delimitedList(VARIABLE('variables*'))
DECLARATION = (IDENTIFIER('type') + VARIABLES + SEMI)

code = '''
int a[3];
int b=1, c=2.0;
float d = f(z[2], 2) + 3*g(4,a), e;
Point f = {1,2};
float g[1+2]; // This won't work but this is expected (size = identifier or integer)
'''

for (token, start, end) in DECLARATION.scanString(code):
    vtype = token.type
    for variable in token.variables:
        name = variable.name
        size = variable.size
        value = variable.value
        s = '%s / %s' % (vtype,name)
        if size:  s += ' [%s]' % size[0]
        if value: s += ' / %s' % value[0]
        s += ';'
        print s

2015-02-28 07:31:15 - hlamer - Parse a string with escape characters

Hi. Is there a clever way to parse a string, which contains escaped spaces? Escaped space should be considered as a symbol, not as a separator. Bash works this way.

ls file\ with\ spaces.txt  file_without_spaces.txt

I'd like to get

['ls', 'file with spaces.txt', 'file_without_spaces.txt']

2015-02-28 07:49:54 - hlamer

Come up with a solution just after posting.

nonSpace = Word(alphanums + '_.')
space = Literal('\\ ')
token = Combine((nonSpace^space) * (1, None))
token.parseString('file\ with\ spaces.txt')
>> (['file\\ with\\ spaces.txt'], {})

Delete the post please.

2015-03-05 05:15:55 - ptmcg

If you don't mind, I'd just as soon leave your post, as it is a good example of how to address this particular case. (As a Python tip, if you are going to have a string with embedded '' characters, it is a good habit to use Python's raw string literal form, to avoid accidentally inserting a '\t', '\a' or other escaped character sequence, when you actually mean to have the backslashes in the string. So in your sample this would look like: r'file\ with\ spaces.txt'.)


2015-03-05 01:14:20 - togr - Enhancement/bugfix to fourFn.py example

The order of expressions in the definition of 'atom' means that 'e' and 'pi' masks functions beginning with those strings:

@@ -77,6 +80,7 @@
 fn  = { 'sin' : math.sin,
         'cos' : math.cos,
         'tan' : math.tan,
+        'exp' : math.exp,
         'abs' : abs,
         'trunc' : lambda a: int(a),
         'round' : round,
@@ -131,6 +135,8 @@
     test( '6.02E23 * 8.048', 6.02E23 * 8.048 )
     test( 'e / 3', math.e / 3 )
     test( 'sin(PI/2)', math.sin(math.pi/2) )
+    test( 'exp(0)', 1 )
+    test( 'exp(1)', math.e )
     test( 'trunc(E)', int(math.e) )
     test( 'trunc(-E)', int(-math.e) )
     test( 'round(E)', round(math.e) )

The two added tests produce

exp(0)!!! 2.71828182846 != 1 ['E'] => ['E']
exp(1) = 2.71828182846 ['E'] => ['E']

where the latter passes by accident.

Changing the definition of 'atom' to the following (i.e. moving 'pi' and 'e' to after function application) fixes this without any regressions:

atom = (Optional('-') \
                + ( fnumber | ident + lpar + expr + rpar | pi | e ).setParseAction(pushFirst) \
                  | ( lpar + expr.suppress() + rpar )).setParseAction(pushUMinus)

Thanks, and best regards,

Tom Grydeland, Norut

2015-03-05 05:09:38 - ptmcg

Nice catch, I'll be sure to add your tests. But I think that I will take a slightly different tack on the means of solution. 'e' and 'pi' are defined using CaselessLiteral, which has the behavior you describe, but rather than reorder to test for them after the function definition, I'll leave atom the way it is, but instead define them using CaselessKeyword. This will serve as a good illustration of the whole rationale for having the Keyword and CaselessKeyword classes - unlike Literal and CaselessLiteral, the Keyword forms only match whole words, not just the leading string. I'll be sure to add comments as well, to explain why CaselessKeyword is being used instead of CaselessLiteral, as these examples are supposed to be instructional, and not just sample code base for users to leverage and expand on for themselves.

Thanks for posting, and Welcome to Pyparsing! :)

-- Paul


2015-03-30 06:37:57 - ansari11 - How to disallow having some keywords in an expression?

I am trying to write a parser for some SQL queries. But my select syntax does not distinguish between the keywords and the normal text. Is there any way to exclude some specific expressions or keywords from another? Here is the code

select_kw = CaselessKeyword('SELECT')
from_kw = CaselessKeyword('FROM')
where_kw = CaselessKeyword('WHERE')

keyword =oneOf (
    select_kw.setResultsName('SELECT') |
    from_kw.setResultsName('FROM') |
    where_kw.setResultsName('WHERE') |
)

##IDENTIFIER - START
simple_identifier = Combine(Word(alphas+'_', exact=1)+Optional(Word(alphanums+'_'+'#'+'$')))
special_identifier=QuotedString(''')
identifier1 = simple_identifier| special_identifier
identifier=~keyword+identifier1
##IDENTIFIER - END

schema_name = ~keyword+identifier.setResultsName('schema_name')
table_name = ~keyword+Combine(Optional(schema_name+'.')+identifier).setResultsName('table_name')
column_name = ~keyword+identifier.setResultsName('column_name')
column_list = Group(delimitedList(column_name, delim=',')).setResultsName('column_list')

##SELECT_CLAUSE - START
select_item = Optional(table_name.setResultsName('')+'.')+asterisk | expression+Optional(as_kw +column_alias)
select_list = Group(delimitedList(select_item, delim=',')).setResultsName('select_list')
select_clause =Group(select_kw+Optional(top_kw+unsigned_integer)+Optional( all_kw|distinct_kw ) +select_list).setResultsName('select_clause')
##SELECT_CLAUSE - END

##FROM_CLAUSE - START
table_alias = ~keyword+identifier.setResultsName('table_alias')
table= ~keyword+table_name + Optional(as_kw) +Optional(table_alias) 
from_clause= Group(from_kw+Group(delimitedList(table, delim=',')).setResultsName('table(s)')).setResultsName('from_clause')
where_clause=Group(where_kw+condition).setResultsName('where_clause')

##CONDITIONS
condition= condition.setResultsName('condition')+or_kw+condition.setResultsName('condition') | condition.setResultsName('condition')+and_kw+condition.setResultsName('condition')| not_kw+condition.setResultsName('condition')| left_brace+ condition.setResultsName('condition') +right_brace| predicate.setResultsName('predicate')
subquery = select_clause \
    +from_clause \
    +Optional(where_clause)\


ssql='select * from schm12.tbl4table1_1 where\n'
info =subquery.parseString(ssql)
dinfo=string.split(info.dump(),'\n')
pprint(dinfo)

Give me this output

['[['SELECT', ['*']], ['FROM', [['schm12.tbl4table1_1'], 'where']]]',
 '- from_clause: ['FROM', [['schm12.tbl4table1_1'], 'where']]',
 '  - table(s): [['schm12.tbl4table1_1'], 'where']',
 '    - table_alias: ['where']',
 '    - table_name: ['schm12.tbl4table1_1']',
 '      - schema_name: ['schm12']',
 '- select_clause: ['SELECT', ['*']]',
 '  - select_list: ['*']']

It does not exclude Where as a keyword. What should I do to exclude the keywords from such identifiers?

2015-03-30 16:47:21 - ptmcg

The basic problem is here:

select_kw=CaselessKeyword('SELECT')
from_kw=CaselessKeyword('FROM')
where_kw=CaselessKeyword('WHERE')

keyword =oneOf (
select_kw.setResultsName('SELECT')|
from_kw.setResultsName('FROM')|
where_kw.setResultsName('WHERE')|
)

The purpose of oneOf is to simplify (and correct) expressions like:

Literal('ABC') | Literal('DEF') | Literal('DEFGH')

to just:

oneOf('ABC DEF DEFGH')

In your definition of keyword, just list the keyword expressions or'ed together with the '|' operator:

keyword = select_kw | from_kw | where_kw

It is rarely necessary to define results names for literal strings or keywords, but if you really want them, I usually define them further down in the parser, where they get used in larger expressions.

Once you have changed the definition of keyword, your definition of identifier should be sufficiently guarded from accidentally matching a defined keyword, so you can remove most if not all the other '~keyword' expressions sprinkled about. Just be sure to update keyword when you add support for other SQL keywords as you expand the amount of SQL you plan to support with your parser. (I notice you already have some code to handle 'TOP', 'AS', 'OR', 'AND', and 'NOT'.)

I also find that my parser code is easier to read using a shortcut for 'setResultsName'. You can replace:

expr.setResultsName('ABCDEF')

with just plain:

expr('ABCDEF')

Glad to hear you are enjoying working with pyparsing! -- Paul


2015-04-09 03:51:42 - AndreWin - Nested formatting

Hi! I decided to create parser of dokuwiki code. It's not difficult for me to parse bold, italic or monospaced text:

import pyparsing as pp
bold_text = pp.QuotedString('**')
italic_text = pp.QutedString('//')
monospaced_text = pp.QuotedString('''')

In dokuwiki formatting can be nested. For exapmle:

_**''some text''**_

What the best way to format such expression?

Thanks in advance, Andrei.

2015-04-12 03:56:38 - AndreWin

I decided to solve this task by using .parseAction() that will search and parse nested expressions.


2015-04-12 04:00:26 - AndreWin - Parsing link in dokuwiki format

Hello! I should parse the following code:

[[http://google.com|This is Google]]

I typed:

urltext = pp.Literal('http://') + pp.Word(pp.alphanums+'/_.=?&#%')
link = pp.Literal('[[') + urltext + pp.Optional(pp.Literal('|') + ... ) + pp.Literal(']]')

I don't know what to write unstead of the points (...). Please help me.

Thanks in advance, Andre.

2015-04-12 04:37:11 - AndreWin

I found answer:

urltext = pp.Literal('http:<em>')+pp.Word(pp.alphanums+'/_.=?&#%')
link = pp.Literal('[[') + urltext + pp.Optional(pp.Literal('|') + pp.SkipTo(']]')) + pp.Literal(']]')

Test:

link.parseString('')
(['[[', 'http:</em>', 'google.ru', '|', 'This is Google', ']]'], {})

2015-04-12 04:38:18 - AndreWin

fix previous answer:

urltext = pp.Literal('http:')+pp.Word(pp.alphanums+'/_.=?&#%')
link = pp.Literal('[[') + urltext + pp.Optional(pp.Literal('|') + pp.SkipTo(']]')) + pp.Literal(']]')

Test:

link.parseString('This is Google')
(['[[', 'http:', 'google.ru', '|', 'This is Google', ']]'], {})

2015-04-15 05:36:10 - reneryu - Return results of NestedExprs as the way they are in the input string

Hi all,

I need to parse strings have format like '12@(1, 2@(1, 3))' and extract the two parts enclosed in the outermost parentheses separated by a period, in this case, 1 and 2@(1, 3). These two parts have same pattern, either numbers or numbers@parentheses. And the content inside the parentheses can be futhur nested in this way. Is there a way to retrive results like they were in the original string, '1' and '2@(1, 3)' in this case instead of having a list of every element? Thanks.

2015-04-16 04:23:44 - ptmcg

Whenever you have nesting of an element within itself, you will need to forward declare that expression using a pyparsing Forward expression. Then once you have the contents defined, you 'assign' them to the declared Forward using '<<' or '<<=' operator. To keep the nesting levels, the Group class adds structure to the parsed tokens. See below:

from pyparsing import *

AT = Literal('@')
integer = Word(nums)
marker = Combine(integer + AT)
LPAREN,RPAREN,COMMA = map(Suppress, '(),')

sample = '12@(1, 2@(1, 3))' 

term = Forward()
expr = Group(marker + Group(LPAREN + term + COMMA + term + RPAREN))
term <<= expr | integer


print expr.parseString(sample).asList()

prints:

[['12@', ['1', ['2@', ['1', '3']]]]]

2015-04-17 08:50:32 - rjmarshall17 - Newbie issue: Not able to get to nested block

Hi,

I'm new to pyparsing and I'm trying to write a parser for text that resembles:

dataType EPISODE episodeNumber xxx originalAirDate yyy length zzz
title: 'This is the episode title'
director: 'Fred Flintstone'
producer: 'Barney Rubble'
rating: 34.5
characters {
    total: 5
    cartoonist: 'Bam Bam'
    studio: 'ABC'
    character {
        name: 'Fred Flintstone'
        age: 30
        hair: black
        garment: spotted
        voice: 'Alan Reed'
    }
    character {
        name: 'Barney Rubble'
        age: 35
        hair: blonde
        garment: brown
        voice: 'Mel Blanc'
    }
    town: 'Bedrock'
    employer: 'Slate Rock and Gravel'
}
RecentAiring: aaa
RecentRating: bbb

Anyway, I think that should give you an idea, it's key/value pairs with nested sections and subsections that are labeled and also contain key/value pairs.

My first attempt at a parsers is:

lbrace, rbrace, lbracket, rbracket, equals, colon, space, quote = map(Suppress, '{}[]=: '')
key = Word(alphanums)
value = Optional(quote) + Word(alphanums + ' .-_') + Optional(quote)key_value = Dict(Group(key + colon + value))
key_value = Dict(Group(key + colon + value))
subsection = Dict(Group(Word(alphas) + lbrace + ZeroOrMore(key_value) + rbrace))
section = Dict(Group(Word(alphas) + lbrace + ZeroOrMore(key_value) + Group(ZeroOrMore(subsection)) + ZeroOrMore(key_value) + rbrace))

The key_value parser seems to work fine, as does subsection. But when I use the section parser on the characters section I see:

>>> result = section.parseString(data)
>>> pprint(result.asList())
[['characters',
  ['total', '5'],
  ['cartoonist', 'Bam Bam'],
  ['studio', 'ABC'],
  [['character',
    ['name', 'Fred Flintstone'],
    ['age', '30'],
    ['hair', 'black'],
    ['garment', 'spotted'],
    ['voice', 'Alan Reed']],
   ['character',
    ['name', 'Barney Rubble'],
    ['age', '35'],
    ['hair', 'blonde'],
    ['garment', 'brown'],
    ['voice', 'Mel Blanc']]],
  ['town', 'Bedrock'],
  ['employer', 'Slate Rock and Gravel']]]

But I don't seem to be able to get to the character subsections, e.g.:

result.characters.character[0].name
Traceback (most recent call last):
  File '<pyshell#288>', line 1, in <module>
    result.characters.character[0].name
IndexError: string index out of range

I'm not sure what I'm doing wrong. Any suggestions?

Thanks,

Rob

2015-04-17 08:52:28 - rjmarshall17

Sorry, the definitions should have been in a code block, i.e.:

lbrace, rbrace, lbracket, rbracket, equals, colon, space, quote = map(Suppress, '{}[]=: '')
key = Word(alphanums)
value = Optional(quote) + Word(alphanums + ' .-_') + Optional(quote)key_value = Dict(Group(key + colon + value))
key_value = Dict(Group(key + colon + value))
subsection = Dict(Group(Word(alphas) + lbrace + ZeroOrMore(key_value) + rbrace))
section = Dict(Group(Word(alphas) + lbrace + ZeroOrMore(key_value) + Group(ZeroOrMore(subsection)) + ZeroOrMore(key_value) + rbrace))

2015-04-17 09:00:13 - rjmarshall17

Never mind...figured it out:

>>> section = Dict(Group(Word(alphas) + lbrace + ZeroOrMore(key_value) + Group(ZeroOrMore(subsection))('character') + ZeroOrMore(key_value) + rbrace))
>>> result = section.parseString(data)
>>> result.characters.character[0].name
'Fred Flintstone'
>>> result.characters.character[1].name
'Barney Rubble'

2015-04-17 10:34:42 - rjmarshall17

Coming back to this...I'm beginning to see that assigning a name like this, rather than taking the name from the label on the section, is going to be problematic. Is there a way to use the label on the section rather than setting the name?

2015-04-18 12:31:28 - rjmarshall17

Hi,

So I tried something like this:

def labeledNestedExpr(bracket_type='brace',body=None):
    bracket_types = {
            'brace':map(Suppress,'{}'),
            'bracket':map(Suppress,'[]'),
            'paren':map(Suppress,'()'),
            }
    openBracket,closeBracket = bracket_types[bracket_type]
    startBlock = Word(alphanums+'-_+.') + openBracket
    bodyExpr = Forward()
    def getBlockLabel(toks):
        label = toks[0]
        bodyExpr << ZeroOrMore(body)(label)
        return None
    startBlock.addParseAction(getBlockLabel)
    return (startBlock + bodyExpr + closeBracket)

Which seems to work fine on a single level, e.g.:

data = '''
character {
        name: 'Fred Flintstone'
        age: 30
        hair: black
        garment: spotted
        voice: 'Alan Reed'
    }
'''

result = nested.parseString(data)
result.character.name
'Fred Flintstone'
result.character.voice
'Alan Reed'
result['character']['age']
'30'

But it, obviously, doesn't handle nesting. What do I need to do so that it will handle an arbritrary number of levels? I have tried nestedExpr but it doesn't give me the ability to label the blocks.

2015-04-18 14:12:22 - rjmarshall17

OK, so I fixed it that it handles nesting, but the result is incorrect. Here's what I have now:

#!/usr/bin/env python

from pyparsing import *
from pprint import pprint

lbrace, rbrace, lbracket, rbracket, equals, colon, space, quote = map(Suppress, '{}[]=: '')
key = Word(alphanums)
value = Optional(quote) + Word(alphanums + ' .-_/') + Optional(quote)
key_value = Dict(Group(key + colon + value))

def labeledNestedExpr(bracket_type='brace',body=None):
    bracket_types = {
        'brace':map(Suppress,'{}'),
        'bracket':map(Suppress,'[]'),
        'paren':map(Suppress,'()'),
    }
    openBracket,closeBracket = bracket_types[bracket_type]
    startBlock = Word(alphanums+'-_+.') + openBracket
    content = Forward()
    def getBlockLabel(toks):
        # print 'toks=%s' % toks
        label = toks[0]
        content << Group(body)(label)
        return None
    startBlock.addParseAction(getBlockLabel)
    ret = Forward()
    ret << Group(startBlock + ZeroOrMore( ret | content) + closeBracket)
    return ret

data = '''
characters {
    total: 5
    cartoonist: 'Bam Bam'
    studio: 'ABC'
    character {
        name: 'Fred Flintstone'
        age: 30
        hair: black
        garment: spotted
        voice: 'Alan Reed'
    }
    character {
        name: 'Barney Rubble'
        age: 35
        hair: blonde
        garment: brown
        voice: 'Mel Blanc'
    }
    town: 'Bedrock'
    employer: 'Slate Rock and Gravel'
}
'''

nested = Dict(labeledNestedExpr(body=key_value))
result = nested.parseString(data)
pprint(result.asList())

The result ends up:

[['characters',  ,
  ,
  ,
  ['character',
   ,
   ,
   ,
   ,
   ],
  ['character',
   ,
   ,
   ,
   ,
   ],
  ,
  ]]

So it's all there, but I wanted to be able to access it like:

result.characters.studio and get 'ABC'
result.characters.character[0].name and get 'Fred Flintstone'

Instead I get:

result.characters.studio
result.characters.character[0].name

So, again, any ideas what I'm doing wrong here?

Thanks,

Rob

2015-04-18 14:13:34 - rjmarshall17

Sorry, still not used to how to format these posts...

2015-05-01 16:24:48 - ptmcg

Yes, the wikispaces markup is not very code-friendly, I'm sorry about that. Congratulations on making as much progress as you have, Dict is one of the more advanced features in pyparsing. To help you figure out where things are going wrong, please try using results.dump() instead of results.asList() - dump() will give you more visibility into the names you have defined and what values they point to.


2015-04-25 04:32:07 - Jimorie - Efficiency problems when parsing nested function calls

Hello parsers!

I have written a basic dice and arithmetics parser, which works fine in most aspects. But it needs a crazy long time to parse nested function calls!

I would be very grateful if someone could help me tune this grammar for better performance!

I'll paste the pyparsing code below, which you can test running with an input like 'foo(foo(foo(1)))' and see that it takes exponentially longer to tokenize the string the more nested function calls you add.

# coding=utf-8

from pyparsing import *

class DiceParser (object):

    def __init__ (self):

        # Start with pre-defining recursive placeholder rules
        self.expr = Forward()

        # Numbers
        self.num = Combine(Word(nums) + Optional('.' + Optional(Word(nums))))
        #self.num.setParseAction(self.convertNum)

        # Paranthesis
        self.lpar = Literal('(').suppress()
        self.rpar = Literal(')').suppress()
        self.parexpr = self.lpar + self.expr + self.rpar

        # Expression lists
        self.exprlist = delimitedList(self.expr) | Empty()
        #self.exprlist.setParseAction(self.wrapList)

        # Arrays
        self.lbracket = Literal('[').suppress()
        self.rbracket = Literal(']').suppress()
        self.array = self.lbracket + self.exprlist + self.rbracket

        # Functions
        self.ident = Word(alphas, alphas + nums + '_')
        self.func = Forward()
        self.func << self.ident + self.lpar + self.exprlist + self.rpar
        #self.func.setParseAction(self.wrapFunc)

        # Our atomic expression
        self.atom = self.parexpr | self.func | self.array | self.num

        # Our operators
        self.diceop = CaselessLiteral('D') | CaselessLiteral('T')
        #self.diceop.setParseAction(self.wrapBinaryOp)
        self.dicemod = CaselessLiteral('op') | CaselessLiteral('ob')
        #self.dicemod.setParseAction(self.wrapUnaryOp)
        self.signop = oneOf('+ -')
        #self.signop.setParseAction(self.wrapUnaryOp)
        self.setop = oneOf('&')
        #self.setop.setParseAction(self.wrapBinaryOp)
        self.multop = oneOf('* /')
        #self.multop.setParseAction(self.wrapBinaryOp)
        self.plusop = oneOf('+ -')
        #self.plusop.setParseAction(self.wrapBinaryOp)

        # Operator precedence thankfully handled by the framework!
        self.expr << operatorPrecedence(self.atom,
            [(self.diceop, 2, opAssoc.LEFT),
             (self.dicemod, 1, opAssoc.RIGHT),
             (self.signop, 1, opAssoc.RIGHT),
             (self.setop, 2, opAssoc.LEFT),
             (self.multop, 2, opAssoc.LEFT),
             (self.plusop, 2, opAssoc.LEFT),]
        )

if __name__ == '__main__':
    from sys import argv
    string = ' '.join(argv[1:])
    parser = DiceParser()
    print parser.expr.parseString(string, parseAll=True)

Thanks in advance for any help, and for the excellent pyparsing library!

2015-05-01 16:21:00 - ptmcg

With recursive grammars (and operatorPrecedence internally creates a recursive grammar), you can often get performance improvements by using ParserElement.enablePackrat().

Call this method after importing pyparsing and see if you get any improvements.

You'll also see some benefits by replacing self.num with self.num = Regex(r'\d+(\.\d*)?') which is equivalent to the full pyparsing expression you have written.


2015-04-28 00:36:28 - mwjackson - parse trees with infixNotation

Hi,

I'm having trouble turning multiple infix operators into a nested parse tree.

My grammar is thus:

atom = quotedString ^ Word(alphanums) ^ Combine(Word(nums) + Optional(Literal('.')) + Optional(Word(nums)))
ident = Word(alphas, alphanums + '_')
expression = infixNotation(applicable, [
    (oneOf('^'), 2, opAssoc.LEFT, push_op_first),
    (oneOf('* / %'), 2, opAssoc.LEFT, push_op_first),
    (oneOf('+ -'), 2, opAssoc.LEFT),
    (oneOf('< <= > >='), 2, opAssoc.LEFT, push_op_first),
    (oneOf('== !='), 2, opAssoc.LEFT, push_op_first),
    (oneOf('& ^ |'), 2, opAssoc.LEFT, push_op_first),
    (oneOf('|>'), 2, opAssoc.LEFT, push_first_and_compose),
    (oneOf('='), 2, opAssoc.LEFT )
])
applicable <<= atom ^ ident ^ collection ^ func_appl ^ lambda_exp

And I would expect the result of

expression.parseString('x = 1 + 2 + 3').asList()

to parse as

[['x', '=', [[1, '+', 2], '+', 3]]]

but instead am seeing

[['x', '=', [1, '+', 2, '+', 3]]]

Although when I use opAssoc.RIGHT, I see the desired nested tree with the reverse ordering ([['x', '=', [1, '+', [2, '+', 3]]]]), but using opAssoc.LEFT, the nesting disappears...

Any ideas?

2015-05-01 16:15:55 - ptmcg

This is a known issue with infixNotation. Left-associative operations that have more than 2 successive operators are all collected into a single list. I have written a workaround parse action that will do the conversion of one of these lists to the nested pairs of pairs form. Or you can simply evaluate the list from left-to-right.

2015-05-01 18:06:22 - mwjackson

Where can I find this workaround?

2015-05-03 09:50:53 - ptmcg

I searched and searched and could not find the email thread that contained this, so I had to recreate the parse action. See the following code:

from pyparsing import *

atom = Word(alphas) | Word(nums)

def nest_operand_pairs(tokens):
    tokens = tokens[0]
    ret = ParseResults(tokens[:3])
    remaining = iter(tokens[3:])
    done = False
    while not done:
        next_pair = (next(remaining,None), next(remaining,None))
        if next_pair == (None, None):
            done = True
            break
        ret = ParseResults([ret])
        ret += ParseResults(list(next_pair))
    return [ret]

expr = infixNotation(atom,
    [
    (oneOf('* /'), 2, opAssoc.LEFT, ),
    (oneOf('+ -'), 2, opAssoc.LEFT, ),
    ])

print expr.parseString('a + b + c')
print expr.parseString('a + b * d + c * e')



# redefine expr, adding parse actions to binary operations to simulate 
# recursive parsing
expr = infixNotation(atom,
    [
    (oneOf('* /'), 2, opAssoc.LEFT, nest_operand_pairs),
    (oneOf('+ -'), 2, opAssoc.LEFT, nest_operand_pairs),
    ])

print expr.parseString('a + b + c')
print expr.parseString('a + b * d + c * e')

2015-05-06 01:01:59 - mwjackson

great, thanks!


2015-05-01 07:27:42 - Euticus - Newbie question: using restOfLine

I am relatively new to Python and VERY new to pyparsing, but I have been very pleased with pyparsing so far. I am trying to build an assembler for a 'classic' microprocessor. In this assembly language, a comment begins with a ';' and all characters til the end of the line. Here's one of my attempts:

import pyparsing as pp

comment = pp.Literal(';') + pp.restOfLine()    
test=(
    'hi there',
    ';this is a commet\n')

for t in test:
    print t, '->', comment.parseSring(t)

Running this gives me a traceback. Can you tell me what I am doing wrong.

2015-05-01 16:12:40 - ptmcg

Debugging your posted code:

>python 'vv.py'
    hi there ->
    Traceback (most recent call last):
      File 'vv.py', line 10, in <module>
        print t, '->', comment.parseSring(t)
    AttributeError: 'And' object has no attribute 'parseSring'

This is due to a typo - there is no method parseSring, it is spelled parseString. Fixing this gives:

    >python 'vv.py'
    hi there ->
    Traceback (most recent call last):
      File 'vv.py', line 10, in <module>
        print t, '->', comment.parseString(t)
      File 'C:\Python27\lib\site-packages\pyparsing.py', line 1125, in parseString
        raise exc
    pyparsing.ParseException: Expected ';' (at char 0), (line:1, col:1)

This is actually working correctly. Your first test does not start with a comment, so you get a pyparsing exception that no leading ';' was found.

I changed your code slightly to read:

    for t in test:
        print t, '->', pp.Optional(comment).parseString(t)

and now we get:

    >python 'vv.py'
    hi there -> []
    ;this is a commet
    -> [';', 'this is a commet']

2015-05-02 09:09:56 - Euticus

Thanks-I guess I must have been out of it when I checked my code!


2015-05-07 15:37:13 - thatsgobbles - Newbie to pyparsing: Issues using Forward and maxiumum recursion depth

Hello all,

I'm having trouble with a postfix recipe grammar I am working on. In the grammar, it is possible to have operands (ingredients) acted upon by unary and binary operators. The resulting products can then be used as operands in later operations. I'm seeing what looks like an infinite recursion when trying to parse a simple example of my recipe grammar. Shown below is my pyparsing code:

from pyparsing import *

INGREDIENT_SIGIL                =   Literal( '*' )
UNARY_OP_SIGIL                  =   Literal( '=' )
BINARY_OP_SIGIL                 =   Literal( '/' )
VARIANT_LIST_START_SIGIL        =   Literal( '[' )
VARIANT_LIST_SEPARATOR          =   Literal( '|' )
VARIANT_LIST_CLOSE_SIGIL        =   Literal( ']' )
VARIANT_TAG_LIST_SEPARATOR      =   Literal( ',' )
PSEUDO_INGREDIENT_SIGIL         =   Literal( ':' )
SIMULTANEOUS_OP_SIGIL           =   Literal( '+' )
VARIANT_TAG_SIGIL               =   Literal( '#' )
MODIFIER_SIGIL                  =   Literal( ',' )
RECIPE_START_SIGIL              =   Literal( '<' )
RECIPE_CLOSE_SIGIL              =   Literal( '>' )
ANNOTATION_SIGIL                =   Literal( ';' )
COMMENT_START_SIGIL             =   Literal( '(' )
COMMENT_CLOSE_SIGIL             =   Literal( ')' )
OPERAND_GROUP_START_SIGIL       =   Literal( '{' )
OPERAND_GROUP_CLOSE_SIGIL       =   Literal( '}' )
HEADING_START_SIGIL             =   Literal( '.' )
HEADING_CLOSE_SIGIL             =   Literal( '.' )

general_operand                 =   Forward()
general_unary_operator          =   Forward()
general_binary_operator         =   Forward()

nz_digits                       =   '123456789'
digits                          =   '0' + nz_digits
id_char                         =   alphas + '-' \''
title_char                      =   id_char + digits + '[]#'

string                          =   Word( title_char )
phrase                          =   Word( id_char )
pos_integer                     =   Word( nz_digits, digits )

modifier                        =   MODIFIER_SIGIL + string
annotation                      =   ANNOTATION_SIGIL + string

sentence                        =   phrase + ZeroOrMore( modifier ) + ZeroOrMore( annotation )

simple_operand                  =   ( INGREDIENT_SIGIL
                                    + Optional( PSEUDO_INGREDIENT_SIGIL )
                                    + sentence
                                    )
simple_unary_operator           =   ( UNARY_OP_SIGIL
                                    + Optional( SIMULTANEOUS_OP_SIGIL )
                                    + sentence
                                    )
simple_binary_operator          =   ( BINARY_OP_SIGIL
                                    + Optional( SIMULTANEOUS_OP_SIGIL )
                                    + sentence
                                    )

variant_tag_list                =   VARIANT_TAG_SIGIL + pos_integer + ZeroOrMore( VARIANT_TAG_LIST_SEPARATOR + pos_integer )

heading                         =   HEADING_START_SIGIL + phrase + ZeroOrMore( annotation ) + HEADING_CLOSE_SIGIL

variant_operand                 =   ( VARIANT_LIST_START_SIGIL
                                    + general_operand + Optional( variant_tag_list )
                                    + OneOrMore( VARIANT_LIST_SEPARATOR + general_operand + Optional( variant_tag_list ))
                                    + VARIANT_LIST_CLOSE_SIGIL
                                    )
variant_unary_operator          =   ( VARIANT_LIST_START_SIGIL
                                    + general_unary_operator + Optional( variant_tag_list )
                                    + OneOrMore( VARIANT_LIST_SEPARATOR + general_unary_operator + Optional( variant_tag_list ))
                                    + VARIANT_LIST_CLOSE_SIGIL
                                    )
variant_binary_operator         =   ( VARIANT_LIST_START_SIGIL
                                    + general_binary_operator + Optional( variant_tag_list )
                                    + OneOrMore( VARIANT_LIST_SEPARATOR + general_binary_operator + Optional( variant_tag_list ))
                                    + VARIANT_LIST_CLOSE_SIGIL
                                    )

general_operand                 <<  ( ( general_operand + general_operand + simple_binary_operator )
                                    | ( general_operand + simple_unary_operator )
                                    | simple_operand
                                    | variant_operand
                                    )

general_unary_operator          <<  ( simple_unary_operator
                                    | variant_unary_operator
                                    )

general_binary_operator         <<  ( simple_binary_operator
                                    | variant_binary_operator
                                    )

recipe                          =   ( RECIPE_START_SIGIL
                                    + heading
                                    + general_operand
                                    + RECIPE_CLOSE_SIGIL
                                    )

The sample input recipe:

<.Savory Wild Mushroom Soup.
* mushrooms =chop
>

Cheers, Mark

2015-05-17 07:24:44 - ptmcg

I've taken a run at postfix expressions before, and I never worked out a good solution to the inherent left-recursion. So for your example, I took another run at postfix. See below:

from pyparsing import *
from collections import deque

# define stack for push/pop expressions in parse actions
expr_stack = deque()

# define parse actions for each expression operand, bin_op, and unary_op
def pushOperand(tokens):
    expr_stack.append(tokens[0])

def pop1thenPush(tokens):
    prefix = 'un:'
    op1 = None
    try:
        op1 = expr_stack.pop()
    except Exception:
        if op1 is not None:
            expr_stack.append(op1)
        raise
    expr_stack.append(ParseResults([op1,prefix+tokens[0]]))

def pop2thenPush(tokens):
    prefix = 'bin:'
    op1 = op2 = None
    try:
        op2 = expr_stack.pop()
    except Exception:
        if op2 is not None:
            expr_stack.append(op2)
        raise
    try:
        op1 = expr_stack.pop()
    except Exception:
        if op1 is not None:
            expr_stack.append(op1)
        if op2 is not None:
            expr_stack.append(op2)
        raise
    expr_stack.append(ParseResults([op1,op2,prefix+tokens[0]]))

# define parse action for overall expr to pull off from expr_stack,
# and clear it for next parse
def getStack(tokens):
    ret = list(expr_stack)
    expr_stack.clear()
    if len(ret) > 1:
        raise ParseException('too many operands for given operators')
    return ret[0]


# define the expressions
operand = Word(alphas)
bin_op = oneOf('+ - * /')
unary_op = oneOf('+ -')
expr = OneOrMore(bin_op | unary_op | operand)


# attach parse actions to expressions
operand.setParseAction(pushOperand)
unary_op.setParseAction(pop1thenPush)
bin_op.setParseAction(pop2thenPush)
expr.setParseAction(getStack)


# some tests, including some failure cases 
tests = '''\
A B +
A -
A B + C + -
A B + C + - - -
A B + C + - - - D E - +
A B + C + - - - + D E - +
A B C + +
A B C
A B + * C + - - - + D E - +
'''.splitlines()

for t in tests:
    print t
    try:
        print expr.parseString(t, parseAll=True)
    except ParseException as pe:
        print 'Error: ' + str(pe)
    print

This prints:

A B +
['A', 'B', 'bin:+']

A -
['A', 'un:-']

A B + C + -
[[['A', 'B', 'bin:+'], 'C', 'bin:+'], 'un:-']

A B + C + - - -
[[[[['A', 'B', 'bin:+'], 'C', 'bin:+'], 'un:-'], 'un:-'], 'un:-']

A B + C + - - - D E - +
[[[[[['A', 'B', 'bin:+'], 'C', 'bin:+'], 'un:-'], 'un:-'], 'un:-'], ['D', 'E', 'bin:-'], 'bin:+']

A B + C + - - - + D E - +
[[[[[[['A', 'B', 'bin:+'], 'C', 'bin:+'], 'un:-'], 'un:-'], 'un:-'], 'un:+'], ['D', 'E', 'bin:-'], 'bin:+']

A B C + +
['A', ['B', 'C', 'bin:+'], 'bin:+']

A B C
Error: too many operands for given operators (at char 0), (line:1, col:1)

A B + * C + - - - + D E - +
Error: Expected end of text (at char 6), (line:1, col:7)

The left-recursion is handled using a supplemental stack, essentially a 'lookbehind' after each operator. It is also important to test for binary operations before unary ones, since our success/failure test for matching is whether there are one or two items on the stack on which to operate.

Thanks for being patient on this, -- Paul


2015-05-17 06:00:21 - Animusmontus - Script parser goes infinite loop at end of script? What do I do to stop this?

<snip>
    r_part = pp.Or([r_comment_line, r_includes, r_global_constants, r_global_variables, r_forward_declarations, r_main_function, r_function_declarations])

    r_program = pp.OneOrMore(r_part) + pp.stringEnd
 <snip>




//========================================================================
//  SE5 Main Event Global Constants
//========================================================================
//  

//------------------------------------------------------------------------
// Global Constants
//------------------------------------------------------------------------
globalconst

  AI_MAIN_EVENT_EVENTTEXT_TXT:                                                     string    := 'EventText.txt'
  AI_MAIN_EVENT_NORMAL:                                                            string    := 'Normal'

  AI_MAIN_EVENT_FLAG_VEHICLENAME:                                                  string    := '[%VehicleName%]'
  AI_MAIN_EVENT_FLAG_DAMAGEAMOUNT:                                                 string    := '[%DamageAmount%]'
  AI_MAIN_EVENT_FLAG_SYSTEMNAME:                                                   string    := '[%SystemName%]'
  AI_MAIN_EVENT_FLAG_SECTORLOC:                                                    string    := '[%SectorLoc%]'
  AI_MAIN_EVENT_FLAG_MOVEAMOUNT:                                                   string    := '[%MoveAmount%]'
  AI_MAIN_EVENT_FLAG_SUPPLYAMOUNT:                                                 string    := '[%SupplyAmount%]'
  AI_MAIN_EVENT_FLAG_ORDNANCEAMOUNT:                                               string    := '[%OrdnanceAmount%]'
  AI_MAIN_EVENT_FLAG_DESTSYSTEMNAME:                                               string    := '[%DestSystemName%]'
  AI_MAIN_EVENT_FLAG_DESTSECTORLOC:                                                string    := '[%DestSectorLoc%]'
  AI_MAIN_EVENT_FLAG_PLANETNAME:                                                   string    := '[%PlanetName%]'
  AI_MAIN_EVENT_FLAG_AMOUNT:                                                       string    := '[%Amount%]'
  AI_MAIN_EVENT_FLAG_STARNAME:                                                     string    := '[%StarName%]'
  AI_MAIN_EVENT_FLAG_WARPPOINTNAME:                                                string    := '[%WarpPointName%]'
  AI_MAIN_EVENT_FLAG_STORMNAME:                                                    string    := '[%StormName%]'
  AI_MAIN_EVENT_FLAG_MINERALSAMOUNT:                                               string    := '[%MineralsAmount%]'
  AI_MAIN_EVENT_FLAG_ORGANICSAMOUNT:                                               string    := '[%OrganicsAmount%]'
  AI_MAIN_EVENT_FLAG_RADIOACTIVESAMOUNT:                                           string    := '[%RadioactivesAmount%]'
  AI_MAIN_EVENT_FLAG_TIMEREMAINING:                                                string    := '[%TimeRemaining%]'

  AI_MAIN_EVENT_EVENT_MESSAGE_1_TITLE:                                             string    := 'Event Message 1 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_1_TEXT:                                              string    := 'Event Message 1 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_1_SMALL_PICTURE_FILENAME:                            string    := 'Event Message 1 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_1_LARGE_PICTURE_FILENAME:                            string    := 'Event Message 1 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_2_TITLE:                                             string    := 'Event Message 2 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_2_TEXT:                                              string    := 'Event Message 2 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_2_SMALL_PICTURE_FILENAME:                            string    := 'Event Message 2 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_2_LARGE_PICTURE_FILENAME:                            string    := 'Event Message 2 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_3_TITLE:                                             string    := 'Event Message 3 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_3_TEXT:                                              string    := 'Event Message 3 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_3_SMALL_PICTURE_FILENAME:                            string    := 'Event Message 3 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_3_LARGE_PICTURE_FILENAME:                            string    := 'Event Message 3 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_4_TITLE:                                             string    := 'Event Message 4 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_4_TEXT:                                              string    := 'Event Message 4 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_4_SMALL_PICTURE_FILENAME:                            string    := 'Event Message 4 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_4_LARGE_PICTURE_FILENAME:                            string    := 'Event Message 4 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_5_TITLE:                                             string    := 'Event Message 5 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_5_TEXT:                                              string    := 'Event Message 5 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_5_SMALL_PICTURE_FILENAME:                            string    := 'Event Message 5 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_5_LARGE_PICTURE_FILENAME:                            string    := 'Event Message 5 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_6_TITLE:                                             string    := 'Event Message 6 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_6_TEXT:                                              string    := 'Event Message 6 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_6_SMALL_PICTURE_FILENAME:                            string    := 'Event Message 6 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_6_LARGE_PICTURE_FILENAME:                            string    := 'Event Message 6 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_7_TITLE:                                             string    := 'Event Message 7 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_7_TEXT:                                              string    := 'Event Message 7 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_7_SMALL_PICTURE_FILENAME:                            string    := 'Event Message 7 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_7_LARGE_PICTURE_FILENAME:                            string    := 'Event Message 7 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_8_TITLE:                                             string    := 'Event Message 8 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_8_TEXT:                                              string    := 'Event Message 8 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_8_SMALL_PICTURE_FILENAME:                            string    := 'Event Message 8 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_8_LARGE_PICTURE_FILENAME:                            string    := 'Event Message 8 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_9_TITLE:                                             string    := 'Event Message 9 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_9_TEXT:                                              string    := 'Event Message 9 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_9_SMALL_PICTURE_FILENAME:                            string    := 'Event Message 9 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_9_LARGE_PICTURE_FILENAME:                            string    := 'Event Message 9 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_10_TITLE:                                            string    := 'Event Message 10 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_10_TEXT:                                             string    := 'Event Message 10 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_10_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 10 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_10_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 10 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_11_TITLE:                                            string    := 'Event Message 11 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_11_TEXT:                                             string    := 'Event Message 11 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_11_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 11 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_11_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 11 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_12_TITLE:                                            string    := 'Event Message 12 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_12_TEXT:                                             string    := 'Event Message 12 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_12_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 12 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_12_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 12 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_13_TITLE:                                            string    := 'Event Message 13 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_13_TEXT:                                             string    := 'Event Message 13 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_13_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 13 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_13_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 13 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_14_TITLE:                                            string    := 'Event Message 14 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_14_TEXT:                                             string    := 'Event Message 14 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_14_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 14 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_14_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 14 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_15_TITLE:                                            string    := 'Event Message 15 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_15_TEXT:                                             string    := 'Event Message 15 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_15_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 15 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_15_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 15 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_16_TITLE:                                            string    := 'Event Message 16 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_16_TEXT:                                             string    := 'Event Message 16 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_16_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 16 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_16_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 16 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_17_TITLE:                                            string    := 'Event Message 17 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_17_TEXT:                                             string    := 'Event Message 17 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_17_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 17 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_17_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 17 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_18_TITLE:                                            string    := 'Event Message 18 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_18_TEXT:                                             string    := 'Event Message 18 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_18_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 18 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_18_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 18 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_19_TITLE:                                            string    := 'Event Message 19 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_19_TEXT:                                             string    := 'Event Message 19 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_19_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 19 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_19_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 19 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_20_TITLE:                                            string    := 'Event Message 20 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_20_TEXT:                                             string    := 'Event Message 20 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_20_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 20 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_20_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 20 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_21_TITLE:                                            string    := 'Event Message 21 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_21_TEXT:                                             string    := 'Event Message 21 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_21_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 21 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_21_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 21 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_22_TITLE:                                            string    := 'Event Message 22 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_22_TEXT:                                             string    := 'Event Message 22 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_22_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 22 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_22_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 22 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_23_TITLE:                                            string    := 'Event Message 23 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_23_TEXT:                                             string    := 'Event Message 23 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_23_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 23 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_23_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 23 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_24_TITLE:                                            string    := 'Event Message 24 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_24_TEXT:                                             string    := 'Event Message 24 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_24_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 24 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_24_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 24 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_25_TITLE:                                            string    := 'Event Message 25 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_25_TEXT:                                             string    := 'Event Message 25 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_25_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 25 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_25_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 25 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_26_TITLE:                                            string    := 'Event Message 26 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_26_TEXT:                                             string    := 'Event Message 26 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_26_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 26 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_26_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 26 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_27_TITLE:                                            string    := 'Event Message 27 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_27_TEXT:                                             string    := 'Event Message 27 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_27_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 27 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_27_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 27 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_28_TITLE:                                            string    := 'Event Message 28 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_28_TEXT:                                             string    := 'Event Message 28 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_28_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 28 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_28_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 28 Large Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_29_TITLE:                                            string    := 'Event Message 29 Title'
  AI_MAIN_EVENT_EVENT_MESSAGE_29_TEXT:                                             string    := 'Event Message 29 Text'
  AI_MAIN_EVENT_EVENT_MESSAGE_29_SMALL_PICTURE_FILENAME:                           string    := 'Event Message 29 Small Picture Filename'
  AI_MAIN_EVENT_EVENT_MESSAGE_29_LARGE_PICTURE_FILENAME:                           string    := 'Event Message 29 Large Picture Filename'
 endglobalconst

I am using python3. Use parser.r_program.parseFile().

Thanks in advance.

2015-05-17 06:03:24 - Animusmontus

Opps forgot to say that it does identify the comment lines and global constants. The problem is at the end of the file right after the 'endglobalconst'. The is no carriage return or line feed after 'endglobalconst'.

2015-05-17 07:00:34 - ptmcg

Well, it is helpful to see the full script, but this isn't really enough for me to see the problem directly. But let me suggest a troubleshooting approach.

From the naming, it looks like this input file contains only a r_global_constants structure. See if you have the same problem if you try parsing it with just that one expression instead of the full Or of all the program parts. If that still has an infinite loop, then please post the definition of r_global_constants (and any expressions it references). If that does not infinitely loop, then try adding other program parts one at a time until you get the infinite looping behavior. Then you can focus on what that added component does - for instance, if it starts with empty or restOfLine, those expressions might loop forever on the end of the input, since they don't necessarily advance the parsing position. Also, I suggest you try using MatchFirst instead of Or, if your components are sufficiently non-ambiguous. If you get infinite looping even on just r_global_constants, there might also be an issue with your definition of comments, if it is possible that they might match an empty string. HTH, -- Paul

2015-05-17 08:48:36 - Animusmontus

That script is the whole script minus some constant definitions so that it would fit the 20kb message limit. I have debug set to true on the r_comment_line, r_includes and r_global_constants. I can see by the reported line and column that it is past the endglobalconst and attempting to parse non existent text. I originally did not have the stringEnd added to the r_program and it was doing the same thing. I thought it needed to be told where the end was, but it did not work.

The whole python script is:

#!/usr/bin/python3
'''
'''

import pyparsing as pp

_debug = True
_caseless = True

class parser(object):
    '''
    '''

    s_uppers = pp.srange('[A-Z0-9]')
    s_uppers_ = pp.srange('[A-Z0-9_!@#$%^&*]')

    t_int_literal = pp.Word(pp.nums)
    t_str_literal = pp.QuotedString(''', '\\', '''')
    t_real_literal = pp.Or([pp.Combine(pp.Word(pp.nums) + '.' + pp.Word(pp.nums) + pp.Optional(pp.oneOf('E e') + pp.Optional(pp.oneOf('+ -')) + pp.Word(pp.nums))),
                            pp.Combine(pp.Word(pp.nums) + pp.oneOf('E e') + pp.Optional(pp.oneOf('+ -')) + pp.Word(pp.nums))])
    t_bool_literal = pp.oneOf('false true')
    t_crlf = pp.Literal('\r\n')
    t_cr = pp.Literal('\r')
    t_lf = pp.Literal('\n')
    t_colon = pp.Suppress(pp.Literal(':'))
    t_assign = pp.Suppress(pp.Literal(':='))
    t_lp = pp.Suppress(pp.Literal('('))
    t_rp = pp.Suppress(pp.Literal(')'))
    t_end = pp.Suppress('end')
    t_begin = pp.Suppress('begin')

    r_filename = pp.Combine(pp.Word(pp.alphanums + '_') + '.txt')
    r_filename.setName('filename')
    r_filename.setDebug(True)

    r_include = pp.Group('#include' + r_filename)
    r_include.setName('include statement')
    r_include.setDebug(True)

    r_includes = pp.Group(pp.ZeroOrMore(r_include))

    r_upper_id = pp.Word(s_uppers, s_uppers_)
    r_upper_id.setName('upper id')
    r_upper_id.setDebug(True)

    r_type = pp.oneOf('long string real boolean longlist')
    r_type.setName('type')
    r_type.setDebug(True)

    r_value = t_int_literal | t_str_literal | t_real_literal | t_bool_literal
    r_value.setName('value')
    r_value.setDebug(True)

    r_constant_declaration = pp.Group(r_upper_id + t_colon + r_type + t_assign + r_value)
    r_constant_declaration.setName('constant declaration')
    r_constant_declaration.setDebug(True)

    r_constant_declarations = pp.OneOrMore(r_constant_declaration)

    r_global_constants = pp.Group('globalconst' + r_constant_declarations + pp.Suppress('endglobalconst'))
    r_global_constants.setName('global constants')
    r_global_constants.setDebug(True)

    r_id = pp.Word(pp.alphas, pp.alphanums + '_')

    r_variable_declaration = pp.Group(r_id + t_colon + r_type +
                                      pp.Optional(t_assign + r_value))

    r_variable_declarations = pp.OneOrMore(r_variable_declaration)

    r_global_variables = pp.Group('globalvars' + r_variable_declarations + pp.Suppress('endglobalvars'))

    r_function_parameter = pp.Group(r_id + t_colon + r_type + pp.Suppress(pp.Optional('byref')))

    r_function_parameters = pp.OneOrMore(r_function_parameter)

    r_forward_function_declaration = pp.Group('function' + r_id + pp.Suppress('returns') + r_type + 'params' + r_function_parameters + t_end)

    r_forward_function_declarations = pp.OneOrMore(r_forward_function_declaration)

    r_forward_declarations = pp.Group('deffunc' + r_forward_function_declarations + pp.Suppress('enddeffunc'))

    r_function_variable_declaration = pp.Group(r_id + t_colon + r_type)

    r_function_variable_declarations = pp.OneOrMore(r_function_variable_declaration)

    r_function_vars_declarations = pp.Group('vars' + r_function_variable_declarations)

    r_statement = pp.Forward()

    r_statements = pp.Group(pp.OneOrMore(r_statement))

    r_main_function = pp.Group('function' + 'Main' + pp.Suppress('returns') + r_type + r_function_vars_declarations + t_begin + r_statements + t_end)

    r_function_declaration = pp.Group('function' + r_id + pp.Suppress('returns') + r_type + pp.Suppress('params') + r_function_parameters + r_function_vars_declarations + t_begin +
                                      r_statements + t_end)

    r_function_declarations = pp.Group(pp.ZeroOrMore(r_function_declaration))

    r_comment_line = pp.Suppress('//' + pp.restOfLine('comment'))

    r_part = pp.Or([r_comment_line, r_includes, r_global_constants, r_global_variables, r_forward_declarations, r_main_function, r_function_declarations])

    r_program = pp.OneOrMore(r_part) + pp.stringEnd

    r_expression = pp.Forward()

    r_set_statement = pp.Group('set' + r_id + t_assign + r_expression)

    r_else_statement = pp.Group('else' + r_statements)

    r_if_statement = pp.Group('if' + r_expression + pp.Suppress('then') + r_statements + pp.Optional(r_else_statement) + pp.Suppress('endif'))

    r_for_statement = pp.Group('for' + r_id + t_assign + r_expression + pp.Suppress('to') + r_expression + pp.Suppress('do') + r_statements + pp.Suppress('endfor'))

    r_params = pp.Group(pp.delimitedList(r_expression))

    r_function_call = pp.Group(r_id + t_lp + r_params + t_rp)

    r_member_call = pp.Combine(r_id + '.' + r_function_call)

    r_callable = r_function_call | r_member_call

    r_expr_value = pp.Or([t_int_literal, t_str_literal, t_real_literal, t_bool_literal, r_callable])

    r_call_statement = pp.Group('call' + r_callable)

    r_loop_statement = pp.Group('loop' + r_statements + pp.Suppress('endloop'))

    r_return_statement = pp.Group('return' + r_expression)

    r_exitwhen_statement = pp.Group('exitwhen' + r_expression)

    r_case_test_statement = pp.Group(r_expr_value + t_colon + r_statements)

    r_case_test_statements = pp.OneOrMore(r_case_test_statement)

    r_case_statement = pp.Group('case' + r_id + r_case_test_statements + pp.Suppress('endcase'))

    r_statement <<= pp.Or([r_comment_line, r_set_statement, r_if_statement, r_for_statement, r_call_statement, r_loop_statement, r_return_statement, r_exitwhen_statement,
                          r_case_statement])

    r_expression <<= pp.operatorPrecedence(r_expr_value, 
                                           (
                                            (pp.oneOf('+ -'), 1, pp.opAssoc.RIGHT),
                                            ('^', 2, pp.opAssoc.RIGHT),
                                            (pp.oneOf('* /'), 2, pp.opAssoc.LEFT),
                                            (pp.oneOf('+ -'), 2, pp.opAssoc.LEFT),
                                            (pp.oneOf('and or xor !'), 2, pp.opAssoc.LEFT),
                                            (pp.oneOf('= < > >= <= <>'), 2, pp.opAssoc.LEFT)
                                           )
                                          )

2015-05-17 09:21:17 - Animusmontus

MatchFirst is a bust. Instead of going through all the Or it infinitly loops over r_includes at the position of 'globalconst'.

2015-05-17 10:35:01 - ptmcg

Here is the parser fragment you posted earlier:

r_part = pp.Or([r_comment_line, r_includes, r_global_constants, r_global_variables, r_forward_declarations, r_main_function, r_function_declarations])
r_program = pp.OneOrMore(r_part) + pp.stringEnd

For the sake of readability, let's write that as:

r_part = pp.Or([r_comment_line, 
                r_includes, 
                r_global_constants, 
                r_global_variables, 
                r_forward_declarations, 
                r_main_function, 
                r_function_declarations])
r_program = pp.OneOrMore(r_part) + pp.stringEnd

Although I still find the operator syntax generally more pleasing to read:

r_part = (r_comment_line ^ 
            r_includes ^ 
            r_global_constants ^ 
            r_global_variables ^ 
            r_forward_declarations ^ 
            r_main_function ^ 
            r_function_declarations)

The problem in your parser is that you define two of these sub-elements using 'ZeroOrMore(subsubexpression)', so they will successfully match even if the subsubexpression is not present, such as at the end of the program. And since they match on nothing, they don't advance the parse position at all, and so the next OneOrMore iteration, they match nothing again, and again, and again.

The solution is to change this statement:

r_includes = pp.Group(pp.ZeroOrMore(r_include))

to this:

r_includes = pp.Group(pp.OneOrMore(r_include))

and this statement:

r_function_declarations = pp.Group(pp.ZeroOrMore(r_function_declaration))

to this:

r_function_declarations = pp.Group(pp.OneOrMore(r_function_declaration))

After changing these lines, I was able to get your parser to terminate on the given sample text.

-- Paul

2015-05-17 12:09:49 - Animusmontus

Yeah, I found that finally. They were zero or more when I first wrote the parser and had a fixed order for all the parts. But I was looking at the scripts some and I was not sure of the order of some of the parts so I added the r_parts and did not change anything above it. The fixed order was r_includes, r_global_constants, r_global_variables, r_forward_declarations, r_main_function, r_function_declarations; but the comments in the scripts mentioned that the includes actually were supposed to fit between global constants and global variables. So I changed it to be more, lets say, flexible.

2015-05-27 06:00:00 - ptmcg

This architectural guideline is an excellent developer's mantra: Be strict when sending and tolerant when receiving. Flexibility is important in parsers, and the more variety you can tolerate in inbound data, the more robust your parser will be, and likely to last over time without constant intervention.


2015-05-23 10:09:32 - lars.stavholm - Line sensitive parsing

What's the best approach for line-sensitive parsing with pyparsing?

I'm trying to parse the output of the dmidecode command as follows:

Memory Device
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: DIMM
        Set: 1
        Locator: DIMM_A1 
        Bank Locator: Not Specified
        Type: DDR3
        Type Detail: Synchronous
        Speed: 1600 MHz
        Manufacturer: 00CE00B300CE
        Serial Number: 13B9E3D5
        Asset Tag: 03133563
        Part Number: M393B1G70BH0-YK0  
        Rank: 1

Memory Device
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: DIMM
        Set: 1
[snip]

My parser looks like this:

header = Group(Literal('Memory Device') + LineEnd())
colon = Literal(':').suppress() 
caption = Group(OneOrMore(Word(alphas)) + FollowedBy(colon))
value = Group(restOfLine())
item = Group(caption + colon + value)
section = Group(header + OneOrMore(item))
parser = OneOrMore(section)

And the printout of the result is:

[[result = parsed.asList()print pprint(result)[[['Memory Device', '\n'], [['Total', 'Width'], [' 72 bits\r']],
[['Data', 'Width'], [' 64 bits\r']],
[['Size'], [' 8192 MB\r']],
[['Form', 'Factor'], [' DIMM\r']],
[['Set'], [' 1\r']],
[['Locator'], [' DIMM_A1 \r']],
[['Bank', 'Locator'], [' Not Specified\r']],
[['Type'], [' DDR3\r']],
[['Type', 'Detail'], [' Synchronous\r']],
[['Speed'], [' 1600 MHz\r']],
[['Manufacturer'], [' 00CE00B300CE\r']],
[['Serial', 'Number'], [' 13B9E3D5\r']],
[['Asset', 'Tag'], [' 03133563\r']],
[['Part', 'Number'], [' M393B1G70BH0-YK0 \r']],
[['Rank'], [' 1\r']],
[['Memory', 'Device', 'Total', 'Width'], [' 72 bits\r']],
[snip]

As you can see, the second header ('Memory Detail') doesn't stop at LineEnd(), but continues into the next caption.

What am I doing wrong? (it's driving me nuts) (BTW, PyParsing rocks!)

2015-05-27 05:52:34 - ptmcg

This is because 'Memory Device' matches the specification for a caption, so it and the following caption get slurped into the body of the previous section. You might need to define line breaks as significant whitespace, but there are several steps to this, and you might get away with making a slight mod to the definition of caption, to use a negative lookahead to make sure that 'Memory Device' separators don't get misinterpreted as captions:

caption = Group(~header + OneOrMore(Word(alphas)) + FollowedBy(colon))

If you find that you have to define more headers than just 'Memory Device', then you would just expand this to:

caption = Group(~(header1 | header2 | header3) + OneOrMore(Word(alphas)) + FollowedBy(colon))

or perhaps more neatly:

header = header1 | header2 | header3
caption = Group(~header + OneOrMore(Word(alphas)) + FollowedBy(colon))

Congrats on making good progress on this on your own!

-- Paul

2015-05-27 09:00:56 - lars.stavholm

Thanks, I've tried line breaks as significant whitespace, and it worked but looks kinda' ugly, I'm trying to avoid that type of solution. As a workaround I added markers for line start, line end, and TAB's. That's working, but still kinda' ugly. The negative lookahead sounds promising, I'll try that and get back to you.


2015-05-26 08:05:47 - AndreWin - Parsing c++ code block with arbitrary order of its elements

Hello! I have for example the following c++ code:

s = '''
// comment1
int var1;
int var2;
// comment2
void somemethod1();
'''

These elements can be placed in any order. I wrote the following:

import pyparsing as pp
comment = pp.Literal('\\') + pp.restOfLine
var_name = pp.Word(alphas, alphanums + '_')
var_def = pp.Keyword('int') + var_name + ';' + pp.Optional(pp.restOfLine)
func_name = var_name
func_def = pp.Keyword('void') + func_name + pp.Literal('();') + pp.Optional(pp.restOfLine)
class_body = pp.ZeroOrMore(comment) & pp.ZeroOrMore(var_def) & pp.ZeroOrMore(func_def)
class_body.parseString(s)

I get error: ParseException: Expected end of text (at char 0), (line:1, col:1) How can I solve this task? Thanks, Andrei.

2015-05-26 08:42:03 - ptmcg

Changing the literal string in your comment from \ to // will clean things up for you.

A few other suggestions:

  • Use Groups to add structure to your parsed output. Right now all the parsed tokens are just one long list of strings
  • Avoid literals that are really a series of potentially separate tokens. Defining the function body as '();' means that you won't detect functions with extra whitespace inside the ()'s or before the ';'. You can fix this by replacing pp.Literal('();') with pp.Literal('(')+pp.Literal(')')+pp.Literal(';'), or the simplified version below. (this also gives you an option to add a placeholder in case you have to handle the passing in of function arguments).

Finally, with Group adding per-statement structure, I can just call pprint() on the returned ParseResults:

import pyparsing as pp
LPAR,RPAR,SEMI = map(pp.Literal,'();')
G = pp.Group
comment = G(pp.Literal('//') + pp.restOfLine)
var_name = pp.Word(pp.alphas, pp.alphanums + '_')
var_def = G(pp.Keyword('int') + var_name + SEMI + pp.Optional(pp.restOfLine))
func_name = var_name
func_args = pp.Empty() # placeholder here for adding args later
func_def = G(pp.Keyword('void') + func_name + LPAR + G(pp.Optional(func_args)) + RPAR + SEMI + pp.Optional(pp.restOfLine))
class_body = pp.ZeroOrMore(comment) & pp.ZeroOrMore(var_def) & pp.ZeroOrMore(func_def)
class_body.parseString(s).pprint()

prints:

[['//', ' comment1'],
 ['int', 'var1', ';', ''],
 ['int', 'var2', ';', ''],
 ['//', ' comment2'],
 ['void', 'somemethod1', '(', [], ')', ';', '']]

2015-05-27 01:06:16 - AndreWin

Thank you very much!) My parser works now!

2015-05-27 01:08:40 - AndreWin

To be honest I didn't know that parser can modify text. I rewrited my parser so it do this replacement itself. So I needn't any loop in my code. Cool! :)

2015-05-27 05:43:33 - ptmcg

What kind of text modification are you doing? Are you using a parse action for this? Glad to hear these advanced features are working out for you, congratulations!

2015-05-27 05:52:41 - AndreWin

Thanks :) I need to insert macro name at the start of class body and before class methods beginning from void. I used .parseAction and .transformString for it. Exapmples for Pyparsing was very helpful for me.


2015-05-29 08:12:18 - jellby - Comments in blocks

I have a very simple grammar. Basically it's made up of blocks, each of which starts with '&' and some name, and extends down to the next one or to the end of the file. In addition, there are comments of two kinds: one-line comments, from a '' to the end of the line, and C-style comments, enclosed in '/ */'. The comments can be inside the blocks too. My try so far:

#!/usr/bin/env python
from pyparsing import *

NL = Suppress( LineEnd() )
restOfLineNL = restOfLine + NL

LineComment = Literal('*') + restOfLineNL
BlockComment = nestedExpr('/*', '*/')
Comment = BlockComment | LineComment

ModuleName = LineStart() + Word( '&', alphanums + '_', min=2 ) + NL
Module = ModuleName + SkipTo( StringEnd() | ModuleName, include=False)
Grammar = ZeroOrMore( Module )
Grammar.ignore(Comment)

result = Grammar.parseString('''
&keyword1

*comment

&keyword2
arbitrary lines
* with comments interspersed
possibly empty

/*
&keyword3
c-style comments
are allowed too
*/

&keyword4                                                                                                                                                                                                                
''', parseAll=False)                                                                                                                                                                                                     

for line in result:                                                                                                                                                                                                      
  print('|'+line.rstrip()+'|')                                                                                                                                                                                           

this results in:

|&keyword1|
||
|&keyword2|
|arbitrary lines
* with comments interspersed
possibly empty

/|
|&keyword3|
|c-style comments
are allowed too|
|&keyword4|
||

This has a few problems:

  1. The comment inside &keyword2 is not removed.

  2. The C-style comment around &keyword3 is not removed.

  3. I would like to remove the empty strings too (this is minor).

Thanks in advance.


2015-06-04 13:49:16 - r3d4rmy - Parsing dblSlashComment

I'm relatively new to PyParsing and I've been able to get a good way just going off what I've read on here but for this one issue I can't seem to find a way to resolve it given what I currently understand.

I used the response in to structure what I needed. The main difference is I'm dealing with dblSlashComments that can appear in different places.

Example input txt file:

// Comment 1

BeginScript ScriptName
{

FunctionA(' Some text' VarA 'Some text')

FunctionB(VarB = $Dir_text 'Some text', VarC = 'Some text');

FunctionC('Some text')

//Comment 2

FunctionD(VarD = Value1, VarE = Value2, //Comment 2
                 VarF = Value3,...); //Comment 3

FunctionE(VarG, VarH = 'Some text');

FunctionF(VarI = //Comment 4
                  Value4);

FunctionG(//Comment
                  VarG = Value5);
}

I can't post an actual example due to the nature of my work so the following are the rules the above psuedo-example uses:

  1. A comment starts with '//' and can appear anywhere but it will always have a newline following right after it
  2. All functions are enclosed by '(' and ');'
  3. A function can take anywhere from 0 to many arguments
  4. A function can only take a standalone variable if it's the first argument
  5. A function can only take a variable, string, or assignment as arguments
  6. A function can only contain alphabetical characters
  7. An assignment can be either Var1 = Var2, Var1 = 'Text', or Var1 = float or int
  8. Variables on the left side of an '=' can only be alphanumeric while the right side can be almost any combination of characters (no '/' or '' though)
  9. Multiple arguments are separated by commas
  10. A string will always be enclosed by double quotes
  11. Strings should maintain their white space
  12. Strings can span multiple lines
  13. Anything that is not a comment or function can be ignored
  14. Comments should be preserved for the parsed function (if possible)

I know this is a lot to try and understand so below you'll see my initial shot at it:

# Integer can be a positive or negative number
        integer_ = Regex(r'-?[0-9]+?')

        # Float is any positive or negative digit, followed by a decimal, followed by digits
        float_ = Regex(r'-?[0-9]+\.[0-9]+?')

        # A double quoted string that can span multople lines OR a double quoted string
        quotedStr = QuotedString(''''',multiline=True) | dblQuotedString

        # Define basic punctuation and suppress them from showing in the output
        EQ,COMMA,LPAR,RPAR,SEMI = map(Suppress,'=,();')

        # Definition of a function value
        # [Any upper or lower case alphabetical characters]
        function = Word(alphas)

        # Definition of the value 'x' in 'x' = 'y'
        # [Same as function definition plus integers]
        assignee = Word(alphanums)

        # Defintion of a specialized alphanumeric string
        # [Same as assignee definition plus special chars]
        alphanum_special = Word(alphanums + '_$%*:-')

        # Define a placeholder for func_call - we don't have it yet, but we need it now
        func_call = Forward()

        # Definition of the value 'y' in 'x' = 'y'
        # [Float OR integer OR alphanumeric/special chars OR quoted string]
        expr_arg = (float_ | integer_ | alphanum_special | quotedStr)

        # Definition of the complete structure 'x' = 'y'
        named_arg = (assignee + EQ + expr_arg)

        # Definition for what makes a complete argument in a function
        # [Either a named_arg OR an expr_arg]
        ##func_arg = (named_arg | expr_arg) # Removed due to comment issue

        func_call << (function('fname') + LPAR +
        ZeroOrMore((Optional(ZeroOrMore(named_arg + COMMA)) |
        Optional(named_arg) |
        Optional(expr_arg)) + Optional(dblSlashComment))
         + RPAR + SEMI)

        # Statements are comments or function calls
        statement = Group(func_call('func_call') | dblSlashComment('comment'))

        try:
            stmts = OneOrMore(statement).parseString(text)
            # for each statement, print out parsed bits

            for stmt in stmts:
                #~ print stmt.dump()
                if stmt.comment:
                    print 'COMMENT: ', stmt.comment[0].strip()
                if stmt.fname:
                    print 'FUNCTION CALL: ', stmt.fname
                    print 'ARGUMENTS: ', stmt.args
        except ParseException, err:
            print err.line
            print ' '*(err.column-1) + '^'
            print err

I'm most concerned about solving the comment issue as it's providing to be the biggest hassle. If I can only keep comments that appear outside of functions, I'd be ok with that as well.

I currently have an infinite loop due to the first ZeroOrMore in func_call, however if I take it out, then it stops after the first occurence of a comment so I'm obviously not parsing something right.

Thank you for reading and any help is appreciated.

2015-06-07 02:19:23 - ptmcg

Hmmm. In general, pyparsing parsers usually are written to discard comments as they are found. It is difficult to anticipate all the places comments might occur in source code. For instance, in this familiar expression,

y = m*x + b;

A C programmer could comment this as:

y /* dependent value */ = /* slope */ m * /* independent value */ x + /* y-intercept */ b;

Or this for loop:

for (line=1; line <= page_size; ++line)

for (line=1; // line is 1-based
     line <= min(numlines, page_size); // watch out for array index overflow
     ++line)

To handle all these cases, you end up sprinkling Comment expressions all over your nice grammar until it is incomprehensible.

So for pyparsing, I added the 'ignore' method, so that comments could be defined as 'ignorable'. 'ignore' is usually called just once, after the complete grammar is defined, for example:

overall_parser.ignore(cStyleComment)

Now when overall_parser.parseString is called, any sub-expression will first check for and skip over any C-style comment. (All ignored expressions are checked in the same code that skips over whitespace.)

Are you absolutely sure you want the comments? It is going to be a fair bit of work to keep the flexibility of where comments can be found, and keep your parser manageable. One approach might be to take two passes at your code: pass 1 will find all the comments and make note of their locations (use scanString to simplify doing this); then in pass 2, ignore comments, but annotate functions with comments that were found in pass 1. Other than this, I don't have much to suggest. I don't think it is possible to have an expression that is 'ignore'd to appear in the body of a parser - it would never match.

Good luck with your project. -- Paul

2015-06-07 02:25:30 - ptmcg

Sorry for writing that long bit about comments appearing anywhere - rereading your original question, it is clear you are very well aware of this. But the main question still boils down to: do you really need the comments. I think your parser will spring to life you if remove all the Optional(dblSlashComment) expressions, and just add stmts.ignore(dblSlashComment) before using stmts to parse your code. This parser would work for pass 2 in the scenario I described above, and it would clean out a lot of the comment-handling clutter that you have in your parser now.

-- Paul

2015-06-08 07:43:55 - r3d4rmy

Thanks for the quick reply Paul. I'm trying out what you said since while preserving the comments would be nice, it's not an necessity.

The script seems to have a fit over the line:

named_arg = (assignee + EQ + expr_arg)

I turned on debug and it seems to keep expecting an alpha word:

Exception raised:Expected W:(abcd...) (at char 167), (line:2, col:65)

Not so coincidentally, that's the exact location of the first comment embedded in a function with multiple arguments. I've tried using the ignore method but haven't been able to make it understand that I just want to pretend the comment isn't there apparently and go on to the next line.

I might be missing something basic here though and not realizing it.

2015-06-08 12:36:09 - r3d4rmy

Ok, so I solved everything that I wanted to except dealing with quotes. What would be the most correct way to basically say 'If variable y for assignment x = y is a string, preserve the white space inside the string and pass it through while keeping the quotes? An example might be: String = 'Keep the spacing here', so it should end up looking like [String, 'Keep the space here']


2015-06-05 08:39:06 - onyx_onyx - Error caused by wrong token

Hi.

I'm new to PyParsing, and I've been tinkering about trying to build a simple parser for small configuration files of the form:

lorryCapacity <float>
serviceTime <int>
binCapacity <float>

I have the following grammar:

lorryCapKWD = Literal('lorryCapacity')
serviceTimeKWD = Literal('serviceTime')
binCapacityKWD = Literal('binCapacity')

floatVal = Regex(r'\d+\.\d+')
number = Regex(r'\d+')

lorryCapDecl    = Group(lorryCapKWD + floatVal)
serviceTimeDecl = Group(serviceTimeKWD + number)
binCapDecl      = Group(binCapacityKWD + floatVal)

configFile = Group(lorryCapDecl + serviceTimeDecl + binCapDecl)

Now, this parses wonderfully with valid input. My problem arises when we supply something malformed, for example, specifying a float after the 'serviceTime' keyword. The parse fails (as it should), but it fails with a message similar to 'expected 'binCapacity''. This is wrong; it should fail because a float was encountered where an int was expected, and print an error referencing THAT mistake.

I think I understand the reason behind this - the code sees the leading 'whole' part of the number, e.g. 2 in serviceTime 2.34, accepts that as the int that was expected, and moves on to parse the rest of the input. It proceeds to encounter a '.' character, and, as this wasn't the token 'binCapacity' that was expected, it throws the error I mentioned above.

How would I go about fixing things so the whole '2.34' gets parsed, recognised as a non-integer, and correctly flagged as the source of error?

Apologies if this is something really simple - I am a PyParsing newbie!

2015-06-07 01:55:17 - ptmcg

Your issue is very similar to zaymich's, which I just responded to. Pyparsing's error messaging can be confusing because of the internal backtracking it does while trying to match subexpressions within your grammar.

The solution to your problem is the same as for zaymich: use the '-' operator. But I think your parser is an even better case for '-'. '-' (which internally I call ErrorStop, to mean 'stop backtracking here in case of subsequent error') is best used when you have syntax like:

cmdA = keywordA + ... rest of command A's elements ...
cmdB = keywordB + ... rest of command B's elements ...
cmdC = keywordC + ... rest of command C's elements ...

In this grammar, once keywordA is matched, then one must have the remaining parts of command A match.

For your parser, this becomes:

lorryCapDecl    = Group(lorryCapKWD - floatVal)
serviceTimeDecl = Group(serviceTimeKWD - number)
binCapDecl      = Group(binCapacityKWD - floatVal)

I hope this improves your error messages.

-- Paul

2015-06-07 11:25:48 - onyx_onyx

I tried your suggestion, and used

lorryCapDecl    = Group(lorryCapKWD - floatVal)
serviceTimeDecl = Group(serviceTimeKWD - number)
binCapDecl      = Group(binCapacityKWD - floatVal)

Unfortunately, the original issue remains; I still get an error saying 'expected 'binCapacity'' instead of one referring to the invalid value type that has been specified when I supply a malformed input script.

2015-06-07 13:25:27 - ptmcg

Please post the malformed script that is giving you this error.

2015-06-07 13:40:41 - ptmcg

Nevermind, I have recreated your error.

2015-06-07 13:57:50 - ptmcg

I added tests to your program like this, trying a couple variations of bad input:

tests = '''\
lorryCapacity 1.1
serviceTime 200
binCapacity 2.2
=\
lorryCapacity 1.1
serviceTime 200
binCapacity 22
=\
lorryCapacity 1.1
serviceTime 200.5
binCapacity 22
=\
lorryCapacity 1.1
binCapacity 2.2
=\
lorryCapacity 1.1
serviceTime 200
'''.split('=')

for t in tests:
    print t
    try:
        configFile.parseString(t).pprint()
    except ParseException as e:
        print e.line
        print ' '*(e.col-1)+'^'
        print e
    print

Things look okay, except for Test 3, which is the specific case you described, and floating point value for service time:

lorryCapacity 1.1
serviceTime 200
binCapacity 2.2

[[['lorryCapacity', '1.1'], ['serviceTime', '200'], ['binCapacity', '2.2']]]

lorryCapacity 1.1
serviceTime 200
binCapacity 22

binCapacity 22
            ^
Expected Re:('\\d+\\.\\d+') (at char 46), (line:3, col:13)

lorryCapacity 1.1
serviceTime 200.5
binCapacity 22

serviceTime 200.5
               ^
Expected 'binCapacity' (at char 33), (line:2, col:16)

lorryCapacity 1.1
binCapacity 2.2

binCapacity 2.2
^
Expected 'serviceTime' (at char 18), (line:2, col:1)

lorryCapacity 1.1
serviceTime 200


^
Expected 'binCapacity' (at char 34), (line:3, col:1)

The simplest solution is to add a negative lookahead to the items that take integer arguments, like this:

serviceTimeDecl = Group(serviceTimeKWD + ~floatVal + number)

Giving:

serviceTime 200.5
           ^
Found unwanted token, Re:('\\d+\\.\\d+') (at char 29), (line:2, col:12)

This is better, but the reported position is at the front of the leading whitespace (because NotAny, the pyparsing class used to implement the '~' operator, does not do whitespace skipping). We can force whitespace skipping by adding a do-nothing Empty() expression - just like other helpers like 'lineStart' and 'lineEnd', pyparsing defines 'empty' for Empty(). An Empty() expression always matches, matching nothing, but only after skipping over whitespace. So by expanding your expression to:

serviceTimeDecl = Group(serviceTimeKWD + empty + ~floatVal + number)

The exception output is now:

serviceTime 200.5
            ^
Found unwanted token, Re:('\\d+\\.\\d+') (at char 30), (line:2, col:13)

Finally, to make your error message intelligible to those not familiar with your parser (or to you, 6 months from now), override pyparsing's default naming for expressions using setName():

floatVal = Regex(r'\d+\.\d+').setName('real number')
number = Regex(r'\d+').setName('integer number')

Not to be confused with 'setResultsName', which assigns field names to the parsed output, 'setName' assigns a human-friendly name to the expression itself. Your exception output is now:

serviceTime 200.5
            ^
Found unwanted token, real number (at char 30), (line:2, col:13)

Sorry for steering you down the ErrorStop path earlier, I was thinking of a different kind of parser. You may still find it useful at some time in the future (mostly when the order of the elements is not predictable), but for now, you can leave the '+' operators where they are.

Good luck - welcome to pyparsing! -- Paul


2015-06-07 00:06:39 - zaymich - Parsing of Optional(...)

Hi. I have the following grammar:

from __future__ import unicode_literals
from pyparsing import *

clause_a = Literal('aa')
clause_b = Literal('bb')
clause_c = Literal('cc')

clauses = clause_a + Optional(clause_b) + Optional(clause_c)

clause_d = Literal('dd')

root = clauses + Optional(clause_d) + StringEnd()

print root.parseString('aabbcc') #<= Ok
print root.parseString('aabb') #<= Ok
print root.parseString('aacc') #<= Ok
print root.parseString('aabbdd') #<= Ok

print root.parseString('aabdd') #<= here i want exception for clause_b!

How?

Thanks for answer, great thanks for pyparsing.

2015-06-07 01:46:54 - ptmcg

To change this behavior, you have to do a couple of things.

First, your definition of clause_b has to break up the two 'b' characters, so that pyparsing has a chance to do some processing between them. You could do:

B = Literal('b')
clause_b =  B + B

But then you would start getting results like:

['aa', 'b', 'b', 'cc']

To combine these back together use pyparsing's Combine class:

clause_b =  Combine(B + B)

['aa', 'bb', 'cc']

Now you are back to where you were before, but now you can change pyparsing's error handling. Pyparsing defines an alternative to '+' operator for combining expressions, '-'. Using the '-' operator, you can suppress pyparsing's backtracking when an error is found. If you write clause_b as:

clause_b =  Combine(B - B)

Now you have the behavior you want. The first 'b' will be matched, but when failing on the second 'b', you will get this exception:

pyparsing.ParseSyntaxException: Expected 'b' (at char 3), (line:1, col:4)

Be judicious in how you use '-', don't just change all your '+' operators to '-'! Pyparsing's backtracking is often the desired behavior.

Cheers, -- Paul


2015-06-17 12:34:46 - nicoder - Class vs Instance Declaration for ParseElements?

Hi all,

In all the examples I have seen, parse elements have been defined as class variables, for example:

However, doing it this way is causing issues for me as I want to pass parseResults to an object for further processing.

Is there anything wrong with creating my parser like:

class Parser(object):

    # keywords
    def __init__(self):
        self.ID = Word(alphas + '_', alphanums + '_') ] ]

etc.

I have really screwed up my parser with getting static and instance confused, and I just want to do it all in an object now.

Any help would be much appreciated.

2015-08-19 06:11:47 - ptmcg

I think you might be misunderstanding the role of the classes in your posted example. The parser itself is defined in a module-level method called parser, no classes involved. The purpose of the classes is to act as parse-time callbacks - that is, when some expression in the parser is matched, pyparsing takes the parsed tokens, and creates one of the AQL classes with it. So in aql.py, the classes are for post-parsing work, not for parser definition.

Let's say you want to have a simple-minded date validator, that looks at dates like YYYY/MM/DD, and for whatever reason, you want to validate that MM is from 1 to 12 and DD is from 1 to 31 (which can still give invalid dates like Feb 30, but not relevant for this example. I can create a Validator class that gets constructed with a valid value range, and then give it a validate method that will be a parse-time parse action (doing as you say 'pass ParseResults to an object for further processing'). Here is how a Validator class might look:

from pyparsing import *

class Validator(object):
    def __init__(self, minval, maxval):
        self.minval = minval
        self.maxval = maxval
    def __call__(self, s, locn, tokens):
        if not self.minval <= tokens[0] <= self.maxval:
            raise ParseException(s,locn,
                                 'parsed value %s not in range(%d to %d)' % (tokens[0], self.minval, self.maxval))
integer = Word(nums).setParseAction(lambda t: int(t[0]))

date = (integer + '/' + 
        integer().addParseAction(Validator(1,12)) + '/' + 
        integer().addParseAction(Validator(1,31)))

print date.parseString('2000/1/1')
print date.parseString('2000/13/1')

2015-06-27 01:17:09 - HubbaBubbaMan - Building a template engine with pyparsing

Hey all,

as a little practice project for programming I want to build template engine like . But now, I have problems in defining my grammar...

For example consider the following template:

<!doctype html>
<html>
    <head>
        <title>Welcome to {{ page }}</title>
    </head>
    <body>
        {% each visitors %}
            {% if it.member %}
                <div>{% it.name %} is a member</div>
        {% else %}
        <div>{% it.name %} is not a member</div>
        {% end %}
    {% end %}
    </body>
</html>

I started with the following code for the grammar

VAR_START = Suppress('{{')
VAR_END = Suppress('}}') 
variable = VAR_START + SkipTo(VAR_END) + VAR_END

BLOCK_START = Suppress('{%')
BLOCK_END = Suppress('%}')
BLOCK_END_PART = BLOCK_START + Suppress('end') + BLOCK_END 

block_body = Forward()

loop_head = BLOCK_START + Suppress('each') + Word(alphas, alphanums + '_') + BLOCK_END
loop = loop_head + block_body + BLOCK_END_PART

but how can I define block_body in the end? I don't really have an idea because the block_body can start with text containing variables and then there will be again a loop or a condition...

I would very much appreciate any help of you :-)

Thanks


2015-07-03 22:03:29 - zaymich - Line comments and arithmetic expressions

Hi!

I'll defined next grammatic:

def ParseComment(tokens):
    return 'comment'

line_comment = (Suppress(Literal('--')) 
                + restOfLine() 
                + StringEnd()).setParseAction(ParseComment)

expr_operand = Optional(line_comment) + Word(initChars=nums, bodyChars=nums)

expr_arithmetic = operatorPrecedence(expr_operand,
                                      [
                                        (signop, UNARY, opAssoc.RIGHT),
                                        (multop, BINARY, opAssoc.LEFT),
                                        (plusop, BINARY, opAssoc.LEFT),
                                        (concatop, BINARY, opAssoc.LEFT),
                                      ])

print expr_arithmetic.parseString(\
'''
--a
1+2
''')

for having possibility to include (exmpl) in arithm expr line comments '--...'

i cant't compose parser admitting '--...' as beginning of line comment, instead of two unary minuses

Could i realized in this view? Or another way?

2015-07-03 22:05:07 - zaymich

Of course, at the beginning:

signop = oneOf('+ -')
multop = oneOf('* /')
plusop = oneOf('+ -')
concatop = Word('||')
UNARY, BINARY = 1, 2

2015-07-04 04:40:52 - zaymich

No more question :-) i should be use line_comment = ... LineEnd()... Thanks.


2015-07-21 14:40:38 - yosepfkaggerman - Parsing multiple lines as in an IDE

I'm trying to write my own code-editor, I figure its a good way to learn pyQt. I am using a qtextedit, in which i can write code(it's not real code, more pseudo code). Each line represents ending in a semi-colon represents some command e.g.

PSEUDOF->FWD->90;
PSEUDOS->STOP;
PSEUDOR->RIGHT 90;
PSEUDOF->FWD 10;

These are relatively easy to read, as the user presses the [ENTER] the current line is read, parsed and checked for errors so the following

PSEUDO->RIGHT -pi/2

would generate an error because the line doesn't end in a semi-colon and the value following RIGHT needs to be a number.(my editor, my rules).All this I have more or less got working. by defining a grammer for each statement type i.e fed,back,stop, etc I would like to know how to do multiple lines though. I am familiar with editors such as Eclipse,sublime or visual studio which handle muliple lines very well, in my case

PSEUDO->DO:
FWD->90
RIGHT->45
FWD->10
LEFT->55
FWD->50
STOP;

Should all be read in and treated as one statement, starting at the keyword PSEUDO and ending at the semi-colon. However the following should be read as 3 separate statements.

PSEUDO->DO:
FWD->90
RIGHT->45
FWD->10
LEFT->55
FWD->50
STOP;

PSEUDO->DO:
FWD->90
RIGHT->45
STOP;

PSEUDO->BACK 10;

My question how can I go about reading muliple lines as described above as discreet statements? What should I be keeping in mind when writing/defining a grammer.


2015-07-29 04:36:42 - larapsodia - Minimum Length

I have a parsing grammar with one of the elements defined as so:

VBZ = oneOf(vbz_prefixes)('prefix') + SkipTo( vbz_suffix + endOfString | endOfString)('stem') + Optional( (vbz_suffix)('suffix') )

I would like this grammar to only match words with a 'stem' of at least two letters. Is there any way to do this?


2015-08-26 08:06:27 - AndreWin - Problems with end of line

Hello!

I have string:

s = '''1;2;3;\r\n4;5;6;\r\n7;8;9;\r\n'''

I'd like to have:

[[1,2,3], [4,5,6], [7,8,9]]

My code:

num = pp.Word(pp.nums)
delimiter = pp.Suppress(';')
EOL = pp.Suppress(pp.LineEnd())
textline = pp.Group(pp.OneOrMore(num + delimiter) + EOL)
fulltext = pp.OneOrMore(textline)

My results:

>> fulltext.parseString(s)
([(['1', '2', '3', '4', '5', '6', '7', '8', '9'], {})], {})

What should I change to get nested lists?

P.S.: I also can't understand what difference is between LineEnd() and lineEnd.

Please help me.

Thanks in advance.

Andrey.

2015-09-08 04:48:24 - ptmcg

By default, a line ending is treated like normal whitespace. To change this behavior in your parser, call

ParserElement.setDefaultWhitespaceChars(' ')

Insert this code right after importing pyparsing.

When you do this, your EOL's will not be part of the automatically-skipped whitespace.

lineEnd, lineStart, empty, etc. are just convenience constants, equivalent to LineEnd(), LineStart(), Empty(), and so on. Note however, that these constants are defined with the regular set of whitespace characters - since you are changing them, then you should not use the convenience constants, but should construct your own, as you do in EOL.

2015-09-08 05:04:28 - AndreWin

Thanks a lot!)


2015-09-07 06:11:02 - Rittel - Skip to first possibility in text

I am trying to use the method Skipto to reach the first occurence of several possible Literals in the text.

Imagine something similar to this:

OneOrMore(SkipTo(Literal('Example1') | Literal('Example2')))

If I now have a text similar to this one:

<<Lot of stuff>>
Example2
<<More stuff>>
Example1
<<Stuff>>

It only finds the Example1 occurence and just ignores the other one. Now my question is how I can skip to the first possibility in the file.

Thank you already in advance!

2015-09-07 07:17:38 - Rittel

I mistyped there.. The problem lies with the following expression:

OneOrMore(SkipTo(...longer expression...) | SkipTo(...another long expression...)))

I am not able to fuse the SkipTo's together for other reasons

2015-09-07 09:26:24 - ptmcg

If you are only trying to process bits and pieces from within a larger body of text, try using searchString or scanString instead of parseString.

from pyparsing import oneOf, lineno

sample = '''
<<Lot of stuff>>
Example2
<<More stuff>>
Example1
<<Stuff>>'''

expr = oneOf('Example1 Example2')

for toks, start, end in expr.scanString(sample):
    print toks
    print 'starts at line', lineno(start, sample)
    print 'ends at line', lineno(end, sample)
    print

prints

['Example2']
starts at line 3
ends at line 3

['Example1']
starts at line 5
ends at line 5

2015-09-09 12:40:46 - TheVeryOmni - deferred execution of ParseAction in case of 'Or' - is it a bug?

Hi there, could someone tell me if this is a bug or a intended behaviour of PyParsing?

import pyparsing as pp

#Two expressions and a input string which could - syntactically - be matched against both expressions. The 'Literal' expression is considered invalid though, so this PE should always detect the 'Word' expression.
def validate(token):
    if token[0] == 'def':
        raise pp.ParseException('signalling invalid token')
    return token

a = pp.Word(pp.alphanums).setName('Word').setDebug()
b = pp.Literal('def').setDebug().setName('Literal').setParseAction(validate)

#The 'Literal' expressions's ParseAction is not executed directly after syntactically detecting the 'Literal' Expression but only after the Or-decision has been made (which is too late)...    
print(pp.Or([b, a]).setDebug().parseString('def'))

This is either a bug or setParseAction is not the correct approach to add additional, non-syntactic checks to a ParseExpression. In that case: what is the correct approach to do it?

Thx in advance!

2015-09-09 17:01:22 - ptmcg

Very interesting example. Parse actions are the intended place for these types of additional validation checks, so I would like this mechanism to be smart enough to handle things.

The issue is that Or tests all the alternatives in its list before selecting the correct one. Since pyparsing does not know whether parse actions are stateful, or have side-effects, or anything, they are not run during the Or alternative selection process. And as you found, in this case, Or picks b, and then b fails in its parse action.

It looks like I have to make Or smarter than it is now, and save all the expressions that 'work', and then execute them in descending order of matching length, so that if a validating parse action fails, then another alternative that worked in the first pass can be tried.

I'll put this logic into Or, and get something checked into SourceForge shortly (just at the moment, my paying day job has a pressing deadline, so I might not get to this in the next few hours). This probably has not cropped up earlier, for a few reasons, one being that I tend to steer people to using MatchFirst over Or, and MatchFirst does not have this issue.

Thanks for reporting it - if I don't get back by Monday, drop me another note.

-- Paul

2015-11-04 01:16:19 - ptmcg

I just checked in the fix to this into SVN on Sourceforge, to be released in 2.0.6 (sometime in the next few weeks, I expect).


2015-09-13 08:00:28 - Williamzjc - questions and advice about the codes of pyparsing2.0.3

It is hard to comprehend the whold programs. Following is my advice, some are insignificant, some not. I will continued to read it.

In line 311, it writes that self[name].__name = name, I think the auther made a mistake. self[name].__name only appears once. It seams self.__name=name.

Line 531-537 could be corrected as follows: out = [res.asList() if isinstance(res,ParseResults) else res for res in self.__toklist] Line 510-516 could be return '[' + ', '.join(str(i) if isinstance(i, ParseResults) else repr(i) for i in self.__toklist) + ']'

Line 303: if not (isinstance(toklist, (type(None), str, list)) and toklist in (None,'',[])):, why not 'if toklist:'

Similarly in line 297.

Line 489 addoffset = ... # lambda a: offset if a<0 else a+offset

Similarly Line 711 return 1 if loc<len(s) and s[loc] == '\n' else loc - s.rfind('\n', 0, loc)

Line 490 variable otheritems could be omitted, since it is only used once(also 584 worklist) always use method append for expression list+[v], and use extend for list+list (498,329,335))

In line 354-358 (see also 446-449)

        for name, val in self.__tokdict:
            for j in removed:
                for k, (value, position) in enumerate(val):
                    val[k] = _ParseResultsWithOffset(value, position - (position > j))

even (but I am not sure it will be dangerous. method delitem could be imporved)

        for name, val in self.__tokdict:
            for j in removed:
                val = [_ParseResultsWithOffset(value, position - (position > j) for (value, position) in val)

Line 502-504: the definition of radd is suggest to be redefined

Line 558 and 582 can be combined, since 'out' is never used between the two lines..

In Line 794 and 800, just use @staticmethod.

Finnaly, I also suggest redefining (just a suggestion)

class _ParseResultsWithOffset(object):
    def __init__(self,p1,p2):
        self.pr = p1
        self.offset = p2
    def __getitem__(self,i):
        if i ==0 or i=='pr':
             return self.pr
        elif i==1 or i=='offset':
             return self.offset
    def __repr__(self):
        return repr((self.pr, self.offset))
    def setOffset(self,i):
        self.offset = i

2015-09-13 08:15:29 - ptmcg

Thanks for the code review - I'll look over your notes and see if I can add any insights on why things are the way they are. But I wouldn't be surprised if there are some odd bits in there that can stand improving/removing!

2015-09-13 12:55:58 - ptmcg

Several of your comments have to do with converting explicit 'if condition: else:' code to a ternary expression. Pyparsing's origins predate this feature of Python, and I avoided it in much of the code while I was still trying to maintain support for older Python versions. After a quick skim through the code, I do find that a ternary expression has crept in already, in the implementation of srange. I do prefer list comprehensions over calling .append() inside a loop, and, now that 2.0.3 and beyond are focused solely on Py2.6/3.0 and later, I'll try to incorporate some of these tighter code idioms.

Note, though, that this body of code is still bridging the 2.6+/3.x gap, so I am limited in some code features that are not accessible across that whole version range.

Decorator syntax was not part of Python at pyparsing's inception, but instead we had this clunky form that you see in ParserElement. This will be cleaned up in the next release, per your suggestions (and a few others that you missed :) ).

Line 502-504: the definition of radd is suggest to be redefined ParseResults.__radd__ is there specifically to support calling 'sum(expr.searchString(inputstring))' so I won't be changing that.

Thanks for raising these points - I'll have them checked into SourceForge in the next hour or so.

-- Paul


2015-10-01 14:34:19 - rjmco - Parsing nginx configuration files

Hi. I am new to pyparsing, and I'm trying to use it to parse and modify nginx configuration files such as the below.

server {
  listen 443 ssl;
  server_name prod.example.com example.com stage.example.com;

  ssl_certificate ssl.crt/example.com.crt;
  ssl_certificate_key ssl.key/example.com.key;

  keepalive_timeout 5;

  access_log /var/log/nginx/example.access.log;
  error_log /var/log/nginx/example.error.log;

  root /var/www/example/cs/current;

  client_max_body_size 20M;

  location / {
    try_files $uri $uri/index.html @cs_shared;
  }

  location /images {
    expires 30d;
  }

  location /scripts {
    expires 30d;
  }

  location /styles {
    expires 30d;
  }

  location @cs_shared {
    root /var/www/example/cs/shared/public;
    try_files $uri $uri/index.html @ss;
  }

  location @ss {
    root /var/www/example/ss/current/public;
    try_files $uri @index;
  }

  location @index {
    root /var/www/example/cs/current;
    rewrite ^ /index.html break;
  }

  location @rails {
    proxy_pass http://example-ror;

    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header Host $host;
    proxy_set_header X-Request-Start 't=${msec}';
    proxy_redirect off;

    proxy_read_timeout 300;

    proxy_buffer_size 16k;
    proxy_buffers 32 16k;

  }

  location ~* ^/(api|assets|system|rich|admin) {
    root /var/www/example/ss/current/public;
    try_files $uri @rails;
  }

  error_page 500 502 503 504 /500.html;
  location = /500.html {
    root /var/www/example/cs/current;
  }

  error_page 405 =200 $uri;

}

Using the following grammar taken from nginxparser (), I am able to parse and modify most of the configuration, but the last 'server' block.

        # constants
        left_bracket = Literal('{').suppress()
        right_bracket = Literal('}').suppress()
        semicolon = Literal(';').suppress()
        space = White().suppress()
        key = Word(alphanums + '_/')
        value = CharsNotIn('{};,')
        location = CharsNotIn('{};,' + string.whitespace)
        # modifier for location uri [ = | ~ | ~* | ^~ ]
        modifier = Literal('=') | Literal('~*') | Literal('~') | Literal('^~')
    
        # rules
        assignment = (key + Optional(space + value) + semicolon)
        block = Forward()
    
        block << Group(
            Group(key + Optional(space + modifier) + Optional(space + location))
            + left_bracket
            + Group(ZeroOrMore(Group(assignment) | block))
            + right_bracket)
    
        script = OneOrMore(Group(assignment) | block).ignore(pythonStyleComment)

The last server block is ignored by the grammar because of the following nginx directive:

proxy_set_header X-Request-Start 't=${msec}';

The brackets inside the double quotes are interpreted as delimiters for the blocks.

I've been trying to adjust the grammar with nestedExpr as below but without much success. When I do so, the python process uses all available memory on the system.

    # constants
    left_bracket = Literal('{').suppress()
    right_bracket = Literal('}').suppress()
    semicolon = Literal(';').suppress()
    space = White().suppress()
    key = Word(alphanums + '_/')
    #value = dblQuotedString | CharsNotIn('{};,') 
    value = CharsNotIn('{};,')
    location = CharsNotIn('{};,' + string.whitespace)
    # modifier for location uri [ = | ~ | ~* | ^~ ]
    modifier = Literal('=') | Literal('~*') | Literal('~') | Literal('^~')

    # rules
    assignment = (key + Optional(space + value) + semicolon)
    block = Forward()

    block << Group(
        Group(key + Optional(space + modifier) + Optional(space + location))
        nestedExpr('{','}', Group(ZeroOrMore(Group(assignment) | block))))

    script = OneOrMore(Group(assignment) | block).ignore(pythonStyleComment)

Does anyone have any idea on how to solve this?

Thanks in advance!

2015-10-02 01:37:42 - Williamzjc

space is not needed


2015-10-12 07:40:40 - Williamzjc - One question about setParseAction

If an expression E of ParserElement has action and its sub-expressions also have their actions, but I want to re-use E without any action, its action or action of sub-expressions, then how should I do? How delete the actions?

2015-10-21 01:40:37 - ptmcg

At one time, I was thinking of supporting this using something like E.setParseAction(None). But that is not how things work currently.

You could just poke the member variable directly: E.parseAction = []
You could also back up to where you set the parse action, and apply the parse action to a copy of E, leaving E clear of parse actions. And I have no parse action 'stripper' that will clear the parse actions on E and all sub-expressions within E. This would not be a difficult recursive method to write, though.

2015-10-21 04:05:45 - Williamzjc

Thank you


2015-10-19 05:12:31 - heronils - 2to3 required?

I see print statements in the code examples, why not add the info that one has to run 2to3 -w before using Pyparsing on py 3.

2015-10-19 14:22:41 - ptmcg

Well, strictly speaking, pyparsing itself is Py2.6 thru Py 3.x compatible, so 2to3 isn't necessary to use pyparsing. The examples are somewhat lagging, I agree, but I'm not sure that is enough of an issue that one would say they have to run 2to3 before they could use pyparsing.

2015-10-19 14:26:06 - ptmcg

I'm open to posting a notice somewhere regarding the version dependency of the examples, is the wiki page itself sufficient? Many people install pyparsing using pip or easy_install and (sadly) never even see the examples or the docs, so I'm not sure there is anyplace in the install itself to make this point. But I can definitely add a header on the wiki page, and in the README file that co-resides with the examples.

2015-10-22 23:38:50 - heronils

Hi again,

Yes i see now that sourceforge has also py 3.x installers, thats great. I just installed it and it works :-)

I suggest to append: ' (including Python 3)' to 'supports Python versions 2.6 and later' above. That is more explicit.

2015-10-24 12:20:48 - heronils

The reason is, i have run into websites where 'supports 2.x' was said, but it was written before python 3 existed. Other website authors saying this did ignore python 3 because they didnt like it (seems there exist a few out there). These authors did not care that people will probably install their software on a python 3 box and then get errors.

Therefore i like when a website says explicitely: 'yes, it is python 3'. It removes these ambiguities.

2015-11-04 01:24:15 - ptmcg

I think the above description is clearer, thanks for the suggestion!


2015-10-28 07:46:08 - Williamzjc - Suggestion

Line 1737, it might be len(self.initCharsOrig) == 1, I think, please consider it.

Line 2996. it should be string.upper

Following are some personal recommendation.

Line 1813/1834, we could use re._pattern_type (Line 145, types.GeneratorType)

Line 495-496, I think we can delete them if it is unnecessary

Line 1631, set seams redundant

Line 1545, the definition of setName seams redundant too

Finally, I suggest us to rewrite Line 490-496 as

        for k, vlist in other.__tokdict.items():
            for v in vlist:
                self[k] = _ParseResultsWithOffset(v[0],addoffset(v[1]))

I am sorry the line numbers may be inexact, since I writed some comments into the file.

2015-10-28 07:52:49 - Williamzjc

Line 2996 is right, I am wrong, I don't know such funtion.

2015-10-30 21:14:04 - ptmcg

I put a couple of these into 2.0.4/2.0.5. Dead right about self.initCharOrig, thanks! Also Token.setName() was completely unnecessary, I cut it. The other items were less compelling, need to make a stronger case for them.


2015-11-05 13:44:08 - mbeaches - Markup/down grouping of style

I'm just discovering pyparsing, and I suspect it easily do what I want, but I'm a bit lost in getting it do what I want.

I have a string marked up with some standard type of markup like 'this is a sentence with a **few words bolded** and some regular text'. I'm trying to get a list (or some object) which would give me something like (list of tuples would be great, but anything I can iterate over would be fine):

[('normal','this is a sentence with a'),('bold','few words bolded'),('normal','and some regular text')]

I've looked at the markup example (along with many others) but can't seem to get it just right. I'm not wanting to transform the string; I'm having to build a word document with python-docx and need to handle in paragraph character styles (add_run).

I'd typically post what I've tried, but it results in garbage. I'm trying to group using the optional call to place in the 'normal' and 'bold' but I'm getting all the normal along with the bold. Not giving up, but having to ask for help after working through the samples for the afternoon.

Any pointers?

2015-11-05 13:45:44 - mbeaches

those bolded words were meant to be literally quoted with double asterisks in the string. probably should have read the how to post code first :(

2015-11-06 05:42:58 - mbeaches

Feel like I'm getting close, but having some trouble with the (not bold and not italic ) part. Also unsure if transformString is the way to go but not sure how else to get everything to fire. How to I also get the 'normal' text into wordLine?

from pyparsing import *

wikiInput = '''
Here is a simple Wiki input:
  *This is in italics.*   **This is in bold!**   ***This is in bold italics!***
  And this text is **also bold**.
'''

def buildLineAction(lst, style):
    def parseAction(s,l,t):
        lst.append((style,t[0]))
    return parseAction

BOLD = 'Bold'
ITALIC = 'Italic'
NORMAL = 'Normal'
CHAR_STYLE_CHOICES = {BOLD: 'With bold emphasis',
                      ITALIC: 'With italics',
                      NORMAL: 'Default Formatting',}

wordLine = []
italicized = QuotedString('*').setParseAction(buildLineAction(wordLine,ITALIC))
bolded = QuotedString('**').setParseAction(buildLineAction(wordLine,BOLD))
normal = (~bolded & ~ italicized).setParseAction(buildLineAction(wordLine,NORMAL))

wikiMarkup = bolded | italicized | normal

print wikiMarkup.transformString(wikiInput)
print wordLine

2015-11-06 17:27:02 - mbeaches

This works sort of, but I'm sure there's a better way.

from pyparsing import *

wikiInput = '''
Here is a simple Wiki input:
  *This is in italics.*   **This is in bold!**   ***This is in bold italics!***
  And this text is **also bold**.
'''

def buildLineAction(lst, style):
    def parseAction(s,l,t):
        if style == NORMAL:
            phrase = ' '.join(t[0])
            if phrase:
                lst.append((style,phrase))
        else:
            lst.append((style,t[0]))
    return parseAction

BOLD = 'Bold'
ITALIC = 'Italic'
NORMAL = 'Normal'
CHAR_STYLE_CHOICES = {BOLD: 'With bold emphasis',
                      ITALIC: 'With italics',
                      NORMAL: 'Default Formatting',}

wordLine = []
word = ZeroOrMore(Word(alphanums))
italicized = QuotedString('*').setParseAction(buildLineAction(wordLine,ITALIC))
bolded = QuotedString('**').setParseAction(buildLineAction(wordLine,BOLD))
normal = (Group(word) | ~bolded | ~italicized).setParseAction(buildLineAction(wordLine,NORMAL))

wikiMarkup = bolded | italicized | normal

print wikiMarkup.transformString(wikiInput)
print wordLine

it produces:

HereisasimpleWikiinput:*Thisisinitalics.***Thisisinbold!*****Thisisinbolditalics!***Andthistextis**alsobold**.

[('Normal', 'Here is a simple Wiki input: *This is in italics.* **This is in bold!** ***This is in bold italics!*** And this text is **also bold**.')]

2015-11-06 17:29:06 - mbeaches

wrong output. this is what it produces

[('Normal', 'Here is a simple Wiki input'), ('Italic', 'This is in italics.'), ('Bold', 'This is in bold!'), ('Bold', '*This is in bold italics!'), ('Normal', 'And this text is'), ('Bold', 'also bold')]

2015-11-10 00:10:34 - utkarsh007 - How I can remove C/C++ style comments and genrtate a string back using pyparsing

Here is what I want to do

variable  'light-on' { //2 values
    type discrete[2] {  'true'  'false' };
    property 'position = (218, 195)' ;
}

And it should return a string of the format

variable  'light-on' {
    type discrete[2] {  'true'  'false' };
    property 'position = (218, 195)' ;
}

I was using regex for doing so. Does using pyparsing is better because file is large

2015-11-10 00:21:31 - utkarsh007

Give code example not the name of method please. I know the method but don't know how to use it

2015-11-10 01:11:18 - ptmcg

Pyparsing has a built-in expression for matching C++ style comments, cppStyleComment. Use it with suppress() and transformString() to do what you want:

code = '''variable  'light-on' { //2 values
        type discrete[2] {  'true'  'false' };
        property 'position = (218, 195)' ;
    }
    '''

    from pyparsing import *

    print cppStyleComment.suppress().transformString(code)

gives:

variable  'light-on' { 
    type discrete[2] {  'true'  'false' };
    property 'position = (218, 195)' ;
}

2015-11-10 05:45:47 - utkarsh007

@ptmcg Also I have one question. How I can replace multiple spaces and tabs by one space. More formally can you tell me pyparsing implementation for this

remove_multiple_spaces = re.compile(r'[' ''\t''\r''\f'][' ''\t''\r''\f']*')
string = remove_multiple_spaces.sub(' ',string)

2015-11-10 05:52:17 - utkarsh007

Also is there something like str.find() in pyparsing. I want a functionality that returns the first occurrence of a pattern starting from a specific point and then stops.

2015-11-11 17:02:30 - ptmcg

ParseExpression.scanString may be what you are looking for. It returns a generator that moves through the input string, yielding each match that it finds (along with the start and end locations). As you iterate over the generator, you get each successive match, starting at the last match's end location. Will that do what you are looking for?

2015-11-12 08:05:51 - utkarsh007

No, I have to iterate over and this will slow down. I want similar function to str.find() in which I can give start location to search so that I can stop at the first found location.

2015-11-12 08:05:53 - utkarsh007

No, I have to iterate over and this will slow down. I want similar function to str.find() in which I can give start location to search so that I can stop at the first found location.


2015-11-11 09:01:11 - pdelsante - simpleBool.py: Binding values to parsed operands

Hello,

Suppose you want to extend the simpleBool.py example () to create a parser object that can be imported from other modules.

For example, assume you have a list of boolean expressions:

expressions = [
    'a and b',
    'a or b',
    'a and not b',
]

Now, somewhere else, you have got a list of dictionaries defining values for a and b:

values = [
    { 'a': True, 'b': False, },
    { 'a': True, 'b': True, },
    { 'a': False, 'b': True, },
    { 'a': False, 'b': False, },
   ...
]

Now, you want to take entries from the 'values' list and evaluate every entry with all of the expressions above. This can be done by creating a class inside simpleBool.py, for example:

# On top, you define a global empty dict to hold operands:
operands = {}

# Now, you change the __init__ method of BoolOperand to take values
# from the operands dict instead of directly searching for global variables
# of the given name, e.g. 'a' becomes 'operands['a']'
class BoolOperand(object):
    def __init__(self,t):
        self.label = t[0]
        self.value = eval('operands['{}']'.format(t[0]))

# Rest of simpleBool.py here

# Then, right before __main__, you define a new class:
class RuleMatcher(object):
    def __init__(self, rule, values):
        global operands = values
        return boolExpr.parseString(rule)[0]

You can then do something like that:

from simpleBool import RuleMatcher

for element in values:
    for rule in expressions:
        res = RuleMatcher(rule, element)

This works pretty well, but now suppose you'd want to do that from a multithreaded process, spawning a thread for every rule and evaluating values to all threads. You cannot be relying on the global variable 'operands' inside the simpleBool module, since that is not thread safe. I have been thinking about this for some time, but I can't seem to find any clever way to assign the operands as an attribute of the RuleMatcher instance, and then have the BoolBinOp class take them from there... am I making a mountain out of a molehill? Please do not tell me that threads are evil, I already know but I can't avoid them ;-)

2015-11-11 09:07:55 - pdelsante

Uhm, sorry, there is a typo in RuleMatcher, should be:

class RuleMatcher(object):
    def __init__(self, rule, values):
        global operands
        operands = values
        return boolExpr.parseString(rule)[0]

2015-11-11 17:08:41 - ptmcg

One thing that I would start with is change the location of the eval in SimpleBool out of the init method and into the bool method, so that the binding is as late as possible. Then you could replicate the parsed data structure across all the threads (using pickle, or just repeated calls to parseString - ParseResults has a copy() method, but I think it is a shallow copy only). By giving each thread its own copy of the ParseResults, then you won't have issues with one thread stepping on another's evaluation process.

2015-11-12 02:53:59 - pdelsante

Hello, thanks for your answer. I see what you are meaning, and I could easily solve that part of the problem by moving the call to infixNotation() inside the RuleMatcher __init__() as follows:

class RuleMatcher(object):
    def __init__(self, rule, values):
        global operands
        operands = values
        self.boolExpr = infixNotation(...)
        return self.boolExpr.parseString(rule)[0]

Still, I don't think this would be enough, because of the global operands variable, that is shared across threads. So, basically, even if I move the eval() to the SimpleBool.__bool__() method (which is a good suggestion), every thread would be accessing the same 'operands' dict.

For example, when thread1 is initiated, it sets the operands global variable with the first set of values: { 'a': True, 'b': False }. Then, it calls parseString() with its own rule string. Now, assume that thread2 is initiated at around the same time: when it sets the operands global dict, it will overwrite thread1's values and this may change thread1's output. This is what I really don't know how to solve.

I could make it so that operands is a dict of dicts, having the thread id as the first-level key, for example:

operands = {
    'thread1': { 'a': True, 'b': False, },
    'thread2': { 'a': True, 'b': True, },
}

Then, the eval would become:

import threading
thread_id = threading.current_thread().ident
eval('operands['{}']['{}']'.format(thread_id, t[0]))

Still, I don't like that approach as I would also need to remember to delete dictionary entries once I'm done with them.


2015-11-27 05:27:42 - Williamzjc - Correction and Suggestion

Line 177,178: It is should be

self.msg = ''
self.pstr = pstr

Please consider it.

Line 999: It is loc or preloc? Which one is better?

Why not define self.__accumNames as a set. Only Line 219,302,686 should be corrected slightly.

From Line 1733: The definitions of properties initCharsOrig, initChars, bodyCharsOrig, bodyChars are a little verbose. Why not left two of them?

Line 2970: it is indeed redundant.


2015-12-02 04:41:43 - mentaal - issue with excludeChars "Word" keyword arg

Hi, I am trying to make a parser for a number which can contain an '_'. I would like to underscore to be suppressed in the output. For example, a valid word would be 1000_000 which should return a number: 1000000. I have tried the excludeChars keyword argument for this as my understanding is that this should do the following: 'If supplied, this argument specifies characters not to be considered to match, even if those characters are otherwise considered to match.' So below is my attempt:

num = pp.Word(pp.nums+'_', excludeChars='_')
num.parseString('123_4')

but I end up with the result '123' instead of '1234' Any suggestions? Thanks!

2015-12-08 05:39:14 - ptmcg

You also posted this in StackOverflow - I answered it there.

Welcome to pyparsing! -- Paul


2015-12-08 04:57:42 - Jeroen537 - QuotedString behaves unexpectedly (?)

How can I get QuotedString to strip the opening and closing delimiters, and nothing more, when some characters from these delimiters can also be part of the inner string? I want to retrieve what is in between, for further processing (parsing). Examples:

print(QuotedString(''''').parseString(''''^']''''))
[�^']'] (as I expected)

print(QuotedString(�'''').parseString('''''^']''''))
[�'^']'] (also as expected, the fourth double quote from the beginning is included in the string)

print(QuotedString(''''').parseString('''''^']'''''))
[�'^']'] (unexpected, the fourth double quote from the end is excluded).

Do I misunderstand how QuotedString is supposed to work? If so, is there another way of achieving my goal? I�d rather avoid complicated regular expressions.

(Background: As a learning project, I am working on a parser for SPARQL on the basis of its EBNF grammar. One of the production rules reads

STRING_LITERAL_LONG2      ::=   ''''' ( ( ''' | '''' )? ( [^'\] | ECHAR ) )* ''''' 

where ECHAR stands for a string representing an escaped character, like '\t' etc. The interesting part here is that one or two double quotes are allowed inside the expression, but without any of them being escaped.)

2015-12-08 05:29:04 - Jeroen537

OK, I found a way around it, I think:

def stringLiteralLong2Helper(r):
    return ( Optional ( Literal(''') | Literal('''') ) + \
           ZeroOrMore( Word ('^']', exact=1) | ECHAR )
           ).leaveWhitespace().parseString(r[0][3:-3])

STRING_LITERAL_LONG2 = SkipTo(stringEnd).setParseAction(stringLiteralLong2Helper) 

This seems to work on my test set. Any comments welcome.

The question about QuotedString behavior remains.

2015-12-08 05:36:45 - ptmcg

The problem isn't that QuotedString is stripping off the ending quotes, it is that the regex inside is finishing early, when it sees that first instance of '''''. Since your issue seems to occur only when the one- or two-quote sequence comes at the end of the quoted string, I think the hack/workaround of 'Combine(QuotedString(''''') + Optional(Word(''')))' might work for you, see below (using the new runTests() method):

from pyparsing import *

tests = r'''
'''^']'''
''''^']'''
''''^']''''
''''^']'''''
''''^']'' blah'''
'''[1:-1]  # strip leading and trailing newlines

qs = QuotedString(''''')
qs.runTests(tests, parseAll=True)

print

qs = Combine(QuotedString(''''') + Optional(Word(''')))
qs.runTests(tests, parseAll=True)

prints

'''^']'''
['^']']

''''^']'''
[''^']']

''''^']''''
          ^
Expected end of text (at char 10), (line:1, col:11)

''''^']'''''
          ^
Expected end of text (at char 10), (line:1, col:11)

''''^']'' blah'''
[''^']'' blah']


'''^']'''
['^']']

''''^']'''
[''^']']

''''^']''''
[''^']'']

''''^']'''''
[''^']''']

''''^']'' blah'''
[''^']'' blah']

2015-12-08 06:43:53 - Jeroen537

Great, a simple and elegant solution. I also understand better now how QuotedString works. Thanks for explaining. Nice to learn about runTests, which I was not aware of. Is there an overview of current functionality apart from the API docs? Thanks btw for a great module!

2015-12-08 08:04:02 - ptmcg

The source distribution includes this CHANGES file, which you can also view from the SourceForge SVN repo: . Glad pyparsing is working for you! -- Paul

2015-12-08 08:16:54 - ptmcg

(btw - this link includes the CHANGES updates that will be included in the next release, 2.0.7. This release is just now under development, so to-be-released mods are staged in SVN, so you can get an advance look at what's coming. The current released version is 2.0.6.)


2015-12-16 07:29:57 - StephenDause - ParseException seems to give incorrect location

I'm having trouble getting a useful location of parse failures. I took the cpp_enum_parser.py example and modified it to demonstrate:

#!/usr/bin/python

from pyparsing import *
# sample string with enums and other stuff
sample = \
 '''enum hello {
        Zero,
        One,
        Two,
        Three,
        Five=5,
        Six,
        Ten=10
        .};
    enum blah
        {
        alpha,
        beta,
        gamma = 10 ,
        zeta = 50
        };
    '''

# syntax we don't want to see in the final parse tree
LBRACE,RBRACE,EQ,COMMA,SEMICOLON = map(Suppress,'{}=,;')
_enum = Suppress('enum')
identifier = Word(alphas,alphanums+'_')
integer = Word(nums)
enumValue = Group(identifier('name') + Optional(EQ + integer('value')))
enumList = Group(enumValue + ZeroOrMore(COMMA + enumValue))
enum = _enum + identifier('enum') + LBRACE + enumList('names') + RBRACE + SEMICOLON
enums = ZeroOrMore(Group(enum))

enum_vals = enums.parseString(sample, parseAll=True)

print enum_vals

The ParseException I get has a location of char 0, line 1, col 1. I am expecting a location around line 9.

Is there anything I'm doing wrong, or is this a bug?


2015-12-21 04:33:45 - anon3456 - Parsing GDB/MI

I'm trying to parse GDB/MI with pyparsing, but I'm having a couple problems.

import pyparsing as pp

test='''*stopped,reason='breakpoint-hit',disp='keep',bkptno='1',thread-id='0',frame={addr='0x08048564',func='main',args=[{name='argc',value='1'},{name='argv',value='0xbfc4d4d4'}],file='myprog.c',fullname='/home/nickrob/myprog.c',line='68'}'''

ID = pp.Word(pp.alphas).setResultsName('id') # id i.e. 'stopped', 'reason' or 'value'
OP = pp.Regex('[\*\-\+\&\@\~\^]') # operator i.e. '*'
STRING = pp.QuotedString(''', escChar='\\') # quoted string i.e. ''0xbfc4d4d4''
EQUALS = pp.Literal('=') # equals
_ARRAY = pp.nestedExpr(opener='[', closer=']', content=STRING)
DICT = pp.nestedExpr(opener='{', closer='}', content=STRING|_ARRAY)
ARRAY = pp.nestedExpr(opener='[', closer=']', content=STRING|DICT)
VALUE = STRING ^ DICT ^ ARRAY
WHOLE = (OP + ID + pp.Literal(',').suppress() + pp.delimitedList(pp.dictOf(ID + EQUALS.suppress(), VALUE))) #.setDebug() # everything joined up
print(WHOLE.parseString(test)) #.asDict()

output:

['*', 'stopped', ['reason', 'breakpoint-hit'], ['disp', 'keep'], ['bkptno', '1']]

I can't work out why it's stopping at 'bkptno', any guidance would be appreciated :) Thanks

2015-12-21 05:58:22 - Williamzjc

thread-id does not match ID

2015-12-21 06:03:06 - anon3456

ah. >.< thanks!

2015-12-21 06:09:13 - Williamzjc

You'd better to consider 'frame = ...' making the problem more ez

2015-12-21 06:22:03 - anon3456

ah. >.< thanks!

2015-12-21 07:55:34 - ptmcg

As williamzjc pointed out, your ID expression needs to accept embedded '-'s. I recommend you use the two-argument form of Word to do this, as in ID = Word(pp.alphas, pp.alphanums+'-_'); if you just do ID = Word(pp.alphas+'-_') then you could match an id consisting of just one or more '-'s (which may be valid, I don't know this format).

As for the rest, I would advise that you do your own explicit definitions for ARRAY and DICT, using a Forward definition for VALUE. nestedExpr is provided in pyparsing mostly as a short cut for jumping over nested {}'s or ()'s in source code, and getting at the contents is not so straightforward.

By your inclusion of dictOf, I assume you want to get at the elements by their names. Here is how I would structure your parser:

#~ instead of
#~ _ARRAY = pp.nestedExpr(opener='[', closer=']', content=STRING)
#~ DICT = pp.nestedExpr(opener='{', closer='}', content=STRING|_ARRAY)
#~ ARRAY = pp.nestedExpr(opener='[', closer=']', content=STRING|DICT)

VALUE = pp.Forward()
KEY_VALUE = pp.Group(ID + EQUALS.suppress() + VALUE)
ARRAY = pp.Group(pp.Suppress('[') + pp.Optional(pp.delimitedList(VALUE)) + pp.Suppress(']'))
DICT = (pp.Suppress('{') + pp.Optional(pp.Dict(pp.delimitedList(KEY_VALUE))('map')) + pp.Suppress('}'))

Then your expression for WHOLE works fine. Instead of using asDict(), you can just access the results directly by name. If you print out the contents using dump(), you'll get a nice output like this:

['*', 'stopped', ['reason', 'breakpoint-hit'], ['disp', 'keep'], ['bkptno', '1'], 
    ['thread-id', '0'], ['frame', [['addr', '0x08048564'], ['func', 'main'], 
    ['args', [[['name', 'argc'], ['value', '1']], [['name', 'argv'], ['value', '0xbfc4d4d4']]]], 
    ['file', 'myprog.c'], ['fullname', '/home/nickrob/myprog.c'], ['line', '68']]]]
- bkptno: 1
- disp: keep
- frame: [[['addr', '0x08048564'], ['func', 'main'], ['args', [[['name', 'argc'], ['value', '1']], 
         [['name', 'argv'], ['value', '0xbfc4d4d4']]]], ['file', 'myprog.c'], 
         ['fullname', '/home/nickrob/myprog.c'], ['line', '68']]]
  - id: frame
  - map: [['addr', '0x08048564'], ['func', 'main'], ['args', [[['name', 'argc'], ['value', '1']], 
         [['name', 'argv'], ['value', '0xbfc4d4d4']]]], ['file', 'myprog.c'], 
         ['fullname', '/home/nickrob/myprog.c'], ['line', '68']]
    - addr: 0x08048564
    - args: [[[['name', 'argc'], ['value', '1']], [['name', 'argv'], ['value', '0xbfc4d4d4']]]]
      - id: args
    - file: myprog.c
    - fullname: /home/nickrob/myprog.c
    - func: main
    - line: 68
- id: stopped
- reason: breakpoint-hit
- thread-id: 0

Then you can access the results using code like:

results['id']  # gives 'stopped'
results['bkptno'] # gives 1
results.bkptno # also gives 1
results.frame.map.file # gives 'myprog.c'

-- Paul


2015-12-27 06:41:50 - AndreWin - Parsing markdown

Hello!

I'd like to parse the string like this: 'Variable my_cool_var can accept only positive value'.

I tried:

italic = pp.QuotedString('_').leaveWhitespace()
italic.setParseAction(lambda t: '<italic>{0}</italic>'.format(t[0]))
print(italic.transformString('Variable my_cool_var can accept _only_ positive value'))

I get result:

Variable mycoolvar can accept only positive value

How can I change my code to keep 'my_cool_var' as is?

Best regards, Andrey.

2015-12-27 08:32:35 - ptmcg

leaveWhitespace() is there to override pyparsing's default behavior of skipping over whitespace before trying to match. I think you want to surround your italic definition with WordStart/WordEnd like this:

italic = (pp.WordStart() + pp.QuotedString('_') + pp.WordEnd())

This will give you

Variable my_cool_var can accept <italic>only</italic> positive value

-- Paul

2015-12-27 08:56:00 - AndreWin

Thanks a lot, Paul!)

But... I can't understand why this example works... I read doc, but this wasn't to help me...

Best regards, Andrey.

2015-12-27 09:02:16 - AndreWin

I just tried:

italic.transformString('This is _my_cool_var_ in my code')

I got result:

'This is _my_cool_var_ in my code'

I'd like to get <italic>my_cool_var</italic> unstead.

Best regards, Andrey.

2015-12-27 09:06:11 - AndreWin

It seems, I understood: in 'This is _my_cool_var_ in my code' parser see quoted string _my_ and my conditions don't work.

Best regards, Andrey.

2015-12-27 09:10:11 - AndreWin

It seems, escQuote parameter of QuotedString should help. But I don't know what condition to write for this parameter...

Best regards, Andrey.

2015-12-27 09:24:53 - AndreWin

italic = (pp.WordStart() + pp.QuotedString('_', escQuote='_') + pp.WordEnd())
italic.setParseAction(lambda t: '<italic>{0}</italic>'.format(t.asList()[0]))

test:

italic.transformString('Variable my_cool_var can accept only_only positive value')

This will give:

'Variable my_cool_var can accept only_only positive value'

So this problem seems to be solve!)

Best regards, Andrey.

2015-12-27 09:24:56 - ptmcg

More likely, QuotedString is not really what you should be using. Probably best to roll your own, something like

italic = pp.Combine(pp.WordStart() + '_' + pp.SkipTo('_' + pp.WordEnd(), include=True))

For that matter, you might as well just use a Regex:

italic = pp.Regex(r'\b_.*?_\b')

2015-12-27 09:28:02 - AndreWin

Hm... My last example works fine...

For example:

italic.transformString('This is _my_very_cool_var_ in my code')

This will give:

'This is <italic>my_very_cool_var</italic> in my code'

Best regards, Andrey.