Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix various Python 3.12 SyntaxWarning #5538

Merged

Conversation

elboulangero
Copy link
Contributor

No description provided.

@solardiz
Copy link
Member

Thanks! I wonder what these changes mean for Python 2? Most of these scripts use just python, which is our current convention for scripts written in a way that they're meant to work with both 2 and 3.

@exploide Can you please take a look?

@exploide
Copy link
Contributor

So the changes look fine in general.

Compatibility, also with Python 2, should not be an issue since the syntax stayed the same but Python 3.12+ is stricter and emits a warning when it comes to backslash sequences. That means if a string contains a backslash it must be a valid escape sequence like \n or, if a literal backslash is intended, then it must either be escaped with another backslash \\ or the string should be a r"raw\ string\ literal". But this works the same with Python 2.

There are only two things I would like to double-check.

  1. In adxcsouf2john.py the byte-string literal has been changed to a raw-string literal. But now string and bytes are probably incompatible and shouldn't match in the regex. I would expect that you need a raw-byte-string literal: rb'...'
  2. In ejabberd2john.py it uses the Grammer class of parsimonious. I don't know this package and can't tell whether it still works with the preceding newline this change now introduced.

Unfortunately, we have no test files for any of the formats in john-samples. But since @elboulangero found these warnings, you probably have some test files? Can you please retest especially the two mentioned scripts or at least justify that you are sure it works this way?

@solardiz
Copy link
Member

Thank you very much @exploide for such a thorough review!

@elboulangero This reminds me, if you have any sample/test inputs you could publish, please submit those to our john-samples repo via a pull request. Thank you!

@elboulangero
Copy link
Contributor Author

Thanks for the quick feedback.

First, I can share the warnings that I get with Python 3.12, and that are fixed with this PR:

/usr/share/john/adxcsouf2john.py:98: SyntaxWarning: invalid escape sequence '\ '
  matches = re.findall(b'([a-zA-Z0-9_-]{3,9})\ (\d{8})', data)
/usr/share/john/aix2john.py:36: SyntaxWarning: invalid escape sequence '\s'
  if re.match('^\s*\S+\s*:\s*$',line):
/usr/share/john/dashlane2john.py:75: SyntaxWarning: invalid escape sequence '\D'
  sys.stderr.write("\nThe required .aes files can be found inside %AppData%\Dashlane\profiles directory tree on Windows.\n")
/usr/share/john/dashlane2john.py:76: SyntaxWarning: invalid escape sequence '\ '
  sys.stderr.write("\nThe required .aes files can be found inside ~/Library/Application\ Support/Dashlane/profiles/ directory tree on macOS.\n")
/usr/share/john/dns/rdata.py:277: SyntaxWarning: invalid escape sequence '\#'
  if not token.is_identifier() or token.value != '\#':
/usr/share/john/ejabberd2john.py:80: SyntaxWarning: invalid escape sequence '\s'
  grammar = Grammar("""\
/usr/share/john/vmx2john.py:26: SyntaxWarning: invalid escape sequence '\)'
  ks_re = '.+phrase/(.*?)/pass2key=(.*?):cipher=(.*?):rounds=(.*?):salt=(.*?),(.*?),(.*?)\)'
/usr/share/legion/app/auxiliary.py:162: SyntaxWarning: invalid escape sequence '\['
  string = '\[[0-9]+\]\[[a-z-]+\].+'  # when a password is found, the line contains [port#][plugin-name]

In ejabberd2john.py it uses the Grammer class of parsimonious. I don't know this package and can't tell whether it still works with the preceding newline this change now introduced.

For this one, I think that just printing the two strings (before and after turning it in a raw string), is enough to show that there was no change:

$ python3.12
Python 3.12.6 (main, Sep  7 2024, 14:20:15) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> print("""\
...     entry = (term _ "." _)* _
...     term = boolean / atom / list / tuple / map / string / binary / number
...     atom = ~"[a-z][0-9a-zA-Z_]*" / ("'" ~"[^']*" "'")
...     _ = ~"\s*"
...     list = ( _ "[" _ term (_ "," _ term)* _ "]" ) / ( _ "[" _ "]")
...     tuple = ( _ "{" _ term (_ "," _ term)* _ "}" ) / ( _ "{" _ "}")
...     map   = ( _ "#{" _ keyvalue (_ "," _ keyvalue)* _ "}" ) / ( _ "#{" _ "}")
...     keyvalue = term _ "=>" _ term _
...     string = '"' ~r'(\\\\.|[^"])*' '"'
...     binary = "<<" string ">>"
...     boolean = "true" / "false"
...     number = ~"\-?[0-9]+\#[0-9a-zA-Z]+" / ~"\-?[0-9]+(\.[0-9]+)?((e|E)(\-|\+)?[0-9]+)?"
...     """)
<stdin>:1: SyntaxWarning: invalid escape sequence '\s'
    entry = (term _ "." _)* _
    term = boolean / atom / list / tuple / map / string / binary / number
    atom = ~"[a-z][0-9a-zA-Z_]*" / ("'" ~"[^']*" "'")
    _ = ~"\s*"
    list = ( _ "[" _ term (_ "," _ term)* _ "]" ) / ( _ "[" _ "]")
    tuple = ( _ "{" _ term (_ "," _ term)* _ "}" ) / ( _ "{" _ "}")
    map   = ( _ "#{" _ keyvalue (_ "," _ keyvalue)* _ "}" ) / ( _ "#{" _ "}")
    keyvalue = term _ "=>" _ term _
    string = '"' ~r'(\\.|[^"])*' '"'
    binary = "<<" string ">>"
    boolean = "true" / "false"
    number = ~"\-?[0-9]+\#[0-9a-zA-Z]+" / ~"\-?[0-9]+(\.[0-9]+)?((e|E)(\-|\+)?[0-9]+)?"

>>> print(r"""
...     entry = (term _ "." _)* _
...     term = boolean / atom / list / tuple / map / string / binary / number
...     atom = ~"[a-z][0-9a-zA-Z_]*" / ("'" ~"[^']*" "'")
...     _ = ~"\s*"
...     list = ( _ "[" _ term (_ "," _ term)* _ "]" ) / ( _ "[" _ "]")
...     tuple = ( _ "{" _ term (_ "," _ term)* _ "}" ) / ( _ "{" _ "}")
...     map   = ( _ "#{" _ keyvalue (_ "," _ keyvalue)* _ "}" ) / ( _ "#{" _ "}")
...     keyvalue = term _ "=>" _ term _
...     string = '"' ~r'(\\.|[^"])*' '"'
...     binary = "<<" string ">>"
...     boolean = "true" / "false"
...     number = ~"\-?[0-9]+\#[0-9a-zA-Z]+" / ~"\-?[0-9]+(\.[0-9]+)?((e|E)(\-|\+)?[0-9]+)?"
...     """)

    entry = (term _ "." _)* _
    term = boolean / atom / list / tuple / map / string / binary / number
    atom = ~"[a-z][0-9a-zA-Z_]*" / ("'" ~"[^']*" "'")
    _ = ~"\s*"
    list = ( _ "[" _ term (_ "," _ term)* _ "]" ) / ( _ "[" _ "]")
    tuple = ( _ "{" _ term (_ "," _ term)* _ "}" ) / ( _ "{" _ "}")
    map   = ( _ "#{" _ keyvalue (_ "," _ keyvalue)* _ "}" ) / ( _ "#{" _ "}")
    keyvalue = term _ "=>" _ term _
    string = '"' ~r'(\\.|[^"])*' '"'
    binary = "<<" string ">>"
    boolean = "true" / "false"
    number = ~"\-?[0-9]+\#[0-9a-zA-Z]+" / ~"\-?[0-9]+(\.[0-9]+)?((e|E)(\-|\+)?[0-9]+)?"

In fact, I saved the two outputs in two files and then compared them, otherwise it's not possible to really say if there's a difference.

In adxcsouf2john.py the byte-string literal has been changed to a raw-string literal. But now string and bytes are probably incompatible and shouldn't match in the regex. I would expect that you need a raw-byte-string literal: rb'...'

You are surely right on that, I updated the PR to use rb'...'. Thanks for the feedback!

you probably have some test files? Can you please retest especially the two mentioned scripts or at least justify that you are sure it works this way?

No, I don't have tests unfortunately.

I'm part of Kali Linux team, and Python 3.12 is the new default in Kali, it's about to hit Kali Rolling later today if all goes well. We see those warnings when the john package is installed, so I fix it, mainly for cosmetics purpose.

But no, I don't have specific tests, sorry. It's Ok if you don't want to merge this PR, if you think the changes can't be included without more extended testing.

Also, we do run all the tests as part of our CI pipelines, and at the moment the tests run only against Python 3.12. We don't test Python 2.

@exploide
Copy link
Contributor

I think it's fine and ready to merge as soon as you push the rb'...' change. (You wrote you changed it but the commit has not been updated.)

@solardiz
Copy link
Member

I think it's fine and ready to merge as soon as you push the rb'...' change. (You wrote you changed it but the commit has not been updated.)

Yes, I'd like to merge this based on the discussion so far. For the one pending change above, please amend and force-push the one existing commit, not add an extra commit. Thanks!

@elboulangero
Copy link
Contributor Author

I think it's fine and ready to merge as soon as you push the rb'...' change. (You wrote you changed it but the commit has not been updated.)

Ah indeed, I forgot to push, thanks for the nudge.

Now it's done and ready to merge I think.

@solardiz solardiz merged commit 63f6c35 into openwall:bleeding-jumbo Sep 17, 2024
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants