Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pp_split: no SWITCHSTACK in @ary = split(...) optimisation #18232

Closed

Conversation

richardleach
Copy link
Contributor

The @ary = split(...) optimisation uses SWITCHSTACK to make @ary
masquerade as the value stack. However, code that is not aware of this
could modify @ary during the split and cause perl to segfault/panic.
(e.g. see added tests)

This commit essentially removes that SWITCHSTACK and then reverses
some operations (e.g. Copy stack<->array) towards the end of the function.
There is also some refactoring to consolidate all changes to @ary at the
end of the function, rather than having some at the beginning (previously
essential) and some at the end.

No user-visible changes - besides perl not crashing in the tests - are
intended. However, there is the unavoidable side effect that a large split
will now permanently grow the stack when that might previously not have
happened.

Note: this PR is standalone but comes from the discussion in #18014.
Use of a temporary AV was previously suggested (#18090) but rejected as
adding too much complexity to an already complex OP. Should a temp
AV be needed in the future, #18090 could probably be improved upon
by some of the refactoring in this commit.

Basic Performance Measurements

  • A small split was about 10% faster:
    $str = "perl"; my @ary; for (1 .. 1_000_000) { @ary = split(//, $str); }
  • A mid-sized split was about 5% slower:
    $str = "perl" x 1000; for (1 .. 1_000) { my @ary = split(//, $str); }
  • No meaningful difference for splits where the resulting SVs have to be
    on both @ary and the stack. e.g.
    for ( @ary = split(//, $str) ){ }
    (There should be no performance difference other than noise for any
    gimme_scalar split that does not generate SVs.)

On x64 Linux with gcc, there was no size difference in the perl binary,
although disassembly suggests that pp_split itself has fewer instructions.

@richardleach
Copy link
Contributor Author

Note for reviewers: this PR passes all tests, but there are a few things I'm unsure about:

  1. Is "SP + 1 - iters" seems a bit messy, should I be doing something else here?
    Copy(SP + 1 - iters, AvARRAY(ary), iters, SV*);
  2. This block works, but is it the appropriate thing to be doing? Should I instead be using PUSHMARK(SP) earlier in the function and then popping it at this point?
          if (gimme != G_ARRAY) {
            /* SP points to the final SV* pushed to the stack. But the SV*  */
            /* are not going to be used from the stack. Point SP to below   */
            /* the first of these SV*.                                      */
            SP = SP + 1 - iters;
            PUTBACK;
          }
  1. This block works, but is it the appropriate thing to be doing?
    if (gimme != G_ARRAY) {
        PUSHMARK(SP);
        GETTARGET;
        XPUSHi(iters);
    }

@richardleach richardleach requested a review from iabyn October 11, 2020 18:50
pp.c Show resolved Hide resolved
@richardleach richardleach requested a review from tonycoz October 27, 2020 23:41
pp.c Outdated
GETTARGET;
XPUSHi(iters);
if (gimme != G_ARRAY) {
PUSHMARK(SP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't noticed before, but this is incorrect, with your patch:

$ ./miniperl -e 'sub foo { print "@_\n" } foo(1, scalar split " ", "a b")'
2

my system perl (5.28):

$ perl -e 'sub foo { print "@_\n" } foo(1, scalar split " ", "a b")'
1 2

Removing the PUSHMARK fixes that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tony, looks like another test case to add to split.t :)

However, removing the PUSHMARK causes test failures in lib/Config.t:

not ok 79 - 3
# Failed test 79 - 3 at lib/Config.t line 140
#      got "PERL_API_REVISION=\'5\'"
# expected "3"
# 3 lines found
not ok 80 - 3
# Failed test 80 - 3 at lib/Config.t line 141
#      got "\'5\'"
# expected "3"
# 3 lines found
ok 81 - trailing colon gives 1-line response: PERL_API_REVISION='5' PERL_API_SUBVERSION='3' PERL_API_VERSION='33' 
ok 82 - trailing colon gives 1-line response: '5' '3' '33' 
not ok 83 - 4
# Failed test 83 - 4 at lib/Config.t line 146
#      got "PERL_API_REVISION"
# expected "4"
# found 'tag='
not ok 84 - 4
# Failed test 84 - 4 at lib/Config.t line 147
#      got "PERL_API_REVISION"
# expected "4"
# found 'tag='
# test tagged responses, multi-line and single-line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mashing on the keyboard "fixes it" by changing that block to:

    if (gimme != G_ARRAY) {
        if (PL_op->op_private & OPpSPLIT_ASSIGN)
            PUSHMARK(SP)
         GETTARGET;
         XPUSHi(iters);
    }

But I don't have a good enough understanding of the mark stack yet to know whether this is the correct thing to do or just happens to work for the cases tested....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed an update that only does the PUSHMARK sometimes and adds in your test case from above. Please could you take another look when you get a chance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you have an off-by-one error, Using your latest:

tony@mars:.../git/perl2$ ./perl -e 'sub foo { print "@_\n" } foo(1, scalar(@x = split " ", "a b"))'
2
# system perl
tony@mars:.../git/perl2$ perl -e 'sub foo { print "@_\n" } foo(1, scalar(@x = split " ", "a b"))'
1 2

I'll take a closer look soon, but maybe you'll have an idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the problem is the:

                SP = SP + 1 - iters;

I assume you added 1 here to match the:

Copy(SP + 1 - iters, AvARRAY(ary), iters, SV*);

that you were suspicious of, but that Copy() is correct. SP in perl points at the topmost stack item not 1 just beyond the topmost item, so the pointer arithmetic adds the +1 to adjust for that.

The issue with:

                SP = SP + 1 - iters;

is we've pushed iters items, but this only removes (iters-1) items, this would be okay if the code near return re-used the slot, but it pushes instead, so we end up with the first element from the split result on the stack and the count of items, eg if we remove the extra PUSHMARK()s you get:

tony@mars:.../git/perl2$ ./miniperl -le 'sub foo { print "@_\n" } foo(1, scalar (@x = split " ", "a b"));'
1 a 2

The PUSHMARKs just hide the "a" from the list op (the call to foo()), and anything else pushed.

So the latest commit has two issues:

  • extra PUSHMARK()s that confuse any list operators (only the PUSHMARK needed is the prep for the tied call)
  • the off by one error in adjusting SP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tony. I'd missed the off-by-one and so assumed there must be something I should be doing with the mark stack. New commit pushed.

@tonycoz
Copy link
Contributor

tonycoz commented Nov 16, 2020

(mostly) squashed and applied as 607eaf2 and ab307de

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants