-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pp_split: no SWITCHSTACK in @ary = split(...) optimisation #18232
pp_split: no SWITCHSTACK in @ary = split(...) optimisation #18232
Conversation
Note for reviewers: this PR passes all tests, but there are a few things I'm unsure about:
|
pp.c
Outdated
GETTARGET; | ||
XPUSHi(iters); | ||
if (gimme != G_ARRAY) { | ||
PUSHMARK(SP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't noticed before, but this is incorrect, with your patch:
$ ./miniperl -e 'sub foo { print "@_\n" } foo(1, scalar split " ", "a b")'
2
my system perl (5.28):
$ perl -e 'sub foo { print "@_\n" } foo(1, scalar split " ", "a b")'
1 2
Removing the PUSHMARK fixes that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tony, looks like another test case to add to split.t :)
However, removing the PUSHMARK causes test failures in lib/Config.t:
not ok 79 - 3
# Failed test 79 - 3 at lib/Config.t line 140
# got "PERL_API_REVISION=\'5\'"
# expected "3"
# 3 lines found
not ok 80 - 3
# Failed test 80 - 3 at lib/Config.t line 141
# got "\'5\'"
# expected "3"
# 3 lines found
ok 81 - trailing colon gives 1-line response: PERL_API_REVISION='5' PERL_API_SUBVERSION='3' PERL_API_VERSION='33'
ok 82 - trailing colon gives 1-line response: '5' '3' '33'
not ok 83 - 4
# Failed test 83 - 4 at lib/Config.t line 146
# got "PERL_API_REVISION"
# expected "4"
# found 'tag='
not ok 84 - 4
# Failed test 84 - 4 at lib/Config.t line 147
# got "PERL_API_REVISION"
# expected "4"
# found 'tag='
# test tagged responses, multi-line and single-line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mashing on the keyboard "fixes it" by changing that block to:
if (gimme != G_ARRAY) {
if (PL_op->op_private & OPpSPLIT_ASSIGN)
PUSHMARK(SP)
GETTARGET;
XPUSHi(iters);
}
But I don't have a good enough understanding of the mark stack yet to know whether this is the correct thing to do or just happens to work for the cases tested....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed an update that only does the PUSHMARK sometimes and adds in your test case from above. Please could you take another look when you get a chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you have an off-by-one error, Using your latest:
tony@mars:.../git/perl2$ ./perl -e 'sub foo { print "@_\n" } foo(1, scalar(@x = split " ", "a b"))'
2
# system perl
tony@mars:.../git/perl2$ perl -e 'sub foo { print "@_\n" } foo(1, scalar(@x = split " ", "a b"))'
1 2
I'll take a closer look soon, but maybe you'll have an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the problem is the:
SP = SP + 1 - iters;
I assume you added 1 here to match the:
Copy(SP + 1 - iters, AvARRAY(ary), iters, SV*);
that you were suspicious of, but that Copy() is correct. SP
in perl points at the topmost stack item not 1 just beyond the topmost item, so the pointer arithmetic adds the +1 to adjust for that.
The issue with:
SP = SP + 1 - iters;
is we've pushed iters
items, but this only removes (iters-1)
items, this would be okay if the code near return re-used the slot, but it pushes instead, so we end up with the first element from the split result on the stack and the count of items, eg if we remove the extra PUSHMARK()s you get:
tony@mars:.../git/perl2$ ./miniperl -le 'sub foo { print "@_\n" } foo(1, scalar (@x = split " ", "a b"));'
1 a 2
The PUSHMARKs just hide the "a" from the list op (the call to foo()), and anything else pushed.
So the latest commit has two issues:
- extra PUSHMARK()s that confuse any list operators (only the PUSHMARK needed is the prep for the tied call)
- the off by one error in adjusting SP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tony. I'd missed the off-by-one and so assumed there must be something I should be doing with the mark stack. New commit pushed.
The
@ary = split(...)
optimisation uses SWITCHSTACK to make@ary
masquerade as the value stack. However, code that is not aware of this
could modify
@ary
during the split and cause perl to segfault/panic.(e.g. see added tests)
This commit essentially removes that SWITCHSTACK and then reverses
some operations (e.g. Copy stack<->array) towards the end of the function.
There is also some refactoring to consolidate all changes to
@ary
at theend of the function, rather than having some at the beginning (previously
essential) and some at the end.
No user-visible changes - besides perl not crashing in the tests - are
intended. However, there is the unavoidable side effect that a large split
will now permanently grow the stack when that might previously not have
happened.
Note: this PR is standalone but comes from the discussion in #18014.
Use of a temporary AV was previously suggested (#18090) but rejected as
adding too much complexity to an already complex OP. Should a temp
AV be needed in the future, #18090 could probably be improved upon
by some of the refactoring in this commit.
Basic Performance Measurements
$str = "perl"; my @ary; for (1 .. 1_000_000) { @ary = split(//, $str); }
$str = "perl" x 1000; for (1 .. 1_000) { my @ary = split(//, $str); }
on both
@ary
and the stack. e.g.for ( @ary = split(//, $str) ){ }
(There should be no performance difference other than noise for any
gimme_scalar split that does not generate SVs.)
On x64 Linux with gcc, there was no size difference in the perl binary,
although disassembly suggests that pp_split itself has fewer instructions.