-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Minor pcre optimizations #12923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor pcre optimizations #12923
Conversation
This changes the variables that are bools to actually be bools instead of ints, which allows some additional optimization by the compiler (e.g. removing some ternaries and move extensions). It also gets rid of the use_flags argument because that's just the same as flags == 0. This reduces the call frame.
This can't be true if the second condition is true because it would require the string to occupy the entire address space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Didn't find any obvious mistakes.
@@ -1013,8 +1013,8 @@ static inline void add_offset_pair( | |||
static void populate_subpat_array( | |||
zval *subpats, const char *subject, PCRE2_SIZE *offsets, zend_string **subpat_names, | |||
uint32_t num_subpats, int count, const PCRE2_SPTR mark, zend_long flags) { | |||
bool offset_capture = (flags & PREG_OFFSET_CAPTURE) != 0; | |||
bool unmatched_as_null = (flags & PREG_UNMATCHED_AS_NULL) != 0; | |||
zend_long offset_capture = flags & PREG_OFFSET_CAPTURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect this to change the emitted assembly, at least after optimizations. Does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sequence emitted by my compiler (GCC 13) for the old code was:
mov reg, flags
test reg, flag
setne reg2
and for the if:
test reg2, reg2
jz ...
while the change I made turns this into:
mov reg, flags
and reg, flag
and then the same sequence for the if.
So for some reasons the compiler produces one extra instruction in the old code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok. This optimization seems to work fine on godbolt, probably something else in the function that trips it up. I'm just asking because it's easy to imagine that somebody will switch it back, expecting it to work. So a comment might be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I doubt that it really matters. What this optimization PR showed yet again is that reducing the number of memory accesses is the most worthwhile optimization. In my previous pcre optimization PR I reduced the number of memory accesses and it had a great improvement, and the biggest improvement from this PR is again caused by avoiding the marks
HashTable unwrapping from the zval.
bool offset_capture = (flags & PREG_OFFSET_CAPTURE) != 0; | ||
bool unmatched_as_null = (flags & PREG_UNMATCHED_AS_NULL) != 0; | ||
zend_long offset_capture = flags & PREG_OFFSET_CAPTURE; | ||
zend_long unmatched_as_null = flags & PREG_UNMATCHED_AS_NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on another line similar unmatched_as_null
is declared as uint32_t
No description provided.