-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Optimize IngestDocMetadata isAvailable #120753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize IngestDocMetadata isAvailable #120753
Conversation
You can't actually remove a key from a map while you're in the middle of iterating through that same map. The only tests of this code happened to pass because of the order of the keys in the tests (I swear I am not making this up).
and shortcut the map lookup in the almost-always-expected case of the key being looked up on a CtxMap *not* having a leading underscore (and therefore the whole isAvailable thing being an unnecessary expense).
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @joegallo, I've created a changelog YAML for you. |
|
||
private static Map<String, FieldProperty<?>> validateLeadingUnderscores(final Map<String, FieldProperty<?>> properties) { | ||
for (String key : properties.keySet()) { | ||
assert key.charAt(0) == UNDERSCORE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masseyke assert
made sense to me before I pulled this out into a method, now I'm thinking it should be an IllegalArgumentException
, perhaps. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched it in a683c54.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the constructor that lets you pass in arbitrary properties is only used by unit tests. It seems a little odd that we have this dangerous constructor and validation around it just for the sake of unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fe05c56 drops the bad constructor and fusses with a handful of tests that were using the more flexible version of things. I always hated TestIngestCtxMetadata
so thanks for pushing on this, because now it's gone.
@@ -150,7 +151,7 @@ public Object remove(Object key) { | |||
@Override | |||
public void clear() { | |||
// AbstractMap uses entrySet().clear(), it should be quicker to run through the validators, then call the wrapped maps clear | |||
for (String key : metadata.keySet()) { | |||
for (String key : new ArrayList<>(metadata.keySet())) { // copy the key set to get around the ConcurrentModificationException | |||
metadata.remove(key); | |||
} | |||
// TODO: this is just bogus, there isn't any case where metadata won't trip a failure above? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove this TODO now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehhhhhhh, I rewrote the comment in ce216ff, but I didn't remove it. clear()
still doesn't work in the general case, for sure, and I'm not sure it works in any particularly practical case.
assert key != null && key.isEmpty() == false; | ||
// we can avoid a map lookup on most keys since we know that the only keys that are 'metadata keys' for an ingest document | ||
// must be keys that start with an underscore | ||
if (key.charAt(0) != UNDERSCORE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm amazed that this makes a difference at all (and I'm really curious why), but I've seen the charts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not have guessed that this would have made a measurable performance improvement, but it seems to. And it doesn't seem to cause any harm. LGTM
For example, this won't work for IngestCtxMap because the _version isn't nullable, and so the metadata of an ingest document cannot be clear()-ed.
fe05c56 here is big enough that I'm re-requesting review -- I don't want to sneak anything by anybody. |
We no longer allow arbitrary properties, the properties map is hardcoded. TestIngestCtxMetadata is deleted entirely, and several tests are rewritten to account for newer slightly-less-flexible (but way easier to reason about!) IngestDocMetadata.
f15f401
to
f18ea3a
Compare
A good bit of the diff on the PR is fixing some src and test buglets around the code I'm changing -- it turns out the code and tests only appeared to work, but in reality they were both broken.
The actual work of this PR is that the
IngestDocMetadata
is formalized to have all its properties start with a leading underscore character (this was already true except for tests, and now it's true in tests, too), and then since we know that all metadata properties start with a leading underscore, we can shortcut theisAvailable
check (which is a mapcontainsKey
call) in the case of a key with a leading character that is not an underscore and just return false.On the benchmark I'm running we call this a few hundred times per document, because we guard everything in
isAvailable
checks insideCtxMap
, so even though this is the micro-est of optimizations it does actually matter in practice. My guess is that it gives better CPU cache locality because we already have the key but the properties map itself might be elsewhere in a worse cache line or whatever -- that is, it's faster this way because we don't have to run off to the heap in the vast majority of cases now (or at least that's the story I'm telling myself, it might not actually be true).Here's a
rename
processor profile in the 'before' side, the purple bits are theisAvailable
invocations (we really do call this a lot!):