Optimize IngestDocument FieldPath allocation #120573

joegallo · 2025-01-22T04:22:16Z

Constructing a FieldPath from a String path requires that we split (via String#split) into an array of substrings (we're splitting on dots, so for example "foo.bar.baz" becomes ["foo", "bar", "baz"]).

So that's allocation of an ArrayList to hold the results as we do the scan, allocation of the Strings to hold each individual substring, and finally allocation of the resulting array at the end when the scan is finished. It's not the slowest thing ever, but it's not free. Of course the scan itself has some small CPU cost, too. (For the record, though, we go down the fast path of String#split, so it's not like we're doing regexes here, thank goodness.)

We call it like it's free, however (😬). Consider the happy path of a rename processor:

elasticsearch/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RenameProcessor.java

Lines 83 to 86 in 5efe216

    
           Object value = document.getFieldValue(path, Object.class); 
        
           document.removeField(path); 
        
           try { 
        
               document.setFieldValue(target, value);

Let's imagine we loop over 1000 incoming json documents and run that processor. For each json document we'll turn some path into a FieldPath twice (once for the getFieldValue and once for the removeField), then we'll turn some other path into a FieldPath once (for the setFieldValue). And we do that for all 1000 documents.

Anyway... that's a lot of arrays of substrings that we're allocating.

This PR introduces a local static cache that holds onto previously allocated FieldPath objects and allows us to look them up by the path they represent. Returning references to already allocated FieldPath objects is way faster than allocating new ones.

The same pattern of a map that we just whack when it exceeds its size limit is applied in StringLiteralDeduplicator (introduced in #76405) and DateProcessor (see #92880) -- it works pretty well!

Like #120571 this doesn't improve any one ingest processor specifically, it just kinda makes them all a bit faster -- reading and writing values is what this makes faster, and most of what every processor does is read a value, do something with it, and then write the result.

This makes all of ingest on my local test benchmark faster by about 20%, but that's not necessarily indicative of normal workloads. It makes an example convert processor faster by 80%, and a remove processor faster by 70%, but a date processor only gets a smidge faster -- the former processors are mostly just shuffling values around, so the effect is outsized there, while the latter has real work to do (parsing date strings) that this doesn't make any faster.

elasticsearchmachine · 2025-01-22T04:22:40Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2025-01-22T04:22:40Z

Hi @joegallo, I've created a changelog YAML for you.

server/src/main/java/org/elasticsearch/ingest/IngestDocument.java

nielsbauman

LGTM, thanks! 🚀

elasticsearchmachine · 2025-01-29T18:39:57Z

💚 Backport successful

Status	Branch	Result
✅	8.x

joegallo · 2025-03-24T13:30:08Z

Here's a screenshot from the nightly benchmarks -- the drop on in ingest time spent in set and remove processors really jumps out here. There were also some later drops from #125051 and #125232.

joegallo added 5 commits January 21, 2025 18:24

Make these final

f10d4b0

Make the FieldPath class static final

065bc9d

Hide construction behind a static method

856e5e2

Move the input checking to the static method

046c115

Add a static cache utility

0a2681a

joegallo added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v9.0.0 v8.18.0 labels Jan 22, 2025

joegallo requested a review from nielsbauman January 22, 2025 04:22

Update docs/changelog/120573.yaml

eb6ef07

nielsbauman reviewed Jan 22, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/ingest/IngestDocument.java Show resolved Hide resolved

nielsbauman approved these changes Jan 22, 2025

View reviewed changes

joegallo added the auto-backport Automatically create backport pull requests when merged label Jan 27, 2025

joegallo added 2 commits January 29, 2025 12:29

Merge branch 'main' into ingest-document-field-path

e76f44b

Add some comments

1c762c7

joegallo merged commit d763805 into elastic:main Jan 29, 2025
16 checks passed

joegallo deleted the ingest-document-field-path branch January 29, 2025 18:38

joegallo added a commit to joegallo/elasticsearch that referenced this pull request Jan 29, 2025

Optimize IngestDocument FieldPath allocation (elastic#120573)

9546667

joegallo mentioned this pull request Jan 29, 2025

[8.x] Optimize IngestDocument FieldPath allocation (#120573) #121227

Merged

elasticsearchmachine pushed a commit that referenced this pull request Jan 29, 2025

Optimize IngestDocument FieldPath allocation (#120573) (#121227)

0b52f95

joegallo mentioned this pull request Mar 6, 2025

[Ingest Pipeline] Remove processor can be significantly slower than other processors #123891

Closed

joegallo mentioned this pull request Mar 19, 2025

Add an ignoreMissing parameter to IngestDocument's removeField method #125232

Merged

joegallo mentioned this pull request Mar 24, 2025

Refactor IngestDocument's resolve method #125051

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize IngestDocument FieldPath allocation #120573

Optimize IngestDocument FieldPath allocation #120573

Uh oh!

joegallo commented Jan 22, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jan 22, 2025

Uh oh!

elasticsearchmachine commented Jan 22, 2025

Uh oh!

Uh oh!

nielsbauman left a comment

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 29, 2025

Uh oh!

joegallo commented Mar 24, 2025

Uh oh!

Uh oh!

	Object value = document.getFieldValue(path, Object.class);
	document.removeField(path);
	try {
	document.setFieldValue(target, value);

Optimize IngestDocument FieldPath allocation #120573

Optimize IngestDocument FieldPath allocation #120573

Uh oh!

Conversation

joegallo commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 22, 2025

Uh oh!

elasticsearchmachine commented Jan 22, 2025

Uh oh!

Uh oh!

nielsbauman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 29, 2025

💚 Backport successful

Uh oh!

joegallo commented Mar 24, 2025

Uh oh!

Uh oh!

joegallo commented Jan 22, 2025 •

edited

Loading