Skip to content

[ES|QL] Support some stats on aggregate_metric_double #120343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jan 29, 2025

Conversation

limotova
Copy link
Contributor

@limotova limotova commented Jan 17, 2025

Adds non-grouping support for min, max, sum, and count, using
CompositeBlock as the underlying block type and an internal
FromAggregateMetricDouble function to handle converting from
CompositeBlock to the correct metric subfields.

Closes #110649

@limotova limotova force-pushed the add-aggregate-double-metric-aggregates branch from a3123a7 to c12cf21 Compare January 17, 2025 06:57
Adds support for min, max, sum, and count
@limotova limotova force-pushed the add-aggregate-double-metric-aggregates branch from c12cf21 to ae16694 Compare January 18, 2025 04:14
@limotova limotova requested review from martijnvg and dnhatn January 18, 2025 08:31
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Larisa this look good! I left a few comments.

About the failing yamlRestCompatTest test suite, the following error is returned by aggregate double metric field mapper: Must have all subfields to use aggregate double metric in ESQL

The yamlRestCompatTest test suite runs 8.x versions of the same yaml test against current main in this branch. This error is now returned, because before only min and max metric were configured to be stored. This fails the assumption in aggregate double metric field mapper. Maybe be less strict here (see comment in AggregateDoubleMetricFieldMapper)?

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments, but the approach looks good. Thanks Larisa!

@limotova limotova force-pushed the add-aggregate-double-metric-aggregates branch from 0064bc5 to a5be73b Compare January 23, 2025 02:17
@limotova limotova requested review from dnhatn and martijnvg January 27, 2025 16:33
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments and questions. Thanks Larisa for iterating on this.

@@ -233,6 +233,17 @@ private static Block constantBlock(BlockFactory blockFactory, ElementType type,
case BYTES_REF -> blockFactory.newConstantBytesRefBlockWith(toBytesRef(val), size);
case DOUBLE -> blockFactory.newConstantDoubleBlockWith((double) val, size);
case BOOLEAN -> blockFactory.newConstantBooleanBlockWith((boolean) val, size);
case COMPOSITE -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, a composite type can be more than aggregated_metric_double? Can we just leave it unsupported here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add this to be able to support the unit tests and wasn't really sure of a way to work around it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If composite can be more than just aggregated_metric_double, then should aggregated_metric_double have its own element type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it really needs it's own type. Maybe there's something funny around constants though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we instanceof on the val and do an AggregateMetricDouble if it's one of those constants. Again, this feels like it's the kind of thing we'd use for just tests and ROW. Which is ok.


package org.elasticsearch.compute.data;

public class AggregateMetricDoubleLiteral {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we introduce this because of tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's a couple of other places in tests where the need for something like this also popped up in the first attempt at implementing aggregate_metric_double

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misreading, but is this only needed for tests? If so should this be moved to the test sources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's only used in the tests so far but one function that required it is outside of the test code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code was originally for ROW but we've used it more in tests since. I think, conceptually at least, we could use this thing for ROW support for aggregate metric double. It's a lot more convenient than a Map representation or something.

@limotova limotova changed the title [ES|QL] Support some stats on aggregate_double_metric [ES|QL] Support some stats on aggregate_metric_double Jan 28, 2025
@limotova limotova marked this pull request as ready for review January 28, 2025 05:25
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jan 28, 2025
@limotova limotova added auto-backport Automatically create backport pull requests when merged and removed needs:triage Requires assignment of a team area label labels Jan 28, 2025
@dnhatn dnhatn requested a review from nik9000 January 28, 2025 05:38

package org.elasticsearch.compute.data;

public class AggregateMetricDoubleLiteral {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misreading, but is this only needed for tests? If so should this be moved to the test sources?

@@ -233,6 +233,17 @@ private static Block constantBlock(BlockFactory blockFactory, ElementType type,
case BYTES_REF -> blockFactory.newConstantBytesRefBlockWith(toBytesRef(val), size);
case DOUBLE -> blockFactory.newConstantDoubleBlockWith((double) val, size);
case BOOLEAN -> blockFactory.newConstantBooleanBlockWith((boolean) val, size);
case COMPOSITE -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If composite can be more than just aggregated_metric_double, then should aggregated_metric_double have its own element type?

@@ -501,4 +503,10 @@ interface SingletonOrdinalsBuilder extends Builder {
*/
SingletonOrdinalsBuilder appendOrd(int value);
}

interface AggregateMetricDoubleBuilder extends Builder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! This is not something in server but we have to add it here for this. Huh. I suppose that's ok.


package org.elasticsearch.compute.data;

public class AggregateMetricDoubleLiteral {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code was originally for ROW but we've used it more in tests since. I think, conceptually at least, we could use this thing for ROW support for aggregate metric double. It's a lot more convenient than a Map representation or something.

@@ -233,6 +233,17 @@ private static Block constantBlock(BlockFactory blockFactory, ElementType type,
case BYTES_REF -> blockFactory.newConstantBytesRefBlockWith(toBytesRef(val), size);
case DOUBLE -> blockFactory.newConstantDoubleBlockWith((double) val, size);
case BOOLEAN -> blockFactory.newConstantBooleanBlockWith((boolean) val, size);
case COMPOSITE -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it really needs it's own type. Maybe there's something funny around constants though?

case UNSUPPORTED, OBJECT, DOC_DATA_TYPE, TSID_DATA_TYPE, PARTIAL_AGG -> throw new IllegalArgumentException(
"can't make random values for [" + type.typeName() + "]"
);
case UNSUPPORTED, OBJECT, DOC_DATA_TYPE, TSID_DATA_TYPE, PARTIAL_AGG, AGGREGATE_METRIC_DOUBLE ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we'll have to be able to make random AggregateMetricDoubles. But later is fine.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy when the others are happy.

@@ -233,6 +233,17 @@ private static Block constantBlock(BlockFactory blockFactory, ElementType type,
case BYTES_REF -> blockFactory.newConstantBytesRefBlockWith(toBytesRef(val), size);
case DOUBLE -> blockFactory.newConstantDoubleBlockWith((double) val, size);
case BOOLEAN -> blockFactory.newConstantBooleanBlockWith((boolean) val, size);
case COMPOSITE -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we instanceof on the val and do an AggregateMetricDouble if it's one of those constants. Again, this feels like it's the kind of thing we'd use for just tests and ROW. Which is ok.

@@ -141,6 +144,9 @@ protected TypeResolution resolveType() {
public Expression surrogate() {
var s = source();
var field = field();
if (field.dataType() == DataType.AGGREGATE_METRIC_DOUBLE) {
return new Sum(s, FromAggregateMetricDouble.withMetric(source(), field, AggregateMetricDoubleBlockBuilder.Metric.COUNT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really clear explanation of what this field is - COUNT(aggregate_metric_double) will is a SUM of the preaggregated counts. It's really quite educational.

It does ask "should we have methods to pick apart the sub-fields?" like GET_COUNT(aggregate_metric_double). Or something. Not now, but eventually? Like, if we want to treat the field like it's just a container of numbers.

Which is odd. We don't have container type fields until this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, you already wrote it and called it FromAggregateMetricDouble. Of course! Do we plug it in? It looks like not. That's fine and good. But we should talk about if it's appropriate to expose it in the language as a function one day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plugged it in, but it is not yet exposed at the language level. I think we should consider exposing this method so that users can retrieve values from aggregate_metrics_double individually, as we currently don't have a good way to return all values at once. Let's address this in a follow-up.

@@ -372,7 +372,7 @@ private PhysicalOperation planTopN(TopNExec topNExec, LocalExecutionPlannerConte
case GEO_POINT, CARTESIAN_POINT, GEO_SHAPE, CARTESIAN_SHAPE, COUNTER_LONG, COUNTER_INTEGER, COUNTER_DOUBLE, SOURCE ->
TopNEncoder.DEFAULT_UNSORTABLE;
// unsupported fields are encoded as BytesRef, we'll use the same encoder; all values should be null at this point
case PARTIAL_AGG, UNSUPPORTED -> TopNEncoder.UNSUPPORTED;
case PARTIAL_AGG, UNSUPPORTED, AGGREGATE_METRIC_DOUBLE -> TopNEncoder.UNSUPPORTED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means you can't SORT if one of these is used later. That's fine for now, but may not be appropriate later.

Copy link
Contributor Author

@limotova limotova Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's something we want to support later on/not in this PR
and might require some discussion since I feel like it's not clear what we would sort on exactly

@@ -288,7 +291,7 @@ public AggregateDoubleMetricFieldType(String name) {
}

public AggregateDoubleMetricFieldType(String name, Map<String, String> meta, MetricType metricType) {
super(name, true, false, false, TextSearchInfo.SIMPLE_MATCH_WITHOUT_TERMS, meta);
super(name, true, false, true, TextSearchInfo.SIMPLE_MATCH_WITHOUT_TERMS, meta);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were we just incorrectly saying "no, this doesn't have doc values" before? That feels silly.

}
}

private void copyDoubleValuesToBuilder(Docs docs, BlockLoader.DoubleBuilder builder, NumericDocValues values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you could reuse one of the other builders somehow maybe? Like the one that numerics uses for doubles. It has code just like this, right? Too tricky to share?

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Thanks Larisa!

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

private final Double sum;
private final Integer count;

public AggregateMetricDoubleLiteral(Double min, Double max, Double sum, Integer count) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be turned into a record?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it can; I just did that and moved it to be under AggregateMetricDoubleBlockBuilder

@limotova limotova merged commit bcd8d15 into elastic:main Jan 29, 2025
16 checks passed
@limotova limotova deleted the add-aggregate-double-metric-aggregates branch January 29, 2025 22:08
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 120343

limotova added a commit to limotova/elasticsearch that referenced this pull request Jan 29, 2025
Adds non-grouping support for min, max, sum, and count, using
CompositeBlock as the underlying block type and an internal
FromAggregateMetricDouble function to handle converting from
CompositeBlock to the correct metric subfields.

Closes elastic#110649
elasticsearchmachine pushed a commit that referenced this pull request Jan 29, 2025
)

Adds non-grouping support for min, max, sum, and count, using
CompositeBlock as the underlying block type and an internal
FromAggregateMetricDouble function to handle converting from
CompositeBlock to the correct metric subfields.

Closes #110649
@@ -212,6 +212,9 @@ private static Stream<AggDef> groupingAndNonGrouping(Tuple<Class<?>, Tuple<Strin
if (tuple.v1().isAssignableFrom(Rate.class)) {
// rate doesn't support non-grouping aggregations
return Stream.of(new AggDef(tuple.v1(), tuple.v2().v1(), tuple.v2().v2(), true));
} else if (tuple.v2().v1().equals("AggregateMetricDouble")) {
Copy link
Contributor

@idegtiarenko idegtiarenko Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit puzzled by this condition.
I think (related pr: #121542) corresponds to extra configs. I do not think they are ever set to AggregateMetricDouble.
I also do not think we enter this branch in CsvTests.

Could you please help me understand when this is happening?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it looks like we define case AGGREGATE_METRIC_DOUBLE -> "AggregateMetricDouble"; below, but I do not think dataTypeToString is called when building AggDef here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct we don't enter this branch; this PR was a 2nd/3rd iteration on adding aggregate metric double to ES|QL and it was necessary in an older iteration, and I mistakenly left it in in this PR. I plan to remove this in the next phase/PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for non-grouping aggregations on aggregate_metric_double field in es|ql
6 participants