Skip to content

Conversation

@youngxinler
Copy link
Contributor

This PR:

  1. Add TestMetricsRowGroupFilter "some_double_nans" field which contain NaN as well as non-nan values for Coverage.
  2. Change TestMetricsRowGroupFilterTypes "double" and "float" fields , add some NAN, for value Coverage.

TestMetricsRowGroupFilterTypes. Regarding adding new fields containing NaN, or adding NAN to existing fields, I think the latter is better, because all field types in this class are unique.

#6518

@youngxinler
Copy link
Contributor Author

@RussellSpitzer can you help with the review? About improve Test Coverage of RowGroupFilter Code with Nans.



@Test
public void testDoubleWithNan() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and the Float test we probably want some of the negative tests as well but they would only work in ORC. So I think just add a "if (orc) {

negative cases
}

Copy link
Contributor Author

@youngxinler youngxinler Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for @RussellSpitzer suggestion, but when i run test as your suggestion.
I found ORC also can't skip read. so i think the negative test is work for parquet and ORC?

as for #6517, "// Only ORC should be able to distinguish using min/max when NaN is present", i add these new tests for ORC skip.

      // just for "format == FileFormat.ORC"
      // record.setField("_some_double_nans", (i % 10 == 0) ? Double.NaN : 2D); // includes some nan values
      shouldRead = shouldRead(equal("some_double_nans", 10.0));
      Assert.assertFalse("Should skip: column with some nans contains target value", shouldRead);

      shouldRead = shouldRead(greaterThan("some_double_nans", 10.0));
      Assert.assertFalse("Should skip: column with some nans contains target value", shouldRead);

image

About ORC nans pushdown. apache/orc#1077. to avoid ORC pushdown if float with nans.

@youngxinler
Copy link
Contributor Author

@RussellSpitzer Can I trouble you if you have time to do a review? About improve Test Coverage of RowGroupFilter Code with Nans. I have made changes.

record.setField("_float", ((float) (100 - i)) / 100F + 1.0F); // 2.0f, 1.99f, 1.98f, ...
record.setField("_double", ((double) i) / 100.0D + 2.0D); // 2.0d, 2.01d, 2.02d, ...
record.setField("_float", (i % 10 == 0) ? Float.NaN : ((float) (100 - i)) / 100F + 1.0F); // 2.0f, 1.99f, 1.98f, ... OR NAN
record.setField("_double", (i % 10 == 0) ? Double.NaN : ((double) i) / 100.0D + 2.0D); // 2.0d, 2.01d, 2.02d, ... OR NAN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't do that here. we can do negative tests as long as there aren't NANs correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In testEq() it checks that the first parameterized value is read, and the second value is not read. So if there are NANs this should fail, if it doesn't fail with this code change let's figure out why.

}

@Test
public void testDoubleWithNan() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, If ORC doesn't support it either this is ok with me

private final Object readValue;
private final Object skipValue;
// float or double type contain nans can't pushDown to skip
private final boolean unableSkip;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than modify the whole test, I think we would just make special cases for the float and double with Nans, I actually think this is covered fine in the other MetricRowGroupFilter test so we can probably just leave this file alone.

@youngxinler youngxinler force-pushed the ImproveTestCoverageOfRowGroupFilter branch from fd6f081 to 82fe6d3 Compare January 26, 2023 11:08
@youngxinler
Copy link
Contributor Author

please take a review if you have time. @RussellSpitzer

import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;

import static org.apache.iceberg.avro.AvroSchemaUtil.convert;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did all these imports move?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks our checkstyle code, please run all relevant tests and check ./gradlew build -x test -x integrationTest before asking for review.

@RussellSpitzer RussellSpitzer merged commit 505368a into apache:master Feb 8, 2023
@RussellSpitzer
Copy link
Member

Thanks for the PR @youngxinler !

@youngxinler
Copy link
Contributor Author

Thanks for the review and guidance in the process. @RussellSpitzer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants