-
Notifications
You must be signed in to change notification settings - Fork 3k
Parquet: Improve Test Coverage of RowGroupFilter Code with Nans #6518 #6554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Improve Test Coverage of RowGroupFilter Code with Nans #6518 #6554
Conversation
|
@RussellSpitzer can you help with the review? About improve Test Coverage of RowGroupFilter Code with Nans. |
|
|
||
|
|
||
| @Test | ||
| public void testDoubleWithNan() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this and the Float test we probably want some of the negative tests as well but they would only work in ORC. So I think just add a "if (orc) {
negative cases
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for @RussellSpitzer suggestion, but when i run test as your suggestion.
I found ORC also can't skip read. so i think the negative test is work for parquet and ORC?
as for #6517, "// Only ORC should be able to distinguish using min/max when NaN is present", i add these new tests for ORC skip.
// just for "format == FileFormat.ORC"
// record.setField("_some_double_nans", (i % 10 == 0) ? Double.NaN : 2D); // includes some nan values
shouldRead = shouldRead(equal("some_double_nans", 10.0));
Assert.assertFalse("Should skip: column with some nans contains target value", shouldRead);
shouldRead = shouldRead(greaterThan("some_double_nans", 10.0));
Assert.assertFalse("Should skip: column with some nans contains target value", shouldRead);About ORC nans pushdown. apache/orc#1077. to avoid ORC pushdown if float with nans.
|
@RussellSpitzer Can I trouble you if you have time to do a review? About improve Test Coverage of RowGroupFilter Code with Nans. I have made changes. |
| record.setField("_float", ((float) (100 - i)) / 100F + 1.0F); // 2.0f, 1.99f, 1.98f, ... | ||
| record.setField("_double", ((double) i) / 100.0D + 2.0D); // 2.0d, 2.01d, 2.02d, ... | ||
| record.setField("_float", (i % 10 == 0) ? Float.NaN : ((float) (100 - i)) / 100F + 1.0F); // 2.0f, 1.99f, 1.98f, ... OR NAN | ||
| record.setField("_double", (i % 10 == 0) ? Double.NaN : ((double) i) / 100.0D + 2.0D); // 2.0d, 2.01d, 2.02d, ... OR NAN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably shouldn't do that here. we can do negative tests as long as there aren't NANs correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In testEq() it checks that the first parameterized value is read, and the second value is not read. So if there are NANs this should fail, if it doesn't fail with this code change let's figure out why.
| } | ||
|
|
||
| @Test | ||
| public void testDoubleWithNan() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, If ORC doesn't support it either this is ok with me
| private final Object readValue; | ||
| private final Object skipValue; | ||
| // float or double type contain nans can't pushDown to skip | ||
| private final boolean unableSkip; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than modify the whole test, I think we would just make special cases for the float and double with Nans, I actually think this is covered fine in the other MetricRowGroupFilter test so we can probably just leave this file alone.
fd6f081 to
82fe6d3
Compare
|
please take a review if you have time. @RussellSpitzer |
| import org.junit.runner.RunWith; | ||
| import org.junit.runners.Parameterized; | ||
|
|
||
| import static org.apache.iceberg.avro.AvroSchemaUtil.convert; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did all these imports move?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks our checkstyle code, please run all relevant tests and check ./gradlew build -x test -x integrationTest before asking for review.
|
Thanks for the PR @youngxinler ! |
|
Thanks for the review and guidance in the process. @RussellSpitzer |

This PR:
TestMetricsRowGroupFilterTypes. Regarding adding new fields containing NaN, or adding NAN to existing fields, I think the latter is better, because all field types in this class are unique.
#6518