Skip to content

Conversation

@szehon-ho
Copy link
Member

Some minor cleanup discussed: #7539 (comment) ,

Try to make the readable_metrics code of BaseEntriesTable and BaseFilesTable align, by extracting common class and making code more similar.

if (partitionType.fields().size() < 1) {
// avoid returning an empty struct, which is not always supported. instead, drop the partition
// field (id 102)
// avoid returning an empty struct, which is not always supported.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related, but fixing longstanding uneven line breaks introduced by the spotless refactor

* file projection
*
* @return file projection with required columns to read readable metrics
* file projection.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the return does not convey additional information, so removed it in favor of the method comment for brevity

StructProjection structProjection = structProjection(projection);
return CloseableIterable.transform(entryAsStruct, structProjection::wrap);

return CloseableIterable.transform(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to make both branches be more alike:

  1. calculate final struct projection
  2. calculate 'file' projection if needed for reading manifest
  3. Use these to transform the result

@szehon-ho szehon-ho force-pushed the readable_metrics_refactor branch from 5de22b6 to 141c9ab Compare May 15, 2023 21:06
private CloseableIterable<? extends ManifestEntry<? extends ContentFile<?>>> entries(
Schema newFileProjection) {
return ManifestFiles.open(manifest, io, specsById).project(newFileProjection).entries();
Schema fileStructProjection) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this method a bit more functional (input/output is more clear)

@szehon-ho
Copy link
Member Author

@dramaticlly fyi

Copy link
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @szehon-ho for the refactoring. I think now logic in BaseFiles is much easier to read

private StructLike withReadableMetrics(
ContentFile<?> file, Types.NestedField readableMetricsField) {
int metricsPosition = projection.columns().indexOf(readableMetricsField);
int columnCount = projection.columns().size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can reuse the projectionColumnCount as variable name above ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, projectionColumnCount is I guess the description of what it is, but the way I think about the method withReadableMetrics is actually more towards the goal, of making a struct.

So I think it's better to have the variables in the end: struct, structSize, metricsStruct, metricsPosition

(projectionColumnCount being structSize). Changed the references to match. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation, I think structSize sounds great!

* ensuring that the underlying metrics used to create that column are part of the final
* projection.
*
* @param projectionSchema projection to transform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: projection to transform read a bit weird as first time I read it I am not clear on why do we need to transform, maybe requested projection or intended projection which correspond to returning of actual projection ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, changed to 'requestedProjection'

@szehon-ho szehon-ho force-pushed the readable_metrics_refactor branch from f071cf5 to 45db9e8 Compare May 23, 2023 01:31
@szehon-ho szehon-ho force-pushed the readable_metrics_refactor branch from 45db9e8 to be003e9 Compare May 23, 2023 01:39
@szehon-ho szehon-ho merged commit 7dbdfd3 into apache:master May 23, 2023
@szehon-ho
Copy link
Member Author

Merged, as this is just a continuation/code fixup of #7539.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants