Skip to content

Add -Ycollect-statistics for collecting statistics without printing them #10795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

szeiger
Copy link
Contributor

@szeiger szeiger commented Jun 14, 2024

This change adds a new flag -Ycollect-statistics that enables the same statistics gathering as -Ystatistics but without dumping all the statistics to the console. This is useful for build tools that want to collect statistics via a compiler plugin without interfering with the console output or the operation of -Ystatistics (if specified explicitly by the user).

Note that there is an internal YstatisticsEnabled setting that may appear to do this already, but in fact it controls both collecting and printing together. Even if you switched it on internally (without enabling any phase statistics via -Ystatistics / -Yhot-statistics) you would still get at least the phase timings summary.

I have a 2.12 version ready as well. It's not quite merge compatible for direct cherry-picking.

@scala-jenkins scala-jenkins added this to the 2.13.15 milestone Jun 14, 2024
@lrytz
Copy link
Member

lrytz commented Jun 14, 2024

Maybe we can change -Ystatistics to take an optional argument -Ystatistics:silent?

@szeiger
Copy link
Contributor Author

szeiger commented Jun 14, 2024

I thought about that but it's not obvious what the semantics should be or how the use of this feature by a build tool would compose with an explicit -Ystatistics from the user. -Ystatistics takes a list of phases. And empty list implicitly disables statistics output (and gathering), but any valid phase enables statistics gathering for all phases and prints the output both for the specified phase and for the phase timings and statistics collection overhead in general.

@lrytz
Copy link
Member

lrytz commented Jun 17, 2024

That makes sense, thanks. Looks good to me, @som-snytt maybe you have some more thoughts?

@som-snytt
Copy link
Contributor

som-snytt commented Jun 21, 2024

Maybe ideally -Ystatistics to collect and -Vstatistics to print. -Y has no compatibility guarantee, but maybe someone's build would break, so it's nicer not to.

-Y for behavior, -V for output.

At this late date, whatever keeps the ecosystem on life support.

Collect cold statistics

All statistics are cold.

Oh, I suppose that when they are represented in a visual form that roils the emotions, they become hot. Also, "damn lies, and statistics".

This change adds a new flag `-Ycollect-statistics` that enables the same statistics gathering as `-Ystatistics` but without dumping all the statistics to the console. This is useful for build tools that want to collect statistics via a compiler plugin without interfering with the console output or the operation of `-Ystatistics` (if specified explicitly by the user).

Note that there is an internal `YstatisticsEnabled` setting that may appear to do this already, but in fact it controls both collecting and printing together. Even if you switched it on internally (without enabling any phase statistics via `-Ystatistics` / `-Yhot-statistics`) you would still get at least the phase timings summary.
@szeiger szeiger force-pushed the wip/collect-statistics-2.13 branch from 4627ee7 to 61ca27a Compare June 26, 2024 12:30
@szeiger
Copy link
Contributor Author

szeiger commented Jun 26, 2024

Collect cold statistics

All statistics are cold.

These are the statistics that are relatively cheap to collect, as opposed to -Yhot-statistics. I'm trying to get some more reliable data from our build, but so far it looks like even the cold statistics have a significant performance impact and we may want to split them up further. I'd like to collect at least the phase timings in every build (which should not have a noticeable performance impact).

Depending on how the performance analysis goes, I may end up splitting cold statistics in another PR that turns -Ycollect-statistics into a ChoiceSetting to control this.

@lrytz lrytz merged commit bea510c into scala:2.13.x Jun 26, 2024
szeiger added a commit to szeiger/scala that referenced this pull request Jun 26, 2024
Backport of scala#10795:

This change adds a new flag `-Ycollect-statistics` that enables the same statistics gathering as `-Ystatistics` but without dumping all the statistics to the console. This is useful for build tools that want to collect statistics via a compiler plugin without interfering with the console output or the operation of `-Ystatistics` (if specified explicitly by the user).

Note that there is an internal `YstatisticsEnabled` setting that may appear to do this already, but in fact it controls both collecting and printing together. Even if you switched it on internally (without enabling any phase statistics via `-Ystatistics` / `-Yhot-statistics`) you would still get at least the phase timings summary.
@szeiger
Copy link
Contributor Author

szeiger commented Jun 27, 2024

After running some more benchmarking builds (a medium-size build of part of our codebase running in parallel on a 192-core machine), I'm getting an average build time of 770s without statistics and 820s with statistics, a 6.5% increase. This is purely from having cold statistics collecting enabled, without any output. Splitting off a subset of even colder statistics looks worthwhile. I wouldn't want to take a 6.5% performance hit in all our builds.

@lrytz
Copy link
Member

lrytz commented Jun 27, 2024

@szeiger
Copy link
Contributor Author

szeiger commented Jun 27, 2024

At first glance it looks like #6234 defeats the point of this trick. Or is it still better when statistics have been enabled but are turned off at a later time? I would expect that never enabling statistics will result in similarly efficient code. I'll separate the phase timings from the rest (using their own AlmostFinal machinery) and run another benchmark with that. I hope that even with an inefficient AlmostFinal the overhead will be close to zero because the timers are only accessed a dozen or so times per build.

szeiger added a commit to szeiger/scala that referenced this pull request Jul 11, 2024
Backport of scala#10795:

This change adds a new flag `-Ycollect-statistics` that enables the same statistics gathering as `-Ystatistics` but without dumping all the statistics to the console. This is useful for build tools that want to collect statistics via a compiler plugin without interfering with the console output or the operation of `-Ystatistics` (if specified explicitly by the user).

Note that there is an internal `YstatisticsEnabled` setting that may appear to do this already, but in fact it controls both collecting and printing together. Even if you switched it on internally (without enabling any phase statistics via `-Ystatistics` / `-Yhot-statistics`) you would still get at least the phase timings summary.
@SethTisue SethTisue added the release-notes worth highlighting in next release notes label Jul 15, 2024
@SethTisue SethTisue changed the title Collect statistics without printing them Add -Ycollect-statistics for collecting statistics without printing them Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes worth highlighting in next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants