-
Notifications
You must be signed in to change notification settings - Fork 3.1k
partial parallelisation of genbcode, and code that it touches #5815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partial parallelisation of genbcode, and code that it touches #5815
Conversation
*/ | ||
def packageInternalName: String = { | ||
lazy val (packageInternalName:String, simpleName: String) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pattern introduces a third field, scala/scala-dev#308
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now there is a feature that I was not aware of. reasonable easy to work around it though
@@ -1077,6 +1052,13 @@ abstract class BTypes { | |||
"scala/Null", | |||
"scala/Nothing" | |||
) | |||
|
|||
def apply(internalName: InternalName) : ClassBType = { | |||
classBTypeFromInternalName.getOrElseUpdate(internalName, new ClassBType(internalName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably make sure there's a single ClassBType per InternalName, also under concurrent access. I haven't checked in detail if something depends on this assumption.
The above are just two random comments, but before going into more details, let's make a plan how to get this in. First, thanks @mkeskells for the PR, this is going to improve compiler performance a lot! I'm a little worried about the change in its current form because of the existing code structure in GenBCode. I would very much like to clean this up before doing such a substantial change and making the code even harder to follow. The current pattern of splitting up the backend into "components" is a bit of a red herring, because it basically puts everything in a hierarchy of traits, but ultimately everything is bunched into a single class I'd like to use more composition instead of inheritance, like we already do for Second, I'd like to separate the parts of the packend that can access Since you have some deep experience with the backend now, maybe you have other suggestions? I can start working on this refactoring next week. |
@lrytz the real expert of this is @retronym, and I would defer to him on the changes and structure of the files I did have an earlier version where I extracted the components into separate files to work on and then attempted to isolate global access, but ran into a few issues
I do think that there are some bits that we can easily lift ( maybe in a separate PR) Happy to discuss this on a call, or via email, but I think that we need to talk to @retronym I also have another change that affects this area. based on @retronym use of per run settings. I think that could be also done before considering the restructure, as is is simple point fixes and would be easier to consider now then to track after rework. It is more CPU and memory reductions I will submit this per-run as a PR on Sunday/Monday if I get the time I also note that this PR is showing errors. I hope to look at this in the same timeframe |
/rebuild |
@mkeskells I plan to work on this after Copenhagen, I hope to have it done in 3 weeks. |
Genbcode has internal phases
worker1
optimisation
Worker2
Worker3
This parallelises optimisation ( under some circumstances) with worker1
when worker1 has finished multiple worker2 and 3 can commence
the unit of work for the paralelisation is changes to be a source file ( was a class)
There is some modification of the IO patterns
I/O is very expensive in windows, so reduceing the file operation reduces the stat calls
minor inlining changes to reduce memory usage
Partial move to nio for performance
canched to data structures for thread safety
small changes to IO library for type refinement
added a -Y option to enable/disable parallel running
running benchmarks on a warmed up VM using sbt to compile akka-actor I get the following times based on an quad core I7 laptop running windows 10 with SSD
For windows - In summary this change slightly reduces total compile time by 20%, and the CPU usage by 10%, and allocation by about 2%
The changes in Unix will be posted shortly, but will not expected to be as dramatic
variance is based on 60 compile cycles, removing the first 10 as warmup. Compile target is akka-actor
the tool used to measure these results will be contributed in #5760 and updated in https://github.com/rorygraves/scalac_perf/tree/2.12.x_profile2
post processing the results is via https://github.com/rorygraves/perf_tester
results key
baseline - 2.12.x branch snapped end of March
genBcodeBase[Enabled/Disabled] - the parallelization changes ( with parallelization enabled/disabled(
genBcodeBase_BT[Enabled/Disabled] - the parallelization changes ( with parallelization enabled/disabled, with a optimization for BTypes descriptor generation - which is a separate commit
ALL - summary of 60 cycles of compile
after 10 90% - ignore the first 10 cycles, and the worst 10% of the remains
after 10 90% JVM, no GC - additionally ignore data outside the jvm /GenBcode phase, and ignore the results when a GC occurred during the jvm phase
notes
this PR builds on the work in #5800 which is withdrawn
This is a squashed, and tidied up version of that PR
Results using a I7 windows 10 SSD quad core with Norton AV (with exclusions around the dev area)
Unix results are not quite at dramatic, and will be added shortly
Windows results