partial parallelisation of genbcode, and code that it touches #5815

mkeskells · 2017-03-30T22:23:54Z

Genbcode has internal phases
worker1
optimisation
Worker2
Worker3

This parallelises optimisation ( under some circumstances) with worker1
when worker1 has finished multiple worker2 and 3 can commence

the unit of work for the paralelisation is changes to be a source file ( was a class)

There is some modification of the IO patterns

I/O is very expensive in windows, so reduceing the file operation reduces the stat calls

minor inlining changes to reduce memory usage

Partial move to nio for performance

canched to data structures for thread safety

small changes to IO library for type refinement

added a -Y option to enable/disable parallel running

running benchmarks on a warmed up VM using sbt to compile akka-actor I get the following times based on an quad core I7 laptop running windows 10 with SSD
For windows - In summary this change slightly reduces total compile time by 20%, and the CPU usage by 10%, and allocation by about 2%

The changes in Unix will be posted shortly, but will not expected to be as dramatic

variance is based on 60 compile cycles, removing the first 10 as warmup. Compile target is akka-actor

the tool used to measure these results will be contributed in #5760 and updated in https://github.com/rorygraves/scalac_perf/tree/2.12.x_profile2

post processing the results is via https://github.com/rorygraves/perf_tester

results key
baseline - 2.12.x branch snapped end of March

genBcodeBase[Enabled/Disabled] - the parallelization changes ( with parallelization enabled/disabled(

genBcodeBase_BT[Enabled/Disabled] - the parallelization changes ( with parallelization enabled/disabled, with a optimization for BTypes descriptor generation - which is a separate commit

ALL - summary of 60 cycles of compile

after 10 90% - ignore the first 10 cycles, and the worst 10% of the remains

after 10 90% JVM, no GC - additionally ignore data outside the jvm /GenBcode phase, and ignore the results when a GC occurred during the jvm phase

notes

this PR builds on the work in #5800 which is withdrawn
This is a squashed, and tidied up version of that PR

Results using a I7 windows 10 SSD quad core with Norton AV (with exclusions around the dev area)
Unix results are not quite at dramatic, and will be added shortly

Windows results

ALL

                  RunName	                AllWallMS	                   CPU_MS	                Allocated
              00_baseline	 10725.42 [+2.87% -0.87%]	 10502.60 [+2.71% -0.88%]	  2799.19 [+1.10% -0.99%]
  01_genBcodeBaseDisabled	  9621.73 [+2.76% -0.88%]	  9470.83 [+2.72% -0.88%]	  2781.74 [+1.11% -1.00%]
       02_genBCodeEnabled	  8979.15 [+3.68% -0.83%]	  9666.15 [+3.15% -0.86%]	  2752.37 [+1.15% -0.99%]
03_genBcodeBaseDisabled_BT	  9549.15 [+2.71% -0.87%]	  9379.17 [+2.65% -0.87%]	  2761.93 [+1.11% -1.00%]
    04_genBCodeEnabled_BT	  8836.47 [+3.48% -0.84%]	  9590.63 [+3.08% -0.87%]	  2748.42 [+1.14% -0.99%]
after 10 90%

                  RunName	                AllWallMS	                   CPU_MS	                Allocated
              00_baseline	  9741.62 [+1.04% -0.96%]	  9611.46 [+1.04% -0.96%]	  2789.80 [+1.00% -1.00%]
  01_genBcodeBaseDisabled	  8815.88 [+1.04% -0.96%]	  8686.11 [+1.03% -0.96%]	  2773.25 [+1.00% -1.00%]
       02_genBCodeEnabled	  7906.04 [+1.05% -0.95%]	  8735.07 [+1.04% -0.95%]	  2740.46 [+1.00% -1.00%]
03_genBcodeBaseDisabled_BT	  8738.31 [+1.05% -0.95%]	  8606.60 [+1.04% -0.95%]	  2751.74 [+1.00% -1.00%]
    04_genBCodeEnabled_BT	  7875.92 [+1.04% -0.95%]	  8707.64 [+1.04% -0.96%]	  2736.24 [+1.00% -1.00%]
after 10 90% JVM, no GC

                  RunName	                AllWallMS	                   CPU_MS	                Allocated
              00_baseline	  2776.93 [+1.09% -0.93%]	  2753.47 [+1.08% -0.93%]	   604.19 [+1.00% -1.00%]
  01_genBcodeBaseDisabled	  1758.43 [+1.05% -0.96%]	  1736.98 [+1.03% -0.94%]	   589.91 [+1.00% -1.00%]
       02_genBCodeEnabled	  1141.89 [+1.08% -0.94%]	  2057.29 [+1.03% -0.94%]	   583.88 [+1.00% -1.00%]
03_genBcodeBaseDisabled_BT	  1685.44 [+1.10% -0.93%]	  1660.39 [+1.11% -0.93%]	   581.33 [+1.00% -1.00%]
    04_genBCodeEnabled_BT	  1108.42 [+1.09% -0.92%]	  2043.40 [+1.04% -0.95%]	   577.50 [+1.00% -1.00%]

lrytz · 2017-03-31T12:40:17Z

src/compiler/scala/tools/nsc/backend/jvm/BTypes.scala

     */
-    def packageInternalName: String = {
+    lazy val (packageInternalName:String, simpleName: String) = {


this pattern introduces a third field, scala/scala-dev#308

now there is a feature that I was not aware of. reasonable easy to work around it though

lrytz · 2017-03-31T12:49:24Z

src/compiler/scala/tools/nsc/backend/jvm/BTypes.scala

@@ -1077,6 +1052,13 @@ abstract class BTypes {
      "scala/Null",
      "scala/Nothing"
    )
+
+    def apply(internalName: InternalName) : ClassBType = {
+      classBTypeFromInternalName.getOrElseUpdate(internalName, new ClassBType(internalName))


We should probably make sure there's a single ClassBType per InternalName, also under concurrent access. I haven't checked in detail if something depends on this assumption.

lrytz · 2017-03-31T13:22:35Z

The above are just two random comments, but before going into more details, let's make a plan how to get this in.

First, thanks @mkeskells for the PR, this is going to improve compiler performance a lot!

I'm a little worried about the change in its current form because of the existing code structure in GenBCode. I would very much like to clean this up before doing such a substantial change and making the code even harder to follow.

The current pattern of splitting up the backend into "components" is a bit of a red herring, because it basically puts everything in a hierarchy of traits, but ultimately everything is bunched into a single class GenBCode.

I'd like to use more composition instead of inheritance, like we already do for BTypes and the optimizer.

Second, I'd like to separate the parts of the packend that can access global (therefore Symbol, Type), and the rest which can be parallelized. Again, we already have this for BTypes and the optimizer.

Since you have some deep experience with the backend now, maybe you have other suggestions?

I can start working on this refactoring next week.

mkeskells · 2017-03-31T20:15:03Z

@lrytz the real expert of this is @retronym, and I would defer to him on the changes and structure of the files
I did find it hard to navigate when I was doing this work though

I did have an earlier version where I extracted the components into separate files to work on and then attempted to isolate global access, but ran into a few issues

it would make it hard to review the changes
there are lots of things that have access patterns to global, and my drive was for performance, rather than structure

I do think that there are some bits that we can easily lift ( maybe in a separate PR)
e.g. changes to the IO lib, descriptor generation in BTypes

Happy to discuss this on a call, or via email, but I think that we need to talk to @retronym
I know that he has other changes in this area

I also have another change that affects this area. based on @retronym use of per run settings. I think that could be also done before considering the restructure, as is is simple point fixes and would be easier to consider now then to track after rework. It is more CPU and memory reductions

I will submit this per-run as a PR on Sunday/Monday if I get the time

I also note that this PR is showing errors. I hope to look at this in the same timeframe

…ot exist prior to write

mkeskells · 2017-04-04T20:32:41Z

/rebuild

mkeskells · 2017-05-24T19:27:50Z

some parts of this have been done in other PRs. The parallelism will be addressed after the refactor that @lrytz is looking to do. @Lyrtz what is the timeframe for this to complete?

lrytz · 2017-05-26T06:20:55Z

@mkeskells I plan to work on this after Copenhagen, I hope to have it done in 3 weeks.

rorygraves and others added 4 commits March 29, 2017 21:33

Add support for in phase runnable tracking

cdfba55

initial parallelization enabler

1ac955e

Parrellisation support for jvm phase (GenBCode)

7e6f32e

lazyly build and cache Btypes descriptor

1146240

scala-jenkins added this to the 2.12.2 milestone Mar 30, 2017

This was referenced Mar 30, 2017

partial parallelisation of genbcode, and code that it touches [ci: last-only] #5800

Closed

Add more diagnostics for compiler performance analysis #5760

Closed

lrytz self-requested a review March 31, 2017 07:48

lrytz modified the milestones: 2.12.3, 2.12.2 Mar 31, 2017

lrytz reviewed Mar 31, 2017

View reviewed changes

lrytz mentioned this pull request Mar 31, 2017

Lazy ClassBTypes #5739

Merged

mkeskells added 2 commits April 2, 2017 23:57

lazyly build and cache Btypes descriptor

7602306

support virtual abstract files as output and don't require files to n…

fdeaefc

…ot exist prior to write

This was referenced Apr 4, 2017

Improve performance of the backend #5820

Merged

per run immutable settings #5825

Closed

correct bounds for optimiser completions

f2d31fe

This was referenced Apr 5, 2017

optimise Btypes descriptor generation #5831

Closed

optimise completeSilentlyAndCheckErroneous #5832

Merged

mkeskells closed this May 24, 2017

SethTisue removed this from the 2.12.3 milestone Jun 27, 2017

lrytz mentioned this pull request Aug 11, 2017

Backend Refactoring #6012

Merged

mkeskells mentioned this pull request Feb 19, 2018

Enable parallel optimizing and writing of classes by GenBCode #6124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

partial parallelisation of genbcode, and code that it touches #5815

partial parallelisation of genbcode, and code that it touches #5815

Uh oh!

mkeskells commented Mar 30, 2017 •

edited

Loading

Uh oh!

lrytz Mar 31, 2017

Uh oh!

mkeskells Mar 31, 2017

Uh oh!

lrytz Mar 31, 2017

Uh oh!

lrytz commented Mar 31, 2017

Uh oh!

mkeskells commented Mar 31, 2017

Uh oh!

mkeskells commented Apr 4, 2017

Uh oh!

mkeskells commented May 24, 2017

Uh oh!

lrytz commented May 26, 2017

Uh oh!

Uh oh!

partial parallelisation of genbcode, and code that it touches #5815

partial parallelisation of genbcode, and code that it touches #5815

Uh oh!

Conversation

mkeskells commented Mar 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lrytz Mar 31, 2017

Choose a reason for hiding this comment

Uh oh!

mkeskells Mar 31, 2017

Choose a reason for hiding this comment

Uh oh!

lrytz Mar 31, 2017

Choose a reason for hiding this comment

Uh oh!

lrytz commented Mar 31, 2017

Uh oh!

mkeskells commented Mar 31, 2017

Uh oh!

mkeskells commented Apr 4, 2017

Uh oh!

mkeskells commented May 24, 2017

Uh oh!

lrytz commented May 26, 2017

Uh oh!

Uh oh!

mkeskells commented Mar 30, 2017 •

edited

Loading