0% found this document useful (0 votes)
6 views7 pages

A Sub-Millisecond GC For .NET - ! - Applied Algorithms

An experimental Garbage Collector for .NET called Satori has shown remarkable performance improvements, including a 50x reduction in median pause time and over 100x for the 99th percentile. Satori not only reduces pause times significantly but also maintains a smaller heap size compared to the traditional Server GC, making it ideal for high-performance applications. Developers are encouraged to test Satori on their workloads and provide feedback to help prioritize its development.

Uploaded by

gmorris59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

A Sub-Millisecond GC For .NET - ! - Applied Algorithms

An experimental Garbage Collector for .NET called Satori has shown remarkable performance improvements, including a 50x reduction in median pause time and over 100x for the 99th percentile. Satori not only reduces pause times significantly but also maintains a smaller heap size compared to the traditional Server GC, making it ideal for high-performance applications. Developers are encouraged to test Satori on their workloads and provide feedback to help prioritize its development.

Uploaded by

gmorris59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

A sub-millisecond GC for .NET?!

Howdy folks, I wanted to bring your attention to a Github discussion over in the .NET runtime
where an experimental Garbage Collector called Satori has emerged that is producing very
exciting numbers for those of us in the .NET performance crowd.

Quick Links:

- The discussion thread

- The comment introducing the new GC

- Benchmarks, Benchmarks, Benchmarks

TL;DR / Results

Compared to the traditional Server GC in interactive mode, Satori offers improvements in


synthetic benchmarks in several key metrics that range from impressive to down right shocking.
How shocking?

- 50x improvement to Median Pause Time

- >100x improvement to 99th percentile pause times

- 3x improvement to Heap Size

So yeah… it is really a stunning development.

I would encourage anyone interested in writing high-performance .NET code to try out Satori
(instructions below) on your own workloads to see if it offers any benefits. This feedback can
help the folks at Microsoft prioritize investment in this experiment.

You can find instructions on how to get it at the end of this article

What is a garbage collector and why do I care?


Automated garbage collection is a way of managing memory in applications that is used by
many(most?) popular languages such as C#/.NET, Java, Go, Ruby, Javascript, PHP and many
more. Notably, it isn’t used much in lower level languages such as Rust, C/C++ and Zig because
those languages need more precise control over the behavior of their applications.

Specifically, when you use a garbage collector, you are giving up control over how your
application manages memory; you are also giving up control over how your program is executed
in time. Most types of garbage collectors have to freeze your program in place during a phase
called “stop the world” in order to count up all of the objects your program has created and see
which ones are still valid. It then removes the invalid objects and does a some other house
keeping such as rearranging the valid objects into more compact regions of memory so that your
program’s memory useage doesn’t grow out of control overtime.

As you can imagine however, stopping everything that your program is doing at seemingly
random times in the middle of operations can be a disruptive event and the longer the “pause”,
the more disruptive it can be. This is why lower level languages generally avoid garbage
collection because of the unpredictable nature of their design. Those languages generally require
the developer to keep track of the objects that they create and manually delete them from
memory when they are done with them. This can be tricky to do correctly and when a developer
forgets to delete an object, a memory leak occurs which can cause the amount of memory that a
program uses to grow until the program crashes.

Garbage collected languages for the most part don’t have to worry about memory leaks (although
other types of resource leaks can be just as bad or worse) and this is a big reason that GCs are so
popular. They make programming safer and simpler, which is why you don’t have to think about
them. They just work…until they don’t.

Large scale and high-throughput applications have to begin to take the behavior of the GC into
account because the pauses to the application can become extremely disruptive. Imagine if you
favorite app would freeze for 5 seconds every few minutes! Well real-world GC pauses can last
this long or even longer if there are large numbers of objects to track or very large amounts of
memory to manage. Or sometimes a programmer might accidentally choose an algorithm that
generates a lot of objects very quickly that causes the GC to “thrash”. In these cases the amount
of time that the GC spends cleaning up memory can actually take up more time then the
application’s actual code! So a lot of care and optimization goes into making GCs as fast and
efficient as possible!

Some history of the .NET GC


The .NET garbage collector has a very long history and has evolved quite a bit over the years and
it has improved in multiple ways. Long ago, the Workstation GC was created, it was essentially
intended for desktop UI applications. It is more or less single threaded and only uses a single
heap to store managed objects.

As the demands of .NET applications grew, a new “server” garbage collector emerged that was
designed to maximize throughput of the application, at the expense of longer pause times. It is
generally more efficient for GC to run a few larger collections rather than many smaller ones. At
first big difference was that server GC used multiple heaps (generally one per CPU core) to store
objects. This makes it easier for the .NET runtime to allocate and manage objects when running
on very large machines with lots of CPU cores. Another difference was that server GC would, by
design, use a lot more of the computer’s memory. This was because the GC was trying to
minimize the number of pauses by letting more objects build up over time and then doing a big
collection of all of them.

As .NET got more mature, more advanced features such as the option to use Concurrent garbage
collection for both workstation and server mode. This allowed the GC to use multiple threads to
collect memory and helped make pause times smaller. Concurrent garbage collection was
replaced with Background garbage collection in later versions of .NET and helped the garbage
collector scale even better over time.

Even more innovations and features have been added over the years, largely driven by Maoni
Stephens, who has been an incredible resource over the years through her blog posts. Her and
the team have added some more features in recent years, including a major one called DATAS
which trades a little bit of application throughput for dramatically smaller heap sizes when using
the Server GC.

But quietly while this was happening in .NET other ecosystems were also delivering innovations
and improvements. Java has a bright ecosystem where developers have the option of swapping
out their garbage collectors completely rather than just changing a few options. And then Go
made a quantum leap came and made everyone pay attention. Advances in the Go garbage
collector made pause times lighting fast, many times less than 1 millisecond which makes them
pretty much invisible during request processing. This immediately made developers in other
language ecosystems jealous and the comparisons began pouring in.

As people drew comparisons between .NET and Go, the important thing to note is that although
Go had smaller pause durations, .NET offered superior throughput. This was an intentional
tradeoff made by the .NET team and has worked fairly well over the years. But pauses can still be
painful especially when everything else in .NET has become dramatically faster in the last 10
years. Web requests that used to take 250 milliseconds, now might only take 10 milliseconds and
being interrupted by a GC pause, even a short one, shows up a lot more on monitoring
dashboard than it used to.

So what is going on now?


For the last 10 or 15 years, when people would ask about alternative garbage collectors for .NET,
the runtime team has patiently explained each time that .NET support some more exotic
features (such as interior pointers) that some other ecosystems (like Java) do not. They would
also explain the tradeoffs and send the commenter on their way. Which is why when I saw yet
another thread asking about a pauseless GC in .NET, I hopped in to provide the standard
answer.

Now skip forward, 2 years. The thread has been untouched in 8 months when all of a sudden,
.NET runtime engineer Vladimir Sadov popped up and floated the idea that a GC could keep
pauses down to 2-3 milliseconds, but that was just theoretical. Until it wasn’t. Vladamir has
apparently been sitting on an experimental fork of the runtime that provides an alternative
garbage collector (called Satori) that completely dominates the existing server and workstation
GCs on pause latency, even with all the new innovations over the years.

This is exciting for me as a .NET developer that works on financial processing systems where
latency is always a concern. The idea of being able to have my cake and eat it too when I get
high-throughput and low pause times is really enticing to me and the results have been nothing
but encouraging in the short time so far.

How does it perform?


There is a drop in allocation throughput of about 15-20%, but the improvement in pause time
places it amongst the lowest pause times in the industry. To pile on the wins, Satori keeps heap
size much smaller than the server GC. This has a significant impact on benchmark performance.
For example, in a 30 second synthetic benchmark, the existing server GC took about 2.6 seconds
of that time (about 8%). Satori only needed 156 milliseconds (about 0.5%). This is giving back
real time back to the application.

Just take a look at this table from the GC stress benchmark:

GC GC Allocation
Mode P50 p90 p99 pMax
Count Time % Rate MB/s
workstation-batch 38 88.42 39.1 971.571 1015.808 1211.597 1211.569
workstation-interactive 36 88.04 39.19 997.785 1037.926 2351.104 2351.104
workstation-lowlatency 2622 95.96 46.63 0.042 0.064 421.069 1131.315
workstation-
39 88.69 37.92 985.497 1042.841 2156.134 2156.134
sustainedlowlatency
server-batch 19 10.61 172.7 157.594 495.616 495.616 495.616
server-interactive 20 11.03 174.46 148.48 153.6 772.915 772.915
server-sustained-
19 11.23 172.91 165.888 801.178 801.178 801.178
lowlatency
server-batch-datas 78 42.55% 112.94 154.522 171.622 491.52 1124.762
server-interactive-
49 96.38 23.61 1073.971 1116.57 1143.603 1143.603
datas
GC GC Allocation
Mode P50 p90 p99 pMax
Count Time % Rate MB/s
server-sustained-
46 96.45 22.96 1102.643 1154.253 1171.456 1171.456
lowlatency-datas
satori-interactive 21 N/A 144.75 0.203 31.166 27.853 27.853
satori-lowlatency 21 N/A 147.62 0.143 0.192 5.491 5.491
The numbers ultimately speak for themselves. Satori in all benchmarks run so far offers sub-
millisecond pause times at the 90th percentile and at the 99th percentile in many. Max pause
times improve on the existing server GC by anywhere from 20x-100x or more in some cases. In
case you’re wondering how the workstation GC (which was designed for low latency workloads),
it would probably need to be rearchitected to even compete fairly. The single threaded design
leaves it far slower and temperamental than the other options.

So how do I get it?


It is actually pretty simple.

1. You need to build your app targeting .NET 8.0


2. You need to publish your app in self-contained mode
1. `dotnet publish –self-contained -c Release -o .\pub`
3. Once published, you need to copy in two modified dlls into your publish folder. And you
can now run your application without any other changes.
1. I have provided these for windows here. (I’m working on building them for linux…)
2. Or if you don’t want to download random dlls off the internet you can clone the .NET
runtime repo and run `build.cmd clr -c Release`

You can find the original instructions from Vladimir here.

Try it out, see how it performs for you and report back to the thread if you find the results
compelling. I hope that you’ve found this article interesting

You might also like