WP Understanding Java Garbage Collection 20170110
WP Understanding Java Garbage Collection 20170110
Understanding Java
Garbage Collection
And what you can do about it
Understanding Java Garbage Collection
Introduction
The Java programming language utilizes a managed
runtime (the Java Virtual Machine, or JVM) to improve
developer productivity and provide cross-platform
portability. Because different operating systems and
hardware platforms vary in the ways that they manage
memory, the JVM performs this function for the develop-
er, allocating memory as objects are created and freeing
it when they are no longer used. This process of freeing
unused memory is called ‘garbage collection’ (GC), and
is performed by the JVM on the memory heap during
application execution.
This paper explains in more detail how garbage
collection works, the different algorithm types employed Java garbage collection can have a big impact on
application performance and throughput. As the JVM
by commercially available JVMs, and how developers
heap size grows, so does the amount of time that an
and architects can make better informed decisions application must pause to allow the JVM to perform GC.
The result can be long, unexpected pauses that can
delay transactions, deteriorate application throughput,
cause user-session time-outs, force nodes to fall out of
clusters, or result in even more severe business related
Executive Summary losses (e.g. drop in revenue or damage to reputation).
Garbage Collection (GC) is an integral part of application
behavior on Java platforms, yet it is often misunderstood.
This paper explains in more detail how garbage
Java developers need to understand how GC works and
collection works, the different algorithm types employed
how the actions they can take in selecting and tuning
by commercially available JVMs, and how developers
collector mechanisms, as well as in application architec-
and architects can make better informed decisions on
ture choices, can affect runtime performance, scalability
which garbage collector to use and how to maximize
and reliability.
application performance.
This white paper reviews and classifies the various
Why Care About the Java Garbage Collector?
garbage collectors and collection techniques available in
Overall garbage collection is much better and more
JVMs today. This paper provides an overview of common
efficient than you might think. It’s much faster than
garbage collection techniques, algorithms and defines
malloc() at allocating memory and dead objects cost
terms and metrics common to all collectors including:
nothing to collect (really!). GC will find all the dead
• Generational
objects, even in cyclic graphs, without any assistance
• Parallel
from the developer. But in many ways garbage collection
• Stop-the-world
is much more insidious than many developers and
• Incremental
architects realize.
• Concurrent
• Mostly-concurrent
For most collectors GC related pauses are proportional
to size of their heaps which is approximately 1 second
The paper classifies each major JVM collector’s mecha-
for each gigabyte of live objects. So, a larger heap
nisms and characteristics and discusses the trade-offs
(which can be advantageous for most apps) means a
involved in balancing requirements for responsiveness,
longer pause. Worse yet, if you run a 20 minute test
throughput, space, and available memory across varying
and tune until all the pauses go away, the likelihood is
scale levels. The paper concludes with some pitfalls,
that you’ve simply moved the pause to the 21st minute.
common misconceptions, and “myths” around garbage
TABLE OF CONTENTS
Why Care About the Java Garbage Collector? 2 Remembered Set 5 Oracle HotSpot CMS 7
Classifying the Collector 3 Commercial Implementations 6 Oracle HotSpot G1GC (Garbage First) 7
Steps in Garbage Collection 3 What Developers and Architects Can Do 6 Azul Systems Zing C4 8
Mark 3 Garbage Collection Metrics 6 GC Tuning Observations 9
Sweep 4 The Need for Empty Memory for GC 6 Summary 10
Compact 4 GC Strategy: Delaying the Inevitable 6 About Azul Systems 10
Types of Collectors 5 Choosing a Garbage Collector 7 Appendix A: Garbage Collection Terminology 10
2 Generational Collectors 5 Oracle HotSpot ParallelGC 7
Understanding Java Garbage Collection
So unfortunately, the pause will still happen and your Concurrent collector – performs garbage collection
application will suffer. Inaddition, the presence of concurrently while application execution continues.
garbage collection doesn’t eliminate object leaks –
Parallel collector – uses multiple threads. A collector
the developer still has to find and fix references holding
can be concurrent but not parallel, and it can be
those leaked objects.
concurrent AND parallel. (Side note – be cautious when
researching older literature on garbage collection, since
The good news is Java does provide some level of GC
what we used to call parallel is now called concurrent.)
control. Developers and architects can make decisions
that can adjust application performance, due to the Stop-the-world (STW) – is the opposite of concurrent.
behavior of the garbage collector. For example, in C++ it It performs garbage collection while the application is
makes sense to null every reference field when it’s no completely stopped.
longer needed. However, in a Java program, coding in
Incremental – performs garbage collection as a series
nullifiers everywhere is disastrous and far worse than
of smaller increments with potentially long gaps in
coding in nothing. If every single class uses a finalizer to
between. The application is stopped during garbage
null reference fields, the garbage collector will potentially
collection but runs in between increments.
have to perform millions of object finalizations per GC
cycle – leading to very long garbage collection pauses. Moving – the collector moves objects during garbage
collection and has to update references to those live
objects.
Conservative – most non-managed runtimes are
conservative. In this model, the collector is unsure of
whether a field is a reference or not, so it assumes that
it is. This is in contrast to a Precise Collector.
Precise – a precise collector knows exactly where every
possible object reference is. A collector cannot be a
moving collector without also being precise, because you
have to know which references to fix when you move the
live objects. Precise collectors identify the live objects in
the memory heap, reclaim resources held by dead
objects and periodically relocate live objects.
Trying to solve garbage collection at the application Most of the work the virtual machine does to be precise,
programming layer is dangerous. It takes a lot of practice is actually in the compiler, not the collector itself. All
and understanding to get it right; time that could better commercial JVMs today are moving and precise.
spent building value-added features. And, even if you
make all the right decisions, it is likely that other code
your application leverages will not be optimized or the Safepoint opportunities in your code should be
application workload changes over time, and your frequent. If the garbage collector has to wait for a
application will still have GC related performance issues. safepoint that is minutes (or longer) away, your
Also, depending on the characteristics of your applica- application could run out of memory and crash
tion, choosing the wrong garbage collector type or using before garbage can be collected.
the wrong settings can greatly increase pause times or
even cause out-of-memory crashes. With a proper
understand of garbage collection and what your available Steps in Garbage Collection
options are, you can make better informed decisions and Before the garbage collector can reclaim memory, it
product choices that can improve the performance and must ensure the application is at a ‘GC safepoint’. A
reliability of your application at runtime. GC safepoint is a point or range in a thread’s execution
where the collector can identify all the references in the
Classifying the Collector thread’s execution stack. The terms ‘safepoint’ and ‘GC
Garbage collectors are divided into several types. For safepoint’ are often used interchangeably, however many
each type some collectors are categorized as ‘mostly’, types of safepoints exist, some of which require more
as in ‘mostly concurrent’. This means that sometimes it information than a GC safepoint. A ‘Global Safepoint’ is
doesn’t operate according to that classification and has when all application threads are at a safepoint.
a fallback mechanism for when that occurs. So, a
‘mostly concurrent’ collector may operate concurrently Safepoint opportunities in your code should be frequent.
with application execution and only occasionally If the garbage collector has to wait for a safepoint that is
stop-the-world if needed. minutes (or longer) away, your application could run out
3
Understanding Java Garbage Collection
Mark
This phase, also known as ‘trace’, finds all the live
objects in the heap. The process starts from the ‘roots’,
which includes thread stacks, static variables, special
references from JNI code and other areas where live
objects are likely to be found. A reference to an object
can only prevent the object from being garbage collected,
if the reference chains from a GC root.
Types of Collectors
Mark/Sweep/Compact Collector – performs the three
phases as three separate steps.
Because these young generation objects die quickly, Oracle’s HotSpot uses what’s called a ‘blind store’.
the live set in the young generation takes up a small Every time you store a reference it marks a card. This
percentage of the available space. Thus a moving works well, because checking the reference takes more
collector makes sense, since we have space in which CPU time, so the system saves time by just marking
5 the card.
Understanding Java Garbage Collection
Oracle’s JRockit* Dynamic Garbage Monolithic, stop-the-world, Mark/Sweep - can choose mostly
Collector copying concurrent or parallel, incremental compaction,
fall back to monolithic stop-the-world
IBM J9* Balanced Monolithic, stop-the-world, Mostly concurrent marker, mostly incremental
copying compaction, fall back to monolithic stop-the-world
IBM J9* Opt throughput Monolithic, stop-the-world, Parallel Mark/Sweep, stop-the-world compaction
copying
140000
120000
Te s t B e n c h o p s / s e c
100000
80000
CMS
60000
parrallel-gc
40000
C4-NoConcYoung
20000
C4
0
0 5 10 15 20 25 30 35
Heap Size (GB)
10
1 CMS
parrallel-gc
0.1 C4-NoConcYoung
C4
0.01
0 5 10 15 20 25 30 35
9
Understanding Java Garbage Collection