Java Performance Companion
Java Performance Companion
com
Java Performance
®
Companion
www.allitebooks.com
This page intentionally left blank
www.allitebooks.com
Java Performance
®
Companion
Charlie Hunt
Monica Beckwith
Poonam Parhar
Bengt Rutisson
Boston • Columbus • Indianapolis • New York • San Francisco • Amsterdam • Cape Town
Dubai • London • Madrid • Milan • Munich • Paris • Montreal • Toronto • Delhi • Mexico City
São Paulo • Sidney • Hong Kong • Seoul • Singapore • Taipei • Tokyo
www.allitebooks.com
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim,
the designations have been printed with initial capital letters or in all capitals.
The authors and publisher have taken care in the preparation of this book, but make no expressed or
implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed
for incidental or consequential damages in connection with or arising out of the use of the information or
programs contained herein.
For information about buying this title in bulk quantities, or for special sales opportunities (which may include
electronic versions; custom cover designs; and content particular to your business, training goals, marketing
focus, or branding interests), please contact our corporate sales department at [email protected] or
(800) 382-3419.
For government sales inquiries, please contact [email protected].
For questions about sales outside the United States, please contact [email protected].
Visit us on the Web: informit.com/aw
Cataloging-in-Publication Data is on file with the Library of Congress.
Copyright © 2016 Pearson Education, Inc.
All rights reserved. Printed in the United States of America. This publication is protected by copyright, and
permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval
system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or
likewise. For information regarding permissions, request forms and the appropriate contacts within the
Pearson Education Global Rights & Permissions Department, please visit www.pearsoned.com/permissions.
ISBN-13: 978-0-13-379682-7
ISBN-10: 0-13-379682-5
Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, Indiana.
First printing, April 2016
www.allitebooks.com
Contents
Preface ix
Acknowledgments xi
About the Authors xv
www.allitebooks.com
vi Contents
www.allitebooks.com
Contents vii
www.allitebooks.com
viii Contents
Index 155
www.allitebooks.com
Preface
Welcome to the Java® Performance Companion. This book offers companion material
to Java™ Performance [1], which was first published in September 2011. Although
the additional topics covered in this book are not as broad as the material in Java™
Performance, they go into enormous detail. The topics covered in this book are the G1
garbage collector, also known as the Garbage First garbage collector, and the Java
HotSpot VM Serviceability Agent. There is also an appendix that covers additional
HotSpot VM command-line options of interest that were not included in the Java™
Performance appendix on HotSpot VM command-line options.
If you are currently using Java 8, have interest in migrating to Java 8, or have
plans for using Java 9, you will likely be either evaluating G1 GC or already using
it. Hence, the information in this book will be useful to you. If you have interest in
diagnosing unexpected HotSpot VM failures, or in learning more about the details of
a modern Java Virtual Machine, this book’s content on the HotSpot VM Serviceability
Agent should be of value to you, too. The HotSpot VM Serviceability Agent is the tool
of choice for not only HotSpot VM developers but also the Oracle support engineers
whose daily job involves diagnosing and troubleshooting unexpected HotSpot VM
behavior.
This book begins with an overview of the G1 garbage collector by offering some
context around why G1 was implemented and included in HotSpot VM as a GC. It then
goes on to offer an overview of how the G1 garbage collector works. This chapter is
followed by two additional chapters on G1. The first is an in-depth description of the
internals of G1. If you already have a good understanding of how the G1 garbage
ix
www.allitebooks.com
x Preface
collector works, and either have a need to further fine-tune G1 or want to know more
about its inner workings, this chapter would be a great place to start. The third chap-
ter on G1 is all about fine-tuning G1 for your application. One of the main design
points for G1 was to simplify the tuning required to realize good performance. For
instance, the major inputs into G1 are the initial and maximum Java heap size it can
use, and a maximum GC pause time you are willing to tolerate. From there G1 will
attempt to adaptively adjust to meet those inputs while it executes your application.
In circumstances where you would like to achieve better performance, or you would
like to do some additional tuning on G1, this chapter has the information you are
looking for.
The remaining chapter is dedicated entirely to the HotSpot VM Serviceability
Agent. This chapter provides an in-depth description of and instructions for how to
use the Serviceability Agent. If you have interest in learning more about the inter-
nals of the HotSpot VM, or how to troubleshoot and diagnose unexpected HotSpot
VM issues, this is a good chapter for you. In this chapter you will learn how to use
the HotSpot VM Serviceability Agent to observe and analyze HotSpot VM behavior
in a variety of ways through examples and illustrations.
Last, there is an appendix that includes HotSpot VM command-line options that
were not included in Java™ Performance’s appendix on HotSpot VM command-line
options. Many of the HotSpot VM command-line options found in the appendix are
related to G1. And, rather than merely listing these options with only a description,
an attempt is made to also mention when it is appropriate to use them.
References
[1] Charlie Hunt and Binu John. JavaTM Performance. Addison-Wesley, Upper Saddle
River, NJ, 2012. ISBN 978-0-13-714252-1.
Charlie Hunt
For those who have ever considered writing a book, or are curious about the effort
involved in doing so, the book-writing experience is a major undertaking! For me it
just would not have happened without the help of so many people. I cannot begin to
mention everyone who made this possible.
In an attempt to at least name those who have had a profound impact on this book
getting drafted and eventually into print, I would first like to thank my coauthors,
Monica Beckwith, Bengt Rutisson, and Poonam Parhar. When the idea of doing a
companion book to Java™ Performance first surfaced, I thought it would be great
to offer the opportunity to these talented HotSpot VM engineers to showcase their
expertise. I am sure I have learned much more from each of them than they have
learned from me. I could not be prouder of their contributions to this book.
I also extend sincere thanks to Monica Beckwith for her persistence and passion
in sharing her in-depth knowledge of G1 GC. In the early days of G1, I had the plea-
sure of working with Monica on a daily basis on G1 performance, eventually handing
off full reins to her. She has done an exceptional job with driving G1’s performance
and sharing her G1 knowledge.
I also have to explicitly call out Poonam Parhar and thank her for her patience.
Poonam so patiently waited for the other contributors to complete their initial
drafts—patiently as in years of patience! Had all of us finished our drafts in a timely
way, this book probably would have been on the shelf at least two years earlier.
xi
xii Acknowledgments
Monica Beckwith
I felt honored when I was approached by my mentor Charlie Hunt to write a few
chapters for this book. I didn’t have the slightest idea that it would take me so
long. So, my first set of thanks goes to my fellow writers for their patience and to
Charlie for his persistence and encouragement throughout. While we are talking
about encouragement, I want to thank my hubby, Ben Beckwith—when he saw my
frustration he had nothing but words of encouragement for me. He was also the
initial reviewer of my drafts. Thank you, Ben. And then, of course, my two kiddos,
Annika and Bodin, and my mom, Usha, who have been nothing but supportive of me
and of this book.
My technical strength on G1 taps off John Cuthbertson, and I am thankful to
him for supporting my crazy queries and patiently listening and working with
me to “make G1 adaptive” and to “tame mixed collections.” When we used to
discuss the adaptive marking threshold, I got tired of typing and talking about
InitiatingHeapOccupancyPercent, so I shortened it to IHOP and John just loved
it. It’s really hard to find such supportive colleagues as John and Charlie.
And then there are Paul Hohensee and Tony Printezis. They are my mentors in
their own right, and I can assure you that their persistence in reviewing my chapters
has improved the readability and content by at least 75 percent! :)
Thank you all for trusting me and encouraging me. I am forever in your debt!
Acknowledgments xiii
Poonam Parhar
I was deeply honored and excited when Charlie suggested that I write a chapter on the
Serviceability Agent. I thought it was a great idea, as this wonderful tool is little known
to the world, and it would be great to talk about its usefulness and capabilities. But I had
never written a book before, and I was nervous. Big thanks to Charlie for his trust in me,
for his encouragement, and for guiding me throughout writing the chapter on the SA.
I would like to thank my manager, Mattis Castegren, for always being supportive
and encouraging of my work on this book, and for being the first reviewer of the
chapter on the SA. Huge thanks to Kevin Walls for reviewing my chapter and helping
me improve the quality of the content.
Special thanks to my husband, Onkar, who is my best friend, too, for being sup-
portive and always being there whenever I need help. And of course I am grateful
to my two little angels, Amanvir and Karanvir, who are my continuous source of
motivation and happiness.
And my most sincere thanks to my father, Subhash C. Bajaj, for his infectious
cheerfulness and for being a source of light, and for always inspiring me to never
give up.
Bengt Rutisson
When Charlie asked me to write a chapter for this book, I was very honored and
flattered. I had never written a book before and clearly had no idea how much
work it is—even to write just one chapter! I am very grateful for all the support
from Charlie and the reviewers. Without their help, I would not have been able to
complete this chapter.
A big thanks to my wife, Sara Fritzell, who encouraged me throughout the work
and helped me set up deadlines to get the chapter completed. And, of course, many
thanks to our children, Max, Elsa, Teo, Emil, and Lina, for putting up with me during
the writing period.
I would also like to thank all of the members of the HotSpot GC engineering
team, both past and present. They are by far the most talented bunch of engineers
I have ever worked with. I have learned so much from all of them, and they have all
inspired me in so many ways.
This page intentionally left blank
About the Authors
Charlie Hunt (Chicago, IL) is currently a JVM Engineer at Oracle leading a variety
of Java SE and HotSpot VM projects whose primary focus is reducing memory
footprint while maintaining throughput and latency. He is also the lead author of
Java™ Performance (Addison-Wesley, 2012). He is a regular presenter at the JavaOne
Conference where he has been recognized as a Java Rock Star. He has also been
a speaker at other well-known conferences, including QCon, Velocity, GoTo, and
Dreamforce. Prior to leading a variety of Java SE and HotSpot VM projects for Oracle,
Charlie worked in several different performance positions, including Performance
Engineering Architect at Salesforce.com and HotSpot VM Performance Architect at
Oracle and Sun Microsystems. He wrote his first Java application in 1998, joined
Sun Microsystems in 1999 as Senior Java Architect, and has had a passion for Java
and JVM performance ever since.
xv
xvi About the Authors
Poonam Parhar (Santa Clara, CA) is currently a JVM Sustaining Engineer at Oracle
where her primary responsibility is to resolve customer-escalated problems against
JRockit and HotSpot VMs. She loves debugging and troubleshooting problems and
is always focused on improving the serviceability and supportability of the HotSpot
VM. She has nailed down many complex garbage collection issues in the HotSpot VM
and is passionate about improving the debugging tools and the serviceability of the
product so as to make it easier to troubleshoot and fix garbage-collector-related issues
in the HotSpotVM. She has made several contributions to the Serviceability Agent
debugger and also developed a VisualVM plugin for it. She presented “VisualVM
Plugin for the SA” at the JavaOne 2011 conference. In an attempt to help customers
and the Java community, she shares her work experiences and knowledge through
the blog she maintains at https://blogs.oracle.com/poonam/.
This chapter is an introduction to the Garbage First (or G1) garbage collector (GC)
along with a historical perspective on the garbage collectors in the Java HotSpot
Virtual Machine (VM), hereafter called just HotSpot, and the reasoning behind
G1’s inclusion in HotSpot. The reader is assumed to be familiar with basic garbage
collection concepts such as young generation, old generation, and compaction.
Chapter 3, “JVM Overview,” of the book Java™ Performance [1] is a good source for
learning more about these concepts.
Serial GC was the first garbage collector introduced in HotSpot in 1999 as part
of Java Development Kit (JDK) 1.3.1. The Parallel and Concurrent Mark Sweep
collectors were introduced in 2002 as part of JDK 1.4.2. These three collectors roughly
correspond to the three most important GC use cases: “minimize memory footprint
and concurrent overhead,” “maximize application throughput,” and “minimize
GC-related pause times.” One might ask, “Why do we need a new collector such as
G1?” Before answering, let’s clarify some terminology that is often used when comparing
and contrasting garbage collectors. We’ll then move on to a brief overview of the four
HotSpot garbage collectors, including G1, and identify how G1 differs from the others.
Terminology
In this section, we define the terms parallel, stop-the-world, and concurrent. The term
parallel means a multithreaded garbage collection operation. When a GC event activity
is described as parallel, multiple threads are used to perform it. When a garbage
1
2 Chapter 1
Garbage First Overview
Parallel GC
Parallel GC is a parallel stop-the-world collector, which means that when a GC occurs, it
stops all application threads and performs the GC work using multiple threads. The GC
work can thus be done very efficiently without any interruptions. This is normally the
best way to minimize the total time spent doing GC work relative to application work.
However, individual pauses of the Java application induced by GC can be fairly long.
Both the young and old generation collections in Parallel GC are parallel and
stop-the-world. Old generation collections also perform compaction. Compaction
moves objects closer together to eliminate wasted space between them, leading to an
optimal heap layout. However, compaction may take a considerable amount of time,
which is generally a function of the size of the Java heap and the number and size
of live objects in the old generation.
At the time when Parallel GC was introduced in HotSpot, only the young
generation used a parallel stop-the-world collector. Old generation collections used a
single-threaded stop-the-world collector. Back when Parallel GC was first introduced,
the HotSpot command-line option that enabled Parallel GC in this configuration was
-XX:+UseParallelGC.
At the time when Parallel GC was introduced, the most common use case for
servers required throughput optimization, and hence Parallel GC became the
default collector for the HotSpot Server VM. Additionally, the sizes of most Java
heaps tended to be between 512MB and 2GB, which keeps Parallel GC pause times
Parallel GC 3
relatively low, even for single-threaded stop-the-world collections. Also at the time,
latency requirements tended to be more relaxed than they are today. It was common
for Web applications to tolerate GC-induced latencies in excess of one second, and as
much as three to five seconds.
As Java heap sizes and the number and size of live objects in old generation grew,
the time to collect the old generation became longer and longer. At the same time,
hardware advances made more hardware threads available. As a result, Parallel GC
was enhanced by adding a multithreaded old generation collector to be used with a
multithreaded young generation collector. This enhanced Parallel GC reduced the
time required to collect and compact the heap.
The enhanced Parallel GC was delivered in a Java 6 update release. It was
enabled by a new command-line option called -XX:+UseParallelOldGC. When
-XX:+UseParallelOldGC is enabled, parallel young generation collection is also
enabled. This is what we think of today as Parallel GC in HotSpot, a multithreaded
stop-the-world young generation collector combined with a multithreaded stop-the-
world old generation collector.
Tip
In Java 7 update release 4 (also referred to as Java 7u4, or JDK 7u4), - XX:+UseParallelOldGC
was made the default GC and the normal mode of operation for Parallel GC. As of Java 7u4,
specifying -XX:+UseParallelGC also enables -XX:+UseParallelOldGC, and likewise
specifying -XX:+UseParallelOldGC also enables -XX:+UseParallelGC.
GC App
Threads Threads
Parallel GC works well for applications that meet these requirements. For
applications that do not meet these requirements, pause times can become excessively
long, since a full GC must mark through the entire Java heap and also compact the
old generation space. As a result, pause times tend to increase with increased Java
heap sizes.
Figure 1.1 illustrates how the Java application threads (gray arrows) are stopped
and the GC threads (black arrows) take over to do the garbage collection work. In
this diagram there are eight parallel GC threads and eight Java application threads,
although in most applications the number of application threads usually exceeds the
number of GC threads, especially in cases where some application threads may be
idle. When a GC occurs, all application threads are stopped, and multiple GC threads
execute during GC.
Serial GC
Serial GC is very similar to Parallel GC except that it does all its work in a single
thread. The single-threaded approach allows for a less complex GC implementation
and requires very few external runtime data structures. The memory footprint is the
lowest of all HotSpot collectors. The challenges with Serial GC are similar to those
for Parallel GC. Pause times can be long, and they grow more or less linearly with
the heap size and amount of live data. In addition, with Serial GC the long pauses
are more pronounced, since the GC work is done in a single thread.
Concurrent Mark Sweep (CMS) GC 5
GC App
Thread Threads
Figure 1.2 How Java application threads are interrupted by a single GC thread when
Serial GC is used
Because of the low memory footprint, Serial GC is the default on the Java HotSpot
Client VM. It also addresses the requirements for many embedded use cases. Serial
GC can be explicitly specified as the GC to use with the -XX:+UseSerialGC HotSpot
command-line option.
Figure 1.2 illustrates how Java application threads (gray arrows) are stopped
and a single GC thread (black arrow) takes over to do the garbage collection work
on a machine running eight Java application threads. Because it is single-threaded,
Serial GC in most cases will take longer to execute a GC event than Parallel GC since
Parallel GC can spread out the GC work to multiple threads.
application threads. To achieve this, the CMS old generation collector does most of its
work concurrently with application thread execution, except for a few relatively short
GC synchronization pauses. CMS is often referred to as mostly concurrent, since there
are some phases of old generation collection that pause application threads. Exam-
ples are the initial-mark and remark phases. In CMS’s initial implementation, both
the initial-mark and remark phases were single-threaded, but they have since been
enhanced to be multithreaded. The HotSpot command-line options to support mul-
tithreaded initial-mark and remark phases are -XX:+CMSParallelInitialMark
Enabled and -XX:CMSParallelRemarkEnabled. These are automatically
enabled by default when CMS GC is enabled by the -XX:+UseConcurrent
MarkSweepGC command-line option.
It is possible, and quite likely, for a young generation collection to occur while
an old generation concurrent collection is taking place. When this happens, the old
generation concurrent collection is interrupted by the young generation collection
and immediately resumes upon the latter’s completion. The default young generation
collector for CMS GC is commonly referred to as ParNew.
Figure 1.3 shows how Java application threads (gray arrows) are stopped for the
young GCs (black arrows) and for the CMS initial-mark and remark phases, and old
generation GC stop-the-world phases (also black arrows). An old generation collec-
tion in CMS GC begins with a stop-the-world initial-mark phase. Once initial mark
completes, the concurrent marking phase begins where the Java application threads
are allowed to execute concurrently with the CMS marking threads. In Figure 1.3,
the concurrent marking threads are the first two longer black arrows, one on top of
the other below the “Marking/Pre-cleaning” label. Once concurrent marking com-
pletes, concurrent pre-cleaning is executed by the CMS threads, as shown by the
two shorter black arrows under the “Marking/Pre-cleaning” label. Note that if there
are enough available hardware threads, CMS thread execution overhead will not
have much effect on the performance of Java application threads. If, however, the
hardware threads are saturated or highly utilized, CMS threads will compete for
CPU cycles with Java application threads. Once concurrent pre-cleaning completes,
the stop-the-world remark phase begins. The remark phase marks objects that may
have been missed after the initial mark and while concurrent marking and concur-
rent pre-cleaning execute. After the remark phase completes, concurrent sweeping
begins, which frees all dead object space.
One of the challenges with CMS GC is tuning it such that the concurrent work
can complete before the application runs out of available Java heap space. Hence,
one tricky part about CMS is to find the right time to start the concurrent work.
A common consequence of the concurrent approach is that CMS normally requires
on the order of 10 to 20 percent more Java heap space than Parallel GC to handle
the same application. That is part of the price paid for shorter GC pause times.
Concurrent Mark Sweep (CMS) GC 7
Concurrent
Marking/Pre-cleaning Sweeping
App App
Initial-mark Remark
Threads Threads
Figure 1.3 How Java application threads are impacted by the GC threads
when CMS is used
Another challenge with CMS GC is how it deals with fragmentation in the old
generation. Fragmentation occurs when the free space between objects in the
old generation becomes so small or nonexistent that an object being promoted from
the young generation cannot fit into an available hole. The CMS concurrent collec-
tion cycle does not perform compaction, not even incremental or partial compaction.
A failure to find an available hole causes CMS to fall back to a full collection using
Serial GC, typically resulting in a lengthy pause. Another unfortunate challenge
associated with fragmentation in CMS is that it is unpredictable. Some application
runs may never experience a full GC resulting from old generation fragmentation
while others may experience it regularly.
Tuning CMS GC can help postpone fragmentation, as can application modifica-
tions such as avoiding large object allocations. Tuning can be a nontrivial task and
requires much expertise. Making changes to the application to avoid fragmentation
may also be challenging.
be decided up front where the young and old generations should be placed in the
virtual address space, since the young and old generations are separate consecutive
chunks of memory.
Tip
The term to describe the collection of a subset of old generation regions in conjunction with a
young collection is mixed GC. Hence, a mixed GC is a GC event in which all young generation
regions are collected in addition to a subset of old generation regions. In other words, a mixed
GC is a mix of young and old generation regions that are being collected.
Garbage First (G1) GC 9
Similar to CMS GC, there is a fail-safe to collect and compact the entire old
generation in dire situations such as when old generation space is exhausted.
A G1 old generation collection, ignoring the fail-safe type of collection, is a set of
phases, some of which are parallel stop-the-world and some of which are parallel con-
current. That is, some phases are multithreaded and stop all application threads, and
others are multithreaded and execute at the same time as the application threads.
Chapters 2 and 3 provide more detail on each of these phases.
G1 initiates an old generation collection when a Java heap occupancy threshold is
exceeded. It is important to note that the heap occupancy threshold in G1 measures
the old generation occupancy compared to the entire Java heap. Readers who are
familiar with CMS GC remember that CMS initiates an old generation collection
using an occupancy threshold applied against the old generation space only. In G1,
once the heap occupancy threshold is reached or exceeded, a parallel stop-the-world
initial-mark phase is scheduled to execute.
The initial-mark phase executes at the same time as the next young GC. Once the
initial-mark phase completes, a concurrent multithreaded marking phase is initiated
to mark all live objects in the old generation. When the concurrent marking phase is
completed, a parallel stop-the-world remark phase is scheduled to mark any objects
that may have been missed due to application threads executing concurrently with
the marking phase. At the end of the remark phase, G1 has full marking information
on the old generation regions. If there happen to be old generation regions that do
not have any live objects in them, they can be reclaimed without any additional GC
work during the next phase of the concurrent cycle, the cleanup phase.
Also at the end of the remark phase, G1 can identify an optimal set of old
generations to collect.
Tip
The set of regions to collect during a garbage collection is referred to as a collection set (CSet).
The regions selected for inclusion in a CSet are based on how much space can be freed
and the G1 pause time target. After the CSet has been identified, G1 schedules a GC
to collect regions in the CSet during the next several young generation GCs. That
is, over the next several young GCs, a portion of the old generation will be collected
in addition to the young generation. This is the mixed GC type of garbage collection
event mentioned earlier.
With G1, every region that is garbage collected, regardless of whether it is young
or old generation, has its live objects evacuated to an available region. Once the live
objects have been evacuated, the young and/or old regions that have been collected
become available regions.
An attractive outcome of evacuating live objects from old generation regions into
available regions is that the evacuated objects end up next to each other in the virtual
10 Chapter 1
Garbage First Overview
G1 Design
As mentioned earlier, G1 divides the Java heap into regions. The region size can vary
depending on the size of the heap but must be a power of 2 and at least 1MB and at
most 32MB. Possible region sizes are therefore 1, 2, 4, 8, 16, and 32MB. All regions
are the same size, and their size does not change during execution of the JVM. The
region size calculation is based on the average of the initial and maximum Java
heap sizes such that there are about 2000 regions for that average heap size. As an
example, for a 16GB Java heap with -Xmx16g -Xms16g command-line options, G1
will choose a region size of 16GB/2000 = 8MB.
Garbage First (G1) GC 11
If the initial and maximum Java heap sizes are far apart or if the heap size is very
large, it is possible to have many more than 2000 regions. Similarly, a small heap size
may end up with many fewer than 2000 regions.
Each region has an associated remembered set (a collection of the locations that
contain pointers into the region, shortened to RSet). The total RSet size is limited but
noticeable, so the number of regions has a direct effect on HotSpot’s memory footprint.
The total size of the RSets heavily depends on application behavior. At the low end,
RSet overhead is around 1 percent and at the high end 20 percent of the heap size.
A particular region is used for only one purpose at a time, but when the region is
included in a collection, it will be completely evacuated and released as an available
region.
There are several types of regions in G1. Available regions are currently unused.
Eden regions constitute the young generation eden space, and survivor regions con-
stitute the young generation survivor space. The set of all eden and survivor regions
together is the young generation. The number of eden or survivor regions can change
from one GC to the next, between young, mixed, or full GCs. Old generation regions
comprise most of the old generation. Finally, humongous regions are considered to
be part of the old generation and contain objects whose size is 50 percent or more of
a region. Until a JDK 8u40 change, humongous regions were collected as part of the
old generation, but in JDK 8u40 certain humongous regions are collected as part of
a young collection. There is more detail on humongous regions later in this chapter.
The fact that a region can be used for any purpose means that there is no need
to partition the heap into contiguous young and old generation segments. Instead,
G1 heuristics estimate how many regions the young generation can consist of and
still be collected within a given GC pause time target. As the application starts allo-
cating objects, G1 chooses an available region, designates it as an eden region, and
starts handing out memory chunks from it to Java threads. Once the region is full,
another unused region is designated an eden region. The process continues until the
maximum number of eden regions is reached, at which point a young GC is initiated.
During a young GC, all young regions, eden and survivor, are collected. All live
objects in those regions are evacuated to either a new survivor region or to an old
generation region. Available regions are tagged as survivor or old generation regions
as needed when the current evacuation target region becomes full.
When the occupancy of the old generation space, after a GC, reaches or
exceeds the initiating heap occupancy threshold, G1 initiates an old generation
collection. The occupancy threshold is controlled by the command-line option
-XX:InitiatingHeapOccupancyPercent, which defaults to 45 percent of the
Java heap.
G1 can reclaim old generation regions early when the marking phase shows that
they contain no live objects. Such regions are added to the available region set. Old
regions containing live objects are scheduled to be included in a future mixed collection.
12 Chapter 1
Garbage First Overview
Humongous Objects
G1 deals specially with large object allocations, or what G1 calls “humongous objects.”
As mentioned earlier, a humongous object is an object that is 50 percent or more
of a region size. That size includes the Java object header. Object header sizes
vary between 32- and 64-bit HotSpot VMs. The header size for a given object within
a given HotSpot VM can be obtained using the Java Object Layout tool, also
known as JOL. As of this writing, the Java Object Layout tool can be found on the
Internet [2].
When a humongous object allocation occurs, G1 locates a set of consecutive avail-
able regions that together add up to enough memory to contain the humongous
object. The first region is tagged as a “humongous start” region and the other regions
are marked as “humongous continues” regions. If there are not enough consecutive
available regions, G1 will do a full GC to compact the Java heap.
Humongous regions are considered part of the old generation, but they contain
only one object. This property allows G1 to eagerly collect a humongous region
when the concurrent marking phase detects that it is no longer live. When this
happens, all the regions containing the humongous object can be reclaimed
at once.
A potential challenge for G1 is that short-lived humongous objects may not be
reclaimed until well past the point at which they become unreferenced. JDK 8u40
implemented a method to, in some cases, reclaim a humongous region during a young
collection. Avoiding frequent humongous object allocations can be crucial to achieving
application performance goals when using G1. The enhancements available in JDK
8u40 help but may not be a solution for all applications having many short-lived
humongous objects.
goals without requiring a full GC and can usually be tuned such that a full GC is
not needed.
Concurrent Cycle
A G1 concurrent cycle includes the activity of several phases: initial marking,
concurrent root region scanning, concurrent marking, remarking, and cleanup. The
beginning of a concurrent cycle is the initial mark, and the ending phase is cleanup.
All these phases are considered part of “marking the live object graph” with the
exception of the cleanup phase.
The purpose of the initial-mark phase is to gather all GC roots. Roots are the
starting points of the object graphs. To collect root references from application
threads, the application threads must be stopped; thus the initial-mark phase is
stop-the-world. In G1, the initial marking is done as part of a young GC pause since
a young GC must gather all roots anyway.
The marking operation must also scan and follow all references from objects in
the survivor regions. This is what the concurrent root region scanning phase does.
During this phase all Java threads are allowed to execute, so no application pauses
occur. The only limitation is that the scanning must be completed before the next GC
is allowed to start. The reason for that is that a new GC will generate a new set of
survivor objects that are different from the initial mark’s survivor objects.
Most marking work is done during the concurrent marking phase. Multiple
threads cooperate to mark the live object graph. All Java threads are allowed to
execute at the same time as the concurrent marking threads, so there is no pause in
the application, though an application may experience some throughput reduction.
After concurrent marking is done, another stop-the-world phase is needed to
finalize all marking work. This phase is called the “remark phase” and is usually a
very short stop-the-world pause.
The final phase of concurrent marking is the cleanup phase. In this phase, regions
that were found not to contain any live objects are reclaimed. These regions are not
included in a young or mixed GC since they contain no live objects. They are added
to the list of available regions.
The marking phases must be completed in order to find out what objects are live
so as to make informed decisions about what regions to include in the mixed GCs.
Since it is the mixed GCs that are the primary mechanism for freeing up memory in
G1, it is important that the marking phase finishes before G1 runs out of available
regions. If the marking phase does not finish prior to running out of available regions,
G1 will fall back to a full GC to free up memory. This is reliable but slow. Ensuring
that the marking phases complete in time to avoid a full GC may require tuning,
which is covered in detail in Chapter 3.
www.allitebooks.com
14 Chapter 1
Garbage First Overview
Heap Sizing
The Java heap size in G1 is always a multiple of the region size. Except for that
limitation, G1 can grow and shrink the heap size dynamically between -Xms and -Xmx
just as the other HotSpot GCs do.
G1 may increase the Java heap size for several reasons:
1. An increase in size can occur based on heap size calculations during a full GC.
2. When a young or mixed GC occurs, G1 calculates the time spent to perform
the GC compared to the time spent executing the Java application. If too much
time is spent in GC according to the command-line setting -XX:GCTimeRatio,
the Java heap size is increased. The idea behind growing the Java heap size in
this situation is to allow GCs to happen less frequently so that the time spent
in GC compared to the time spent executing the application is reduced.
The default value for -XX:GCTimeRatio in G1 is 9. All other HotSpot garbage
collectors default to a value of 99. The larger the value for GCTimeRatio,
the more aggressive the increase in Java heap size. The other HotSpot collectors
are thus more aggressive in their decision to increase Java heap size and by
default are targeted to spend less time in GC relative to the time spent executing
the application.
3. If an object allocation fails, even after having done a GC, rather than immediately
falling back to doing a full GC, G1 will attempt to increase the heap size to
satisfy the object allocation.
4. If a humongous object allocation fails to find enough consecutive free regions to
allocate the object, G1 will try to expand the Java heap to obtain more available
regions rather than doing a full GC.
5. When a GC requests a new region into which to evacuate objects, G1 will prefer
to increase the size of the Java heap to obtain a new region rather than failing
the GC and falling back to a full GC in an attempt to find an available region.
References
[1] Charlie Hunt and Binu John. Java™ Performance. Addison-Wesley, Upper Saddle
River, NJ, 2012. ISBN 978-0-13-714252-1.
[2] “Code Tools: jol.” OpenJDK, circa 2014. http://openjdk.java.net/projects/code-tools/
jol/.
2
Garbage First Garbage
Collector in Depth
Background
G1 GC is the latest addition to the Java HotSpot Virtual Machine. It is a compacting
collector based on the principle of collecting the most garbage first, hence the name
“Garbage First” GC. G1 GC has incremental, parallel, stop-the-world pauses that
15
16 Chapter 2
Garbage First Garbage Collector in Depth
achieve compaction via copying and also has parallel, multistaged concurrent
marking that helps in reducing the mark, remark, and cleanup pauses to a bare
minimum.
With the introduction of G1 GC, HotSpot JVM moved from the conventional heap
layout where the generations are contiguous to generations that are now composed of
noncontiguous heap regions. Thus, for an active Java heap, a particular region could
be a part of either eden or survivor or old generation, or it could be a humongous
region or even just a free region. Multiples of these regions form a “logical” generation
to match conventional wisdom formed by previous HotSpot garbage collectors’ idea
of generational spaces.
Garbage Collection in G1
G1 GC reclaims most of its heap regions during collection pauses. The only exception
to this is the cleanup stage of the multistaged concurrent marking cycle. During the
cleanup stage, if G1 GC encounters purely garbage-filled regions, it can immediately
reclaim those regions and return them to a linked list of free regions; thus, freeing up
those regions does not have to wait for the next garbage collection pause.
G1 GC has three main types of garbage collection cycles: a young collection cycle,
a multistage concurrent marking cycle, and a mixed collection cycle. There is also a
single-threaded (as of this writing) fallback pause called a “full” garbage collection
pause, which is the fail-safe mechanism for G1 GC in case the GC experiences
evacuation failures.
Tip
An evacuation failure is also known as a promotion failure or to-space exhaustion or even
to-space overflow. The failure usually happens when there is no more free space to promote
objects. When faced with such a scenario, all Java HotSpot VM GCs try to expand their heaps.
But if the heap is already at its maximum, the GC tries to tenure the regions where objects
were successfully copied and update their references. For G1 GC, the objects that could not
be copied are tenured in place. All GCs will then have their references self-forwarded. These
self-forwarded references are removed at the end of the garbage collection cycle.
Tip
“GC efficiency” actually refers to the ratio of the space to be reclaimed versus the estimated
GC cost to collect the region. Due to the lack of a better term, in this book we refer to the
sorting of the heap regions in order to identify candidate regions as calculating the “GC
efficiency.” The reason for using the same terminology is that GC efficiency evaluates the
benefit of collecting a region with respect to the cost of collecting it. And the “efficiency” that
we are referring to here is solely dependent on liveness accounting and hence is just the cost
of collecting a region. For example, an old region whose collection is less time-consuming
than other more expensive old regions is considered to be an “efficient” region. The most
efficient regions would be the first ones in the sorted array of regions.
At the end of a collection cycle, the regions that were a part of the collection set or
CSet (refer to the section “Collection Sets and Their Importance” later in this chapter
for more information) are guaranteed to be free and are returned to the free list. Let’s
talk about these concepts in more detail.
Tip
G1 GC will select the default of 200ms if -XX:MaxGCPauseMillis is not set on the
command line. If the user sets -Xmn or related young generation sizing command-line options
such as -XX:NewRatio, G1 GC may not be able to adjust the young generation size based
on the pause time goal, and hence the pause time target could become a moot option.
18 Chapter 2
Garbage First Garbage Collector in Depth
Based on the Java application’s object allocation rate, new free regions are
added to the young generation as needed until the desired generation size is
met. The heap region size is determined at the launch of the JVM. The heap
region size has to be a power of 2 and can range anywhere from 1MB to 32MB.
The JVM shoots for approximately 2048 regions and sets the heap region size
accordingly (Heap region size = Heap size/2048). The heap region size is aligned
and adjusted to fall within the 1MB to 32MB and power of 2 bounds. The adaptive
selection of heap region size can be overwritten on the command line by setting it
with -XX:G1HeapRegionSize=n. Chapter 3, “Garbage First Garbage Collector
Performance Tuning,” includes more information on when to override the JVM’s
automatic sizing.
Here the new size of the young generation can be calculated by adding the new Eden
size to the new Survivors size (i.e., 1043M + 13M = 1056M).
Humongous Regions
For the G1 GC, the unit of collection is a region. Hence, the heap region size
(-XX:G1HeapRegionSize) is an important parameter since it determines what
size objects can fit into a region. The heap region size also determines what objects
are characterized as “humongous.” Humongous objects are very large objects that
span 50 percent or more of a G1 GC region. Such an object doesn’t follow the usual
fast allocation path and instead gets allocated directly out of the old generation in
regions marked as humongous regions.
Figure 2.1 illustrates a contiguous Java heap with the different types of G1 regions
identified: young region, old region, and humongous region. Here, we can see that
each of the young and old regions spans one unit of collection. The humongous region,
20 Chapter 2
Garbage First Garbage Collector in Depth
Young Old
Region Region
Old Old
Region Region
Old
Region
Young
Humongous Region
Region
on the other hand, spans two units of collection, indicating that humongous regions
are formed of contiguous heap regions. Let’s look a little deeper into three different
humongous regions.
In Figure 2.2, Humongous Object 1 spans two contiguous regions. The first
contiguous region is labeled “StartsHumongous,” and the consecutive contiguous
region is labeled “ContinuesHumongous.” Also illustrated in Figure 2.2, Humongous
Object 2 spans three contiguous heap regions, and Humongous Object 3 spans just
one region.
Tip
Past studies of these humongous objects have indicated that the allocations of such objects
are rare and that the objects themselves are long-lived. Another good point to remember
is that the humongous regions need to be contiguous (as shown in Figure 2.2). Hence it
makes no sense to move them since there is no gain (no space reclamation), and it is very
expensive since large memory copies are not trivial in expense. Hence, in an effort to avoid
the copying expense of these humongous objects during young garbage collections, it was
deemed better to directly allocate the humongous objects out of the old generation. But in
recent years, there are many transaction-type applications that may have not-so-long-lived
humongous objects. Hence, various efforts are being made to optimize the allocation and
reclamation of humongous objects.
Humongous Regions 21
Humongous Object 1
StartsHumongous ContinuesHumongous
Humongous Object 2
Humongous
Object 3
StartsHumongous
Tip
An important potential issue or confusion needs to be highlighted here. Say the current G1
region size is 2MB. And say that the length of a byte array is aligned at 1MB. This byte array
will still be considered a humongous object and will need to be allocated as such, since the
1MB array length doesn’t include the array’s object header size.
22 Chapter 2
Garbage First Garbage Collector in Depth
Tip
In G1, the IHOP threshold defaults to 45 percent of the total Java heap. It is important
to note that this heap occupancy percentage applies to the entire Java heap, unlike the
heap occupancy command-line option used with CMS GC where it applies only to the old
generation. In G1 GC, there is no physically separate old generation—there is a single pool
of free regions that can be allocated as eden, survivor, old, or humongous. Also, the number
of regions that are allocated, for say the eden, can vary over time. Hence having an old
generation percentage didn’t really make sense.
Trivia
I used to write e-mails to G1 GC dev engineers and also to G1 GC users about the marking
threshold and would always have to refer to it in full as InitiatingHeapOccupancyPercent
since there is another difference (other than the one mentioned in the preceding tip)
between CMS and G1’s marking threshold—it’s the option name! CMS’s marking threshold
is called CMSInitiatingOccupancyFraction, and as you can see there is no “percent”
in the option name. So to avoid any confusion, I would always have to specify the full option
name for G1, and soon I developed a form of endearment for this option and started calling
it IHOP.
A Mixed Collection Pause 23
When the old generation occupancy reaches (or exceeds) the IHOP threshold,
a concurrent marking cycle is initiated. Toward the end of marking, G1 GC calculates
the amount of live objects per old region. Also, during the cleanup stage, G1 GC ranks
the old regions based on their “GC efficiency.” Now a mixed collection can happen!
During a mixed collection pause, G1 GC not only collects all of the regions in the
young generation but also collects a few candidate old regions such that the old
regions with the most garbage are reclaimed.
Tip
An important point to keep in mind when comparing CMS and G1 logs is that the multistaged
concurrent cycle in G1 has fewer stages than the multistaged concurrent cycle in CMS.
Thus the number of mixed collections per mixed collection cycle can be controlled
by the minimum old generation CSet per mixed collection pause count and the heap
waste percentage.
Old-to-young references—G1 GC maintains pointers from regions in the old
generation into the young generation region. The young generation region is
said to “own” the RSet and hence the region is said to be an RSet “owning”
region.
Old-to-old references—Here pointers from different regions in the old generation
will be maintained in the RSet of the “owning” old generation region.
In Figure 2.3, we can see one young region (Region x) and two old regions (Region
y and Region z). Region x has an incoming reference from Region z. This reference
is noted in the RSet for Region x. We also observe that Region z has two incoming
references, one from Region x and another from Region y. The RSet for Region z needs
to note only the incoming reference from Region y and doesn’t have to remember
the reference from Region x, since, as explained earlier, young generation is always
collected in its entirety. Finally, for Region y, we see an incoming reference from
Region x, which is not noted in the RSet for Region y since Region x is a young region.
As shown in Figure 2.3, there is only one RSet per region. Depending on the
application, it could be that a particular region (and thus its RSet) could be “popular”
such that there could be many updates in the same region or even to the same
location. This is not that uncommon in Java applications.
G1 GC has its way of handling such demands of popularity; it does so by changing
the density of RSets. The density of RSets follows three levels of granularity, namely,
sparse, fine, and coarse. For a popular region, the RSet would probably get coarsened
to accommodate the pointers from various other regions. This will be reflected in
the RSet scanning time for those regions. (Refer to Chapter 3 for more details on
RSet scanning times.) Each of the three granular levels has a per-region-table (PRT)
abstract housing for any particular RSet.
Since G1 GC regions are internally further divided into chunks, at the G1 GC
region level the lowest granularity achievable is a 512-byte heap chunk called a
“card” (refer to Figure 2.4). A global card table maintains all the cards.
26 Chapter 2
Garbage First Garbage Collector in Depth
Region x Region y
Region z
LEGEND
When a pointer makes a reference to an RSet’s owning region, the card containing
that pointer is noted in the PRT. A sparse PRT is basically a hash table of those card
indices. This simple implementation leads to faster scan times by the garbage collector.
On the other hand, fine-grained PRT and coarse-grained bitmap are handled in a
different manner. For fine-grained PRT, each entry in its open hash table corresponds
to a region (with a reference into the owning region) where the card indices within
Remembered Sets and Their Importance 27
that region are stored in a bitmap. There is a maximum limit to the fine-grained PRT,
and when it is exceeded, a bit (called the “coarse-grained bit”) is set in the coarse-
grained bitmap. Once the coarse-grained bit is set, the corresponding entry in the
fine-grained PRT is deleted. The coarse-grained bitmap is simply a bitmap with one
bit per region such that a set bit means that the corresponding region might contain
a reference to the owning region. So then the entire region associated with the set
bit must be scanned to find the reference. Hence a remembered set coarsened to a
coarse-grained bitmap is the slowest to scan for the garbage collector. More details
on this can be found in Chapter 3.
During any collection cycle, when scanning the remembered sets and thus the
cards in the PRT, G1 GC will mark the corresponding entry in the global card table to
avoid rescanning that card. At the end of the collection cycle this card table is cleared;
this is shown in the GC output (printed with -XX:+PrintGCDetails) as Clear CT
and is next in sequence to the parallel work done by the GC threads (i.e., external
root scanning, updating and scanning the remembered sets, object copying, and
termination protocol). There are also other sequential activities such as choosing and
freeing the CSet and also reference processing and enqueuing. Here is sample output
using -XX:+UseG1GC -XX:PrintGCDetails -XX:PrintGCTimeStamps with build
JDK 8u45. The RSet and card table activities are highlighted. More details on the
output are covered in Chapter 3.
Continued
28 Chapter 2
Garbage First Garbage Collector in Depth
object.field = some_other_object;
object.field = some_other_object;
This assignment will trigger the barrier code. Since the barrier is issued after a
write to any reference, it is called a “post-write” barrier. Write barrier instruction
Remembered Sets and Their Importance 29
sequences can get very expensive, and the throughput of the application will fall
proportionally with the complexity of the barrier code; hence G1 GC does the
minimum amount of work that is needed to figure out if the reference update is a
cross-region update since a cross-region reference update needs to be captured in the
RSet of the owning region. For G1 GC, the barrier code includes a filtering technique
briefly discussed in “Older-First Garbage Collection in Practice” [4] that involves a
simple check which evaluates to zero when the update is in the same region. The
following pseudo-code illustrates G1 GC’s write barrier:
Tip
The concurrent refinement threads are threads dedicated to the sole purpose of
maintaining the remembered sets by scanning the logged cards in the filled log buffers
and then updating the remembered set for those regions. The maximum number of
refinement threads is determined by –XX:G1ConcRefinementThreads. As of JDK 8u45,
if –XX:G1ConcRefinementThreads is not set on the command line, it is ergonomically
set to be the same as –XX:ParallelGCThreads.
Once the update log buffer reaches its holding capacity, it is retired and a new
log buffer is allocated. The card enqueuing then happens in this new buffer. The
retired buffer is placed in a global list. Once the refinement threads find entries in
the global list, they start concurrently processing the retired buffers. The refinement
threads are always active, albeit initially only a few of them are available. G1 GC
handles the deployment of the concurrent refinement threads in a tiered fashion,
adding more threads to keep up with the amount of filled log buffers. Activation
thresholds are set by the following flags: -XX:G1ConcRefinementGreenZone,
-XX:G1ConcRefinementYellowZone, and -XX:G1ConcRefinementRedZone. If
the concurrent refinement threads can’t keep up with the number of filled buffers,
mutator threads are enlisted for help. At such a time, the mutator threads will stop
their work and help the concurrent refinement threads to finish processing the filled
log buffers. Mutator threads in GC terminology are the Java application threads.
Hence, when the concurrent refinement threads can’t keep up the number of filled
buffers, the Java application will be halted until the filled log buffers are processed.
Thus, measures should be taken to avoid such a scenario.
30 Chapter 2
Garbage First Garbage Collector in Depth
Tip
Users are not expected to manually tune any of the three refinement zones. There
may be rare occasions when it makes sense to tune –XX:G1ConcRefinementThreads
or –XX:ParallelGCThreads. Chapter 3 explains more about concurrent refinement and
the refinement threads.
Concurrent Marking in G1 GC
With the introduction of G1 GC regions and liveness accounting per region, it became
clear that an incremental and complete concurrent marking algorithm was required.
Taiichi Yuasa presented an algorithm for incremental mark and sweep GC in which
he employed a “snapshot-at-the-beginning” (SATB) marking algorithm [5].
Yuasa’s SATB marking optimization concentrated on the concurrent marking
phase of the mark-sweep GC. The SATB marking algorithm was well suited for G1
GC’s regionalized heap structure and addressed a major complaint about the HotSpot
JVM’s CMS GC algorithm—the potential for lengthy remark pauses.
G1 GC establishes a marking threshold which is expressed as a percentage of the
total Java heap and defaults to 45 percent. This threshold, which can be set at the com-
mand line using the -XX:InitiatingHeapOccupancyPercent (IHOP) option, when
crossed, will initiate the concurrent marking cycle. The marking task is divided into
chunks such that most of the work is done concurrently while the mutator threads are
active. The goal is to have the entire Java heap marked before it reaches its full capacity.
The SATB algorithm simply creates an object graph that is a logical “snapshot”
of the heap. SATB marking guarantees that all garbage objects that are present at
the start of the concurrent marking phase will be identified by the snapshot. Objects
allocated during the concurrent marking phase will be considered live, but they are
not traced, thus reducing the marking overhead. The technique guarantees that all
live objects that were alive at the start of the marking phase are marked and traced,
and any new allocations made by the concurrent mutator threads during the marking
cycle are marked as live and consequently not collected.
The marking data structures contain just two bitmaps: previous and next. The
previous bitmap holds the last complete marking information. The current marking
cycle creates and updates the next bitmap. As time passes, the previous marking
information becomes more and more stale. Eventually, the next bitmap will replace
the previous bitmap at the completion of the marking cycle.
Corresponding to the previous bitmap and the next bitmap, each G1 GC heap
region has two top-at-mark-start (TAMS) fields respectively called previous TAMS
(or PTAMS) and next TAMS (or NTAMS). The TAMS fields are useful in identifying
objects allocated during a marking cycle.
Concurrent Marking in G1 GC 31
Bottom
PTAMS
Previous
Bitmap
Top
NTAMS
Next
Bitmap
End
Figure 2.5 A heap region showing previous bitmap, next bitmap, PTAMS,
and NTAMS during initial mark
Bottom
PTAMS
Previous
Bitmap
NTAMS
Next
Bitmap
Top
End
Figure 2.6 A heap region showing previous bitmap, next bitmap, PTAMS,
and NTAMS during concurrent marking
At the start of a marking cycle, the NTAMS field is set to the current top of each
region as shown in Figure 2.5. Objects that are allocated (or have died) since the
start of the marking cycle are located above the corresponding TAMS value and
are considered to be implicitly live. Live objects below TAMS need to be explicitly
marked. Let’s walk through an example: In Figure 2.6, we see a heap region during
32 Chapter 2
Garbage First Garbage Collector in Depth
Bottom
PTAMS
Previous
Bitmap
NTAMS
Next
Bitmap
Top
End
Bottom
PTAMS
Previous
Bitmap
Top
End
Bottom
PTAMS
All Live Are Marked Previous
Bitmap
NTAMS
Next
Bitmap
Top
End
Bottom
PTAMS
Previous
Bitmap
NTAMS
Next
All Objects Are Bitmap
Implicitly Live
Top
End
include the objects allocated during the concurrent marking and are thus allocated
above the NTAMS and are implicitly live with respect to the next bitmap, as shown in
Figure 2.10. At the end of the remark pause, all live objects above the PTAMS and below
the NTAMS are completely marked as shown in Figure 2.9. As mentioned earlier, objects
allocated during the concurrent marking cycle will be allocated above the NTAMS and
are considered implicitly live with respect to the next bitmap (see Figure 2.10).
www.allitebooks.com
34 Chapter 2
Garbage First Garbage Collector in Depth
Initial Mark
During initial mark, the mutator threads are stopped in order to facilitate the mark-
ing of all the objects in the Java heap that are directly reachable by the roots (also
called root objects).
Tip
Root objects are objects that are reachable from outside of the Java heap; native stack objects
and JNI (Java Native Interface) local or global objects are some examples.
Since the mutator threads are stopped, the initial-mark stage is a stop-the-
world phase. Also, since young collection also traces roots and is stop-the-world, it’s
convenient (and time efficient) to carry out initial marking at the same time as a reg-
ular young collection. This is also known as “piggybacking.” During the initial-mark
pause, the NTAMS value for each region is set to the current top of the region
(see Figure 2.5). This is done iteratively until all the regions of the heap are processed.
Concurrent Marking
The concurrent marking stage is concurrent and multithreaded. The command-line
option to set the number of concurrent threads to be used is -XX:ConcGCThreads.
By default, G1 GC sets the total number of threads to one-fourth of the parallel
Stages of Concurrent Marking 35
if (marking_is_active) {
pre_val := x.f;
if (pre_val != NULL) {
satb_enqueue(pre_val);
}
}
the corresponding bit in the marking bitmap (and pushing the object onto a local
marking stack if the object lies behind the finger). Marking then iterates over the
set bits in a section of the marking bitmap, tracing the field references of the marked
objects, setting more bits in the marking bitmap, and pushing objects as necessary.
Live data accounting is piggybacked on the marking operation. Hence, every time
an object is marked, it is also counted (i.e., its bytes are added to the region’s total).
Only objects below NTAMS are marked and counted. At the end of this stage, the
next marking bitmap is cleared so that it is ready when the next marking cycle starts.
This is done concurrently with the mutator threads.
Tip
JDK 8u40 introduces a new command-line option - X X : + C l a s s U n l o a d i n g
WithConcurrentMark which, by default, enables class unloading with concurrent marking.
Hence, concurrent marking can track classes and calculate their liveness. And during the
remark stage, the unreachable classes can be unloaded.
Remark
The remark stage is the final marking stage. During this stop-the-world stage, G1 GC
completely drains any remaining SATB log buffers and processes any updates. G1 GC
also traverses any unvisited live objects. As of JDK 8u40, the remark stage is stop-the-
world, since mutator threads are responsible for updating the SATB log buffers and as
such “own” those buffers. Hence, a final stop-the-world pause is necessary to cover all
the live data and safely complete live data accounting. In order to reduce time spent
in this pause, multiple GC threads are used to parallel process the log buffers. The
-XX:ParallelGCThreads help set the number of GC threads available during any
GC pause. Reference processing is also a part of the remark stage.
Tip
Any application that heavily uses reference objects (weak references, soft references, phantom
references, or final references) may see high remark times as a result of the reference-processing
overhead. We will learn more about this in Chapter 3.
Cleanup
During the cleanup pause, the two marking bitmaps swap roles: the next marking
bitmap becomes the previous one (given that the current marking cycle has been
finalized and the next marking bitmap now has consistent marking information), and
Evacuation Failures and Full Collection 37
the previous marking bitmap becomes the next one (which will be used as the current
marking bitmap during the next cycle). Similarly, PTAMS and NTAMS swap roles as
well. Three major contributions of the cleanup pause are identifying completely free
regions, sorting the heap regions to identify efficient old regions for mixed garbage
collection, and RSet scrubbing. Current heuristics rank the regions according to live-
ness (the regions that have a lot of live objects are really expensive to collect, since
copying is an expensive operation) and remembered set size (again, regions with
large remembered sets are expensive to collect due to the regions’ popularity—the
concept of popularity was discussed in the “RSets and Their Importance” section).
The goal is to collect/evacuate the candidate regions that are deemed less expensive
(fewer live objects and less popular) first.
An advantage of identifying the live objects in each region is that on encountering
a completely free region (that is, a region with no live objects), its remembered set
can be cleared and the region can be immediately reclaimed and returned to the
list of free regions instead of being placed in the GC-efficient (the concept of GC
efficiency was discussed in the “Garbage Collection in G1” section) sorted array and
having to wait for a reclamation (mixed) garbage collection pause. RSet scrubbing
also helps detect stale references. So, for example, if marking finds that all the objects
on a particular card are dead, the entry for that particular card is purged from the
“owning” RSet.
There are other times when a humongous allocation may not be able to find
contiguous regions in the old generation for allocating humongous objects.
At such times, the G1 GC will attempt to increase its usage of the Java heap. If
the expansion of the Java heap space is unsuccessful, G1 GC triggers its fail-safe
mechanism and falls back to a serial (single-threaded) full collection.
38 Chapter 2
Garbage First Garbage Collector in Depth
During a full collection, a single thread operates over the entire heap and does
mark, sweep, and compaction of all the regions (expensive or otherwise) constituting
the generations. After completion of the collection, the resultant heap now consists of
purely live objects, and all the generations have been fully compacted.
Tip
Prior to JDK 8u40, unloading of classes was possible only at a full collection.
The single-threaded nature of the serial full collection and the fact that the
collection spans the entire heap can make this a very expensive collection, especially
if the heap size is fairly large. Hence, it is highly recommended that a nontrivial
tuning exercise be done in such cases where full collections are a frequent occurrence.
Tip
For more information on how to get rid of evacuation failures, please refer to Chapter 3.
References
[1] Charlie Hunt and Binu John. Java™ Performance. Addison-Wesley, Upper Saddle
River, NJ, 2012. ISBN 978-0-13-714252-1.
[2] Tony Printezis and David Detlefs. “A Generational Mostly-Concurrent Garbage
Collector.” Proceedings of the 2nd International Symposium on Memory Management.
ACM, New York, 2000, pp. 143–54. ISBN 1-58113-263-8.
[3] Urs Hölzle. “A Fast Write Barrier for Generational Garbage Collectors.” Presented
at the OOPSLA’93 Garbage Collection Workshop, Washington, DC, October 1993.
[4] Darko Stefanovic, Matthew Hertz, Stephen M. Blackburn, Kathryn S McKinley,
and J. Eliot B. Moss. “Older-First Garbage Collection in Practice: Evaluation in a Java
Virtual Machine.” Proceedings of the 2002 Workshop on Memory System Performance.
ACM, New York, 2002, pp. 25–36.
[5] Taiichi Yuasa. “Real-Time Garbage Collection on General Purpose Machines.”
Journal of Systems and Software, Volume 11, Issue 3, March 1990, pp. 181–98.
Elsevier Science, Inc., New York.
3
Garbage First Garbage
Collector Performance
Tuning
Tip
The serial stages of the young collection pause can be multithreaded and use the value
of -XX:ParallelGCThreads to determine the GC worker thread count.
39
40 Chapter 3
Garbage First Garbage Collector Performance Tuning
Let’s look at an excerpt from a G1 GC output log generated while running DaCapo
with the HotSpot VM command-line option -XX:+PrintGCDetails. Here is the
command-line and “Java version” output:
The snippet shows one G1 GC young collection pause, identified in the first line
by (G1 Evacuation Pause) and (young). The line’s timestamp is 108.815, and
total pause time is 0.0543862 seconds:
The Stages of a Young Collection 41
The following lines show the major parallel work carried out by the eight worker
threads:
[GC Worker Start (ms): Min: 108815.5, Avg: 108815.5, Max: 108815.6, Diff:
0.1]
[Ext Root Scanning (ms): Min: 0.1, Avg: 0.2, Max: 0.2, Diff: 0.1, Sum: 1.2]
[Update RS (ms): Min: 12.8, Avg: 13.0, Max: 13.2, Diff: 0.4, Sum: 103.6]
[Processed Buffers: Min: 15, Avg: 16.0, Max: 17, Diff: 2, Sum: 128]
[Scan RS (ms): Min: 13.4, Avg: 13.6, Max: 13.7, Diff: 0.3, Sum: 109.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 25.1, Avg: 25.2, Max: 25.2, Diff: 0.1, Sum: 201.5]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.4]
[GC Worker Total (ms): Min: 51.9, Avg: 52.0, Max: 52.1, Diff: 0.1,
Sum: 416.0]
[GC Worker End (ms): Min: 108867.5, Avg: 108867.5, Max: 108867.6, Diff: 0.1]
GC Worker Start and GC Worker End tag the starting and ending timestamps
respectively of the parallel phase. The Min timestamp for GC Worker Start is the
time at which the first worker thread started; similarly, the Max timestamp for GC
Worker End is the time at which the last worker thread completed all its tasks. The
lines also contain Avg and Diff values in milliseconds. The things to look out for in
those lines are:
How far away the Diff value is from 0, 0 being the ideal.
Any major variance in Max, Min, or Avg. This indicates that the worker
threads could not start or finish their parallel work around the same time.
That could mean that some sort of queue-handling issue exists that requires
further analysis by looking at the parallel work done during the parallel
phase.
42 Chapter 3
Garbage First Garbage Collector Performance Tuning
[Ext Root Scanning (ms): Min: 0.1, Avg: 0.2, Max: 0.2, Diff: 0.1, Sum: 1.2]
Here again, we look for Diff >> 0 and major variance in Max, Min, or Avg.
Tip
The variance (Diff) is shown for all the timed activities that make up the parallel phase.
A high variance usually means that the work is not balanced across the parallel threads for
that particular activity. This knowledge is an analysis starting point, and ideally a deeper dive
will identify the potential cause, which may require refactoring the Java application.
Another thing to watch out for is whether a worker thread is caught up in dealing
with a single root. We have seen issues where the system dictionary, which is treated
as a single root, ends up holding up a worker thread when there is a large number
of loaded classes. When a worker thread is late for “termination” (explained later in
this section), it is also considered held up.
[Update RS (ms): Min: 12.8, Avg: 13.0, Max: 13.2, Diff: 0.4, Sum: 103.6]
[Processed Buffers: Min: 15, Avg: 16.0, Max: 17, Diff: 2, Sum: 128]
In order to limit the time spent updating RSets, G1 sets a target time as a percentage
of the pause time goal (-XX:MaxGCPauseMillis). The target time defaults to
10 percent of the pause time goal. Any evacuation pause should spend most of its time
copying live objects, and 10 percent of the pause time goal is considered a reason-
able amount of time to spend updating RSets. If after looking at the logs you realize
that spending 10 percent of your pause time goal in updating RSets is undesirable,
you can change the percentage by updating the -XX:G1RSetUpdatingPauseTime
Percent command-line option to reflect your desired value. It is important to
remember, however, that if the number of updated log buffers does not change, any
decrease in RSet update time during the collection pause will result in fewer buffers
being processed during that pause. This will push the log buffer update work off onto
the concurrent refinement threads and will result in increased concurrent work and
sharing of resources with the Java application mutator threads. Also, worst case, if
the concurrent refinement threads cannot keep up with the log buffer update rate,
the Java application mutators must step in and help with the processing—a scenario
best avoided!
Tip
As discussed in Chapter 2, there is a command-line option called –XX:G1Conc
RefinementThreads. By default it is set to the same value as –XX:ParallelGCThreads,
which means that any change in XX:ParallelGCThreads will change the
–XX:G1ConcRefinementThreads value as well.
Before collecting regions in the current CSet, the RSets for the regions in the CSet
must be scanned for references into the CSet regions. As discussed in Chapter 2, a
popular object in a region or a popular region itself can lead to its RSet being coars-
ened from a sparse PRT (per-region table) to a fine-grained PRT or even a coarsened
bitmap, and thus scanning such an RSet will require more time. In such a scenario,
you will see an increase in the Scan RS time shown here since the scan times depend
on the coarseness gradient of the RSet data structures:
[Scan RS (ms): Min: 13.4, Avg: 13.6, Max: 13.7, Diff: 0.3, Sum: 109.0]
Another parallel task related to RSets is code root scanning, during which the code
root set is scanned to find references into the current CSet:
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0543862,
Sum: 0.1]
44 Chapter 3
Garbage First Garbage Collector Performance Tuning
In earlier versions of HotSpot, the entire code cache was treated as a single root
and was claimed and processed by a single worker thread. A large and full or nearly
full code cache would thus hold up that worker thread and lead to an increase in
the total pause time. With the introduction of code root scanning as a separate
parallel activity, the work of scanning the nmethods is reduced to just scanning
the RSets for references from the compiled code. Hence for a particular region in
the CSet, only if the RSet for that region has strong code roots is the corresponding
nmethod scanned.
Tip
Developers often refer to the dynamically compiled code for a Java method by the HotSpot
term of art nmethod. An nmethod is not to be confused with a native method, which refers
to a JNI method. Nmethods include auxiliary information such as constant pools in addition
to generated code.
Tip
To reduce nmethod scanning times, only the RSets of the regions in the CSet are scanned for
references introduced by the compiler, rather than the “usual” references that are introduced
by the Java application mutator threads.
Tip
–XX:+G1SummarizeRSetStats is a diagnostic option and hence must be enabled by
adding –XX:+UnlockDiagnosticVMOptions to the command line, for example,
JAVA_OPTS=”-XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:
+PrintGCDetails -XX:+G1SummarizeRSetStats -XX:G1Summarize
RSetStatsPeriod=1 -Xloggc:jdk8u45_h2.log”.
The Stages of a Young Collection 45
Before GC RS summary
After GC RS summary
Continued
46 Chapter 3
Garbage First Garbage Collector Performance Tuning
The main things to look out for in this snippet are as follows:
The log output is printed for both before and after the GC pause. The Processed
cards tag summarizes the work done by the concurrent refinement threads and
sometimes, though very rarely, the Java application mutator threads. In this case, 96
completed buffers had 23,270 processed cards, and all (100 percent) of the work was
done by the concurrent RSet refinement threads. There were no RSet coarsenings as
indicated by Did 0 coarsenings.
Other parts of the log output describe concurrent RSet times and current RSet
statistics, including their sizes and occupied cards per region type (young, free,
humongous, or old). You can use the log to figure out how the code root sets are
The Stages of a Young Collection 47
referencing the RSets for each region type as well as the total number of references
per region type.
Tip
The ability to visualize four areas of potential improvement—RSet coarsenings, updating
RSets, scanning RSets, and scanning nmethods referencing RSets—can help significantly in
understanding your application and may pave the way for application improvements.
[Object Copy (ms): Min: 25.1, Avg: 25.2, Max: 25.2, Diff: 0.1, Sum: 201.5]
Tip
G1 GC uses the copy times as weighted averages to predict the time it takes to copy a single
region. Users can adjust the young generation size if the prediction logic fails to keep up with
the desired pause time goal.
Termination
After completing the tasks just described, each worker thread offers termination if its
work queue is empty. A thread requesting termination checks the other threads’ work
queues to attempt work stealing. If no work is available, it terminates. Termination
tags the time that each worker thread spends in this termination protocol. A GC
worker thread that gets caught up in a single root scan can be late to complete all
the tasks in its queue and hence eventually be late for termination.
48 Chapter 3
Garbage First Garbage Collector Performance Tuning
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
If any (or all) worker threads are getting caught up somewhere, it will show up
in long termination times and may indicate a work-stealing or load-balancing issue.
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.4]
[GC Worker Total (ms): Min: 51.9, Avg: 52.0, Max: 52.1, Diff: 0.1,
Sum: 416.0]
Tip
The main GC thread is the VM thread that executes the GC VM operation during a safepoint.
Tip
During a mixed collection pause, Code Root Fixup will include the time spent in updating
non-evacuated regions.
Tip
For a young collection all the young regions are collected, hence there is no “choosing”
as such since all the young regions automatically become a part of the young CSet. The
“choosing” occurs for a mixed collection pausse and becomes an important factor in
understanding how to “tame” your mixed collections. We will discuss choosing the CSet in
detail later in this chapter.
Reference processing and enqueuing are for soft, weak, phantom, final, and JNI
references. We discuss this and more in the section titled “Reference Processing Tuning.”
The act of reference enqueuing may require updating the RSets. Hence, the
updates need to be logged and their associated cards need to be marked as dirty.
50 Chapter 3
Garbage First Garbage Collector Performance Tuning
The time spent redirtying the cards is shown as the Redirty Cards time (0.2 ms
in the preceding example).
Humongous Reclaim is new in JDK 8u40. If a humongous object is found to be
unreachable by looking at all references from the root set or young generation regions
and by making sure that there are no references to the humongous object in the RSet,
that object can be reclaimed during the evacuation pause. (See Chapter 2 for detailed
descriptions of humongous regions and humongous objects.)
The remainder of the Other time is spent in fixing JNI handles and similar work.
The collective time in Other should be very small, and any time hog should have a
reasonable explanation. As an example, you could see higher times in Free CSet
if your CSet per pause is very large. Similarly, Ref Proc and Ref Enq could show
higher times depending on how many references are used in your application. Similar
reasoning can be applied to Humongous Reclaim times, if you have many short-lived
humongous objects.
6.317: [GC pause (G1 Evacuation Pause) (young) 6.317: [G1Ergonomics (CSet
Construction) start choosing CSet, _pending_cards: 5800, predicted base
time: 20.39 ms, remaining time: 179.61 ms, target pause time: 200.00 ms]
6.317: [G1Ergonomics (CSet Construction) add young regions to CSet, eden:
225 regions, survivors: 68 regions, predicted young region time: 202.05
ms]
6.317: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 225
regions, survivors: 68 regions, old: 0 regions, predicted pause time:
222.44 ms, target pause time: 200.00 ms]
, 0.1126312 secs]
In the above example, the pause time goal was left at its default value of 200ms,
and it was observed that even though the prediction logic predicted the pause time to
be 222.44ms, the actual pause time was only 112.63 ms. In this case, we could have
easily added more young regions to the CSet.
The second example above showcases a similar scenario to the one shown before,
only here the pause time target was changed to 50 ms (no adjustments were made
to the young generation). Once again, the prediction logic was off and it predicted a
pause time of 75.36 ms, whereas the actual pause time was 21.86 ms.
After adjusting the young generation (as can be seen in the number of eden and
survivor regions added to the CSet in the third example, below), we could get the
pause times to be in the 50 ms range as shown here:
Here, even though the prediction logic is still way off, our pause time (50.75 ms)
is in the desired range (50 ms).
Tip
Unlike CMS GC’s initiation of a marking cycle, which is with respect to its old generation size,
G1’s InitiatingHeapOccupancyPercent is with respect to the entire Java heap size.
The concurrent marking cycle starts with an initial marking pause which happens
at the same time as (aka, is “piggybacked onto”) a young collection pause. This pause
marks the beginning of the collection cycle and is followed by other concurrent and
parallel tasks for root region scanning, concurrent marking and liveness accounting,
final mark, and cleanup. Figure 3.1 shows all the pauses in a concurrent mark-
ing cycle: initial mark, remark, and cleanup. To learn more about the concurrent
marking cycle please refer to Chapter 2.
The concurrent marking tasks can take a long time if the application’s live object
graph is large and may often be interrupted by young collection pauses. The concur-
rent marking cycle must be complete before a mixed collection pause can start and is
immediately followed by a young collection that calculates the thresholds required to
trigger a mixed collection on the next pause, as shown in Figure 3.1. The figure shows
an initial-mark pause (which, as mentioned earlier, piggybacks on a young collection).
There could be more than one young collection when the concurrent phase is under
way (only one pause is shown in the figure). The final mark (also known as remark)
completes the marking, and a small cleanup pause helps with the cleanup activities
as described in Chapter 2. There is a young generation evacuation pause right after
the cleanup pause which helps prepare for the mixed collection cycle. The four pauses
after this young collection pause are the mixed collection evacuation pauses that
successfully collect all the garbage out of the target CSet regions.
Concurrent Marking Phase Tunables 53
(in milliseconds)
Initial Mark
Pause Time
Figure 3.1 Young collection pauses, mixed collection pauses, and pauses in a
concurrent marking cycle
If any of the concurrent marking tasks and hence the entire cycle take too long
to complete, a mixed collection pause is delayed, which could eventually lead to an
evacuation failure. An evacuation failure will show up as a to-space exhausted
message on the GC log, and the total time attributed to the failure will be shown in
the Other section of the pause. Here is an example log snippet:
Continued
54 Chapter 3
Garbage First Garbage Collector Performance Tuning
When you see such messages in your log, you can try the following to avoid the
problem:
It is imperative to set the marking threshold to fit your application’s static plus
transient live data needs. If you set the marking threshold too high, you risk
running into evacuation failures. If you set the marking threshold too low, you
may prematurely trigger concurrent cycles and may reclaim close to no space
during your mixed collections. It is generally better to err on the side of start-
ing the marking cycle too early rather than too late, since the negative conse-
quences of an evacuation failure tend to be greater than those of the marking
cycle running too frequently.
If you think that the marking threshold is correct, but the concurrent cycle is
still taking too long and your mixed collections end up “losing the race” to reclaim
regions and triggering evacuation failures, try increasing your total concurrent
thread count. -XX:ConcGCThreads defaults to one-fourth of -XX:Parallel
GCThreads. You can either increase the concurrent thread count directly or
increase the parallel GC thread count, which effectively increases the concurrent
thread count.
Tip
Increasing the concurrent thread count will take processing time away from mutator
(Java application) threads since the concurrent GC threads work at the same time as your
application threads.
regions selected from the old generation. Tuning mixed collections can be broken
down to varying the number of old regions in the mixed collection’s CSet and adding
enough back-to-back mixed collections to diffuse the cost of any single one of them
over the time it takes to collect all eligible old regions. Taming mixed collections will
help you achieve your system-level agreement for GC overhead and responsiveness.
The -XX:+PrintAdaptiveSizePolicy option dumps details of G1’s ergonomics
heuristic decisions. An example follows.
First the command line:
The first line tells us the evacuation pause type, in this case a mixed collection,
and the predicted times for activities such as CSet selection and adding young and
old regions to the CSet.
On the fifth timestamp, tagged Mixed GCs, you can see G1 decide to continue with
mixed collections since there are candidate regions available and reclaimable bytes
are still higher than the default 5 percent threshold.
This example highlights two tunables: the number of old regions to be added to
the CSet as can be seen here:
The last line of this example shows that mixed GCs will be continued since there
is enough garbage to be reclaimed (10.24 percent). Here the reclaimable percentage
threshold is kept at its default value of 5 percent.
If mixed collections are becoming exponentially expensive as can be seen in
Figure 3.2, increasing this threshold will help. Remember, however, that the increase
will leave more regions fragmented and occupied. This means that the old generation
will retain more (transient) live data, which must be accounted for by adjusting your
marking threshold accordingly.
The Taming of a Mixed Garbage Collection Phase 57
(in milliseconds)
Pause Time
The minimum threshold for the number of old regions to be included in the
CSet per mixed collection pause within a mixed collection cycle is specified by
-XX:G1MixedGCCountTarget and defaults to 8. As briefly discussed in Chapter 2,
the minimum number of old regions per mixed collection pause is
Minimum old CSet size per mixed collection pause = Total number of candidate
old regions identified for mixed collection cycle/G1MixedGCCountTarget
This formula determines the minimum number of old regions per CSet for each
mixed collection to be x such as to facilitate y back-to-back mixed collections that will
collect all candidate old regions. The set of back-to-back mixed collections carried out
after a completed concurrent marking cycle constitutes a mixed collection cycle. Let’s
look at line 4 of the preceding example:
Line 4 tells us that only 24 regions were added to the CSet, since the minimum
number of old regions to be added per CSet was not met. The previous (young)
collection pause tells us that there are 189 candidate old regions available for
reclamation, hence G1 GC should start a mixed collection cycle:
58 Chapter 3
Garbage First Garbage Collector Performance Tuning
Lines 3 and 5 show that even though there were more candidate old regions
available for collection, the total number of old regions in the current CSet was
capped at 103 regions. As mentioned in the preceding description, the 103 region
count comes from the total heap size of 1G and the 10 percent default value for
G1OldCSetRegionThresholdPercent, which rounds up to 103.
Now that we know how to specify the minimum and maximum number of regions
per CSet per mixed collection, we can modify the thresholds to suit our pause time
goal and at the same time maintain the desired amount of transient live data in the
old generation.
The -XX:G1MixedGCLiveThresholdPercent option, which defaults to 85 percent
(JDK 8u45), is the maximum percentage of live data within a region that will allow it
to be included in a CSet. Per-region live data percentages are computed during the con-
current marking phase. An old region deemed to be too expensive to evacuate—that is,
whose live data percentage is above the liveness threshold—is not included as a CSet
candidate region. This option directly controls fragmentation per region, so be careful.
Avoiding Evacuation Failures 59
Tip
Increasing the G1MixedGCLiveThresholdPercent value means that it will take longer to
evacuate old regions, which means that mixed collection pauses will also be longer.
Heap size. Make sure that you can accommodate all the static and transient
live data and your short- and medium-lived application data in your Java heap.
Apart from the accommodation of live data, additional Java heap space, or
headroom, should be available in order for GC to operate efficiently. The more
the available headroom, the higher the possible throughput and/or the lower
the possible latency.
Avoid over-specifying your JVM command-line options! Let the defaults work
for you. Get a baseline with just your initial and maximum heap settings and a
desired pause time goal. If you already know that the default marking threshold
is not helping, add your tuned marking threshold on the command line for the
baseline run. So, your base command would look something like this:
-Xms2g -Xmx4g -XX:MaxGCPauseMillis=100
or
-Xms2g -Xmx4g -XX:MaxGCPauseMillis=100 -
XX:InitiatingHeapOccupancyPercent=55
If your application has long-lived humongous objects, make sure that your
marking threshold is set low enough to accommodate them. Also, make sure
that the long-lived objects that you deem “humongous” are treated as such by
G1. You can ensure this by setting -XX:G1HeapRegionSize to a value that
guarantees that objects greater than or equal to 50 percent of the region size
are treated as humongous. As mentioned in Chapter 2, the default value is
calculated based on your initial and maximum heap sizes and can range from
1 to 32MB.
Here is a log snippet using -XX:+PrintAdaptiveSizePolicy:
A concurrent cycle is being requested since the heap occupancy crossed the
marking threshold due to a humongous allocation request. The request was for
2,097,168 bytes, which is far larger than the 1MB G1 heap region size default
set by G1 at JVM start-up time.
There are times when evacuation failures are caused by not having enough
space in the survivor regions for newly promoted objects. When you observe
this happening, try increasing -XX:G1ReservePercent. The reserve percent
creates a false ceiling for the reserved space so as to accommodate any variation
in promotion patterns. The default value is 10 percent of the total Java heap and
is limited by G1 to 50 percent of the total Java heap since more would amount
to a very large amount of wasted reserve space.
Reference Processing
Garbage collection must treat Java reference objects—phantom references, soft
references, and weak references—differently from other Java objects. Reference
objects require more work to collect than non-reference objects.
In this section, you will learn how to identify if the time required to perform
reference processing during a G1 garbage collection pause is an issue for your
application and how to tune G1 to reduce this overhead, along with tips for isolating
which reference object type is inducing the most overhead. Depending on the
application and its pause time requirements, refactoring the application’s source
code may be required to reduce reference processing overhead.
Tip
Java Platform, Standard Edition (Java SE) API documentation for each of the reference
object types (http://docs.oracle.com/javase/8/docs/api) and the java.lang.ref package
APIs for phantom reference, soft reference, and weak reference are both good sources for
understanding how each reference object type behaves and how and when they are garbage
collected.
separately from the time spent processing them. The two times are reported in the
Other section of the log on both young and mixed GCs:
Ref Proc is the time spent processing reference objects, and Ref Enq is the
time spent enqueuing reference objects. As the example suggests, the time spent in
Ref Enq is rarely as long as the time spent in Ref Proc. In fact, we have yet to see
an application that consistently has higher Ref Enq times. If it happens, it means
that the amount of effort required to process a reference is very small relative to its
enqueuing time, which is unlikely for most reference object types.
G1 also reports reference processing activity during the remark phase during G1’s
concurrent cycle. Using -XX:+PrintGCDetails, the log output from remark will
include reference processing time:
Note that G1 is the only HotSpot GC that reports reference processing times with
-XX:+PrintGCDetails. However, all HotSpot GCs will report per-reference-object-
type information using -XX:+PrintReferenceGC.
As a general guideline, Ref Proc times in PrintGCDetails output that are
more than 10 percent of the total GC pause time for a G1 young or mixed GC are
62 Chapter 3
Garbage First Garbage Collector Performance Tuning
cause to tune the garbage collector’s reference processing. For G1 remark events it
is common to see a larger percentage of time spent in reference processing since the
remark phase of the concurrent cycle is when the bulk of reference objects discovered
during an old generation collection cycle are processed. If the elapsed time for the G1
remark pause exceeds your target pause time and a majority of that time is spent in
reference processing, tune as described in the next section.
A large number of a particular reference object type indicates that the application
is heavily using it. Suppose you observe the following:
object types, and the amount of time to process them also dominates reference
processing time. Corrective actions include the following:
One reference object type to watch for and be careful of using is the soft reference.
If PrintReferenceGC log output suggests that a large number of soft references are
being processed, you may also be observing frequent old generation collection cycles,
which consist of concurrent cycles followed by a sequence of mixed GCs. If you see
a large number of soft references being processed and GC events are occurring too
frequently, or heap occupancy consistently stays near the maximum heap size, tune
the aggressiveness with which soft references are reclaimed using XX:SoftRefLRU
PolicyMSPerMB. It defaults to a value of 1000, and its units are milliseconds.
The default setting of -XX:SoftRefLRUPolicyMSPerMB=1000 means that a
soft reference will be cleared and made eligible for reclamation if the time it was
last accessed is greater than 1000ms times the amount of free space in the Java
heap, measured in megabytes. To illustrate with an example, suppose -XX:Soft
LRUPolicyMSPerMS=1000, and the amount of free space is 1GB, that is, 1024MB.
Any soft reference that has not been accessed since 1024 × 1000 = 1,024,000ms, or
1024 seconds, or slightly over 17 minutes ago, is eligible to be cleared and reclaimed
by the HotSpot garbage collector.
The effect of setting a lower value for -XX:SoftRefLRUPolicyMSPerMB is to
provoke more aggressive clearing and reclamation of soft references, which leads to
lower heap occupancy after GC events, or in other words, less live data. Conversely,
setting -XX:SoftRefLRUPolicyMSPerMB higher causes less aggressive soft ref-
erence clearing and reclamation, which leads to more live data and higher heap
occupancy. Tuning -XX:SoftRefLRUPolicyMSPerMB may not actually lead to lower
reference processing times and in fact may increase them.
The primary reason to tune -XX:SoftRefLRUPolicyMSPerMB is to reduce
the frequency of old generation collection events by reducing the amount of live
data in the heap. We recommend against the use of soft references as a means for
implementing memory-sensitive object caches in Java applications because doing so
will increase the amount of live data and result in additional GC overheads. See the
sidebar “Using Soft References” for more detail.
64 Chapter 3
Garbage First Garbage Collector Performance Tuning
size Java heap with the same load. This means that if you are using soft references for a
memory-sensitive object cache, the size of that object cache may vary between garbage
collectors, not only between different JVM vendors, but also from one garbage collector to
another within the same JVM. You may observe completely different GC behavior such as
a difference in live data or heap occupancy, number of GCs, the frequency at which they
occur, and/or their duration by moving between HotSpot’s CMS, Parallel, and G1 GCs. JVMs
such as IBM’s J9 or Azul’s Zing may also behave differently with respect to soft reference
reclamation. This puts limitations on the portability of a Java application using soft references
since it will behave differently moving from one garbage collector to another, or at least
require JVM tuning of the constant multiplier, which may be nontrivial, in order to achieve
acceptable application behavior.
References
[1] Charlie Hunt and Binu John. Java™ Performance. Addison-Wesley, Upper Saddle
River, NJ, 2012. ISBN 978-0-13-714252-1.
[2] Nirmar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. “Thread Scheduling
for Multiprogrammed Multiprocessors.” Proceedings of the Tenth Annual ACM
Symposium on Parallel Algorithms and Architectures. ACM, New York, 1998,
pp. 119–29. ISBN 0-89791-989-0.
This page intentionally left blank
4
The Serviceability
Agent
67
68 Chapter 4
The Serviceability Agent
SA Components
The SA consists mostly of Java classes, but it also contains a small amount of native
code to read raw bits from the processes and the core files:
On Solaris, SA uses libproc to read bits from a process or a core file.
On Linux, SA uses a mix of /proc and ptrace (mostly the latter) to read bits
from a process. For core files, SA parses Executable and Linkable Format (ELF)
files directly.
On Windows, SA uses the Windows Debugger Engine interface (dbgeng.dll
library) to read the raw bits from the processes and core files.
sa-jdi.jar
sawindbg.dll on Windows, libsaproc.so on Solaris/Linux
SA components are built as part of the standard build of the HotSpot repository.
The native code component of SA is placed in libsaproc.so or sawindbg.dll, and the
Java classes are placed in sa-jdi.jar.
The binary sa-jdi.jar provides the SA Java APIs and also includes useful debugging
tools implemented using these APIs. It also includes an implementation of the Java
Debug Interface (JDI), which allows JDI clients to do read-only debugging on core
files and hung processes.
JDK 7, and later releases on all platforms
JDK 6u17+ on Solaris and Linux
JDK 6u31+ on Windows
70 Chapter 4
The Serviceability Agent
Prior to these versions, the SA was not shipped with the JDK on Windows, and
only a subset of SA classes was shipped with JDKs on Solaris and Linux. The JDK
versions listed make the complete set of SA classes available on all of these platforms.
:
class constantPoolCacheOopDesc: public arrayOopDesc {
friend class VMStructs;
private:
// the corresponding constant pool
constantPoolOop _constant_pool;
:
nonstatic_field(constantPoolCacheOopDesc, _constant_pool,
constantPoolOop)
From the _constant_pool field in the file vmStructs.cpp, the SA knows there
is a class named constantPoolCacheOopDesc that has a field with the name
_constant_pool of type constantPoolOop in the Java HotSpot VM.
Note that VMStructs is declared as a friend class. Most of the classes in HotSpot
declare VMStructs to be a friend so that the private fields of that class can be
accessed in VMStructs.
During the HotSpot build, vmStructs.cpp is compiled into vmStructs.o, which is
included in the shared library libjvm.so or jvm.dll. vmStructs.o contains all the data
that the SA needs to read the HotSpot data internal representations. And at runtime,
the SA can read this data from the target VM.
The structure and field names declared in vmStructs.cpp are used by the
corresponding Java code in the SA. Thus, if a field named in vmStructs.cpp is deleted
or renamed, the corresponding Java code that accesses that field also needs to be
modified. If declarations in VMStructs and the Java code in SA are not in sync, SA
will fail when it tries to examine a process/core file.
SA Version Matching
As we saw in the previous section, the Java code in SA is a mirror of the C++ code
in HotSpot. If some data structures or algorithms are changed, added, or removed
in HotSpot, the same changes have to be made in the SA Java code. Due to this
tight coupling between the SA Java code and the HotSpot implementation, an SA
instance can reliably debug only the HotSpot VM that was built from the same
repository as that of the SA instance. In order to detect the version mismatch
between the SA and the target HotSpot, we place a file named sa.properties
into sa-jdi.jar during the HotSpot build process. This file contains an SA version
property, for example,
sun.jvm.hotspot.runtime.VM.saBuildVersion=24.0-b56
At runtime, SA reads this property, compares it with the version of the target
HotSpot VM being analyzed, and throws a VMVersionMismatchException if the
versions do not match. This check can be disabled by running the SA tools with the
following system property:
-Dsun.jvm.hotspot.runtime.VM.disableVersionCheck
With this option, SA does not complain if its version does not match the version of
the target VM and attempts to attach to it. This option is useful if you want to attach
the SA to a non-matching version of the target HotSpot VM.
72 Chapter 4
The Serviceability Agent
HSDB
HotSpot Debugger is the main GUI tool. It facilitates examining a Java process,
core file, and also a remote Java process. Let’s see how we can launch and use it on
a Windows machine.
First, let’s set the JAVA_HOME environment variable to the folder where the JDK
that we want to use is installed so that we can use this variable wherever we need
to access files/folders in that JDK:
set JAVA_HOME=d:\Java\7u40
On Windows, the PATH environment variable should contain the location of the
JVM binary used by the target process/core and also the folder where Microsoft
Debugging Tools for Windows is installed on the machine, for example:
set PATH=%JAVA_HOME%\bin\server;d:\windbg;%PATH%
%JAVA_HOME%\bin\java –classpath%JAVA_HOME%\lib\sa-jdi.jar
sun.jvm.hotspot.HSDB
On Solaris and Linux, we just need to set JAVA_HOME to point to the installed JDK
and then launch the tool as in the following:
These launch commands bring up the HSDB GUI tool as shown in Figure 4.1.
The Serviceability Agent Debugging Tools 73
Beginning with JDK 9, there are two different ways to launch HSDB. For instance,
here is an example of how to launch the SA with JDK 9:
set PATH=d:\Java\jdk9-b102\bin;%PATH%
java sun.jvm.hotspot.HSDB
set PATH=d:\Java\jdk9-b102\bin;%PATH%
jhsdb.exe
clhsdb command line debugger
hsdb ui debugger
jstack --help to get more information
jmap --help to get more information
jinfo --help to get more information
jhsdb.exe hsdb
jhsdb.exe clhsdb
hsdb>
And the following shows how to start the jstack tool with additional options that
can be passed to the jstack tool:
jhsdb.exe jstack
--locks to print java.util.concurrent locks
--mixed to print both java and native frames (mixed mode)
--exe executable image name
--core path to coredump
--pid pid of process to attach
Attach to a local HotSpot process
Attach to a core file
Attach to a remote debug server
Tip
Note that the HotSpot core/crash dump file can be very large depending on the amount of
state information it contains at the time of core/crash dump file generation. You may have
to configure the operating system to generate large core files and also ensure that the file
system where the file is generated has sufficient space.
To open a core file with the SA, launch HSDB and click File > Open HotSpot core
file as shown in Figure 4.5. Then, enter the path to the core and the path to the Java
executable as shown in Figure 4.6.
The Serviceability Agent Debugging Tools 77
After attaching to the core file, HSDB provides the same set of tools to explore the
core as it does with the live process.
$JAVA_HOME/bin/rmiregistry -J-Xbootclasspath/p:${JAVA_HOME}/lib/sa-jdi.jar
This command creates and starts a remote object registry. One can specify a port
number at which this registry should be started. If no port number is specified,
the registry is started on port 1099.
2. Start the debug server on the remote machine, specifying the process or core
file to be debugged:
or
uniqueID is an optional string. If we want to run more than one debug server at
the same time on the same machine, we must specify a different uniqueID string
for each debug server. The debug server starts the RMI registry at the default port
1099 if the RMI registry was not already started.
Now, let’s start a Java process on a Solaris SPARC machine, attach a debug server
to it, and then connect to that debug server from a Windows machine.
3. Start the debug server, passing it the HotSpot process ID and the unique name
we want to assign to this debuggee:
From the Windows machine, we can now connect to this specific debug server
using the unique identifier and the hostname in [uniqueID@]hostname format as
shown in Figure 4.7. Once the unique ID and hostname have been entered and the
OK button has been pressed, the SA will display a status window saying it is trying
to connect to the debug server, as shown in Figure 4.8.
After connecting the HSDB to the debug server, we can use all the utilities
available under the Tools menu and debug the process as if it were running on our
local machine.
HSDB Tools
HSDB offers many utilities that help us explore and debug Java processes or core
files.
Java Threads
The first window that appears in HSDB when it is connected to a Java process or
core file is the panel that displays all the Java threads in the target JVM. Figure 4.9
shows all the Java threads in the attached Java process.
This panel has some interesting icons at the top left to show information on the
selected Java thread:
Inspect Thread: This icon brings up the Object Inspector window showing the
VM representation of the thread object. See Figure 4.10.
Show Stack Memory: This shows the stack data with symbol information at the
stack memory of the selected thread as in Figure 4.11.
Show Java Stack Trace: This shows the Java stack trace of a thread. The method
names and addresses are hyperlinks in the displayed stack trace, and clicking
these method links shows the method details in the lower part of the window.
See Figure 4.12.
Show Thread Information: This shows detailed information about the selected
thread, as shown in Figure 4.13.
Find Crashes: This last icon on the Java Threads panel searches for whether
any of the threads encountered a crash and, if so, shows details about the crash.
Now let’s take a quick look at the utilities available in this GUI tool under the
Tools menu, as shown in Figure 4.14.
The Serviceability Agent Debugging Tools 83
www.allitebooks.com
84 Chapter 4
The Serviceability Agent
Some of these tools are helpful in debugging the Java-level issues, and some are
very useful in troubleshooting the JVM-level problems.
Class Browser
With the Class Browser (see the example in Figure 4.15), we can see all the classes
loaded by the target VM. It also allows us to dump the class files for all or some
selective classes loaded in the target VM. This tool is very useful when we do not have
access to the source code of the application and just have the core dump file and we
need to investigate some issue with the loaded classes. For example, if some loaded
class is not behaving as expected, we can dump that class, look at its code, and try
to figure out the problem. Or perhaps there are too many loaded classes and we are
getting out-of-memory errors; with the Class Browser we can look through the classes
and see if some unneeded classes are also getting loaded or whether the classes are
getting unloaded as expected.
Deadlock Detection
This feature helps us detect the Java-level deadlock among the threads. If a Java-level
deadlock exists among the threads in the target VM, this tool prints information
about the threads involved in the deadlock and also the monitors they are waiting
to acquire. Figure 4.16 shows an example where no deadlocks were found.
The Serviceability Agent Debugging Tools 85
Object Inspector
We can inspect the Java objects as well as the VM internal C++ structures using the
Object Inspector tool.
To inspect a Java object, we need to provide the object’s address in the Java heap;
this tool shows the internal fields of the object as in Figure 4.17.
Object Inspector can also show the VM internal C++ structures that are described
by the VMStructs database in the target VM. See Figure 4.18.
86 Chapter 4
The Serviceability Agent
Object Histogram
With the Object Histogram tool, an example of which is shown in Figure 4.19, we can
get a histogram of the objects present in the Java heap. This tool helps in diagnosing
memory leaks or out-of-memory-related problems in Java programs.
88 Chapter 4
The Serviceability Agent
This tool also has a feature to show the instances of a particular class. Clicking on
the search icon in the top right corner brings up the window showing all the instances
of the selected class. See Figure 4.20.
We can select any instance from the list of instances and get the liveness path
of the object through which that object is reachable and is considered alive by the
garbage collector, as shown in Figure 4.21.
We can also see the liveness path of the object in another window by using the
Compute Liveness and Show Liveness buttons available in the Inspector window.
See Figure 4.23.
Find Pointer
This tool can help us find where a particular address lies in the Java process address
space. This is particularly useful when we are dealing with JVM crashes and want
to know the details of the memory address being accessed when the JVM crashed.
For instance, where in the JVM does the address reside? Is the address at a location
in the Java heap, in the eden space, in the old generation space? Figure 4.25 shows
an example of an address that has been entered and the resulting information about
the address that is displayed.
Memory Viewer
Memory Viewer shows the raw memory contents in hexadecimal format at any given
heap address. See Figure 4.28.
The Serviceability Agent Debugging Tools 95
Code Viewer
Code Viewer (see the example in Figure 4.30) can show us the bytecodes of a method
and the JIT compiler-generated machine code of the method if the method has been
compiled. This tool is very useful in troubleshooting JIT compiler issues. Many
times we encounter problems where a certain method compiled by the server/client
compiler is either producing unexpected results or is causing a JVM crash. By looking
at the disassembled generated compiled code (see Figure 4.31), we can get to the root
of such issues.
96 Chapter 4
The Serviceability Agent
Heap Parameters
The Heap Parameters utility shows us the heap boundaries of various generations
of the Java heap. It is helpful in finding out the heap region in which a particular
address lies. See Figure 4.32 for an example.
System Properties
We can get the system properties used by the target VM using the System Properties
tool. An example is shown in Figure 4.33.
VM Version Info
This utility shows the detailed JVM version of the target process or core file. An
example is shown in Figure 4.34.
CLHSDB
Command-Line HotSpot Debugger is the command-line variant of the HSDB. To
launch CLHSDB, we need to set the same environment variables as we did for the
HSDB. Use the following command to launch this tool:
It offers almost all the features that the UI version of the tool does; for example, to
examine any Java object or VM data structure there is a command called inspect:
hsdb> universe
Heap Parameters:
Gen 0: eden [0x23f50000,0x23fae5a0,0x243a0000) space capacity = 4521984,
8.546337182971014 used
from [0x243a0000,0x243a0000,0x24420000) space capacity = 524288, 0.0
used
to [0x24420000,0x24420000,0x244a0000) space capacity = 524288, 0.0
usedInvocations: 0
Gen 1: old [0x294a0000,0x294a0000,0x29f50000) space capacity =
11206656, 0.0 usedInvocations: 0
perm [0x33f50000,0x33f68700,0x34b50000) space capacity = 12582912,
0.7954915364583334 used ro space: [0x37f50000,0x383d2e40,0x38950000)
space capacity = 10485760, 45.1129150390625 used, rw space:
[0x38950000,0x38fd67b8,0x39550000) space capacity = 12582912,
54.37768300374349 usedInvocations: 0
102 Chapter 4
The Serviceability Agent
List of Commands
Here is the complete list of commands available with the CLHSDB tool:
hsdb> help
Available commands:
assert true | false
attach pid | exec core
detach
dumpcfg { -a | id }
dumpcodecache
dumpideal { -a | id }
dumpilt { -a | id }
echo [ true | false ]
examine [ address/count ] | [ address,address]
field [ type [ name fieldtype isStatic offset address ] ]
findpc address
flags [ flag | -nd ]
help [ command ]
history
inspect expression
intConstant [ name [ value ] ]
jhisto
jstack [-v]
livenmethods
longConstant [ name [ value ] ]
pmap
print expression
printas type expression
printmdo [ -a | expression ]
printstatics [ type ]
pstack [-v]
quit
reattach
revptrs address
scanoops start end [ type ]
search [ heap | perm | rawheap | codecache | threads ] value
source filename
symbol address
symboldump
symboltable name
thread { -a | id }
threads
tokenize ...
type [ type [ name super isOop isInteger isUnsigned size ] ]
universe
verbose true | false
versioncheck [ true | false ]
vmstructsdump
where { -a | id }
The Serviceability Agent Debugging Tools 103
FinalizerInfo
This tool prints details on the finalizable objects in the target VM:
HeapDumper
This tool can dump the Java heap to a file in the hprof format:
PermStat
This tool prints the statistics of the permanent generation of the attached process
or the core file:
Continued
104 Chapter 4
The Serviceability Agent
PMap
This tool prints the process map of the target process/core much like the Solaris
pmap tool:
Object Histogram
Object histograms can be collected using the utilities available in HSDB and CLHSDB.
A standalone tool is also available that can be used to dump object histograms from
the target VM:
jhat also provides an interface to use this language. Good documentation on this
language is available in the jhat tool. That help documentation can also be accessed
from here: https://blogs.oracle.com/poonam/entry/object_query_language_help.
ClassDump
Using this tool, we can dump the loaded classes from the target VM. It is possible
to dump a single class or multiple classes from the selected packages. A few system
properties are available that can be set to specify the name of the class that we want
to dump or the list of packages from which we want to dump the classes. These are
listed in Table 4.1.
Here is how we can attach the ClassDump utility to a running process and dump
the loaded classes in a folder specified using the -Dsun.jvm.hotspot.tools.
jcore.outputDir property:
myserver 20 % ls classes
./ ../ TestClass.class java/ sun/
As we can see, the ClassDump utility has dumped the classes loaded in the process
in the classes/ folder. Similarly, this tool can attach to a core file or to a remote debug
server and dump the classes loaded in the target VM.
DumpJFR
DumpJFR is an SA-based tool that can be used to extract Java Flight Recorder (JFR)
information from the core files and live HotSpot processes.
Java Flight Recorder and Mission Control tools are shipped with the JDK since
JDK 7u40. As we know, the Java Flight Recorder is a tool for collecting diagnostic
and profiling data about a running Java application. It is integrated into the JVM,
and its usage causes almost no performance overhead. Java Mission Control can be
used to analyze the data collected by the Java Flight Recorder.
DumpJFR provides the capability to extract the JFR data from the core files of
crashes, or hung Java processes, which otherwise is not possible to access. This tool
is shipped with the JDK since JDK 8u60.
DumpJFR tool sources are present under hotspot/src/closed/agent/ in the repository.
During the build process of HotSpot sources, DumpJFR class files get included into
sa-jdi.jar when the closed part of the HotSpot repository gets built.
Please note that Java Flight Recorder and Mission Control, and hence this tool,
require a commercial license for use in production.
This is how we can attach DumpJFR to a live process and dump the JFR data
into a recording file:
This attaches DumpJFR to a core file to extract the Java Flight Recorder
information:
The DumpJFR tool dumps the JFR data to a file called recording.jfr in the current
working folder. This recording file can be analyzed using Java Mission Control.
JSDB
JavaScript Debugger provides a JavaScript interface to the SA. It is a command-line
JavaScript shell based on Mozilla’s Rhino JavaScript Engine.
More details on this utility can be found in the open-source HotSpot repository in
the file hotspot/agent/doc/jsdb.html.
1. Copy all the libraries used by the program from the core host to the debugger
host, say, to a folder /space/corelibs/. Note that the libraries can be copied either
directly under /space/corelibs/ or to a full directory path under /space/corelibs/.
For example, /local/java/jre/lib/sparc/server/libjvm.so from the core host can be
copied directly either under /space/corelibs/ or under /space/corelibs/local/java/
jre/lib/sparc/server/ on the debugger host. Similarly, /usr/lib/libthread_db.so
from the core host can be copied either to /space/corelibs/ or to /space/corelibs/
usr/lib/ on the debugger host.
The list of required library files can be obtained either from the hs_err log file
under the section “Dynamic Libraries” or by using the native debuggers such
as gdb, dbx, and WinDbg.
2. Then set the SA environment variable SA_ALTROOT to the folder containing the
shared libraries on the debugger host, that is, setenv SA_ALTROOT /space/
corelibs/.
Now, for the core file core.16963 that we tried to open in the previous section, we
copied all the required libraries from the core host to the system where we want to
open the core with SA and then set the environment variable SA_ALTROOT:
Continued
112 Chapter 4
The Serviceability Agent
JDI Implementation
The SA binary sa-jdi.jar also has an implementation of the Java Debug Interface
(JDI) that makes it possible for any JDI client (e.g., JDB) to attach to the core files
and also Java processes using the JDI Connectors provided by this implementation.
The VM object returned by the attach() method of these connectors is read-only.
This means that the obtained VM object cannot be modified. So, JDI clients using
these connectors should not call any JDI methods that are defined to throw a
VMCannotBeModifiedException.
$JAVA_HOME/bin/jdb
-J-Xbootclasspath/a:$JAVA_HOME/lib/sa-jdi.jar:$JAVA_HOME/lib/tools.jar
–connect sun.jvm.hotspot.jdi.SACoreAttachingConnector:core=
${CORE_FILE},javaExecutable=${EXEC_FILE}
$JAVA_HOME/bin/jdb
-J-Xbootclasspath/a:$JAVA_HOME/lib/sa-jdi.jar:$JAVA_HOME/lib/tools.jar
-connect sun.jvm.hotspot.jdi.SAPIDAttachingConnector:pid=2402
114 Chapter 4
The Serviceability Agent
${JAVA_HOME}/bin/rmiregistry
-J-Xbootclasspath/p:${JAVA_HOME}/lib/sa-jdi.jar
This command creates and starts a remote object registry. An optional port number
may be specified as the the port number at which the registry should be started. If
no optional port number is specified, the registry is started on port 1099.
2. Start the debug server on the remote machine, specifying the process or core
file to be debugged:
or
SA Debug Server starts the RMI registry at port 1099 if the registry is not already
running.
An alternative to these two steps is to use the jsadebugd utility that is shipped
with the JDK to start the RMI registry and the debug server on the remote machine.
uniqueID is an optional string. If more than one debug server is to run at the
same time on the same machine, each must have a different uniqueID string.
Extending Serviceability Agent Tools 115
package sun.jvm.hotspot.tools;
import java.io.PrintStream;
import java.util.*;
import sun.jvm.hotspot.runtime.*;
To write our own tool, we need to extend it from the Tool class and implement
the run() method where we add the main functionality of the tool. In the preceding
example, using the VM class we obtain the system properties and then print those
properties to the standard output.
Let’s compile and run this tool against a running process and see what the output
looks like:
Continued
116 Chapter 4
The Serviceability Agent
sun.jvm.hotspot.tools.SysPropsDumper 5880
Attaching to process ID 5880, please wait...
Debugger attached successfully.
Client compiler detected.
JVM version is 24.0-b56
java.runtime.name = Java(TM) SE Runtime Environment
java.vm.version = 24.0-b56
sun.boot.library.path = D:\Java\7u40\jre\bin
java.vendor.url = http://java.oracle.com/
java.vm.vendor = Oracle Corporation
path.separator = ;
file.encoding.pkg = sun.io
java.vm.name = Java HotSpot(TM) Client VM
sun.os.patch.level = Service Pack 1
sun.java.launcher = SUN_STANDARD
user.script =
user.country = US
user.dir = D:\tests
java.vm.specification.name = Java Virtual Machine Specification
java.runtime.version = 1.7.0_40-b43
java.awt.graphicsenv = sun.awt.Win32GraphicsEnvironment
os.arch = x86
java.endorsed.dirs = D:\Java\7u40\jre\lib\endorsed
line.separator =
java.io.tmpdir = C:\Users\pobajaj\AppData\Local\Temp\
java.vm.specification.vendor = Oracle Corporation
user.variant =
os.name = Windows 7
sun.jnu.encoding = Cp1252
java.library.path = D:\Java\7u40\bin;C:\windows\Sun\Java\bin;C:\windows\
system32;C:\windows;D:\Java\7u40\bin;C:\windows\system32;C:\windows;C:
\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\Program
Files (x86)\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Micros
oft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\
DTS\Binn\;D:\Program Files\Perforce;.
java.specification.name = Java Platform API Specification
java.class.version = 51.0
sun.management.compiler = HotSpot Client Compiler
os.version = 6.1
user.home = C:\Users\pobajaj
user.timezone =
java.awt.printerjob = sun.awt.windows.WPrinterJob
file.encoding = Cp1252
java.specification.version = 1.7
user.name = pobajaj
java.class.path = .
java.vm.specification.version = 1.7
sun.arch.data.model = 32
sun.java.command = TestClass
java.home = D:\Java\7u40\jre
user.language = en
java.specification.vendor = Oracle Corporation
awt.toolkit = sun.awt.windows.WToolkit
Serviceability Agent Plugin for VisualVM 117
After we are done, we can detach the process from the SA by clicking the Detach from
Process button. The process will resume.
Opening and diagnosing core files is similar to exploring processes with SA-Plugin.
First we need to open the core file in VisualVM and then attach to that core from the
SA Plugin view. As in the case of a process, we can explore objects in the Java heap,
look at thread objects, search for values in the Java heap, and look at the compiled
code of methods from the core file. SA-Plugin’s capability of exploring the core files
helps a great deal in the postmortem of dead processes.
SA-Plugin Utilities
This plugin makes some of the Serviceability Agent utilities available in the VisualVM
in four panels described in the following sections.
Inspector panel. And clicking on the Show Stack Trace icon on the Java Threads
panel shows the stack trace of the thread. An example of the Show Stack Trace panel
is shown in Figure 4.38.
Oop Inspector
We can inspect the Java objects as well as the VM internal C++ structures using this
panel. All the Java objects in the HotSpot VM are represented as oops—ordinary
object pointers. The Oop Inspector panel shows details of oops. The ability to see all
the internal fields of any Java object provides great debugging help. This panel also
provides the ability to compute the liveness of any oop in the Java heap. An example
of the Oop Inspector panel is shown in Figure 4.39.
Code Viewer
This panel shows the bytecodes of a method and the JIT compiler-generated machine
code of the method if the method has been compiled. See Figures 4.40 and 4.41 for
examples of the Code Viewer panel. Figure 4.40 shows the bytecodes of a method, and
Figure 4.41 shows the JIT compiler-generated machine code for the compiled method.
Serviceability Agent Plugin for VisualVM 121
Find Panel
This panel, seen in Figure 4.42, has three utilities:
Find Pointer helps us find where a particular address lies in the Java process
address space.
Find Value in Heap helps us find all the locations in the Java heap where a
particular value is present.
Find Value in CodeCache helps us find all the locations in the compiler code
cache where a particular value is present.
Troubleshooting Problems Using the SA 123
Diagnosing OutOfMemoryError
java.lang.OutOfMemoryError is thrown when there is insufficient space
to allocate new objects in the Java heap or in the permanent generation space
or metaspace. (Please note that in JDK 8, permanent generation space has been
removed in favor of the metaspace.) At that point, the garbage collector cannot make
any more space available and the heap cannot be expanded further.
124 Chapter 4
The Serviceability Agent
The possible reasons for this error could be that the footprint of the application is
large and cannot be accommodated in the specified Java heap, or there is a memory
leak in the application.
Here is a Java program that throws java.lang.OutOfMemoryError in the Java
heap space:
D:\tests>java MemoryError
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Vector.<init>(Vector.java:131)
at java.util.Vector.<init>(Vector.java:144)
at java.util.Vector.<init>(Vector.java:153)
at Employee.<init>(MemoryError.java:42)
at MemoryError.main(MemoryError.java:14)
Collect the heap dump from the process using the jmap utility, the JConsole
utility, or using the -XX:+HeapDumpOnOutOfMemoryError JVM option, and
then analyze the heap dump using jhat or VisualVM.
Collect the heap histogram from the running process using the SA tools.
Collect the core or crash dump file at the occurrence of OutOfMemoryError
using the -XX:OnOutOfMemoryError option, and then obtain the heap
histogram from that core/crash dump file using the SA tools.
Since our program does not run for long and does not give us enough time to attach
any tool to the running process, we run the process with -XX:OnOutOfMemoryError
to produce a crash dump file when the OutOfMemoryError occurs. We are running
this program on a Windows machine.
D:\tests>java -XX:OnOutOfMemoryError="D:\Tools\userdump8.1\x64\userdump
%p" MemoryError
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="D:\Tools\userdump8.1\x64\userdump %p"
# Executing "D:\Tools\userdump8.1\x64\userdump 4768"...
User Mode Process Dumper (Version 8.1.2929.5)
Copyright (c) Microsoft Corp. All rights reserved.
Dumping process 4768 (java.exe) to
d:\tests\java.dmp...
The process was dumped successfully.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Vector.<init>(Vector.java:131)
at java.util.Vector.<init>(Vector.java:144)
at java.util.Vector.<init>(Vector.java:153)
at Employee.<init>(MemoryError.java:42)
at MemoryError.main(MemoryError.java:14)
Troubleshooting Problems Using the SA 125
To generate the core file on Solaris or Linux platforms, the program can be run as
Now, let’s open this crash dump file in the HSDB tool:
Get the object histogram of the heap by clicking Tools > Object Histogram, as
shown in Figure 4.43.
This histogram shows that apart from the system classes, Address and Employee
classes appear near the top of the histogram and occupy a lot of space in the heap.
This tells us that the instances of these classes are the main culprits for the
OutOfMemoryError.
In the Object Histogram utility, we can find out all the instances of a particular
class that are present in the Java heap. To do this, there is an icon at the top right
corner of the Object Histogram window. Let’s find all the instances for the Address
class. Figure 4.44 shows the Object Histogram window actively searching for all
address instances.
Finding all the instances may take some time as the tool has to traverse the whole
heap. This will bring up the Show Objects of Type window for the Address class as
shown in Figure 4.45.
Now, by clicking on the Compute Liveness button we can get the liveness path (the
reference path by which an instance is reachable from the GC roots) for the instances
of the Address class. See Figure 4.46.
Then we can see the liveness path of the selected instance by clicking on the Show
Liveness Path button.
The liveness path of the Address instance at 0x2564b698 shows that it is
reachable from a Vector field employeesList.
Here is the Java code of the test program:
import java.util.*;
public class MemoryError{
static Vector employeesList;
public static void main(String arg[]) throws Exception{
HashMap employeesMap = new HashMap();
employeesList = new Vector();
int i = 0;
while(true){
Employee emp1 =
new Employee(“Ram”,
new Address(“MG Road”, “Bangalore”, 123, India”));
Troubleshooting Problems Using the SA 129
Employee emp2 =
new Employee("Bob",
new Address("House No. 4", "SCA", 234, "USA"));
Employee emp3 =
new Employee("John",
new Address("Church Street", "Bangalore",
569, "India"));
employeesMap.put(new Integer(i++), emp1);
employeesList.add(emp1);
employeesMap.put(new Integer(i++), emp2);
employeesList.add(emp2);
employeesMap.put(new Integer(i++), emp3);
employeesList.add(emp3);
emp2.addReports(emp1);
emp3.addReports(emp1);
emp3.addReports(emp2);
}
}
}
class Employee{
public String name;
public Address address;
public Vector directReports;
public Employee(String nm, Address addr){
name = nm;
address = addr;
directReports = new Vector();
}
public void addReports(Employee emp){
directReports.add((Object)emp);
}
}
class Address{
public String addr;
public String city;
public int zip;
public String country;
public Address(String a, String ci, int z, String co){
addr = a;
city = ci;
zip = z;
city = co;
}
}
We can see that this Java code maintains records of employees along with their
address details. The employee records are added to two collection objects: to a hash
map and to a vector. Vector employeesList and hash map employeeMap hold
references to the Employee and Address instances and thus prevent them from
getting collected by the garbage collector. Therefore, these instances keep increasing
in number in the Java heap.
This test program demonstrates a mistake that is commonly made in real
applications. Often, applications inadvertently maintain multiple caches and
130 Chapter 4
The Serviceability Agent
collections of objects that are not required for the program logic, and that mistake
increases the footprint of the process, thus leading to out-of-memory errors.
Please note that to demonstrate the problem and to produce the out-of-memory
error, the employee records are added in a while loop in the test program, and that
may not be the case in real-world applications.
We can also obtain the object histograms from the crash dump file using the Object
Histogram standalone utility:
We start the execution of this program and then let it run for some time. The program
enters deadlock soon after. Then we attach HSDB to the hung process. HSDB has a
utility called Deadlock Detection that is available under the Tools menu. Upon launching
the Deadlock Detection tool, we get the message window shown in Figure 4.47.
This shows that the program has one Java-level deadlock involving two threads—
Thread-0 and Thread-1. Thread-0 is waiting to lock Monitor 0x00ef2aac, which is held
by Thread-1, and Thread-1 is waiting to lock Monitor 0x00ef0f8c, which is already
held by Thread-0; hence the deadlock.
132 Chapter 4
The Serviceability Agent
To get more details on these monitors, we can use the Monitor Cache Dump utility.
See Figure 4.48.
Sometimes, it is interesting to look at the state of the threads. We can do this
by inspecting the thread object in the Oop Inspector utility. Let’s take a look at the
Thread-0 and Thread-1 thread objects in the Oop Inspector. Double-clicking on these
threads in the Java Threads panel will bring up the Oop Inspector windows for these
thread objects. Figure 4.49 shows the Oop Inspector for Thread-0, and Figure 4.50
shows the Oop Inspector window for Thread-1.
In these two snapshots, both threads have the thread status of 1025. Let’s look
at the code in Java HotSpot VM that computes the thread state of the Java thread
from its threadStatus field:
This computation would return the thread state as BLOCKED when the
threadStatus is 1025. This means that both threads are in the BLOCKED state.
So, using the Deadlock Detection tool, we can easily nail down the Java-level
deadlocks in applications.
Test Program
Here is the test program that has the Java classes TestCrash and Object1.
Object1 has a native method (nativeMethod()) which is implemented in C code
in the file test.c. The TestCrash class creates Object1 instances and then calls
nativeMethod() on one of these Object1 instances.
import java.util.ArrayList;
void method1() {
/* Fill the heap with ArrayLists */
ArrayList a1 = new ArrayList();
for (int i=0; i<10000; i++) {
Object1 obj = new Object1();
a1.add(obj);
}
try {
Thread.sleep(1000);
} catch (InterruptedException ex) {
}
}
}
class Object1 {
byte[] array = {1, 2, 3};
native void nativeMethod();
}
Here is the native part of the test case implementing method nativeMethod()
using JNI. This source code is placed in a file named test.cpp following the JNI source
file naming conventions.
#include "test.h"
#include <stdlib.h>
#include <memory.h>
/*
* Class: Object1
* Method: nativeMethod
* Signature: ()V
*/
jbyte* array =
(jbyte*)env->GetPrimitiveArrayCritical(arrayObject, 0);
}
138 Chapter 4
The Serviceability Agent
Now, we need to compile this program on a Solaris machine with the following
instructions:
export JAVA_HOME=/java/jdk1.7.0_40/
$JAVA_HOME/bin/javac TestCrash.java
CC -m32 -I$JAVA_HOME/include -I$JAVA_HOME/include/solaris -G
test.cpp -o libtest.so
V [libjvm.so+0x5718db] void
GenCollectedHeap::gen_process_strong_
roots(int,bool,bool,bool,SharedHeap::ScanningOption,
OopsInGenClosure*,bool,OopsInGenClosure*)+0x5b
V [libjvm.so+0x574594] void
GenMarkSweep::mark_sweep_phase1(int,bool)+0x88
V [libjvm.so+0x573e5d] void
GenMarkSweep::invoke_at_safepoint(int,ReferenceProcessor*,bool)+0x179
V [libjvm.so+0x579d9e] void
OneContigSpaceCardGeneration::collect(bool,bool,unsigned,bool)+0x8a
V [libjvm.so+0x57116a] void
GenCollectedHeap::do_collection(bool,bool,
unsigned,bool,int)+0x676
V [libjvm.so+0x572743] void
GenCollectedHeap::do_full_collection
(bool,int)+0x4f
V [libjvm.so+0x217f6a] void VM_GenCollectFull::doit()+0xa6
V [libjvm.so+0x1d05df] void VM_Operation::evaluate()+0x77
V [libjvm.so+0x14aa82] void VMThread::loop()+0x496
V [libjvm.so+0x14a4d0] void VMThread::run()+0x98
V [libjvm.so+0x85a40d] java_start+0xaf5
C [libc.so.1+0xbd673] _thrp_setup+0x9b
C [libc.so.1+0xbd920] _lwp_start+0x0
Figure 4.52 shows the disassembly of the code that was being executed around PC
0xfe20af22 when the crash happened.
The program counters and corresponding assembly instructions shown in
Figure 4.52 indicate that the process crashed when trying to access the value at
address eax+100, which is at program counter 0xfe20af22. From the hs_err file,
we can see the contents of the registers, and the value in the EAX register was:
To find out what was at 0x2e617661 and determine why the crash happened while
attempting to read the value at 0x2e617661+100 we can use HSDB’s Find Pointer
panel, shown in Figure 4.53, to see that this address does not lie in the Java heap.
Using the Find Address in Heap tool, we can find all the locations in the Java heap
from where this particular address is referenced. See Figure 4.54.
Now, examine these found locations in the Object Inspector (see Figure 4.55) to
see if the addresses lie within some valid objects.
142 Chapter 4
The Serviceability Agent
All the found addresses bring up the byte array object at 0xe7a4ea38 in the
Object Inspector, which means the object at 0xe7a4ea38 is the closest valid object
just before these locations. If we look carefully, these locations actually go beyond the
limits of the byte array object, which should have ended at 0xe7a4ea48, and from
address 0xe7a4ea48 the next object should have started. See the raw contents at
memory location 0xe7a4ea38 in Figure 4.56.
We can look at the raw bits as characters in the dbx debugger (shown below). This
clearly shows that the object at 0xe7a4ea38 has a byte stream that goes beyond its
size limit of three elements and overwrites the object starting at 0xe7a4ea48.
(dbx) x 0xe7a4ea38/100c
0xe7a4ea38: '\001' '\0' '\0' '\0' '\030' '\007' '�' '�' '\003' '\0'
'\0' '\0' 'H' 'e' 'l' 'l'
0xe7a4ea48: 'o' ' ' ' 'J' 'a' 'v' 'a' '.' 'H' 'e' 'l' 'l' 'o' ' ' 'J'
'a' 'v'
0xe7a4ea58: 'a' '.' 'H' 'e' 'l' 'l' 'o' ' ' 'J' 'a' 'v' 'a' '.' 'H'
'e' 'l'
0xe7a4ea68: '\003' '\0' '\0' '\0' 'a' 'v' 'a' '.' 'H' 'e' 'l' 'l' 'o'
' ' 'J' 'a'
0xe7a4ea78: 'v' 'a' '.' 'H' 'e' 'l' 'l' 'o' ' ' 'J' 'a' 'v' 'a' '.'
'H' 'e'
0xe7a4ea88: 'l' 'l' 'o' ' ' 'J' 'a' 'v' 'a' '.' 'H' 'e' 'l' 'l' 'o' ' '
'J'
0xe7a4ea98: 'a' 'v' 'a' '.'
This gives us a big clue. Now, we can easily search in the code where the bytes
“Hello Java.Hello Java. . .” are being written and find the buggy part of the code that
overflows a byte array. The following shows the faulty lines in our JNI code:
145
146 Appendix Additional HotSpot VM Command-Line Options of Interest
the “+” or “-” character ahead of the command-line option name. For instance,
-XX:ConcGCThreads is a command-line option that expects a numeric value such
as -XX:ConcGCThreads=6.
The default values for a given HotSpot VM command-line option mentioned in
this appendix are based on the option’s default value in JDK 8u45. As a result, there
may be some cases where a given command-line option has a different default value
from what is mentioned in this appendix. Changes in default values usually occur
due to additional information or input on the behavior of the command-line option
at an alternative value.
-XX:+UseG1GC
Enables the G1 garbage collector. To use G1 in both Java 7 and Java 8 releases, it
must be explicitly enabled using this command-line option. As of this writing, there
are plans to make G1 the default GC for Java 9, though there is a possibility of
reverting back to Parallel GC prior to Java 9’s release.
-XX:ConcGCThreads
Sets the number of threads that the GC should use when doing work concurrently
with the Java application. By default this is roughly one-fourth of the number of
threads used to perform GC work when the Java threads are stopped.
Reducing -XX:ConcGCThreads can lead to higher throughput performance since
fewer GC threads compete with the Java application for CPU usage. Too few concur-
rent threads may cause the concurrent cycle to not complete fast enough to reclaim
memory.
-XX:G1HeapRegionSize
Sets the size of the heap regions for G1. By default this is about 1/2000 of the heap
size. Possible region sizes are 1M, 2M, 4M, 8M, 16M, and 32M.
Objects that are larger than half the region size need to be managed specially by
G1. Increasing the region size allows larger objects to be allocated with the normal
allocation path, which may help performance if such large objects are common.
However, having larger regions means that G1 has less flexibility when it comes to
making ergonomic decisions, such as, for example, deciding on the size of the young
generation.
-XX:G1HeapWastePercent
Controls the acceptable amount of free memory that G1 will not collect. By default
this value is 5 percent.
G1 will collect all regions until any regions left would only free up -XX:G1HeapWaste
Percent of memory. Especially on large heaps it can make sense to use a lower value
Appendix Additional HotSpot VM Command-Line Options of Interest 147
-XX:G1MixedGCCountTarget
Sets the target value for how many mixed GCs should be performed after a concurrent
cycle. The default value is 8.
Old regions normally take a little longer to collect than young regions. Allowing
more mixed GCs after a concurrent cycle allows G1 to spread out the reclamation of
the old regions over more collections. But increasing -XX:G1MixedGCCountTarget
also means that it will take longer until a new concurrent cycle can be started. If mixed
GC pause times are too long, it may help to increase -XX:G1MixedGCCountTarget.
-XX:+G1PrintRegionLivenessInfo
This is a diagnostic VM option, and as such it needs to be enabled with -XX:+Unlock
DiagnosticVMOptions. When enabled, it will print liveness information for each
region on the heap. The information includes the usage, the size of the remembered
set, and the “GC efficiency,” which is a measure of how valuable it is to collect this
region. The information is logged after a marking cycle has completed and also after
the regions have been sorted for inclusion in the collection set.
The information logged by -XX:+G1PrintRegionLivenessInfo can be useful
when trying to understand the heap usage and to identify issues with remembered
sets. Since it logs one line for each region in the heap, the data can be hard to manage
for large heaps.
-XX:G1ReservePercent
To reduce the risk of getting a promotion failure, G1 reserves some memory for
promotions. This memory will not be used for the young generation. By default G1
keeps 10 percent of the heap reserved for this purpose.
On a large heap with a large live set, 10 percent of the heap may be too much
to reserve. Reducing this value can leave more memory for the young generation
and lead to longer times between GCs, which normally increases throughput
performance.
-XX:+G1SummarizeRSetStats
This is a diagnostic VM option, and as such it needs to be enabled with
-XX:+UnlockDiagnosticVMOptions. When enabled, it will print a detailed
summary about the remembered sets when the VM exits. In combination with
-XX:G1SummarizeRSetStatsPeriod this summary can be printed periodically
instead of just at VM exit.
If remembered set issues are suspected, this can be a useful tool to analyze them.
148 Appendix Additional HotSpot VM Command-Line Options of Interest
-XX:G1SummarizeRSetStatsPeriod
This is a diagnostic VM option, and as such it needs to be enabled with
-XX:+UnlockDiagnosticVMOptions. It can only be used together with
-XX:+G1SummarizeRSetStats. If set to a value other than 0, this will print the
same summary produced by -XX:+G1SummarizeRSetStats. But instead of printing
it just on VM exit, it will print it each time after the number of GCs specified as the
–XX:G1SummarizeRSetStatsPeriod has occurred.
It can be expensive to print this information for every GC, but printing it periodically
makes it possible to identify trends in the remembered set management.
-XX:+G1TraceConcRefinement
This is a diagnostic VM option, and as such it needs to be enabled in combination
with -XX:+UnlockDiagnosticVMOptions. With -XX:+G1TraceConcRefinement
enabled, information about the concurrent refinement threads is logged.
The information produced includes when concurrent refinement threads are activated
and when they are deactivated. This can be useful to identify issues with concurrent
refinement.
-XX:+G1UseAdaptiveConcRefinement
When enabled, this command-line option dynamically recalculates the values for
-XX:G1ConcRefinementGreenZone, -XX:G1ConcRefinementYellowZone, and
-XX:G1ConcRefinementRedZone every GC. This flag is enabled by default.
-XX:GCTimeRatio
Sets the time spent in the Java application threads compared to that spent in the
GC threads.
G1 attempts to honor the value set for -XX:GCTimeRatio to ensure that
the Java application threads get enough execution time as specified by this
command-line option. G1 does this by splitting work up and aborting work that
can be split up or aborted. G1 also tries to spread out GC pauses to accomplish
this goal. The default value for -XX:GCTimeRatio can vary depending on the
garbage collector in use by the HotSpot VM. When G1 is in use, the default
-XX:GCTimeRatio is 9.
The HotSpot VM converts the -XX:GCTimeRatio value to a percentage using the
following formula: 100/(1 + GCTimeRatio). In other words, -XX:GCTimeRatio can
be thought of as asking the HotSpot VM to attempt to spend no more than 100/(1 +
GCTimeRatio) percent of its time executing in the GC threads. Hence, a default value
of -XX:GCTimeRatio=9 with G1 means that up to 10 percent of the time can be spent
doing GC work.
It should be noted that the HotSpot VM’s throughput garbage collector, Parallel
GC, sets a default value for -XX:GCTimeRatio=99. This means that the HotSpot
VM should attempt to spend up to 1 percent of its time doing GC work. This makes
Appendix Additional HotSpot VM Command-Line Options of Interest 149
-XX:+HeapDumpBeforeFullGC
When this command-line option is enabled, an hprof file is created just prior to a full GC
starting. The hprof file is created in the directory where the HotSpot VM is launched.
Comparing the contents of the Java heap before and after a full GC using this
command-line option in conjunction with -XX:+HeapDumpAfterFullGC can give a
good indication of memory leaks and other issues.
-XX:+HeapDumpAfterFullGC
When this command-line option is enabled, an hprof file is created right after the
full GC has completed. The hprof file is created in the directory where the HotSpot
VM is launched.
Comparing the contents of the Java heap before and after a full GC using this
command-line option in conjunction with -XX:+HeapDumpBeforeFullGC can give
a good indication of memory leaks and other issues.
-XX:InitiatingHeapOccupancyPercent
Sets the value for when a concurrent cycle in G1 should be started. The default value
is 45. In other words, after a GC, G1 measures the occupancy of the old generation
space and compares that to the current Java heap size. If the occupancy of the old
generation space reaches or exceeds the InitiatingHeapOccupancyPercent,
then a G1 concurrent cycle is initiated by scheduling an initial-mark operation
to begin on the next GC. The G1 concurrent cycle is the means by which G1
concurrently collects the old generation space. A concurrent cycle begins with an
initial-mark operation, which can be observed in the GC logs with the -XX:+PrintGC
Details command-line option.
If full GCs are occurring due to the old generation running out of available space,
lowering this value will initiate a concurrent cycle earlier to avoid exhausting
available space.
If no full GCs are occurring and it is desirable to increase throughput, it may help
to increase this value to get fewer concurrent cycles. G1 concurrent cycles do require
CPU cycles to execute and therefore may steal CPU cycles from application threads.
Hence, frequent G1 concurrent cycles can reduce peak throughput. However, it is
generally better to err with G1 concurrent cycles running too early rather than too
late. If G1 concurrent cycles are run too late, the consequence is likely to be frequent
full GCs, which of course introduces undesirable lengthy GC pauses.
See also -XX:+G1UseAdaptiveIHOP.
-XX:+UseStringDeduplication
Enables the deduplication of Java Strings. This command-line option and feature
was introduced in JDK 8u20. String deduplication is disabled by default.
150 Appendix Additional HotSpot VM Command-Line Options of Interest
The object is an instance of java.lang.String.
The object is being evacuated from a young heap region.
The object is being evacuated to a young/survivor heap region and the
object’s age is equal to the deduplication age threshold, or the object is
being evacuated to an old heap region and the object’s age is less than the
deduplication age threshold. See -XX:StringDeduplicationAgeThreshold
for additional information on the deduplication threshold.
Interned Strings are dealt with differently from noninterned Strings. Interned
Strings are explicitly deduplicated just before being inserted into the internal
HotSpot VM’s StringTable to avoid counteracting HotSpot Server JIT compiler
optimizations done on String literals.
See also -XX:StringDeduplicationAgeThreshold and -XX:+PrintString
DeduplicationStatistics.
-XX:StringDeduplicationAgeThreshold
Sets the String object age threshold when a String object is considered a candidate
for deduplication. The default value is 3.
More specifically, a String becomes a candidate for deduplication once a String
object has been promoted to a G1 old region, or its age is higher than the deduplication
age threshold. Once a String has become a candidate for deduplication, or has been
deduplicated, it will never become a candidate again. This approach avoids making
the same object a candidate more than once.
Also see -XX:+UseStringDeduplication and -XX:+PrintString
DeduplicationStatistics.
Appendix Additional HotSpot VM Command-Line Options of Interest 151
-XX:+PrintStringDeduplicationStatistics
Enables the printing of String deduplication statistics. The default value is disabled.
This command-line option can be very helpful when you want to know if
enabling deduplication will result in a significant savings in the amount of space
in use by String objects in the Java heap. Hence, enabling this command-line
option provides data that justifies whether there may be value in enabling
-XX:+UseStringDeduplication.
Also see - X X : + U s e S t r i n g D e d u p l i c a t i o n and - X X : S t r i n g
DeduplicationAgeThreshold.
-XX:+G1UseAdaptiveIHOP
This is a new command-line option available in JDK 9 and later. It adaptively adjusts
the initiating heap occupancy threshold from the initial value of the command-line
option InitiatingHeapOccupancyPercent. The intent is to let G1 adapt the
marking threshold to application behavior so as to increase throughput by triggering
the marking cycle as late as possible yet not exhaust old generation space.
The mechanism enabled by G1UseAdaptiveIHOP uses the value of the
command-line option InitiatingHeapOccupancyPercent as an initial value for
the marking cycles until sufficient observations about the application behavior have
been made. It then adaptively adjusts to a more optimal heap occupancy percent at
which to start the marking cycle.
When G1UseAdaptiveIHOP is disabled (when -XX:-G1UseAdaptiveIHOP is
explicitly specified as a command-line option), G1 will always use the InitiatingHeap
OccupancyPercent value as the occupancy at which to start the marking cycle.
G1UseAdaptiveIHOP, as of this writing, will be enabled by default with its
introduction in JDK 9.
See also -XX:InitiatingHeapOccupancyPercent.
-XX:MaxGCPauseMillis
This command-line option sets the pause time goal in milliseconds for G1. The default
value is 200. Note that this is a goal, not a hard maximum pause time that G1 can
never exceed.
G1 attempts to size the young generation to make sure that it can be collected
within the goal set by -XX:MaxGCPauseMillis. When G1 is in use, this
command-line option, along with sizing the Java heap with -Xms and –Xmx, is the
command-line option that is expected to be used. These three command-line options
are also the suggested starting point when using G1, even when migrating from
Parallel GC or CMS GC to G1.
152 Appendix Additional HotSpot VM Command-Line Options of Interest
-XX:MinHeapFreeRatio
Sets the value for how much memory is allowed to be free on the heap. The default
value is 40. The value is actually a percentage, not a ratio. It is unfortunate that the
command-line name includes the term ratio.
The Java HotSpot VM uses the value set by -XX:MinHeapFreeRatio to help
determine when to grow the Java heap. This decision is made during a GC and
can be described thus: if less than -XX:MinHeapFreeRatio percent of the Java
heap is free, G1 will attempt to grow the Java heap to meet the value set as the
-XX:MinHeapFreeRatio.
Obviously, in order for the Java heap size to grow, the values set for -Xms, the
initial Java heap size, and -Xmx, the maximum Java heap size, must be set to
different values. If -Xms and -Xmx are set to the same value, the Java HotSpot VM
will not grow or shrink the Java heap.
-XX:MaxHeapFreeRatio
Sets the value for how little memory is allowed to be free on the heap. The default
value is 70. Similarly to -XX:MinHeapFreeRatio, the value is actually a percentage,
not a ratio. Again, similarly to -XX:MinHeapFreeRatio, it is unfortunate that the
command-line name includes the term ratio.
The Java HotSpot VM uses the value set by -XX:MaxHeapFreeRatio to help
determine when to shrink the Java heap. This decision is made during a GC and
can be described thus: if more than -XX:MaxHeapFreeRatio percent of the Java
heap is free, G1 will attempt to shrink the Java heap to meet the value set as the
-XX:MaxHeapFreeRatio.
Also as is the case with -XX:MinHeapFreeRatio, in order for the Java heap size
to shrink, the values set for -Xms, the initial Java heap size, and -Xmx, the maximum
Java heap size, must be set to different values. If -Xms and -Xmx are set to the same
value, the Java HotSpot VM will not shrink or grow the Java heap.
-XX:+PrintAdaptiveSizePolicy
Turns on logging of information about heap size changes. This information can be
very useful in understanding the ergonomic heuristic decisions made by G1 (also
applicable to Parallel GC).
Use of -XX:+PrintAdaptiveSizePolicy in tuning G1 is described in Chapter 3,
“Garbage First Garbage Collector Performance Tuning.”
-XX:+ResizePLAB
Sets whether the thread-local allocation buffers that the GC uses for promoting
objects should be resized dynamically or just have a static size. The default value is
true.
Appendix Additional HotSpot VM Command-Line Options of Interest 153
The dynamic resizing of the PLABs in G1 has been shown to cause some performance
issues. For some applications, disabling the resizing of thread-local promotion buffers
can improve performance by reducing the duration the G1 GC pauses.
-XX:+ResizeTLAB
Sets whether the thread-local allocation buffers that the Java application threads
use for Java object allocations should be resized dynamically or have a static fixed
size. The default is true.
In most cases the TLAB resizing improves application performance by reducing
contention in the allocation path. Contrary to the PLAB sizing, it is not very common
to see that turning this off is good for performance. Hence, for almost all applications,
it is best to leave this command-line option enabled, which again is its default.
-XX:+ClassUnloadingWithConcurrentMark
Turns on class unloading during G1 concurrent cycles, the concurrent collection of
old generation. The default is true.
Normally it is beneficial to be able to unload unreachable classes during G1 con-
current cycles rather than having to rely on full GCs to do so. The unloading of classes
during G1 concurrent cycles can sometimes increase G1 remark GC times. If G1
remark pause times are higher than can be tolerated to meet GC pause time goals,
and few classes are expected to be unreachable, it may be beneficial to disable this
option, that is, -XX:-ClassUnloadingWithConcurrentMark.
-XX:+ClassUnloading
Turns class unloading on and off. The default value is true, meaning the HotSpot
VM will unload unreachable classes. If this command-line option is disabled—that is,
-XX:-ClassUnloading—the HotSpot VM will not unload any unreachable classes,
ever, not even as part of full GCs.
-XX:+UnlockDiagnosticVMOptions
Sets whether flags tagged as diagnostic options should be allowed or not. The default
is false.
There are some command-line options that are tagged as “diagnostic.” These can be
seen in a list of Java HotSpot VM command-line options using the -XX:+PrintFlags
Final in conjunction with -XX:+UnlockDiagnosticVMOptions. “Diagnostic” options
have a field {diagnostic} command-line option type listed in the -XX:+PrintFlags
Final output. Examples of diagnostic HotSpot VM command-line options include
-XX:+G1PrintRegionLivenessInfo, which prints detailed liveness information
for each G1 region on each GC, and -XX:+LogCompilation, which produces infor-
mation about optimization decisions the Java HotSpot VM’s JIT compiler has made.
154 Appendix Additional HotSpot VM Command-Line Options of Interest
-XX:+UnlockExperimentalVMOptions
Sets whether flags tagged as experimental options should be allowed or not. The
default is false.
Similar to -XX:+UnlockDiagnosticVMOptions, there are some command-line
options that are tagged as “experimental.” These can be seen in a list of Java HotSpot
VM command-line options using the -XX:+PrintFlagsFinal in conjunction with
-XX:+UnlockExperimentalVMOptions. “Experimental” options have a field {exper-
imental} command-line option type listed in the -XX:+PrintFlagsFinal output.
It is important to note experimental command-line options are not part of the offi-
cially supported Java SE product—but are available for experimenting with—yet offer
additional capabilities that may be worth exploring for some user cases.
Some experimental command-line options may be performance related features
but may not have undergone full scrutiny or rigorous quality assurance testing. Yet,
they may offer performance benefits in some use cases.
An example of an experimental command-line option is -XX:G1NewSizePercent,
which controls the minimum size of G1 can reduce the young generation relative
to the Java heap size. It defaults to 5. In other words, the minimum size G1 can
adaptively size young generation is 5% of the Java heap size. For some use cases
where very low GC pause times are desired, one could experiment in a non-production
environment setting a lower -XX:G1NewSizePercent to say 1 or 2, yet realizing in
doing so there is no official product support.
To use an experimental option, the -XX:+UnlockExperimentalVMOptions must
be specified in conjunction with the experimental command-line option. For example,
to use the -XX:G1NewSizePercent=2 command-line option, you would specify both
-XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=2. The –XX:
+UnlockExperimentalVMOptions only need be specified once if more than one
experimental command-line option is desired.
-XX:+UnlockCommercialFeatures
Sets whether Oracle-specific features that require a license are to be enabled. The
default is false.
There are some capabilities or features developed by Oracle that are available only
via the Oracle Java runtime distributed by Oracle and are not included in OpenJDK.
An example of an Oracle commercial feature is Java Flight Recorder, which is part of
Oracle’s monitoring and management tool called Java Mission Control. Java Flight
Recorder is a profiling and event collection framework that can collect and show
within Java Mission Control low-level information about how the application and
JVM are behaving.
Index
A Classes
dumping, 106–108
Addresses, finding, 92
unloading, 153
Age tables, 19
Cleanup phase, G1 GC, 13
Aging, live objects, 18, 19
Cleanup stage of concurrent
Allocating humongous objects, allocation
marking, 36
path, 19, 21
CLHSDB (Command-Line HotSpot
Available regions, 8
Debugger). See also HSDB
(HotSpot Debugger)
B command list, 102
Barriers, in RSets, 28–30, 35 description, 100–102
Boolean command-line options, 145 Java objects, examining, 101
launching, 100
VM data structures,
C examining, 101
Cards, definition, 25 CMS (Concurrent Mark Sweep) GC,
Chunks. See also Regions pause times, 5–7
cards, 25 Coarse-grained bitmap, 25
definition, 25–28 Code root scanning, 43–44
global card table, 25–28 Code Viewer, 95–97
Class browser, 84–85 Code Viewer panel, 120–122
Class files, dumping, 84, 112 Collection sets (CSets). See CSets
Class instances, displaying, 88 (collection sets)
ClassDump, 106–108 Command Line Flags, 98, 100
155
156 Index
G
E
G1 garbage collector. See G1 GC
Eden space, 11, 18–19 G1 GC. See also Regions
Efficiency, garbage collection. See GC available regions, 8
efficiency cleanup, 13
Ergonomics heuristics, dumping, 55 collection cycles, 16–17
Evacuation, 47 compaction, 10, 12
Evacuation failures concurrent cycle, 13
definition, 16 concurrent marking, 13
duration of, determining, 37 concurrent root region scanning, 13
log messages, 53–54 description, 8–10
overview, 37–38 enabling, 146
to-space exhausted log message, fragmentation, 10
53–54 free memory amount, setting, 146–147
Evacuation failures, potential causes full GCs, 12–13
heap size, 59 heap sizing, 14
insufficient space in survivor regions, 60 initial-mark phase, 13
long-lived humongous objects, 59–60 marking the live object graph, 13
overtuning JVM command-line pause times, 10
options, 59 remarking, 13
External root region scanning, 42 threads, setting number of, 146
Garbage, maximum allowable, 56–59
Garbage collectors. common issues, 7–8
F Garbage First garbage collector.
Finalizable objects, printing details See G1 GC
of, 103 Garbage-collector-related issues, 92
FinalizerInfo, 103 GC efficiency
Find Address in Heap, 92–93 definition, 17
Find Object by Query, 90–91 identifying garbage collection
Find panel, 122–123 candidates, 22–24
Find Pointer, 92, 122–123 GC Worker Other times too high, 48
Find Value in Code Cache, 92–93 GCLABs (GC local allocation buffers), 47
Find Value in CodeCache (in VisualVM), gcore tool, 108
122–123 Generations, definition, 24
158 Index
H Heap Parameters, 98
Java Threads, 80–84
Heap, occupancy threshold, adaptive
JIT compiler-related problems,
adjustment, 151
92–93, 95–97
Heap Parameters, 98
liveness path, getting, 88–89, 91
HeapDumper, 103
memory leaks, 87–89
Heaps
Memory Viewer, 94
boundaries, displaying, 98–99, 101
method bytecodes, displaying,
dividing into regions. See Regions
95–97, 120–122
dumping, 103
Monitor Cache Dump, 94
free memory, setting, 152
Object Histogram, 87–89
G1 concurrent cycle, initiating, 149
Object Inspector, 85–87
liveness information, printing, 147
out-of-memory problems, 87–89
raw memory contents, displaying, 94
raw memory contents, displaying, 94
region size, setting, 146
reference paths, computing, 89–90
size, causing evacuation failure, 59
synchronization-related issues, 95
size changes, logging, 152
System Properties, 98–99
sizes, Parallel GC, 2–4
VM Version Info, 98–99
sizing, G1 GC, 14
Humongous continue regions, 12
HotSpot Serviceability Agent.
Humongous objects
See Serviceability Agent
allocation path, 19, 21
Hprof files, creating, 149
causing evacuation failure, 59–60
HSDB (HotSpot Debugger). See also
description, 12
CLHSDB (Command-Line HotSpot
identifying, 19
Debugger)
optimizing allocation and reclamation
connecting to debug server, 78–80
of, 20
core files, attaching to, 76
short-lived, optimizing collection of, 21
debugging modes, 74
Humongous Reclaim, 50
description, 72–80
Humongous regions
HotSpot processes, attaching to, 74–76
contiguity, 21
launching, 72
definition, 11
opening core files, 76–77
description, 19–21
remote debugging, 78–80
examples of, 20–21
HSDB (HotSpot Debugger) tools and
Humongous start regions, 12
utilities
addresses, finding, 92
class browser, 84–85 I
class instances, displaying, 88 IHOP. See –XX:InitiatingHeap
Code Viewer, 95–97 OccupancyPercent
Compute Reverse Pointers, 89–90 Initial-mark stage of concurrent
Deadlock Detection, 84, 86 marking, 34
displaying Java threads, 80–84 Initial-mark phase, G1 GC, 13
dumping class files, 84, 112 inspect command, 101
Find Address in Heap, 92–93
Find Object by Query, 90–91
Find Pointer, 92 J
Find Value in Code Cache, 92–93 Java application threads. See Threads
garbage-collector-related issues, 92 Java Debug Interface (JDI). See JDI
heap boundaries, displaying, 98–99 (Java Debug Interface)
Index 159
T W
TAMS (top-at-mark-starts), 30–33 Weak references, excessive, correcting, 63
Tenuring threshold, live objects, 18, 19 Work stealing, 47–48
Termination, 47–48 Write barriers, RSets, 28–30
Thread count, increasing, 54
Thread stack trace, displaying, 119–120
Threads X
CMS (Concurrent Mark Sweep)
–XX:+ClassUnloadingWithConcurrent
GC, 6–7
Mark, 36, 153
displaying, 80–84, 119–120
–XX:+CMSParallelInitialMark
setting number of, 146
Enabled, 6
time ratio, setting, 148–149
–XX:+CMSParallelRemarkEnabled, 6
Threads, interrupting
–XX:ConcGCThreads, 34–35, 54, 146
Parallel GC, 4
–XX:G1ConcRefinementGreenZone,
Serial GC, 5
29, 148
Timed activities, variance in, 42
–XX:G1ConcRefinementRedZone,
TLABs, 17–18, 21, 153
29, 148
Top-at-mark-starts (TAMS), 30–33
–XX:G1ConcRefinementThreads,
To-space exhausted log message,
29–30, 43
53–54. See also Evacuation failures
–XX:G1ConcRefinementYellowZone,
To-space exhaustion. See Evacuation
29, 148
failures
–XX:G1HeapRegionSize, 19, 59, 146
To-space overflow. See Evacuation failures
–XX:G1HeapRegionSize=n, 18
Troubleshooting. See JDI (Java Debug
–XX:G1HeapWastePercent,
Interface); Serviceability Agent
23, 146–147
Tuning
–XX:G1MaxNewSizePercent,
concurrent marking phase, 52–54
17, 50–52
mixed collection phase, 54–56
–XX:G1MixedGCCountTarget,
reclamation, 63–65
23, 57, 147
young generations, 50–52
–XX:G1MixedGCLiveThreshold
Percent, 58
–XX:G1NewSizePercent, 17, 50–52
U
–XX:+G1PrintRegionLiveness
universe command, 101 Info, 147
Update log buffer, 29–30 –XX:G1ReservePercent, 60, 147
userdump tool, 108 –XX:+G1SummarizeRSetStats,
44–47, 147
–XX:G1SummarizeRSetStats
V Period, 148
Version checking, disabling, 111 –XX:+G1TraceConcRefinement, 148
Version matching, 71 –XX:+G1UseAdaptiveConc
VisualVM. See Serviceability Agent Refinement, 148
VM data structures, examining, 101 –XX:+G1UseAdaptiveIHOP, 151
VM Version Info, 98–99 –XX:GCTimeRatio, 148–149
VMStructs class, 70–71 –XX:+HeapDumpAfterFullGC, 149
vmStructs.cpp file, 70–71 –XX:+HeapDumpBeforeFullGC, 149
164 Index
–XX:InitiatingHeapOccupancy Y
Percent
Young collection pauses
default value, 11
description, 18–19
description, 149
eden regions, 18–19
heap occupancy percentage, 22
survivor regions, 18–19
occupancy threshold, default, 22
triggering, 18
occupancy threshold, setting, 30
Young collection pauses, live objects
overview, 22–24
aging, 18, 19
–XX:InitiatingHeapOccupancy
copy to survivor, 18, 19
Percent=n, 52–54
identifying, 22–24
–XX:MaxGCPauseMillis, 17, 50–52, 151
liveness factor per region, calculating,
–XX:MaxHeapFreeRatio, 152
22–24
–XX:MaxTenuringThreshold, 19
survivor fill capacity, 19
–XX:MinHeapFreeRatio, 152
tenuring, 18, 19
–XX:ParallelGCThreads, 29–30,
tenuring threshold, 18, 19
35, 43, 54
Young collections
–XX:+ParallelRefProcEnabled, 62–63
concurrent marking phase, tuning,
–XX:+PrintAdaptiveSizePolicy,
52–54
55, 59, 152
evacuation failure, log messages,
–XX:+PrintGCDetails, 18, 27, 60–61
53–54
–XX:PrintGCTimeStamps, 27
increasing thread count, 54
–XX:+PrintReferenceGC, 61–63
marking threshold, setting, 54
–XX:+PrintStringDeduplication
phases, 39. See also specific phases
Statistics, 151
Young collections, parallel phase
–XX:+ResizePLAB, 152–153
activities outside of GC, 48
–XX:+ResizeTLAB, 153
code root scanning, 43–44
–XX:SoftRefLRUPolicyMSPerMB=1000,
code sample, 39–40
63–64
concurrent refinement threads,
–XX:StringDeduplicationAge
42–43
Threshold, 150
definition, 39
–XX:TargetSurvivorRatio, 19
evacuation, 47
–XX:+UnlockCommercialFeatures, 154
external root region scanning, 42
–XX:+UnlockDiagnostic
load balancing, 47–48
VMOptions, 147, 153–154
processed buffers, 42–44
–XX:+UnlockExperimental
reclamation, 47
VMOptions, 154
RSets, and processed buffers,
–XX:+UseConcurrentMarkSweepGC, 6
42–44
–XX:+UseG1GC, 27
RSets, summarizing statistics,
–XX:+UseG1GC, 146
44–47
–XX:+UseParallelGC, 2–3
scanning nmethods, 44
–XX:+UseParallelOldGC, 2–3
start of parallel activities, 41
–XX:+UseSerialGC, 5
summarizing parallel activities, 48
–XX:+UseStringDeduplication,
termination, 47–48
149–150
Index 165
Addison-Wesley • Cisco Press • IBM Press • Microsoft Press • Pearson IT Certification • Prentice Hall • Que • Sams • VMware Press