/ Java EE Support Patterns

1.25.2013

Java concurrency: the hidden thread deadlocks

Most Java programmers are familiar with the Java thread deadlock concept. It essentially involves 2 threads waiting forever for each other. This condition is often the result of flat (synchronized) or ReentrantLock (read or write) lock-ordering problems.

Found one Java-level deadlock:
=============================
"pool-1-thread-2":
  waiting to lock monitor 0x0237ada4 (object 0x272200e8, a java.lang.Object),
  which is held by "pool-1-thread-1"
"pool-1-thread-1":
  waiting to lock monitor 0x0237aa64 (object 0x272200f0, a java.lang.Object),
  which is held by "pool-1-thread-2"

The good news is that the HotSpot JVM is always able to detect this condition for you…or is it?

A recent thread deadlock problem affecting an Oracle Service Bus production environment has forced us to revisit this classic problem and identify the existence of “hidden” deadlock situations.

This article will demonstrate and replicate via a simple Java program a very special lock-ordering deadlock condition which is not detected by the latest HotSpot JVM 1.7. You will also find a video at the end of the article explaining you the Java sample program and the troubleshooting approach used.

The crime scene

I usually like to compare major Java concurrency problems to a crime scene where you play the lead investigator role. In this context, the “crime” is an actual production outage of your client IT environment. Your job is to:

  • Collect all the evidences, hints & facts (thread dump, logs, business impact, load figures…)
  • Interrogate the witnesses & domain experts (support team, delivery team, vendor, client…)
The next step of your investigation is to analyze the collected information and establish a potential list of one or many “suspects” along with clear proofs. Eventually, you want to narrow it down to a primary suspect or root cause. Obviously the law “innocent until proven guilty” does not apply here, exactly the opposite.

Lack of evidence can prevent you to achieve the above goal. What you will see next is that the lack of deadlock detection by the Hotspot JVM does not necessary prove that you are not dealing with this problem.

The suspect

In this troubleshooting context, the “suspect” is defined as the application or middleware code with the following problematic execution pattern.

  • Usage of FLAT lock followed by the usage of ReentrantLock WRITE lock (execution path #1)
  • Usage of ReentrantLock READ lock followed by the usage of FLAT lock (execution path #2)
  • Concurrent execution performed by 2 Java threads but via a reversed execution order
The above lock-ordering deadlock criteria’s can be visualized as per below:


Now let’s replicate this problem via our sample Java program and look at the JVM thread dump output.

Sample Java program

This above deadlock conditions was first identified from our Oracle OSB problem case. We then re-created it via a simple Java program. You can download the entire source code of our program here.

The program is simply creating and firing 2 worker threads. Each of them execute a different execution path and attempt to acquire locks on shared objects but in different orders. We also created a deadlock detector thread for monitoring and logging purposes.

For now, find below the Java class implementing the 2 different execution paths.

package org.ph.javaee.training8;

import java.util.concurrent.locks.ReentrantReadWriteLock;

/**
 * A simple thread task representation
 * @author Pierre-Hugues Charbonneau
 *
 */
public class Task {
      
       // Object used for FLAT lock
       private final Object sharedObject = new Object();
       // ReentrantReadWriteLock used for WRITE & READ locks
       private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
      
       /**
        *  Execution pattern #1
        */
       public void executeTask1() {
            
             // 1. Attempt to acquire a ReentrantReadWriteLock READ lock
             lock.readLock().lock();
            
             // Wait 2 seconds to simulate some work...
             try { Thread.sleep(2000);}catch (Throwable any) {}
            
             try {              
                    // 2. Attempt to acquire a Flat lock...
                    synchronized (sharedObject) {}
             }
             // Remove the READ lock
             finally {
                    lock.readLock().unlock();
             }           
            
             System.out.println("executeTask1() :: Work Done!");
       }
      
       /**
        *  Execution pattern #2
        */
       public void executeTask2() {
            
             // 1. Attempt to acquire a Flat lock
             synchronized (sharedObject) {                 
                   
                    // Wait 2 seconds to simulate some work...
                    try { Thread.sleep(2000);}catch (Throwable any) {}
                   
                    // 2. Attempt to acquire a WRITE lock                   
                    lock.writeLock().lock();
                   
                    try {
                           // Do nothing
                    }
                   
                    // Remove the WRITE lock
                    finally {
                           lock.writeLock().unlock();
                    }
             }
            
             System.out.println("executeTask2() :: Work Done!");
       }
      
       public ReentrantReadWriteLock getReentrantReadWriteLock() {
             return lock;
       }
}

As soon ad the deadlock situation was triggered, a JVM thread dump was generated using JVisualVM.


As you can see from the Java thread dump sample. The JVM did not detect this deadlock condition (e.g. no presence of Found one Java-level deadlock) but it is clear these 2 threads are in deadlock state.

Root cause: ReetrantLock READ lock behavior

The main explanation we found at this point is associated with the usage of the ReetrantLock READ lock. The read locks are normally not designed to have a notion of ownership. Since there is not a record of which thread holds a read lock, this appears to prevent the HotSpot JVM deadlock detector logic to detect deadlock involving read locks.

Some improvements were implemented since then but we can see that the JVM still cannot detect this special deadlock scenario.

Now if we replace the read lock (execution pattern #1) in our program by a write lock, the JVM will finally detect the deadlock condition but why?

Found one Java-level deadlock:
=============================
"pool-1-thread-2":
  waiting for ownable synchronizer 0x272239c0, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
  which is held by "pool-1-thread-1"
"pool-1-thread-1":
  waiting to lock monitor 0x025cad3c (object 0x272236d0, a java.lang.Object),
  which is held by "pool-1-thread-2"

Java stack information for the threads listed above:
===================================================
"pool-1-thread-2":
       at sun.misc.Unsafe.park(Native Method)
       - parking to wait for  <0x272239c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
       at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
       at org.ph.javaee.training8.Task.executeTask2(Task.java:54)
       - locked <0x272236d0> (a java.lang.Object)
       at org.ph.javaee.training8.WorkerThread2.run(WorkerThread2.java:29)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       at java.lang.Thread.run(Thread.java:722)
"pool-1-thread-1":
       at org.ph.javaee.training8.Task.executeTask1(Task.java:31)
       - waiting to lock <0x272236d0> (a java.lang.Object)
       at org.ph.javaee.training8.WorkerThread1.run(WorkerThread1.java:29)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       at java.lang.Thread.run(Thread.java:722)

This is because write locks are tracked by the JVM similar to flat locks. This means the HotSpot JVM deadlock detector appears to be currently designed to detect:

  • Deadlock on Object monitors involving FLAT locks
  • Deadlock involving Locked ownable synchronizers associated with WRITE locks
The lack of read lock per-thread tracking appears to prevent deadlock detection for this scenario and significantly increase the troubleshooting complexity.

I suggest that you read Doug Lea’s comments on this whole issue since concerns were raised back in 2005 regarding the possibility to add per-thread read-hold tracking due to some potential lock overhead.

Find below my troubleshooting recommendations if you suspect a hidden deadlock condition involving read locks:

  • Analyze closely the thread call stack trace, it may reveal some code potentially acquiring read locks and preventing other threads to acquire write locks.
  • If you are the owner of the code, keep track of the read lock count via the usage of the lock.getReadLockCount() method
I’m looking forward for your feedback, especially from individuals with experience on this type of deadlock involving read locks.

Finally, find below a video explaining such findings via the execution and monitoring of our sample Java program.



1.12.2013

QOTD: Java Thread vs. Java Heap Space

The following question is quite common and is related to OutOfMemoryError: unable to create new native thread problems during the JVM thread creation process and the JVM thread capacity. 

This is also a typical interview question I ask to new technical candidates (senior role). I recommend that you attempt to provide your own response before looking at the answer.

Question:

Why can’t you increase the JVM thread capacity (total # of threads) by expanding the Java heap space capacity via -Xmx?

Answer:

The Java thread creation process requires native memory to be available for the JVM process. Expanding the Java heap space via the –Xmx argument will actually reduce your Java thread capacity since this memory will be “stolen” from the native memory space.

  • For a 32-bit JVM, the Java heap space is in a race with the native heap, including the thread capacity
  • For a 64-bit JVM, the thread capacity will mainly depend of your OS physical & virtual memory availability along with your current OS process related tuning parameters
In order to better understand this limitation, I now propose to you the following video tutorial.

You can also download the sample Java program from the link below:
https://docs.google.com/file/d/0B6UjfNcYT7yGazg5aWxCWGtvbm8/edit

1.11.2013

Java Verbose GC - tutorial video


This is my first tutorial video which will provide you with technical detail on how to enable and analyze the verbose:gc output data of your JVM process.

You can also download the Java sample program from the link below. Please make sure that you configure your Java runtime with a heap space of only 1 GB (-Xmx1024m).
https://docs.google.com/file/d/0B6UjfNcYT7yGenZCU3FfTHFfNnc/edit




Future videos may also include scripts in order for the non-English audience to perform proper language translation.

I’m looking forward for your feedback and suggestions on topics and video format you would like to see.

1.09.2013

Java video tutorials: A picture is worth a thousand words

This is my first post for 2013. I want to thank all my readers for their great feedback and comments from the various articles I posted in 2012.

It is sometimes quite difficult to truly show and explain some troubleshooting techniques via articles only. For 2013, on top of my regular posting, I will be introducing tutorial and troubleshooting videos. These training videos will be available for free from YouTube. I really hope that you will appreciate this addition. Watch for a new Videos section from the top navigation bar.

I’m really looking forward for your suggestions and recommendations on what type of tutorials and videos you would like to watch and learn. Here are some examples:

  • Live Thread Dump analysis tutorial
  • Live Heap Dump analysis tutorial
  • JVM monitoring and tuning
  • JVM verbose:gc live analysis
  • Java CPU troubleshooting
  • JVisualVM tutorial
  • Weblogic & JBoss console management
  • Weblogic & JBoss monitoring
  • Java & Java EE sample program create via Eclipse
  • Many more…

Regards,
Pierre-Hugues Charbonneau

12.24.2012

Java Thread: retained memory analysis

This article will provide you with a tutorial allowing you to determine how much and where Java heap space is retained from your active application Java threads. A true case study from an Oracle Weblogic 10.0 production environment will be presented in order for you to better understand the analysis process.

We will also attempt to demonstrate that excessive garbage collection or Java heap space memory footprint problems are often not caused by true memory leaks but instead due to thread execution patterns and high amount of short lived objects.

Background

As you may have seen from my past JVM overview article, Java threads are part of the JVM fundamentals. Your Java heap space memory footprint is driven not only by static and long lived objects but also by short lived objects.

OutOfMemoryError problems are often wrongly assumed to be due to memory leaks. We often overlook faulty thread execution patterns and short lived objects they “retain” on the Java heap until their executions are completed. In this problematic scenario:

  • Your “expected” application short lived / stateless objects (XML, JSON data payload etc.) become retained by the threads for too long (thread lock contention, huge data payload, slow response time from remote system etc.)
  •  Eventually such short lived objects get promoted to the long lived object space e.g. OldGen/tenured space by the garbage collector
  • As a side effect, this is causing the OldGen space to fill up rapidly, increasing the Full GC (major collections) frequency
  • Depending of the severity of the situation this can lead to excessive GC garbage collection, increased JVM  paused time and ultimately OutOfMemoryError: Java heap space
  • Your application is now down, you are now puzzled on what is going on
  • Finally, you are thinking to either increase the Java heap or look for memory leaks…are you really on the right track?
In the above scenario, you need to look at the thread execution patterns and determine how much memory each of them retain at a given time.

OK I get the picture but what about the thread stack size?

It is very important to avoid any confusion between thread stack size and Java memory retention. The thread stack size is a special memory space used by the JVM to store each method call. When a thread calls method A, it “pushes” the call onto the stack. If method A calls method B, it gets also pushed onto the stack. Once the method execution completes, the call is “popped” off the stack.

The Java objects created as a result of such thread method calls are allocated on the Java heap space. Increasing the thread stack size will definitely not have any effect. Tuning of the thread stack size is normally required when dealing with java.lang.stackoverflowerror or OutOfMemoryError: unable to create new native thread problems.


Case study and problem context

The following analysis is based on a true production problem we investigated recently.

  • Severe performance degradation was observed from a Weblogic 10.0 production environment following some changes to the user web interface (using Google Web Toolkit and JSON as data payload)
  • Initial analysis did reveal several occurrences of OutOfMemoryError: Java heap space errors along with excessive garbage collection. Java heap dump files were generated automatically (-XX:+HeapDumpOnOutOfMemoryError) following OOM events
  • Analysis of the verbose:gc logs did confirm full depletion of the 32-bit HotSpot JVM OldGen space (1 GB capacity)
  • Thread dump snapshots were also generated before and during the problem
  • The only problem mitigation available at that time was to restart the affected Weblogic server when problem was observed
  • A rollback of the changes was eventually performed which did resolve the situation
The team first suspected a memory leak problem from the new code introduced.

Thread dump analysis: looking for suspects…

The first analysis step we did was to perform an analysis of the generated thread dump data. Thread dump will often show you the culprit threads allocating memory on the Java heap. It will also reveal any hogging or stuck thread attempting to send and receive data payload from a remote system.

The first pattern we noticed was a good correlation between OOM events and STUCK threads observed from the Weblogic managed servers (JVM processes). Find below the primary thread pattern found:

<10-Dec-2012 1:27:59 o'clock PM EST> <Error> <BEA-000337>
<[STUCK] ExecuteThread: '22' for queue:
'weblogic.kernel.Default (self-tuning)'
has been busy for "672" seconds working on the request
which is more than the configured time of "600" seconds.


As you can see, the above thread appears to be STUCK or taking very long time to read and receive the JSON response from the remote server. Once we found that pattern, the next step was to correlate this finding with the JVM heap dump analysis and determine how much memory these stuck threads were taking from the Java heap.

Heap dump analysis: retained objects exposed!

The Java heap dump analysis was performed using MAT. We will now list the different analysis steps which did allow us to pinpoint the retained memory size and source.

1. Load the HotSpot JVM heap dump



2. Select the HISTOGRAM view and filter by “ExecuteThread”

* ExecuteThread is the Java class used by the Weblogic kernel for thread creation & execution *


As you can see, this view was quite revealing. We can see a total of 210 Weblogic threads created. The total retained memory footprint from these threads is 806 MB. This is pretty significant for a 32-bit JVM process with 1 GB OldGen space. This view alone is telling us that the core of the problem and memory retention originates from the threads themselves.

3. Deep dive into the thread memory footprint analysis

The next step was to deep dive into the thread memory retention. To do this, simply right click over the ExecuteThread class and select: List objects > with outgoing references.


As you can see, we were able to correlate STUCK threads from the thread dump analysis with high memory retention from the heap dump analysis. The finding was quite surprising.

3. Thread Java Local variables identification

The final analysis step did require us to expand a few thread samples and understand the primary source of memory retention.


As you can see, this last analysis step did reveal huge JSON response data payload at the root cause. That pattern was also exposed earlier via the thread dump analysis where we found a few threads taking very long time to read & receive the JSON response; a clear symptom of huge data payload footprint.

It is crucial to note that short lived objects created via local method variables will show up in the heap dump analysis. However, some of those will only be visible from their parent threads since they are not referenced by other objects, like in this case. You will also need to analyze the thread stack trace in order to identify the true caller, followed by a code review to confirm the root cause.

Following this finding, our delivery team was able to determine that the recent JSON faulty code changes were generating, under some scenarios, huge JSON data payload up to 45 MB+. Given the fact that this environment is using a 32-bit JVM with only 1 GB of OldGen space, you can understand that only a few threads were enough to trigger severe performance degradation.

This case study is clearly showing the importance of proper capacity planning and Java heap analysis, including the memory retained from your active application & Java EE container threads.

Learning is experience. Everything else is just information

I hope this Java Thread Example has helped you understand how you can pinpoint the Java heap memory footprint retained by your active threads by combining thread dump and heap dump analysis. Now, this article will remain just words if you don’t experiment so I highly recommend that you take some time to learn this analysis process yourself for your application(s).

Please feel free to post any comment or question.

12.17.2012

QOTD: Weblogic threads

This is the first post of a new series that will bring you frequent and short “question of the day” articles. This new format will complement my existing writing and is designed to provide you with fast answers for common problems and questions that I often get from work colleagues, IT clients and readers.

Each of these QOTD posts will be archived and you will have access to the entire list via a separate page. I also encourage you to post your feedback, recommendations along with your own complement answer for each question.

Now let’s get started with our first question of the day!

Question:

What is the difference between Oracle Weblogic thread states & attributes (Total, Standby, Active, idle, Hogging, Stuck)?

Answer:

Weblogic thread state can be a particularly confusing topic for Weblogic administrators and individuals starting to learn and monitor Weblogic threads. The thread monitoring section can be accessed for each managed server Managed server under the Monitoring > Threads tab.

As you can see, the thread monitoring tab provides a complete view of each Weblogic thread along with its state. Now let’s review each section and state so you can properly understand how to assess the health.

# Summary section

  • Execute Thread Total Count: This is the total number of threads “created” from the Weblogic self-tuning pool and visible from the JVM Thread Dump. This value correspond to the sum of: Active + Standby threads
  • Active Execute Threads: This is the number of threads “eligible” to process a request. When thread demand goes up, Weblogic will start promoting threads from Standby to Active state which will enable them to process future client requests
  • Standby Thread Count: This is the number of threads waiting to be marked “eligible” to process client requests. These threads are created and visible from the JVM Thread Dump but not available yet to process a client request
  • Hogging Thread Count: This is the number of threads taking much more time than the current execution time in average calculated by the Weblogic kernel
  • Execute Thread Idle Count: This is the number of Active threads currently “available” to process a client request





In the above snapshots, we have:

  • Total of 43 threads, 29 in Standby state and 14 in Active state
  • Out of the 14 Active threads, we have 1 Hogging thread and 7 Idle threads e.g. 7 threads “available” for request processing
  • Another way to see the situation: we have a total of 7 threads currently “processing” client request with 1 out of 7 in Hogging state (e.g. taking more time than current calculated average)

# Thread matrix




This matrix gives you a view of each thread along with its current state. There is one more state that you must also understand:

  • STUCK: A stuck thread is identified by Weblogic when it is taking more time than the configured stuck thread time (default is 600 seconds). When facing slowdown conditions, you will normally see Weblogic threads transitioning from the Hogging state followed by STUCK, depending how long these threads remain stuck executing their current request