WebLogic Stability
Detect and Analyse Stuck Threads
OGh SIG Cloud Application Foundation 28 sept
Maarten Smeets
Ankur Arora
2
• About Maarten
– Oracle Integration Consultant
– Experience with Oracle SOA Suite since 2007
– Well certified (SOA, BPM, Java, SQL,
PL/SQL among others)
– Author of a lot of blog articles
(http://javaoraclesoa.blogspot.com)
• About AMIS
– Located in the Netherlands
https://nl.linkedin.com/in/smeetsm
@MaartenSmeetsNL
3
• About Ankur
– Oracle Fusion Midleware Consultant
– Experience with
Oracle SOA Suite since 2008
– ITIL Certified
https://nl.linkedin.com/in/annkurarora
4
Agenda
• Introduction • Analyze • Prevent
– What is a stuck thread – Analyze a stuck thread with ThreadLogic – Timeouts authentication providers?
– Can a stuck thread recover? – Analyze a stuck thread with MAT – Prevent stuck threads in Service Bus
– What are the consequences of a stuck – Prevent stuck threads in composites
thread? – Work Managers
– WebLogic setting for Stuck Threads. – Methods to prevent stuck threads
5
Introduction
6
What is a stuck thread
Often an indication of stuck threads
7
What is a stuck thread
Server, Monitoring, Threads
Get a thread dump
First a thread becomes a Hogger. Next it becomes Stuck
8
Can a stuck thread recover?
• Yes. It doesn’t mean that a thread is marked as stuck is not doing anything. It is just waiting for something
which is taking more than the desired time.
– Infrastructure
– Software
– Slow response form the third party application.
9
Consequences of a Stuck Thread
What are the consequences of a stuck thread?
• That particular thread can’t be provided to other tasks.
• Too many stuck threads can prevent requests or responses from being processed.
• If there are large number of Stuck Threads in a domain then
it will make the application slow or even make the domain unusable.
10
Weblogic Settings for Stuck Thread
To check or change the default settings based on which Weblogic marks a thread as stuck, follow the below
steps.
• Click on Lock & Edit if you want to change the default settings based on your application requirements.
• Click on Servers.
• Click on the Managed Server where you want to change or check the settings.
• On the Configuration > Tuning tab
• Stuck Thread Max Time -- Amount of time, in seconds, that a thread must be continually working before a
server instance diagnoses a thread as being stuck.
• Stuck Thread Timer Interval -- Amount of time, in seconds, after which a server instance periodically scans
threads to see if they have been continually working for the configured Stuck Thread Max Time.
• Click Save & Activate the changes.
• Reboot of the Managed Server is required for the changes to take effect.
11
What is a stuck thread
WebLogic Server marks a thread as stuck if it is continually working for a set period of time.
12
Analysis
13
Analyze a stuck thread
Obtain a thread dump
• Obtain a thread dump
14
Analyze a stuck thread
Obtain a thread dump using CLI
• Obtain the ProcessId <pid> by doing jps
• All JVM’s (Jrockit, Hotspot, OpenJDK)
– on Linux/Unix: kill -3 <pid>
– OS independant: jstack –l <pid>
• JRockit specific
– jrcmd <PID> print_threads
15
Analyze a stuck thread
Obtain a thread dump with a GUI
• Java VisualVM.
• Bundled with JDK version 6 update 7
or greater
16
Analyze a stuck thread
Obtain a thread dump with a GUI
• Java Mission Control
• Bundled with JDK 7 Update 40
or greater
17
Analyse a thread dump with ThreadLogic
• ThreadLogic
18
Analyse a thread dump with ThreadLogic
• Found my stuck thread
19
Analyse a thread dump with ThreadLogic
• Suggestions to
increase stability!
20
Analyse a thread dump with ThreadLogic
• OSB has
stuck thread
21
Obtain a heap dump
Using CLI
• Obtain the ProcessId <pid> by doing jps
• All JVM’s (Jrockit, Hotspot, OpenJDK) OS independant
– jmap -dump:format=b,file=file.hprof <pid>
• Jrockit specific
– jrcmd <pid> hprofdump filename=file.hprof
22
Obtain a heap dump
GUI: Java Mission Control
23
Analyse a heap dump with MAT
• Eclipse Memory Analyzer
Tool allows heap analysis
24
Analyse a heap dump with MAT
• Stuck thread leads to dispatcher. Dispatcher leads to pipeline.
This information cannot be obtained from the thread dump alone
25
Analyse a heap dump with MAT
Stuck thread leads to testClient
invoke
Invoke leads to
serviceMetadata
serviceMetadata
leads to Proxy
Name of proxy service causing issue
26
Analysis summary
• Look at the threads in WelLogic console
• Obtain a thread dump from the console or CLI
• Analyse the thead dump with ThreadLogic or manually
• Obtain a heap dump from the console or CLI
• Analyse the heap dump with MAT
27
Prevention
28
Preventing stuck threads
Circuit breakers Timeouts Bulkheads Redundancy
29
Circuit breaker (SOA Suite 12.2.1)
Overview
• Automatically suspend upstream endpoints when a downstream endpoint is down
• Automatically resumes any suspended service when the downstream endpoint comes back up.
• Prevents fault buildup in the server. No need to bulk-recover faulted instances
• Supported for
– Web Service: Incoming requests are rejected for the duration that
the Web service is suspended.
– Adapters: JMS, AQ, DB, File and FTP adapters can be automatically
suspended in this release.
– EDN Subscribers: The EDN subscriber closest to the downstream
endpoint gets suspended.
30
Circuit breaker
Configuration
31
Circuit breaker
Configuration
32
Circuit breaker
In action
33
Timeouts in SOA Suite
• Timeouts on references
• SyncMaxWaitTime
• Timeouts in the SOA EJB’s
• JTA timeout
• JDBC timeout
• Distributed lock timeout (Database)
34
Prevent stuck threads
Timeouts authentication providers
• The LDAPServerMBean.ConnectTimeout attribute for all LDAP Authentication providers
has a default value of zero. This default setting can result in a slowdown in WebLogic
Server execution if the LDAP server is unavailable.
• Oracle recommends that you set the LDAPServerMBean.ConnectTimeout attribute to a
non-zero value; for example, 60 seconds. You can set this value via either the WebLogic
Server Administration Console, WLST or in the config.xml file.
35
Prevent stuck threads in Service Bus
• Understand the threading model
– Avoid common misconceptions!
• Understand the different ways the Service Bus can call external services
– Service Callout
– Publish
– Routing
– Java Callout
36
Service Bus Threading model
http://www.ateam-oracle.com/wp-content/uploads/2013/09/OSBThreadingModelHTTPTransport_1.1.pdf
37
Prevent stuck threads
In the Service Bus
• Oracle Service Bus supports several different ways to invoke an external service – understand the differences
and choose the right one for your requirements:
• Service Callout
– Enrichment of the payload
– Blocking Call
The solution to this is to make sure that
any Business Services used by a Callout in
a pipeline use a different Work Manager
to the pipeline itself. http://www.oracle.com/technetwork/middleware/soasuite/learnmore/con7977-2769509.pdf
https://blogs.oracle.com/reynolds/entry/following_the_thread_in_osb
38
Prevent stuck threads
In the Service Bus
• Publish
– Invoke the service asynchronously
– Non-blocking call
– Continue with the message flow without waiting on a response
– No guaranteed delivery unless QoS set to exactly once
39
Prevent stuck threads
In the Service Bus
• Routing
– Common mechanism to invoke a service
– Demarcation between request and response pipelines; thread is released after routing before receiving response
– Uses asynchronous servlet to wait for response from HTTP-based service
– Setting the Quality of Service to Exactly-Once will use the same thread for the response as the request
• Java Callout
– Should only be used when invoking very fast services
Can cause stuck threads / thread starvation
– The request thread is the same thread executing the Java method
if response does not come
– examples: Read or Update a cache, quick calculations
40
Prevent stuck threads
In the Service Bus
• Resolution
– Assign Minimum Constraint Work Manager to Invoked Business Service
– Set the number of threads ( 0 < Threads <= 2)
• Work Managers used should be considered carefully
• References – http://docs.oracle.com/middleware/1213/wls/CNFGD/self_tuned.htm#CNFGD112
41
Prevent stuck threads in composites
SOA Direct-Binding
• Is the right choice for invoking a SOA composite ONLY IF your use case involves one of these three
requirements
– You need to propagate the Security Subject to SOA
– You need to propagate the transaction context to SOA
– You have an asynchronous process with a callback to the OSB from SOA
• Thread is blocked until a response is returned
• Timeouts cannot be set on this transport; could
potentially lead to long running processes
42
SOA-Direct and performance
• Creating direct bindings (SOA-Direct, OSB-Direct, etc.) for components in addition to any other bindings, e.g.
SOAP bindings allows the engine to avoid unnecessary marshaling. However, this may not necessarily
improve overall performance and scalability. Calling services through OSB may provide more scalability as
the OSB HTTP transport is capable of releasing threads to do other work while waiting for responses.
http://www.oracle.com/technetwork/middleware/bpm/learnmore/bpm11gperftuning-1912340.pdf
43
Work Managers
• WebLogic uses a concept called Work Manager in order to prioritize work and maintain threads and thread-
pools. Work Manager can be created and configured by the administrator on the WebLogic level or by
application developers on the application level (deployment descriptors).
• The Work Manager enables you to guarantee that each application will get their chunk of the available
resources (threads/connections) or you can limit the amount of resources (e.g. threads).
• Purpose
– Indicating the type of work
– Prioritizing work
44
Work Manager constraints
• A constraint defines the minimum and maximum number of threads allocated to execute requests and the
total number of requests that can be queued or executing before the server begins rejecting requests.
Constraints can be shared by several Work Managers.
• Max threads – Default, unlimited.
The maximum number of threads that can concurrently execute requests. Can be set based on the
availability of a resource the request depends on e.g. a connection pool.
• Min threads – Default, zero.
The minimum number of threads to allocate to requests. Useful for preventing deadlocks.
• Capacity – Default, -1 (never reject requests).
The capacity (including queued and executing) at which the server starts rejecting requests.
• The thread pool is shared amongst all Work Managers, it does not guarantee that number of threads will be
available for processing at any given time. Some users confuse the notion of constraints with the idea of
establishing dedicated thread pools for a service.
45
Work Manager request classes
• Request classes define how requests are prioritized and how threads are allocated to requests. They can be
used to ensure that high priority applications are scheduled before low priority ones, requests complete
within a given response time or certain users are given priority over others. Each Work Manager may specify
one request class.
• Fair Share – Defines the average thread-use time. Specified as a relative value, not a percentage.
• Response Time – Defines the requested response time (in milliseconds).
• Context – Allows you to specify request classes based on contextual information such as the user or user
group.
46
Types of Work Managers
• Default – Used if no other Work Manager is configured. All applications are given an equal priority.
• Global – Domain-scoped and are defined in config.xml. Applications use the global Work Manager as a
blueprint and create their own instance. The work each application does can then be distinguished from
other applications.
• Application – Application-scoped and are applied only to a specific application. Specified in either weblogic-
application.xml, weblogic-ejb-jar.xml, or weblogic.xml.
47
Rule of thumb
• Design the application keeping in view the worst case scenario not the best case scenario.
• Define timeouts in service calls.
• Define timeouts for authentication providers
• Keep monitoring the capacity of the infrastructure like CPU etc.
• Define Circuit Breakers, Work Managers etc to avoid one service bringing down the entire server
48
Demonstration
49