0% found this document useful (0 votes)
144 views49 pages

Weblogic Stability Detect and Analyse Stuck Threads: Ogh Sig Cloud Application Foundation 28 Sept

This document discusses strategies for detecting and preventing stuck threads in WebLogic. It begins with an introduction that defines a stuck thread and outlines potential consequences. It then covers how to analyze stuck threads using thread dumps from the WebLogic console or CLI and heap dumps analyzed in MAT. Finally, it discusses ways to prevent stuck threads, such as using circuit breakers, timeouts, and bulkheads to limit resource consumption when failures occur.

Uploaded by

dmdunlap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views49 pages

Weblogic Stability Detect and Analyse Stuck Threads: Ogh Sig Cloud Application Foundation 28 Sept

This document discusses strategies for detecting and preventing stuck threads in WebLogic. It begins with an introduction that defines a stuck thread and outlines potential consequences. It then covers how to analyze stuck threads using thread dumps from the WebLogic console or CLI and heap dumps analyzed in MAT. Finally, it discusses ways to prevent stuck threads, such as using circuit breakers, timeouts, and bulkheads to limit resource consumption when failures occur.

Uploaded by

dmdunlap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

WebLogic Stability

Detect and Analyse Stuck Threads


OGh SIG Cloud Application Foundation 28 sept
Maarten Smeets
Ankur Arora
2

• About Maarten
– Oracle Integration Consultant
– Experience with Oracle SOA Suite since 2007
– Well certified (SOA, BPM, Java, SQL,
PL/SQL among others)
– Author of a lot of blog articles
(http://javaoraclesoa.blogspot.com)
• About AMIS
– Located in the Netherlands

https://nl.linkedin.com/in/smeetsm

@MaartenSmeetsNL
3

• About Ankur
– Oracle Fusion Midleware Consultant
– Experience with
Oracle SOA Suite since 2008
– ITIL Certified

https://nl.linkedin.com/in/annkurarora
4

Agenda

• Introduction • Analyze • Prevent


– What is a stuck thread – Analyze a stuck thread with ThreadLogic – Timeouts authentication providers?
– Can a stuck thread recover? – Analyze a stuck thread with MAT – Prevent stuck threads in Service Bus
– What are the consequences of a stuck – Prevent stuck threads in composites
thread? – Work Managers
– WebLogic setting for Stuck Threads. – Methods to prevent stuck threads
5

Introduction
6

What is a stuck thread

Often an indication of stuck threads


7

What is a stuck thread

Server, Monitoring, Threads

Get a thread dump

First a thread becomes a Hogger. Next it becomes Stuck


8

Can a stuck thread recover?

• Yes. It doesn’t mean that a thread is marked as stuck is not doing anything. It is just waiting for something
which is taking more than the desired time.
– Infrastructure
– Software
– Slow response form the third party application.
9

Consequences of a Stuck Thread

What are the consequences of a stuck thread?

• That particular thread can’t be provided to other tasks.

• Too many stuck threads can prevent requests or responses from being processed.

• If there are large number of Stuck Threads in a domain then


it will make the application slow or even make the domain unusable.
10

Weblogic Settings for Stuck Thread

To check or change the default settings based on which Weblogic marks a thread as stuck, follow the below
steps.

• Click on Lock & Edit if you want to change the default settings based on your application requirements.
• Click on Servers.
• Click on the Managed Server where you want to change or check the settings.
• On the Configuration > Tuning tab
• Stuck Thread Max Time -- Amount of time, in seconds, that a thread must be continually working before a
server instance diagnoses a thread as being stuck.
• Stuck Thread Timer Interval -- Amount of time, in seconds, after which a server instance periodically scans
threads to see if they have been continually working for the configured Stuck Thread Max Time.
• Click Save & Activate the changes.
• Reboot of the Managed Server is required for the changes to take effect.
11

What is a stuck thread

WebLogic Server marks a thread as stuck if it is continually working for a set period of time.
12

Analysis
13

Analyze a stuck thread


Obtain a thread dump

• Obtain a thread dump


14

Analyze a stuck thread


Obtain a thread dump using CLI

• Obtain the ProcessId <pid> by doing jps

• All JVM’s (Jrockit, Hotspot, OpenJDK)


– on Linux/Unix: kill -3 <pid>
– OS independant: jstack –l <pid>

• JRockit specific
– jrcmd <PID> print_threads
15

Analyze a stuck thread


Obtain a thread dump with a GUI

• Java VisualVM.

• Bundled with JDK version 6 update 7


or greater
16

Analyze a stuck thread


Obtain a thread dump with a GUI

• Java Mission Control

• Bundled with JDK 7 Update 40


or greater
17

Analyse a thread dump with ThreadLogic

• ThreadLogic
18

Analyse a thread dump with ThreadLogic

• Found my stuck thread


19

Analyse a thread dump with ThreadLogic

• Suggestions to
increase stability!
20

Analyse a thread dump with ThreadLogic

• OSB has
stuck thread
21

Obtain a heap dump


Using CLI

• Obtain the ProcessId <pid> by doing jps

• All JVM’s (Jrockit, Hotspot, OpenJDK) OS independant


– jmap -dump:format=b,file=file.hprof <pid>

• Jrockit specific
– jrcmd <pid> hprofdump filename=file.hprof
22

Obtain a heap dump


GUI: Java Mission Control
23

Analyse a heap dump with MAT

• Eclipse Memory Analyzer


Tool allows heap analysis
24

Analyse a heap dump with MAT

• Stuck thread leads to dispatcher. Dispatcher leads to pipeline.


This information cannot be obtained from the thread dump alone
25

Analyse a heap dump with MAT

Stuck thread leads to testClient


invoke

Invoke leads to
serviceMetadata

serviceMetadata
leads to Proxy

Name of proxy service causing issue


26

Analysis summary

• Look at the threads in WelLogic console

• Obtain a thread dump from the console or CLI

• Analyse the thead dump with ThreadLogic or manually

• Obtain a heap dump from the console or CLI

• Analyse the heap dump with MAT


27

Prevention
28

Preventing stuck threads

Circuit breakers Timeouts Bulkheads Redundancy


29

Circuit breaker (SOA Suite 12.2.1)


Overview
• Automatically suspend upstream endpoints when a downstream endpoint is down

• Automatically resumes any suspended service when the downstream endpoint comes back up.

• Prevents fault buildup in the server. No need to bulk-recover faulted instances

• Supported for
– Web Service: Incoming requests are rejected for the duration that
the Web service is suspended.
– Adapters: JMS, AQ, DB, File and FTP adapters can be automatically
suspended in this release.
– EDN Subscribers: The EDN subscriber closest to the downstream
endpoint gets suspended.
30

Circuit breaker
Configuration
31

Circuit breaker
Configuration
32

Circuit breaker
In action
33

Timeouts in SOA Suite

• Timeouts on references

• SyncMaxWaitTime

• Timeouts in the SOA EJB’s

• JTA timeout

• JDBC timeout

• Distributed lock timeout (Database)


34

Prevent stuck threads


Timeouts authentication providers

• The LDAPServerMBean.ConnectTimeout attribute for all LDAP Authentication providers


has a default value of zero. This default setting can result in a slowdown in WebLogic
Server execution if the LDAP server is unavailable.

• Oracle recommends that you set the LDAPServerMBean.ConnectTimeout attribute to a


non-zero value; for example, 60 seconds. You can set this value via either the WebLogic
Server Administration Console, WLST or in the config.xml file.
35

Prevent stuck threads in Service Bus

• Understand the threading model


– Avoid common misconceptions!

• Understand the different ways the Service Bus can call external services
– Service Callout
– Publish
– Routing
– Java Callout
36

Service Bus Threading model

http://www.ateam-oracle.com/wp-content/uploads/2013/09/OSBThreadingModelHTTPTransport_1.1.pdf
37

Prevent stuck threads


In the Service Bus

• Oracle Service Bus supports several different ways to invoke an external service – understand the differences
and choose the right one for your requirements:

• Service Callout
– Enrichment of the payload
– Blocking Call

The solution to this is to make sure that


any Business Services used by a Callout in
a pipeline use a different Work Manager
to the pipeline itself. http://www.oracle.com/technetwork/middleware/soasuite/learnmore/con7977-2769509.pdf
https://blogs.oracle.com/reynolds/entry/following_the_thread_in_osb
38

Prevent stuck threads


In the Service Bus

• Publish
– Invoke the service asynchronously
– Non-blocking call
– Continue with the message flow without waiting on a response
– No guaranteed delivery unless QoS set to exactly once
39

Prevent stuck threads


In the Service Bus

• Routing
– Common mechanism to invoke a service
– Demarcation between request and response pipelines; thread is released after routing before receiving response
– Uses asynchronous servlet to wait for response from HTTP-based service
– Setting the Quality of Service to Exactly-Once will use the same thread for the response as the request

• Java Callout
– Should only be used when invoking very fast services
Can cause stuck threads / thread starvation
– The request thread is the same thread executing the Java method
if response does not come
– examples: Read or Update a cache, quick calculations
40

Prevent stuck threads


In the Service Bus

• Resolution
– Assign Minimum Constraint Work Manager to Invoked Business Service
– Set the number of threads ( 0 < Threads <= 2)

• Work Managers used should be considered carefully

• References – http://docs.oracle.com/middleware/1213/wls/CNFGD/self_tuned.htm#CNFGD112
41

Prevent stuck threads in composites


SOA Direct-Binding

• Is the right choice for invoking a SOA composite ONLY IF your use case involves one of these three
requirements
– You need to propagate the Security Subject to SOA
– You need to propagate the transaction context to SOA
– You have an asynchronous process with a callback to the OSB from SOA

• Thread is blocked until a response is returned

• Timeouts cannot be set on this transport; could


potentially lead to long running processes
42

SOA-Direct and performance

• Creating direct bindings (SOA-Direct, OSB-Direct, etc.) for components in addition to any other bindings, e.g.
SOAP bindings allows the engine to avoid unnecessary marshaling. However, this may not necessarily
improve overall performance and scalability. Calling services through OSB may provide more scalability as
the OSB HTTP transport is capable of releasing threads to do other work while waiting for responses.

http://www.oracle.com/technetwork/middleware/bpm/learnmore/bpm11gperftuning-1912340.pdf
43

Work Managers

• WebLogic uses a concept called Work Manager in order to prioritize work and maintain threads and thread-
pools. Work Manager can be created and configured by the administrator on the WebLogic level or by
application developers on the application level (deployment descriptors).

• The Work Manager enables you to guarantee that each application will get their chunk of the available
resources (threads/connections) or you can limit the amount of resources (e.g. threads).

• Purpose
– Indicating the type of work
– Prioritizing work
44

Work Manager constraints

• A constraint defines the minimum and maximum number of threads allocated to execute requests and the
total number of requests that can be queued or executing before the server begins rejecting requests.
Constraints can be shared by several Work Managers.

• Max threads – Default, unlimited.


The maximum number of threads that can concurrently execute requests. Can be set based on the
availability of a resource the request depends on e.g. a connection pool.
• Min threads – Default, zero.
The minimum number of threads to allocate to requests. Useful for preventing deadlocks.
• Capacity – Default, -1 (never reject requests).
The capacity (including queued and executing) at which the server starts rejecting requests.
• The thread pool is shared amongst all Work Managers, it does not guarantee that number of threads will be
available for processing at any given time. Some users confuse the notion of constraints with the idea of
establishing dedicated thread pools for a service.
45

Work Manager request classes

• Request classes define how requests are prioritized and how threads are allocated to requests. They can be
used to ensure that high priority applications are scheduled before low priority ones, requests complete
within a given response time or certain users are given priority over others. Each Work Manager may specify
one request class.

• Fair Share – Defines the average thread-use time. Specified as a relative value, not a percentage.

• Response Time – Defines the requested response time (in milliseconds).

• Context – Allows you to specify request classes based on contextual information such as the user or user
group.
46

Types of Work Managers

• Default – Used if no other Work Manager is configured. All applications are given an equal priority.

• Global – Domain-scoped and are defined in config.xml. Applications use the global Work Manager as a
blueprint and create their own instance. The work each application does can then be distinguished from
other applications.

• Application – Application-scoped and are applied only to a specific application. Specified in either weblogic-
application.xml, weblogic-ejb-jar.xml, or weblogic.xml.
47

Rule of thumb

• Design the application keeping in view the worst case scenario not the best case scenario.

• Define timeouts in service calls.

• Define timeouts for authentication providers

• Keep monitoring the capacity of the infrastructure like CPU etc.

• Define Circuit Breakers, Work Managers etc to avoid one service bringing down the entire server
48

Demonstration
49

You might also like