Wa de Troubleshooting Guide
Wa de Troubleshooting Guide
Troubleshooting Guide
Table of Contents
Introduction ............................................................................4
Verify that an Issue Has Already Been Fixed ........................................4
Compatibility Matrix for CA Workload Automation DE (dSeries Edition) .........6
Finding Logs or Additional Data for an Issue ........................................7
Troubleshooting the CA Workload Automation DE Server ....................... 10
Server Shuts Down When the Number of Active Application Generations Reaches 600 .........................................10
Server Shuts Down with Java Out of Memory Error .................................................................................11
Server Slows Down with Out.of.memory Exception Error. .........................................................................11
Unable to retrieve users from LDAP subdirectories .................................................................................12
Job Fails with ‘Command Requires a User ID’ Error .................................................................................12
How to Set Up a Disaster Recovery Environment for CA Workload Automation DE ..............................................12
Update the Server JRE to the Latest Update Version ...............................................................................13
Change the IP Address of the Server .................................................................................................13
Cannot Install the Server Due to Ports Conflict ......................................................................................13
Predefined (Canned) Report Generation Fails Due to Insufficient Java Heap Memory ...........................................14
Server Stops Sending Email Notifications Due to Connection Reset Error .........................................................15
Open a Support Ticket for Updating the CA Workload Automation DE License ...................................................15
Server Stops Triggering the Application When the Number of Application generations Reaches Maximum....................16
1. Log in to CA Support.
2. Enter “CA Workload Automation Solutions & Patches” in the “Find It” field and click the Right
arrow .
3. Click “CA Workload Automation DE” under Product in the left pane.
4. Click “CA Workload Automation Solutions & Patches” from the search result to view the screen
as follows:
5. Click the patch number to verify the issues that are fixed as part of the published patches.
https://support.ca.com/irj/portal/anonymous/phpsupcontent?contentID=7e52789e-ab05-4dea-b58b-
a4925a7f4beb&productID=7833
1. Server Trace Logs: Stores communication messages between the CA Workload Automation DE
components and maintains debugging information. By default, the trace log is located in the
following directory:
server_install_dir\logs\tracelog.timestamp.txt
Note: Trace logs are archived (rolls over) every day. Each trace log is created with a timestamp, for
example, trace log.<timestamp>.txt. You can find the older trace logs based on the date and time
when the trace log was created.
2. Server Buffer Logs (applicable only for R11.3.x server): Stores all messages between the server and
the database regardless of category or severity. By default, the buffer log is located in the following
directory:
server_install_dir/logs/buffer/buffer.timestamp.txt
Note: After the buffer log file reaches its maximum size, it is archived and a new buffer log file is
created with a new timestamp. By default, only the latest buffer log is archived and saved. So, we
recommend that you back up the buffer log as soon as the issue occurs.
(Applicable only for r11.1 server) If the problem is reproduced consistently, enable debug logs on
server side and collect the log using the setlogid CLI command. The following is the snippet of the
setlogid command:
3. Thread Dumps: The thread dumps shows the current threads stacktrace used by the server. Issue
the threaddump CLI command to view the stacktrace. Back up the stacktrace when the issue occurs.
Example: This example displays a partial response to the THREADDUMP command. It displays the
current threads stacktrace used by the server.
Note: Threaddump can also be taken from the CA WA Desktop Client CLI perspective.
a. How to find the version and build number of the CA Workload Automation DE server?
b. How to find the version and build number of CA Workload Automation Agent?
c. How to find the version and build number of CA WA Desktop Client?
5. Agent Logs: Collect the agent logs if the agent was involved in an event monitoring or job related
process when the issue occurs. By default, agent logs are located in the agent_install_dir\log
directory. You can zip the entire log folder when the issue occurs and provide it to CA Support.
You can also provide the log details of the issue by following these steps:
a. Stop the agent.
b. Change the log level of the agent to 8 by modifying the log.level parameter in the
agentparam.txt file in the agent installation directory.
Note: The log level determines the type and number of logs the agent generates and the
amount of information the log contains. Setting the log level to 8, generates the tracing
information that is useful for troubleshooting communication issues.
c. Delete the contents of the log folder.
d. Start the agent.
e. Reproduce the issue.
f. Zip the entire log folder.
Internal trace log: By default, the internal trace log is located in the
C:\Users\<User>\workspace-CAWA-11.x directory.
Trace log set by operator: The trace log that is set by the operator. You can set the trace log
location in CA WA Desktop Client by following these steps:
a. Click Trace, Set Trace File from the CA WA Desktop Client menu.
b. Specify the location of the trace file.
7. Database Dumps: If the issue is related to database, provide the database dump.
Note: For more information about collecting database dumps for a database, contact your database
administrator.
8. Heap Dumps: If the issue is related to memory, collect the heap dumps. To collect the heap dumps
from the server, do the following:
a. (On UNIX) If jdk is installed on your server machine, use the "jmap" tool by issuing the
following command:
jmap -dump:format=b,file=heapdump.dmp <dSeries_server_pid>
You can retrieve the server pid (process ID) from the server installation directory by running
the following command:
cat serverPID
b. (On Windows) If jdk (1.6) is installed on your machine, use the "jvisualvm" tool as follows:
a. Change to the following directory on the command prompt:
<jdk home>\bin
b. Enter the following command:
visualvm.exe --jdkhome "C:\Software\Java\jdk1.6.0" --userdir
"C:\Temp\visualvm_userdir"
The main window of VisualVM opens. By default, the Applications section is
displayed on the left pane of the main window. The Applications section shows the
Java applications running on the local and remote JVMs.
c. Right-click an application node and select Main application from the popup menu to
collect heap dumps.
Solution:
If the operating system has memory issues, the server might shuts down. To fix this issue, you must
increase the heap size.
To increase the heap size, follow these steps:
1. Restart the OS
2. Do one of the following to increase the heap size:
On Windows:
Edit the following property in the startServer.lax file in the server_install/bin directory:
lax.nl.java.option.java.heap.size.max=value_in_bytes
On UNIX:
MAX_HEAP_SIZE=value_in_megabytes
To fix this issue, either avoid using the "attach spool file" option in the job email notification settings or
limit the size of email attachments. You can limit the size of email attachments by setting the maximum
size of spool file attachments in email notifications.
For more information about setting up a disaster recovery environment for CA Workload Automation
DE, refer to the PIB RI72688 on CA Support Online.
For more information about updating the server JRE to the latest update version, refer to the PIB
RI69979 on CA Support Online.
Solution:
When you install the server, the server reads the system hostname and IP address from the system's
'hosts' file. To fix this issue, add the system IP address and the hostname in the hosts file. The hosts file
is located in the following directory:
3. Right-click "Topology"
6. Click the icon at the right side of "Internal" section to show the "Advanced Parameters"
7. Set "Maximum memory available to a reporting process (in megabytes)" value to a higher
memory (the default value is 512); increase it to 1024 or more.
HAC.emailNotification: javax.mail.MessagingException:
Exception reading response; nested exception is:
java.net.SocketException: Connection reset
Solution:
The server connects to the email server with no timeout setting and relies on the email server to reset
the connection. If this connection gets disrupted and is not fully reset due to networking glitches, server
cannot send email notifications.
To fix this issue, restart the server in the standalone configuration or issue the CHANGEROLE command
in the CA WA High Availability configuration.
Note: For further information or clarification about your license, click Update an Existing Issue or
Contact CA Licensing.
6. Enter the appropriate information in the ‘Customer Care Case Request’ form and click 'Submit'.
Server Stops Triggering the Application When the Number of Application
generations Reaches Maximum
When the number of Application generations reaches a maximum value, the server stop triggering the
application and sends a warning email to the administrator.
To avoid this issue, we recommend that you issue the RESETGEN CLI command and reset the
application’s generation number periodically.
When I try to open predefined reports, the reports did not appear on the Reports pane in the Services
perspective.
Solution:
Verify that you have REPORT_DESIGN permission for opening predefined reports.
Stop the Desktop Client and then start it again by right-clicking and selecting "Run as administrator".
When CA WA High Availability Server Fails over, Will the CA WA Desktop Client
Connection to Server Too Fail Over Automatically?
When the preferred (primary) server goes down, CA WA Desktop Client will point to the standby server
automatically and this connection will be shown as an active instance.
When the preferred server is back up and running, CA WA Desktop Client will automatically establish
connection with the preferred server and this connection will be shown as an active instance.
1. Add the following properties in the config.ini file in the '<drive>:\Program Files (x86)\CA\WA
Desktop Client\configuration' directory:
Dsun.java2d.d3d=false
Dj3d.displaylist=false
<drive>:\WINDOWS\system32\nvoglnt.dll
<drive>:\Users\Administrator\workspace-CAWA-11.x\.metadata
To validate if the server can connect to the database, follow these steps:
Solution:
The ORA-28000 error occurs when the Oracle user specified a wrong password consequently for
maximum number of times, which is specified in the user's profile parameter FAILED_LOGIN_ATTEMPTS.
Hence, the DBA user account gets locked.
To fix this issue, wait for PASSWORD_LOCK_TIME or contact your DBA administrator to verify your
password. To change the password that is specified in the server configuration, use the setdbparm
utility.
Solution:
CA Workload Automation DE r11.1 can be configured with Oracle RAC, but RAC failover feature is not
supported. If a database failover to another node in an Oracle RAC database configuration, the server is
forced to shut down. The server has to be manually re-started to establish connection to the new Oracle
RAC database instance. To configure CA Workload Automation DE database for failover in an Oracle RAC
cluster environment, the required JDBC connection string is given below:
jdbc:oracle:thin:@(DESCRIPTION=(load_balance=on)(FAILOVER=on)(ADDRESS_LIST=(ADDRESS=(PROTOC
OL=TCP)
(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME= service-name)))
where,
You can modify the database connection jdbc.URL parameter using the setdbparm utility.
The server cannot be restarted when the server to DB2 connection failed due to DB2 database exception
error. The server trace log contains the following error message:
Solution:
To fix this issue, contact your DBA administrator and verify the following:
1. Change the password in the DB2 database configuration. Change the rdbms.password
parameter in the CA Workload Automation DE database connection properties using the
'setdbparm' utility.
Solution:
The server takes long time to display the application workflow in the Monitor perspective, when the
server waits for an existing database connection.
By default, the server can establish up to fifty database connections. You can increase the default value.
If the trace log file contains a message similar to the following, the server is waiting for an existing
connection to complete the task:
Solution:
The SQL Server database exception error indicates that the CA Workload Automation database
transaction log file is full and has run out of disk space, forcing the server to shut down.
To fix this issue, contact your DBA administrator and review the log_reuse_wait_desc column in
sys.databases catalog view to find root cause.
Server Shuts Down When the Disk Space of DB2 Database is Full
Symptom:
When the disk space of DB2 database is full, the server shuts down with the following error message in
the trace log:
Solution:
To fix this issue, contact your administrator and increase the disk space on the database server.
1. Remove active and completed jobs information from the server repository as follows:
Force complete active application generations that are in trouble or having failed/suberror jobs
for long time to ensure the following:
Issue the 'purgecompletedjobs' CLI command periodically or schedule the command to run once
a day for all applications (for example, over 7 days old or less) depending on the run frequency.
Example:
2. Scale down job historical data periodically by deleting 'excessive' records from ESP_APPLICATION
and ESP_GENERIC_JOB tables in server database:
b. Issue single SQL statement to clear data from both database ESP_APPLICATION and
ESP_GENERIC_JOB tables.
The SQL statement will delete, using the END_DATE_TIME date job historical data from
both ESP_APPLICATION and ESP_GENERIC_JOB tables due to CAWA database schema
uses cascade delete and therefore sets up dependencies between rows in the tables. It
is recommended to keep few months’ old data and DO NOT specify a date that will
impact active workflow in the SQL statement.
Solution:
The server collects and stores the information about completed Applications in the relational database.
The server performance gets impacted when the volume of history data increases over a certain period
of time.
To improve the server performance, issue the MOVEHISTORYDATA CLI command to move the history
data in the database tables to stage tables. This command has the following format:
movehistorydata [olderthan("olderthan")]
olderthan("olderthan")
Moves history data older than the specified time to stage tables.
Example: This example moves all the history data that is older than a month to stage tables:
movehistorydata olderthan("now less 1 month")
Note: When the data has been moved, a message is added to the tracelog.txt file indicating the number
of Applications and rows that were moved.
The agent may fail to start with the error message "Unexpected encryption
type. None received. DES expected."
This problem can come when encryption type and key defined for the agent in the Topology does not
match the one specified during agent installation.
Please make sure that the encryption key defined for the agent in the Topology is of same type (DES)
and matches the one defined for the agent during installation.
When there is huge number of agents in agent group configured with CPU load
balancing, then lots of stale messages gets exchanged between agent and
server. This can result in decreased performance of the following:
Change role
Startup of server
Workload processing of server
For additional information on troubleshooting this problem, refer to the PIB Number RI73864
dSeries agent fails to start after the owner of the dSeries installation directory
was changed to a non-root user
Description:
ESP dSeries was installed on a Linux machine as root. After the installation the owner of the ESP_dSeries
directory was changed with the 'chown' command:
chown -R dseries:dseries ESP_dSeries
On subsequent start up the dSeries agent fails with the following message:
/opt/ESP_dSeries/startServer: WARNING: Could not increase file descriptor limit to 4096.
Solution:
1. Increase the nofile parameter by updating /etc/security/limits.conf:
soft nofile 4096
hard nofile 65535
Jobs may go to 'Ready' state and won’t complete if they are run as non-root
user. Following errors may show up in spool file:
---------------------------------------------------------------------------------------------
Output of messages for workload object SECONDJOB/GEGU.10/MAIN
Start date Tue Jan 7 15:30:32 2014
---------------------------------------------------------------------------------------------
-bash: /data/CA/WorkloadAutomation/ESPSystemAgent/cybspawn: Permission denied
cybspawn: unable to obtain the message queue ID. errno:13 Reason:Permission
denied
cybspawn: unable to obtain the message queue ID. errno:13 Reason:Permission
denied
Solution:
Change the permission to 755 on the Directory: /data/CA/WorkloadAutomation/ESPSystemAgent.
Then, restart the agent and see if this solves the issue.
Once the dseries is shut down and brought back after long time jobs which were
missed during that time period will get triggered. What is the way that these
jobs should not be triggered.
If you don't want any missed scheduled executions of events during the outage to trigger after the
server is re-start, we would suggest you suspend the events before the server outage and resume the
events later.
While a failover made by the changerole command some agents were not
updated with active dSeries server and had continuing trying the connection to
the previous one
Most likely, the root cause was the 2nd execution of "changerole" command before the server
completed processing all the "Control Active" messaged received as part of the previous execution
changerole command.
The ESP dSeries connects to the Agent, sends a message, waits for an acknowledgment of receipt, and
then disconnects.
When an Agent has something to send to ESP dSeries Manager, it does the same thing: connects, sends,
gets an acknowledgment, and disconnects.
ESP dSeries and Agents have sender and receiver components. The receiver listens on a predefined
TCP/IP port.
When the sender has one or more messages to transmit, it connects to the receiver's port, sends the
messages, and then closes the connection.
Each Agent has a dynamic sender port (i.e. the sender component grabs any free port, when it wants to
send a message to the Manager) and one receiver port (i.e. the communication.inputport ).
The ESP dSeries Manager has a receiver port (i.e. the communication.managerport ) and has a dynamic
sender port (i.e. the sender component grabs any free port, when it wants to send a message to the
Agent) for each connected Agent.
For these reasons, it's important that your firewall rules (were present) should be such that it
accommodates the Agent Server communication method.
1. The file system ran out of the available disk space. You could check it with the 'df' command
(refer to your OS Reference guide for the command usage), for example as it's shown for the
IBM AIX:
# df –k
Filesystem 1024-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 65536 46488 30% 1690 14% /
/dev/hd2 5898240 1087404 82% 29116 11% /usr
/dev/hd9var 589824 493344 17% 463 1% /var
/dev/hd3 2031616 2028036 1% 91 1% /tmp
/dev/hd1 65536 63972 3% 51 1% /home
2. The permission on the Agent's spool directory was changed; use the 'ls -al' command against the
spool directory and its subdirectories to display the permission.
3. The maximum value for a file reference has been exceeded. This value is defined by the
operating system standard (LINK_MAX). It is usually 32,000 or 32,768 (depending on the OS).
Most likely you haven't cleared the spool files for a long period of time so it might have reached
this maximum value (you also might have too many files in the Agent's log directory so it would
be useful to check this as well). You would need to move as many spool file as possible from
their directories and then repeat this procedure regularly.
For all three possible cases above, stop the Agent first, and then perform necessary actions - if needed
please get help from your UNIX administrator.
A job failed with Status: WARNING: User <user_name> is not authorized to send
AFM to Agent <user_name>
Symptom:
A job failed with Status: WARNING: User dba is not authorized to send AFM to Agent <agent_name>.
Solution:
The WA user that triggered the application doesn't have permissions AGENT.* and AGENTUSR.* for the
agent on which the job is executed. WA administrator needs to add that permission to the WA user in
question.
Migrating to CAWA DE 11.3, you need to have a new database instance created and install the new
CAWA DE 11.3 release and artifacts, global variables, and history data from the current CAWA DE11.1 to
CAWA DE 11.3 (you should not install on top of the current CAWA DE 11.1). Refer to the Implementation
Guide on upgrading the server.
You can upgrade to r11.3 from the following versions of the server:
r5.0.3
r11.1
1. Install the r11.3 server and default agent on Windows or UNIX with the latest service packs applied.
Note: If you install the r11.3 server on the same computer as your existing installation, you must use
different ports.
For Agents, both CAWA DE 11.1 and 11.3 (on same server) can use same Agent R7 installed on other
nodes. There is a note in (1) above.
Note: If you install the r11.3 server on the same computer as your existing installation, you must use
different ports.
Operating System might not be set for the Daylight Savings Time (DST)
adjustment. If it is not set, would CA Workload Automation DE still follow DST?
Yes. CA Workload Automation DE retrieves the DST information from the server JRE and does not use
the DST setting at the operating system level. If any recent changes were made at your local time zone,
please check the following site:
http://java.sun.com/javase/downloads/index.jsp#timezone
For the latest TZUpdater and if needed apply one to both Desktop Client and the dSeries server JREs.
The TZUpdater installation instructions could be found on aforementioned site.
What are the Oracle 10.x/11.x certified character sets for dSeries server?
The certified character sets for Oracle 10.x/11.x are WE8ISO8859P1/WE8MSWIN1252 respectively.
Oracle Unicode is not supported.
The migration utility obtains all artifacts, except for the history and the global variables, via the
source server and stores them into the "migration\tmp\<migration_dir>" directory; then the
destination server copies these artifacts into its database during the start routine;
The migration utility obtains the history data and the global variables and places them into the
destination server's database directly by itself, i.e. without dSeries server’s involvement.
Therefore, in order to migrate the artifacts into the server 11.3 you would need to start up the source
server (either the 5.0.3 or the 11.1 version) and the destination (the 11.3) server's database; then start
the destination (the 11.3) server.
How to cold start CA Workload Automation (CA WA) and what happens after
cold start?
Symptom:
If CA Workload Automation shuts down improperly, it can result in inconsistencies in the database. This
can result in several issues, such as events not triggering properly to server shutdown. Cold start is then
required to clear these inconsistencies and errors in the database.
Solution:
To do a cold start of CA Workload Automation, rename the runonce.properties.bak to
runonce.properties in the install directory and start the CA WA.
Users can query ESP_APPLICATION table in the CA WA database to see all the historical data. Here is a
query that will select all the rows from the defined date:
CA Workload Automation dSeries server has five levels to implement the cold
start. What are those levels meanings?
The cold start is purposed to clear out active workload and delete the events' schedules. However there
are five levels of the cold start which allow to vary the behavior:
Start.type.level = -2
The server starts with normal cold start, but scheduled events that have not been processed at the time
of the shutdown will be re-scheduled. Events that require a manual trigger are not preserved.
Start.type.level = -1
The server starts with a normal cold start, the generations count for all applications is set to zero - these
applications will be purged also.
Start.type.level = 0
The server cold starts its scheduler and active workload (nothing is preserved).
Start.type.level = 1
The server starts with the cold start, but all scheduled events are preserved. Events that were not yet
processed at the time of the shutdown continue to be triggered after the server is started. Events that
require a manual trigger are also preserved.
Start.type.level = 2
The server starts with the cold start, but active workload is preserved. Workload that was running at the
time of shutdown continues to run after the server is started.
What flavor of the JDK 6 should be chosen for the Web Services installation - 32
or 64-bit?
Currently the dSeries Web Services installer exists in the 32-bit compilation only therefore the JDK 6
should be chosen of 32-bit as well.
comment out line containing LOG_FILE and add another line so it looks like:
comment out line containing LOG_FILE and add another line so it looks like:
#LOG_FILE=/opt/sunje01/dseries11.1/clilog.txt
LOG_FILE=/dev/null
How can I check if the port is available, the firewall does not block the traffic,
and the service is running on it?
Telnet to <IP address>: <port number>.
For example:
C:\>telnet host1:80
C:\>Connecting to host1:80...Could not open a connection to host: Connect failed
How to find the version and build number of the CA Workload Automation DE
server?
You can run the version utility to verify the version and build number of the server.
To verify the server version and build number, follow these steps:
install_dir
Specifies the server installation directory.
Note: Alternatively you can also connect to desktop client as ADMIN user run the cli command “about”
from the CLI perspective.
How to find the version and build number of CA Workload Automation Agent?
To find the version and build number of an agent, follow these steps:
1. Change to the agent installation directory on the command prompt.
2. Enter the following command:
On Windows:
cybAgent –v
On UNIX:
./ cybAgent –v