100% found this document useful (1 vote)
126 views487 pages

Troubleshoot Section

Uploaded by

shitalhbhalerao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
126 views487 pages

Troubleshoot Section

Uploaded by

shitalhbhalerao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 487

Troubleshoot

AI Operations Management -
Containerized
Version : 24.4

PDF Generated on : 12/19/2024

The Information Company

© Copyright 2024 Open Text


AI Operations Management - Containerized 24.4

Table of Contents
1. Troubleshoot 15

1.1. Find the log files 16

1.1.1. Find the log files 18

1.2. Troubleshoot installation 19

1.2.1. Failed creating reader to topic in scheduler pod log 20

1.2.2. Automatically create required databases functionality fails 21

1.2.3. Deployment fails when UCMDB Probe is integrated in demo environment 22

1.2.4. Fluent bit logs for postload-taskcontroller has many errors due to insufficient 23

buffer size
1.2.5. Troubleshoot guided install 25

1.2.6. idl_config.sh script is failing with below error on sass 27

1.2.7. Collector customization settings were lost after installing 2021.11 Patch 1 28

1.2.8. Anomaly detection capability configuration doesn't overwrite the chart 29

configuration
1.2.9. itom-monitoring-service-data-broker remains in Init mode when OBM 30

capability
1.2.10. AMCis not selected files aren't deployed after installation
configuration 31

1.2.11. Error: The maximum number of addresses has been reached 34

1.2.12. Image upload error while image download and upload 35

1.2.13. Error: Target group name 'xxx-tg-TLS-5443' cannot be longer than '32' 36

characters
1.2.14. itom-monitoring-admin pod not starting after installation 37

1.2.15. The itom-monitoring-admin pod is unresponsive during certificate import 38

1.2.16. itom-di-postload pods deployment isn't successful 40

1.2.17. Cannot install secondary deployment because 'MultipleDeployment' is not 42

enabled in feature-gates
1.2.18. Failed to parse values.yaml 43

1.2.19. Can not reuse name 44

1.2.20. Failed pre-install: BackoffLimitExceededI 45

1.2.21. Pre-upgrade hooks failed: job failed: BackoffLimitExceeded 50

1.2.22. “502: Bad Gateway” 52

1.2.23. Suite Installer 503 nginx error 53

1.2.24. Suite uninstall fails 54

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 2
AI Operations Management - Containerized 24.4

1.2.25. Pods in status CrashLoopBackOff 55

1.2.26. First master node goes down during installation 56

1.2.27. The browser does not trust certificates 57

1.2.28. Remove the OBM configuration 58

1.2.29. CAS content zip optic content schema installation gets stuck 61

1.2.30. "Too many pods" error when installing an application 62

1.2.31. Log forwarding for the new capability is not configured 63

1.3. Troubleshoot upgrade 64

1.3.1. Helm upgrade fails with timeout error 65

1.3.2. The itom-monitoring-admin pod is in CrashLoopBackOff state 67

1.3.3. AWS content upgrade fails 68

1.3.4. Automatic upgrade of OBM Content Pack and UCMDB view fails while 69

upgrading AI Operations Management


1.3.5. itom-monitoring-admin pod doesn't come up while upgrading due to 71

Liquibase lockpods not running after upgrade


1.3.6. All the 72

1.3.7. ops-monitoring-ctl tool fails with invalid username or password 73

1.3.8. gen_secret.sh failing due to cert issue during rerun on the same 75

environment
1.3.9. Pre-upgrade hooks failed 76

1.3.10. Error while upgrading OMT 78

1.3.11. OPTIC DL Vertica Plugin fails with an error during the upgrade 80

1.3.12. "UPGRADE FAILED" error occurs after updating certificates from OMT 81

Management
1.3.13. EventsPortal
sent from OBM are not stored in the opr_event Vertica table 82

1.3.14. Upgrading Hyperscale Observability results in AWS itom-snf-monitoring pod 83

to not restart
1.3.15. Pulsar Push adapter doesn't work after upgrade 84

1.4. Troubleshoot administration 86

1.4.1. NFS storage is running out of space 87

1.4.2. Cannot delete a database or user 88

1.4.3. UCMDB probe pod stuck 89

1.4.4. Vault pod is stuck 90

1.4.5. Cannot login to OMT swagger 91

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 3
AI Operations Management - Containerized 24.4

1.5. Troubleshoot generic metric collection issues 92

1.5.1. Troubleshoot Content Administration Service (CAS) 93

1.5.2. User does not have required permissions to modify OOTB content 97

1.5.3. Agent System Metric Push does not have a Store and Forward capability 98

1.5.4. Credentials manager pod fails to start after suite restart 99

1.5.5. itom-opsbridge-cs-redis in CrashLoopBackOff state 100

1.6. Troubleshoot Agent Metric Collector 102

1.6.1. Issues related to Edge self-monitoring when AMC is deployed 103

1.6.2. AMC throws an SSL connection error 104

1.6.3. AMC failed to collect metrics 105

1.6.4. Topology forward to OPTIC DL fails 106

1.6.5. AMC failed to discover new nodes 107

1.7. Troubleshoot resource related issues 108

1.7.1. Pod "omi-0" not getting started 109

1.7.2. Worker node does not start 110

1.8. Troubleshoot Docker 111

1.8.1. Login to Docker Hub fails 112

1.8.2. "System error: read parent: connection reset by peer" 113

1.8.3. Docker pull doesn't work: Error while pulling image 114

1.9. Troubleshoot Post Installation Issues 115

1.9.1. Troubleshoot verification of installation 116

1.9.2. Renew token failed in http_code=403 119

1.9.3. Book-keeper pods fail 120

1.9.4. Find the pod logs 123

1.10. Troubleshoot Postgresql 125

1.10.1. idm-postgresql cannot access /var/pgdata 126

1.11. Troubleshoot Agentless Monitoring 127

1.11.1. Troubleshoot APM integration configuration 128

1.11.2. Quick Report shows no data 129

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 4
AI Operations Management - Containerized 24.4

1.11.3. Unable to see SiteScope providers in Agentless Monitoring UI 130

1.12. Troubleshoot Application Monitoring 133

1.12.1. APM fails to communicate CI's deletion to Application Monitoring 134

1.12.2. Sync Issue between APM and MCC Application Monitoring 135

1.12.3. Setup is idle for 15 mins during sync 136

1.12.4. Missing Files Content/Files Section Under Monitor Resources 137

1.12.5. Error "crash loop back off" 138

1.13. Troubleshoot OBM 139

1.13.1. OBM pod fails to start 140

1.13.2. omi-1 pod has no policies deployed 141

1.13.3. Business logic engine service is currently unavailable 142

1.13.4. Classic OBM UI remains inaccessible after logging into AI Operations 143

Management
1.13.5. Event browser connection error 144

1.13.6. Japanese view names do not appear correctly 145

1.13.7. OBM dialog boxes and applets fail to load 146

1.13.8. OBM services not starting 147

1.13.9. OBM UI fails to load the page due to lightweight single sign-on issue 148

1.13.10. Recipients page does not open 149

1.13.11. RTSM Administration pages do not load 150

1.13.12. Event Correlations are skipped in high load situations 151

1.13.13. RTSM Gateway gets locked 152

1.13.14. Workspaces menu is empty 153

1.13.15. Export My Workspace content to another system 154

1.13.16. Troubleshoot Data Flow Probe 155

1.13.16.1. Cannot transfer Data Flow Probe from one domain to another 156

1.13.16.2. Discovery shows disconnected status for a Probe 157

1.13.16.3. BSM server and the Probe connection fails due to an HTTP exception 158

1.13.16.4. The Discovery tab is not displayed 159

1.13.16.5. Data Flow Probe node name cannot be resolved to its IP address 160

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 5
AI Operations Management - Containerized 24.4

1.13.16.6. mysqld.exe and associated files are not deleted 161

1.13.16.7. The Probe fails to start or fails to connect to server 162

1.13.16.8. Integration Probe not listed in Data Flow Probe Setup module tree 163

1.13.16.9. Troubleshoot PostgreSQL 164

1.13.16.9.1. Unable to find the Data Flow Probe database scripts 165

1.13.16.9.2. Data Flow Probe database service cannot start 166

1.13.17. Downtime notifications are not sent to DES unless OBM processes are 167

restarted
1.13.18. File format and extension pop-up appear in the graph_type excel 168

1.13.19. Errors appear in opr-configserver.log 169

1.13.20. OBM Configurator tool doesn't terminate after timeout 170

1.13.21. Common keyboard shortcuts 171

1.13.22. Adding or deleting dashboard or favorite on one GW is not visible on other 172

GWs
1.13.23. PD does not find any entry point to forward data to BVD 173

1.13.24. Missing footer inside variable picker 174

1.13.25. Conditional dashboard assignment fails 175

1.13.26. OBM and OMW connection issues 176

1.13.27. Troubleshoot Build Adapter Package 177

1.13.28. Logs 178

1.13.29. Common keyboard shortcuts 181

1.13.30. Failed to create CMDB role 182

1.13.31. Delete integration_admin user 183

1.13.32. Creating a BVD Connected Server using the CLI doesn't work 184

1.13.33. Service Health data flow from OBM to RAW tables of OPTIC DL 185

1.13.34. Operations Agent Health dashboard is not displayed when selecting an 186

Operations
1.13.35. PDAgent CI fails from OPTIC DL as content packs having PD artifacts to
graphing 187

graph metrics
1.13.36. from OPTIC
OMi Server DL fail to import
self-monitoring content pack shows errors and unresolved 188

content
1.13.37. OutOfMemoryError: GC overhead limit exceeded error 189

1.14. Troubleshoot OMT 190

1.15. Troubleshoot Stakeholder Dashboards and OPTIC reports 191

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 6
AI Operations Management - Containerized 24.4

1.15.1. Operations Cloud doesn't load latest content 192

1.15.2. bvd-redis pod goes to CrashLoopBackOff state 193

1.15.3. Localization isn't working while exporting the report to PDF 194

1.15.4. First Stakeholder Dashboard is blank in Firefox browser 195

1.15.5. BVD CLI and Web to PDF CLI exit with error 196

1.15.6. Data table widget becomes blank on editing 197

1.15.7. Requesting PDF of a UIF page without specifying the file name throws an 198

error
1.15.8. BVD pod is in CrashedLoopState as migration table is locked 199

1.15.9. No valid trusted impersonation between user 200

1.15.10. BVD does not show the correct license information 201

1.15.11. Please try after some time-server is busy 202

1.15.35. Request to the server fails 203

1.15.13. Unable to login BVD console. Getting 403 error 204

1.15.14. Schedule jobs are deleted if the schedules are failed 205

1.15.15. Name of the report and exported csv file names are different during cross 206

launch
1.15.16. Link provided in the mail along with scheduled report is not loading 207

complete report.notifications are not shown even though they are enabled
1.15.17. Popup 208

1.15.18. Number of bytes received from Vertica exceeded the configured 209

maximum
1.15.19. Certain passwords provided during application configuration for Vertica do 210

not work No
1.15.20. with BVD
data in RUM BVD dashboards 211

1.15.21. BVD data and statistics aging issue 214

1.15.22. Vertica certificate issue 216

1.15.23. Vertica DB connection fails with self signed certificate 217

1.15.24. BVD reports failed to load with a red banner without any error 218

1.15.25. Vertica Database connection fails 219

1.15.26. BVD pods failing with error WRONGPASS invalid username-password pair 220

1.15.27. Uploading a dashboard locks the SVG file on disk 221

1.15.28. Blank SVG files or shapes 222

1.15.29. Dashboard loading crashes or blocks browser 223

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 7
AI Operations Management - Containerized 24.4

1.15.30. Exchange Certificate in Vertica 224

1.15.31. Vertica RO user password does not get updated 225

1.15.32. Administration menu doesn't appear in the left panel 226

1.15.33. Connection timed out 227

1.15.34. Date type parameter using UTC time zone 228

1.15.35. Request to the server fails 229

1.15.36. Processing request from server: Request failed with status code 500 230

1.15.37. Processing request from server: Request failed with status code 404 231

1.15.38. Server busy error 232

1.15.39. Exporting report to PDF fails with warning message 233

1.15.40. WebtoPDF not generating PDF when port number is not specified in the 234

URL
1.15.41. WebtoPDF not generating PDF 235

1.15.42. Mail command failed:501 Invalid MAIL FROM address provided 236

1.15.43. Notification messages 237

1.16. Troubleshoot OPTIC Data Lake 239

1.16.1. Troubleshoot data flow 241

1.16.1.1. Metric data does not reach the Vertica database 242

1.16.1.2. Troubleshoot Forecast/Aggregate data flow 243

1.16.1.3. Data logging to Vertica stopped 246

1.16.1.4. Aggregate not happening after upgrade 247

1.16.1.5. Aggregate table has missing or no data 248

1.16.1.6. Data is in OPTIC Data Lake Message Bus topic but not present in Vertica 250

tables
1.16.1.7. Unable to create the same dataset again 251

1.16.1.8. Data sent to the OPTIC DL HTTP Receiver not available in Vertica 252

database
1.16.1.9. Postload task flow not running 253

1.16.1.10. Automatic certificate request from itom-collect-once-data-broker-svc not 254

received
1.16.1.11. Single message pushed to a topic is not streaming into database 255

1.16.2. Error messages 256

1.16.2.1. Certificate with the alias 'CA on abc.net' is already installed 257

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 8
AI Operations Management - Containerized 24.4

1.16.2.2. dbinit.sh reinstall fails with error 258

1.16.2.3. ERROR: Unavailable: initiator locks for query - Locking failure: Timed out I 259

locking
1.16.2.4. Can't forward events from OBM to OPTIC DL 260

1.16.2.5. Failed to create consumer: Subscription is fenced 261

1.16.2.6. Error while publishing data to the OPTIC DL Message Bus topics 262

1.16.2.7. After upgrade, the Vertica itom_di_metadata_ TABLE is not updated 263

1.16.2.8. Data loading to Vertica is stopped for a topic 264

1.16.2.9. Error getting topic partitions metadata 265

1.16.2.10. Insufficient resources on pool error 266

1.16.3. Troubleshoot OPTIC DL connection issues 267

1.16.3.1. Vertica database is not reachable 268

1.16.3.2. Failed to connect to host 269

1.16.3.3. Correlated group of events does not appear in OBM 270

1.16.3.4. Data Source not getting listed in PD 271

1.16.3.5. Table does not get deleted after dataset is deleted 272

1.16.3.6. Vertica catalog directory has random empty folders 273

1.16.4. Troubleshoot OPTIC DL pods issues 274

1.16.4.1. The itom-di-metadata-serverand itom-di-data-access-dpl pods are not Up 275

and Running
1.16.4.2. Howafter installation
to recover itomdipulsar-bookkeeper pods from read-only mode 277

1.16.4.3. itom-di-dp-worker-dpl pod is in CrashLoopBackOff state 280

1.16.4.4. The itomdipulsar pods stuck in the init state 281

1.16.4.5. itomdipulsar-zookeeper pod in CrashLoopBackOff state 282

1.16.4.6. Postload pods do not start and are stuck in 1/2 status 283

1.16.4.7. Suite deployment failed with pods in pending state 284

1.16.5. Troubleshoot using ITOM DI monitoring dashboards 286

1.16.5.1. Guidelines for adding panels to the OPTIC Data Lake Health Insights 287

dashboard
1.16.5.2. Vertica Streaming Loader dashboard panels have no data loaded 288

1.16.5.3. The DP worker memory usage meter displays increasing memory usage 289

1.16.5.4. Data Flow Overview dashboard displays some topics with message batch 290

backlog greater than 10K

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 9
AI Operations Management - Containerized 24.4

1.16.5.5. Data not found in Vertica and Scheduler batch message count is zero 291

1.16.5.6. Postload Detail dashboard 293

1.16.5.7. Postload Detail dashboard Taskflow drop-down does not list the 295

configured
1.16.5.8. task flows
Postload Overview dashboard 296

1.16.5.9. Request error rate in Data Flow Overview dashboard is greater than zero​ 299

1.16.5.10. Request error rate in Data Flow Overview dashboard is increasing over 300

time
1.16.5.11. The dashboard loads slowly or the page is unresponsive 301

1.16.5.12. The Receiver dashboard, Average Message Outgoing Rate panel 302

displays zero
1.16.5.13. The Receiver dashboard, Avg incoming requests rate - error (All) panel 303

is greater than
1.16.5.14. zero req/sec
The Receiver dashboard, Receiver running panel shows less than 100% 304

1.16.5.15. Vertica dashboard queries failfrequently during daily aggregation 305

phase
1.16.6. How to's 306

1.16.6.1. How to check the Vertica tables 307

1.16.6.2. How to check if OPTIC DL Message Bus topics and data is created 308

1.16.6.3. How to check the OPTIC DL Message Bus pod communication 309

1.16.6.4. How to recover OPTIC DL Message Bus from a worker node failure 310

1.16.6.5. How to check connectivity between Vertica node and OPTIC DL Message 311

Bus ProxyHow
1.16.6.6. services
to verify the OPTIC DL Vertica Plugin version after reinstall 313

1.17. Troubleshoot Hyperscale Observability 315

1.17.1. Performance Dashboards and events aren't visible 317

1.17.2. Dashboard not found error on trying to redirect monitoring Service 318

Overview dashboard
1.17.3. UCMDB views for Hyperscale Observability aren't available in Performance 319

Dashboard
1.17.4. Performance Dashboard displays graphs with no data 320

1.17.5. Discovery is failing and not discovering any of the components for multi 322

probe domain
1.17.6. Hyperscale Observability events not forwarding to OPTIC Data Lake 323

1.17.7. WRONGPASS invalid username-password or user is disabled 324

1.17.8. Troubleshoot AWS Hyperscale Observability 325

1.17.8.1. Deleting a credential using the CLI (ops-monitoring-ctl) fails 326

1.17.8.2. AWS events are not forwarded to OBM 327

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 10
AI Operations Management - Containerized 24.4

1.17.8.3. Discovery failed because of invalid proxy 329

1.17.8.4. Unable to discover AWS resources and collect metrics 330

1.17.8.5. Metrics collected but no records in database 331

1.17.8.6. Discovery or metric collection failed 332

1.17.8.7. Unable to create a monitoring configuration 335

1.17.8.8. Multiple CIs with same name in uCMDB and PD Views 337

1.17.8.9. Events are triggered incorrectly 338

1.17.8.10. No events in the event browser 339

1.17.8.11. Modifications to default threshold configuration files get overridden 340

1.17.8.12. No Hyperscale Observability dashboards appear when a CI is selected 341

1.17.8.13. Unable to collect specific metrics for ECS 342

1.17.8.14. A few widgets in the Performance Dashboards don't have data 343

1.17.9. Troubleshoot Azure Hyperscale Observability 344

1.17.9.1. error loading graphQL mapping file for service: datafactoryservices 345

1.17.9.2. The resource type could not be found in the namespace 346

'Microsoft.DataLakeStore' for apicome


1.17.9.3. PT Dashboard doesn't version
up '2016-11-01 347

1.17.10. Troubleshoot Kubernetes Hyperscale Observability 348

1.17.10.1. Events for Kubernetes infrastructure objects aren't displaying in OBM 349

Event Browser
1.17.10.2. Kubernetes collector triggers false events with major severity 350

1.17.10.3. Kubernetes collection fails due to hostname verification failure 352

1.17.10.4. Kubernetes Summary page displays undefined value in MYSQL innodb 354

graph
1.17.10.5. Kubernetes Summary page displays wrong data in the Total 355

Namespaces Count widget


1.17.11. Troubleshoot VMware Hyperscale Observability 356

1.17.11.1. Find VMware Virtualization logs files 357

1.17.11.2. Failed to activate zone. Error : [Action: activate zone , Resource: , Status 359

Code: 500,The
1.17.11.3. Request Status: Serverupdate
ops-monitoring-ctl internal error
command displays an error 361

1.18. Troubleshoot Automatic Event Correlation 362

1.18.1. UIF pages take a long time to load 363

1.18.2. AEC events aren't getting processed after an upgrade 364

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 11
AI Operations Management - Containerized 24.4

1.18.3. Troubleshoot AEC pipeline 365

1.18.4. aec_admin role doesn't exist in IDM 367

1.18.5. Automatic Event Correlation Explained UI is not visible in UI Foundation 368

1.18.6. Automatic Event Correlation Explained UI does not load the translation 369

resources
1.18.7. Automatic Event Correlation Explained UI does not show complete data 370

1.18.8. Automatic Event Correlation Explained UI cannot be launched from OBM 371

1.18.9. AEC pods restart frequently 372

1.18.10. Automatic Event Correlation fails 373

1.18.11. Correlation job fails with timeout error 374

1.18.12. AEC pods display error due to insufficient resources in Vertica's general 375

resource itom-analytics-opsbridge-notification
1.18.13. pool pod fails with OOMKilled error 376

1.18.14. AEC Explained UI partitions fail to load 377

1.18.15. Analytics pods are in CrashLoopBackOff state 378

1.18.16. AEC topology partitions do not reflect topology in RTSM 379

1.18.17. AEC pipeline pods are scaled down 380

1.19. Troubleshoot Monitoring Service Edge 381

1.19.1. OBM agent proxy selection during Edge installation on K3s installs 382

additional
1.20. common Reports
Troubleshoot components 383

1.20.1. Metric collector fails to connect to agent node 384

1.20.2. Discovery authorization fails 385

1.20.3. Troubleshoot issues related to historical or missing data 388

1.20.4. Error in report widgets quexserv.error.query.nosuch.host 390

1.20.5. RUM reports are showing partial data or no data 392

1.20.6. Business Process Monitoring reports are showing partial data or no data 397

1.20.7. Troubleshoot Business Process Monitoring reports collection issues 402

1.20.8. System Infrastructure reports are showing no or partial data or updated 405

data is not
1.20.9. shown
System in the reports
Infrastructure report widget displays partial data for metrics 408

collected by Operations
1.20.10. Troubleshoot Agent Infrastructure Reports collection issues with Agent
System 416

Metric Collector
1.20.11. Troubleshoot System Infrastructure Reports collection issues with Metric 421

Streaming policies

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 12
AI Operations Management - Containerized 24.4

1.20.12. Forecast data is not displayed in System Infrastructure Summary or 426

System
1.20.13.Resource Details reports
System Infrastructure Availability data is missing in reports 428

1.20.14. Event reports are showing no or partial data or updated data is not shown 431

in the reports
1.20.15. System Infrastructure report widget displays partial data for metrics 433

collectedTroubleshoot
1.20.16. by SiteScopeSystem Infrastructure Reports collection issues with 441

SiteScope as collector
1.20.17. Task flows aren't listed on the OPTIC DL Health Insights dashboard 444

1.20.18. Aggregate tables are not updated data in the system infrastructure or 445

event reports
1.20.19. aren't resources
Insufficient refreshed to execute plan on pool 448

itom_di_postload_respool_provider_default
1.20.20. tenant_id is not configured - SiteScope 449

1.20.21. lastUpdatedBy: is not defined in the schema 450

1.20.22. ops-monitoring-ctl tool is not starting the metric collection 451

1.20.23. Agent Metric Collector is unable to collect metrics from the Operations 452

Agents
1.20.24.on thecontent
The workerupload
nodes fails or if the tables in mf_shared_provider_default 453

schema
1.20.25. are
FromnotOPTIC
populated completely
Data Administration : Could not complete request 455

successfully
1.20.26. SysInfra file system or node instanceType doesn't display the targets 456

1.20.27. ProducerBlockedQuotaExceededException error in DI receiver logs 457

1.20.28. Service health aggregation 459

1.20.29. CI enrichment not available for metric data 461

1.20.30. Downtime enrichments aren't forwarded to OPTIC Data Lake 462

1.20.31. Troubleshooting topology centric reports 463

1.20.32. Issue with Data Enrichment Service with Classic OBM Integration 464

1.21. Troubleshoot Open Data Ingestion 465

1.21.1. Error “Failed to authorize user” error code: 3002 466

1.21.2. Error “Failed to authorize request” error code: 3001 467

1.22. Troubleshoot CMI tool errors 468

1.22.1. The process can't access the file because it's being used by another 469

process
1.22.2. Configuration not found in the configuration sheet 470

1.22.3. Field name has invalid input at row number in the sheet 471

1.22.4. [ERROR] : Column name has invalid input at row number in the 472

configuration sheet

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 13
AI Operations Management - Containerized 24.4

1.22.5. Type mismatch exception “Can't get a STRING value from a NUMERIC cell” 473

1.22.6. CMI tool fails to generate Excel files when ran directly on an OA machine 474

1.23. Troubleshoot Integration 475

1.23.1. Change password for an UCMDB user 476

1.23.2. Data forwarding issues from classic OBM to OPTIC Data Lake 477

1.23.3. Inherited SiteScope integration role is missing 478

1.23.4. SiteScope topology doesn't appear in RTSM 479

1.23.5. SiteScope (non-TLS) - OBM (TLS) integration fails while configuring the 480

connected server SiteScope node CIs in the OBM RTSM


1.23.6. Duplicate 481

1.24. Contact support 482

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 14
AI Operations Management - Containerized 24.4

1. Troubleshoot
You can use the information to troubleshoot problems that you may encounter when installing and using the AI Operations
Management and the ITOM Container Deployment Foundation.

Complete the following steps to troubleshoot issues:

1. Perform general checks. Follow these steps::


Check the Known issues section in the Release notes.
Check for the availability of patches that may have fixed some known issues.
Check if the issue is related to a third-party product. Contact the respective vendor for support.
2. After detecting a problem, refer to the troubleshoot topics to find solutions.
3. If you are still unable to resolve the issue on your own, then analyze the logs and contact support.

Related topics
To see the list of known issues, see Known issues .
To see the list of known issues, see Release Notes.
For more information on how to manage logs, see Logs.
To contact support, see Contact Support.
For more information on Troubleshooting Toolkit, see Deployment Toolkit.
For more information on OMT troubleshooting, see Troubleshoot OMT

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 15
AI Operations Management - Containerized 24.4

1.1. Find the log files


To troubleshoot your issue, you can review log files.

Silent install logs

Logs for the OMT install phase


1. Phase1: /opt/cdf/log/scripts/install
2. Phase2: /opt/cdf/log/scripts/silent-install
3. Upload Images: /opt/cdf/log/scripts/uploadimages
4. OMT API server: <itom-logging-vol>/container/cdf-apiserver-*.log

Logs for AI Operations Management install phase


1. Orchestration: <itom-core-volume>/suite-install/opsbridge/output/log/opsbridge_config.log
2. OBM: <itom-core-volume>/suite-install/opsbridge/output/log/opsbridge_config.log
3. OPTIC Data Lake: <itom-core-volume>/suite-install/opsbridge/output/log/di_config.log
4. BVD: <itom-core-volume>/suite-install/opsbridge/output/log/bvd_config.log
5. EUM: Log into suite-conf-pod-opsbridge-eum-*-*. /var/opt/OV/log/startup.log
6. PM: Log into suite-conf-pod-opsbridge-pm-*-*. /var/opt/OV/log/pm.log
7. CollSvc: None

OMT API server logs


Log of cdf-apiserver pod: kubectl logs <cdf-apiserver pod name> -n core -c cdf-apiserver

Application logs
Installation

/opt/kubernetes/install-.log

NFS share

Location to access logs from the worker node:

<path to NFS log-volume>/<namespace>/<namespace>__<pod-name>__<container-name>__<node-name>[__<optional component-na


me>]

Use the double underscore in <namespace>__<pod-name>__<container-name>__<node-name>[__<optional component-


name>].

Example: /var/vols/itom/log-volume/opsbridge-w87mk/opsbridge-w87mk__omi-0__omi__example.hostname.net__obm

<NFS_obm_directory>/omi/opt/HP/BSM/log/topaz_all.log
<NFS_obm_directory>/omi/opt/HP/BSM/log/jboss7_boot.log
<NFS_obm_directory>/omi/opt/HP/BSM/log/supervisor/nanny_all.log

Login

<NFS_obm_directory>/omi/opt/HP/BSM/log/jboss/login.log

OMT logs

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 16
AI Operations Management - Containerized 24.4

OPTIC Management Toolkit (OMT) uses Fluent bit to collect and gather logs for OMT system components, containers, and
Kubernetes. For more information, see the OMT documentation.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 17
AI Operations Management - Containerized 24.4

1.1.1. Find the log files


You can find log files in the directory <CDF_LOG_VOLUME>/container .

Available log types are access , audit , bvd and error . The logs follow the naming schema <log type>-<pod name>.log

The access logs contain information about all requests processed by the server (for example HTML files or graphics).
This data can then be statistically analyzed and summarized by another program.
The audit logs contain information about successful and failed user logins.
The logs contain debug information for more detailed troubleshooting.

By default, the application only logs errors and auditing information.

You can enable additional logging for the receiver, controller, and web server. To enable additional logging, complete the
following steps:

Enabling access logging for the receiver will also log the API keys. This might impact the security of the application as every
user with access to the log files will be able to see the API keys.

1. Launch the Management Portal from a supported web browser:

https://<external_access_host>:5443

<external_access_host> is the fully qualified domain name of the host which you specified as EXTERNAL_ACCESS_HOST in
the install.properties file during the OPTIC Management Toolkit (OMT) installation. Usually, this is the master node's
FQDN.

2. Click on Launch Dashboard (opens a new browser page), select <application namespace> and Deployments.

3. For either bvd-receiver-deployment or bvd-www-deployment , click Actions > View/edit YAML.

Look for the required YAML file.

bvd-receiver : Receive incoming messages (data items)

bvd-www : Provides web-UI and real-time push to browser

bvd-controller : Does aging of old data items and bootstrap of database

bvd-ap-bridge : Talks to Autopass server and calculates # of allowed dashboards

bvd-redis : In memory database for statistics and session data. Message bus for server process communication

4. In the new window, search for "debug" to find the DEBUG entry.

5. Change the DEBUG value to bvd:*

6. Click UPDATE.

If you install the application with the internal PostgreSQL, you can also view the Redis and PostgreSQL logs using kubectl :

kubectl get pods -n <opsbridge namespace>

kubectl logs <pod name> -c bvd-redis -n <opsbridge namespace>

kubectl logs <pod name> -c bvd-postgres -n <opsbridge namespace>

Note

: After the installation of the application some errors are expected and the application processes and PostgreSQL logs the errors.
They should stop after a few minutes, once the application is up and running correctly.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 18
AI Operations Management - Containerized 24.4

1.2. Troubleshoot installation


This section provides the following troubleshooting topics:

Container crashes due to low entropy


AMC configuration files aren't deployed after installation
Error: The maximum number of addresses has been reached
Image upload error while image download and upload
Error: Target group name 'xxx-tg-TLS-5443' cannot be longer than '32' characters
itom-di-postload pods deployment isn't successful
Cannot install secondary deployment because 'MultipleDeployment' is not enabled in feature-gates
Failed to parse values.yaml
Can not reuse name
Failed pre-install: BackoffLimitExceeded
“502: Bad Gateway”
Suite Installer 503 nginx error
Suite uninstall fails
Pods in status CrashLoopBackOff
First master node goes down during installation
The browser does not trust certificates
Remove the OBM configuration

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 19
AI Operations Management - Containerized 24.4

1.2.1. Failed creating reader to topic in scheduler


pod log
When deploying AWS in a shared OPTIC Data Lake environment with AI Operations Management as provider, an error was
encountered in the scheduler pod NFS logs.

Cause
This occurs when there is a connection error between the Message Bus and Vertica. Following is the error recorded in the
scheduler pod NFS logs:

Failed creating reader to topic - requesting restart: ConnectError | Zero message count found in topic!

Solution
Perform the following steps as a workaround to resolve this error:

1. Run the following command to add the existing application suite chart values to the helmvalues_opsb.yaml:

helm get values opsb -n opsb -o yaml > helmvalues_opsb.yaml

2. Update the following parameters in the helmvalues_opsb.yaml file and set the values as shown below:

pulsar: pulsar.itomqapri.saqa-aws.cloud

externalDNS:

enabled: true

3. Run the following command to update the deployment:

helm upgrade opsb -n opsb -f helmvalues_opsb.yaml <suite_chart_tgz_path> --timeout=20m

4. Verify the following new entries created for the Message Bus in AWS Route53 service:
A-record
TXT-record
5. Ensure that there are no issues in the data flow and the Message Bus to Vertica connection is restored.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 20
AI Operations Management - Containerized 24.4

1.2.2. Automatically create required databases


functionality fails

Error
psql: error: connection to server at "port 5432 failed: FATAL: database "dbadmin" does not exist"

Issue
Automatically create required databases functionality fails.

Solution
Create a database with the same name as the database administrator user name.

For example, if your database administrator user name is dbadmin, then run the following command to create a database
with the same name:

CREATE DATABASE dbadmin OWNER dbadmin;

and then use the same user dbadmin while deploying using Automatically create required databases feature on Apphub.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 21
AI Operations Management - Containerized 24.4

1.2.3. Deployment fails when UCMDB Probe is


integrated in demo environment
The Universal CMDB (UCMDB) probe restarts in a demo environment.

Cause
When you set the deployment size to Demo and you have UCMDB probe integrated in your enviroment, the UCMDB probe
might restart.

Solution
Perform the following steps to remediate the issue:

1. Run the following command to retrieve the deployment configuration

helm get values <helm deployment name> -n <application namespace> > <VALUES_FILE_NAME>

Example

helm get values deployment01 -n opsb-helm > /var/tmp/values_new.yaml

2. Edit the yaml file and update the following values:


Parameter Value

ucmdbprobe.size Small

ucmdbprobe.deployment.maxRemoteProcesses 2

3. Run the following command to redeploy.

helm upgrade <helm deployment name> <chart> -n <suite namespace> -f <values.yaml>

Example

helm upgrade deployment01 /home/opsbridge-suite-chart/charts/opsbridge-suite-20xx.xx.0.tgz -n opsb-helm -f /var/tmp/values_n


ew.yaml

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 22
AI Operations Management - Containerized 24.4

1.2.4. Fluent bit logs for postload-taskcontroller


has many errors due to insufficient buffer size
After deploying AI Operations Management or Network Operations Management to use fluent bit functionality, it's advised to
monitor the console of the fluent bit pod.

Error
The console has error messages that the lines are too long in the taskcontroller.log .

for example, taskcontroller.log requires a larger buffer size, lines are too long. Skipping file.

Cause
Insufficient buffer size for taskcontroller.log

Solution
The workaround for this issue is to update the fluent bit configmap created for OPTIC Data Lake Postload.

1. Edit the configmap di-postload-fluentbit

kubectl edit -n <application namespace> cm/di-postload-fluentbit

2. Search for " di-postload-input.conf " key in the configmap and add the following lines at the end of the INPUT section
for taskcontroller.log as shown in the examples.

Buffer_Chunk_Size 64KB
Buffer_Max_Size 128KB

Important

Retain the indentation in the file as shown in the


example.

Example for Network Operations Management:

di-postload-input.conf: |-
[INPUT]
Name tail
Tag odlpostload.*
Path /fluentbit/deployment-log/nom/nom*postload-taskcontroller*/taskcontroller.log
Multiline On
Parser_FirstLine odl-postload-tc
Buffer_Chunk_Size 64KB
Buffer_Max_Size 128KB

Example for AI Operations Management:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 23
AI Operations Management - Containerized 24.4

di-postload-input.conf: |-
[INPUT]
Name tail
Tag odlpostload.*
Path /fluentbit/deployment-log/opsb/opsb*postload-taskcontroller*/taskcontroller.log
Multiline On
Parser_FirstLine odl-postload-tc
Buffer_Chunk_Size 64KB
Buffer_Max_Size 128KB

3. Save the change and exit.


4. Get the name of the fluent bit deployment:

kubectl get deploy -n <application namespace> | grep itom-fluentbit

5. Redeploy the fluent bit pod:

kubectl scale deploy/<fluent bit deployment name> -n <application namespace> --replicas=0


kubectl scale deploy/<fluent bit deployment name> -n <application namespace> --replicas=1

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 24
AI Operations Management - Containerized 24.4

1.2.5. Troubleshoot guided install


The console shows phase wise deployment status information and error messages. For detailed logs, at the end of the
installation, you can check the guided_Install_dd-mm-yy-h:min:sec.log file, for example, guided_Install_18-04-23-08:15:03.log file, in
the same directory from where you executed the script. In case of a failure, you can use the log file to troubleshoot issues.

The following are some common troubleshooting scenarios:

LPV deployment fails

Solution
1. Check the guided_Install_dd-mm-yy-h:min:sec.log file. For example, guided_Install_18-04-23-08:15:03.log file.
2. If it fails stating disk is already mounted or the disk already has a partition created, add a new disk that's not mounted
or partitioned.
3. Refer to Configure separate disk devices for each LPV on worker nodes and complete the prerequisite.
4. If it fails for any other reason follow the steps mentioned in Uninstall local storage provisioner.

Vertica installation fails

Vertica deployment fails

Solution 1
Check the firewall status. If you have disabled the firewall on the master node and enabled on Vertica nodes, disable the
firewall on all Vertica nodes.

Solution 2
1. Check the guided_Install_dd-mm-yy-h:min:sec.log file. For example, guided_Install_18-04-23-08:15:03.log file.
2. If the error is due to Vertica failure, run the following commands on all the Vertica nodes to clean up:

sudo rpm -e vertica-11.1.*


sudo rm -rf /opt/vertica/
sudo rm -rf /var/opt/vertica/

3. If you have used the certificates created by the guided install script, on the Vertica server, clean up the certificates and
key in the following directories:
1. < directory where you have downloaded and unzipped the files>/resources/issuecert directory
2. /tmp on all the vertica nodes

If you have used existing certificates, on all the Vertica servers, clean up the certificates and key in the /tmp directory.

4. Rerun the guided install script

Vertica cleanup step fails


At times, the cleanup fails with the following error:

ERROR: Vertica processes still running.ERROR: You must stop them prior to uninstall.

Solution
Perform the following steps to resolve this issue:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 25
AI Operations Management - Containerized 24.4

1. Stop the Vertica processes that are running on all the nodes.
2. Run the following command on all Vertica nodes:

rpm -e vertica-<version>

OPTIC Data Lake Vertica plugin installation fails

Solution
Check the guided_Install_dd-mm-yy-h:min:sec.log file. For example, guided_Install_18-04-23-08:15:03.log file. If the cause of the
issue is an existing /home/dbadmin/.itomdipulsarudx , then follow these steps:

1. Remove the /home/dbadmin/.itomdipulsarudx file from all the Vertica nodes and follow the instructions at Install OPTIC
Data Lake Vertica Plugin.
2. Rerun the guided install script.

If there is any other issue with the plugin, see Install the Vertica OPTIC DL plugin, Create the variables file, and Configure the
Vertica database and Enable TLS to install and configure manually.

PostgreSQL installation fails


If you have used the certificates created by the guided install script and PostgreSQL installation fails, cleanup the certificates
and key created by guided install script in the following directories on the PostgreSQL server:

< directory where you have downloaded and unzipped the files>/resources/issuecert directory
/tmp

If you have used existing certificates, on the PostgreSQL server, clean up the certificates and key in the /tmp directory.

Rerun the script


If you must run the guided installer again because of OMT failure, the installer resumes from the last completed stage
depending on the user input. You must clean up all the resources by following these steps:

1. Uninstall OMT.
2. Edit your least-input.properties or custom-input.properties file depending on what you have used as shown in the example
below:

completedModule: "Master,Worker,NFS,relationalDatabase,Vertica,OMTDatabase,OMTPreRequisite"

If script fails at OMTDeployment stage, perform the clean up steps and then remove both the modules OMTDatabase and
OMTPreRequisite from completedModule section and add them in the deploy section.

deploy: "OMTDatabase,OMTPreRequisite,OMTDeployment,LPVDeployment,ApplicationPrerequisite"

3. Rerun the guided install script.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 26
AI Operations Management - Containerized 24.4

1.2.6. idl_config.sh script is failing with below


error on sass

Error
Error: UPGRADE FAILED: create: failed to create: Secret "sh.helm.release.v1.opsb.v5" is invalid: data: Too long:
must have at most 1048576 bytes

2022-09-16T09:28:03,428Z ERROR idl_config:main: Helm upgrade failed.

Issue
idl_config.sh script fails due to helm secrets size issue.

Cause
opsb-background.png under the images directory is taking more disk space, up almost 217kb .

Solution
1. Unzip the application chart
unzip opsbridge-suite-<version>.tgz
2. Go to /opsbridge-suite/_bosun/images/
cd /opsbridge-suite/_bosun/images/
3. Delete opsb-background.png
rm opsb-background.png
4. Continue with running the idl_config.sh script using the following command:
./idl_config.sh -cacert <absolute path of obm certificate> -chart < absolute path of opsbridge application chart tgz> -namespace < a
pplication namespace> -release <deployment name>

Example:

./idl_config.sh -cacert /home/ec2-user/obm.crt -chart /tmp/opsbridge-suite-chart/charts/opsbridge-suite/ -namespace opsb -release


opsb

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 27
AI Operations Management - Containerized 24.4

1.2.7. Collector customization settings were lost


after installing 2021.11 Patch 1

Cause
You may lose OOTB collector configurations after 2021.11 patch1 installation if autoStartAgentMetricCollector to true.

Solution
If you have changed OOTB configuration, make sure that AMC isn't going to autostart after Patch installation. Set autoStartAg
entMetricCollector to false in the values.yaml file before patch installation. Follow the installation instructions given in the
patch documentation.

Tip

Don't customize OOTB collector configurations but instead clone and disable the OTTB versions before the
installation.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 28
AI Operations Management - Containerized 24.4

1.2.8. Anomaly detection capability configuration


doesn't overwrite the chart configuration
When deploying the anomalyDetection capability, the configuration doesn't correctly apply to the itom-oba-config chart in value
s.yaml file. The anomaly detection configurator or source type configurator UI displays "failed to load data" error message on
the screen.

Cause
This issue occurs due to the incorrect structure of anomaly parameters in the values.yaml file. The itom-oba-config chart ( anom
alyDetection capability) doesn't read the parameters from the suite provided in the values.yaml file.

Solution
1. Run the following command to get the values.yaml file.

helm get values <helm deployment name> -n <suite namespace> > /tmp/values.yaml

2. Rearrange the configuration parameters located under itom-oba-config instead of anomalyDetection as follows:

itom-oba-config:
deployment:
oba:
protocol: https
host: OBA Application Server host
configParameterServicePort: 9090

3. Run the following command to update the suite:

helm upgrade <helm deployment name> -n <suite namespace> -f <values.yaml> <chart> --reuse-values

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 29
AI Operations Management - Containerized 24.4

1.2.9. itom-monitoring-service-data-broker
remains in Init mode when OBM capability is not
selected

Cause
When you have deployed OPTIC Reporting or HyperScale Observability through AppHub without selecting external OBM or
OBM capability, itom-monitoring-service-data-broker remains in Init status.

Solution
In AppHub, edit the same deployment to enable OBM capability and redeploy.
If you want to use Classic OBM, do the following and redeploy:

For OPTIC Reporting > Enable Agent Metric Collector > enable Use Classic Operations Bridge Manager
(OBM)
For HyperScale Observability > enable Use External OBM.

Note

There may be a delay during Data broker pod startup as agent configuration runs every time the pod is recreated. This is the
expected behavior and requires no action.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 30
AI Operations Management - Containerized 24.4

1.2.10. AMC configuration files aren't deployed


after installation
When installing AI Operations Management with OPTIC Reporting capability, the default Agent Metric Collector (AMC)
credential, target, and collector configuration files are deployed automatically through auto configuration job ( itom-monitoring-
collection-autoconfigure-job ). However, one or more configuration files may fail to get deployed due to the causes listed below.

Cause 1
The IsAgentMetricCollectorEnabled parameter wasn't enabled in the values.yaml file during installation.

Solution 1
Ensure that you set the parameter IsAgentMetricCollectorEnabled to true in the values.yaml file. For more information,
see Configure System Infrastructure Reports using Agent Metric Collector.

Cause 2
All the containers in the required pods (Content Manager service, Monitoring Admin service, Data Administration service, IDM
service, and BVD service) aren't in running state.

Solution 2
Ensure that all the pods are up and running so that you can deploy the AMC configuration file. Run the following command to
verify if all the pods are running:

Kubectl get pods -AKubectl get pods -A

Cause 3
Download of the ops-monitoring-ctl and ops-content-ctl CLIs failed or Sysinfra content pack ( OpsB_SysInfra_Content_202x.
xx.xxx.zip) failed.

Solution 3
See Administer AMC with CLI to know more about how and where to download the CLIs.

Open the autoconfig logs using the following command:

kubectl logs -n $(kubectl get pod -A | awk '/autoconfigure/ {print $1, $2}') > autoconfig.log

Look for the error message in the autoconfig.log file.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 31
AI Operations Management - Containerized 24.4

Target Status Retries


------------------------------------------------------------------------------------------
amc_obm_rtsm Create Failed 3
amc_oa_nodes Create Failed 3

Credential Status Retries


------------------------------------------------------------------------------------------
amc_obm_basic_auth Create Failed 3
amc_obm_cert_auth Create Failed 3

File Status Retries


------------------------------------------------------------------------------------------
bbc_ports.txt Create Failed 3

Collector Status Tag Enabled Version Retries


----------------------------------------------------------------------------------------------------
agent-collector-sysinfra Create Failed standard 2022.05.140 3

If the credential and target configuration files aren't available on the NFS server, you can manually create the credential,
target, and collector configurations. For more information, see AMC credential configuration and AMC target configuration.

To create and deploy the configurations, see Administer AMC with CLI.

Cause 4
Certificate exchange between Operations Bridge Manager (OBM) and Data Broker Container (DBC) isn't done within two
hours of installation.

Solution 4
This suggests that the certificate exchange didn't happen within two hours.

Perform the following steps to resolve the issue:

You can use one of the following ways to grant certificates from OBM:
From OBM UI - Administration -> Setup and Maintenance -> Certificate Requests

Or

From CLI
Run the following command to get a list of available certificates. You will get the core IDs of the available certificates. For
example: "daf17e6a-e203-75d6-10d6-ff59507e88dc"

# /opt/OV/bin/ovcm -listpending

Run the following command to grant the certificate:

# /opt/OV/bin/ovcm -grant coreId

For example:

# /opt/OV/bin/ovcm -grant "daf17e6a-e203-75d6-10d6-ff59507e88dc"

Rerun the autoconfig job with the following commands:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 32
AI Operations Management - Containerized 24.4

kubectl get jobs -n $(kubectl get jobs -A | awk '/autoconfigure/ {print $1, $2}') -o yaml > file_itom-monitoring-collection-autoconfigure
-job.yaml
yq eval -i 'del(.spec.template.metadata.labels)' file_itom-monitoring-collection-autoconfigure-job.yaml
yq eval -i 'del(.spec.selector)' file_itom-monitoring-collection-autoconfigure-job.yaml
kubectl delete job -n $(kubectl get jobs -A | awk '/autoc/ {print $1, $2}')
kubectl apply -f file_itom-monitoring-collection-autoconfigure-job.yaml

After the AMC configuration files are deployed, you will see the following message in the auto configuration log:

=======================================SUMMARY====================================
=================
obmendpoint - https://hostname.swinfra.net:443/

Certificates :
32fdaad0-76e8-75d5-1966-88cc61e5a54c

Trusted Certificates :
CA_41f46360-1006-75d5-0dc3-8fe1ffa780f9_2048
MF RE CA on Vault b355f46c
MF RID CA on Vault b355f46c
kubernetes

Target Status Retries


------------------------------------------------------------------------------------------
amc_obm_rtsm Created
amc_oa_nodes Created

Credential Status Retries


------------------------------------------------------------------------------------------
amc_obm_basic_auth Created
amc_obm_cert_auth Created

Collector Status Tag Enabled Version Retries


----------------------------------------------------------------------------------------------------
agent-collector-sysinfra Created standard true 2022.11.149

=================================================================================
===================

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 33
AI Operations Management - Containerized 24.4

1.2.11. Error: The maximum number of


addresses has been reached
Error: "The maximum number of addresses has been reached. Service: AmazonEC2; Status Code: 400; Error Code: AddressLimitExceeded"

Cause
While performing the AWS infrastructure manual set up Prepare VPC step, uploading the network-with-vpc.template , the upload
fails with the error. This is because the default Elastic IP setting in the service quota is 5. This isn't enough to successfully
create the stack.

Solution
To resolve this issue, you must enhance the quota (for example: 10) in the AWS Management Console. After the quota
approval to a higher number, the stack gets created. For more information, see AWS documentation.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 34
AI Operations Management - Containerized 24.4

1.2.12. Image upload error while image download


and upload
If you are using the ITOM Cloud Deployment Toolkit for AWS setup, an upload error appears while running the image-transfer.p
y script during creation of ECR repositories and download and upload of images.

Solution
Perform the following steps to resolve this issue:

1. Go to the location /etc/docker and open the daemon.json file.


2. Edit the file as follows and save it:

{
"max-concurrent-downloads": 1,
"max-concurrent-uploads": 1
}

3. Run the command: systemctl start docker

4. Run the following commands on the system with AWS CLI:

region=<ecr_region>
ecrURL=`aws ecr get-authorization-token --region $region --query="authorizationData[0].proxyEndpoint"| grep -oE "[0-9]+[^\"]*"
`
ecrUserName=AWS
ecrUserPassword=`aws --region $region ecr get-login-password`
python3 image-transfer.py -su <source-username> -sp <source-password> -so <source-orgname> -sr <source-registry> -p /tmp/
cdf-image-set.json -ts 4 -ry 3 -tu $ecrUserName -tp $ecrUserPassword -to <target-orgname> -tr $ecrURL
python3 image-transfer.py -su <source-username> -sp <source-password> -so <source-orgname> -sr <source-registry> -p /tmp/
_image-set.json -ts 4 -ry 3 -tu $ecrUserName -tp $ecrUserPassword -to <target-orgname> -tr $ecrURL

The image-transfer.py is in <unzipped opsbridge-suite-chart>/scripts/byok folder.


In these commands:

<ecr_region> is the region where you will put your ECR repositories.
<source-username> is the user name of the source docker registry.
<source-password> is the password of the source docker registry.
<source-registry> is the URL of the source registry. For example, registry.hub.docker.com .
<source-orgname> is the organization name in the source docker registry. It's hpeswitom if you use Docker Hub, or
contact the admin of the registry.
<target-orgname> is the organization name in the target docker registry. You can get it from the admin of the
registry.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 35
AI Operations Management - Containerized 24.4

1.2.13. Error: Target group name 'xxx-tg-TLS-


5443' cannot be longer than '32' characters
While running the load balancer script on AWS using the toolkit to deploy the suite, the following error appears:

Error: Target group name 'xxx-tg-TLS-5443' cannot be longer than '32' characters

where, xxx is the environment-prefix.

Cause
This issue is because you have set the environment-prefix in itom-cloud-toolkit-20xx.xx.xx-XX/aws/tf-itom-sa/template.tfvars file with
more characters than the default.

Solution
Ensure to set the environment-prefix in itom-cloud-toolkit-20xx.xx.xx-XX/aws/tf-itom-sa/template.tfvars file with the same character
length as the default value.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 36
AI Operations Management - Containerized 24.4

1.2.14. itom-monitoring-admin pod not starting


after installation
After completing the installation, the itom-monitoring-admin pod doesn't come up. You will see the following statements in the
container log file:

liquibase: 400 Bad Request. Default objects cannot be updated.


liquibase: 400 Bad Request. Default objects cannot be updated.
liquibase: 400 Bad Request. Default objects cannot be updated.
liquibase: 400 Bad Request. Default objects cannot be updated.

Cause
The issue occurs if the default objects fail to upload again after itom-monitoring-admin pod restarts.

Solution
To resolve this issue, follow these steps:

1. For the embedded PostgreSQL database, get the credentials of the PostgreSQL database, log in to the pod, and then log
in to the PostgreSQL database:

kubectl exec -ti itom-postgresql-766d5455df-lmwd4 -n opsb-helm -c itom-postgresql -- get_secret IDM_DB_USER_PASSWORD_KEY

kubectl -n opsb-helm exec <pod_name> -ti -c itom-postgresql -- bash

psql -d monitoringadmindb -U monitoringadminuser -p <port> -h itom-postgresql;

2. For the external PostgreSQL database, log into PostgreSQL database using the IP address:

psql -d monitoringadmindb -U monitoringadminuser -p <port> -h <IP addres of the PostgreSQL database>;

3. When prompted for the password, enter the password.

4. Delete entries in cs_specdefinition table:

TRUNCATE monitoringadminschema.cs_spec_definition CASCADE;

5. Delete spec definition from metadata:

DELETE FROM monitoringadminschema.cs_metadata where type= 'specdefinition';

6. Delete thresholds:

truncate monitoringadminschema.CS_THRESHOLD_NEW cascade;


delete from monitoringadminschema.cs_metadata where type ='threshold';

7. Delete and restart the itom-monitoring-admin pod:

kubectl delete pod itom-monitoring-admin-b7565db8b-zc5gs -n collection-service

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 37
AI Operations Management - Containerized 24.4

1.2.15. The itom-monitoring-admin pod is


unresponsive during certificate import
The itom-monitoring-admin pod doesn't start or takes more time. The issue persists even after increasing the readiness probe
timeouts. This issue occurs during certificate import when charts are deployed.

itom-monitoring-admin 2022-07-10 14:32:00,835 INFO 200-setup-user:setup_user Created user: serviceuser


itom-monitoring-admin 2022-07-10 14:32:00,847 INFO 200-setup-user:setup_user Created group: servicegroup
itom-monitoring-admin 2022-07-10 14:32:00,873 INFO 999-service:source Running as user 1999: uid=1999(serviceuser) gid=1999(serv
icegroup) groups=1999(servicegroup)
itom-monitoring-admin 2022-07-10 14:32:00,881 WARN 999-service:source Not running with read-only filesystem
itom-monitoring-admin 2022-07-10 14:32:00,897 WARN 200-calc-heap:source heap space (773741824) exceeded maximum (50000000
0)
itom-monitoring-admin 2022-07-10 14:32:00,935 INFO utils:importKey Importing key into keystore /tmp/home/secrets/server-keystore: /
var/run/secrets/boostport.com/…

Cause:
This issue occurs in the host systems that don't give enough entropy for secure random numbers.

Solution:
You can resolve this issue by installing haveged package in all worker nodes. Perform the following steps to add the Extra
Packages for Enterprise Linux (EPEL) repository for Red Hat Enterprise Linux (RHEL) and Community ENTerprise Operating
System (CentOS):

1. Run the following command to create a temporary directory for storing the EPEL repository rpm file:

mkdir <folder_name>

2. Navigate to the newly created directory and then download the EPEL repository rpm file by running the following
command to download RHEL:

wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm or curl -LO https://dl.fedoraproject.org/pub/epel/epel-r


elease-latest-7.noarch.rpm

3. Run the following command to install the newly downloaded rpm package:

sudo yum install epel-release-latest-7.noarch.rpm

4. Run the following command to install haveged package:

yum install haveged

5. Verify if the haveged services are running on your system.


a. View the loading status of haveged :

systemctl status haveged

b. Enable haveged :

systemctl enable haveged

c. Start haveged :

systemctl start haveged

d. View the active status of haveged :

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 38
AI Operations Management - Containerized 24.4

systemctl status haveged

6. Restart the itom-monitoring-admin pod.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 39
AI Operations Management - Containerized 24.4

1.2.16. itom-di-postload pods deployment isn't


successful
Cause
This is a known issue and is because the Vertica RW/RO database user password has " = " character.

Solution
Follow the steps to resolve this issue:

Task 1: Change the passwords - Change the Vertica password without using the " = " character in the Vertica database for
Vertica DB, read write, read-only users.

For External Vertica, don't change the password for the dbadmin . You must change the password only for the vertica_rouser
and vertica_rwuser you created during the Vertica installation. Perform the following steps:

1. On the Vertica node, run the command: su - <dbadmin username>


2. Log on to the Vertica database and run the following queries using vsql :
ALTER USER <vertica_rouser> IDENTIFIED BY '<newpassword>';
ALTER USER <vertica_rouser> IDENTIFIED BY '<newpassword>';

Tip

If you have used embedded Vertica, you must perform these steps for the dbadmin
user.

Task 2: Update suite secrets verification

Run the following command on the control plane to verify if you can update the password for ITOMDI_DBA_PASSWORD_KEY and
ITOMDI_RO_PASSWORD_KEY parameters:

cd /opt/kubernetes/scripts/
./generate_secrets -u -n <suite namespace> -c <opsbridge-suite_chart_file> -o <secrets.yaml>

Where:
<suite namespace> is the namespace where you have installed AI Operations Management.
<opsbridge-suite_chart_file> is a file with .tgz extension. The suite zip contains the file opsbridge-suite-<version>.tgz in
the charts directory.

If you have upgraded to this version from a version where you had = in the passwords, you won't be able to edit the ITOMDI_
DBA_PASSWORD_KEY and ITOMDI_RO_PASSWORD_KEY parameters and you see the following message:

2021-09-02T09:54:06+03:00 [INFO] secret 'ITOMDI_DBA_PASSWORD_KEY' has already been configured.


2021-09-02T09:54:06+03:00 [INFO] secret 'ITOMDI_RO_USER_PASSWORD_KEY' has already been configured.

You must perform the next task to patch the secret with the new password.

Task 3: patch AI Operations Management secret with the new password

1. Run the following command:

kubectl patch secret opsbridge-suite-secret -p '{"data":{"ITOMDI_DBA_PASSWORD_KEY":"'$(echo <newpassword> | tr -d '\n' | bas


e64)'","ITOMDI_RO_USER_PASSWORD_KEY":"'$(echo <newpassword> | tr -d '\n' | base64)'"}}' -n <suite-namespace>

2. Run the following commands to restart the pods:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 40
AI Operations Management - Containerized 24.4

kubectl scale deployments/itom-di-metadata-server -n <suite-namespace> --replicas=0


kubectl scale deployments/itom-di-administration -n <suite-namespace> --replicas=0
kubectl scale deployments/itom-di-postload-taskcontroller -n <suite-namespace> --replicas=0
kubectl scale deployments/itom-di-postload-taskexecutor -n <suite-namespace> --replicas=0
kubectl scale deployments/itom-di-scheduler-udx -n <suite-namespace> --replicas=0
kubectl scale deployments/itom-di-data-access-dpl -n <suite-namespace> --replicas=0

kubectl scale deployments/itom-di-metadata-server -n <suite-namespace> --replicas=1


kubectl scale deployments/itom-di-administration -n <suite-namespace> --replicas=1
kubectl scale deployments/itom-di-postload-taskcontroller -n <suite-namespace> --replicas=1
kubectl scale deployments/itom-di-postload-taskexecutor -n <suite-namespace> --replicas=1
kubectl scale deployments/itom-di-scheduler-udx -n <suite-namespace> --replicas=1
kubectl scale deployments/itom-di-data-access-dpl -n <suite-namespace> --replicas=1

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 41
AI Operations Management - Containerized 24.4

1.2.17. Cannot install secondary deployment


because 'MultipleDeployment' is not enabled in
feature-gates

[FATAL] 2020-06-05 09:17:11: Cannot install secondary deployment because 'MultipleDeployment' is not enabled in
core/feature-gates is seen when you try to create suite namespace or secondary namespace.
# ./cdfctl.sh deployment create -d <suite namespace >-t Helm -u admin -p <'password>

Cause
FEATURE_GATE setting does not allow multiple deployments.
When changing install.properties we have set "Prometheus=true". But that will not allow taking the default of
"MultipleDeployment=true".

Solution
1) kubectl edit cm feature-gates -n core
2) set the MultipleDeployment feature to true, save, and exit.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 42
AI Operations Management - Containerized 24.4

1.2.18. Failed to parse values.yaml


Failed to parse values.yaml during helm install .
During helm install you may see: Error: Failed to parse values.yaml: error converting YAML to JSON: yaml: line xx: found character that
cannot start any token.

Cause
Helm install verifies the values.yaml file and throw warnings if there are any syntax errors. Helm install will not proceed
further until all the syntax errors are corrected.

Solution
Manually verify and correct all the errors. You may even choose to use any other syntax validator tools available.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 43
AI Operations Management - Containerized 24.4

1.2.19. Can not reuse name


Error: cannot re-use a name that is still in use.
If you happen to re-run the helm install command you may see an error like: Error: cannot re-use a name that is still in
use.

Cause
If you accidentally cancel or for some reason the install session gets aborted before it is completed and you have to re-run
the helm install command.

Solution
1. Uninstall the deployment which has aborted or canceled by using the below command:
helm uninstall <helm deployment name> -n <suite namespace> --no-hooks
Example:

helm uninstall deployment01 -n opsb-helm --no-hooks

2. Run helm-install as described in Deploy.

if you still face errors do the following:

1. Delete the complete namespace by following Uninstall.


2. Follow the steps mentioned in Re-install the suite while retaining OMT section of Re-run helm install.

Important

There is no need to
uninstall OMT.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 44
AI Operations Management - Containerized 24.4

1.2.20. Failed pre-install: BackoffLimitExceededI


During application installation, you may see the following error and the application installation may abort.

Error: failed pre-install: job failed: BackoffLimitExceeded

Cause
This issue occurs when one or more Kubernetes jobs have failed and have exceeded the number of retries.

The installation process has a check for certificates and databases. During the installation, the installation process validates
the following:

Certificate validation: External database CA certificates, application client authentication certificates, OPTIC Data Lake
message bus client authentication certificates, and ingress controller certificates.
Database validation: External database connection parameters for PostgreSQL and Oracle. If you have selected a
capability that requires OPTIC Data Lake, the Vertica database connection parameters are also checked.
OPTIC Data Lake Plugin validation: Supported Vertica versions, OPTIC Data Lake Vertica Plugin versions, and the Vertica
configurations.

The certificate validation, database validation, or Plugin validation may fail due to various reasons.

Causes for failures in certification validation


Certificate validation fails when the parameters don't have the expected values.

The following table provides more information on specific parameters and their expected value:

Parameter Expected value

Certificate The certificate format should be X509 with


format Privacy Enhanced Mail ( PEM) encoding.

Key length for RSA must be greater than or


Key length
equal to 2048 bits and ECDSA is 256 bits.

Signature Hash The algorithm should be rsaWithSha256 or later.


Algorithm (SHA) SHA-256 , SHA-384 , SHA-512 are supported.

The certificate validity period must be more than


Validity period
5 days.

The certificate key type should be RSA or


Type
ECDSA .

Extended Key Should be empty or set to "TLS Web Server auth"


Usage (EKU) .

Key Usage Key usage must contain a digital signature.

SAN SAN list should contain an external access host.

Causes for failures in DB validation


Database validation fails when you give the wrong username, schema name, or password.

Causes for failures in OPTIC Data Lake Plugin validation


The following table provides more information on errors and causes for OPTIC Data Lake Plugin validation failure:

Error Cause

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 45
AI Operations Management - Containerized 24.4

Error Cause

This error occurs due to one of the following causes:

ERROR driver: Failed to establish a connection to the primary server or an Vertica servers aren't responsive.
y backup host. ... no such host Vertica host name isn't correct.
The network path to Vertica hostname isn't reachable.

This error occurs due to one of the following causes:


ERROR driver: Failed to establish a connection to the primary server or an
Vertica port isn't correct.
y backup host. ... connect: connection refused
Vertica service stopped.

This error occurs due to one of the following causes:

Could not connect to Vertica: Error: [28000] Invalid username or passwor The Vertica user name isn't correct.
d The Vertica user password that's stored in a secret isn't
correct.

You have configured Vertica with TLS, but the Vertica


x509: certificate signed by unknown authority ...
certificate for the application is wrong.

Error: [3D000] Database "<myDB>" does not exist The Vertica database name is wrong or doesn't exist.

Invalid Vertica version detected Vertica Analytic Database v9.2.1, expecte


The Vertica version isn't supported.
d v10.1.0,v10.1.1

This error occurs due to one of the following causes:

OPTIC Data Lake Vertica Plugin isn't installed.


Failed reading UDX version
Vertica user doesn't have the required privileges to
OPTIC Data Lake Vertica Plugin in Vertica.

Invalid UDX version detected 2.4.0-8, expected 2.5.0-19 OPTIC Data Lake Vertica Plugin version is wrong.

Solution

Solution for deployment using AppHub


If you are deploying using AppHub, do the following:

1. Sign in to AppHub and go to Deployments.


2. Select the name of the deployment and click on View Health to analyze the details of pod status, summary, and
Logs. This will show an X mark for the pods that are in error state.
3. Select the pod in error state.
4. Go to the Advanced tab of that specific pod to see the error messages due to which the installation has failed and fix
the errors.
Resolve the issues related to OPTIC Data Lake Plugin validation according to the error logs as follows:
Error Solution

ERROR driver: Failed to establish a connection to the primary server or


Check the Vertica server connection.
any backup host. ... no such host

ERROR driver: Failed to establish a connection to the primary server or Check the Vertica port configuration and the Vertica
any backup host. ... connect: connection refused connection.

Could not connect to Vertica: Error: [28000] Invalid username or passw Check and give the correct Vertica username and
ord password.

x509: certificate signed by unknown authority ... Check and give the correct Vertica certificate name.

Error: [3D000] Database "<myDB>" does not exist Check and give the correct Vertica database name.

Invalid Vertica version detected Vertica Analytic Database v9.2.1, expe Ensure that you have deployed or upgraded to the
cted v10.1.0,v10.1.1 supported Vertica version.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 46
AI Operations Management - Containerized 24.4

Error Solution

Install the OPTIC Data Lake Vertica Plugin. Ensure that the
Failed reading UDX version
Vertica user has the required privileges.

You must install the supported OPTIC Data Lake Vertica


Invalid UDX version detected 2.4.0-8, expected 2.5.0-19
Plugin version.

5. After fixing the configurations redeploy.


6. Simultaneously, open one more terminal for the control plane node and run the following command:

​ kubectl get jobs| grep itom-analytics-flink-controller-pre-upgrade-hook

7. Delete the pre-upgrade-hook job.​​

kubectl delete job itom-analytics-flink-controller-pre-upgrade-hook

8. Continue to install or re-deploy the application.

Solution for deployment using CLI


1. Look for failed jobs in the application namespace.
Run the following command to list the jobs and see their status:

kubectl get jobs -n <application namespace>

2. Check for all the failed jobs and their logs to identify the problem and fix the issues.
Run the following command to see the logs of the failed job:

kubectl logs pods/<name of the failed job pod name> -n <application namespace>

Resolve the issues related to OPTIC Data Lake Plugin validation according to the error logs as follows:
Error Solution

ERROR driver: Failed to establish a connection to the primary server or


Check the Vertica server connection.
any backup host. ... no such host

ERROR driver: Failed to establish a connection to the primary server or Check the Vertica port configuration and the Vertica
any backup host. ... connect: connection refused connection.

Could not connect to Vertica: Error: [28000] Invalid username or passw Check and give the correct Vertica username and
ord password.

x509: certificate signed by unknown authority ... Check and give the correct Vertica certificate name.

Error: [3D000] Database "<myDB>" does not exist Check and give the correct Vertica database name.

Invalid Vertica version detected Vertica Analytic Database v9.2.1, expe Ensure that you have deployed or upgraded to the
cted v10.1.0,v10.1.1 supported Vertica version.

Install the OPTIC Data Lake Vertica Plugin. Ensure that the
Failed reading UDX version
Vertica user has the required privileges.

You must install the supported OPTIC Data Lake Vertica


Invalid UDX version detected 2.4.0-8, expected 2.5.0-19
Plugin version.

3. After fixing the configurations redeploy.


4. Simultaneously, open one more terminal for the control plane node and run the following command:

​ kubectl get jobs| grep itom-analytics-flink-controller-pre-upgrade-hook

5. Delete the pre-upgrade-hook job.​​

kubectl delete job itom-analytics-flink-controller-pre-upgrade-hook

6. Continue to install or re-deploy the application.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 47
AI Operations Management - Containerized 24.4

Example
This is an example scenario when itom-certificate-validator-job has failed.

# kubectl get jobs -n nom4


NAME COMPLETIONS DURATION AGE
apply-license-job-qouixka 1/1 3m10s 10d
create-odl-objects-job-7xpim4e 1/1 11m 10d
import-bvd-dashboards-job-6t6lpqf 1/1 5m43s 10d
itom-autocreate-databases-sk9yrq7 1/1 5s 10d
itom-certificate-validator-job 0/1 3h10m 3h10m
itom-pt-ui-config-job-pkzhndm 1/1 6m19s 10d

In this example, the itom-certificate-validator-job has failed, it's showing 0/1 is complete.

Run the following command to find the itom-certificate-validator pod name:

# kubectl get pods -n nom4 | grep itom-certificate-validator-job


itom-certificate-validator-job-lbgvv 0/1 Error 0 3h14mer of the role bvd_admin

Run the following command to verify the logs:

# kubectl logs pods/itom-certificate-validator-job-lbgvv -n nom4


2023-08-14T03:39:31.588+0000 INFO : Sleeping 0 ...
2023-08-14T03:39:31.690+0000 INFO : Running source scripts ...
2023-08-14T03:39:31.789+0000 INFO : Running startup scripts ...
2023-08-14T03:39:31.883+0000 INFO : Startup scripts completed
2023-08-14T03:39:31.977+0000 INFO : CMD: /bin/sh -c /script/main
Starting Certificate Validation
Started validating the CA Certificates
Found 4 certificates at /var/run/secrets/db-certificates

Parsing Certificate: /var/run/secrets/db-certificates/ProvRE.crt


Parsed the Certificate: /var/run/secrets/db-certificates/ProvRE.crt sucessfully
Begin the expiry check for the certificate.
Certificate expiry validation successful.

Begin the Public Key length check for the certificate.


Certificate Public Key Length validation successful.

Begin the Signature Key Algorithm check for the certificate.


Certificate Signature Key Algorithm matches with the required key algorithm: SHA256-RSA

Parsing Certificate: /var/run/secrets/db-certificates/ProvRID.crt


Parsed the Certificate: /var/run/secrets/db-certificates/ProvRID.crt sucessfully
Begin the expiry check for the certificate.
Certificate expiry validation successful.

Begin the Public Key length check for the certificate.


Certificate Public Key Length validation successful.

Begin the Signature Key Algorithm check for the certificate.


Certificate Signature Key Algorithm matches with the required key algorithm: SHA256-RSA

Parsing Certificate: /var/run/secrets/db-certificates/postgres.crt


Parsed the Certificate: /var/run/secrets/db-certificates/postgres.crt sucessfully
Begin the expiry check for the certificate.
Certificate expiry validation successful.

Begin the Public Key length check for the certificate.


Certificate key length validation failed: certificate Public Key Length validation failed. Minimum key length must be 2048 bits.
Begin the Signature Key Algorithm check for the certificate.
Certificate Signature Key Algorithm matches with the required key algorithm: SHA256-RSA

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 48
AI Operations Management - Containerized 24.4

Common Certificate Validation Failed for CA Certificates: Certificate: SAN: sac-hvm03312.swinfra.net Issuer CN: sac-hvm03312.swinfra.
net Common Name: sac-hvm03312.swinfra.net
Parsing Certificate: /var/run/secrets/db-certificates/vertica-ca.crt
Parsed the Certificate: /var/run/secrets/db-certificates/vertica-ca.crt sucessfully
Begin the expiry check for the certificate.
Certificate expiry validation successful.

Begin the Public Key length check for the certificate.


Certificate Public Key Length validation successful.

Begin the Signature Key Algorithm check for the certificate.


Certificate Signature Key Algorithm matches with the required key algorithm: SHA256-RSA

Error: Not all the certificates were found valid.

Started validating the API Client CA Certificates


processApiClientCaDir: There are no API Client CA Certificates present to validate.

Started validating the ODL external CA signed Certificates


processOpticDLCertDir: There are no ODL external CA signed certificates present to validate.

Started validating the NGINX Ingress controller custom server Certificates


processNginxCertDir: There are no NGINX Ingress controller custom server certificates present to validate.

Certificate validation failed. Terminating the certificate validation process.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 49
AI Operations Management - Containerized 24.4

1.2.21. Pre-upgrade hooks failed: job failed:


BackoffLimitExceeded
During the suite install or upgrade the following error appears:

Error: pre-upgrade hooks failed: job failed: BackoffLimitExceeded

Cause
During the suite deployment or upgrade, the job itom-di-scheduler-udx-preinstall runs. This job checks for the supported
Vertica, OPTIC DL Vertica Plugin versions, and the Vertica configurations. If the versions don't match the error appears.

Solution
Follow these steps to resolve this error:

1. Run the following command:


kubectl get pods -n <suite namespace> | grep itom-di-scheduler-udx-preinstall
Note down the udx validator job pod name.
2. Run the following command:
kubectl logs -n <suite namespace> <udx validator job pod name>
The error logs appear.

You can resolve the issue according to the error logs as follows:

Error Cause and solution

This error is because of the following:

Vertica servers aren't responsive.


ERROR driver: Failed to establish a connection to the p
Vertica host name isn't correct.
rimary server or any backup host. ... no such host
The network path to Vertica hostname isn't reachable.

You must check the Vertica server connection.

This error is because of the following:


ERROR driver: Failed to establish a connection to the p
Vertica port isn't correct.
rimary server or any backup host. ... connect: connecti
Vertica service stopped.
on refused
You must check the Vertica port configuration and the Vertica connection.

This error is because of the following:

Could not connect to Vertica: Error: [28000] Invalid use The Vertica user name isn't correct.
rname or password The Vertica user password that's stored in a secret isn't correct.

You must check and give the correct Vertica username and password.

This error is because you have configured Vertica with TLS, but the Vertica
x509: certificate signed by unknown authority ... certificate for the suite isn't correct. You must check and give the correct Vertica
certificate name.

This error is because the Vertica database isn't correct. You must check and give
Error: [3D000] Database "<myDB>" does not exist
the correct Vertica database name.

Invalid Vertica version detected Vertica Analytic Datab This error is because the Vertica version isn't supported. You must ensure to
ase v9.2.1, expected v10.1.0,v10.1.1 deploy/upgrade to the supported Vertica version.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 50
AI Operations Management - Containerized 24.4

Error Cause and solution

This error is because of the following:

OPTIC DL Vertica Plugin isn't installed.


Vertica user doesn't have the privilege to OPTIC DL Vertica Plugin in
Failed reading UDX version Vertica.

You must install the OPTIC DL Vertica Plugin. Ensure to give required privileges
to the OPTIC DL Vertica Plugin in Vertica.

Invalid UDX version detected 2.4.0-8, expected 2.5.0-1 This error is because OPTIC DL Vertica Plugin isn't the correct version. You must
9 install the supported OPTIC DL Vertica Plugin version.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 51
AI Operations Management - Containerized 24.4

1.2.22. “502: Bad Gateway”


"502 Bad Gateway" error when attempting to launch OBM

After the installation of AI Operations Management, a 502 Bad Gateway error displays when trying to access OBM.

Cause
The 502 error displays because OBM is not yet up and running.

Solution
Depending on the host machine, it might take up to one hour for OBM to start after the initial configuration.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 52
AI Operations Management - Containerized 24.4

1.2.23. Suite Installer 503 nginx error


"503 nginx error" when attempting to run the Suite Installer

After the installation of the Container Deployment Foundation, a 503 Nginx error displays when trying to access the Suite
Installer.

Cause
This error may display because the time on the master and worker nodes is different.

Solution
To resolve this issue, synchronize the time on your nodes by using, for example, NTP or VMWare tools.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 53
AI Operations Management - Containerized 24.4

1.2.24. Suite uninstall fails


After CDF fails to install a suite, you cannot uninstall the suite by clicking SUITE > Management > Actions > Uninstall.

Solution
To solve this problem, run the following commands to recreate the suite-db pods:

kubectl get pods -n core

kubectl delete pod <pod name of suite-db> -n core

You can restart the virtual machine where the suite has installed the suite-db pod, as an alternate method.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 54
AI Operations Management - Containerized 24.4

1.2.25. Pods in status CrashLoopBackOff


Reboot does not work. Pods are in status CrashLoopBackOff. After attempting to reboot, the pods have the status
CrashLoopBackOff.

Cause
This relates to the vault-renewal container, which does not get a valid token.

Solution
You have to delete the failed pods. Once the pods delete, they recreate automatically and should run without error.

You can get the status of all pods with the following command:
kubectl get pods --all-namespaces

First, delete all failed database related pods ( suite-db , idm-postgresql , postgresql-aplm ).

Next, delete all failed pods within the namespace core.

After that delete all failed pods within the namespace opsbridge , starting with postgres , ucmdb , omi , redis , bvd ). Use the
following command to delete the failed pods within the namespaces specified above:

kubectl delete pod <pod_name> --namespace <pod_namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 55
AI Operations Management - Containerized 24.4

1.2.26. First master node goes down during


installation
The first control plane node encounters errors or goes down when you are using the installation portal to configure the
installation.

Cause
This issue occurs because the first control plane node is crashed and the virtual IP address is resolved to the second or the
third control plane node.

Solution
Continue the installation from the following URL:

https://<second/third_control_plane_node>:3000

In this URL, replace <second/third _control_plane_node > with the hostname of either the second control plane node or the
third control plane node.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 56
AI Operations Management - Containerized 24.4

1.2.27. The browser does not trust certificates


When you configure the OMT installation on the installation portal, you receive an error message that resembles the
following:

ERROR: The browser does not trust certificate

Cause
This issue occurs because the root CA isn't imported to the certificate trust management of your browser.

Solution
1. Run the following command on your terminal to check the certificate details:
kubectl get cm -n $CDF_NAMESPACE public-ca-certificates -o yaml
2. Copy the RE_ca.crt part to a file as a certificate and save it.
3. Upload the RE_ca.crt certificate file to your browser.
4. Upload the ca.crt file from $CDF_HOME/ssl to your browser.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 57
AI Operations Management - Containerized 24.4

1.2.28. Remove the OBM configuration


While there isn't necessarily a specific cause, you might need to remove the OBM configuration created by the obm-configurat
or.jar tool. This topic mentions the steps to do that.

Solution for removing OBM configuration


To get a clean OBM system, after setting it up for event forwarding and correlation, remove the OBM configuration in the
following order:

1. Before deleting content packs, make sure the Content Pack Development is enabled in the infrastructure settings

a. Go to Administration > Setup and Maintenance > Infrastructure Settings .


b. Search for development.
c. Make sure the value for Enable Content Pack Development is set to true. If it's set to false, switch it to true.

2. Remove COSO AEC Integration content pack

a. Go to Administration > Setup and Maintenance > Content Packs .


b. Select the content pack COSO AEC Integration and press Delete.

3. Remove COSO Data Lake Event Integration content pack

a. Go to Administration > Setup and Maintenance > Content Packs .


b. Select the content pack COSO Data Lake Event Integration and press Delete.

4. When Close Cause if Symptoms Closed content pack is installed

Note

This was only part of the integration until


2021.08.

a. Remove Close Cause if Symptoms Closed content pack

i. Go to Administration > Setup and Maintenance > Content Packs .


ii. Select the content pack Close Cause if Symptoms Closed CP and press Delete.

b. Disable and remove Time-Based Event Automation Rules

i. Go to Administration > Event Processing > Time-Based Event Automation .


ii. Select Close Cause if Symptoms Closed and press Delete.
iii. Select Close Cause if Symptoms Closed: Reset Symptoms Check and press Delete.

c. Remove the script Close Cause if Symptoms Closed

i. Go to Administration > Event Processing > Automation > Time-Based Event Automation .
ii. Select the rightmost icon in the icon list preceding the rules list to open the Scripts Manager.
iii. In the Scripts Manager, select the CloseCauseIfSymptomsClosed script and delete it.

5. Disable and remove event integration rule

a. Go to Administration > Event Processing > Event Forwarding .


b. Select the rule COSO Data Lake Event Integration Rule and press Delete.

6. Remove the connected server and alias

a. Go to Administration > Setup and Maintenance > Connected Servers .


b. Select COSO Data Lake and delete it.
c. Select COSO Data Lake Alias and delete it.

7. Remove the AEC Event Forwarding script

a. Go to Administration > Setup and Maintenance > Connected Servers .


b. Select the rightmost icon in the icon list preceding the rules list to open the Scripts Manager.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 58
AI Operations Management - Containerized 24.4

c. In the Scripts Manager, select the COSO Data Lake Event Forwarding Script script and delete it.

8. Remove AEC configuration items

a. Go to Administration > Operations Console > Tools.


b. Select ConfigurationItem on the left.
c. Select Launch AEC Explained and delete it.
d. Select Show Correlation Group Details (AEC Explained) and delete it.
e. Select Show Occurrence Details (AEC Explained) and delete it.

9. Remove the integration user and integration user role

a. Go to Administration > Users > Users, Groups and Roles.


b. Click users.
c. In the user list, select the integration user, whose name was defined at configuration time, and press Delete.
d. Click user roles.
e. In the roles list, select COSO Data Lake Event Integration User Role and press Delete.

10. In the Infrastructure Settings, revert the settings for COSO endpoints

a. Go to Administration > Setup and Maintenance > Infrastructure Settings .


b. Search for OPTIC DL.
c. For each of the nine values for Administration Endpoint, Data Access Endpoint, Data Receiver
Endpoint, Integration Password, Integration User, Proxy Password, Proxy URL, Proxy User Name, and
Tenant ID, press Revert.

11. Remove the suite certificates


Remove the suite certificates from the OBM trust store.

a. Get the list of installed certificates.


On Linux:

/opt/OV/bin/ovcert -list

On Windows:

"%OvInstallDir%\bin\win64\ovcert" -list

b. In the list of installed certificates, find the certificates that begin with MF CDF or MF RE or MF RIC or MF RID .....
Remove the certificates from both resource groups.
For example, on Linux:

/opt/OV/bin/ovcert -remove "MF RE CA on Vault 0e34f12b"


/opt/OV/bin/ovcert -remove "MF RE CA on Vault 0e34f12b" -ovrg server

For example, on Windows:

"%OvInstallDir%\bin\win64\ovcert" -remove "MF RE CA on Vault 0e34f12b"


"%OvInstallDir%\bin\win64\ovcert" -remove "MF RE CA on Vault 0e34f12b" -ovrg server

Repeat the earlier commands for all suite certificates.

Remove suite configuration

Remove the OBM certificates


Get a list of existing certificates from the suite.

./idl_config.sh -namespace <SUITE_NAMESPACE> -list

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 59
AI Operations Management - Containerized 24.4

Delete desired certificates based on the Common Name.

./idl_config.sh -namespace <SUITE_NAMESPACE> -chart <SUITE_HELM_CHART> -delete <DESIRED_COMMON_NAMES>

For example:

1. Get the common name from the certificate list.

./idl_config.sh -namespace opsbridge-helm -list


2021-04-22 05:43:49 INFO idl_config:main: Logging to /tmp/idl_config_log/idl_config.log
--------------------------------------------------------------------------------------
certificateKey=client-6f3ac156-d3f2-4408-8e53-a58aee15a6a7.crt
subject=
commonName = CA_2450e432-77e4-75b2-0def-e50b69eb3874_2048
localityName = Multi2-OBM.swinfra.net
organizationName = Hewlett-Packard
organizationalUnitName = OpenView

2. Remove the certificate.

./idl_config.sh -namespace opsbridge-helm -chart /root/opsbridge-helm \


-delete "CA_2450e432-77e4-75b2-0def-e50b69eb3874_2048"

Remove the OBM instance from the DataSource registry


Use the call-analytics-datasources.sh script from the OBM integration tools to remove OBM instances from the DataSource
Registry.

1. Get the list of registered instances.

cd integration-tools
./call-analytics-datasources.sh -aec-namespace <AEC_NAMESPACE> list both

Check the list for the endpoint ID of the OBM instance to remove.

2. Delete the OBM receiver.

./call-analytics-datasources.sh -aec-namespace <AEC_NAMESPACE> remove receiver -i <endpoint-id>

3. Delete the OBM source.

./call-analytics-datasources.sh -aec-namespace <AEC_NAMESPACE> remove source -i <endpoint-id>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 60
AI Operations Management - Containerized 24.4

1.2.29. CAS content zip optic content schema


installation gets stuck
The CAS content zip optic content schema installation gets stuck in the installing state and doesn't complete.

The ./ops-content-ctl list content list shows:

Also, the CAS deployment log shows unsuccessful deployment. You can verify the logs by running the command:

kubectl logs <CAS_JOB_POD_NAME> -n <Opsbridge_Namespace>

Cause
This issue may occur because of many reasons. One such reason is the itom-opsb-content-manager pod restarts during
installation, causing the schema installation to become unresponsive.

Solution
You can resolve the issue in either of the following ways:

Solution 1
When multiple contents have issues, then you must restart the CAS job pod. To restart, run the command:

kubectl delete pod <CAS_JOB_POD_NAME> -n <Opsbridge_Namespace>

Solution 2
When one or two contents have issue, then you can resolve the issue by force starting the content installation. To force start
the installation, run the command:

ops-content-ctl install content -n <name of the content> -v <version for the content> -f

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 61
AI Operations Management - Containerized 24.4

1.2.30. "Too many pods" error when installing an


application
When you try to deploy an application, you get the following errors:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 26m default-scheduler 0/1 nodes are available: 1 Too many pods.
Warning FailedScheduling 10m (x17 over 25m) default-scheduler 0/1 nodes are available: 1 Too many pods.

Cause
This issue occurs because by default Kubernetes only allows 110 pods for each node.

Solution
To work around this issue, you need to modify the maximum number of pods running on each node. Perform the following
steps:

1. Run the following command to get the maximum pods value for the node:

kubectl get nodes <node_name> -o json | jq -r '.status.capacity.pods'

2. Run the following command to update maxPods field in the kubelet-config file. The <value> must be a non-negative
integer:

Note

Ensure that the current node has enough resources to run these
pods.

yq -i e '.maxPods=<value>' $CDF_HOME/cfg/kubelet-config

For example:

yq -i e '.maxPods=110' $CDF_HOME/cfg/kubelet-config

3. Run the following command to restart kubelet service:

systemctl restart kubelet

4. Run the following command to check the maxPods value for the node:

kubectl get nodes <node_name> -o json | jq -r '.status.capacity.pods'

Repeat the steps if you want to modify the maxPods value for other nodes. You should ssh to these nodes before performing
the steps.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 62
AI Operations Management - Containerized 24.4

1.2.31. Log forwarding for the new capability is


not configured

Issue
When helm upgrade is run on an application to enable a new capability, the log forwarding for the new capability
is not configured. The log entries for the new capability won't appear in Elasticsearch.

Cause
The log forwarding configuration associated with the new capability is not loaded.

Solution
The workaround is to reload the configuration by restarting the fluent-bit pod. You must scale down the replicas of the itom-fl
uentbit pod to 0 and scale it back to 1.
This forces fluent bit to read all the configmaps including that of the new capability .

1. Get the name of the fluent bit deployment:

kubectl get deploy -n <application namespace> | grep itom-fluentbit

2. Redeploy the fluent bit pod:

kubectl scale deploy/<fluent bit deployment name> -n <application namespace> --replicas=0


kubectl scale deploy/<fluent bit deployment name> -n <application namespace> --replicas=1

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 63
AI Operations Management - Containerized 24.4

1.3. Troubleshoot upgrade


This section provides the following troubleshooting topics:

Upgrade 2021.11 to 2022.05 or Redeploying 2021.11 via AppHub UI is failing


monitoring-admin pod doesn't come up while upgrading
ops-monitoring-ctl tool fails with invalid username or password
UCMDB pod is stuck after suite upgrade
gen_secret.sh failing due to cert issue during rerun on the same environment
Interrupt the upgrade
Suite container fails when new capability is added
Pre-upgrade hooks failed
No configuration found for content
Error while upgrading OMT

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 64
AI Operations Management - Containerized 24.4

1.3.1. Helm upgrade fails with timeout error


During the helm upgrade, the itomdipulsar-broker-pre-upgrade-backlog-setting job goes into a race condition. Upgrade fails with
the following errors in the pre-upgrade-backlog-setting pod:

[main] WARN com.microfocus.pulsar.config.job.OpticBacklogQuota - Attempt 1 retrying to get topic backlog quota for the persistent://p
ublic/default/valid_data_topic due to {}"

org.apache.pulsar.shade.javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error"

ERROR com.microfocus.pulsar.config.job.OpticBacklogQuota - Attempt 5 retrying to get topic backlog quota for the persistent://public/d
efault/valid_data_topic due to {}"

Additionally, run the command helm list -A for the application deployment. The error similar to the following appears:

Upgrade "aws" failed: pre-upgrade hooks failed: timed out waiting for the condition

Cause
The helm upgrade fails when itomdipulsar-broker-pre-upgrade-backlog-setting takes longer than the configured helm timeout
interval. This can happen when the pre-upgrade backlog job isn't complete which is because the topic level policy isn't
working due to a corrupted policy.

Solution
Follow these steps to delete the corrupted topic and resolve this issue:

1. Ensure the itomdipulsar-broker-pre-upgrade-backlog-setting pod isn't in Running state. Run the following command to check
the state:

kubectl get pods -n <application namespace> | grep -i itomdipulsar-broker-pre-upgrade-backlog-setting

If the pod isn't running, skip step 2 and perform from step 3.
2. If the pod is running, run the following commands:

kubectl delete job <pre-upgrade-broker-apply-settings> -n <application namespace>

kubectl get deployment itomdipulsar-broker -n <application namespace>

Note down the number of replicas and then run the following commands:

kubectl scale deployment itomdipulsar-broker --replicas=0 -n <application namespace>

kubectl scale deployment itomdipulsar-broker --replicas=<number of replicas> -n <application namespace>

Wait until the broker is completely up and then perform the next steps.
3. Run the following commands:

kubectl exec -it itomdipulsar-bastion-0 -n <application namespace> -c pulsar /bin/bash

cd bin

4. Run the following command to delete the corrupted topic:

./pulsar-admin topics delete -f persistent://public/default/__change_events

5. Run the following command to list the topics and ensure that you see the __change_events topic in the output:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 65
AI Operations Management - Containerized 24.4

./pulsar-admin topics list public/default | grep -i __change_events

6. Start the upgrade for the application. For more information, see the Upgrade section.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 66
AI Operations Management - Containerized 24.4

1.3.2. The itom-monitoring-admin pod is in


CrashLoopBackOff state
When upgrading the application with external Oracle database, the itom-monitoring-admin pod goes in CrashLoopBackOff state
and the itom-monitoring-admin logs include a message similar to the following:

PLS-00201: identifier 'DBMS_LOCK' must be declared


ORA-06550: line 1, column 7:
PL/SQL: Statement ignored

Solution
To resolve this issue, follow these steps:

1. Log in to the Oracle database as SYSDBA user.


2. Run the following command:

GRANT EXECUTE on SYS.DBMS_LOCK to monitoringadminuser;

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 67
AI Operations Management - Containerized 24.4

1.3.3. AWS content upgrade fails


When upgrading from 2022.05 to a newer version, AWS content upgrade fails.

Solution
To resolve this issue, uninstall AWS content and install it again. Uninstalling content will lead to data loss. So make sure that
you back up the data before uninstalling content.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 68
AI Operations Management - Containerized 24.4

1.3.4. Automatic upgrade of OBM Content Pack


and UCMDB view fails while upgrading AI
Operations Management
During AI Operations Management upgrade, some UCMDB packages and OBM Content Packs fail to upgrade.

Cause
The issue occurs due to the versioning scheme changes between prior releases and the current one.

Solution
You can resolve the issue by deploying the UCMDB views and then uploading and deploying the content packs manually.

Deploy the view


Follow these steps to deploy the UCMDB views package to RTSM:

Launch the RTSM UI of a target OBM server as a desktop application from the OBM UI.
Deploy the UCMDB views package to RTSM from your local directory.

Launch RTSM UI as a desktop application from the Local Client


Follow these steps:

1. Go to Administration > RTSM Administration and click Local Client to download the Local Client tool.
2. Launch the Local Client tool.
1. Extract the UCMDB_Local_Client.zip package to a location of your choice, for example, the desktop.
2. Double-click UCMDB Local Client.cmd (Windows) or UCMDB Local Client.sh (Mac). The UCMDB Local Client window
opens.
3. Add or edit the login configuration for the target OBM server that you want to access.
1. Click or . The Add/Edit Configuration dialog opens.
2. Enter the following details:
Host/IP: Specify the value provided in the values.yaml for <externalAccessHost>.
Protocol: Select HTTPS as the protocol from the drop-down list.
Port: Specify the value provided in the values.yaml for <externalAccessPort>.
Target Env: Select UD/UCMDB as the target environment from the drop-down list.
3. Click OK.
4. Launch RTSM UI from the UCMDB Local Client window.
1. In the UCMDB Local Client window, click the Label value for the OBM server that you want to access. The Log
In dialog opens.
2. In the Log In dialog, enter your login parameters.
3. Click Login. The RTSM UI opens in a new window.

Deploy <service_name> UCMDB views package to RTSM


Follow these steps:

1. Download <service_name> UCMDB views from the following location:


On Linux:
wget --no-check-certificate https://<externalAccessHost>:<externalAccessHost>/staticfiles/monitoring-service/Monitoring_Service_<s
ervice_name>_UCMDB_Views.zip
On Windows:
https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service_<service_name>_UCMDB_Vie
ws.zip

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 69
AI Operations Management - Containerized 24.4

where, <service_name> is any Hyperscale Observability service such as AWS, Azure, GCP, Kubernetes, and VMware.

2. In the RTSM UI, go to Managers > Administration > Package Manager.


3. Click the button to open the Deploy Packages to Server dialog box.
4. Click the button to open the Deploy Packages to Server (from local disk) dialog box.
5. Select the <service_name> UCMDB views package zip file and click Open. The package appears in the upper pane of
the dialog box and its resources appear in the lower pane.
6. Select the resources from the package that you want to deploy. All the resources are selected by default.
7. Click Deploy.
8. A status report appears indicating whether the deployment was successful for each resource selected.

Import the Event Mapper content pack into OBM


Following these steps:

1. Download the Event Mapper content pack from the following location:
On Linux:

wget --no-check-certificate https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service


_Event_Mapper_<version>.zip

On Windows:

https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service_Event_Mapper_<version>.
zip

2. On the OBM user interface, go to Administration > SETUP AND MAINTENANCE > Content Packs.
3. Click Import. The Import Content Pack window appears.
4. Browse to the location where you have saved the Event Mapper content pack and then click Import. The Event Mapper
content pack gets imported. Click Close.

Import the Hyperscale Observability <service_name> Content Pack


Following these steps:

1. Download the <service_name> content pack from the following location:


On Linux:

wget --no-check-certificate https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service


_<service_name>_Content_Pack_<version>.zip

On Windows:

https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service_<service_name>_Content
_Pack_<version>.zip

2. On OBM user interface, go to Administration > SETUP AND MAINTENANCE > Content Packs.
3. Click Import. The Import Content Pack window appears.
4. Browse to the location where you have saved the <service_name> content pack and then click Import. The
<service_name> content pack gets imported. Click Close.
where <service_name> is any Hyperscale Observability service such as AWS, Azure, GCP, Kubernetes, and VMware.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 70
AI Operations Management - Containerized 24.4

1.3.5. itom-monitoring-admin pod doesn't come


up while upgrading due to Liquibase lock
When upgrading from an older version of the application, the itom-monitoring-admin pod doesn't come up. You will see the
following statements in the container log file:

liquibase: Waiting for changelog lock....


liquibase: Waiting for changelog lock....
liquibase: Waiting for changelog lock....
liquibase: Waiting for changelog lock....
liquibase: Waiting for changelog lock....
liquibase: Waiting for changelog lock....
liquibase: Waiting for changelog lock....

Cause
This issue may occur if the embedded PostGreSQL database is interrupted during creation of schema and fails to release the
Liquibase lock.

Solution
To resolve this issue, follow these steps:

1. Applies to PostgreSQL only. Get the password of the embedded PostgreSQL pod:

# kubectl exec -ti <pod_name> -n opsb-helm -c itom-postgresql -- get_secret IDM_DB_USER_PASSWORD_KEY


<Password>

2. Applies to PostgreSQL only. Log in to embedded PostgreSQL pod and log in into psql.
Example:

psql -d monitoringadmindb -U monitoringadminuser -p 5432 -h itom-postgresql

3. Select table and find the id:

select * from monitoringadminschema.databasechangeloglock

4. Update the table entry to unlock:

UPDATE monitoringadminschema.databasechangeloglock SET LOCKED=0 WHERE ID=<id>;

5. Restart itom-monitoring-admin pod.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 71
AI Operations Management - Containerized 24.4

1.3.6. All the pods not running after upgrade


After the application upgrade, the bvd-www-deployment and uif-upload-job pods aren't running.

Solution
To resolve this issue, you must delete the pods that aren't running as follows:

kubectl delete pod <pod name> -n <application namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 72
AI Operations Management - Containerized 24.4

1.3.7. ops-monitoring-ctl tool fails with invalid


username or password
The ops-monitoring-ctl tool isn't enabling the AMC collection after upgrading to a newer version.

Run the ops-monitoring-ctl get collector -V2 command to check the status of the collection configurations. You may get the
following error:

"statusCode": 403,
"reasonPhrase": "Forbidden"
Error: Invalid username or password, you must be logged in to the server

This error message appears when the username and password are invalid or the user is unauthorized.

You can also check the application.log file to find out the issue. The log file is available in this location: <log-vol>/cloud-monitori
ng/monitoring-admin/<pod_name>. See Map to NFS page for more information.

Cause
The user isn't assigned to the monitoringServiceAdminRole role or to the Administrators group.

Solution
1. Create the role monitoringServiceAdminRole if it doesn't exist. To do that:
a. Enter the URL https://<external_hostname>:<port>/idm-admin on the browser to access IDM. Enter your credentials
and log in. (Go to Organization > Roles)
b. To find the role exists or not, click the search icon the on the upper-right of the screen and enter the role name.

c. Create the role if doesn’t exist. To do that:

d. Click to add the role. Enter the related values for Name, Display name, Description, Application, and Associate
permission, and then click SAVE.

2. Assign the role to the Administrators group if the role isn't assigned to the group. To do that:
a. Go to Organization > Group.
b. To check the group exists or not click the search icon the on the upper-right of the screen and enter the group
name. The page displays the group name and related roles.
c. Click the Administrators link to see the Associated roles in the Group Settings.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 73
AI Operations Management - Containerized 24.4

d. Search for the role. If it doesn't exist in the list, add the role.
3. Rerun autoconfiguration job.

a. kubectl get jobs -n $(kubectl get jobs -A | awk '/autoconfigure/ {print $1, $2}') -o yaml > file_itom-monitoring-collection-au
toconfigure-job.yaml

b. yq eval -i 'del(.spec.template.metadata.labels)' file_itom-monitoring-collection-autoconfigure-job.yaml

c. yq eval -i 'del(.spec.selector)' file_itom-monitoring-collection-autoconfigure-job.yaml

d. kubectl delete job -n $(kubectl get jobs -A | awk '/autoc/ {print $1, $2}')

e. kubectl get jobs -n $(kubectl get jobs -A | awk '/autoconfigure/ {print $1, $2}') -o yaml > file_itom-monitoring-collection-au
toconfigure-job.yamlyq eval -i 'del(.spec.template.metadata.labels)' file_itom-monitoring-collection-autoconfigure-job.yamly
q eval -i 'del(.spec.selector)' file_itom-monitoring-collection-autoconfigure-job.yamlkubectl delete job -n $(kubectl get jobs -
A | awk '/autoc/ {print $1, $2}')kubectl apply -f file_itom-monitoring-collection-autoconfigure-job.yaml

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 74
AI Operations Management - Containerized 24.4

1.3.8. gen_secret.sh failing due to cert issue


during rerun on the same environment
Error:

curl: (60) Peer's certificate issuer has been marked as not trusted by the user. More details here:
http://curl.haxx.se/docs/sslcerts.html. curl performs SSL certificate verification by default, using a "bundle" of
Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an
alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a problem with the certificate (it might be
expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification
of the certificate, use the -k (or --insecure) option. 2021-06-03_00:27:20.158 [ ERR] Failed to execute curl when
posting secrets yaml -- curl error

Solution
On master node, under /root directory run ls -a to find the file .gs.<hostname>.curl-ca-bundle.crt and delete it.

cd /root
ls -a
rm -rf gs.<hostname>.curl-ca-bundle.crt

Rerun gen_secrets.sh script.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 75
AI Operations Management - Containerized 24.4

1.3.9. Pre-upgrade hooks failed

Scenario 1
Error: UPGRADE FAILED: pre-upgrade hooks failed: job failed: BackoffLimitExceeded.

Solution1

CLI
In a shared OPTIC Data Lake setup, when you upgrade the consumer before upgrading the provider using CLI , the
upgrade will fail with "Error: UPGRADE FAILED: pre-upgrade hooks failed: job failed: BackoffLimitExceeded"

1. List all the pods and look for restrict-consumer-upgrade pod :

kubectl get pods -n <suite namespace> | grep restrict-consumer-upgrade

Example output:

itom-restrict-consumer-upgrade-8dzm7 0/1 Error 0 28s

You can describe the pod and see details:

kubectl describe pods/<podname> -n <application namespace>

2. Get the logs for the itom-restrict-consumer-upgrade pod for the detailed cause of the failure:
kubectl logs <pod-name> -n <application namespace> -c <container name>
Example output:

kubectl logs itom-restrict-consumer-upgrade-mh2pd -n opsb-helm -c restrict-consumer-upgrade

"Consumer is not eligible for upgrade as Optic DL Provider is not Upgraded to 2023.05".

3. You can check the Jobs:

kubectl get jobs -n <application namespace>

If consumer upgrade fails then "restrict-upgrade" pre-upgrade job will be in 0/1 state.
4. If any other job other than restrict-upgrade job has failed, try re-running the upgrade. If the upgrade fails the second
time, You can contact Support and services.

AppHub
In a shared OPTIC Data Lake setup, when you upgrade the consumer before upgrading the provider using AppHub ,
the upgrade will fail with "Error: UPGRADE FAILED: pre-upgrade hooks failed: timed out waiting for the condition"

1. In AppHub UI goto DEPLOYMENTS > View Health > select (Pre-hook)itom-restrict-consumer-upgrade container
which will be in an error state.
2. Click on Logs tab and then click on "restrict-consumer-upgrade (Error)".
You will see the message Consumer is not eligible for upgrade as Optic DL Provider is not Upgraded to
2023.05.

In a no shared OPTIC Data Lake setup, follow the steps mentioned in solution2.

Solution2
This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 76
AI Operations Management - Containerized 24.4

1. List all the pods and see which pod is in an error state:
kubectl get pods -n <suite namespace>
2. Find the pod which is in an error state.
3. Get the logs of the pod for the detailed cause of the failure:
kubectl logs <pod-name> -n <suite namespace>
4. If there is an issue with the itom-opsb-db-connection-validator-job , there might be some misconfiguration with the external
database used. The cause of the failure can be found in the logs. Fix verify the parameters passed and re-run the
upgrade.
5. If any other job other than itom-opsb-db-connection-validator-job has failed, try re-running the upgrade. If the upgrade fails
the second time, You can contact Support and services.

Scenario 2
Error: UPGRADE FAILED: cannot patch "itomdipulsar-bookkeeper" with kind StatefulSet: StatefulSet.apps "itomdipulsar-boo
kkeeper" is invalid

Solution
1. Verify the parameters i tomdipulsar.bookkeeper.volumes.ledgers.size , itomdipulsar.bookkeeper.volumes.journal.size and itomdip
ulsar.zookeeper.volumes.data.size in the current values file with the values file passed during installation.
To get the values passed for installation use the command :
helm get values <release-name> -n <namespace>
2. Make the values of the parameters itomdipulsar.bookkeeper.volumes.ledgers.size, itomdipulsar.bookkeeper.volumes.journal.size
and itomdipulsar.zookeeper.volumes.data.size same as values passed during instalation and re run the upgrade.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 77
AI Operations Management - Containerized 24.4

1.3.10. Error while upgrading OMT


You receive error messages when you try to upgrade OMT on a node.

Solution
Follow these steps to troubleshoot the upgrade failure:

1. Run the following command:


source /etc/profile
2. Optional. If the upgrade failed on a master (control plane) node that was not the first master (control plane) node
stopped, run the following command to remove the etcd member from the etcd cluster:
upgrade.sh -d /<Parameter file path>/CDF_upgrade_parameters.txt

You will receive a message that resembles one of the following:


Remove member successfully
Not found this node in etcd cluster
3. Check whether the backup-complete file was created in the /<backup directory>/CDF_201703_backup directory.
If the backup-complete file does not exist, the OMT backup process failed. Follow these steps to back up the OMT,
and then follow the remaining steps to troubleshoot the upgrade:
1. Delete the /<backup directory>/CDF_201703_backup folder.
2. Run the following command:

upgrade.sh -u /<Parameter file path>/CDF_upgrade_parameters.txt

If the backup-complete file exists, the OMT backup completed successfully. Follow the remaining steps to
troubleshoot the upgrade.
4. Run the following command to check the status of the kubelet service:
systemctl status kubelet
If the kubelet service is not active, delete the kubelet.service file in the /usr/lib/systemd/system directory.
If the kubelet service is active, run the following command to stop the kubelet service. Then, delete the
kubelet.service file in the /usr/lib/systemd/system directory.

systemctl stop kubelet

5. Run the following command to check the docker service status: systemctl status docker
If the docker service is not active, delete the docker.service file in the /usr/lib/systemd/system directory.
If the docker service is active, run the following command to stop the docker service. Then, delete the
docker.service file in the /usr/lib/systemd/system directory.

systemctl stop docker

6. Check the docker-bootstrap service status with the command: systemctl status dockerbootstrap.
If the docker-bootstrap is not active, delete the docker-bootstrap.service file in the /usr/lib/systemd/system
directory.
If the docker-bootstrap is active, run the following command, and then delete the docker-bootstrap.service file in
the /usr/lib/systemd/system directory.

systemctl stop docker-bootstrap

7. Run the following commands to unmount the mounted data:

for data in $(mount | grep "${K8S_HOME}/data/" | cut -d" " -f3 | sort -r);do umount -f -l $data; done
for data in $(mount | grep "/usr/lib/kubelet" | cut -d" " -f3 | sort -r);do umount -f -l $data; done

8. Reboot the machine that you are retrying the upgrade.


9. Run the following command to delete the $<K8S_HOME> directory:

rm -rf $<K8S_HOME>

10. Run the following command to roll back the $<K8S_HOME> directory:

mv /<backup directory>/CDF_201703_backup $<K8S_HOME>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 78
AI Operations Management - Containerized 24.4

11. Delete the backup-complete file in the $<K8S_HOME> directory.


12. Run the following commands to recover the docker.service and docker-bootstrap.service files:

mv ${K8S_HOME}/docker.service /usr/lib/systemd/system/
mv ${K8S_HOME}/docker-bootstrap.service /usr/lib/systemd/system/

13. (Optional) If the upgrade failed on the first master (control plane) node that was stopped, manually restore the data on
the NFS server.
14. Run the following command to retry the upgrade:

upgrade.sh -u /<Parameter file path>/CDF_upgrade_parameters.txt

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 79
AI Operations Management - Containerized 24.4

1.3.11. OPTIC DL Vertica Plugin fails with an error


during the upgrade
While upgrading the OPTIC DL Vertica Plugin, the following error appears:

fatal dbinit: ROLLBACK 5365: [42704] User available location ["/home/dbadmin/.itomdipulsarudx"] does not exist on node ["v_itomdb_n
ode0001"]

where, /home/dbadmin/ is the <path to Vertica admin home>.

Cause
This issue is because you may have cleaned up the Vertica server before the upgrade. However, some residual files may
have remained.

Solution
Perform these steps to resolve this issue:

1. Log on to the same Vertica node where you have already installed the OPTIC DL Vertica Plugin. This means the node
that has the /usr/local/itom-di-pulsarudx folder.
2. Delete the directory <path to Vertica admin home>/.itomdipulsarudx
3. Upgrade the OPTIC DL Vertica Plugin.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 80
AI Operations Management - Containerized 24.4

1.3.12. "UPGRADE FAILED" error occurs after


updating certificates from OMT Management
Portal
While performing helm upgrade, you might get the following message:

Error: UPGRADE FAILED: template: opsbridge-suite/charts/itom-ingress-controller/templates/secret.yaml:5:24: executing "opsbridge-suit


e/charts/itom-ingress-controller/templates/secret.yaml" at <index $secret.metadata.annotations "deployments.microfocus.com/ingore-t
ls-cert">: error calling index: index of untyped nil

Cause
Some values in nginx-default-secre t are missing.

Solution
Follow these steps to add the missing values:

1. Run the following command to get the <release-name> :

helm list -n <opsb namespace> -a 2>/dev/null | grep -E 'opsbridge-suite-[0-9]+\.[0-9]+\.[0-9]\+[0-9]+' | awk '{print $1}' | xargs

2. Run the following commands to add annotations to the nginx-frontend-secret secret and put it under helm management:

kubectl patch secret nginx-default-secret -n <suite-ns> -p "{ \"metadata\": { \"annotations\": { \"meta.helm.sh/release-name\": \"
<deployment-name>\", \"meta.helm.sh/release-namespace\": \"<suite-ns>\" }, \"labels\":{ \"app.kubernetes.io/managed-by\": \"H
elm\" } } }"

where,

suite-ns is the namespace where you have installed the AI Operations Management.

deployment-name is the helm deployment name.

3. Run the helm upgrade again.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 81
AI Operations Management - Containerized 24.4

1.3.13. Events sent from OBM are not stored in


the opr_event Vertica table

Problem
After the n-2 upgrade of the application, events sent from OBM aren't stored in the opr_event Vertica table as the opr_event
target and source scheduler streams are missing.

Solution
Restart the suite ( cdfctl runlevel set -l DOWN/UP -n <suite_namespace> ) to resolve this issue.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 82
AI Operations Management - Containerized 24.4

1.3.14. Upgrading Hyperscale Observability


results in AWS itom-snf-monitoring pod to not
restart
When you upgrade Hyperscale Observability for AWS from 2022.05 to 2022.11, you find AWS itom-snf-monitoring pod doesn't
restart until you disable TLS on RDS.

Cause
This issue happens because the container software insists that the RDS certificate needs a hostname in its SAN (Subject
Alternate Name) field.

Solution
To resolve the issue:

1. Include the Subject Alternative Name (SAN) extension in the TLS certificate for PostgreSQL by following the steps
outlined in the Enable TLS in PostgreSQL.
2. Run the below Helm upgrade command:

helm upgrade <helm deployment name> -n <suite namespace> -f <values.yaml> <chart> [--set-file "caCertificates.vertica-ca\.
crt"=<vertica certificate file> ] [--set-file "caCertificates.postgres\.crt"=<relational database certificate file> [--set-file oracleWalle
t=<base64 encoded wallet text file> ]] [-f <deployment.yaml> ] [-f <secrets.yaml>] --timeout 15m

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 83
AI Operations Management - Containerized 24.4

1.3.15. Pulsar Push adapter doesn't work after


upgrade
After you upgrade Content Pack from version 2020.08 or earlier to version 2020.11 or later, the Pulsar Push adapter no
longer works. You can't create an integration point for pushing CIs by using the Pulsar Push adapter.

You can see error messages similar to the following:

jvm 1 | <2022-03-30 17:43:05,415> 239409 [ERROR] [AdHoc:AD_HOC_TASK_PATTERN_ID-5-1648654952471] (DataAdapterLoggerIm


pl.java:83) - >> java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.Compl
etionException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.pulsar.common.util.SecurityUtilityorg.apache.puls
ar.client.api.PulsarClientException: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: java.util.
concurrent.CompletionException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.pulsar.common.util.SecurityUtili
ty
jvm 1 | <2022-03-30 17:43:05,416> 239410 [INFO ] [AdHoc:AD_HOC_TASK_PATTERN_ID-5-1648654952471] (PulsarClientImpl.java:66
9) - Client closing.
jvm 1 | <2022-03-30 17:43:07,444> 241438 [ERROR] [AdHoc:AD_HOC_TASK_PATTERN_ID-5-1648654952471] (FederationNoneLifeCyc
leProbeProcessor.java:72) - Failed executing none life cycle federation request [com.hp.ucmdb.discovery.probe.request.TestConnection
ProbeRequest] for integration [RTSM_COSO_Topology_Streaming]
jvm 1 |
jvm 1 | com.hp.ucmdb.federationspi.exception.DataAccessCommunicationException: java.util.concurrent.ExecutionException: org.apa
che.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: java.lang.NoClassDefFoundError: Could not initiali
ze class org.apache.pulsar.common.util.SecurityUtility
jvm 1 | at com.microfocus.ucmdb.adapters.pulsar.PulsarPushAdapter.testConnection(PulsarPushAdapter.java:113)
jvm 1 | at com.hp.ucmdb.adapters.GenericAdapter.callPushConnectorToTestConnection(GenericAdapter.java:166)
jvm 1 | at com.hp.ucmdb.adapters.GenericAdapter.testConnection(GenericAdapter.java:150)
jvm 1 | at com.hp.ucmdb.discovery.probe.processor.TestConnectionProbeRequestProcessor.processFederationNoneLifeCycle(TestC
onnectionProbeRequestProcessor.java:18)
jvm 1 | at com.hp.ucmdb.discovery.probe.processor.TestConnectionProbeRequestProcessor.processFederationNoneLifeCycle(TestC
onnectionProbeRequestProcessor.java:12)
jvm 1 | at com.hp.ucmdb.discovery.probe.processor.FederationNoneLifeCycleProbeProcessor.process(FederationNoneLifeCycleProb
eProcessor.java:63)

Cause
This issue is caused by Java conflicts in the adapter resource package. You need to manually remove the unnecessary .jar
files.

The following nine .jar resources are the ones that are required and should be kept. You can remove other .jar files from the
Pulsar Push adapter package.

adapterCode/PulsarPushAdapter/dependencies/javax.activation.jar
adapterCode/PulsarPushAdapter/dependencies/javax.ws.rs-api.jar
adapterCode/PulsarPushAdapter/dependencies/jcip-annotations.jar
adapterCode/PulsarPushAdapter/dependencies/kafka-push-adapter.jar
adapterCode/PulsarPushAdapter/dependencies/pulsar-client-admin-api.jar
adapterCode/PulsarPushAdapter/dependencies/pulsar-client-api.jar
adapterCode/PulsarPushAdapter/dependencies/pulsar-client.jar
adapterCode/PulsarPushAdapter/dependencies/validation-api.jar
adapterCode/PulsarPushAdapter/PulsarPushAdapter.jar

Solution
To remove the unnecessary resources that causing Java conflicts, follow these steps:

1. From UCMDB UI, go to Administration > Package Manager, and then select PulsarPushAdapter.
2. Click the Undeploy resources button from the toolbar.
3. Select the unnecessary jars causing conflicts for removal. That is, select the .jar resources (starting with "adapaterCode
- PulsarPushAdapter") that are not in the required .jar list. Don't select the nine resources listed in the Cause section as

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 84
AI Operations Management - Containerized 24.4

they're required by the adapter.


4. Click Next and confirm the list.
The change will take effect in a few minutes without the need of cleaning up or restarting Data Flow Probe.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 85
AI Operations Management - Containerized 24.4

1.4. Troubleshoot administration


This section provides the following troubleshoot administration topics:

Cannot delete a database or user


UCMDB probe pod stuck
Vault pod is stuck
Cannot login to OMT swagger
Vertica fails with pool size error

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 86
AI Operations Management - Containerized 24.4

1.4.1. NFS storage is running out of space


Network File System (NFS) has less or no usable storage space.

Cause
This issue occurs if logs consume a lot of space in NFS storage.

Solution
Follow these steps to fix this issue:

1. Run the following command to identify the logging persistent volume name:

kubectl get pvc -n $CDF_NAMESPACE itom-logging-vol -o json|jq -r .spec.volumeName

2. Run the following commands to identify the NFS server and NFS path that the logging persistent volume is mounted to.
Replace <logging PV name> with the persistent volume name that you identified in the previous step.

kubectl get pv <logging PV name> -o json|$CDF_HOME/bin/jq -r '.spec.nfs.server'


kubectl get pv <logging PV name> -o json|$CDF_HOME/bin/jq -r '.spec.nfs.path'

3. Log in to the NFS server that you identified in the previous step and go to /var/vols/itom/<log file directory path> . The <log
file directory path> depends on whether logging volumes are created manually or by using the storage provisioner.

4. Identify and delete the old log files (generated by OMT and AI Operations Management capabilities) that are no longer
required.

5. Configure log rotation or deletion to avoid encountering this issue again. For detailed steps, see Change the log rotation
or delete configuration.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 87
AI Operations Management - Containerized 24.4

1.4.2. Cannot delete a database or user


When you uninstall or remove the database, you run the below commands.

PostgreSQL
psql -f RemoveSQL.sql
Oracle
echo exit | sqlplus sys/mysyspassword as SYSDBA @RemoveSQL.sql

You may not be able to remove a Oracle or a PostgreSQL database and may get below errors.
Oracle:

DROP USER cdfidmdb CASCADE

ERROR at line 1:

ORA-01940: cannot drop a user that is currently connected

this is the Error get during Removesql.sql execution for Oracle.

PostgreSQL:

psql:RemoveSQL.sql:1: ERROR: database "bvd" is being accessed by other users


DETAIL: There are 5 other sessions using the database.
psql:RemoveSQL.sql:2: ERROR: database "cdfidmdb" is being accessed by other users
DETAIL: There are 9 other sessions using the database.
psql:RemoveSQL.sql:3: ERROR: database "autopassdb" is being accessed by other users
DETAIL: There are 10 other sessions using the database.
psql:RemoveSQL.sql:4: ERROR: role "cdfidmuser" cannot be dropped because some objects depend on it

Cause
This error occurs due to the following reasons:

There are multiple active connections or sessions that are accessing the database.
The suite is not uninstalled before deleting the relational databases.

Solution
You need to disconnect from the database or close the active sessions in order to remove the users or databases.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 88
AI Operations Management - Containerized 24.4

1.4.3. UCMDB probe pod stuck

Cause
itom-ucmdb-probe pod gets stuck at 1/2 state.

Solution
1. Log on to any one of the master (control plane) nodes.
2. Run the following command:

kubectl scale sts itom-ucmdb -n opsb --replicas=0

3. Log on to postgres rtsm database and execute the following query:

select * from jgroupsping;

4. If data is present even after the deployment is down then truncate the table:

TRUNCATE jgroupsping;

5. Run the following commands to start the deployment:

kubectl scale sts itom-ucmdb -n opsb --replicas=1


kubectl scale sts itom-ucmdb -n opsb --replicas=2

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 89
AI Operations Management - Containerized 24.4

1.4.4. Vault pod is stuck

Symptom
Vault pod isn't up after restoring the backed up data.

Solution
1. Scale down vault deployment. Run the following command:

kubectl scale deployment itom-vault -n <suite-namespace> --replicas=0

2. Copy/restore vault data again from backed up NFS volumes to /mnt/efs/var/vols/itom/opsbvol*/vault


3. Give execute permissions to the vault directory. Run the following command:
chmod -R 755 /mnt/efs/var/vols/itom/opsbvol*/vault
4. Change ownership of the vault directory. Run the following command:
chown -R <SYSTEM_USER_ID>:<SYSTEM_GROUP_ID> /mnt/efs/var/vols/itom/opsbvol*/vault
5. Delete the secrets:

kubectl delete secret vault-credential


kubectl delete secret vault-root-cert
kubectl delete secret vault-passphrase
kubectl delete secret vault-approle-xxxxxx
kubectl delete secret vault-approle-yyyyyy

6. Restore the secrets from the backed up data:

kubectl apply -f vault-credential.yaml


kubectl apply -f vault-root-cert.yaml
kubectl apply -f vault-passphrase.yaml
kubectl apply -f vault-approle-xxxxxx.yaml
kubectl apply -f vault-approle-yyyyyy.yaml

7. Scale up the vault deployment. Run the following command:


kubectl scale deployment itom-vault -n <suite-namespace> --replicas=1

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 90
AI Operations Management - Containerized 24.4

1.4.5. Cannot login to OMT swagger


When you go to OMT swagger page with the https://<external_host>:5443/suiteInstaller/swagger-ui.html URL, you cannot get the
token value while entering your CDF management portal username and password into the token-controller.

Cause
This issue occurs because you have configured the EXTERNAL_ACCESS_PORT in the base-configmap to a wrong port.

Solution
1. Log on to any one of the master (control plane) nodes.
2. Run the following command to edit the base-configmap . You need to change the value of the parameter EXTERNAL_ACCES
S_PORT to "5443". kubectl edit cm base-configmap -n core
3. Run the following command to get list the running pods:
kubectl get pods -n core
Your terminal resembles the following:

[root@sh ~]# kubectl get pods -n core


NAME READY STATUS RESTARTS AGE
cdf-add-node-1573115695592 0/1 Completed 0 4d1h
cdf-add-node-1573117979792 0/1 Completed 0 4d
cdf-apiserver-59d8b989b5-kkspw 2/2 Running 0 4d1h
fluentd-4xx2w 2/2 Running 0 4d1h
fluentd-jcn9s 2/2 Running 0 4d
idm-865b8b8f54-4zwjc 2/2 Running 0 4d1h
idm-865b8b8f54-x7h6v 2/2 Running 0 4d1h
itom-cdf-deployer-jtv9v 0/1 Completed 0 4d1h
itom-cdf-image-utils-cd9nh 0/1 Completed 0 4d1h

4. Run the following command to delete the cdf-apiserver-xxxx pod. You need to replace the <cdf-apiserver> with the pod
name you get from the previous step.
kubectl delete pod -n core <cdf-apiserver>
5. Wait for some minutes till all cdf-apiserver starts again. You can run the following command to check the pod status.
kubectl get pods -n core

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 91
AI Operations Management - Containerized 24.4

1.5. Troubleshoot generic metric collection issues


This section covers the following troubleshoot general metric collection scenarios:

Troubleshoot Content Administration Service (CAS)


Agent System Metric Push does not have a Store and Forward capability
Credentials manager pod fails to start after suite restart
itom-opsbridge-cs-redis in CrashLoopBackOff state

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 92
AI Operations Management - Containerized 24.4

1.5.1. Troubleshoot Content Administration


Service (CAS)
BVD and OPTIC Data Lake artefacts used in OPTIC Reporting and Automatic Event Correlation are packaged as content zips.
These zips are internal and not modifiable.

RUM
Event
Internal file - OpsB_RUM_Content_2020.08.zip
Internal file - OpsB_Event_Content_2020.08.zip
Real User Monitor schema and Real User Monitor
Event schema
Reports

SysInfra BPM
Internal file - OpsB_SysInfra_Content_2020.08.zip Internal file - OpsB_BPM_Content_2020.08.zip
System Infrastructure schema Synthetic Transaction schema

CMDB
Internal file - OpsB_CMDB_Content_2020.08.zip
OPTIC Data Lake tables (purely internal - not
documented)

These content zips are deployed to the respective services by the Content Administration Service (CAS). Further, there is a
startup Job that helps invoke CAS right after AI Operations Management is deployed. In case you notice any issues with
CAS' operation, use this document to isolate the root cause and resolve it.

Flow
During helm deployment:

Kubernetes Job (itom-opsb-content-administration-job-xxxxx), Deployment (itom-opsb-content-administration) and


Service (itom-opsb-content-administration-svc) for CAS is created
Kubernetes Pods created for CAS
CAS Pod (itom-opsb-content-administration-xxxxxxxx-xxxxx) is started
CAS Job Pod (itom-opsb-content-administration-job-xxxxx-xxxxx) waits for BVD, OPTIC Data Lake, Node Resolver
(explained later under AMC), and CAS services to be available
Once available, uses internal CAS API to deploy content files
CAS Pod receives these requests
CAS Pod unzips respective content files and builds OPTIC Data Lake and BVD requests
CAS Pod executes the OPTIC Data Lake and BVD requests in the following order: bvd, metadata, enrichment,
retention, roll-up, blk-upload, entityConfig

Note

metadata, enrichment, retention, roll-up, blk-upload, entityConfig are concepts internal to OPTIC Data
Lake

If a request to OPTIC Data Lake to create metadata, enrichment and roll-up (OPTIC Data Lake tables) fails, the
request is tried again every 3 minutes, 10 times.
If the request fails even after 30 minutes, content deployment exits with the following log message:

Exiting as metadata table not created. List of tables not created are: <list of tables failed>

Once each section is completed, you will see the following message:

Continuing as all metadata table was created.

Logs

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 93
AI Operations Management - Containerized 24.4

Accessing logs
For offline analysis:
Ask for a tar/zip with the contents of the {log-vol}/{namespace}/*content-admin*/ folder
For real-time analysis:
On any of the CDF nodes, start by setting environment variables for the namespace and various pods

export cas_ns=$(kubectl get pods --all-namespaces | awk '!/job/ && /content-manager/{printf $1}');
export cas_job=$(kubectl get pods -n $cas_ns| awk '/job/ && /content-management/{printf $1}');
export cas_pod=$(kubectl get pods --all-namespaces | awk '!/job/ && /content-manager/{printf $2}');

CAS Job

Check logs with:

kubectl logs -n $cas_ns $cas_job

CAS

cas-util tool helps interact with CAS. Use the following command to download it to one of the CDF nodes

wget --no-check-certificate https://<external_access_host>/staticfiles/contrib/cas-util/cas-util.sh && chmod 755 cas-util


.sh

Tail current logs from CAS using and check for errors:

./cas-util.sh logs tail

Download logs from CAS using to share logs:

./cas-util.sh logs download

Analyzing logs

CAS Job
On successfully requesting CAS to deploy content:

INFO: <content> Content Administration Service Job now calling curl -k -s -o /dev/null -w %{http_code} https://itom-opsb-co
ntent-administration-svc:8443/v1/content/configuration/all/category/RUM
INFO: Curl command to CAS is triggered with response code 202
INFO: <content> configuration is triggered. Please check the status

When there is a problem requesting CAS to deploy content:

ERROR: <content> was not accepted by either OPTIC Data Lake Administration Service or BVD. Will not be able to conf
igure <content>. Please check the status in Content Administration Service.

Steps to resolve:

Proceed to CAS log analysis and resolve the root cause

CAS
On successfully deploying a BVD dashboard:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 94
AI Operations Management - Containerized 24.4

INFO: Uploading SystemAvailability-TopN dashboard.


INFO: /opt/content-administration/bin/bvd-cli-linux --import --file ...
Success authenticating user. Importing dashboard...
Import dashboard success.
INFO: Uploading SystemAvailability-TopN dashboard is completed successfully.

On successfully deploying metadata to OPTIC Data Lake:

INFO: Performing configure task on 'OPTIC Data Lake'. May take some time to complete task...

2020-10-07 07:45:25 INFO: Job Summary:
Type||Category||Status
======================================
bvd||system_infra||Completed
blk-upload(ingestion-conf)||system_infra||Skipped [Reason: Content is not available].
blk-upload(load-conf)||system_infra||Skipped [Reason: Content is not available].
entityConfiguration||system_infra||Skipped [Reason: Content is not available].
metadata||system_infra||Completed
retention||system_infra||Completed
enrichment(hourly)||system_infra||Completed
enrichment(daily)||system_infra||Completed
enrichment(forecast)||system_infra||Completed
roll-up(perl)||system_infra||Completed
roll-up(task)||system_infra||Completed
roll-up(task-flow)||system_infra||Completed
======================================

When there is a problem deploying to OPTIC Data Lake:

[main] INFO OBMDIConfig.sendDataSetToAdminWs(801) - Performing 'reconfigure' with method 'PUT' on 'opsb_internal_repo


rts_schedule_config_1h_id' with 'OPTIC Data Lake' URL 'https://itom-di-administration-svc:8443/urest/v2/itom-data-ingestion-
administration/dataSetConfiguration/opsb_internal_reports_schedule_config_1h_id'.
[main] ERROR OBMDIConfig.sendDataSetToAdminWs(837) - Could not transfer 'opsb_internal_reports_schedule_config_1h_id
' content to 'OPTIC Data Lake' endpoint 'https://itom-di-administration-svc:8443/urest/v2/itom-data-ingestion-administration/
dataSetConfiguration/opsb_internal_reports_schedule_config_1h_id'. Reason:
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
...
[main] ERROR OBMDIConfig.sendDataSetToAdminWs(838) - Kindly review the reason and take corrective action on 'OPTIC D
ata Lake'.
...
[main] ERROR OBMDIConfig.main(2552) - Could not send data under '/var/content-administration/content/opr/integration/cos
o/metadata/' to 'OPTIC Data Lake' endpoint 'https://itom-di-administration-svc:8443/urest/v2/itom-data-ingestion-administrat
ion/dataSetConfiguration'.

Steps to resolve:
Resolve the root cause on OPTIC Data Lake (see OPTIC Data Lake Troubleshooting for details)

Run CAS commands manually to re-deploy out of the box content:

opsb-content-service.sh easyconfigure -configuration_types all -category agent_infra


opsb-content-service.sh easyconfigure -configuration_types all -category agentless_infra
opsb-content-service.sh easyconfigure -configuration_types all -category BPM
opsb-content-service.sh easyconfigure -configuration_types all -category event
opsb-content-service.sh easyconfigure -configuration_types all -category system_infra
opsb-content-service.sh easyconfigure -configuration_types all -category RUM
opsb-content-service.sh easyconfigure -configuration_types retention -category entityConfiguration
opsb-content-service.sh easyconfigure -configuration_types entityConfiguration -category all

To start the collection:-


opsb-content-service.sh start -collection_types oa -all

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 95
AI Operations Management - Containerized 24.4

Note

To check the status of content schemas:


opsb-content-service.sh status -configuration_types all -category
all

Note

To check the status of collection:


opsb-content-service.sh status -collection_types oa -
all

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 96
AI Operations Management - Containerized 24.4

1.5.2. User does not have required permissions


to modify OOTB content
After Helm upgrade from a lower version to 24.2 chart, when you try to uninstall or modify out-of-the-box (OOTB) content
using the ops-content-ctl , you will get this error message:

Cause
This issue occurs when you don't have the di-admin role set in your provider org for the ops-content-ctl .

Solution
To resolve the issue, you must manually add the di-admin role as your provider in ops-content-ctl .

Follow the steps to add the role:

Log into the IDM server.


Add di-admin to your user roles.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 97
AI Operations Management - Containerized 24.4

1.5.3. Agent System Metric Push does not have a


Store and Forward capability

Issue
Agent System Metric Push doesn't have a Store and Forward capability.

Solution
1. Install Operations Agent (OA) 12.15.
2. Contact Software Support and get the hotfix (OCTCR19G1192060) for OA 12.15. This provides the Store and Forward
capability.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 98
AI Operations Management - Containerized 24.4

1.5.4. Credentials manager pod fails to start after


suite restart
When the credential manager restarts during database initialization, it checks for an error message already exists string. If
the error contains this string, credential manager proceeds further. If the database is localized and the error messages are
displayed in other languages then the already exists string isn't available in the error and the credential manager tries to
reinitialize the database. This results in the credential manager pod failure to start.

Solution
LC_MESSAGES is responsible for printing out messages in the required language. Therefore set LC_MESSAGES to US English in
the PostgreSQL server.

export LC_MESSAGES=en_US.UTF-8

Note

You can also configure the locale for system error messages in postgresql.conf
file.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 99
AI Operations Management - Containerized 24.4

1.5.5. itom-opsbridge-cs-redis in
CrashLoopBackOff state
Redis pod is in CrashLoopBackOff state.

Following are some sample error scenarios:

The cs-redis pods are restarting continuously with the following errors:

1667855677.596905032,"2022-11-08T00:14:37.596905032+03:00","time="2022-11-07T21:14:37Z" level=error msg="Couldn't conne


ct to redis instance"","itom-opsbridge-cs-re

dis-5f69c95669-z4h49","opsb-helm","xxxxx","cs-redis"

1667855697.596963443,"2022-11-08T00:14:57.596963443+03:00","time="2022-11-07T21:14:57Z" level=error msg="Couldn't connec


t to redis instance"","itom-opsbridge-cs-re

dis-5f69c95669-z4h49","opsb-helm","xxxxx","cs-redis"

1667855717.104071175,"2022-11-08T00:15:17.104071175+03:00","/bin/startRedis.sh: line 116: 45 Killed nohup /usr/sbin/


redis-server /var/opt/redis

/redis.conf","itom-opsbridge-cs-redis-5f69c95669-z4h49","opsb-helm","xxxxx","cs-redis"

1667855717.114281550,"2022-11-08T00:15:17.11428155+03:00","2022-11-07T21:15:17,113+0000 redis:start Exiting as one of the pr


ocesses has exited. Redis status: 1, re

dis-export

Found a huge (~1.3 GB) Redis backup file redis-cs/data/dump.rdb .

Solution
1. Edit itom-opsbridge-cs-redis deployment to increase the memory limit to 8 GB.

Default configurations from the deployment:

name: cs-redis
ports:
- containerPort: 6380
protocol: TCP
- containerPort: 9121
name: redis-exporter
protocol: TCP
resources:
limits:
cpu: "2"
memory: 2Gi
requests:
cpu: 100m
memory: 1Gi

Change the default configuration as described below:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 100
AI Operations Management - Containerized 24.4

name: cs-redis
ports:
- containerPort: 6380
protocol: TCP
- containerPort: 9121
name: redis-exporter
protocol: TCP
resources:
limits:
cpu: "2"
memory: 8Gi
requests:
cpu: 100m
memory: 2Gi

2. Log into cs-redis container and execute the following commands:


a. Get the pod name as described in the sample command.

$kubectl get pods -A | grep cs-redis


monitoring-edge itom-opsbridge-cs-redis-59d7df9c59-lhddx 2/2 Running 0 21h

b. Log into the container bash as described in the sample command.

$ kubectl exec -it -n monitoring-edge -c cs-redis itom-opsbridge-cs-redis-59d7df9c59-lhddx bash

3. Execute the following commands inside cs-redis container in sequential order.

a. REDIS_PASSWORD=`get_secret redis_pwd | cut -f 2 -d=` ; redis-cli --tls --cert


/var/run/secrets/boostport.com/cs-redis.crt --key /var/run/secrets/boostport.com/cs-redis.key --cacert
/var/run/secrets/boostport.com/ca.crt -p 6380 -a ${REDIS_PASSWORD}

b. XTRIM oa_metricpull_recurring MAXLEN 0

c. XTRIM oa_metricpull_background MAXLEN 0

4. Reset the memory limit as described in step 1 to default values.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 101
AI Operations Management - Containerized 24.4

1.6. Troubleshoot Agent Metric Collector


This section covers the following troubleshooting scenario:

Issues related to Edge self-monitoring when AMC is deployed


AMC throws an SSL connection error
AMC failed to collect metrics
Metrics aren't added for newly added nodes

Note

: For enhanced logging in itom-monitoring-oa-metric-collector ( agent-collector-sysinfra ), you can edit the collector
configuration and set the metricCollectorLogLevel parameter to DEBUG.

For example:

1. Copy the collector configuration to a file. Run the following command:

./ops-monitoring-ctl get coll -n agent-collector-sysinfra -o yaml > <filename>.yaml

2. Edit the collector configuration yaml file. Set the metricCollectorLogLevel parameter to DEBUG .

3. Update the collector configuration. Run the following command:

./ops-monitoring-ctl update -f <filename>.yaml

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 102
AI Operations Management - Containerized 24.4

1.6.1. Issues related to Edge self-monitoring


when AMC is deployed

Problem
You may encounter any of the issues when you configure self monitoring for Monitoring Service Edge to send alerts to OBM.

You need to check the log file. Log file path of Edge self monitoring: /var/opt/OV/log/edge-self-monitoring.log

Issue 1: Topology not integrated or events not being sent

Solution 1
You can try the following:

1. Run ovc and check if opcgeni and opcmona are running.


2. If opcgeni and/or opcmona are in stopped state, run ovc -start to start these processes.
3. If there is any error when running ovc -start, run ovc -kill;ovc -start , wait for 5 minutes and check that /var/opt/OV/bin/instru
mentation/ contains the "monitoring" binary. Ideally, the file monitoring.gz shouldn't be present as it will be unzipped
when the scheduled task policy runs the execute.sh script.

Issue 2: Topology doesn't include some pods

Solution 2
You can try the following:

1. Search for "CI:" in the Edge self monitoring log and check if any POD is missing in the list.
2. Enable debug logging by setting SELF_MON_LOG_LEVEL to debug in the deployment of data broker container.
3. After debug log is enabled, search for "Topology XML:" in the log and check the XML contents.

The following screenshot illustrates a part of topology xml which contains the pod CI.

Issue 3: Event not generated for collection failure or pod not in running state

Solution 3
You can try the following:

1. Search for event generated in the log.


2. If it's an event related to pod not in running state, the log message should read "Critical event generated for pod <pod
name> with state <pod state>.
3. If it's an event related to discovery or metric collection failure, the log message should read "Critical event generated for
job <job name> with state <job state>.
4. In the debug log, search for "Event XML:"

The following screenshot is a sample scenario for an event generated when the pod isn't in a running state.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 103
AI Operations Management - Containerized 24.4

1.6.2. AMC throws an SSL connection error

Cause
You may get this error if the communication between the Agent Metric Collector and the Operations Agent fails. This happens
if you have changed the ASYMMETRIC_KEY_LENGTH from 2048 to 4096 on the OBM server and not on DBC.

Note

The Data Broker Container (DBC) is an Operations Agent node that's managed by OBM. It enables the Agent Metric Collector to
communicate with OBM and receives the certificate updates.

Solution
Change the ASYMMETRIC_KEY_LENGTH to 4096 on DBC. Follow the steps on DBC (Agent node or managed node):

1. Update the configuration variable ASYMMETRIC_KEY_LENGTH using the following command:

ovconfchg -ns sec.cm -set ASYMMETRIC_KEY_LENGTH <RSA Encryption algorithm supported key length>

2. To remove the existing node certificate on the agent, run the following commands:

ovcert -remove <certificate alias>

ovcert -remove <CA certificate alias>

3. To request a new node certificate from the management server, run the following command:

ovcert -certreq

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 104
AI Operations Management - Containerized 24.4

1.6.3. AMC failed to collect metrics


Operations Agent isn't logging performance metrics.

Cause
This issue may occur if a metric class is missing in the parameter file.

Solution
1. Go to:
On Linux: /var/opt/perf/parm
On Windows: %OvDataDir%\parm.mwc
2. Look for the following line in the parameter file: log global application process device=disk, cpu, filesystem transaction
If a metric class is missing then add it.
3. Run the command: ovc -restart oacore

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 105
AI Operations Management - Containerized 24.4

1.6.4. Topology forward to OPTIC DL fails


The topology forwarding to OPTIC Data Lake fails. The following warning appears in the itompulsar-proxy logfile:

javax.net.ssl.SSLProtocolException: The certificate chain length (11) exceeds the maximum allowed length (10)

Cause
This issue is because the connection to the OPTIC Data Lake Message Bus proxy fails and the topology data push doesn't
work.

Solution
Follow these steps to resolve this issue:

1. The error indicates that the certificate chain length is greater than the configured. Run the following command to
update the certificate chain length:
helm upgrade <release name> -f <values YAML filename> -n <application namespace> <chart location> --set itomdipulsar.proxy.co
nfigData.PULSAR_MEM="-Xms2g -Xmx2g -XX:MaxDirectMemorySize=1g -Djdk.tls.maxCertificateChainLength=15"
2. Run the following commands to verify the certificate chain length settings:
kubectl get pods -n <application namespace>
kubectl exec -it itomdipulsar-proxy-<pod value> -n <application namespace> -c itomdipulsar-proxy /bin/bash
ps -ef |grep -i java
The Java process JVM arguments display the parameter updated in step 1.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 106
AI Operations Management - Containerized 24.4

1.6.5. AMC failed to discover new nodes

Problem
Agent Metric Collector (AMC) isn't collecting metrics from new nodes.

Cause
The new nodes aren't available in a collector configuration or node filter file.

Solution
When your configuration includes the node filter files, those filter files should contain newly added node names. To ensure
that the node filter files contain the new node names, do the following:

Run the AMC benchmark tool on all nodes to get an updated recommendation and node filter list. For more information,
see Use the amc-benchmark-tool.
Update the node filter file with the newly added nodes.
Enable the delta detection capability to get the list of new nodes for which metrics aren't collected. For more
information, see Manage new nodes.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 107
AI Operations Management - Containerized 24.4

1.7. Troubleshoot resource related issues


This section provides the following troubleshooting topics:

Pod "omi-0" not getting started


Worker node does not start

Make sure the resource allocation is as per Sizing Calculator. For more information on Sizing Calculator, see: Sizing the
deployment.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 108
AI Operations Management - Containerized 24.4

1.7.1. Pod "omi-0" not getting started


An error appears as the pod "omi-0" not getting started after the successful installation of OMT.

Solution
We recommend to increase the disk I/O to minimum 100-150 MBs.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 109
AI Operations Management - Containerized 24.4

1.7.2. Worker node does not start


Due to missing disk space, the worker nodes does not start.

Solution
To solve this problem, make sure that the / and /var directories have at least 5 GB free disk space.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 110
AI Operations Management - Containerized 24.4

1.8. Troubleshoot Docker


This section provides the following troubleshooting topics:

Login to Docker Hub fails


"System error: read parent: connection reset by peer"
Docker pull doesn't work: Error while pulling image

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 111
AI Operations Management - Containerized 24.4

1.8.1. Login to Docker Hub fails

Solution
To solve this problem, try these solutions:

Make sure to provide the correct user name and password.


Make sure to configure the Docker HTTP proxy as follows:
/usr/lib/systemd/system/docker.service.d/http_proxy.conf
Make sure to configure the host HTTP proxy as follows:
export http_proxy https_proxy

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 112
AI Operations Management - Containerized 24.4

1.8.2. "System error: read parent: connection


reset by peer"
You cannot start a container and it fails with the error message "System error: read parent: connection reset by peer".

Solution
Edit the kube-registry-proxy.yaml file by adding the following parameters:

name: DOCKER_FIX
value: "dockerfix"

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 113
AI Operations Management - Containerized 24.4

1.8.3. Docker pull doesn't work: Error while


pulling image
Docker pull does not work on worker nodes. You see the following error message: "Error while pulling image: Get http://localhost:5
000/v1/repositories/itom-hcm-poca-jenkins/images: read tcp 127.0.0.1:43074->127.0.0.1:5000: read: connection reset by peer" .

Solution
1. Change the subnet mask to 255.255.255.0.
2. Configure the parameter FLANNEL_BACKEND_TYPE as follows:
FLANNEL_BACKEND_TYPE = vxlan

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 114
AI Operations Management - Containerized 24.4

1.9. Troubleshoot Post Installation Issues


This section provides the following troubleshooting topics:

Troubleshoot verification of installation


Renew token failed in http_code=403
Book-keeper pods fail
Find the pod logs

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 115
AI Operations Management - Containerized 24.4

1.9.1. Troubleshoot verification of installation


Check the resources (pods, deployments, service)
You can use the key-value pairs to list resources (pods, deployments, service) by their labels using the following command: k
ubectl get pods --selector="itom.microfocus.com/capability"="itom-data-ingestion" -n opsb-helm

Example: The following command lists all the pods that are part of collection-services capability.
kubectl get pods --selector="itom.microfocus.com/capability"="collection-services" -n opsb-helm
The available key-value pairs are:
app:
app.kubernetes.io/managed-by
app.kubernetes.io/name
app.kubernetes.io/version
itom.microfocus.com/capability
itom.microfocus.com/description

You can use this command to view pods which are not running:

kubectl get pods --all-namespaces -o wide | awk -F " *|/" '($3!=$4 || $5!="Running") && $5!="Completed" {print $0}'

You can use this command to watch the pods' status changes:

kubectl get pods --all-namespaces -w

Verify Stakeholder Dashboard Installation

Verify the status of BVD pods


To get all pods in the deployment, run the following command:
kubectl get pods -n <suite namespace>
The following table lists BVD pods and they should be in running state:

Pod Description

Hosts the in-memory redis data


bvd-redis
structure.

bvd-controller-
Performs database initialization.
deployment

bvd-ap-bridge Handles licensing.

bvd-www-deployment Hosts the BVD Reporting application.

bvd-receiver-deployment Receives data for BVD reporting.

bvd-quexserv Performs query execution

Verify Reporting Installation

Verify if schema tables are created


After you install the reporting capability, the schema tables are automatically created. You may verify if the raw and
aggregated tables are created in the OPTIC Data Lake. For a complete list of tables, see the AI Operations Management Data
Model.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 116
AI Operations Management - Containerized 24.4

Verify if Reports are imported


After you install the reporting capability, the BVD dashboards are automatically imported. You may verify if the following BVD
dashboards are imported:

System infrastructure reports:

System Executive Summary


System Availability - Top N
System Availability Detail
System CPU - Top N
System Memory - Top N
System Resource Detail
System Disk Space - Top 10
System Disk Space Detail
System Resource Top 3

Event reports:

Event Executive Summary


Event Assignment by User Group
Event Assignment by User
Events by CI Type
Events by CI
Events by Policy
Events by ETI

Real user monitor reports:

RUM_AllApps_Dashboard
RUM_PerApp_Dashboard

Verify Automatic Event Correlation Installation


To verify the installation, you need to ensure that the status of the pods is Running.
Automatic Event Correlation (AEC) creates several Kubernetes deployments and one CRON job. The table below lists the
associated pod names.

Pod Description

itom-analytics-
REST service that manages information that is
datasource-
required to integrate EA with OBM.
registry

itom-analytics-ea- REST service for Automatic Event Correlation


config internal configuration.

itom-analytics- Pulsar client that reads attributes from input


event-attribute- data (for example, OBM events) and triggers
reader internal logic.

itom-analytics- Container is responsible for sending results


opsbridge- back from EA to OBM such as Auto Event
notification Correlation events.

itom-analytics-
Indicates CRON job. The CRON job runs every
auto-event-
10 minutes.
correlation-job

The Kubernetes keeps up to three associated pods in the active pod list and their status should either be Running or
Completed.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 117
AI Operations Management - Containerized 24.4

Note

The itom-analytics-datasource-registry pod will switch to Running from Init only after the DI administration services become
available.

In Vertica, the name of the tables associated with Automatic Event Correlation are prefixed with " aiops_ ." Two of the tables
that are associated with a Data Set created by DI are:

aiops_correlation_event
aiops_correlation_event_rejected

The Automatic Event Correlation capability creates the schema " itom_analytics_provider_default". The following internal tables
for Automatic Event Correlation are created in that schema:

aiops_internal_aec_user_groups
aiops_internal_correlation_graph
aiops_internal_correlation_groups
aiops_internal_correlation_metadata
aiops_internal_correlation_transactions
aiops_internal_topology_metadata
aiops_internal_topological_mappings

Note

You will be able to see the aiops_internal_topological_mappings table created only after the cmdb_entity_* tables are
created by Content Administration Service job.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 118
AI Operations Management - Containerized 24.4

1.9.2. Renew token failed in http_code=403


Error message “ERROR: Renew token failed in http_code=403”.

Cause
This issue relates to the container kubernetes-vault-renew. You see this error message when the vault token has expired.

Solution
You have to generate a new vault token. You can follow either of these steps.

Solution 1
Initialize a new token with the following commands on the master (control plane) node:

cd $CDF_HOME/bin

kube-restart.sh

Solution 2
Delete the pod manually if the ReplicationController of Deployment manages the pod. A new pod creates automatically.
If the Replication Controller does not manage the pod, run the following command on the node which runs the pod:

pdocker restart `docker ps -a |grep <podName> |grep kubernetes-vault-init|awk '{print $1}'`

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 119
AI Operations Management - Containerized 24.4

1.9.3. Book-keeper pods fail


When you run kubectl describe on failing Book-keeper pods, messages similar to the following are displayed:

"1 node(s) didn't match node selector"


" 3 node(s) had volume node affinity conflict"

The number of nodes in the above messaged could vary.

Cause
This issue occurs if you have configured local storage provisioner on the control plane node. When the control plane node is
not configured to share the workload with the worker nodes, you must not add new disks or use the local directories and
configure local storage provisioner on the control plane node.

Solution
Perform the following tasks to remove local storage provisioner on the control plane node:

1. Make a note of the number of replicas for the book-keeper and zoo-keeper pods:
kubectl get statefulset -n <suite_namespace>
Example:

kubectl get statefulset -n opsb


NAME READY AGE
itom-monitoring-pt-coso-dl-data-access 1/1 47d
itom-monitoring-pt-zookeeper 1/1 47d
itomdipulsar-autorecovery 1/1 47d
itomdipulsar-bastion 1/1 47d
itomdipulsar-bookkeeper 3/3 47d
itomdipulsar-zookeeper 3/3 47d

Here, the number n/n under Ready indicates the total number of replicas for that pod. In this example, the bookkeeper
and zookeeper pods have 3 replicas each.
2. Scale down the bookkeeper pods:
kubectl scale -n <suite_namespace> statefulset itomdipulsar-bookkeeper --replicas=0

3. Scale down the zookeeper pods:


kubectl scale -n <suite_namespace> statefulset itomdipulsar-zookeeper --replicas=0

4. Run the following command to view the local persistent volumes:


kubectl get pv

kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGEC
LASS REASON AGE
db-single 5Gi RWX Retain Bound core/db-single-vol cdf-default 4
7d
itom-logging 5Gi RWX Retain Bound core/itom-logging-vol cdf-default
47d
itom-vol 5Gi RWX Retain Bound core/itom-vol-claim cdf-default 4
7d
local-pv-1131a5eb 154Gi RWO Delete Bound opsb/itomdipulsar-bookkeeper-journal-itomdipulsar-bookkeeper
-2 fast-disks 47d
local-pv-1529aa1b 154Gi RWO Delete Available fast-disks 47
d
local-pv-282f7bdf 154Gi RWO Delete Available fast-disks 47
d
local-pv-2c1edf48 154Gi RWO Delete Bound opsb/itomdipulsar-zookeeper-zookeeper-data-itomdipulsar-zook
eeper-2 fast-disks 47d
local-pv-30cf49a5 154Gi RWO Delete Available fast-disks 47

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 120
AI Operations Management - Containerized 24.4

d
local-pv-3aeb6ceb 154Gi RWO Delete Available fast-disks 47
d
local-pv-3ea851fe 154Gi RWO Delete Bound opsb/itomdipulsar-bookkeeper-ledgers-itomdipulsar-bookkeeper
-0 fast-disks 47d
local-pv-40b7d363 154Gi RWO Delete Available fast-disks 47
d
local-pv-525ce6a3 154Gi RWO Delete Bound opsb/itomdipulsar-bookkeeper-ledgers-itomdipulsar-bookkeeper
-2 fast-disks 47d
local-pv-56c439d0 154Gi RWO Delete Available fast-disks 47
d
local-pv-57221822 154Gi RWO Delete Available fast-disks 47
d
local-pv-65e8b9f1 154Gi RWO Delete Available fast-disks 47
d
local-pv-704dc194 154Gi RWO Delete Available fast-disks 47
d
local-pv-730252fc 154Gi RWO Delete Bound opsb/itomdipulsar-bookkeeper-journal-itomdipulsar-bookkeeper-
0 fast-disks 47d
local-pv-76fbd452 154Gi RWO Delete Available fast-disks 47
d
local-pv-7aa7b6d9 154Gi RWO Delete Available fast-disks 47
d
local-pv-7b218186 154Gi RWO Delete Available fast-disks 47
d
local-pv-7c22b037 154Gi RWO Delete Available fast-disks 47
d
local-pv-842f2fa0 154Gi RWO Delete Bound opsb/itomdipulsar-bookkeeper-ledgers-itomdipulsar-bookkeeper-
1 fast-disks 47d
local-pv-87baf490 154Gi RWO Delete Bound opsb/itomdipulsar-zookeeper-zookeeper-data-itomdipulsar-zook
eeper-1 fast-disks 47d
local-pv-93b6c6d7 154Gi RWO Delete Available fast-disks 47
d
local-pv-99265f58 154Gi RWO Delete Bound opsb/itomdipulsar-zookeeper-zookeeper-data-itomdipulsar-zook
eeper-0 fast-disks 47d
local-pv-a0e1a70e 154Gi RWO Delete Available fast-disks 47
d
local-pv-a8bb2cfa 154Gi RWO Delete Bound opsb/itomdipulsar-bookkeeper-journal-itomdipulsar-bookkeeper-
1 fast-disks 47d
local-pv-b96141b2 154Gi RWO Delete Available fast-disks 47
d
local-pv-b9b30a27 154Gi RWO Delete Available fast-disks 47
d
local-pv-bb1fb96e 154Gi RWO Delete Available fast-disks 47
d
local-pv-bca7630d 154Gi RWO Delete Available fast-disks 47
d
local-pv-bff6f09 154Gi RWO Delete Available fast-disks 47
d
local-pv-c303c8b6 154Gi RWO Delete Available fast-disks 47
d
local-pv-cf2f5162 154Gi RWO Delete Available fast-disks 47
d
local-pv-e8aa1bc6 154Gi RWO Delete Available fast-disks 47
d
local-pv-f085e3cc 154Gi RWO Delete Available fast-disks 47
d
local-pv-f619faaf 154Gi RWO Delete Available fast-disks 47
d
local-pv-fe791519 154Gi RWO Delete Available fast-disks 47
d
local-pv-ff46c635 154Gi RWO Delete Available fast-disks 47
d
vol1 10Gi RWX Retain Bound opsb/opsb-dbvolumeclaim 4
7d
vol2 10Gi RWX Retain Bound opsb/opsb-configvolumeclaim
47d

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 121
AI Operations Management - Containerized 24.4

vol3 10Gi RWX Retain Bound opsb/opsb-datavolumeclaim 4


7d
vol4 10Gi RWX Retain Bound opsb/opsb-logvolumeclaim 4
7d
vol5 10Gi RWX Retain Available 47d
vol6 10Gi RWO Retain Available omistatefulset 47d
vol7 10Gi RWO Retain Available omistatefulset 47d

5. Delete the PVCs of zookeeper and bookeeper pods that are bound to disks mounted on control plane nodes:
kubectl delete pvc <PVC_name_of_the_fastdisks> -n <suite_namespace>
Example:

kubectl delete pvc itomdipulsar-bookkeeper-ledgers-itomdipulsar-bookkeeper-0 -n opsb


kubectl delete pvc itomdipulsar-bookkeeper-journal-itomdipulsar-bookkeeper-1 -n opsb
kubectl delete pvc itomdipulsar-bookkeeper-ledgers-itomdipulsar-bookkeeper-2 -n opsb
kubectl delete pvc itomdipulsar-bookkeeper-journal-itomdipulsar-bookkeeper-0 -n opsb
kubectl delete pvc itomdipulsar-bookkeeper-journal-itomdipulsar-bookkeeper-2 -n opsb
kubectl delete pvc itomdipulsar-bookkeeper-ledgers-itomdipulsar-bookkeeper-1 -n opsb
kubectl delete pvc itomdipulsar-zookeeper-zookeeper-data-itomdipulsar-zookeeper-1 -n opsb
kubectl delete pvc itomdipulsar-zookeeper-zookeeper-data-itomdipulsar-zookeeper-2 -n opsb
kubectl delete pvc itomdipulsar-zookeeper-zookeeper-data-itomdipulsar-zookeeper-0 -n opsb

6. Delete the PVs of zookeeper and bookeeper pods that are bound to disks mounted on control plane nodes: kubectl delete
pv <PV_name_of_the_fastdisks>
Example:

kubectl delete pv local-pv-3e49e84a


kubectl delete pv local-pv-5ef2a7d5
kubectl delete pv local-pv-63d53e90
kubectl delete pv local-pv-6a918ae7
kubectl delete pv local-pv-74549671
kubectl delete pv local-pv-cc2b7212
kubectl delete pv local-pv-dd1c2fec
kubectl delete pv local-pv-e0d68fb8
kubectl delete pv local-pv-ecf7f67f

7. Umount the disks that were added for the configuration of local storage provisioner on the control plane node:
umount /mnt/disks/<vol_name>
Example:

umount /mnt/disks/vol1
umount /mnt/disks/vol2
umount /mnt/disks/vol3

8. Scale up the bookkeeper pod:


kubectl scale -n <suite_namespace> statefulset itomdipulsar-bookkeeper --replicas=N
Where, N is the total number of replicas for the bookeeper pod.
9. Scale up the zookeeper pod:
kubectl scale -n <suite_namespace> statefulset itomdipulsar-zookeeper --replicas=N
Where, N is the total number of replicas for the zookeeper pod.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 122
AI Operations Management - Containerized 24.4

1.9.4. Find the pod logs


If an error occurs during AI Operations Management installation or after installation, you can use the script find-current-pod-log
s.sh to find the relevant NFS server and log folder for a given running pod to find the log files associated with the pods.

Run the following script in the current folder opsbridge-suite-chart/scripts:

./find-current-pod-logs.sh

When you run the script it'll prompt for the POD number. Enter the POD number from the list for which you want to find
the relevant NFS server and log folder details.

Note

To view the relevant information you must give the POD number from the list and not the POD
name.

For example:

[root@mastermulti scripts]# ./find-current-pod-logs.sh


find-current-pod-logs.sh v1.0 running against CDF version 2021.11.x
=======================================================================
1) bvd-ap-bridge-65985d4dbb-7bs9r
2) bvd-controller-deployment-587759749f-sb52j
3) bvd-explore-deployment-789d7d54fb-pb492
4) bvd-quexserv-58f677798f-4v4s7
5) bvd-receiver-deployment-6478f5f68d-bxrlt
6) bvd-redis-757f478468-89c26
7) bvd-www-deployment-794b6b7464-m9fsb
8) credential-manager-9cc77778-47qv6
9) itom-analytics-aec-explained-58d66d76d-2mj2p
10) itom-analytics-auto-event-correlation-job-27248100-hgprg
11) itom-analytics-auto-event-correlation-job-27248110-mvtz8
12) itom-analytics-auto-event-correlation-job-27248120-vh6hk
13) itom-analytics-datasource-registry-76b6dd7f98-2ntw5
14) itom-analytics-ea-config-68575bf799-jpbt5
15) itom-analytics-event-attribute-reader-5dfb77c54d-kjm46
16) itom-analytics-opsbridge-notification-7868879f86-cdgfp
17) itom-analytics-text-clustering-server-88f996cf4-jngwv
18) itom-autopass-lms-5b8d9456c4-twd8h
19) itom-di-administration-7cc45f6bd7-fh9xr
20) itom-di-data-access-dpl-84f6f56cc7-jllwc
21) itom-di-dp-job-submitter-dpl-69b6dcc69f-vkmrs
22) itom-di-dp-master-dpl-7866c6776d-m7vdh
23) itom-di-dp-worker-dpl-5c94f8cdbb-5vgtr
24) itom-di-metadata-server-6d585ccd7-4w6pf
25) itom-di-postload-taskcontroller-8694b55cc7-4d2bm
26) itom-di-postload-taskexecutor-6c67b9485b-s9tbc
27) itom-di-receiver-dpl-766977df94-whrmg
28) itom-di-scheduler-udx-b879dcddf-wbshm
29) itom-di-vertica-dpl-585b66c774-v65kq
30) itom-idm-d79bf965c-rgbtc
31) itom-ingress-controller-8df444557-fgf58
32) itom-ingress-controller-8df444557-hrj9n
33) itom-monitoring-admin-8449c884cd-fr2gc
34) itom-monitoring-collection-autoconfigure-job-8k6il-6p5m7
35) itom-monitoring-collection-manager-5b889b99f4-bvnh9
36) itom-monitoring-job-scheduler-56c796b646-5gddv
37) itom-monitoring-oa-discovery-collector-747dd7b955-vl7hg
38) itom-monitoring-oa-metric-collector-6494b7669c-blt4s
39) itom-monitoring-service-data-broker-5d78dbd95b-fcm4w
40) itom-monitoring-snf-89987f874-dd6nv
41) itom-omi-aec-integration-jhfw8-xrksq
42) itom-omi-aec-integration-watcher-27248100-hq9nx

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 123
AI Operations Management - Containerized 24.4

43) itom-omi-aec-integration-watcher-27248110-9l8tn
44) itom-omi-aec-integration-watcher-27248120-bzgvc
45) itom-omi-csr-granter-gc6xi-np5dn
46) itom-omi-di-integration-w0d4g-c5kd6
47) itom-opsb-content-management-job-wihuv-x6gxj
48) itom-opsb-content-manager-7968f45f8c-sbkr2
49) itom-opsb-db-connection-validator-job-7457f
50) itom-opsb-resource-bundle-7c7d559bc5-h7tkp
51) itom-opsbridge-cs-redis-fd8848b6c-xl7nq
52) itom-opsbridge-data-enrichment-service-c545d766-8fjvh
53) itom-reloader-d78b57dc5-s42lz
54) itom-ucmdb-0
55) itom-vault-5977f4b7ff-2q2zs
56) itomdimonitoring-gen-certs-job-shzrt
57) itomdimonitoring-verticapromexporter-58bfcd4bd4-cpjk5
58) itomdipulsar-bookkeeper-0
59) itomdipulsar-bookkeeper-init-vxgzo5r-gbhcj
60) itomdipulsar-broker-68597546cb-b25nn
61) itomdipulsar-minio-connector-post-upgrade-job-ym81yor-npt9w
62) itomdipulsar-proxy-85c5d89594-kktfl
63) itomdipulsar-zookeeper-0
64) itomdipulsar-zookeeper-metadata-lwjdff3-6dffx
65) omi-0
66) omi-artemis-7bdd945f5b-2jzwd
67) opr-event-dataset-updater-lcsb3-pkbjx
68) webtopdf-deployment-978d56fb5-4d85v
Select POD: 6
bvd-redis-757f478468-89c26
Go to the NFS Server: yournfsserver.example.net
Then: cd /var/vols/itom/cdf-log/container/

Run the following command: "ls -lrt |grep bvd-redis-757f478468-89c26"

Note

Instead of redirecting to the NFS server manually you can use the option to mount/unmount the NFS volume and move to the local
folder. You can use the following options with the POD logger script:

-m : (optional) will mount the NFS volume locally and move to that local folder
-u : (optional) will unmount a previously mounted volume

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 124
AI Operations Management - Containerized 24.4

1.10. Troubleshoot Postgresql


This section provides the following troubleshooting topic:

idm-postgresql cannot access /var/pgdata

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 125
AI Operations Management - Containerized 24.4

1.10.1. idm-postgresql cannot access /var/pgdata


You receive the error message that idm-postgresql cannot access /var/pgdata/.

Solution
Make sure the user owns the right group, for example, group ID:1999: User ID: 1999.

An example command: chown -R 1999:1999 /var/pgdata/

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 126
AI Operations Management - Containerized 24.4

1.11. Troubleshoot Agentless Monitoring


This section covers the following topic(s):

Troubleshoot APM integration configuration.


Unable to see SiteScope providers in Agentless Monitoring UI

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 127
AI Operations Management - Containerized 24.4

1.11.1. Troubleshoot APM integration


configuration
Problem
Objects with duplicate APM IDs or with APM ID == (-1) exists.

Solution
Follow these steps to resolve the issue:

1. Open a JMX Console (there is one provided in <SiteScope root directory>\java\bin\jconsole.exe), and
enter 28006 (the default port) in the Port field.
In the MBeans tab, select com.mercury.sitescope/Integration/Bac/Tools/BacIntegrationToolsJMX.
For objects with duplicate APM IDs, activate fixDuplicateBACConfiguration() .
For objects with APM ID == (-1), activate fixMinusOneBACConfiguration() .
It's also recommended to activate softSync() to send the new configuration to APM.
2. If measurements have the wrong category ID, restart SiteScope.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 128
AI Operations Management - Containerized 24.4

1.11.2. Quick Report shows no data


Quick Report shows no data.

Cause
Metrics don't reach OPTIC Data Lake.

Solution
Ensure that the SiteScope metric streaming is enabled to OPTIC Data Lake and that SiteScope is integrated with OBM.

Follow these steps:

1. Integrate SiteScope metrics with OPTIC Data Lake


2. SiteScope is integrated with OBM
I. Establish trust between SiteScope and AI Operations Management
i. Task 1: Establish trust from SiteScope to the application
ii. Task 2: Establish trust from the AI Operations Management to SiteScope
II. Create a connected server in OBM and verify topology synchronization
i. Task 1: Create a user
ii. Task 2: Create a user group and associate the user and the group
iii. Task 3: Add SiteScope as a Connected Server
iv. Task 4: Verify topology from SiteScope in RTSM

Validation
Run the following queries to ensure that the metric steraming is enabled and reaching OPTIC Data Lake:

SELECT node_fqdn, to_timestamp_tz(max(timestamp_utc_s)) as 'latest timestamp', count(*), timestampdiff(minute, to_timestamp(max(


timestamp_utc_s)), clock_timestamp()) as 'age in minutes' FROM mf_shared_provider_default.opsb_agentless_node GROUP BY 1 ORDER
BY 2 DESC,1;

Here mf_shared_provider_default is the default schema name.

Exmaple output:

output

If you are using a different schema name, then run the following query:

SELECT schema_id,schema_name,u.user_name as owner,create_time,is_system_schema from v_catalog.schemata s join v_catalog.user


s u on s.schema_owner_id = u.user_id order by schema_name;

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 129
AI Operations Management - Containerized 24.4

1.11.3. Unable to see SiteScope providers in


Agentless Monitoring UI

Problem
You can't see the SiteScope providers, monitor groups, and monitors under the provider groups in Agentless Monitoring UI
after you onboard SiteScope.

Solution
Go through the following sections to check and rectify the issue.

Verify using CLI


1. Run the command to check the target using CLI. You should check the URL, port, and hostname of the SiteScope.
Note: Make a note of the SiteScope target endpoint and ensure the URL doesn't have a "/" at the end.

ops-monitoring-ctl.exe get target

For example in Windows:

C:\> ops-monitoring-ctl.exe get target


NAME SUBTYPE ENDPOINT
SiteScopeWindowsTarget sis-url http://sitescopewin.abc.com:8080/SiteScope

For example in Linux:

# ./ops-monitoring-ctl get target


NAME SUBTYPE ENDPOINT
SiteScopeLinuxTarget sis-url https://sitescopelnx.abc.com:8443/SiteScope

2. Run the command to check the provider group you created.

ops-monitoring-ctl.exe get providergroups

For example:

C:\> ops-monitoring-ctl.exe get providergroups


ID NAME DESC PROVIDER TYPE Parent Name
5023cbc4-e8b0-44bc-8f91-056a20741770 SiteScopeProviderGrp SiteScopeProviderGrp providergroup

3. Run the command to check the provider properties; check if it points to the exact target and provider group names.
Ensure that the name should match as names are case sensitive.

ops-monitoring-ctl.exe get providers

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 130
AI Operations Management - Containerized 24.4

For example in Windows:

C:\> ops-monitoring-ctl.exe get providers


ID NAME DESC Parent Name PROVIDER TYPE TARGETNAM
E
984ac31a-2ba4-485b-90e9-73280e0fbdd7 provider_SiteScopeWin provider_SiteScopeWin SiteScopeProviderGrp pro
vider [SiteScopeWindowsTarget]

For example in Linux:

# ./ops-monitoring-ctl get providers


ID NAME DESC Parent Name PROVIDER TYPE TARGETNAME
c11733f6-8127-4253-9ebc-493b6d5ec640 provider_SiteScopeLnx provider_SiteScopeLnx SiteScopeProviderGrp provid
er [SiteScopeLinuxTarget]

Verify using SiteScope


1. Check the roles created in SiteScope. The names are case sensitive and should match the IDM roles for SiteScope. For
example: SISadmin and SISuser .

2. Check the URL of the IDM server in the classic SiteScope master.config file or in infrastructure settings. The URL must
end with "/".
For example: https://<FQDN_of_the_external_access_host>/idm-service/v3.0/tokens/

3. You must import the application certificate to SiteScope from SiteScope UI > Preferences > Certificate
management

4. Restart SiteScope if you have done any changes.

5. Check the error.log in SiteScope server machine for errors.

Verify using IDM from the application


1. In IDM, check if the IDM default roles are correctly assigned to the Agentless Monitoring UI user (use the same user that
you used to login to Agentless Monitoring UI)

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 131
AI Operations Management - Containerized 24.4

For details, see Manage IDM users.

2. If SiteScope is https, you should import the SiteScope CA certificate to the suite and vice versa. See Add SiteScope
certificates.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 132
AI Operations Management - Containerized 24.4

1.12. Troubleshoot Application Monitoring


This section covers the following topics:

APM fails to communicate CI's deletion to Application Monitoring


Sync Issue between APM and Application Monitoring
Missing Files Content/Files Section Under Monitor Resources

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 133
AI Operations Management - Containerized 24.4

1.12.1. APM fails to communicate CI's deletion to


Application Monitoring
In case of a network issue, Application Performance Management (APM) fails to communicate CI's deletion to MCC
Application Monitoring.

Solution
User must manually delete CI's from MCC UI.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 134
AI Operations Management - Containerized 24.4

1.12.2. Sync Issue between APM and MCC


Application Monitoring
Sync issue between Application Performance Management (APM) and MCC Application Monitoring

Cause
Network or deployment issues. Check the log files in the apm-config-sync-service pod to verify the sync flow.

Solution
Restart the apm-config-sync-service pod.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 135
AI Operations Management - Containerized 24.4

1.12.3. Setup is idle for 15 mins during sync


After MCC Application Monitoring integration with Application Performance Management (APM) the setup is idle for 15
minutes.

Cause
The data sync begins after the APM to MCC sync interval of 15 minutes is completed.

Solution
Restart APM sync service pod to start the data sync immediately.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 136
AI Operations Management - Containerized 24.4

1.12.4. Missing Files Content/Files Section Under


Monitor Resources
Missing Files Content/Files Section under Monitor Resources.

Solution
Restart monitoring-resources pod.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 137
AI Operations Management - Containerized 24.4

1.12.5. Error "crash loop back off"


APM sync service pod fails to start in Azure environment, error message: crash loop back off.

Solution
Update the env name and value of the deployment, perform the following steps:

1. Run the command:

kubectl edit deploy itom-apm-config-sync-service -n <namespace>

2. Navigate to env variable section.


3. Add env name as SERVICE_HEAP and value as 500000000 .
4. Save the deployment.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 138
AI Operations Management - Containerized 24.4

1.13. Troubleshoot OBM


This section covers the troubleshooting topics for OBM.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 139
AI Operations Management - Containerized 24.4

1.13.1. OBM pod fails to start


At times, the OBM pod fails to start after installation with the following error:

omi - ERROR: failed with status 1 in /docker-entrypoint

Cause
This issue occurs when the following NFS volumes hosting omi-0 and omi-1 pods have files from previous chart installation

Solution
To resolve this issue, do the following:

1. Delete the omi persistent volumes.


2. Start and complete a fresh installation of AI Operations Management again.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 140
AI Operations Management - Containerized 24.4

1.13.2. omi-1 pod has no policies deployed

Problem
After the successful containerized installation and deployment of OBM in high availability mode, the omi-1 pod does not have
any policies deployed to it.

Solution
Do the following to deploy the policies to omi-1 :

List the deployed policies on omi-0 : kubectl exec omi-0 -n <namespace> -c omi -- /opt/OV/bin/ovpolicy -list.
In the OBM UI, deploy the same policies to omi-1 .

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 141
AI Operations Management - Containerized 24.4

1.13.3. Business logic engine service is currently


unavailable
An error message appears as Business logic engine service is currently unavailable when you run discovery or integration jobs.
This is possible while working on the OBM dashboards that are tightly coupled with the RTSM model.

Cause
Due to the discovery or integration jobs, marble receives massive topology changes from the UCMDB. By design, marble
restarts the dashboard service to reload the model. The outage lasts about 90 seconds after the shutdown notification was
received.

Solution
We suggest not to run discovery or integration jobs during the hours in which you modify dashboards.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 142
AI Operations Management - Containerized 24.4

1.13.4. Classic OBM UI remains inaccessible after


logging into AI Operations Management
If you log into AI Operations Management and then log into classic OBM, you can't access classic OBM and you are prompted
for the login credentials again.

Cause
This issue occurs in classic OBM due to the browser cookie for SSO of AI Operations Management overwriting the browser
cookie of OBM. To solve this issue, configure the AI Operations Management to use lightweight single sign-on (LW-SSO) and
not hpsso.

Note

This issue occurs only for classic OBM and not for AI Operations Management.

This issue occurs only when the FQDN of classic OBM and AI Operations Management are the same.

For example, FQDN of classic OBM has the format, [hostname1].[domain].[tld] such as mambo8.mambo.net and
FQDN of AI Operations Management has the format, [hostname2].[domain].[tld] such as omidock.mambo.net .

In this scenario, you can’t access the classic OBM because the AI Operations Management sets an SSO for *.mambo.net,
which overwrites the SSO set by the classic OBM to mambo8.mambo.net. However, if the FQDN of classic OBM is completely
different, you won’t encounter the issue.

Solution
Perform the following steps:

1. Log in to the AI Operations Management/idm-admin URL.

2. Choose the SYSTEM SETTINGS tab.

3. Switch from Basic to Advanced mode.

4. From the LWSSO section, locate and double-click the Creation Domain Mode.

5. Edit the value and modify it to lwsso from hpsso.

6. Save the configuration.

7. Log out from AI Operations Management.

8. In classic OBM, from your web browser, open the developer console and expand Cookies.

For example, to open the developer console and check cookies in Google Chrome:

a. Right-click on the browser and click Inspect to open the developer console

b. Go to the Applications tab on the console.

c. Expand the Cookies dropdown under the Storage section.

9. Under Cookies, delete all cookies that are displayed.

10. Reload the classic OBM page in the same browser and you can now access it.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 143
AI Operations Management - Containerized 24.4

1.13.5. Event browser connection error


When you open OBM Event Browser and CDF management portal in the same browser, event browser connection error is
displayed.

Solution
If you want to open the CDF management portal and OBM in the same browser, use a private browsing window for one of
them.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 144
AI Operations Management - Containerized 24.4

1.13.6. Japanese view names do not appear


correctly

After reloading the Event Browser (HTML), Japanese view names are displayed as question marks, and no events are listed.

Solution
This issue occurs if the Microsoft SQL Server database is not installed on a Japanese operating system. To resolve the issue,
install the MS SQL database on a Japanese operating system.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 145
AI Operations Management - Containerized 24.4

1.13.7. OBM dialog boxes and applets fail to load

OBM dialog boxes and applets, such as the Authentication Wizard, do not load properly.

Cause
Old java files on your client computer.

Solution
Clear the Java cache by following this procedure:

1. Open Control Panel > Java > Temporary Internet Files > Settings .
2. In the Temporary Internet Files section, click Settings.
3. In the Temporary File Settings dialog box, click Delete Files.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 146
AI Operations Management - Containerized 24.4

1.13.8. OBM services not starting


OBM services do not start when OBM is reconfigured using a new set of empty databases due to the UCMDB packages are
missing.

Solution
In order to reduce the startup time, UCMDB packages are uploaded only once and ucmdb_pkg_up_ok.marker file is created after
a successful upload attempt. If you want to reconfigure OBM by using a new set of empty databases, you will skip the upload
step as the marker file indicates that the UCMDB packages are already uploaded.

To work around this issue, before running the reconfiguration, make sure you delete the marker file in the following location:

/var/opt/OV/conf/ucmdb_pkg_up_ok.marker

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 147
AI Operations Management - Containerized 24.4

1.13.9. OBM UI fails to load the page due to


lightweight single sign-on issue
If you access UIs from the Operation Bridge management portal, you may experience issues loading the user interface. The
error message appears as:

An error occurred. Server Response could not be Parsed.

Cause
You face this issue due to lightweight single sign-on (LW-SSO). You have to set the LW-SSO expiry period value properly.

Solution
Follow the steps:

1. Login to the Operation Bridge Management Portal.


2. Go to Administration or IdM Administration and select SYSTEM SETTINGS tab.
3. Switch from Basic to Advanced mode and select LWSSO section.
4. Edit the LWSSO Expiration Period from 30 to 600.
5. Save the configuration.
6. Log out from user interface and login again to check the availability of UI.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 148
AI Operations Management - Containerized 24.4

1.13.10. Recipients page does not open


The Recipients page does not open, or the recipients page opens, but recipients cannot be added or modified.

This error occurs if the default templates for recipients were not loaded when OBM was installed. Do the following to fix this
issue:

1. Access Administration > RTSM Administration > Package Manager .

2. Click Deploy packages to server (from local disk) .

3. Click Add, select the BSMAlerts.zip file and click Open. Deploy the package.

You may have to copy the BSMAlerts.zip file from the OBM system to your local system.

Recipients that were create before BSMAlerts.zip was redeployed have no valid notification template and must be deleted
and added again.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 149
AI Operations Management - Containerized 24.4

1.13.11. RTSM Administration pages do not load


RTSM Administration pages do not load and are not accessible.

Solution
Make sure that the OBM gateway server is able to access the Default Virtual Server for Application Users URL. This URL
can be available in Infrastructure Settings (go to Administration > Setup and Maintenance > Infrastructure Settings)
If you are using a reverse proxy or load balancer, make sure you log in through the Default Virtual Server for Application
Users URL.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 150
AI Operations Management - Containerized 24.4

1.13.12. Event Correlations are skipped in high


load situations
If a high number of events with ETIs is forwarded to OBM over a long period of time (30 minutes or longer), the correlation
engine only considers the specific number of recent events. The oldest events are removed from the queue and no longer
considered for correlation if the limit is exceeded.

Solution
Go to Administration > Setup and Maintenance > Infrastructure Settings . Check and update the limit of the Max
Waiting Queue Size parameter.
The default value is 5000. The valid range is 100 to 20000. If you are experiencing this problem, lower the incoming event
rate or increase the Max Waiting Queue size limit. If the limit is increased, you should also monitor the memory consumption
and, if necessary, increase the memory setting (parameter -Xmx) for the opr-backend process.

Additionally, check the following:

Rules Topology Pane is Empty


No rule is selected in the Correlation Rules pane
No view is active in the Rules Topology pane

Indicators List is Empty


No configuration item type is selected in the Rules Topology pane
No indicator is defined for the selected CI type

Can't Save Correlation Rule


The rule is invalid or incomplete, for example:

The rule doesn't have at least one symptom event


The rule doesn't have a cause event
The topology path is invalid
The rule refers to a configuration item that's not resolvable

Correlation Generator Displays URL in Title Bar


This issue is related to the Security Settings of Internet Explorer. To display the title in place of the URL, go to: Internet
Options> Security> Internet Zone> Custom Level Enable Allow web sites to open windows without address or status bars.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 151
AI Operations Management - Containerized 24.4

1.13.13. RTSM Gateway gets locked

Cause
When you try to run a query on large CI models, the business_impact_new service is busy for a few minutes. During this
time, the RTSM gateways get locked.

Solution
1. Go to Administration > RTSM Administration > Modeling > Modeling Studio.
2. Edit user preferences in Modeling Studio:
1. Go to Tools > User Preferences > General.
2. Set Show Hidden queries in the Modelling Studio to True and click OK.
3. Log out of OBM and log in again.
4. Depending on your environment, do one of the following to edit the business_impact_new query:
Delete the SLA branch from the query (including C2 CI) if you do not have SLA CIs.
Limit the depth to 4 (or smaller if required).
Limit the number of visited objects to 400000000 or smaller by editing the tql.compound.link.max.visited.objects
file.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 152
AI Operations Management - Containerized 24.4

1.13.14. Workspaces menu is empty


After the reconfiguration of OBM in a distributed environment, the Workspaces menu is empty.

Cause
This issue occurs if a new database is created on the data processing server, with the uimashup files being on the gateway
server.

Solution
If the database was recreated during the reconfiguration, run the following commands to copy the uimashup files to the
correct location:

Windows:

xcopy /S /Y "<OMi_Home>\conf\uimashup\import\loaded\*" "<OMi_Home>\conf\uimashup\import\toload"

Linux:

cp -rf /opt/HP/BSM/conf/uimashup/import/loaded/* /opt/HP/BSM/conf/uimashup/import/toload

A server restart is required.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 153
AI Operations Management - Containerized 24.4

1.13.15. Export My Workspace content to another


system
To move My Workspace content between OBM systems, perform the following steps:

1. On the source gateway system, open the JMX console: http(s)://localhost:29000 . Log in to the JMX console using the
appropriate credentials.

2. Invoke Foundations:service=UIMDataLoader .

3. Invoke exportAllData and specify the following:

1. The path to the directory where OBM should save the configuration files for the exported data.

2. customerID = 1

You can also export a specific type of content, rather than all content, by using the exportEventsMetaData method
for events, the exportComponentsMetaData method for components, or the exportPagesData method for pages.

4. Go to the directory you specified in the previous step and find the following files:

EventsMetaData_<date>_<timestamp>.uim.xml

ComponentsMetaData_<date>_<timestamp>.uim.xml

PagesData_<date>_<timestamp>.uim.xml

5. Copy these files to the target system and save them under <OMi_Home>/conf/uimashup/import/toload in the corresponding
folder: Events, Components, or Pages.

6. On the target gateway system, open the JMX console: http(s)://localhost:29000 . Log in to the JMX console using the
appropriate credentials.

7. Invoke Foundations:service=UIMDataLoader .

8. Invoke loadAllData and specify customerID = 1 .

If you only exported a specific type of content, use loadEventsMetaData , loadComponentsMetaData , or loadPagesData .

9. Log in to OBM and go to the My Workspace area. All content exported from the source system should now be available
in the target system. If everything was imported correctly, the files were moved from the toload to the loaded folder and
there is nothing in the errors folder.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 154
AI Operations Management - Containerized 24.4

1.13.16. Troubleshoot Data Flow Probe


This section provides the following troubleshooting topics:

Cannot transfer Data Flow Probe from one domain to another


Discovery shows disconnected status for a Probe
BSM server and the Probe connection fails due to an HTTP exception
Discovery tab is not displayed
Data Flow Probe node name cannot be resolved to its IP address
mysqld.exe and associated files are not deleted
The Probe fails to start or fails to connect to server
Integration Probe not listed in Data Flow Probe Setup module tree
Troubleshoot PostgreSQL

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 155
AI Operations Management - Containerized 24.4

1.13.16.1. Cannot transfer Data Flow Probe from


one domain to another
Cannot transfer Data Flow Probe from one domain to another.

Cause
Once you have defined the domain of a Probe, you can change its range, but not the domain.

Solution
Install the Probe again:

1. If you are going to use the same ranges for the Probe in the new domain, export the ranges before removing the Probe.
2. Remove the existing Probe from RTSM. For more information on removing Probe, see the Remove Domain or Probe
button in Data Flow Probe Setup Window topic.
3. Install the Probe. For more information on installing the Probe, see the section about installing the Data Flow Probe in
the UCMDB Help.
4. During installation, give the new Probe a different name or delete the reference to Probe from the original domain.

Related topic
For more information on removing Probe, see the Remove Domain or Probe button on Data Flow Probe Setup
Window page.
For more information on installing the Probe, see Install the Data Flow Probe.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 156
AI Operations Management - Containerized 24.4

1.13.16.2. Discovery shows disconnected status


for a Probe

Cause
We do not know the cause for this issue.

Solution
Check the following on the Probe machine:

1. That the Probe is running


2. That there are no network problems
3. If the probe status is Disconnected or Disconnected (being restarted). Search for restart messages in the
wrapperProbeGW logs.
4. If the probe does not restart, try to take probe thread dump from the disconnected time and search for the ProbeGW
Tasks Downloader thread.
5. If there is no probe thread dump, investigate the problematic timeframe in the wrapperProbeGw log. In particular:
1. Check if the probe tasks confirmer has been running for more than 5 minutes.
2. Check if some of the resources are being downloaded for more than 5 minutes.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 157
AI Operations Management - Containerized 24.4

1.13.16.3. BSM server and the Probe connection


fails due to an HTTP exception
The connection between the Business Service Management (BSM) server and the Probe fails due to an HTTP exception.

Cause
The cause for this issue is unknown.

Solution
Ensure that none of the Probe ports are in use by another process.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 158
AI Operations Management - Containerized 24.4

1.13.16.4. The Discovery tab is not displayed


The Discovery tab is not displayed on the main page of Business Service Management.

Cause
The cause for this issue is unknown.

Solution
Install a license for the Probe. For more information, see "Licensing Models for Run-time Service Model" Page is not found.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 159
AI Operations Management - Containerized 24.4

1.13.16.5. Data Flow Probe node name cannot be


resolved to its IP address
The suite cannot resolve a Data Flow Probe node name to its IP address. Due to this, the suite cannot discover the host, and
the Probe does not function in a correct manner.

Cause
We do not know the cause for this issue.

Solution
Add the host machine name to the Windows HOSTS file on the RTSM Data Flow Probe machine.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 160
AI Operations Management - Containerized 24.4

1.13.16.6. mysqld.exe and associated files are


not deleted
After uninstalling the Data Flow Probe, the suite does not delete the mysqld.exe and associated files.

Cause
We do not know the cause for this issue.

Solution
To delete all files, restart the machine which has the installed Data Flow Probe.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 161
AI Operations Management - Containerized 24.4

1.13.16.7. The Probe fails to start or fails to


connect to server
After updating the RTSM Server CUP, the Probe fails to start or fails to connect to server.

Cause
We do not know the cause for this issue.

Solution
The Probe's CUP version must match the RTSM Server's CUP version. If the CUP versions do not align, you must update the
Probe's CUP version. In some cases, you may have to deploy the CUP manually on a Probe.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 162
AI Operations Management - Containerized 24.4

1.13.16.8. Integration Probe not listed in Data


Flow Probe Setup module tree
The Data Flow Probe Setup module tree does not list the Integration Probe when you try to check the Probe connection.

Cause
The Data Flow Probe Setup module displays Data Flow Probes for discovery. Integration Probes—i.e., Probes on Linux
machines, and Windows Probes configured for integration— do not display in the Data Flow Probe Setup module.

Solution
To see the connection of integration Probe, create a dummy integration point and verify that the Probe gets listed among the
Probes that you can select for the integration point (in the Data Flow Probe field).

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 163
AI Operations Management - Containerized 24.4

1.13.16.9. Troubleshoot PostgreSQL


This section provides the following troubleshooting topics:

Unable to find the Data Flow Probe database scripts


Data Flow Probe database service cannot start

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 164
AI Operations Management - Containerized 24.4

1.13.16.9.1. Unable to find the Data Flow Probe


database scripts

Solution
The table below lists the Data Flow Probe database scripts. You can modify these scripts for administration purposes, both in
Windows and Linux environments.

The Data Flow Probe machine hosts these scripts in the following location:
Windows: C:\hp\UCMDB\DataFlowProbe\tools\dbscripts
Linux: /opt/hp/UCMDB/DataFlowProbe/tools/dbscripts
You should change the Data Flow Probe database scripts for specific administration purposes.

Script Description

Exports all data from the DataFlowProbe database schema to


exportPostgresql [PostgreSQL root account password]
data_flow_probe_export.bin in the current directory

importPostgresql [Export file name] [PostgreSQL root account Imports data from a file created by the exportPostgresql script
password into the DataFlowProbe schema

Configures the PostgreSQL Data Flow Probe account to access them


enable_remote_user_access
from remote machines

Configures the PostgreSQL Data Flow Probe account to access them


remove_remote_user_access
from the local machine (default)

set_db_user_password [new PostgreSQL Data Flow Probe account


Modifies the PostgreSQL Data Flow Probe account password
password] [PostgreSQL root account password]

set_root_password [new PostgreSQL root account password]


Modifies the PostgreSQL root account password
[Current PostgreSQL root account password]

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 165
AI Operations Management - Containerized 24.4

1.13.16.9.2. Data Flow Probe database service


cannot start

Cause 1
Hosts machine must not contain "localhost".

Solution 1
On the Data Flow Probe machine, open

Windows: %systemroot%\system32\drivers\etc\hosts
Linux: /etc/hosts and ensure that you comment-out all lines containing "localhost".

Cause 2
You have installed Microsoft Visual C++ 2010 ×64 Redistributable during the installation of the Probe. If for some reason if
you uninstall this redistributable installation, PostgreSQL stops working.

Solution 2
Check if you have installed the Microsoft Visual C++ 2010 x64 Redistributable. If not, reinstall it.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 166
AI Operations Management - Containerized 24.4

1.13.17. Downtime notifications are not sent to


DES unless OBM processes are restarted
Problem
In OPTIC DL, when you enable the infrastructure setting Enable forwarding Downtime/Service Health Data to OPTIC
DL, the change isn't reflected immediately in the OBM process. When downtime occurs, the data isn't forwarded to OPTIC DL.

Solution
After enabling the infrastructure setting Enable forwarding Downtime/Service Health Data to OPTIC DL , login to the
OBM Pod and restart the oprAS process.

kubectl exec -ti -n $(kubectl get pods -A | awk '/omi-0/ {print $1,$2}') -c omi -- bash
<OBM_HOME>/opr/support/opr-support-utils.sh -restart oprAS

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 167
AI Operations Management - Containerized 24.4

1.13.18. File format and extension pop-up appear


in the graph_type excel

Problem
When you open the graph_type excel, you see the file format and extension pop-up.

Solution
Click Yes in the pop-up to see the data stored in the graph_type excel.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 168
AI Operations Management - Containerized 24.4

1.13.19. Errors appear in opr-configserver.log

Problem
After you install or upgrade OBM, you see errors in opr-configserver log when you upload content pack. You see the "
Instrumentation was not added as binary part is empty" error message.

Solution
Ignore the errors when you launch the Content Packs UI and find the instrumentation in the respective Content Packs.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 169
AI Operations Management - Containerized 24.4

1.13.20. OBM Configurator tool doesn't terminate


after timeout

Problem
If the installation of the AI Operations Management certificate on a Windows OBM system takes too long, the certificate
installation process isn’t terminated correctly and the OBM Configurator tool is stuck.

Solution
Perform the following steps:

1. Press Ctrl+C and abort the OBM Configurator tool.

2. Run the OBM Configurator tool as you were using it before by specifying all parameters but with an extra parameter, --f
orce .

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 170
AI Operations Management - Containerized 24.4

1.13.21. Common keyboard shortcuts


You can use keyboard shortcuts to facilitate working with certain OBM user interfaces.

The following table lists common shortcuts of HTML5 OBM UIs. Each shortcut's functionality is specific to the listed UI context.
Not all shortcuts might apply to all HTML5 UIs.

These shortcuts do not apply to Java-based UIs.

Keyboard
Context Description
shortcut

Select or activate an UI control that is focused. For example, you can open the Edit Key Performance
Enter Controls
Indicator panel if the focus is on the Edit button.

Spacebar Switches Select or unselect a check button, or toggle an on/off switch.

Up Arrow or Radio
Switch the focus from one radio button to another.
Down Arrow buttons

Depending on the context, the arrow keys behave differently. In general, use the arrow keys to navigate

Arrow keys between items of equal semantic in an intuitive way.

For example, when using the Top View component, use the keys to move from one CI to another.

Shift+Home or Text
Select the complete text.
Shift+End fields

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 171
AI Operations Management - Containerized 24.4

1.13.22. Adding or deleting dashboard or favorite


on one GW is not visible on other GWs
In OBM, adding and deleting custom dashboards and favorites are visible to PD only on the GW on which the action was
performed. The other GW PD UI does not display the new dashboard.

Cause
This is because PD maintains a cache for each GW server. After the dashboards or favorites are added or removed on one
GW server, the GW server cache is not updated.

Solution
To resolve this problem, perform the following steps on the Gateway Server:

1. Launch OBM in a browser.


2. Open a new tab in the same browser and launch http://<OMi_Gateway_Server>/OVPM/rest/1.0/admin/clearcache/all
3. Click Clear Cache.
4. Go to Administration > Operations Console > Performance Dashboard Mappings .
5. Check if that the Performance Dashboard Mappings screen loads without errors or warnings.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 172
AI Operations Management - Containerized 24.4

1.13.23. PD does not find any entry point to


forward data to BVD

Problem
PD queries data using the time zone of the OBM system. If the agent system is using a different time zone that is behind
OBM's time zone, PD does not find any entry point to forward to BVD.

Solution
Make sure that the agent and the OBM use the same time zone.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 173
AI Operations Management - Containerized 24.4

1.13.24. Missing footer inside variable picker

Problem
For Chinese locale sometimes all strings are not localized. Due to this, you will not see the footer inside the variable picker
from where you can open the Policy Parameters page.

Solution
To resolve this issue, do one of the following:

Refresh the page and check if you can see the Chinese messages and the footer to open the Policy Parameters page.
Switch to another locale to manage policy parameters.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 174
AI Operations Management - Containerized 24.4

1.13.25. Conditional dashboard assignment fails

Cause
Conditional dashboard assignment fails if a static CI Property is selected while creating a conditional dashboard.

Solution
Do not select a static CI Property (for example, "Actual Deletion Period", "Deletion Candidate Period") while creating
conditional dashboards.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 175
AI Operations Management - Containerized 24.4

1.13.26. OBM and OMW connection issues


You are not able to connect Operations Manager (OMW) systems to Operations Bridge Manager (OBM) systems.

Symptom
When you connect OMW systems to OBM, the connection fails with an error. Error message appears as

Operation failed. HTTP Status: 500 (Internal Server Error). Internal server error. Details: Connection reset.

Solution
Follow the steps on the OMW system:

1. Enable TLS 1.2 support on Windows system. Go to the page: https://support.microsoft.com/en-us/help/3140245/update-


to-enable-tls-1-1-and-tls-1-2-as-default-secure-protocols-in-wi and follow the instructions.
2. Install the latest OMW patches: OMW_203, OMW_204, and OVCS binary.

Check the connection by following the steps:

1. On OMW system: ovcert -trust <OBM_FQDN>


2. On OBM system: ovcert -trust <OMW_FQDN>
3. On OBM system: cp /opt/HP/BSM/conf/JRE/lib/java.security /opt/HP/BSM/conf/JRE/lib/java.security.bak
4. On OBM system: cp /opt/HP/BSM/conf/JRE/lib/java.security.ORIG /opt/HP/BSM/conf/JRE/lib/java.security
5. On OBM system: Restart OBM. Export server certificate from OMW system (example, with 'openssl s_client -connect
<OMW_HTTPS_URL> < /dev/null | openssl x509 -out /tmp/c.crt')
6. On OBM system: Create new connected server either in UI or via command line: '/opt/HP/BSM/opr/bin/ConnectedServer.sh -u
ser admin -pw admin -a -certificatefile <OMW_CERT_FILE> -type OMW -name <OMW_NAME> -label <OMW_LABEL> -dns <OMW_FQD
N> -iuser <OMW_INTEGRATION_USER> -ipw <OMW_INEGRATION_PASSWORD>'

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 176
AI Operations Management - Containerized 24.4

1.13.27. Troubleshoot Build Adapter Package


The procedure for building a new adapter requires complete and correct re-naming and replacing. Any error will likely affect
the adapter. The package must be unzipped and re-zipped correctly to act as a UCMDB package. Refer to the out-of-the-box
packages as examples. Common errors include:

Including another directory on top of the package directories in the ZIP file.

Solution: ZIP the package in the same directory as the package directories such as discoveryResources,
adapterCode, etc. Do not include another directory level on top of this in the ZIP file.

Omitting a critical re-name of a directory, a file, or a string in a file.

Solution: Follow the instructions to create the package very carefully.

Misspelling a critical re-name of a directory, a file, or a string in a file.

Solution: Do not change your naming convention in mid-stream once you begin the re-naming procedure. If you realize
that you need to change the name, start over completely rather than trying to retroactively correct the name, as there
is a high risk of error. Also, use search and replace rather than manually replacing strings to reduce the risk of errors.

Deploying adapters with the same file names as other adapters, especially in the discoveryResources and
adapterCode directories.

Solution: You may be using a UCMDB version with a known issue that prevents mapping files from having the same
name as any other adapter in the same UCMDB environment. If you attempt to deploy a package with duplicate names,
the package deployment will fail. This problem may occur even if these files are in different directories. Further, this
problem can occur regardless of whether the duplicates are within the package or with other previously deployed
packages.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 177
AI Operations Management - Containerized 24.4

1.13.28. Logs
This section provides information about OBM logs.

OBM Logs - Overview


OBM records the procedures and actions performed by the various components in log files. The log files are usually designed
to aid Software Support when OBM does not perform as expected.

You can view log files with any text editor.

Log File Locations


Most log files are located in the <OBM_HOME>/log directory and sub-directories organized by component.

Log file properties are defined in files in the following directory and its subdirectories: <OBM_HOME>/conf/core/Tools/log4j .

In addition to application log files, there are web server log files. The web server log files are in the <OBM_HOME>/WebServer/lo
gs directory.

Log File Locations in a Distributed Deployment


In single server installations, all OBM servers and their logs reside on the same machine. In the case of a distributed
deployment of the servers among several machines, logs for a particular server are usually saved on the computer on which
the server is installed. However, if it's necessary for you to inspect logs, you should do so on all machines.

When comparing logs on client machines with those on the OBM server machines, keep in mind that the date and time
recorded in a log are recorded from the machine on which the log was produced. It follows that if there is a time difference
between the server and client machines, the same event is recorded by each machine with a different time stamp.

Log Severity Levels


Each log is configured so that the information it records corresponds to a certain severity threshold. Because the various logs
are used to keep track of different information, each is preset to an appropriate default level.

Typical log levels are listed below from narrowest to widest scope:

Error. The log records only events that adversely affect the immediate functioning of OBM. When a malfunction occurs,
you can check if Error messages were logged and inspect their content to trace the source of the failure.
Warning. The log's scope includes, in addition to Error-level events, problems for which OBM is currently able to
compensate and incidents that should be noted to prevent possible future malfunctions.
Info. The log records all activity. Most of the information is routine and the log file quickly fills up.
Debug. This level is used by Software Support when troubleshooting problems.

The default severity threshold level for log files differs per log but is generally set to either Warning or Error.

The names of the different log levels may vary slightly on other servers and for various procedures. For example, Info may
be referred to as Always logged or Flow.

If required, you can change the log level in the respective properties file in the log directory: <OBM_HOME>/conf/core/Tools/log4
j.

Log File Size and Automatic Archiving


A size limit is set for each type of log file. When a file reaches this limit, it is renamed and becomes an archived log. A new
active log file is then created.

For many logs, you can configure the number of archived log files that are saved. When a file reaches its size limit, it's
renamed with the numbered extension 1 (log.1). If there is currently an archived log with the extension 1 (log.1), it is

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 178
AI Operations Management - Containerized 24.4

renamed to log.2, log.2 becomes log.3, and so on, until the oldest archived log file (with the number corresponding to the
maximum number of files to be saved) is permanently deleted.

The maximum file size and the number of archived log files are defined in the log properties files located in <OBM_HOME>/con
f/core/Tools/log4j .

property.<MODULE>_fileMaxSize = 2000KB
property.<MODULE>_backupCount = 10

For example,

property.opr-backend_fileMaxSize = 2000KB
property.opr-backend_backupCount = 10

Application Server Log


The log file <OBM_HOME>/log/opr-as_boot.log logs start-up activities including running the application server process,
deployment, and start-up status, as well as the number of busy ports.

*.hprof Files
*.hprof files contain a dump heap of an OBM process's data structures. These files are generated by the Java virtual machine
(JVM) if a process fails with a Java Out Of Heap Memory condition.

You are rarely aware of a problem because the problematic process restarts automatically after a failure. The existence of
many *.hprof files indicate that there may be a problem in one of the OBM components, and its contents should be analyzed
to determine the problem.

If you run out of disk space, you can delete the *.hprof files.

Event Flow Logging for an Event


You can enable event flow logging (or flow trace) for an event by setting the custom attribute __TRACE__ . It may have any
severity level. By default, only events with the custom attribute __TRACE__ set are logged to the flow trace log files.

To enable event flow logging for all events, set the infrastructure setting Event Flow Logging Mode to file.

You can enable trace logging on the OM server or agent sending the event, or you can add the trace to the event at a later time. Whenever
this custom attribute is enabled on an event, trace output for this event appears in the following flow trace logs:

OBM data processing server: <OBM_HOME>/log/opr-backend/opr-flowtrace-backend.log


OBM gateway server: <OBM_HOME>/log/wde/opr-gateway-flowtrace.log

Tasks

Delete OBM Logs


You can delete all OBM log files after stopping OBM. This enables you to free up disk space. However, from a support
perspective, it's useful to save older logs.

Don't delete the log directory.

1. Stop OBM.
2. Delete all files under <OBM_HOME>/log . Don't delete the log directory.
3. Delete all .hprof files under /var/opt/OV/log/ (Linux) or %OvDataDir%\log (Windows).
4. Delete all files under <OBM_HOME>/WebServer/logs . Don't delete the logs directory.

Some files can't be deleted, because Apache owns them.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 179
AI Operations Management - Containerized 24.4

If the log directories fill up quickly, it's possible that you still have the loglevel set to DEBUG from troubleshooting an issue.
Change it back to its original value (usually INFO ) if you're done with troubleshooting.

If you have increased the log file size or number of files (for example, when you enabled DEBUG logging), change those back
to their original value (otherwise, more space may be consumed).

Change Log Levels


If requested by Software Support, you may have to change the severity threshold level in a log, for example, to a debug
level.

1. Open the log properties file in a text editor. Log file properties are defined in files in the following directory: <OBM_HOME
>/conf/core/Tools/log4j .
2. Locate the loglevel parameter. For example,

loglevel=ERROR

3. Change the level to the required level. For example,

loglevel=DEBUG

4. Save the file.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 180
AI Operations Management - Containerized 24.4

1.13.29. Common keyboard shortcuts


You can use keyboard shortcuts to facilitate working with certain OBM user interfaces.

The following table lists common shortcuts of HTML5 OBM UIs. Each shortcut's functionality is specific to the listed UI context.
Not all shortcuts might apply to all HTML5 UIs.

These shortcuts do not apply to Java-based UIs.

Keyboard
Context Description
shortcut

Select or activate an UI control that is focused. For example, you can open the Edit Key Performance
Enter Controls
Indicator panel if the focus is on the Edit button.

Spacebar Switches Select or unselect a check button, or toggle an on/off switch.

Up Arrow or Radio
Switch the focus from one radio button to another.
Down Arrow buttons

Depending on the context, the arrow keys behave differently. In general, use the arrow keys to navigate

Arrow keys between items of equal semantic in an intuitive way.

For example, when using the Top View component, use the keys to move from one CI to another.

Shift+Home or Text
Select the complete text.
Shift+End fields

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 181
AI Operations Management - Containerized 24.4

1.13.30. Failed to create CMDB role

Problem
With the latest OBM versions, in an IdM enabled environment, creating a role in OBM results in two corresponding roles in
IdM: one for the OBM application and another for the CMDB application. In older OBM versions, the system maps only one IdM
role to each OBM role for the OBM application.
During an upgrade from version 2022.05 to a newer OBM version, like 24.2 or 24.4 , the second IDM role that should be
mapped to the OBM one does not show up.

Solution
To ensure that existing roles are properly reflected in IdM after the upgrade, either re-create the role(s) or modify a role in
OBM (for example, by changing the description), which will sync it to IdM as two roles. Next, update your existing groups in
IdM to assign both the OBM and CMDB roles.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 182
AI Operations Management - Containerized 24.4

1.13.31. Delete integration_admin user


If OBM is available and the integration_admin user within OBM is not a Super-Admin, delete the user. The integration_admin
user will be recreated automatically.

Restart omi-0 pod


Run the command to bring down OBM:

kubectl -n <namespace> scale statefulset omi --replicas=0

Run the command to start OBM:

kubectl -n <namespace> scale statefulset omi --replicas=2

Run this command to start OBM if HA is disabled:

kubectl -n <namespace> scale statefulset omi --replicas=1

Enable event forwarding


1. Run the following command to get the values.yaml file of your existing deployment configuration:

helm get values <deployment_name> -n <namespace> > <values_file_name>

2. Run the command to enable event forwarding:

helm upgrade <deployment_name> <chart> -n <namespace> --set noop=true -f <values_file_name>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 183
AI Operations Management - Containerized 24.4

1.13.32. Creating a BVD Connected Server using


the CLI doesn't work
Creating a BVD Connected Server using the CLI opr-connected-server doesn't work.

Solution
1. Get the API Key from the BVD admin UI (referred to below as <apikey> ).
2. In the OBM UI, go to Administration > Setup and Maintenance > Connected Servers, and click Business Value
Dashboard in the left pane.
3. Click New.
4. Enter the required details.
For example, in an AI Operations Management deployment with OBM capability, enter the details as follows:
Display Label: Local BVD deployment
Identifier: Local_BVD_Deployment
Endpoint URL: https://bvd-receiver:4000/bvd-receiver/api/submit/<apikey>
5. Click Create.

Related topics
For more information on creating Connected Servers, see Connected Servers.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 184
AI Operations Management - Containerized 24.4

1.13.33. Service Health data flow from OBM to


RAW tables of OPTIC DL
No data is available in the service health RAW tables ( opr_hi_* or opr_kpi_* ). Solution

1. Ensure OPR-SO-FORWARDER service is running on your Operations Bridge Manager (OBM). Run the command: <OBM_HO
ME>/tools/bsmstatus/bsmstatus . If you don't see the OPR-SO-FORWARDER service listed in the status then:
1. Stop OBM. Run the command:
For Linux: sh /opt/HP/BSM/scripts/run_hpbsm stop .
For Windows: <Topaz_Home>\bin\SupervisorStop.bat (for nanny).
<Topaz_Home>\bin\SupervisorStopAll.bat (for UCMDB and nanny).
2. Re-run the config wizard. Run the command:
For Linux: sh /opt/HP/BSM/bin/config-server-wizard.sh .
For Windows: <Topaz_Home>\bin\config-server-wizard.bat .
3. While running the config-server-wizard.sh script, in the Server deployment page, select the Forwarding service to
OPTIC DL enabled checkbox.
4. Start OBM. Run the command:
For Linux: sh /opt/HP/BSM/scripts/run_hpbsm start .
For Windows: <Topaz_Home>\bin\SupervisorStart.bat .
You can view the OPR-SO-FORWARDER service running.

2. Ensure to enable the forwarding infrastructure setting: Enable forwarding Downtime/Service Health data to
OPTIC DL.
3. Ensure to install the OpsB_serviceHealth content in Operations bridge Suite setup, verify if opr_hi_* and opr_kpi_* tables
are present in the Vertica DB.

Run the command on the master node to list the contents: ops-content-ctl list content. For more information,
see Command line interface.

Note

It could be possible that there are no new service health records generated in OBM and that's why you don't see any records in DB.
To confirm you can enable DEBUG in <OBM_HOME>/conf/core/Tools/log4j/opr-so-forwarder/opr-so-forwarder.prop
erties and you should see the following log lines every 5 minutes in the log file (<OBM_HOME>/log/opr-so-forwarder/opr-
so-forwarder.log ). Got 0 hi entries from DB /Got 0 kpi entries from DB.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 185
AI Operations Management - Containerized 24.4

1.13.34. Operations Agent Health dashboard is


not displayed when selecting an Operations
Agent CI

Problem
The Operations Agent Health dashboard isn't displayed when selecting an Operations Agent CI in the Monitoring Health tab.

Solution
1. Go to Administration > Operations Console > Performance Dashboard Mappings .

2. Select the Operations Agent entry in the CI type tree on the left.

3. Move the Operations Agent Health dashboard to the first position in the list on the right.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 186
AI Operations Management - Containerized 24.4

1.13.35. PD graphing fails from OPTIC DL as


content packs having PD artifacts to graph
metrics from OPTIC DL fail to import

Problem
PD graphing fails from OPTIC DL as content packs having PD artifacts (MetricConfig) to graph metrics from OPTIC DL fail to
import when OBM is upgraded from or older than version 2020.10.

Solution
Run the PD_Identity_Issue_Workaround.sql file in the event database and then run the command for uploading the content
pack: <OBM_HOME>/bin/opr-content-auto-upload.<sh/bat> -a -forceReload -uploadFolder <OBM_HOME>/conf/opr/content/en_US

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 187
AI Operations Management - Containerized 24.4

1.13.36. OMi Server self-monitoring content pack


shows errors and unresolved content

Problem
OMi Server self-monitoring content pack shows errors and unresolved content after upgrading OBM from versions older than
2022.05.

Solution
This issue doesn't affect functionality. To resolve this error, run the below commands in DPS to import the content pack
again.

For Windows
1. %topaz_home%/bin/opr-content-manager.bat -i %topaz_home%\conf\opr\content\en_US\OMi_Self_Monitoring.zip -user <Login na
me of the user required for authentication> -pw <Password for the specified user>

2. %topaz_home%/opr/bin/opr-assign.bat -user <Login name of the user required for authentication> -pw <Password for the specifie
d user> -delete_auto_assignment -id d474d89b-26c3-46e5-b0f8-4772d63c5ea5

3. To verify if the auto assignment is deleted, run the below command.

%topaz_home%/opr/bin/opr-assign.bat -list_auto_assignment_by_view -view_name "OMi Deployment" -user <Login name of the u


ser required for authentication> -pw <Password for the specified user>

For Linux
1. /opt/HP/BSM/bin/opr-content-manager.sh -i /opt/HP/BSM/conf/opr/content/en_US/OMi_Self_Monitoring.zip -user <Login name of th
e user required for authentication> -pw <Password for the specified user>

2. /opt/HP/BSM/opr/bin/opr-assign.sh -user <Login name of the user required for authentication> -pw <Password for the specified u
ser> -delete_auto_assignment -id d474d89b-26c3-46e5-b0f8-4772d63c5ea5

3. To verify if the auto assignment is deleted, run the below command.

/opt/HP/BSM/opr/bin/opr-assign.sh -list_auto_assignment_by_view -view_name "OMi Deployment" -user <Login name of the user r
equired for authentication> -pw <Password for the specified user>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 188
AI Operations Management - Containerized 24.4

1.13.37. OutOfMemoryError: GC overhead limit


exceeded error
If you have configured OBM with PostgreSQL using TLS and the database server certificate contains Certificate Revocation
List (CRL) Distribution Points (DPs). If these CRL DPs point to CRLs that have a huge size, the configurartion wizard and post
installation process fail with the following error:

OutOfMemoryError: GC overhead limit exceeded

This issue is known to occur on Azure with Azure PostgreSQL Flexible Server database.

Cause
This issue occurs if the size of the CRL size is huge. For example, more than 35 MB.

Solution
To fix this issue, do the following:

1. Go to the directory where you have the values.yaml file for AI Operations Management deployment.
2. In the OBM Settings section, look for obm.deployment.database.postgresCrlCheckEnabled parameter.
3. Set the value of this paramater to false to disable CRL checks while connecting to the database. Before you disable the
check, make sure you have read and agreed to the security implications.
4. Save the file.
5. Run the following command to upgrade the deployment for the changes done:

helm upgrade <deployment> /home/opsbridge-suite-chart/charts/opsbridge-suite-<version>.tgz --namespace <namespace> -f v


alues.yaml

6. Run the following command to delete the omi pod:

kubectl -n <namespace> delete pod -l app.kubernetes.io/name=omi

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 189
AI Operations Management - Containerized 24.4

1.14. Troubleshoot OMT


To troubleshoot OMT, see Troubleshoot OMT.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 190
AI Operations Management - Containerized 24.4

1.15. Troubleshoot Stakeholder Dashboards and


OPTIC reports

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 191
AI Operations Management - Containerized 24.4

1.15.1. Operations Cloud doesn't load latest


content
You access Operations Cloud and the UI isn't loading the information properly.

For example:

The Help and About page displays an outdated version.


Pages aren't loading and showing "Invalid widgets".
The instances and categories are missing in the side navigation panel.

Cause
The new versions of Operations Cloud content packs aren't properly recognized and thus not uploaded.

Solution
Perform these steps to resolve this issue:

Note

Replace <application namespace> with an appropriate namespace in the


command.

1. Run the following command to touch all configuration maps:

kubectl -n <application namespace> exec -ti $(kubectl -n <application namespace> get pods -l=app.kubernetes.io/name=conten
t-service|tail -1|awk '{print $1}') -c content-service -- bash -c 'touch /tmp/uif-content/*'

2. Run the following command to touch a configuration map with name: (For example, obmContentPack.json , aecContentPack
-1.4.2.json, …)

kubectl exec -it $(kubectl -n <application namespace> get pods -l=app.kubernetes.io/name=content-service|tail -1|awk '{print $
1}') -c content-service -n <application namespace> -- bash -c "touch /tmp/uif-content/<Name of the Json file>"

If you see the issue persists even after running any of the above two commands, restart the pod. To restart the pod, do the
following:

1. Run the following command to stop the pod:

kubectl -n <application namespace> scale deployment uif-contentservice-deployment --replicas=0

2. Run the following command to start the pod:

kubectl -n <application namespace> scale deployment uif-contentservice-deployment --replicas=1

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 192
AI Operations Management - Containerized 24.4

1.15.2. bvd-redis pod goes to CrashLoopBackOff


state
On fresh install bvd-redis pod goes to CrashLoopBackOff state.

Cause
The issues occurs, when the randomly generated password begins with '--' characters. It is considered as another argument
instead of a value thus causing pod startup failure.

Solution
1. Run the following command to login to any container that's in running state:

kubectl exec -it <pod-name> -n <namespace> bash

2. Update the redis password with the following command:

update_secret redis_pwd <newPasswordValue> bvd-secret

After updating the secret, in the next pod restart attempt, redis pod which is in crashloopstate must come into the running
state.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 193
AI Operations Management - Containerized 24.4

1.15.3. Localization isn't working while exporting


the report to PDF
Make sure to adjust the language setting before exporting the translated dashboard to a PDF.

To do this:

1. Navigate to side navigation panel, click Administration > Setup & Configuration > Settings.
2. Under User settings, select the localized language for the dashboard.

Failing to do so will result in PDF getting generated in English instead of the desired localized language.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 194
AI Operations Management - Containerized 24.4

1.15.4. First Stakeholder Dashboard is blank in


Firefox browser
The first Stakeholder dashboard that you access using the Firefox browser appears blank.

Cause
This issue occurs in the Firefox browser.

Solution 1
Either use Chrome or Edge browser.

Solution 2
1. Make sure you've two Stakeholder dashboards.
2. After login, navigate to a different dashboard first and then to the one you want to see.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 195
AI Operations Management - Containerized 24.4

1.15.5. BVD CLI and Web to PDF CLI exit with


error
BVD and Web To PDF CLIs exit with the following error instead of performing the operation specified by the command line
parameters.

Error message: unable to get local issuer certificate

Cause
To identify a known server, the CLIs are storing the server certificates in a sub directory of the users home directory. If the
server certificate changes after renewal, the CLIs can't identify the certificate and will display the "unable to get local issuer
certificate" error message.

Solution
1. Go to your home directory.
2. For bvd-cli you'll find a bvdCliCert directory.
3. For PDF print you'll find Web2PdfCert directory.
4. Delete the file that has the same name as the server that you're trying to use with the CLI from the directory.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 196
AI Operations Management - Containerized 24.4

1.15.6. Data table widget becomes blank on


editing
When you select a row in a data table widget and try to edit the widget, the widget becomes blank.

Cause
The issue occurs whenever you edit the data table with a row selected and attached predefined query has a dimension.

Solution
Change the time range in the time selector or remove the context that you applied for the dimension.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 197
AI Operations Management - Containerized 24.4

1.15.7. Requesting PDF of a UIF page without


specifying the file name throws an error
When you request PDF of a UIF page using WebtoPDF CLI command without specifying the file name, the web to pdf service
throws an error.

Cause
When you don't specify file name while requesting PDF of UIF page, WebtoPDF fails to save the file.

Solution
Give the file name in the CLI command using -- out option

Example:

[pdf-print|pdf-print.exe] --suite_url <suite url> --url <URL of the page> --user <webpage username> --pass <webpage password> --ou
t <output file location>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 198
AI Operations Management - Containerized 24.4

1.15.8. BVD pod is in CrashedLoopState as


migration table is locked
bvd-controller-deployment POD is in crashedloopbackoff state. Log of the databasecreation container in that POD shows the
following entries:

2022-11-10T07:44:05.708Z bvd:audit:migrateManager Current DB schema version: 0


Can't take lock to run migrations: Migration table is already locked

2022-11-10T07:44:05.731Z bvd:error:db Upgrade to DB schema version 20170907110000 failed. Error: ERROR: Migration table is already
locked
2022-11-10T07:44:05.732Z bvd:error:init Error during startup of init: ERROR: Migration table is already locked
2022-11-10T07:44:05.796Z bvd:error:init Init process is aborting now

Cause
The databasecreation container, which is responsible for creating/updating the database tables, got interrupted before it
could finish. This will happen only after an installation or upgrade.

Solution
If you are sure migrations aren't running you can release the lock manually by running the command: knex migrate:unlock

For new installation:

1. Scale bvd-controller-deployment to 0.
2. Delete all tables from the BVD database.
3. Scale bvd-controll-deployment to 1.

This should trigger the creation of the BVD database tables.

For upgrade:

1. Scale bvd-controller-deployment to 0.
2. Restore the BVD database from the backup taken before the upgrade.
3. Scale bvd-controll-deployment to 1.

This should trigger the update of the BVD database tables to the current version.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 199
AI Operations Management - Containerized 24.4

1.15.9. No valid trusted impersonation between


user
Establish trust relationship in IdM to allow impersonation, which is necessary for PDF and CSV export. If you don't establish
the trust, then in the IdM error log file you find the following error message:

2022-08-17T05:05:24.893+0000 ERROR [https-jsse-nio-8443-exec-9] com.hp.ccue.identity.service.IdentityServiceImpl [] - No valid


trusted impersonation between user: 2c9082888146c0de0181481584d008e8 and trustor admin

2022-08-17T05:05:24.895+0000 ERROR [https-jsse-nio-8443-exec-9] com.hp.ccue.identity.web.api.v3.IdentityController [] -


Authentication failed: no valid trust

Cause
The log shows that the suite admin (trustee) tries to impersonate the admin (trust) but there is no trust relationship between
the groups of these two users that they belong to.

For example:

“suiteadmin” is added only to group “suitegroup”


“admin” is added to groups Administrators , SuiteAdministrators , admin

The following table illustrates the trust established between groups:

trust trustee

PreSales superIDMAdmins

Administrators superIDMAdmins

superIDMAdmins superIDMAdmins

Solution
Perform any of the following to resolve the issue:

Add the user suiteadmin to group superIDMAdmins , to match trust rule 2.

Create another trust between one of groups of admin ( Administrators , SuiteAdministrators , admin ) to the group of
suiteadmin (“suitegroup”)

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 200
AI Operations Management - Containerized 24.4

1.15.10. BVD does not show the correct license


information
BVD doesn't show the correct license information and the bvd-controller logs the following errors:

bvd:error:bvd-ap-controller Error getting token from IDM ERROR: getaddrinfo ENOTFOUND <external suite hostname>
bvd:error:bvd-ap-controller Aborting steps to get the feature details

Cause
AP-Bridge accesses IdM using an external hostname instead of an internal name. The external suite hostname isn't DNS
resolvable from within the Kubernetes cluster.

Solution
Make sure that AP-Bridge uses the internal IdM hostname and that it's resolvable.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 201
AI Operations Management - Containerized 24.4

1.15.11. Please try after some time-server is


busy

Cause
If the server receives more than one request at the same time the following error appears:

ERROR: Please try after some time…server is busy

Solution
Wait until the first CLI request is complete before sending another Web to PDF CLI request.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 202
AI Operations Management - Containerized 24.4

1.15.35. Request to the server fails


Request to the server fails. Unable to process the request.

Cause
If you enter a wrong proxy configuration in the Web to PDF CLI, the current request and next requests with the wrong proxy
to the server will fail.

Solution
Make sure to type a correct proxy configuration in the Web to PDF CLI.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 203
AI Operations Management - Containerized 24.4

1.15.13. Unable to login BVD console. Getting


403 error
When user tries to login to BVD, the following error might occur:

{"error":true,"message":"Forbidden"}

bvd-www-deployment log file has the following entries:

bvd:error:idmService Failed to get IDM groups of user admin. Status Code:403. Error Detail:Access is denied

bvd:passportIdMStrategy Error verifying authentication Error: Status Code:403. Error Detail:Access is denied

Cause
The integration user credential has been changed and isn't added to any group, so it has no permission in the system to
validate the access to BVD console, which results in 403 forbidden error.

Solution
You need to update the integration_admin settings from the database and restart bvd-www-deployment pod. You will be able
to login to BVD console successfully.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 204
AI Operations Management - Containerized 24.4

1.15.14. Schedule jobs are deleted if the


schedules are failed
If the scheduled job fails, you don't get the export result, and the schedule gets deleted.

Cause
If a scheduled export fails (BVD or SMTP server not reachable, wrong SMTP configuration, etc.), the server will retry to
execute the schedule five times before giving up. When the server gives up, it will delete the schedule without notice.

Solution
If you're missing the export result and the schedule is gone, request your administrator to inspect the WebToPDF-
deployment logs. If necessary, fix these reasons together with your admin and create that schedule again.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 205
AI Operations Management - Containerized 24.4

1.15.15. Name of the report and exported csv file


names are different during cross launch
When creating CSV export for the BVD report, which got opened as a drill-down report from another report, the created
export has the name of the original report and not the name of the actual exported report.

Solution
Set the current child report title for the CSV zip file.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 206
AI Operations Management - Containerized 24.4

1.15.16. Link provided in the mail along with


scheduled report is not loading complete report.
The link provided in the mail and a scheduled report aren't loading the complete report. The scroll bar of the widget group
isn't working, and therefore only the first page of data is visible.

Cause
The URL contains &page=1 . Hence, the complete report isn't loaded.

Solution
Remove &page=1 from the URL and reload the page. This will enable the scroll bar of the widget group, allowing you to see
all data by scrolling through it.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 207
AI Operations Management - Containerized 24.4

1.15.17. Popup notifications are not shown even


though they are enabled
Popup notifications do not appear on the UI, even if they are enabled.

Cause
The browser and the server time are not in sync. If the server time is more than 10 seconds after the browser time, popup
cannot be displayed.

Solution
You need to sync the server and browser time for the notifications to popup.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 208
AI Operations Management - Containerized 24.4

1.15.18. Number of bytes received from Vertica


exceeded the configured maximum
When customers run a query for OOTB reports sometimes they come across the following error:

The number of bytes received from Vertica exceeded the configured maximum. Terminated connection. Received bytes: 5247312; Allowed
maximum: 5242880

Cause
The number of bytes received from Vertica exceeded the following configured limit:

Query result format "Default": 5 MB


Query result format "Use in Widget Group": 20 MB

Solution
To fix the issue, apply any one of the following solutions:

Solution 1: Reduce the response size

To reduce the response size, change the query so that it returns fewer data. Example: Remove unused data fields or limit the
number of rows returned.

Solution 2: Increase the maximum response limit through the QUERY_RESPONSE_LIMIT environment variable.

To increase the query response limit:

1. Run the following command to edit the bvd-quexserv deployment:

kubectl -n <namespace> edit deployment bvd-quexserv

2. After the following two lines:

- name: DEBUG
value: bvd:error*,bvd:audit*

3. Add the following:

- name: QUERY_RESPONSE_LIMIT
value: 100000

4. Replace 100000 with the required query size in bytes. Make sure to retain the indentation.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 209
AI Operations Management - Containerized 24.4

1.15.19. Certain passwords provided during


application configuration for Vertica do not work
with BVD
When the Vertica database connection is configured via the suite configuration, all installed capabilities can access the
Vertica database except BVD. In the BVD UI, the following error is displayed:

Database authentication failed

However, entering the same password (used in the suite configuration) in the BVD Vertica connection settings slide in works
without any issues.

Cause
BVD falsely assumes that the password provided is base64 encoded if the password contains the characters a-z, A-Z, 0-9, /
and +, and the length of the password is a multiple of 4.

Solution
Follow these steps to fix this issue:

1. Change the Vertica password to include another character besides a-z, A-Z, 0-9, / and +, or use a password with a
length that isn't a multiple of 4.
2. Reconfigure the suite to use the changed password

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 210
AI Operations Management - Containerized 24.4

1.15.20. No data in RUM BVD dashboards

Issues
The two RUM dashboards, RUM_PerApp and RUM_AllApps don't show any data.

The user can’t apply parameters to the dashboards.

After selecting an application, the Apply button stays disabled.

Solution
1. Go to Rum_AllApps_Dashboard
2. Click Settings and select Data Collectors.
3. In the Data Collectors, choose and delete these three queries:
Application #1 (app01)
Application #2 (app02)
Application #3 (app03)

4. Click and select Create Parameter Query.


1. Add the following values:

2. In the Query field, add the value: select distinct(application_name) as app_name from opsb_rum_page
3. Click RUN.
4. Scroll down, in both the Value column and Label column drop-down choose app_name.

5. Retain default values for other parameters.


6. Click SAVE.

5. Click and select Create Parameter Query from the menu to create the second query.
1. Add values

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 211
AI Operations Management - Containerized 24.4

2. In the Query field, add the value: select distinct(application_name) as app_name from opsb_rum_page
3. Click RUN.
4. Scroll down, in both the Value column and Label column drop-down choose app_name.

5. Retain default values for other parameters.


6. Click SAVE.

6. Click and select Create Parameter Query from the menu to create the third query.
1. Add values

2. In the Query field, add the value: select distinct(application_name) as app_name from opsb_rum_page
3. Click RUN.
4. Scroll down, in both the Value column and Label column drop-down choose app_name.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 212
AI Operations Management - Containerized 24.4

5. Retain default values for other parameters.


6. Click SAVE.

After you create the three new parameter queries you will see data in the RUM BVD dashboard.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 213
AI Operations Management - Containerized 24.4

1.15.21. BVD data and statistics aging issue


The following were the issues with BVD data and channel statistics aging:

Issue 1: The format of the next data aging and channel statistics aging time read from the DB isn't valid. After a
restart, the next data aging happens in 10 days and statistics aging in 24 h regardless of what's configured in the UI.
Issue 2: All next schedules will happen in 10/1 day intervals, unless the intervals get changed in the UI again.
Issue 3: In the System settings UI, the Save button doesn't get enabled to save the changes, if only Aging data gets
modified.

Issue 1
After a restart, the next data aging happens in 10 days and statistics aging in 24 h regardless of what's configured in the
UI. After restarting the system, the BVD controller logs the following:

2021-07-21T09:12:13.595Z bvd:error:controller The format of the next data aging time read from the DB is not valid.
Time read: "2021-07-30T10:52:14.308Z"
2021-07-21T09:12:13.602Z bvd:error:controller The format of the next channel statistics aging time read from the DB is
not valid.

Cause
The format of the next data aging and channel statistics aging time read from the DB isn't valid.

Solution
Use a database trigger to strip the quotes before inserting/updating the bvdSettings table.

For Postgres:

SET search_path TO <bvd schema name>;


CREATE OR REPLACE FUNCTION MF_OCTCR19G1396476() RETURNS trigger AS $MF_OCTCR19G1396476_TRG$
BEGIN
IF NEW.key like 'next%Time' THEN
NEW.value = replace(NEW.value,'"','');
END IF;
RETURN NEW;
END;
$MF_OCTCR19G1396476_TRG$ LANGUAGE plpgsql;
CREATE TRIGGER MF_OCTCR19G1396476_TRG BEFORE INSERT OR UPDATE ON "bvdSettings"
FOR EACH ROW EXECUTE PROCEDURE MF_OCTCR19G1396476();

For Oracle:

CREATE OR REPLACE TRIGGER <bvd schema name>.OCTCR19G13964676


BEFORE INSERT OR UPDATE ON <bvd schema name>."bvdSettings"
FOR EACH ROW
BEGIN
IF :new."key" like 'next%Time'
THEN :new."value" := replace(:new."value",'"');
END IF;
END;

To remove the trigger:

For Postgres:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 214
AI Operations Management - Containerized 24.4

SET search_path TO <bvd schema name> ;


DROP TRIGGER MF_OCTCR19G1396476_TRG;
DROP FUNCTION MF_OCTCR19G1396476();

For Oracle:

DROP TRIGGER <bvd schema name>.OCTCR19G13964676;

Issue 2
All the next schedules will happen in 10/1 day intervals, unless the intervals get changed in the UI again

Cause
BVD controller doesn't load configured aging interval on startup.

Solution
Workaround 1: After bvd-controller-deployment got started/restarted, change the aging settings in the UI and save them.
This will make controller to pickup the changed settings. This must repeat after each start/restart of bvd-controller-deployment .
Workaround 2: Configure the aging time range with environment variables to overwrite the default values. For this, you
need to edit the bvd-controller-deployment using the following command:

kubectl edit deployment -n <suite namespace> bvd-controller-deployment

Search for the following:

containers:
- args:
- controller
env:

Add the following after the "env:" in the next line:

- name: BVD_AGING_CHANNEL_STORAGE_TIME
value: "2"
- name: BVD_AGING_PURGE_OLDER
value: "2"

Note

Keep the same indentation as the other lines after "env:". Replace the value 2 with the actual interval in days that you want to
configure. In the UI, the name of the BVD_AGING_CHANNEL_STORAGE_TIME is Data channel statistics and
BVD_AGING_PURGE_OLDER is Data records in the UI.

After saving the changes, when the BVD controller gets restarted, the changed settings are applied.

Issue 3
In System settings dialog box in UI, Save button not enabled if you modify only Aging data.

Solution
Add a space character to custom CSS field and remove it again. This will enable the save button.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 215
AI Operations Management - Containerized 24.4

1.15.22. Vertica certificate issue


When configuring TLS for Vertica you can experience issues with the following scenarios:

1. Common Name (CN) of Certificate Authority (CA) and server certificate are the same
2. CN of CA and server certificate differ
3. CN of CA is empty

CN of CA and server certificate are the same


bvd-quexserv crashes with the following error in the log file:

FATAL ERROR: MarkCompactCollector: young object promotion failed Allocation failed - JavaScript heap out of memory.

Cause

The issue arises, if the common name (CN) of the certificate authority (CA) is the same as the CN of the server certificate.
This is a third-party issue in NodeJS: https://github.com/nodejs/node/issues/37757

Solution

Use different CNs for the CA and server certificate.

Certificate verification issue


bvd-quexserv shows the following error in the log file:

ERROR: unable to verify the first certificate

Cause

CA signs the Server certificate, but the CA isn't in the trusted list of BVD

Solution

Add the CA (and if necessary, the whole certificate chain) to BVD during the configuration of the Vertica connection. You can
add multiple certificates to the same file.

CN of CA is empty
bvd-quexserv shows the following error in the log file:

ERROR: Found an empty CN, Please use proper Issuer/Subject CN for the server(vertica.example.com) certificate for
database(vertica).

Cause

This is a third party issue in NodeJS: https://github.com/nodejs/node/issues/37025.

Solution

Use a non empty CN for the CA. You need to give an arbitrary value (should be different from the CN of the server certificate)
to avoid this error.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 216
AI Operations Management - Containerized 24.4

1.15.23. Vertica DB connection fails with self


signed certificate
When you test the Vertica connection in predefined query UI with TLS enabled, using self-signed certificate you might get the
following error in the UI:

TLS connection failed: provided certificate is not correct

In the bvd-quexserv log, you can find the following error:

bvd:error:VerticaConnector Cannot connect to vertica: Error: self signed certificate

Cause
This error occurs when Operations Cloud tries to make an HTTPS request to the Vertica server with a wrong or faulty self-
signed SSL certificate. Operations Cloud rejects such certificates by default.

Solution
Operations Cloud rejects the Certificate Authority (CA) with an empty Common Name (CN) field regardless of any other
attributes within the certificate. To resolve this issue, assign a valid domain name to the Common Name (CN) field in the SSL
certificate.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 217
AI Operations Management - Containerized 24.4

1.15.24. BVD reports failed to load with a red


banner without any error
When viewing a dashboard, a red banner shows up without any error and bvd-quexserv pod logs don't show any log
messages for the corresponding time.

Cause
This is a suspected memory leak issue in the bvd-quexserv .

Solution
Run the following command:

kubectl rollout restart deployment bvd-quexserv -n <namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 218
AI Operations Management - Containerized 24.4

1.15.25. Vertica Database connection fails


The DB test connection for Vertica TLS fails with any one of the following error message:

Not able to connect to Vertica


None of the Vertica cluster node is reachable
Vertica authentication failed

In the Quexserv log you can find the following error message for DB connection failure.'

Error: The sending password for "vertica", encryption algorithm MD5 does not match the effective server
configured encryption algorithm SHA512

Cause
BVD doesn't support SHA512 encrypted passwords for Vertica connections. Supports only MD5 as encryption algorithm for
authentication with Vertica.

Solution
Run the following query on the Vertica database to alter the security algorithm to MD5 for the username used to connect
BVD to that Vertica:

ALTER USER username SECURITY_ALGORITHM 'MD5' IDENTIFIED BY 'newpassword' REPLACE 'oldpassword';

In case you changed the password for that user, configure BVD with that new password in the respective environment. In
case the Vertica connection details got configured in the BVD UI, update the password there; If configured during the suite
installation, update the configuration there.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 219
AI Operations Management - Containerized 24.4

1.15.26. BVD pods failing with error WRONGPASS


invalid username-password pair
Whenever the bvd-www, bvd-explore, or bvd-receiver PODs aren't running, the log files contain the following error:

[ioredis] Unhandled error event: ReplyError: WRONGPASS invalid username-password pair

Cause
To access redis , the password of redis doesn't match with the password of BVD PODs.

Solution
Delete the bvd-redis-deployment POD using the following command:

kubectl delete pod -n <suite namespace> <bvd-redis-deployment pod name>

This will delete the redis POD and will trigger Kubernetes to create a new one. The new redis POD will then use the same
password as the BVD PODs.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 220
AI Operations Management - Containerized 24.4

1.15.27. Uploading a dashboard locks the SVG


file on disk
On Internet Explorer 11 browser, the upload of SVG files locks the files in the file system. For example, after uploading an
SVG file in Internet Explorer 11, when you click Apply or Save in the Dashboards page and try to export to the file from Visio
again, the Visio changes aren't saved to the SVG file.

Cause
The issue is due to Internet Explorer 11 browser limitation.

Solution
After saving the uploaded dashboard, refresh the Dashboards page, for example by pressing F5.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 221
AI Operations Management - Containerized 24.4

1.15.28. Blank SVG files or shapes


When editing a dashboard in Visio, the dashboard is blank or shapes are missing after it's exported as SVG file.

Cause
Visio supports styles for shapes which the SVG standard doesn't. Having applied one of these styles to a shape can lead to
missing shapes or empty dashboards after exporting to SVG. This happens, for example, to all shapes with shadows.

Solution
Remove the unsupported style or review the Microsoft Visio documentation for possible workarounds.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 222
AI Operations Management - Containerized 24.4

1.15.29. Dashboard loading crashes or blocks


browser
When sending a huge amount of data to a data channel in a high frequency, the dashboard containing multiple widgets with
this data channel blocks or even crashes the browser.

Cause
This issue can occur due to one of the following possible causes when sending the data to BVD:

1. Cause 1: Sending unneeded data


2. Cause 2: Update frequency not matching widget size

Solution

Solution for sending unneeded data


If possible, send only required data to BVD to display the dashboard. If you send additional data, this data is also sent to the
browser during the initial loading and every update of the dashboard. For example, don't send JSON data of several hundred
KB to BVD if you want to display only one string and a number.

Solution for update frequency not matching widget size


During the initial loading and every update of a dashboard, Line Chart widgets have to initially load all data required to
display that widget. Due to this, you must make sure that the update frequency of the data channel matches the size and
time of the Line Chart displayed in the dashboard. For example, if a Line Chart is about 100 pixels in size, the data channel
shouldn't hold more than 100 updates for the given time. If the period is one month, the channel shouldn't update more often
than three times a day. If there are more updates, too much data transfer between the server and the browser might block
the browser if multiple MB are reached. As a consequence, the Line Chart becomes unreadable because it's technically
impossible to display more data points.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 223
AI Operations Management - Containerized 24.4

1.15.30. Exchange Certificate in Vertica


When you set up a Vertica database connection in the Data Collector, use a CA issued TLS certificate instead of a self signed
one. Self signed certificates are a security risk because the entity issues the certificate, signs them as well.

Certificate Authority (CA) issued certificates are safer. Other entity can verify the issued certificate.

You can exchange the self signed certificate in Vertica with a CA certificate as follows:

1. Prerequisite. You must have a CA private key and public certificate ready.
2. Log on to Vertica's Management Console as the administrator.
3. On the home page, click Settings.
4. In the left panel, click SSL certificates.
5. Click Browse to import the new key, and click Apply to apply the change.
6. Restart the Management Console.

For more information and additional details, review the Vertica documentation.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 224
AI Operations Management - Containerized 24.4

1.15.31. Vertica RO user password does not get


updated
The RO User password doesn't get updated when you change the password in the helm values and the chart got redeployed.

Solution
Login into BVD UI and change the password of the database connection in the Predefined query UI.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 225
AI Operations Management - Containerized 24.4

1.15.32. Administration menu doesn't appear in


the left panel
The Administration menu doesn't appear in the left panel of Operations Cloud.

To verify, run the following command:

kubectl logs <uif-contentservice-deployment-xxxxx> -n <application namespace> -c content-service

Check if the uif-contentservice log file displays the parse error as follows:

Cause
This issue is because the baseCP.json file isn't entirely copied during the uif-contentservice-deployment pod start up.

Solution
Follow these steps to resolve this issue:

1. Log on to the control plane.


2. Run the following command:
kubectl exec -it <uif-contentservice-deployment-xxxxx> -n <application namespace> bash
3. Run the following command:
cd /var/bvd/uif-content
4. Open the baseCP.json file and verify if the file is complete. If the file isn't complete, run the following command to copy
the complete file manually:
cp /bvd/cm/initialContent/baseCP.json /var/bvd/uif-content/__baseCP.json
5. Wait for five minutes and then refresh the report.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 226
AI Operations Management - Containerized 24.4

1.15.33. Connection timed out


BVD reports failed to load with connection timeout error. bvd-quexserv reports the following error for a query taking more
than 30 seconds to execute:

' Error ETIMEDOUT : Connection timed out '

Cause
Default query execution request timeout of 4 minutes not working.

Solution
kubectl set env deployment/bvd-www-deployment -n <namespace> REQ_TIMEOUT=240000
kubectl set env deployment/bvd-quexserv -n <namespace> REQ_TIMEOUT=240000

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 227
AI Operations Management - Containerized 24.4

1.15.34. Date type parameter using UTC time


zone
Date type parameter is using UTC time zone instead of server time zone.

Cause
BVD treats Vertica timestamp data type values as being from UTC time zone. Otherwise, it will lead to different time stamps
getting displayed for the same value in line charts and text widgets.

Solution
Use the to_char Vertica function to convert timestamp data type to a string.

Example: to_char(to_timestamp(timestamp_utc_s), 'YYYY-MM-DD HH:MI:SS')

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 228
AI Operations Management - Containerized 24.4

1.15.35. Request to the server fails


Request to the server fails. Unable to process the request.

Cause
If you enter a wrong proxy configuration in the Web to PDF CLI, the current request and next requests with the wrong proxy
to the server will fail.

Solution
Make sure to type a correct proxy configuration in the Web to PDF CLI.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 229
AI Operations Management - Containerized 24.4

1.15.36. Processing request from server: Request


failed with status code 500

Cause
If the URL or the user credentials of the BVD web page entered in the Web to PDF CLI is wrong, the following error appears:

ERROR: Processing request from server: Request failed with status code 500

Solution
Make sure to type correct URL and the user credentials of the BVD web page to the Web to PDF CLI.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 230
AI Operations Management - Containerized 24.4

1.15.37. Processing request from server: Request


failed with status code 404

Cause
If the URL entered in the CLI is wrong the following error appears:

ERROR: Processing request from server: Request failed with status code 404

Solution
Make sure to type correct URL in the Web to PDF CLI.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 231
AI Operations Management - Containerized 24.4

1.15.38. Server busy error

Cause
If the server receives more than one request at the same time the following error appears:

ERROR: Please try after some time…server is busy

Solution
Wait until the first CLI request is complete before sending another Web to PDF CLI request.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 232
AI Operations Management - Containerized 24.4

1.15.39. Exporting report to PDF fails with


warning message
When user attempts to export a report to PDF, using the following command:

/pdf-print-linux --url https://<external_hostname>:<port>/bvd/#/show/<Report Name> --suite_url https://<external_hostname>:<port


> --user <BVD Username> --pass <Password> --strict_host_check no --out <Output file location>

Web to PDF service displays the following warning message:

(node:9837) Warning: Accessing non-existent property 'padLevels' of module exports inside circular dependency

Cause
The issue is due to Third party–Node.js packaging.

Solution
You can ignore the warning.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 233
AI Operations Management - Containerized 24.4

1.15.40. WebtoPDF not generating PDF when


port number is not specified in the URL
WebtoPDF fails to generate PDF when user doesn't specify port number in the URL and displays the following error message:

Failed to get the idm token Error: Request failed with status code 502

Cause
If user doesn't specify a port in the URL, the WebtoPDF service isn't taking the default value 443 for port number to generate
PDF.

Solution
You need to specify the default port number 443 in the command.

Example:

[pdf-print|pdf-print.exe] --url "https://batvm.swlabs.com:19443/bvd/#show/Welcome?params=none" --suite_url https://suite.example.c


om --user admin --pass Password@123

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 234
AI Operations Management - Containerized 24.4

1.15.41. WebtoPDF not generating PDF


Web to PDF doesn't generate PDF output when the total pages are greater than a single digit.

Cause
Due to the slow response time of the server.

Solution
Use --SaveOnServer and set the option to "True". This parameter saves the PDF on the server. Saves the PDF in the
default location of the Suite container. Under reports, you can find the output pdf
file. Example: /var/vols/itom/cdf/vol3/ips/reports.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 235
AI Operations Management - Containerized 24.4

1.15.42. Mail command failed:501 Invalid MAIL


FROM address provided
WebToPDF isn't able to send emails of scheduled PDF exports. You can find the following error in the WebToPDF log file:

bvd:error:schedule Failed to send the mail to the user. Error: "ERROR: Mail command failed: 501 Invalid MAIL FROM address provided"

Cause
WebToPDF uses the user name configured to connect to the SMTP server as the sender of the emails. If that user name isn't
an email address, the server throws an error depending on the SMTP server configuration.

Solution
You can try any one of the following:

Configure the email address as user name to login to the SMTP server.
Configure the SMTP server to use the accounts email address as "FROM" when sending emails.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 236
AI Operations Management - Containerized 24.4

1.15.43. Notification messages


The following section describes the messages and solutions.

The data received from the server does not contain {0} but {1} values

Widgets: Bar chart, Donut chart

The chart by default displays {0} values. However, the data sent by the server only contains {1} values. Check the data sent
by the server and make sure that it contains all required data fields. Open Administration > Dashboards & Reports >
Stakeholder Dashboards & Reports > Dashboard Management to verify the configured data fields and the data sent by
the server in the data channel selector.

There is no valid data channel

Widgets: Bar Chart, Status Color Group, Donut Chart, Feed, Status Images, Sparkline / Multiple Area Chart, Text Value, Status
Visibility Group

There is no data channel set for this widget. Therefore it won't receive any data to display. Open Administration >
Dashboards & Reports > Stakeholder Dashboards & Reports > Dashboard Management to configure a data channel for
the widget.

There are no 'path' or 'rect' elements for the dashboard item '{0}'

Widgets: Status Color Group

The Status Color Group is coloring lines and areas of the grouped shapes. If these don't contain lines and areas, coloring will
fail with this error. Open the dashboard in Visio and make sure that valid shapes are grouped with this Status Color Group.

The property '{0}' used in the coloring rule is not part of the data. Please check the syntax of the rule.
(Dashboard item '{1}')

Widgets: All widgets with coloring/image selection rules

Open Administration > Dashboards & Reports > Stakeholder Dashboards & Reports > Dashboard Management and
verify the rule's syntax: the rule should only use data fields that are included in the data sent by the server.

The coloring rule '{0}' is not valid. Please check the syntax of the rule. (Dashboard item '{1}')

Widgets: All widgets with coloring/image selection rules

Open Administration > Dashboards & Reports > Stakeholder Dashboards & Reports > Dashboard Management and
verify that the rule adheres to the format described in the Coloring Rule section.

The data received from the server is missing either the 'link' or the 'title' field (Feed '{0}')

Widgets: Feed

Make sure that the data sent by the server contains the data fields “link” and “title”.

The data received from the server does not contain a status (Status images '{0}')

Widgets: Status Image

The status field configured in Administration > Dashboards & Reports > Stakeholder Dashboards & Reports >
Dashboard Management isn't part of the data sent by the server. Make sure to choose the correct status field or the server
sends the correct data.

The data received from the server does not contain a property called '{0}' (Spark line '{1}')

Widgets: Sparkline / Multiple Area Chart

The data field configured in Administration > Dashboards & Reports > Stakeholder Dashboards & Reports >
Dashboard Management isn't part of the data sent by the server. Make sure to choose the correct data field or the server
sends the correct data.

There is no valid URL for the dashboard item '{0}'

Widgets: Web Page

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 237
AI Operations Management - Containerized 24.4

There is no URL set for this widget. Therefore it won't display any data. Open Administration > Dashboards & Reports >
Stakeholder Dashboards & Reports > Dashboard Management to configure a URL of the dashboard item.

The property '{0}' used in the visibility rule is not part of the data. Please check the syntax of the rule.
(Dashboard item '{1}')

Widgets: All widgets with visibility rules

Open Administration > Dashboards & Reports > Stakeholder Dashboards & Reports > Dashboard Management and
verify the rule's syntax: the rule should only use data fields that are included in the data sent by the server.

The visibility rule '{0}' is not valid. Please check the syntax of the rule. (Dashboard item '{1}')

Widgets: All widgets with visibility rules

Open Administration > Dashboards & Reports > Stakeholder Dashboards & Reports > Dashboard Management and
verify that the rule adheres to the format described in the Visibility Rule section.

Unable to calculate a color with the data received from the server (Dashboard item '{0}')

Widgets: All widgets with coloring rules

The value sent by the server, together with the given coloring rule, didn't result in a color. Open Administration >
Dashboards & Reports > Stakeholder Dashboards & Reports > Dashboard Management to verify the accuracy of your
coloring rule and give a default color as the last entry. For details on defining coloring rules, see the Coloring Rule section.

Widget type '{0}' is not supported

Widgets: Widget Group

The widget type you placed in your Widget Group isn't supported. Not supported widget types are Feed and Web Page
widgets.
Edit the dashboard or template in Visio to make sure to group the supported widget type with the widget group. For more
information, see the Group Widgets topic.

Unable to create Text Value widget. There is no valid data channel set for the text value

Widgets: Text value widget

In the report, the widget having hyperlinks uses Text Value widget. To avoid this message, use Hyperlink Group widget.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 238
AI Operations Management - Containerized 24.4

1.16. Troubleshoot OPTIC Data Lake


This section covers the following troubleshooting scenarios:

Troubleshoot data flow


Metrics do not reach the Vertica database
Troubleshoot Forecast/Aggregate data flow
Data logging to Vertica stopped
Aggregate not happening after upgrade
Aggregate table has missing or no data
Data is in OPTIC Data Lake Message Bus topic but not present in Vertica tables
Unable to create the same dataset again
Data sent to the OPTIC DL HTTP Receiver not available in Vertica database
Postload task flow not running
Automatic certificate request from itom-collect-once-data-broker-svc not received
Single message pushed to a topic is not streaming into database
Error messages
Certificate with the alias 'CA on abc.net' is already installed
dbinit.sh reinstall fails with error
ERROR: Unavailable: initiator locks for query - Locking failure: Timed out I locking
Insufficient resources to execute plan on pool itom_di_stream_respool_provider_default
Can't forward events from OBM to OPTIC DL
Failed to create consumer: Subscription is fenced
Error while publishing data to the OPTIC DL Message Bus topics
Not enough free slots available to run the job
After upgrade, the Vertica itom_di_metadata_* TABLE is not updated
Data loading to Vertica is stopped for a topic
Data Ingestion loader is constantly restarting
Error getting topic partitions metadata
Frequent WOS spill error observed in the Vertica log
Insufficient resources on pool error
Troubleshoot OPTIC DL Connection issues
Vertica database is not reachable
Failed to connect to host
Correlated group of events does not appear in OBM
Data Source not getting listed in PD
Table does not get deleted after dataset is deleted
Vertica catalog directory has random empty folders
Troubleshoot OPTIC DL pods issue
The itom-di-metadata-server and itom-di-data-access-dpl pods are not Up and Running after installation
itom-di-dp-worker-dpl pod is in CrashLoopBackOff state
The itomdipulsar pods stuck in the init state
The itomdipulsar-bookkeeper pods are not accessible
itomdipulsar-zookeeper pod in CrashLoopBackOff state
Postload pods do not start and are stuck in 1/2 status
Suite deployment failed with pods in pending state
Troubleshoot using ITOM DI monitoring dashboards
Guidelines for adding panels to the OPTIC Data Lake Health Insights dashboard
Vertica Streaming Loader dashboard panels have no data loaded
The DP worker memory usage meter displays increasing memory usage
Data Flow Overview dashboard displays some topics with message batch backlog greater than 10K
Data not found in Vertica and Scheduler batch message count is zero
Postload Detail dashboard Taskflow drop-down does not list the configured task flows
Request error rate in Data Flow Overview dashboard is greater than zero
Request error rate in Data Flow Overview dashboard is increasing over time
The dashboard loads slowly or the page is unresponsive
The Receiver dashboard, Average Message Outgoing Rate panel displays zero
The Receiver dashboard, Avg incoming requests rate - error (All) panel is greater than zero req/sec
The Receiver dashboard, Receiver running panel shows less than 100%
Vertica dashboard queries fail frequently during daily aggregation phase

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 239
AI Operations Management - Containerized 24.4

How to's
How to check the Vertica tables
How to check if OPTIC DL Message Bus topics and data is created
How to check the OPTIC DL Message Bus pod communication
How to recover OPTIC DL Message Bus from a worker node failure
How to check connectivity between Vertica node and OPTIC DL Message Bus Proxy services
How to verify the OPTIC DL Vertica Plugin version after reinstall

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 240
AI Operations Management - Containerized 24.4

1.16.1. Troubleshoot data flow


This section covers possible problems that can cause the OPTIC Data Lake data flow to fail and how you can troubleshoot
them.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 241
AI Operations Management - Containerized 24.4

1.16.1.1. Metric data does not reach the Vertica


database

Cause
Metric data doesn't reach the Vertica database if a Pulsar topic isn't created.

Solution
Follow the steps:

1. Run the query on the Vertica database to identify the missing topics: select distinct(source_name) from itom_di_scheduler_pro
vider_default.stream_microbatch_history where end_message ilike '%topicnotfound%';
2. Run the command and note down bastion pod name: kubectl get pods -A |grep -i bastion
3. Run the command to log into the Pulsar bastion pod: kubectl -n opsb-helm exec -it <itomdipulsar-bastion-pod-name> -c itomdi
pulsar-bastion bash
4. For each of the topic from step 1, run the command to create a Pulsar topic: bin/pulsar-admin topics create-partitioned-topic
-p 3 <topic_name>
For example: bin/pulsar-admin topics create-partitioned-topic -p 3 opsb_agent_cpu

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 242
AI Operations Management - Containerized 24.4

1.16.1.2. Troubleshoot Forecast/Aggregate data


flow

Forecast data flow


The forecast flow helps forecast the value of a metric/field at a specific configurable point of time in the future. This flow also
helps compute the days to exceed a given threshold value.

Aggregate data flow


Aggregation allows you to roll up the data from raw to hourly/daily tables in Vertica. In OPTIC Date Lake, the data
aggregation flow uses Postload data processing. The Postload process configures the taskflow in the data processor to
aggregate the data periodically.

Solution

Verify the dashboards


1. Log on to the Grafana dashboards.
2. Click the namespace where you have deployed the OPTIC DL.
3. Select the ITOM DI/Postload Overview Dashboard.
4. Check the Taskflow overview panel for the state and status of the task. Verify if there is any task flow in Failed non-
recoverable tasks panel.
You can click the task id to go to the Postload Detail dashboard and view more details on the task in the Failed non-
recoverable tasks panel.
If the TaskFlowID is in the “FAILED_NON_RECOVERABLE” state in this case, perform the next steps.
5. Run the following command to get the itom-di-administration pod name from the namespace where you have deployed
OPTIC DL:
kubectl get pods -n <suite namespace>
6. Run the following command to copy the certificates:
kubectl cp <suite namespace>/<administration pod>:/var/run/secrets/boostport.com/RE/server.crt /tmp/AdminstrationPod.crt
kubectl cp <suite namespace>/<administration pod>:/var/run/secrets/boostport.com/RE/server.key /tmp/AdminstrationPod.key
7. Run the following command with the certificate names to continue the task:
curl -k -X POST https://<master-hostname>:<administration-service-port>/urest/v1/itom-data-ingestion-administration/monitoring/tas
kflow/{taskflowid}/continue --key /tmp/AdminstrationPod.key --cert /tmp/AdminstrationPod.crt -H 'content-type: application/json'

5. If the flow doesn't continue and the task goes to the “FAILED_NON_RECOVERABLE” state often, perform this step:
Go to <conf-volume> location and perform the following steps:
i. Open the task-executor-logback.xml file.
ii. Change the level from INFO to DEBUG in the following line:
<logger name="com.microfocus" level="INFO" additivity="false">
<appender-ref ref="LogFileAppender"/>
</logger>
iii. Save the file.
iv. Open the log4perl.conf file.
v. Change the level from INFO to DEBUG in the following line:
log4perl.logger.topology = INFO, topologyAppender
vi. Save the file.
vii. Run the curl command from the step 3c .

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 243
AI Operations Management - Containerized 24.4

viii. Collect the logs and contact Software Support to resolve the issue.

Verify the database layer


Run the command: kubectl get pods -n <suite namespace> and make sure that the following pods are running:
itom-di-metadata-server
itom-di-scheduler-udx
itom-di-administration
itom-di-vertica-dpl - This pod is available in Internal Vertica.
Check for Vertica connection issues. To troubleshoot, see Failed to connect to host scenario from the Related topics.
Check the schema permissions (read permission) to the users to create and view reports. Make sure to give the
permissions as mentioned in the Prepare Vertica topic.
Check the administration.log file. Search the administration.log file with the task id to see the complete details. The file
shouldn't have any errors related to the forecast. To debug the issues, set the log file level from INFO to DEBUG . For
more information, to see the list of log files and to set the log file level from INFO to DEBUG , see Find the log files from
the Related topics.
In the Vertica database, use either the admin tools or the DB Visualizer tool and check the *_itom_di_configuration
schema. The POSTLOAD_TASKFLOW table should be available with the schema, table, and mapping details.
If the forecast isn't happening on the data/table - See the troubleshooting scenarios Forecast functionality isn't working
as expected and Trace issue in Aggregate and Forecast.
Vertica error due to Certificate issue: Check the Vertica certificate details in the cert: parameter in the suite installation
parameters file.

Verify the data processing layer

The Postload data processing layer consists of the following pods:

1. itom-di-postload-taskcontroller
2. itom-di-postload-taskexecutor - This pod receives and processes the jobs from the data processor master pod. Acts as the
task manager.

To verify the Postload data processing layer:

Check if the Vertica database is reachable. To troubleshoot, see the Vertica database isn't reachable. Also, check the
task status, see Postload task flow not running from the Related topics.
Check if the Vertica sessions have exceeded. Run the query
SELECT node_name, current_value FROM v_monitor.configuration_parameters WHERE parameter_name = 'MaxClientSessions';
You can increase the max session based on the memory available in Vertica.
Check error, tasks, and state in data processor logs:
taskexecutor.log
forecast.log
For more information on log files and the locations, see Find the log files from the Related topics.
If there are no forecast logs, see the troubleshooting scenario No logs for aggregate and forecast from the Related
topics.

Related topics
To troubleshoot Vertica connection issues, see Failed to connect to host.
To check the Vertica tables, see How to check the Vertica tables.
To troubleshoot Vertica database is reachable, see Vertica database isn't reachable.
To troubleshoot Data is in OPTIC DL Message Bus topic but not present in Vertica tables, see Data is in OPTIC DL
Message Bus topic but not present in Vertica tables.
To check the task status of Vertica database, see Data Processor Postload task flow not running.
To troubleshoot, see Forecast data is not displayed in System Infrastructure Summary or System Resource Details

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 244
AI Operations Management - Containerized 24.4

reports.
For information on data flows and task flows that enable the OPTIC Reporting - System infrastructure reports and event
reports, see Reporting data and task flows.
To troubleshoot Aggregate table has missing or no data, see Aggregate table has missing or no data.
To verify the creation of OPTIC DL Message Bus topics and data, see How to check if OPTIC DL Message Bus topics and
data is created.
To prepare Vertica database, see Prepare Vertica database.
To troubleshoot Aggregate table has missing or no data, see Aggregate table has missing or no data.
To see the aggregate and availability log files and to set the log file level from INFO to DEBUG , see System
Infrastructure reports are showing no or partial data or updated data is not shown in the reports and System
Infrastructure Availability data is missing in reports.
To troubleshoot Aggregate tables are not updated data in the system infrastructure or event reports are not refreshed,
see Aggregate tables are not updated data in the system infrastructure or event reports are not refreshed.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 245
AI Operations Management - Containerized 24.4

1.16.1.3. Data logging to Vertica stopped


Data logging to Vertica stopped and the OPTIC DL HTTP Receiver is in a hung state.

Cause
This issue is because the OPTIC DL HTTP Receiver isn't responsive. The receiver log file shows the java.lang.OutOfMemoryError:
GC overhead limit exceeded error.

Solution
Perform these steps to resolve the issue:

1. Run the command: kubectl get pods -n <suite namespace>


2. Copy the itom-di-receiver-dpl-<pod value> name.
3. Run the command to go into the pod: kubectl -n <suite namespace> exec -ti itom-di-receiver-dpl-<pod value> -c itom-di-receive
r-cnt bash
4. Run the command: set | grep -i jvm
5. Update the value of the parameter RECEIVER_JVM_ARGS from -Xms512m -Xmx1024m to - Xms512m -Xmx2048m .
6. Run the following command to restart the pod:
kubectl delete pod itom-di-receiver-dpl-<pod value> -n <suite namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 246
AI Operations Management - Containerized 24.4

1.16.1.4. Aggregate not happening after upgrade

Cause
This issue is because of any of the following reasons:

the task got stuck in DISPATCHED or RUNNING or FINISHED state and you have upgraded the suite before the task
recovered.
OR
because the task related message was lost in the OPTIC DL Message Bus.

Perform these steps to further check the issue:

1. Log on to the Grafana dashboards.


2. Click the namespace folder where you have deployed OPTIC DL.
3. Go to the ITOM DI/Postload Detail dashboard and select the time for which you want to analyze the data.
4. From the Taskflow drop-down select the taskflow that has the issue.
5. The Taskflow state and the Taskflow status panels will display the taskflow state. The Task details panel lists the
current state and status of the tasks. The different states of a task are: READY , SCHEDULED , DISPATCHED , RUNNING , FINI
SHED while the different statuses are: SUCCESS , FAILED_RECOVERABLE , FAILED_NON_RECOVERABLE .
6. Check if one of the Tasks is in the DISPATCHED or RUNNING or FINISHED state for a long time with taskflow in the SCHEDU
LED or RUNNING state with taskflow Status empty.

Solution
Perform these steps to resolve this issue:

1. Run the following commands on the master node to scale down the itom-di-postload-taskcontroller and itom-di-postload-task
executor pods:
kubectl scale deployment itom-di-postload-taskcontroller --replicas=0 -n <suite namespace>
kubectl scale deployment itom-di-postload-taskexecutor --replicas=0 -n <suite namespace>
2. Run the following commands to log on to the pulsar bastion pod:
kubectl get pods -n <suite namespace>
Note down the bastion pod name.
kubectl -n <suite namespace> exec -ti <bastion pod-<POD value>> -c pulsar -- bash
3. Run the following commands to delete the topics:
bin/pulsar-admin topics delete-partitioned-topic -f persistent://public/itomdipostload/di_internal_postload_state
bin/pulsar-admin topics delete-partitioned-topic -f persistent://public/itomdipostload/di_postload_task_status_topic
bin/pulsar-admin topics delete-partitioned-topic -f persistent://public/itomdipostload/di_postload_task_topic
4. Run the following commands on the master node to scale up the itom-di-postload-taskcontroller and itom-di-postload-
taskexecutor pods:
kubectl scale deployment itom-di-postload-taskcontroller --replicas=<replica count> -n <suite namespace>
kubectl scale deployment itom-di-postload-taskexecutor --replicas=<replica count> -n <suite namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 247
AI Operations Management - Containerized 24.4

1.16.1.5. Aggregate table has missing or no data


The aggregate table has missing or no data.

Possible causes
Cause 1: The required pods aren't running.
Cause 2: The taskflow is in FAILED_NON_RECOVERABLE state.
Cause 3: Errors in the log file.
Cause 4: Vertica connection issue.

Solution

The required pods aren't running


Check if the itom-di-postload pods are up and running. Run the following command to check the pod status:
kubectl get pods -n <application namespace> | grep itom-di-postload
If the pods aren't running contact Software Support.

The taskflow is in FAILED_NON_RECOVERABLE state


1. Log on to the Grafana dashboard.
2. Go to the <namespace> folder where you have deployed OPTIC DL.
3. Select the ITOM DI/Postload Overview Dashboard.
4. Check the Taskflow overview panel for the state and status of the task. Verify if there is any taskflow in Failed non-
recoverable tasks panel.
You may click the task id to go to the Postload Detail dashboard and view more details on the task in the Failed
non-recoverable tasks panel.
If the TaskFlowID is in the FAILED_NON_RECOVERABLE state in this case, perform the next steps.
5. Run the following command to get the itom-di-administration pod name from the namespace where you have deployed
OPTIC DL:
kubectl get pods -n <application namespace>
6. Run the following command to copy the certificates:
kubectl cp <application namespace>/<administration pod>:/var/run/secrets/boostport.com/RE/server.crt /tmp/AdminstrationPod.crt
kubectl cp <application namespace>/<administration pod>:/var/run/secrets/boostport.com/RE/server.key /tmp/AdminstrationPod.ke
y
7. Run the following command with the certificate names to continue the task:
curl -k -X POST https://<master-hostname>:<administration-service-port>/urest/v1/itom-data-ingestion-administration/monitoring/tas
kflow/{taskflowid}/continue --key /tmp/AdminstrationPod.key --cert /tmp/AdminstrationPod.crt -H 'content-type: application/json'
The {taskflowid} is the TaskFlowID from step 4.
If the flow doesn't continue and the task goes to this state often, perform the next steps.
8. Run the following command:
kubectl get ns
Note down the <application namespace> .
9. Run the following command:
kubectl get cm -n <application namespace> | grep logback
The list of logback files appears. Note down the taskexecutor-logback file name.
10. Run the following command to edit the file:
kubectl edit cm taskexecutor-logback-cm -n <application namespace>
11. Change the level from INFO to DEBUG in the following line:

<logger name="com.microfocus" level="INFO" additivity="false">


<appender-ref ref="LogFileAppender"/>
</logger>

12. Save the file.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 248
AI Operations Management - Containerized 24.4

13. Run the following commands to edit the config file:


kubectl edit cm log4perl-cm -n <application namespace>
14. Change the level from INFO to DEBUG in the following line:

log4perl.logger.topology = INFO,
topologyAppender

15. Save the file.


16. Run the curl command from step 7.
17. Collect the logs and contact Software Support to resolve the issue.

Errors in the log file


1. To get the Taskflow ID, log on to the Grafana dashboard and click the ITOM DI/Postload Overview dashboard from
the <namespace> folder where you have deployed OPTIC DL. Verify if the task flow is in Failed non-
recoverable tasks panel. Note down the taskflow id. You can click the task to see more details about the task flow.
2. Go to the <log-volume> location on the NFS server and in each of the <application namespace>__itom-di-postload-taskexecut
or-<pod value>__postload-taskexecutor__<node name> directory, run the command: grep -irh "TASKFLOW_ID" * > txt
The output lists the log files: aggregate.log and taskexecutor.log
3. Check the error in the logs and perform the remediation steps mentioned in the log file.

Vertica connection issue


1. Verify the configuration parameters for the database connection are correct.
2. Verify if the Vertica database node is running using the adminTools interface. On the Vertica server, go to the location
/opt/vertica/bin and type admintools . Press enter. Select View Database Cluster State to verify the state.
3. Check the schema permissions (read permission) to the users to create and view reports. Make sure to give the user
permissions while creating Vertica database users.
4. Check the Vertica certificate details in the cert: parameter in the application values.yaml file.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 249
AI Operations Management - Containerized 24.4

1.16.1.6. Data is in OPTIC Data Lake Message


Bus topic but not present in Vertica tables

Cause
This issue may occur if there is a mismatch in the mapping of a topic name to the corresponding table name.

Solution
1. Log on to the Vertica database. Use either the admin tools or any of the DB Visualizer tool of your choice to view the
database tables.
2. Click itom_di_configuration_* > TABLE > MICROBATCH. In the Data tab, check the CONTENT_JSON row if the table
name and topic name are the same.
For example:
"topic_name": "SCOPE_GLOBAL",
"streaming_table_schema": "<suite>_store",
"streaming_table_name": "SCOPE_GLOBAL"
If there is a mismatch, update them.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 250
AI Operations Management - Containerized 24.4

1.16.1.7. Unable to create the same dataset


again
Unable to create the same dataset again even after dropping the table from the Vertica database.

Cause
This is because the metadata information stored in OPTIC Data Lake isn't cleaned.

Solution
Perform the following steps to clean up metadata information:

1. Drop the required tables from the database by running the command:

drop table <schema_name>.<table_name>

2. Go to <conf-volume>/di/administration/conf/metadata/<table_schema> directory.
3. Run the following command to delete the required files ending with metadata.json :

rm -f <table_name>_metadata.json

4. Go to <conf-volume>/di/vertica-ingestion/conf/dataset/<table_schema> directory.
5. Run the following command to delete the required files ending with dataset.json :

rm -f <table_name>_dataset.json

6. If you have configured the dataset for json-load , go to <conf-volume>/di/vertica-ingestion/conf/dataset/<table_schema>


directory and run the following command to delete the files ending with microbatch.json :

rm -f <table_name>_microbatch.json

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 251
AI Operations Management - Containerized 24.4

1.16.1.8. Data sent to the OPTIC DL HTTP


Receiver not available in Vertica database

Cause
Data sent to the OPTIC DL HTTP Receiver doesn't have the required headers. The receiver-itom-di-receiver-dpl-<pod value>.log
displays receiver.topic.from.header value is true but no header provided with header fieldname x-monitoredcollection message.

Solution
To resolve this issue, make sure that the data streamed to the OPTIC DL HTTP Receiver contains the required header fields. If
data has required headers, you can check the sent data in the log by changing the log level from INFO to DEBUG as follows:

1. Run the following command:


kubectl get ns
Note down the <suite namespace> .
2. Run the following command:
kubectl get cm -n <suite namespace> | grep logback
The list of logback files appears. Note down the itom-di-receiver-logback file name.
3. Run the following command to edit the file:
kubectl edit cm itom-di-receiver-logback-cm -n <suite namespace>
4. Change the level from INFO to DEBUG in the following line:

<logger name="com.microfocus" level="INFO" additivity="false">


<appender-ref ref="LogFileAppender"/>
</logger>

5. Save the file.


6. Go to <log-volume> and check the receiver-itom-di-receiver-dpl-<pod value>.log file and debug the error.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 252
AI Operations Management - Containerized 24.4

1.16.1.9. Postload task flow not running


The configured Postload task isn’t running and displays Failed status on the Grafana dashboard.

Cause
If a Postload task flow encounters an error, the task flow execution retries the task for the configured number of times, and
then the task switches to FAILED_NON_RECOVERABLE state. Common causes for these errors are:

1. Shell/Perl script error (Postload tasks are shell/Perl scripts)


2. Vertica database or other connection issues

Solution
When a task in the data processing flow fails, it means that it has retried for the configured number of times (=1440 for bulk
load - approx. 1 day), and then the task status set to FAILED. Perform the following steps:

1. Log on to the Grafana dashboard.


2. Click the <namespace> folder where you have deployed OPTIC DL.
3. Select the ITOM DI/Postload Overview dashboard.
4. From the Failed non-recoverable tasks panel, note down the taskflow id with FAILED task status.
5. Run the following command to get the itom-di-administration pod name from the namespace where you have deployed
OPTIC DL:
kubectl get pods -n <namespace>
6. Run the following command to copy the certificates:
kubectl cp <namespace>/<administration pod>:/var/run/secrets/boostport.com/RE/server.crt /tmp/AdminstrationPod.crt
kubectl cp <namespace>/<administration pod>:/var/run/secrets/boostport.com/RE/server.key /tmp/AdminstrationPod.key
7. Run the following command and note down the <administration-service-port>:
For embedded Kubernetes:
kubectl get svc itom-di-administration-nodeport -n <namespace> -o json | jq -r .spec.ports[].nodePort
For AWS:
kubectl -n <namespace> get svc itom-di-administration-svc -o json | jq -r '.metadata.annotations."external-dns.alpha.kubernetes.io/h
ostname"'
For Azure:
kubectl -n <namespace> get svc itom-di-administration-svc -o json | jq -r '.spec.loadBalancerIP
8. Run the following command on the master or bastion node using the certificates from step 6 to monitor the task flow
and to resume FAILED task flows:
curl -k -X POST https://<hostname>:<administration-service-port>/urest/v2/itom-data-ingestion-administration/monitoring/taskflow/{t
askflowname}_taskflow/continue --key /tmp/AdminstrationPod.key --cert /tmp/AdminstrationPod.crt -H 'content-type: application/json'

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 253
AI Operations Management - Containerized 24.4

1.16.1.10. Automatic certificate request from


itom-collect-once-data-broker-svc not received
After the upgrade or reconfigure of the suite, the certificate request from itom-collect-once-data-broker-svc not received.

Cause
This issue is because during the suite upgrade or reconfigure, you've changed the value for the parameter global.di.cloud.exter
nalAccessHost.pulsar in the values.yaml file. Due to this, the data flow from OPTIC DL Message Bus to Vertica database gets
interrupted.

Solution
To resolve this issue, you must restart the Vertica database.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 254
AI Operations Management - Containerized 24.4

1.16.1.11. Single message pushed to a topic is


not streaming into database
Few topics have a high message ingestion rate, at the same time if only one message streams occasionally for other topics,
the message isn't streamed to the database.

Cause
This issue is due to wrong backlog information retrieved from the OPTIC DL Message Bus, due to which OPTIC DL Streaming
Loader doesn't run micro batches for such topics.

However, it's observed that backlog information gets reflected the moment more than two messages are streamed in for the
same topics.

Solution
Perform the following steps to disable the OPTIC DL Streaming Loader backlog checks and resolve this issue:

1. Run the following command:

helm upgrade <release-name> <application-chart> -n <application namespace> -f <values YAML filename> --set itom-di-udx-sc
heduler.scheduler.configData.scheduler.enableFrameBacklogCheck="false" -set itom-di-udx-scheduler.scheduler.configData.sche
duler.enableMicrobatchBacklogCheck="false"

2. Run the following command to verify if enableFrameBacklogCheck and enableMicrobatchBacklogCheck are false :

kubectl get cm itom-di-udx-scheduler-scheduler -n <application namespace> -o yaml | grep BacklogCheck

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 255
AI Operations Management - Containerized 24.4

1.16.2. Error messages


This section covers possible error messages related to OPTIC Data Lake.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 256
AI Operations Management - Containerized 24.4

1.16.2.1. Certificate with the alias 'CA on abc.net'


is already installed
While running the OPTIC Data Lake secure connection script, the following warning appears:

WARNING: Certificate with alias '<alias name> on <host name>' is already installed

Cause
In the case of OPTIC Data Lake secure connection, if you have removed the earlier certificate and install a new certificate on
the same system, the warning appears. This issue may occur if the certificate is similar to any of the deleted certificates.

Solution
You can ignore this warning.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 257
AI Operations Management - Containerized 24.4

1.16.2.2. dbinit.sh reinstall fails with error


During the OPTIC DL Vertica Plugin reinstall, the dbinit.sh command fails with the following error:

Todays OPTIC DL Vertica Plugin is older ... not supported ... 291 1

Cause
This issue occurs because you have run the dbinit.sh command as the database administrator user instead of the root user.
Also, the OPTIC DL Vertica Plugin RPM you are installing is older compared to the existing RPM.

Solution
You must make sure to install the RPM as a root user. Perform the following steps:

1. Log on to the Vertica node where you have installed the RPM as the root user.
2. Run the following command and note down the RPM version:
rpm -qa itom-di-pulsarudx
3. If you have run the RPM as a database administrator, you will see the files in the location $HOME/.coso_rpm_installed and
/home/dbadmin/.coso_rpm_installed . Run the following command to check the files and compare the RPM versions:
cat /home/dbadmin/.coso_rpm_installed
The RPM version will appear: itom-di-pulsarudx-<RPM version>
cat $HOME/.coso_rpm_installed
The RPM version will appear: itom-di-pulsarudx-<RPM version>
4. Run the following command and manually update the file with the latest RPM version from the compared files in step 3:
cat $HOME/.coso_rpm_installed/ itom-di-pulsarudx-<RPM version>
5. Run the following command to complete the RPM install:
./dbinit.sh <option>
Options usage: dbinit.sh [-h|-?|--help] [-p|--preview] [-s|--silent|--suppress] [-w|--dbapass]

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 258
AI Operations Management - Containerized 24.4

1.16.2.3. ERROR: Unavailable: initiator locks for


query - Locking failure: Timed out I locking
The Vertica Streaming Loader log ( scheduler.log ) displays the following error:

[Vertica][VJDBC](5156) ERROR: Unavailable: initiator locks for query - Locking failure: Timed out I locking Table:itom_di_scheduler_provi
der_default.stream_microbatch_history. . Your current transaction isolation level is SERIALIZABLE

Cause
This error appears when the Data Retention process runs in the background. This process deletes the old microbatch data
that's greater than the retention period and if lane workers try to insert new microbatch data into stream_microbatch_history
table at the same time.

Solution
You can ignore this warning. The lane worker will retry and stream the message again and consume the message.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 259
AI Operations Management - Containerized 24.4

1.16.2.4. Can't forward events from OBM to


OPTIC DL

Symptom
After executing obm-configurator.jar script to integrate OBM with the OPTIC DL, events aren't getting forwarded to OPTIC DL
from OBM and RUN TEST command from the connected server results in a NullPointerException error.

Cause
The log file opr-event-sync-adapter.log shows that the integration Groovy script times out in the init method:

2022-04-26 09:38:04,115 [RMI TCP Connection(24846)-137.15.90.24] INFO ForwardingScriptExecutorImpl.init(63) - Initializing forwardin


g script: COSO Data Lake Event Forwarding Script
2022-04-26 09:38:09,146 [RMI TCP Connection(24846)-137.15.90.24] ERROR ForwardingScriptExecutorImpl.scriptLifecycleController(20
8) - java.util.concurrent.TimeoutExceptionmethod com.microfocus.opsbridge.aiops.forwarder.ItomAnalyticsAdapter.init threw an excepti
on.
2022-04-26 09:38:09,146 [RMI TCP Connection(24846)-137.15.90.24] ERROR ForwardingScriptExecutorImpl.executeMethod(150) - Scri
pt execution failed: COSO Data Lake Event Forwarding Script, method=init
2022-04-26 09:38:09,146 [RMI TCP Connection(24846)-137.15.90.24] ERROR ForwardingScriptExecutorCache.createExecutor(281) - Scr
ipt COSO Data Lake Event Forwarding Script has failed to load. Script ID: 2e798590-b8a4-45d1-838c-20cfbddbd632

Solution
Increase the timeout for init in OBM by executing the following commands:

On Linux:

/opt/HP/BSM/opr/support/opr-support-utils.sh -set_setting -context opr -set opr.epi.maxInitDestroyExecTime 30000

On Windows:

\HPBSM\opr\support\opr-support-utils.bat -set_setting -context opr -set opr.epi.maxInitDestroyExecTime 30000

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 260
AI Operations Management - Containerized 24.4

1.16.2.5. Failed to create consumer: Subscription


is fenced
[pulsar-io-22-6] WARN org.apache.pulsar.broker.service.persistent.PersistentSubscription - Attempting to add consumer Consumer{sub
scription=PersistentSubscription{topic=persistent://public/default/interface_health-partition-1, name=1_itom_di_scheduler_default_defa
ult_interface_health}, consumerId=47840, consumerName=5fbecfff0f, address=/15.218.54.88:40216} on a fenced subscription

[pulsar-io-22-6] WARN org.apache.pulsar.broker.service.ServerCnx - [/15.218.54.88:40216][persistent://public/default/interface_health-


partition-1][1_itom_di_scheduler_default_default_interface_health] Failed to create consumer: Subscription is fenced

Cause
This issue is because the OPTIC DL Message Bus topic consumer got disconnected and reconnected. The reconnect may
happen before the consumer resets in the server. This causes the consumer to conflict with its own subscription. The retries
in the consumer resolve the issue.

Solution
You can ignore this warning message.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 261
AI Operations Management - Containerized 24.4

1.16.2.6. Error while publishing data to the OPTIC


DL Message Bus topics
The following error appears in the logs while publishing the data to the OPTIC DL Message Bus topics:
ERROR: Failed to create producer: Future didn't finish within deadline

Cause
This issue is because the worker node goes down and the data isn't published to topics even after the node comes up.

Solution
Perform the following steps to resolve this issue:

1. Log on to the Grafana dashboard.


2. Click the <namespace> folder where you have deployed the OPTIC DL.
3. Select the ITOM DI/Pulsar Overview dashboard.
4. Check if there is a drop in the Message Rate panel graph. If the graph indicates a drop to zero as seen in the following
image, the message bus isn't getting data.

5. Run the following commands to kill all the broker pods:


kubectl get ns
Note down the application namespace.
kubectl get pods -n <namespace>
Note down all the itomdipulsar-broker pod names.
kubectl delete pod itomdipulsar-broker<pod value> -n <suite namespace>
6. Wait for a few minutes.
7. Run the command to check if the broker pods are running again:
kubectl get pods -n <namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 262
AI Operations Management - Containerized 24.4

1.16.2.7. After upgrade, the Vertica


itom_di_metadata_ TABLE is not updated
After the upgrade, the Vertica itom_di_metatdata_* TABLE isn't updated and the following errors appear:

In the master node itom-di-metadata-server pod: /var/vols/itom/<application namespace>/itom-log/di/metadata-server/__itom-di-metad


ata-server-<pod value>/metadata-server-app.log :

org.springframework.orm.ObjectOptimisticLockingFailureException: Batch update returned unexpected row count from update [0];
actual row count: 0; expected: 1; statement executed: HikariProxyPreparedStatement@141103738 wrapping com.vertica.jdbc.VerticaJ
dbc4PreparedStatementImpl@448a2be4; nested exception is org.hibernate.StaleStateException: Batch update returned unexpected ro
w count from update [0]; actual row count: 0; expected: 1; statement executed: HikariProxyPreparedStatement@141103738 wrapping
com.vertica.jdbc.VerticaJdbc4PreparedStatementImpl@448a2be4

In the Vertica master node: <catalog-path>/<database-name>/<node-name>_catalog/vertica.log

IV lock table - deadlock error Deadlock IV locking Table:itom_di_metadata_provider_default.FIELD_TAG. IV held by [user vertica_rwuse
r

Cause
This issue is because of the deadlock in the Vertica tables and the itom-di-metadata-server pod couldn't perform all the table
update operations.

Solution
To resolve this issue, restart the itom-di-metadata-server pod to rerun all the modification updates in Vertica for the changes to
appear in the tables. Perform these steps:

1. Log on to the master node and run the following command to stop the itom-di-metadata-server pod:
kubectl -n <suite namespace> scale deployment itom-di-metadata-server --replicas=0
2. Run the following command to start the itom-di-metadata-server pod:
kubectl -n <suite namespace> scale deployment itom-di-metadata-server --replicas=1

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 263
AI Operations Management - Containerized 24.4

1.16.2.8. Data loading to Vertica is stopped for a


topic
Data loading to Vertica stopped for a topic. The following exception appears in the scheduler.log file:

Rolling back microbatch: [Vertica][VJDBC](5948) ROLLBACK: Local temporary objects may not specify a schema name
java.sql.SQLSyntaxErrorException: [Vertica][VJDBC](5948) ROLLBACK: Local temporary objects may not specify a schema name

Cause
This issue is due to the accidental dropping of the rejected table. The data streaming stops when there are no rejected tables
present in Vertica.

Solution
Perform these steps to resolve this issue:

1. Log on to the Vertica node as the dbadmin user.


2. Run the following query. If schema or table name has special characters, then enclose them with double quotes(" "):

COPY "<schema>"."<table>" SOURCE public.SinglePulsarSource(timeslice_id='default' , serviceURL='pulsar+ssl://localhost:9999


99', stream='public/default/testtopic|0|,public/default/testtopic|1|,public/default/testtopic|2|', log_level='error' , subscription='defa
ult' , performance_test='false' , has_message_check='true',reader_ack_grouping_time_millis=0,max_message_count=0,max_stre
am_bytes=0,is_saas='true',tenant_name='some_tenant',duration=interval '1200 milliseconds' , read_timeout=interval '500 millis
econds') PARSER public.KafkaJSONParser(flatten_arrays=True,flatten_maps=True) REJECTED DATA AS TABLE "<schema>"."<tabl
e>_rejected" AUTO STREAM NAME 'vkstream_itom_di_scheduler_provider_default_microbatch_1_test'

For example:

COPY mf_shared_provider_default.opr_event_annotation SOURCE public.SinglePulsarSource(timeslice_id='default' , serviceURL='lo


calhost:999999', stream='public/default/nonexistenttopic|0|', log_level='error' , subscription='default' , performance_test='false' ,
has_message_check='true',reader_ack_grouping_time_millis=0,max_message_count=0,max_stream_bytes=0,is_saas='true',tena
nt_name='null',duration=interval '30000 milliseconds' , read_timeout=interval '500 milliseconds', validate_hostname='false') PAR
SER KafkaParser() REJECTED DATA AS TABLE mf_shared_provider_default.opr_event_annotation_rejected STREAM NAME 'vkstream
_rej_create';

3. Run the following command to check the rejected table creation:

select count(*) from v_catalog.tables where table_name ilike '<table>_rejected';

For example:

select count(*) from v_catalog.tables where table_name ilike 'opr_event_annotation_rejected';

The output displays 1 .


4. If the data is still not streaming to the Vertica database, run the following commands to restart the itom-di-scheduler-udx
pod:

kubectl get pods -n <application namespace>

Note down the itom-di-scheduler-udx pod name.

kubectl delete pod itom-di-scheduler-udx-<pod value> -n <application namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 264
AI Operations Management - Containerized 24.4

1.16.2.9. Error getting topic partitions metadata


Data doesn't flow to the Vertica database and the following error appears in the scheduler.log file:

Error getting topic partitions metadata: ConnectError or Failed to establish connection: Connection refused

Cause
This issue is because the OPTIC DL Vertica plugin fails to establish connections due to the OPTIC DL Message Bus port not
being open or OPTIC DL SSL certificate issues. The OPTIC DL Message Bus client fails to interact with OPTIC DL Message Bus
and the message streaming gets affected. As a result, the data streaming to the database gets affected.

To further check the ERROR message and the reason from the scheduler.log , run the following command:

grep "ERROR" scheduler.log | grep "Failed extracting partition for topic"

The output similar to the following appears:

ERROR 27 [Notification-Configuration-Processor-Thread-5] [Config name: dataset_name_abc_dblogerror_trail_abc, config type: MICROBA


TCH, Event type: CREATE]--- c.v.solutions.kafka.model.StreamSource : Unable to fetch the partition information: UDX Version(2.
12.0-20) Failed extracting partition for topic - requesting restart ConnectError

Solution
Follow these steps to resolve this issue:

1. Run the following command to get the configured OPTIC DL Message Bus Proxy or broker port from scheduler config
map:
kubectl get cm itom-di-udx-scheduler-scheduler -n opsb-helm -o yaml | grep pulsar.datasource.port
The output similar to the following appears:
pulsar.datasource.port: "31051"
2. Verify that the configured pulsar port is available and reachable from all the Vertica nodes.
3. Verify the OPTIC DL SSL certificates are valid.
4. Restart the Vertica database.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 265
AI Operations Management - Containerized 24.4

1.16.2.10. Insufficient resources on pool error


Insufficient resources on pool error on Vertica and data blocked from reaching the database.

java.sql.SQLTransientException: [Vertica][VJDBC](3587) ERROR: Insufficient resources to execute plan on pool <suite>_resource_pool

Cause
This is because the streaming and bulk upload channels store metrics into OPTIC Data Lake simultaneously. Also, Vertica
database queries for these operations are with high resource demands. This error message indicates that the query was
unable to execute as the available resources at that point in time in Vertica didn't match the estimated resource
requirement. The resource is memory in MB.

Solution
You must configure resource pools for different operations recommended by the suite. The parameters of the resource pools
depend on the concurrent load from the operation.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 266
AI Operations Management - Containerized 24.4

1.16.3. Troubleshoot OPTIC DL connection issues


This section covers possible problems that can cause the connections to OPTIC Data Lake to fail and how you can
troubleshoot them.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 267
AI Operations Management - Containerized 24.4

1.16.3.1. Vertica database is not reachable


The Vertica database is unreachable and the following error appears in /var/log/messages

echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message

Cause
This error message appears in the /var/log/messages before the database goes into an unexpected shut down. This is because
the deployments have high RAM provisioning and low disk I/O.

Solution
To resolve this issue, set the dirty tuning ratios for the OS of the node where the issue occurred as mentioned in the Vertica
documentation: Tuning Parameters.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 268
AI Operations Management - Containerized 24.4

1.16.3.2. Failed to connect to host


java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](100176) Failed to connect to host <vertica_hostname> on port 5433.
Reason: Failed to establish a connection to the primary server or any backup address.

Cause
This issue occurs if the Vertica database is down or if the database connection details provided during initial configuration
isn't correct.

Solution
Try the following solutions one by one:

1. Verify the configuration parameters for the database connection are correct.
2. Verify if the Vertica database node is running using the adminTools interface.
3. If there are any configuration changes for any of the particular pod, restart the pod for the new configurations to get
updated.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 269
AI Operations Management - Containerized 24.4

1.16.3.3. Correlated group of events does not


appear in OBM

Cause
The longer downtime may occur during the upgrade process. This is because the OPTIC Data Lake is inactive for more than
16 hours and correlated group of events does not appear in OBM.

Solution
This is a known issue in Kubernetes. Delete the cron jobs definitions and resubmit again.

Run the command:

kubectl get cronjob -n <application namespace> itom-analytics-community-detector-job -o yaml > community-detector-job.yaml

kubectl delete cronjob -n <application namespace> itom-analytics-community-detector-job

kubectl create -f community-detector-job.yaml

rm community-detector-job.yaml

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 270
AI Operations Management - Containerized 24.4

1.16.3.4. Data Source not getting listed in PD


DataSource “OPTIC Data Lake” not getting listed in Performance Dashboard (PD) while designing a graph.

Cause
This issue may occur when there could be a topology synchronization issue.

Solution
1. Check if the selected CI's “monitored_by” property contains value as “SiteScope” i.e. monitored by SiteScope.
2. In OBM UI, select Administration > Setup and Maintenance > Infrastructure Settings > Foundations. Select
ITOM Intelligent Data Lake and check if Data Receiver Endpoint URL is configured.
3. If data receiver endpoint is configured but still data source is not getting listed in PD UI then there may be an issue with
Data access Endpoint not being available. Contact Support team.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 271
AI Operations Management - Containerized 24.4

1.16.3.5. Table does not get deleted after


dataset is deleted
The dataset configuration DELETE API call deletes the dataset configuration from the configuration schema but tables aren't
deleted in Vertica.

Run the following queries on the Vertica node for further check:

Note

For the tenant name ' provider ' and deployment name ' default ' the schema names are: itom_di_configuration_provider_
default , itom_di_metadata_provider_default , mf_shared_provider_default .

1. In the schema itom_di_configuration_<tenant>_<deployment> , for table DATASET , make sure that the configuration isn't
present:
Select * from "itom_di_configuration_<tenant>_<deployment>"."DATASET" where name ilike '<Dataset Name>';
2. In the schema itom_di_metadata_<tenant>_<deployment> , for table FIELD_METADATA , make sure the metadata is present:
Select * from "itom_di_metadata_<tenant>_<deployment>"."FIELD_METADATA" where dataset_name ilike '<Dataset Name>';
3. In the schema mf_shared_<tenant>_<deployment> , make sure that the table is present:
Select * from "mf_shared_<tenant>_<deployment>"."<Dataset Name>";

Cause
This issue is because the DELETE call sent by the itom-di-administration pod isn't received or not processed by the itom-di-meta
data-server pod.

Solution
Perform these steps on the Vertica node to resolve this issue:

1. Run the following query to get the server ID:


Select DATASET_SERVER_UNIQUE_ID from "itom_di_metadata_<tenant>_<deployment>"."FIELD_METADATA" where dataset_name ilik
e '<Dataset Name>';
Note down the server ID.
2. Delete the records from FIELD_METADATA , FIELD_TAG , and DATASET_TAG in the schema itom_di_metadata_<tenant>_<deplo
yment> :
DELETE from "itom_di_metadata_<tenant>_<deployment>"."FIELD_METADATA" where DATASET_SERVER_UNIQUE_ID='<serverId fro
m the step 1>';
DELETE from "itom_di_metadata_<tenant>_<deployment>"."FIELD_TAG" where DATASET_SERVER_UNIQUE_ID='<serverId from the st
ep 1>';
DELETE from "itom_di_metadata_<tenant>_<deployment>"."DATASET_TAG" where DATASET_SERVER_UNIQUE_ID='<serverId from th
e step 1>';
3. Drop the tables from the schema mf_shared_<tenant>_<deployment> :
DROP TABLE "mf_shared_<tenant>_<deployment>"."<Dataset Name>";
4. Send the POST API call for the dataset.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 272
AI Operations Management - Containerized 24.4

1.16.3.6. Vertica catalog directory has random


empty folders
Empty folders with random unreadable names in the Vertica catalog directory.

Cause
This issue is because, during the OPTIC DL Vertica Plugin initialization or reconfiguration, the SSL directory path may get
initialized with unwanted characters. This may result in the creation of these empty folders in the Vertica catalog directory.

Solution
There is no other impact on the data ingestion and the folders can be manually cleaned up. You must check all the Vertica
nodes and perform the clean up on each of the Vertica nodes. Follow these steps:

1. Log on to the Vertica node and go to the catalog folder.


2. Run the following command to get the list of empty folders:
find . -type d -empty
This lists the folders which are empty. Check the folder names with unwanted characters.
3. Run the following command to get the folder ID:
- ls -ila
Note down the folder ID.
4. Run the following command to delete the folder id:
find . -inum <Folder ID from step 3> -exec rm -ir {} \;

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 273
AI Operations Management - Containerized 24.4

1.16.4. Troubleshoot OPTIC DL pods issues


This section covers possible problems that can cause the OPTIC Data Lake data pods issue and how you can troubleshoot
them.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 274
AI Operations Management - Containerized 24.4

1.16.4.1. The itom-di-metadata-serverand itom-


di-data-access-dpl pods are not Up and Running
after installation
The itom-di-metadata-server and itom-di-data-access-dpl pods aren't Up and Running completely after the installation. The meta
data-server-out.log displays FlywayMigrateException .

Cause
This issue occurs during the suite installation when the itom-di-metadata-server pod gets restarted during the start of the pod
which causes the metadata schema to go into inconsistent state.

To verify the issue perform these steps:

1. After the suite installation run the following command to check the pod status:
kubectl get pods -n <suite namespace>
The status of all the pods appears. ​The following output appears for the itom-di-metadata-server and itom-di-data-access-dp
l pods not running completely:

itom-di-data-access-dpl-85656848f7-dkdqp 0/2 Init:2/4 0 9h


itom-di-metadata-server-cbbdd9d79-jj82j 1/2 Running 77 9h

2. Run the command: kubectl describe pv itom-log


Note down the Path from the output. For example: /var/vols/itom/<application namespace>/itom-log
3. Go to the location <log path>/di/metadata-server/__itom-di-metadata-server-<pod value> and open the metadata-server-out.log
file. The following exception appears:

2021-02-11 14:47:26.331 ERROR 96 [main] --- o.s.boot.SpringApplication : Application run failed

org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'flywayInitializer' defined in class pa


th resource [org/springframework/boot/autoconfigure/flyway/FlywayAutoConfiguration$FlywayConfiguration.class]: Invocation of i
nit method failed; nested exception is org.flywaydb.core.internal.command.DbMigrate$FlywayMigrateException:

Migration <schema migration file name >.sql failed

-----------------------------------------------------------------
SQL State : <SQL state>

Error Code : <error code>

Message : [Vertica][VJDBC](<error code>) ROLLBACK: <Rollback message>

Location : db/migration/<schema migration file name >.sql (<File Location with issue>)

Line : < error line number >

Statement : < SQL Statement with issue>

Solution
Perform these steps to resolve this issue:

Note

Perform these steps only in the case of a new installation of the suite. In a managed Kubernetes deployment, a reference to the
master node in this topic implies bastion node.

1. Run the following command on the master node to scale down the itom-di-metadata-server pod:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 275
AI Operations Management - Containerized 24.4

kubectl -n <suite namespace> scale deployment itom-di-metadata-server --replicas=0


2. Run the following query on the Vertica server to drop the itom_di_metadata_provider_default schema from the database:
drop schema itom_di_metadata_provider_default cascade;
3. Run the following command on the master node to scale up the itom-di-metadata-server pod:
kubectl -n <suite namespace> scale deployment itom-di-metadata-server --replicas=1

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 276
AI Operations Management - Containerized 24.4

1.16.4.2. How to recover itomdipulsar-


bookkeeper pods from read-only mode
This topic helps you to verify and recover the itomdipulsar-bookkeeper pods from read-only mode. If the used disk space is
95% the Bookie switches to read-only mode.

The Grafana Pulsar-Bookie Metrics dashboard, ReadOnly Bookies panel displays value indicating that the pod
disk is 95% filled. This means that the disk space is low. The Healthy pane, the Writeable Bookies and writable bookies
(percentage) panels display lower values than configured.

Tip

You can check the number of configured Bookies on the Grafana Pulsar-Bookie Metrics
dashboard.

On further check, ITOM DI/Pulsar Overview dashboard, Pod overview pane, the Bookie and Bookies up panels display
lower values than configured.

Run the command to check the EnsembleSize :

kubectl -n <application namespace> describe cm itomdipulsar-broker | grep -A2 managedLedgerDefaultEnsembleSize

The default value of EnsembleSize is 2 . For a production setup (other than the low footprint deployment) the default value is
2.

If the number of writable Bookies is less than the EnsembleSize configured, the application stops working.

Perform the steps mentioned in any one of the solutions to resolve the issue so that you have a minimum of two writable
Bookies.

Solution 1
This solution is applicable for the application deployed on AWS and gives you steps to increase the storage volume that helps
the system handle the additional load.

If you have set up AWS EBS dynamic volumes and deployed the application, you can edit the storage class and increase the
capacity of the PVC volume. Follow these steps to resolve this issue:

1. Log on to the bastion node.


2. Open the OPTIC DL Message Bus storage class YAML file. After the volumeBindingMode parameter, include the
parameter allowVolumeExpansion: true . Save the file.
3. Run the following commands to get the PVC names:
kubectl -n <application namespace> get pvc
Note down the itomdipulsar-bookkeeper PVC names from the Name column.
4. Run the following command to edit each of the PVC and increase the Capacity :
kubectl -n <application namespace> edit pvc <pvc name>
You must resize the storage of one PVC at a time. Wait for the changes to be complete for one PVC and then update the
next. You can only increase the storage volume and can't decrease it. You can perform the resizing once in six hours for
an EBS volume.

Solution 2
This solution gives you steps to add more worker nodes to handle the additional load.

To resolve this issue you must add additional itomdipulsar-bookkeeper pods. This solution requires additional worker nodes and
storage but, enables the OPTIC DL Message Bus to function without data loss.

1. Add two additional workers nodes to the existing Kubernetes cluster.


2. After you add the worker nodes to the cluster, configure the PV on that node for the itomdipulsar-bookkeeper pod with the
required size.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 277
AI Operations Management - Containerized 24.4

3. Run the following command to scale up the itomdipulsar-bookkeeper pod by 2:

kubectl -n <application namespace> scale statefulset itomdipulsar-bookkeeper --replicas=<Number_Of_ExistingReplicas>+2

For example: If the existing itomdipulsar-bookkeeper replicas are 3, the command to scale up is:

kubectl -n helm scale statefulset itomdipulsar-bookkeeper --replicas=5

4. Go to the ITOM DI/Pulsar - Bookie Metrics dashboard and check if the Writable Bookies panel displays a minimum
of two BookKeeper pods as writable.

Solution 3
This solution is applicable for the application deployed on embedded Kubernetes and gives you steps to clean up the
BookKeeper pod data that aren't consumed by the application. This relieves the load on the system. However, there will be a
minor loss of data that isn't consumed by the application.

When two or more itomdipulsar-bookkeeper pods reach read-only state, the applications can't ingest data to the OPTIC DL
Message Bus. Applications can consume data from the OPTIC DL Message Bus but won't be able to acknowledge the
messages. To resolve this issue, you must clean up the itomdipulsar-bookkeeper pod data for the pods that aren't available.
With this solution, you will lose the messages that aren't yet consumed by the application.

1. Run the following command to stop the pods:

/opt/kubernetes/scripts/cdfctl.sh runlevel set -l DOWN -n <application namespace>

2. On each worker node, clean up the volumes used by OPTIC DL Message Bus itomdipulsar-bookkeeper and itomdipulsar-zook
eeper pods. You must clean up only the contents of the volume folder. Run the following commands:

kubectl -n <application namespace> get pvc | grep 'itomdipulsar-bookkeeper\|itomdipulsar-zookeeper\|VOLUME'

Note down the VOLUME names for the itomdipulsar-bookkeeper and itomdipulsar-zookeeper pods.

kubectl describe pv <Volume> | grep -A 2 'Claim\|Node Affinity\|Source' | grep 'Claim\|hostname\|Path'

Where, Claim indicates BookKeeper or Zookeeper volume claims.


Term indicates on which node the PV resides.
Path gives the path of the volume.
Note down the Path for the volumes.

cd <path to the volume used by OPTIC DL Message bus pods>


rm -rf *

3. Run the following command to drop Vertica Streaming Loader schema:

DROP SCHEMA <schema name> CASCADE;

The default value for the Vertica Streaming Loader is: itom_di_scheduler_provider_default
For example:

DROP SCHEMA itom_di_scheduler_provider_default CASCADE

4. Run the following command to start the pods:

/opt/kubernetes/scripts/cdfctl.sh runlevel set -l UP -n <application namespace>

You will see that the pods aren't in the Running state. The pods will run after you complete the next step.
5. Run the following command to bring the pods to the running state. For noop= true , you must use a parameter that isn't
used in any of the application charts:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 278
AI Operations Management - Containerized 24.4

helm upgrade <release name> <application chart path> -n <application namespace> --set noop=true -f <values YAML filename>
--no-hooks

6. Go to the ITOM DI/Pulsar - Bookie Metrics dashboard and check if the Writable Bookies panel displays a minimum
of two BookKeeper pods as writable.
7. Run the following commands to restart the itom-di-administration pod:

kubectl scale deployment itom-di-administration --replicas=0 -n <application namespace>

kubectl scale deployment itom-di-administration --replicas=1 -n <application namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 279
AI Operations Management - Containerized 24.4

1.16.4.3. itom-di-dp-worker-dpl pod is in


CrashLoopBackOff state
OPTICDL:24.2/DPCrashbackloopoff

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 280
AI Operations Management - Containerized 24.4

1.16.4.4. The itomdipulsar pods stuck in the init


state
The itomdipulsar pods stuck in the init state and PVC in Pending state.

Cause
This issue is because of the following:

you have configured the Local Volume Provisioner's PV with disk sizes less than the default size
OR
the disk size isn't the same as the size in the values YAML file.

Solution
Perform these steps to resolve this issue:

1. Run the following command and note down the PVC names for the storage class:
kubectl get pvc -n <suite namespace>
The default name for the storage class is fast-disks . If you have provided any other name for the storage class note
down the storage class name and run the following command to list the PVC:
kubectl get pvc -n <suite namespace> | grep <storage class name> | grep -v NAME | awk '{print $1}'
2. Run the following command to delete each of the PVC:
kubectl -n <suite namespace> delete pvc <PVC name>
3. Run the following commands to delete the itomdipulsar-bookkeeper and itomdipulsar-zookeeper StatefulSet pods:
kubectl get statefulset -n <suite namespace>
Note down the itomdipulsar-bookkeeper and itomdipulsar-zookeeper StatefulSet pod names and run the following
commands:
kubectl delete statefulset -n <suite namespace> <itomdipulsar-bookkeeper pod name>
kubectl delete statefulset -n <suite namespace> <itomdipulsar-zookeeper pod name>
4. Run the following commands and note down the PV names for zookeeper-data , bookkeeper-journal and bookkeeper-ledgers :
kubectl get pv | grep <storage class name> | grep -v NAME | awk '{print $1}'
5. Run the following command and note down the Capacity of each of the PV:
kubectl describe pv <PV name>
6. You must update the values YAML file with the values from the earlier steps and upgrade the helm chart.

Follow these steps to update the YAML file and upgrade the helm chart:

1. Update the values YAML file for OPTIC DL Message Bus with the Capacity values noted down from the earlier step:

bookkeeper:
volumes:
journal:
name: "journal"
size: "<DISKSIZE>"
ledgers:
name: "ledgers"
size: "<DISKSIZE>"
zookeeper:
volumes:
data:
name: "zookeeper-data"
size: "<DISKSIZE>"

2. Run the following command to upgrade the hem chart:


helm upgrade <release name> <chart name> -n <suite namespace> -f <values.yaml>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 281
AI Operations Management - Containerized 24.4

1.16.4.5. itomdipulsar-zookeeper pod in


CrashLoopBackOff state
The itomdipulsar-zookeeper pod is in CrashLoopBackOff state.

Cause
This issue is because the node is restarted during the data flow. The itomdipulsar-zookeeper log file reports the following
exception:

2022-03-15T10:52:49,629 [main] INFO org.apache.zookeeper.server.persistence.FileSnap - Reading snapshot /pulsar/data/zookeeper/ve


rsion-2/snapshot.400068543
2022-03-15T10:52:49,647 [main] ERROR org.apache.zookeeper.server.quorum.QuorumPeer - Unable to load database on di
sk
java.io.IOException: Transaction log: /pulsar/data/zookeeper/version-2/log.40005ecdf has invalid magic number 0 != 151488
4167
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:643) ~[org.apache.pulsar-puls
ar-zookeeper-2.7.2.jar:2.7.2]

Solution
Perform these steps to resolve this issue:

1. On the control plane run the following command to scale down the itomdipulsar-zookeeper pod:
kubectl scale statefulset itomdipulsar-zookeeper --replicas=0 -n <suite namespace>
2. Run the following command to get the PV and the PVC attached to the itomdipulsar-zookeeper pod in
CrashLoopBackOff state:
kubectl get pvc -n <suite namespace>
3. Run the following command to describe the PV:
kubectl describe pv of zookeeper data
4. From the output, go to the location of the PV on the node where it's running.
5. Delete the corrupted file reported in the itomdipulsar-zookeeper log file.
6. Run the following command to scale up the itomdipulsar-zookeeper pod:
kubectl scale statefulset itomdipulsar-zookeeper --replicas=1 -n <suite namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 282
AI Operations Management - Containerized 24.4

1.16.4.6. Postload pods do not start and are


stuck in 1/2 status
The itom-di-postload-taskcontroller and itom-di-postload-taskexecutor pods don't start and are stuck in 1/2 status.

Cause
This is because of a failure in the creation of one or more of the Postload internal topics in the OPTIC Data Lake Message Bus.
Sometimes, the topic partitions aren't successfully created even though topic creation has returned success in OPTIC Data
Lake Message Bus. This is because of an issue in OPTIC Data Lake Message Bus:
https://github.com/apache/pulsar/issues/9173

For further debugging, you can check the taskcontroller.log file if the following error messages appear:

2021-06-07 08:54:29.036 INFO 106 [main] --- c.m.itomdi.utils.manager.TopicCreator : Going to create topic : di_postload_task_topic
with partition count 1

2021-06-07 08:54:29.065 INFO 106 [main] --- c.m.itomdi.utils.manager.TopicCreator : Namespace itomdipostload already exists

2021-06-07 08:54:29.470 ERROR 106 [main] --- c.m.i.u.p.PostloadProcessorProducer : Exception in creating producer for the topic: p
ersistent://public/itomdipostload/di_postload_task_topic.

org.apache.pulsar.client.api.PulsarClientException$TopicDoesNotExistException: Topic Not Found.

Solution
Perform the following steps to resolve this issue:

1. Log on to the OPTIC Data Lake Message Bus bastion pod:


kubectl exec -it itomdipulsar-bastion-0 -n <suite namespace> -c pulsar bash
2. Run the following command with the topic name that appears in the log file:
bin /pulsar-admin topics create-missed-partitions public /itomdipostload/<topic name in the log file>
For example:
bin /pulsar-admin topics create-missed-partitions public /itomdipostload/di_postload_task_topic
3. Run the following command to verify the topic creation:
pulsar-admin topics list public/itomdipostload
4. Run the following command to check if there are any errors for the topic created:
pulsar-admin topics stats public/itomdipostload/<topicname>
5. Run the following command to check if the pods are running:
kubectl get pods -n <suite namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 283
AI Operations Management - Containerized 24.4

1.16.4.7. Suite deployment failed with pods in


pending state
The suite deployment failed with pods in pending state.

Cause
This issue may be because of insufficient memory in the pod.

To verify, run the commands:

1. kubectl get pods -n <suite namespace>


If any of the pod(s) from the output display status as Pending, note down the pod name.
2. kubectl describe pods -n <suite namespace> <pod name>
A message similar to the following appears:
Insufficient memory (3)

Solution
Perform the following steps to resolve the issue:

1. Run the following command to check the memory:


kubectl get nodes
Note down the worker node name for each worker listed.
2. Run the following command:
kubectl describe nodes <Worker_node_name>
From the output scroll to Allocated Resources similar to the following:

Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 6515m (54%) 38150m (317%)
memory 19885Mi (98%) 49904Mi (246%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)

The output displays the requested memory for the worker. If the percentage used in the Requests column is high, 98%
for example, it's likely additional pods won't schedule on this worker. The output displays a list of memory requested
from pods on the running system.
In the output, scroll to the section Non-terminated Pods similar to the following:

Non-terminated Pods: (24 in total)

Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AG
E

--------- ---- ------------ ---------- --------------- ------------- ---

core fluentd-w589p 105m (0%) 500m (4%) 205Mi (1%) 650Mi (3%) 3h19m

core itom-logrotate-xs6h7 100m (0%) 200m (1%) 100Mi (0%) 200Mi (0%) 3h19
m

core local-volume-provisioner-8989t 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h7m

The list of pods and their requested usage appears. You may see some pods with zero memory requested. If a pod isn't

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 284
AI Operations Management - Containerized 24.4

executed yet but scheduled, it won't show any memory allocated. With this output, you can find what's deployed on the
node and the requested memory to schedule the pod. The stateful sets have an additional dependency in Kubernetes. If
you have multiple replicas, they will start in a numerical order. They will wait to indicate ready until the next replica is
up and running successfully. Therefore, for example, in a replica set, the bookkeeper or zookeeper pods fail to start
instance 0, it's possible that there is no enough memory on other workers to schedule the later replicas for execution.
You will have to check the other workers memory usage.
3. Make sure to calculate the memory according to the requirement and increase the memory of the worker nodes.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 285
AI Operations Management - Containerized 24.4

1.16.5. Troubleshoot using ITOM DI monitoring


dashboards
This section covers possible problems that can resolve using the OPTIC Data Lake ITOM DI monitoring dashboards.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 286
AI Operations Management - Containerized 24.4

1.16.5.1. Guidelines for adding panels to the


OPTIC Data Lake Health Insights dashboard
You can add more panels to the out of the box ITOM DI Grafana dashboards or create custom dashboards. Follow these
guidelines:

1. You mustn't alter or save the changes in the OOTB ITOM DI dashboard.
2. Create a copy from the ITOM DI dashboard and then add panels.
3. You must save the dashboard with a different name. Make sure that the new name isn't the same as the existing OOTB
ITOM DI dashboards.
4. You must create and save the custom dashboard with a different name and in a different folder.

Alter a dashboard
You mustn't alter or save the changes in the OOTB ITOM DI dashboard. You may create a copy or duplicate the required
dashboard and then use it to alter.

1. Navigate to the dashboard you want to alter.


2. Click the Dashboard settings icon.
3. Click Save As... on the left bar.
4. Type the name of the duplicated dashboard and click Save.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 287
AI Operations Management - Containerized 24.4

1.16.5.2. Vertica Streaming Loader dashboard


panels have no data loaded
The ITOM DI/Vertica Streaming Loader dashboard panels display "No data".

Cause
This issue is because the interval level in the ITOM DI/Vertica Streaming Loader dashboard isn't set to 1.5 times more the
value of scrapeIntervalSec in the values YAML file.

The monitoring.verticapromexporter:scrapeIntervalSec: 60 parameter in the values YAML and the interval mentioned in the ITOM
DI/Vertica Streaming Loader dashboard are dependent on each other. The default value of
the scrapeIntervalSec parameter in the values YAML file is 60 seconds.
If you have edited scrapeIntervalSec value make sure to change the interval level in the dashboard.

Solution
In the ITOM DI/Vertica Streaming Loader dashboard, considering the scrapeIntervalSec as 60 sec , update the interval level
as 2 mins or 120 sec .
You must update the following panels with the interval level:

1. Pod Overview
Avg message ingestion rate
Avg ingestion throughput (bytes/sec)
2. Data Flow Summary
Scheduler message ingestion rate (all topics)
Scheduler ingestion throughput (bytes/sec) (all topics)
3. Per Topic
Scheduler message ingestion rate
Scheduler read rate
4. Per Partition
Scheduler message ingestion rate
Scheduler read rate

Perform the following steps in each of the panels to view the data:

1. Click on the drop-down icon next to the panel heading.

2. Click Edit.
3. In Metrics field, edit the value from sum(rate(vertica_pulsar_udx_message_count[1m])) to sum(rate(vertica_pulsar_udx_messag
e_count[2m]))

4. Click the Dashboards Settings icon and click Save As....


5. Save the dashboard with a new name. Make sure that the new name isn't the same as the existing ITOM DI dashboard
names.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 288
AI Operations Management - Containerized 24.4

1.16.5.3. The DP worker memory usage meter


displays increasing memory usage
OPTICDL:24.2/DPWorkerMemoryincreases

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 289
AI Operations Management - Containerized 24.4

1.16.5.4. Data Flow Overview dashboard displays


some topics with message batch backlog greater
than 10K
After starting the data collection, the data isn't arriving in Vertica at the expected time. The ITOM DI/Data Flow Overview
dashboard displays some topics with the message batch backlog greater than 10K .

Cause
The issue is either because the Vertica Streaming Loader isn't able to pull data from OPTIC DL Message Bus to these topics.
OR there may be an issue in Vertica resource usage which halts a few of the data streams.

Solution
Perform the following steps to resolve the issue:

1. Click the drop-down icon on the Top 10 topics backlog panel and go to the ITOM DI/ Pulsar - Topic dashboard.

2. Form the Topic drop-down, select the particular topic for which the message backlog is greater than 10k .

3. Scroll down to the Local msg batch backlog panel and hover the cursor over the graph to find which subscription is
having the message backlog for that topic.

4. If the subscription has itom_di_scheduler in the subscription name, go to the ITOM DI/Vertica Streaming
Loader dashboard.

5. In the ITOM DI/Vertica Streaming Loader dashboard, see the Scheduler message ingestion rate panel.
Select the topic for which the message backlog is higher.

If the ingestion rate is 0 for that topic, then there are some errors in the streaming of data from that particular topic.​ Follow
these steps:

1. Run the following command to check the Vertica Streaming Loader logs:
kubectl -n <suite namespace> logs itom-di-udx-scheduler-<pod value> -c itom-di-udx-scheduler-scheduler

2. From this log information check the exact cause of the issue and the remediation steps.

If the ingestion rate is > 0 for that topic, it's because the data is getting streamed to Vertica but the streaming rate is slow.
Follow these steps:

1. Go to the Vertica dashboard.

2. In the Vertica dashboard, check the CPU usage, memory usage, and Resource pool memory usage.

3. If any of these panels display resource issues, check the Vertica logs. Go to the following location on the Vertica system
to check the logs:
<catalog-path>/<database-name>/<node-name>_catalog/vertica.log

4. From this log information check the exact cause of the issue and the remediation steps.

If the cause of the issue isn't clear from logs, collect the log details and contact Software Support.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 290
AI Operations Management - Containerized 24.4

1.16.5.5. Data not found in Vertica and Scheduler


batch message count is zero
The data is missing in the Vertica database. The Vertica Streaming dashboard Scheduler batch message count panel
shows zero but, the Pulsar-Topic dashboard shows an increasing message backlog. On a further check in the Pulsar-
Topic dashboard for any topic with continuous message streaming, the backlog is increasing but data rows in the data
table aren't increasing.

Possible causes
Cause 1: Data set and micro batch information aren't available
Cause 2: Communication issue between the Vertica node and OPTIC DL Message Bus Proxy services
Cause 3: Data is in the Vertica database rejected tables
Cause 4: Mismatch in the mapping of a topic name to the corresponding table name or missing configuration

Solution

Data set and micro batch information aren't available


1. Log on to the Vertica database.
2. Run the following query to check if the stream_* tables have the required configurations for the dataset:
select table_schema, table_name, create_time from v_catalog.tables where table_name ILIKE '<table name>' ;
3. Run the following query:
select * from itom_di_scheduler_provider_default.stream_sources where source ilike '<table name>' ;
Check if the stream_* table enabled column has the value as t for the corresponding dataset.

If the data table is present, but micro batch (stream source) isn't present in stream_sources , follow these steps:

1. Run the following commands to restart the itom-di-scheduler-udx pod:


kubectl get pods -n <suite namespace>
Note down the itom-di-scheduler-udx pod name.
kubectl delete pod itom-di-scheduler-udx-<pod value> -n <suite namespace>

Tip

Wait for a few minutes for the pod to


initialize.

2. Run the queries mentioned in steps 2 and 3 and check if stream_sources have the micro batch information.

Communication issue between the Vertica node and proxy services


In case of communication issues between the Vertica node and OPTIC DL Message Bus Proxy services, you can use the disyst
emcheck tool to check the connectivity. Perform the following steps:

Note

In a managed Kubernetes deployment, a reference to the master node in this topic implies a bastion
node.

1. The disystemcheck tool is available when you install the OPTIC DL Vertica plugin. It's available in the /usr/local/itom-di-puls
arudx/bin/ folder. If you are testing on any other node, you must copy this tool to that node. Go to the /usr/local/itom-di-pul
sarudx/bin/ folder on the Vertica node.
2. Run disystemcheck tool as follows:
./disystemcheck -h <master hostname>
where:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 291
AI Operations Management - Containerized 24.4

-h : is the hostname of the Kubernetes master or worker that you want to check.

Note

If the OPTIC DL service ports are different than the default ones, use the disystemcheck CLI options, to specify the
ports.

3. If there are issues in connectivity or port availability, the tool gives error messages that you must fix.

Data is in the Vertica database rejected tables


For each of the streaming tables, a corresponding rejected table gets created. This table has the rejected data and the
reason for rejection while copying data from OPTIC DL Message Bus to Vertica.

If the data is available in the rejected table, check the rejection reason in the same table and fix the issue.

Mismatch in the mapping of a topic name to the corresponding


table name or missing configuration
1. Log on to the Vertica database. Use either the admin tools or any of the DB Visualizer tool of your choice to view the
database tables.
2. Click itom_di_configuration_* > TABLE > MICROBATCH. In the Data tab, check the CONTENT_JSON row if the table
name and topic name are the same.
For example:
"topic_name": "SCOPE_GLOBAL",
"streaming_table_schema": "<suite>_store",
"streaming_table_name": "SCOPE_GLOBAL"
3. If there is a mismatch, update them. If the configuration is missing, check the administration.log file to see the complete
details. If the cause of the issue isn't clear from the logs, collect the log details and contact Software Support.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 292
AI Operations Management - Containerized 24.4

1.16.5.6. Postload Detail dashboard

When to look at this dashboard


You may use the dashboard in the following cases:

Case 1 - The data isn't loaded to the Vertica database. The Task Controller and Task Executor pods are up and running.
However, in the Postload Detail dashboard, the Taskflow drop-down doesn't list the taskflows.

Follow these steps to check the data flow and troubleshoot:

1. Open the Postload Detail dashboard.


2. In the Taskflow drop-down, check if it lists all the configured taskflows. If you see that there are missing or no
taskflows, this means that the Postload pods haven't received the configurations. Continue with the next steps.
3. Go to the Postload Overview dashboard. In the Taskflow overview section, check if the Taskflow state and Task
state panels display 0.

To resolve this issue, follow the steps mentioned in the troubleshooting scenario Postload Detail dashboard Taskflow drop-
down doesn't list the configured taskflows.

About the dashboard


The Postload Detail dashboard provides detailed taskflow information during the Postload processing of data. This dashboard
also provides the following details:

The state and status for each taskflow and details of the configured tasks.
The Task Controller and Task Executor pod memory usage details like CPU, memory usage, and direct memory.

You can click the Postload Overview button to go to the dashboard.

The following table provides you the details of the Postload detail dashboard:

Panel name Description

Taskflow The drop-down list to select the taskflow.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 293
AI Operations Management - Containerized 24.4

Panel name Description

This panel provides an overview of memory utilized by the running pods, job, and task details.

Taskflow state - The current state of the selected taskflow.


The different states of taskflow: RUNNING , SUSPENDED , FINISHED .

Taskflow status - The current status of the selected taskflow.


The different status of taskflow: ERROR , WARNING .

Task details - The list of tasks with details for selected taskflow. This panel displays the following information:

Taskflow a. Task id
statistics b. Task name
c. Taskflow id
d. Taskflow name
e. Task state
f. Task status
g. Start time
h. End time
i. Retry count
j. Task exec time exceeded

This panel provides the Memory and CPU usage information by the task controller and task executor pods.

1. Task Controller memory usage - The memory used over time by the itom-di-postload-taskcontroller pod.
2. Task Controller CPU usage - The CPU usage over time by the itom-di-postload-taskcontroller pod.
3. Task Executor memory usage - The memory used over time by the itom-di-postload-taskexecutor pod.
Pod details 4. Task Executor CPU usage - The CPU usage over time by the itom-di-postload-taskexecutor pod.
5. Task Controller direct memory usage - The direct memory used over time by the itom-di-postload-taskcont
roller pod.
6. Task Executor direct memory usage - The direct memory used over time by the itom-di-postload-taskexecu
tor pod.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 294
AI Operations Management - Containerized 24.4

1.16.5.7. Postload Detail dashboard Taskflow


drop-down does not list the configured task flows
The ITOM DI/Postload Detail dashboard, Taskflow drop-down doesn't list the configured task flows. On further
investigation, the ITOM DI/Postload Overview dashboard Taskflow state and Task state panels don't show any value.

Possible causes
Cause 1: Due to missing task flow configuration
Cause 2: The itom-di-administration pod isn't running

Solution

Due to missing task flow configuration


In the Vertica database, use either the admin tools or the DB Visualizer tool and check the itom_di_configuration_* schema.
The POSTLOAD_TASKFLOW table should be available with the schema, table, CSV name, and the mapping details. If the
table data is missing contact Software Support.

The itom-di-administration pod isn't running


Run the following command to check if the pod is running:
kubectl get pods -n <suite namespace> | grep itom-di-administration
If the pods aren't running, check the administration.log file. Search the administration.log file with the task id to see the
complete details. The file shouldn't have any errors related to the aggregation. From the log information check
the exact cause of the issue and the remediation steps. To debug the issues, set the log file level from INFO to DEBUG .
If the itom-di-administration pod is running, collect the log details from the following log files and contact Software
Support:
taskexecutor.log
taskexecutor-out.log
taskcontroller.log

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 295
AI Operations Management - Containerized 24.4

1.16.5.8. Postload Overview dashboard

When to look at this dashboard


You may use the dashboard in the following cases:

Case 1 - Data isn't loaded to Vertica. In the Postload Overview dashboard, the Pod overview panel shows the pods aren't in a
running state and the meter isn't at 100%.

Follow these steps to check the data flow and troubleshoot:

1. Log on to OPTIC Data Lake Health Insights dashboards.


2. Open the Postload Overview dashboard.
3. The number represents the number of Task Controller and Task Executor pods that are up and running.
4. The running percentage meter gives a comparison of the pods that are in running state for the
configured pod instances. If all pods are running this meter will be at 100%.

If you observe that the pods aren't running, perform the following troubleshooting steps:

1. Check the Pulsar pod communication.


2. Check if data in the OPTIC DL Message Bus topic reaches the Vertica database.
3. Data from OPTIC DL Message Bus isn't available in Vertica and backlog message is growing.

Case 2 - The data isn't loaded to the Vertica database. In the Postload Detail dashboard, the Taskflow drop-down lists
the taskflow, but the taskflow doesn't run. In the Postload Overview dashboard, the Failed non-recoverable tasks
panel displays tasks that are erroneous.

Follow these steps to check the data flow and troubleshoot:

1. Open the Postload Overview dashboard.


2. In the Taskflow overview section if any tasks of a taskflow are in FAILED_NON_RECOVERABLE status. These taskflows
appear red in color under the Failed non-recoverable tasks panel.
A FAILED_NON_RECOVERABLE taskflow won't execute until you fix the cause for failure/s. A taskflow can end up in FAILED_
NON_RECOVERABLE state if even one of the tasks ends up failed execution.
3. Go to the Postload Detail dashboard and select the FAILED_NON_RECOVERABLE taskflow from the Taskflow drop-down
to get further details for the taskflow failure.

To resolve this issue, follow the steps mentioned in the troubleshooting scenario Postload task flow not running.

About the dashboard


The Postload Overview dashboard provides the message flow information during the Postload processing of data. This
dashboard also provides the following details:

The number of running Task Controller, Task Executor pods.


The overall CPU and memory usage by the Task Controller, Task Executor pods.
The list of Tasks in FAILED_NON_RECOVERABLE state.
The list of Tasks that have exceeded the maximum time.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 296
AI Operations Management - Containerized 24.4

The tasks and taskflow that are present and their states
The details of all the configured taskflows.

You can click the Postload detail dashboard button to go to the detailed dashboard.

The following table provides you the details of the Postload Overview dashboard:

Panel
Description
name

This panel provides an overview of Postload processing pod status and memory usage.

1. # Task Controller - The total number of task controller pods running.


2. Task Controller running - The percentage of task controller pods running.
Pod 3. Task Controller memory usage - The percentage of memory used by the task controller pod.
overview 4. Task Controller CPU usage - The percentage of CPU used by the task controller pod.
5. # Task Executor - The total number of task executor pods running.
6. Task Executor running - The percentage of task executor pods running.
7. Task Executor memory usage - The percentage of memory used by the task executor pod.
8. Task Executor CPU usage - The percentage of CPU used by the task executor pod.

This panel provides an overview of tasks and task flows in Postload processing.

Failed non-recoverable tasks - The list of all tasks in the FAILED_NON_RECOVERABLE state. When a task in the data
processing flow fails, it means that it has retried for the configured number of times (=1440 for bulk load - approx. 1 day) and
then the task status set to FAILED_NON_RECOVERABLE .
This panel displays the following information:

1. Task id
2. Task name
3. Taskflow id
4. Taskflow name
5. Task state
6. Task status
7. Status info
8. Start time
9. End time
10. Retry count
11. Taskflow state
12. Taskflow status

Task exceeded maximum time - The list of tasks that exceed the defined maximum time. The value can be 0 or 1 . The
value 0 indicates the task hasn't exceeded the maximum time defined and 1 indicates the task has exceeded the maximum
running time.

Taskflow state - The state of the configurable taskflow.


The different states of taskflow: SCHEDULED , RUNNING , FINISHED .
The different status of taskflow: SUCCESS , ERROR , WARNING .
Following is the description of the taskflow state and status:

1. A taskflow selected for execution is in SCHEDULED state and status NONE . All the tasks in this task flow are
in READY state at this time.
2. If a taskflow is in RUNNING state and status NONE , that means the taskflow is running or is about to run with the
checks passed. One of its tasks can be in DISPATCHED / RUNNING or all tasks can be in READY state at this time.
3. If a taskflow is in FINISHED state and status SUCCESS , that means all the tasks finished with status SUCCESS .
4. If a taskflow is in FINISHED state and status WARNING , that means one or more tasks finished with status SUCCESS_WI
TH_WARN or FAILED_RECOVERABLE and the other tasks finished with status SUCCESS .
5. If a taskflow is in FINISHED state and status ERROR , that means one of the tasks finished with the status FAILED_NON_
RECOVERABLE .

Task state - The state of the configurable task.


The different states of a task: READY , SCHEDULED , DISPATCHED , RUNNING , FINISHED .
The different status of a task: SUCCESS , SUCCESS_WITH_WARN , FAILED_RECOVERABLE , FAILED_NON_RECOVERABLE .
Following is the description of the task state and status:

1. If a task is in state READY and status NONE , that means the task is in a taskflow that's either scheduled to run or is
currently running.
Taskflow 2. If a task is in state SCHEDULED and status NONE , that means the itom-di-postload-taskcontroller pod has identified
overview the task to will go to the itom-di-postload-taskexecutor pod for running.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 297
AI Operations Management - Containerized 24.4

3. If a task is in state DISPATCHED and status NONE , that means the itom-di-postload-taskcontroller pod has sent the
Panel
Description
task to the itom-di-postload-taskexecutor pod for running.
name
4. If a task is in state RUNNING and status NONE , that means the task is running in the itom-di-postload-
taskexecutor pod.
5. If a task is in state FINISHED and status SUCCESS , that means the task ran successfully.
6. If a task is in state FINISHED and status SUCCESS_WITH_WARN , that means the task run finished with a warning.
7. If a task is in state FINISHED and status FAILED_RECOVERABLE , that means the task run failed and its retry count is
less than or equal to maximum retries.
8. If a task is in state FINISHED and status FAILED_NON_RECOVERABLE , that means the task run failed and its retry count
exceeds maximum retries.

Last 10 executed Taskflows - The list of the last 10 executed taskflows with details. This panel displays the following
information:

1. Taskflow id
2. Taskflow name
3. Taskflow state
4. Taskflow status
5. Status info
6. Start time
7. End time

Taskflow details - The list of taskflows with details based on the selected time range. This panel displays the following
information:

1. Taskflow id
2. Taskflow name
3. Taskflow state
4. Taskflow status
5. Status info
6. Start time
7. End time

Task details - The list of tasks with details based on the selected time range. This panel displays the following information:

1. Task id
2. Task name
3. Taskflow id
4. Taskflow name
5. Task state
6. Task status
7. Start time
8. End time
9. Retry count
10. Taskflow state
11. Taskflow status

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 298
AI Operations Management - Containerized 24.4

1.16.5.9. Request error rate in Data Flow


Overview dashboard is greater than zero​
Some or most of the data isn't reached Vertica. On the further check, the ITOM DI/Data Flow Overview dashboard, the
Request error rate panel shows a value greater than zero.

Cause
The issue is because of the following:

the itom-di-receiver pod isn't running OR


due to invalid certificates OR
improper Header while sending a request to the receiver OR
the OPTIC DL Message Bus Topic isn't available

Solution
Perform the following steps to resolve the causes:

1. In the ITOM DI/Data Flow Overview dashboard, click the drop-down icon on the Request error rate panel and go to
the Receiver dashboard.
2. In the ITOM DI/Receiver dashboard, check if the receiver pod is running from the Pod overview > Receiver and
the Data Flow Overview dashboard panels.
3. Check the Topic drop-down if the list displays the required topic.
4. If the pods aren't running or the topics aren't listed, run the following commands to check the OPTIC DL HTTP Receiver
logs:

kubectl -n <application namespace> logs itom-di-receiver-dpl-<pod value> -c itom-di-receiver-cnt

cd /var/vols/itom/<application namespace>/itom-log/di/receiver/__itom-di-receiver-dpl-<pod value>/receiver-itom-di-receiver-dpl-


<pod_name>.log

From the log information check the exact cause of the issue and the remediation steps.

If the cause of the issue isn't clear from the logs, collect the log details and contact Software Support.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 299
AI Operations Management - Containerized 24.4

1.16.5.10. Request error rate in Data Flow


Overview dashboard is increasing over time
Some or most of the data isn't reached Vertica. On the further check, the ITOM DI/Data Flow Overview dashboard, the
Request error rate panel shows the value that's increasing over time.

Cause
The issue is because either OPTIC DL HTTP Receiver isn't able to forward the messages OR the OPTIC DL Message Bus isn't
accepting messages.

Solution
Perform the following steps to resolve the issue:

1. Open the ITOM DI/Pulsar - Messaging Metrics dashboard.

2. Scroll to Storage, Storage Write Latency panel.​

3. Check if the average storage write latency is between 20 to 50 ms (or according to the suite requirement).
If the average storage write latency is greater than 1 second, it's because the OPTIC DL Message Bus isn't able to write
data to its persistent store. Or the write to the BookKeeper pod is slow.​

4. Open the ITOM DI/Pulsar - Bookie Metrics dashboard for further debugging.

5. In ITOM DI/Pulsar - Bookie Metrics dashboard, check the Writable Bookies for the number of writable bookies and
Writable bookies (percentage).​

6. If one or more BookKeeper has gone to a read-only state, the Writable bookies (percentage) appear less than 100.​

7. Run the following command to check the Bookkeeper logs:


kubectl -n <suite namespace> logs itomdipulsar-bookkeeper-<pod value> -c itomdipulsar-bookkeeper

8. If there are errors related to disk space utilization crossing 95%, increase the OPTIC DL Message Bus Bookkeeper
component replicas.

If the cause of the issue isn't clear from logs, collect the log details and contact Software Support.​

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 300
AI Operations Management - Containerized 24.4

1.16.5.11. The dashboard loads slowly or the


page is unresponsive
After increasing the dashboard data time range to view historical data with a range greater than 24 hours, the dashboard
loads slow or the page is unresponsive.

Cause
This issue is because, with the time range increase in the dashboard, the query results for the panels increase and takes time
to populate. In the case of panels with graphs, the data points appear slow due to increased query results.

Solution
To resolve this issue, don't use the relative time range to view the data. Instead, you must use the absolute time range of the
period (that's lesser than 24 hours) for which you want to view the data. Follow these steps:

1. Log on to the ITOM DI dashboard for which you want to view the historical data.

2. Click .
3. In the Absolute time range section type the From and To time.

For example:
4. Click the Apply time range. You will be able to view the data.

If you prefer to save the dashboard, you must save it with a new name.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 301
AI Operations Management - Containerized 24.4

1.16.5.12. The Receiver dashboard, Average


Message Outgoing Rate panel displays zero
Data collection started, data not arrived to Vertica. In the ITOM DI/Data Flow dashboard, the Pulsar message incoming rate
panel shows the data flow isn't coming to the OPTIC DL Message Bus. On the further check, the ITOM DI/Receiver
dashboard, Average Message Outgoing Rate panel displays zero.

Cause
This issue is because the requests are reaching the OPTIC DL HTTP Receiver but the messages aren't getting published to the
OPTIC DL Message Bus successfully.

Solution
To resolve this issue, follow these steps:

1. Run the command: kubectl describe pv itom-log


Note down the Path from the output. For example: /var/vols/itom/<application namespace>/itom-log
2. On the NFS server, go to the location <log path>/di/receiver/__itom-di-receiver-dpl-<pod value>
3. Open the receiver-out.log file and check for errors.
4. If there is an Topic not found error, to make sure of the topic creation, check if the respective topic lists in the ITOM
DI/Pulsar - Topic dashboard Topic drop-down.
5. If the topic doesn't exist contact Software Support.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 302
AI Operations Management - Containerized 24.4

1.16.5.13. The Receiver dashboard, Avg


incoming requests rate - error (All) panel is
greater than zero req/sec
Some collected data doesn't reach Vertica. On the further check, the ITOM DI/Receiver dashboard, Avg incoming
requests rate - error (All) panel displays the number of rejected requests greater than zero req/sec.

Cause
This issue is because some or all requests coming to the OPTIC DL HTTP Receiver server are failing.

Solution
To resolve this issue, follow these steps:

1. Run the command: kubectl describe pv itom-log


Note down the Path from the output. For example: /var/vols/itom/<application namespace>/itom-log
2. On the NFS server, go to the location <log path>/di/receiver/__itom-di-receiver-dpl-<pod value>
3. Open the receiver-out.log file and check for errors.
4. If there is a receiver.topic.from.header value is true but no header provided with header fieldname <messagetopic> error, there is
a missing header with the request to the OPTIC DL HTTP Receiver.
5. If there are SSL errors, there is an issue with the certificates.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 303
AI Operations Management - Containerized 24.4

1.16.5.14. The Receiver dashboard, Receiver


running panel shows less than 100%
Some or most of the data isn't reached Vertica. On the further check, the ITOM DI/Receiver dashboard, the Receiver
running panel shows less than 100%.

Cause
This issue is because some or all the itom-di-receiver-dpl pods aren't in the running state.

Solution
To resolve this issue, check the status of the itom-di-receiver-dpl pod. Perform the solutions as mentioned for each of the
following pod states:​

ImagePullBackoff : Check if pods are able to successfully pull image from the repository.​

Pod Initializing : Check if the itomdipulsar-bookkeeper , itomdipulsar-broker , itomdipulsar-proxy , itomdipulsar-zookeeper, and itom-
idm pods are running. This is because the itom-di-receiver-dpl pod performs a dependency check on these pods.​

Pending : Run the command kubectl desc itom-di-receiver-dpl-<pod value> and check if there is any memory or CPU crunch
on the CDF cluster.​

CrashloopBackoff/Error : Go to /var/vols/itom/<application namespace>/itom-log/di/receiver/__itom-di-receiver-dpl-<pod


value> location and check the OPTIC DL HTTP Receiver receiver-out.log for errors. From the log information check
the exact cause of the issue and the remediation steps.​

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 304
AI Operations Management - Containerized 24.4

1.16.5.15. Vertica dashboard queries


failfrequently during daily aggregation phase
The ITOM DI/Vertica dashboard queries fail often during the daily aggregation phase. The following error appears in the vert
icapromexporter log:
Unable to getTableCounts Error: [57014] Query canceled while waiting for resources

Run the following command to check the log file:


kubectl -n <suite namespace> logs itomdimonitoring-verticapromexporter-<pod value> -c itomdimonitoring-verticapromexporter

Cause
This issue is because the ITOM DI/Vertica dashboard needs a memory resource pool to display metrics on the dashboard.
The dashboard runs the queries to get data from the Vertica database. By default, the dashboard uses the general resource
pool. Due to increased load in the general resource pool, the queries from the dashboard get rejected and metric collection
isn't complete. The dashboard appears blank for that period. To avoid this issue you must create a dedicated resource pool
for the dashboard.

Solution
Perform the following steps to resolve this issue:

1. Log on to the Vertica system as the dbadmin.


2. Run the following queries to create a resource pool for the ITOM DI/Vertica dashboard on user tables:
CREATE RESOURCE POOL <resource pool name for dashboard > MEMORYSIZE '<memory size>';
GRANT USAGE ON RESOURCE POOL <resource pool name for dashboard > to <read-only user name> WITH GRANT OPTION;
For example:
CREATE RESOURCE POOL itom_monitor_respool_provider_default MEMORYSIZE '1G';
GRANT USAGE ON RESOURCE POOL itom_monitor_respool_provider_default to vertica_rouser WITH GRANT OPTION;
3. Update the values YAML file with the parameter to include the resource pool for monitoring:

monitoring
verticapromexporter
config
monitoringResourcePool: <resource pool name for dashboard>

For example:

monitoring
verticapromexporter
config
monitoringResourcePool: itom_monitor_respool_provider_default

4. Run the helm upgrade command as follows:


helm upgrade <deployment name> -n <suite namespace> -f <values.yaml> <suite deploy chart with version.tgz>
5. Run the following command to restart the verticapromexporter pod:
kubectl delete pod itomdimonitoring-verticapromexporter-<pod value> -n <suite namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 305
AI Operations Management - Containerized 24.4

1.16.6. How to's


This section covers frequently asked questions that can help resolve the OPTIC Data Lake data flow issues.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 306
AI Operations Management - Containerized 24.4

1.16.6.1. How to check the Vertica tables


This section helps you to check the Vertica tables.

1. Log on to the Grafana dashboards.


2. Click the <namespace> folder where you have deployed OPTIC DL.
3. Select the ITOM DI/Vertica dashboard.
4. Check the Rows added per table panel. This panel displays the rate at which the rows are getting added to the
Vertica tables.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 307
AI Operations Management - Containerized 24.4

1.16.6.2. How to check if OPTIC DL Message Bus


topics and data is created
This section helps you to verify the creation of OPTIC DL Message Bus topics and data.

1. Run the following command to get the Pulsar bastion pod:

kubectl get pods -n <application namespace>

Note down the bastion pod name.


2. Run the following command to log on to the bastion container:

kubectl -n <application namespace> exec -ti <bastion pod-<POD value>> -c pulsar -- bash

3. Run the following command to list the topics:

./bin/pulsar-admin topics list public/default

The topics list appears. This confirms the creation of topics. You must note down the topic name.
For example, if the output of the command is: "persistent://public/default/di_task_status_topic-partition-0" , you must
note down the topic name: di_task_status_topic .
4. Run the following command to check the topic details:

./bin/pulsar-client consume -s test-subscription --subscription-mode NonDurable -n 0 <topic name>

For example:

./bin/pulsar-client consume -s test-subscription --subscription-mode NonDurable -n 2 di_task_status_topic

The output appears with the ----- got message ----- separator. This confirms the creation of messages with the data.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 308
AI Operations Management - Containerized 24.4

1.16.6.3. How to check the OPTIC DL Message


Bus pod communication
This section helps you to check the communication between OPTIC DL Message Bus producers and consumers.

Perform the following steps:

1. Run the command to get the broker pod: kubectl get pods -n <suite namespace> . Note down the broker pod name.
2. Run the command to log on to the broker container: kubectl -n <suite namespace> exec -ti <broker pod-<POD value>> -c ito
mdipulsar-broker bash
3. Run the command to list the topics: ./bin/pulsar-admin topics list public/default . Note down the topic name.
4. Run the command to consume the message from the topic: ./bin/pulsar-client consume -s test-subscription --subscription-mod
e NonDurable -n 0 <topic name>
5. Open a new session and repeat steps 1 and 2.
6. Run the command to send the message to the topic: bin/pulsar-client produce -m hi <topic name> . The message hi appears
in the consumer session.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 309
AI Operations Management - Containerized 24.4

1.16.6.4. How to recover OPTIC DL Message Bus


from a worker node failure
If you have deployed OMT with its default local storage provisioner and there is a worker node failure, follow this topic to
recover the OPTIC DL Message Bus from the worker node failure.

When the worker node is in an unrecoverable state, the OPTIC DL Message Bus itomdipulsar-autorecovery , itomdipulsar-bastion ,
itomdipulsar-bookkeeper , and itomdipulsar-zookeeper pods running on a Kubernetes worker node will be in an error state
(pending/terminating). Even when one of the pods is in an error state, the data ingestion and consumption to and from the
OPTIC DL Message Bus continues to work. However, there may be a performance degradation as the load needs to be
handled by a cluster that has one less node and the OPTIC DL Message Bus pod. To restore the Kubernetes cluster to its
original capacity, you must add a new worker node with a local disk provisioned manually. For more information, see the
Install section.

For the application deployed on embedded Kubernetes, follow these steps to recover the OPTIC DL Message Bus pods from
the error state to the running state:

1. Run the following commands to delete the itomdipulsar-autorecovery-0 and itomdipulsar-bastion-0 pods that are in a
terminating state:
kubectl -n <application namespace> delete pod itomdipulsar-autorecovery-0 --force
kubectl -n <application namespace> delete pod itomdipulsar-bastion-0 --force
2. Run the following command to delete the itomdipulsar-zookeeper-<pod value> pods and the attached PVC that are in a
terminating state:
kubectl delete pod itomdipulsar-zookeeper-<pod value> -n <application namespace> --force ; kubectl -n <application namespace>
delete pvc itomdipulsar-zookeeper-zookeeper-data-itomdipulsar-zookeeper-<pod value> --force
3. Run the following command to delete the itomdipulsar-zookeeper-<pod value> pods that are in pending state:
kubectl delete pod itomdipulsar-zookeeper-<pod value> -n <application namespace> --force
4. Run the following command to verify if the itomdipulsar-zookeeper-<pod value> pods are running in the new node:
kubectl get pods -n <application namespace> -o wide
5. Log on to the itomdipulsar-bastion-0 pod:
kubectl exec -it itomdipulsar-bastion-0 -n <application namespace> -c pulsar bash
6. Run the following command to list the bookie IDs and get the ID of the problematic bookie:
/pulsar/bin/pulsar-admin bookies list-bookies
Note down the ID of the bookie.
7. Run the following command to decommission the problematic bookie:
./bin/bookkeeper shell decommissionbookie -bookieid <ID noted in step 7>
8. Run the following command to check if the bookie is removed:
/pulsar/bin/pulsar-admin bookies list-bookies
9. Run the following command to delete the itomdipulsar-bookkeeper-<pod value> pods and the attached PVC that are in a
terminating state:
kubectl delete pod itomdipulsar-bookkeeper-<pod value> -n <application namespace> --force ; kubectl -n <application namespace>
delete pvc itomdipulsar-bookkeeper-journal-itomdipulsar-bookkeeper-<pod value> --force ; kubectl -n <application namespace> dele
te pvc itomdipulsar-bookkeeper-ledgers-itomdipulsar-bookkeeper-<pod value> --force
10. Run the following command to delete the itomdipulsar-bookkeeper-<pod value> pods that are in pending state:
kubectl delete pod itomdipulsar-bookkeeper-<pod value> -n <application namespace> --force
11. Run the following command to check if the itomdipulsar pods are running:
kubectl get pods -n <application namespace> -o wide| grep -i pulsar
The list of pods appears with Status as Running or Completed.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 310
AI Operations Management - Containerized 24.4

1.16.6.5. How to check connectivity between


Vertica node and OPTIC DL Message Bus Proxy
services
The OPTIC DL Vertica plugin errors display connectivity issues between the OPTIC DL Vertica plugin to OPTIC DL Message Bus
Proxy. The connectivity errors information are in vertica.log or dbLog files in the Vertica nodes.

Cause
This issue is because the ports aren't open or due to the connectivity issues between the Vertica node and OPTIC DL Message
Bus Proxy.

Solution
You can use the disystemcheck tool to check the connectivity. Perform the following steps:

Note

In a managed Kubernetes deployment, a reference to the master node in this topic implies bastion
node.

1. The disystemcheck tool is available when you install the OPTIC DL Vertica plugin. It's available in the /usr/local/itom-di-puls
arudx/bin/ folder. If you are testing on any other node, you must copy this tool to that node. Go to the /usr/local/itom-di-pul
sarudx/bin/ folder on the Vertica node.
2. Run disystemcheck tool as follows:
./disystemcheck -h <master hostname>
where:
-h : is the hostname of the Kubernetes master or worker that you want to check.

Note

If the OPTIC DL service ports are different than the default ones, use the disystemcheck CLI options, to specify the
ports.

Following is the disystemcheck sample output

[verticadba@vhost1 testCerts]$ ls
disystemcheck server.crt server.key
[verticadba@vhost1 testCerts]$ /usr/local/itom-di-pulsarudx/bin/disystemcheck -h demosys-master-node.net
systemcheck version 1.0
[SUCCESS] Receiver port open (30001)
[SUCCESS] MINIO port open (30006)
[SUCCESS] Pulsar Admin service operational
[SUCCESS] OpticDL Administration port open (30004)
[SUCCESS] OpticDL Administration service operational
[SUCCESS] Data Access port open (30003)
[SUCCESS] Data Access service operational
[SUCCESS] Receiver service operational
[SUCCESS] Published Pulsar message
[SUCCESS] Received Pulsar message

3. If there are issues in connectivity or port availability, the tool gives error messages that you must fix. Following is the dis
ystemcheck sample output with errors:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 311
AI Operations Management - Containerized 24.4

[verticadba@vhost1 testCerts]$ ./disystemcheck -h my-master-host


systemcheck version 1.0
[FAILED] Data Access port open (30003) https://my-master-host:30003 Get "https://my-master-host:30003": dial tcp: lookup my-m
aster-host on 11.123.45.250:53: no such host
[FAILED] Data Access service operational https://my-master-host:30003/urest/v1/itom-data-ingestion-store/dataSetMetadata Get "
https://my-master-host:30003/urest/v1/itom-data-ingestion-store/dataSetMetadata": dial tcp: lookup my-master-host on 11.123.45
.250:53: no such host
[FAILED] Receiver port open (30001) https://my-master-host:30001 Get "https://my-master-host:30001": dial tcp: lookup my-mast
er-host on 11.123.45.250:53: no such host
[FAILED] MINIO port open (30006) https://my-master-host:30006/minio/prometheus/metrics Get "https://my-master-host:30006/mi
nio/prometheus/metrics": dial tcp: lookup my-master-host on 11.123.45.250:53: no such host
[FAILED] Pulsar Admin service operational https://my-master-host:31001/admin/v2/clusters Get "https://my-master-host:31001/ad
min/v2/clusters": dial tcp: lookup my-master-host on 11.123.45.250:53: no such host
[FAILED] OpticDL Administration port open (30004) https://my-master-host:30004 Get "https://my-master-host:30004": dial tcp: lo
okup my-master-host on 11.123.45.250:53: no such host
[FAILED] OpticDL Administration service operational https://my-master-host:30004/urest/v2/itom-data-ingestion-administration/da
taSetConfiguration Get "https://my-master-host:30004/urest/v2/itom-data-ingestion-administration/dataSetConfiguration": dial tcp
: lookup my-master-host on 11.123.45.250:53: no such host
[FAILED] Receiver service operational https://my-master-host:30001/receiver Post "https://my-master-host:30001/receiver": dial tc
p: lookup my-master-host on 11.123.45.250:53: no such host
[FAILED] Put "https://my-master-host:31001/admin/v2/persistent/public/default/nooptesttopic/partitions": dial tcp: lookup my-mast
er-host on 11.123.45.250:53: no such host
WARN[0002] [Failed to connect to broker.] error="dial tcp: lookup my-master-host on 11.123.45.250:53: no such host" r
emote_addr="pulsar+ssl://my-master-host:31051"
WARN[0003] [Failed to connect to broker.] error="dial tcp: lookup my-master-host on 11.123.45.250:53: no such host" r
emote_addr="pulsar+ssl://my-master-host:31051"
..,
WARN[0028] [Failed to connect to broker.] error="dial tcp: lookup my-master-host on 11.123.45.250:53: no such host" r
emote_addr="pulsar+ssl://my-master-host:31051"
WARN[0053] [Failed to connect to broker.] error="dial tcp: lookup my-master-host on 11.123.45.250:53: no such host" r
emote_addr="pulsar+ssl://my-master-host:31051"
[FAILED] to create Pulsar producer connection error

disystemcheck tool usage


This tool checks the external port access to the OPTIC DL pods. Additionally, it tests the pods by sending a request to them.
The disystemcheck tool is available when you install the OPTIC DL Vertica plugin. It's available in the /usr/local/itom-di-pulsarudx/
bin/ folder.

Log on to the Vertica node where you have installed the RPM. Go to the location /usr/local/itom-di-pulsarudx/bin and run the
command . /disystemcheck --help to see usage.

Following are the parameters and the description:

Parameter Description Default

-a, --admin itom-di-administration port. If it's different than the default, type the port number. 30004

--checkminio To check itom-di-minio port.

--checkreceiver To check itom-di-receiver-dpl port.

-d, --dataaccess itom-di-data-access-dpl port. If it's different than the default, type the port number. 30003

-h, --host The hostname of the Kubernetes master or worker to test.

-m, --minioport itom-di-minio port. If it's different than the default, type the port number. 30006

-p, --pulsaradmin OPTIC DL Pulsar Administration port. If it's different than the default, type the port number. 31001

-t, --pulsarclient itomdipulsar-proxy port. If it's different than the default, type the port number. 31051

-r, --receiver itom-di-receiver-dpl port. If it's different than the default, type the port number. 30001

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 312
AI Operations Management - Containerized 24.4

1.16.6.6. How to verify the OPTIC DL Vertica


Plugin version after reinstall
If you reinstall the OPTIC DL Vertica Plugin RPM, you must make sure to install the required OPTIC DL Vertica Plugin RPM. You
must install the RPM on only one node. It's recommended to install on the Vertica cluster master node. However, if you have
installed RPM on any of the nodes in the cluster, make sure to check all the nodes.

Perform the following steps on the Vertica node to verify the RPM version is the same as the rpm installed to the database:

Step 1 - Get the installed RPM version


Perform any of the following steps to get the RPM version:

Run the following command:


rpm -qa itom-di-pulsarudx
You will see a similar output with the version as follows:
itom-di-pulsarudx-2.1.0-031.x86_64

OR

When the OPTIC DL Vertica Plugin processes data, a history record gets written to the database. This record also
displays the version of the OPTIC DL Vertica Plugin. Log on to the Vertica database and run the following query:
select ENDING_MSG from itom_di_scheduler_default_default.microbatch_history limit 1;
You will see a similar output with the version as follows:

ENDING_MSG

---------------------

UDX 2.1 . 0 - 031 : ...

Step 2 - Check the library


Run the following commands to get the library that's installed from the OPTIC DL Vertica Plugin:

1. cd /usr/local/itom-di-pulsarudx/lib
2. sha1sum libitom-di-pulsarudx- 9.2 . 1 .so
An output similar to the following appears:
d5b89107716658c38b74d8bbf8e4c7fbc34a7142 libitom-di-pulsarudx-9.2 . 1 .so
In this output, the is the d5b89107716658c38b74d8bbf8e4c7fbc34a7142 library version.

Step 3 - Check the library in the Vertica catalog directory


Run the following commands to get the library from the Vertica catalog directory:

1. Log on to the Vertica system as the dbadmin user.


2. Run the following command:
admintools -t list_db -d <database name>
3. From the output, note down the Catalog Directory location
4. Run the following commands:
cd <Catalog Directory location >/Libraries
sha1sum */PulsarSourceLib*
An output similar to the following appears:
d5b89107716658c38b74d8bbf8e4c7fbc34a7142 026d604cf6e97db86b456649c5bd34f900a000000432a1b2/PulsarSourceLib_026d60
4cf6e97db86b456649c5bd34f900a000000432a1b2.so
In this output, the is the d5b89107716658c38b74d8bbf8e4c7fbc34a7142 library version.

The output d5b89107716658c38b74d8bbf8e4c7fbc34a7142 is the same for both. This confirms the currently installed version of

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 313
AI Operations Management - Containerized 24.4

the RPM library.

Step 4 - Uninstall the wrong RPM version and clear the dbinit.sh

history
If the output is different, you must uninstall the OPTIC DL Vertica Plugin. Perform the following steps:

1. Run the following command:


vsql -U verticadba -f /usr/local/itom-di-pulsarudx/sql/uninstall.sql
2. Log on to the Vertica database and run the following query:
DROP LIBRARY
3. Run the following commands from the admintools to restart the database:
admintools -t stop_db -p <dbadmin password> -d <database name> -F
Wait for the database to stop.
admintools -t start_db -d <database name>
4. Run the following commands as the root user to reset the dbinit.sh history. This will allow you to install the OPTIC DL
Vertica Plugin from the start with the required parameter values:
for i in $HOME/.dbinit_env_vars.*
mv $i $i.old

You can now install the OPTIC DL Vertica Plugin with the same user names used before. For more information, see the
Prepare section.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 314
AI Operations Management - Containerized 24.4

1.17. Troubleshoot Hyperscale Observability


This section provides you with the steps to troubleshoot issues that you may face while using the Hyperscale
Observability capability.

Logging details
Category Log File Name Log Location

Collection-manager collection-manager.log <log-vol>/cloud-monitoring/collection-manager/<pod_name>/

Discovery aws-collector.log <log-vol>/cloud-monitoring/aws/discovery/collector/<pod_name>/

job-fetcher.log
Discovery <log-vol>/cloud-monitoring/aws/discovery/job-fetcher/<pod_name>/

Discovery result-processor.log <log-vol>/cloud-monitoring/aws/discovery/result-processor/<pod_name>/

Job-scheduler scheduler.log <log-vol>/cloud-monitoring/job-scheduler/<pod_name>/

aws-collector.log
Metric <log-vol>/cloud-monitoring/aws/metric/collector/<pod_name>/

Metric job-fetcher.log <log-vol>/cloud-monitoring/aws/metric/job-fetcher/<pod_name>/

Metric result-processor.log <log-vol>/cloud-monitoring/aws/metric/result-processor/<pod_name>/

Monitoring-admin application.log <log-vol>/cloud-monitoring/monitoring-admin/<pod_name>/

Threshold-processor threshold-processor.log <log-vol>/cloud-monitoring/aws/threshold-processor/static/<pod_name>/

Find the path of log volumes


Find the NFS servers and the path to which log volume ( opsb-logvolumeclaim) is mapped using the following steps:

1. List down the PVC's for suite namespace and look for opsb-logvolumeclaim:

kubectl get pvc -n <suite-namespace>

# kubectl get pvc -n opsb-helm


NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
opsb-configvolumeclaim Bound opsbvol1 10Gi RWX 7h48m
opsb-datavolumeclaim Bound opsbvol3 10Gi RWX 7h48m
opsb-dbvolumeclaim Bound opsbvol4 10Gi RWX 7h48m
opsb-logvolumeclaim Bound opsbvol5 10Gi RWX 7h48m
opsb-monitoringvolumeclaim Bound opsbvol2 10Gi RWX 7h48m
opsb-pvc-omi-0 Bound opsbvol6 10Gi RWO omistatefulset 7h48m

2. Get the NFS server and pathname:

kubectl describe pvc <PVC-NAME-for-logvolumeclaim> -n <suite-namespace> | grep Volume: | sed 's/.*Volume: *//' | xargs kubectl de
scribe pv | grep 'Server:\|Path:'

For example:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 315
AI Operations Management - Containerized 24.4

# kubectl describe pvc opsb-logvolumeclaim -n opsb-helm | grep Volume: | sed 's/.*Volume: *//' | xargs kubectl describe pv | grep
'Server:\|Path:'
Server: mycomputer.example.net
Path: /var/vols/itom/opsbvol1

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 316
AI Operations Management - Containerized 24.4

1.17.1. Performance Dashboards and events


aren't visible
On OBM, when you search for the Performance Dashboards and events, they don't show up.

Cause
The issue occurs when the Hyperscale Observability Content Pack doesn't get imported to OBM during installation.

Solution
To resolve the issue, follow the steps to manually import the Content Pack.

1. Download the Content Pack from the following location:


On Linux:
wget --no-check-certificate https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service_<m
onitoring_type>_Content_Pack_<version>.zip
On Windows:
https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service_<monitoring_type>_Content_P
ack_<version>.zip
Where < monitoring_type > is AWS, Azure, or Kubernetes.
2. On OBM user interface, go to Administration > SETUP AND MAINTENANCE > Content Packs.
3. Click Import. The Import Content Pack window appears.
4. Browse to the location where you have saved the Content Pack and then click Import. The Content Pack gets imported.
Click Close.
5. To verity the import of Content Pack, log into OBM and go to ADMINISTRATION > SETUP AND MAINTENANCE >
CONTENT PACKS. Search for OBM Content Pack - Monitoring Service < monitoring_type>.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 317
AI Operations Management - Containerized 24.4

1.17.2. Dashboard not found error on trying to


redirect monitoring Service Overview dashboard
While using the Monitoring Service Store and Forward dashboard if you try to redirect to the Monitoring Service Overview
dashboard, you find an error message: Dashboard not found.

Cause
This issue occurs due to a failure in redirection.

Solution
You can resolve this issue by going to the Monitoring Service Overview dashboard from the dashboard section.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 318
AI Operations Management - Containerized 24.4

1.17.3. UCMDB views for Hyperscale


Observability aren't available in Performance
Dashboard
UCMDB views for Hyperscale Observability aren't available in the View Explorer of OBM Performance Dashboard.

Cause
The issue occurs when the Hyperscale Observability related UCMDB views aren't deployed to OBM during installation.

Solution
To resolve the issue, manually deploy the UCMDB views corresponding to the monitoring type. For example, if AWS views are
missing, you should manually deploy the UCMDB views of the AWS service package.

Launch the RTSM UI as a desktop application from Local Client:

1. Go to Administration > RTSM Administration and click Local Client to download the Local Client tool.
2. Launch the Local Client tool.
a. Extract the UCMDB_Local_Client.zip package to a location of your choice, for example, the desktop.
b. Double-click UCMDB Local Client.cmd (Windows) or UCMDB Local Client.sh (Mac). The UCMDB Local Client window
opens.
3. Add or edit login configuration for the target OBM server that you want to access.
a. Click or . The Add/Edit Configuration dialog opens.
b. Enter the following details:
Host/IP: Specify the value provided in the values.yaml for <externalAccessHost>.
Protocol: Select HTTPS as the protocol from the drop-down list.
Port: Specify the value provided in the values.yaml for <externalAccessPort>.
Target Env: Select CMS as the target environment from the drop-down list.
c. Click OK.
4. Launch RTSM UI from the UCMDB Local Client window.
a. In the UCMDB Local Client window, click the Label value for the OBM server that you want to access. The Log
In dialog opens.
b. In the Log In dialog, enter your login parameters.
c. Click Login. The RTSM UI opens in a new window.

Deploy UCMDB views package to RTSM:

1. Download relevant service UCMDB views from the following location:


On Linux:
wget --no-check-certificate https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service_<m
onitoring_type>_UCMDB_Views.zip
On Windows:
https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service_<monitoring_type>_UCMDB_Vi
ews.zip
Where < monitoring_type > is AWS, Azure, or Kubernetes.

2. In the RTSM UI, go to Managers > Administration > Package Manager.


3. Click the button to open the Deploy Packages to Server dialog box.
4. Click the button to open the Deploy Packages to Server (from local disk) dialog box.
5. Select the UCMDB views package zip file and click Open. The package appears in the upper pane of the dialog box and
its resources appear in the lower pane.
6. Select the resources from the package that you want to deploy. All the resources are selected by default.
7. Click Deploy. A status report appears indicating whether the deployment was successful for each resource selected.
8. To verify the deployment of UCMDB views, log in to OBM and go to Workspaces > Operations
Console > Performance Perspective and search for Monitoring_Service_<monitoring_type> in the View
Explorer.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 319
AI Operations Management - Containerized 24.4

1.17.4. Performance Dashboard displays graphs


with no data
Performance Dashboard displays graphs with no data for AWS, Azure, or Kubernetes collector.

Cause
This issue occurs if you have deployed multiple Hyperscale Observability content packs with same CI types. The recently
deployed content pack overrides the previously deployed content pack. For example, if you deploy AWS content pack and
then deploy Azure or Kubernetes content pack, the Performance Dashboard for AWS displays graphs with no data.

Currently deploying multiple Hyperscale Observability content packs with same CI types isn't supported.

Solution
To resolve this issue, redeploy the Hyperscale Observability content pack for the required collector as mentioned below:

Prerequisites
Make sure that you have enabled the containerized OBM capability along with Hyperscale Observability capability.

Import Hyperscale Observability content pack for AWS into OBM


Perform the following steps to import the Hyperscale Observability content pack for AWS into OBM:

1. Download the AWS content pack from the following location:


On Linux:
wget --no-check-certificate https://<externalAccessHost>:<externalAccessHost>/staticfiles/monitoring-service/Monitoring_Service_A
WS_Content_Pack_<version>.zip
On Windows:
https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service_AWS_Content_Pack_<version>
.zip
2. On OBM user interface, go to Administration > SETUP AND MAINTENANCE > Content Packs.
3. Click Import. The Import Content Pack window appears.
4. Browse to the location where you have saved the AWS content pack and then click Import. The AWS content pack gets
imported. Click Close.

Import Hyperscale Observability content pack for Azure into OBM


Perform the following steps to import the Hyperscale Observability content pack for Azure into OBM:

1. Download the Azure content pack from the following location:


On Linux:
wget --no-check-certificate https://<externalAccessHost>:<externalAccessHost>/staticfiles/monitoring-service/Monitoring_Service_Az
ure_Content_Pack_<version>.zip
On Windows:
https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service_Azure_Content_Pack_<version
>.zip
2. On OBM user interface, go to Administration > SETUP AND MAINTENANCE > Content Packs.
3. Click Import. The Import Content Pack window appears.
4. Browse to the location where you have saved the Azure content pack and then click Import. The Azure content pack
gets imported. Click Close.

Import Hyperscale Observability content for Kubernetes into OBM

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 320
AI Operations Management - Containerized 24.4

Perform the following steps to import Kubernetes content pack into OBM:

1. Download the Kubernetes content pack from the following location:


On Linux:
wget --no-check-certificate https://<externalAccessHost>:<externalAccessHost>/staticfiles/monitoring-service/Monitoring_Service_Ku
bernetes _Content_Pack_<version>.zip
On Windows:
https://<externalAccessHost>:<externalAccessPort>/staticfiles/monitoring-service/Monitoring_Service_Kubernetes_Content_Pack_<ve
rsion>.zip
2. On OBM, go to Administration > SETUP AND MAINTENANCE > Content Packs.
3. Click Import. The Import Content Pack window appears.
4. Browse to the location where you have saved the Kubernetes content pack and then click Import.
5. The Kubernetes content pack gets imported. Click Close.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 321
AI Operations Management - Containerized 24.4

1.17.5. Discovery is failing and not discovering


any of the components for multi probe domain
The service discovery fails, doesn't discover any of the components of the service environment. The <service-name>-collector.l
og shows no discovery for resources accross all the collections.
Example:
When you run the ./ops-monitoring-ctl get collector-status, for a Kubernetes service, you get the error message:

Discovery Collection Failed for any of the 3 Kubernetes collections configured

Cause
Hyperscale Observability supports one default domain with a collector. Trying to use it with multiple domains results the
discovery to fail.

Solution
You can resolve the issue by changing the value of PROBE_DOMAIN environment variable to the DefaultDomain setting in the i t
om-ucmdb-probe deployment.

Follow these steps:

1. Edit itom-ucmdb-probe deployment


2. Find the environment variable, PROBE_DOMAIN , under the containers section.
3. Set its value to DefaultDomain by entering,
name : PROBE_DOMAIN
value : DefaultDomain .

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 322
AI Operations Management - Containerized 24.4

1.17.6. Hyperscale Observability events not


forwarding to OPTIC Data Lake
You have a scenario where AI Operations Management integrates with external OBM while Hyperscale Observability is
enabled. Hyperscale Observability sends events to OBM, but OBM does not forward them to the OPTIC opr_event table. The
events remain in the Received from... state.

Cause
The cause behind the issue is wrong hostname resolution. Due to this, OBM fails to forward events from Hyperscale
Observability to OPTIC Data Lake.

Solution
To resolve this issue, configure the Data Broker to use an aliased hostname.

Follow the steps:

Update the host files


1. Update the host files on OBM Gateway and DPS:
For Linux, edit the /etc/hosts file.
For Windows, edit the C:\Windows\System32\drivers\etc\hosts file.
2. Add an alias for the Data Broker host as follows:
<IP address> <external_accesshost>_hso.<domain> <external_accesshost>_hso
Example:
34.12.13.54 myopsb_hso.com myopsb_hso

Configure Data Broker


1. Log in to the Data Broker pod.
2. Run the following command to set the alias:
/opt/OV/bin/ovconfchg -ns xpl.net -set LOCAL_NODE_NAME <external_accesshost>_hso.<domain>

Clean up and grant certificates


1. In the Data Broker pod, clean up all existing certificates.
2. Resend a certificate request from the Data Broker pod.
3. On the OBM system, approve the certificate request for the Data Broker.

Additional configuration
You need to perform the additional steps to add the Fully Qualified Domain Name (FQDN) to your DNS.

1. Update Kube DNS Entry on Kubernetes Deployment : Run the following command to edit the ConfigMap dns-hosts-c
onfigmap in the kube-system namespace:
kubectl edit cm -n kube-system dns-hosts-configmap
2. Update the host keys: Modify the dns-hosts-key entry in the ConfigMap to include the alias name and the IP address of
externalAccesshost . The entry should follow this format:
<IP address> <external_accesshost>_hso.<domain>
Example:
34.12.13.54 myopsb_hso.com
3. Save and apply changes: After making the changes, save the ConfigMap . Kubernetes will automatically apply the
updated DNS configuration to the cluster.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 323
AI Operations Management - Containerized 24.4

1.17.7. WRONGPASS invalid username-password


or user is disabled
When you trigger the ops-monitoring-ctl get monitor-status command from the CLI, the panic error displays in ctl.

You can't see the monitoring status because of the following panic error:

panic: runtime error: invalid memory address or nil pointer dereference

[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x8200f0]

goroutine 1 [running]:

The following error displays in the monitoring-admin log located in the pod:

io.lettuce.core.RedisCommandExecutionException: WRONGPASS invalid username-password pair or user is disabled.

Cause
The monitoring-admin pod can't connect to the Redis pod.

Solution
The following are the solutions to fix this error:

You must restart the monitoring-admin pod.


Use or upgrade to the latest version of ops-monitoring-ctl .

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 324
AI Operations Management - Containerized 24.4

1.17.8. Troubleshoot AWS Hyperscale


Observability

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 325
AI Operations Management - Containerized 24.4

1.17.8.1. Deleting a credential using the CLI (ops-


monitoring-ctl) fails

Problem
Deleting a credential using the CLI fails sometimes.

Solution
Run the delete command again to delete a credential configuration:

You may run one of the following commands:

ops-monitoring-ctl delete -f <filename>

Here, <filename> is the credential configuration YAML file that you want to delete.

For example:

ops-monitoring-ctl delete -f ./input.yaml

OR

ops-monitoring-ctl delete credential -n <credential name>

For example:

ops-monitoring-ctl delete credential -n my-first-aws-cred

You can delete credentials that aren't referenced in any target.

Important

Delete the credential yaml file ( input.yaml file in the example) after deleting the
credential.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 326
AI Operations Management - Containerized 24.4

1.17.8.2. AWS events are not forwarded to OBM

Cause
The Operations Bridge Manager (OBM) pod isn't running.

Solution
1. Verify if the OBM pod is running:

AWS collector will start sending events to OBM only if the OBM pod is running:

Make sure that the omi-0 (and omi-1 in HA) pod is running.

Run the following command to see the pod status in a namespace:

kubectl get pods --namespace <suite namespace>

For example:

kubectl get pods -n opsbridge1

2. Make sure that OBM trusts the Data Broker Container. See the section Grant certificate request on OBM at Prerequisites
for using Classic OBM with Hyperscale Observability.
3. Make sure that you have deployed the MonitoringService_Threshold_Event_Mapper policy.
4. Send test data to the endpoint ( generic_event_mapper ) of the MonitoringService_Threshold_Event_Mapper policy
and verify if a corresponding event gets generated in the OBM event browser:

Run the CURL command to send test data to the policy receiver endpoint ( generic_event_mapper ):

curl -k -X POST <URL> -d @<testdatafile>

Here:

<URL> is:

From containers: https://itom-collect-once-data-broker-clusterip:30005/bsmc/rest/events/generic_event_mapper

From master or control plane nodes: https://<clusterIP>:30005/bsmc/rest/events/generic_event_mapper

<testdatafile> is the file that you have to create.

Sample test data:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 327
AI Operations Management - Containerized 24.4

<root>
<EventDetails>
<Title>Sample message to test connectivity</Title>
<Description></Description>
<MsgKey>Config1:CI1</MsgKey>
<Category></Category>
<Subcategory></Subcategory>
<ETI></ETI>
<Node></Node>
<RelatedCI></RelatedCI>
<SubComponent></SubComponent>
<SourceCI></SourceCI>
<SourceEventId></SourceEventId>
<eventDrillDownUrl></eventDrillDownUrl>
<CloseEventPattern>^Config1:CI1$</CloseEventPattern>
<SendWithClosedStatus>false</SendWithClosedStatus>
</EventDetails>
<ViolationDetails>
<Severity>Critical</Severity>
</ViolationDetails>
</root>

Save this as testdata.xml.

For example:

curl -k -X POST https://itom-collect-once-data-broker-clusterip:30005/bsmc/rest/events/generic_event_mapper -d @testdata.xml

Verify if an event gets generated in the OBM event browser with the title ''Sample message to test connectivity".

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 328
AI Operations Management - Containerized 24.4

1.17.8.3. Discovery failed because of invalid


proxy

Error message
Run the command:

ops-monitoring-ctl get collector-status -r -o yaml

You may get the following error:

Job Processing Failed. Err: GetDiscoveryTriggerStatusError ('Failed to get basic session credentials via sts services. error message:', 'com.a
mazonaws.SdkClientException: Unable to execute HTTP request: Remote host terminated the handshake

Cause
This error occurs if you haven't configured a proxy or configured a wrong proxy.

Solution
Set the correct proxy. For details, see Configure proxy credential and target.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 329
AI Operations Management - Containerized 24.4

1.17.8.4. Unable to discover AWS resources and


collect metrics
After you post the monitoring configurations, the system is unable to discover AWS resources and collect metrics from
CloudWatch. The following error message gets logged in the discovery or metric pod logs:

level=error msg="Error : [GetResourcesWithContext : RequestError: send request failed\ncaused by: Post \https://tagging.<aws_region
>.amazonaws.com/\: EOF] while fetching the Resource for Service. [service_name]"

For details about the log location, see Troubleshoot Hyperscale Observability.

Cause
This issue may occur if the proxy isn't set in the AWS target configuration.

Solution
Perform the following steps:

1. Edit the target configuration created earlier and add the proxy details under spec .
Example:

spec:
subType: aws-region
endpoint: <aws_region>
credential: <credential_name>
proxy:
url: http://<corp_proxy>:<port>/

Here,
<aws_region> is the AWS region that you want to monitor. For example, us-east-1, ap-east-1, eu-west-2 , etc.
<credential_name> is the name of the credential created earlier.
http://<corp_proxy>:<port>/ is the proxy URL that's used to connect to the internet.
2. Run the following command to update the existing target configuration:
On Linux:

./ops-monitoring-ctl update -f ./<target_configuration.yaml>

On Windows:

ops-monitoring-ctl update -f ./<target_configuration.yaml>

Here <target_configuration.yaml> is the name of the target configuration file.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 330
AI Operations Management - Containerized 24.4

1.17.8.5. Metrics collected but no records in


database

Cause 1
When all metrics collected have null values, data isn't sent to the OPTIC Data Lake

Cause 2
You changed the metric collection frequency to 1 minute but didn't enable detailed monitoring for EC2 services on AWS.

Solution 1
This is the expected behavior. The AWS collector will send data to the OPTIC Data Lake only if the collector is able to poll at
least one metric value from AWS.

This implies that for a given frequency if data is available on the target AWS, the AWS collector will fetch the same data and
send it to the database. If the AWS collector receives null values for all the metrics, it won't send data to the database.

Solution 2
AWS collector won't send data to the database if you have scheduled the collector to run at a frequency lesser than 5
minutes and haven't enabled detailed monitoring.

For monitoring at a higher frequency of 1 minute, you must subscribe to AWS CloudWatch Detailed Monitoring Metrics (at a 1
minute frequency). For details, see Enable or turn off detailed monitoring for your instances.

Make sure you factor in the high frequency polling when you plan for detailed monitoring.

The following table gives you information about the Hyperscale Observability Metric Collection Interval and the AWS
CloudWatch Metric Interval:

Hyperscale Observability Metric Collection


AWS service name AWS CloudWatch Metric Interval
Interval

Elastic Compute Cloud (EC2) 1 minute or 5 minutes based on AWS


5 minutes
service CloudWatch plan

Auto Scaling Group (ASG)


5 minutes 1 minute
service

Elastic Block Store (EBS) service 5 minutes 1 minute

Relational Database Service


5 minutes 1 minute for RDS instances
(RDS)

Simple Notification Service


5 minutes 5 minutes
(SNS)

Simple Queue Service (SQS) 5 minutes 5 minutes

Elastic Load Balancing (ELB)


5 minutes 1 minute
service

Simple Storage Service (S3) 5 minutes for buckets and requests 24 hours for buckets and 1 minute for requests

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 331
AI Operations Management - Containerized 24.4

1.17.8.6. Discovery or metric collection failed

Cause 1
The AWS collector is unable to reach the target AWS account if you haven't configured a proxy or configured a wrong proxy.

Cause 2
Wrong access key, secret key, or AssumeRole ARNs.

Cause 3
AWS account doesn't have the ReadOnlyAccess permission to connect to the CloudWatch APIs.

Cause 4
Discovery failed because of timeout

Solution 1
1. Run the command:

ops-monitoring-ctl get collector-status -r -o yaml

You are missing or using the wrong proxy, you will see the following error:

Job Processing Failed. Error: GetDiscoveryTriggerStatusError ('Failed to get basic session credentials via sts services. error message:',
'com.amazonaws.SdkClientException: Unable to execute HTTP request: Remote host terminated the handshake

2. Set the correct proxy. For details, see Configure proxy credential and target.

Solution 2
1. Run the command:

ops-monitoring-ctl get collector-status -r -o yaml

If you have configured a wrong access key, secret key, or AssumeRole ARNs, you will see the following error:

If discovery fails:

corresponding to Invalid Access key : "The security token included in the request is invalid. (Service: AWSSecurityTokenService;
Status Code: 403; Error Code: InvalidClientTokenId; Request ID: 1cd7d66a-34c9-4b19-ae72-3bb3a5780d7b;"

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 332
AI Operations Management - Containerized 24.4

corresponding to Role Arn : "User: arn:aws:iam::<account ID>:user/CloudMon_User is not authorized to perform: sts:AssumeRole
on resource: arn:aws:iam::<account ID>:role/CloudMon_EC3 (Service: AWSSecurityTokenService; Status Code: 403; Error Code:
AccessDenied; Request ID: 61064e1d-18d1-4c58-bea8-217faf6b0cb6";

If metric collection fails:


Metrics collection failed. Error: no resources received for metrics collection for Region: us-east-2'

2. Update credentials in the collector configuration:


If you used the CLI to manage your collector configuration, run the following command to update a credential
configuration:

ops-monitoring-ctl update -f <filename>

Here, <filename> is the credential configuration YAML file that you have to update. If prompted, enter the
password for the user that you configured in the Set up monitoring CLI step. After the configuration gets posted.
This change gets propagated to all running collectors using this credential.

For example:

ops-monitoring-ctl update -f ./input.yaml

Important

Delete the credential yaml file ( input.yaml file in the example) after updating the
credential.

Solution 3
1. Run the command:

ops-monitoring-ctl get collector-status -r -o yaml

If you haven't assigned a role with the ReadOnlyAccess policy to an AWS user, you may get the following errors:

[AccessDeniedException:
User: arn:aws:iam::<owner_id>:user/<User1> is not authorized to perform:
tag:GetResources\n\tstatus code: 400,

[AuthorizationError:
User: arn:aws:iam::<owner_id>:user/<User1> is not authorized to perform:
SNS:GetTopicAttributes on resource: arn:aws:sns:us-east-1:<owner_id>:SNS-TOPIC-TEST\n\tstatus
code: 403

2. Assign a role with the ReadOnlyAccess policy to an AWS user to monitor your AWS resources.

Solution 4
If discovery takes longer than the configured discovery frequency, it may timeout in the first run but will succeed in later
runs.

Run the command:

ops-monitoring-ctl get collector-status -r -o yaml

Wait for the next run, if it fails:

Adjust the discovery frequency (see, Modify frequency of discovery and metric collection)
Add stricter tags to reduce the monitored instances per collector configuration.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 333
AI Operations Management - Containerized 24.4

Note

If the " ops-monitoring-ctl get collector-status " command returns " NA " as the status, it indicates that the collector has not
run yet.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 334
AI Operations Management - Containerized 24.4

1.17.8.7. Unable to create a monitoring


configuration

Cause 1
Either the protocol, host, or port that you have specified in the URL is wrong.

Cause 2
IDM username or password is wrong.

Cause 3
The syntax in the YAML file is wrong.

Cause 4
You require a proxy to connect to the AI Operations Management server from the ops-monitoring-ctl CLI.

Solution 1
1. Make sure the URL is in the following format: https://<host>:<port>
2. Make sure that the values specified for protocol, hostname, and port are correct.
3. Run the command to set URL:

ops-monitoring-ctl config set <propertyname> [propertyvalue]


For example:
ops-monitoring-ctl config set cs.server https://123.456.1.1:443

Solution 2
Run the command to set the correct IDM username and password:

ops-monitoring-ctl config set <propertyname> [propertyvalue]

Examples:

To set the username configuration property

ops-monitoring-ctl config set cs.user


When prompted, enter the username.

To set the password configuration property

ops-monitoring-ctl config set cs.password


When prompted, enter the password.

Solution 3
You may get the following error: "mapping values are not allowed in this context"

Use any online YAML validator tool to check and correct the syntax of the yaml file.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 335
AI Operations Management - Containerized 24.4

Solution 4
Use the ops-monitoring-ctl CLI from a server that has direct access to the AI Operations Management.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 336
AI Operations Management - Containerized 24.4

1.17.8.8. Multiple CIs with same name in uCMDB


and PD Views

Cause 1
AWS allows multiple AWS resources to have the same name. If you have defined multiples resources on AWS running with
the same name then you would see corresponding CIs with the same names.

Cause 2
When AWS resources are terminated, sometimes uCMDB doesn't delete the corresponding CI. This happens if the uCMDB
probe restarts unexpectedly and the AWS resource gets terminated before the next scheduled discovery run (By default, it is
one hour).

Solution 1
The CIs are not duplicate and represent actual running instances on AWS. This is the expected behavior.

Solution 2
The terminated CI (that are not deleted in uCMDB) will not be displayed in PD views as the views filter for the attribute
MonitoredBy=MonitoringService in CIs. The AWS collector removes the attribute MonitoredBy=MonitoringService in
CIs that are terminated.

After the default aging period, the terminated CIs are deleted automatically in uCMDB.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 337
AI Operations Management - Containerized 24.4

1.17.8.9. Events are triggered incorrectly

Cause
The threshold configurations, or the JsonLogic expression, or the classifications that you have used to generate events are
wrong.

Solution
Before deploying a newly created threshold configuration, use the ms-helper-util tool to test the threshold
configurations and the JsonLogic expressions.

For details, see Validate threshold configurations.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 338
AI Operations Management - Containerized 24.4

1.17.8.10. No events in the event browser

Cause
The Data Broker Container is not able to connect to the Operations Bridge Manager (OBM)

Solution
Try to post a sample data to databroker and see events are getting generated:

Here is the sample xml:

<root>
<EventDetails>
<Title>Sample message to test connectivity</Title>
<Description></Description>
<MsgKey>Config1:CI1</MsgKey>
<Category></Category>
<Subcategory></Subcategory>
<ETI></ETI>
<Node></Node>
<RelatedCI></RelatedCI>
<SubComponent></SubComponent>
<SourceCI></SourceCI>
<SourceEventId></SourceEventId>
<eventDrillDownUrl></eventDrillDownUrl>
<CloseEventPattern>^Config1:CI1$</CloseEventPattern>
<SendWithClosedStatus>false</SendWithClosedStatus>
</EventDetails>
<ViolationDetails>
<Severity>Critical</Severity>
</ViolationDetails>
</root>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 339
AI Operations Management - Containerized 24.4

1.17.8.11. Modifications to default threshold


configuration files get overridden

Cause
Modifications to default threshold configuration files get overridden if you restart the monitoring-admin pod.

Solution
Don't modify the default threshold configuration files. You can either save the default thresholds with a different name and
then modify them, or create new thresholds. For details, see Create your own thresholds for Hyperscale Observability
collectors.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 340
AI Operations Management - Containerized 24.4

1.17.8.12. No Hyperscale Observability


dashboards appear when a CI is selected

Cause
Change in the default dashboards. The default dashboards will change if you deploy a Management Pack after deploying
Hyperscale Observability content pack.

Solution
Follow the steps:

1. Log in to the OBM.


2. Go to Administration > Operations Console > Performance Dashboard Mapping.
3. Select the required CI Type.
4. The right pane of the dashboard should display the MonitoringService_AWS_<ServiceName>_Overview as the
'Default' dashboard. If it's not set as the default dashboard then select it as the default dashboard.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 341
AI Operations Management - Containerized 24.4

1.17.8.13. Unable to collect specific metrics for


ECS
When you use ECS/<metric name> to collect a specific ECS metric, no collection happens in the AWS collector.

Cause
Not known.

Solution
You can resolve the issue by using the '*' wildcard, like ECS/*.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 342
AI Operations Management - Containerized 24.4

1.17.8.14. A few widgets in the Performance


Dashboards don't have data

Cause
You may have configured Hyperscale Observability to collect only specific metrics.

Solution
Follow the steps:

Note

Certain metrics occur only for a specific configuration of instances in AWS. For example: If an EBS volume is created, only a few
common metrics are collected for all the volume types and few metrics are collected for specific types of volumes.

1. Run the following command to check the list of services that you are monitoring or the metrics that you are collecting:

./ops-monitoring-ctl get collector -n <name of the collector> -o yaml

You will see a list of services under the metricConfig section along with the metrics (if defined) which are required for
the collector. For a list of metrics that are required by Performance Dashboards, see Metrics available for
visualization on Performance Dashboard (PD) section on the AWS collector configuration page.
2. Update the collector configuration to include the metrics required for PD visualization, see AWS collector configuration.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 343
AI Operations Management - Containerized 24.4

1.17.9. Troubleshoot Azure Hyperscale


Observability
This section covers the following troubleshooting scenarios:

PT Dashboard doesn't come up

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 344
AI Operations Management - Containerized 24.4

1.17.9.1. error loading graphQL mapping file for


service: datafactoryservices
After you upgrade from AI Operations Management 2023.05 patch 2 to 24.1, the Azure discovery fails. You may face this
issue when the Azure collector is running before starting the upgrade.

You will see an error message like the following:

time="2024-01-16T09:22:31Z" level=error msg="error loading graphQL mapping file for service: datafactoryservices" container=azure
-collector file="service_mapping.go:205" func="mapping.LoadGraphQLMappingDynamically()"

time="2024-01-16T09:22:31Z" level=error msg="error loading graphQL mapping file for service: servicebus" container=azure-collector
file="service_mapping.go:205" func="mapping.LoadGraphQLMappingDynamically()"

time="2024-01-16T09:22:31Z" level=error msg="error loading graphQL mapping file for service: datalakestores" container=azure-colle
ctor file="service_mapping.go:205" func="mapping.LoadGraphQLMappingDynamically()"

Cause
The discovery fails because of error while loading the GraphQL mapping file for discovery service.

Solution
1. Go to the config volume NFS location. For example, /var/vols/itom/<opsbvol1>/azure-collector/content.
2. Enter the below commands to remove these 3 topic mapping files from the above folder:

rm -rf topic_mapping_servicebus.json
rm -rf topic_mapping_datafactoryservices.json
rm -rf topic_mapping_datalakestores.json

3. Restart the Azure discovery pod by running the command:

kubectl delete pod <pod_name>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 345
AI Operations Management - Containerized 24.4

1.17.9.2. The resource type could not be found in


the namespace 'Microsoft.DataLakeStore' for api
version '2016-11-01
Azure service discovery happens partially. When you try to view the monitoring status by running the command:

ops-monitoring-ctl get ms

You will see the following:

...
azur1e [azure] ENABLED discovery recurring Discovery Collection Partially Completed on 01 Oct
24 11:07 IST

When you try to view the Azure service monitoring status by running the command:

ops-monitoring-ctl get ms -o yaml -n <azure collector config name>

You will see the following:

....
state: |-
Resources discovered Partially
...

Cause
This issue occurs when the Discovery Collector tries to discover a deprecated Microsoft Azure service.

Solution
You can safely ignore this error.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 346
AI Operations Management - Containerized 24.4

1.17.9.3. PT Dashboard doesn't come up


When you start the performance dashboard (PT) for the first time, it doesn't come up.

Cause
The underlying APIs that fetch the data for PT graphs fail.

Solution
Refresh the browser.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 347
AI Operations Management - Containerized 24.4

1.17.10. Troubleshoot Kubernetes Hyperscale


Observability
This section covers the following troubleshooting topic:

Kubernetes collector triggers false events with major severity


Kubernetes collection fails due to hostname verification failure

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 348
AI Operations Management - Containerized 24.4

1.17.10.1. Events for Kubernetes infrastructure


objects aren't displaying in OBM Event Browser

Cause
The events aren't triggered due to a known issue with default static thresholds for Kubernetes infrastructure objects.

Solution
To resolve this issue, follow these steps:

1. Log in to OBM and go to Administration > Monitoring > Policy Templates.


2. Select Template by Type > Events > Event from REST Web Service.
3. In the middle pane, expand MonitoringService_Threshold_Event_Mapper and select the version.

4. Click Duplicate to create a duplicate of the MonitoringService_Threshold_Event_Mapper policy and rename


it. You'll have to update this policy in the next step. Make sure that you update duplicated policy and not the original
policy.
5. Click Edit and go to Source tab.
6. Specify the Path as k8s_threshold_event_mapper.
7. Click Save.
8. Click Assign and Deploy. The Assign and Deploy window opens.
9. In the Configuration Item tab, select the configuration item itom-monitoring-service-data-broker-svc .
10. Click Assign.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 349
AI Operations Management - Containerized 24.4

1.17.10.2. Kubernetes collector triggers false


events with major severity
Kubernetes events with Major severity are triggered even if a pod metric doesn't cross the defined threshold. You will see
this issue with the following Kubernetes pod related metrics:

pod_limits_cpu_cores
pod_limits_mem_b

Cause
This issue occurs if the value of pod_limits_cpu_cores or pod_limits_mem_b metric is set to zero.

Solution
To resolve this issue, you must omit ootb-kubernetes-pods-cpu-util-vs-cpu-limits and ootb-kubernetes-pods-mem-util-vs-
mem-limits thresholds from the Kubernetes collector configuration. Follow these steps:

1. Edit the Kubernetes collector configuration file to remove ootb-kubernetes-pods-cpu-util-vs-cpu-limits and ootb-kubernetes-po
ds-mem-util-vs-mem-limits thresholds:
Example:

apiVersion: core/v1
type: collector
metadata:
tenant: public
namespace: default
name: <unique_collector_name>
displayLabel: <display_label>
description: <description>
spec:
subType: k8s
enabled: <true or false>
targets:
- <k8s-target_name>
thresholds:
- ootb-kubernetes-daemonset-misscheduled-count
- ootb-kubernetes-pods-status-phase-window-base
- ootb-kubernetes-clusters-cpu-util-vs-cpu-allocatable
- ootb-kubernetes-namespaces-cpu-util-vs-cpu-limits
- ootb-kubernetes-nodes-mem-util-mem-allocatable
- ootb-kubernetes-daemonset-current-vs-desired-scheduled-daemon-pod
- ootb-kubernetes-nodes-cpu-util-vs-cpu-allocatable
- ootb-kubernetes-nodes-memory-pressure-status
- ootb-kubernetes-pods-status-phase
- ootb-kubernetes-nodes-pid-pressure-status
- ootb-kubernetes-nodes-disk-pressure-status
- ootb-kubernetes-pvc-status-phase-window-base
- ootb-kubernetes-pv-status-phase-window-base
- ootb-kubernetes-nodes-net-unavailable-status
- ootb-kubernetes-namespaces-mem-util-vs-mem-limits
- ootb-kubernetes-clusters-mem-util-vs-mem-allocatable
- ootb-kubernetes-pvc-status-phase
- ootb-kubernetes-nodes-kubelet-ready-status
- ootb-kubernetes-pv-status-phase
- ootb-kubernetes-deployment-unavailable-replica-count
collectionModes:
- collectionType: pull

2. Run the following command to update the collector configuration:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 350
AI Operations Management - Containerized 24.4

ops-monitoring-ctl update -f <collector_configuration_yaml_file>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 351
AI Operations Management - Containerized 24.4

1.17.10.3. Kubernetes collection fails due to


hostname verification failure
Kubernetes collection fails with the message similar to the following:

x509: certificate is valid for kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, openshift, ope
nshift.default, openshift.default.svc, openshift.default.svc.cluster.local, 172.30.0.1, not example.net.net

Cause
This issue occurs if the endpoint specified in the Kubernetes target configuration doesn't match the subject name in the
certificate of the Kubernetes cluster that you want to monitor.

Solution
Follow these steps to resolve this issue:

1. Log in to the Kubernetes node and run the following command to get the certificate:

openssl s_client -connect [hostname]:[port] > [certificate_name].pem

2. Run the following command to list the subject name and alternate name of the saved certificate:

​openssl x509 -text -noout -in [<certificate_name].pem -certopt no_header,no_version,no_serial,no_signame,no_validity,no_issuer,


no_pubkey,no_sigdump,no_aux

Example output:

Subject: CN = <DNS>
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Subject Key Identifier:
76:5B:75:F0:B5:30:C6:D4:6F:A3:1D:7C:E7:59:70:DD:62:64:88:33
X509v3 Authority Key Identifier:
keyid:F5:F5:E1:33:86:C9:05:7A:A5:38:5A:0E:24:3C:78:09:3E:8F:2C:FA

X509v3 Subject Alternative Name:


DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:o
penshift, DNS:openshift.default, DNS:openshift.default.svc, DNS:openshift.default.svc.cluster.local, DNS:<DNS>, IP Address:<IP)A
ddress>

3. Open the target credential file and add serverName parameter. You can specify any of the subject alternative names
listed in the previous step as the value of serverName .
Example:
​Based on the example in step 2, you can specify any of the following as the serverName :
kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, openshift, openshift.default, openshift.de
fault.svc, openshift.default.svc.cluster.local .

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 352
AI Operations Management - Containerized 24.4

---
apiVersion: core/v1
type: target
metadata:
tenant: public
namespace: default
name: k8s-openshift
spec:
subType: k8s
endpoint: <endpoint_FQDN>
credential: openshift
context:
tls:
serverName: openshift
proxy:
url:myproxy.net:8080

4. Save the file.


5. Run the following command to update the collector configuration:

ops-monitoring-ctl update -f <filename>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 353
AI Operations Management - Containerized 24.4

1.17.10.4. Kubernetes Summary page displays


undefined value in MYSQL innodb graph
The Kubernetes Application Summary page displays undefined values in the MYSQL innodb data Read, Write, and Fsync
graph.

Solution
To rectify the error in InnoDB Reads, Writes and Fsync graph , perform the following steps:

1. Go to MySQL Instance Overview page (on the Kubernetes Application Summary page, select MySQL service
page and then drill down to any one instance) and navigate to the MYSQL innodb data Read, Write, and Fsync graph and
click More Actions.
2. Click Edit and expand the Predefined query dropdown and select Edit the Predefined Query.
3. In the SQL Query space, replace the existing query with the query below:

SELECT dt, data_fsyncs_count as 'Fsyncs', data_pending_fsyncs_count as 'Pending Fsyncs', data_pending_reads_count as 'Pending


Reads', data_pending_writes_count as 'Pending Writes', data_reads_count as 'Reads', data_writes_count as 'Writes' FROM ( SELE
CT TIME_SLICE(to_timestamp(timestamp_utc_s), 300) AS dt, data_fsyncs_count, data_pending_fsyncs_count, data_pending_reads_
count, data_pending_writes_count, data_reads_count, data_writes_count FROM opsb_k8sapp_mysql_innodb
WHERE
data_fsyncs_count IS NOT NULL AND
data_pending_fsyncs_count IS NOT NULL AND
data_pending_reads_count IS NOT NULL AND
data_pending_writes_count IS NOT NULL AND
data_reads_count IS NOT NULL AND
data_writes_count IS NOT NULL AND
timestamp_utc_s > (EXTRACT(epoch FROM ${${Calendar:start}}::TIMESTAMPTZ))
AND timestamp_utc_s < (EXTRACT(epoch FROM ${${Calendar:end}}::TIMESTAMPTZ))
and ${resource_name IN (${monSvcInnodbResource})}) as op1

4. Now select Run Query and then click Save.

Note

After running the query if you see the error that the query results are empty, go to the Adminstration > Dashboards &
Reports > Predefined queries, search for Time Period(Calendar), click Edit and change the Defaults value from 12
hours setting to 12 months and click Save.

5. In the Visualization dropdown, remove the default Table option and select Time-series Line instead.
6. In the Time field add dt and in the Metric values field add all the remaining values from the dropdown, namely 'Fsyncs
', 'Pending Fsyncs', 'Pending Reads', 'Pending Writes’, ‘Reads’, ‘Writes’.
7. Click Close.
8. On the Kubernetes Application Summary page, click and select Save. The graph will now display the correct
values.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 354
AI Operations Management - Containerized 24.4

1.17.10.5. Kubernetes Summary page displays


wrong data in the Total Namespaces Count
widget
The widget Total Namespaces Count displays wrong data on the Kubernetes Namespaces List and Kubernetes
Cluster Instance Overview pages.

Solution
To rectify the error in the count on widget Total Namespaces Count in the Kubernetes Cluster Instance Overview
page perform the following steps:

1. Navigate to Total Namespaces Count widget (go to Kubernetes Cluster Instance Overview page, click on any
available cluster on the summary page) and click More Actions.
2. Click Edit and expand the Predefined query dropdown and select Edit the Predefined Query.
3. In the SQL Query space, replace the existing query with the query below:

SELECT dt, cnt FROM ( SELECT TIME_SLICE(to_timestamp(timestamp_utc_s), 300) AS dt, count(distinct(resource_name)) as cnt FR
OM kubernetes_namespaces
WHERE
timestamp_utc_s > (EXTRACT(epoch FROM ${${Calendar:start}}::TIMESTAMPTZ))
AND timestamp_utc_s < (EXTRACT(epoch FROM ${${Calendar:end}}::TIMESTAMPTZ))
AND ${cluster_name IN (${monSvcClusterName})} AND ${collection_policy_name IN (${monSvcK8SConfig})}
AND ${labels LIKE '%' || ((${monSvcK8SClusterTags})) || '%'} group by dt) as op1 order by dt desc limit 1

4. Select Run Query and then click Save.

5. Click More Actions and then select Refresh. The graph will now display the correct count.

Perform the same steps to rectify the wrong count on the Total Count widget in the Kubernetes Namespace List page.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 355
AI Operations Management - Containerized 24.4

1.17.11. Troubleshoot VMware Hyperscale


Observability
This section covers the following troubleshooting topics:

Find VMware Virtualization logs files


Failed to activate zone. Error : [Action: activate zone , Resource: , Status Code: 500, Request Status: Server internal
error
The ops-monitoring-ctl update command displays an error

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 356
AI Operations Management - Containerized 24.4

1.17.11.1. Find VMware Virtualization logs files


Use this topic to find the log files required for troubleshooting the discovery issue.

VMware Virtualization logs

Probe communication log


Path to access the probe communication logs:

/var/vols/itom/<log-volume>/ucmdb/probe/vcenter-probe/communicationLog/<Job-ID>/VMware VirtualCenter Topology with scope by VI


M/<.record file>

Example:

/var/vols/itom/opsbvol4/ucmdb/probe/vcenter-probe/communicationLog/633c58e2-d6e9-4aa8-b492-c39b375b30e3/VMware VirtualCent
er Topology with scope by VIM/ex_135b59795d3261db4a2a08b0b863f93f.record

Required log files: *.record

Probe log
Path to access the probe logs:

/var/vols/itom/<log-volume>/ucmdb/probe/vcenter-probe/log

Example:

/var/vols/itom/opsbvol4/ucmdb/probe/vcenter-probe/log

Required log files:

probe-error.log
RemoteProcesses.log

Server log
Path to access the server logs:

/var/vols/itom/<log-volume>/ucmdb/server/itom-ucmdb-0

Example:

/var/vols/itom/opsbvol4/ucmdb/server/itom-ucmdb-0

Required Log Files:

cmdb.reconciliation.log
cmdb.reconciliation.error.log
mam.autodiscovery.log
error.log

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 357
AI Operations Management - Containerized 24.4

Commands for log volume and job ID


Run the following commands to get the log volume and the Job ID:

Log volume

kubectl get pv

Job ID

./ops-monitoring-ctl get coll -n <collector-name> -o yaml --full

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 358
AI Operations Management - Containerized 24.4

1.17.11.2. Failed to activate zone. Error : [Action:


activate zone , Resource: , Status Code: 500,
Request Status: Server internal error
You will see the VMware Virtualization collection to fail with the following error:

time="2024-03-28T11:23:16Z" level=error msg="Failed to activate zone. Error : [Action: activate zone , Resource: , Status Code: 500,
Request Status: Server internal error, Response: {\r\n \"errorCode\" : 500,\r\n \"errorSource\" : null,\r\n \"message\" : {\r\n \"code\" :
11001014,\r\n \"parameter\" : null,\r\n \"description\" : \"Zone based discovery is not enabled.\",\r\n \"errorParametersValues\" : nu
ll,\r\n \"errorMap\" : null,\r\n \"parametrizedError\" : false\r\n },\r\n \"details\" : null,\r\n \"recommendedActions\" : null,\r\n \"nested
Errors\" : null,\r\n \"data\" : null\r\n}]" JobID=3456721c-71e1-410b-bb56-ba6b7b648b70 JobName=sg-vcenter-collector-001 JobType=di
scovery JobUnitID=84c62084-93a6-4210-9dbd-274784fae1e2 container=vcenter-discovery-collector file="zone.go:286" func="ud.Switc
hZoneState()"

Cause
This error occurs when you have not enabled the zone-based discovery in the UCMDB Web UI.

Solution
You can resolve the error by setting the zone-based discovery in the UCMDB Web UI following these steps:

Note

Only administrators have permission to enable zone-based discovery solution in UCMDB Web
UI.

Caution

Once the UCMDB Web UI zone-based discovery solution is enabled, the existing discovery running in the UCMDB Web UI will no
longer take effect and you must start using UCMDB Web UI for configuring discovery. The existing discovery configuration (apart
from the Data Flow Probe Setup) won't be migrated and needs to be re-created in the new discovery solution.

You can enable the UCMDB Web UI zone-based discovery solution from either of the following:

Enable UCMDB Web UI zone discovery from UCMDB Web UI


When you activate a zone for discovery inUCMDB Web UI for the first time, you should see a warning message informing you
about the impact on the existing discovery on the UCMDB Web UI (if any). If you are fully aware of the impact and you are
sure you want to continue, click click here to enable in the warning message to enable the UCMDB Web UI zone-based
discovery solution.

Enable UCMDB Web UI zone discovery from JMX Console


1. Access the UCMDB server JMX console.

You may have to log in with a user name (default: sysadmin) and password.

2. Locate the setSettingValue operation from the UCMDB:service=Settings Services category.

3. Provide values for the parameters as described in the table below:

Parameter Value

customerID <Customer ID> (Default: 1)

name appilog.collectors.enableZoneBasedDiscovery

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 359
AI Operations Management - Containerized 24.4

Parameter Value

value true (Default: false)

4. Click Invoke.

How to check if the UCMDB Web UI zone-based discovery solution is


enabled
Once the UCMDB Web UI zone-based discovery solution is enabled, the following changes happen:

Discovery-related components on UCMDB server switch to follow


the new discovery logic
Dispatch Manager reads the enableZoneBasedDiscovery flag and switches to follow the new discovery logic

Task Generator reads the flag and switches to follow the new discovery logic

Result Processing reads the flag and switches to follow the new discovery logic

Key words to check if the discovery works in the new discovery logic:

isNewZoneBased:true in the <UCMDBServer>/runtime/log/mam.dispatch.log file

[ZONE BASED DISPATCH] in the <UCMDBServer>/runtime/log/mam.dispatch.log file

Probe server switches to follow the new discovery logic


Probe reads the flag and restarts to switch, you should see the following in
the <DataFlowProbe>\runtime\log\WrapperProbeGw.log file:

... 41088315 [INFO ] [ProbeGW: DB Tasks Distributor] (InfrastructureSettingsManager.java:232) - current enableZoneBasedDiscovery is


true
... 41088315 [INFO ] [ProbeGW: DB Tasks Distributor] (InfrastructureSettingsManager.java:234) - enableZoneBasedDiscovery is change
d, restart probe.
...

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 360
AI Operations Management - Containerized 24.4

1.17.11.3. The ops-monitoring-ctl update


command displays an error
When updating vCenter specific collectors, the ops-monitoring-ctl update command displays an error. Error message
displayed: "Error: Invalid input file"

Solution
You can only create, enable/disable or delete a configuration using the Ops-monitoring-ctl command. Updating or modifying
collector configuration using parameters isn't currently supported. To update or modify the configuration, you must use yaml
input file.

Example:

ops-monitoring-ctl update -f <yaml_input_file>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 361
AI Operations Management - Containerized 24.4

1.18. Troubleshoot Automatic Event Correlation


This section covers possible problems that can cause the issues related to Automatic Event Correlation (AEC) and how to
troubleshoot them.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 362
AI Operations Management - Containerized 24.4

1.18.1. UIF pages take a long time to load


After you upgrade AI Operations Management, the URLs with /ui in the path take more than 20 seconds to load.

Cause
The issue occurs when the AEC content pack isn't upgraded yet.

Solution
To resolve the issue, perform the following steps to upgrade the Automatic Event Correlation (AEC) content pack:

1. Go to the scripts directory in the unzipped opsbridge-suite-chart file.


2. On the control plane node, bastion node, or the installer node depending on your Kubernetes distribution, run the reload
AecContent.sh script.
3. Give the opsbridge-suite-chart namespace and the password of the admin user when prompted.

The following is the sample output:

./reloadAecContent.sh
Enter the namespace in which Operations Bridge is running: opsbridge-helm
Please enter the password for user 'admin'
Enter password: ***********
..
Content pack was added successfully.

After you have upgraded the AEC content pack, you can launch the URLs again.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 363
AI Operations Management - Containerized 24.4

1.18.2. AEC events aren't getting processed after


an upgrade
The root-cause service doesn't process Automatic Event Correlation (AEC) events after an upgrade.

Cause
This issue occurs if the automatic upgrade of AEC schema fails. For more information, see Post-upgrade configurations.

Solution
Follow these steps to update the schema manually:

1. Run the following commands as dbadmin in the Vertica database:

SELECT MAKE_AHM_NOW();
ALTER TABLE itom_analytics_provider_default.aiops_internal_root_cause_score DROP COLUMN id CASCADE;

Example output:

dbadmin=> SELECT MAKE_AHM_NOW();


MAKE_AHM_NOW
-----------------------------------
AHM set (New AHM Epoch: 32129882)
(1 row)
dbadmin=> ALTER TABLE itom_analytics_provider_default.aiops_internal_root_cause_score DROP COLUMN id CASCADE;
ALTER TABLE

2. Execute the commands on a node where kubectl is configured to control the AI Operations Management deployment
and restart the itom-analytics-root-cause pod using the following sequence:

a. Get pod name:

kubectl get pod -A | grep itom-analytics-root-cause

b. Restart pod:

kubectl delete pod -n <opsb-namespace> <name from previous command>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 364
AI Operations Management - Containerized 24.4

1.18.3. Troubleshoot AEC pipeline


The AEC pipeline consists of 3 pods in Kubernetes:

1. itom-analytics-aec-pipeline-jm- : This pod is the Job Manager that handles Pulsar connectors and overall pipeline setup and
teardown.
2. itom-analytics-aec-pipeline-tm- : This pod is the Task Manager where the main pipeline logic runs.
3. itom-analytics-flink-controller- : This pod is the Flink controller that provides an improved Kubernetes integration to handle
upgrades and self-healing logic.

Note

The itom-analytics-aec-pipeline-jm- and itom-analytics-aec-pipeline-tm- pods constitute an Apache Flink cluster for
distributed data processing.

Temporarily scale down the AEC pipeline


At times, the AEC pipeline needs to be paused during troubleshooting. Since the controller pod itom-analytics-flink-controller-
manages part of the cluster's lifecycle on top of Kubernetes, the scale down procedure is different.

Follow the below step to scale down the AEC pipeline:

1. On a Kubernetes control plane node, execute the following command to edit the pipeline's Config Map (CM):
kubectl -n <SUITE_NAMESPACE> edit cm itom-analytics-aec-flink-cm

2. Add the following annotation under the resource's metadata's annotations:


flink.aiops.microfocus.com/suspend: "true"

3. Save the resource and wait for about 1 minute.

To undo the scale down of the AEC pipeline, edit the CM again and remove the annotation.

Access the AEC pipeline detailed logs


Although warnings and errors should always be printed to standard output, the AEC pipeline pods log only a subset of
information to standard output. Detailed information is logged to files located on the NFS, which are subject to log rotation.

The NFS path to locate the logs is:


<NFS_LOG_PATH>/itom-analytics/aec-pipeline/

For example,

If an installation selects /var/vols/itom/opsbvol1 as the NFS log volume, the NFS path is:
/var/vols/itom/opsbvol1/itom-analytics/aec-pipeline/

There are three main files in this log location:

job_manager.log: This log file will contain information about the AEC pipeline's internal checkpoints.
task_manager_0.log: This log file will contain information about the Pulsar connectors.
aec_pipeline.log: Under normal operation, this log file conta AEC pipeline start-up and shut-down information.

If the pods are failing, these log files provide additional insight into the cause.

You can edit the following file, to modify log levels:


<NFS_CONF_PATH>/itom-analytics/aec-pipeline/logback-console.xml

Increase pod memory


If the logs indicate that the processes are running out of memory, you can increase the memory. To ensure this is persisted
across upgrades, this must be done via Helm values. For more information on helm values, see Configure values.yaml.

You can add custom values in YAML file as following:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 365
AI Operations Management - Containerized 24.4

aec:
deployment:
flinkPipeline:
jobManagerResources:
memory: "512"
taskManagerResources:
memory: "2048"

The values for memory must be surrounded with double quotes and must not include units. They are assumed to be
mebibytes (MiB).

You can perform the Helm upgrade with the modified YAML file. This will ensure that the values are propagated to the
necessary Kubernetes resources.

Advanced tuning
Since the AEC pipeline runs on top of Flink, most of its configuration properties can be adjusted, except the following
properties. For more information on flink configuration properties, see Apache Flink Configuration documentation.

Service addresses and ports


Java Key or Trust Store properties
kubernetes.jobmanager.replicas
jobmanager.memory.process.size
taskmanager.memory.process.size

You can add the other properties, as described in the Apache Flink Configuration documentation, in the YAML file under aec.d
eployment.flinkPipeline.additionalConf.
For example, you can perform the tuning Akka's internal frame size by providing the below properties in the YAML file:

aec:
deployment:
flinkPipeline:
additionalConf:
akka.framesize: "20971520b"

Related topics
Configure values.yaml
Apache Flink Configuration

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 366
AI Operations Management - Containerized 24.4

1.18.4. aec_admin role doesn't exist in IDM

Cause
For a non-default UI Foundation context root, the AEC Explained content upload fails. Therefore, the aec_admin role isn't
uploaded to UI Foundation and not created in IDM.

Solution
To create the aec_admin role in IDM, ensure that the content upload to UI Foundation is successful.

Verify if UI Foundation is uploaded successfully by checking the logs of the aec-explained-ui-uploader container in the itom-anal
ytics-aec-explained deployment and then do the following, as required:

1. If the UI Foundation is uploaded successfully, do the following:

a. Access the AEC Explained UI by directly accessing UI Foundation via the URL.

b. Identify a user who is part of the Administrators group with admin rights and privileges.

Note: A user without admin rights and privileges can still access the AEC Explained UI in this release but can't
see any events or AEC data.

c. If you can see the AEC Explained UI in UI Foundation, then there was a problem with UI Foundation creating
the aec_admin role in IDM.

d. Verify the logs of the bvd-explore container in the bvd-explore deployment.

e. If you can't find the issue in the bvd-explore logs, you can create the aec_admin role manually in IDM.

2. If UI Foundation isn't uploaded successfully, do the following:

a. Verify that the bvd explore context root has a forward slash (/).

For example, a sample of the aec-explained-ui-uploader container is as follows:

2022-01-10T19:49:38.347Z INFO Using bvd explore url: https://bvd-explore:4000

2022-01-10T19:49:38.348Z INFO Using bvd explore context root: /preview

2022-01-10T19:49:38.356Z INFO Trying to delete bvd explore config files with aec- prefix

b. If the bvd explore context root has no forward slash, edit bvd-config configmap and add a forward slash to the exploreC
ontextRoot . Ensure that the bvd explore context root appears as follows:

bvd.exploreContextRoot: /preview

c. You can also set the contextRoot to its default value (/ui) by upgrading your deployment:

helm upgrade <releasename> -n <suite namespace> <chart> --reuse-values --set bvd.params.exploreContextRoot="/ui"

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 367
AI Operations Management - Containerized 24.4

1.18.5. Automatic Event Correlation Explained UI


is not visible in UI Foundation

Cause
After you launch UI Foundation, the AEC Explained UI isn't available. This can happen if you're using Internet Explorer
which isn't supported. If you're using any other browser to access the UI, you can see the AEC Explained category on the
left side under the Search icon.

Solution
If the AEC Explained category is missing, verify the aec-explained-ui-uploader container logs in the itom-analytics-aec-
explained pod. You can find the URL where the AEC Explained UI is uploaded and whether there are any upload errors. You
can restart uploading the configuration files by deleting the pod, itom-analytics-aec-explained .

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 368
AI Operations Management - Containerized 24.4

1.18.6. Automatic Event Correlation Explained UI


does not load the translation resources

Cause
You can't see the translation resources because UI Foundation can't query the localization files from the static-files-container in
the itom-opsb-resource-bundle pod.

Solution
You can check whether the pod, itom-opsb-resource-bundle is reachable or ignore the error. Then you can see the UI in the
default language, which is English.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 369
AI Operations Management - Containerized 24.4

1.18.7. Automatic Event Correlation Explained UI


does not show complete data

Cause
The AEC Explained UI shows the AEC Explained page information but without any data. This signals that the AEC
Explained back end has problems querying your Vertica database or processing data.

Solution
You can check the logs of the aec-explained-service container in the itom-analytics-aec-explained pod for any errors. If there are
no errors in the back end, you can also trigger a redeployment of the UI by deleting the itom-analytics-aec-explained pod.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 370
AI Operations Management - Containerized 24.4

1.18.8. Automatic Event Correlation Explained UI


cannot be launched from OBM

Cause
When you try to cross launch the AEC Explained UI from the OBM Event Browser, the AEC Explained UI isn't launched.
This means that the AEC Explained URL tools aren't properly configured in OBM. This can come from problems during the
AEC integration or suite upgrade.

Solution
Perform the following steps in OBM:

1. From the OBM menu, navigate to Administration > Operations Console and click Tools.

2. Click ConfigurationItem, edit Show Correlation Group Details (AEC Explained), Show Occurrence Details
(AEC Explained), and click Launch AEC Explained URL tools that are integrated with the AEC content pack.

Note

The URLs should point to your UI Foundation.

If the tools contain a setting variable such as ${setting.integrations.uif.url}/aec-overview, go to Setup and


Maintenance > Infrastructure Settings and search for "UI Foundation URL". Enter the correct URL for the setting
variable.

If the setting variable does not exist, you can edit the URL tools directly. Replace${setting.integrations.uif.url} with the
URL of your UI Foundation deployment (including the UI Foundation context root). The result should look like
this: https://my.hostname.com:443/ui/aec-overview

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 371
AI Operations Management - Containerized 24.4

1.18.9. AEC pods restart frequently


The AEC pods ( itom-analytics-aec-pipeline-jm-566b59766-vz7qm and itom-analytics-aec-pipeline-tm-7478fbcfc9-4qcxx ) continuously
restart in your AI Operations Management environment.

Solution
You must manually clear the <OPSB_DATA_VOLUME>/itom-analytics/aec-pipeline/state directory (delete Flink's state) if
Automatic Event Correlation (AEC) fails to recover from the checkpoint and keeps restarting.

AEC's Flink periodically clears the data but if disk space utilization increases (for example, 10+ GB), activate a CRON job to
restart the pipeline once a day.

kubectl -n <SUITE_NAMESPACE> patch cj itom-analytics-flink-housekeeping-job -p '{"spec": {"suspend" : false}}'

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 372
AI Operations Management - Containerized 24.4

1.18.10. Automatic Event Correlation fails

Cause
To configure Automatic Event Correlation (AEC), you set a user name while configuring the connected server (to forward
events to OPTIC Data Lake) and while configuring the endpoints using the call-analytics-datasources tool. AEC will fail if the
user name entered in the connected server for OPTIC Data Lake does not match the user name entered in the call-analytics-da
tasources tool.

Solution
Run the following command to check if there is a mismatch in the user names:

kubectl -n <suite namespace> exec <datasource registry pod> -c datasource-registry -- troubleshoot-obm-receivers.sh

For example:

kubectl -n opsbridge-helm exec itom-analytics-datasource-registry-54ff88bdc4-znq57 -c datasource-registry -- troubleshoot-obm-receivers.s


h

Sample output:

2020-09-04 13:17:37,995 INFO troubleshoot-obm-receivers:main Found 2 OBM receiver(s).


2020-09-04 13:17:38,477 INFO troubleshoot-obm-receivers:main Checking connected servers in https://myhost.mycompany.net:443, w
hich is bound to receiver with id=mambo8_receiver
2020-09-04 13:17:38,612 INFO troubleshoot-obm-receivers:main Source id=mambo8 found in OBM's connected servers' user names.
2020-09-04 13:17:38,906 INFO troubleshoot-obm-receivers:main Checking connected servers in https://myhost.mycompany.net:443, w
hich is bound to receiver with id=foobar_receiver
2020-09-04 13:17:39,036 WARN troubleshoot-obm-receivers:main Source id=foobar was NOT found in OBM's connected servers
' user names, only these were found:
[
"myhost8"
]

If there is a mismatch, correct the user name.

Related topics
To configure automatic event correlation, see the OBM Configurator Tool.
For more details about the call-analytics-datasources tool, see Configure endpoints for Automatic Event Correlation.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 373
AI Operations Management - Containerized 24.4

1.18.11. Correlation job fails with timeout error


The status of the itom-analytics-auto-event-correlation-job job displays a timeout error instead of completed when checking the
pods.

Cause
A timeout error occur in an environments that contains a larger correlation graph with 5 million or more entries in the itom_an
alytics_provider_default.aiops_internal_correlation_graph table.

Solution
To solve the issue, you need to increase the timeout of the job by adding the batch-job.timeout-minutes key to the itom-analytic
s-config config map with a value greater than 60 (default).
For example:
apiVersion: v1
data:
batch-job.timeout-minutes: "120"

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 374
AI Operations Management - Containerized 24.4

1.18.12. AEC pods display error due to


insufficient resources in Vertica's general
resource pool
The general Vertica resource pool doesn't have enough resources for the AEC queries and displays an " ERROR: Insufficient reso
urces to execute plan on pool general" on the following pods:

itom-analytics-text-clustering-server
itom-analytics-root-cause
itom-analytics-aec-pipeline-tm

Solution
Create two new resource pools for AEC by following the steps to resolve this issue:

1. Log on to the Vertica system as a dbadmin .


2. Run the following queries to create two new resource pools for AEC:

CREATE RESOURCE POOL aec_interactive_pool_provider_default MAXCONCURRENCY NONE MEMORYSIZE '<memory size>' MAXM
EMORYSIZE '<max memory size>' PRIORITY 0 RUNTIMEPRIORITY HIGH;

CREATE RESOURCE POOL aec_background_pool_provider_default MAXCONCURRENCY NONE MAXMEMORYSIZE '<max memory siz
e>' PRIORITY 0 RUNTIMEPRIORITY MEDIUM;

GRANT USAGE ON RESOURCE POOL aec_interactive_pool_provider_default to <read-write user name> WITH GRANT OPTION;

GRANT USAGE ON RESOURCE POOL aec_background_pool_provider_default to <read-write user name> WITH GRANT OPTION;

For example:

CREATE RESOURCE POOL aec_interactive_pool_provider_default MAXCONCURRENCY NONE MEMORYSIZE '10%' MAXMEMORYSIZE


'80%' PRIORITY 0 RUNTIMEPRIORITY HIGH;

CREATE RESOURCE POOL aec_background_pool_provider_default MAXCONCURRENCY NONE MAXMEMORYSIZE '15%' PRIORITY 0 R


UNTIMEPRIORITY MEDIUM;

GRANT USAGE ON RESOURCE POOL aec_interactive_pool_provider_default to vertica_rwuser WITH GRANT OPTION;

GRANT USAGE ON RESOURCE POOL aec_background_pool_provider_default to vertica_rwuser WITH GRANT OPTION;

3. Run the following command to get the helm values:

helm get values <deployment name> -n <suite namespace> > valuesBackup.yaml

4. Update the valuesBackup.yaml file with the parameters to include the resource pools for AEC:

aec:
deployment:
vertica:
aecBackgroundResourcepool: aec_background_pool_provider_default
aecInteractiveResourcepool: aec_interactive_pool_provider_default

5. Run the helm upgrade command:

helm upgrade <deployment name> -n <suite namespace> <suite deploy chart with version.tgz> -f valuesBackup.yaml

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 375
AI Operations Management - Containerized 24.4

1.18.13. itom-analytics-opsbridge-notification
pod fails with OOMKilled error
Post upgrade (from 2023.05 to 23.4), the itom-analytics-opsbridge-notification pod has an OOMKilled error and and the
aiops_correlation_event topic has a high backlog.

Cause
During the upgrade, if the post-upgrade task isn't performed on time, the root-cause service sends too many correlation
events to the Notification Container, resulting in a high backlog that blocks the component.

Solution
Follow the steps to clear the backlog:

1. Perform the post-upgrade task.


2. On pulsar bastion, check the aiops_correlation_event_internal topic backlog:
./bin/pulsar-admin topics partitioned-stats public/default/aiops_correlation_event_internal
3. If the backlogSize has a high value, clear the backlog:

Scale down the root-cause service:


kubectl -n $(helm list -A | awk '/opsbridge-suite/ {print $2}') scale deploy itom-analytics-root-cause --replicas=0

Clear the root-cause service backlog on pulsar bastion:


./bin/pulsar-admin topics clear-backlog persistent://public/default/aiops_correlation_event_internal -s itom_analytics_root-cause

Scale up the root-cause service:


kubectl -n $(helm list -A | awk '/opsbridge-suite/ {print $2}') scale deploy itom-analytics-root-cause --replicas=1

4. Scale down the Notification Container:


kubectl -n $(helm list -A | awk '/opsbridge-suite/ {print $2}') scale deploy itom-analytics-opsbridge-notification --replicas=0
5. On pulsar bastion, to clear the aiops_correlation_event backlog, get the topic subscriptions and clear each subscription
backlog:

Get the topic subscriptions:


./bin/pulsar-admin topics subscriptions public/default/aiops_correlation_event

Clear each subscription backlog:


./bin/pulsar-admin topics clear-backlog persistent://public/default/aiops_correlation_event -s <subscription_name>

6. Scale up the Notification Container:


kubectl -n $(helm list -A | awk '/opsbridge-suite/ {print $2}') scale deploy itom-analytics-opsbridge-notification --replicas=1

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 376
AI Operations Management - Containerized 24.4

1.18.14. AEC Explained UI partitions fail to load


In the AEC Explained UI, the Partitions Metrics widget displays an error on the AEC Overview and the Topology Partitions
pages. The itom-analytics-aec-explained pod displays the following error in the logs:

time="2024-03-04T19:44:59Z" level=warning msg="could not get partitions metrics" component=server error="could not get partition
s metrics from database: could not get partitions: ERROR 6999: [42V13] The library [VFunctionsLib] for the function [L ││ ISTAGG(varch
ar)] was compiled with an incompatible SDK Version [11.0.1]"

Problem
During the Vertica upgrade few packages aren't upgraded successfully.

Solution
Force install the packages by running the following command:

"admintools -t install_package -d <dbname> -p 'password' -P all --force-reinstall"

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 377
AI Operations Management - Containerized 24.4

1.18.15. Analytics pods are in CrashLoopBackOff


state
At times, the following pods may go in the CrashLoopBackOff state:

itom-analytics-ea-config-
itom-analytics-datasource-registry-

When you check their logs, you may see lines similar to the following example:

2022-05-10T13:13:58,580Z INFO base-utils:importKey Importing key into keystore …


2022-05-10T13:16:01,015Z INFO base-utils:importTrusts Importing certificates into truststore …

Cause
This issue occurs when the entropy is low.

To confirm entropy is low, use SSH to run the following command in each Kubernetes worker:

cat /proc/sys/kernel/random/entropy_avail

If the number returned by the command is below 1000 in any of the Kubernetes worker, entropy is too low.

Solution
The solution depends on your Kubernetes setup.

The Kubernetes cluster isn't an OpenShift cluster: Use the package manager of your operating system to install the rng-
tools package in each Kubernetes worker.
The Kubernetes cluster is an OpenShift cluster using Red Hat Enterprise Linux (RHEL) for the workers: Use the yum
package manager to install the rng-tools package on each Kubernetes worker.
The Kubernetes cluster is an OpenShift cluster using Red Hat CoreOS (RHCOS) for the workers: Please contact Red Hat
support to know the available options to enhance the entropy pool. You can deploy custom solutions, for example using
"haveged", but you must understand the security implications in that case.

1. Depending on your Kubernetes setup, start the service that you have added to the workers. For example, if you
installed rng-tools then execute the following commands:

systemctl start rngd

systemctl enable rngd

2. To check that the entropy pool has increased, run the following command again:

cat /proc/sys/kernel/random/entropy_avail

The reported output should be higher than before.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 378
AI Operations Management - Containerized 24.4

1.18.16. AEC topology partitions do not reflect


topology in RTSM
Some CIs in the AEC partitions may be missing or no longer in the RTSM partitions.

Cause
Since AEC relies on the topology pushed by the data flow probe to the Optic data lake, the topology partitions in AEC don't
match the RTSM.

Solution
Clear the current topology and reload it from the Optic Data Lake:

1. Delete the forwarded topology from the Optic data lake and the AEC partitions:

TRUNCATE TABLE itom_analytics_provider_default.aiops_internal_topological_mappings;

TRUNCATE TABLE itom_analytics_provider_default.aiops_internal_topology_metadata;

TRUNCATE TABLE mf_shared_provider_default.cmdb_entity_configuration_item_raw;

TRUNCATE TABLE mf_shared_provider_default.cmdb_entity_relation_raw;

2. Do a full topology sync from RTSM/UCMDB. Go to Data Flow Management > Integration Studio and click Full
Synchronization for the integration point that forwards topology to Optic Data Lake.

After 10–20 minutes, the AEC partitions reload and will be visible in the AEC Explained UI.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 379
AI Operations Management - Containerized 24.4

1.18.17. AEC pipeline pods are scaled down

Cause
The flink-controller controls the AEC pipeline pods ( itom-analytics-aec-pipeline-jm and itom-analytics-aec-pipeline-tm ). It scales
down the pods during suite upgrades and should scale them back up again. If they fail to scale back up, the flink controller
may scale them down again when manually scaling them up.

Solution
Edit the itom-analytics-aec-flink-cm configmap to delete the flink.aiops.microfocus.com/suspend: "true" line:

kubectl edit cm -n <namespace> itom-analytics-aec-flink-cm

Once the config map is saved, the flink-controller should scale up the pipeline pods. If not, scale them back up manually.

Note

itom-analytics-flink-housekeeping-job (disabled by default) scaled-down the pipeline


Before software revision 25.1, the
itom-analytics-flink-controller-pre-upgrade-hook pod was
pods on every run (every night at midnight) when the
detected. The itom-analytics-flink-controller-pre-upgrade-hook pod is kept for a few days after the job is completed. You
can also manually delete the pod to solve the issue:

kubectl delete -n <namespace> $(kubectl get pods -A -l "job-name=itom-analytics-flink-controller-pre-u


pgrade-hook" -o name)

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 380
AI Operations Management - Containerized 24.4

1.19. Troubleshoot Monitoring Service Edge


This section describes the issues related to Monitoring Service Edge.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 381
AI Operations Management - Containerized 24.4

1.19.1. OBM agent proxy selection during Edge


installation on K3s installs additional common
components
When you install only OBM agent proxy on Monitoring Service Edge on K3s, other additional components like job scheduler
and monitoring admin are also installed.

Solution
Overall scaling requirement for Edge is high. To ensure optimum resource utilization, scale down the the following pods that
aren't required for OBM agent proxy:

itom-monitoring-admin-xxxx
itom-opsbridge-cs-redis-xxxx
itom-monitoring-snf-xxxx
itom-monitoring-collection-manager-xxxx
itom-monitoring-job-scheduler-xxxx
credential-manager-xxxx
itom-postgresql-xxxx
itom-vault-xxxx
itom-resource-bundle-xxxx
itom-ingress-controller-xxxx
itom-ingress-controller-xxxx
itom-reloader-xxxx
itom-monitoring-job-scheduler-xxxx
itom-idm-xxxx

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 382
AI Operations Management - Containerized 24.4

1.20. Troubleshoot Reports


This section covers possible problems that can cause the issues related to OPTIC Reporting and how you can troubleshoot
them.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 383
AI Operations Management - Containerized 24.4

1.20.1. Metric collector fails to connect to agent


node
Metric collector needs correct proxy and port details to connect to the agent node. If metric collector fails to connect to the
agent node, then check proxy file and port file for the information.

To check the proxy and port file, do the following:

1. Run the following command to copy the proxy file and port file to data broker container:

kubectl cp <filename> <namespace>/<dbc pod name>:/var/opt/OV/conf/bbc -c itom-monitoring-service-data-broker

Example:

kubectl cp proxyFile.xml opsb/itom-monitoring-service-data-broker-6c57cd9bd-hh8xz:/var/opt/OV/conf/bbc -c itom-monitoring-ser


vice-data-broker
kubectl cp portFile opsb/itom-monitoring-service-data-broker-6c57cd9bd-hh8xz:/var/opt/OV/conf/bbc -c itom-monitoring-service-d
ata-broker

2. Go to the data broker container:

kubectl exec -it itom-monitoring-service-data-broker-6c57cd9bd-hh8xz -n opsb -c itom-monitoring-service-data-broker bash

3. To set the proxy file, run the following command:

ovconfchg -ns bbc.http -set PROXY_CFG_FILE <proxyFileName>

Example:

ovconfchg -ns bbc.http -set PROXY_CFG_FILE proxyFile.xml

4. To set the port file, run the following command:

ovconfchg -ns bbc.cb.ports -set CB_PORTS_CFG_FILE <portFileName>

Example:

ovconfchg -ns bbc.cb.ports -set CB_PORTS_CFG_FILE portFile

5. Run the following command to check the connectivity to a given agent:

/opt/OV/bin/ovcodautil -showds -n <hostname or IP address>

Example:

itom-monitoring-service-data-broker-6c57cd9bd-hh8xz:/ # ovcodautil -showds -n mycomputer.myhost.net

Sample output:

NumDataSources = 2
SCOPE
CODA

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 384
AI Operations Management - Containerized 24.4

1.20.2. Discovery authorization fails


When discovery service authorization fails with following error:

Failed to authorization token for UCMDB endpoint

OR unable to discover the node.

Cause
Authorization or node discovery may fail due to any of the following reasons:

Server may be down


Wrong URL
Wrong user name and password

Solution
To identify the reason for the failure, run the following command:

Log in to the oa-discovery-collector pod:

kubectl exec -it -c itom-monitoring-oa-discovery-collector -n $(kubectl get pods -A | awk '/itom-monitoring-oa-discovery/ {print $1, $2}')
-- bash

If the target OBM is containerized, run the following command:

curl -X POST https://obm-endpoint:<port>/ucmdb-server/rest-api/authenticate/ -H "Content-Type: application/json" --data '{"username"


:"userNameValue","password":"userPasswordValue","clientContext": 1}' --proxy "http://<proxyUrl>:<proxyPort>" -U proxyUserName:pr
oxyPassword --cert /path_to/client.crt --key /path_to/client.key --cacert /path_to/trustCert

If the target OBM is classic, run the following command:

curl -X POST https://obm-endpoint:<port>/rest-api/authenticate/ -H "Content-Type: application/json" --data '{"username":"userNameVal


ue","password":"userPasswordValue","clientContext": 1}' --proxy "http://<proxyUrl>:<proxyPort>" -U proxyUserName:proxyPassword -
-cert /path_to/client.crt --key /path_to/client.key --cacert /path_to/trustCert

Sample commands:

Without proxy and without loadbalancer:

curl -X POST https://my-obm-endpoint:443/rest-api/authenticate/ -H "Content-Type: application/json" --data '{"username":"admin","pas


sword":"admin","clientContext": 1}' -k

With proxy and without load balancer:

curl -X POST https://my-obm-endpoint:443/rest-api/authenticate/ -H "Content-Type: application/json" --data '{"username":"admin","pas


sword":"admin","clientContext": 1}' --proxy "http://my-proxyUrl:8080" -U optionalProxyUserName:optionalProxyPassword -k

With proxy and with load balancer:

curl -X POST https://my-loadbalancer-endpoint:19443/rest-api/authenticate/ -H "Content-Type: application/json" --data '{"username":"a


dmin","password":"admin","clientContext": 1}' --proxy "http://my-proxyUrl:8080" -U optionalProxyUserName:optionalProxyPassword --c
ert /path_to/client.crt --key /path_to/client.key --cacert /path_to/trustCert

If the authentication succeeds, the following output appears with the token :

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 385
AI Operations Management - Containerized 24.4

{
"token" : "eyJ0eXAiOiJKV1QiUzI1NiJ9.eyJ1bmlxdWVfc2FsdCI6InBsImV4cCI6MTY4MjUwNzc4MiwicmVwb3NpdG9yeSI6IlVDT
UR.r25PGw-_tBw-YBzu-uB78kKZ5NKvh1vaEvpO9Uu2LU8",
"isProviderCustomer" : true
}

Note down the token from the output and run the following command:

To get the CIs for OOTB TQL and containerized OBM:

curl -X POST https://<Containerized_OBM_Host_Name>:<port>/ucmdb-server/rest-api/topologyQuery -H "Content-Type: application/jso


n" --data '@/All_Node_CIs.json' -H "Authorization: Bearer eyJ0eXAiOiJKV1QiUzI1NiJ9.eyJ1bmlxdWVfc2FsdCI6InBsImV4cCI6MTY4MjUwNzc4
MiwicmVwb3NpdG9yeSI6IlVDTUR.r25PGw-_tBw-YBzu-uB78kKZ5NKvh1vaEvpO9Uu2LU8" -k

To get the CIs for OOTB TQL and classic OBM:

curl -X POST https://<Classic_OBM_Host_Name>:<port>/rest-api/topologyQuery -H "Content-Type: application/json" --data '@/All_Node_


CIs.json' -H "Authorization: Bearer eyJ0eXAiOiJKV1QiUzI1NiJ9.eyJ1bmlxdWVfc2FsdCI6InBsImV4cCI6MTY4MjUwNzc4MiwicmVwb3NpdG9ye
SI6IlVDTUR.r25PGw-_tBw-YBzu-uB78kKZ5NKvh1vaEvpO9Uu2LU8" -k

The sample output is as follows:

{
"cis" : [ {
"ucmdbId" : "4b849c138fb6bb2aa18fa4576ed",
"globalId" : "4b849c138fb6bb2aa18fa4576ed",
"type" : "ip_address",
"properties" : {
"display_label" : "1XX.1X.X.1XX",
"authoritative_dns_name" : "CI Name",
"ip_address_type" : "IPv4"
},
"attributesDisplayNames" : null,
"attributesQualifiers" : null,
"displayLabel" : null,
"label" : "IpAddress"
....
}

To get the CIs for custom TQL and containerized OBM:

curl -X POST https://<Containerized_OBM_Host_Name>:<port>/ucmdb-server/rest-api/topology -H "Content-Type: application/json" --da


ta 'Custom_TQL_Name' -H "Authorization: Bearer eyJ0eXAiOiJKV1QiUzI1NiJ9.eyJ1bmlxdWVfc2FsdCI6InBsImV4cCI6MTY4MjUwNzc4Miwicm
Vwb3NpdG9yeSI6IlVDTUR.r25PGw-_tBw-YBzu-uB78kKZ5NKvh1vaEvpO9Uu2LU8" -k

To get the CIs for custom TQL and classic OBM:

curl -X POST https://<Classic_OBM_Host_Name>:<port>/rest-api/topology -H "Content-Type: application/json" --data 'Custom_TQL_Nam


e' -H "Authorization: Bearer eyJ0eXAiOiJKV1QiUzI1NiJ9.eyJ1bmlxdWVfc2FsdCI6InBsImV4cCI6MTY4MjUwNzc4MiwicmVwb3NpdG9yeSI6IlV
DTUR.r25PGw-_tBw-YBzu-uB78kKZ5NKvh1vaEvpO9Uu2LU8" -k

The sample output is as follows:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 386
AI Operations Management - Containerized 24.4

{
"cis" : [ {
"ucmdbId" : "4b849c138fb6bb2aa18fa4576ed",
"globalId" : "4b849c138fb6bb2aa18fa4576ed",
"type" : "ip_address",
"properties" : {
"display_label" : "1XX.1X.X.1XX",
"authoritative_dns_name" : "CI Name",
"ip_address_type" : "IPv4"
},
"attributesDisplayNames" : null,
"attributesQualifiers" : null,
"displayLabel" : null,
"label" : "IpAddress"
....
}

If the authentication doesn't succeed or node aren't discovered, resolve the issue as follows:

If server is down, make sure it's up and running.


If the URL is wrong, update the endpoint with correct URL in the target configuration file.
If the user name or password is wrong, update the credential configuration file with proper credentials.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 387
AI Operations Management - Containerized 24.4

1.20.3. Troubleshoot issues related to historical


or missing data
When you have to reaggregate the historical or missing data in sysinfra or event tables you can use process_historic_raw_data.s
h script. This script copies and reaggregate the historical data in agent or agentless raw tables. You can use the same script
for event and service health data aggregation.

Cause
When the data is missing in aggregate tables.
When the data isn't copied from the raw tables to aligned tables or not aggregated.

Solution
Before you begin

Check if the data is available in the raw table for the required duration:

a. Log in to Vertica database as an admin user using VSQL client.

b. Run the following query:


select * from <raw table name> where to_timestamp (timestamp_utc_s) between '<Start time>' and '<End time>'
If query doesn't return any records, then no data is present in the raw table. Therefore you can't use the process_historic
_raw_data.sh script to retrieve the raw data in the sysinfra tables. For steps to fix this issue, see Troubleshoot System
Infrastructure Reports - Agent Metric Collector.
For information on raw table schema details, see Aligned dataset

If the query specified in the previous step returns records, download and run the process_historic_raw_data.sh script, see Script
to copy and reaggregate data for sysinfra and event tables.

Important

You must run the process_historic_raw_data.sh script as the same user that installed OMT. Running the script as a different
user will result in errors.

If the script displays an error message when uploading the content, manually upload the content. For more information,
see CLI to manage content.

After the script runs (immediately or at the scheduled time), wait for a few minutes and then verify if the missing data is
available in the sysinfra tables:

Check the historical task status for the metric that you specified when running the process_historic_raw_data.sh script.
Example:
In this example, the metric specified is avail .

./process_historic_raw_data.sh -b 2021-11-09T09:00:00+05:30 -e 2021-11-09T11:00:00+05:30 -n OpsB_SysInfra_Content_Historic -m a


vail -c All

1. Log on to the Grafana monitoring dashboards.


2. From the Home drop-down, select the dashboard from the ITOM DI folder list.
3. Select the Postload Detail dashboard.
4. From the Taskflow drop-down, select the taskflow corresponding to the metric specified when running the script. For
example, based on the example in the previous step, select opsb_sysavl_taskflow.

Note

If you have specified more than one metric, you must check all the respective
taskflows.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 388
AI Operations Management - Containerized 24.4

5. Make sure all the historical tasks are in FINISHED state. For example, based on the example in previous step, check the
state of tasks like opsb_agtnodesh_id, opsb_sysavldlyh, and opsb_sysinfra_node_hist_1h.

6. If the historical tasks are in FINISHED state, run the query to check if sysinfra table has all the records.
Example:

select count(*) from mf_shared_provider_default.opsb_sysinfra_node where TO_TIMESTAMP(timestamp_utc_s) >='2021-11-09 09:


00:00' and TO_TIMESTAMP(timestamp_utc_s) <= '2021-11-09 11:05:00';

7. Repeat the query specified in the previous step to check the records in hourly and daily tables.

If the historical tasks are in READY , SCHEDULED , DISPATCHED , or RUNNING state for a long time, follow the steps mentioned
in Aggregate table has missing or no data and System Infrastructure Availability data is missing in reports topics.

If the data is still missing in sysinfra tables, collect debug information from the process_historic_raw_data.sh.<timestamp>.log file
and contact Support.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 389
AI Operations Management - Containerized 24.4

1.20.4. Error in report widgets


quexserv.error.query.nosuch.host
The reports or widgets appear without any data and the following error appears:

quexserv.error.query.nosuch.host

Cause
This issue is because the OPTIC Data Lake Vertica database isn't connected to Operations Cloud and due to this the query for
the report doesn't run.

Solution
Follow these steps to update and validate the database connection:

1. Enter the following address on a browser:


https://<external_access_host>:<external_access_port>/ui/
The user name is admin
The password is what you entered as the admin password when installing OMT.
By default <external_access_port> is 443.

2. Open the side navigation panel and click Administration > Dashboards & Reports > Predefined Queries.

3. Click and then click DB connection settings.


4. Validate if the following are aligned with your Vertica database settings:

Host name: You can connect to either a single Vertica host or a Vertica cluster. If you want to connect to a Vertica
cluster, enter the host names as a comma separated list. This ensures that the user interface uses the live node from
the cluster. If you are using an embedded Vertica, use itom-di-vertica-svc .
Port: You set the Vertica port in the vertica.port parameter in the values.yaml file during the application installation. The
default port is 5433.
Security: The Enable TLS for secure communication check box gets cleared if you set the vertica.tlsEnabled
parameter to false in the values.yaml file during the application installation. For more information, see the section
Vertica in the Configure values.yaml page.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 390
AI Operations Management - Containerized 24.4

Database name: You set the Vertica database name in the vertica.db parameter in the values.yaml file during the
application installation. The default is itomdb .
Login: You may set the Vertica read only user login name in the vertica.rouser parameter in the values.yaml file during
the application installation. The default is vertica_rouser .
Password: Set to the password of the Vertica read only user.
Confirm password: Type the Vertica read only user password again to confirm.
5. Click TEST CONNECTION to test the connection. TEST CONNECTION must be successful.
6. Click SAVE SETTINGS.
7. Refresh the report to see the data.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 391
AI Operations Management - Containerized 24.4

1.20.5. RUM reports are showing partial data or


no data

Possible causes
Cause 1: The queries for corresponding reports aren't returning any data.
Cause 2: The tables required for corresponding report queries don't have data.
Cause 3: Data is present in raw tables, the issue is with OPTIC DL pods.
Cause 4: Data isn't available in OPTIC DL Message Bus topics.

Solution

The queries for corresponding reports aren't returning any data


Run a query:

Enter the following URL on a browser and log in using the credentials: https://<external_access_host>:<external_access_port>/<ui
>

For BVD Reports:

1. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

2. Click the dashboard from which you didn't see data and click .
3. Click on the widget for which there is no data and then copy the Data channel name.
4. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.
5. Search for the Data Channel name that you copied.

6. Select the Data channel name, and click .


7. The query appears in the right pane. Click Run. The query result appears with data.

For Flex Reports:

1. From the side navigation panel, type the report name in search or navigate to the report.

2. Click on the icon on the widget from which you didn't see data and then click .

3. Expand the PREDEFINED QUERY section and then click below the query name.
4. The query appears. Scroll down to the end of the query and click RUN QUERY. The query result appears with data.

Query isn't executed


Check the OPTIC Data Lake DB Connection. Follow these steps:

1. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.

2. Click and then click DB CONNECTION. The SET UP DB CONNECTION pane appears.
3. Give the Vertica database details:
Hostname: If Vertica is a cluster, enter the host names of the cluster nodes separated by commas
Port: The default port is 5433
TLS: The Use TLS check box is cleared if you set the vertica.tlsEnabled parameter to false in the values.yaml file
during the suite installation. The default is that TLS is enabled. See the section 'Vertica' in the Configure
Values.yaml page.
DB name: The Vertica database name is configured in the vertica.db parameter in the values.yaml file during the
suite installation. The default is itomdb .
Login: The Vertica read-only user login name is configured in the vertica.rouser parameter in the values.yaml file
during the suite installation. The default is vertica_rouser .
Password: Set to the password of the Vertica read-only user.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 392
AI Operations Management - Containerized 24.4

4. Click TEST CONNECTION to test the connection. A confirmation message appears as shown:

5. If the connection isn't successful, provide the correct details, and then test the connection. If the connection is
successful, click SAVE SETTINGS.

Check the health of bvd pods and resolve the issues according to the log message:

Command to
Command to describe
Pod Container Description check the Log files
pods
status of pods

<bvd-www-deployment-PO
D>_<opsbridge-namespace
bvd-www kubectl get pod kubectl describe pod <bv
Provides web UI and real-time >_bvd-www-*.log
bvd-www- kubernete s --all-namespa d-www-deployment-POD
push to browser for BVD <bvd-www-deployment-PO
deployment s-vault-ren ces -o wide | gr > -n <opsbridge-namesp
dashboards D>_<opsbridge-namespace
ew ep "bvd" ace>
>_ kubernetes-vault-renew-
*.log

<bvd-redis-POD>_<opsbrid
ge-namespace>_bvd-redis-*
bvd-redis
.log
bvd-redis- In the memory database for kubectl get pod
kubectl describe pod <bv <bvd-redis-POD>_<opsbrid
stunnel statistics and session data, s --all-namespa
bvd-redis d-redis-POD> -n <opsbrid ge-namespace>_bvd-redis-s
message bus for server ces -o wide | gr
kubernete ge-namespace> tunnel-*.log
process communication ep "bvd"
s-vault-ren
<bvd-redis-POD>_<opsbrid
ew
ge-namespace>_kubernete
s-vault-renew-*.log

<bvd-quexserv-POD>_<op
bvd-quexs
kubectl get pod sbridge-namespace>_bvd-q
erv Query execution service for kubectl describe pod <bv
bvd- s --all-namespa uexserv-*.log
kubernete executing Vertica on demand d-quexserv-POD> -n <op
quexserv ces -o wide | gr
queries sbridge-namespace> <bvd-quexserv-POD>_<op
s-vault-ren ep "bvd"
sbridge-namespace>_kuber
ew
netes-vault-renew-*.log

<bvd-receiver-deployment-
bvd-receiv POD>_<opsbridge-namespa
er kubectl get pod kubectl describe pod <bv ce>_bvd-receiver-*.log
bvd-
Receive incoming messages s --all-namespa d-receiver-deployment-P
receiver- kubernete <bvd-receiver-deployment-
(data items) ces -o wide | gr OD> -n <opsbridge-name
deployment s-vault-ren POD>_<opsbridge-namespa
ep "bvd" space>
ew ce>_kubernetes-vault-rene
w-*.log

<bvd-ap-bridge-POD>_<op
bvd-ap-bri
kubectl get pod sbridge-namespace>_bvd-a
dge Talks to Autopass server and kubectl describe pod < b
bvd-ap- s --all-namespa p-bridge-*.log
kubernete calculates # of allowed vd-ap-bridge -POD> -n <
bridge ces -o wide | gr <bvd-ap-bridge-POD>_<op
s-vault-ren dashboards opsbridge-namespace>
ep "bvd" sbridge-namespace>_kuber
ew
netes-vault-renew-*.log

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 393
AI Operations Management - Containerized 24.4

<bvd-controller-deploymen
bvd-contr t-POD>_<opsbridge-names
oller kubectl get pod kubectl describe pod <bv pace>_bvd-controller-*.log
bvd-
Does aging of old data items s --all-namespa d-controller-deployment-
controller- kubernete <bvd-controller-deploymen
and bootstrap of database ces -o wide | gr POD> -n <opsbridge-nam
deployment s-vault-ren t-POD>_<opsbridge-names
ep "bvd" espace>
ew pace>_kubernetes-vault-re
new-*.log

<bvd-explore-deployme
bvd-explor nt-POD>_<opsbridge-name
e kubectl get pod kubectl describe pod <bv space>_ bvd-explore-*.log
bvd-
Provides web UI and back end s --all-namespa d-explore-deployment-PO
explore- kubernete <bvd-explore-deployme
services for BVD explore ces -o wide | gr D> -n <opsbridge-names
deployment s-vault-ren nt-POD>_<opsbridge-name
ep "bvd" pace>
ew space>_kubernetes-vault-re
new-*.log

Query is executed, but the top 5 or bottom 5 records aren't displayed


Check if the widget and query are mapped correctly.

Enter the following URL on a browser and log in using the credentials: https://<external_access_host>:<external_access_port>/<ui
>

For BVD Reports:

1. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

2. Click the dashboard from which you didn't see data and click .
3. Click on the widget for which there is no data and then copy the Data channel name.
4. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.
5. Search for the Data Channel name that you copied.

6. Select the Data channel name, and click .


7. The query appears in the right pane. Verify if the widget is mapped to the correct query. If not, correct the query and
click RUN QUERY. The query runs and the result appears with data.
8. Based on the query type, click SAVE or SAVE DATA QUERY.

For Flex Reports:

1. From the side navigation panel, type the report name in search or navigate to the report.

2. Click on the icon on the widget from which you didn't see data and then click .

3. Expand the PREDEFINED QUERY section and then click below the query name.
4. The query appears. Verify if the widget is mapped to the correct query. If not, correct the query and click RUN QUERY.
The query result appears with data.
5. Click SAVE.

Query is executed, but no records are displayed


Run the same SQL queries from a database tool like DB Visualizer:

select to_timestamp(max(timestamp_utc)) from mf_shared_provider_default.<table name>

Verify the latest data is present in the following tables in the mf_shared_provider_default schema:

opsb_rum_action
opsb_rum_crash
opsb_rum_event
opsb_rum_page

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 394
AI Operations Management - Containerized 24.4

opsb_rum_request
opsb_rum_session
opsb_rum_tcp
opsb_rum_trans
opsb_rum_page_1h
opsb_rum_page_1d

The tables required for corresponding report queries don't have


data
Verify if the latest data is present in the tables. Run the SQL queries from a database tool like DbVisualizer:

select to_timestamp(max(timestamp_utc)) from mf_shared_provider_default.<table name>

Verify the latest data is present in the following tables in the mf_shared_provider_default schema:

opsb_rum_action
opsb_rum_crash
opsb_rum_event
opsb_rum_page
opsb_rum_request
opsb_rum_session
opsb_rum_tcp
opsb_rum_trans

Data is present in the tables, the issue is with OPTIC DL pods

Check the health of the OPTIC DL pods


Pod Container Description Command to check health

itom-di-receiv
er-cnt It receives the data from sources in JSON format over HTTP and kubectl describe pod <itom-di-receiv
itom-di-
sends the data to relevant OPTIC DL Message Bus topics. er-dpl-POD> -n <opsbridge-namespa
receiver-dpl kubernetes-v ce>
ault-renew Dependency - pulsar-itomdipulsar-proxy

itom-di-admi
kubectl describe pod <itom-di-admin
itom-di- nistration It helps to configure the data process, data ingestion to Vertica,
istration-POD> -n <opsbridge-names
administration and receiver tasks.
kubernetes-v pace>
ault-renew

certificate-re
It helps to produce and consume messages. kubectl describe pod < itomdipulsar-
itomdipulsar- new
broker-POD> -n <opsbridge-namesp
broker Dependency - itomdipulsar-zookeeper , itomdipulsar-bookkeep
itomdipulsar- ace>
er
broker

certificate-re
new kubectl describe pod < itom-di-sched
itom-di-
It schedules the processed data and loads to Vertica. uler-udx-POD> -n <opsbridge-names
scheduler-udx itom-di-udx-s
pace>
cheduler-sche
duler

itom-di-meta
itom-di- It receives the data from the OPTIC DL Message Bus and sends kubectl describe pod < itom-di-meta
data-server
metadata- it to Vertica. It manages table creation and streaming data-server-POD> -n <opsbridge-na
server kubernetes-v configuration in Vertica. mespace>
ault-renew

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 395
AI Operations Management - Containerized 24.4

Pod Container Description Command to check health

certificate-re
kubectl describe pod <itomdipulsar-z
itomdipulsar- new It stores the metadata of itomdipulsar-bookkeeper and OPTIC
ookeeper- POD> -n <opsbridge-name
zookeeper DL Message Bus pods.
itomdipulsar- space>
zookeeper

Resolve the issues according to the message.

Data isn't available in OPTIC DL Message Bus Topics


Run the below command to check if the OPTIC DL Message Bus Topics are created inside the itomdipulsar-bastion-0 pod:

1. Login to the itomdipulsar-bastion container:

kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>

2. Execute the following scripts to list the OPTIC DL Message Bus Topics topics:
pulsar@itomdipulsar-bastion-0:/pulsar/bin> ./pulsar-admin topics list-partitioned-topics public/default |grep rum

Perform the next solution steps if the data is present in the topics and the issue persists.

Check the communication between OPTIC DL Message Bus producer and


consumer
Run the following scripts in the pulsar0bastion-0 pod to check if the connection to OPTIC DL Message Bus consumer is healthy:

1. Login to the itomdipulsar-bastion container:


kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>
2. Execute the following producer command type 'hi':
./bin/pulsar-client produce <topic name> -m hi
3. Open one more session and execute the following consumer command, where you should see the same message 'hi'
typed in the producer console:
./bin/pulsar-client consume -s test-subscription -n 0 <topic name>

Check if data is present in the OPTIC DL Message Bus topic


1. Login to the itomdipulsar-bastion container:
kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>
2. Execute the following consumer command to check the data in the Pulsar topic:
./bin/pulsar-client consume -s test-subscription -n 0 <topic name>
3. After you analyze, run the command to unsubscribe the Pulsar topic:
bin/pulsar-admin topics unsubscribe <Topic Name> -s <Subscription Name> -f
For example: bin/pulsar-admin topics unsubscribe opsb_agent_node -s test-subscription -f

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 396
AI Operations Management - Containerized 24.4

1.20.6. Business Process Monitoring reports are


showing partial data or no data

Possible causes
Cause 1: The queries for corresponding reports aren't returning any data.
Cause 2: The raw tables required for corresponding report queries don't have data.
Cause 3: Data isn't available in OPTIC DL Message Bus topics.

Solution

The queries for corresponding reports aren't returning any data


Run a query:

Enter the following URL on a browser:

https://<external_access_host>:<external_access_port>/<ui>

Log in with your IdM username and password.

For BVD Reports:

1. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

2. Click the dashboard from which you didn't see data and click .
3. Click on the widget for which there is no data and then copy the Data channel name.
4. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.
5. Search for the Data Channel name that you copied.

6. Select the Data channel name, and click .


7. The query appears in the right pane. Click Run. The query result appears with data.

For Flex Reports:

1. From the side navigation panel, type the report name in search or navigate to the report.

2. Click on the icon on the widget from which you didn't see data and then click .

3. Expand the PREDEFINED QUERY section and then click below the query name.
4. The query appears. Scroll down to the end of the query and click RUN QUERY. The query result appears with data.

Query isn't executed


Check the OPTIC Data Lake DB Connection. Follow these steps:

1. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.

2. Click and then click DB CONNECTION. The SET UP DB CONNECTION pane appears.
3. Give the Vertica database details:
Hostname: If Vertica is a cluster, enter the host names of the cluster nodes separated by commas.
Port: The default port is 5433.
TLS: The Use TLS check box isn't selected if you set the vertica.tlsEnabled parameter to false in the values.yaml file
during the application installation. The default is that TLS is enabled. See the section 'Vertica' in the Configure
Values.yaml page.
DB name: Specify the name of the database to which you want to connect the UI.
Login: Specify the user name of the read-only user. For information about creating a read-only user, see the
Related topics.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 397
AI Operations Management - Containerized 24.4

Password: Specify the password.


4. Click TEST CONNECTION to test the connection. A confirmation message appears as shown:

5. If the connection isn't successful, give the correct details and then test the connection. If the connection is successful,
click SAVE SETTINGS.

Check the health of bvd pods and resolve the issues according to the log message:

Command to
Command to describe
Pod Container Description check the Log files
pods
status of pods

<bvd-www-deployment-PO

kubectl get pod D>_<opsbridge-namespace


bvd-www kubectl describe pod <bv
Provides web UI and real-time s --all- >_bvd-www-*.log
bvd-www- kubernete d-www-deployment-
push to browser for BVD namespaces -o <bvd-www-deployment-PO
deployment s-vault-ren POD> -n <opsbridge-nam
dashboards wide | grep "bvd D>_<opsbridge-namespace
ew espace>
" >_ kubernetes-vault-renew-
*.log

<bvd-redis-POD>_<opsbrid
ge-namespace>_bvd-redis-
bvd-redis
*.log
kubectl get pod
bvd-redis- In the memory database for
s --all- kubectl describe pod <bv <bvd-redis-POD>_<opsbrid
stunnel statistics and session data,
bvd-redis namespaces -o d-redis-POD> -n <opsbrid ge-namespace>_bvd-redis-
message bus for server
kubernete wide | grep "bvd ge-namespace> stunnel-*.log
process communication
s-vault-ren "
<bvd-redis-POD>_<opsbrid
ew
ge-namespace>_kubernete
s-vault-renew-*.log

<bvd-quexserv-POD>_<op
bvd-quexs kubectl get pod sbridge-namespace>_bvd-q
erv Query execution service for s --all- kubectl describe pod <bv
bvd- uexserv-*.log
kubernete executing Vertica on demand namespaces -o d-quexserv-POD> -n <op
quexserv <bvd-quexserv-POD>_<op
s-vault-ren queries wide | grep "bvd sbridge-namespace>
" sbridge-namespace>_kuber
ew
netes-vault-renew-*.log

<bvd-receiver-deployment-
bvd-receiv kubectl get pod POD>_<opsbridge-namesp
er kubectl describe pod <bv ace>_bvd-receiver-*.log
bvd- s --all-
Receive incoming messages d-receiver-deployment-P
receiver- kubernete namespaces -o <bvd-receiver-deployment-
(data items) OD> -n <opsbridge-name
deployment s-vault-ren wide | grep "bvd POD>_<opsbridge-namesp
space>
ew " ace>_kubernetes-vault-ren
ew-*.log

<bvd-ap-bridge-POD>_<op
bvd-ap-bri kubectl get pod sbridge-namespace>_bvd-a
dge Talks to Autopass server and s --all- kubectl describe pod < b
bvd-ap- p-bridge-*.log
kubernete calculates # of allowed namespaces -o vd-ap-bridge -POD> -n <
bridge <bvd-ap-bridge-POD>_<op
s-vault-ren dashboards wide | grep "bvd opsbridge-namespace>
" sbridge-namespace>_kuber
ew
netes-vault-renew-*.log

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 398
AI Operations Management - Containerized 24.4

Command to
Command to describe
Pod Container Description check the Log files
pods
status of pods

<bvd-controller-deploymen
bvd-contr kubectl get pod t-POD>_<opsbridge-names
oller kubectl describe pod <bv pace>_bvd-controller-*.log
bvd- s --all-
Does aging of old data items d-controller-deployment-P
controller- kubernete namespaces -o <bvd-controller-deploymen
and bootstrap of database OD> -n <opsbridge-name
deployment s-vault-ren wide | grep "bvd t-POD>_<opsbridge-names
space>
ew " pace>_kubernetes-vault-re
new-*.log

<bvd-explore-deployment-
bvd-explor kubectl get pod POD>_<opsbridge-namesp
e kubectl describe pod <bv ace>_ bvd-explore-*.log
bvd- s --all-
Provides web UI and back end d-explore-deployment-PO
explore- kubernete namespaces -o <bvd-explore-deployment-
services for BVD explore D> -n <opsbridge-names
deployment s-vault-ren wide | grep "bvd POD>_<opsbridge-namesp
pace>
ew " ace>_kubernetes-vault-ren
ew-*.log

Query is executed, but the top 5 or bottom 5 records aren't displayed


Check if the widget and query are mapped correctly.

Enter the following URL on a browser:

https://<external_access_host>:<external_access_port>/<ui>

Log in with your IdM username and password.

For BVD Reports:

1. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

2. Click the dashboard from which you didn't see data and click .
3. Click on the widget for which there is no data and then copy the Data channel name.
4. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.
5. Search for the Data Channel name that you copied.

6. Select the Data channel name, and click .


7. The query appears in the right pane. Verify if the widget is mapped to the correct query. If not, correct the query and
click RUN QUERY. The query runs and the result appears with data.
8. Based on the query type, click SAVE or SAVE DATA QUERY.

For Flex Reports:

1. From the side navigation panel, type the report name in search or navigate to the report.

2. Click on the icon on the widget from which you didn't see data and then click .

3. Expand the PREDEFINED QUERY section and then click below the query name.
4. The query appears. Verify if the widget is mapped to the correct query. If not, correct the query and click RUN QUERY.
The query result appears with data.
5. Click SAVE.

Query is executed, but no records are displayed


Run the same SQL queries from a database tool like DB Visualizer:

select to_timestamp(max(timestamp_utc)) from opsbridge_store.<table name>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 399
AI Operations Management - Containerized 24.4

Verify the latest data is present in the following tables in the opsbridge_store schema :

opsb_synthetic_trans

The raw tables required for corresponding report queries don't have
data

Data is present in raw tables


Verify if the latest data is present in the tables. Run the SQL queries from a database tool like DbVisualizer:

select to_timestamp(max(timestamp_utc)) from mf_shared_provider_default.<table name>

Verify the latest data is present in the following tables in the mf_shared_provider_default schema:

opsb_synthetic_trans
opsb_synthetic_trans_trc_hop
opsb_synthetic_trans_trc_path
opsb_synthetic_trans_errors
opsb_synthetic_trans_components

Data isn't present in raw tables


Check the health of the OPTIC Data Lake pods:

Pod Container Description Command to check health

itom-di-receiver-

itom-di- cnt Receives the data from the source and kubectl describe pod <itom-di-receiver-dp
receiver-dpl kubernetes-vault passes it to the OPTIC DL Message Bus. l-POD> -n <opsbridge-namespace>

-renew

itom-di-administ
itom-di- ration kubectl describe pod <itom-di-administrat
administration ion-POD> -n <opsbridge-namespace>
kubernetes-vault
-renew

certificate-
itomdipulsar- renew kubectl describe pod < itomdipulsar-brok
broker er-POD> -n <opsbridge-namespace>
itomdipulsar-bro
ker

certificate-
Responsible for getting data from OPTIC
itom-di- renew kubectl describe pod < itom-di-scheduler-
DL Message Bus and streaming it to
scheduler-udx udx-POD> -n <opsbridge-namespace>
itom-di-udx-sche Vertica.
duler-scheduler

itom-di-metadat
itom-di-
a-server Responsible for meta data configuration kubectl describe pod < itom-di-metadata-
metadata-
(table creation) server-POD> -n <opsbridge-namespace>
server kubernetes-vault
-renew

certificate-
itomdipulsar- renew kubectl describe pod <itomdipulsar-zooke
zookeeper eper- POD> -n <opsbridge-namespace>
itomdipulsar-zoo
keeper

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 400
AI Operations Management - Containerized 24.4

Data isn't available in OPTIC DL Message Bus Topics

Check the communication between OPTIC DL Message Bus producer and


consumer
Run the following scripts in the pulsar0bastion-0 pod to check if the connection to OPTIC DL Message Bus consumer is healthy:

1. Login to the itomdipulsar-bastion container:

kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>

2. Execute the following producer command type 'hi':

./bin/pulsar-client produce <topic name> -m hi

3. Open one more session and execute the following consumer command, where you should see the same message 'hi'
typed in the producer console:

./bin/pulsar-client consume -s test-subscription -n 0 <topic name>

Check if data is present in the OPTIC DL Message Bus topic


1. Login to the itomdipulsar-bastion container:

kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>

2. Execute the following consumer command to check the data in the OPTIC DL Message Bus topic:

./bin/pulsar-client consume -s test-subscription -n 0 <topic name>

Execute the following scripts to list the OPTIC DL Message Bus topics:
pulsar@itomdipulsar-bastion-0:/pulsar/bin> ./pulsar-admin topics list-partitioned-topics public/default |grep synthetic_trans

"persistent://public/default/opsb_synthetic_trans"

"persistent://public/default/opsb_synthetic_trans_trc_hop"

"persistent://public/default/opsb_synthetic_trans_trc_path"

"persistent://public/default/opsb_synthetic_trans_errors"

"persistent://public/default/opsb_synthetic_trans_components"

Data isn't present in OPTIC DL Message Bus topics


There must be some error in the collection or configuration. See Troubleshoot Business Process Monitoring reports collection
issues to resolve the issue.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 401
AI Operations Management - Containerized 24.4

1.20.7. Troubleshoot Business Process Monitoring


reports collection issues
Business Process Monitoring reports are showing no data or partial data.

Cause
There must be some error in the collection or configuration.

Solution
To resolve this issue, follow these solutions in the same order and check if the data appears on the report.

Check if the certificate exchange between OPTIC Data Lake and


OBM is successful
1. Log on to OBM.
2. Run the following command:
/opt/OV/bin/ovcert -list

All granted certificates are listed.

Check if you have installed Operations Agent and integrated with


BPM Server
To stream BPM data into OPTIC Data Lake, you must integrate the Operations Agent which is on the BPM server with OBM.

Perform the following steps to check if you have installed Operations Agent:

1. Log on to the BPM node.


2. Run the following command:
cd /opt/OV/bin
3. Run the following command:
./opcagt -version
The version of the Operations Agent appears. Make sure that the version is 12.10 or later.

Option 1
Run the following commands if you want to install and integrate Operations Agent (on a BPM server) with OBM:

On Linux: ./oainstall.sh -i -a -s <OBM load balancer or gateway server> -cert_srv <OBM load balancer or gateway server>

If the master node (OBM node) is in HA, run the following command: ./oainstall.sh -i -a -s <HA_VIRTUAL_IP's FQDN> -cert_srv <CDF
master node FQDN>

On Windows: cscript oainstall.vbs -i -a -s <OBM load balancer or gateway server> -cert_srv <OBM load balancer or gateway server>

If the master node (OBM node) is in HA, run the following command: cscript oainstall.vbs -i -a -s <HA_VIRTUAL_IP's FQDN> -cert_srv
<CDFmaster node FQDN>

Option 2
If Operations Agent is already installed, follow the steps to integrate Operations Agent with OBM:

1. Run the following command to integrate Operations Agent with OBM:


On Linux:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 402
AI Operations Management - Containerized 24.4

/opt/OV/bin/OpC/install/opcactivate -srv <OBM load balancer or gateway server>


On Windows:
<C:>\HP\HP BTO Software\bin\win64\OpC\install>cscript opcactivate.vbs -srv <OBM load balancer or gateway server>
This step sends the certificate request from the BPM server (Operations Agent node) to OBM.
2. If you didn't configure OPTIC Data Lake, run the following command to update the trusted certificates:
ovcert -updatetrusted

Grant the Operations Agent certificate


Follow the steps:

1. Log on to OBM UI.


2. Go to Administration > SETUP AND MAINTENANCE > Certificate Request

3. Click to grant the certificate.

Check if the nodes are added as monitored nodes in OBM


Follow the steps to make sure that you add the BPM server (Operations Agent node) as a monitored node in OBM:

1. Go to Administration > Setup and Maintenance > Monitored node .


2. In the Node Views pane, expand Predefined Node Filters and click Monitored Nodes.
3. All monitored nodes are listed. If the Operations Agent node isn't listed then follow the steps:
1. Click .
2. Select Computer and then select the node type. Create New Monitored Nodes window opens.
3. In the Primary DNS Name box, add the Fully Qualified Domain Name (FQDN) of the node.
4. In the IP Addresses box, click a row in the IP Address column. The IP address of the node will automatically
populate.
5. Click Ok. This adds the node to the node list.

Check if you have deployed the BPM Metrics streaming policy


See Task 1: Deploy the BPM Metrics Streaming Aspect

Check the di_receiver log file for errors


1. Go to : /var/vols/itom/log-volume/<opsbridge-namespace>/<opsbridge-namespace>__<itom-di-receiver-pod>__receiver__<worker
machine where di receiver is running>
For example:
/var/vols/itom/log-volume/opsbridge-jugcl/opsbridge-jugcl__itom-di-receiver-dpl-799f8547cd-d6dxb__receiver__btpvm0785.hpeswlab.n
et
2. Check the following file for error: receiver-out.log
3. Take appropriate action.

Check system.txt for errors


1. Log into the Operation Agent node.
2. Go to /var/opt/OV/log . Check for system.txt file for errors and fix the errors.

Check the tmp folder


1. Log into the Operation Agent node.
2. Go to %OvDataDir%\tmp\opcgeni\jsonfwd\opcgeni\snfd1 if OA version is earlier than 12.22 and %OvDataDir%\tmp\opcgeni\<Me
tricsMapping folder>\opcgeni\snfd1 if OA version is 12.22.
3. Ensure that no files are accumulating in these folders and that new files are forwarded as they arrive.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 403
AI Operations Management - Containerized 24.4

Enable debug mode


Enable DEBUG for the respective components by following these steps:

Administration

Edit the following file and change the level " INFO " to " DEBUG "

File Name: <NFS-conf-volume>/di/administration/conf/logback.xml

Receiver

Edit the following file and change the level " INFO " to " DEBUG "

File Name:

<NFS-conf-volume>/di/receiver/conf/logback.xml

<logger name="com.hpe.itomdi" level="INFO" additivity="false">

<root level="INFO">

Data-processor

Edit the following file and change the level " INFO " to " DEBUG "

File Name: <NFS-conf-volume>/di/data-processor/conf/logback.xml

<root level="INFO">

<logger name="com.hpe.itomdi.postload" level="INFO">

<logger name="com.hpe.itomdi.postloadapi" level="INFO">

<logger name="com.hpe.itomdi.preload" level="INFO">

<logger name="com.hpe.itomdi.preloadapi" level="INFO">

<logger name="com.hpe.itomdi.utils" level="INFO">

<logger name="org.apache.flink" level="INFO" additivity="false">

<logger name="org.apache.kafka" level="INFO" additivity="false">

<logger name="org.apache.hadoop" level="INFO" additivity="false">

<logger name="org.apache.zookeeper" level="INFO" additivity="false">

Vertica-ingestion

Edit the following file and change the level " INFO " to " DEBUG "

File Name: <NFS-conf-volume>/di/metadata-server/conf/logback.xml

<logger name="com.hpe.opsb.di" level="INFO" additivity="false">

<root level="INFO">

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 404
AI Operations Management - Containerized 24.4

1.20.8. System Infrastructure reports are


showing no or partial data or updated data is not
shown in the reports

Cause
Predefined queries for corresponding reports ( opsb_sysinfra_* and opsb_sysinfra_SysExecSumm* ) aren't returning any data.
Tables required for the corresponding ( opsb_agent* or opsb_agentless* or opsb_sysinfra* ) report queries aren't created.
The tables required for corresponding report queries don't have data.
Task flow ( opsb_sys* ) required for populating corresponding tables used in reports aren't running.
There are errors during the execution of task flows.

Solution
If the copy scripts were run, and there are no errors in the aggregate log files, check if the widget that's not showing data is
connected to the correct queries. Follow these steps:

Enter the following URL on a browser:

https://<external_access_host>:<external_access_port>/<ui>

Log in with your IdM username and password.

For BVD Reports:

1. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

2. Click the dashboard from which you didn't see data and click .
3. Click on the widget for which there is no data and then copy the Data channel name.
4. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.
5. Search for the Data Channel name that you copied.

6. Select the Data channel name, and click .


7. The query is displayed in the right pane. Click Run. The query result appears with data.

For Flex Reports:

1. From the side navigation panel, type the report name in search or navigate to the report.

2. Click on the icon on the widget from which you didn't see data and then click .

3. Expand the PREDEFINED QUERY section and then click below the query name.
4. The query appears. Scroll down to the end of the query and click RUN QUERY. The query result appears with data.

If the query doesn't return any data then follow the steps:

1. Check if data from the monitored nodes are sent to the raw tables ( opsb_agent_* and opsb_agentless_* tables) from
the integrated sources. If data is sent, continue with the following steps, else see if you have configured the data
sources correctly. For more information to configure data sources, see Configure reporting.
2. The flow or sequence to check for the data is given below:
Operations Agent data flow across tables:
opsb_agent_* > opsb_sysinfra_*
SiteScope or Agentless data flow across tables:
opsb_agentless_* > opsb_sysinfra_*
3. If there is no data in the opsb_sysinfra_* tables, check the copy scripts.

4. In the database, go to mf_shared_provider_default schema. In the opsb_internal_reports_schedule_config_1h table, look for


the following columns:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 405
AI Operations Management - Containerized 24.4

Note

Copy scripts copy data from the raw tables to respective Aligned sysinfra tables. Copy scripts are executed once every 5
minute interval.

tablename : Displays for the source table for which the data is getting copied
processed_time : Displays till when data was considered for aggregation
execution_time : Displays the last time the copy script was executed
last_processed_epoch : Displays the Vertica epoch value of the last processed row from the raw tables.

If the copy scripts were run, check the copy script log files:

Note

Make sure you change the logging level back toERROR after completing the analysis. Leaving it inINFO leads to log file locking
issues and may further result in hanging the task script run.

1. In the NFS server, go to:


For Operations Agent- <mounted log path>/reports/agent_infra
For SiteScope or Agentless - <mounted log path>/reports/agentless_infra
2. Look for errors in the:
For Operations Agent opsb_agt*.log file.
For SiteScope or Agentless opsb_agtl*.log file.
3. If there are no errors in the logs, change the log level to INFO and verify log messages in the next run.
To change the log level, go to /<conf vol path on NFS>/di/postload/conf/reports and open the <scriptname>log4perl.conf file
and change the log4perl.logger.<scriptname> to INFO.

In the subsequent run of the task script, you will find detailed logs in the log file of the respective scripts with more
details like the query runs and the number of records updated.

If the copy scripts were run, and there are no errors in the log files, check for the data flow across the aggregate tables:

1. Operations Agent data flow across aggregate tables :


opsb_agent_* > opsb_agent_*_1h
opsb_agent_* > opsb_agent_*_1d
opsb_sysinfra_* > opsb_sysinfra_*_1h
opsb_sysinfra_disk > opsb_sysinfra_disk_1h > opsb_sysinfra_disk_1d
For other CPU, node, filesys , and netif tables: opsb_sysinfra_* > opsb_sysinfra_*_1d
2. SiteScope or Agentless data flow across aggregate tables :
opsb_agentless_* > opsb_agentless_*_1h
opsb_agentless_disk > opsb_agentless_disk_1h > opsb_agentless_disk_1d
opsb_agentless_generic > opsb_agentless_generic_1h > opsb_agentless_generic_1d
For other CPU, node, filesys , and netif tables: opsb_agentless_* > opsb_agentless_*_1d
opsb_sysinfra_* > opsb_sysinfra_*_1h
opsb_sysinfra_disk > opsb_sysinfra_disk_1h > opsb_sysinfra_disk_1d
For other CPU, node, filesys , and netif tables: opsb_sysinfra_* > opsb_sysinfra_*_1d

3. Check for errors in the aggregate.log file:

1. Log in to the NFS server.


2. Go to /<log nfs path>/<opsb namespace>/<opsb namespace>__itom-di-postload-taskexecutor-<pod name>__postload-taskex
ecutor__<worker node where task executor is running>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 406
AI Operations Management - Containerized 24.4

For example: /var/vols/nfs/vol5/opsb/opsb__itom-di-postload-taskexecutor-7d9b99d899-fwv7p__postload-taskexecutor__host.


mycomputer.net

3. For example in case of Vertica database connectivity issues following error statements can be observed:
2020-08-26 19:12:39.995 [TFID:{opsb_sysdisk_taskflow} TID:{opsb_agentless_disk_1d_id}] RID:{d0993d10-92f9-4be4-b326-29
a5d4582e2b}-output] []- ERROR DbCommons::ERROR /taskexecutor/conf-local/bin/enrichment/DbCommons.pm (34) 34 Unable t
o open config.properties file to read vertica db credentials
2020-08-26 19:12:40.518 [TFID:{opsb_sysdisk_taskflow} TID:{opsb_agent_disk_1d_id}] RID:{1583ee75-67ec-4b56-b3bb-9dd13
b3be1e5}-output] []- Invalid connection string attribute: SSLMode (SQL-01S00)1ERROR ReadHistory::try {...} /taskexecutor/conf
-local/bin/enrichment/ReadHistory.pm (36) 36 opsb_agent_disk_1d : Can't connect to database: FATAL 3781: Invalid username o
r password
Solution: Check and fix the Vertica Database connectivity.

Check if there is data in the raw tables, if there are no data in the agent raw tables then see System Infrastructure report
widget displays partial data for metrics collected by Operations Agent, and if no data in the agentless raw tables then
see System Infrastructure report widget displays partial data for metrics collected by SiteScope

If copy scripts weren't run as scheduled (for more than 30 minutes) then see Aggregate tables aren't updated, data in the
system infrastructure, or event reports aren't refreshed.

Related topics
See How to recover itomdipulsar-bookkeeper pods from read-only mode if the itomdipulsar-bookkeeper pods are in read-
only mode.
Data Processor Postload task flow not running
Aggregate functionality isn't working as expected
Reporting data and task flows
Vertica database isn't reachable
Failed to connect to host
Trace issue in Aggregate and Forecast
No logs for aggregate and forecast
Aggregate not happening after upgrade

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 407
AI Operations Management - Containerized 24.4

1.20.9. System Infrastructure report widget


displays partial data for metrics collected by
Operations Agent
Even though the System Infrastructure report widget shows the data query, the widget doesn't display updated data or
displays partial data.

Possible causes
Cause 1: The queries for corresponding reports aren't returning any data.
Cause 2: The daily or hourly or aligned tables required for corresponding report queries don't have data.
Cause 3: Data is present in raw tables, but the copy script hasn't run.
Cause 4: Data isn't present in raw tables, the issue is with OPTIC DL pods.
Cause 5: Data isn't available in OPTIC DL Message Bus Topics.

Solution

The queries for corresponding reports aren't returning any data


Run the query as follows:

Enter the following URL on a browser: https://<external_access_host>:<external_access_port>/<ui>

For BVD Reports:

1. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

2. Click the dashboard from which you didn't see data and click .
3. Click on the widget for which there is no data and then copy the Data channel name.
4. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.
5. Search for the Data Channel name that you copied.

6. Select the Data channel name, and click .


7. The query is displayed in the right pane. Click Run. The query result appears with data.

For Flex Reports:

1. From the side navigation panel, type the report name in search or navigate to the report.

2. Click on the icon on the widget from which you didn't see data and then click .

3. Expand the PREDEFINED QUERY section and then click below the query name.
4. The query appears. Scroll down to the end of the query and click RUN QUERY. The query result appears with data.

Query isn't executed


Check the OPTIC Data Lake DB Connection. Follow these steps:

1. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.

2. Click and then click DB CONNECTION. The SET UP DB CONNECTION pane appears.
3. Give the Vertica database details:
Hostname: If Vertica is a cluster, enter the host names of the cluster nodes separated by commas
Port: The default port is 5433
TLS: The Use TLS check box isn't selected if you had set the vertica.tlsEnabled parameter to false in
the values.yaml file during the suite installation. The default is that TLS is enabled. See the section 'Vertica' in
the Configure Values.yaml page.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 408
AI Operations Management - Containerized 24.4

DB name: The Vertica database name is configured in the vertica.db parameter in the values.yaml file during the
suite installation. The default is itomdb .
Login: The Vertica read-only user login name is configured in the vertica.rouser parameter in the values.yaml file
during the suite installation. The default is vertica_rouser .
Password: Set to the password of the Vertica read-only user.
4. Click TEST CONNECTION to test the connection. A confirmation message appears as shown:

5. If the connection isn't successful, provide the correct details, and then test the connection. If the connection is
successful, click SAVE SETTINGS.

Check the health of bvd pods and resolve the issues according to the log message:

Command to
check the Command to describe
Pod Container Description Log files
status of pods
pods

<bvd-www-deployment-P
OD>_<opsbridge-namespa
bvd-www kubectl get po kubectl describe pod <b
Provides web UI and real-time ce>_bvd-www-*.log
bvd-www- kubernete ds --all-namesp vd-www-deployment-PO
push to browser for BVD <bvd-www-deployment-P
deployment s-vault-ren aces -o wide | g D> -n <opsbridge-name
dashboards OD>_<opsbridge-namespa
ew rep "bvd" space>
ce>_ kubernetes-vault-ren
ew-*.log

<bvd-redis-POD>_<opsbri
dge-namespace>_bvd-redi
bvd-redis
s-*.log
bvd-redis- In the memory database for kubectl get po
kubectl describe pod <b <bvd-redis-POD>_<opsbri
stunnel statistics and session data, OPTIC ds --all-namesp
bvd-redis vd-redis-POD> -n <opsbr dge-namespace>_bvd-redi
DL Message Bus for server aces -o wide | g
kubernete idge-namespace> s-stunnel-*.log
process communication rep "bvd"
s-vault-ren
<bvd-redis-POD>_<opsbri
ew
dge-namespace>_kuberne
tes-vault-renew-*.log

<bvd-quexserv-POD>_<o
bvd-quexs
kubectl get po psbridge-namespace>_bvd
erv Query execution service for kubectl describe pod <b
bvd- ds --all-namesp -quexserv-*.log
kubernete executing Vertica on demand vd-quexserv-POD> -n <o
quexserv aces -o wide | g <bvd-quexserv-POD>_<o
s-vault-ren queries psbridge-namespace>
rep "bvd" psbridge-namespace>_kub
ew
ernetes-vault-renew-*.log

<bvd-receiver-deployment
bvd-receiv -POD>_<opsbridge-names
er kubectl get po kubectl describe pod <b pace>_bvd-receiver-*.log
bvd-
Receive incoming messages ds --all-namesp vd-receiver-deployment-
receiver- kubernete <bvd-receiver-deployment
(data items) aces -o wide | g POD> -n <opsbridge-na
deployment s-vault-ren -POD>_<opsbridge-names
rep "bvd" mespace>
ew pace>_kubernetes-vault-re
new-*.log

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 409
AI Operations Management - Containerized 24.4

Command to
check the Command to describe
Pod Container Description Log files
status of pods
pods

<bvd-ap-bridge-POD>_<o
bvd-ap-bri
kubectl get po kubectl describe pod < psbridge-namespace>_bvd
dge Talks to Autopass server and
bvd-ap- ds --all-namesp bvd-ap-bridge -POD> -n -ap-bridge-*.log
kubernete calculates # of allowed
bridge aces -o wide | g <opsbridge- <bvd-ap-bridge-POD>_<o
s-vault-ren dashboards
rep "bvd" namespace> psbridge-namespace>_kub
ew
ernetes-vault-renew-*.log

<bvd-controller-deployme
nt-POD>_<opsbridge-nam
bvd-contr
kubectl get po kubectl describe pod <b espace>_bvd-controller-*.l
bvd- oller
Does aging of old data items and ds --all-namesp vd-controller-deploymen og
controller- kubernete bootstrap of database aces -o wide | g t-POD> -n <opsbridge-n <bvd-controller-deployme
deployment s-vault-ren rep "bvd" amespace> nt-POD>_<opsbridge-nam
ew
espace>_kubernetes-vault
-renew-*.log

<bvd-explore-deployment-
bvd-explor POD>_<opsbridge-namesp
e kubectl get po kubectl describe pod <b ace>_ bvd-explore-*.log
bvd-
Provides web UI and back end ds --all-namesp vd-explore-deployment-P
explore- kubernete <bvd-explore-deployment-
services for BVD explore aces -o wide | g OD> -n <opsbridge-nam
deployment s-vault-ren POD>_<opsbridge-namesp
rep "bvd" espace>
ew ace>_kubernetes-vault-ren
ew-*.log

Query is executed, but the top 5 or bottom 5 records aren't displayed


Check if the widget and query are mapped correctly:

1. Enter the following URL on a browser:


https://<external_access_host>:<external_access_port>/<ui>
2. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

3. Click the dashboard from which you didn't see data and click .
4. Click the widget for which there is no data and then copy the Data channel name.
5. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.

6. Search for the Data Channel name that you copied, select the Data channel name, and click .
7. The query is displayed in the right pane. Verify if the widget is mapped to the correct query. If not, correct the query and
click RUN. If the query is executed successfully, the query result is displayed:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 410
AI Operations Management - Containerized 24.4

8. Based on the query type, click SAVE or SAVE DATA QUERY

Query is executed, but no records are displayed


Run the same SQL queries from a database tool like DB Visualizer:

select to_timestamp(max(timestamp_utc)) from mf_shared_provider_default .<table name>

Verify the latest data is present in the following tables in the mf_shared_provider_default schema:

opsb_sysinfra_netif_1d
opsb_sysinfra_avail_1d
opsb_sysinfra_cpu_1d
opsb_sysinfra_disk_1d
opsb_sysinfra_filesys_1d
opsb_sysinfra_node_1d

The daily or hourly or aligned tables required for corresponding


report queries don't have data

Data isn't present in daily tables


Check if data isn't present in the daily tables, and run the SQL queries from a database tool like DbVisualizer:

select to_timestamp(max(timestamp_utc_s)) from mf_shared_provider_default.<table name>

Verify if the latest data is present in the following tables in the mf_shared_provider_default schema:

opsb_sysinfra_netif_1h
opsb_sysinfra_avail_1h
opsb_sysinfra_cpu_1h
opsb_sysinfra_disk_1h
opsb_sysinfra_filesys_1h
opsb_sysinfra_node_1h

Perform the next solution to check if the data isn't available in these hourly tables.

Data isn't present in hourly tables

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 411
AI Operations Management - Containerized 24.4

Check if daily aggregations are executed:

1. Log in to the NFS server.


2. Go to the itom_di_postload_provider_default and look for the itom_di_postload_provider_default.ROLLUP_CONTROL table.
3. In the ROLLUP_Name column, look for the aggregation name (For example: opsb_sysinfra_node_1d ) and then look for the L
AST_EXE_TIME (to know the last run time) and then look for the MAX_EPOCH_TIME (to know the time until when the records
were aggregated).
Daily aggregations are expected to run every 4:45 minutes. If it's beyond this time check for errors in the aggregation
log and report the error.
4. Run the following command to view the log files:
/<log nfs path>/<opsb namespace>/<opsb namespace>__itom-di-postload-taskexecutor- <pod name>__postload-taskexecutor__<w
orker node where task executor is running>
For example: /var/vols/nfs/vol5/opsb/opsb__itom-di-postload-taskexecutor-7d9b99d899-fwv7p__postload-taskexecutor__host.mycom
puter.net
5. Check the log file output and fix the issue. For example, in the case of Vertica database connectivity issues, you may
see the error: Unable to open config.properties file to read vertica db credentials, fix the Vertica Database connectivity.

If the data isn't present in tables, run the SQL queries from a database tool like DB Visualizer:

select to_timestamp(max(timestamp_utc_s)) from mf_shared_provider_default.<table name>

Verify if the latest data is present in the following tables in the mf_shared_provider_default schema:

opsb_sysinfra_netif
opsb_sysinfra_avail
opsb_sysinfra_cpu
opsb_sysinfra_disk
opsb_sysinfra_filesys
opsb_sysinfra_node

Perform the next solution to check if the data isn't available in these aligned tables.

Data isn't present in aligned tables


Check if hourly aggregations are executed:

1. In OPTIC Data Lake, go to the itom_di_postload_provider_default and look for the itom_di_postload_provider_default.ROLLUP_CO
NTROL table.
2. In the ROLLUP_Name column, look for the aggregation name (For example: opsb_sysinfra_node_1h ) and then look for the L
AST_EXE_TIME (to know the last run time) and then look for the MAX_EPOCH_TIME (to know the time till when the records
were aggregated).
Daily aggregations are expected to run every 12 minutes. If it's beyond this time check for errors in the aggregation log
and report the error.
3. Run the following command to view the log files:
/<log nfs path>/<opsb namespace>/<opsb namespace>__itom-di-postload-taskexecutor- <pod name>__postload-taskexecutor__<w
orker node where task executor is running>
For example: /var/vols/nfs/vol5/opsb/opsb__itom-di-postload-taskexecutor-7d9b99d899-fwv7p__postload-taskexecutor__host.mycom
puter.net

If the data isn't present in tables, run the SQL queries from a database tool like DB Visualizer:

select to_timestamp(max(timestamp_utc_s)) from mf_shared_provider_default.<table name>

Verify if the latest data is present in the following tables in the mf_shared_provider_default schema:

opsb_agent_netif
opsb_agent_cpu
opsb_agent_disk
opsb_agent_filesys
opsb_agent_node

Perform the next solution if the data is present in the raw tables and the issue persists.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 412
AI Operations Management - Containerized 24.4

Data is present in raw tables, but the copy script hasn't run
The copy scripts copy data from the raw tables to respective Aligned sysinfra tables. You can check the opsb_internal_reports_s
chedule_config_1h table in the mf_shared_provider_default schema in the Vertica database and look for columns:

tablename - displays the source table for which the data is getting copied
processed_time - displays till when data was considered for aggregation
execution_time - displays the last time the copy script was executed

Copy scripts are executed once every 5 min interval.

Copy scripts were run


If the copy scripts have run, check the copy script log files:

1. On the NFS server, go to: <mounted log path>/reports/agent_infra


2. Look for errors in the:
For Operations Agent opsb_agt*.log file.
3. If there are no errors in the logs, change the log level to INFO and verify log messages in the next run.
To change the log level, go to /<conf vol path on NFS>/di/postload/conf/reports and open the <scriptname>log4perl.conf file
and change the log4perl.logger.<scriptname> to INFO.

In the subsequent run of the task script, you will find detailed logs in the log file of the respective scripts with more
details like the query runs and the number of records updated.

If the copy scripts were run, and there are no errors in the log files, check the section "Agent Metric Collector Errors".

Copy scripts weren't run


If copy scripts weren't run as scheduled (for more than 30 minutes) then see Aggregate tables aren't updated, data in the
system infrastructure, or event reports aren't refreshed.

Data isn't present in raw tables, the issue is with OPTIC DL pods

Check the health of the OPTIC DL pods


Pod Container Description Command to check health

itom-di-receiv
er-cnt It receives the data from sources in JSON format over HTTP and kubectl describe pod <itom-di-receiv
itom-di-
sends the data to relevant OPTIC DL Message Bus topics. er-dpl-POD> -n <opsbridge-namespa
receiver-dpl kubernetes-v ce>
ault-renew Dependency - pulsar-itomdipulsar-proxy

itom-di-admi
kubectl describe pod <itom-di-admin
itom-di- nistration It helps to configure the data process, data ingestion to Vertica,
istration-POD> -n <opsbridge-names
administration and receiver tasks.
kubernetes-v pace>
ault-renew

certificate-re
It helps to produce and consume messages. kubectl describe pod < itomdipulsar-
itomdipulsar- new
broker-POD> -n <opsbridge-namesp
broker Dependency - itomdipulsar-zookeeper , itomdipulsar-bookkeep
itomdipulsar- ace>
er
broker

certificate-re
new kubectl describe pod < itom-di-sched
itom-di-
It schedules the processed data and loads to Vertica. uler-udx-POD> -n <opsbridge-names
scheduler-udx itom-di-udx-s
pace>
cheduler-sche
duler

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 413
AI Operations Management - Containerized 24.4

Pod Container Description Command to check health

itom-di-meta
itom-di- It receives the data from the OPTIC DL Message Bus and sends kubectl describe pod < itom-di-meta
data-server
metadata- it to Vertica. It manages table creation and streaming data-server-POD> -n <opsbridge-na
server kubernetes-v configuration in Vertica. mespace>
ault-renew

certificate-re
kubectl describe pod <itomdipulsar-z
itomdipulsar- new It stores the metadata of itomdipulsar-bookkeeper and OPTIC
ookeeper- POD> -n <opsbridge-name
zookeeper DL Message Bus pods.
itomdipulsar- space>
zookeeper

Resolve the issues according to the message.

Check if itom-collect-once-data-broker pod stays in Init:0/2 state


i tom-collect-once-data-broker pod stays in Init:0/2 state when Collection Service is deployed with classic OBM.

Follow the steps to configure collection services with OBM.

You can also execute the following command to get the log:

# kubectl logs <itom-collect-once-data-broker-POD> -n <opsbridge-namespace> -c itom-collect-once-data-broker-init

--2019-06-19 22:08:00-- http://omi:383/com.hp.ov.sec.cm.certificateserver/msg

Resolving omi (omi)... 16.78.120.136

Connecting to omi (omi)|16.78.120.136|:383... failed: Connection timed out.

Retrying.

Resolve the issues according to the log message.

Data isn't available in OPTIC DL Message Bus Topics


Run the below command to check if the OPTIC DL Message Bus Topics are created inside the itomdipulsar-bastion-0 pod:

1. Login to the itomdipulsar-bastion container:


kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>
2. Execute the following scripts to list the OPTIC DL Message Bus topics:
pulsar@itomdipulsar-bastion-0:/pulsar/bin> ./pulsar-admin topics list-partitioned-topics public/default |grep agent

"persistent://public/default/opsb_agent_filesys"

"persistent://public/default/opsb_agent_node"

"persistent://public/default/opsb_agent_disk"

"persistent://public/default/opsb_agent_cpu"

"persistent://public/default/opsb_agent_netif"

Perform the next solution steps if the data is present in the topics and the issue persists.

Check the communication between OPTIC DL Message Bus producer and


consumer
Run the following scripts in the pulsar0bastion-0 pod to check if the connection to OPTIC DL Message Bus consumer is healthy:

1. Login to the itomdipulsar-bastion container:


kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 414
AI Operations Management - Containerized 24.4

2. Execute the following producer command type 'hi':


./bin/pulsar-client produce <topic name> -m hi
3. Open one more session and execute the following consumer command, where you should see the same message 'hi'
typed in the producer console:
./bin/pulsar-client consume -s test-subscription -n 0 <topic name>

Check if data is present in the OPTIC DL Message Bus topic


1. Login to the itomdipulsar-bastion container:
kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>
2. Execute the following consumer command to check the data in the Pulsar topic:
./bin/pulsar-client consume -s test-subscription -n 0 <topic name>
3. After you analyze, run the command to unsubscribe the Pulsar topic:
bin/pulsar-admin topics unsubscribe <Topic Name> -s <Subscription Name> -f
For example: bin/pulsar-admin topics unsubscribe opsb_agent_node -s test-subscription -f

Data isn't present in the OPTIC DL Message Bus topics


There must be some error in the collection or configuration. For more information, see Troubleshoot System Infrastructure
Reports collection issues with Agent Metric Collector or Troubleshoot System Infrastructure Reports collection issues with
Metric Streaming Policy.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 415
AI Operations Management - Containerized 24.4

1.20.10. Troubleshoot System Infrastructure


Reports collection issues with Agent Metric
Collector
System Infrastructure reports are showing no data or partial data for a few Agent nodes.

Possible causes
Cause 1: Agent nodes aren't discovered.
Cause 2: The nodes may have failed to connect to the Agent Metric Collector.
Cause 3: Agent connected to the metric collector, but no metrics are pulled.

Solution
To analyze and resolve this issue, follow the applicable solutions:

Agent nodes aren't discovered

Ensure that you have enabled the collection


Run the command to check the status of the collection: ops-monitoring-cli get ms

If the collection configuration isn't present, deploy the configuration. For steps, see Manage Agent Metric Collection.
If you have disabled the collection configuration, enable the configuration. Run the command: ops-monitoring-cli enable col
lector -n <collector-name>
If you have enabled the collection configuration, check the self health of the collection. Run the command: ops-monitorin
g-cli get ms -o yaml -r

Perform the following steps according to the error messages that appear for the collection health check:

Error: Payload validation failed


Solution: Update the configuration with the right value to the fields indicated. For more information, see Sample AMC
configuration YAML files.
Error: Fetch Credentials from Job Fetcher failed
Solution: Ensure that the credential manager pod is up and running and the credentials exist as follows:
Run the command to ensure the credential manager pod is Running : kubectl get pods --all-namespaces | grep credential-man
ager
Run the command to ensure the credentials exist: ./ops-monitoring-ctl get cred
Error: error while initializing RTSM Rest Client
Solution: This error appears if you have used certificate based authentication and given the wrong PEM certificates. To
resolve this error, update the certificate credential consumed by the discovery collector with the correct certificate
credential.
Error: Failed to get authorization token for UCMDB endpoint
This error may appear due to the following reasons:
Incorrect Target endpoint details : Check the target endpoint, port, and availability. The HTTP response code
can be either 0 or 5XX .
Solution: Update the HTTP target configuration with the correct target endpoint details. For more information, see
Metric collector target configuration.
Incorrect authentication credentials: The HTTP Response code can be 4XX . The following error message
appears:
Error: error in rest response. HTTP response code: 400.
Solution: Update the basic auth credential configuration with the correct credentials. For more information, see
Discovery service credential configuration.
Error: error while fetching OOTB/Custom TQL: The following error message appears:
Server side error while fetching Custom TQL: tql3,

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 416
AI Operations Management - Containerized 24.4

Solution: Perform the steps in Discovery authentication fails.

Verify node discovery


Follow these steps:

1. Run the following command to get the discovery collector pod name:

kubectl get pods -n <application namespace> | grep oa-discovery-collector

For example:

kubectl get pods -n opsbridge-suite | grep oa-discovery-collector

2. Run the following command to get inside the discovery collector pod:

kubectl exec -it <pod name> -c <container name> -n <application namespace> <command>

For example:

kubectl exec -it itom-monitoring-oa-discovery-collector-7966b59-flvrr -c itom-monitoring-oa-discovery-collector -n opsbridge-suite


/bin/bash

3. Find the <config_name>Nodes.log file

ls

For example:

agent-collector-sysinfraNodes.log

Only the node names present in this file are considered for metric collection. You can also check the following two log
files:
<config_name>MissingNodes.log : This file has the node names present in the node allow filter list but weren't present in
the TQL response.
<config_name>FaultyNodeFromTqlNodes.log : The file has the faulty nodes from TQL response. This means that TQL
response for the node didn't have the FQDN or the short name.

For further verification, run the following command to enter the discovery pod to access oa-discovery-collector.log log:

kubectl exec -it <pod name> -c <container name> -n <application namespace> <command>

For example:

kubectl exec -it itom-monitoring-oa-discovery-collector-7966b59-flvrr -c itom-monitoring-oa-discovery-collector -n opsbridge-suite /bin/b


ash

Open the oa-discovery-collector.log file and check the log file messages. If you observe errors, contact Software Support.

The nodes may have failed to connect to the Agent Metric Collector

Verify agent node connectivity issues


Secure communication issue: This indicates an issue with the certificate imported in the metric collector pod. Login to the
pod and check the following commands

1. /opt/OV/bin/ovcert -list . The command should list the node certificate and the trust certificates.
2. /opt/OV/bin/ovcoreid . The command output should match the alias name in the Certificates section of the ovcert -list
command.
3. Run the following command to check the connection:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 417
AI Operations Management - Containerized 24.4

/opt/OV/bin/ovconfchg -ns bbc.http -set PROXY_CFG_FILE proxy file;/opt/OV/bin/ovconfchg -ns bbc.cb.ports -set CB_PORTS_CFG_FI
LE port file;/opt/OV/bin/bbcutil -ping <agent node name or IP>

You should use Host IP if you have used the host IP mapping. The command output should contain status=eServiceOK If it
shows status=eSSLError , then there is an issue with trust between metric collector container and agent nodes.

Verify proxy, port, host name mapping for collection configuration


Possible errors:

The below errors indicate that the metric collector is unable to fetch the configuration files like port, proxy, and hosts.

Run the following curl command in the metric collector container to check if the file retrieval is working. It should display the
file contents.

curl --location --request GET 'https://localhost:40005/v1.0/file/<tenant>/<namespace>/<file name>' -k

"Failed to fetch etc hosts file. Collection from some agent nodes may fail"
"Failed to fetch connection configuration file port or proxy file name. Collection from some agent nodes may fail"

Socket connection issue: Possible reasons for this error:

1. If you have used host IP mapping, then IP for the node name isn't given. Run ops-monitoring-ctl.exe get file -n hosts file nam
e and check if it lists the file. The file name appears if it's configured.
2. For proxy and port files configuration, run the following command to check the connection to the agent node. It will
display the target node IP address and OV Communication Broker port if configured correctly.

/opt/OV/bin/ovconfchg -ns bbc.http -set PROXY_CFG_FILE proxy file;/opt/OV/bin/ovconfchg -ns bbc.cb.ports -set CB_PORTS_CFG_FI
LE port file;/opt/OV/bin/bbcutil -gettarget <agent hostname>

3. During the execution of a job schedule, run ovconfget bbc.http and ovconfget bbc.cb.ports to check if PROXY_CFG_FILE and
CB_PORTS_CFG_FILE are set.

Note

: The configurations are emptied at the end of every scheduled run of the
job.

OVBBCCB connection issue

1. This means that the agent node communication broker is running on a non default port (other than 383). Run ops-monito
ring-ctl.exe get file -n port file name

The command will display the port file name if configured.

2. Port file should be present in /var/opt/OV/conf/bbc the container. During the execution of a job schedule /opt/OV/bin/ovconf
get bbc.cb.ports will display CB_PORTS_CFG_FILE=port file name .

Note

The configurations are emptied at the end of every scheduled run of the
job.

3. Run the following commands to check the connection to the agent node:

/opt/OV/bin/ovconfchg -ns bbc.cb.ports -set CB_PORTS_CFG_FILE port file;/opt/OV/bin/bbcutil -getcbport <agent hostname>
/opt/OV/bin/ovconfchg -ns bbc.cb.ports -set CB_PORTS_CFG_FILE port file;/opt/OV/bin/bbcutil -ping <host or IP of the agent machin
e>

4. Verify metric collection status. Run the command: ops-monitoring-ctl.exe get collector-status . It should display " Metric collect
ion succeeded on timestamp" .

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 418
AI Operations Management - Containerized 24.4

Agent connected to the metric collector, but no metrics are pulled

Check if the Operations Agent processes are running


Run the following command on the Operations Agent node: /opt/OV/bin/ovc

Check if " oacore " is running


Agent performance metrics are plotted in the out of the box reports. So it becomes vital to check if the performance metrics
are collected or not. Run the following commands on the Operations Agent node, to see the status of the oacore process.

Location of binary

UNIX (except AIX): /opt/OV/bin/ovc -status| grep oacore


AIX: /usr/lpp/OV/bin/ovc -status| grep oacore
Windows: "%OvInstallDir%\bin\win64\ovc.exe" -status| findstr oacore

If the oacore process is in a stopped or aborted state, run the following command to start it: ovc -start oacore

(Optional) Start oacore in debug mode:

UNIX (except AIX): /opt/OV/bin/oacore oacore /var/opt/OV/conf/oa/PipeDefinitions/oacore.xml


AIX: / usr/lpp/OV/bin/oacore oacore /var/opt/OV/conf/oa/PipeDefinitions/oacore.xml
Windows: "%OvInstallDir%bin\win64\oacore.exe" oacore "%OvDataDir%\conf\oa\PipeDefinitions\oacore.xml"

Check if the metrics are collected


Run the following commands on the Operations Agent node, to check if metrics are collected:

ovcodautil -ds SCOPE -o GLOBAL -flat -last

ovcodautil -ds SCOPE -o FILESYSTEM -flat -last

ovcodautil -ds SCOPE -o CPU -flat -last

ovcodautil -ds SCOPE -o DISK -flat -last

ovcodautil -ds SCOPE -o NETIF -flat -last

If metrics aren't collected, check the parm file for the following line:

log global application process device=disk,cpu,filesystem transaction

If a metric class is missing, then add the same and restart oacore

The parm file location:

On UNIX: /var/opt/perf/parm
On Windows: "%OvDataDir%parm.mwc

Check system.txt for errors


1. Log into the Operation Agent node
2. Go to /var/opt/OV/log . Check for system.txt file for errors
3. Check the hpcswatch.log file for errors.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 419
AI Operations Management - Containerized 24.4

The Watchdog mechanism monitors the hpsensor process. Watchdog runs once in every hour and logs the status of
the hpsensor process in the hpcswatch.log file located at:

On Windows:

%OvDataDir%\hpcs\hpcswatch.log

On Linux:

/var/opt/OV/hpcs/hpcswatch.log

While installing Operations Agent, the Watchdog mechanism adds a cron tab entry on Unix and Linux systems and a
schedule task entry on Windows systems.

Note

The hpcswatch.log file rolls over when it reaches a maximum size of 1 MB. During the roll over a
new hpcswatch.log file replaces the existing file.

Kubernetes object names


Component name Kubernetes deployment name Kubernetes pod name

Discovery Collector itom-monitoring-oa-discovery-collector itom-monitoring-oa-discovery-collector-xxxxxxxxx-xxxx

Recurring Metric Collector itom-monitoring-oa-metric-collector itom-monitoring-oa-metric-collector-xxxxxxxxx-xxxx

Background metric collector itom-monitoring-oa-metric-collector-bg itom-monitoring-oa-metric-collector-bg-xxxxxxxxx-xxxx

Note

: For enhanced logging in itom-monitoring-oa-metric-collector ( agent-collector-sysinfra ), you can edit the collector
configuration and set the metricCollectorLogLevel parameter to DEBUG.

For example:

1. Copy the collector configuration to a file. Run the following command:

./ops-monitoring-ctl get coll -n agent-collector-sysinfra -o yaml > <filename>.yaml

2. Edit the collector configuration yaml file. Set the metricCollectorLogLevel parameter to DEBUG .

3. Update the collector configuration. Run the following command:

./ops-monitoring-ctl update -f <filename>.yaml

For more troubleshooting on the data flow and for the self health monitoring, see the OA - Monitoring service overview
dashboard.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 420
AI Operations Management - Containerized 24.4

1.20.11. Troubleshoot System Infrastructure


Reports collection issues with Metric Streaming
policies
System Infrastructure Reports reports are showing no data or partial data.

Cause
There must be some error in the Metric Streaming Policy collection or configuration.

Solution
To resolve this issue, follow these solutions in the same order and check if the data appears on the report.

Verify if the certificate exchanges were successful


1. Log on to OBM.
2. Run the following command:
/opt/OV/bin/ovcert -list
3. The OMT (earlier CDF) certificate must appear under "Trusted Certificates" For example:

CA_<OVRG_CORE_ID_XYZ.FQDN>_<ASYMMETRIC_KEY_LENGTH>
MF CDF RE CA on XYZ.FQDN
MF CDF RIC CA on XYZ.FQDN
MF CDF RID CA on XYZ.FQDN
MF CDF di-integration CA on XYZ.FQDN

Run the command: ovcert -check to check status

OvCoreId set : OK
Private key installed : OK
Certificate installed : OK
Certificate valid : OK
Trusted certificates installed : OK
Trusted certificates valid : OK

The status should be OK. Make sure to grant the certificate request on the OBM. If OMT certificates are missing, execute
the command to update the trusted certificates on the Operations Agent node: ovcert -updatetrusted

If the certificate exchange wasn't successful, see:

Configure Classic OBM


Configure a secure connection between containerized OBM and OPTIC Data Lake

Verify if the Operations Agent nodes are added as monitored nodes


in OBM
Follow the steps to make sure that the Operations Agent node is added as a monitored node in OBM:

1. Go to Administration > Setup and Maintenance > Monitored node .


2. In the Node Views pane, expand Predefined Node Filters and click Monitored Nodes.
3. All monitored nodes are listed.

If the Operations Agent node isn't listed then follow the steps:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 421
AI Operations Management - Containerized 24.4

Option 1- Install and integrate Operations Agent


Run the following commands if you want to install and integrate Operations Agent (on a SiteScope server) with OBM:
On Linux: ./oainstall.sh -i -a -s <CDF master node FQDN> -cert_srv <CDFmaster node FQDN>

Note

: If the master node (OBM node) is in HA, run the following command: ./oainstall.sh -i -a -s <HA_VIRTUAL_IP's FQDN> -c
ert_srv <CDFmaster node FQDN>

On Windows: cscript oainstall.vbs -i -a -s <CDF master node FQDN> -cert_srv <CDFmaster node FQDN>

Note

: If the master node (OBM node) is in HA, run the following command: cscript oainstall.vbs -i -a -s <HA_VIRTUAL_IP's FQ
DN> -cert_srv <CDFmaster node FQDN>

Option 2 - Integrate Operations Agent


If Operations Agent is already installed, follow the steps to integrate Operations Agent with OBM:

1. Run the following command to integrate Operations Agent with OBM:

On Linux:
/opt/OV/bin/OpC/install/opcactivate -srv <FQDN of OBM>
On Windows:
<C:>\HP\HP BTO Software\bin\win64\OpC\install>cscript opcactivate.vbs -srv <FQDN of OBM>
This step sends the certificate request from the SiteScope server (Operations Agent node) to OBM.

2. If you didn't configure a secure connection between OBM and OPTIC Data Lake, run the following command on all
Operations Agent nodes to update the trusted certificates:
ovcert -updatetrusted

3. Grant the Operations Agent certificate

Follow the steps:

1. Log on to OBM UI.


2. Go to Administration > SETUP AND MAINTENANCE > Certificate Request

3. Click to grant the certificate

OV Certificate Server ( ovcs ) crashes in OBM


The issue is usually observed when you try to authenticate an Agent node.

Run the following commands on OBM:


On Linux:

1. Run the command to get the status of the certificate server:


/opt/OV/bin/ovc | grep -i ovcs
Sample output:

ovcs OV Certificate Server SERVER (14150) Aborted

2. Run the command to restart the certificate server:


/opt/OV/bin/ovc -restart ovcs

On Windows:

1. Run the command to get the status of the certificate server:


"%OvInstallDir%\bin\win64\ovc"
2. Run the command to restart the certificate server:
"%OvInstallDir%\bin\win64\ovc" -restart ovcs

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 422
AI Operations Management - Containerized 24.4

Check if performance metrics are collected

Check if the Operations Agent processes are running


Run the following command on the Operations Agent node: /opt/OV/bin/ovc

Check if " oacore " is running


Agent performance metrics are plotted in the out of the box reports. So it becomes vital to check if the performance metrics
are collected or not. Run the following commands on the Operations Agent node, to see the status of the oacore process.

Location of binary

UNIX (except AIX): /opt/OV/bin/ovc -status| grep oacore


AIX: /usr/lpp/OV/bin/ovc -status| grep oacore
Windows: "%OvInstallDir%\bin\win64\ovc.exe" -status| findstr oacore

If the oacore process is in a stopped or aborted state, run the following command to start it: ovc -start oacore

(Optional) Start oacore in debug mode:

UNIX (except AIX): /opt/OV/bin/oacore oacore /var/opt/OV/conf/oa/PipeDefinitions/oacore.xml


AIX: / usr/lpp/OV/bin/oacore oacore /var/opt/OV/conf/oa/PipeDefinitions/oacore.xml
Windows: "%OvInstallDir%bin\win64\oacore.exe" oacore "%OvDataDir%\conf\oa\PipeDefinitions\oacore.xml"

Check if the metrics are collected


Run the following commands on the Operations Agent node, to check if metrics are collected:

ovcodautil -ds SCOPE -o GLOBAL -flat -last

ovcodautil -ds SCOPE -o FILESYSTEM -flat -last

ovcodautil -ds SCOPE -o CPU -flat -last

ovcodautil -ds SCOPE -o DISK -flat -last

ovcodautil -ds SCOPE -o NETIF -flat -last

If metrics aren't collected, check the parm file for the following line:

log global application process device=disk,cpu,filesystem transaction

If a metric class is missing, then add the same and restart oacore

The parm file is located at:

On UNIX: /var/opt/perf/parm
On Windows: "%OvDataDir%parm.mwc

Verify if the System Metrics Metric Store Aspect is deployed


See, Configure System Infrastructure Reports using metric streaming policies.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 423
AI Operations Management - Containerized 24.4

Check system.txt for errors


1. Log into the Operation Agent node
2. Go to /var/opt/OV/log . Check for system.txt file for errors
3. Check the hpcswatch.log file for errors.

The Watchdog mechanism monitors the hpsensor process. Watchdog runs once in every hour and logs the status of
the hpsensor process in the hpcswatch.log file located at:

On Windows:

%OvDataDir%\hpcs\hpcswatch.log

On Linux:

/var/opt/OV/hpcs/hpcswatch.log

While installing Operations Agent, the Watchdog mechanism adds a cron tab entry on UNIX and Linux systems and a
schedule task entry on Windows systems.

Note

The hpcswatch.log file is rolled over when it reaches a maximum size of 1 MB. During roll over the existing file is replaced
with a new hpcswatch.log file.

Enable debug mode


Enable DEBUG for the respective components by following the steps mentioned:

hpsensor

Change the value of the hpcs.trace parameter, in the hpcs.conf file, from " INFO " to " DEBUG ":

On the Operations Agent, the hpcstrace.log file is created at:

On Windows: %OvDataDir%\hpcs\

On Linux: /var/opt/OV/hpcs/

Administration

In OPTIC Data Lake, edit the following file and change the level from " INFO " to " DEBUG ":

File Name: <NFS-conf-volume>/di/administration/conf/logback.xml

<logger name="com.swgrp.itomdi.administration.service.fileSystemToDbMigration" level="INFO" additivity="false">


<appender-ref ref="MigrationLogFileAppender"/>
</logger>
<logger name="com.swgrp.itomdi" level="INFO" additivity="false">
<appender-ref ref="LogFileAppender"/>
</logger>

Receiver

In OPTIC Data Lake, edit the following file and change the level from " INFO " to " DEBUG ":

File Name: <NFS-conf-volume>/di/receiver/conf/logback.xml

<logger name="com.hpe.itomdi" level="INFO" additivity="false">

<root level="INFO">

Data-processor

In OPTIC Data Lake, edit the following file and change the level from " INFO " to " DEBUG ":

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 424
AI Operations Management - Containerized 24.4

File Name: <NFS-conf-volume>/di/data-processor/conf/logback.xml

<root level="INFO">

<logger name="com.hpe.itomdi.postload" level="INFO">

<logger name="com.hpe.itomdi.postloadapi" level="INFO">

<logger name="com.hpe.itomdi.preload" level="INFO">

<logger name="com.hpe.itomdi.preloadapi" level="INFO">

<logger name="com.hpe.itomdi.utils" level="INFO">

<logger name="org.apache.flink" level="INFO" additivity="false">

<logger name="org.apache.kafka" level="INFO" additivity="false">

<logger name="org.apache.hadoop" level="INFO" additivity="false">

<logger name="org.apache.zookeeper" level="INFO" additivity="false">

Vertica-ingestion

In OPTIC Data Lake, edit the following file and change the level "from " INFO " to " DEBUG ":

File Name: <NFS-conf-volume>/di/metadata-server/conf/logback.xml

<logger name="com.hpe.opsb.di" level="INFO" additivity="false">

<root level="INFO">

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 425
AI Operations Management - Containerized 24.4

1.20.12. Forecast data is not displayed in System


Infrastructure Summary or System Resource
Details reports

Cause
Predefined queries for the forecast in the System Executive Summary ( opsb_sysinfra_SysExecSumm_forecast* ) and System
Resource Detail ( opsb_sysinfra_SysResourceDetail_forecast*) isn't returning any data.
Forecast tables for node and filesystem aren't created.
There is no data in the forecast tables.
Task flow ( opsb_sysavl/opsb_sysfs ) required for the forecast isn't running.
There are errors during the execution of the forecast task flows.

Solution
If the Forecast data isn't shown in the System Executive Summary or the System Usage Details reports, check for the data on
the reports as follows:

Check if the widget that isn't showing data is connected to the correct queries. Enter the following URL on a browser:

https://<external_access_host>:<external_access_port>/<ui>

Log in with your IdM username and password.

For BVD Reports:

1. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

2. Click the dashboard from which you didn't see data and click .
3. Click on the widget for which there is no data and then copy the Data channel name.
4. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.
5. Search for the Data Channel name that you copied.

6. Select the Data channel name, and click .


7. The query appears in the right pane. Click Run. The query result appears with data.

For Flex Reports:

1. From the side navigation panel, type the report name in search or navigate to the report.

2. Click on the icon on the widget from which you didn't see data and then click .

3. Expand the PREDEFINED QUERY section and then click below the query name.
4. The query appears. Scroll down to the end of the query and click RUN QUERY. The query result appears with data.

Check for the data in forecast tables in the database in the schema mf_shared_provider_default . The flow or sequence to check
for the data is as follows:

1. Operations Agent data flow across tables:


opsb_agent_filesys_1d > opsb_agent_filesys_forecast
opsb_agent_node_1d > opsb_agent_node_forecast
opsb_sysinfra_filesys_1d > opsb_sysinfra_filesys_forecast
opsb_sysinfra_node_1d > opsb_sysinfra_node_forecast
2. SiteScope or Agentless data flow across tables:
opsb_agentless_filesys_1d > opsb_agentless_filesys_forecast
opsb_agentless_node_1d > opsb_agentless_node_forecast
opsb_sysinfra_filesys_1d > opsb_sysinfra_filesys_forecast
opsb_sysinfra_node_1d > opsb_sysinfra_node_forecast

Check for errors in the forecast.log file:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 426
AI Operations Management - Containerized 24.4

1. Log in to the NFS server.


2. Go to /<log volume path >/<opsb namespace>/<opsb namespace>__itom-di-postload-taskexecutor-<pod name>__postload-taskexe
cutor__<worker node where task executor is running>
For example: /var/vols/nfs/vol5/opsb/opsb__itom-di-postload-taskexecutor-7d9b99d899-fwv7p__postload-taskexecutor__host.myco
mputer.net
3. For example, in case of Vertica database connectivity issues the following error statements appear:
2020-08-18 19:46:02.253 [TFID:{opsb_sysavl_taskflow} TID:{opsb_sysinfra_node_forecast_id}] RID:{b552e434-1b1f-4bd7-9d00-17d3
557256df}-output] []- Invalid connection string attribute: SSLMode (SQL-01S00)1ERROR ReadForecastHistory::try {...} /taskexecutor/
conf-local/bin/enrichment/ReadForecastHistory.pm (35) 35 opsb_sysinfra_node_forecast : Can't connect to database: [Vertica][DSI] An
error occurred while attempting to retrieve the error message for key 'VConnectFailed' and component ID 101: Could not open error m
essage files - Check that "/opt/vertica/lib64/en-US/VerticaMessages.xml" or "/opt/vertica/lib64/VerticaMessages_en-US.xml" exists and
are accessible. MessageParameters=["could not translate host name \"OPSBsac2590Vert-01.swinfra.net\" to address: Name or service
not known
2020-08-18 19:46:02.253 [TFID:{opsb_sysavl_taskflow} TID:{opsb_sysinfra_node_forecast_id}] RID:{b552e434-1b1f-4bd7-9d00-17d3
557256df}-output] []- "] (SQL-08001) [state was 08001 now 01S00]
Solution: Check and fix the Vertica Database connectivity

If copy scripts weren't run as scheduled (for more than 30 minutes) then see Troubleshoot Forecast/Aggregate data flow.

Related topics
Data Processor Postload task flow not running
Reporting data and task flows
Vertica database isn't reachable
Failed to connect to host
Trace issue in Aggregate and Forecast
No logs for aggregate and forecast
Aggregate not happening after upgrade

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 427
AI Operations Management - Containerized 24.4

1.20.13. System Infrastructure Availability data is


missing in reports

Cause
Predefined queries for availability ( opsb_sysinfra_Avail* and opsb_sysinfra_SysExecSumm_avail* ) aren't returning any data.
Tables required for availability reports ( opsb_agent_node or opsb_agentless_node or opsb_sysinfra_node or opsb_sysinfra_avai
l* ) aren't created.
The tables required for availability reports aren't having data.
The task flow ( opsb_sysavl ) required for availability isn't running.
There are errors during the execution of the availability task flow.
Late arrival data isn't getting captured in the Availability tables.

Solution
Check if the widget that isn't showing data is connected to the correct queries. Enter the following URL on a browser:

https://<external_access_host>:<external_access_port>/<ui>

Log in with your IdM username and password.

For BVD Reports:

1. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

2. Click the dashboard from which you didn't see data and click .
3. Click on the widget for which there is no data and then copy the Data channel name.
4. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.
5. Search for the Data Channel name that you copied.

6. Select the Data channel name, and click .


7. The query appears in the right pane. Click Run. The query result appears with data.

For Flex Reports:

1. From the side navigation panel, type the report name in search or navigate to the report.

2. Click on the icon on the widget from which you didn't see data and then click .

3. Expand the PREDEFINED QUERY section and then click below the query name.
4. The query appears. Scroll down to the end of the query and click RUN QUERY. The query result appears with data.

Check the data in the tables:

1. Check if data is flowing across tables:


Data flow across tables for Operations Agent data:
opsb_agent_node > opsb_sysinfra_node > opsb_sysinfra_avail > opsb_sysinfra_avail_1h > opsb_sysinfra_avail_1d
Data flow across tables for SiteScope (Agentless) data:
opsb_agentless_node > opsb_sysinfra_node > opsb_sysinfra_avail > opsb_sysinfra_avail_1h > opsb_sysinfra_avail_1d
2. Check for uptime seconds in the opsb_agent_node and opsb_agentless_node raw tables. Verify that the value in the
column uptime_s increments with every collection.
3. If the data is present in the raw tables ( opsb_agent_node and opsb_agentless_node) , go to the aligned table ( opsb_sysinfra_
node) , and verify that there is data in the uptime_s column.
4. If the data isn't present in the aligned tables ( opsb_sysinfra_* tables), go to the OPTIC Data Lake Monitoring health chart
and verify if the post load task flow ( ops_sysavl ) is running.
5. If data isn't present in opsb_sysinfra_* tables, check the task script logs.

Check task script logs for errors:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 428
AI Operations Management - Containerized 24.4

Note

Make sure you change the logging level back toERROR after completing the analysis. Leaving it inINFO leads to log file locking
issues and may further result in hanging the task script run.

1. On the NFS server, the script logs are present at: /<mount path for log directory>/reports/system_infra/opsb_sys*.log (* can
be uptm or avlhly or avldly , or avlla ). Check the following logs:

Note

For the late arrival data which aren't captured in the availability table, check for errors in the opsb_sysavlla.log
file.

For opsb_sysinfra_avail table see opsb_sysuptm.log


For opsb_sysinfra_avail_1h table see opsb_sysavlhly.log
For opsb_sysinfra_avail_1d table see opsb_sysavldly.log
2. If there are no errors in the logs, change the log level to INFO and verify log messages in the next run.
To change the log level, go to /<conf vol path on NFS>/di/postload/conf/reports/system_infra/ and open the <scriptname>log4p
erl.conf file and change the log4perl.logger.<scriptname> to INFO.

In the subsequent run of the task script, you will find detailed logs in the log file of the respective scripts with more
details like the query runs and the number of records updated.

3. If there are no errors found in the logs and data isn't present in the opsb_sysavl* tables, verify if data is present in the op
sb_sysinfra_* tables.
4. If there is no data in the opsb_sysinfra_* tables, and check the copy scripts.

5. In the database, go to mf_shared_provider_default schema. In the opsb_internal_reports_schedule_config_1h table, look for:

Note

Copy scripts copy data from the raw tables to respective Aligned sysinfra tables. Copy scripts are executed once every 5
minutes interval.

tablename - Displays for the source table for which the data is getting copied
processed_time - Displays till when data was considered for aggregation
execution_time - Displays the last time the copy script was executed
last_processed_epoch - Displays the Vertica epoch value of the last processed row from the opsb_sysinfra_node table.

If the copy scripts were run, check the copy script log files:

Note

Make sure you change the logging level back toERROR after completing the analysis. Leaving it inINFO leads to log file locking
issues and may further result in hanging the task script run.

1. In the NFS server, go to:


For Operations Agent- <mounted log path>/reports/agent_infra
For SiteScope or Agentless - <mounted log path>/reports/agentless_infra
2. Look for errors in the:
For Operations Agent opsb_agtnode*.log file.
For SiteScope or Agentless opsb_agtlnode.log file.
3. If there are no errors in the logs, change the log level to INFO and verify log messages in the next run.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 429
AI Operations Management - Containerized 24.4

To change the log level, go to /<conf vol path on NFS>/di/postload/conf/reports/agent_infra/ or /<conf vol path on NFS>/di/postloa
d/conf/reports/agentless_infra/ and open the <scriptname>log4perl.conf file and change the log4perl.logger.<scriptname> to I
NFO.

In the subsequent run of the task script, you will find detailed logs in the log file of the respective scripts with more
details like the query runs and the number of records updated.

If the copy scripts were run, and there are no errors in the log files, check if there is data in the raw tables, if there is no data
in the agent raw tables then see System Infrastructure report widget displays partial data for metrics collected by Operations
Agent, and if no data in the agentless raw tables then see System Infrastructure report widget displays partial data for
metrics collected by SiteScope.

If copy scripts weren't run as scheduled (for more than 30 minutes) then see Aggregate tables aren't updated, data in the
system infrastructure, or event reports aren't refreshed.

Related topics
Data Processor Postload task flow not running
Aggregate functionality isn't working as expected
Reporting data and task flows
Aggregate not happening after upgrade

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 430
AI Operations Management - Containerized 24.4

1.20.14. Event reports are showing no or partial


data or updated data is not shown in the reports

Cause
Predefined queries or data channels ( opsb_event_* ) for the widget not returning any data.
One or more of the metrics mapped in the corresponding widgets are null.
Tables required for the corresponding ( opr_event, opsb_event_*_summary_1h or opsb_event_*_summary_1d ) report queries
aren't created.
Data isn't present in one or more event tables - raw tables ( opr_event ), hourly aggregation tables, or daily aggregation
tables.
Task flow ( opsb_evt* ) required for corresponding reports isn't running.
There are errors during the execution of the task flow.
Late arrival data isn't getting captured in the Availability tables

Solution
Check if the widget that isn't showing data is connected to the correct queries. Enter the following URL on a browser:

https://<external_access_host>:<external_access_port>/<ui>

Log in with your IdM username and password.

For BVD Reports:

1. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

2. Click the dashboard from which you didn't see data and click .
3. Click on the widget for which there is no data and then copy the Data channel name.
4. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.
5. Search for the Data Channel name that you copied.

6. Select the Data channel name, and click .


7. The query appears in the right pane. Click Run. The query result appears with data.

For Flex Reports:

1. From the side navigation panel, type the report name in search or navigate to the report.

2. Click on the icon on the widget from which you didn't see data and then click .

3. Expand the PREDEFINED QUERY section and then click below the query name.
4. The query appears. Scroll down to the end of the query and click RUN QUERY. The query result appears with data.

If the query doesn't return any data then follow the steps:

1. Check if the events from the integrated OBM sources are forwarded to the opr_event table. For details about the event
tables, see Event schema. If data is forwarded, continue with the next steps, check if event forwarding is configured
correctly. If you are using classic OBM, see Configure Classic OBM. If you are using containerized OBM, see Configure a
secure connection between containerized OBM and OPTIC Data Lake.
2. The events forwarded from OBM are saved in the raw table ( opr_event ). Task scripts aggregate the data from the opr_ev
ent table and insert it into hourly aggregate tables ( opsb_event_*_summary_1h ). Further, daily aggregations are
computed from the respective hourly tables and inserted into daily aggregate tables ( opsb_event_*_summary_1d )
Task flow: opsb_event > opsb_event_*_summary_1h > opsb_event_*_summary_1d
3. If there is no data present or no recent data updates in the hourly aggregation table, check the task script logs present
in the NFS server. On the NFS server, the task script logs are present at: /<mount path for log directory>/reports/event and
check opsb_evt*.log (* can be cihly or etihly or cithly or polhly or usrhly or grphly or hly ).

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 431
AI Operations Management - Containerized 24.4

Note

For the late arrival data that aren't captured in the availability table, check for errors in the opsb_evt*la.log (* can be cihl
y or etihly or cithly or polhly or usrhly or grphly or hly ) file.

4. If there are no errors in the logs, then change the log level of the task script from ERROR to INFO and check for the
detailed log in the consequent run.
To change the logging level, go to <postload mount path on NFS>/conf/reports/event and edit opsb_evt*log4perl.conf (* can be
cihly or etihly or cithly or polhly or usrhly or grphly or hly ) change the logging level from ERROR to INFO.
In the subsequent run of the task script, you will find detailed logs in the log file of the respective scripts with more
details like the query runs and the number of records updated.

Note

Make sure you change the logging level back to ERROR after completing the analysis. Leaving it inINFO leads to log file
locking issues and may further cause task flows to be in a non responsive state.

If there is no data present or data shown isn't recent in the daily aggregated tables, then verify the respective hourly
aggregate table. If data is present in the hourly aggregate table and not updated in the daily table, then check the aggregate.l
og file for any errors.

1. Log in to the NFS server.


2. Go to /<log nfs path>/<opsb namespace>/<opsb namespace>__itom-di-postload-taskexecutor-<pod name>__postload-taskexecutor
__<worker node where task executor is running>
For example: /var/vols/nfs/vol5/opsb/opsb__itom-di-postload-taskexecutor-7d9b99d899-fwv7p__postload-taskexecutor__host.myco
mputer.net

3. For example, in case of Vertica database connectivity issues the following error statements appear:
2020-08-26 19:12:40.518 [TFID:{opsb_evtcihly_taskflow} TID:{opsb_event_ci_summary_1d_id}] RID:{43d9115d-4413-43ff-9012-eaad
996b06db}-output] []- Invalid connection string attribute: SSLMode (SQL-01S00)1ERROR ReadHistory::try {...} /taskexecutor/conf-loc
al/bin/enrichment/ReadHistory.pm (36) 36 opsb_event_ci_summary_1d : Can't connect to database: FATAL 3781: Invalid username or
password
Solution: Check and fix the Vertica Database connectivity

Note

For “Events by CI” report, it's a prerequisite that the topology data forwarding is enabled. For information about topology data
forwarding, see Forward topology from classic OBM to OPTIC Data Lake or Forward topology from containerized OBM to OPTIC Data
Lake.

Related topics
Data Processor Postload task flow not running
Aggregate functionality isn't working as expected
Reporting data and task flows
Vertica database isn't reachable
Failed to connect to host
Trace issue in Aggregate and Forecast
No logs for aggregate and forecast
Aggregate not happening after upgrade

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 432
AI Operations Management - Containerized 24.4

1.20.15. System Infrastructure report widget


displays partial data for metrics collected by
SiteScope
Even though the System Infrastructure report widget shows the data query, the widget doesn't display updated data or
displays partial data.

Possible causes
Cause 1: The queries for corresponding reports aren't returning any data.
Cause 2: The daily or hourly or aligned tables required for corresponding report queries don't have data.
Cause 3: Data is present in raw tables, but the copy script hasn't run.
Cause 4: Data isn't present in raw tables, the issue is with OPTIC DL pods.
Cause 5: Data isn't available in OPTIC DL Message Bus Topics.

Solution

The queries for corresponding reports aren't returning any data


Run a query:

Enter the following URL on a browser:

https://<external_access_host>:<external_access_port>/<ui>

Log in with your IdM username and password.

For BVD Reports:

1. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

2. Click the dashboard from which you didn't see data and click .
3. Click on the widget for which there is no data and then copy the Data channel name.
4. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.
5. Search for the Data Channel name that you copied.

6. Select the Data channel name, and click .


7. The query appears in the right pane. Click Run. The query result appears with data.

For Flex Reports:

1. From the side navigation panel, type the report name in search or navigate to the report.

2. Click on the icon on the widget from which you didn't see data and then click .

3. Expand the PREDEFINED QUERY section and then click below the query name.
4. The query appears. Scroll down to the end of the query and click RUN QUERY. The query result appears with data.

Query isn't executed


Check the OPTIC Data Lake DB Connection. Follow these steps:

1. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.

2. Click and then click DB CONNECTION. The SET UP DB CONNECTION pane appears.
3. Give the Vertica database details:
Hostname: If Vertica is a cluster, enter the host names of the cluster nodes separated by commas
Port: The default port is 5433

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 433
AI Operations Management - Containerized 24.4

TLS: The Use TLS check box isn't selected if you had set the vertica.tlsEnabled parameter to false in
the values.yaml file during the suite installation. The default is that TLS is enabled. See the section 'Vertica' in
the Configure Values.yaml page.
DB name: The Vertica database name is configured in the vertica.db parameter in the values.yaml file during the
suite installation. The default is itomdb .
Login: The Vertica read-only user login name is configured in the vertica.rouser parameter in the values.yaml file
during the suite installation. The default is vertica_rouser .
Password: Set to the password of the Vertica read-only user.
4. Click TEST CONNECTION to test the connection. A confirmation message appears as shown:

5. If the connection isn't successful, provide the correct details, and then test the connection. If the connection is
successful, click SAVE SETTINGS.

Check the health of bvd pods and resolve the issues according to the log message:

Command to
check the Command to describe
Pod Container Description Log files
status of pods
pods

<bvd-www-deployment-P
OD>_<opsbridge-namespa
bvd-www kubectl get po kubectl describe pod <b
Provides web UI and real-time ce>_bvd-www-*.log
bvd-www- kubernete ds --all-namesp vd-www-deployment-PO
push to browser for BVD <bvd-www-deployment-P
deployment s-vault-ren aces -o wide | g D> -n <opsbridge-name
dashboards OD>_<opsbridge-namespa
ew rep "bvd" space>
ce>_ kubernetes-vault-ren
ew-*.log

<bvd-redis-POD>_<opsbri
dge-namespace>_bvd-redi
bvd-redis
s-*.log
bvd-redis- In the memory database for kubectl get po
kubectl describe pod <b <bvd-redis-POD>_<opsbri
stunnel statistics and session data, OPTIC ds --all-namesp
bvd-redis vd-redis-POD> -n <opsbr dge-namespace>_bvd-redi
DL Message Bus for server aces -o wide | g
kubernete idge-namespace> s-stunnel-*.log
process communication rep "bvd"
s-vault-ren
<bvd-redis-POD>_<opsbri
ew
dge-namespace>_kuberne
tes-vault-renew-*.log

<bvd-quexserv-POD>_<o
bvd-quexs
kubectl get po psbridge-namespace>_bvd
erv Query execution service for kubectl describe pod <b
bvd- ds --all-namesp -quexserv-*.log
kubernete executing Vertica on demand vd-quexserv-POD> -n <o
quexserv aces -o wide | g <bvd-quexserv-POD>_<o
s-vault-ren queries psbridge-namespace>
rep "bvd" psbridge-namespace>_kub
ew
ernetes-vault-renew-*.log

<bvd-receiver-deployment
bvd-receiv -POD>_<opsbridge-names
er kubectl get po kubectl describe pod <b pace>_bvd-receiver-*.log
bvd-
Receive incoming messages ds --all-namesp vd-receiver-deployment-
receiver- kubernete <bvd-receiver-deployment
(data items) aces -o wide | g POD> -n <opsbridge-na
deployment s-vault-ren -POD>_<opsbridge-names
rep "bvd" mespace>
ew pace>_kubernetes-vault-re
new-*.log

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 434
AI Operations Management - Containerized 24.4

Command to
check the Command to describe
Pod Container Description Log files
status of pods
pods

<bvd-ap-bridge-POD>_<o
bvd-ap-bri
kubectl get po kubectl describe pod < psbridge-namespace>_bvd
dge Talks to Autopass server and
bvd-ap- ds --all-namesp bvd-ap-bridge -POD> -n -ap-bridge-*.log
kubernete calculates # of allowed
bridge aces -o wide | g <opsbridge- <bvd-ap-bridge-POD>_<o
s-vault-ren dashboards
rep "bvd" namespace> psbridge-namespace>_kub
ew
ernetes-vault-renew-*.log

<bvd-controller-deployme
nt-POD>_<opsbridge-nam
bvd-contr
kubectl get po kubectl describe pod <b espace>_bvd-controller-*.l
bvd- oller
Does aging of old data items and ds --all-namesp vd-controller-deploymen og
controller- kubernete bootstrap of database aces -o wide | g t-POD> -n <opsbridge-n <bvd-controller-deployme
deployment s-vault-ren rep "bvd" amespace> nt-POD>_<opsbridge-nam
ew
espace>_kubernetes-vault
-renew-*.log

<bvd-explore-deployment-
bvd-explor POD>_<opsbridge-namesp
e kubectl get po kubectl describe pod <b ace>_ bvd-explore-*.log
bvd-
Provides web UI and back end ds --all-namesp vd-explore-deployment-P
explore- kubernete <bvd-explore-deployment-
services for BVD explore aces -o wide | g OD> -n <opsbridge-nam
deployment s-vault-ren POD>_<opsbridge-namesp
rep "bvd" espace>
ew ace>_kubernetes-vault-ren
ew-*.log

Query is executed, but the top 5 or bottom 5 records aren't displayed


Check if the widget and query are mapped correctly:

1. Enter the following URL on a browser:


https://<external_access_host>:<external_access_port>/<ui>
2. Log in with your IdM username and password.
3. From the side navigation panel, click Administration > Dashboards & Reports > Stakeholder Dashboards &
Reports > Dashboard Management.

4. Click the dashboard from which you didn't see data and click .
5. Click the widget for which there is no data and then copy the Data channel name.
6. From the side navigation panel, click Administration > Dashboards & Reports > Predefined Queries.

7. Search for the Data Channel name that you copied, select the Data channel name, and click .
8. The query is displayed in the right pane. Verify if the widget is mapped to the correct query. If not, correct the query and
click RUN. If the query is executed successfully, the query result is displayed:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 435
AI Operations Management - Containerized 24.4

9. Based on the query type, click SAVE or SAVE DATA QUERY

Query is executed, but no records are displayed


Run the same SQL queries from a database tool like DB Visualizer:

select to_timestamp(max(timestamp_utc)) from mf_shared_provider_default .<table name>

Verify the latest data is present in the following tables in the mf_shared_provider_default schema:

opsb_sysinfra_netif_1d
opsb_sysinfra_node_1d
opsb_sysinfra_cpu_1d
opsb_sysinfra_disk_1d
opsb_sysinfra_filesys_1d

The daily or hourly or aligned tables required for corresponding


report queries don't have data

Data isn't present in daily tables


If data isn't present in the daily tables, run the SQL queries from a database tool like DB Visualizer:

select to_timestamp(max(timestamp_utc)) from mf_shared_provider_default.<table name>

Verify the latest data is present in the following tables in the mf_shared_provider_default schema:

opsb_sysinfra_netif_1h
opsb_sysinfra_node_1h
opsb_sysinfra_cpu_1h
opsb_sysinfra_disk_1h
opsb_sysinfra_filesys_1h

Data is present in hourly tables


Check if daily aggregations are executed:

1. In OPTIC Data Lake, go to the itom_di_postload_provider_default and look for the itom_di_postload_provider_default.ROLLUP_CO

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 436
AI Operations Management - Containerized 24.4

NTROL table.
2. In the ROLLUP_ Name column, look for the aggregation name (For example: system_infra_metric_node_1d ) and then look
for the LAST_EXE_TIME (to know the last run time) and then look for the MAX_EPOCH_TIME (to know the time until when
the records were aggregated).
Daily aggregations are expected to run for every 4:45 minutes. If it's beyond this time check for errors in the
aggregation log and report the error.
3. Run the following command to view the log files:
/var/vols/itom/log-volume/<opsbridge-namespace>/<opsbridge-namespace>__<itom-di-dp-worker-dpl-podname>__dp-worker__<kub
ernetes worker node>
For example:
/var/vols/itom/log-volume/opsbridge-jugcl/opsbridge-jugcl__itom-di-dp-worker-dpl-9bd6b6964-2rfd2__dp-worker__btpvm0785.hpeswla
b.net

Data isn't present in hourly tables


Run the SQL queries from a database tool like DB Visualizer:

select to_timestamp(max(timestamp_utc)) from mf_shared_provider_default.<table name>

Verify the latest data is present in the following tables in the mf_shared_provider_default schema:

opsb_sysinfra_netif
opsb_sysinfra_node
opsb_sysinfra_cpu
opsb_sysinfra_disk
opsb_sysinfra_filesys

Data is present in aligned tables


Check if hourly aggregations are executed:

1. In OPTIC Data Lake, go to the itom_di_postload_provider_default and look for the itom_di_postload_provider_default.ROLLUP_CO
NTROL table.
2. In the ROLLUP_Name column, look for the aggregation name (For example: system_infra_metric_node_1h ) and then look
for the LAST_EXE_TIME (to know the last run time) and then look for the MAX_EPOCH_TIME (to know the time till when the
records were aggregated).
Daily aggregations are expected to run every 12 minutes. If it's beyond this time check for errors in the aggregation log
and report the error.
3. Run the following command to view the log files:
Go to the respective pod directory in the NFS volumes:
cd <log-volume>/<suite namespace>
ls <suite namespace>__itom-di-postload-taskexecutor-<pod value>__postload-taskexecutor__<node name>
For example:
/var/vols/itom/opsbvol2/opsb-helm/opsb-helm__itom-di-postload-taskexecutor-85d9b6fbcc-r8mjc__postload-taskexecutor__btp-hvm00
779.swinfra.net

Data isn't present in aligned tables


Run the SQL queries from a database tool like DB Visualizer:

select to_timestamp(max(timestamp_utc)) from mf_shared_provider_default.<table name>

Verify the latest data is present in the following tables in the mf_shared_provider_default schema :

opsb_agentless_netif
opsb_agentless_cpu
opsb_agentless_disk
opsb_agentless_filesys
opsb_agentless_node

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 437
AI Operations Management - Containerized 24.4

Data is present in raw tables, but the copy script hasn't run

Copy scripts copy data from the raw tables to respective Aligned sysinfra tables. In the Vertica database, go to opbsbridge_sto
re schema. In the opsb_internal_reports_schedule_config_1h table, look for:

tablename - look for the source table


processed_time - Displays till when data was considered for aggregation
execution_time - Displays the last time the copy script was executed

Copy scripts are executed once every 5 minute intervals.

Copy scripts were run


If the copy scripts were run, check the copy script log files:

1. On the NFS server, go to: <mounted log path>/reports/agentless_infra


2. Look for errors in the:
For Operations Agent opsb_agtl*.log file.
3. If there are no errors in the logs, change the log level to INFO and verify log messages in the next run.
To change the log level, go to /<conf vol path on NFS>/di/postload/conf/reports and open the <scriptname>log4perl.conf file
and change the log4perl.logger.<scriptname> to INFO.

In the subsequent run of the custom task script, you will find detailed logs in the log file of the respective scripts
with more details like the query runs and the number of records updated.

If the copy scripts were run, and there are no errors in the log files, check for Collection or configuration errors.

Copy scripts weren't run


If copy scripts weren't run as scheduled (for more than 30 minutes) then see Aggregate tables aren't updated, data in the
system infrastructure, or event reports aren't refreshed.

Data isn't present in raw tables, the issue is with OPTIC DL pods

Check the health of the OPTIC DL pods


Command to check
Pod Container Description
health

itom-di-po
stload-task
controller- The task controller takes as input a list of task flows from the administration kubectl describe pod <ito
itom-di-
cnt pod and schedules the eligible tasks for execution. It then sends the task and m-di-postload-taskcontroll
postload-
its associated payload as messages onto the configured task topic in the er-POD> -n <opsbridge-na
taskcontroller kubernete OPTIC DL Message Bus. mespace>
s-vault-ren
ew

itom-di-po
stload-task
executor-c The OPTIC DL Message Bus consumers running in the task executors read kubectl describe pod <ito
itom-di- nt these messages from the task topic and trigger the tasks for execution. It m-di-postload-taskexecuto
postload- then sends the task execution status back to the task controller onto the r-POD> -n <opsbridge-na
taskexecutor kubernete status topic. mespace>
s-vault-ren
ew

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 438
AI Operations Management - Containerized 24.4

Command to check
Pod Container Description
health

itom-di-re
ceiver-cnt kubectl describe pod <ito
itom-di- Receives the data from the source and passes it to the OPTIC DL Message
kubernete m-di-receiver-dpl-POD> -n
receiver-dpl Bus.
s-vault-ren <opsbridge-namespace>

ew

itom-di-ad
ministratio kubectl describe pod <ito
itom-di- n m-di-administration-POD>
administration -n <opsbridge-namespace
kubernete
>
s-vault-ren
ew

certificate
kubectl describe pod < ito
itomdipulsar- -renew
mdipulsar-broker-POD> -n
broker
itomdipuls <opsbridge-namespace>
ar-broker

certificate
-renew kubectl describe pod < ito
itom-di- Responsible for getting data from OPTIC DL Message Bus. and streaming it to m-di-scheduler-udx-POD>
itom-di-ud
scheduler-udx Vertica. -n <opsbridge-namespace
x-schedule
>
r-schedule
r

itom-di-m
etadata-se kubectl describe pod < ito
itom-di-
rver m-di-metadata-server-POD
metadata- Responsible for metadata configuration (table creation)
> -n <opsbridge-namespa
server kubernete
ce>
s-vault-ren
ew

certificate
kubectl describe pod <ito
-renew
itomdipulsar- mdipulsar-zookeeper- POD
zookeeper itomdipuls > -n <opsbridge-namespa
ar- ce>
zookeeper

Resolve the issues according to the log messages.

Data isn't available in OPTIC DL Message Bus Topics


Run the below command to check if the OPTIC DL Message Bus Topics are created inside the itomdipulsar-bastion-0 pod:

1. Login to the itomdipulsar-bastion container:

kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>

2. Execute the following scripts to list the OPTIC DL Message Bus topics:
pulsar@itomdipulsar-bastion-0:/pulsar/bin> ./pulsar-admin topics list-partitioned-topics public/default |grep agentless

"persistent://public/default/opsb_agentless_filesys"

"persistent://public/default/opsb_agentless_node"

"persistent://public/default/opsb_agentless_disk"

"persistent://public/default/opsb_agentless_cpu"

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 439
AI Operations Management - Containerized 24.4

"persistent://public/default/opsb_agentless_generic"

"persistent://public/default/opsb_agentless_netif"

Perform the next solution steps if the issue persists.

Check the communication between OPTIC DL Message Bus producer and


consumer
Run the following scripts in the pulsar0bastion-0 pod to check if the connection to OPTIC DL Message Bus consumer is healthy:

1. Login to the itomdipulsar-bastion container:


kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>
2. Execute the following producer command type 'hi':
./bin/pulsar-client produce <topic name> -m hi
3. Open one more session and execute the following consumer command, where you should see the same message
'hi' typed in the producer console:
./bin/pulsar-client consume -s test-subscription -n 0 <topic name>

Check if data is present in the OPTIC DL Message Bus topic


1. Login to the itomdipulsar-bastion container:

kubectl exec -it itomdipulsar-bastion-0 -c pulsar -n <opsbridge-namespace>

2. Execute the following consumer command to check the data in the OPTIC DL Message Bus topic:

./bin/pulsar-client consume -s test-subscription -n 0 <topic name>

Data isn't present in OPTIC DL Message Bus topics


There must be some error in the collection or configuration. See Troubleshoot System Infrastructure Reports collection issues
with SiteScope as collector to resolve the issue.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 440
AI Operations Management - Containerized 24.4

1.20.16. Troubleshoot System Infrastructure


Reports collection issues with SiteScope as
collector
System Infrastructure Reports reports are showing no data or partial data.

Cause
There must be some error in the SiteScope collection or configuration.

Solution
To resolve this issue, follow these solutions in the same order and check if the data appears on the report.

Check if the certificate exchange between OPTIC Data Lake and OBM is successful

1. Log on to OBM
2. Run the following command: /opt/OV/bin/ovcert -list

All granted certificates are listed.

Check if you have installed Operations Agent and integrated with


SiS Server
To stream SiteScope data into OPTIC Data Lake, you must integrate the Operations Agent which is on the SiteScope server
with OBM.

Perform the following steps to check if you have installed Operations Agent:

1. Log on to the SiteScope node.


2. Run the following command: cd /opt/OV/bin
3. Run the following command: ./opcagt -version
The version of the Operations Agent appears. Make sure that the version is 12.14 or later.

Option 1
Run the following commands if you want to install and integrate Operations Agent (on a SiteScope server) with OBM:

On Linux: Run the following commands:

./oainstall.sh -i -a -s <OBM load balancer or gateway server>

cd /opt/OV/bin

./ovcert -certreq

If the master node (OBM node) is in HA, run the following command: ./oainstall.sh -i -a -s <HA_VIRTUAL_IP's FQDN> -cert_srv <OMT
master node FQDN>

On Windows: Run the following commands:

cscript oainstall.vbs -i -a -s <OBM load balancer or gateway server>

<C:>\Program FilesHP\HP BTO Software\bin>ovcert -certreq

If the master node (OBM node) is in HA, run the following command: cscript oainstall.vbs -i -a -s <HA_VIRTUAL_IP's FQDN> -cert_srv
<OMTmaster node FQDN>

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 441
AI Operations Management - Containerized 24.4

Option 2
If Operations Agent is already installed, follow the steps to integrate Operations Agent with OBM:

1. Run the following command to integrate Operations Agent with OBM:


On Linux:
Run the following commands:
cd /opt/OV/bin
./ovcert -certreq
If the valid certificate exists an ERROR " There is already a valid certificate for this node installed" appears.
To remove this certificate and send a new certificate, check all the certificates available. Run the command: ./ovcert -list
To remove the certificate run the command: ./ovcert -remove <name of the certificate to be removed>
After you remove all certificates from the certificates section, run the command to update the new certificate: ./ovcert -c
ertreq
On Windows:
Navigate to the following location and run the command:
<C:>\Program FilesHP\HP BTO Software\bin>ovcert -certreq
If the valid certificate exists an ERROR " There is already a valid certificate for this node installed" appears.
To remove this certificate and send a new certificate, check all the certificates available. Run the command: ovcert -list
To remove the certificate run the command: ovcert -remove <name of the certificate to be removed>
After you remove all certificates from the certificates section, run the command to update the new certificate: ovcert -
certreq
2. To configure a secure connection between containerized OBM and OPTIC Data Lake, perform the following steps:
1. Get Integration tools
2. Configure a secure connection between containerized OBM and OPTIC Data Lake

Grant the Operations Agent certificate


Follow the steps:

1. Log on to OBM UI.


2. Go to Administration > SETUP AND MAINTENANCE > Certificate Request

3. Click to grant the certificate.

Check if you have deployed the SiteScope Metrics streaming policy


See task 1: Deploy the SiteScope Metrics Streaming Aspect

Check if the monitors are enabled and if you have added the OPTIC
Data Lake tag
See task 5: Enable Monitors

Check the di_receiver log file for errors


1. Go to : /var/vols/itom/log-volume/<opsbridge-namespace>/<opsbridge-namespace>__<itom-di-receiver-pod>__receiver__<worker
machine where di receiver is running>
For example:
/var/vols/itom/log-volume/opsbridge-jugcl/opsbridge-jugcl__itom-di-receiver-dpl-799f8547cd-d6dxb__receiver__btpvm0785.hpeswlab.n
et
2. Check the following file for error: receiver-out.log
3. Take appropriate action.

Check system.txt for errors


1. Log into the Operation Agent node.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 442
AI Operations Management - Containerized 24.4

2. Go to /var/opt/OV/log . Check for system.txt file for errors. Fix the errors.

Enable debug mode


You can change the log level from INFO to DEBUG in the logback files for each of the following:

Pod XML name

OPTIC DL Administration administration-logback-cm

OPTIC DL HTTP Receiver itom-di-receiver-logback-cm

taskcontroller-logback-cm
OPTIC DL Postload Processor
taskexecutor-logback-cm

OPTIC DL Metadata Server metadata-server-logback-cm

Perform the following steps to change the log level:

1. Run the following command:


kubectl get ns
Note down the <application namespace> .
2. Run the following command:
kubectl get cm -n <application namespace> | grep logback
The list of logback files, as mentioned in the table, appear. Perform the next steps for each of the logback files.
3. Run the following command to edit the file:
kubectl edit cm <logback-cm> -n <application namespace>
For example: kubectl edit cm administration-logback-cm -n suitens
4. Change the level from INFO to DEBUG in the following line:
<logger name="com.microfocus" level="INFO" additivity="false">
<appender-ref ref="LogFileAppender"/>
</logger>
5. Save the file.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 443
AI Operations Management - Containerized 24.4

1.20.17. Task flows aren't listed on the OPTIC DL


Health Insights dashboard

Cause
This issue may occur due to the following reasons:

The task flows are configured but not running.


The task flows aren't available.

Solution
1. If the task flows are configured but not running, follow the steps described in Data Processor Postload task flow not
running.
2. If the task flows aren't available, follow the steps described below to resolve the issue on a fresh installation. If you see
this issue in a running setup, or if the issue persists after following the steps below, contact Software Support.

Important

The steps described below may lead to data loss as database tables will get
formatted.

a. Run the following command to download the content :

ops-content-ctl.exe download content -n <content name> -v <version>

b. Rename the downloaded content to increment the version. For example,


rename OpsB_SysInfra_Content_2021.11.004.zip to OpsB_SysInfra_Content_2021.11.005.zip

c. Upload the renamed content file:

ops-content-ctl upload content -f <renamed_file_name>

Example:

ops-content-ctl upload content -f OpsB_SysInfra_Content_2021.11.005.zip

d. Run the following command to uninstall the previous version of the content :

ops-content-ctl uninstall content -n <content_name> -v <content_version>

Example:

ops-content-ctl uninstall content -n OpsB_SysInfra_Content -v 2021.11.004

e. Run the following command to install the uploaded content with the incremented version:

ops-content-ctl install content -n <content_name> -v <new_content_version>

Example:

ops-content-ctl install content -n OpsB_SysInfra_Content -v 2021.11.005

f. Verify if the task flows are available after installing the content.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 444
AI Operations Management - Containerized 24.4

1.20.18. Aggregate tables are not updated data


in the system infrastructure or event reports
aren't refreshed

Cause
This issue may also occur if the log level gets changed to INFO (from the default option ERROR) and not reverted to ERROR
after analyzing the log. This causes the task flows to lock the log files. The locked files that are present in the log location
don't allow the new process to add entries in the log file. Due to this, though the status of the task flow is RUNNING no data
gets processed.

Solution
Perform the following:

1. On the OPTIC Data Lake Health Insights dashboard: Check the status of the task flows and note down the
taskflowId and taskId of tasks that are running for a long time. If you don't see the task flows listed on the
dashboard, refer the steps described in Task flows are not listed on the OPTIC DL Health Insights dashboard.
2. On the master (control plane) node: Check if the same tasks are running for a long time:
Set the logging level to ERROR: If any processes (within the long-running tasks) are running, and if the task
flows aren't completed even after an hour, check the logging level and set it to ERROR.
3. Check if the aggregate data flow is not working. See Troubleshoot Forecast/Aggregate data flow.

See the details of the tasks:

On the OPTIC Data Lake Health Insights dashboard:


1. Go to the Postload Detail dashboard and select the timeframe for which you want to analyze the data (For example:
You may select the timeframe like last 12 or 24 hours).
Dashboard URL: https://<masternode hostname>:<port>/itomdimonitoring/d/postloaddetailv1.
For more details, see Postload Detail dashboard.
2. In the Taskflow drop-down, select the task flow that's running for a long time. The dashboard displays a list of all the
tasks configured in that task flow and their current state
(READY/RUNNING/FINISHED/FAILED_RECOVERABLE/FAILED_NON_RECOVERABLE).
3. Make a note of the taskflowId and taskId/s from the dashboard for all the tasks that are in the RUNNING state and
have been running for a long duration (more than an hour).

On the master (control plane) node:


1. Run the command to get the list of pods:
kubectl -n <deployment_namespace> get pods
2. Run the command to log in to the pod:
kubectl -n <deployment_namespace> exec -it <taskexecutor_pod_instance> -c itom-di-postload-taskexecutor-cnt -bash

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 445
AI Operations Management - Containerized 24.4

Note

If there are many taskexecutor pods, repeat this for each instance of the taskexecutor that's running on the
node.

3. Run the command to get the list of all the processes that are running in the taskexecutor:
ps -ef
4. For all the taskId/s that you noted down from the OPTIC Data Lake Health Insights dashboard, check if the corresponding
tasks (processes within the tasks) are running on any of the taskexecutor pods.

5. If the process for the above task is present in the list then perform the following:

Set the logging level to ERROR:


1. In the configuration file that's present in the mount configuration path (For example: /mnt/itom/postload/conf/rep
orts/ ), check the logging level of the task flow and set it to ERROR (the default option).

For example:
If you want to check the logging level for an agent netif task flow, go to /mnt/itom/postload/conf/reports/agent_infr
a . You will find the following files:

I have no name!@itom-di-postload-taskexecutor-7b5c7f798c-486fm:/mnt/itom/postload/conf/reports/agent_infra> ls -lt


rh
total 24K
-rw-r--r-- 1 root root 619 Sep 28 07:34 opsb_agtnetiflog4perl.conf.bkp
-rw-r--r-- 1 1999 1999 587 Sep 29 07:40 opsb_agtfslog4perl.conf
-rw-r--r-- 1 1999 1999 598 Sep 29 07:49 opsb_agtcpulog4perl.conf
-rw-r--r-- 1 1999 1999 609 Sep 29 07:51 opsb_agtdisklog4perl.conf
-rw-r--r-- 1 1999 1999 609 Sep 29 07:53 opsb_agtnodelog4perl.conf
-rw-r--r-- 1 1999 1999 450 Sep 29 07:58 opsb_agtnetiflog4perl.conf

Open the agtnetiflog4perl.conf file and verify that the value is set to ERROR:

2. Go to mounted log location in the taskexecutor pod (For example: /mnt/itom/postload/log/reports/ )


You will see directories like agent_infra, system_infra, agentless_infra, and event. Go to the content directory
where you had seen the task flow issues.
For example:
To verify log files specific to the Agent content, in this example agent netif related task flow, go to /mnt/itom/p
ostload/log/reports/agent_infra and check the opsb_agtnetif.log.* files.
3. Remove the locked files (with the extension of .LCK ).

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 446
AI Operations Management - Containerized 24.4

4. Terminate the process for which you removed the locked files:
kill -9 <PID of the task>
The process restarts automatically.
For example:
To terminate the opsb_agtnetif.pl process:
1. Run the ps -ef command and get the process id (PID) of the opsb_agtnetif.pl process.
2. Run the command: kill -9 20787
3. The process disappears from the list of running processes in taskexecutor pod and starts automatically in
the next schedule.

5. Verify that the new process for the task triggers per the schedule:

To verify that the process has started in the next schedule, check that the killed process (For example: o
psb_agtnetif.pl ) is shown in the output of the ps-ef command.

Check the database and make sure that the data aggregations are running and data is updated. For
example: You can check if the netif data aggregations are running and data is updated.

If copy scripts were not run as per schedule (for more than 30 minutes) then see Troubleshoot Forecast/Aggregate data flow.

Related topics
To troubleshoot Aggregate table has missing or no data, see Aggregate table has missing or no data.
To troubleshoot Aggregate not happening after upgrade, see Aggregate not happening after upgrade.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 447
AI Operations Management - Containerized 24.4

1.20.19. Insufficient resources to execute plan on


pool itom_di_postload_respool_provider_default
There's a delay in loading aggregated data in reports. The postload resource pool reaches 100% memory usage and the
following error appears in the aggregation log files:

ERROR 3587: Insufficient resources to execute plan on pool itom_di_postload_respool_provider_default [Timedout waiting for resource r
equest: Request exceeds limits: Memory(KB) Exceeded: Requested = 13895, Free = 0 (Limit = 6918675, Used = 7264791) (que
ueing threshold)]

Cause
This issue is because the postload resource pool default memory (default memory value - 25% ) allocated isn't enough for the
ongoing streaming.

Solution
Perform the following steps to increase resource pool memory:

Get the Limit and Used memory details from the error message. The <memory in percentage> = Used-Limit
For example: 7264791-6918675 = 346,116 (about 338 MB required)

If Vertica memory is 32 GB RAM, and the required additional memory is 338 MB, that means, you must increase the memory
by 1% .

If the <memory in percentage> is more than 5% , check the sizing calculator, and contact Support to validate the size.

If the <memory in percentage> is less than 5% , follow these steps:

1. Log on to the Vertica database as the dbadmin user.


2. Run the following query:
3. ALTER RESOURCE POOL <postload resourcepool name> MAXMEMORYSIZE <memory in percentage>;
The <postload resourcepool name> is the name you have given in dbinit_conf.yaml file.
For example: ALTER RESOURCE POOL "itom_di_postload_respool_provider_default" MAXMEMORYSIZE '26%';

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 448
AI Operations Management - Containerized 24.4

1.20.20. tenant_id is not configured - SiteScope


A tenant id enables you to configure multiple tenants. The tenant_id can have a maximum of 80 characters.

You should use the COSO_tenant.properties file to configure the tenant id when SiteScope is the data source.

The COSO_tenant.properties is protected with file permissions. Only users with specific permissions can access this file.

On Windows:

If the SiteScope service is running with a local system account, any user in the Administrators group will be able to
modify this file.
If SiteScope is running with a user account, only that specific user will be able to modify this file.

On Linux:

The root user and the user who is running the SiteScope service can modify this file.

Cause 1
COSO_tenant.properties file isn't accessible

Solution 1
Make sure you have the required permissions.

Cause 2
The warning message, tenant_id is not configured, is logged in error.log file if a tenant_id isn't configured or if the tenant_i
d has more than 80 characters,

Solution 2
Follow the steps:

1. Configure the tenant_id in the COSO_tenant.properties and make sure that you do not exceed 80 characters.

You can find the COSO_tenant.properties at:

On Windows: <SITESCOPE_HOME>\templates.applications\COSO_tenant.properties

On Linux: /opt/HP/SiteScope/templates.applications/COSO_tenant.properties

2. Restart SiteScope.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 449
AI Operations Management - Containerized 24.4

1.20.21. lastUpdatedBy: is not defined in the


schema
When you create collection configuration file from the OOTB "agent-collector-sysinfra", you may get the following error:

Error: 400 BAD_REQUEST "additionalProperties - $.metadata.lastUpdatedBy: is not defined in the schema and the schema
does not allow additional properties,

additionalProperties - $.metadata.lastUpdatedDate: is not defined in the schema and the


schema does not allow additional properties"

Cause
The lastUpdatedBy or lastUpdatedDate from the metadata section isn't removed.

Solution
Remove the lastUpdatedBy or lastUpdatedDate from the metadata section.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 450
AI Operations Management - Containerized 24.4

1.20.22. ops-monitoring-ctl tool is not starting


the metric collection
The ops-monitoring-ctl tool doesn't start the collection and displays the following message:

Name: agent-collector-sysinfra-2760

metric:

timestamp: 1641959364

durationInMilliSec: 236774

state: Metric collection failed on 12 Jan 22 09:19 IST

summary: Fetching collection state failed. Hence collection is not started

Do we need the following message to display?

discovery:

timestamp: 1641958216

durationInMilliSec: 1171

state: Discovery succeeded on 12 Jan 22 09:00 IST

summary: 2494 nodes discovered for target

Cause
The suite is running slow or not responding.

Solution

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 451
AI Operations Management - Containerized 24.4

1.20.23. Agent Metric Collector is unable to


collect metrics from the Operations Agents on
the worker nodes

Cause
You may get the following error: An SSL connection IO error has occurred. This may be due to a network problem or an SSL
handshake error. Possible causes for SSL handshake errors are that no certificate is installed, an invalid certificate is
installed, or the peer doesn't trust the initiator's certificate.

This error occurs if you change the ASYMMETRIC_KEY_LENGTH of the Operations Agent on the OBM server from 2048 to 4096
and not change the ASYMMETRIC_KEY_LENGTH on DBC.

Solution
Update the ASYMMETRIC_KEY_LENGTH on DBC.

Note

The Data Broker Container (DBC) is an Operations Agent node that's managed by OBM. It enables the Agent Metric Collector to
communicate with OBM and receives certificate updates.

Perform the steps to apply the configuration changes on the DBC (Operations Agent node):

1. Update the configuration variable ASYMMETRIC_KEY_LENGTH using the following command:

ovconfchg -ns sec.cm -set ASYMMETRIC_KEY_LENGTH <RSA Encryption algorithm supported key length>

2. To remove the existing node certificate on the agent, run the following commands:

ovcert -remove <certificate alias>

ovcert -remove <CA certificate alias>

3. To request a new node certificate from the management server, run the following command:

ovcert -certreq

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 452
AI Operations Management - Containerized 24.4

1.20.24. The content upload fails or if the tables


in mf_shared_provider_default schema are not
populated completely

Symptom
The content upload failed with the status Completed with errors or tables in mf_shared_provider_default schema aren't
populated completely.

Solution
Perform the following to reinstall the content. To configure the CLI, see Administer AMC with CLI.

1. Run the following command in the folder containing the ops-content-ctl tool to list the Name, Version, and Status of all
the content files.

On Linux:

./ops-content-ctl list content

On Windows:

ops-content-ctl.exe list content

2. Run the following command in the folder containing the ops-content-ctl tool to download the failed content.

On Linux:

./ops-content-ctl download content -n <content name> -v <version>

On Windows:

ops-content-ctl.exe download content -n <content name> -v <version>

Note

The downloaded file gets saved as .zip file in the current


directory.

3. Before uploading the content zip file increase the version number of the downloaded content. For example.,

On Linux:

mv OpsB_SysInfra_Content_2021.05.003.zip OpsB_SysInfra_Content_2021.05.003.001.zip

On Windows:

ren OpsB_SysInfra_Content_2021.05.003.zip OpsB_SysInfra_Content_2021.05.003.001.zip

4. Run the following command in the folder containing the ops-content-ctl tool to upload the renamed content.

On Linux:

./ops-content-ctl upload content -f <zip>

On Windows:

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 453
AI Operations Management - Containerized 24.4

/ops-content-ctl.exe upload content -f <zip>

5. Run the following command in the folder containing the ops-content-ctl tool to reinstall the uploaded content.

On Linux:

./ops-content-ctl upgrade content -n <content name> -v <version>

On Windows:

/ops-content-ctl.exe upgrade content -n <content name> -v <version>

Note

For version 2021.08 and the next versions, you can run the following command in the folder containing theops-content-ctl tool
to force install the content.

./ops-content-ctl install content -n <content name> -v <version> -f

For example,

./ops-content-ctl install content -n OpsB_SysInfra_Content -v 2021.08.026 -f

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 454
AI Operations Management - Containerized 24.4

1.20.25. From OPTIC Data Administration : Could


not complete request successfully
You observe a failure in deployment of CAS content due to server side timeout in OPTIC Data Lake.

Cause
The issue occurs due to the server taking longer than 60 seconds for POST/PUT API calls for CMDB Content.

Solution
To resolve the issue, run the below helm upgrade with additional setting to increase client timeout:

helm upgrade <helm deployment name> <chart> -n <application namespace> --reuse-values --set opsbcontentmanager.contentman
ager.config.defaultHTTPTimeout=180

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 455
AI Operations Management - Containerized 24.4

1.20.26. SysInfra file system or node


instanceType doesn't display the targets

Problem
The targets are not displayed for filesystem and node type in the SysInfra content because in the field group the forecast is
selected, by default.

Solution
1. Launch URL: https://<external_access_host>:<external_access_port>/ui/
2. Log in with your IDM username and password.
3. Click select Operations > Performance & Analysis > Troubleshoot Performance from the left side
navigation.

4. Click and select the field group from the list.


5. Select the normal fieldGroup instead of the forecast.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 456
AI Operations Management - Containerized 24.4

1.20.27.
ProducerBlockedQuotaExceededException error
in DI receiver logs

Problem
Data ingestion POST requests sent to receiver fails with the HTTP status code 429 and the following error appears in receiver
log:

ProducerBlockedQuotaExceededException

Cause
Returns the HTTP Status code 429 when the number of requests sent to Receiver are more than the configured throttle limit
in receiver or one of the components in the data flow pipeline triggers backpressure.

Solution
Perform these steps to resolve this issue:

1. Configure backlog quota for each topic in the Message Bus. To get the backlog quota value of topic configured on your
setup, run the following command:
helm get values <pulsar-release-name> -a -n <namespace> | grep backlogQuotaTopicDefaultLimitSize
2. If the backlog quota configured is 200 MB and the topic has 3 partitions, the backlog quota set for each partition is
200/3 = 66 MB. You may check the backlog quota value of a topic partition, using the following command:

kubectl exec -it itomdipulsar-bastion-0 -n <namespace> -c pulsar -- /bin/bash -c "bin/pulsar-admin namespaces get-backlog-quot
as public/default"

Note: Replace <pulsar-release-name> and <namespace> with values corresponding to your deployment.

Messages sent to receiver will be rejected if the backlog quota has exceeded on the topic set in the header of the
request. The following message appears in the receiver logs:

The backlog quota of the topic persistent://public/default/<partition-name> that the producer <producer-name> produces to is excee
ded

Backlog quota indicates a maximum backlog a subscription can have on a topic partition. The earlier message indicates
that one or more consumers on the partition isn't consuming messages at the same rate at which it's ingested. Do the
following steps to identify the subscription that's causing the backlog:

Go to ITOM DI / Pulsar - Topic dashboard and choose affected partition from Topic filter and pick the time range of
the observation.
Check the list of subscriptions for affected partitions in the Local msg batch backlog panel.
Note: Scheduler creates subscriptions that have the following pattern:
<partition-id>_<scheduler-schema-name>_<topic-name>
For example:
0_itom_di_scheduler_provider_default_demotopic
If the subscription that caused the backlog isn't created by the scheduler and created for testing or troubleshooting
or any other purposes, follow up with a respective owner and check whether these are valid subscriptions or not
and identify their purpose. If these are test subscriptions, delete them.
If the subscription belongs to the scheduler, check if itom-di-scheduler-udx pod and Vertica are up and running. Also,
see the following OPTIC DL Message Bus troubleshooting scenarios to resolve the symptoms that caused the
backlog issue:
The itomdipulsar-bookkeeper pods are not accessible
Vertica Streaming Loader dashboard panels have no data loaded

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 457
AI Operations Management - Containerized 24.4

Automatic certificate request from itom-collect-once-data-broker-svc not received


Single message pushed to a topic is not streaming into database

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 458
AI Operations Management - Containerized 24.4

1.20.28. Service health aggregation


The tables required for the corresponding ( opr_hi* or opr_kpi* ) report queries aren't created.
The tables required for corresponding report queries don't have data.
Task flow ( opr_hi* or opr_kpi* ) required for populating corresponding tables used in reports isn't running.
There are errors during the execution of task flows.

Solution
If the scripts were run, and there are no errors in the aggregate log files:

1. If the query does not return any data then follow the steps:
1. Check if data from the monitored nodes are sent to the raw tables ( opr_hi_status and opr_kpi_status tables) from the
integrated sources (OBM). If data is sent, continue with the following steps, else see if you have configured the
data sources correctly. See Configure reporting.
2. The flow or sequence to check for the data is:
HI Duration data flow across tables:
opr_hi_status > opr_hi_duration_1h > opr_hi_duration_1d
HI Severity data flow across tables:
opr_hi_status > opr_hi_severity_1h > opr_hi_severity_1d
KPI Status data flow across tables:
opr_kpi_status > opr_kpi_status_1h > opr_kpi_status_1d
2. If there is no data in the opr_hi_duration_* , opr_hi_severity_* , and opr_kpi_status_* tables then check the aggregation
scripts.
3. In the database, go to mf_shared_provider_default schema. In the opsb_internal_service_health_schedule_config_1d and opsb_i
nternal_service_health_schedule_config_1h table and look for the following columns:

Note

Aggregation scripts copy data from the raw tables to the respective aggregate tables *_1h. Daily aggregation copy data from
hourly tables to respective daily tables. Aggregation scripts are run once every 60 minutes.

tablename: displays the source table for which the data is getting copied.
processed_time: displays till when data was considered for aggregation.
execution_time: displays the last time the copy script was executed.
last_processed_epoch: displays the Vertica epoch value of the last processed row from the raw tables.

4. If the scripts were run, check the script log files:


1. In the NFS server, go to <mounted log path>/reports/service_health.
2. Look for errors in the:
opr_hi*.log file for HI
opr_kpi*.log file for KPI
5. If there are no errors in the logs, change the log level to INFO and verify log messages in the next run. To change the
log level, go to /<conf vol path on NFS>/di/postload/conf/reports and open the <scriptname>log4perl.conf file and change the
log4perl.logger.<scriptname> to INFO .

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 459
AI Operations Management - Containerized 24.4

Note

After completing the analysis, change the log level back toERROR. Retaining it as INFO may lead to log file locking issues
and cause the task script to hang.

While running the task script, you can view the log details in the log file with more details including query runs and the
number of records updated.

6. Check if there is data in the raw tables, and if there is no data in the service health raw tables ( opr_hi* or opr_kpi* ) then
see Data and Task flow -
Service health aggregation.
7. If the raw tables don't have data then see Data flow from OBM to raw tables.

Related topics
Data Processor Post-load task flow not running
Aggregate functionality is not working as expected
Reporting data and task flows
Aggregate not happening after upgrade
System Infrastructure Availability data is missing in reports

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 460
AI Operations Management - Containerized 24.4

1.20.29. CI enrichment not available for metric


data
Inaccurate downtime and topology reports.

Cause
DES service is unable to update the downtime and CI enrichment fields.

Solution
To resolve the issue follow the steps below:

Ensure the rules and topology for the content is available in DES
Log into the DES pod and execute the following command:
curl --key /var/run/secrets/boostport.com/server.key --cert /var/run/secrets/boostport.com/server.crt --cacert /var/run/secrets/boostpor
t.com/issue_ca.crt -H "Content-Type: application/json" -X GET https://<externalAccessHost>:30010/v1/itomdes/reconciliation | jq . | te
e
Check the topic name is available under the rules.
Check the view name and attribute available under views from the previous result is matching the view name and CI
Attributes in RTSM.

Analyze the enrichment from the log file


Find the log file at <path to NFS log-volume>/cloud-monitoring/des/<namespace>__<podname>__<container-name>__<node-nam
e> . The name of the log file is data-enrichment.log.
If enrichment has failed, find the information in the log.
Example:
time="2022-07-25T08:33:36Z" level=info msg="Record Count:0 Time Taken for Search :2.719923ms Query :@name:server1_sac\\-hv
m00668Node01Cell_8880 @primary_dns_name:sac\\-hvm00668\\.swinfra\\.net @ciType:websphereas Topic Name:opsb_websphere_jd
bc" Application=data_enrichment PID=103 file="reconciliation_parser.go:245" func="task.reconcileTask.updateMetricRecord()"

Verify Redis
Connect to Redis CLI and verify the keys are present by running the below:
kubectlexec <cs-redispodname> -ti bash -c <cs-rediscontainer name> -n <namespace>
Example: kubectl exec itom-opsbridge-cs-redis-dc8948476-mmjsl -ti bash -c cs-redis-n opsbridge-suite
Get the Redis password using the command get_secretredis_pw
>get_secret $REDIS_PWD_KEY
<password of redis instance>
Connect to Redis CLI using the command:
>redis-cli --cert /var/run/secrets/boostport.com/cs-redis.crt --key /var/run/secrets/boostport.com/cs-redis.key --cacert /var/run/secrets/
boostport.com/ca.crt -h cs-redis -p 6380 --tls -a <password of redis instance>
Check if the CI ID is present in Redis using keys ciid* command.
To fetch the properties cached for the CI, execute the command:
HGETALL ciid<Required ID from Keys Result>

Verify RTSM
If Redis Keys doesn't list the CI ID, then verify if the RTSM View is listing the CI.
If Redis Key is available, but the search query Record Count is 0, then verify if the CI Property Value from HGETALL the
result of the CI is matching the RTSM properties of the CI.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 461
AI Operations Management - Containerized 24.4

1.20.30. Downtime enrichments aren't forwarded


to OPTIC Data Lake
After modifying the DES endpoint in OBM infrastructure settings and enabling it for forwarding to OPTIC Data Lake, downtime
data isn't available in opr_downtime table.

Cause
Incorrect infrastructure setting in OBM.

Solution
Do the following to resolve the issue:

Make sure the DES endpoint is https://<externalAccessHost>:30010/v1/itomdes, if you are using Classic OBM.
Make sure the DES endpoint is https://itom-opsbridge-des-svc:40009/v1/itomdes, if you are using Containerized OBM.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 462
AI Operations Management - Containerized 24.4

1.20.31. Troubleshooting topology centric reports


The topology centric reports have any of the following issues:

No data in the tables:


cmdb_entity_relation_businesselement_ci_latest
cmdb_entity_relation_cicollection_ci_latest
cmdb_entity_relation_location_ci_latest
cmdb_entity_relation_node_ci_latest
Empty result set for parameter queries or data queries.

Solution 1
If there is no data in the tables, check for data flow in the cmdb tables. If the data is up to date, then check for the logs in the
custom script log file.

Go to the following file and check the errors:

<mount path for log directory>/reports/cmdb/cmdb_helper.log

Solution 2
If there are no errors in the logs, change the log level to INFO and verify log messages in the next run.

Follow these steps to change the log level:

1. Go to /<conf vol path on NFS>/di/postload/conf/reports .


2. Open the cmdb_helperlog4perl.conf file.
3. Change the log4perl.logger.cmdb_helper value to INFO .

In the next task script run, the respective script log files will consist the detailed logs with details like the query runs and the
number of records updated.

For more information, see Troubleshoot Reports.

Note

Make sure you change the log level back to ERROR after completing the analysis. Leaving it in INFO leads to log file locking
issues and may further result in hanging the task script run.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 463
AI Operations Management - Containerized 24.4

1.20.32. Issue with Data Enrichment Service with


Classic OBM Integration

Troubleshoot issue with CI enrichment for metrics


This topic offers steps to fix missing cmdb_id enrichment or old cmdb_id enrichment for the metrics sent to the Data
Enrichment Service for storing in OPTIC DL.

Cause
In the case of external OBM, the Data enrichment Service will collect CIs from RTSM at fixed intervals provided with ciCollectio
nIntervalMin suite configuration. By default, the interval is 60 minutes. So any CIs discovered in OBM will take a maximum of
60 minutes to be available for enrichment to DES. The same is the case for the CIs which are redeployed and resolved to new
CI ID.

Solution
If it's required to enrich the metrics with cmdb_id immediately, the ciCollectionIntervalMin set to smaller intervals. The
configuration can be updated in the AI Operations Management values.yaml . Edit the ciCollectionIntervalMin mentioned under i
tomopsbridgedes-cicache in values.yaml and update the chart. For more information, see Configure values.yaml.

Note

Lowering the interval will add an additional load on the RTSM scans from
DES.

itomopsbridgedes:
des:
cicache:
# When external OBM is configured, Data enrichment Service will collect CIs from RTSM at fixed interval provided with ciCollectionIn
tervalMin. Value should be provided in minutes.
ciCollectionIntervalMin: "60"

Alternatively, edit itom-opsbridge-des-cicache-builder deployment and update the environment variable


"CI_COLLECTION_INTERVAL_MIN".

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 464
AI Operations Management - Containerized 24.4

1.21. Troubleshoot Open Data Ingestion


This section covers the following troubleshooting scenarios:

Error “Failed to authorize user” error code: 3002


Error “Failed to authorize request” error code: 3001

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 465
AI Operations Management - Containerized 24.4

1.21.1. Error “Failed to authorize user” error


code: 3002
The OPTIC Data Lake API call for data access fails with the following error even if the token is valid:

{"statusCode":403,"reasonPhrase":null,"errorCode":"3002","errorSource":"Data Access","message":"Forbidden","details":"Failed to auth


orize user","recommendedActions":"Please check if the user associated with token has appropriate roles","data":null,"nestedErrors":null
}

Cause
This is because the user role doesn't have the right privileges for the service used. While you deploy the application, make
sure to set the users mapped to the correct roles during user creation.

Solution
Perform these steps to resolve this issue:

Make sure the associated role for the user is as follows that you use for the token authorization:

For the OPTIC Data Lake Administration API calls, the user must have di_admin role.
For the OPTIC Data Lake HTTP Receiver API ingestion calls, the user must have di_ingestion role.
For OPTIC Data Lake Data Access API calls, the user must have di_data_access role.

For steps to create these users and assign roles, see OMT documentation. Set the correct role and perform the API call.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 466
AI Operations Management - Containerized 24.4

1.21.2. Error “Failed to authorize request” error


code: 3001
The OPTIC DL API call for data access fails with the following error even if the token is valid and the associated role is correct:

{ "statusCode": 401, "reasonPhrase": null, "errorCode": "3001", "errorSource": "Data Access", "message": "Authorization Error"
,
"details": "Failed to authorize request", "recommendedActions": "Please check if the valid token/cert is provided", "data": null, "
nestedErrors": null
}

Cause
This issue is because of the expired token. The token has a limited lifetime. The default is 30 minutes.

Solution
To resolve this issue, you must refresh the IdM token. Send the following content to https://<HOST>:<PORT>/idm-service/v3.0/to
kens to refresh the token id:

{
"refresh_credentials":{ "refresh_token":"<REFRESH_TOKEN>" }
}

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 467
AI Operations Management - Containerized 24.4

1.22. Troubleshoot CMI tool errors


This section provides you with the steps to troubleshoot issues that you may face while using the Custom Metric Ingestion
tool.

Logging details
JAR Name Log File Name Log Location

codatoexcel-<version>.jar codatoexcel.log <jar directory>

exceltojson_schema-<version>.jar exceltojson_schema.log <jar directory>

exceltojson_oacollector-<version>.jar oacollector.log <jar directory>

By default, the log level gets set to DEBUG but you can change it to ERROR, WARN, INFO in log4j2.properties file
present at the <jar directory>.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 468
AI Operations Management - Containerized 24.4

1.22.1. The process can't access the file because


it's being used by another process

Cause
This error occurs when the <Datasource.xlsx> or Configuration.xlsx files are open while running the CMI tools.

Solution
Close the <Datasource.xlsx>and Configuration.xlsx before running the tools.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 469
AI Operations Management - Containerized 24.4

1.22.2. Configuration not found in the


configuration sheet
Error: Configuration not found for '<table name>' in the configuration sheet '<sheet name>'

Cause
This error may occur when you run the exceltojson_schema tool if there is a typo while updating the table names in
<Datasource.xlsx> or Configuration.xlsx.

For example: [ERROR]: Configuration not found for 'opsb_weblogic_cluster_status' in the configuration sheet 'Schema-Raw'

Solution
Correct the name in <Datasource.xlsx>and Configuration.xlsx.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 470
AI Operations Management - Containerized 24.4

1.22.3. Field name has invalid input at row


number in the sheet
Error: '<Field name>' has invalid input at row <row number> in the sheet '<sheet name>'

Cause
This error may occur when you run the exceltojson_schema tool if there is no input or invalid input for a field in
the Datasource.xlsx or Configuration.xlsx.

For example: [ERROR]: 'type' has invalid input at row 3 in the sheet 'opsb_weblogic_cluster_status'

Solution
Correct the values and run the tool again.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 471
AI Operations Management - Containerized 24.4

1.22.4. [ERROR] : Column name has invalid input


at row number in the configuration sheet
Error: '<Column name>' has invalid input at row <row number> in the configuration sheet '<Sheet name>', '<Metric
name>' metric id doesn't exist in sheet '<Table name>'

For example: [ERROR] : 'groupByFields' has invalid input at row 3 in the configuration sheet 'Aggregation', 'instance_namee_id' metric id d
oesn't exist in sheet 'opsb_ad_ws_servcoll'

Solution
Make sure that the id column value in the table (For example: opsb_ad_ws_servcoll table) in the
<Datasource.xlsx> matches with the value given in the corresponding column (For example, groupByField column) of the
sheet (For example, aggregation sheet) in the Configuration.xlsx.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 472
AI Operations Management - Containerized 24.4

1.22.5. Type mismatch exception “Can't get a


STRING value from a NUMERIC cell”

Cause
The data type entered for the respective field in .xlsx isn't correct.

Solution
Correct the values and run the tool again

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 473
AI Operations Management - Containerized 24.4

1.22.6. CMI tool fails to generate Excel files when


ran directly on an OA machine
Fails to generate Excel files using the CodaToExcel tool when ran directly on OA machines in 2021.11, 2022.05, and 2022.11.
The tool exists with the following exception: Class: coso.integration.tool.Utils, Method: isWindows, Line No.: 39, Message: OS is not Wi
ndows

Solution
Dump the Coda data object to a text file and give the text file as a source to the CodaToExcel binary using the -s parameter.

Dump the Coda object using the following commands:

/opt/OV/bin/ovcodautil -obj > /tmp/CMI_2021.11.001/NewDump.txt

java -jar codatoexcel-2021.11.jar -s ./NewDump.txt

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 474
AI Operations Management - Containerized 24.4

1.23. Troubleshoot Integration


This section provides the following troubleshooting topics:

Change password for an UCMDB user


Inherited SiteScope integration role is missing
SiteScope topology doesn't appear in RTSM
SiteScope (non-TLS) - OBM (TLS) integration fails while configuring the connected server
Duplicate SiteScope node CIs in the OBM RTSM

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 475
AI Operations Management - Containerized 24.4

1.23.1. Change password for an UCMDB user


To change password for an UCMDB user:

1. Log in to the OBM RTSM using the Local Client. Ensure that the Local Client is installed on the OBM server. For more
information, see Use Local Client.
2. Go to Users and Groups.
3. Select the user and click Reset Password. Enter the new and final password.
4. In the Roles tab, check that both of the following are shown:
Inherited Role contains "SiteScope Integration Roles"
Parent Groups contains "OBM integration admins"

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 476
AI Operations Management - Containerized 24.4

1.23.2. Data forwarding issues from classic OBM


to OPTIC Data Lake
Events, downtime, and service health data is not getting forwarded from classic OBM to OPTIC Data Lake.

Cause
This issue is seen in classic OBM integrations with AI Operations Management installed on GCP. The data forwarding fails due
to incorrect OPTIC Data Lake hostnames populated by obm-configurator.jar file.

Solution
1. In the AI Operations Management installation, collect the helm values for service names from the values.yaml file.

2. In the classic OBM, go to Administration > Setup and Maintenance > Infrastructure Settings and select OPTIC DL.

3. Under OPTIC DL - Settings, update the following services with the data collected from the values.yaml file:

Data Access Endpoint

Data Enrichment Service Receiver Endpoint

Data Receiver Endpoint

4. Go to Administration > Setup and Maintenance > Connected Server > COSO Data Lake and update the Fully
qualified domain name with the Data Enrichment Service Receiver Endpoint for connecting to OPTIC DL.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 477
AI Operations Management - Containerized 24.4

1.23.3. Inherited SiteScope integration role is


missing
If the Parent Groups contains OBM Integration Admins, but the Inherited Roles is missing SiteScope Integration
Role, then do the following:

1. In Users and Groups, click the Groups tab.


2. Select the OBM Integration Admins group.
3. In the Assigned Roles section, click Edit.
4. From the list of Available Roles, select SiteScope Integration Role.
5. In the Select Tenants for Role popup, accept the default All Tenants and click OK.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 478
AI Operations Management - Containerized 24.4

1.23.4. SiteScope topology doesn't appear in


RTSM
After integrating the containerized OBM with SiteScope, if the SiteScope topology doesn't appear in RTSM, you can check for
errors in the SiteScope logs. If you see an error in <SiteScope>\logs\bac_integration\discovery.log , clear the topology cache
on SiteScope:

Stop the SiteScope service.


Delete the 4 files in <SiteScope>\discovery\hsqldb directory
Start the SiteScope service.
Synchronize the topology. Log in to SiteScope and navigate to Preferences > Integration Preferences. Edit APM
Integration, and in the resulting window click Re-Synchronize under APM Preferences Available Operations.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 479
AI Operations Management - Containerized 24.4

1.23.5. SiteScope (non-TLS) - OBM (TLS)


integration fails while configuring the connected
server
The non HTTPS SiteScope tries to communicate with HTTPS OBM via SSLv3, which isn't allowed due to OBM's
TLS configuration. Due to this, Apache aborts the connection attempts. If possible, switch SiteScope to HTTPS and retry the
integration. If this isn't possible, make sure that SiteScope and OBM exclude the same protocols and cyphers.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 480
AI Operations Management - Containerized 24.4

1.23.6. Duplicate SiteScope node CIs in


the OBM RTSM
If you have duplicate SiteScope node CIs in OBM RTSM, merge the CIs as follows:

1. Go to Administration > RTSM Administration > Modeling > IT Universe Manager.

2. Search for the SiteScope node CIs.

3. Select the duplicate CIs, right-click, and select Merge CIs from the context menu. As the merge target, select the CI
that has the FQDN as a primary domain name.

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 481
AI Operations Management - Containerized 24.4

1.24. Contact support


You can contact support at Support and services.

Before you contact support, run the support toolset (for OPTIC Management Toolkit issues) or the diagnoseIdM command (for
IdM issues) to collect diagnostic information that will help support.

Support toolset
OMT provides a support toolset that helps to collect the following information about Containerd, Kubernetes, suites,
commands, directories, and files:

Containerd: containers, inspect, containerd service systemd logs


Kubernetes: nodes, pods, namespaces, images, containers, cluster-info , describe, and logs
Suite: cdfapiserver-db dump, suite data, modules, products deployments, and features
Commands: user defined
Directories and files: user defined

You can view the summary information on a console, and view detailed output information in an encrypted .tar file.

How to use the support toolset


To use the support toolset, follow these steps:

1. Log on to the control plane node.

2. Run the following command:

cd $CDF_HOME/tools/support-tool

3. Run the following command:

# ./support-dump [ -c <dump-filename-with-path> ] [-u <username> [-p <password>]] [-P <package_password>]

Note

Example usage

Run the following command to create a dump file with a default file name in a default directory:

# ./support-dump

Run the following command to create a dump file with a specified file name in a specified directory (for example, create
a dump.aes file in /var/test):

./support-dump -c /var/test/dump.aes

Run the following command to create a dump file with a specified user name and password. For example, create a
dump file with a default file name in a default directory with the password abcdef. Connect the suite-installer with
admin as the user and 123456 as the password.

# ./support-dump -u admin -p 123456 -P abcdef

4. Run the following command to unpack the dump file:

dd if=xxxx.aes |openssl aes-256-cbc -md sha1 -d -k <package_password>|tar zxf -

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 482
AI Operations Management - Containerized 24.4

Encryption of the output file


A dump file is encrypted by default. However, if you choose not to encrypt the output file, you can use the --no-encrypt-output
option.

Caution

By selecting the disable output file encryption option, you are disabling or bypassing security features, thereby exposing the
system to increased security risks. By using this option, you understand and agree to assume all associated risks and hold
OpenText harmless for the same.

How to use configuration files

The support toolset provides a configuration file (conf/supportdump.config) that includes some predefined [commands], [files],
and [dirs] to specify information collection details. You can define your own [commands], [files], and [dirs] in this configuration
file. Additionally, you can create other configuration files in the same directory. When using the configuration files, pay
attention to the following:

The output of the same command will be saved into one file. For example, all the output of the cat command will be
saved to the cat.out file.
All directories, files, and output of commands are stored in the <local_ip>-<NodeType>/os directory.
Wildcard characters can be used in a file name and directory name. For example, /etc/sysconfig/network-scripts/ifcfg-*
Single environment variable is supported. For example, ${CDF_HOME}/log .
A file or files (separated by spaces) following a directory will be excluded from the support toolset collection.

Note

Example usage
The support toolset collects all files and directories in the ${CDF_HOME}/cfg except the * _User.json
file:
${CDF_HOME}/cfg *_User.json

Dump file

The default support dump file is: dmp/support_data_YYYYMMDD-hhmmss.aes . The dump file contains the support_data_YYYYMMDD-
hhmmss.log file of the running support toolset and the ITOM_Core_Platform directory for the dump files. The following table
describes the dump files in the ITOM_Core_Platform directory.

Name Description Type

The directory of container information and user defined information on the current node.

workload:

containerd_config.out: output of the final configuration for containerd


containerd_images.out: image summary managed by containerd
containerd_inspect.out: detail information for each image
<local_ip>- containers.out: container detail information managed by containerd Directory
<NodeType> journalctl_containerd.out: logs for containerd service

os: user defined commands, directories, and files


commands: directory of output files of commands defined in the [commands] section in
.config files. The file name format: <command>.out.
other directories: directories and files defined in the [files], and [dirs] sections in .config
files. The structure of directories will be reserved.

deployment: suite related data, for example, suite feature, embedded suite database data

helm: summary of helm information


global Directory
kubernetes: details of Kubernetes related resources, for example, pods, clusters

platform: summary of pods information

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 483
AI Operations Management - Containerized 24.4

Example: Running the support toolset


The following output is an example of running the support toolset on a console.

[root@myhost support-tool]# ./support-dump


Package password:
Retype package password:

##############################################
OMT - Support Data Export

Date: 2021-08-23 09:49:18


Current node: myhost.mycomany.com
Node type: Master
Containerd: vv1.5.5
Kubernetes: server-v1.21.4 client-v1.21.4
##############################################

----------------------------------------------
Containers in k8s.io namespace
Export: containers.out
Comments: on Master node myhost.mycomany.com
----------------------------------------------
CONTAINER IMAGE RUNTIME
0318347028899bdd7fb30af24ca5fda6a1433a4532519adc39e1a63cc2191a02 itom-image.registry.example.com:port/hpeswitomsand
box/kubernetes-vault-init:0.15.0-0019 io.containerd.runc.v2
04880158976e091d2ff6efa8da436f90104779f289a7a587a982e92df8e85145 itom-image.registry.example.com:port/hpeswitomsandb
ox/opensuse-base:15.3-002 io.containerd.runc.v2
......

Collecting containerd images information


Collecting containerd configuration information

----------------------------------------------
Nodes
Export: kube_summary.out
----------------------------------------------
NAME STATUS ROLES AGE VERSION
myhost.mycomany.com Ready control-plane,master,worker 8h v1.21.4

----------------------------------------------
Pods
Export: kube_summary.out
----------------------------------------------
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINA
TED NODE READINESS GATES
core apphub-apiserver-5b555c6896-jftc8 2/2 Running 0 8h 172.16.0.23 myhost.mycomany.com
<none> <none>
core apphub-ui-54959bb964-m6cx6 2/2 Running 0 8h 172.16.0.22 myhost.mycomany.com
<none> <none>
......

----------------------------------------------
POD Containers
Export: containers_by_pod.out
----------------------------------------------
NAMESPACE POD NODE IMAGE
CONTAINER CONTAINER_ID
core apphub-apiserver-5b555c6896-jftc8 myhost.mycomany.com itom-image.registry.example.com:port/hpeswi
tomsandbox/apphub-apiserver:1.1.0-49 apphub-apiserver 701a4434b5b6
core apphub-apiserver-5b555c6896-jftc8 myhost.mycomany.com itom-image.registry.example.com:port/hpeswi
tomsandbox/kubernetes-vault-renew:0.15.0-0019 kubernetes-vault-renew fe298a393acb

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 484
AI Operations Management - Containerized 24.4

......

Management Portal user(-p):admin


Password:

----------------------------------------------
Suite Deployment
Export: suite_features.out
----------------------------------------------
SUITE VERSION NAMESPACE DEPLOYMENT_STATUS INSTALL_DATE NFS_SERVER NFS_OUTPUT_PATH
demo 2021.11.001 demo-bheco INSTALL_FINISHED null null

----------------------------------------------
Suite Features
Export: suite_features.out
----------------------------------------------
SUITE EDITION SELECTED FEATURE_SET FEATURE
demo <<EDITION_EXPRESS>> true <<FS1_NAME>> <<FS1_DESC>>
<<FS2_NAME>> <<FS2_DESC>>
<<FS3_NAME>> <<FS3_DESC>>
<<EDITION_PREMIUM>> false
<<EDITION_ULTIMATE>> false

Inspect containers .................... exported to containerd_inspect.out


cluster-info dump .......... exported to cluster_info.out
describe pods ................... exported to kube_describe.out
Suite data ........................ exported to suite_data directory
Get logs from all pods ............ exported
Run kube-status.sh ................ exported
Run get pods wide ................. exported
Get describe from all pods ........ exported
Making OS commands & files list ... done
Running OS commands in list ....... done
Collecting OS files in list ....... done
Packing dump files ................ package file is /opt/kubernetes/tools/support-tool/dmp/support_data_20210823-094901.aes.

Please use below command to uncompress the package file:


dd if=/opt/kubernetes/tools/support-tool/dmp/support_data_20210823-094901.aes |openssl aes-256-cbc -md sha1 -d -k <your_pass
word>|tar zxf -

diagnoseIdM command
The diagnoseIdM command is a subcommand of the idm.sh script. It collects diagnostic information about IdM, including:

All IdM log files


All IdM configuration files
Tomcat configuration files
IdM version information
Frequently used SQL query results

Diagnostic information is saved to a file in the /idmtools/idm-installer-tools/ directory. To run the command, follow these steps:

1. Run the following command to get into the IdM pod:

kubectl exec -it $(kubectl get pod -n <namespace> -ocustom-columns=NAME:.metadata.name |grep idm|head -1) -n <namespac
e> -c idm sh

For example:

kubectl exec -it $(kubectl get pod -n core -ocustom-columns=NAME:.metadata.name |grep idm|head -1) -n core -c idm sh

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 485
AI Operations Management - Containerized 24.4

2. Run the following command to collect diagnostic information about IdM:

sh /idmtools/idm-installer-tools/idm.sh diagnoseIdM

To to collect diagnostic information about the SAML configuration (SAML metadata, and the status of signing and
encryption certificates), run the following command:

sh /idmtools/idm-installer-tools/idm.sh diagnose samlMetadata -host <FQDN> -port <port>

For example:

sh /idmtools/idm-installer-tools/idm.sh diagnose samlMetadata -host mymachine.mycompany.com -port 1234

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 486
AI Operations Management - Containerized 24.4

© Copyright 2024 Open Text


For more info, visit https://docs.microfocus.com

This PDF was generated on 12/19/2024 for your convenience. For the latest documentation, always see https://docs.microfocus.com. Page 487

You might also like