0% found this document useful (0 votes)
42 views4 pages

Data Domain - Historical Database Pruning Failed After Upgrading To DDOS 6.1 or 6.2 - Dell US

Uploaded by

604597
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views4 pages

Data Domain - Historical Database Pruning Failed After Upgrading To DDOS 6.1 or 6.2 - Dell US

Uploaded by

604597
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Article Number: 000055469

📠 Print

Data Domain: Historical database pruning failed after upgrading to DDOS 6.1 or 6.2

Summary: How to address issues with the historical database failing to prune after some DDOS upgrade, which will eventually result in the
DD OS /ddr/ partition being full, and causing downtime

Audience Level: Partners

Article Content

Symptoms

Data Domain Restorers (DDRs) use a historical database to record events (such as various performance metrics) over time.
This database is a SQLite database file/instance held under /ddr/hd, i.e.:

/ddr/hd:
-rw-r--r-- 1 root root 1490051072 Jun 8 07:23 dd_hd.db

Note that the /ddr/hd directory sits on the /ddr file system:

Filesystem 1K-blocks Used Available Use% Mounted on


/dev/dd_dg0_0p14 5160576 3691320 1207112 76% /ddr

Once a day a 'prune' job is started by cron to remove old/unnecessary data from the historical database to prevent it from growing too large,
i.e.:

config.crontab.hd_prune = 12 12 * * * root /ddr/bin/dd_hd_rdb_tool -p

Under certain circumstances, however, this job can fail to run (particularly after some upgrades to 6.2.x code). When this happens
corresponding messages will be seen in messages.engineering:

Feb 4 12:10:00 dd2500-01 hdc: INFO: SQLITE: rc=1, no such table: hd_perf_clgrp
Feb 4 12:10:00 dd2500-01 hdc: INFO: Error: <5146: Internal error accessing historical database. Diagnostics:
Operation(prepare stmt), Reported by(hdal), Location(hdal_db_sqlite3.c, 237, _get_object_meta_info), Database
error(sqlite3, 1, no such table: hd_perf_clgrp), ddr/lib/dd_hd_rdb_sqlite.c, dd_hd_rdb_report_error_sqlite, 3825>

In addition, an alert will be raised indicating that pruning has failed:

Feb 4 12:12:02 dd2500-01 dd_hd_rdb_tool: INFO: Event posted: p0-19 (11000013:285212691): EVT-HD-RDB-0004: Historical
database pruning failed.

If pruning fails repeatedly then old/unnecessary data will not be able to be removed from the historical database meaning that
it slowly grows in size.

Ultimately this can cause the /ddr file system to become 100% full (either transiently during attempted pruning or permanently
if the historical database increases to a sufficient size). This then causes further issues such as an inability for the system
to update registry files. Ultimately this can lead to system instability/unexpected DDFS restarts.

Cause

Pruning of the historical database generally requires as much free space in the /ddr file system as the current size of the historical
database. If the /ddr/ has somehow become nearly full, the pruning job for the historical database may fail, which will only add to disk
space consumption on /ddr/ overtime.
However in this case we focus on the pruning of the historical database failing for reasons other than previous lack of space. Something in
the sequence of DDOS upgrades and / or other problems had during the upgrades, may have caused a missing table in the historical DB.
This missing table is not something that the sqlite validation check will catch. The missing table is typically "hd_perf_clgrp" or
"hd_space_fmig_runs_detailed", as seen below:

Feb 4 12:10:00 dd2500-01 hdc: INFO: Error: <5146: Internal error accessing historical database.
Diagnostics: Operation(prepare stmt), Reported by(hdal), Location(hdal_db_sqlite3.c, 237,
_get_object_meta_info), Database error(sqlite3, 1, no such table: hd_perf_clgrp),
ddr/lib/dd_hd_rdb_sqlite.c, dd_hd_rdb_report_error_sqlite, 3825>

If this error, or any other in the logs (messages.engineering) for other missing tables keeps showing, or alerts about the historical database
not pruning daily show up, please contact your contracted support provider at the earliest to have the underlying issue resolved.

Resolution

Data Domain Support will most likely need a remote session to do proper troubleshooting for the root cause of the reason why the
historical database is not pruning. Failing to resolve this pruning issue will continue increasing the disk space used under the smaller /ddr/
partition, and will eventually fill up, resulting in potential unavailability of the DD and downtime.

Unless if the problem with pruning is the result of /ddr/ already being nearly full, the resolution will incur fixing the historical database
structure and, in some cases, running the pruning action off the small /ddr/ partition, so that the process doesn't run out of space. Neither
one of these actions require any downtime or will affect the running of backups or other DD activities. The only downside may be some
historical and performance entries in the database to be dropped while the database is being made consistent or being pruned off a
separate, larger partition.

Note the code issue resulting in the mentioned trigger preventing the historical database upgrade (and hence the failures to prune it daily)
was fixed in the code for the following releases:

DDOS 6.1.2.40 and later


DDOS 6.2.0.20 and later

Hence, for any customer planning to upgrade for the first time to DDOS 6.1 or DDOS 6.2, it is strongly advised to do so to any of the
mentioned fixed releases above.
Note: DD OS upgrade will not resolve the issue if the Data Domain is already having historical database pruning errors.
Additional Information

This content is translated in other languages:

https://downloads.dell.com/TranslatedPDF/PT-BR_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/ZH-CN_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/ES_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/DE_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/FR_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/IT_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/JA_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/NL_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/KO_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/RU_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/PT_KB531069.pdf
https://downloads.dell.com/TranslatedPDF/SV_KB531069.pdf

Errors while pruning the historical database can be found in "messages.engineering", so using "log view debug/messages.engineering" may
be used to search for those, considering the pruning job is scheduled to be started at 12:12 PM local DD time every day. It would be useful to
provide DD Support with any matching errors or a full SUB up-front for analysis.
Partner Notes
If any of the actions above fail because the /ddr/ partition is too full, and there is no way to make sufficient free space to complete those
actions DD may even enter into a limited session with write error: No space left on device

The historical database will have to be copied from /ddr/hd/ into a partition with more disk space, and the actions performed on the copy
of the database, so once the database is consistent, upgraded and pruned, it will be copied back replacing the original one. Summarized
steps would be as follows:

Delete the .backup* and .tmp_backup* file under /ddr/hd


!!!!YOUR DATA IS IN DANGER !!!! # du -sh /ddr/hd/*
509M dd_hd.backup.db.gz
2.3G dd_hd.db
496M dd_hd.tmp_backup.db.gz
# rm /ddr/hd/dd_hd.backup.db.gz /ddr/hd/dd_hd.tmp_backup.db.gz

Make a copy of the large historical database somewhere else after verifying that the destination partition has enough extra space:
# cp /ddr/hd/dd_hd.db /ddr/var/tools/

Do any necessary actions on copy of the database itself (such as dropping the offending trigger):
# sqlite3 -vfs /ddr/var/tools/dd_hd.db
sqlite> DROP TRIGGER IF EXISTS hd_ddboost_optdup_prune;
sqlite> .quit

Run the historical database upgrade tool on the copy of the DB:
# dd_hd_rdb_tool -u -d /ddr/var/tools/dd_hd.db
You can check progress in a duplicate Bash session or wait for the above command to finish.:
# tail -f | grep "HD RDB" /ddr/var/log/debug/messages.engineering
Sep 25 13:30:34 DD9500 dd_hd_rdb_tool: INFO: hd_upgrade: HD RDB Upgrade /ddr/var/tools/dd_hd.db
begin
Sep 25 13:31:04 DD9500 dd_hd_rdb_tool: INFO: hd_upgrade: HD RDB Upgrade end

Now test pruning the copy of the database as well:


# dd_hd_rdb_tool -p -d /ddr/var/tools/dd_hd.db
You can check progress in a duplicate Bash session or wait for the above command to finish:
# tail -f | grep "hd_prune" /ddr/var/log/debug/messages.engineering
Sep 25 13:36:41 DD9500 dd_hd_rdb_tool: INFO: hd_prune: HD RDB Pruning begin
Sep 25 13:36:47 DD9500 dd_hd_rdb_tool: NOTICE: hd_prune_alerts: keep latest 1000 samples
Sep 25 13:36:47 DD9500 dd_hd_rdb_tool: NOTICE: hd_prune_ddboost: prune samples older than 7 days
Sep 25 13:36:47 DD9500 dd_hd_rdb_tool: NOTICE: hd_prune_vdisk: prune samples older than 31 days
Sep 25 13:36:47 DD9500 dd_hd_rdb_tool: NOTICE: hd_prune_pcr: prune hd_pcr_measurements_ext samples
older than 5 days
Sep 25 13:36:47 DD9500 dd_hd_rdb_tool: NOTICE: hd_prune_pcr: Pruning tenant, tenant-unit, and mtree
samples from hd_pcr_measurements table
Sep 25 13:37:15 DD9500 dd_hd_rdb_tool: INFO: hd_prune: HD RDB Pruning end

Finally, in-place replace the historical database with the upgraded, fixed one, and make sure to restart the "hdc" process:
# mv /ddr/var/tools/dd_hd.db /ddr/hd/dd_hd.db
# kill `pidof hdc`

This has been seen a few times, for example Bug 234768 and Bug 245390, the fundamental reason for the problem (with the existing trigger
preventing the historical database upgrade) being described and fixed in Bug 234998.

Article Properties

Affected Product
Data Domain
Product

Data Domain

Last Published Date

21 Dec 2020

Version
3

Article Type
Solution

You might also like