OceanProtect Backup Storage 1.3.0 Troubleshooting (With Data Backup Feature) 02
OceanProtect Backup Storage 1.3.0 Troubleshooting (With Data Backup Feature) 02
1.3.0
Issue 02
Date 2023-07-12
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: https://www.huawei.com
Email: [email protected]
Purpose
This chapter describes the entire troubleshooting process, including the safety
operation guide, troubleshooting preparation, basic principles and methods of
fault locating, troubleshooting procedures, and common troubleshooting methods.
Intended Audience
This document is intended for:
● Technical support engineers
● Maintenance engineers
Symbol Conventions
Symbols that may be found in this document are defined as follows.
Symbol Description
Change History
Issue Date Description
Contents
2 Troubleshooting Procedure................................................................................................... 6
3 Information Collection and Fault Reporting.....................................................................8
3.1 Collecting Live Network Information............................................................................................................................... 8
3.2 Collecting System Fault Information..............................................................................................................................10
3.2.1 Using OceanProtect to Collect Information............................................................................................................. 10
3.2.1.1 Exporting Alarms and Events..................................................................................................................................... 10
3.2.1.2 Exporting Debug Logs.................................................................................................................................................. 11
3.2.2 Using SmartKit to Collect Device Information........................................................................................................ 11
3.3 Collecting ProtectAgent Logs............................................................................................................................................15
3.4 Collecting the Ethernet Switch Information................................................................................................................ 15
4 Troubleshooting.....................................................................................................................17
4.1 Login Faults............................................................................................................................................................................. 17
4.1.1 Failed to Log In to OceanProtect Using the Firefox Browser............................................................................. 17
4.2 Download Faults................................................................................................................................................................... 18
4.2.1 Failed to Download ProtectAgent Using Internet Explorer 11 in the Windows Server OS..................... 18
4.3 Application Faults................................................................................................................................................................. 20
4.3.1 Error Message "ORA-19571" Is Displayed When Oracle Archive Logs Are Backed Up............................ 20
4.3.2 Restoration Task Fails Due to the Full Flash Recovery Area of the Database............................................. 21
4.3.3 When an Oracle Database Is Being Backed Up, the Backup Job Stops Running for a Long Time....... 22
4.3.4 Oracle Database Backup Fails and the Error Details Contain "RMAN-06059: expected archived log
not found, loss of archived log compromises recoverability"....................................................................................... 23
4.3.5 A Database Breaks Down After the Database Authentication Mode Is Changed During Oracle ADG
or RAC Cluster Backup............................................................................................................................................................... 24
4.3.6 Failed to Execute an Oracle Recovery Job and the Error Details Contain Error Code ORA-19698.......24
4.3.7 Data Recovery to the Original Host or Instant Recovery Fails and the Error Details Contain
"ORA-01034: ORACLE not available".................................................................................................................................... 25
4.3.8 A Message Is Displayed Indicating that the Database Is Running When a Recovery Job Is Executed
............................................................................................................................................................................................................ 25
4.3.9 Backup Fails Because the Name of a Protected VM Contains Special Characters..................................... 26
4.3.10 Restoration Is Successful but the VM Fails to Start After the VM Is Successfully Restored to the
Original Location.......................................................................................................................................................................... 28
4.3.11 Restoration Is Successful but the VM Fails to Be Started When the Original VM Is Overwritten for
Restoration..................................................................................................................................................................................... 29
4.3.12 Container Process in the D State Due to a Storage Pool Fault.......................................................................30
4.3.13 Abnormal POD Status................................................................................................................................................... 31
4.3.14 HDFS Restoration Fails Because Space Occupied by Data Increases During the Process......................33
4.3.15 The Protection Status of Some Resources Remains Creating..........................................................................33
4.3.16 The Storage Pool Capacity Is Used Up.................................................................................................................... 34
4.3.17 Failed to Initialize the Dameng Database.............................................................................................................. 41
4.3.18 After a SQL Server Cluster Instance Is Recovered, the Primary and Secondary Replica Nodes in the
Availability Group of the Cluster Instance Are in the Not Synchronizing/Suspect State................................. 42
4.3.19 When Recovering an OpenGauss Cluster Instance, the Cluster Status Is Degraded and the Task
Status Is Failed............................................................................................................................................................................. 49
4.3.20 Failed to Create a Copy Index.....................................................................................................................................50
4.3.21 Failed to Restart the Backup Subjob of the Source Deduplication Service During the Backup.......... 51
4.3.22 Service Data Write Times Out and Fails When Backing Up Non-InnoDB Data of MySQL or MariaDB
............................................................................................................................................................................................................ 52
4.3.23 An Error Is Reported When a Replication Job Is Executed, Indicating that the Remote Replication
Pair Is Faulty.................................................................................................................................................................................. 53
4.3.24 Failed to Recover an OpenStack Cloud Server Because the Agent Host Is Faulty for a Long Time
............................................................................................................................................................................................................ 54
4.3.25 No Available Network Drive for Browsing Windows Fileset Resources....................................................... 55
4.3.26 Plug-in Authentication Failure Reported by the Windows Fileset Host.......................................................56
4.3.27 Failed to Execute a Database Backup Job.............................................................................................................. 57
4.3.28 Failed to Delete Temporary Snapshot Files of the Huawei Cloud Stack Application..............................58
4.3.29 Failed to Execute a Live Mount Job.......................................................................................................................... 59
4.3.30 Failed to Install the Agent on a Windows Host................................................................................................... 59
4.3.31 Recovery Subjob Fails During MySQL/MariaDB Instance Recovery.............................................................. 60
4.3.32 Host Goes Offline Repeatedly and Jobs Fail to Be Executed Occasionally................................................. 61
4.3.33 Job Stopped Unexpectedly During Execution........................................................................................................62
4.3.34 Failed to Log In to Controller B When the Management Port of Controller A Is Faulty....................... 63
4.3.35 Error Code "SQL4973N" Is Displayed When the System Failed to Run the Rollforward Command
During Restoration...................................................................................................................................................................... 64
4.3.36 Abnormal Status of the Dameng Database Cluster After Restoration........................................................ 64
4.3.37 Source Deduplication Enabled in a Backup Policy Does Not Take Effect....................................................65
4.3.38 Failed to Create a Snapshot During the Execution of an OpenStack Host Backup Job......................... 67
4.3.39 Some Application Copies Failed to Be Replicated (or Reversely Replicated) from Version 1.3.0 or
Later to Version 1.2.1.................................................................................................................................................................. 67
5 Common Operations............................................................................................................ 82
5.1 Uninstalling ProtectAgent.................................................................................................................................................. 82
5.2 Forcibly Uninstalling ProtectAgent................................................................................................................................. 83
5.3 Installing ProtectAgent (Manual Mode, Applicable to the Linux OS)................................................................85
5.4 Installing ProtectAgent (Manual Mode, Applicable to the Windows OS)........................................................ 87
5.5 How Do I Access the GaussDB Container to Perform Database Operations?................................................. 90
5.6 Logging In to OceanProtect.............................................................................................................................................. 91
5.7 Downloading the ProtectAgent Software Package................................................................................................... 93
5.8 What If I Failed to Download ProtectAgent Using Internet Explorer 11 on the Windows Server?..........94
B Glossary................................................................................................................................... 99
This chapter provides guidelines for safety operations during activities such as
installation, maintenance, and troubleshooting. The guidelines consist of safety
regulations for both personnel and equipment. You must follow these guidelines
to avoid personal injury and equipment damage.
1.1 Alarm and Safety Symbols
1.2 Safety Precautions for ESD Protection
1.3 Safety Precautions for Laser Protection
1.4 Safety Precautions for Using Fibers
1.5 Safety Precautions for Short Circuit Protection
1.6 Safety Precautions for Operating Equipment
1.7 Safety Precautions for Condensation Prevention
Table 1-1 lists the alarm and safety symbols labeled on equipment.
Symbol Description
Symbol Description
Personal Injury
DANGER
The laser emitted by an optical module is an invisible infrared ray, which may
cause permanent eye injury. Do not look into the optical module during device
maintenance.
Equipment Damage
To prevent equipment damage when you handle the equipment, follow these
precautions:
● When not in use, the optical interfaces on the equipment and fiber connectors
on fiber jumpers must be covered with dust-proof caps.
● After removing a fiber jumper that connects to an optical interface on the
equipment, cover the optical interface and the fiber jumper connector with
dust-proof caps.
● When performing a hardware loopback test by connecting a fiber jumper to
an optical interface, add an attenuator to prevent the risk of damage to the
optical module caused by excessively strong optical power.
● When using the Optical Time Domain Reflectometer (OTDR), disconnect the
fiber jumper between the peer equipment and the local equipment to avoid
damage to the optical module caused by excessively strong optical power.
● Unless necessary, do not remove or insert the modules connecting to fibers.
DANGER
The laser beam on an optical interface board or from a fiber may cause eye injury.
Do not look into optical interfaces or fiber connectors during installation and
maintenance.
Replacing Fibers
Use dust-proof caps to cap the connectors of the fibers that are not in use.
NOTICE
● Do not place tools on air intake boards of cabinets. Otherwise, a short circuit
may occur.
● Do not drop screws into a cabinet or the equipment. Otherwise, a short circuit
may occur.
DANGER
● Before checking device installation and cable connections, ensure that the
system power supply is switched off. Otherwise, incorrect or loose cable
connections may result in personal injury or equipment damage.
● Do not wear an ESD wrist strap when powering on the equipment to prevent
an electric shock.
Troubleshooting
DANGER
NOTE
If the temperature difference cannot be determined, wait one night after moving devices to
the equipment room and then install them.
2 Troubleshooting Procedure
This section describes the troubleshooting procedure, helping you solve problems
that may occur when using OceanProtect backup storage.
Table 2-1 describes the procedure for troubleshooting OceanProtect backup
storage.
Step 1 Record the System messages (such as error codes and job details)
fault message. that are displayed after a fault occurs can be used to
diagnose the fault.
Step 3 Record all If no system messages are displayed, you can collect
information. alarms and logs. For details, see 3.2 Collecting
System Fault Information.
Step 4 Rectify the After the fault is located, remove the fault by referring
fault. to the instructions in the following information:
● Prompt message
● Alarm information
● Log information
● 4 Troubleshooting
Step 5 Contact If the fault persists after you follow instructions in the
Huawei preceding information, summarize fault information
technical and contact Huawei technical support engineers.
support.
After a fault occurs, you can collect and report basic information, fault
information, and device information in a timely manner to help maintenance
personnel quickly locate and rectify the fault. Note that the information collection
operations described in this chapter can be performed only after being authorized
by the customer.
3.1 Collecting Live Network Information
3.2 Collecting System Fault Information
3.3 Collecting ProtectAgent Logs
3.4 Collecting the Ethernet Switch Information
Applicat Operating Records the type and version number of the operating
ion host system system installed on the application host.
informat version
ion
Applicatio Records the application type and version number.
n version
Context
Specify the alarms and events to be exported by setting the severity or time of
occurrence.
Internet Explorer on Windows is used as an example here. If another browser is
used, perform related operations based on site requirements.
Precautions
Alarms and events exported from the OceanProtect backup storage are saved in a
*.xls file. Do not modify the file.
Procedure
Step 1 5.6 Logging In to OceanProtect.
Step 2 Choose Insight > Alarms and Events > Alarms.
The Alarms page is displayed.
----End
Procedure
Step 1 Log in to the OceanProtect WebUI by referring to 5.6 Logging In to
OceanProtect.
Step 2 Export system logs.
1. Choose System > Log Management .
2. In the system debug log settings area, select the level of logs to be exported
from the drop-down list.
You can export the debug logs at the selected level (and above).
3. Click OK to save the setting of the export log level.
4. In the log export area, click the drop-down list and select the node and
software module whose logs are to be exported.
The exported logs include system debug logs and system running parameters.
5. Click Export to export debug logs.
Step 3 Export agent logs.
1. Choose Protection > Hosts and Applications > Hosts, and find the target
host.
2. Select the target host and choose More > Export Logs in the upper area of
the page.
----End
Prerequisites
● The SmartKit installation package of V200R007C00RC9SPC200 or later is
available on the maintenance terminal. This section uses the
V200R007C00RC9SPC200 version as an example. The package name is
SmartKit_V200R007C00RC9SPC200_zh.zip. Obtain the software package as
follows:
– Enterprise users: Click here.
– Carrier users: Click here.
● You have obtained the signature file and digital signature public key and
verified the software integrity.
a. On the OceanProtect backup storage server software download page,
download the .asc digital signature file with the same name as the
software package.
b. Use the digital signature public key and verification tool provided by the
Huawei technical support website to verify the digital signature of the
software package. Obtain the digital signature public key, validation tool,
and their usage methods as follows:
▪ Enterprise users
Software digital signature validation tool (OpenPGP)
▪ Carrier users
Digital Signature Authentication Mode
● You have obtained the management IP address, username, and password for
logging in to OceanProtect backup storage.
Context
After logging in to SmartKit, click in the upper right corner to obtain help
information.
Procedure
Step 1 Import the information collection tool package. If the information collection tool
package has been imported, skip this step.
1. On the SmartKit home page, click Function Management.
2. Click Import.
3. In the Import dialog box that is displayed, select Collect.zip and click OK.
4. In the Verification and Installation window, select Information Collection,
and click Install.
5. Click OK on the dialog box displayed after the tool is successfully installed.
Step 2 In the navigation tree on the left, select Storage and click Storage Information
Collection in the right pane.
Step 3 In the Storage Information Collection window, click the Information Collection
area.
Step 4 Add the OceanProtect backup storage. If the OceanProtect backup storage has
been added, skip this step.
1. Click Add Devices.
3. Click Next.
4. Configure authentication information. The username is admin. To obtain the
default password of user admin, see the OceanProtect Backup Storage 1.3.0
Account List (with Data Backup Feature).
5. Click Finish.
If the following dialog box is displayed, click OK.
----End
Step 1 Log in to the host, and run the following command to go to the ProtectAgent
installation directory:
cd /opt/DataBackup/ProtectClient
Step 3 Run the following command to go to the log storage directory and obtain logs.
cd /opt/DataBackup/ProtectClient/ProtectClient-E/stmp
----End
----End
4 Troubleshooting
Symptom
A user fails to log in to OceanProtect using the Firefox browser. The system
displays the error message "You are attempting to import a cert with the same
issuer/serial as an existing cert, but that is not the same cert. Error code:
SEC_ERROR_REUSED_ISSUER_AND_SERIAL."
Alarm Information
None.
Possible Causes
The certificate verification mechanism of the Firefox browser cannot identify
certificates with the same issuer but different serial numbers. This problem occurs
when a user logs in to a OceanProtect and then logs in to another OceanProtect.
Troubleshooting
● Use another browser, such as Chrome, to log in.
▪ If no, go to c.
c. Close the Firefox browser and go to the installation path of the Firefox
browser, for example, C:\Users %userprofile %\AppData\Roaming
\Mozilla\Firefox\Profiles %profile.default %\.
d. Delete the cert_override.txt file and .db files (for example, cert8.db or
cert9.db) from the path.
e. Log in to OceanProtect again and check whether the fault is rectified.
Suggestion
None.
Symptom
When archive logs of the Oracle database are backed up, the backup job fails and
the error code ORA-19571 is displayed in the job details.
Possible Causes
The value of the Oracle database parameter control_file_record_keep_time is
smaller than the archive log backup period. As a result, the archive log file has
been overwritten during the next log backup, and logs cannot be backed up.
Troubleshooting
Step 1 Use PuTTY to log in to the Oracle database host.
● Solution 2:
On OceanProtect, change the archive log backup period in the SLA to be
smaller than the value of control_file_record_keep_time.
----End
Possible Causes
The database host has residual database files, causing the Flash Recovery Area to
become full during the restoration. As a result, the restoration task fails.
Troubleshooting
Step 1 Use PuTTY to log in to the Oracle database host.
Step 4 Run the following command to delete the residual database files:
rm -rf Residual database file directory
----End
Symptom
When an Oracle database is being backed up, the backup job stops running for a
long time. The operating system of the database host is Red Hat 7.6 and the
kernel version is 3.10.0-957.el7.x86_64.
Possible Causes
The kernel issue of Red Hat 7.6 3.10.0-957.el7.x86_64 causes OceanProtect backup
storage process to restart. As a result, the file system on the backup storage
cannot be written, and the Oracle backup job is suspended.
Troubleshooting
Step 1 Use PuTTY to log in to the Oracle database host.
Step 2 Run the following commands to check whether the operating system of the host is
Red Hat 7.6 and whether the kernel version is 3.10.0-957.el7.x86_64:
cat /etc/redhat-release
uname -r
● If yes, go to Step 3.
● If no, contact technical support.
Step 3 Download the Red Hat's kernel package.
Step 4 Upload the downloaded kernel package to the database host.
Step 5 Run the following command to install the new kernel package:
rpm -ivh Kernel package
Step 6 After restarting the operating system, run the following command to check
whether the current kernel version is the new one:
uname -r
----End
Possible Causes
Before the backup job is executed, the file system that stores backup log files is
deleted. As a result, invalid log files exist on the Oracle host and the backup fails.
(On DeviceManager, the file system whose name starts with L stores backup log
files.)
Troubleshooting
Step 1 Use PuTTY to log in to the Oracle database host.
Step 2 Run the following command to switch to user oracle:
su - oracle
Step 4 Run the following commands to verify and delete the invalid log files:
crosscheck archivelog all;
delete expired archivelog all;
----End
Symptom
In the Oracle 11.2.0.1 ADG or RAC cluster running AIX 7.1/7.2, after OS
authentication is changed to database authentication, the database breaks down
unexpectedly.
Possible Causes
In the Oracle 11.2.0.1 environment, if the rman command fails to be invoked
during backup job execution, the fault is caused by the Oracle database.
Troubleshooting
Upgrade the Oracle database to 11.2.0.4 or a later version.
Symptom
An Oracle recovery job fails to be executed. The error details are as follows:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at xx/xx/2023 11:17:16
ORA-19698: +DG/db_name/redo_file_name is from different database: id=722521960, db_name=db_name
Possible Causes
When Oracle recovery to a different host is performed, if a same-name redo file of
another database exists at the target location, the Oracle attempts to use this
redo log file for recovery. An error is reported because this redo log file belongs to
another database.
Troubleshooting
Step 1 Change the redo file path of the conflicting database and move the redo file from
the file system to another location.
----End
Symptom
After a user performs the following operations, data restoration to the original
host or instant recovery fails, and the error details contain "ORA-01034: ORACLE
not available".
1. Perform backup for database A on a single node (the volume type is ASM) to
generate backup copy a.
2. Use the single node as the target location for a live mount. The mounted
copy and target host has a database that shares the same name with
database A.
3. Cancel the live mount after performing the live mount.
4. Use backup copy a to restore data to the original host or perform instant
recovery.
Possible Causes
During original host restoration or instant recovery, OceanProtect backup storage
modifies the spfile file. In the preceding scenario, the spfile file does not exist in
the database. As a result, the database fails to be started and the restoration job
fails.
Troubleshooting
Step 1 Use PuTTY to log in to the Oracle database host.
----End
Perform the following steps to shut down the database and perform the job again:
----End
Possible Causes
The name of a protected VM or disk contains the special character @.
Troubleshooting
Step 1 Check whether the VM name or virtual disk name contains the special character
@.
1. Use a client or browser to log in to the ESXi host or vCenter Server.
2. Check whether the name of the VMDK file or its parent folder of the VM in
the datastore contains the at sign (@).
– If yes, go to Step 2 to change the VM name.
– If no, contact technical support engineers.
Step 2 Shut down the VM.
This section assumes that the old VM name is yms123@ and the new VM name is
yms1234 after the change.
Step 3 Use PuTTY to log in to the ESXi host where the VM resides.
Step 4 Run the following command to go to the directory where the VM resides:
cd /vmfs/volumes/datastore name/original VM name
Step 5 Run the following command to rename the .vmdk and _n.vmdk files. (The virtual
disk files include only the old name.vmdk file and the old name_n.vmdk file.)
For example, run the following command to copy the configuration file:
cp "[email protected]" "yms1234.vmx"
Step 7 Open the new VM configuration file and change all the old names to the new
names.
Step 8 Run the following command to rename all the other files:
mv "old name.nvram" "new name.nvram"
mv "yms123@_1-ctk.vmdk" "yms1234_1-ctk.vmdk"
Step 10 Run the following command to delete the old VM configuration file:
rm /vmfs/volumes/datastore name/new name/old name.vmx
Step 11 Run the following command to register the new VM on the ESXi host.
vim-cmd solo/registervm /vmfs/volumes/datastore name/new name/new name.vmx
----End
● The system enters the emergency mode. In emergency mode, run the journal
-p err command. If information similar to the following is displayed, the boot
partition cannot be mounted:
failed to mount /boot
Possible Causes
A disk fails to be mounted in the full copy because the inode information in the
first block group of the disk is modified.
Troubleshooting
Step 1 Check whether the problem is caused by the failure to mount some data disks or
system disks on the VM.
This section uses the /dev/sdc1 device as an example. In the actual situation,
other devices may be not found.
1. Log in to the protected VM and enter the maintenance mode as prompted.
2. Run the following command to check whether automatic mounting upon
system startup has been enabled for the faulty disk:
cat /etc/fstab
If the mounting information about /dev/sdc1 does not exist, the disk is not
mounted.
4. Run the following command to check whether the device exists:
fdisk -l | grep "dev/sdc1"
xfs_repair /dev/sdc1
Step 4 Run the following command to check whether the disk is mounted successfully:
mount | grep "/dev/sdc1"
If you can access the operating system and the mounting information about /dev/
sdc1 exists, the fault is rectified.
----End
Possible Causes
When a VM is restored in Hot-Add transmission mode, the VM disk is unmounted
from the ProtectAgent host after the data restoration is complete. However, the
disk fails to be unmounted owing to an internal error of the VDDK library and
then gets locked. Consequently, the VM cannot be started.
Troubleshooting
Step 1 Use a browser to log in to the vCenter Server.
4. Click on the right of the disk to unmount the disk. When unmounting the
disk, deselect Delete files from datastore.
5. Click OK.
Step 5 After the consolidation is complete, choose ACTIONS > Power > Power On to
start the VM.
----End
Possible Causes
The container process in the D state due to a storage pool fault.
Troubleshooting
Step 1 Log in to OceanProtect.
Step 2 Choose System > Infrastructure > Local Storage.
Step 3 Click Open the device management platform on the right of the local storage
name to go to the DeviceManager page.
Step 4 Choose Settings > Container Settings.
Step 5 In the Node Information area on the Container Settings tab page, view the
nodes in the Stopped state. For example, 0A.
2. Select S by pressing arrow keys, and then press S to complete the setting.
Then, "whose current sort field is S" is displayed in the upper right corner.
3. Press Q to exit the sorting page.
4. Press Shift and R to sort the S column in ascending order.
5. Check whether a process is in the D state in any of the following scenarios:
– If the COMMAND column contains a service container process and the
value in the S column corresponding to the process is D, the service
container is in the D state.
– If the COMMAND column contains the /pause process and the value in
the S column corresponding to the process is D, the pause container
process is in the D state.
– If the COMMAND column contains the isulad process and the value in
the S column corresponding to the process is D, the isulad process is in
the D state.
6. Run the following command to manually restart the node if at least one of
the scenarios described in 9.5 exist.
reboot
Step 10 Check whether the alarm "The system detects that the container root directory is
faulty" is cleared.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Suggestion
None.
References
None.
Possible Causes
A disk I/O error occurs.
Troubleshooting
Step 1 Log in to the CLI as the super administrator.
Step 4 Run the following command to check the pod status in the namespace and record
the name of the pod whose status is error and the namespace to which the pod
belongs.
The first column indicates the namespace, the second column indicates the pod
name, and the fourth column indicates the pod status.
container.sh –c kubectl get pod -A –owide
Step 18 Run the following command to check whether the pod status is restored to
normal:
container.sh –c kubectl get pod -A –owide
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Suggestion
None.
References
None.
Symptom
After backup files are restored, space occupied by these files increases. As a result,
restoration fails due to insufficient storage space.
Possible Cause
A snapshot is created for the directory during backup. In this case, a new file is
saved each time an updated is performed.
Troubleshooting
Step 1 Delete the snapshot of the backup directory and perform restoration again.
● If the restoration is successful, no further action is required.
● If the restoration fails, contact technical support engineers.
----End
Suggestion
None.
Reference
None.
Symptom
After a resource protection job is created, the protection status of some resources
remains Creating. File systems are used in the following example.
Possible Cause
When the front- and back-end container cards are faulty, the message delivered
by the system to the database for changing the status is lost. As a result, the job is
complete but the protection status is abnormal.
Procedure
Step 1 5.6 Logging In to OceanProtect.
Step 2 Choose Protection from the navigation pane. On the corresponding resource
protection page, select the resource whose protection status is abnormal and click
More > Remove Protection.
Step 3 Click More > Protection for the resource.
If the protection status remains Creating, contact technical support engineers.
----End
Suggestion
None.
Reference
None.
Possible Cause
The storage pool capacity has been used up.
Procedure
It is strongly recommended that you expand the storage pool capacity.
If the storage pool capacity cannot be expanded, perform the following
operations:
Step 2 Log in to a machine (physical machine or VM) that can access the logical port.
Run the following command to check whether the logical port can be accessed:
ping $ip
Example:
[root@localhost agent245]# ping 192.168.105.158
PING 192.168.105.158 (192.168.105.158) 56(84) bytes of data.
64 bytes from 192.168.105.158: icmp_seq=1 ttl=64 time=0.233 ms
64 bytes from 192.168.105.158: icmp_seq=2 ttl=64 time=0.183 ms
^C
--- 192.168.105.158 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.183/0.208/0.233/0.025 ms
The preceding command output indicates that the logical port can be accessed
normally.
Step 3 Search for the file system to be mounted.
1. Click Services > File System and search for the file systems whose names
start with pvc_.
2. Find the file system whose total capacity is 200 GB and allocated capacity is
about 175 GB, and record the file system name.
3. Find the file system whose total capacity is 100 GB and record the file system
name.
2. Click the Share tab and select the share path name.
3. Click Add.
4. Select Advanced in the upper right corner and set parameters as prompted.
Step 5 Log in to the host mentioned in step Step 2 and mount the file system.
Parameter Description
Example:
sudo /usr/bin/mount -t nfs -o retry=0,vers=3,soft,nolock,timeo=10 192.168.105.158:/
pvc_a000eb07_8e0a_4da1_93cf_1889cc044c84 /tmp/pvc1
Mount the file systems found in the preceding steps. After the mounting is
complete, run the mount command to check the mounting status. The following
is an example:
$ mount | grep pvc
192.168.105.158:/pvc_30e42b07xx on /tmp/pvc1 type nfs xxxxx
192.168.105.158:/pvc_30a752d5xx on /tmp/pvc2 type nfs xxxxx
192.168.105.158:/pvc_30a7521exx on /tmp/pvc3 type nfs xxxxx
The preceding three file systems are mounted to the /tmp/pvc1, /tmp/pvc2,
and /tmp/pvc3 folders.
Go to the mount directories queried in the preceding step (for example, /tmp/
pvc1, /tmp/pvc2, /tmp/pvc3) in sequence and find the inf_pvc_file1 file. The
following is an example:
ll -h
total 171G
-rw-r----- 1 nobody nobody 171G Sep 5 19:19 inf_pvc_file1
4. Wait until the command is executed successfully, which shall take about 30
minutes.
Step 8 Log in to OceanProtect, choose Explore > Copy Data, and delete unnecessary
copies to release space.
Step 9 Log in to DeviceManager, choose System > Storage Pools to view the storage
pool capacity.
4. Enter y.
5. Enter minisystem.
The command output is as follows:
-----------------System Information-----------------
| Product Version | 1.1.RC2 |
| System Version | 7600508195 |
| Release Time | 20220314221834 |
----------------------------------------------------
6. Run the following command to restart all data backup feature containers in
OceanProtect backup storage:
container.sh -c kubectl delete --all pods -n dpa
I have read and understand the consequences associated with performing this operation.
9. Run the following command to check whether all Pods are started.
container.sh -c kubectl get pods -n dpa
If all Pods are in the Running state, the services are restarted successfully. This
process takes about 5 to 10 minutes.
----End
Suggestion
None.
Reference
None.
Possible Cause
The /etc/ld.so.conf file does not contain the directory where the database file
resides. As a result, database initialization fails.
Troubleshooting
Step 1 Use PuTTY to log in to the Dameng database host.
Step 3 Run the su - dmdba -c 'echo $DM_HOME' command to obtain the Dameng
database installation directory.
Step 4 Run the vi /etc/ld.so.conf command to open the ld.so.conf file. In a new line,
enter the bin directory under the database installation directory, for
example, /dm8/bin.
Step 5 Run the ldconfig /etc/ld.so.cache command to reload the ld.so.conf file.
Step 6 Perform database recovery again to check whether the fault is rectified.
----End
Suggestion
None.
Reference
None.
Symptom
After a SQL Server cluster instance is recovered, the primary and secondary replica
nodes in the availability group of the cluster instance are in the Not
Synchronizing/Suspect state.
Possible Cause
The failover mode of the availability group is Manual.
Troubleshooting
Step 1 On the host where the cluster instance is located, log in to Microsoft SQL Server
Management Studio as a database administrator.
Step 2 Locate the availability group node where the primary and secondary replicas in
the Not Synchronizing/Suspect state reside.
Step 3 Right-click the availability group and choose Failover from the shortcut menu.
Step 4 Perform a primary/secondary failover for the availability group and click Next.
Step 5 Select a new host as the primary node of the availability group and click Next.
Step 6 Select Click here to confirm failover with potential data loss and click Next.
Step 7 Click Connect to authenticate the login to the SQL Server instance.
Step 8 On the displayed login page, log in to the SQL Server instance as a database
administrator and click Connect.
Step 10 After the primary/secondary failover of the availability group is configured, click
Finish.
Step 13 Select an availability group, right-click the database, and choose Resume Data
Movement. In the dialog box that is displayed, click OK and wait until the process
is complete.
----End
Suggestion
None.
Reference
None.
Possible Cause
During OpenGauss cluster instance recovery, the standby node fails to invoke the
gs_ctl command to rebuild the instance.
Procedure
Step 1 Use PuTTY to log in to the standby node of the OpenGauss database as a
database user.
/opt/mogdb/install/data/dn
(1 row)
----End
Suggestion
None.
Reference
None.
Possible Causes
1. The disk space used for global search is insufficient.
2. The index information is incorrect.
Troubleshooting
Step 1 Find the file system used for global search.
1. Log in to the OceanProtect backup storage as user admin.
2. Run the change user_mode current_mode user_mode=developer command
to go to the developer view.
3. Run the minisystem command to go to the minisystem view.
4. Run the container.sh -c kubectl get pvc -n dpa command to view the PVC
information. The PVC name corresponding to the storage volume used for
global search is data-nas.
5. Change the hyphen (-) in the storage volume name to an underscore (_) to
get the file system name used by Elasticsearch for global search. For example,
change pvc-63ec14f7-0ad3-481e-b06b-6fedd12409e5 to
pvc_63ec14f7_0ad3_481e_b06b_6fedd12409e5.
Step 2 Log in to DeviceManager and check whether the file system capacity is used up.
Step 4 Manually create a copy index and check whether the index is successfully created.
----End
Suggestion
None.
Reference
None.
Symptom
When a backup job is being executed, the backup subjob of the source
deduplication service failed to be restarted using the systemctl restart dataturbo
command.
Possible Cause
At the time when DataTurbo is restarted, the backup subjob is writing data or
operating the mount directory. As a result, service data cannot be written and the
backup fails.
Troubleshooting
Adopt a job-level retry mechanism.
Step 2 Choose Protect > Policies > SLAs > Create to create a backup policy for the
backup job.
Step 3 In the displayed Backup Policy dialog box, configure Automatic Retry in the
Advanced area.
----End
Suggestion
None.
Reference
None.
4.3.22 Service Data Write Times Out and Fails When Backing
Up Non-InnoDB Data of MySQL or MariaDB
Symptom
Service data write times out and fails when backing up non-InnoDB data of
MySQL or MariaDB.
Possible Cause
For a MySQL or MariaDB database that does not use the InnoDB storage engine,
the current data table is locked before the backup job is complete. As a result,
data cannot be written and all DDL and DML statements are blocked.
Troubleshooting
Step 1 Use PuTTY to log in to the MySQL or MariaDB database host.
Step 6 After the backup is complete, change the value of lock-ddl-per-table to 0 and
save the change.
lock-ddl-per-table = 0
----End
Suggestion
None.
Reference
None.
Possible Cause
The replication cluster at the primary site is deleted when the replication card or
cluster at the secondary site is faulty and is added again after the fault is rectified,
resulting in an abnormal remote replication pair on the secondary site.
Troubleshooting
Step 1 Obtain the ID of the remote replication pair.
1. Log in to OceanProtect at the primary site by referring to 5.6 Logging In to
OceanProtect.
2. Choose Insight > Jobs. On the Historical Jobs tab, find the event description
The system attempts to synchronize remote replication pair xxxxxxx,
where xxxxxxx indicates the ID of the remote replication pair.
----End
Suggestion
None.
Reference
None.
Possible Causes
For cloud disk recovery, the agent host unmounts the cloud disk to be recovered
from the cloud server and then mounts it to the agent host. During the recovery, if
the agent host is faulty or powered off and no other agent is available to take
over the recovery job, the job fails and the disks of the OpenStack cloud server are
lost.
Troubleshooting
Step 1 Use PuTTY to log in to any OpenStack node using the management IP address as
user fsp.
Step 2 Run the following command to switch to user root:
su - root
Step 3 Run the following command to import environment variables, select openstack
environment variable of cloud_admin (keystone v3), and enter the password of
cloud_admin:
source set_env
Step 4 Run the following command to query the ID of the host (agent host) to which the
cloud disk is mounted:
cinder show volume-id
volume-id indicates the ID of the lost cloud disk. You can view the ID in the
recovery job.
In the command output, attached_servers indicates the ID of the agent host to be
queried.
Step 5 Run the following command to unmount the cloud disk from the agent host:
nova volume-detach server-id volume-id
NOTE
When mounting a system disk to a cloud server, set the drive letter to /dev/vda to ensure
that the system disk can be found when the server starts. Run the following command to
mount the system disk:
nova volume-attach server-id volume-id /dev/vda
----End
Suggestions
When creating a recovery job, you are advised to specify multiple agent hosts to
execute the recovery job. If one agent host is faulty, other agent hosts can take
over the recovery job. In addition, if an agent host is faulty and can be recovered
within 5 minutes, the system re-executes or resumes the recovery job to ensure
that the recovery job is successfully executed.
Possible Causes
The mechanism for mounting the Windows network drive is defective, as a result,
the plug-in process for browsing resources cannot access the network drives
mounted by other users.
Troubleshooting
Method 1:
You are advised to back up the CIFS share in NAS share mode. To back up the CIFS
share in Windows fileset mode, perform the following steps:
Step 1 Contact the administrator to obtain the psexec tool installation package and
install the psexec tool.
Step 2 Add the psexec.exe file to the Path variable of environment variables.
Step 3 On the fileset host, run the psexec -i -s cmd.exe command to start the tool.
Step 4 In the displayed cmd window, run the mount command.
----End
Method 2:
----End
Suggestions
You are advised to back up CIFS shares in NAS share mode.
Reference
None.
Possible Causes
The Windows host lacks the Microsoft C and C++ (MSVC) runtime libraries.
Troubleshooting
Step 1 Download the vc_redist.exe executable file specific to the Windows host
architecture from the Microsoft official website.
Step 3 Execute the job again and verify that the job is successfully executed.
----End
Suggestions
None.
Reference
None.
Symptom
The database backup job fails.
Possible Causes
With source deduplication enabled, the dataturbo process is faulty when a backup
subjob is being executed, leading to backup failure.
Troubleshooting
● If the job is scheduled by an SLA, the SLA automatically retries the job.
● If the backup job is manually executed, execute the backup job again.
Suggestions
None.
Reference
None.
Possible Causes
The management IP address of the Huawei distributed block storage node cannot
communicate with the IP addresses of the External OM planes for OpenStack host
nodes.
Troubleshooting
Step 1 Use PuTTY to log in to an OpenStack controller node through its External OM
plane IP address.
Default account: fsp
Default password: See the default password of the account for the target node in
the "Type A (Background)" sheet of the Account List specific to your version.
NOTE
----End
Possible Causes
The Linux host is in enforcing security mode and cannot run the MySQL service.
Troubleshooting
Step 1 Log in to the MySQL database host and run the vi /etc/selinux/config command
to change the SELinux configuration to disabled.
Step 2 Save the modification and restart the MySQL database host.
----End
Suggestions
None.
Reference
None.
Possible Causes
During the agent installation, a DLL may not be found due to environment
problems.
Troubleshooting
Step 1 Download the lost DLL specific to the Windows host architecture from the
Microsoft official website.
Step 2 Manually save the downloaded library in the Windows system directory.
----End
Suggestions
None.
Reference
None.
Symptom
A recovery subjob fails during MySQL/MariaDB instance recovery.
The error message Aborting because of a corrupt database page in the system
tablespace is displayed.
Possible Causes
The database page in the backup copy is corrupted. As a result, the recovery fails.
Troubleshooting
Step 1 Log in to the MySQL/MariaDB database as user root.
Step 2 Run the vi /etc/my.cnf command to open the my.cnf configuration file and add
the following content to the file.
innodb_force_recovery=1
Step 3 Return to the backup storage management page and enable Forcible Recovery
for recovery.
NOTE
----End
Suggestions
None.
Reference
None.
Possible Causes
Any backup link between the controller of the backup storage device and the
agent host is faulty.
Troubleshooting
Step 1 Log in to DeviceManager and check the IP address of the logical port.
Step 2 Check whether the networks between each logical port and the host are normal.
Method 1:
Check whether the IP address of each logical port can be pinged on the host.
Method 2:
Log in to the background of the backup storage device and run the ping --vrf vrf-
srv desIP -I srcIP command to check whether the IP address of each logical port
can be pinged.
NOTE
----End
Suggestions
Rectify network faults between logical ports and the host to ensure that the
networks between each network port and the host are normal.
Reference
None.
Symptom
During execution, a job is stopped unexpectedly when the progress reaches a
certain value. (Not applicable to VMware data protection jobs)
Possible Causes
A module in the system is faulty.
Troubleshooting
Step 1 Query the faulty module based on the job type and progress in Table 4-4.
Table 4-4 Mapping between job progress, job types, and modules
NOTE
Example:
● If an archiving job is stopped unexpectedly with the job progress between 6% and 95%,
the data protection engine module is faulty. Contact technical support engineers.
● (Special case) If the progress of a live mount job of the backup copy stops at 3%, an
error occurs in the step of copy cloning. In this case, contact technical support engineers.
Step 2 Export system logs or agent logs by referring to 3.2.1.2 Exporting Debug Logs
and save them properly.
----End
Suggestions
None.
Reference
None.
Symptom
On the OceanProtect backup storage in a dual-controller environment, after the
management port of controller A is disabled in the background, the login to the
management interface of controller B fails.
Possible Causes
An internal component of the software processes and forwards the login request
to controller A. However, the management port of controller A has been disabled
and the login fails.
Troubleshooting
Enable the management port of controller A and then log in to the management
interface of controller B. Alternatively, wait for 3 to 5 minutes, refresh the page,
and try to log in to the management interface of controller B.
Suggestions
None.
Reference
None.
Symptom
Error code "SQL4973N" is displayed when the system failed to run the rollforward
command during restoration.
Possible Causes
The Rollforward utility processes all log files found on each database partition, but
the stop point on the specified database partition does not match the
corresponding record on the catalog database partition.
Troubleshooting
Step 1 Log in to the database node and run the drop db Database name command to
delete the original target database.
Step 2 Run the create db Database name command to re-create the target database.
Step 3 Run the db2 update db cfg for Database name using LOGARCHMETH1 DISK:
Archive directory and db2 update db cfg for Database name using TRACKMOD
ON commands to enable database archiving.
Step 4 Run the db2 backup db Database name on all nodes to Backup copy path
command to complete an offline backup.
----End
Suggestions
None.
Reference
None.
Symptom
On the OceanProtect WebUI, check the Dameng cluster details. A host whose
running mode is only Active Node exists. After the cluster is restored, the
database is not started and is abnormal.
Possible Causes
After the cluster is restored, the daemon process of the host whose running mode
is only Active Node fails to start the active node.
Troubleshooting
Step 1 Use PuTTY to log in to the Dameng database host whose running mode is only
Active Node.
Step 2 Run the following command to restart the daemon process. In the command,
dmdba indicates the Dameng database installation user, and
DmWatcherServicedmrw indicates the name of the registered daemon process
service.
su - dmdba -c "$DM_HOME/bin/DmWatcherServicedmrw restart"
Step 3 Use PuTTY to log in to the host where the cluster monitor resides.
Step 4 Run the following command to log in to the monitor. In the command, /dm8/
dmmonitor.ini indicates the path of the monitor configuration file.
su - dmdba -c "dmmonitor /dm8/dmmonitor.ini"
Step 5 Run the show command to check the cluster status. In the command output, if
ISTATUS of all nodes is OPEN, the database has been started.
show
2023-07-08 18:14:08
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GRP1 45330 TRUE AUTO TRUE
----End
Suggestions
None.
Reference
Parameters in the command for restarting the daemon process:
[dmdba@DM2 ~]$ $DM_HOME/bin/DmWatcherServicedmrw restart
Possible Causes
The requiretty permission is enabled in the /etc/sudoers configuration file. As a
result, the source deduplication function is unavailable.
Troubleshooting
Step 1 Log in to the agent host as user root.
Step 2 Run the following command to search for the directory of the sudoers
configuration file:
whereis sudoers
Step 3 Run the following command to change the permission on the sudoers
configuration file:
chmod u+w /etc/sudoers
Step 4 Run the following command to open the etc/sudoers configuration file:
vim /etc/sudoers
----End
Suggestions
None.
Reference
Parameters in the command for restarting the daemon process:
[dmdba@DM2 ~]$ $DM_HOME/bin/DmWatcherServicedmrw restart
Procedure
Step 1 Log in to the agent host as user root.
Step 2 Run the following command to search for the directory of the sudoers
configuration file:
whereis sudoers
Step 3 Run the following command to change the permission on the sudoers
configuration file:
chmod u+w /etc/sudoers
Step 4 Run the following command to open the etc/sudoers configuration file:
vim /etc/sudoers
Step 5 Check whether the Defaults requiretty field exists in the configuration file. If yes,
add # before the Defaults requiretty field to cancel the configuration.
Add the following content to the configuration file, and save the modification and
exit.
Common username ALL=(ALL) ALL
----End
Possible Causes
The snapshot cannot be created because the storage pool capacity is used up.
Troubleshooting
Expand the storage pool capacity or delete unnecessary volumes to release
storage space.
Suggestions
None.
Reference
None.
Possible Causes
The version of the primary end for replication (or reverse replication) is later than
that of the secondary end.
Troubleshooting
Upgrade the earlier version by referring to the upgrade guide. Ensure that the
device versions of primary and secondary ends are the same.
Suggestions
None.
Reference
None.
Possible Causes
The PAM file of the agent host is modified, and # is added to the first line.
Troubleshooting
Step 1 Use PuTTY to log in to the agent host.
Step 2 Run the vi /etc/pam.d/su command to open the su file.
Step 3 Delete # from the #auth parameter in the first line.
Step 4 Perform the push installation again.
----End
Suggestions
None.
Reference
None.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm
Controller Is Faulty is displayed on the tab page.
Possible Causes
The controller is faulty.
Impact
Controller faults may deteriorate system performance and reliability.
Fault Diagnosis
Procedure
Step 1 Check whether the controller Alarm indicator on the storage device is steady
yellow.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
For details about how to replace a faulty controller, see the Parts Replacement of
the corresponding product model.
Step 3 Check whether the controller Alarm indicator is steady off and Health Status of
the controller on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
Log in to the DeviceManager and choose System > Hardware > Devices. On the
front view of a 2 U controller or disk enclosure that is marked by an exclamation
mark (!), click the disk in the red square. Health Status of the disk is Faulty.
The disk Alarm/Location indicator on the storage device is steady yellow. Figure
4-3 show the location of the disk Alarm/Location indicator.
Alarm Information
On the Alarms and Events page of the DeviceManager, click the Current Alarms
tab. The alarm Hard Disk Is Faulty is displayed on the tab page.
Possible Causes
The disk is faulty.
Impact
A disk failure causes the disk domain to which the disk belongs to be degraded or
fail. If the disk domain is degraded, the system read/write performance
deteriorates and data loss may occur. If the disk domain fails, data loss occurs and
services are interrupted.
Fault Diagnosis
Procedure
Step 1 Check whether the disk Alarm/Location indicator on the storage device is steady
yellow.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
After the disk is reinserted, check whether the disk Alarm/Location indicator is
steady off and Health Status of the disk on the DeviceManager is Normal.
For details about how to replace a disk, see the Parts Replacement of the
corresponding product model.
NOTICE
Ensure that the replaced disk has a capacity equal to or larger than the faulty disk
and is of the same type as the faulty disk.
Step 4 Check whether the disk Alarm/Location indicator is steady off and Health Status
of the disk on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
to display the rear view. Click the interface module in the red square. Health
Status of the interface module is Faulty.
The interface module Power indicator on the storage device is steady yellow.
Figure 4-5 shows the location of the interface module Power indicator.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm
Interface Module Is Faulty is displayed on the tab page.
Possible Causes
The interface module is faulty.
Impact
If an interface module malfunctions, it cannot process services and services will
work in single-link mode, resulting in service interruption risks.
Fault Diagnosis
Procedure
Step 1 Check whether the interface module Power indicator on the storage device is
steady yellow.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
NOTICE
Before removing and reinserting the interface module, power off the interface
module in the rear view of the storage device displayed on the DeviceManager.
After the interface module is reinserted, check whether the interface module
Power indicator is steady green and Health Status of the interface module on the
DeviceManager is Normal.
For details about how to replace an interface module, see the Parts Replacement
of the corresponding product model.
Step 4 Check whether the interface module Power indicator is steady green and Health
Status of the interface module on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
Log in to the DeviceManager and choose System > Hardware > Devices. Select a
disk enclosure that is marked by an exclamation mark (!), and click to display
the rear view. On the rear view of the storage device, click the expansion module
in the red square. Health Status of the expansion module is Faulty.
The expansion module Alarm indicator on the storage device is steady yellow.
Figure 4-7 show the location of the expansion module Alarm indicator.
Figure 4-7 Location of the expansion module Alarm indicator on a 2 U SAS disk
enclosure
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm
Expansion Module Is Faulty is displayed on the tab page.
Possible Causes
The expansion module is faulty.
Impact
An expansion module fault may cause the running of only a single link at the
backend, which may degrade system performance and system reliability.
Fault Diagnosis
Procedure
Step 1 Check whether the expansion module Alarm indicator on the storage device is
steady yellow.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
For details about how to replace an expansion module, see the Parts Replacement
of the corresponding product model.
Step 3 Check whether the expansion module Alarm indicator is steady off and Health
Status of the expansion module on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
Log in to the DeviceManager and choose System > Hardware > Devices. Select a
controller enclosure. On the front view of the controller enclosure, click the fan
module in the red square. Health Status of the fan module is Faulty.
The fan module Running/Alarm indicator on the storage device is steady yellow.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm Fan
Module Is Faulty is displayed on the tab page.
Possible Causes
The fan module is faulty.
Impact
If a fan module is faulty, the temperature of the controller enclosure or disk
enclosure may increase. If the system works at a high temperature for a long time,
the service life of the system may be impaired.
Fault Diagnosis
Procedure
Step 1 Check whether the fan module Running/Alarm indicator on the storage device is
steady yellow.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Check whether any objects affect the rotation of the fans.
● If yes, go to Step 3.
● If no, go to Step 4.
After the objects are removed, check whether the fan module Running/Alarm
indicator is steady green and Health Status of the fan module on the
DeviceManager is Normal.
For details about how to replace a faulty fan module, see the Parts Replacement
of the corresponding product model.
Step 5 Check whether the fan module Running/Alarm indicator is steady green and
Health Status of the fan module on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
Log in to DeviceManager and choose System > Hardware > Devices.
The BBU Running/Alarm indicator on the storage device is steady yellow. Figure
4-10 show the location of the BBU Running/Alarm indicator.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm BBU Is
Faulty is displayed on the tab page.
Possible Causes
The BBU is faulty.
Impact
A BBU fault may reduce the reliability of the system.
Fault Diagnosis
Procedure
Step 1 Check whether the BBU Running/Alarm indicator on the storage device is steady
yellow.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Replace the faulty BBU.
For details about how to replace a BBU, see the Parts Replacement of the
corresponding product model.
Step 3 Check whether the BBU Running/Alarm indicator is steady green and Health
Status of the BBU on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
marked by an exclamation mark (!). On the main page, click to switch to the
rear view of the storage device. On the rear view of the storage device, click the
power module in the red square. Health Status of the power module is Faulty.
The power module Running/Alarm indicator is steady yellow. Figure 4-12, and
Figure 4-13show the location of the power module Running/Alarm indicator.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm Power
Module Is Faulty is displayed on the tab page.
Possible Causes
The power module is faulty.
Impact
If a power module is faulty, the power supply to the system may be unstable or
powered off, which deteriorates the reliability of the system.
Fault Diagnosis
Procedure
Step 1 Check whether the power Running/Alarm indicator on the storage device is steady
yellow.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
For details about how to replace a power module, see the Parts Replacement of
the corresponding product model.
Step 3 Check whether the power Running/Alarm indicator is steady green and Health
Status of the power module on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
5 Common Operations
Procedure
Step 1 Use PuTTY to log in to the host where ProtectAgent needs to be uninstalled as a
system administrator.
Step 4 If the host was used to protect Oracle databases and ProtectAgent will be installed
on it again to protect VMware VMs in the future, or if the host was used to
protect VMware VMs and ProtectAgent will be installed on it again to protect
Oracle databases in the future, perform the following operations to delete
the /etc/HostSN/HostSN file on the host. Skip this step in other scenarios.
Step 5 If ProtectAgent will be installed on the host again and the OS has been reinstalled
on the host or the /etc/HostSN/HostSN file has been deleted from the host
before ProtectAgent is installed, perform the following operations to delete the
host on OceanProtect. Skip this step in other scenarios.
1. Log in to OceanProtect.
2. Choose Protect > Hosts and Applications > Hosts.
3. Search for the host on which ProtectAgent has been uninstalled based on the
IP address and click Delete Host.
Before deleting the host, ensure that the host is not associated with any SLA.
----End
Step 1 Use PuTTY to log in to the host where ProtectAgent needs to be uninstalled as a
system administrator.
Step 2 Run the ps -ef | grep rdadmin command to check whether ProtectAgent is
running.
If the following command output is displayed, ProtectAgent is not running. In this
case, go to Step 3.
NOTE
End the monitor process first, and then end other processes in sequence.
2. If yes, run the net user rdadmin /delete command to delete user rdadmin.
----End
Precautions
If the protected resource is GaussDB (DWS) and the firewall is not required, run
the systemctl stop firewalld command to disable the firewall. If you need to
enable the firewall of the GaussDB (DWS) node or agent host, enable the ports
required for GaussDB (DWS) backup and restoration before enabling the firewall.
Otherwise, the GaussDB (DWS) cluster may not work properly. For details about
how to query the ports required for GaussDB (DWS) backup and restoration, see
"Querying Ports to Be Opened for GaussDB (DWS) Backup and Restoration" in
OceanProtect Backup Storage 1.3.0 Data Backup Feature Guide (for GaussDB
(DWS)). Enabling the firewall is a high-risk operation. Exercise caution when
enabling the firewall.
Procedure
Step 1 Download the ProtectAgent software package from OceanProtect.
For details, see 5.7 Downloading the ProtectAgent Software Package.
Step 2 Check whether the protected resource is an HDFS or HBase cluster and whether
Kerberos authentication is configured in the cluster.
● If yes, ensure that the time on the agent host, the OceanProtect backup
storage, and the Kerberos server is consistent before the installation. If the
time is inconsistent, use the same NTP server. Otherwise, the ProtectAgent
installation may fail.
● If no, ensure that the time on the agent host is consistent with that on the
OceanProtect backup storage before the installation.
To view the time on the OceanProtect backup storage, perform the following
steps:
a. Log in to the OceanProtect WebUI by referring to 5.6 Logging In to
OceanProtect.
b. Choose System > Settings > Device Time.
View the current device time. If the time is inconsistent, change the host
time or use the same NTP server. Otherwise, the ProtectAgent installation
may fail.
Step 3 Install ProtectAgent.
1. Use PuTTY to log in to the host where ProtectAgent is to be installed as a
system administrator.
2. Run the following commands in sequence to create a directory for storing the
software package and grant the permission.
The permission on the directory for storing the software package must be
755. Assume that the software package is stored in /opt/install.
mkdir /opt/install
chmod -R 755 /opt/install
The software package name varies depending on the agent type and OS. Use the
actual package name.
unzip DataProtect_xxx_client_xxx.zip
NOTE
▪ If you choose to overwrite the original host, the new host inherits the copy
information of the original host, and the UUID of the new host is the same as
that of the original host.
▪ If you choose not to overwrite the original host, delete the original host...
and reinstall it.
Prerequisites
The language for installing ProtectAgent, OceanProtect, and the agent host OS
must be the same. Otherwise, information may be displayed in multiple languages
Procedure
Step 1 Download the ProtectAgent software package.
1. Log in to the OceanProtect WebUI by referring to 5.6 Logging In to
OceanProtect.
2. Choose Protect > Hosts and Applications > Hosts.
3. Click ProtectAgent Package Management and set related parameters.
Table 5-1 describes the parameters.
OS Choose Windows.
4. Click OK.
Obtain the downloaded ProtectAgent software package.
If you fail to download the ProtectAgent package using Internet Explorer 11
on the Windows Server, refer to 5.8 What If I Failed to Download
ProtectAgent Using Internet Explorer 11 on the Windows Server?.
Step 2 Ensure that the time on the hosts is consistent with that on the OceanProtect
backup storage.
To view the time on the OceanProtect backup storage, perform the following
steps:
The software package name must not contain any space. Otherwise, ProtectAgent
installation will fail.
3. Go to the software package directory and install the software.
a. To specify the service IP address used for registration, perform this step.
Otherwise, go to Step 3.3.b.
i. Go to the ..\DataProtect_xxx_client_general_windows
\DataProtect_xxx_client_general_windows\conf directory, where
xxx indicates the ProtectAgent version.
ii. Open the client.conf file and add eip=XXX.XXX.XXX.XXX to the end
of the file, where XXX.XXX.XXX.XXX is the service IP address used for
registration.
b. Install the software.
i. Go to the ..\DataProtect_xxx_client_general_windows
\DataProtect_xxx_client_general_windows directory, where xxx
indicates the ProtectAgent version.
ii. Double-click install.bat to install the software.
NOTE
Install the software as prompted. During the installation, you must enter the
private key created when you download the ProtectAgent software package
and enter the ProtectAgent user password.
Step 5 Repeat the preceding steps to install the ProtectAgent software on other agent
hosts.
----End
Step 1 Use PuTTY to log in to the CLI of the storage system as user admin.
NOTE
XXX indicates the name of the database to be operated. Currently, database names
including "ADMINDB", "ANON_POLICY", "ANTI_RANSOMWARE", "APPLICATIONDB",
"ARCHIVEDB", "BASE_PARSER", "DATAMOVERDB", "DEE_SCHEDULER", "DME_UNIFIED",
"GENERALSCHEDULERDB", "INDEXER", "POSTGRES", "PROTECT_MANAGER" and
"REPLICATIONDB" are supported.
PROTECT_MANAGER=#
Step 4 Run the following command to disable the database paging function.
\pset pager off
CAUTION
Database operations are highly risky. Exercise caution when performing these
operations.
----End
Prerequisites
The management terminal meets the following requirements:
● Its operating system and browser are compatible.
OceanProtect supports multiple OSs and browsers. For details about the
compatibility information, use OceanProtect Compatibility Query.
● To ensure normal display, you are advised to set the zoom ratio of the
browser or the system to 100%.
– For Windows 7, choose Control Panel > Display. For other operating
systems, perform operations based on the site requirements.
– In a browser, press Ctrl+0 (Windows) or Command+0 (MAC OS).
Context
● This section uses the Windows OS as an example to describe how to log in to
OceanProtect.
● If a user does not perform any operations after logging in to the system for a
period longer than the timeout interval (60 minutes by default and
modifiable), the system logs out automatically.
Precautions
Use different types of browsers to access the OceanProtect and DeviceManager
management pages, respectively. Otherwise, when a user logs out of one
management page or switches the language, the other management page exits or
the language is switched at the same time.
Procedure
Step 1 On the maintenance terminal, open a browser.
Step 2 In the address box, enter https://XXX.XXX.XXX.XXX:25080 and press Enter.
The OceanProtect login page is displayed.
XXX.XXX.XXX.XXX indicates the management IP address of the controller.
NOTE
The web browser may prompt that the website has a security certificate error. If the IP
address is correct, you can ignore the prompt and continue accessing OceanProtect.
NOTE
● The GUI may vary slightly depending on the product version and model. The actual GUI
prevails.
● To learn details about each step and operation, click to view online help.
● You can click the username in the upper right corner of the page and select Log Out to
log out of OceanProtect.
----End
Procedure
Step 1 Log in to the OceanProtect WebUI by referring to 5.6 Logging In to
OceanProtect.
Step 2 Choose Protection > Hosts and Applications > Hosts.
Step 3 Click ProtectAgent Package Management and set related parameters.
Table 5-2 describes the parameters.
Parameter Description
----End
B Glossary
A
Application- This function simultaneously backs up application data
consistent backup (such as database and VM data) and application memory
data to ensure the consistency of application system
backups.
Archive storage Storage media for archived data.
Auditor A built-in user role. Users with this role can view all
system resources.
C
Copy archiving Copy archiving saves copies generated by backup or
replication to specified archive storage for long-term
retention.
Copy-based recovery A copy-based recovery mode in which data can be
recovered to the point in time when the copy is
generated.
Copy replication A data protection technology that replicates data copies
to a remote site for remote protection.
D
Data compression The process of encoding data to reduce its size. Lossy
compression (i.e., compression using a technique in which
a portion of the original information is lost) is acceptable
for some forms of data (e.g., digital images) in some
applications, but for most IT applications, lossless
compression (i.e., compression using a technique that
preserves the entire content of the original data, and
from which the original data can be reconstructed
exactly) is required.
Data desensitization This function is used to process sensitive data and protect
privacy.
Data protection A built-in user role. Users with this role have permissions
administrator related to data protection, data utilization, and resource
management.
Deduplication Deduplication is a specialized data compression
technique for eliminating coarse-grained redundant data,
typically to improve storage utilization. In the
deduplication process, duplicate data is deleted, leaving
only one copy of the data to be stored, along with
references to the unique copy of data. Deduplication is
able to reduce the required storage capacity since only
the unique data is stored.
Differential backup A backup mode in which data that has changed since the
last full backup is backed up.
E
External storage Production storage devices that can be managed by the
system.
F
File level restore A recovery mode that can quickly recover one or more
files on a host or VM disk.
Fileset A collection of one or more files or directories to be
protected.
Full backup A backup mode in which all data at the current time
point is backed up.
G
Global search Global search is used to query resources (hosts, VMs, and
databases), copies, and copy files in the system based on
keywords.
I
Incremental backup A backup mode which backs up only data newly added or
modified since the last full or incremental backup.
Incremental backup A backup mode in which only the newly added or
forever modified data is backed up after all data is backed up in
the first backup.
index A collection of files that store source data collected by
the search engine. An index consists of an index mapping
file and multiple index files. The index mapping file
defines all fields (such as field name, field type, whether
to store, and mapping text analyzer) in the index, and
the index file stores the instantiated data of index
mapping file.
Instant restore Instant restore is a feature that provides the instant
restoration function for restoring data and creating
images using backups. With Instant restore enabled, the
time required for restoring data and creating images can
be reduced to minutes, which is much faster than the
normal restoration function.
L
Live Mount A technology that uses backup copies. It directly maps a
backup copy to a host that uses data in a standard
storage protocol to recover or access the backup copy
data immediately.
Local storage Built-in storage for storing backup data.
Log backup A backup mode in which transaction logs in databases
are backed up. Transaction logs are a series of records
about all modifications made to databases and the
transactions that executed these modifications.
M
Mount update A policy used to update instant mounting data.
policy
P
Point-in-time A point-in-time-based data recovery mode in which data
recovery can be recovered to any point in time between two
copies.
R
Rate limiting policy A policy used to limit the bandwidth for backup,
replication, and archiving.
Remote device A built-in user role. Users with this role can be used for
administrator authentication and authorization between the source and
target clusters during copy replication.
Resource A collection of hardware and software required by data
protection tasks, such as hosts, databases, and VMs.
S
Service Level Service Level Agreement adopts one or more policies
Agreement (backup, archiving, and replication) to protect resources
at a specified frequency or period.
Source cluster A cluster that initiates replication during copy replication.
System A built-in user role. Users with this role have full system
administrator permissions, including data protection, data utilization,
system configuration, and resource management.
T
Target cluster A cluster for receiving replication data from the source
cluster during copy replication.