Module+8 - Data Domain Cloud Tier
Module+8 - Data Domain Cloud Tier
Introduction
This module focuses on the definition, features, and architecture of Data Domain
Cloud Tier.
Introduction
This lesson covers benefits and components of Data Domain Cloud Tier.
The Data Domain Cloud Tier enables the movement of data from the active tier of a
Data Domain system to low-cost, high-capacity object storage in the public, private,
or hybrid cloud for long-term data retention. Only unique, deduplicated data is sent
from the Data Domain system to the cloud or retrieved from the cloud. Sending
only deduplicated data ensures that the data being sent to the cloud occupies as
little space as possible.
Metadata to support the cloud is maintained in the cloud tier shelf of the local
storage. This metadata is used in operations such as deduplication, cleaning, and
replication. Using local storage for metadata minimizes writes to the cloud. The
metadata includes the index, the Directory Manager (DM) for managing the
namespace and container metadata. Some metadata, including container
metadata, is also stored with the data in the cloud for disaster recovery purposes.
Cloud tiering provides a scalable solution for data storage. With the Data Domain
Cloud Tier, users can store up to two times the maximum active tier capacity in the
cloud for long-term retention of data. With cloud tiering policies, data is in the right
place at the right time. Data is scheduled to be moved to the cloud tier using
policies based on the age of the data.
When data is moved from the active tier to the cloud tier, it is deduplicated and
stored in object storage in the native Data Domain format. This results in a lower
Total Cost of Ownership (TCO) over time for long term, cloud storage. The cloud
tier supports encryption of data at rest and the Data Domain Retention Lock
feature, thus ensuring the ability to satisfy regulatory and compliance policies.
Here are a few considerations when deciding to implement Cloud Tier on a Data
Domain system:
• The DD Cloud Tier feature is not supported on any Data Domain system that
has the Extended Retention feature that is enabled.
• The DD Cloud Tier feature may consume all available bandwidth in a shared
WAN link, especially in a low-bandwidth configuration (1 Gbps). The DD Cloud
Tier feature may impact other applications sharing the WAN link.
• On systems with a dedicated management interface, reserve that interface for
system management traffic (using protocols such as HTTP and SSH). Backup
and Cloud Tier data traffic should be directed to other interfaces, such as eth1a.
Model Sizing
Here are the Data Domain models that support Data Domain Cloud Tier along with
the supported physical memory and storage requirements for each model.
Take a moment, and familiarize yourself with the specifications of each model.
The Data Domain cloud tier is managed through a single Data Domain namespace.
There is no separate cloud gateway or virtual appliance required. Data movement
is supported by the native Data Domain policy management framework.
With Data Domain, cloud storage supports Dell EMC Elastic Cloud Storage (ECS),
Virtustream, Amazon Web Services, Google Cloud Storage, Alibaba Cloud, and
Microsoft Azure. Extra storage is required to hold metadata associated with the
data in the cloud tier. Deduplication, cleaning, and replication operations use
metadata.
The Data Domain Cloud Tier is supported on physical Data Domain systems with
expanded memory configurations. Data Domain Cloud Tier can be used with DD
VE 3.0 or later in 16-TB, 64-TB, and 96-TB storage options.
Extra metadata storage is required to support the cloud tier. The amount of
required metadata storage is based on the Data Domain platform.
A Data Domain system can run either the Cloud Tier or Extended Retention
features but not both on the same system.
The cloud tier is supported in high availability (HA) configuration. Both nodes must
be running DD OS 6.0 or higher with HA enabled.
With DD OS 6.0 and later, one or two cloud units are supported on each Data
Domain system. Each cloud unit has the maximum capacity of the active tier. The
active tier does not have to be at maximum capacity to scale the cloud tier to
maximum capacity. Each cloud unit maps to a cloud provider, which can be
different cloud providers. Metadata shelves store metadata for both cloud units.
The number of metadata shelves that are needed depends on the cloud unit
physical capacity.
This example shows a system with an active tier and two cloud units. Each cloud
unit has a capacity equal to that of the active tier. Data that is stored on the active
tier provides local access to data and can be used for operational recoveries. The
cloud tier provides long-term retention for data that is stored in the cloud.
The NFS, CIFS, and DD Boost protocols are supported for data movement to and
from the cloud tier.
VTL Tape Out to Cloud is supported with DD OS version 6.1 and later. DD VTL
Tape Out to Cloud supports storing the VTL vault on DD Cloud Tier storage.
Each cloud unit has its own segment index and metadata and thus each cloud is a
deduplication unit by itself. There is no deduplication across tiers: active tier and
cloud units. Cloud tiers use the same Data Domain compression algorithm. Data is
compressed using the LZ compression algorithm for the cloud tier. Cloud
deduplication does not do the packing phase.
Cloud tier cleaning does not do partial copy forward to avoid unnecessary reads
from the cloud. When all segments within a region are dead, the entire object is
deleted. Most of the work of cleaning happens locally using local cloud metadata
information. The cloud is accessed to delete objects in the cloud with no live data
and to perform some copy forward of container metadata-related activities.
DD Retention Lock is supported in the cloud tier. Files that are locked on the active
tier using retention lock can be moved to the cloud. Also, you can apply retention
lock on files that are already in the cloud tier. Deleting files in the cloud unit is
prevented on compliant Data Domain systems.
Secure HTTP (HTTPS) is used for the transfer of data between a Data Domain
system and the cloud.
Data Domain encryption can be enabled at three levels: the Data Domain system,
the active tier, and the cloud tier. Encryption of the active tier is only applicable if
encryption is enabled for the Data Domain system. A license for encryption is
required. You are prompted for the security officer username and password to
enable encryption.
Cloud units have separate controls for enabling encryption. Data at rest encryption
is enabled by default on data in the cloud. Users can disable encryption. Active tier
encryption is not required to enable cloud tier encryption. With DD OS 6.0 and
later, using an external key manager is not supported.
Once data is in the cloud tier, the encryption status cannot be changed. The
decision to encrypt data or not must be made before sending any data to the cloud.
The use of an embedded key manager is supported.
Replication
Managed file replication and MTree replication are supported on cloud tier-enabled
Data Domain systems. One or both systems can have cloud tier enabled. If the
source system is cloud tier-enabled, data may be read from the cloud if the file was
already migrated to the cloud tier. A replicated file is always placed first in the
active tier on the destination system even when cloud tier is enabled.
To support replication to the cloud, any source system must be running DD OS 6.0
or later. See the DD OS Release Notes system requirements.
Introduction
Configure Storage
As mentioned previously, with cloud tier storage, the Data Domain system holds
the metadata for the files residing in the cloud. A copy of the metadata resides in
the cloud for disaster recovery.
To enable the cloud tier, you must meet the storage requirement for the licensed
capacity. Configure the cloud tier in the file system. Click Next. A cloud file system
requires a local store for a local copy of the cloud metadata.
If creating a file system, the cloud tier can be enabled at the time that the new file
system is created. To create a file system, select Create File System and then
configure the active tier of the system.
In Data Management > File System, the main panel displays statistics for the
active and cloud tiers. Highlighted here is space usage data for the cloud tier.
When the cloud tier is added to the file system, add a cloud unit for a supported
cloud provider for storing cloud tier data. A maximum of two cloud units are
supported.
You can modify cloud unit credentials for an existing cloud unit, if necessary. To
delete a cloud unit, contact EMC Support.
Firewall port requirements. Port 443 or Port 80 must be open to the cloud provider
networks for both endpoint IPs and provider authentication IP for bi-directional
traffic. Remote cloud provider destination IP and access authentication IP address
ranges must be enabled through the firewall.
For ECS private cloud, local ECS authentication, and web storage (S3), access to
ports 9020 (HTTP) and 9021 (HTTPS) must be enabled through the firewall. ECS
private cloud load balancer IP access and port rules must be configured.
Proxy settings must be configured to support object sizes up to 4.5 MB. If customer
traffic is being routed through a proxy, a self-signed/CA-signed proxy certificate
must be imported.
OpenSSL cipher suites. Default communication with all cloud providers is initiated
with strong cipher.
Import CA certificates before adding cloud units for ECS, Virtustream Storage
Cloud, Azure, and Amazon Web Services S3 (AWS).
You can modify cloud unit credentials for an existing cloud unit, if necessary. To
delete a cloud unit, contact EMC Support.
For AWS, Azure and Virtustream, root CA certificates can be downloaded from
https://www.digicert.com/digicert-root-certificates.htm. For AWS and Azure cloud
providers, download the Baltimore CyberTrust Root certificate. For a Virtustream
cloud provider, download the DigiCert High Assurance EV Root CA certificate.
For ECS, the root certificate authority varies by customer. Contact the load
balancer provider for details.
For ECS private cloud, local ECS authentication, and web storage (S3), access to
ports 9020 (HTTP) and 9021 (HTTPS) must be enabled through the firewall. ECS
private cloud load balancer IP access and port rules must be configured.
Downloaded certificate files have a .crt extension. Use OpenSSL to convert the file
from .crt format to .pem. For additional information, see the Data Domain System
Administration Guide found at support.emc.com.
Regions are configured at bucket level instead of object level. All objects that are
contained in a bucket are stored in the same region. A region is specified when a
bucket is created, and cannot be changed once it is created.
The Alibaba Cloud user credentials must have permissions to create and delete
buckets and to add, modify, and delete files within the buckets they create.
• GetObject
• PutObject
• DeleteObject
AWS offers a range of storage classes. The Cloud Providers Compatibility Matrix,
available from http://compatibilityguide.emc.com:8080/CompGuideApp/ provides
up-to-date information about the supported storage classes.
For enhanced security, the Cloud Tier feature uses Signature Version 4 for all AWS
requests. Signature Version 4 signing is enabled by default.
The AWS user credentials must have permissions to create and delete buckets and
to add, modify, and delete files within the buckets they create.
• DeleteBucket
• ListAllMyBuckets
• GetObject
• PutObject
• DeleteObject
3. In the Add Cloud Unit screen, enter a name for the cloud unit
4. For Cloud provider, select EMC Elastic Cloud Storage (ECS)
5. Supply the provider Access key and Secret key
6. Enter the provider endpoint with this format
http://<ip/hostname>:<port>. If using a secure endpoint, use HTTPS
instead
7. IF an HTTP proxy server is required to get around a firewall, click Configure for
HTTP Proxy Server
8. Click Add
The File System main window displays information for the cloud unit.
The Cloud Tier feature supports qualified S3 cloud providers under an S3 Flexible
provider configuration option.
3. Enter a name for this cloud unit. Only alphanumeric characters are supported
4. For Cloud provider, select Flexible Cloud Tier Provider Framework for S3 from
the drop-down list
5. Enter the provider Access key as password text
6. Enter the provider Secret key as password text
7. Specify the appropriate Storage region
8. Enter the provider endpoint in this format: http://<ip/hostname>:<port>. If you
are using a secure endpoint, use https instead
9. For Storage class, select the appropriate storage class from the drop-down list
10. Ensure that port 443 (HTTPS) is not blocked in firewalls. Communication with
the S3 cloud provider occurs on port 443
11. If an HTTP proxy server is required to get around a firewall for this provider,
click Configure for HTTP Proxy Server. Enter the proxy hostname, port, user,
and password
12. Click Add
The Google Cloud Provider user credentials must have permissions to create and
delete buckets and to add, modify, and delete files within the buckets they create.
Microsoft Azure offers a range of storage account types. The Cloud Providers
Compatibility Matrix, available from
http://compatibilityguide.emc.com:8080/CompGuideApp/ provides up-to-date
information about the supported storage classes.
3. Enter a name for this cloud unit. Only alphanumeric characters are supported
4. For Cloud provider, select Microsoft Azure Storage from the drop-down list
5. For Account type, select Government or Public
6. Select the storage class from the drop-down list
7. Enter the provider Account name
8. Enter the provider Primary key as password text
9. Enter the provider Secondary key as password text
10. Ensure that port 443 (HTTPS) is not blocked in firewalls. Communication with
the Azure cloud provider occurs on port 443
11. If an HTTP proxy server is required to get around a firewall for this provider,
click Configure for HTTP Proxy Server. Enter the proxy hostname, port, user,
and password
12. Click Add
The Virtustream cloud provider uses the following endpoints, depending on storage
class and region. Be sure that DNS can resolve these hostnames before
configuring cloud units.
• s-us.objectstorage.io
• s-eu.objectstorage.io
• s-eu-west-1.objectstorage.io
• s-eu-west-2.objectstorage.io
• s-us-central-1.objectstorage.io
Introduction
In this lab you will configure cloud tier storage on a Data Domain system using Dell
EMC Elastic Cloud Storage as the cloud service provider.
Data Movement
Introduction
This lesson covers data movement to the cloud and supported protocols.
A file is moved from the Active to the Cloud Tier based on the date it was last
modified. For data integrity, the entire file is moved at this time. The Data
Movement Policy establishes the age threshold, age range, and the destination.
The data movement schedule is set at File System > Cloud Units.
The schedule is set relative to active tier cleaning. Run cloud tier cleaning after
every Nth run of active tier cleaning. By default, cloud tier cleaning runs after every
4th scheduled active tier cleaning. On-demand cleaning is per cloud unit.
Recall is the act of bringing data from the cloud to the active tier. Restore is the act
of recovering data from the active tier and making it available to the client.
For nonintegrated backup applications, you must recall the data to the active tier
before you can restore the data. Backup administrators must trigger a recall or
backup applications must perform a recall before cloud-based backups can be
restored. Once a file is recalled, aging is reset and starts again from 0, and the file
is eligible based on the age policy set. A file can be recalled on the source MTree
only. Integrated applications can recall a file directly.
Recall fails if there is no space in the active tier to move the file. This decision is
made before any movement is started. Recall is per file. Cloud Tier checks for
existing data segments on the active tier. Only segments not present in the active
tier are invoked for recall from the cloud.
Select Data Management > File System > Summary. In the Cloud Tier section of
the Space Usage panel, click Recall, or expand the File System status panel at
the bottom of the screen. Click Recall.
The Recall link is available only if a cloud unit is created and has
data. The Recall File from Cloud dialog is displayed.
In the Recall File from Cloud dialog, enter the exact file name (no wildcards) and
full path of the file, for example: /data/col1/mt11/ file1.txt. Click Recall.
Before DD OS 6.1, there was a four file limit for recalls at a given time. Any new
recall jobs had to poll for a slot, creating a bottleneck.
In DD OS 6.1 and later, only four recall jobs are active at any given time.
Customers can queue up to 1,000 recall jobs to run automatically as previous jobs
complete. The recall queue is auto-regenerated, so if the system is restarted during
a recall the recall continues when the system is back up.
Once the file has been recalled to the active tier, you can restore the data.
Tape Out to cloud storage offers the ability to store offsite and retrieve tapes for
long-term retention (LTR) use cases.
VTL Tape Out to Cloud requires DD OS 6.1 or higher installed on either a physical
Data Domain system or a DD VE instance. The Data Domain Cloud Tier feature
must be enabled. A cloud profile and cloud unit name should be configured before
using the VTL Tape Out to Cloud feature.
Both VTL and Cloud Tier Capacity licenses are required to use the VTL Tape Out
to Cloud feature.
The workflow for backing up and restoring data using the VTL Tape Out to Cloud
feature is as follows:
1. Perform the backup Server/Client configuration and user application setup
2. Backup to primary disk storage pools
3. During backup, the data is backed up while the backup server maintains the
necessary backup catalog and tracking metadata
4. Data replicates to Data Domain VTL vault
5. This replication can be onsite or geographically separated sites. The backup
server tracks the tapes in a “mountable” state
6. Once the tapes are ready for long-term retention, they are ejected from the tape
storage pool
You can manage a DD VTL using the Data Domain System Manager or the Data
Domain Operating System (DD OS) Command Line Interface (CLI).
End-to-End Workflow
The Data Domain VTL Tape Out to Cloud feature uses these components in the
Data Domain system. The user interacts with the Data Domain system using the
CLI or GUI. The VTL service uses the Tape Out to Cloud functionality built on the
Data Domain file system Long-Term Retention service.
The DD file system uses NFS v3 APIs to access the VTL tape pool to send the
virtual tapes in the vault to the cloud tier.
There are two types of policies that Tape Out to Cloud is built upon.
The Tape selection policy is applied at the pool level and sets the age threshold for
data moving to the cloud. The minimum setting is 14 days. If the policy is set to
user-managed, the user uses a command to select one or more tapes to move at
the next scheduled data movement. If the setting is set to none, no tapes are
moved to the cloud.
The cloud data movement schedule defines how frequently vaulted tapes are
moved to the cloud. The cloud data movement schedule can be set to never, to any
number of days/weeks, or run manually.
You can find specific commands that are used to set the tape selection policy, and
cloud data movement schedule in the Data Domain Command Reference Guide.
The vtl tape recall start command can be used to recall one or more
tapes from the cloud.
The vtl tape show pool command can also be used to view the state of tapes
being recalled.
When the recall is completed, the vtl tape show pool command can be used
to list the location of the tapes. The recalled tapes are displayed as, “vault.”
Introduction
In this lab you will configure data movement to the cloud and then recall a file from
the cloud.
Summary