Data Domain Fundamentals Student Guide
Data Domain Fundamentals Student Guide
Copyright ©2015 EMC Corporation. All Rights Reserved. Published in the USA. EMC believes the information in this publication is accurate as of its
publication date. The information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND
WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. The trademarks, logos, and
service marks (collectively "Trademarks") appearing in this publication are the property of EMC Corporation and other parties. Nothing contained in this
publication should be construed as granting any license or right to use any Trademark without the prior written permission of the party that owns the
Trademark.
EMC, EMC² AccessAnywhere Access Logix, AdvantEdge, AlphaStor, AppSync ApplicationXtender, ArchiveXtender, Atmos, Authentica, Authentic Problems,
Automated Resource Manager, AutoStart, AutoSwap, AVALONidm, Avamar, Bus-Tech, Captiva, Catalog Solution, C-Clip, Celerra, Celerra Replicator,
Centera, CenterStage, CentraStar, EMC CertTracker. CIO Connect, ClaimPack, ClaimsEditor, Claralert ,cLARiiON, ClientPak, CloudArray, Codebook
Correlation Technology, Common Information Model, Compuset, Compute Anywhere, Configuration Intelligence, Configuresoft, Connectrix, Constellation
Computing, EMC ControlCenter, CopyCross, CopyPoint, CX, DataBridge , Data Protection Suite. Data Protection Advisor, DBClassify, DD Boost, Dantz,
DatabaseXtender, Data Domain, Direct Matrix Architecture, DiskXtender, DiskXtender 2000, DLS ECO, Document Sciences, Documentum, DR Anywhere,
ECS, elnput, E-Lab, Elastic Cloud Storage, EmailXaminer, EmailXtender , EMC Centera, EMC ControlCenter, EMC LifeLine, EMCTV, Enginuity, EPFM.
eRoom, Event Explorer, FAST, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony, Global File Virtualization, Graphic Visualization, Greenplum,
HighRoad, HomeBase, Illuminator , InfoArchive, InfoMover, Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, ISIS,Kazeon, EMC LifeLine,
Mainframe Appliance for Storage, Mainframe Data Library, Max Retriever, MCx, MediaStor , Metro, MetroPoint, MirrorView, Multi-Band
Deduplication,Navisphere, Netstorage, NetWorker, nLayers, EMC OnCourse, OnAlert, OpenScale, Petrocloud, PixTools, Powerlink, PowerPath, PowerSnap,
ProSphere, ProtectEverywhere, ProtectPoint, EMC Proven, EMC Proven Professional, QuickScan, RAPIDPath, EMC RecoverPoint, Rainfinity, RepliCare,
RepliStor, ResourcePak, Retrospect, RSA, the RSA logo, SafeLine, SAN Advisor, SAN Copy, SAN Manager, ScaleIO Smarts, EMC Snap, SnapImage,
SnapSure, SnapView, SourceOne, SRDF, EMC Storage Administrator, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix
DMX, Symmetrix VMAX, TimeFinder, TwinStrata, UltraFlex, UltraPoint, UltraScale, Unisphere, Universal Data Consistency, Vblock, Velocity, Viewlets, ViPR,
Virtual Matrix, Virtual Matrix Architecture, Virtual Provisioning, Virtualize Everything, Compromise Nothing, Virtuent, VMAX, VMAXe, VNX, VNXe, Voyence,
VPLEX, VSAM-Assist, VSAM I/O PLUS, VSET, VSPEX, Watch4net, WebXtender, xPression, xPresso, Xtrem, XtremCache, XtremSF, XtremSW, XtremIO,
YottaYotta, Zero-Friction Enterprise Storage.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 1
This course covers the features, benefits, and advantages of using a Data Domain system
for backup operations, the physical architecture of a typical backup environment using Data
Domain systems, and the methods used for administering a Data Domain system.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 2
Having an understanding of where this course fits into your Data Domain curriculum will
help you find the additional training you require, as well as have realistic expectations of
what is covered in the course.
This course, Data Domain Fundamentals, is a prerequisite for all courses in both the Data
Domain Specialist for Implementation Engineers and Data Domain Specialist for Storage
Administrators certifications.
To earn the Data Domain Specialist for Implementation Engineers certification, you will
need to complete the following:
• Data Domain System Administration
• Data Domain Hardware Installation
• Data Domain System Implementation with Application Hardware
• Data Domain Extended Retention – Installation, Configuration, and Administration
• Data Domain Specialist Exam for Implementation Engineers
To earn the Data Domain Specialist for Storage Administrators certification, you will need to
complete the following:
• Data Domain System Administration
• Data Domain System Implementation with Application Hardware
• Data Domain Extended Retention – Installation, Configuration, and Administration
• Data Domain Specialist Exam for Storage Administrators
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 3
This module focuses on the Data Domain benefits, features and current hardware models.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 4
This lesson covers the benefits and advantages of the Data Domain solution for backup,
recovery, archiving, and compliance.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 5
Increasing the storage speed and capacity for the data generated along with the cost-
effectiveness is a perpetual challenge. One of the most expensive and resource intensive
task is gathering, storing, and protecting data backups. Writing data on the tapes and
shipping and storing the tapes off-site is one of the largest financial and labor resource
challenge in the conventional tape centric environment. The above diagram illustrates the
conventional process of handling backups through backup servers.
In step one, the diagram describes how clients and servers are storing data on the primary
storage device.
Step two illustrates the conventional process of handling backups through backup servers.
The backup servers preserve the data on the primary storage device by copying it to a tape
library.
In step three, tapes are physically transported and stored off-site for archival and disaster
recovery purposes. This insures the backup data will not be lost due to a negative event in
the data center.
Step four describes off-site data recovery process. In this case, data recovery requires a
manual process of transporting the tapes back to the primary storage device in the data
center.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 6
Introducing a Data Domain system to an existing backup environment adds a scalable
backup and archive solution to any enterprise environment.
Data Domain systems are a protection storage platform for backup and archive data that
reduce the amount of disk storage needed to retain and protect data by ratios of 10-30x
and greater, making disk a cost-effective alternative to tape. These systems can protect up
to 86 PB of logical capacity in a single system enabling customers to retain data online and
onsite for longer retention periods, as well as providing faster and more reliable restores.
Data Domain systems allow more backups to complete faster with throughput up to 59
TB/hour. It also reduces the pressure on limited backup windows.
EMC Data Domain Replicator software transfers only the deduplicated and compressed
unique changes across any IP network, requiring a fraction of the bandwidth, time, and
cost, compared to traditional replication methods. “Time-to-DR readiness” is greatly
reduced when compared to other replication methods.
Data Domain’s Data Invulnerability Architecture – built into every Data Domain system –
provides the industry’s best defense against data integrity issues ensuring you can access
and recover your data when you need it.
Finally, Data Domain systems are able to consolidate backup, archive, and disaster recovery
onto a single platform making them an ideal protection storage solution.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 7
A Data Domain appliance is a storage system with shelves of disks and a controller. This
appliance is optimized to perform the backup first and then archive the applications. It also
supports the industry-leading enterprise applications.
The list on the left comprise primarily of backup, archive, and enterprise applications—not
only EMC’s offerings with EMC NetWorker & Avamar, but also with Dell, Symantec, Oracle,
HP, IBM, SAP Hana, and others.
The data is transferred from the application to primary storage through Ethernet or Fibre
Channel.
Ethernet uses mass storage protocols, NFS or CIFS. It can also use optimized protocols
such as NDMP, or products such as Data Domain Boost; a custom integration with leading
backup applications.
Fibre Channel connectivity enables a Data Domain system to act as a virtual tape library
which eliminates virtual tape management. Fibre Channel connectivity also enables DD
Boost over Fibre Channel.
After the data is received by Data Domain system, it’s deduplicated during storage process
and later it is replicated for disaster recovery. Only the deduplicated and compressed
unique data segments that have been filtered out through the process on the target tier are
replicated.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 8
Data Domain systems are the ideal protection storage platform for backup and archive
data. You can use a variety of data sources including databases, email servers, virtual
machines, and more. One of the main strengths of Data Domain systems is that all of these
data sources, and a broad range of backup and archive use cases and applications, can be
protected on a single system.
For both backup and archive use cases, one of the unique feature Data Domain systems
offer is the ability to encrypt data inline, meaning data is deduplicated then encrypted in
real-time as it is written to disk. Furthermore, on the archive side, Data Domain systems
can meet a variety of US and international compliance regulations for archive data.
Deployed in the context of today's backup and recovery solutions, Data Domain systems
optimize data protection environments by minimizing reliance on tape. Customers can
consolidate tape operations to reduce costs and ease management while significantly
reducing the backup windows and quickly restoring key applications. Both local and remote
Data Domain deduplication storage reduces data volume by 10 to 30 times making disk
backup cost effective and wide area network vaulting of data operationally feasible. In
summary, backup and recovery solutions employing Data Domain systems minimize risk,
improve data protection and recovery, and control costs.
In the context of business continuity and availability Data Domain systems optimize data
protection and allow customers to quickly restore key applications. The incorporation of
Data Domain systems into business continuity and availability solutions helps to assure
business operations that recover quickly from disasters while improving availability of key
business systems.
Finally, all data sent to a Data Domain system can be efficiently replicated to a secondary
site for disaster recovery and/or long-term retention. This secondary site can be “on
premise” at your own DR site or could be in the “cloud” at a service provider’s facility.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 9
In summary, Data Domain systems simplify the storage and handling of data by reducing,
or in many cases, entirely eliminating the need for tape for data storage. With Data Domain
systems, data is backed up to disk instead of tape.
Data domain deduplication greatly reduces the data footprint before the data is backed up.
Data domain global compression technology combines an exceptionally efficient high-
performance in-line deduplication technology with a local compression technique. The
reduced data footprint allows data to be retained on-site for longer periods and allows
transfer across the network for archival.
Tape backups can optionally be incorporated into a Data Domain environment if required by
regulatory or corporate requirements.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 10
Data Domain systems are part of the EMC Data Protection Suite.
For backup and recovery, the Data Protection Suite also includes Avamar and NetWorker
with integrated data deduplication technology, providing enterprise-wide protection of files,
applications, and databases in both physical and virtual environments, as well as unified
search capabilities with Data Protection Search. The Suite allows enterprises to ensure long-
term reliable backup retention, including long term retention to private and public clouds
with CloudBoost. It also includes Data Protection Advisor, for robust, yet simple data
protection environment monitoring, analysis and reporting.
The introduction of DD Boost for Enterprise Applications to the Suite provides advanced
integration with leading third party backup and enterprise applications. By distributing parts
of the deduplication to the backup or application server, DD Boost technology speeds
backups by up to 50%. In addition, DD Boost technology enables more efficient resource
utilization– including reducing the backup impact on the server by 20 to 40% and reducing
the impact on the network by 80 to 99%.
The Cloud option includes MozyEnterprise which combines enterprise-grade security and
controls with best-in-class online backup and recovery. MozyEnterprise delivers the ability
for enterprises to easily protect mobile workers and small remote offices.
The Syncplicity option delivers customers secure sync, share, collaboration, and real-time
document protection for DT/LT and mobile users to hybrid cloud.
The archive components of the Suite leverages EMC SourceOne. An added option to Data
Protection Suite for Archive is available which includes SourceOne Discovery Manager and
Email Supervisor.
All best-in-class solutions are in EMC’s affordable, easy to implement and manage Data
Protection Suite.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 11
This lesson covers Data Domain system hardware models.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 12
Here's the current Data Domain family - by reducing storage requirements by 10 to 30x and
archive storage requirements by up to 5x, Data Domain systems can help significantly
reduce the storage footprint for small enterprise/ROBO (Remote Office/Back Office)
environments and scaling all the way up to large enterprise environments.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 13
This table highlights the capacities and compatibilities of the options for the ES30 expansion
shelves.
The ES30-SATA can accommodate 15 one, two, or three TB drives and supports the
DD4200, DD4500, DD7200, and DD9500.
The ES30-SAS can accommodate 15 two or three TB drives and supports the DD2500,
DD4200, DD4500, DD7200, and DD9500.
Both the ES30-SATA and ES30-SAS have one spare drive. ES30-SATA and ES30-SAS
shelves can be attached to the same head unit, but cannot be combined in the same set.
The ES30-60 can accommodate 15 four TB drives and supports the DD4500, DD7200, and
DD9500.
DS60 (Dense Storage) shelf supports 3TB and 4TB SAS drives in 15 drive increments, up to
60 drives per shelf. DS60 is available for the DD4200, DD4500, DD7200, and DD9500
systems.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 14
This lesson covers the hardware features common to all Data Domain appliances.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 15
Data Domain systems are based around the same basic hardware architecture.
Hardware features common to all models include: rack mountable in four-post racks; hot-
swappable disks with redundant hot-swappable fans and redundant hot-swappable power
modules; DIMM (Dual In-line Memory Module) modules for RAM (Random Access Memory);
a battery backed NVRAM (Non-Volatile RAM) card or PRAM (Persistent RAM); video,
keyboard, and mouse ports to connect to a monitor and keyboard and mouse; front panel
LEDs (Light Emitting Diodes) that provide system status indicators.
Most Data Domain systems support the addition of one or more storage expansion shelves
to increase capacity.
Documents for each hardware model are published on the EMC support site.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 16
Components under high mechanical or electrical stress such as spinning drives, fans, and
power supplies are provided with N+1 redundant configuration, N+1 redundancy is a
system configuration in which certain components have at least one independent backup
component to ensure system functionality continues if a part fails. This allows for
uninterrupted operation at full capacity and operational status if one component fails. For
data, RAID 6 (Redundant Array of Independent Disks) technology provides additional
protection of data integrity when up to two disks fail.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 17
Connectivity features include keyboard, mouse, and monitor connections, serial and
Ethernet ports, and many systems also support Fibre Channel connectivity.
Many Data Domain models provide keyboard and PS2 mouse port for connecting directly to
the unit with a keyboard and monitor. Check with the onsite administrator for the preferred
access method. For repairs in the field, access to the command line interface to shut down,
restart, and run diagnostics is usually through the serial port.
All Data Domain systems may be connected to Ethernet networks for TCP/IP-based data
transfer and system management. All models have a minimum of two built-in ports. Some
models may be configured with additional ports by adding optional Ethernet expansion
cards. Newer systems also include a dedicated Ethernet port for what is known as lights-out
management or remote system management. Interface cards are usually added to provide
additional network capacity.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 18
This module covered the Data Domain benefits, features, and current hardware models.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 19
This module provides an overview of Data Domain architecture and technologies.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 20
This lesson covers the data paths used by Data Domain systems.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 21
In environments that rely on Ethernet connectivity, backup and archive media servers send
data from clients to Data Domain systems on the network. A direct connection between a
dedicated port on the backup management server and a dedicated port on the Data Domain
system may also be used.
The data is written to the backup file system on the Data Domain system. Physical
separation of the replication traffic from backup traffic can be achieved by using two
separate Ethernet interfaces on the Data Domain system. This allows backups and
replication to run simultaneously without network conflicts.
Data Domain systems support the following protocols over Ethernet connections:
• NFS
• CIFS
• DD Boost
• NDMP
• Telent/SSH (for system administration purposes only)
• FTP/SFTP (for system administration purposes only)
• HTTP/HTTPS (for system administration purposes only)
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 22
If a supported FC HBA is installed on the Data Domain system, the system can be
connected to a Fibre Channel system attached network (SAN) and use VTL and DD Boost
for backup operations.
If the Data Domain virtual tape library (VTL) option is licensed, the backup or archive server
sees the Data Domain system as one or multiple VTLs.
If the Data Domain Boost (DD Boost) option is licensed, any supported backup application
will be able to perform backup and restore operations using the DD Boost protocol over the
Fibre Channel connection. Refer to the Data Domain Boost Compatibility Guide and Data
Domain Boost Administrator Guide (available on the EMC support portal) for backup
applications that support DD Boost over Fibre Channel.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 23
This lesson covers the Data Domain file structures.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 24
Data Domain system administrative files are stored in /ddvar. This directory stores system
core and log files, generated support upload bundles, compressed core files, and upgrade
packages.
The /ddvar folder keeps administrative files separate from storage files.
You cannot rename or delete /ddvar, nor can you access all of its sub-directories. But the
files stored in /ddvar can be deleted and retrieved as well.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 25
The MTree (Managed Tree) file structure is the destination for user data. It is also the root
directory for user data. You can configure your backup application to separate and organize
backup files using the MTree file structure.
MTrees provide more granular space management and reporting. This allows for finer
management of several features including replication, snapshots, quotas, and retention
lock. These operations can be performed on a specific MTree rather than on the entire file
system.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 26
This lesson covers general deduplication methods and Data Domain deduplication.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 27
In file-based deduplication, only the original instance of a file is stored. Future identical
copies of the file use a small reference to point to the original file content. File-based
deduplication is also called single-instance storage (SIS).
Variable-length segment deduplication evaluates data by examining its contents to look for
the boundary from one segment to the next. Variable-length segments are any number of
bytes within a range determined by the particular algorithm implemented.
With post-process deduplication, files are written to disk first, then they are scanned and
compressed.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 28
With Data Domain inline deduplication, variable-length segments are examined as soon as
they arrive in the system to determine if they are a new segment or a duplicate of a
segment previously stored. Data Domain deduplication occurs in RAM, before the data is
written to disk. Around 99% of data segments are analyzed in RAM without disk access.
Data Domain inline deduplication analyzes the data in RAM, and reduces disk seek times to
determine if the new data must be stored. Writes from RAM to disk are done in full-stripe
batches to use the disk more efficiently, reducing disk access.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 29
This lesson covers the Stream-Informed Segment Layout (SISL) scaling architecture.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 30
SISL is short for Stream-Informed Segment Layout.
SISL is used to implement EMC Data Domain inline deduplication. SISL uses fingerprints
and RAM to identify segments already on disk.
SISL architecture provides fast and efficient deduplication by avoiding excessive disk reads
to check if a segment is on disk:
• 99% of duplicate data segments are identified inline in RAM before they are stored to
disk.
• Scales with Data Domain systems using newer and faster CPUs and RAM.
• Increases new-data processing throughput-rate.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 31
SISL does the following:
1. Segment: The data is split into variable-length segments.
2. Fingerprint: Each segment is given a fingerprint, or hash, for identification.
3. Filter: The summary vector and segment locality techniques identify 99% of the
duplicate segments in RAM (inline) before storing to disk. If a segment is a
duplicate, it is referenced and discarded. If a segment is new, the data moves on to
step 4.
4. Compress: New segments are grouped and compressed using common algorithms:
lz, gz, gzfast, or off/no compression (lz by default).
5. Write: Writes data (segments, fingerprints, metadata, and logs) to containers stored
on disk.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 32
This lesson covers the Data Domain Data Invulnerability Architecture (DIA).
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 33
Data Invulnerability Architecture (DIA) is an important EMC Data Domain technology that
provides safe and reliable storage.
Data Domain systems are designed as the storage of last resort – built to ensure you can
reliably recover your data with confidence. Its elements comprise an architectural design
which provides data invulnerability.
Four technologies within the DIA that fight against the data loss are:
• Inline data verification
• Fault avoidance and containment
• Continuous fault detection and healing
• File system recoverability
DIA helps to provide data integrity and recoverability and extremely resilient and protective
disk storage. This keeps data safe.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 34
As stated previously, DIA uses four technologies to fight data loss.
The inline data verification check verifies all file system data and metadata. The end-to-end
verification flow is as follows:
1. Writes request from backup software
2. Analyzes data for redundancy
3. Stores new data segments
4. Stores fingerprints
5. Verifies that DD OS can read the data from disk
6. Verifies that the checksum that is read back matches the checksum written to disk.
In addition to end-to end verification, Data Domain systems are equipped with a specialized
log-structured file system. Latest data is never overwritten on the existing data. Traditional
file systems often overwrite blocks when data is changed, and then uses the old block
address. The Data Domain file system writes only to new blocks. This eliminates the
chances of incorrect overwrite, that may be caused by a software bug, to the latest backup
data. Also older versions remain safe.
RAID 6 redundancy, enables continuous fault detection and healing to provide an extra level
of protection within the Data Domain operating system. The DD OS detects faults and
recovers them continuously. Continuous fault detection and healing ensures successful data
restore operations.
Lastly, the DIA file system recovery reconstructs lost or corrupted file system metadata. It
includes file system check tools so if a Data Domain system does have a problem, DIA file
system recovery ensures that the system is brought back online quickly.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 35
This module covered an overview of Data Domain architecture and technologies.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 36
This module focuses on the features and benefits of the Data Domain Operating System
(DD OS).
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 37
This lesson covers the features, benefits, and ecosystem of the DD Boost protocol.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 38
DD Boost is a private protocol that is more efficient than CIFS or NFS. DD Boost has a
private, efficient data transfer protocol with options to increase efficiencies.
Data Domain Boost is a software option supported across the entire Data Domain family,
that distributes parts of the deduplication process out of the Data Domain system and onto
the backup or application server enabling client-side deduplication. This can speed backups
by up to 50% and enables more efficient resource utilization – including reducing the
impact on the server by 20 to 40%. It also reduces the impact on the network by 80 to
99%.
In addition, DD Boost for backup applications enables the application to control Data
Domain replication process with full catalog awareness of both the local and remote copies
of the backup.
DD Boost for Enterprise Applications provides application owners control and visibility of
their own backups to Data Domain systems using their native utilities.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 39
EMC Avamar and NetWorker support DD Boost over LAN, SAN, and WAN. Other leading
backup and enterprise applications support DD Boost over LAN and/or SAN.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 40
This lesson covers the features and benefits of the Data Domain VTL.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 41
Data Domain Virtual Tape Library software eliminates the challenges of physical tape and
can emulate up to 60 or more virtual tape libraries with up to 1080 virtual tape drives, and
unlimited tape cartridges.
EMC has qualified Data Domain Virtual Tape Library with leading open systems and IBM i
enterprise backup applications. It integrates non-disruptively into existing Fibre Channel
storage area network (SAN) backup environments.
Any Data Domain system running VTL can also run other backup operations using NAS,
NDMP, and DD Boost simultaneously.
Using EMC Data Domain Replicator software, organizations can vault virtual tape cartridges
over a wide area network (WAN) to another site for disaster recovery, remote office backup
and recovery, or multisite tape consolidation.
Disk-based network storage provides a shorter RTO by eliminating the need for handling,
loading, and accessing tapes from a remote location.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 42
This lesson covers Data Replication features, benefits and types.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 43
EMC Data Domain Replicator (DD Replicator) provides automated, policy-based, network
efficient, and encrypted replication for DR (disaster recovery) and multi-site backup and
archive consolidation. DD Replicator asynchronously replicates only compressed,
deduplicated data over a WAN (wide area network), which eliminates up to 99 percent of
the bandwidth required compared to standard replication methods.
When replicating over untrusted networks, Data Domain Replicator can encrypt sensitive
data. This encryption can be enabled on all or only a selected portion of the replicated data
set.
For fast time-to-DR readiness, Data Domain Replicator provides logical throughput
performance up to 52 TB per hour over a 10 Gb network in replication deployments where
one Data Domain system is mirroring its data to another.
You can also consolidate data from up to 270 remote sites by simultaneously replicating
data to a single, large Data Domain system at a central hub.
Data Domain Replicator offers flexibility by providing multiple replication topologies such as
full-system mirroring, bidirectional, many-to-one, one-to-many, and cascaded. In addition,
you can replicate either all or a subset of the data on the Data Domain system. For the
highest level of security, DD Replicator can encrypt data being replicated between DD
systems using the standard SSL (Secure Socket Layer) protocol.
To manage network utilization, you can set up a schedule to throttle Data Domain
Replicator WAN utilization at different times of the day.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 44
Replication is set up with a source Data Domain system and one or more destination Data
Domain systems. There are five replication types: Collection, Directory, MTree, Pool, and
Managed.
• Collection
– Duplicates the entire data store on the source and transfers that to the
destination, and the replicated volume is read-only.
• Directory
− Provides replication at the level of individual directories.
• MTree
– Replicates entire MTrees (that is, a virtual file structure that enables advanced
management).
• Pool
– Pool replication is similar to directory replication, but the source is VTL data.
• Managed
– Used with Data Domain Boost and is managed and controlled by the backup
software.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 45
Data Domain has various supported replication topologies where data flows from source to
destination directory over a LAN or WAN. Directory replication can be configured in the
following ways: one-to-one replication is the simplest type of replication. This is from the
Data Domain source system to a Data Domain destination system.
In a bidirectional replication pair, data from the source is replicated to the destination
directory on the destination system and from the source directory on the destination
system to the destination directory on the source system.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 46
This lesson covers Data Domain Extended Retention features and benefits.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 47
EMC Data Domain Extended Retention provides long-term retention of backup data and
eliminates tape infrastructure for backup retention. This software is supported on the
DD860, DD990, DD4200, DD4500, DD7200, and DD9500 systems.
EMC Data Domain Extended Retention provides an internal tiering approach that enables
cost-effective, long-term retention of backup data on an EMC Data Domain system. With
this, customers can leverage Data Domain systems for long-term backup retention and
minimize reliance on tape. Data Domain Extended Retention transparently incorporates two
tiers of storage on a Data Domain system to achieve cost-effective scalability while
delivering the throughput required to ingest hundreds of terabytes of backup data. This
combination makes Data Domain systems the ideal tape elimination solution for long-term
backup retention.
Data Domain Extended Retention provides transparent separation of short-term and long-
term backup data by storing it in different tiers on Data Domain systems. Data is initially
stored in the active tier for backup and operational recovery, then moved to an extremely
scalable retention tier that is optimized for long-term data retention—usually measured in
years.
It ensures long-term data access and recoverability with fault isolation so that in the event
of a failure or catastrophe the system continues to operate with all unaffected components.
Data Domain Extended Retention enables granular unit-to-unit replication for disaster
recovery. In the event of a connectivity issue affecting the replication process, the Data
Domain system only needs to replicate the impacted unit to resynchronize.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 48
This lesson covers Data Domain encryption and data sanitization.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 49
EMC Data Domain system offers two type of encryption:
• Encryption of data at rest
• Encryption of data in flight
Encryption of data at rest protects user data if the Data Domain system is lost or stolen. It
also eliminates accidental exposure if a failed drive needs replacements. When the file
system is intentionally locked, an intruder who circumvents the network security controls
and gains access to the Data Domain system will be unable to read the file system without
the proper administrative control, passphrase, and cryptographic key.
By default, Data Domain Encryption software option encrypts all data on the system using
an internally-generated encryption key. This encryption key is static and cannot be changed
by the user.
In addition to above features it also provides inline encryption, which means as the data is
being ingested, the data stream is deduplicated, compressed, and encrypted using a
encryption key before being written to the RAID group.
Encryption of data in flight encrypts data being transferred via DD Replicator software. It
uses OpenSSL AES 256-bit encryption to encapsulate the replicated data over the wire. The
encryption encapsulation layer is immediately removed as soon as it lands on the
destination Data Domain system. Data within the payload can also be encrypted via Data
Domain encryption software.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 50
Data sanitization which is also referred as electronic shredding is performed when classified
or sensitive data is written to any system that is not approved to store such data. The Data
Domain sanitization command exists to enable the administrator to delete files at the logical
level, whether a backup set or individual files.
Data Domain's sanitization approach ensures that it complies DoD (Department of Defense)
and NIST (National Institute of Systems and Technology) procedures. Normal file deletion
provides residual data that allows recovery. Sanitization removes all the traces of deleted
files with no residual remains.
System sanitization was designed to remove all traces of deleted files and restore the
system to the state prior to the file's existence. The primary use of the sanitize command
is to resolve Classified Message Incidents (CMIs) that occur when classified data is copied
inadvertently onto a non-secure system. System sanitization is typically required in
government installations.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 51
This lesson covers Data Domain Retention Lock Governance and Compliance Editions.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 52
EMC Data Domain Retention Lock enables IT organizations to efficiently store and manage
different types of governance and compliance archive data on a single EMC Data Domain
system.
Data Domain Retention Lock Compliance edition meets the strictest retention requirements
of regulatory standards such as SEC 17a-4(f) for electronic records including file, email, and
content. Data Domain Retention Lock Compliance edition ensures that files on the Data
Domain system that are locked by an archive application for a specified retention period
cannot be deleted or overwritten under any circumstances until the retention period
expires.
In addition, with Data Domain Retention Lock Compliance edition, litigation hold enables
customers extend retention policies to protect compliant archive data during legal
discovery.
Data Domain Retention Lock also enables secure file locking of archive data at an individual
file level; enabling these files to be intermixed with unlocked files on the same Data Domain
system.
Data Domain Retention Lock leverages industry-standard protocols such as Network File
System (NFS) and Common Internet File System (CIFS) for time-based retention of files. As
a result, it can be integrated seamlessly with industry-leading archive applications including
EMC SourceOne and Symantec Enterprise Vault, providing customers with an end-to-end
archiving solution.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 53
DD Retention Lock Governance edition allows customers to maintain the integrity of the
archive data with the assumption that the system administrator is generally trusted with all
legal actions performed on the Data Domain system.
Locked files cannot be modified on the Data Domain system even after the retention period
for the file expires. Archive data that is retained on the Data Domain system is not deleted
automatically when the retention period expires; an archiving application must delete the
file.
With DD Retention Lock Governance edition, IT administrators can meet secure data
retention requirements while keeping the ability to update the retention period should the
corporate governance policies change.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 54
The DD Retention Lock Compliance edition meets the strict requirements of regulatory
standards for electronic records, such as SEC 17a-4(f), and other standards that are
practiced worldwide.
DD Retention Lock Compliance, when enabled on an MTree, ensures that all files locked by
an archiving application, for a time-based retention period, cannot be deleted or overwritten
under any circumstances until the retention period expires. This is archived using multiple
hardening procedures:
• Requiring dual sign-on for certain administrative actions. Before engaging DD
Retention Lock Compliance edition, the System Administrator must create a Security
Officer role. The System Administrator can create the first Security Officer, but only
the Security Officer can create other Security Officers on the system.
• Some of the actions requiring dual sign-on are:
– Extending the retention periods for an MTree
– Renaming the MTree
– Deleting the Retention Lock Compliance license from the Data Domain system
– Securing the system clock from illegal updates
• DD Retention Lock Compliance implements an internal security clock to prevent
malicious tampering with the system clock. The security clock closely monitors and
records the system clock. If there is an accumulated two-week skew within a year
between the security clock and the system clock, the Data Domain file system (DDFS)
is disabled and can be resumed only by a security officer.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 55
This lesson covers Data Domain Secure Multi-Tenancy.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 56
The secure multi-tenancy for Data Domain feature allows enterprises and service providers
to deliver data protection-as-a-service. Specifically this feature:
• Enables enterprises to deploy Data Domain systems in a private cloud
• Enables service providers to deploy Data Domain systems in a hybrid/public cloud
• Allows for different cloud models for protection storage
– Local Backup, Backup-as-a-Service (BaaS) for hosted applications
– Replicated Backup, Disaster Recovery-as-a-Service (DRaaS)
– Remote Backup, BaaS over WAN
Secure multi-tenancy for Data Domain systems is a feature that enables secure isolation of
many users and workloads on a shared system. As a result, the activities of one tenant are
not visible or apparent to other tenants. This capability improves cost efficiencies through a
shared infrastructure while providing each tenant with the same visibility, isolation, and
control that they would have with their own stand-alone Data Domain system.
A tenant may be one or more business units or departments hosted onsite for an enterprise
or “large enterprise” (LE). A common example would be Finance and Human Resources
sharing the same Data Domain system. Each department would be unaware of the presence
of the other. A tenant may also be one or more external applications that are hosted
remotely by a service provider (SP) on behalf of a client.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 57
SMT components, also known as management objects, provide security and isolation within
a shared infrastructure. SMT components are initially created by the admin during the basic
provisioning sequence, but can also be created manually as needed.
In SMT terms, the landlord is the storage admin or the Data Domain Administrator. The
landlord is responsible for managing the Data Domain system. The landlord sets up the file
systems, storage, networking, replication and protocols. They are also responsible for
monitoring overall system health and replace any failed hardware as necessary.
A tenant is responsible for scheduling and running the backup application for the tenant
customer and for managing their own tenant-units including configuring backup protocols
and monitoring resources and stats within their tenant-unit.
Tenant-units are logical containers for MTrees. They also contain important information,
such as users, notification groups, and other configuration elements. Tenant-units cannot
be viewed or detected by other tenants, which ensures security and isolation of the control
path when running multiple tenants simultaneously on the shared infrastructure.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 58
In this simple example, two companies, Red and Blue, share the same Data Domain
system. Tenant units and individual data paths are logically and securely isolated from each
other and are managed independently. Tenant users can backup using their application
servers to Data Domain storage in secure isolation from other tenants on the Data Domain
system.
Tenant administrators can perform self-service fast copy operations within their tenant
units for data restores as needed. Tenant administrators are able to monitor data capacity
and associated alerts for capacity and stream use.
The landlord, responsible for the Data Domain system monitors and manages all tenants in
the system and has visibility across the entire system. They set capacity and stream quotas
on the system for the different tenant units, and report on tenant unit data.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 59
Logical Data isolation allows providers to spread the capital expenditure and operational
expenditure of a protection storage infrastructure across multiple tenants. Data isolation is
achieved by using separate DD Boost users for different MTrees or by using the access
mechanisms of NFS, CIFS, and VTL.
The DD Boost protocol allows creation of multiple DD Boost users on a Data Domain
system. With that, each tenant can be assigned one or more DD Boost user credentials that
can be assigned access privileges to one ore more MTrees in a tenant unit defined for a
particular tenant. This allows secure access to different tenant datasets using their separate
DD Boost credentials by restricting access and visibility.
Metering and Reporting enable a provider to ensure they are running a sustainable business
model. The need of such reporting in a multi-tenant environments even greater for the
provider to track usage on a shared asset such as a Data Domain System.
Similarly, for other protocols such as CIFS, NFS and VTL the native protocol level access
control mechanisms can be used to provide isolation.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 60
This module covered the features and benefits of the Data Domain Operating System (DD
OS).
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 61
This module focuses on how to access a Data Domain system through the Data Domain
Command Line Interface, Data Domain System Manager, and the Data Domain
Management Center.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 62
This lesson covers the features and benefits of the Data Domain Command Line Interface.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 63
The EMC Data Domain command line interface (CLI) enables you to manage Data Domain
systems.
The initial installation and configuration of the Data Domain Operating System will most-
likely be done with direct access to the hardware either through a serial connection or using
a keyboard and monitor directly attached to the system.
To initially access the Data Domain system, the default administrator’s username and
password will be used. The default administrator name is sysadmin. The initial password for
the sysadmin user is the system’s serial number.
After the initial configuration, use the SSH or Telnet (if enabled), IPMI, or SOL utilities to
access the system using the CLI remotely.
The DD OS Command Reference Guide provides information for using the commands to
accomplish specific administration tasks. Each command also has an online help page that
gives the complete command syntax. Help pages are available at the CLI using the help
command. Any Data Domain system command that accepts a list (such as a list of IP
addresses) accepts entries separated by commas, by spaces, or both.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 64
Data Domain systems running with DD OS 5.0 or higher supports remote power
management using the Intelligent Platform Management Interface (IPMI), and they support
remote monitoring of the boot sequence using Serial over LAN (SOL).
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 65
This lesson covers the features and benefits of the Data Domain System Manager (DDSM)
and the Data Domain Management Center (DDMC).
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 66
Data Domain systems are managed using sophisticated tools, including EMC Data Domain
System Manager: a graphical user interface which is web-based.
The Data Domain System Manager is a browser-based graphical user interface, available
through Ethernet connections, for managing a single system from any location. Data
Domain System Manager provides a single, consolidated management interface that allows
for configuration and monitoring of many system features and system settings. It provides
simple configuration wizards which guide you through a simplified configuration of your
system to get your system operating quickly.
You can access the System Manager from many popular web browsers:
• Microsoft Internet Explorer™
• Google Chrome™
• Mozilla Firefox™
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 67
EMC’s Data Domain Management Center is a scalable, virtual appliance framework that
streamlines the management and monitoring of Data Domain systems. Data Domain
Management Center integrates complex workflows into a single interface, eliminating the
overhead of managing devices across large data centers or remote sites. Key features
include Health and Status Resource Monitoring, Capacity, and Replication Management,
integrated Reporting and Permission, Group and Property-based Administration.
The Data Domain Management Center solution is supported across multiple Data Domain
Operating System releases.
Data Domain Management Center is designed for customers with multiple Data Domain
systems who are seeking to aggregate management and reporting from a single interface.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 68
This module covered accessing a Data Domain system using the the Data Domain
Command Line Interface, Data Domain System Manager, and the Data Domain
Management Center.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 69
This course covered features, benefits and advantages of using a Data Domain system for
backup operations, the physical architecture of a typical backup environment using Data
Domain systems, and the methods used for administering a Data Domain system.
This concludes the training. Proceed to the course assessment on the next slide.
Copyright 2016 EMC Corporation. All rights reserved. Data Domain Fundamentals 70