M01res01-Technology Overview
M01res01-Technology Overview
This module also includes knowledge checks and a lab, which enable you to test your knowledge.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 1
This lesson is an introduction to the EMC Data Domain appliance. The first topic answers the
question: What is a Data Domain system?
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 2
EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery. An EMC Data Domain system can also be used for online storage with additional features
and benefits.
A Data Domain system can connect to your network via Ethernet or Fibre Channel connections.
Data Domain systems use low-cost Serial Advanced Technology Attachment (SATA) disk drives and
implement a redundant array of independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity.
Most Data Domain systems have a controller and multiple storage units.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 3
EMC has several hardware offerings to meet a variety of environments including:
• Small enterprise data centers and remote offices
• Midsized enterprise data centers
• Enterprise data centers
• Large enterprise data centers
• EMC Data Domain Expansion Shelves
Visit the Data Domain Hardware page on http://www.emc.com/ for specific models and
specifications.
http://www.emc.com > Products and Solutions > Backup and Recovery > EMC Data Domain >
Hardware
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 4
The latest Data Domain Operating System (DD OS) has several features and benefits, including:
• Support for leading backup, file archiving, and email archiving applications
• Simultaneous use of VTL, CIFS, NFS, NDMP, and EMC Data Domain Boost
• Inline write/read verification, continuous fault detection, and healing
• Conformance with IT governance and regulatory compliance standards for archived data
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 5
This lesson covers deduplication, which is an important technology that improves data storage by
providing extremely efficient data backups and archiving. This lesson also covers the different types
of deduplication (inline, post-process, file-based, block-based, fixed-length, and variable-length)
and the advantages of each type. The last topic in this lesson covers Data Domain deduplication
and its advantages.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 6
Deduplication is similar to data compression, but it looks for redundancy of large sequences of
bytes. Sequences of bytes identical to those previously encountered and stored are replaced with
references to the previously encountered data.
This is all hidden from users and applications. When the data is read, the original data is provided
to the application or user.
Deduplication performance is dependent on the amount of data, bandwidth, disk speed, CPU, and
memory or the hosts and devices performing the deduplication.
When processing data, deduplication recognizes data that is identical to previously stored data.
When it encounters such data, deduplication creates a reference to the previously stored data, thus
avoiding storing duplicate data.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 7
Deduplication typically uses hashing algorithms.
Hashing algorithms yield a unique value based on the content of the data being hashed. This value
is called the hash or fingerprint, and is much smaller in size than the original data.
Different data contents yield different hashes; each hash can be checked against previously stored
hashes.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 8
In file-based deduplication, only the original instance of a file is stored. Future identical copies of
the file use a small reference to point to the original file content. File-based deduplication is
sometimes called single-instance storage (SIS).
In this example, eight files are being deduplicated. The blue files are identical, but each has its own
copy of the file content. The grey files also have their own copy of identical content. After
deduplication there are still eight files. The blue files point to the same content, which is stored
only once on disk. This is similar for the grey files. If each file is 20 megabytes, the file-based
deduplication has reduced the storage required from 160 megabytes to 40.
File-based deduplication enables storage savings. It can be combined with compression (a way to
transmit the same amount of data in fewer bits) for additional storage savings. It is popular in
desktop backups. It can be more effective for data restores. It doesn’t need to re-assemble files. It
can be included in backup software, so an organization doesn’t have to depend on a vendor disk.
File-based deduplication results are often not as great as with other types of deduplication (such as
block- and segment-based deduplication). The most important disadvantage is there is no
deduplication with previously backed up files if the file is modified.
File-based deduplication stores an original version of a file and creates a digital signature for it
(such as SHA1, a standard for digital signatures). Future exact copy iterations of the file are pointed
to the digital signature rather than being stored.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 9
Fixed-length segment deduplication (also called block-based deduplication or fixed-segment
deduplication) is a technology that reduces data storage requirements by comparing incoming data
segments (also called fixed data blocks or data chunks) with previously stored data segments. It
divides data into a single, fixed length (for example, 4 KB, 8 KB, 12 KB).
Fixed-length segment deduplication reads data and divides it into fixed-size segments. These
segments are compared to other segments already processed and stored. If the segment is
identical to a previous segment, a pointer is used to point to that previous segment.
In this example, the data stream is divided into a fixed length of four units. Small pointers to the
common content are assembled in the correct order to represent the original data. Each unique
data element is stored only once.
For data that is identical (does not change), fixed-length segment deduplication reduces storage
requirements.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 10
When data is altered the segments shift, causing more segments to be stored. For example, when
you add a slide to a Microsoft PowerPoint deck, all subsequent blocks in the file are rewritten and
are likely to be considered as different from those in the original file, so the deduplication effect is
less significant. Smaller blocks get better deduplication than large ones, but it takes more resources
to deduplicate.
In backup applications, the backup stream consists of many files. The backup streams are rarely
entirely identical even when they are successive backups of the same file system. A single addition,
deletion, or change of any file changes the number of bytes in the new backup stream. Even if no
file has changed, adding a new file to the backup stream shifts the rest of the backup stream. Fixed-
sized segment deduplication backs up large numbers of segments because of the new boundaries
between the segments.
Many hardware and software deduplication products use fixed-length segments for deduplication.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 11
Variable-length segment deduplication evaluates data by examining its contents to look for the
boundary from one segment to the next. Variable-length segments are any number of bytes within
a range determined by the particular algorithm implemented.
Unlike fixed-length segment deduplication, variable-length segment deduplication uses the content
of the stream to divide the backup or data stream into segments based on the contents of the data
stream.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 12
When you apply variable-length segmentation to a data sequence, deduplication uses variable data
segments when it looks at the data sequence. In this example, byte A is added to the beginning of
the data. Only one new segment needs to be stored, since the data defining boundaries between
the remaining data were not altered.
Eventually variable-length segment deduplication will find the segments that have not changed,
and backup fewer segments than fixed-size segment deduplication. Even for storing individual files,
variable length segments have an advantage. Many files are very similar to, but not identical to,
other versions of the same file. Variable length segments will isolate the changes, find more
identical segments, and store fewer segments than fixed-length deduplication.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 13
With post-process deduplication, files are written to disk first, then they are scanned and
compressed.
Post-process deduplication should never interfere with the incoming backup data speed.
Post-process deduplication requires more I/O. It writes new data to disk and then reads the new
data before it checks for duplicates. It requires an additional write to delete the duplicate data and
another write to update the hash table. If it can’t determine whether a data segment is duplicate or
new, it requires another write (this happens about 5% of the time). It requires more disk space to:
• Initially capture the data.
• Store multiple pools of data.
• Provide adequate performance by distributing the data over a large number of drives.
Post-process deduplication is run as a separate processing task and could lengthen the time
needed to fully complete the backup.
In post-process deduplication, files are first written to disk in their entirety (they are buffered to a
large cache). After the files are written, the hard drive is scanned for duplicates and compressed. In
other words, with post-process deduplication, deduplication happens after the files are written to
disk.
With post-process deduplication, a data segment enters the appliance (as part of a larger stream of
data from a backup), and it is written to disk in its entirety. Then a separate process (running
asynchronously and possibly from another appliance accessing the same disk) reads the block of
data to determine if it is a duplicate. If it is a duplicate, it is deleted and replaced with a pointer. If it
is new, it is stored.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 14
With Data Domain inline deduplication, incoming data is examined as soon as it arrives to
determine if a segment (or block, or chunk) is new or unique or a duplicate of a segment previously
stored. Inline deduplication occurs in RAM before the data is written to disk. Around 99% of data
segments are analyzed in RAM without disk access.
In some cases, an inline deduplication process will temporarily store a small amount of data on disk
before it is analyzed. A very small amount of data is not identified immediately as either unique or
redundant. That data is stored to disk and examined again later against the previously stored data.
Inline deduplication requires less disk space than post-process deduplication. There is less
administration for an inline deduplication process, as the administrator does not need to define
and monitor the staging space.
Inline deduplication analyzes the data in RAM, and reduces disk seek times to determine if the new
data must be stored.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 15
When the deduplication occurs close to where data is created, it is often referred to as source-
based deduplication, whereas when it occurs near where the data is stored, it is commonly called
target-based deduplication.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 16
EMC Data Domain Global Compression™ is the EMC Data Domain trademarked name for global
compression, local compression, and deduplication.
Global compression equals deduplication. It identifies previously stored segments and cannot be
turned off.
Local compression compresses segments before writing them to disk. It uses common, industry-
standard algorithms (for example, lz, gz, and gzfast). The default compression algorithm used by
Data Domain systems is lz.
Local compression is similar to zipping a file to reduce the file size. Zip is a file format used for data
compression and archiving. A zip file contains one or more files that have been compressed, to
reduce file size, or stored as is. The zip file format permits a number of compression algorithms.
Local compression can be turned off.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 17
This lesson covers EMC Data Domain SISL™ Scaling Architecture.
In this lesson, you learn more about SISL architecture, its advantages, and how it works.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 18
SISL architecture provides fast and efficient deduplication:
• 99% of duplicate data segments are identified inline in RAM before they are stored to disk.
• System throughput increases directly as CPU performance increases.
• Reduces the disk footprint by minimizing disk access.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 19
SISL does the following:
1. Segment
The data is broken into variable-length segments.
2. Fingerprint
Each segment is given a fingerprint, or hash, for identification.
3. Filter
The summary vector and segment locality techniques identify 99% of the duplicate
segments in RAM, inline, before storing to disk. If a segment is a duplicate, it is referenced
and discarded. If a segment is new, the data moves on to step 4.
4. Compress
New segments are grouped and compressed using common algorithms: lz, gz, gzfast (lz by
default).
5. Write
Writes data (segments, fingerprints, metadata and logs) to containers, and containers are
written to disk.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 20
This lesson covers EMC Data Domain Data Invulnerability Architecture (DIA), which is an important
EMC Data Domain technology that provides safe and reliable storage.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 21
Data Invulnerability Architecture (DIA) is an important EMC Data Domain technology that provides
safe and reliable storage.
The EMC Data Domain operating system (DD OS) is built for data protection. Its elements comprise
an architectural design whose goal is data invulnerability. Four technologies within the DIA fight
data loss:
• End-to-end verification
• Fault avoidance and containment
• Continuous fault detection and healing
• File system recoverability
DIA helps to provide data integrity and recoverability and extremely resilient and protective disk
storage. This keeps data safe.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 22
The end-to-end verification check verifies all file system data and metadata. The end-to-end
verification flow is shown on this slide.
If something goes wrong, it is corrected through self-healing and the system alerts to back up
again.
Since every component of a storage system can introduce errors, an end-to-end test is the simplest
way to ensure data integrity. End-to-end verification means reading data after it is written and
comparing it to what was sent do disk, proving that it is reachable through the file system to disk,
and proving that data is not corrupted.
When the DD OS receives a write request from backup software, it computes a huge checksum
over the constituent data. After analyzing the data for redundancy, it stores the new data segments
and all of the checksums. After the I/O has selected a backup and all data is synced to disk, the DD
OS verifies that it can read the entire file from the disk platter and through the Data Domain file
system, and that the checksums of the data read back match the checksums of the written data.
This ensures that the data on the disks is readable and correct and that the file system metadata
structures used to find the data are also readable and correct. This confirms that the data is correct
and recoverable from every level of the system. If there are problems anywhere, for example if a
bit flips on a disk drive, it is caught. Mostly, a problem is corrected through self-healing. If a
problem can’t be corrected, it is reported immediately, and a backup is repeated while the data is
still valid on the primary store.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 23
Data Domain systems are equipped with a specialized log-structured file system that has important benefits.
• New data never overwrites existing data. (The system never puts existing data at risk.)
Traditional file systems often overwrite blocks when data changes, and then use the old block address. The Data
Domain file system writes only to new blocks. This isolates any incorrect overwrite (a software bug problem) to
only the newest backup data. Older versions remain safe.
As shown in this slide, the container log never overwrites or updates existing data. New data is written to new
containers. Old containers and references remain in place and safe even when software bugs or hardware faults
occur when new backups are stored.
• The system includes non-volatile RAM (NVRAM) for fast, safe restarts.
The system includes a non-volatile RAM (NVRAM) write buffer into which it puts all data not yet safely on disk. The
file system leverages the security of this write buffer to implement a fast, safe restart capability.
The file system includes many internal logic and data structure integrity checks. If a problem is found by one of
these checks, the file system restarts. The checks and restarts provide early detection and recovery from the kinds
of bugs that can corrupt data. As it restarts, the Data Domain file system verifies the integrity of the data in the
NVRAM buffer before applying it to the file system and thus ensures that no data is lost due to a power outage.
For example, in a power outage, the old data could be lost and a recovery attempt could fail. For this reason, Data
Domain systems never update just one block in a stripe. Following the no-overwrite policy, all new writes go to
new RAID stripes, and those new RAID stripes are written in their entirety. The verification-after-write ensures
that the new stripe is consistent (there are no partial stripe writes). New writes don’t put existing backups at risk.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 24
Continuous fault detection and healing provide an extra level of protection within the Data Domain
operating system. The DD OS detects faults and recovers from them continuously. Continuous fault
detection and healing ensures successful data restore operations.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 25
This is the flow for continuous fault detection and healing:
1. The Data Domain system periodically rechecks the integrity of the RAID stripes and
container logs.
2. The Data Domain system uses RAID system redundancy to heal faults. RAID 6 is the
foundation for Data Domain systems continuous fault detection and healing. Its dual-parity
architecture offers advantages over conventional architectures, including RAID 1 (mirroring),
RAID 3, RAID 4 or RAID 5 single-parity approaches.
RAID 6:
Protects against two disk failures.
Protects against disk read errors during reconstruction.
Protects against the operator pulling the wrong disk.
Guarantees RAID stripe consistency even during power failure without reliance on
NVRAM or an uninterruptable power supply (UPS).
Verifies data integrity and stripe coherency after writes.
By comparison, after a single disk fails in other RAID architectures, any further
simultaneous disk errors cause data loss. A system whose focus is data protection must
include the extra level of protection that RAID 6 provides.
To ensure that all data returned to the user during a restore is correct, the Data Domain file
system stores all of its on-disk data structures in formatted data blocks. These are self-
identifying and covered by a strong checksum. On every read from disk, the system first
verifies that the block read from disk is the block expected. It then uses the checksum to
verify the integrity of the data. If any issue is found, it asks RAID 6 to use its extra level of
redundancy to correct the data error. Because the RAID stripes are never partially updated,
their consistency is ensured and thus so is the ability to heal an error when it is discovered.
Continuous error detection works well for data being read, but it does not address issues
with data that may be unread for weeks or months before being needed for a recovery. For
this reason, Data Domain systems actively re-verify the integrity of all data every week in an
ongoing background process. This scrub process finds and repairs defects on the disk before
they can become a problem.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 26
The EMC Data Domain Data Invulnerability Architecture (DIA) file system recovery is a feature that
reconstructs lost or corrupted file system metadata. It includes file system check tools.
If a Data Domain system does have a problem, DIA file system recovery ensures that the system is
brought back online quickly.
This slide shows DIA file system recovery:
• Data is written in a self-describing format.
• The file system can be recreated by scanning the logs and rebuilding it from metadata
stored with the data.
In a traditional file system, consistency is not checked. Data Domain systems check through initial
verification after each backup to ensure consistency for all new writes. The usable size of a
traditional file system is often limited by the time it takes to recover the file system in the event of
some sort of corruption.
Imagine running fsck on a traditional file system with more than 80 TB of data. The reason the
checking process can take so long is the file system needs to sort out the locations of the free
blocks so new writes do not accidentally overwrite existing data. Typically, this entails checking all
references to rebuild free block maps and reference counts. The more data in the system, the
longer this takes.
In contrast, since the Data Domain file system never overwrites existing data and doesn’t have
block maps and reference counts to rebuild, it has to verify only the location of the head of the log
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 27
to safely bring the system back online and restore critical data.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 27
This lesson covers the Data Domain file system. The Data Domain file system includes:
• ddvar (Administrative files)
• MTrees (File Storage)
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 28
Data Domain system administrative files are stored in /ddvar. This directory stores system core and
log files, generated support upload bundles, compressed core files, and .rpm upgrade packages.
The ddvar file structure keeps administrative files separate from storage files.
You cannot rename or delete /ddvar, nor can you access all of its sub-directories, such as the core
sub-directory.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 29
The MTree (Managed Tree) file structure is the destination for deduplicated data. It is also the root directory
for deduplicated data. It comes pre-configured for NFS export as /backup. You configure directory export
levels to separate and organize backup files in the MTree file system.
The MTree file structure:
• Uses compression.
• Implements data integrity.
• Reclaims storage space with file-system cleaning. You will learn more about file-system cleaning
later in this course.
MTrees provide more granular space management and reporting. This allows for finer management of
replication, snapshots, and retention locking. These operations can be performed on a specific MTree rather
than on the entire file system. For example, you can configure directory export levels to separate and
organize backup files.
Although a Data Domain system supports a maximum of 100 MTrees, system performance might degrade
rapidly if more than 14 MTrees are actively engaged in read or write streams (With DD OS 5.3 and 5.4 the
DD990, DD890, and DD880 series appliances will support up to 32 active MTrees). The degree of
degradation depends on overall I/O intensity and other file-system loads. For optimum performance, you
should contain the number of simultaneously active MTrees to a maximum of 14 or 32 depending on which
model is used. Whenever possible, it is best to aggregate operations on the same MTree into a single
operation.
You can add subdirectories to MTree directories. You cannot add anything to the /data directory. You can
change only the col1 subdirectory. The backup MTree (/data/col1/backup) cannot be deleted or renamed. If
MTrees are added, they can be renamed and deleted. You can replicate directories under /backup.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 30
This lesson covers Data Domain protocols.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 31
Five protocols can be used to connect to a Data Domain appliance:
• NFS
This protocol allows Network File System (NFS) clients access to Data Domain system
directories and MTrees.
• CIFS
This protocol allows Common Internet File System (CIFS) clients access to Data Domain
system directories and MTrees.
• VTL
The virtual tape library (VTL) protocol enables backup applications to connect to and
manage Data Domain system storage as if it were a tape library. All of the functionality
generally supported by a physical tape library is available with a Data Domain system
configured as a VTL. The movement of data from a system configured as a VTL to a physical
tape library is managed by backup software (not by the Data Domain system). The VTL
protocol is used with Fibre Channel (FC) networking.
• DD Boost
The DD Boost protocol enables backup servers to communicate with storage systems
without the need for Data Domain systems to emulate tape. There are two components to
DD Boost: one component that runs on the backup server and another component that
runs on a Data Domain system.
• NDMP
If the VTL communication between a backup server and a Data Domain system is through
NDMP (Network Data Management Protocol), no Fibre Channel (FC) is required. When you
use NDMP, all initiator and port functionality does not apply.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 32
This lesson covers Data Domain data paths, which include NFS, CIFS, DD Boost, NDMP, and VTL over
Ethernet or Fibre Channel.
This lesson also covers where a Data Domain system fits into a typical backup environment.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 33
Data Domain systems connect to backup servers as storage capacity to hold large collections of backup data.
This slide shows how a Data Domain system integrates non-intrusively into an existing storage environment.
Often a Data Domain system is connected directly to a backup server. The backup data flow from the clients
is simply redirected to the Data Domain device instead of to a tape library.
Data Domain systems integrate non-intrusively into typical backup environments and reduce the amount of
storage needed to back up large amounts of data by performing deduplication and compression on data
before writing it to disk. The data footprint is reduced, making it possible for tapes to be partially or
completely replaced.
Depending on an organization’s policies, a tape library can be either removed or retained.
An organization can replicate and vault duplicate copies of data when two Data Domain systems have the
Data Domain Replicator software option enabled.
One option (not shown) is that data can be replicated locally with the copies stored onsite. The smaller data
footprint after deduplication also makes WAN vaulting feasible. As shown in the slide, replicas can be sent
over the WAN to an offsite disaster recovery (DR) location.
WAN vaulting can replace the process of rotating tapes from the library and sending the tapes to a vault by
truck.
If an organization’s policies dictate that tape must still be made for long-term archival retention, data can
flow from the Data Domain system back to the server and then to a tape library.
Often the Data Domain system is connected directly to the backup server. The backup data flow is
redirected from the clients to the Data Domain system instead of to tape. If tape needs to be made for long-
term archival retention, data flows from the Data Domain system back to the server and then to tape,
completing the same flow that the backup server was doing initially. Tapes come out in the same standard
backup software formats as before and can go off-site for long-term retention. If a tape must be retrieved, it
goes back into the tape library, and the data flows back through the backup software to the client that
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 34
needs it.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 34
A data path is the path that data travels from the backup (or archive) servers to a Data Domain
system.
Ethernet supports the NFS, CIFS, FTP, NDMP, and DD Boost protocols that a Data Domain system
uses to move data.
In the data path over Ethernet (a family of computer networking technologies), backup and archive
servers send data from clients to Data Domain systems on the network via the TCP(UDP)/IP (a set
of communication protocols for the internet and other networks).
You can also use a direct connection between a dedicated port on the backup or archive server and
a dedicated port on the Data Domain system. The connection between the backup (or archive)
server and the Data Domain system can be Ethernet or Fibre Channel, or both if needed. This slide
shows the Ethernet connection.
When Data Domain Replicator is licensed on two Data Domain systems, replication is enabled
between the two systems. The Data Domain systems can be either local, for local retention, or
remote, for disaster recovery. Data in flight over the WAN can be secured using VPN. Physical
separation of the replication traffic from backup traffic can be achieved by using two separate
Ethernet interfaces on a Data Domain system. This allows backups and replication to run
simultaneously without network conflicts. Since the Data Domain OS is based on Linux, it needs
additional software to work with CIFS. Samba software enables CIFS to work with the
Data Domain OS.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 35
A data path is the path that data travels from the backup (or archive) servers to a Data Domain
system.
Fibre Channel supports the VTL and DD Boost protocols that a Data Domain system uses to move
data.
If the Data Domain virtual tape library (VTL) option is licensed, and a VTL FC HBA is installed on the
Data Domain system, the system can be connected to a Fibre Channel system attached network
(SAN). The backup or archive server sees the Data Domain system as one or multiple VTLs with up
to 512 virtual linear tape-open LTO-1, LTO-2, LTO-3 , or LTO-4 tape drives and 20,000 virtual slots
across up to 100,000 virtual cartridges.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 36
This lesson covers Data Domain administration interfaces, which include the System Manager,
which is the graphical user interface (GUI), and the command line interface (CLI).
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 37
The EMC Data Domain command line interface (CLI) enables you to manage Data Domain systems.
You can do everything from the CLI that you can do from the System Manager.
After the initial configuration, use the SSH or Telnet (if enabled) utilities to access the system
remotely and open the CLI.
The DD OS 5.4 Command Reference Guide provides information for using the commands to
accomplish specific administration tasks. Each command also has an online help page that gives the
complete command syntax. Help pages are available at the CLI using the help command. Any Data
Domain system command that accepts a list (such as a list of IP addresses) accepts entries
separated by commas, by spaces, or both.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 38
With the Data Domain System Manager (formerly the Data Domain Enterprise Manager), you can manage
one or more Data Domain systems. You can monitor and add systems from the System Manager. (To add a
system you need a sysadmin password.) You can also view cumulative information about the systems you’re
monitoring.
A Data Domain system should be added to, and managed by, only one System Manager.
You can access the System Manager from many browsers:
• Microsoft Internet Explorer™
• Google Chrome™
• Mozilla Firefox™
The Summary screen presents a status overview of, and cumulative information for, all managed systems in
the DD Network devices list and summarizes key operating information. The System Status, Space Usage,
and Systems panes provide key factors to help you recognize problems immediately and to allow you to drill
down to the system exhibiting the problem.
The tally of alerts and charts of disk space that the System Manager presents enables you to quickly spot
problems.
Click the plus sign (+) next to the DD Network icon in the sidebar to expose the systems being managed by
the System Manager.
The System Manager includes tabs to help you navigate your way through administrative tasks. To access
the top- and sub-level tabs, shown in this slide, you must first select a system. In the lower pane on the
screen, you can view information about the system you selected. In this slide, a system has been selected,
and you can view details about it.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 39
This lab covers the lab environment setup required for this class.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 40
This lab covers the steps necessary to access a Data Domain system.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 41
EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery with high-speed, inline deduplication. An EMC Data Domain system can also be used for
online storage with additional features and benefits.
A Data Domain system can connect to your network via Ethernet or Fibre Channel connections.
With an Ethernet connection, the system can be accessed using the NDMP, DD Boost, CIFS and NFS
protocols. The Fibre Channel connection supports the VTL protocol.
EMC Data Domain implements deduplication in a special hardware device. Most Data Domain
systems have a controller and multiple storage units.
Data Domain systems use low-cost Serial Advanced Technology Attachment (SATA) disk drives and
implement a redundant array of independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity. Data Domain systems use non-volatile random access
memory (NVRAM) to protect unwritten data. NVRAM is used to hold data not yet written to disk.
Holding data like this ensures that data is not lost in a power outage.
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 42
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 43
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 44