0% found this document useful (0 votes)

58 views46 pages

M01res01-Technology Overview

Uploaded by

Ram Guggul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views46 pages

M01res01-Technology Overview

Uploaded by

Ram Guggul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

This module focuses on Data Domain core technologies.

It includes the following lessons:

• Data Domain Overview
• Deduplication Basics
• EMC Data Domain Stream-Informed Segment Layout (SISL™) Scaling Architecture Overview
• EMC Data Domain Data Invulnerability Architecture (DIA) Overview
• EMC Data Domain File Systems Introduction
• EMC Data Domain Protocols Overview
• EMC Data Domain Data Paths Overview
• EMC Data Domain Administration Interfaces

This module also includes knowledge checks and a lab, which enable you to test your knowledge.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 1
This lesson is an introduction to the EMC Data Domain appliance. The first topic answers the
question: What is a Data Domain system?

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 2
EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery. An EMC Data Domain system can also be used for online storage with additional features
and benefits.

A Data Domain system can connect to your network via Ethernet or Fibre Channel connections.

Most Data Domain systems have a controller and multiple storage units.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 3
EMC has several hardware offerings to meet a variety of environments including:
• Small enterprise data centers and remote offices
• Midsized enterprise data centers
• Enterprise data centers
• Large enterprise data centers
• EMC Data Domain Expansion Shelves

Visit the Data Domain Hardware page on http://www.emc.com/ for specific models and
specifications.
http://www.emc.com > Products and Solutions > Backup and Recovery > EMC Data Domain >
Hardware

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 4
The latest Data Domain Operating System (DD OS) has several features and benefits, including:
• Support for leading backup, file archiving, and email archiving applications
• Simultaneous use of VTL, CIFS, NFS, NDMP, and EMC Data Domain Boost
• Inline write/read verification, continuous fault detection, and healing
• Conformance with IT governance and regulatory compliance standards for archived data

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 5
This lesson covers deduplication, which is an important technology that improves data storage by
providing extremely efficient data backups and archiving. This lesson also covers the different types
of deduplication (inline, post-process, file-based, block-based, fixed-length, and variable-length)
and the advantages of each type. The last topic in this lesson covers Data Domain deduplication
and its advantages.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 6
Deduplication is similar to data compression, but it looks for redundancy of large sequences of
bytes. Sequences of bytes identical to those previously encountered and stored are replaced with
references to the previously encountered data.

This is all hidden from users and applications. When the data is read, the original data is provided
to the application or user.

Deduplication performance is dependent on the amount of data, bandwidth, disk speed, CPU, and
memory or the hosts and devices performing the deduplication.

When processing data, deduplication recognizes data that is identical to previously stored data.
When it encounters such data, deduplication creates a reference to the previously stored data, thus
avoiding storing duplicate data.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 7
Deduplication typically uses hashing algorithms.

Hashing algorithms yield a unique value based on the content of the data being hashed. This value
is called the hash or fingerprint, and is much smaller in size than the original data.

Different data contents yield different hashes; each hash can be checked against previously stored
hashes.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 8
In file-based deduplication, only the original instance of a file is stored. Future identical copies of
the file use a small reference to point to the original file content. File-based deduplication is
sometimes called single-instance storage (SIS).

In this example, eight files are being deduplicated. The blue files are identical, but each has its own
copy of the file content. The grey files also have their own copy of identical content. After
deduplication there are still eight files. The blue files point to the same content, which is stored
only once on disk. This is similar for the grey files. If each file is 20 megabytes, the file-based
deduplication has reduced the storage required from 160 megabytes to 40.

File-based deduplication enables storage savings. It can be combined with compression (a way to
transmit the same amount of data in fewer bits) for additional storage savings. It is popular in
desktop backups. It can be more effective for data restores. It doesn’t need to re-assemble files. It
can be included in backup software, so an organization doesn’t have to depend on a vendor disk.

File-based deduplication results are often not as great as with other types of deduplication (such as
block- and segment-based deduplication). The most important disadvantage is there is no
deduplication with previously backed up files if the file is modified.

File-based deduplication stores an original version of a file and creates a digital signature for it
(such as SHA1, a standard for digital signatures). Future exact copy iterations of the file are pointed
to the digital signature rather than being stored.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 9
Fixed-length segment deduplication (also called block-based deduplication or fixed-segment
deduplication) is a technology that reduces data storage requirements by comparing incoming data
segments (also called fixed data blocks or data chunks) with previously stored data segments. It
divides data into a single, fixed length (for example, 4 KB, 8 KB, 12 KB).

Fixed-length segment deduplication reads data and divides it into fixed-size segments. These
segments are compared to other segments already processed and stored. If the segment is
identical to a previous segment, a pointer is used to point to that previous segment.

In this example, the data stream is divided into a fixed length of four units. Small pointers to the
common content are assembled in the correct order to represent the original data. Each unique
data element is stored only once.

For data that is identical (does not change), fixed-length segment deduplication reduces storage
requirements.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 10
When data is altered the segments shift, causing more segments to be stored. For example, when
you add a slide to a Microsoft PowerPoint deck, all subsequent blocks in the file are rewritten and
are likely to be considered as different from those in the original file, so the deduplication effect is
less significant. Smaller blocks get better deduplication than large ones, but it takes more resources
to deduplicate.

In backup applications, the backup stream consists of many files. The backup streams are rarely
entirely identical even when they are successive backups of the same file system. A single addition,
deletion, or change of any file changes the number of bytes in the new backup stream. Even if no
file has changed, adding a new file to the backup stream shifts the rest of the backup stream. Fixed-
sized segment deduplication backs up large numbers of segments because of the new boundaries
between the segments.

Many hardware and software deduplication products use fixed-length segments for deduplication.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 11
Variable-length segment deduplication evaluates data by examining its contents to look for the
boundary from one segment to the next. Variable-length segments are any number of bytes within
a range determined by the particular algorithm implemented.

Unlike fixed-length segment deduplication, variable-length segment deduplication uses the content
of the stream to divide the backup or data stream into segments based on the contents of the data
stream.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 12
When you apply variable-length segmentation to a data sequence, deduplication uses variable data
segments when it looks at the data sequence. In this example, byte A is added to the beginning of
the data. Only one new segment needs to be stored, since the data defining boundaries between
the remaining data were not altered.

Eventually variable-length segment deduplication will find the segments that have not changed,
and backup fewer segments than fixed-size segment deduplication. Even for storing individual files,
variable length segments have an advantage. Many files are very similar to, but not identical to,
other versions of the same file. Variable length segments will isolate the changes, find more
identical segments, and store fewer segments than fixed-length deduplication.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 13
With post-process deduplication, files are written to disk first, then they are scanned and
compressed.
Post-process deduplication should never interfere with the incoming backup data speed.
Post-process deduplication requires more I/O. It writes new data to disk and then reads the new
data before it checks for duplicates. It requires an additional write to delete the duplicate data and
another write to update the hash table. If it can’t determine whether a data segment is duplicate or
new, it requires another write (this happens about 5% of the time). It requires more disk space to:
• Initially capture the data.
• Store multiple pools of data.
• Provide adequate performance by distributing the data over a large number of drives.
Post-process deduplication is run as a separate processing task and could lengthen the time
needed to fully complete the backup.
In post-process deduplication, files are first written to disk in their entirety (they are buffered to a
large cache). After the files are written, the hard drive is scanned for duplicates and compressed. In
other words, with post-process deduplication, deduplication happens after the files are written to
disk.
With post-process deduplication, a data segment enters the appliance (as part of a larger stream of
data from a backup), and it is written to disk in its entirety. Then a separate process (running
asynchronously and possibly from another appliance accessing the same disk) reads the block of
data to determine if it is a duplicate. If it is a duplicate, it is deleted and replaced with a pointer. If it
is new, it is stored.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 14
With Data Domain inline deduplication, incoming data is examined as soon as it arrives to
determine if a segment (or block, or chunk) is new or unique or a duplicate of a segment previously
stored. Inline deduplication occurs in RAM before the data is written to disk. Around 99% of data
segments are analyzed in RAM without disk access.
In some cases, an inline deduplication process will temporarily store a small amount of data on disk
before it is analyzed. A very small amount of data is not identified immediately as either unique or
redundant. That data is stored to disk and examined again later against the previously stored data.

The process is shown in this slide, as follows:

• Inbound segments are analyzed in RAM.
• If a segment is redundant, a reference to the stored segment is created.
• If a segment is unique, it is compressed and stored.

Inline deduplication requires less disk space than post-process deduplication. There is less
administration for an inline deduplication process, as the administrator does not need to define
and monitor the staging space.

Inline deduplication analyzes the data in RAM, and reduces disk seek times to determine if the new
data must be stored.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 15
When the deduplication occurs close to where data is created, it is often referred to as source-
based deduplication, whereas when it occurs near where the data is stored, it is commonly called
target-based deduplication.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 16
EMC Data Domain Global Compression™ is the EMC Data Domain trademarked name for global
compression, local compression, and deduplication.

Global compression equals deduplication. It identifies previously stored segments and cannot be
turned off.

Local compression compresses segments before writing them to disk. It uses common, industry-
standard algorithms (for example, lz, gz, and gzfast). The default compression algorithm used by
Data Domain systems is lz.

Local compression is similar to zipping a file to reduce the file size. Zip is a file format used for data
compression and archiving. A zip file contains one or more files that have been compressed, to
reduce file size, or stored as is. The zip file format permits a number of compression algorithms.
Local compression can be turned off.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 17
This lesson covers EMC Data Domain SISL™ Scaling Architecture.

EMC Data Domain SISL™ Scaling Architecture is also called:

• Stream-Informed Segment Layout (SISL) scaling architecture
• SISL scaling architecture
• SISL architecture
• SISL technology

SISL architecture helps to speed up Data Domain systems.

In this lesson, you learn more about SISL architecture, its advantages, and how it works.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 18
SISL architecture provides fast and efficient deduplication:
• 99% of duplicate data segments are identified inline in RAM before they are stored to disk.
• System throughput increases directly as CPU performance increases.
• Reduces the disk footprint by minimizing disk access.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 19
SISL does the following:
1. Segment
The data is broken into variable-length segments.
2. Fingerprint
Each segment is given a fingerprint, or hash, for identification.
3. Filter
The summary vector and segment locality techniques identify 99% of the duplicate
segments in RAM, inline, before storing to disk. If a segment is a duplicate, it is referenced
and discarded. If a segment is new, the data moves on to step 4.
4. Compress
New segments are grouped and compressed using common algorithms: lz, gz, gzfast (lz by
default).
5. Write
Writes data (segments, fingerprints, metadata and logs) to containers, and containers are
written to disk.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 20
This lesson covers EMC Data Domain Data Invulnerability Architecture (DIA), which is an important
EMC Data Domain technology that provides safe and reliable storage.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 21
Data Invulnerability Architecture (DIA) is an important EMC Data Domain technology that provides
safe and reliable storage.

The EMC Data Domain operating system (DD OS) is built for data protection. Its elements comprise
an architectural design whose goal is data invulnerability. Four technologies within the DIA fight
data loss:
• End-to-end verification
• Fault avoidance and containment
• Continuous fault detection and healing
• File system recoverability

DIA helps to provide data integrity and recoverability and extremely resilient and protective disk
storage. This keeps data safe.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 22
The end-to-end verification check verifies all file system data and metadata. The end-to-end
verification flow is shown on this slide.
If something goes wrong, it is corrected through self-healing and the system alerts to back up
again.

Since every component of a storage system can introduce errors, an end-to-end test is the simplest
way to ensure data integrity. End-to-end verification means reading data after it is written and
comparing it to what was sent do disk, proving that it is reachable through the file system to disk,
and proving that data is not corrupted.

When the DD OS receives a write request from backup software, it computes a huge checksum
over the constituent data. After analyzing the data for redundancy, it stores the new data segments
and all of the checksums. After the I/O has selected a backup and all data is synced to disk, the DD
OS verifies that it can read the entire file from the disk platter and through the Data Domain file
system, and that the checksums of the data read back match the checksums of the written data.

This ensures that the data on the disks is readable and correct and that the file system metadata
structures used to find the data are also readable and correct. This confirms that the data is correct
and recoverable from every level of the system. If there are problems anywhere, for example if a
bit flips on a disk drive, it is caught. Mostly, a problem is corrected through self-healing. If a
problem can’t be corrected, it is reported immediately, and a backup is repeated while the data is
still valid on the primary store.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 23
Data Domain systems are equipped with a specialized log-structured file system that has important benefits.
• New data never overwrites existing data. (The system never puts existing data at risk.)
Traditional file systems often overwrite blocks when data changes, and then use the old block address. The Data
Domain file system writes only to new blocks. This isolates any incorrect overwrite (a software bug problem) to
only the newest backup data. Older versions remain safe.
As shown in this slide, the container log never overwrites or updates existing data. New data is written to new
containers. Old containers and references remain in place and safe even when software bugs or hardware faults
occur when new backups are stored.

• There are fewer complex data structures.

In a traditional file system, there are many data structures (for example, free block bit maps and reference counts)
that support fast block updates. In a backup application, the workload is primarily sequential writes of new data.
Because a Data Domain system is simpler, it requires fewer data structures to support it. As long as the Data
Domain system can keep track of the head of the log, new writes never overwrite old data. This design simplicity
greatly reduces the chances of software errors that could lead to data corruption.

• The system includes non-volatile RAM (NVRAM) for fast, safe restarts.
The system includes a non-volatile RAM (NVRAM) write buffer into which it puts all data not yet safely on disk. The
file system leverages the security of this write buffer to implement a fast, safe restart capability.
The file system includes many internal logic and data structure integrity checks. If a problem is found by one of
these checks, the file system restarts. The checks and restarts provide early detection and recovery from the kinds
of bugs that can corrupt data. As it restarts, the Data Domain file system verifies the integrity of the data in the
NVRAM buffer before applying it to the file system and thus ensures that no data is lost due to a power outage.
For example, in a power outage, the old data could be lost and a recovery attempt could fail. For this reason, Data
Domain systems never update just one block in a stripe. Following the no-overwrite policy, all new writes go to
new RAID stripes, and those new RAID stripes are written in their entirety. The verification-after-write ensures
that the new stripe is consistent (there are no partial stripe writes). New writes don’t put existing backups at risk.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 24
Continuous fault detection and healing provide an extra level of protection within the Data Domain
operating system. The DD OS detects faults and recovers from them continuously. Continuous fault
detection and healing ensures successful data restore operations.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 25
This is the flow for continuous fault detection and healing:
1. The Data Domain system periodically rechecks the integrity of the RAID stripes and
container logs.
2. The Data Domain system uses RAID system redundancy to heal faults. RAID 6 is the
foundation for Data Domain systems continuous fault detection and healing. Its dual-parity
architecture offers advantages over conventional architectures, including RAID 1 (mirroring),
RAID 3, RAID 4 or RAID 5 single-parity approaches.

RAID 6:
 Protects against two disk failures.
 Protects against disk read errors during reconstruction.
 Protects against the operator pulling the wrong disk.
 Guarantees RAID stripe consistency even during power failure without reliance on
NVRAM or an uninterruptable power supply (UPS).
 Verifies data integrity and stripe coherency after writes.

By comparison, after a single disk fails in other RAID architectures, any further
simultaneous disk errors cause data loss. A system whose focus is data protection must
include the extra level of protection that RAID 6 provides.

3. During every read, data integrity is re-verified.

4. Any errors are healed as they are encountered.

To ensure that all data returned to the user during a restore is correct, the Data Domain file
system stores all of its on-disk data structures in formatted data blocks. These are self-
identifying and covered by a strong checksum. On every read from disk, the system first
verifies that the block read from disk is the block expected. It then uses the checksum to
verify the integrity of the data. If any issue is found, it asks RAID 6 to use its extra level of
redundancy to correct the data error. Because the RAID stripes are never partially updated,
their consistency is ensured and thus so is the ability to heal an error when it is discovered.

Continuous error detection works well for data being read, but it does not address issues
with data that may be unread for weeks or months before being needed for a recovery. For
this reason, Data Domain systems actively re-verify the integrity of all data every week in an
ongoing background process. This scrub process finds and repairs defects on the disk before
they can become a problem.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 26
The EMC Data Domain Data Invulnerability Architecture (DIA) file system recovery is a feature that
reconstructs lost or corrupted file system metadata. It includes file system check tools.
If a Data Domain system does have a problem, DIA file system recovery ensures that the system is
brought back online quickly.
This slide shows DIA file system recovery:
• Data is written in a self-describing format.
• The file system can be recreated by scanning the logs and rebuilding it from metadata
stored with the data.

In a traditional file system, consistency is not checked. Data Domain systems check through initial
verification after each backup to ensure consistency for all new writes. The usable size of a
traditional file system is often limited by the time it takes to recover the file system in the event of
some sort of corruption.
Imagine running fsck on a traditional file system with more than 80 TB of data. The reason the
checking process can take so long is the file system needs to sort out the locations of the free
blocks so new writes do not accidentally overwrite existing data. Typically, this entails checking all
references to rebuild free block maps and reference counts. The more data in the system, the
longer this takes.

In contrast, since the Data Domain file system never overwrites existing data and doesn’t have
block maps and reference counts to rebuild, it has to verify only the location of the head of the log

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 27
This lesson covers the Data Domain file system. The Data Domain file system includes:
• ddvar (Administrative files)
• MTrees (File Storage)

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 28
Data Domain system administrative files are stored in /ddvar. This directory stores system core and
log files, generated support upload bundles, compressed core files, and .rpm upgrade packages.

• The NFS directory is /ddvar

• The CIFS share is \ddvar

The ddvar file structure keeps administrative files separate from storage files.

You cannot rename or delete /ddvar, nor can you access all of its sub-directories, such as the core
sub-directory.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 29
The MTree (Managed Tree) file structure is the destination for deduplicated data. It is also the root directory
for deduplicated data. It comes pre-configured for NFS export as /backup. You configure directory export
levels to separate and organize backup files in the MTree file system.
The MTree file structure:
• Uses compression.
• Implements data integrity.
• Reclaims storage space with file-system cleaning. You will learn more about file-system cleaning
later in this course.
MTrees provide more granular space management and reporting. This allows for finer management of
replication, snapshots, and retention locking. These operations can be performed on a specific MTree rather
than on the entire file system. For example, you can configure directory export levels to separate and
organize backup files.
Although a Data Domain system supports a maximum of 100 MTrees, system performance might degrade
rapidly if more than 14 MTrees are actively engaged in read or write streams (With DD OS 5.3 and 5.4 the
DD990, DD890, and DD880 series appliances will support up to 32 active MTrees). The degree of
degradation depends on overall I/O intensity and other file-system loads. For optimum performance, you
should contain the number of simultaneously active MTrees to a maximum of 14 or 32 depending on which
model is used. Whenever possible, it is best to aggregate operations on the same MTree into a single
operation.
You can add subdirectories to MTree directories. You cannot add anything to the /data directory. You can
change only the col1 subdirectory. The backup MTree (/data/col1/backup) cannot be deleted or renamed. If
MTrees are added, they can be renamed and deleted. You can replicate directories under /backup.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 31
Five protocols can be used to connect to a Data Domain appliance:
• NFS
This protocol allows Network File System (NFS) clients access to Data Domain system
directories and MTrees.
• CIFS
This protocol allows Common Internet File System (CIFS) clients access to Data Domain
system directories and MTrees.
• VTL
The virtual tape library (VTL) protocol enables backup applications to connect to and
manage Data Domain system storage as if it were a tape library. All of the functionality
generally supported by a physical tape library is available with a Data Domain system
configured as a VTL. The movement of data from a system configured as a VTL to a physical
tape library is managed by backup software (not by the Data Domain system). The VTL
protocol is used with Fibre Channel (FC) networking.
• DD Boost
The DD Boost protocol enables backup servers to communicate with storage systems
without the need for Data Domain systems to emulate tape. There are two components to
DD Boost: one component that runs on the backup server and another component that
runs on a Data Domain system.
• NDMP
If the VTL communication between a backup server and a Data Domain system is through
NDMP (Network Data Management Protocol), no Fibre Channel (FC) is required. When you
use NDMP, all initiator and port functionality does not apply.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 32
This lesson covers Data Domain data paths, which include NFS, CIFS, DD Boost, NDMP, and VTL over
Ethernet or Fibre Channel.

This lesson also covers where a Data Domain system fits into a typical backup environment.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 33
Data Domain systems connect to backup servers as storage capacity to hold large collections of backup data.
This slide shows how a Data Domain system integrates non-intrusively into an existing storage environment.
Often a Data Domain system is connected directly to a backup server. The backup data flow from the clients
is simply redirected to the Data Domain device instead of to a tape library.
Data Domain systems integrate non-intrusively into typical backup environments and reduce the amount of
storage needed to back up large amounts of data by performing deduplication and compression on data
before writing it to disk. The data footprint is reduced, making it possible for tapes to be partially or
completely replaced.
Depending on an organization’s policies, a tape library can be either removed or retained.
An organization can replicate and vault duplicate copies of data when two Data Domain systems have the
Data Domain Replicator software option enabled.
One option (not shown) is that data can be replicated locally with the copies stored onsite. The smaller data
footprint after deduplication also makes WAN vaulting feasible. As shown in the slide, replicas can be sent
over the WAN to an offsite disaster recovery (DR) location.
WAN vaulting can replace the process of rotating tapes from the library and sending the tapes to a vault by
truck.
If an organization’s policies dictate that tape must still be made for long-term archival retention, data can
flow from the Data Domain system back to the server and then to a tape library.
Often the Data Domain system is connected directly to the backup server. The backup data flow is
redirected from the clients to the Data Domain system instead of to tape. If tape needs to be made for long-
term archival retention, data flows from the Data Domain system back to the server and then to tape,
completing the same flow that the backup server was doing initially. Tapes come out in the same standard
backup software formats as before and can go off-site for long-term retention. If a tape must be retrieved, it
goes back into the tape library, and the data flows back through the backup software to the client that

Ethernet supports the NFS, CIFS, FTP, NDMP, and DD Boost protocols that a Data Domain system
uses to move data.

In the data path over Ethernet (a family of computer networking technologies), backup and archive
servers send data from clients to Data Domain systems on the network via the TCP(UDP)/IP (a set
of communication protocols for the internet and other networks).

You can also use a direct connection between a dedicated port on the backup or archive server and
a dedicated port on the Data Domain system. The connection between the backup (or archive)
server and the Data Domain system can be Ethernet or Fibre Channel, or both if needed. This slide
shows the Ethernet connection.

When Data Domain Replicator is licensed on two Data Domain systems, replication is enabled
between the two systems. The Data Domain systems can be either local, for local retention, or
remote, for disaster recovery. Data in flight over the WAN can be secured using VPN. Physical
separation of the replication traffic from backup traffic can be achieved by using two separate
Ethernet interfaces on a Data Domain system. This allows backups and replication to run
simultaneously without network conflicts. Since the Data Domain OS is based on Linux, it needs
additional software to work with CIFS. Samba software enables CIFS to work with the
Data Domain OS.

Fibre Channel supports the VTL and DD Boost protocols that a Data Domain system uses to move
data.

If the Data Domain virtual tape library (VTL) option is licensed, and a VTL FC HBA is installed on the
Data Domain system, the system can be connected to a Fibre Channel system attached network
(SAN). The backup or archive server sees the Data Domain system as one or multiple VTLs with up
to 512 virtual linear tape-open LTO-1, LTO-2, LTO-3 , or LTO-4 tape drives and 20,000 virtual slots
across up to 100,000 virtual cartridges.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 36
This lesson covers Data Domain administration interfaces, which include the System Manager,
which is the graphical user interface (GUI), and the command line interface (CLI).

You can do everything from the CLI that you can do from the System Manager.

After the initial configuration, use the SSH or Telnet (if enabled) utilities to access the system
remotely and open the CLI.

The DD OS 5.4 Command Reference Guide provides information for using the commands to
accomplish specific administration tasks. Each command also has an online help page that gives the
complete command syntax. Help pages are available at the CLI using the help command. Any Data
Domain system command that accepts a list (such as a list of IP addresses) accepts entries
separated by commas, by spaces, or both.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 38
With the Data Domain System Manager (formerly the Data Domain Enterprise Manager), you can manage
one or more Data Domain systems. You can monitor and add systems from the System Manager. (To add a
system you need a sysadmin password.) You can also view cumulative information about the systems you’re
monitoring.
A Data Domain system should be added to, and managed by, only one System Manager.
You can access the System Manager from many browsers:
• Microsoft Internet Explorer™
• Google Chrome™
• Mozilla Firefox™
The Summary screen presents a status overview of, and cumulative information for, all managed systems in
the DD Network devices list and summarizes key operating information. The System Status, Space Usage,
and Systems panes provide key factors to help you recognize problems immediately and to allow you to drill
down to the system exhibiting the problem.

The tally of alerts and charts of disk space that the System Manager presents enables you to quickly spot
problems.
Click the plus sign (+) next to the DD Network icon in the sidebar to expose the systems being managed by
the System Manager.

The System Manager includes tabs to help you navigate your way through administrative tasks. To access
the top- and sub-level tabs, shown in this slide, you must first select a system. In the lower pane on the
screen, you can view information about the system you selected. In this slide, a system has been selected,
and you can view details about it.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 41
EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery with high-speed, inline deduplication. An EMC Data Domain system can also be used for
online storage with additional features and benefits.

A Data Domain system can connect to your network via Ethernet or Fibre Channel connections.
With an Ethernet connection, the system can be accessed using the NDMP, DD Boost, CIFS and NFS
protocols. The Fibre Channel connection supports the VTL protocol.

EMC Data Domain implements deduplication in a special hardware device. Most Data Domain
systems have a controller and multiple storage units.

Data Domain systems use low-cost Serial Advanced Technology Attachment (SATA) disk drives and
implement a redundant array of independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity. Data Domain systems use non-volatile random access
memory (NVRAM) to protect unwritten data. NVRAM is used to hold data not yet written to disk.
Holding data like this ensures that data is not lost in a power outage.

Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 42
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 43
Copyright © 2014 EMC Corporation. All rights reserved Module 1: Technology Overview 44

ICDL Computer Essentials
From Everand
ICDL Computer Essentials
Michael Anderson
4/5 (2)
Data Domain Fundamentals Student Guide
100% (1)
Data Domain Fundamentals Student Guide
70 pages
Data Domain Student Guide
No ratings yet
Data Domain Student Guide
288 pages
Student Guide
No ratings yet
Student Guide
428 pages
Understanding Data Deduplication
No ratings yet
Understanding Data Deduplication
4 pages
Emc Data Domain
No ratings yet
Emc Data Domain
13 pages
Data Domain Fundamental
No ratings yet
Data Domain Fundamental
39 pages
Deduplication School
No ratings yet
Deduplication School
61 pages
Data Domain Products - Overview (Customer Presentation)
No ratings yet
Data Domain Products - Overview (Customer Presentation)
27 pages
2014KS Bassiouny-Backup Optimization NetWorker Inside
No ratings yet
2014KS Bassiouny-Backup Optimization NetWorker Inside
35 pages
Ijctt V3i3p108
No ratings yet
Ijctt V3i3p108
6 pages
ExaGrid Systems Straight Talk About Disk Backup With Deduplication
100% (1)
ExaGrid Systems Straight Talk About Disk Backup With Deduplication
34 pages
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
h11534 Why Data Domain
No ratings yet
h11534 Why Data Domain
15 pages
Data Domain, Deduplication and More
No ratings yet
Data Domain, Deduplication and More
6 pages
Module 1: Deduplication: Technology and Systems Introduction
No ratings yet
Module 1: Deduplication: Technology and Systems Introduction
8 pages
Information Storage and Management: Storing, Managing, and Protecting Digital Information in Classic, Virtualized, and Cloud Environments
From Everand
Information Storage and Management: Storing, Managing, and Protecting Digital Information in Classic, Virtualized, and Cloud Environments
EMC Education Services
No ratings yet
Data Deduplication For Dummies: Submitted by 1.shashank Shekhar (11609052) 2.manisha (11609026)
No ratings yet
Data Deduplication For Dummies: Submitted by 1.shashank Shekhar (11609052) 2.manisha (11609026)
22 pages
Evaluating Deduplication Solutions?: What You Really Should Consider
No ratings yet
Evaluating Deduplication Solutions?: What You Really Should Consider
9 pages
Implementing IBM Storage Data Deduplication Solutions Sg247888
No ratings yet
Implementing IBM Storage Data Deduplication Solutions Sg247888
328 pages
EMC Data Domain Tech
No ratings yet
EMC Data Domain Tech
22 pages
Unit 2 and 3 (2 Part)
No ratings yet
Unit 2 and 3 (2 Part)
9 pages
4 2008 Snia
No ratings yet
4 2008 Snia
26 pages
Advanced Fuse Implementation: Definitive Reference for Developers and Engineers
From Everand
Advanced Fuse Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Deduplication Solutions Are Not All Created Equal, Why Data Domain
No ratings yet
Deduplication Solutions Are Not All Created Equal, Why Data Domain
12 pages
Best Free Open Source Data Recovery Apps for Mac OS English Edition
From Everand
Best Free Open Source Data Recovery Apps for Mac OS English Edition
Cyber Jannah Sakura
No ratings yet
EMC Data Domain Technical Overview
0% (1)
EMC Data Domain Technical Overview
24 pages
Data Domain Solution
No ratings yet
Data Domain Solution
15 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Final Paper v2.5
No ratings yet
Final Paper v2.5
6 pages
h7219 Data Domain Data Invul Arch WP
No ratings yet
h7219 Data Domain Data Invul Arch WP
10 pages
m08res01-DD Boost
No ratings yet
m08res01-DD Boost
44 pages
Third Term Revision SheetG10
No ratings yet
Third Term Revision SheetG10
19 pages
Oracle: Protect Your Data
From Everand
Oracle: Protect Your Data
Floribert TCHOKO
No ratings yet
Overview of Storage in Windows Server 2016
No ratings yet
Overview of Storage in Windows Server 2016
49 pages
EMC Data Domain Fundamentals
No ratings yet
EMC Data Domain Fundamentals
2 pages
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
From Everand
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
IDC WP Data-Deduplication
No ratings yet
IDC WP Data-Deduplication
16 pages
05 - Simpana® Deduplication
No ratings yet
05 - Simpana® Deduplication
36 pages
Deduplication in Yaffs: Karthik Narayan Pavithra Seshadrivijayakrishnan
No ratings yet
Deduplication in Yaffs: Karthik Narayan Pavithra Seshadrivijayakrishnan
17 pages
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
From Everand
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
Robert Johnson
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Backing Up and Restoring
No ratings yet
Backing Up and Restoring
5 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Beginner's Guide for Cybercrime Investigators
From Everand
Beginner's Guide for Cybercrime Investigators
Nicolae Sfetcu
5/5 (1)
Network File System in Practice: Definitive Reference for Developers and Engineers
From Everand
Network File System in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
How To Add VMFS Datastore Using VSphere Client
No ratings yet
How To Add VMFS Datastore Using VSphere Client
36 pages
Distributed File Systems Engineering: Definitive Reference for Developers and Engineers
From Everand
Distributed File Systems Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
From Everand
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
Mario Marinov
No ratings yet
SAN M32 New
No ratings yet
SAN M32 New
42 pages
h17072 Data Reduction With Dell Emc Powermax
No ratings yet
h17072 Data Reduction With Dell Emc Powermax
19 pages
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
From Everand
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Module01 AF Series Introduction R6v2 PDF
No ratings yet
Module01 AF Series Introduction R6v2 PDF
104 pages
Storage Optimization with Unity All-Flash Array: Learn to Protect, Replicate or Migrate your data across Dell EMC Unity Storage and UnityVSA
From Everand
Storage Optimization with Unity All-Flash Array: Learn to Protect, Replicate or Migrate your data across Dell EMC Unity Storage and UnityVSA
Victor Wu
5/5 (1)
FPGA Dedup
No ratings yet
FPGA Dedup
49 pages
Gene Nagle Adv Data Reduction Concepts 09-25-13
No ratings yet
Gene Nagle Adv Data Reduction Concepts 09-25-13
23 pages
Zorin OS Administration and User Guide: Definitive Reference for Developers and Engineers
From Everand
Zorin OS Administration and User Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ADD Find Unsorted Array Sorted Array Linked List
No ratings yet
ADD Find Unsorted Array Sorted Array Linked List
27 pages
Steps To Run A JAVA API On Virtual-Box
No ratings yet
Steps To Run A JAVA API On Virtual-Box
5 pages
Isilon GUI Administration
No ratings yet
Isilon GUI Administration
434 pages
Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance
No ratings yet
Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance
21 pages
Install Cloudera Manager Using AMI On Amazon EC2
No ratings yet
Install Cloudera Manager Using AMI On Amazon EC2
39 pages
Pig Installation On Mac and Linux
No ratings yet
Pig Installation On Mac and Linux
2 pages
Hdfs Java Api On Amazon Ec2: Prerequisites
No ratings yet
Hdfs Java Api On Amazon Ec2: Prerequisites
18 pages
2017 Storage Arrays Location Total Iops Iops Average Peak Iops Total Capacity (TB) Used (TB)
No ratings yet
2017 Storage Arrays Location Total Iops Iops Average Peak Iops Total Capacity (TB) Used (TB)
24 pages
Basic Linux Commands
No ratings yet
Basic Linux Commands
24 pages
Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance
No ratings yet
Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance
21 pages
Downloading Pig Datasets
No ratings yet
Downloading Pig Datasets
2 pages
Hey G, Here Is Your Zestmoney Account Statement As On 01 September, 2019
No ratings yet
Hey G, Here Is Your Zestmoney Account Statement As On 01 September, 2019
2 pages
NSX NAS Size
No ratings yet
NSX NAS Size
9 pages
Client-Server Based File Processing Application: The Basic Flow of The Application Is The Same
No ratings yet
Client-Server Based File Processing Application: The Basic Flow of The Application Is The Same
3 pages
M09res01-Data Security
No ratings yet
M09res01-Data Security
38 pages
Multiple Storage Allocations 19 DEC 2017
No ratings yet
Multiple Storage Allocations 19 DEC 2017
1 page
VNX File Shares and Its Capacity Details
No ratings yet
VNX File Shares and Its Capacity Details
2 pages
Running A Mapreduce Program On Cloudera Quickstart VM: Requirements
100% (1)
Running A Mapreduce Program On Cloudera Quickstart VM: Requirements
13 pages
DB Servername VNX Project Name (EBS/RMS/APS Etc) Type (Prod or Dev/Test)
No ratings yet
DB Servername VNX Project Name (EBS/RMS/APS Etc) Type (Prod or Dev/Test)
14 pages
Individual Fund Factsheet: March 2019
No ratings yet
Individual Fund Factsheet: March 2019
51 pages
HPUX oldVMAX Info
No ratings yet
HPUX oldVMAX Info
4 pages
Copying A File From Local To HDFS Using The Java API
No ratings yet
Copying A File From Local To HDFS Using The Java API
10 pages
S.No Array Name Model Capacity in TB: Phoenix DC Storage SAN Details
No ratings yet
S.No Array Name Model Capacity in TB: Phoenix DC Storage SAN Details
2 pages
Form12BB Zensar
No ratings yet
Form12BB Zensar
2 pages
Display Mask Configuration Sheet: Key Mask Name Region
No ratings yet
Display Mask Configuration Sheet: Key Mask Name Region
253 pages
DD Final
No ratings yet
DD Final
854 pages
h17272 Sap Hana Data Domain Techclaims WP PDF
No ratings yet
h17272 Sap Hana Data Domain Techclaims WP PDF
23 pages
Data Domain® DD Integration
No ratings yet
Data Domain® DD Integration
205 pages
EMC Networker and EMC Data Domain Boost Duplication Devices
No ratings yet
EMC Networker and EMC Data Domain Boost Duplication Devices
108 pages
DDOS - 6.0 - DDOS - Upgrade - 4
No ratings yet
DDOS - 6.0 - DDOS - Upgrade - 4
7 pages
Rights Reserved. 1
No ratings yet
Rights Reserved. 1
33 pages
E20 385 IE Backup Recovery Data Domain Specialist Exam
0% (1)
E20 385 IE Backup Recovery Data Domain Specialist Exam
11 pages
Datadomain Ss
No ratings yet
Datadomain Ss
5 pages
Integration Guide For Emc Data Domain Boost (DD Boost™) : Quest Vranger 7.6.6
No ratings yet
Integration Guide For Emc Data Domain Boost (DD Boost™) : Quest Vranger 7.6.6
34 pages
Emc Data Domain Networker Implementation Student Guide
No ratings yet
Emc Data Domain Networker Implementation Student Guide
96 pages
BRS SAP HANA NAS Data Protection Using Data Domain and NetWorker 11
No ratings yet
BRS SAP HANA NAS Data Protection Using Data Domain and NetWorker 11
30 pages
Data Domain Extended Retention Installation, Configuration and Administration - MR-1WP-DDEXRET SRG
0% (1)
Data Domain Extended Retention Installation, Configuration and Administration - MR-1WP-DDEXRET SRG
89 pages
NetWorker 19.1.x Data Domain Boost Integration Guide
No ratings yet
NetWorker 19.1.x Data Domain Boost Integration Guide
208 pages
640 Storage Expansion
No ratings yet
640 Storage Expansion
10 pages
Dell EMC Data Domain DD3300 Data Protection PDF
100% (1)
Dell EMC Data Domain DD3300 Data Protection PDF
2 pages
h8110 Oracle Rman Data Domain WP
No ratings yet
h8110 Oracle Rman Data Domain WP
24 pages
Docu52465 Data Domain Boost For Microsoft Applications Release 1.0 Release Notes
No ratings yet
Docu52465 Data Domain Boost For Microsoft Applications Release 1.0 Release Notes
8 pages
DDSA Course Description 30965 3
No ratings yet
DDSA Course Description 30965 3
3 pages
h10683 DD Boost Oracle RMAN Tech Review WP
No ratings yet
h10683 DD Boost Oracle RMAN Tech Review WP
17 pages
DPA Data Domain WP PDF
No ratings yet
DPA Data Domain WP PDF
25 pages
GlassHouse TSM DataDomain Whitepaper
No ratings yet
GlassHouse TSM DataDomain Whitepaper
40 pages
EMC Data Domain Overview
No ratings yet
EMC Data Domain Overview
23 pages
Module+7 - Data Domain Virtual Tape Library
No ratings yet
Module+7 - Data Domain Virtual Tape Library
64 pages
Data Domain Symantec NetBackup Best Practices
No ratings yet
Data Domain Symantec NetBackup Best Practices
35 pages
Solution Brief Riverbed Data Domain
No ratings yet
Solution Brief Riverbed Data Domain
2 pages

M01res01-Technology Overview

Uploaded by

M01res01-Technology Overview

Uploaded by

This module focuses on Data Domain core technologies.

It includes the following lessons:

The process is shown in this slide, as follows:

EMC Data Domain SISL™ Scaling Architecture is also called:

SISL architecture helps to speed up Data Domain systems.

• There are fewer complex data structures.

3. During every read, data integrity is re-verified.

• The NFS directory is /ddvar

You might also like