0% found this document useful (0 votes)
164 views

Emc Data Domain

The document provides an overview of EMC Data Domain storage systems, including their hardware components, software features, data protection architecture, and advantages. Data Domain systems use deduplication and compression to reduce storage needs and can be used for backup, archiving, disaster recovery, and as online storage. The systems have controller and disk shelf components and support various connectivity and capacity options.

Uploaded by

venkat
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
164 views

Emc Data Domain

The document provides an overview of EMC Data Domain storage systems, including their hardware components, software features, data protection architecture, and advantages. Data Domain systems use deduplication and compression to reduce storage needs and can be used for backup, archiving, disaster recovery, and as online storage. The systems have controller and disk shelf components and support various connectivity and capacity options.

Uploaded by

venkat
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Domain Overview

September 4, 2014 Paras EMC Data Domain

EMC Data Domain


Systems:
EMC Data Domain storage systems are traditionally used for disk
backup, archiving, and disaster recovery.

EMC Data Domain system can also be used for online storage with
additional features and benefits.
A Data Domain system can connect to your network via Ethernet or
Fibre Channel connections.
Data Domain systems use low-cost Serial Advanced Technology
Attachment (SATA) disk drives and implement a redundant array of
independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity.
Note: Data Domain uses only RAID 6 no other raids are possible.
Most Data Domain systems have a controller and multiple storage
units.
Hardware Overview:
Data Domain Models available:

Data Domain hardware consists of Controller and Disk Array


Enclosure.

I will be explaining Data Domain 990 model hardware overview in this


blog.
Hardware overview:
Two components: a. Controller b. Disk Shelf
Data Domain components in chassis:
1.

Quad-socket, 10-core Xeon processors (Westmere-EX)

2.

Two memory configurations available

3.

Base: 128 GB supports up to 360 TB raw, 285 TB usable

4.

Expanded: 256 GB supports up to 720 TB raw, 570 TB usable

5.

External expansion using ES30 and ES20 shelves

6.

Three quad-port 6 Gb/s SAS HBAs for external connectivity

7.

Connectivity up to 24 shelves, or up to max capacity

8.

Four I/O slots for data access connectivity

9.

Up to four dual-port 1 GbE NICs, optical

10.

Up to four quad-port 1 GbE NICs, copper

11.

Up to three dual-port 10 GbE NICs, copper with SFP+ interface

12.

Up to three dual-port 10 GbE NICs, optical with LC interface

13.

Up to three dual-port 8 Gb Fibre Channel VTL HBAs

14.

Two 2 GB remote-battery NVRAM with Battery Backup Unit

Two types of configuration are available in DD990. One DD990 with


128 GB RAM and second one is DD990 with 256 GB.

DD990 chassis enclosure View:


Controller Module Front and Back panel View.
Controller Front panel View

Controller Back Panel View

Disk Shelf Front View:

Disk Shelf Back View:

Software overview:
Overview:
Support for leading backup, file archiving, and email archiving
applications

Simultaneous use of VTL, CIFS, NFS, NDMP, and EMC Data


Domain Boost

Inline write/read verification, continuous fault detection, and


healing

Conformance with IT governance and regulatory compliance


standards for archived data
Software components: Data Domain Operating system
Data Domain Inline Deduplication:
Data domain follows Inline deduplication below mentioned is the
process occurs during Inline deduplication.

1.
2.

3.

Inbound segments are analyzed in RAM.


If a segment is redundant, a reference to the stored segment is
created.
If a segment is unique, it is compressed and stored.
Inline deduplication requires less disk space than post-process
deduplication. There is less administration for an inline deduplication
process, as the administrator does not need to define and monitor the
staging space. Inline deduplication analyzes the data in RAM, and
reduces disk seek times to determine if the new data must be stored
EMC Global and Local Compression:
Global Compression:
EMC Data Domain Global Compression is the EMC Data Domain
trademarked name for global compression, local compression, and
deduplication.

Global compression equals deduplication. It identifies previously


stored segments and cannot be turned off.
Local Compression:
Local compression compresses segments before writing them to disk.
It uses common, industry-standard algorithms (for example, lz, gz,
and gzfast). The default compression algorithm used by Data Domain
systems is lz.
Local compression is similar to zipping a file to reduce the file size. Zip
is a file format used for data compression and archiving. A zip file
contains one or more files that have been compressed, to reduce file
size, or stored as is. The zip file format permits a number of
compression algorithms. Local compression can be turned off.
EMC Data Domain SISL Scaling Architecture:
SISL architecture helps to speed up Data Domain systems.

SISL does the following:


1.Segment The data is broken into variable-length segments.
2.Fingerprint Each segment is given a fingerprint, or hash, for
identification.
3.Filter The summary vector and segment locality techniques
identify 99% of the duplicate segments in RAM, inline, before storing
to disk. If a segment is a duplicate, it is referenced and discarded. If a
segment is new, the data moves on to step 4.
4.Compress New segments are grouped and compressed using
common algorithms: lz, gz, gzfast (lz by default).
5.Write Writes data (segments, fingerprints, metadata and logs) to
containers, and containers are written to disk.

EMC Data Domain Data Invulnerability Architecture (DIA):


The EMC Data Domain operating system (DD OS) is built for data
protection. Its elements comprise an architectural design whose goal is
data invulnerability. Four technologies within the DIA fight data loss:
End-to-end verification
1.

Fault avoidance and containment

2.

Continuous fault detection and healing

3.

File system recoverability


Now lets discuss on above technologies
1. End to End verification:

Steps involved in End to End Verification:


1.

Write request comes from backup software.

2.

Analyze the Data for redundancy.

3.

Store New Data Segments only.

4.

Store fingerprints and verify.

5.

6.

Verify after Backup that DD OS can read the data from disk
through Data domain File system.
Verify that checksum are correct.
2. Fault avoidance and containment
Data Domain systems are equipped with a specialized log-structured
file system that has below features.

1.

New data never overwrites existing data.

2.

Fewer complex data structures.

3.

System includes non-volatile RAM (NVRAM) for fast, safe


restart.
3. Continuous fault detection and healing
Continuous fault detection and healing provide an extra level of
protection within the Data Domain operating system. The DD OS
detects faults and recovers from them continuously. Continuous fault
detection and healing ensures successful data restore operations.

Continuous fault detection and healing process:


1. The Data Domain system periodically rechecks the integrity of the
RAID stripes and container logs.
2. The Data Domain system uses RAID system redundancy to heal
faults. RAID 6 is the foundation for Data Domain systems continuous
fault detection and healing. Its dual-parity architecture offers
advantages over conventional architectures, including RAID 1
(mirroring), RAID 3, RAID 4 or RAID 5 single-parity approaches.
RAID 6:
Protects against two disk failures.
Protects against disk read errors during reconstruction.
Protects against the operator pulling the wrong disk.
Guarantees RAID stripe consistency even during power failure without
reliance on NVRAM or an uninterruptable power supply (UPS).
3. During every read, data integrity is re-verified.
4. Any errors are healed as they are encountered.
4. File system recoverability
File system recovery is a feature that reconstructs lost or corrupted
file system metadata.
In Data Domain file systems data is written in a self-describing format
the file system can be recreated by scanning the logs and rebuilding it
from metadata stored with the data.
Why to Use Data Domain system?

Data Domain has below advantages


1.

Data Deduplication

2.

Easy Integration

3.

Network Efficient Replication

4.

Safe and reliable

You might also like