ThomasRivera Introduction To Data Protection
ThomasRivera Introduction To Data Protection
Author:
SNIA - Data Protection & Capacity Optimization (DPCO) Committee
SNIA Legal Notice
Learning Objectives:
Get a basic grounding in backup and restore technology including tape, disk,
snapshots, deduplication, virtual tape, and replication technologies
Compare and contrast backup and restore alternatives to achieve data
protection and data recovery
Identify and define backup and restore operations and terms
Held in the balance are concepts like the value of the data (data
importance or business criticality), budget, speed, and cost of downtime
Detection
Corruption or failure reported
Diagnosis / Decision
What went wrong?
What recovery point should be used?
What method of recovery should be used -- overall strategy for the recovery?
Restoration
Moving the data from backup to primary location
From tape to disk, or disk to disk, or cloud to disk; Restore the lost or corrupted
information from the backup or archive (source), to the primary or production disks.
Recovery – Almost done!
Application environment - perform standard recovery and startup operations
Any additional steps
Replay log may be applied to a database
Journals may be replayed for a file system
Years Days Hrs Mins Secs Secs Mins Hrs Days ????
Tape Backups Capture on Write Synthetic Backup Instant Recovery Restore from Tape, Disk, Cloud
Vaults Disk Backups Data Replication Point-in-Time Recovery
Archival Snapshots Cloud Backup Roll Back Search & Retrieve
Cold
Offline image of all data
As backup window shrinks & data size expands, cold backup becomes untenable
Cheapest and simplest way to backup data
Application Consistent
Application supports ability to take parts of data set offline during backup
Application knows how to recover from a collection of consistent pieces
Avoids downtime due to backup window
Crash Consistent or Atomic
Data copied or frozen at the exact same moment across entire dataset
Application recovery from an atomic backup similar to an application failover
Rebuilding may be needed
No backup window
Introduction to Data Protection 10
Approved SNIA Tutorial © 2015 Storage Networking Industry Association. All Rights Reserved.
Data Protection Design Trade-offs
LAN
Application Hosts Backup Hosts
Network Clients
CLOUD
LAN
Application Hosts Backup Hosts
Network Clients
WAN
Media
AGENT LAN Server
CATALOG
SAN / SCSI
Application Backup
Server Server
Data
DATA Metadata
Secondary
Storage
AGENT LAN
Media
Server CATALOG
Data
DATA MIRROR Metadata
SAN
Secondary
Storage
AGENT LAN
Media
Server CATALOG
Application Backup
Server Server
SAN / SCSI
Data
DATA Metadata
SNAPSHOT DATA
MOVER Secondary
Storage
Media
CATALOG Server
Backup
Data Server
Metadata Secondary
Storage
WAN
LAN
CLOUD
AGENT
Full Backup
Everything copied to backup (cold or hot backup)
Full view of the volume at that point in time
Restoration straight-forward as all data is available in one backup image
Huge resource consumption (server, network, tapes)
Incremental Backup
Only the data that changed since last full or incremental
Change in the archive bit
Usually requires multiple increments and previous full backup to do full restore
Much less data is transferred
Differential backup
All of the data that changed from the last full backup
Usually less data is transferred than a full
Usually less time to restore full dataset than incremental
Incremental Forever
FULL
File-level backups
Any change to a file will cause entire file to be backed up
Open files often require special handling SW
Open files may get passed over – measure the risks
PRO: Ease of BU and restore CON: Moves tons of data
Block-level backups
Only the blocks that change in a file are saved
Requires client-side processing to discover changed blocks
PRO: Smaller backups, Less network impact, Faster
CON: Client-side impact, increased complexity
Client-side backups
Intelligent agent monitors changes and protects only new blocks
Agent enables advanced technology, granular backups and user policies
Deduplication can enable network efficiency, reduce BU data volume
PRO: Efficiently distributes work CON: Complex client/server
Introduction to Data Protection 22
Approved SNIA Tutorial © 2015 Storage Networking Industry Association. All Rights Reserved.
Backup to Tape, Disk and Beyond
Tape drives run faster than most backup jobs – Is this good?
Matching backup speed is more important than exceeding it
Avoid shoe-shining
Slower hosts can tie up an expensive drive
It’s a shame to waste a drive on these hosts
Slower tapes can tie up expensive (important) servers
It is a shame to let the tape drive throttle backup servers
Slow backup can impact production servers as well
Replacing your tapes may not solve your backup challenges
A well designed backup architecture is the best answer
If backup target speed is your issue:
Consider alternates such as virtual tape (VTL) or D2D2T
Security, security, security……..
What?
Backup to Disk / Disk to Disk Backup
Disk as a primary backup target LAN
Why?
Performance and reliability
Reduced backup window Disk
Target
Greatly improved restores Backup
Server SAN
RAID protection
Eliminate mechanical interfaces
More effective sharing of backup targets
Considerations
Fibre Channel Disks versus SATA versus SAS
Tape
I/O random access vs. MB/s sequential Library
SAN, NAS or DAS
VTL or mirroring
Consider a mix of Disk and Tape (D2D2T)
Consider a capacity-optimized appliance
What:
Virtual Tape Libraries emulate traditional tape
Fits within existing backup environment Backup
VTL
Reduce / eliminate tape handling
Why: IP / FC
SAN
Considerations:
Easy to manage in traditional backup software environment
Can extend the life of current physical tape investment
Introduction to Data Protection 27
Approved SNIA Tutorial © 2015 Storage Networking Industry Association. All Rights Reserved.
Introduction to CDP
What:
Continuous Data Protection
App
Capture every change as it occurs Server
Block-based Path
File-based
Application-based Protect
Record of
Updates
Implementations of true CDP today are delivering zero data loss, zero backup
window and simple recovery; CDP customers can protect all data at all times and
recover directly to any point in time
“Near CDP” (Snapshots, checkpoints) may also help but will not catch every change
What?
A disk based “instant copy” that captures the original data at a specific point in time
Snapshots can be read-only or read-write.
Also known as Checkpoint, Point-in-Time, Stable Image, Clone
Often handled at the storage level
May be done at application server, hypervisor, and/or in cloud
Why?
Allows for complete backup or restore
With application downtime measured in minutes (or less)
May be able to be combined with replication
Most vendors: Image only = (entire Volume)
Backup/Restore of individual files is possible
If conventional backup is done from snapshot
Or, if file-map is stored with Image backup
What?
The process of examining a data-set or I/O stream at the sub-file level and
storing and/or sending only unique data
Cleint-side SW, Target-side HW or SW, can be both client and target
Why?
Check out SNIA Tutorial:
Reduction in cost per terabyte stored
Advanced Deduplication
Significant reduction in storage footprint Concepts
Less network bandwidth required
Considerations
Greater amount of data stored in less physical space
Suitable for backup, archive and (maybe) primary storage
Enables lower cost replication for offsite copies
Store more data for longer periods
Beware 1000:1 dedupe claims – Know your data and use case
Multiple performance trade-offs
Related tutorials
Active Archive – Data Protection for the Data Center
Advanced Deduplication Concepts
Trends in Data Protection and Restoration Technologies
Understanding Data Deduplication
Retaining Information for 100 Years
Visit the Data Protection and Capacity Optimization
Committee website http://www.snia.org/dpco/
DPCO online Product Selection Guide
http://sniadataprotectionguide.org/
AGENT LAN
Media
Server CATALOG
Application Backup
SAN / SCSI Server
Server
Data
DATA Metadata
Secondary
Storage