Data Protection and DR

Uploaded by

Nguyễn Văn Trực

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views4 pages

Data Protection and DR

Uploaded by

Nguyễn Văn Trực

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Module 6, Data Protection and Disaster Recovery.

Storage policies are the core of HyperStore, Data Protection, and DR.
It is through the use of storage policies that data durability and availability are
set.
Again, it is important to point out here that a single HyperStore cluster can
support multiple storage policies.
So, it is extremely flexible in being able to choose the durability and
availability for whatever data you are storing and when, by default,
a maximum of 25 storage policies can be created.
However, this is a soft limit and can be increased if needed.
Through the use of protection policies, we can set the data protection via
replication or a ratio coating
Decide if data should reside in a single site or multi site to extend your DR
capability.
Through the usage of consistency levels, it is possible to have a good balance of
availability
while it is maintaining a guaranteed level of consistency of data.
It is through the storage policy that we can also set whether the data is stored in
a compressed format or encrypted.
HyperStore does not support deduplication. We have discussed at a high level the
two data protection mechanisms that we employ and these can be seen here.
Revelocation in our parity-based erasure coating option.
Any number of replicas can be configured, but it is dependent on the total number
of nodes in the cluster.
Erasure coating schemes available will also be dependent on the total number of
nodes in the cluster.
We will discuss the pros and cons of each in the following slide.
Data durability in terms of data protection refers to the ability of the stored
data to not be affected by bit rot, degradation, or other corruption.
Rather than hardware redundancy, it is concerned with data redundancy, so that data
is never lost or compromised.
Data durability is not new.
And as mentioned in earlier modules, RAID is a data protection mechanism used in
block and file, and depending on the RAID protection, can be used to provide a
different data durability target.
AWS, which uses a replication factor of four, i.e. four separate copies of the
data, provides eleven nines of data durability.
Or in real terms, you can expect to lose no more than.000000001 of your data on an
annual basis.
Sounds pretty good, right?
Hyperstore can even further extend that durability by allowing as many copies as
you have nodes.
Or through the use of large stripes with large number of parity parts, increasing
durability to 13 or 14 nines and beyond.
A key question is when to use replication and when to use erasure coating.
This isn't always an easy question to answer, as there tends to be a business
decision to be made to determine the data durability required.
Sometimes though, the decision is a little easier.
For small clusters of three or four nodes, then replication is the only data
protection mechanism open to you.
Also, for small objects, the overhead of splitting a small object into a number of
fragments, creating coding fragments, and then storing that on a file system, using
an 8K block size, would waste not only CPU resources, but be significantly more
capacity on disk than a comparative replication factor.
For example, a single 4K object being stored using EC4 plus 2 would require 6
fragments.
Each fragment would be 1K.
However, as the block size on disk is 8K, it would require 6x8K to store that data,
or 48K.
For a comparative replication factor that would have the same tolerance to failure
as EC4 plus 2 would be to store 3 copies, or Rf3.
Each 4K copy would be stored using an 8K block size on disk, or 24K in total.
As you can see here, we are using half the amount of storage required using
replication than erasure coating.
Where really, erasure coating is there to provide the exact opposite, i.e. more
efficient use of storage capacity, then replication.
We support multiple data protection and distribution options, all within a single
hyperstore cluster.
These are shown here, and are replication or erasure coating in a single DC.
Replication copies across DCs and replicating erasure coated data between 1 and
more DCs.
And lastly, the ability to create a single erasure coating scheme across 3 or more
DCs.
However, for this last option, very low latency network links are required between
all of the DCs, as it is adequate bandwidth to be able to support the amount of
data being stored.
Caphere of states that, in an object store, of the 3 properties, consistency,
availability, and partition tolerance, you can have 2.
Cassandra is regarded as an AP database being designed around availability and
partition tolerance, and Redis is regarded as a CP database, being designed to
provide consistency and partition tolerance.
Cloudy and hyperstore is designed to use a combination of data protection and
consistency levels to support and provide all 3 capabilities.
Data consistency refers to how up-to-date and in-sync. Data is in a distributed
system.
Strong consistency ensures that all data returned back to the client is the same as
the data that was stored and returned in the same order.
It is similar to synchronous replication, which is utilized in block and file
storage.
Eventual consistency or rights does not guarantee that all copies or fragments, as
defined in the storage policy, are written before success if given back to the
client.
However, it does guarantee a majority or replicas, or at least one parity
protection fragment, is written so that data is not stored unprotected and will
eventually write all replicas or fragments as soon as it is able.
The following slide shows the difference between right consistency of all versus a
right consistency of quorum. With all, all rights must complete before the right is
acknowledged back to the client.
This is however at the expense of availability as any failure of node where the
right was destined would result in a fail right back to the client. With quorum as
long as a majority of rights, more than half, have been successful, then the right
will be acknowledged back to the client at that point.
The system will still continue to write remaining copies and fragments if it is
able to, but the performance benefit, although small, is still seen as the
acknowledgement is received back to the client sooner.
Consistency level is equally important in a multi-DC environment due to the risk of
higher latency and or network response across geographic distances, choosing a
consistency level of all where data has to be written in both locations before
acknowledging to the client does introduce a higher possibility of fail read-oin-
write.
Consideration also needs to be given in any consistency level where the data
between locations is not checked for consistency.
It is possible, where a local consistency level is employed that all data might be
returned back to the client, so this has to be seriously considered before doing
so.
We have mentioned erasure coding before, but let's now look a bit deeper into what
erasure coding is and how it provides efficiency and tolerance to failure.
Erasure coding is defined as a number of data disks and a number of parity disks.
The data disks we refer to as the letter K and the parity or coding disks we refer
to the letter M.
The data disks we refer to as a strip with a combination of strips referred to as a
stripe. Erasure coding schemes are given in the form of K plus M.
Our implementation of erasure coding is based on a conchirid Solomon error
correction, which gives better utilization on usable storage versus replicas.
The M figure is how we refer to the tolerance to failure, while it's still being
able to read or regenerate the data.
As each disk is always on a different node from other strips, the plus number is
the tolerance to failure.
So, 4 plus 2 can survive 2 disks or node failures. 6 plus 3 can survive 3, and 8
plus 4 can survive 4.
Let's review some of the data protection mechanisms available.
Replication factor is multiple copies of the object with usable storage being
defined by raw terabytes divided by the RF number.
We can create policies which are limited to a single site.
The storage overhead here is calculated as K divided by K plus M times the raw
terabyte. In multi-DC environments, we can employ replicated erasure coding.
With replicated EC, the storage policy only allows for one erasure coding scheme.
So, all DCs configured within that storage policy have to have the same erasure
coding scheme.
You cannot mix erasure coding schemes in the same replicated erasure coding storage
policy.
The storage overhead here would be the same calculation as for the single site EC,
but divided by the number of DCs involved in the storage policy.
And lastly, distributed EC. This allows for the usage of a single erasure coding
scheme, but is spread across 3 or more DCs.
This allows for a complete data center failure while still ensuring protection of
the data.
This also maintains a good level of efficiency as we are not losing efficiency by
replication.
The efficiency loss across multiple sites.
This data protection method however does require very low latency networks between
sites.
Please note the low latency.
Although high bandwidth is also important, it is the network latency which is
critical here in distributed EC.
In storage policies involving multiple DCs, there are some consideration when it
comes to applying a consistency level.
These can be seen here and are referred to as local or each.
Local must be able to be written wherever the coordinator node is for the request.
Each must be able to be written in each of the DCs to be successful.
This is further extended by whether the consistency level is for all fragments K
plus N or only quorum K plus 1.
This applies to writes only for reads only K number of fragments need to be read to
be successful.
In all cases K plus N fragments will eventually be written to each DC.
As mentioned, distributed EC can only be employed in three or more DCs and some of
the recommended EC policies can be seen here for the number of participating DCs.
The design rules which must be applied are a minimum of three DCs, nine nodes, and
a five plus four EC strategy.
K plus M total is evenly divisible by number of DCs and losing a DC means that
fragments still available is equal to or more than K plus 1.
The following examples show how distributed EC works in a multi DC configuration.
Consider a three DC configuration with 12 nodes, three nodes in each DC using an
erasure coding scheme of seven plus five.
In the event of a DC failure you would lose four of your 12 nodes leaving eight
nodes remaining.
With an erasure coding scheme of seven plus five you need 12 nodes to store the
seven data parts K and five coding parts M as discussed earlier.
You need to have K fragments available to be able to read or regenerate the data
and K plus one nodes available for a successful write.
Assuming we are using a consistency level of quorum we have eight remaining nodes
which is more than the required seven for reads and meets the minimum K plus one
for writes.
So we can lose a DC and still operate with no impact to service.
If we consider the bottom example and although this configuration would not be
valid and you would be unable to configure such a cluster we use this to show the
impact in this example.
We have 16 nodes across four DCs with each DC having four nodes.
We are using an EC scheme of 12 plus four.
So all nodes in the cluster will have one fragment of the stored object.
If we lose a DC we have 12 remaining nodes.
In this instance we would still be able to read our data as we need K parts which
in a 12 plus four scheme is 12.
And we have 12 remaining nodes.
However we would not be able to write data as we would need K plus one for 13 nodes
for a successful write.
However we only have 12.
Cassandra is the database where we store our metadata.
Cassandra can only store copies of the data and does not itself understand or have
a concept of a RACER code A.
With a data protection scheme using replication the number of metadata copies we
store is equal to the number of data copies.
So a policy of RF3 would store three copies of the data and three copies of the
metadata.
But how do we know how many copies of metadata to store when using an erasure
coding policy?
We use a formula to determine the number of metadata copies which is 2M plus one.
Or in other words twice the number of coding parts plus one.
In this example of an EC scheme of three plus two which is the minimum EC scheme we
can support.
We store three data parts K and two coding or parity parts M.
And two times two plus one or five metadata copies.
This is explained further in this slide and shows the difference between an all
versus a quorum consistency level.
Metadata is also subject to the consistency level.
Therefore if we were using a consistency level of all then all K plus M parts would
need to be written as well as all five metadata copies.
If quorum then only K plus one data encoding parts would need to be written and the
majority of metadata copies written.
The majority of five is of course three. This is just one more than half rounded
down.

Kubernetes: How To Pass The Certified Kubernetes Administrator (CKA) Exam
100% (2)
Kubernetes: How To Pass The Certified Kubernetes Administrator (CKA) Exam
241 pages
Project Synopsis On Shopping Cart
No ratings yet
Project Synopsis On Shopping Cart
18 pages
SAP ABAP Profile 2+ - Ramana
No ratings yet
SAP ABAP Profile 2+ - Ramana
4 pages
Database System Development Life Cycle
100% (1)
Database System Development Life Cycle
6 pages
Kasp10 Best Practicesen PDF
No ratings yet
Kasp10 Best Practicesen PDF
84 pages
Aarne Ranta - Implementing Programming Languages. An Introduction To Compilers and Interpreters (2012, College Publications)
No ratings yet
Aarne Ranta - Implementing Programming Languages. An Introduction To Compilers and Interpreters (2012, College Publications)
226 pages
ABAP Upgrade: How To Skip Phase RUN - SGEN - GENER8 On Shadow Instance
No ratings yet
ABAP Upgrade: How To Skip Phase RUN - SGEN - GENER8 On Shadow Instance
3 pages
Connections 5.5 Small Deployment Part 1 - Core Apps
No ratings yet
Connections 5.5 Small Deployment Part 1 - Core Apps
90 pages
Bizhub c452
No ratings yet
Bizhub c452
343 pages
Unit1-Data Science
No ratings yet
Unit1-Data Science
77 pages
PHP Pdo Syntax
No ratings yet
PHP Pdo Syntax
20 pages
SQL Notes
No ratings yet
SQL Notes
18 pages
Requisition Maintenance
No ratings yet
Requisition Maintenance
25 pages
C4 Containers
No ratings yet
C4 Containers
22 pages
SDLC Model
No ratings yet
SDLC Model
11 pages
FinancialForce Licensing Guide Subscription Overview
No ratings yet
FinancialForce Licensing Guide Subscription Overview
44 pages
IBM Cloud Migration
No ratings yet
IBM Cloud Migration
14 pages
Decision Support and Business Intelligence Systems: (9 Ed., Prentice Hall)
No ratings yet
Decision Support and Business Intelligence Systems: (9 Ed., Prentice Hall)
23 pages
Appsolute MAMP PRO 3
No ratings yet
Appsolute MAMP PRO 3
34 pages
Intro To DS Chapter 5
No ratings yet
Intro To DS Chapter 5
76 pages
H3C UIS Manager StorageGuide-5W101-book
No ratings yet
H3C UIS Manager StorageGuide-5W101-book
26 pages
Fault Tolerance Unit 3-4
No ratings yet
Fault Tolerance Unit 3-4
32 pages
Zero (1) 1
No ratings yet
Zero (1) 1
12 pages
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
Transpose
No ratings yet
Transpose
3 pages
Data Cloud Consultant - 8
No ratings yet
Data Cloud Consultant - 8
16 pages
Aaru CV
No ratings yet
Aaru CV
1 page
How To Run A Hta File in The Ts
No ratings yet
How To Run A Hta File in The Ts
3 pages
Red-Teaming Active Directory Lab #3 (ELS - CORP) (Attack Path 1)
No ratings yet
Red-Teaming Active Directory Lab #3 (ELS - CORP) (Attack Path 1)
24 pages
Capacity Bounds For Distributed Storage: Michael G. Luby
No ratings yet
Capacity Bounds For Distributed Storage: Michael G. Luby
19 pages
DSs CH 07 - Replication, Consistency Fault Tolerance
No ratings yet
DSs CH 07 - Replication, Consistency Fault Tolerance
4 pages
RD-140 - Automatic RMA Generation V1.2 - 03222023
No ratings yet
RD-140 - Automatic RMA Generation V1.2 - 03222023
7 pages
JTN002 - MinUnit - A Minimal Unit Testing Framework For C
No ratings yet
JTN002 - MinUnit - A Minimal Unit Testing Framework For C
2 pages
Storage For Containers Whitepaper
No ratings yet
Storage For Containers Whitepaper
11 pages
The CAP Theorem in DBMS - GeeksforGeeks
No ratings yet
The CAP Theorem in DBMS - GeeksforGeeks
6 pages
Replication Placement On Disk
No ratings yet
Replication Placement On Disk
14 pages
Unit - 6 - Distributed File System
No ratings yet
Unit - 6 - Distributed File System
6 pages
Solution Sizing
No ratings yet
Solution Sizing
3 pages
BCS 413 - Lecture5 - Replication - Consistency
No ratings yet
BCS 413 - Lecture5 - Replication - Consistency
25 pages
Slides
No ratings yet
Slides
31 pages
1.1 Need of The Study: Provable Multicopy Dynamic Data Possession in Cloud Computing Systems
No ratings yet
1.1 Need of The Study: Provable Multicopy Dynamic Data Possession in Cloud Computing Systems
76 pages
SAP GRC Vs ProfileTailor GRC Appsian Security
No ratings yet
SAP GRC Vs ProfileTailor GRC Appsian Security
4 pages
Open Test Siphael
No ratings yet
Open Test Siphael
3 pages
Aurora
No ratings yet
Aurora
8 pages
Chapter 5
No ratings yet
Chapter 5
4 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
18 pages
05 Storage Cloud Services
No ratings yet
05 Storage Cloud Services
64 pages
Casual Consistency
No ratings yet
Casual Consistency
1 page
Comptia Cloud+ CV0 - 004: 715 Questions and Explanation
From Everand
Comptia Cloud+ CV0 - 004: 715 Questions and Explanation
Arabella Kushner
No ratings yet
Chapter 1
No ratings yet
Chapter 1
57 pages
Vehicles of Interest Introduction
No ratings yet
Vehicles of Interest Introduction
1 page
Lec 7 1
No ratings yet
Lec 7 1
46 pages
Lecture 4.0 - Distributed File Systems
No ratings yet
Lecture 4.0 - Distributed File Systems
15 pages
Replicated Data Consistency Explained Through Baseball: Doug Terry Microsoft Research Silicon Valley
No ratings yet
Replicated Data Consistency Explained Through Baseball: Doug Terry Microsoft Research Silicon Valley
14 pages
2 - ACID Vs BASE
No ratings yet
2 - ACID Vs BASE
30 pages
Crash Recovery in A Distributed Data Storage System
No ratings yet
Crash Recovery in A Distributed Data Storage System
28 pages
CA Classes-271-275
No ratings yet
CA Classes-271-275
5 pages
Lecture 6
No ratings yet
Lecture 6
9 pages
Remote Data Checking For Network Coding-Based Distributed Storage Systems
No ratings yet
Remote Data Checking For Network Coding-Based Distributed Storage Systems
13 pages
Optimized Caching Techniques: Application for Scalable Distributed Architectures
From Everand
Optimized Caching Techniques: Application for Scalable Distributed Architectures
Peter Jones
No ratings yet
Oracle Recovery Appliance Handbook: An Insider’S Insight
From Everand
Oracle Recovery Appliance Handbook: An Insider’S Insight
Ramesh Raghav
No ratings yet
Database Development Supporting Offline Update Using CRDT: (Conflict-Free Replicated Data Types)
No ratings yet
Database Development Supporting Offline Update Using CRDT: (Conflict-Free Replicated Data Types)
6 pages
Fault Tolerance in Distributed System Using Fused Data Structures
No ratings yet
Fault Tolerance in Distributed System Using Fused Data Structures
16 pages
IsilonIQ Storage With Small Files Online
No ratings yet
IsilonIQ Storage With Small Files Online
14 pages
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
From Everand
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
Friend Good
No ratings yet
Micropolis Data Storage Primer: Micropolis Handbooks
From Everand
Micropolis Data Storage Primer: Micropolis Handbooks
Micropolis Handbooks
No ratings yet
D.S Consistency and Replication
No ratings yet
D.S Consistency and Replication
44 pages
Application and Implementation of DES Algorithm Based on FPGA
From Everand
Application and Implementation of DES Algorithm Based on FPGA
madhav
No ratings yet
PostgreSQL Replication - Second Edition
From Everand
PostgreSQL Replication - Second Edition
Hans-Jurgen Schonig
No ratings yet
Using Replicated Data To Reduce Backup Cost in Distributed Databases
No ratings yet
Using Replicated Data To Reduce Backup Cost in Distributed Databases
7 pages
Methodology
No ratings yet
Methodology
5 pages
Distributed File Systems: Unit - V Essay Questions
No ratings yet
Distributed File Systems: Unit - V Essay Questions
10 pages
Queue
No ratings yet
Queue
33 pages
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
Distributed File Systems Engineering: Definitive Reference for Developers and Engineers
From Everand
Distributed File Systems Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
GlusterFS Administration and Deployment: Definitive Reference for Developers and Engineers
From Everand
GlusterFS Administration and Deployment: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ceph Architecture and Administration: Definitive Reference for Developers and Engineers
From Everand
Ceph Architecture and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Google Cloud Memorystore in Practice: Definitive Reference for Developers and Engineers
From Everand
Google Cloud Memorystore in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Longhorn for Kubernetes Storage Architecture: The Complete Guide for Developers and Engineers
From Everand
Longhorn for Kubernetes Storage Architecture: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Essential Backup Strategies and Techniques: Definitive Reference for Developers and Engineers
From Everand
Essential Backup Strategies and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Network Backup with Amanda: Definitive Reference for Developers and Engineers
From Everand
Advanced Network Backup with Amanda: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
UrBackup Solutions for Reliable System Backup: Definitive Reference for Developers and Engineers
From Everand
UrBackup Solutions for Reliable System Backup: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AZ-104 Azure Administrator Practice Paper 1: AZ-104 Azure Administrator, #1
From Everand
AZ-104 Azure Administrator Practice Paper 1: AZ-104 Azure Administrator, #1
Tech Interviews
No ratings yet
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
From Everand
The Ceph Handbook: Building and Managing Scalable Distributed Storage Systems
Robert Johnson
No ratings yet
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
From Everand
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
Rob Botwright
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Data Protection and DR

Uploaded by

Data Protection and DR

Uploaded by

Module 6, Data Protection and Disaster Recovery.

You might also like