Data Protection and DR
Data Protection and DR
Storage policies are the core of HyperStore, Data Protection, and DR.
It is through the use of storage policies that data durability and availability are
set.
Again, it is important to point out here that a single HyperStore cluster can
support multiple storage policies.
So, it is extremely flexible in being able to choose the durability and
availability for whatever data you are storing and when, by default,
a maximum of 25 storage policies can be created.
However, this is a soft limit and can be increased if needed.
Through the use of protection policies, we can set the data protection via
replication or a ratio coating
Decide if data should reside in a single site or multi site to extend your DR
capability.
Through the usage of consistency levels, it is possible to have a good balance of
availability
while it is maintaining a guaranteed level of consistency of data.
It is through the storage policy that we can also set whether the data is stored in
a compressed format or encrypted.
HyperStore does not support deduplication. We have discussed at a high level the
two data protection mechanisms that we employ and these can be seen here.
Revelocation in our parity-based erasure coating option.
Any number of replicas can be configured, but it is dependent on the total number
of nodes in the cluster.
Erasure coating schemes available will also be dependent on the total number of
nodes in the cluster.
We will discuss the pros and cons of each in the following slide.
Data durability in terms of data protection refers to the ability of the stored
data to not be affected by bit rot, degradation, or other corruption.
Rather than hardware redundancy, it is concerned with data redundancy, so that data
is never lost or compromised.
Data durability is not new.
And as mentioned in earlier modules, RAID is a data protection mechanism used in
block and file, and depending on the RAID protection, can be used to provide a
different data durability target.
AWS, which uses a replication factor of four, i.e. four separate copies of the
data, provides eleven nines of data durability.
Or in real terms, you can expect to lose no more than.000000001 of your data on an
annual basis.
Sounds pretty good, right?
Hyperstore can even further extend that durability by allowing as many copies as
you have nodes.
Or through the use of large stripes with large number of parity parts, increasing
durability to 13 or 14 nines and beyond.
A key question is when to use replication and when to use erasure coating.
This isn't always an easy question to answer, as there tends to be a business
decision to be made to determine the data durability required.
Sometimes though, the decision is a little easier.
For small clusters of three or four nodes, then replication is the only data
protection mechanism open to you.
Also, for small objects, the overhead of splitting a small object into a number of
fragments, creating coding fragments, and then storing that on a file system, using
an 8K block size, would waste not only CPU resources, but be significantly more
capacity on disk than a comparative replication factor.
For example, a single 4K object being stored using EC4 plus 2 would require 6
fragments.
Each fragment would be 1K.
However, as the block size on disk is 8K, it would require 6x8K to store that data,
or 48K.
For a comparative replication factor that would have the same tolerance to failure
as EC4 plus 2 would be to store 3 copies, or Rf3.
Each 4K copy would be stored using an 8K block size on disk, or 24K in total.
As you can see here, we are using half the amount of storage required using
replication than erasure coating.
Where really, erasure coating is there to provide the exact opposite, i.e. more
efficient use of storage capacity, then replication.
We support multiple data protection and distribution options, all within a single
hyperstore cluster.
These are shown here, and are replication or erasure coating in a single DC.
Replication copies across DCs and replicating erasure coated data between 1 and
more DCs.
And lastly, the ability to create a single erasure coating scheme across 3 or more
DCs.
However, for this last option, very low latency network links are required between
all of the DCs, as it is adequate bandwidth to be able to support the amount of
data being stored.
Caphere of states that, in an object store, of the 3 properties, consistency,
availability, and partition tolerance, you can have 2.
Cassandra is regarded as an AP database being designed around availability and
partition tolerance, and Redis is regarded as a CP database, being designed to
provide consistency and partition tolerance.
Cloudy and hyperstore is designed to use a combination of data protection and
consistency levels to support and provide all 3 capabilities.
Data consistency refers to how up-to-date and in-sync. Data is in a distributed
system.
Strong consistency ensures that all data returned back to the client is the same as
the data that was stored and returned in the same order.
It is similar to synchronous replication, which is utilized in block and file
storage.
Eventual consistency or rights does not guarantee that all copies or fragments, as
defined in the storage policy, are written before success if given back to the
client.
However, it does guarantee a majority or replicas, or at least one parity
protection fragment, is written so that data is not stored unprotected and will
eventually write all replicas or fragments as soon as it is able.
The following slide shows the difference between right consistency of all versus a
right consistency of quorum. With all, all rights must complete before the right is
acknowledged back to the client.
This is however at the expense of availability as any failure of node where the
right was destined would result in a fail right back to the client. With quorum as
long as a majority of rights, more than half, have been successful, then the right
will be acknowledged back to the client at that point.
The system will still continue to write remaining copies and fragments if it is
able to, but the performance benefit, although small, is still seen as the
acknowledgement is received back to the client sooner.
Consistency level is equally important in a multi-DC environment due to the risk of
higher latency and or network response across geographic distances, choosing a
consistency level of all where data has to be written in both locations before
acknowledging to the client does introduce a higher possibility of fail read-oin-
write.
Consideration also needs to be given in any consistency level where the data
between locations is not checked for consistency.
It is possible, where a local consistency level is employed that all data might be
returned back to the client, so this has to be seriously considered before doing
so.
We have mentioned erasure coding before, but let's now look a bit deeper into what
erasure coding is and how it provides efficiency and tolerance to failure.
Erasure coding is defined as a number of data disks and a number of parity disks.
The data disks we refer to as the letter K and the parity or coding disks we refer
to the letter M.
The data disks we refer to as a strip with a combination of strips referred to as a
stripe. Erasure coding schemes are given in the form of K plus M.
Our implementation of erasure coding is based on a conchirid Solomon error
correction, which gives better utilization on usable storage versus replicas.
The M figure is how we refer to the tolerance to failure, while it's still being
able to read or regenerate the data.
As each disk is always on a different node from other strips, the plus number is
the tolerance to failure.
So, 4 plus 2 can survive 2 disks or node failures. 6 plus 3 can survive 3, and 8
plus 4 can survive 4.
Let's review some of the data protection mechanisms available.
Replication factor is multiple copies of the object with usable storage being
defined by raw terabytes divided by the RF number.
We can create policies which are limited to a single site.
The storage overhead here is calculated as K divided by K plus M times the raw
terabyte. In multi-DC environments, we can employ replicated erasure coding.
With replicated EC, the storage policy only allows for one erasure coding scheme.
So, all DCs configured within that storage policy have to have the same erasure
coding scheme.
You cannot mix erasure coding schemes in the same replicated erasure coding storage
policy.
The storage overhead here would be the same calculation as for the single site EC,
but divided by the number of DCs involved in the storage policy.
And lastly, distributed EC. This allows for the usage of a single erasure coding
scheme, but is spread across 3 or more DCs.
This allows for a complete data center failure while still ensuring protection of
the data.
This also maintains a good level of efficiency as we are not losing efficiency by
replication.
The efficiency loss across multiple sites.
This data protection method however does require very low latency networks between
sites.
Please note the low latency.
Although high bandwidth is also important, it is the network latency which is
critical here in distributed EC.
In storage policies involving multiple DCs, there are some consideration when it
comes to applying a consistency level.
These can be seen here and are referred to as local or each.
Local must be able to be written wherever the coordinator node is for the request.
Each must be able to be written in each of the DCs to be successful.
This is further extended by whether the consistency level is for all fragments K
plus N or only quorum K plus 1.
This applies to writes only for reads only K number of fragments need to be read to
be successful.
In all cases K plus N fragments will eventually be written to each DC.
As mentioned, distributed EC can only be employed in three or more DCs and some of
the recommended EC policies can be seen here for the number of participating DCs.
The design rules which must be applied are a minimum of three DCs, nine nodes, and
a five plus four EC strategy.
K plus M total is evenly divisible by number of DCs and losing a DC means that
fragments still available is equal to or more than K plus 1.
The following examples show how distributed EC works in a multi DC configuration.
Consider a three DC configuration with 12 nodes, three nodes in each DC using an
erasure coding scheme of seven plus five.
In the event of a DC failure you would lose four of your 12 nodes leaving eight
nodes remaining.
With an erasure coding scheme of seven plus five you need 12 nodes to store the
seven data parts K and five coding parts M as discussed earlier.
You need to have K fragments available to be able to read or regenerate the data
and K plus one nodes available for a successful write.
Assuming we are using a consistency level of quorum we have eight remaining nodes
which is more than the required seven for reads and meets the minimum K plus one
for writes.
So we can lose a DC and still operate with no impact to service.
If we consider the bottom example and although this configuration would not be
valid and you would be unable to configure such a cluster we use this to show the
impact in this example.
We have 16 nodes across four DCs with each DC having four nodes.
We are using an EC scheme of 12 plus four.
So all nodes in the cluster will have one fragment of the stored object.
If we lose a DC we have 12 remaining nodes.
In this instance we would still be able to read our data as we need K parts which
in a 12 plus four scheme is 12.
And we have 12 remaining nodes.
However we would not be able to write data as we would need K plus one for 13 nodes
for a successful write.
However we only have 12.
Cassandra is the database where we store our metadata.
Cassandra can only store copies of the data and does not itself understand or have
a concept of a RACER code A.
With a data protection scheme using replication the number of metadata copies we
store is equal to the number of data copies.
So a policy of RF3 would store three copies of the data and three copies of the
metadata.
But how do we know how many copies of metadata to store when using an erasure
coding policy?
We use a formula to determine the number of metadata copies which is 2M plus one.
Or in other words twice the number of coding parts plus one.
In this example of an EC scheme of three plus two which is the minimum EC scheme we
can support.
We store three data parts K and two coding or parity parts M.
And two times two plus one or five metadata copies.
This is explained further in this slide and shows the difference between an all
versus a quorum consistency level.
Metadata is also subject to the consistency level.
Therefore if we were using a consistency level of all then all K plus M parts would
need to be written as well as all five metadata copies.
If quorum then only K plus one data encoding parts would need to be written and the
majority of metadata copies written.
The majority of five is of course three. This is just one more than half rounded
down.