Lec 12
Lec 12
Lecture 12
Design of Key Value Stores
Now as far as in today's workload, when we see that, whether this kind of our DBMS ,that is the structure
data can be further utilized or not now to answer this question let us understand the today's workload. So,
there is a mismatch in today's workload with the existing our DBMS or a structural data system in the
following manner, first is that the first mismatch is about the data. So, in today's workload the data is
characterized by a very large, volume and also the data is unstructured, meaning to say that when it
becomes unstructured, then it cannot fit in any of the schema, which is a predefined. And second thing is
the data volume is too large it cannot fit in to several even tables which can be stored on a one computer
system. Now another mismatch in today's workload, we can think of in the terms of read and writes,
which are in large number of random, read and write operations coming from millions of clients. So, how
are you going to handle this large, number of queries that is the read and write operations, third thing is
about today's workload, is that which differs from the previous our DBMS systems which were handling
the workload, is that these workloads have the right heavy operations. So, most of the time, these
workloads requires the right operations, with compared to the, the lesser number of read operation but
read and write put together are much larger in the volume compared to the previous our DBMS s hence,
this is known as the write heavy workloads.
Four thing is that, in this particular workloads we are not going to handle we are not going to use this
foreign key and this foreign key is rarely used. So, foreign key is rarely used in today's, workload, also we
see that this joint operation, also is becoming infrequent in today's workload, therefore the foreign key and
the joint operation, is not very much used in today's workloads. So, therefore we are going to design or we
are going to design a database management system, which is going to handle today's workload which has
these following characteristic or the requirements to handle the large volume, of unstructured, data which
cannot be specified in schema. Second thing is that it is the right heavy, workloads light of right
operations are too many numbers and finally the foreign key and joint operations are not very required in
the today's workload. So, let us see with this particular requirement how we are going to design a new
database system, which is going to cater to these today's workloads and how our why and we are going to
provide how we are going to provide the key value store for the Big Data system.
So, therefore in today's workload, what is needed is the speed? In which these write heavy workloads are
to be catered. So, that means a lightning-fast, writes has to be supported, in today's workload. Second
point is that we have to avoid, in today's workload the single point of failure, that means the data that
means we are not going to be affected, by the failure of a node, in this kind of system, that is called,
‘Fault-Tolerance’. And availability third important criteria is about low cost of operation and total cost of
ownership and fewer system administrators are required to manage, this entire big data system and also it
will be handling able to handle the incremental scalability, that means to support the scale out. So, let us
understand the scalability aspect scalability by scalability we mean that, as the data volume grows, we
keep on adding more system and therefore the capacity of that handling of storage of a big data system
automatically increases, without that is called scalability, scalability means we can keep on adding, more
nodes or a more computer systems, to scale linearly, the performance, of a storage system, this is called as
the, ‘Scalability’, this technique of adding the computer, without replacing with the new one is called a,
‘Scale Out’.
Let us understand about, what you mean by a scale out? And which is supported, in the new today's
workload system, to handle the big data system. Now is scale out means, that it will support to increment,
incrementally grow, your cluster capacity by adding more, component of shelf systems, that means this
way of, of achieving the scalability becomes, cheaper why because we are not going to replace with the
costlier system, we keep on adding more number of systems, that is called,’ Scaling Out’, and also over
the long duration, we have to face in a new phase, in a new phase a few newer faster machines as you
phase out a few older machines. So, that is called a, ‘Scale Out’. So, hence this becomes a cheaper way of
scaling up the, the entire system. And that is how we are going to build the cluster systems, which is being
supported by a scale out technology, this particular is killing out is supported by oil is being used by many
companies, which runs the data centers and the cloud today, in contrast to scale out there is a scaling up
scaling up, means the traditional computer system, we are going to replace with a high capacity powerful
machines, by to increase the, the capacity of the system in terms of memory and the processing
capabilities and so on. So, this is a very costly affair, by providing the scalability which is called, ‘Scaling
up’ by replacing with a very powerful machines, that means the old machines, need to be replaced with
the newer machines this becomes a costly affair, in compared to the two the scale out.