Lec14 Merged
Lec14 Merged
Lecture – 14
Introduction to Map Reduce
Hello, so we will continue our discussion on cloud computing. As in our previous lecture
we discussed about data store or data how to manage data in cloud having an overview
of the things. Now, we like to see that another programming paradigm which is call
MapReduce right a very popular programming paradigm which is primarily level out by
Google, but now being used for different scientific purposes. So, Google primarily
developed it for their large scale searches search engines primarily to search on huge
amount of volumes of documents which their Google search engines chants, but it
becomes a important paradigm programming paradigm for this scientific world to work
on to exploit this philosophy to efficiently execute for different type of scientific
problems.
(Refer Slide Time: 01:30)
So, map reduce is a programming model developed at Google, primarily objective was to
implement large scale search, text processing on massively scalable web data stored in
using big table or and GFS distributed file system. So, as we obtained that big data and
GFS distributed file systems the data are stored. So, how to process this massively
scalable web data that means, the huge volume of data are coming into play. Design for
processing and generating large volumes of data via massively parallel computation
utilizing tens of thousands of processor at a time.
So, I have a large pool of processors a huge pool of data and I want to do some analysis
out of it. So, how can I do it? So, one very popular problem what we see is that if I have
a huge volume of data and number of processors then how do say want to do some sort
of word counting or counting the frequency of some of the words in that huge volume of
data like I want to find out that how many times IIT Kharagpur appears in this a huge
chunk of data, which are primarily stored in this HDFS or GFS or big table type of
architecture.
So, and it should be fault tolerant, ensure progress of the computation even if processor
fails and network fails right. So, because as there are huge volume huge number of
processors and say underlining networks, so I do ensure fault tolerant. So, one of the
example is Hadoop open source implementation of MapReduce developed at time
volume had over initially developed at Yahoo and then became a open source. Available
in a pre packaged AMIs on Amazon EC 2 platform, right. So, we are what we are
looking at is trying to give a programming provide a programming platform or
programming paradigm which can interact with data basis which are stored in this sort of
cloud data stores right, it can be HDFS, GFS type or managed by big table and so on so
forth.
So, if we look at again parallel computing as we have seen in our previous lectures, so
different models of parallel computing it depends on the nature and evolution of the
processor, multiprocessor computer architecture. So, it is shared memory model,
distributed memory model, so these are the 2 popular thing. So, parallel computing
developed for computing, intensive scientific tasks as we all know; later found
application in data base arena or data base paradigm also, right.
So, it was initially it is more of a doing a huge scientific task and later we have seen that
it has a lot of application in the database domain too. And we have seen in our earlier
lecture that we have three type of scenario one is shared memory, shared disk and shared
nothing, right. So, whenever we want to do a programming paradigm or work on
something which can work on this sort of parallel programming paradigm where the data
stored in the different this sort of clouds storages, so we need to take care of that what
sort of mechanism is there. Like, whether it is shared memory, shared disk or shared
nothing type of configuration.
(Refer Slide Time: 05:28)
This is the picture already we have seen in our earlier lectures, so we do not want to
repeat. So, it is a shared memory structure, shared disk and shared nothing, but the
perspective we are looking at now is little different. There it is more of the storage where
we are looking at. Now, we are trying to look at that how the programming can exploit
this type of structure.
So, this is already we have seen; so, shared memory suitable for servers with multiple
CPUs. Shared nothing cluster of independent server each with its own hard disk, so
connected by a high-speed network. And shared disk, so it is a hybrid architecture
independent server cluster shares storage through a high speed network storage like NAS
or SAN. Clusters are connected via to storage via standard Ethernet, fast fiber channel
infini-band and so on and so forth.
So, first of all they should be inherent parallel it should not be a sequence of operations
and then you try to do a parallel. So, it is job 1, job 2, job 3, job 4 a sequence is there or
in between some parallism is there, but if you want to make a parallel operation, there
may not be.
So, what is happening when more data is coming, so you go on deploying more
processor or you go on requesting from the cloud more processor and then your
efficiency remains constant, the efficiency values does not change? So, then what we say
that it is scalable. So, that if I increase both say for example, for linearly then it goes on
in the constant thing. Parallel efficiency increases with the size of the data for a fixed
number of processor, it increases with the size of the data; and if it is a fixed number of
processor then we can have effectively more efficiency.
(Refer Slide Time: 09:46)
Now, the example, which is there in that book you are referring also you will find the
example in different literature, this sort of example not the same. Consider a very large
collection of documents say the web document crawled by the entire internet. So, it is a
pretty large it is large every day it is growing. The problem is to determine the frequency
that is total number of occurrences of each word in this collection right. So, I want to
determine the total number of what is the frequency of occurrences of each word in this
document d. So, thus if there are n documents and m distinct words, we use to determine
m frequency one for each word right. So, this is a simple problem may be true or may be
more relevant for search engines and type of things.
So, we have two approaches let each processor compute the frequency for m/ p words.
So, each processors if there are p processors, if the m frequencies I need to calculate, I
divide m by p, so many, so for example, I want to look for I have ten processors and I
have no I am look for some 90 words, m equal to ninety. So, every processor does it
chunk of ten right roughly if it is not divisible then you have to make some asymmetric
division. So, it makes things and that at the end of things they again report the things
together or in some through some system. So, other way let each processor compute the
frequency of m words across n by p documents.
So, total number of documents say 10,000. So, 10,000 number of words I am looking for
90 number of processor I am having 10. So, one is 90 by 10 is the 9 words on an average
given to the every processor and they count on the things.
Other thing what we are telling that each processor compute the frequency for all the 90
words, but on n by p document if that 10,000 words and then p 10 processors; so, some
thousand document so each take 100 documents and do the processing and once that
frequency of this m words by individual professors processors come out then I sum up
this thing and aggregate and show the result that this is the thing right, by followed by all
processors summing their results right. Parallel computing, now which one will be
efficient based on this parallel computing paradigm, we need to look at right. So, parallel
computing is implemented as a distributed memory model with a shared disk, so that
each processor is able to access any document from the disk in parallel with no
contention. So, this can be one of the implementation mechanisms.
Now, time to read each word from the document say if let us assume that time to read
each word from the document equal to time to send the word to another processor via
inter processor communication and equals to c. So, making thing simple so it may be it
should be means ideally in ideal case or in a real life case it will be different, but we
make these scenarios. So, first approach so time for so time to add to a running total of
the frequencies negligible, so summing up is negligible. Once I find the frequencies of
this m word then summing up is negligible.
Each what word occurs f times on the document on an average. So, if I for our
calculation sake that each word that on an average, workers some f time. Time for
compute all m frequencies with the single processor equal to then I have n m f c .
So, this is the time to compute m frequencies with a single processor, if I have a single
process this could have been the thing. So, if we do the first approach, first approach was
this one, let each processor compute the frequency of m/ p words, so that is a first
approach.
nm
So, each processor reads at most . So, parallel efficiency is calculated as
p f
n m f c
, so 1 by p very vanilla type consideration. So, we take that all are doing
p n m f c
all are morally same frequencies, all are negligible time for the any aggregation then the
all time for the means read and write another operations we have to consider c,
1
considering this we are getting . So, efficiency falls with increasing p. So, if we
p
increase the p, then the efficiency falls. So, it is not constant. So, it is not scalable, it is
one of the major problem is that though it is what we say easy to conceptualize etcetera,
but there is a problem in the scalability so of the things. This one that let each processor
compute frequencies per n by m words n by m words is not scalable.
(Refer Slide Time: 15:36)
Whereas, in the second approach, where that m words we divide into the different
processes oh sorry we divide that document d, whereas every processor compute this for
all the m words and then aggregate. So, apparently what it looks that this could me more
costly. So, it is there is a aggregation thing then you are doing clubbing those processor,
club means dividing the m set into different this whole documents set into different
partitions and doing that, this could be in efficient than the first one. But let us see what
is there. So, the number of read performs for each processor is n / p m f right the time
taken to read is n / p m f . It is because you are having n / p amount of volume of the
data and then want to calculate for m f c , so that number of time taken to calculate
this read. Time taken to write partial frequency on of m words in parallel to disk is
m.(c m) .
So, once you are done you need to write on the parallel to the disk and that is that comes
to be (c m) time taken to communicate partial frequency right to p 1 processors. And
then locally adding sub p sub vectors to generate 1/ p of the final m vector of frequencies
m
then what we have p c . So, what you need to do we are time taken to communicate
p
partial frequencies right because you do not have the whole frequencies. So, partial
frequency by different processor and p 1 processor and then locally adding p sub
vectors to generate 1 by p here of the final m vector frequencies is this one. So,
individually need to do.
So, if we adopt all those things in case of this second approach what we have this parallel
1
frequency as this structure, so , so that is if you if you look at it little minutely
1 2 p nf
if you consult the book, it is not a very difficult problem difficult to deduce. It is pretty
easy just have to go by step by step. Now, this is an interesting phenomena. So, the term
we are having here is 1 2 p nf . So, in this case a p is many, many times less than nf.
Efficiency of the second approach is higher than that of the first right here if it is p is
many, many times less than nf, then this term that this will be tending towards one. And
it can be seemed that there is much efficiency is much higher.
In the first approach, so there is a type it should be, let us in the in first approach each
processor is reading many words than it needs to read resulting in wastage of time. What
we have done in the first approach this many processor we have divided this m into
different chunk. So, the processor say as we as we have taken the example that if I am
having m as ninety and number of processor is p, so 90 by p is 10. So, everybody is
getting 10, but when it is searching the whole document, so number of documents is
reading where there is no hit, it is no success.
So, efficiency, so in the second approach every read is useful right. As it results in a
computation and distributes to the final results. So, for in the second approach, every
read is likely to be useful where it contribute to this result. So, it is scalable also. The
efficiency remains constant at both n and p increases potentially, they proportionally. So,
what we see what we have done there that if my data load increases I will increase the
processor. So, if I proportionally increase the data processor then my efficiency remains
constant in this case in the second case. Efficiency tends to one for fixed p and gradually
increasing n. So, efficiency tends to 1, if the number of processor is fixed and gradually
increased we are increasing n that means we are increasing the data load, number of
processor fixed and it will basically approaches one.
So, with these context or with these background of that which can be that this doing that
individually then aggregating is becoming more efficient with this things, we look at that
your map reduce model. So, it is a parallel programming abstraction used by many
different parallel applications which carry out large scale computations involving
thousands of processors; leverages a common underlining fault tolerant implementation.
Two phases of map reduce map operation and reduce operation. A configurable number
of M mapper - mapper processor and R reducer processors are assigned to work on the
problem. Computation is coordinated by a single master process. So, what we are having
now? There are different mapper processors like and there is a different reducer
processor. So, whole process, I divide into two things.
(Refer Slide Time: 22:16)
Like I have a mapper, so different mapper processor, so there are M processor and there
is reducer. So, there are different reducer processor. So, what we does it when the data
come here it basically do some execution and then this reducer may be based on the type
of problem it will go on different reduce things and do the execution. So, reducer will
generate is more of aggregated results right. So, what it tries to do it is a parallel
programming abstraction used by mineral parallel applications which carryout large
scale computation involving thousands of processors. So, here the application come into
play. So, it is a two phase process, one is a map operation, another is a reduce operation.
So, that the configurable number of M mapper processor, R reducer processors, so it is
configurable; that means, you can have more etcetera mapper and reducer.
(Refer Slide Time: 23:28)
So, map reduce phase. So, if we look at the map phase each mapper read approximately
1
M of the input from the global file. So, it is not the whole data d, but a chunk of the
data read. Map operation consists of transforming one set of value key value pair to
another set of key value pair. So, what map does, it is a one set of key value pair to
another set of key value pair. So, map (k1.v1 ) [(k2 .v2 )] . So, each mapper writes
computational results in one file per reducer. So, what it does, it basically for every
reducer it produces a file. So, it says if there are reducers R 1, R 2, R 3 a mapper m, I
create three files based on the corresponding the reducer. So, the files are sorted by a key
and stored in a local file systems right. The master keeps tracks of the location of these
files. So, there is a master map reduce master, so which takes care of this location of the
file, each mapper produces a one file for every reducers and the master takes care where
the files are stored in the local disk etcetera.
(Refer Slide Time: 24:55)
In the reduce phase, the master informs the reducers where the partial computation have
been stored on local file systems of respective mappers; so that means, in the reducer
phase the reducer consult this master which informs that where its related files are stored
corresponding to the every mapper functions. Reducer makes remote procedure call to
the mappers to fetch the files. So, reducer in turn make a remote procedure call for the
mapper. So, mapper it is somewhere in the disk and the reducer there may be in different
structure with different types of VMs etcetera running on the things ideally it is not far
not geographically distributed then the things will not work. So, nevertheless it is
working on that particular data which are produced by the mapper.
So, each reducer groups the results of the map step using the same key value key value
function f etcetera, so (k2 .[v2 ]) (k2 . f ([v2 ])) . So, here the aggregated functions in
comes into play. In other sense, if we remember our problem. So, what we do that every
doc, every key or every word we want to calculate the frequency, so the functional model
is summing up the frequencies of the things, it can be different for different type of
things. So, it does a k2v etcetera. So, it goes for another key value up here. Final results
are return back to the GFS file system Google file system.
(Refer Slide Time: 26:36)
So, map reduce example. So, if we see there are 3 mapper, 2 reducer. So, map function in
this in our case is that is the data d there are the set of word w1, w2, wn and it produce for
every wi the count of the things, how much count the portion of the mapper it is having.
So, every wi, it counts the thing. So, if you see if d1, it has w1, w2, w4; d2 these are the
things and it counts this. So, every mapper does it, and then it basically stored in a
intermediate space where the reducer reads. So, it generates every file for every reducer
like this particular things is generate a particular file for a reducer. So, there are two
reducer.
So, for two reducer every mapper generates the file. So, and the reducer in turns
basically accumulate those. So, it says that w it has the thing w1, w2, so w1 as 7, w2 as
something 15. In this case, w3, w4 are the other two. So, the reducers reduces the thing
from the inputs of the or from the outputs of the mapper getting the input from the
mappers output.
(Refer Slide Time: 28:08)
So, map reduce model is fault tolerance; there are different way to look at it, one is heart
beat message. So, every particular time period, it says that whether it is a live and type of
things. Communication exists, but no progress master if there are communication exists,
but no progress master duplicate those tasks and assign the processor who are already
completed or some free processors. If the mapper fails, the mapper reassigns key value
designated to it to another work node on the re-execution. So, if it is a failure then it re-
execute the thing. If the reducer fails only the remaining task need to be reassigned to
another node. Since the completed tasks are already written back to Google file system.
So, if the completed tasks are there, they are already in Google file systems only the
remaining tasks need to be reassigned.
(Refer Slide Time: 29:04)
So, if you want to calculate the efficiency of the MapReduce, so the general computation
task on a volume of data D. So, takes w D time to uni-processor read time to read data
from disk performing computation write back to the disk. Time to read write one word
from to disk is c. Now, the computation task is decomposed into map reduce stages like
map stage mapping time cm D data producing and output D , reduce stage reduce time
crD and data produced at the output is D . So, this is not that difficult. So, mapping
time how much that with D every mapper is doing data produced time is from the
particular mapper which is how much time it is producing reduce reducers time in
calculated with the every cr and that finally, we have that reducer output.
(Refer Slide Time: 30:11)
So, considering no overheads in decomposing the task into map and reduce stages, we
can have the following relationship. So, if we forget the overhead in decomposing in
mapping and reducing, so we can have this summation of the things. Now, if we had P
processors that serve as both mapper and reducer right irrespective of the phases to solve
problem. So, if we use P processor sometimes it acts as a mapper, sometimes act as a
reducer. Then we have additional overhead each mapper writes to local we have some
additional overheads writes to local disk followed by each reducer remotely reading to
the disk. For analysis purpose time to read to a word locally or remotely, let us consider
as same. Time to read a data from the disk is for each mapper is wD by number of with
an if the number of processor is P wD P data producer is mapper is D P .
(Refer Slide Time: 31:12)
So, time required to write back to the disk because once you read then you have to after
computation, you have to write back to that is that much. So, similarly data read by each
D
reducer from its partition to each mappers P mappers are D P P . So, . So, if we
P2
calculate like that we say that the parallel efficiency of the map reduce implementation
1
comes as this one, .
2c
1
w
So, there are applications in relational operations using map reduce. Execute SQL
statements relational join, group by on large set of data. Advantages of parallel data base
large scale fault tolerance we want to exploit and I can have those type of function like as
we have seen that it is a group by clause and type of etcetera we can do, so that some sort
of relational operations we can execute.
So, with these we come to this end of today’s talk. So, what we try to do here to give you
a overview of a MapReduce paradigm that how a problem can be divided into a set of
parallel executions, which is a mapper node which creates intermediate results. And
there is a set of reducer nodes which takes this data and create the final results right. And
what we can which there are some of the things which is interesting that the mapper
creates data file for every reducer. So, it is the data is created per reducer. So, the reducer
knows that where the data is there.
Over and above there is a master controller or the map reducer master things which come
to which knows where there things where the data is stored by the mapper and how the
reducer will read. Not only that if the mapper node fails how to reallocate the things; if
the reducer node fails, how to reallocate because the things or the reallocate the not
executed data not executed things not executed yet to be executed operations and so on
and so forth. So, with this we will stop our lecture today.
Thank you.
Cloud Computing
Prof. Soumya Kanti Ghosh
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
Lecture - 13
Managing Data
Hello. So, we will continue our discussion on cloud computing. Today, we will discuss
about some aspects of managing data in cloud, right. So, as we understand that in cloud;
as we have discussed in our earlier lectures that in cloud, one of the major aspect is the
data because at the end of the day, your data and even processing applications are in
somebody else’s domain, right. So, they are being executed at somewhere else which is
beyond your direct control. So, it is virtually host in some virtual data; a virtual machine
somewhere in the cloud. So, it becomes tricky to on the security point of view that we
have discussed; not only that if you look at from the other point of view; so, from the
clouds provider point of view, managing huge volume of data keeping their replicas and
making them queriable and these becomes a again a major issue.
So, all our conventional relational or object oriented model may not directly fit into the
thing, right. So, long you are doing on a small instances experimental some database
application or some small experimentations, then it is fine, but when you have a large
scale thing where huge amount of read write going on or the volume of data is much
much higher than the normal operations, then it is; we need to look in a different way.
These are the things which come into not only for the cloud, it was there a little earlier
also; like how this parallel database accesses; parallel database execution; read-write
execute operations can be done. So, those things become more prominent or a de facto
mechanisms; when we talk about in context of cloud. So, what we will try to do is more
of a overview of how data can be managed in cloud or what are the difference strategies
or schemes people or this ISPs follows and it is not exactly the security point of view; it
is more of a management data management point of view, right.
(Refer Slide Time: 03:02)
So, we will talk about a little bit of relational database already known to you then what
you known to do that scalable data bases or data services like one of the couple of things
are important one is Google file system big table and there is a Mapreduce parallel
programming paradigm; those are the things which comes in back to back, when we are
doing to the things. So, what we want to do when we were we are managing anything on
a cloud platform; whether it is application or data we want to make it scalable in the
sense the it suites scale as the requirement goes up. So, scale-up scale-down in a
ubiquitous way or minimum interference from the; or minimum human or management
interference. So, that type of infrastructure; we want to come up with, right, it is true for
data also.
(Refer Slide Time: 04:09)
So, these are primarily suitable for large volume of massively parallel text processing,
right that is one of the major thing or it is suitable for environment say enterprise
analytics, right, I want to have a; if we want to do analytics on a distributed data stores,
right, it may be a chain of a shopping or commercial staff or it may be a banking
organization or financial any financial organization, even it is something to do with large
volume of other type of data like it metrological data, it maybe climatological data
something which need to be chant or has a distributed things, I need to do some parallel
processing down the line where the actual effect comes into play. If you have a simple
database with a simple instant, then you may not have gone to cloud for that; right. So, it
may be a simple system or you buy a very a VM and work on it then the actual effect of
cloud things are actual advantages of cloud you are not taking out.
So, we will see that similar to big table models there are Google app engines datastore,
Amazon simple DB which are which different provides provide in different flavor, but
the basic philosophy are same.
(Refer Slide Time: 05:46)
So, if we look quickly look at the relational data base which is known to all of you or
most of you users application programs interact with RDBMs through SQL, right. So, it
is the structured query language or SQL by which we interact with the user programs,
etcetera.
So, there is a relational database management parser which transforms queries into
memory and disk label operations and optimize the execution time. So, in any query, we
need to optimize the execution time of the query, right. So, if it is a large data base like
you, whether you do project before select join before or after select that makes a lot of
difference; though the query may be same the query output will be same, but the
execution time may vary to a great extent, right, like I have a huge 2 data bases like R1
say relational databases R1, R2 and I do some projection or selection of some of the
things, right I select A1, A2 and then do a; then do the; join whether I do the join before
or after makes the things like suppose; if I do the select on R1; the number of tuples
come down from 1 million to say few 1000s. Similarly for R2, if I do a select on that;
right. So, then joining is much less costlier. So, whether you do the join first or it said
that becomes a thing that is a database optimization problem nothing to do specifically
for cloud, but relational database allows you to optimize those things.
Disk space management layer, this another property that stores data records on pages of
contiguous memory block. So, that the disk movement is minimized pages are fetched
from the disk into memory as requested state using pre fetching and page replacement
policies. So, this is another aspects of the things like one is looking at that property
making it more efficient in the query processing, other aspect it make it more efficient in
storage terms of things like nearby things if the query requires the some 5 tables if they
are nearby store then the access rate is high. So, database file system layer.
So, previously we have seen that RDBM parser then disk space management layer then
database file system layer. So, it is independent of OS file system, it is a separate file
system. So, it is in order to have full control on retaining or realizing the page in the
memory, files used by the DB or database may span multiple disk to handle large
storages, right.
So, in other sense like if I dependent on the operating system for phase all those things
then it is fine when your again database load is less if it is pretty large then the number of
hope you take it text it becomes costly. So, what you need to do we need to do directly
interact at the at the much lower level with the with the hardware or the available
resources and that exactly this database file system layer tries to immolate uses parallel
IO like we have heard about Raid disk Raid1, Raid2, Raid5, Raid 6eN type of things
arrays or multiple clusters. So, which keeps a redundant redundancy into the thing. So,
the your this failure down time is much less so; that means, is it is basically full failure
proof implementation of the database.
(Refer Slide Time: 09:42)
So, usually the databases storage as row oriented that is we had tuples and its a set of row
of the same schema optimal for write oriented operation the transaction processing
applications, relational records stored in contiguous disk pages access through indexes
primary key on specific columns, B plus tree is one of the favorite storage mechanisms
for this sort of thing. Column oriented efficient for data warehouse workloads right. So,
those who have gone through data warehouses. So, it is a high dimensional data huge
volume of data and being collected and populated by different things. So, it is more of a
warehouse, rather than a simple database. So, this is this column oriented storage are
more suitable for data warehouse type of loads aggregate of measures where rather than
individual data it is more of the analysis on analytics come into play. So, it is aggregation
of measure columns need to be performed based on the values of the dimension columns.
So, we are not going to the data warehouse. So, it has a different dimension tables and
type of things and we need to the operations are more aggregate operations, right, we
want to do some sort of analysis and type of things.
So, data storage techniques as we have seen; it is B plus tree or join indexes. So, one is
row oriented, other one is column oriented. So, this is row oriented data and this is
column oriented data and we need to have a join index which allows this data to be
linked to one another. So, these all these we will get in any standard database book or in
standard literature; primarily as we are following that Gautam Shroff’s Enterprise cloud
computing book for this particular thing. So, that is why we have mentioned, but this is a
very standard operation and you can get in any standard books.
So, I just see the picture fast then come back. So, this is a typical structure of the shared
memory, right. So, these processors different processors shared the memory, here it is a
shared disk. So, different processors shared the disk, here we have shared nothing. So,
individual processor has individual disk; so, in case of a shared memory suitable for
servers with multiple CPUs. So, if there are multiple CPUs. So, if there are multiple
CPUs memory address space is shared and managed by SMP operating systems like the
memory address. This is shared among these SMPs and schedule processors in parallel
exploiting the processors. So, it schedules small things so; that means, I have a shared
memory space and I basically do a execution in a parallel mode.
So on the extreme other end is shared nothing. So, cluster independent servers with each
of its having own disk space and connected by a network. So, at the with a back bone
high speed network if any server shared its own disk space and then do the rest of the
execution and if we look at that in between the thing is the shared disk like it is a hybrid
architecture. So, to say independent server cluster storage through high speed network
that can be NAS or SAN and clusters are connected to storage data via standard Ethernet
fiber, etcetera what we have shown here. So, it is a shared storage and these different
processor access this. So, based on your application type of parallelisms you need we can
go for any of this structure.
So, here we see that it is more this more efficient if the memory things are more compact
where in the other end we if the processors are individually working on separate data sets
and there are machine to say then this could have been a advantage.
So, if we look at the advantages of parallel DB of relational database, if you do not want
to put that; what are the features of relational parallel database structures which is more
advantages for parallel this sort of operations, then the relational database efficient
execution of SQL query by exploiting multiple processors, for shared nothing
architecture tables partition and distributed across possessing table, right. So, happened
that I can partition the table and every the data accountant in the table can be executed
parallely they can be distributed in the different days and the processor can work that
totally depends on your; what is your working mechanisms out there.
So, SQL optimizer handles this distributed joint. So, whenever we need to do some join
then we need to fall on the; distribute your SQL optimizer. So, distributed 2 phase
commit locking for transaction isolation between the processors. So, these are the some
of the features, fault tolerant like system failures handled by transferring control to
standby system. So, I can have different standby system or some with some protocol or
some policy and then if there is a failure, then I can shift that particular execution to
some of the standby system. So, that is possible in this sight of things and restoring
computation for data though these are the things which are more required for data
warehouse type of applications.
So, there are examples of databases capable of handling parallel processing traditional
transaction processing things are oracle, DB2, SQL server data warehouse application
are some of the Vertica, Teradata, Netezza; these are the some of the things which are
more of a data warehouse type of database. Now with these background or with these
things in our in our store what we say we look at that cloud file system.
(Refer Slide Time: 16:50)
Now, as we understand it will not go something become totally we cannot through the
whole thing out of the thing and start doing something new because this database has
grown; they are fault tolerant, they are efficient we have raids and type of things we need
to exploit some of the things and put some more philosophy of which behind the cloud.
So, one of the predominant thing is cloud file Google file system was GFS and back to
back; we have a open source stuff called HDFS; Hadoop distributed file system. So,
which is what we say someone to one mechanism set Google file system. So, Google file
system, design to manage relatively large files using a very large distributed clusters of
commodity servers connected by high speed things. So, it is whether GFS or HDFS, they
are enable to work on very large data files which are distributed over this commodity
servers; typically some of the things are Linux servers which are interconnected through
a very high speed line.
So, they can handle failure even during read write of individual files, right, during the
read-write operation if there is a failure it can handled. Fault tolerant it is definitely a
necessity. So, if we have any that is any simple system term that P(system failure)
probability of system failure is 1-(1-P(component failure))N. So, for if the N is pretty
large, then you can say that we can go for that is the risk of this failure is minimum. So,
supports parallel reads writes appends multiple simultaneous client program. So, it is
parallel read parallel write and update by the client program and we have HDFS that is
Hadoop distributed file system which is open source implementation of GFS architecture
available on Amazon EC2 cloud platform from. So, we have HDFS which is there.
So, if we have a big picture. So, that how a typical GFS are there. So, there are some of
the components are there is master or the name nodes master node in GFS or name node
is HDFS and there are client applications and we have different chunk server in case of
GFS and data nodes in the case of HDFS in a typical cloud environment. So, single
master controls the namespace.
So, if you look at the read operation in GFS, client program sends the full path offset of a
file to the master, right where it wants to read or name node in case of HDFS. So, we
will refer the GFS master node and which is back to back when we it is refer to the name
node in HDFS master replies on meta-data for one of the replicas of the chunk where
these data is found, right, client caches the meta-data for faster access. It reads the data
from the designated chunk server. So, master from the master; it gets that and gets the
mirror this meta-data and from there it basically access this chunk server.
(Refer Slide Time: 22:01)
So, for read operation any of these chunk server or replicated chunk server will do where
write append operation in GFS is little tricky, client program sends a full path of file to
the master GFS on name node HDFS right, the master replies on the meta-data for all
replicas of the chunks where the data is found. The client send data to be appended into
the all chunk servers; chunk server acknowledges the receipt of the data, master
designate one of the chunk server as primary, the primary chunks server appends its copy
of the data into the chunk by offset choosing an offset, right. So, that it do it appending;
appending can also be done beyond end of file to account for the multiple simultaneous,
right.
So, this is a pretty interesting thing that even if you can have append end of EOF beyond
EOF because there are simultaneous writers which are writing and it basically
consolidated at later stage. Sends offset to the replica, if all replica do not success in
writing in the designated offset, the client retries, right. So, the all offset; so, idea is that
whenever I am looking for a data, I need to know that for all the 3 replicas, it should be
at the same offset ideally. So, that I the read processed as there is no delay in that things
because once its calculates it is directly access the other chunks on that offset, right.
(Refer Slide Time: 23:42)
So, fault tolerant in Google file system; the master maintains regular communication
with the chunk server what we say heart beat messages sort of a are you alive type of
thing and in case of a failure chunk server meta-data is updated to reflect failure for
failure of primary chunk server the master assigns a new primary clients occasionally we
will try to this failed we will try to this failed chunk server, update their meta-data from
the master and retry. So, in case of a failure the chunk server meta-data after reflect the
failure. So, the chunk server meta-data says that there is a failure. So, the next time you
do not allocate or like that and for failure of the primary server itself, the master assigns
a new primary. So, it assigns a new primary to work on the thing.
(Refer Slide Time: 24:47)
And update the clients; occasionally we will try to this failed chunk server because it will
be flagged, right. Now another related stuff is big data or related concept of big data,
distributed structure storage 5 system build on GFS, right. So, it is build; it is a structure
distributed structure storage file system it is build on GFS, right. So, data is accessed by
row key, column key, timestamp. So, if you look at. So, it is a multiple instances are
stored. So, there is a time key column key and of course, say row key which says that
where the data is there.
So; that means, it is the chronology is meant it in that fashion. So, it is multiple persons
are stored in a decreasing time stamp.
So, again we see these things. So, there are different tables there are different tablets
which are referred to this table and it is a hierarchical structure and we have a master
server it is primarily a registry or a meta-data repository. So, each table in big data is
split into rangers called tablets, each table is manage by tablet server. So, its stores each
column family for a given row range in a separate distributed file called SS table. So, this
type of management goes into play. So, that my access rate end of the day the access rate
or will be pretty high.
(Refer Slide Time: 27:03)
So, a single meta-data table is maintained by the maintained by the many meta-data
server the meta-data itself can be very large. So, the meta-data while storing this itself
can be very large, in that case; it is again broken down into split into different tablets a
root tablet points to the other meta-data tablets.
So, if the meta-data are repository a pretty large, it is again broken down into different
tablets and there is a root tablet which coordinates with your meta-data; this tablets and
want to real a want to emulate or realize that meta-data services. Supports large parallel
reads and inserts even simultaneously on the same table, insertion done in sorted fashion,
requires more work can be more work than the simple append, right. There is true for a
other databases also because once you insert it is basically you need to push the data
aside and create a insertion point where as in case of a append you are putting data at the
end of the end of that storage or data or the tables.
(Refer Slide Time: 28:22)
So, if you look at the Dynamo architecture. So, it is a key value pair with arbitrary value
key value pair with arbitrary arrays of bytes like it uses MD 5 generates a one twenty
eight bit1hash table hash value.
So, it basically try to map that were virtual node will be mapping to by using this has
function. Range of this has function is mapped as we are discussing that set of virtual
nodes arrange in a ring type of thing. The object is replicated as a primary virtual node as
well N-1 additional virtual nodes, the N is the number of physical nodes. So, that any the
objectives replicated into the things. Each physical nodes are managed is a number of
virtual node at a distributed position on the ring. So, if you look at that this physical node
server they are basically linked with this virtual node server.
(Refer Slide Time: 30:12)
Dynamo architecture, load balancing for transient failure network partition this can
handle write request on object that executed at one of its virtual nodes, right.
Forward all the request to all other nodes; it is executed one of the virtual node and say
in all other all other nodes which have a replicas of the object so; that means, if I am a
object; if it is replicated into another N-1 node. So, one is updated rest are being
communicate. So, there is a quorum protocol that maintains eventual consistency of the
replicas when a large number of concurrent reads and writes going on. So, this quorum
tries to find out that which are the minimum level of replica will be there to handle this
large read write of person.
(Refer Slide Time: 31:02)
So, in next, we are having this dynamo distributed object version right creates a new
version of the objects in his local time stamp created. There are algo for column
consistency.
So, read operation R; write operation E. So, read plus write operation should be greater
than any of the system is quorum consistent there are overheads which will be coming
there is a efficient write large number of replicas are to be read and if it is for a, b, c and
read large number of large number of replicas need to be written. So, these are the 2
things which are they are; so, it is implemented by different storage engines at node level
Berkley DB used by Amazon and can be implemented to using MySQL and etcetera.
Another; the final concept what we are having is the data store. Google and Amazon of a
simple traditional key value pair database stores, right, Google app engines data store in
case of Amazon what we say simple DB; all entities objects in the data store reside on in
one big table, right.
Data store exploit column oriented storage right, data store as I mean store data as a
column families. So, unlike our rational traditional thing is a more of a row family or
tuple based it is called column family.
(Refer Slide Time: 32:30)
So, there are several advantages or several features or characteristics like multiple index
tables are used to support efficient query. Big table horizontally partitioned call sharded
and across the disk whereas, stored lexicographically in the key values other thing.
Beside lexicographic sorting of the data enables there is a execution of prefix and range
queries on key values entities are grouped for transactional purpose because if there is if
when we are having transaction. So, that is a set of entities which are accessed in a more
frequent way and index table to support varied varieties of queries.
So, we can have different indexes or different type of queries. So, it is not we should
understand is not a simple a low a database it is a large database. So, in order to do that; I
cannot churn the whole database. So, need to slice them appropriately. So, that based on
the different variety different queries it can be executed more efficiently.
(Refer Slide Time: 33:37)
And there are few more properties like automatically it creates indexes single property
index or there is a kind index supports the efficient lookup queries of form select all type
of things the configurable in indexes and there is a query execution indexes with highest
selectivity is chosen, right. So, it is when we do the query execution.
So, with this we will stop our discussion here. So, what we tried to discuss over see is
there different aspects we have the notion of our traditional databases which is
established, fault tolerant, efficient and there are different mechanism to do that. So, we
have we have also already this parallel execution things and its present. So, when we
deal with a large volume of data in the cloud which are likely to be there, then what is
are the different aspects we need to look at. So, we may not be able to follow the this
column oriented or tuple oriented relational database we need to a sorry row oriented
database we need to four for column oriented data base and there are different file system
like GFS, HDFS and over that this data store Dynamo and your simple DB and those
things what which are being implemented by various inter cloud service providers CSPs
for efficient storage access, nread write execution of very very large databases.
Thank you.
Cloud Computing
Prof. Soumya Kanti Ghosh
Department of Computer Science and Engineering
Indian Institute of Technology Kharagpur
Lecture – 12
Economics
Hello. So, we will continue our discussion or lectures on cloud computing. Today we
will take up a topic, which is we need to look at that what is this economy behind cloud
computing. Why people will go for this cloud computing type of things right. What it is
not like that, we are getting a something totally new. So, it is only that the different type
of application etc; now we are getting as a service. So, it is what makes this or what will
make this viable. Is it always going to cloud is beneficial or where how to decide that;
whether I need to go to cloud or whether I need to buy something in house and how
much how to balance between in house infrastructure and provisioning of the cloud. So,
in order to do that, we will try to see some basic phenomena or basic economic point
which is makes cloud viable at which type of operations or which type of situation type
of things.
So, we try to more try to look out, that when we organization or individual especially
organizations, one switch over cloud partially or fully, what are the consideration it
should keep in mind. We have seen SLAs they are there are other issues that even during
cloud our initial lectures that there are some limitations, there are some issues with the
cloud, but keeping all that even all those things are running fine. Whether it is
economically to be on cloud always or at times or at what times and type of things and
what should be the business consideration. So, we will have a brief discussion on that.
So, those of you are interested in working on this line can basically now leverage on this
type of work.
(Refer Slide Time: 02:20)
So, from the economic point of view, if we see that what are the different cloud
properties relook at the properties one is the common infrastructure.
At peak time I will be require all those things right. Say for example, IIT kharagpur in a
particular department, a particular lab for a particular year, say MTech first year. The
numbers of seats are say 50 for a particular department and what we expect that 50
percent we will do work on 50 individual systems; we purchase a lab we basically set up
a lab of 50 systems. Adjoining we same time we have to go for power, at the same time
we have to go for AC and we feel that if there is a system goes out and type of things,
keep another 5, 10 percent another 5 system on a standby. So, having 55 systems, but it
may so, happen that the recruitment the number of students joined the course may be less
than 50. So, I have any surplus power even they joined the courses right. This 50 system
are again underutilized, not may not be utilized all the things during the lab hours. May
be daily 8 hours it is utilized otherwise things are not there, right. One is looking at the
peak load.
So, but whenever I consider the system individually. I have taken this processor memory
etcetera thinking, this lab assignment will be leading up to that level so; that means, at
the peak load, right. So, this consideration may be or at many times. I do lot of over
sizing of the things.
And if I have is it becomes more critical, when I have a server, a single server which is
catering to number of users and then I think that all the user will jump into the thing right
and then the peak load will be there right. If those who are working in the networking
you might have seen that typically 10, 24 port networks switch right. How many people
can connect? 24. Even 100 mbps line it is 24 roughly, 24x100 if we not go to the
integrity of this binary thing.
So, it is approximately 2.4 gigabytes, but the switch uplink maybe 1 gigabyte. So, it is
some sort of a blocking architecture, but if I give a uplink of 2.4 or 3 gigabyte then it is
over provisioning, right. I it is all the people are coming statistically at the same peak
time may not be there. So, it statistically we need to look at that whether it is viable to
provision. Such a higher thing when you provision higher, thing it involves lot of costing
and other things in to the things maintenance of the thing, there are things of adjoining
other accessories, etcetera.
Even the cost of the equipment goes up so and so far. So, that is one thing, another is the
location independence that is another property of cloud like ubiquitously available
meeting performance requirement with benefits deriving from latency reduction and user
experience in enhancement. So, this is a location independence property. So, whether it
is economically, how to make use of that online connectivity as an enabler of other
attributes ensuring a service cost and performance impact on the network architecture
etcetera. So, I should have online always connectivity. So, there is another factor. So,
these are the different factors which may force us to look at the different things.
(Refer Slide Time: 06:55)
There are other two direct economic factors, one is the utility pricing right. I pay as you
go model, I pay per unit things like electricity etcetera. There is another thing called on-
demand resources. So, when I demand for the resources are provisioned right. So, it is on
demand for the resources are provided. So, scalable, elastics resources, provision and de-
provision without delay and cost associated with the change, right. So, there cannot be
delay or cost associated anything. There may be cost factor, rather there should not there
minimal human on managing managerial involvement. So, that as I go and go forth and
things. I may resources provisioning and de-provisioning.
So, buying the thing is still, but maintaining the thing at times becomes over the years
much costlier than the equipment itself right. And there is a major problem of especially
for in the computing world that after typically 2 to 3 years. Not more than definitely
within 5 years, that whole thing becomes obsolete. The technology becomes obsolete the
whole system power etc, is no longer valid becomes viable for installing new set of tools
and software etcetera. So, that is a big problem. There is a statistics of scale on the other
end. For infrastructure build on peak requirement like infrastructure, build up peak
requirement.
I think that, I build an infrastructure that always all my students will be there in the class
or all will register for the course and etcetera. So, multiplexing demand may help me
higher utilization right. So, when I use things on multiplexing different demand may help
me higher iteration. Lower cost for deliver resources then unconsolidated work. So, it is
a lower cost per delivered resources then if there is a unconsolidated workload. So, if it is
a consolidated and lower cost will be there.
So, for infrastructure build to less than peak. So, it is not peak less than peak, here also
multiplexing demand reduce the unnerved demands right. So, multiplexing there may be
if it is a blocking architecture, there can be some of the things which are not served. So,
multiplexing may reduce this unnerved, this it is not like that it is good that one it is
always based to the one. I go on some sort of a scheduling algorithm and go on serving.
Lower loss of revenue or a service level agreement violation. Because, SLA violation
means you need to pay out something SLA violation payout. So, it may reduce lower
loss of revenue and etcetera. So, both for peak infrastructure build on peak and non peak
we have this sort of things.
(Refer Slide Time: 10:30)
Another term which comes not only for cloud, for any type of these things where this key
is involved and you have to give services what you say coefficient of variance or
commonly CV.
So, it is not exactly covariance we are talking about, it is coefficient of variance. So, a
statistical measure of the dispersion of data in a data series around the mean, right; like a
coefficient of variance is represented by the ratio, of the standard deviation of the mean
to the mean. So, it is standard deviation or those who understand sigma to mu sigma by
mu right. Here standard deviation by mean and it is useful statistics for comparing the
degree of variation from one data series to another right. So, I can say that, this whether
the coefficient of variation of this data series, is more than less than equal to other data
series is this is a good measure to look at it, right, rather even if the means are drastically
different then also we can compare to data series.
So, how these two data series behavioral things are there the CV gives me idea to do that
right. So, it is widely used or widely looked into in the investing world. So, in investing
world coefficient of variation allows, you to determine how much volatile or risk, you
are assuming in comparison to the amount of return you can expect from your
investment.
So, this CV gives you idea that, how much risk you are assuming in comparison to the
amount of return you can expect from the investment. So, how much risk you are
involving. So, this is important if you see, it is also important in our sort of scenarios also
like we are we are basically involving some risk by leverage, by putting my
organizational from means infrastructure or on from the on premise infrastructure. So,
cloud infrastructure. So, there is a risk of that if the infrastructure is not available so and
so far. So, it is not only infrastructure I want to mean to say that, all type of services on
the cloud thing. So, if the service is not available at it. So, how much risk I need to take
on those things. So, in simple language lower the ratio of standard deviation to mean the
better is your risk return trade off right.
So, lower the ratio to standard deviation to mean return. So, the better is the risk. So, I
can have more smoothness into the curve that is important.
So, it is some sort of a measure of smoothness. So, if it is a very variable load or lot of
peak and non peak things, then I am in much bigger trouble in measuring that how things
will be there. If it is a smoothing out load then, I am much in a better position to do that.
So, coefficient of variation CV as you says that, it is not similar as variance, nor
correlation coefficient as we are mentioning ratio of standard deviation to sigma, to
absolute value of the mean mu or mu mod.
So, smoother curves large mean for a given standard deviation. So, as we are telling the
sigma by mu, if it is a large mean then given standard deviation things will be not
varying that much or a smaller standard deviation or of a given mean that also things will
be much under control. So, importance of smoothness, a facility which fixed asset
servicing highly variable demand will achieve lower utilization then a similar one
servicing relatively smooth demand right. I have I am a service provider, I have some
100 systems are my backbone and I know that the demand will be something smooth
right.
Then I can basically have a better management of the things or I have n number of
systems. So, what will be the value of n is much easier, but if it is a very much varying
suddenly demand goes up 100 then 10, etcetera, then I have a problem, right. Similarly
for any, if you look if we look at our day to day life any shop shopkeeper, the amount of
provisioning you will do that you will keep this in it is store, this depends on the demand
thing, if the demand is something smooth, it may vary that Monday demand is different
from Tuesday demand, different from week weekdays to weekends demand etcetera.
Fine, but he has a idea, but it is totally random or you done cannot do anything, then it is
a very difficult to provisional things. So, that is the same thing holds there.
So, multiplexing demand from multiple sources may reduce the coefficient of variation
right that is the thing. So, now, I cannot predict that who what are the different sources
how things will be there, but if I could have multiplexed the different demand from
multiple sources. So, it may happen that this multiplex thing may have a give me a better
CV right. Which may have a little smooth CV, where the overall demand things are there
it is not like that all are going peak at the time, all are coming down at the lower things,
but I have a multiplex type of things. Like if we look at say X1, X2,…,Xn are independent
random variables of demand identical standard, say for our argument sake they are
having though they are independent and then, but they have something identical sigma
and mu.
So, aggregate demand in case of mean, some of the means n. aggregate variance
is n. 2 . So, if we calculate that coefficient of variance, it is (1/ n )Cv right. So, if n
increases I multiplex more than the 1 / n decreases. So, I have more smoothing out, this
coefficient of variation or moves smooth type of curve. So, adding n independent
demand reduces Cv by1 / n . So, it is it becomes more smoothing out. So, penalty of
insufficient excess resources go smaller, right.
So, what is happening if it is smoothing out then, I do not have to plenty I have to keep
more resources at my back end right, it reduces right. Otherwise I do not know whether it
is a 10 demand of 10 units or 100 units and etcetera and then I have to keep a track of
100 units at the back end. Like aggregating 100 workload bring the penalty down by 10
percent, 1 by root over root over 100 is 10. So, it may bring down the whole thing by 10
percent. So, aggregating multiple resources may allow me to have a reduced loading.
But, what about the different workloads; non negative correlation demands like if x and
1-x sum of a random variable is 1 appropriate selection of the customer segments right is
another important thing right, I say that I have a computing infrastructure. I select those
customers, some of them are active on day time, some of them processing are active on
the night time, right.
So, it is compensating, right, I if the both are working at the same time then the peak will
go much higher, but selecting that negative demands or the time in the timescale. I could
have managed those thing with the same infrastructure. So, that is selection of the
customer base is the important things, what sort of things will be there when I go on
selecting as customers. Perfectly correlated demand, if some of the things are perfectly
coordinated the aggregate demand will be n.x. Variations will be n 2 2 ( x) and mean will
be n. and standard deviation n. ( x) and so far, coefficient of variation remains
constant. So, it is perfectly correlated demand that this thing will be as a constant.
There are third issue if all demands are coming at the peak at the same time all at the
peak at the same time then, I have a serious problem right. Then I have congestion at the
things all are demanding at the same time. Like if I say all classes are breaking at the
hourly basis. So, at the hourly basis, we have a huge demand for this root network, right.
Lot of vehicles like said studying for cycles to it said they are on the demand because all
classes are things. So, it is a peak coming to the thing at the same time like this then we
have a problem.
So, common infrastructure in our real world, correlated demands, private, mid sides,
large sized providers can experience similar statistic scales. So, it is a more or less
correlated demand. Independent demand, midsize provider can achieve similar statistical
economy to an infinitely large provider right. Available data on economy of scale for
large provider is mixed right because use of same COTS type of things that is
commercial of the self systems and components locating near cheap power supplies that
is one thing like if as the power supply is a major thing for the data centers, they want to
build a data center where the power supply will be much cheaper and nearby. Early
entrain automation tool third party parties takes care of it. So, there can be early entrant
automation tools which the third party can be taking.
So, latency also a big thing, right. So, human response is 10 to milliseconds, 10 to 100
milliseconds. So, latency is correlated strongly with the distance right more that distance,
more the network latency another type of latency more on the hopes, more on the failure
rates and etcetera.
So, though it also depends on what sort of routing algorithms etcetera also coming into
play. So, speed of anyway we know that speed of light in fiber. So, some particular 124
miles per milliseconds. So, if suppose I am searching something and it takes more than
couple of seconds, then we are not happy with that even a VOIP thing, if it is something
delayed more than something, 200 nano second; a 200 millisecond second, then it is very
difficult to communicate over this voice over IP.
(Refer Slide Time: 21:49)
So, there is a supporting of a global user base requires a dispersed service architecture.
So, the architecture if I want to support, if my provider is to support all the global user
base then, I have to have a appropriate distributed and dispersed architecture to do that
and similarly the protocol. So, coordinate and coordination consistency availability
partition tolerance these are issues. So, and it has a direct implication on investment that,
what sort of investment we want to do. We need to we like to look at another quickly
another aspects of the thing.
So, economy of scale might not be very effective always means all taking consideration
right, but cloud service do not need to be cheaper to be economical right. So, it is
economic cop thinks, may be based on that my requirement and etcetera what sort of
demand I am going. Like, if we the popular example buildings to give is that a consider a
car. So, buy or leasing a car may cost me something 500 per day right. Whereas, renting
a car may be say for example, 500 per day. So, when it is economical? Suppose, I by
looking at it is always the buying the car may be economical.
But, I if I am commuting say large distance or say one scene; that means, one once in a
month or couple of delayed days in a month. Then buying of car may be more costly
than renting a car, but if I require that car on a daily basis, that going to my workplace
traveling a large distance etcetera then the buying a car, will maybe economical than
renting a car; all right.
So, it all depends that, what sort of demand demands you are having.
So, you just to do some simple some mathematic expression little bit simplified so that
try to have. Like suppose, I have a demand D(t). So, demand varies over time 0 to capital
T 0 to time. So, demand for resources D(t), P is the max demand or peak demand, A is
the average demand. So, average over the time scale, B is the baseline or own unit cost.
So, if I only unit cost, what is the baseline cost C is the cloud unit cost. So, what is the
cloud unit cost if I purchase the thing and U utility premium rent, a car for that there is a
some type. So, there is a rent that car maybe something, it will come whatever in our
case is 5000 by 500. So, something 10 is the utility.
So, utility of the premium is that taking a cloud service divided by the baseline service
right. That is the utility premium, we are having the things right. That is the utility
premium now so, I have a variable demand D(t), I have a peak demand P, we have a
average demand, there is a baseline owned unit cost dc, cloud unit cost C. And I want to
find out the utility premium that is C/B. Now, if I want to see the overall cloud cost. So,
T
U B D(t )dt is the overall cloud cost which is somewhat u about is the overall costing
0
of my cloud.
If I want to calculate the overall baseline over time T; so, P that is the peak demand into
because whenever I am going for based my baseline. I have to go for the peak demand
thing to calculate the thing B is the baseline own unit cost and T is the overall time scale.
Now, when cloud is cheaper, when the CT < BT, if the cost of cloud is less than the cost
of baseline then it is cheaper. So, or in other sense if you look at it very simplified form.
So, when P/A is greater than the utility premium right. Peak cost by average, cost a peak
demand by average demand is greater than the utility premium. So, when the utility
premium is less than the ratio of peak demand, to average demand then your cloud may
be cheaper then owning the infrastructure or owning the services.
(Refer Slide Time: 27:05)
So, utility pricing in real world, in praxis demands are often offense pipe spiky like new
stories suddenly, it came into things. Market promotions, product launches, internet flash
flood something goes to the internet etcetera. Some seasonal things like Christmas, Tax.
So, those time things goes up. Often hybrid model is based. Like you own a car for daily
commute, but rent a car when traveling or when you need a larger move to a larger
distance to move. Key factor is again the ratio of peak to average demand. So, key factor
is there again that, what is the ratio between the peak to average demand, but we should
also consider other cost.
Like which had not considered in the previous calculation, then our network cost both
fixed and usage costs interoperability overhead Like different one information, a one
data is talking to other there in overhead. Consider reliability accessibility so and so far
right. So, these are the different factors.
(Refer Slide Time: 28:17)
Another aspect, we will just quickly see the value of on demand services. So, simple
problem when owning your own resources, you pay the penalty whenever the resources
do not match with the instantaneous demand. Suppose, I have 100 resources and I pay
the penalty. When it is underutilized, suppose it is utilized with 80 percent, 70 percent
then I pay the penalty for the rest of the 10 to a 20, 30 things, right. Either pay for unused
resources or suffer penalty of missing service delivery things are there right. So, if it is
higher things then I do not able to service.
So, penalty is how do I calculate? Penalty is proportional to D(t ) R(t ) dt , all right. If
If the demand is non-linear, then periodic provisioning in cloud is a big question right.
Suppose if the demand is exponential like in this case D(t ) et right, any fixed
provisioning time interval t p according to the current demand we will felt you will fall
exponentially behind. Suppose, I require time t p to provision the things right by the time
I provision it has gone up. So, it goes on, this at t equal to e of t minus t p. This time Dt
that time to provisioning tp, will create a havoc right. So, D(t ) R(t ) k1et if you say.
So, penalty of cost is c.k1e t . In other sense this penalty grows exponentially. It is
extremely difficult to match this, unless you over provision and try to look at that is also
at times difficult because it grows in exponential things. So, if you need to be careful
that, when what type of demands we are expecting need to study and based on that the
provisioning should be there.
(Refer Slide Time: 30:27)
So, based on these I have I kept a small assignment, for you to work at your own time
and we will discuss this assignment in one of these classes during some free time. So, it
says that consider a peak computing demand of an organization 120 units and demand of
the functions time express. At this like these, are the demand of with respect D(t) is that
demand over t, the resource provisioning of the cloud to satisfied the current demand t is
at equal to so and so far where t is the delay in provisioning extra computing resource
on demand. So, this tilde is the delay, right.
Thank you.
Cloud Computing
Prof. Soumya Kanti Ghosh
Department of computer Science and Engineering
Indian Institute of Technology, Kharagpur
Lecture – 11
Service Level Agreement (SLA)
Hello, welcome to this lecture on cloud computing. Today we will be discussing one of
the important topic of cloud computing or one of the major aspect of making this cloud
computing to realize this is the service level agreement, right. Whenever a consumer and
a producer or a service provider and a service consumer want to exchange any services
or what we say meter services which involves some costing, there is a there should be
some agreement on the things, right. So, the agreement involve the pricing factor,
agreement involve the service availability factor, the agreement may involve say for as
other quality factors.
Now, this is very tricky issue and because a organization or even individual is switching
for his own personal or own proprietary computing paradigm to some cloud taking the
service from some service provider, thinking that you will get same type of reliability
and other reliability same type of performance same type of security what he was having
in his own premises, right. The system said it been in own control.
Now, in order to that he need to have some agreement on the things. So, as such there is
a still today till today they we do not have any very standard format of doing that. What
you usually face whenever we are purchasing any services or any taking any VM or any
cloud provider what we are usually do we basically sign off something right. We say that
these are the terms and condition we agree. So, some sort of agreement will be there. So,
we want to see that what are the different parameters to be taken into consideration and
how this service level agreement typically look like, what are it is different components
and at the end some of the popular cloud service provider. What sort of SLA or what sort
of SLA parameters they are providing that we will we will see in this particular lecture
today.
(Refer Slide Time: 02:42)
So, if we look at what is what is SLA or service level agreement a formal contract
between the service provider and the service consumer. So, that is the thing SLA is the
foundation sometimes known as the foundation or sometimes refer the foundation of the
consumer trust in the provider, right. Whenever I purchase some service the first thing
come as how much I can trust to the provider. So, this may gives me a formal way of
trusting the thing that if the SLA is there. Not only that this may help me this parameters
to compare between one provider with the other provider whom to I will take the service.
So, purpose is to define a formal basis for performing and availability of service
provider’s guarantees to the to delivery, I need to a guarantee to the delivery. If I require
a uptime of say 95 percent. So, there should be guarantee of delivery of 95 percent, right.
So, I want those service provider which guarantees that I am more than 95 percent
uptime it provides, right.
So, SLA now if you see SLA is a broad term right. So, it has to have different objectives,
right. So, that what we say service level objectives. So, what this SLOs does?
Objectively measurable condition for services right. SLA is the agreement has different
components and this these are objectively measureable, I say that performance should be
95 percent. So, it is it is cannot be verbal. So, somewhere objectively I should measure
the thing. So, this So, SLA constitute of number of objective basis and these later on we
will see these objectives can be calculate from different parameters right or system level
parameters which I can calculate. So, these are these are objectives which are there.
Now, interestingly the objective may vary from the consumer to consumer. Like even a
consumer if a service consumer like a academic institution like us, may have different
type of objective or may have not totally different I say may have different type of focus
on the objectives then a consumer like may be a financial organization or a software
company so, they have a different type of strategy level objective. Like I can say that my
uptime maybe I can have 95 percent; however, I may require that data persistence to be
much higher, right.
Some other organization say that my uptime is typically more than 99 percent it cannot
be compromised less than 99 percent; however, my I always take a backup of the data.
So, I may not always require a backup facility and objective etcetera right. So, that can
be other things I can have my objectives varies or varying over time I say during peak
hours my availability should be more than 99 percent; whereas, during of peak hours my
availability requirement of availability may be more than 90 percent, right. So, these
things are somewhere objectively measurable thing. So, this components should come
into play.
So, what are the different SLA typical contents we are looking for? So, it is a set of
services which provider will delivery. So, it should specify the set of services the
provider will delivery. I would like to again reiterate. So, these are these SLA's are valid
for different type of service provisioning, like it can be IaaS type of services where the
SLA he want to have you can have pas type of services you can have SaaS type of
services or any type of other type of things like may be data services etcetera. So, this
SLA's correspond to that type of with that respect right. So, SLA for particular which is a
software as a service type of module and so and so forth.
So, it should be complete specific definition of ease service right. So, it is been complete
specific definition, responsibilities of provider and the consumer to respect these SLA. A
set of metric to measure whether the provider is offering the services as guaranteed like
how do I know that whether the thing. So, some metrics should be there that which we
shows that that it is providing the whatever it is guaranteed.
And if you if we see that for a our day to day activity organizational activity there are
this type of agreements MoUs etcetera, where there are things for defining that
objectively there are remedies for violating those things what will happen. And so and so
forth there are penalty what we say penalty of not providing the guaranteed service. If I I
say that I have give a guaranteed service and it is not provided then what I should pay
give some penalty or I need to give some other incentive to do to compensate that that
should be also specified, right.
What if there is not guaranteed etcetera and how whether the SLA will change over time
right. It may change over time during the particular day of the things it may change over
time over different days of the time and etcetera, etcetera. So, there is a whether the SLA
will change over time that is another. So, whether it is a temporally changing
phenomena, right.
Now, there is a concept of web services, I believe most of you are having a thing service
oriented architecture and web services, which are one of the what we say prime over for
coming up the of this cloud services for benefit of all, I will take one short lecture on
later on in subsequent talks one web services and service oriented architecture those who
are not a custom, but I believe that most of you are used to it and type of things right. So,
there is a in case of web services where service provider and consumer there is a concept
of they communicate with each other.
There is also a agreement thing right. Or what we say web service SLA. So, web service
SLA there are component like one is that web service agreement. The XML based
language and protocol negotiating establishing and managing the service agreement at
the run time. So, it is a de-facto XML based things actually the whole web services
things and so on basic foundation of XML. Specify, specify the nature of agreement
template. So, what should be the agreed upon format of the template of the things.
Facilitates in discovering compatible providers right. So, I there can be more than one
provider. So, it should be able to see that where the compatible providers.
Interaction usually request response right. And SLA violation dynamically managed and
verified. So, if there is a SLA violation this need to be dynamically managed and
verified, and necessary action to be initiated of there is a violation of the SLA. There is a
web service level agreement framework there is a concept called WSLA web service,
level agreement framework. So, there is a framework for WS services formal xml
schema based languages to express SLA and runtime interpreter.
Measure and monitor QoS parameter, quality of service parameters and report violations
if any, lack of formal definitions for semantics of the metrics right. So, there are there is
what we are talking about is more of a syntactic way of looking at there are few issues of
semantics of looking at, but there are less standardization of what the semantics of the
whole thing means like whether the whole this parameter still something like I can say
that whether this parameter tells that the system is performing well is going down and
type of things, or I am expecting a failure something or this type of things may happen
when this type of things. So, there are different underlining semantics which is which is
did to be formally defined or standardized.
So, if you look at this WSLA, this as the cloud evolve from this web service and service
oriented architecture keeping those the framework in mind. So obviously, this SLA's are
etcetera are also having a relationship with them with the things, right.
(Refer Slide Time: 12:12)
So, but if you if we look and try to see that little nitty-gritty of the what are the
differences between the things mainly divide mainly trying to look at into three major
components, one is QoS parameters that quality of service parameters, one is automation
and another is resource allocation. So, these are the three thing what we are looking
there.
So, in case of QoS parameter in case of a traditional web services. So, what we see
response time, SLA violation for SLA violation rate for reliability, availability, cost of
services etcetera. So, in traditional things there is a response time plays a vital role and
there are SLA violation rates for reliability, availability and cost of services etcetera. So,
if there are violation then we with there are what should be that there is rate that what we
things will be there.
In case of a cloud, QoS is related to security, privacy, trust management etcetera, have
more importance right. So, the basic way of handling may be there, but in case of a cloud
we are more concerned about security, more concerned about privacy, trust overall
management and so and so forth. If we look at the automation point of view traditional
web services, SLA negotiation provisioning service delivery monitoring are not
automated right. In case of a cloud SLA automation is required for high dynamic and
scalable services. Like I can scale up scale down and thing. So, I dynamic and scalable
services may require this monitoring.
Resource allocation traditional web services UDDI, that is universal description
discovery and integration that is a UDDI is one of the protocol which is very prominent
in web services. It provides a registry services and we do use those type of things in
cloud also for advertising and discovering between the web services. So, this UDDI is
there in case of a cloud resources are allocated and distributed globally without any
central directory perceive right.
So, ideally it is a distributed things and there is a different mechanism of knowing that
who is having what. So, if we look at the different types of SLA's right. What are the
types of SLA's, one is the off-the-shelf or non-negotiable SLA or sometimes known as
direct SLA right. So, one type of things off-the-shelf. So, non conductive not conducive
for mission critical data or applications. So, it may not be if n would be suitable if you
have a if you have a very mission critical application you want to do this thing and so
and so forth.
Provider creates a SLA template and define all criteria that is contract period billing
responsible, response time availability etcetera provides, say template and all whatever
the things is there. Followed by the present day state of art clouds. So, the this specially
this public cloud follow this type of thing. So, whenever you want to buy some services
from the public cloud it will it will go you this. So, with this form with other terms and
condition. So, you need to agree on the things to work on the things right.
Whereas negotiable SLA; that means, you negotiate and find out that what things will be
there, negotiable via external agent. So, I may have there may be external agents which
negotiates between the provider and consumer. And there can be negotiable via multiple
external agent, if there you are buying multiple services and then you amalgamate those
services to achieve one particular thing then it can be multiple layers.
So, it can be. So, it can be either off-the-shelf that is a standard or whatever is provided
or negotiable. Usually what happened whenever we are having small requirement want
to do a something which is there it in our own means some very small scale operations
then we go for this type off-the-shelf staff. Whenever we have a large requirement like
he want to have whole organization process into the thing and etcetera then we want to
look at the more have a negotiable SLA. So, want a special rate it is as good as you are
booking a particular hotel or transport for one or two percent then you go for that
whatever is available, but if you are buying the whole booking the whole hotel or the
booking a transport bus or something then you have a negotiation to do that that what,
we have also negotiate to have some fine tune the agreement process.
So now what you are discussing about. So, we have SLA's right. Service level agreement
and this SLOs are contributing to from this SLA's right. So, service level objectives. So,
objectively measurable condition for service. So, that is one point, encompasses multiple
QoS parameters. So, SLOs can have a different QoS parameter. Like availability,
serviceability, billing, throughput, response time, quality etcetera. So, these are
somewhere other need to be measured. So, for example, may be availability of service is
99.9 percent. So, that is a thing I can say that response time of a database query should
be between 3 to 5 seconds right. So, that is a response time I am looking for throughput
of a server particular server at peak time should be something 0.875, right. So, that may
be another point of looking at the thing. So, these are this can be different type of SLOs.
So, service level management. So, I have agreement I have SLOs, now I have to manage
this whole scenarios. So, monitoring and managing the performance services based on
the SLOs. So, SLOs are reporting that what are the different values etcetera. Provider
perspective make decision based on the business objective and technical realities right.
So, it has a business objective it is has a business goal and technical realities how much
things the provider is having at the back end that is also important. From the consumer
perspective decision about how to use cloud services, right.
So, it is more that whether it is suitable for my organization where I suitable for my
personal dream need, etcetera, then that way I measure.
(Refer Slide Time: 19:00)
So, there are several consideration for SLA's few of them is one as we have our business
level objectives. Responsibilities of the provider and consumer is important; that means,
the balance of responsibility between the provider and consumer will vary according to
the type of services, right.
So, there are some services where the provider’s responsibility is much, much higher,
whether there is a services which are which also require the consumer responsibility
right. Like I put some data and get output of it I may have a things that I can the
consumer, your provider can basically accept data as the rate of say x megabits per
second. And then if I send some data as I x where x is more than the x then there may
be problem of things problem of overflow of data and type of things then, which may
intern violate other SLA's of the thing SLA of the provider.
So, there is a responsibility for the both the try to check and do and there are auditing to
maintain that they all are for you. Business continuity and disaster recovery another
important aspect for the consumer should ensure that cloud providers have adequate
protection in case of a disaster right. It may be disaster natural, manmade, system failure
etcetera. So, it should have been business there are system redundancy, so many cloud
provider deliver their services via massively redundant systems right. So, that you can
get guarantee.
(Refer Slide Time: 20:34)
So, these are some of the things which are interlinked. There are other issues like
maintains maintenance of the cloud infrastructure effects any kind of cloud offering
applicable to both software and hardware. So, maintenance is a big factor. Location of
the data if the cloud provider premises to enforce premises to enforce data location
regulation the consumer must be able to audit the provider to prove the regulations are
being followed.
Like and say that I can it may be So happen that IIT Kharagpur for example, say that all
my data if in cloud should be residing within the jurisdiction of India right. I do not care
how So, there should be a where to I verify that the data is not in some other country or
some other land. Seizure of data, if low enforcement targets that data and application
associated with the particular consumer the multi tenant nature of the cloud computing
makes it likely that the other consumer will be effected also.
Like if suppose a consumer c one it is data the law enforcement or department want to
seize then if it is residing multi tenant with some other consumer then the it may want to
seize the whole at this then other things will be there. So, there is a issue which need to
be there failure of the provider if in case of a failure of the provider.
(Refer Slide Time: 21:56)
What should be there and jurisdiction that any type of litigation were where it will be
addressed. There are other SLA requirements like one definitely security there is a big
issue of data encryption if I encrypt the data how this key management will be there
where the encrypted data will be residing where the key will be residing how the key will
be communicate and etcetera.
Privacy issue isolation of the consumer data in a multi tenant environment how to isolate
a consumer data and services in a multi tenant environment. Data retention and deletion
right. Say if there is a if the how long the cloud will retain the data and where it will
delete and type of things. Hardware erasure or destruction. So, provider requires to 0 out
the memory if consumer powers of the VM or even 0 out the platters of the disk if it is to
be disposed or recycled. So, if there is a thing as that the hardware erasure or disposing
of the hardware etcetera then what will be the other effect on the things. So, this things
need to be there. There are several other requirements.
(Refer Slide Time: 22:57)
So, these are easy to tell about, but while implementing it is very difficult, because
suddenly in the provider things they may not allow this third party to work on their
systems etcetera right. Then auditability, as the consumer are liable to breach liable to
any breaches that occur it is vital they should be able to audit provider systems and
procedure right. As it will affect the consumers own business if with the providers with.
And SLA should make it clear how and when those audit takes place. Because audits and
disrupts audits are disruptive and expensive providers will most likely place limits on
when charges on them right
So, if you want to do frequent auditing then it may basically it is a expensive both on not
only on monitory terms it is on resource term also right. There may be downtime there
may be other resource requirement, etcetera. So, provider may not be interested in that
very frequently so, that to be need to be properly provision.
There is another factor call what we component called key performance indicator right.
So, it is low level resource metric right. So, multiple KPI’s are composed aggregated or
converted to high level SLOs. Multiple SLOs are integrated to have that SLA. So, we
look at I have SLA, then SLO and then KPI, right.
So, these are the different KPI there can be there. So, possible mapping like if
availability is a one of the objective. Then availability is 1 (downtime/ uptime) right.
So, these are the components this KPIs which are very low level and directly measure
measured from the system parameters. And SLOs are measured based on this KPIs, and
this KPI this SLO aggregation of SLOs are these SLA.
So, industry defined KPIs. So, monitoring so, natural question who should monitor the
performance of the provider, does the consumer meets the responsibility solution neutral
third party organization to perform this monitoring, eliminate conflicts of interest then
there is issues of auditability as we have discussed.
(Refer Slide Time: 26:48)
So, the metric for monitoring and auditing these are the typical widely used metrics.
Throughput, availability, reliability, load balancing So, when elastics elasticity kicks in,
So new VMs are booted or terminated for example, then the load balancing of the
systems is a important factor, right.
Durability, how likely the data to be lost how much the, how much durable this data or
services are there elasticity then linearity, how the system performs at the load increases
if it is a linear increases like load increases and the system also increases the system
provisioning also increases whether it is a linear craft, if it is a linear craft, then it is easy
to scale-up like easy to chase this higher demand. If it is a non-linear specially
exponential etcetera then it is a difficult process we will see later on.
(Refer Slide Time: 27:44)
So, there are few more metrics like agility, automation, customer service, response time,
service level violation, transaction time, resolution time these are different other
components.
Now, if we have this chart there are different requirements at different levels right. Like
IaaS may requiring something PaaS may requiring something and SaaS may be
requiring, something on these different type of some a few of the requirement
components.
(Refer Slide Time: 28:16)
So, we will quickly look at some of these popular or example cloud providers SLA's or
what they type like try to give the like Amazon EC2 if the IaaS service provider it shows
that the availabilities 99.5 percent. Like service year 365 days. Annual percentage uptime
region and so and so forth.
Similarly, zoho which provides SaaS, Rack space terremark. So, there are some of the
things which are listed here.
So, what we try to discuss in this particular talk is that this service level agreements in
plays a vital role when we want to use want to make these cloud computing in our in
practice right. I want to when I want to for my personal use, or organizational use, where
when we want to migrate from my conventional traditional system to this cloud
computing thing. Then I need to look at around across these different SLA's right. What I
need, what are the parameters I need and whether it is whether it is measurable by the at
the provider ends and how I can guarantee that my work or my work process business
process should not be affected adversely by the cloud service provider. So, with these we
will stop here.
Thank you.
Cloud Computing
Prof. Soumya Kanti Ghosh
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
Lecture – 15
Open Stack
Hello, so we will continue our discussion on cloud computing. Today we will discuss
and so a demo on a open source cloud which is open stack; one of the very popular open
source cloud. So, we will initially we will have some brief overview of open stack then
we will; so, a demo on open stack; that how way open stack can be configured VM can
be provision. Primarily, we have been used these open stack for IAS type of cloud
infrastructure as a service type of cloud; that we will demonstrate today.
So, if you see open stack one of the very popular open source cloud; which you can
download and install in a particular hardware configuration even with couple of species
with you can install your open stack and see the performance of IAS type of cloud.
So, open stack is a cloud operating system that controls large pool of compute storage
networking resources through a data centre; throughout a data centre, all manage through
dashboard and gives administrator control and while empowering their users
provisioning your resources through a wave interface.
So, what it says; its say you have a set of resources it is a layer above those base this bare
metal resources and it gives the administrator to control these resources, and the user can
basically provisions VMs out of it. So, it access a IAS type of cloud and as it is as we
mentioned it is a open source. So, you can download and install and give it provision of
the things. Incidentally in IIT Kharagpur, we have done experimental; we have made
experimental cloud called Meghamala, which is based on open stack and which has been
installed over blade server.
But never the less open stack you can install over PC’s; here also we have tried with
PC’s etcetera for small-small student projects. So, it gives a feel that how a particular
infrastructure as a cloud works and also how we can work on those things.
So, as for as the open stack capability is there; so, it has a capability of all the services
though primarily we use more as the infrastructure as a service. So, at the infrastructure
as a service provision come compute network storage at the pass level on top of IAS
cloud foundry and over the as the SasS level that browser or thin client accesses. So,
these are the whole pyramid of typical cloud services; which is accessible over internet.
So, infrastructure as a service where you have physical serve as like CPU, RAM, storage,
data center networks, etcetera; over that some developers service and over that some
software as a service.
(Refer Slide Time: 03:44)
Now, if you look at the capability of open stack; so, primarily if you look at from point
of view as infrastructure of service. So, it is a VMs, it can give VMs on demand; so, it is
both provisioning and snapshotting is possible. So, it is virtual machine on demand, it
has a provision for networking, it has storage for VMs and arbitrary files. So, you can
have storage for virtual machines and other files and supports multi tenancy; quotas for
different project users, user can be associated with multi projects and type of things.
So, in essence it gives you a full-fledged experience of a cloud and as it is a open source
you are installing. So, you have the control both administrative and physical control over
the whole thing, so the hardware is used and you are running on the things. So, it is a
good thing or it is a very popular thing for individual user or group or lab to install and
have a life experience and extremely useful for students with couple of PCs or even a
couple of laptops to you can install open stack and see that how things are there.
As we mentioned earlier that there is a good amount of research going on; on resource
management in cloud, resource management, power management and people are talking
about green cloud and sort of things. So, this sort of lab scale implementation of open
source clouds may help in having experience and experimentation of different type of
parameters on the cloud. So, open stack is one of the very popular open source cloud.
(Refer Slide Time: 05:58)
So, if you look at a history of open stack; it started with collaboration between NASA
and Rackspace and over years, it has gone through different project mode and you can
see that it started with around 2010 and we are having regular releases with over years.
So, if you see there is started with a project called Austin and go on going to that Newton
and Pike and Queens which are to be declared.
So, if we look at the overall architecture of open stack; there are a couple of components
like one is that horizon which is primarily the dashboard; which invite the dashboard
project horizon. So, project newton which is the mostly look at the networking cinder
which primary look at the block storage of open stack; nova is the compute, glance is the
image services; the different type of different image services which can be hosted in the
open stack is there.
Then we have other things like swift; which is object storage like we have cinder as
block storage. So, object storage is swift; then ceilometer which is ceilometers, which is
the telemetry services; keystone is the idea identity services. So, what we can see that
this all different sort of services; which are required in a typical cloud are there in open
stack.
So, though they developed in different project mode, but it comes as a bundle and we can
have when we have installations, we have all these flavor into the things. So, it is good to
have a feel of the cloud and you can have operational cloud using your own internal or
in-house cloud using these type of open source. So, things it is less costly you have to
pay for the infrastructure and if you have a one infrastructure as I mean already present
or access infrastructure already present; which we can deploy it into the using open stack
or any other open source cloud.
So, if you look at from the point of view; then you have the standard hardware at the
backbone, then the open stack shear services and over that we have compute networking
storage services and so, this and then their open stack dashboard and the services we
talked about.
So, the user applications come from the top and it goes on to this overall inform APIs
and utilizes this different services to execute particular or realize particular job. I would
like to mention here that we have taken most of these resources from again open stack
site and other resources; so, these are the things which you can get in right that also. So,
if you look at the major components as we are mentioning here; if you just go back like
horizon, newton, cinder, nova glance, swift, ceilometer and keystone; so, these are the
major component.
So, we will just have a brief overview of these components before we see a demo on
those things. So, one of the critical components compute which the service is compute
and the project is a nova project; so, manages the life cycle of compute instances in a
open stack environment. It manages the life cycle of a compute instances in a open stack
environment, responsibilities includes spawning, scheduling, decommissioning of virtual
machine on demand.
So, it comes with a storage; the ephemeral storage and a persistent storage which can be
associated with this VMs; so, these are possible using this compute or nova services.
Another aspects is the networking, another major service is the networking; which is
under the neutron thing, enables network connectivity as a service for other open stack
service is such as open stack compute another things.
So, if you have other open stack surface like compute or nova storage services and so on.
So, this neutron or the networking provides a networking as a service to this different
component of open stack. So, provides an API for users to define network and
attachment to them. So, it provides API to the users to define networks and how to attach
the things. So, if you look at any cloud infrastructure; so, two type of networks are
prominent there; one is the internal network, which is internal to the cloud and there is a
external network to which is external to the cloud.
So, like if as you are talking about that we have a open source cloud in our institute that
what we call it Meghamala; the experimental open source using open stack with a very
only experimental cloud which have being used by faculty and research scholars for their
computing primarily or computing needs, it comes with different flavor; we will show
that example and how we do.
So, it has a internal network for the cloud whereas the external network for that cloud is
basically the IIT network. So, as it is a in house cloud; so, this is not accessible from the
external world; however, the cloud itself has a internal thing which is for communicating
between this different component and providing services and a external link which gives
a connectivity to the external one.
So, this it based on this neutron type of services; if you are using open stack. So, it has a
pluggable architecture that supports many popular networking vendors and technology.
So, it is interoperable between different networking vendors and technologies so that is
feasible in open stack.
The next one is the storage service; which comes under project swift, store and retrieves
arbitrary structure unstructured data objects as a RESTFul, HTTP based API.
So, it is a stores and retrieves unstructured data objects as a RESTFul; I believe that you
know you understand about RESTFul services; RESTFul and HTTP based API. So, it is
highly fault tolerant with data replication and scale out architecture; its implementation is
not like file server with mountable directories etcetera. So, it is a basically a fault tolerant
and system with a scale out architecture, it is not they simple file server. So, it is much
more than that. And in this case, it writes objects and files to multiple devices ensuring
the data is replicated across the server clusters.
So, in order to make fault tolerant it writes into the things; as if you remember when we
are talking about data on cloud or data services on the cloud, in our lecture or when we
discussed about those. So, the replication is the scenario if discuss with the 3
replications.
So, it replicate the data into 3 places, so whenever there is a right operations; so all these
right will be having should sink with all the replicas. Whereas, in case of read over sense
in any of the replica can response to the things; in never the less this storage service of
swift also provide such structures to means fault tolerant and services.
So, another type of storage service is your block storage, which is provided by cinder
provide persistent block storage to running instances. So, it is a persistent block storage
to the running instances; it is pluggable driver architecture facilitate the creation and
management our block storage. So, it is a; again it is a persistent storage and persistent
block storage and provides a pluggable driver architecture to facilitate creation and
management of block storage things.
(Refer Slide Time: 16:25)
Then another component we are having which is not directly under compute or storage,
but plays a vital role in realizing the cloud is the identity service or what you say
keystone under the keystone project.
Provides authentication and authorization service for other open stack services. So, it is a
identity service and provides authorization and authentication of services for other
things. Provides a catalogue of end points for all open stack services; so, also it along
with that it provides a catalogue of end points of open stack services that how those
services are defined and type of things.
(Refer Slide Time: 17:10)
So, the next component is glance or what we say that image services. So, it basically
open sack support separate type of images or image services where which can be loaded
or which can be instantiated into that different VMs. So, I can have different VM and
different images of say operating systems and other things can be instantiated on the
difference VMs based on the user needs and requirement.
So, these sorts of services are provided by that; it is glance project or the glance service
glancing service. So, it stores retrieve virtual machine disk images, so the virtual VM
disk images are stores and retrieves by this glance. And open stack compute makes use
of during the instance provisioning; as we are mentioning, whenever the open stack have
instance provisioning; then they use this storage services.
Like you can have this image services; so, like you can have different images say for
operating systems. So, you can have different flavor or images or other operating
systems and when you instance; based on the requirement of the user, those are
instantiated by this compute services; so, this image repositories are there in the open
stack.
(Refer Slide Time: 18:54)
This is another service of telemetry; which is ceilometers monitors and meters the open
stack cloud for billing, benchmarkings, scalability, statistical purposes.
So, this is important for overall metering of the cloud services as we have discussed in
our initial lecture, as we understand that cloud is a meter service; that means, whatever
uses and type of things are meter. So, that means this particular ceilometer or the
telemetry services is in cloud, in open stack helps us in monitoring and metering the
open stack cloud for billing, benchmarking, scalability, statistical purposes; it is not only
for meeting, it is required that as you are having this measured services.
So, you can do benchmarking the address; the scalability resource and statistical
analysis; like you want to know that what is the loading, how it be a, type of things. So,
those type of things are available; so it is a: though it may not be directly contributing to
the compute or storage or networking; which use anything that is the major component
which allows us to work on those, allows us in this aspect.
But never the less it plays a vital role in having, in helping in realizing these metered
service of the things along with statistical analysis or statistical measurement of things to
charge the performance of the things, which not only help in building of the resource
uses, also it helps in understanding the future requirement and how the infrastructure at
the backbone need to be increased and type of things, it help in realizing those
requirements also.
And maybe the train, maybe the periodical requirement; I can say that every one day this
is the load; weekends load is less; however, some time; it is some particular time
etcetera. So, this overall for analyzing over all the overall this performance of the things;
what we required is more about different sort of data, which this telemetry service
provides us.
Then we have the component of open stack dashboard; that is what we say under project
horizon. So, its provides a web base self service portal to interact with underlining open
stack services; such as launching an instance, assigning IP addresses and configuring
access control.
So, it is a dashboard under project horizon provides a well web base self service portal;
to interact with underline in open stack services such as launching, instance launching,
assigning IP address and configuring; actually we will show in our demo that how we are
using this open stack dashboard to user management, to resource management. Like we
want to assign a VM or assign IP address or if an means configured the access controls,
loading images etcetera all can be done using this dashboard.
So, it is also important aspects and it is the; what we can say frontend interface for the
administrator to manage the cloud. So, it is extensively used for management of the
cloud.
(Refer Slide Time: 23:09)
So, if we look at the overall architecture then what we see that we have different
components. So, this is open stack object storage as we have discussed that is which is
realized to swift; that open stack image services which is used to glance, open stack
compute services which is realized to nova; realized by nova, there are block storage
which is cinder; open stack networking; which is the neutron.
And open stack identity services which is keystone. So, there are different type of
services and if we have seen that horizon which allows us to look at the dashboard. So,
this that open stack dashboard and this is the internet that user comes on the command
line interfaces; nova neutron swift etcetera, cloud management tools like right scale;
other tools GUI tool dashboard, cyber duck and so on and so forth.
So, there is the user interface for the things and then it basically; based on its requirement
it words into this. So, it shows that over how this overall at the top level; open stack
modules are interconnected, how the process flows goes on into this open stack module.
So, this is the; a overview of the thing; there are few more site of the individual
component, we will not go to the much nitty-gritty of the thing, but nitty-gritty of the
material, but try to see that what are the different type components.
(Refer Slide Time: 24:55)
Like, if you look at the open stack workflow; so, these are the different components are
there nova for compute, cinder for block storage; these are the compute nodes in the
nova. There are other components within the nova, there is a cinder also have its own
component. Neutron is the networking part; networking aspects; so, neutron API
scheduler plug in, queues etcetera these are all networking related component.
We have ceilometers; which is primarily that the metering service or the telemetric
component and there are keystone for the identity services. So, user log scene to UI
specific VM parameters like name, flavor, keys etcetera and hits the create button. So,
they do it on the means when user log scene with the particular creative VM. Then the
horizon sends the HTTP request to keystone; authenticate information as specified in the
HTTP header.
So, horizon or the dashboard service sends the information to this; I will re service then
we are at the keystone sends the temporary token back to horizon via HTTP and nova
API sends HTTP request to; now after that keystone sends the HTTP, then the keystone
says the temporary token back to the horizon, via HTTP; horizon sends the post request
to the nova API and send them. So, once that authentication is there then the horizon
sends the things to the compute server; which in turned as it other operations. Now,
finally the nova API sends HTTP equates to validate API token to case 2; so, this way
the whole thing goes on.
(Refer Slide Time: 26:53)
And if we look at the token exchange mechanism; so, the user to the keystone it is; for
the identification or authentication, then it goes to the nova, then the glance and the
neutron. So, this is what we say; if you look at that process flow of that particular
authentication token uses.
Similarly, there is a provisioning flow nova appearing makes a rpc cast to scheduler;
scheduler picks up the message from the message queue, scheduler fetches information
about the whole plaster from the databases, data base filters; selects compute nodes
scheduler, publishes message to the compute queues. Nova compute gets message from
the nova message queue MQ and nova compute makes rpc called to nova conductor and
so on and so forth. So, that is how that provisioning is made into the open stack.
And then we have individual component like compute driver; which are expansion of the
things. We are not going detail into the open stack thing; those who are interested in
particular things can refer to their resources.
Similarly, we have neutron architecture which is the networking architecture and as such
neutron also have several component into; a under its folder.
(Refer Slide Time: 28:27)
The glance which is for image service; so, it has glance databases, can registry that what
sort of services is there; then glance API and storage adopter. So, we have again few
components under the glance.
Similarly, cinder architecture which is cinder architecture which is called block storage;
so these are the major component of the cinder; that is cinder databases, scheduler, API,
volumes and a backup service of this cinder.
(Refer Slide Time: 29:10)
Keystone as we have already discussed is the identity type of service and it has different
modules or different components. Policy back in token, tokenization, catalogue service,
identity service, assignment backend and credentials backend, so these are the different
type of component or services under this keystone identity service.
So, if you look at the open stack storage concepts; it is in the similar line with the other
cloud storages; like ephemeral storage; persists until the VM terminated, accessible from
within VM as local file systems; this is the ephemeral storage. So, once the VM is
terminated; it also goes off. Used to run operating system and or scratch space like as it
is ephemeral, so it is primarily used to run operating system which are loaded as and
when things are required. And it is also can be user a scratch space and managed by the
nova; that we have seen block storage persist until specifically deleted by user.
So, it is a block storage, so it persists until specifically deleted by the user accessible
within VM as a block service. Used to add additional persistent storage to VM and to
operating systems, so it a used to add additional storage to the things; otherwise you are
with the VM taking as storage only and it is managed by cinder.
Then we have a object storage, which is managed by swift persists until specifically
deleted; accessible from anywhere, used to add and store files including VM images. So,
this VM images which are managed by some of the images managed by the glance
services are also used to add store files into the object storage and this and the object
storage is managed by as we have discussed earlier is by swift.
So, in summary if you have a quick overview of the whole this open stack, the user logs
in horizon and initiates the VM creation. Keystone authorizes it, nova initiates
provisioning and saves state to the database. Nova scheduler finds the appropriate host.
Neutron configures the networking aspects, cinder provides block devices. Image URI is
looked up through glance.
Image is retrieved via swift and VM is rendered by a hypervisor. So, this is in a overall
that a brief overview of these open stack; open source cloud. So, what we will do next is
a demo on these things; I will try to give a live demo of our open stack installation at IIT
Kharagpur, as I mentioned and then we will see that how VM can be created and all
those things.
Thank you.