DBMS Lecture 40 Transcript
DBMS Lecture 40 Transcript
Lecture – 40
Course Summarization
So, I would just start with doing a quick recap of what all did we cover and what we
expectedly learnt.
(Refer Slide Time: 00:32)
In week 2, we started off with query language SQL at an Introductory level and then at
an Intermediate level which are really really the first major aspect of a database
particularly relational database that a student must master.
(Refer Slide Time: 01:01)
In week 3, we continued with the advanced SQL and did the aspects of modeling from
specification in terms of Entity Relationship Model.
In week the next week, we did the design issues which was really the involved part and
possibly the most important aspect of the relational database design beyond query coding
query being able to write queries.
So, this is based on dependency and different normal forms and I am sure you have spent
a good time on mastering these.
(Refer Slide Time: 01:40)
In the following week, in week 6, we discussed about indexing and hashing to for
making the accesses really efficient.
(Refer Slide Time: 02:05)
In week 7, we did another critical aspect of database systems that is how to make
transactions work concurrently. So, we defined transactions and define what is
Concurrency and then we took in 2 different critical aspects of Serializability that it is
possible that we can execute transactions in a manner so that their instructions are
intermixed, but then even in that case, they actually produce a result which is as if these
transactions could have been executed in the serial order and we talked about the issues
of recoverability in this respect and we specifically looked at different protocols,
particularly 2 phase locking protocol for managing this kind of concurrency and the evils
of deadlock that may happen when you do it concurrency and how simple protocols like
time based protocol can handle that.
(Refer Slide Time: 03:03)
And in the current week, we have dwelt with different strategies of recovery, particularly
log based recovery and we have touched upon query processing and optimization.
So, this is you have got a very first level overview of this course. This is in a limited time
and with limited number of assignments. So, you will just get a first level idea, this is not
to make you really an expert of database systems, but this will certainly get you started
well in terms of the database management programs, in terms of you are taking up
advanced courses later on or in terms of actually taking up a job in different database
area.
(Refer Slide Time: 03:48)
So, given that given that in this current module, I would discuss about few things beyond
the textbook actually, the space of databases RDBMS bases are quite crowded, the lot of
RDBMS bases you will see. So, I will just take a quick look in terms of the common
RDBMS systems and there had been a number of queries on the forum and in the live
session about that. I will also touch upon we would like to discuss about non relational
database systems which was not a part of the curriculum that we did here, but will just
present a brief overview.
And then finally, I would like to conclude with what should be the road forward from
you, presenting you a kind of a skill profile matrix so that what skills you must pick up to
actually get a job off certain profile and what are the companies that you might look at
working for.
(Refer Slide Time: 04:55)
So, starting with the common databases, there are several relational database systems.
So, in this slide you summarize basically what are the different aspects of relational
database systems which you have been discussing so far. So, this is just a summary of
that.
Now, these are the common database systems. So, I have chosen the ones which are most
widely used, most easily accessible and kind of large companies use them, large
databases exist on them. So, there are primarily 2 classifications; one is a set of database
systems are commercial Oracle from the Oracle corporation, Sybase from Sybase
corporation which is now SAP AG, DB2 from IBM, SQL Server from Microsoft and the
recent entrant to that who is making a regular ripples is Teradata which is a you know
joint database systems from Caltech and certain group of Citibank. So, I if you are
working for a company who subscribes to any of these database software, then you
should be able to use them and understand what all you can do, but if you are working
with smaller companies or you are working as a student, then you will need to use some
of the database systems which are free or are on the GPL licensing or open source.
So, most prominent amongst them is PostgreSQL which is from a Postgres global
development group. So, these are non commercial the software in the sense that you do
not need to pay for them and they are on the GPL and some of some part of that would
be open source as well and a very commonly used is my SQL which is was originally
from a Swedish company called my SQL AB, but now it is acquired by Oracle
corporation, but it still does not you do not need to pay for that. So, these are the
databases and systems to primarily look for and besides that there are some other
database systems which use certain object oriented features on top of the relational
features.
So, if you look in through these, then in most cases you will find in terms of the gross
functionality of the kind of SQL that you can write, a large subset of the SQL that you
can write on databases maintained through these database systems will be same. So, what
you have learnt here would be applicable irrespective of which database which of these
database systems you are using, but of course there are specifics which would be
different amongst them. So, in the next couple of slides, I have 4 on each one slide, I
have given a brief background about the particular database system.
(Refer Slide Time: 07:48)
So that you know you know how stable, how old or you know what are the basic nuances
of that database system for example, Oracle started in 77. So, you can say, it is a it is a 40
year old database system. The latest version is 12 C and these are the different supports
that it has.
Sybase also started in 1987. So, that is almost 30 years, but it is less you know less
vibrant right now, the last stable released happen in 2014 about nearly more than 3 and a
half or 4 years ago and, but Sybase is a has been a very good database systems for
programming through API’s and it has really good support for that.
DB 2 is also a very old possibly the oldest surviving database systems which started in
1970. So, when almost the E. F Codd of the Boyce Codd normal form published the data
manipulation schemes from IBM. So, this is also a widely used, last release 2016.
Teradata is relatively new. It was released in 1984 and it is, but it is it is one where lot of
new developments are still happening and new experiments keep on happening and the
current version is a Teradata 15.
So, it is a very vibrant system. MySQL probably most widely used amongst the free
community among the open source community also where the first internal release
happened in 1995. The recent releases happened this year. So, these are the common
database systems that you will come across. So, I mean given the organization that you
are working with, first find out which database system it uses and then look into the
specific manual for that and specific features.
(Refer Slide Time: 10:34)
Beyond these a Relational Database Systems also, there are certain database systems
which use Object oriented notions in that. So, if you are familiar with object orientation
then you would have understood that the relational approach does not make keep things
object oriented because you are always flattening out in terms of attributes and you are
trying to look at the attributes, but it went for example, when you want to model the
same thing in terms of a C plus plus or java program, you would like to look at a course
as an as an object, as a class you would like to look at instructor as a class, you would
like to look at teaches as a as a kind of class and their instances. So, there has been
attempts to make give a object orientation kind of layer on top of relational databases or
define things in that way. Objectivity DBO 2 objects store are some of the examples, but
unfortunately this is have not been as popular as a regular relational databases.
So, if you happen to use any one of them, then you should be cause a cautious that you
know you really know why you are using it and you would be able to go a long way in
terms of that.
(Refer Slide Time: 11:53)
Now, when you come across a particular system that your company or your university
needs to use and you would like to choose, then it will be good to look at the different
aspects of that system. These are here are some of the parameters on which these
database systems vary in a in a minimal to a very large extent, in terms of what operating
systems it supports, what are the fundamental features, what are the limits for example,
every database sets a number of limits in terms of the index size, in terms of the table
size and whole lot of that.
How are the tables and views created what the kind of restrictions you have that, what
kind of indexes has support the capabilities the data types, the different databases
support. We have talked about a very limited set of data types in terms of SQL, but in an
actual commercial or even you know open source database, the data types could be wider
than that what kind of other objects partitioning access control mechanism. Access
control is very important for ensuring security and finally, what kind of programming
language support do you have.
Because as we have seen in the application development module, that it is not enough to
just have a you know the database firing SQL queries, no application user will actually
fire SQL queries. The application user needs and in GUI possibly or a text interface
through which it will put queries in a different form and that needs to be processed by
taking it to the database engine. So, you need possibly an interface which is in terms of
C, C plus plus, Java, Python, this kind of programming language. So, how do you
connect to or embed such embed your relational query into different languages that differ
between different database systems. So, these are the parameters that you must look at.
In the next series of slides, which I will not you know discuss really because the these
are more like data.
But here after given a compilation of different you know on different aspects, how do
these common database RDMS systems agree or differ. So, this is like a slide which
shows what are the operating system support for different databases. So, if you are for
example, working on android, then you can easily make out that you do not have a
choice to use SQL server or to use Oracle, but you can use Sybase.
But you can use Postgres and MySQL actually, if you look into these two columns, right
most columns which are for the open source databases, you will find that they have the
widest choice in terms of operating system. In many aspects you will find that these free
database systems have a better you know options for you; obviously, when it comes to
you know really the core, core of database systems in terms of really really supporting
very large databases, really really supporting very fast operations, really really
supporting very secure applications, you might need to only work with commercial
software because they offer that, but otherwise for a large number of common you know
medium scale or low cost applications the free database systems, Postgres and MySQL
are really good options there are different such features.
At compile different limits of different database systems here. So, you can see in terms
of maximum row size, columns per row and so on and so forth, how do they differ. So, if
you are making a choice in terms of what database I will use.
(Refer Slide Time: 15:42)
You can use these information. This talks about tables we use the type systems, what
kind of typing is used.
The different data types that are used are given here.
(Refer Slide Time: 15:52)
The access control which is critical for ensuring good security and good management of
that database are given here.
So, these are kind of the different across different features, this is parameters, this is how
they compare. So, well this is for your you know reference only, this is this is not for
your assignment or examination, but I just wanted to have a view that with all the theory
and you know a small hands on that you have seen, when you go to the real life what are
the expected things what are the expected system this will have to work with. Now let us
just move on and let us let me just briefly talk about the a group of database systems
which are non relational and of course I would warn you that the basis for these DBMS
is are not covered in this course, is beyond the course, but just for your information and
to keep in keep you in tune with what is happening frequently around the industry today.
(Refer Slide Time: 17:16)
So, the non-relational database systems have arisen from what they all have must have
heard of is the whole aspect of Big Data. Big Data as a name suggests is certainly
voluminous data, complex data and now the question that you might have is if I i have
done a good relational design, if I have a good RDMS, can I not handle big data. The
question here really is, big data is not only about volume, the volume is only one aspect
which is really large. So, big data typically are characterized by certain Vs.
So, these are not any very standardized characterization, but these are more commonly
accepted once. So, one is volume, that the quantity of data when a for a big data situation
has to be very very large. Now again what is a very very large is again a subjective
question, some you can say that a million record is large, someone else would say no
million record is small, it is actually 10 million is large; some might say that it is a
database need to be petabytes to be large and so on.
But these are all subjectivity, but it is large has a voluminous existent in certain sense,
there has to be different variety different types of data. So, all that we have seen in the
relational database is basically your you know strings and numbers if you look at it in
different ways we are seeing, whatever we have dealt with in all these through all this 39
modules so far, they are primarily about strings and numbers, but nothing else we have
not, but big data can be about free text, it could be about natural language comments
your regularly writing comments on your Facebook.
So, those Facebook comments are phrases and if I want to make certain query based on
that, if I want to make a query that, how many Facebook users has commented on the
success of Virat Kohli as a captain of India if I want to make such a query, then the
question is how do I do that? Because it is not something where you have a at the
information in a very structured way this is no there is no relational schema which says
that well the values are put in terms of Virat Kohli having done very good, moderately or
marginally or you know the captain Indian captains are successful, not successful and so
on.
These are this happen in terms of various texts, phrases, clauses that we write. So, variety
is a major issue then, it could have word your images video. So, big data includes all of
that. The third V is about velocity that the processing speed may need to be really really
fast, often in big data often we say that the processing has to be real time which means
what is real time. Real time is basically that from the time I fire the query and to the time
I get the result, there is a fixed time limit within which it has to happen.
So, if I if I if I really want to do a railway reservation that also is a kind of real time, but
that is not very critical because it is if I get the reservation done in one minute, it is also
ok, if it takes 5 minutes, it is good if we can happen in 10 seconds, but I do not ever need
it in say 20 millisecond. So, but in when you talk about real time, it could really be about
getting all these processing done in millisecond, microsecond, nanosecond and so on.
And those kind of real time systems with a large volume of varied data, it is a big
challenge.
So, those are the challenges of big data, then there could be variability inconsistency of
the data that you are because maintaining integrity is a big problem; there could be issues
in terms of quality of the data that is called veracity. So, actually these things
characteristics that I have put, there are lot of debates in terms of that or many people
take these 3 and say that there these are the 3 V s of big data. There is a 3 main
characteristics, but off let more and more people are also considering variability and
veracity are as the characteristics of big data. Now as it happens, is if you look into these
requirements and what you have you have a fairly good idea of relational databases now
what they can do, how to design them, how to query them, how to implement them, you
will understand that it is not easy to meet these requirements using the relational model.
(Refer Slide Time: 22:07)
So, we need their non-relational databases to effectively support big data. That is that is a
one major reason that you need big data.
So, that in non-relational databases certainly as the name suggests, do not follow
relational model, they offer flexible schema design. The schema may itself change while
the database is evolving which is not the case in the relational schema is fixed, only the
data can change, but here the schema itself can change. It may be able to handle
unstructured data, make natural language comments like images like audio coming in
which do not fit nearly into your you know table structure, some of the other feature sites
they are typically open source because you know still in an experimental stage and needs
to be scalable and some of the popular ones are, these are the names you must have heard
about at least some of them like MongoDB like Cassandra like HBase and so on.
Now again in terms of the non relational database, it does differ in terms of all non
relational database error are not of the same type, there are there have been 4 different
styles or strategies to actually generate realize these non relational databases, they are
called key value store graph, store columns stores and document store. They are also
known as no SQL databases. I, I personally find the name no SQL a little misnomer, it no
SQL does not mean that you are strictly prohibited from not using SQL in these
databases.
But I would rather like to read it more as no SQL means that it is not only SQL like in a
relational database, you can use only SQL and solve problems; here you need to do lot of
other things beyond that.
So, here is a quick comparison between the relational and the non-relational, in terms of
the flexibility of the data model, relational is very structured when on relational has to
have handle unstructured data, semi structured data and therefore it has to be flexible in
terms of the data model, cost complexity and speed faster less capable, but cheaper and
less complex, but in non-relational, you are talking about much more database operations
highly complex in internal structure usually costlier, performance and scalability
certainly non-relational ones need to be better scalable, consistency have a very strict
consistency rules in relation, but in non-relational you use some kind of you know
eventual consistent system. So, maybe not always not everything is consistent in that
way, enterprise management and integration, relational fits very well into because it is
been around for as you have seen the little bit of history of all these common databases,
it is more than 40 years that they have been around.
So, they easily fit into the IT stack whereas, non-relational is still on the in the in the
agile form of development that is becoming more and more common it fits into the cloud
based development and so on. So, these are some of the distinctions that exist.
And these are the different types of no SQL databases, there is a key value store strategy
Redis and MemcacheDB follow this strategy, graph store is used by orient DB and
Neo4J; column store is used by Cassandra and HBase document store, MongoDB,
Couchbase, they use document store. So, these are I am in this is not just about going a
deeper into what they are or how they are distinguished, I just want you to have an idea
that well. These are different from the relational databases they can do lot of structured,
handling of unstructured data they can actually use a scalability of volume a variety
which is relationships cannot do and, but they have there are different principles for
actually implementing them and there is a deeper.
So, if you are interested, you can take specific courses which deal with the big data and
prepare yourself for the bigger challenges ahead.
I have also done similar to the relational database, I have presented here a tentative
comparative study between these different non no SQL databases in terms of what is the
context in which you use them.
Or now while you are doing this unstructured data handling, there will be lot of data
which is also structured.
So, how do you, along with this know SQL, how do you handle the relational data with
these?
Databases what how do how do the performance compare between these different no
SQL databases and based on all that you can make some judgment and it is it is very
important to in today’s time naturally knowing relational databases the foundational ones
are very important, but it is always good to look forward be with the time and I will urge
that if you have started growing interest in handling of data do take specific courses on
big data and no SQL databases. I will end this discussion with a very simple skill job
profile matrix which Will give you some idea in terms of, it will you can use it for a
certain kind of self-assessment as well.
(Refer Slide Time: 27:37)
So, let me just explain the structure of this matrix, what I have tried to do is here I am
sorry. Here on the left, I have given different typical job profiles this is if you look into
LinkedIn, Naukri and all that you will find these kind of profiles being. So, then at the
lowest level there are application programmers for which typically 0 to 4 years of
experience are asked for.
Then the next level, this is so, this is your kind of your career progression also. If you if
you choose to take up databases as your primary job profession, this next level is a senior
application programmer which requires 2 to 6 years of experience depending on the
organization and depending on your skills. Then you move on to database analyst or
architect which you happen in 4 to 8 years of time and on a little different track because
these are these are primarily in terms of application development and hierarchy on that
and the other is an administrator track who actually administers the database in an
organization, controls all the all that is happening in different database applications,
typically 8 to 10 year’s experience is required.
And some of that, so this these are the about actually in terms of you know profiles that
are related to applications and this is a profile which is related to, if you really want to
become a database engineer in a sense that you want to you know make changes in
Oracle, you want to make changes in say MySQL, you want to make changes in say
MongoDB or say that relation will say the Sybase.
So, if you want to become a database engineer who changes the database system itself or
develops the database system itself, then this is the kind of background you will need.
There is a kind of number of years, you would need and of course it is not a single grid
there are multiple grades, you know junior and mid levels in here and those kind and last
which have shown in different color are the whole set of profiles which relate to
programming the big data, analyzing the big data and so on.
So, at a very initial level, you may not be expected to do all of that schema design and
normalization by yourself, but it would be good to be able to do that, but well there will
be seniors to help you, but if you once you become a senior application programmer that
becomes onward that becomes a critical skill to have. Then the next level would be in
terms of application or database architecture management deciding on how to index,
performance optimization.
So, between these two, there will be certain overlaps a senior application programmer in
addition to doing this might do some of these optimization techniques depending on how
competent he or she is. Or some database architect may focus only on this, but these are
the typical skills that you need. But to be a database administrator, you need all of these
skills, but you are specifically administering a certain organizations enterprises whole
database system. So, it is just not one database application, but a whole lot of databases
and whole lot of user groups, security, network connectivity and all that.
So, that needs certainly bigger experience it can, you can see that experience level is
much higher and the skill sets. If you want to become a database engineer, that is not
focus only on the application side, but also have some more understanding in terms of
actually doing working in the internals of the database systems, then you need whole lot
of additional skills like good knowledge and algorithms, in architecture, in compiler all
of that; only then and coupled with coupled with all the database knowledge, then you
will be able to work as a database system engineer. And in the emerging areas of what is
big data where, you need to have now of course, the I am saying this is 0 to 6, it could be
0 to 8 kind of, not more than that because it did not exist quite a long time ago, but you
need to have a basic level of at least this much of the relational database understanding
and knowledge, but what is critical is a whole set of other skills like, you must be aware
with big data the data mining, warehousing strategies machine learning or is often very
useful in this kind of big data applications, python programming, tensor flow all these
become critical. You have to be a good programmer in any case I mean not only just an
SQL program and you might have to be a good program and in C or C plus plus or
python of these, but that is it that is a very very emerging area.
So, if you can acquire a little bit of besides database you know he said that the basics of
the database along with that if you pick up few basics or from here, you will be able to
enter into the space and that will give you a very very bright future in my view otherwise
you can focus on the application programming stat as I have mentioned. So, this is the
basically skill profile matrix that you have mapping that you have that you can focus on.
So, finally, before I close here a glimpses of companies that are in the very active in the
RDMS space really really any big organization you talk about, they have consultancy
projects, product development different database management back end services and so
on. So, DB application development, I have listed some around 20 companies, but there
are really 100s of them almost. Any big organization in any area you think of, they
require databases. So, in terms of beta based application programmer and senior
programmer and to some extent architect, you have a wide range of jobs available which
you may just grab; if you have been able to study write the basics of the database. The
second group of companies which I show here, these are system development companies
who are actually working on the new DBMS products and services around that.
So, these are companies like Oracle, Teradata or Microsoft and so on, naturally these are
big companies and you need more lot of more skills besides the database like I said
algorithms programming and all that to crack a job here and here are some of some
companies which I have mentioned, but there are many others who are focusing on the
big data space.
So, I have tried to you know these may not be absolutely accurate because you know
these are all collected from different sources, but these are the different companies and
the kind of non relational database that they are focusing with working with. So, if you
pick up certain skills in those in a certain non-relational no SQL database, then you can
target the corresponding companies better or other companies and you can see that all.
You know new generation companies, the companies were working for products for the
next 10, 10, 15 years are in this space.
So, there are whole lot of opportunities for you all if you if you prepare a little hard, then
you will I mean job will run after, you will not have to run after the job.
(Refer Slide Time: 35:56)
So, with that I conclude this course a couple of final words the hygiene words. Read the
DBMS textbook thoroughly and solve exercises. There is no shortcut to that, there is no
other way to master the horse other than this you must practice query coding as much as
you can, practice database design from specification. We are releasing a tutorial on this
where for a hospital management system we are showing from the specification how you
can do the initial schema and the refinements and finally, how can you implement it
using my SQL.
So, do similar practices very heavily. Keep in mind the database the knowledge of
database system alone will not be good enough to get a good job, get a good placement.
So, develop good knowledge in programming data structure, algorithms and discrete
structures; these are the minimum required around the database systems which will really
make you powerful and if you need we are there to help you.
As long as the course is on, the forum would be on. You can post in the forum beyond
that also if you need help, please ask for it mail us and wish you all the very best with
your course in your examination and the future course of your profession in life, all the
very best.