0% found this document useful (0 votes)
19 views

Chapter1-Introduction

A Database Management System (DBMS) is software that facilitates the creation, management, and secure access of databases, addressing issues like data redundancy, integrity, and security. The document discusses the evolution of database solutions, various data models, and the purpose of DBMS compared to traditional file systems. It also covers levels of abstraction in database systems and identifies different types of database users.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Chapter1-Introduction

A Database Management System (DBMS) is software that facilitates the creation, management, and secure access of databases, addressing issues like data redundancy, integrity, and security. The document discusses the evolution of database solutions, various data models, and the purpose of DBMS compared to traditional file systems. It also covers levels of abstraction in database systems and identifies different types of database users.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

DBMS - Introduction

Contents
What is Database Management System (DBMS)? .................................................................................. 2
DBMS based applications ................................................................................................................ 2
Historical Perspective / How database solutions have evolved? ............................................................ 2
Purpose of Database Systems / File System vs DBMS............................................................................. 3
Data isolation .................................................................................................................................. 3
Difficulty in accessing the data........................................................................................................ 4
Data redundancy and inconsistency ............................................................................................... 4
Integrity problems........................................................................................................................... 4
Atomicity problems ......................................................................................................................... 4
Concurrent-access anomalies ......................................................................................................... 4
Security problems ........................................................................................................................... 4
Various Data Models (Historical & Current) ............................................................................................ 5
Hierarchical data model .................................................................................................................. 5
Network data model ....................................................................................................................... 5
Relational data model ..................................................................................................................... 5
Entity-Relationship model ............................................................................................................... 5
Object relational data model .......................................................................................................... 5
Semi-structured data model / Document based data model ......................................................... 5
Key-Value based data model ........................................................................................................... 5
Column family data model .............................................................................................................. 6
Graph based data model ................................................................................................................. 6
Levels of Abstraction & Data Independence ........................................................................................... 6
Physical level (Physical Schema)...................................................................................................... 6
Logical level (Conceptual Schema) .................................................................................................. 6
View level (External Schema) .......................................................................................................... 7
Database Users ....................................................................................................................................... 7
Naïve users ...................................................................................................................................... 7
Application programmers ............................................................................................................... 7
Sophisticated users ......................................................................................................................... 7
Database Administrators (DBAs) ..................................................................................................... 7
Database Languages................................................................................................................................ 8
Data Definition Language (DDL) ...................................................................................................... 8

RamakrishnaN - MGIT
Data Control Language (DCL) .......................................................................................................... 8
Data Manipulation Language (DML) ............................................................................................... 8
Transaction Control Language (TCL) ................................................................................................ 8
PL/SQL (Oracle), T-SQL (Microsoft SQL Server) ............................................................................... 8
Database Management System’s Structure/Architecture (Components in a Database System)............ 9
Database based application/service high level architecture ................................................................ 10

What is Database Management System (DBMS)?


Database is a collection of interrelated data. Every organization uses database, for storing their data.

Database Management System is a software/server application/service that supports creating


databases, defining structures to store the data, storing data into them, retrieving that data in
efficient and convenient manner. It also ensures secure access to that data (give access to authorized
users), adheres to data consistency requirements.

Database Systems typically deal with very large volumes of data.

DBMS based applications


In several domains DBMS is being used.

• Enterprises – to store their employee data, accounts data, inventory of items, production of
items, sales of items and so on.
• Banking and Finance – to store their customer data, various accounts data like saving
accounts, current accounts, loan accounts and so on.
• Universities – to store their departments data, courses data, academics data, students data
and so on.
• Telecommunications – to store their network/tower data, customer data, call logs, monthly
generated bills and so on.

Historical Perspective / How database solutions have evolved?


1950s and early 1960s: Magnetic tapes were used for data storage, which are sequential access
based. Data processing programs were bound to the limitations of magnetic tapes. Processing used
to involve reading data from multiple tapes and finally writing on to new tapes, which were then
treated as master tapes.

1960s - 1970s: Wide spread of hard disks has improved the data processing capabilities of programs,
as direct access to desired data is possible (No sequential access limitation). Hierarchical / Tree based
data structures were used to store the data. IBM’s Information Management System (IMS) is one
such initial database system.

In Tree structure, as every data item can have only one parent data item, this structure has posed
limitations sooner. Later, Network Model has developed, where a data item can have multiple

RamakrishnaN - MGIT
relationships/associations with other items. Integrated Data Store (IDS) was one of the early
implementations of this model.

Programs were developed to access/operate on the data from these structures.

1970s - 1980s: The Relational model and non-procedural ways of accessing/operating on the data
was proposed by E. F. Codd. The simplicity of the relational model and hiding of internal storage
structure details while accessing the data has gained prominent attention from the industry. IBM has
introduced first relational database system SQL / DS, from its System R project. Initial commercially
released database systems are IBM DB2, Oracle, Ingres and so on. Later, several systems got
developed by various enterprises like Microsoft SQL Server, MySQL, PostgreSQL, SQLite and more.

In file-based programs, programmers had to come up with efficient ways for accessing the data,
based on the data model used. While with relational database systems, programmers have to deal
with only what data is needed, where the efficiency aspect is taken care by the database system
itself.

1990s: Object-Oriented databases and Object-Relational databases got introduced with the mix of
object-oriented concepts and relational databases, allowing handling of complex data types and
having support for inheritance, encapsulation…etc.

Distributed databases have emerged, allowing the data to be spread across multiple physical
locations, with synchronization techniques needed. High Availability without any down time, is one
of the new key requirements then.

Late 2000s/2010s: With the raise of social media, e-commerce, IoT devices and the need to handle
massive amounts of unstructured or semi-structured data like chat messages/posts, documents,
multimedia content and Key-Value pairs, several different types of NoSQL databases have emerged.
Mongo DB, Apache Cassandra, Amazon Dynamo DB, Microsoft Azure Cosmos DB, Neo4j are few
such systems.

With the growing popularity of Cloud computing, database systems have started moving to Cloud
environment, leaving the machine management to Cloud vendors. Amazon RDS, Google Cloud
Spanner, Microsoft Azure SQL are few such Cloud based Relational Database solutions.

Purpose of Database Systems / File System vs DBMS


Before the invent of DBMS systems, enterprises and universities stored the data in files, supported by
Operating Systems. Multiple programs were developed to access/operate on the data in those files.
For example, universities used to have separate files to track details of list of students, faculty list, list
of courses, student-course enrolment information for each department. Every department used to
maintain such list of files and programs of their own, suiting to their needs. There are several
disadvantages with this file-based approach

Data isolation – Data is scattered over multiple files. File formats may vary over the period and
highly dependent on the programmer, who is creating it. Multiple file formats could lead to
complications while coming up with programs needing to deal with data from multiple of them.
Languages used for programming could also vary, as per the Programmer’s convenient over the
period. So, data formats may not be uniform, programs handling that data may not be uniform, over
the period.

RamakrishnaN - MGIT
Difficulty in accessing the data – It is difficult to have complete list of programs up front foreseeing
the future requirements, which can retrieve the data as needed. Let us say, a need comes to get the
list of students who are from a particular area in the city. A program might be present to fetch list of
students with their addresses, but might not be within particular area of the city. So, either one has
to use the existing program and manually filter the data based on address or first develop a program
which does the needed filtering and returns only the desired students list. Either of these
approaches are fast or efficient. Later some other requirement may arise, for which program is not
yet available!

Data redundancy and inconsistency – Same data might be part of multiple files (referred as
redundancy), wasting storage space and eventually leading to data inconsistency as well. For
example, a student enrolled to a minor degree as well along with a major degree. So, the department
of minor degree also maintain the student information in their file. Let us say, if student’s address got
changed and if one department updated it while the other is not! Eventually leads to confusion
about which one is up to date…etc.

Integrity problems – Data values stored might needing to satisfy certain constraints always. For
example, a student’s marks can never be negative, a phone number must contain 10 digits and so on.
It is difficult to enforce such constraints through multiple programs handling the data in the files. It is
more difficult to enforce constraints spanning across multiple files. For example, ensuring that in
student-course enrolment data, we cannot expect to have a student or course, that is not present in
the list of students or list of courses of the department.

Atomicity problems – A computer system or program is subjected to unexpected or abrupt failures


or crashes. In such cases, data should be left in consistent state. For example, Accounts department
of the university transferring some fund to CSE department of the university. Let us say, after amount
got deducted from Accounts department, program/system got crashed before crediting the amount
to CSE department. This is an inconsistent state to be in. The overall amount balance of the
university should have been same as before to this transfer, but not. Either both debit and credit
should have happened or neither of them.

Concurrent-access anomalies – Multiple programs exist that operate on the same data and several
times multiple such programs may need to be executed concurrently at the same time knowingly or
unknowingly. Concurrent / Parallel execution may leave the data in inconsistent or undesired state.
For example, for a course, we would like to allow only 40 students to be enrolled. After 39 students,
let us say 2 enrolments have started in parallel, which when initially checked found to be less than 40
and went ahead with further execution and finally might end up leaving the enrolment data with 41
enrolments. Another example, where Accounts department is transferring some funds to CSE and
ECE departments while some external transaction is crediting some balance to Accounts department.
If these 3 operations happen concurrently, overall balance might end up in inconsistent state than
desired. Somebody has to supervise all the program executions always, which is difficult and prone
to human mistakes.

Security problems – Not every user or programmer need to have access to all the data of the
department. For example, only HoD of the department should have access to the funds balance of
the department, not every faculty/programmer. A faculty/programmer should not be able to see the
salaries of all other faculty! It is difficult to enforce such data security constraints in file-based system
data.

RamakrishnaN - MGIT
All these difficulties led to the development of Database Management Systems, for handling the
data. A typical Database Management System currently in market like Oracle, Microsoft SQL Server
handles all the above aspects, while maintaining the data.

Various Data Models (Historical & Current)


Data Models of databases specify how the data items are organized relative to each other.

Hierarchical data model – Tree like structure. Suitable only for use cases like organization employee
data, where every employee typically has only one manager and reportees could be multiple.

Example - IBM’s Information Management System (IMS).

Network data model – improved version over the above Hierarchical model, where multiple
relationships are supported. A graph kind of structure, with nodes representing the entities and
connections representing the relationships among those entities. Like, student and course are 2
nodes and student enrolling for the course is a relationship between them. Literally pointers are used
to relate one data item with other data items.

Example – Integrated Data Store (IDS).

Relational data model – Tabular (rows and columns) structure for storing the data. Contains a
collection of tables. Columns represent the attributes that are needed to be tracked, while rows
represent individual records. Both the entities and relationships are stored as records (rows) in
tables. Like, List of students are in a table, List of courses are in another table, Students enrolling to
courses are in another table. SQL is used to query the data from database system.

Examples – Oracle, Microsoft SQL Server, PostgreSQL, MySQL, SQLite.

Entity-Relationship model – Used only for database design. Not for actually storing the data. Deals
with a collection of objects / entities and relationships among those entities. An entity is nothing but
a thing or object in the real world that is distinguishable from other objects. These objects would
have associations/relationships with other objects. This standard based pictorial representation helps
in avoiding the understanding gaps among the stakeholders during database designing.

Object relational data model – is an extension to the Relational data model, where data is stored in
tabular form only, with support for user defined data types for columns like Address, lists for columns
like phoneNumbers…etc. Entities can inherit attributes from other Entities. Entities can have allowed
methods to operate on the data of it (data encapsulation).

Example – Oracle, PostgreSQL

Semi-structured data model / Document based data model – Unlike relational data model, this
permits the data items of same type, to have slightly different set of attributes. Data is stored in the
forms of documents of JSON format or XML format or Blob (Binary Large Object) types. Suitable for
unstructured or semi-structured data, like user profiles, video/audio/document content.

Examples – MongoDB, CouchDB, Microsoft Azure Blob Storage

Key-Value based data model – data is stored as Key-Value pairs, where each Key is unique and
Value against the key can be of any type. A Value can be a serialized JSON content as well.

Examples – Amazon Dynamo DB, Microsoft Azure Cosmos DB, Redis cache

RamakrishnaN - MGIT
Column family data model – data is stored in terms of column families, unlike in terms of complete
rows in relational model. In relational model, data is read row by row and complete row even if we
are interested in one/some particular column(s) only. In this column-based data model, data can be
read column wise very efficiently. Column wise data can also be stored efficiently with good
compression techniques. Time series data generated from IoT sensors or Big data analytics, prefer
this data model for storing the large volumes of data and for efficient real time data analysis.

Examples – Apache Cassandra, Google Cloud BigTable

Graph based data model – Represents the data as collection of nodes, relationships and properties.
Nodes are used for Entities and Connections (can have directions) are used for Relationships and
Properties are key-value pairs holding additional information in Nodes and Connections. Graph
DBMSs employ efficient data structures, such as adjacency lists or matrices, to store and retrieve
data. Social networking applications use this data model. Efficient for traversing across the
relationships.

Example – Neo4j, Amazon Neptune.

Levels of Abstraction & Data Independence

A database system is a collection of interrelated data and provides mechanisms to access or operate
on that data. A major purpose of database system is to abstract out internal complexities behind
actual data storage and provide a simple and efficient way for accessing or operating on such data.

Physical level (Physical Schema) is the lowest level that deals with how actually data is stored in
the system. Deals with complex low level data structures for persisting/retaining the data on the disk.
Structures / organizes the data in such a way to support faster access to the data, for more frequent
use cases. Majority of these aspects are handled by the database management system itself and
even database administrators as well do not have much role to play over here.

Logical level (Conceptual Schema) is the next higher level that deals with what data is stored in the
database & what relationships exists among that data. It hides the physical layer complexities
completely (referred as physical data independence). Database Administrators who decide what data
to be stored in the database, deals with this logical level of abstraction. Deals with data in terms of

RamakrishnaN - MGIT
simple records (or rows) and columns/fields in that record. Does not bother about how this data is
stored at physical disk level.

View level (External Schema) is the highest level of abstraction that exposes only the parts of the
database, instead of entire database. Though logical level uses simpler structures, but complexity
remains due to huge volumes of data and all the fields of data, which may not be required for a naïve
user of the database. For example, a naïve clerk of the department, might be interested in faculty
name and phone number information, but not subject expertise information. A view can present
only the desired aspects of the information, hiding all the unwanted details.

Generally, several Views exists over the same data, as per the desired abstraction per usage.

Better to have applications developed based on these Views, instead of logical level, so that,
applications can remain independent of the changes to the logical level if unavoidable.

It also helps in securing the access to the data, by granting permissions to end users at desired View
level only. DBAs can grant access to end users on Views exposing the relevant information, instead of
complete logical level schema/data.

Database Users
There are few types of database system users, differentiated by the way they are expected to interact
with the system. Different types of interface has been given for each of those user types.

Naïve users are users who interact with the database system by invoking one of the pre-existing
application programs. For example, a student in university, going to a web page where course
enrolment can be made, filling in all the needed details and enrolling into the course. Here, user adds
the data into the existing database or modifies the existing data, through an existing web interface.

Application programmers are software professionals who develop application programs that
interact with backend database, to simplify the database access needs to other end users like naïve
users.

Sophisticated users interact with the database system by writing database queries (SQL) on their
own. No need to have any predefined application or web interface.

Database Administrators (DBAs) are the users who handle entire database system of the
enterprise/university. They

• Design the database for their organization, as per the data that needs to be maintained.
• Define database schema accordingly, in the database system. Altering the schema and
migrating the data, if needed arises post initial creation.
• Create Users. Grant/Revoke permissions to appropriate database users, to secure the data
stored.
• Maintain the database servers with regular system updates and so on.
• Take regular backups of the data in the database, to overcome the natural disasters.
• Ensure sufficiently more disk space availability for near future demands and expand the disk
drives if required.
• Database system performance indicator monitoring like CPU usage, network usage, to take
corrective measures, if any needed.

RamakrishnaN - MGIT
Database Languages
SQL – Structured Query Language is a standard non-procedural language used in Database
Management Systems.

Data Definition Language (DDL) – Helps in defining the database schema, i.e., what all data needs
to be stored in the Database, what all consistency constraints need to be honoured on that data…etc.
DBA defines all this.

Examples – Create table, Alter table commands

Data Control Language (DCL) – Helps in controlling the access to the data in the database, i.e.,
defining which user is permitted to access which part of the data in database. Sometimes this is
referred as part of DDL itself. DBA defines all this.

Examples – Grant, Revoke commands

Data Manipulation Language (DML) – Helps in inserting new data into database, retrieving the
data from database, updating the data in the database...etc. All users of the database, uses these
declarative commands, i.e., users specify only what needs to be done without any details of how?
Database Management Systems employ efficient techniques for operating on the data
efficiently/fastly.

Examples – Select, Insert commands

Transaction Control Language (TCL) – Helps in dealing with Transactions. Transactions allow us to
treat a group of SQL (DML) operations as one work. Either completely executed or nothing is
executed.

Examples – BEGIN TRANSACTION, COMMIT, ROLLBACK

PL/SQL (Oracle), T-SQL (Microsoft SQL Server) is procedural extension to SQL. It allows to write
procedural code (loops, conditions) to execute SQL statements in a more flexible way. Used for
writing the Stored Procedures, Triggers...etc.

SQL is solely meant for operating on data in the database and does not contain general purpose
programming language constructs or capabilities as in C/C++/Java…etc like taking user input,
performing a network communication and so on. Several times, we need to embed SQL commands
into application programs of languages like C/C++/Java and so on. To achieve it, API is available with
standards like Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC).

For example, university expose web pages/forms for allowing the students to enrol into courses, by
filling in all the needed details on their own. Universities would have application to generate payroll
cheques for faculty on monthly basis. All such requirements would involve application programs
needing to access their database, with ODBC/JDBC.

In majority of real-world applications, interactions happen like below:

Application (C++/Java/…) → ODBC/JDBC → Stored Procedures in Database System → SQL commands

RamakrishnaN - MGIT
Database Management System’s Structure/Architecture (Components
in a Database System)

Database System Architecture / Components of Database System

Query Processor layer


DDL Interpreter – Interprets the DDL commands and saves schema definition into Data Dictionary.

DML Compiler & Organizer / Optimizer – Compiles DML query commands into low level instructions
that Query Evaluation Engine can understand. A query command can usually be translated into
couple of alternative evaluation plans, leading to same result. This DML optimizer performs query
optimization, i.e., it picks the most efficient evaluation plan among the alternatives.

Query Evaluation Engine – Executes the low-level instructions generated for the query and returns
the results of the query.

Storage Manager layer


Authorization Manager – Checks if the user is authorized to access the seeking data or not?

Integrity Manager – Checks and ensures that the integrity constraints defined over the data are
maintained always. That is, it ensures that, data always adheres to the domain constraints,

RamakrishnaN - MGIT
uniqueness constraint, referential integrity constraint across the data/relations and so on, during
every data update operations like insert / delete / update…etc.

Transaction, Lock & Recovery Manager – Ensures that data remains in consistent state, despite of
system failures/crashes in between the query executions. Ensures that data remains in consistent
state, even when concurrent query executions happen on same data.

File Manager – Data, the set of records, are stored in hard disk as set of files. This File Manager,
keeps track of list of pages/data blocks of files, allocation of needed space on the disk storage…etc.

Buffer Manager – Responsible for fetching the needed data blocks from disk to main memory during
query executions. Maintains the cache in memory for faster access to the frequently accessed
underlying data.

Disk Storage layer


Data Dictionary – Store the metadata of database, i.e., definitions of database tables and views,
constraints, indices, stored procedures, triggers, user access controls, disk level file information of
each table…etc.

Data files – Store the actual data/set of records.

Indices – Help in achieving faster access to the underlying data stored. Maintains pointers to the data
items, which help in retrieving the needed data faster.

Statistical Data – Contains audit records of updates on the data. This is generally used by recovery
manager to ensure data consistency.

Database based application/service high level architecture


Database based applications are usually partitioned into two or three parts. Below Three-tier
architecture is more common.

3-tier architecture

Client machine merely acts like a front end / UI to the user. The application client communicates with
the application server on the server side, which in turn establishes the connection to the database
system for accessing the data. Application server contains the core business logic which decides what

RamakrishnaN - MGIT
data to be fetched from the database system. Application server connects to database system
typically with ODBC/JDBC.

Below is 2-tier architecture, which is more of educational purpose.

2-tier architecture

RamakrishnaN - MGIT

You might also like