Database System - Notes - Unit-1
Database System - Notes - Unit-1
UNIT 1
What is Database Management System?
Database Management System (DBMS) is software for storing and retrieving users’ data while
considering appropriate security measures. It consists of a group of programs that manipulate the
database. The DBMS accepts the request for data from an application and instructs the operating
system to provide the specific data. In large systems, a DBMS helps users and other third-party
software store and retrieve data.
DBMS allows users to create their own databases as per their requirements. The term “DBMS”
includes the user of the database and other application programs. It provides an interface between
the data and the software application.
Let us see a simple example of a university database. This database is maintaining information
concerning students, courses, and grades in a university environment. The database is organized as
five files:
MySQL
Microsoft Access
Oracle
dBASE
FoxPro
SQLite
Microsoft SQL Server
In contrast with the File Based Data Management System, Dbms has numerous benefits. We are putting light on
some of the considerable benefits here–
1. Data Integrity - Data integrity means data is consistent and accurate in the database. It is essential as there
are multiple databases in DBMS. All these databases contain data which is visible to multiple users. Therefore, it
is essential to ensure that data is consistent and correct in all databases for all users.
2. Data Security - Data security is a vital concept in a database. Only users authorized must be allowed to access
the database and their identity must be authenticated using username and password. Unauthorized users
shouldn’t be allowed to access the database under any circumstances as it violets the integrity constraints.
A DBMS provides a better platform for data privacy thus helping companies to offer an improved data security.
3. Better data integration - Due to the database management system, we have access to well managed and
synchronized form of data making it easy to handle. It also gives an integrated view of how a particular
organization is working and keeps track of how one segment of the company affects another segment.
4. Minimized Data Inconsistency - Data inconsistency occurs between files when various versions of the same
data appear in different places. Data consistency is ensured in the database; there is no data redundancy.
Besides, any database changes are immediately reflected by all users, and there is no data inconsistency.
5. Faster Data Access - The database management system helps the users to produce quick answers to queries
making data accessing accurate and faster.
For any given dataset, dbms can help in solving insightful financial queries like:
(a) What is the bonus given to every salesperson in the last two months?
(b) How many customers have a credit score or more than 800?
(c) What is last year’s profit?
4. Better decision making - Due to DBMS, we now have improved and managed data accessing because of
which we can generate better quality information which can hence make better decisions. Better quality ultimately
improves validity, accuracy and time it takes to read data. It doesn’t guarantee data quality; it provides a
framework to make it easy to enhance data quality.
5. Simplicity - DBMS allows us to understand data better with a clear and simple logical view. With dbms, many
operations like deletion, insertion or creation of file or data, are easy to implement.
6. Recovery and Backup - DBMS automatically takes care of recovery and backup. The users are not required to
take periodical backup as this is taken care of by DBMS. Besides, it also restores a database after a system
failure or crash to prevent its previous condition.
7. Increased end-user productivity - The available data transform into helpful information with the help of
combination tools. It helps end users make better, informative and quick decisions that can make the difference
between success and failure in the global economy.
Additionally, today DBMS is also serving as the backbone of several advanced Technology practices like Data
Science, Data Modelling and Machine Learning.
View of data in DBMS narrate how the data is visualized at each level of data abstraction. Data
abstraction allow developers to keep complex data structures away from the users. The developers achieve this
by hiding the complex data structures through levels of abstraction.
There is one more feature that should be kept in mind i.e. the data independence. While changing the data
schema at one level of the database must not modify the data schema at the next level. In this section, we will
discuss the view of data in DBMS with data abstraction, data independence, data schema in detail.
1. Data Abstraction
2. Data Independence
3. Instance and Schema
Data Abstraction
Data abstraction is hiding the complex data structure in order to simplify the user’s interface of the system.
It is done because many of the users interacting with the database system are not that much computer trained to
understand the complex data structures of the database system.
To achieve data abstraction, we will discuss a Three-Schema architecture which abstracts the database at
three levels discussed below:
Three-Schema Architecture:
The main objective of this architecture is to have an effective separation between the user interface and
the physical database. So, the user never has to be concerned regarding the internal storage of the database
and it has a simplified interaction with the database system.
The physical or the internal level schema describes how the data is stored in the hardware. It also describes
how the data can be accessed. The physical level shows the data abstraction at the lowest level and it
has complex data structures. Only the database administrator operates at this level.
It is a level above the physical level. Here, the data is stored in the form of the entity set, entities, their data
types, the relationship among the entity sets, user operations performed to retrieve or modify the data and
certain constraints on the data. Well adding constraints to the view of data adds the security. As users are
restricted to access some particular parts of the database.
It is the developer and database administrator who operates at the logical or the conceptual level.
It is the highest level of data abstraction and exhibits only a part of the whole database. It exhibits the data in
which the user is interested. The view level can describe many views of the same data. Here, the user retrieves
the information using different application from the database.
We have to create a database of a college. Now, what entity sets would be involved? Student, Lecturer,
Department, Course and so on…
Now, the entity sets Student, Lecturer, Department, Course will be stored in the storage as the consecutive
blocks of the memory location. This is the physical or internal level and is hidden from the programmers but
the database administrator is it aware of it.
At the logical level, the programmers define the entity sets and relationship among these entity sets using a
programming language like SQL. So, the programmers work at the logical level and even the database
administrator also operates at this level.
At the view level, the users have the set of applications which they use to retrieve the data they are interested
in.
Data Independence - Data independence defines the extent to which the data schema can be changed at one
level without modifying the data schema at the next level. Data independence can be classified as shown below:
Logical Data Independence- Logical data independence describes the degree up to which the logical or
conceptual schema can be changed without modifying the external schema. Now, a question arises what is the
need to change the data schema at a logical or conceptual level?
Well, the changes to data schema at the logical level are made either to enlarge or reduce the database by
adding or deleting more entities, entity sets, or changing the constraints on data.
Physical Data Independence - Physical data independence defines the extent up to which the data schema can
be changed at the physical or internal level without modifying the data schema at logical and view level.
Instance - We can define an instance as the information stored in the database at a particular point of time.As
we discussed above the database comprises of several entity sets and the relationship between them. Now, the
data in the database keeps on changing with time. As we keep inserting or deleting the data to and from the
database. Now, at a particular time if we retrieve any information from the database then that corresponds to an
instance.
Schema - Whenever we talk about the database the developers have to deal with the definition of database and
the data in the database. The definition of a database comprises of the description of what data it would contain
what would be the relationship between the data. This definition is the database schema.
SQL is the standard language for database management. All the RDBMS systems like MySQL, MS Access,
Oracle, Sybase, and SQL Server use SQL as their standard database language. SQL programming language
uses various commands for different operations. We will learn about the like DCL, TCL, DQL, DDL and DML
commands in SQL with examples.
(i) CREATE- CREATE statements is used to define the database structure schema:
Syntax:
(ii) DROP - Drops commands remove tables and databases from RDBMS.
Syntax
(iii) ALTER - Alters command allows you to alter the structure of the database.
Syntax: To add a new column in the table
TRUNCATE:
This command used to delete all the rows from the table and free the space containing the table.
Example:
INSERT
UPDATE
DELETE
INSERT: This is a statement is a SQL query. This command is used to insert data into the row of a table.
Syntax:
INSERT INTO students (RollNo, FIrstName, LastName) VALUES ('60', 'Tom', Erichsen');
UPDATE: This command is used to update or modify the value of a column in the table.
Syntax:
UPDATE students
SET FirstName = 'Jhon', LastName= 'Wick'
WHERE StudID = 3;
DELETE: - This command is used to remove one or more rows from a table.
Syntax:
Commit - This command is used to save all the transactions to the database.
Syntax:
Commit;
For example:
Rollback - Rollback command allows you to undo transactions that have not already been saved to the
database.
Syntax:
ROLLBACK;
Example:
SAVEPOINT SAVEPOINT_NAME;
Example:
SAVEPOINT RollNo;
SELECT: This command helps you to select the attribute based on the condition described by the WHERE
clause.
Syntax:
SELECT expressions
FROM TABLES
WHERE conditions;
For example:
SELECT FirstName
FROM Student
WHERE RollNo > 15;
Architecture of DBMS
1. Data- A database is a collection of integrated and shared data. Integrated data represents that an application
programmer can use the data from multiple tables at a particular time for their usage.
For example, for a query regarding to the student information, we may use both the student and faculty tables.
Shared data represents that the single source of data can be used by more than application programmers at
same time.
2. Software :- The software used by the various types of users to access the data in the data base such as
programming language, the SQL commands and graphical user interface (GUI Application)
3. Hardware:- The physical components used in the data base system are the hardware. Hardware includes
(a) I/O devices – Input and output devices that are used for getting the inputs and producing the output in the
system
(b) Processor:- It is a hardware component which is used for converting the inputs to the outputs such as query
to the result. A processor includes Random Access Memory, Control Unit and Memory Unit
(c) Secondary storage device – It is the physical device that is used for storing the data permanently because
RAM can be used for only temporary storage.
(d) Network :- A network is a collection of computer that are connected through the networked components.
Multiple computers can be connected through the network to share the information between the systems/users.
4. Users – The users are of 4 types. They are
(a) Application programmers – These users know programming and access the data in the database by writing
their own programs (applications)
(b) Sophiscated Users – These users are provided with some commands such SQL commands to access the
data in the data base. They need not know programming
(c) Naïve Users – These users are the low level and end users. They are also permitted to access the data in
the database by providing them the graphical user interface applications.
(d) Data base Administrators (DBAs) – These users are administrating the data in the data base. He/She has
the control over the data base. The responsibilities of DBA are
5.Other Components
1. Query Processor: It interprets the requests (queries) received from end user via an application program into
instructions. It also executes the user request which is received from the DML compiler.
Query Processor contains the following components –
DML Compiler: It processes the DML statements into low level instruction (machine language), so that they
can be executed.
DDL Interpreter: It processes the DDL statements into a set of table containing meta data (data about
data).
Embedded DML Pre-compiler: It processes DML statements embedded in an application program into
procedural calls.
Query Optimizer: It executes the instruction generated by DML Compiler.
2. Storage Manager: Storage Manager is a program that provides an interface between the data stored in the
database and the queries received. It is also known as Database Control System. It maintains the consistency
and integrity of the database by applying the constraints and executing the DCL statements. It is responsible for
updating, storing, deleting, and retrieving data in the database.
It contains the following components –
Authorization Manager: It ensures role-based access control, i.e,. checks whether the particular person is
privileged to perform the requested operation or not.
Integrity Manager: It checks the integrity constraints when the database is modified.
Transaction Manager: It controls concurrent access by performing the operations in a scheduled way that it
receives the transaction. Thus, it ensures that the database remains in the consistent state before and after
the execution of a transaction.
File Manager: It manages the file space and the data structure used to represent information in the
database.
Buffer Manager: It is responsible for cache memory and the transfer of data between the secondary storage
and main memory.
Data Dictionary: It contains the information about the structure of any database object. It is the repository of
information that governs the metadata.
Indices: It provides faster retrieval of data item.
SQL provides set operators to compare rows from two or more tables or to combine the results obtained from two
or more queries to obtain the final result. These operators are used to join the results of two (or
more) SELECT statements. While working with SQL, you'll need to query the data from two or more tables, and
instead of joining these two tables, you can use set operators that list the results from the given tables to a single
result or in different rows. That means for some bunch of the problems, set operators are easy to use than joining
data.
There are some rules that you have to follow while applying set operators in SQL. These are mentioned below:
The number of columns in the SELECT statement on which you want to apply the SQL set operators
must be the same.
The order of columns must be in the same order.
The selected columns must have the same data type.
If you want to order/sort the results, the ORDER BY clause must go at the end of the last query. You
can't add ORDER BY inside each SELECT query before the set operators.
SELECT first_select_query
set_operator
SELECT second_select_query
The above example shows two SELECT queries with the set operator in the middle. As mentioned above in the
rules, if you select two columns in the first query, you must select two columns in the second query. The data
types also need to be compatible, i.e., if you select two character types in the first query, you must also do the
same in the second query.
There are different types of set operators that are mentioned below:
UNION
UNION ALL
MINUS
INTERSECT
Let us look into each set operator in more detail with examples. This article uses the following two tables to
execute the queries. All the operations are performed on the comp1_employees and comp2_employees table
that is given below.
comp1_employees:
employee_id employee_name employee_city
comp2_employees:
Union
UNION combines the results of two or more SELECT statements. To successfully execute the operation
of UNION, the number of columns and the data type must be the same in both tables. After performing
the UNION operation, the duplicate rows will be eliminated from the results.
Now, let's take an example to clearly understand how the UNION operator works.
SELECT * FROM comp1_employees
UNION
SELECT * FROM comp2_employees;
As shown in the above code snippet, there are two SELECT queries, and a UNION operator is used. The
first SELECT query will fetch the records from comp1_employees, and the second SELECT query will fetch the
records from comp2_employees, and the UNION operation is performed with the results of both the query.
After performing the UNION operation with both the tables, all the records from the comp1_employees table
and comp2_employees table are displayed except for the duplicate data, i.e., employee_id -- 2 and 77 are
duplicates. Hence, they are displayed only one time.
Union All
UNION and UNION ALL are similar in their functioning, but there is a slight difference. UNION ALL is also used to
combine the results of two or more SELECT statements. To successfully execute the operation of Union All, the
number of columns and the data type must be the same in both tables. After performing the UNION
ALL operation, the duplicate rows will not be eliminated from the results, and all the data is displayed in the result
without removing the duplication.
Now, let's take an example to clearly understand how the UNION ALL operator works.
As shown in the above code snippet, there are two SELECT queries, and a UNION All operator is used. The
first SELECT query will fetch the records from comp1_employees, and the second SELECT query will fetch the
records from comp2_employees, and the UNION ALL operation is performed with the results of both the query.
After performing the UNION ALL operation with both tables, all the records from the comp1_employees table
and comp2_employees table are displayed. Since it's a UNION ALL operation, all the records are displayed,
including the duplicate words, which is not the case in the UNION operation.
UNION operation removes duplicates from the final result, whereas UNION ALL operation does not remove
duplicates and displays all the data.
Intersect
The INTERSECT operator allows you to find the results that exist in both queries. To successfully execute the
operation of INTERSECT, the number of columns and the data type must be the same in both tables. After
performing the INTERSECT operation, the data/records which are common in both the SELECT statements are
returned.
Now, let's take an example to clearly understand how the INTERSECT operator works.
As shown in the above code snippet, there are two SELECT queries, and an INTERSECT operator is used. The
first SELECT query will fetch the records from comp1_employees, and the second SELECT query will fetch the
records from comp2_employees, and the INTERSECT operation is performed with the results of both the query.
After performing the INTERSECT operation with both tables, all the data/records that are common from
the comp1_employees table and comp2_employees table are displayed. As you can see in the result above,
the INTERSECT gets the results that exist in both queries. And the employee_id -- 2 and 77 exist in both tables;
hence they are displayed in the final result.
Minus/Except
The MINUS operator allows you to filter out the results which are present in the first query but absent in the
second query. To successfully execute the MINUS operation, the number of columns and the data type must be
the same in both tables. After performing the MINUS operation, the data/records which are not present in the
second SELECT statement or query are displayed.
Note: The MINUS operator is supported only in Oracle databases. For other databases like SQLite,
PostgreSQL, SQL server, you can use EXCEPT operator to perform similar operations.
As shown in the above code snippet, there are two SELECT queries, and an EXCEPT operator is used. The
first SELECT query will fetch the records from comp1_employees, and the second SELECT query will fetch the
records from comp2_employees, and the EXCEPT operation is performed with the results of both the query.
After performing the EXCEPT operation with both tables, all the data/records that are present in
the comp1_employees table but not in the comp2_employees table are displayed. As you can see in the result
above, the EXCEPT gets the results that are present only in the comp1_employees without any duplicates.
The Cartesian join, also called Cross join, is used to generate a paired combination of each row of the first table
with each row of the second. That means it results in the Cartesian product of two or more tables.
If you don't specify a condition when joining two tables, the database system combines each row from the first
table with each from the second. This type of join is called Cartesian join or Cross join.
Below are the two tables of employees and departments on which Cartesian join is applied in the example below.
employee_id employee_name
1 Bhim Shekh
2 Palash Yadav
3 Ela Shikha
4 Mrinal Thakur
city_id dept_id dept_name
011 1 Finance
022 2 Admin
033 3 Intelligence
044 4 Serices
Let's take an example to clearly understand how the Cartesian join works.
The above query will join the employees and dept table with the CROSS JOIN keyword, and the result obtained
is the paired combination of the employees and dept tables, i.e., after applying cross join or cartesian join, each
row from the employees table is combined with each row from the dept table.