0% found this document useful (0 votes)
11 views

Introduction 1

The document outlines a course on Database Management Systems (DBMS) at NIT Durgapur, detailing its objectives, content, and various data models. It covers fundamental concepts of DBMS, including data organization, transaction processing, and the differences between structured and unstructured data. Additionally, it discusses the advantages and disadvantages of using DBMS compared to traditional file systems, emphasizing the importance of data independence and the three-schema architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Introduction 1

The document outlines a course on Database Management Systems (DBMS) at NIT Durgapur, detailing its objectives, content, and various data models. It covers fundamental concepts of DBMS, including data organization, transaction processing, and the differences between structured and unstructured data. Additionally, it discusses the advantages and disadvantages of using DBMS compared to traditional file systems, emphasizing the importance of data independence and the three-schema architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Database Management System

(CSC 631)

Sanghita Bhattacharjee
Dept of CSE
NIT Durgapur
Course Objectives

 Understand the basic concepts and appreciate the applications of database systems
 Comprehend the fundamentals of design principles for logical design of relational
databases
 Apply the query writing skill and its subsequent optimization
 Discuss the basic issues of transaction processing and concurrency control
Course Content

 This course highlights basic concepts of DBMS, the principals of


DBMS, architecture of DBMS, ER model and EER model,
relational algebra, tuple relational calculus, SQL, database design ,
normalization, transaction management, concurrency control,
recovery
 This course gives a concept of distributed database
Acknowledgment / References

 A Silberschatz, H F Korth and S Sudarshan, Database System Concepts, 5th Edition, 2006
 Ramez Elmasriand Shamkant, B Navathe, Fundamentals of Database Systems, 3rd Edition,
Addison Wesley, 2000
 Video lectures:
(i) Database Management System by Prof. Partha Pratim Das
(ii) Introduction to database systems by Prof. P. Sreenivasa Kumar
Lecture 1
Introduction

 In computerized information system, data is the basic resource of


the organization / enterprise. So, proper organization and
management for data is required for the organization to run
smoothly. Database management system deals the knowledge of
how data stored and managed on a computerized information
system. In any organization, it requires accurate and reliable data for
better decision making, ensuring privacy of data and controlling data
efficiently.
Database
 What is Data ?
By data, we mean the known facts that can be recorded and that have implicit meaning. It
can also be defined as the representation of facts, concepts or instruction in a formal
manner, which is suitable for understanding and processing
e.g: text, number, speech, image, audio, video, sensor data (traffic data)

 Database is the collection of inter related data

 What is Information ?
Information is the processed data on which decisions and actions are based. Or information
is processed meaningful data. Information can be defined as the organized and classified
data to provide meaningful values
Eg: “ Age of Rohan is 18”
Database Management System

 DBMS is the set of programs that are used to access and modify the data in
database
 DBMS is a general purpose software system that enables
 Creation of large disk-resident databases
 Efficient retrieval of data
 Concurrent use of the system by allowing multiple users in a consistent manner
 Avoid redundancy of data and provides correctness of data
 Ensure availability of data in spite of system failures (disk failure, power failure,
software failure etc.)

 It provides an environment that is both convenient and efficient to use in order to


store and access information in database
Database
 Purpose of database
 Collect and manage various data about a real-world enterprise or part of an
enterprise
 Maintain the data to serve various purposes of enterprise
 Update database to reflect the changes
An Example

 A common example is : University Database


 Data about students, faculty, courses, staff, alumni, course- taken, grade_report
etc.
 Reflects the academic aspects of the university
 Purpose: To keep an accurate track of the academic activities of the university
Other Examples:
 Banking: all transactions
 Airlines: reservations, schedules
 Sales: customers, products, purchases
 Manufacturing: production, inventory, orders, supply chain
 Human resources: employee records, salaries, tax deductions
 Databases touch all aspects of our lives
File System Storage based Approach
 File-processing system is supported by a conventional operating system.
 In the early days, file system was used to store data
 A file is a collection of related information
 A system of files and collection of application programs manipulating them is a file-based
system
File System Storage based Approach

Text file DBMS (My SQL)

1 Sohini Kolkata 9434578960 Sl no name Address phno


2 Deepika Haldia 9567123469 1 sohini Kolkata 943457896
…………… 0
…………..
2 Deepika Haldia 956712346
9

File system

File system

Memory/HD
Memory/HD
Structured data vs Unstructured data
Structured data Unstructured data

Has predefined format or predefined data model Cannot be arranged according to predefined
or schema. data set
Examples: Adhar no, address, name, credit card Examples: audio, video, image, geo spatial data,
no, age, price sensor data ( traffic data, weather data), email
content, social media data( whatsapp, FB data)
Textual only Textual and non textual too

Quantitative . Easy to store and access Qualitative. Difficult to mange and access

Relational data based to store it Relational database cannot be used to store


unstructured data
Human generated , machine generated Human generated , machine generated

SQL in RDBMS is used to access structured data Platform like No-SQL (Not only SQL) is used for
RDMS housing, managing and using unstructured data.
MONGO DB is used to store
Data is stored in data ware house Data is stored in data ware house, data lakes
Drawbacks of using File Systems (cont.)
 Uses unstructured data ( e.g. :: images, audio, video, email content,
social media data)
 Data redundancy and inconsistency
 Multiple file formats, duplication of information in different files
 Difficulty in accessing data
 Need to write a new program to carry out each new task i.e. each
different data access request (a query) performed by a separate
program
 Data isolation — multiple files and formats
 Integrity problems
 Integrity constraints (e.g. account balance > 0) become part of
program code
 Hard to add new constraints or change existing ones
Drawbacks of using File Systems (cont.)
 Atomicity problem
 Failures may leave database in an inconsistent state with partial updates
carried out
 Example: transfer of funds from one account to another should either
complete or not happen at all
 Concurrent access by multiple users
 Concurrent accessed needed for performance
 Concurrent accesses of system in uncontrolled manner can lead to
inconsistencies
 Example: two people reading a balance and want to update the account at the
same time
 Security problems
 Database systems offer solutions to all the above problems
Advantages of DBMS
 Controlling redundancy
 Improved data sharing
 Data integrity
 Security
 Data consistency
 Efficient data access
 Data independence

 Disadvantages
 Little bit complex
 Need more memory to run
A Simple Database System Environment

Users/ programmers

Application programs/ queries

Software to process queries


DBMS software

Software to access the stored


data

Meta data ( stored Stored database


data definition)
Abstract View of Data

 Major purpose of database system is to provide users with an


abstract view of data. This implies that the system hides certain
details of how data is stored and maintained.
Levels of Abstraction

 Physical (Internal) level


 This lowest level describes how data/ record (e.g., student
record) is stored in memory
type instructor = record
 Logical (Conceptual) level id : string;
name: string;
 Next higher level describes what data is stored in database dept_name: string;
and what relationship exits between those data, tables, salary : int;
attributes, and constraints end;
 View (External) level
 Application programs hide details of data types
 Describes only part of the entire database
 Consists of set of views
 Views can also hide information (e.g., grade, salary) for
security purposes
Data Models
 A data model- a collection of concepts that can be used to describe the structure of a database –
provides necessary means to achieve this abstraction. By structure of database, we mean
 Data types
 data relationships
 data semantics
 data constraint - that should hold on the data
 Most data models also include a set of basic operations for specifying retrievals and updates on
the database
 Conceptual data model or high level data model
 Provide the concepts that close to the way many users can perceive
 Useful for understanding . Stakeholders and database programmer create this
 Low level or physical data model
 Provide the concepts that describe how data is stored in memory i.e. file format, file
structure
 This is meant for computer specialist or DBA. Not for end user
 Representational model
 It gives logical representation of data without giving details of physical representation.
Database programmer or developer create this
ER Model

 ER (Entity Relationship Model)


 An example of conceptual level data model
 Graphical representation of database design
 It provides the concepts of entities, relationships and attributes
 Example : The University Database
 Entities: student, faculty, course, department, alumni, staff etc.
 Relationships: enrolment relationship between student & course, associate
relationship between faculty member, department etc.
 Attributes: name, roll_number, address etc., of student entity ; name,
emp_code, phone_no, emp_name etc., of faculty entity etc.
Example of ER Model

department address C_code

m n
course
student enroll

C_name credit
name roll_no

Entity: Student, course


Relationship: Enroll
Attributes: Name, roll_no, address, credit
Relational Model

 An example of representational model


 Describes logical representation of database without giving physical details of database
 Relational model uses a collection of tables to represent both data and relationship
among the data. Each table has multiple columns, and each column has a unique name.
Relation
name  Tables are also known as relations.
Attributes
Student

Roll_no Name Address Department


CS19001 Ravi Delhi CSE
CS19002 Nitesh Kolkata CSE
BT19001 Rekha Kolkata BT
CH19005 Raja Kanpur CHE
Tuples
A Sample Relational Database
Other Models
Institute
 Hierarchical model
 Data is organized into tree based structure Departments Infrastructure
Older
models  A child has one and only one parent
 Useful for one to many relationship
 Network model Teachers Subjects
Students
 Data is organized into graph based structure
 A child can have multiple parents
Cores Labs Electives
 Useful for many to many relationship. More faster than hierarchical model to access data
 Object-oriented model
 Extension of relational database
 Suitable for complex database design such as GIS, telecommunications, scientific
equipment, multi media, CAD/ CAM
 Use object oriented languages (C++, java ) to develop such database
 Use concept of class, object, object type, in heritance, encapsulation
Other Models

 Structured data vs semi structured data


 Structured data can be organized by the means of relational database,
while semi structure data cannot be organized into tabular representation
 Self describing and flexible in representing data
 However, hard to implement
 XML can be used to represent semi structured data. HTML, JSON also

 Semi structure data model


 Permit the specification of data where individual data items of the same
type may have different attributes
Schema or Meta Data vs Instance
 In data model, it is important to distinguish the description of database and
database itself
 Schema
 Description of database or logical structure of database, which is specified
during the time of database design and is not changed frequently
 Table names, attributes names with their data types, constraints etc.
Student_name address Roll_no student

course_name Course_code credit course

 Data and schema are stored separately


 DBMS stores the description of schema of constructs and constraints, called
meta data, in DBMS catalog so that DBMS software can refer to the schema
whenever needs to
 Analogous to type of variable in a programming language
Schema diagram of student database

The logical schema is essential for relational database design


Instance
 Instance
 The actual content of the database at a particular point in time
 Snapshot of database at a time
 Analogous to value of a variable
Example of Database Instance
 Course instance

 Student instance

Student_name Address Roll No


Rohit Sharma Mumbai 18CS004
Deva Sharma Delhi 18BT001
Alia Chatterjee Kolkata 18CS007
Udit Goswami Assam 18CH003
Three Schema Architecture
 Goal is to separate the user application from the physical database
 Level architecture is used to describe various schema
 Physical/ internal schema: Internal level has internal schema which describes how data is
stored in the memory i.e. physical storage of the database. Internal schema uses physical data
model and describes data storage and access methods for the database
 Storage allocation : hashing, B tree
 Access paths: primary index, secondary index
 Others: encryption, optimization of internal structure
 Designed by DBA
 Logical/ conceptual schema: Conceptual level has conceptual schema which describes
database for the users. It hides the details of physical storage structure and concentrates on
entities, attributes, data types, relationships, user operations, constraints etc. Representational
data model can be used to describe conceptual schema. It is designed by database developer
or application programmer
 View level schema: View level includes a number of view schemas. Each view schema
describes a part of database that a particular group of users is interested and hides the rest of
the database from that user group. High level data model or representational data model can be used
at this level
 Three schemas are only descriptions of data; the actual data is stored
at the physical level only
 In the three-schema architecture, each user group refers to its own
external schema. Hence, the DBMS must transform a request
specified on an external schema into a request against the conceptual
schema, and then into a request on the internal schema for
processing over the stored database
 If the request is a database retrieval, the data extracted from the
stored database must be reformatted to match the user’s external
view. The processes of transforming requests and results between
levels are called mappings
Illustrating the three schema
architecture

View View 1 View 2 View n


level

View / conceptual
mapping

Logical level
Relation 1 Relation 2 Relation n

Logical/ internal
mapping

Physical Physical schema


level
Why Data Independence

 A database normally contains a lot of data in addition to user data. For


example, it stores data about data , meta data, to locate and retrieve
data easily.
 It is difficult to update or modify set of meta data once it is stored in the
database. But as database expands, it needs to change over time to
satisfy the requirement of users
 If entire data is dependent , it would become a tedious and highly
complex job
 To over come this, we need data independence.
What is Data Independence?

 Schema architecture can be used to explain data independence


 Data independence is defined as the capacity to change schema at one level of
database system without having to change the schema at next higher level
 Aim is to keep data separated from all programs that make use of it
 Types:
 Physical data independence
 Logical data independence
Logical Data Independence

 The capacity to change the conceptual schema without affecting the view
level schemas or application programs
 We may change conceptual schema to expand database
 Examples
 merging / splitting of records
 adding a new attribute to some relation
 no need to change the programs or views that don’t require to use the
new attribute
 deleting an attribute
 no need to change the programs or views that use the remaining data
 view definitions in VL-LL mapping only need to be changed for views
that use the deleted attribute
 changes to constraints can be applied to conceptual schema without affecting
view level schema
Physical Data Independence

 The ability to modify physical/internal schema without affecting the logical or view schema
 Changes to internal schema may be needed as some physical files have to be reorganized by
creating additional access structure to improve the performance of retrieval
 If same data as before remains in database, no need to change conceptual schema
 Physical Data Independence – modification is localized
 achieved by suitably modifying PL-LL mapping
 A very important feature of modern DBMS
 Examples:
 modification at physical level creating a new index
 changing access method
 switching to different data structure
 modifying file organization,
 use of new storage structure
 change of location of database say from C drive to D drive
Database languages

 Database languages are used to read, manipulate and store the data in
database
 Several languages : DDL, DML, DCL, TCL
Data Definition language (DDL)
 Specification notation for defining the database schema. It is used to create schema,
table, index, constraints in database
 DDL commands: create, drop, alter, truncate, rename
 To create the database/ tables : create
 To alter the structure of database : alter
 To drop the database/ table: drop
 To delete the all records from a table including space allocated for the records: truncate
 Example
create table student (
student_name char(30),
address varchar (30),
roll_no char(10) NOT NULL PRIMARY KEY,
age int
);
DDL
 Drop table student;
 Alter table student add phone_no int(10);
 Alter table student drop column age;
 Alter table student modify student_ name varchar(30);
DDL
 DDL compiler whose function is to process DDL statements in order to identify
descriptions of the schema constructs and to store the schema description in the
DBMS catalog (data dictionary )
 Data dictionary is the table which contains the information about database objects
 Database schema (description of entities , attributes as well as meaning of data
elements)
 Integrity constraint
 primary key uniquely identifies records
 synonyms, authorization and security codes
 database authorization ( who can access what)
Data Manipulation Language (DML)
 Language for accessing and manipulating the database
 DML also known as query language
 Two classes of languages
 Procedural – user instructs the system to perform sequence of operations on
database to compute the desired result ( relational algebra)
 Non procedural – user describes the desired information without giving a specific
procedure for getting the information (tuple relational calculus, domain relational
calculus)
 Pure query language (tuple relational calculus, domain relational calculus, relational algebra)
 SQL is the most widely used commercial query language
 DML compiler translates DML statements in query language into evaluation plan
consisting of low level instructions that query evaluation engine understand
 DML commands: insert, update, delete, select
DML Examples
 Insert into student values (‘Anupam’,’Kolkata’,’101’,’982345123’);
 Delete * from student where roll_no=‘101’;
 Update student set address =‘Delhi’ where roll_no=‘101’;
// careful with update. Where clause specifies which record(s) to be
updated. Omitting where clause, all records in the table will be update
Query

 Relational database systems allow a rich class of questions


1. Find the name of the student with roll 18CS1003
2. Find the average marks in CSE department
3. Find the number of students have enrolled in CSE and EE
and many more……
 These questions are called queries
 To process the query we need query language
SQL
 SQL (structured query language ): widely used non-procedural language
 Uses to perform information retrieval from database
 Used in relational database management system (RDMS) and useful for handling
structured data
 E.g. find the name of the student with roll_no 18CS003
select student_name
from student
where roll_no= ’18CS003’;
select * from student; (select all records from student table)
 Application programs generally access databases through
 Application program interface (e.g. ODBC/JDBC) which allow SQL queries
to be sent to a database
 SQL can be embedded in high level language
 Copy all columns from one table to another table:
INSERT INTO table2 SELECT * FROM table1 WHERE condition;
 Copy only some columns from one table into another table:
INSERT INTO table2 (column1, column2, column3,…)SELECT column1, colum
n2, column3, ... FROM table1 WHERE condition;

SQL
DCL and TCL
 DCL: used to control the privilege in database
 Grant privilege/ permission to a user to access the database using GRANT
statement
 Revoke the privilege using REVOKE statement. Take back the privilege to
access the database
 REVOKE and GRANT are examples of DCL

 TCL: Manage the transactions in database


 Commit: save the work
 Roll back: Restores the database to the original since last commit
 Savepoint: set a point in transaction to roll back later
Roll back vs Commit
 select * from student
student  delete * from student where age=17
 commit
ID Name Age  select * from student
1 A 10
2 B 13
student

ID Name Age
3 A 14
1 A 10
4 C 14
2 B 13
5 V 17

3 A 14
4 C 14
Roll back vs Commit
ID Name Age
1 A 10
student
2 B 13

 select * from student


3 A 14
 delete * from student where age=17
4 C 14
 select * from student

 roll back ID Name Age


1 A 10
 select * from student 2 B 13

student
3 A 14
4 C 14
5 V 17
Database Design
 How to say the database design is “good” or “bad”
 Employee table

EmpID Name Salary Dept Building HOD


11 Raja 1000 CSE Annex 1 Dr. Ray
12 David 1000 IT Annex 2 Dr.
Sharma
13 Amir 4000 CHE Main Dr. Gupta
14 Natasha 4000 BT BIO Dr. Palit
15 Dona 3000 CSE Annex 1 Dr. Ray
16 Sanjay 2000 BT BIO Dr. Palit

Any problem in table !!!!!!!!!!!!


Major problem is redundancy
How to design good database

 Proper modelling of database (ER model)


 Normalization (normalize if the design is bad and make it good)

EmpID Name Salary Dept Dept Building HOD


11 Raja 1000 CSE
CSE Annex 1 Dr. Ray
12 David 1000 IT
13 Amir 4000 CHE IT Annex 2 Dr. Sharma
14 Natasha 4000 BT
CHE Main Dr. Gupta
15 Dona 3000 CSE
16 Sanjay 2000 BT BT BIO Dr. Palit

EMP Dept
Database Users and Administrators
 Users are differentiated by the way they expect to interact with the system

 Naive users
 Unsophisticated users who interact with the system by invoking one of the application
programs that have been written previously
 No deep knowledge of database required
 Use the GUI provided by an application program
 Examples: people accessing database over the web, bank tellers, data entry operators
Database Users and Administrators

 Sophisticated user / data analyst


 Interact with the system without writing application programs
 They form their requests in a database query language and submit each such query
to a query processor, whose function is to break down DML statements into
instructions that the storage manager understands.
 Uses SQL to generate answers for complex queries

 Application Programmers
 Computer professionals who write application programs
 Embed SQL in high level programming language and develop programs to satisfy requirements
 Interact with system through DML calls
 Should thoroughly understand the logical schema or relevant views
 Testing of programs is necessary
 DBA (Database Administrator)
 Coordinates all the activities of the database system; the database
administrator has a good understanding of the enterprise’s information
resources and needs
 Functionalities
 Creation and modification of conceptual schema
 Implementation of storage structure and access method
 Physical organization modifications
 Grant/ revoke authorization to other users for data access
 Integrity constraints specification
 Execute immediate recovery procedure in case of failures
 Ensure physical security to database
Overall Database Architecture
Data storage and Querying

 Storage manager
 Query processor
 Transaction management
Storage Management
 Storage manager is a program module that provides the interface
between the low level data stored in the database and the application
programs and queries submitted to the system
 Responsible for interaction with the file manager
 The raw data are stored on the disk using the file system, which is
usually provided by a conventional operating system.
 The storage manager translates the various DML statements into low-
level file-system commands. Thus, the storage manager is responsible
for storing, retrieving, and updating data in the database.
Query Processor

 Parsing and translation


 Optimization
 Evaluation

Scanner,
Intermediate query
Query Parser
form
and
translator

Desired Execution Evaluation


Optimizer
o/p engine plan

data
Transaction Management

 What happens if the system fails?


 What happens if more than one users simultaneously access the system?
 A transaction is a collection of operations (R/W) that performs a single logical
function in database application
 It is unit of atomicity and consistency (correctness of data)
 Transaction manager ensures atomicity and consistency even there is some
failures (power failure, system crashes) and it provides recovery from transaction
failures
 Concurrency control manager is responsible for managing concurrent execution of
multiple transactions

You might also like