0% found this document useful (0 votes)
139 views114 pages

D B M S: ATA ASE Anage Me NT Ystem

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 114

D A T A B AS E M A N A G E M E N T S Y S T E M

-Amrit Gupta
UNIT 1

Introduction to Database, file, Record fields, problems with database. Categorization


of DBMS,(Networking, hierarchical & Relation database) Application of DBMS. The
three-layered Architecture, Advantages & disadvantage of DBMS.

UNIT 2

Important Components DBA, database, application program, DDL, DML etc.


Component of DBMS, query processor, Data dictionary. Physical database
structures, Normalization and logical design.

UNIT 3

Introduction to RDBMS, E – R Model and E – r Diagram Examples and


exercise E F. CODD 12 rules for relation database, data base concept:-
Transaction management properties of a transaction, commit
and Rollback, concurrency, locking.
UNIT 4

Data integrity, integrity constraints, Auditing, backup and recovery. Data dictionary,
system catalogue, introduction to distributed data base. Introduction to client –
server and ODBS connectivity.

UNIT 5

Introduction to SQL: - SQL language DML language DML commands, Relation


Algebra & SQL.: Introduction, Security and Integrity Violations, Authorization,
Granting of Privileges, Security Specification in SQL Data warehousing,
Multidimensional Data Models, Data Warehouse Architecture, ROLAP, MOLAP,
HOLAP, OLAP and OLTP Understand the Concept of Data Warehousing Data Mining,
Data Pre-Processing, Data Marts, Cluster Analysis, Decision Making
UNIT 1

Introduction to Database, file, Record fields, problems with database.


Categorization of DBMS,(Networking, hierarchical & Relation
database) Application of DBMS. The three-layered Architecture,
Advantages & disadvantage of DBMS.
Database
A database is any organized collection of data. Databases are
used by an organization as a method of storing, managing and
retrieving information.
What is a Database

 A database typically consists of:


• Tables
Collection of related records
• Fields (columns)
Single category of data to be stored in a
database (name, telephone number, etc.)
• Records (rows)
Collection of related fields in a database (all the
fields for one customer, for example)
Fe a t u re s o f D B M S

 Capacity for large amount of data


 An easy to use interface language (SQL-structured
query language
 Efficient retrieval mechanisms

 Multi-user support

 Security management

 Concurrency and transaction control

 Persistent storage with backup and recovery for


reliability
Database - Advantages & Disadvantages

 Advantages

• Reduced data redundancy


• Reduced updating errors and increased consistency
• Greater data integrity and independence from
applications programs
• Improved data access to users through use of host
and query languages
• Improved data security
• Reduced data entry, storage, and retrieval costs
• Facilitated development of new applications program
Database - Advantages & Disadvantages

 Disadvantages

• Database systems are complex, difficult, and time-


consuming to design
• Substantial hardware and software start-up costs
• Damage to database affects virtually all applications
programs
• Extensive conversion costs in moving form a file-
based system to a database system
• Initial training required for all programmers and
users
Database Models

 Hierarchical databases
 Network databases
 Object-oriented databases
 Relational databases
 NoSQL databases
Database Models
Hierarchical Databases:

This database advises visualizing the data being organized in a parent-


child relationship, which upon addition of multiple data elements would
resemble a tree. The child records are linked to the parent record using a field,
and so the parent record is allowed multiple child records. However, vice versa
is not possible. 
Database Models
Network Databases:
This database follows the progression of data being categorized in ranks
or levels, wherein data is categorized based on a common point of linkage. As
a result, two entities of data will be lower in rank and the commonality would
assume a higher rank.

Certainly, a complex framework, network databases are more capable of


representing two-directional relationships. Also, conceptual simplicity favours the
utilization of a simpler database management language. 
The disadvantage lies in the inability to alter the structure due to its complexity and
also in it being highly structurally dependent. 
Database Models
Object-Oriented Databases:
 
Information stored in a database is capable of being represented as an
object which response as an instance of the database model. Therefore, the
object can be referenced and called without any difficulty. As a result, the
workload on the database is substantially reduced.

Object,
Attribute
Method
Class
Database Models

In the chart above, we have different objects linked to one another using
methods; one can get the address of the Person (represented by the
Person Object) using the livesAt() method. Furthermore, these objects
have attributes which are in fact the data elements that need to be defined
in the database. 
Database Models
Relational Databases:
 
Information stored in a database is capable of being
represented as an object which response as an instance
of the database model. Therefore, the object can be
referenced and called without any difficulty. As a result,
the workload on the database is substantially reduced.
Database Models

Due to this introduction of tables to organize data, it has become


exceedingly popular. In consequence, they are widely integrated into Web-
Ap interfaces to serve as ideal repositories for user data. What makes it
further interesting is the ease in mastering it, since the language used to
interact with the database is simple (SQL in this case) and easy to
comprehend. 
It is also worth being aware of the fact that in Relational databases, scaling
and traversing through data is quite a light-weighted task in comparison to
Hierarchical Databases. 
Database Models
 NoSQL Databases:
  A NoSQL originally referring to non SQL or non-relational is a database that
provides a mechanism for storage and retrieval of data. This data is modelled in means
other than the tabular relations used in relational databases. 
A NoSQL database includes simplicity of design, simpler horizontal scaling to clusters of
machines, and finer control over availability. The data structures used by NoSQL
databases are different from those used by default in relational databases which makes
some operations faster in NoSQL. The suitability of a given NoSQL database depends on
the problem it should solve. Data structures used by NoSQL databases are sometimes
also viewed as more flexible than relational database tables. 
MongoDB falls in the category of NoSQL document-based database. 

 Advantages of NoSQL – 
There are many advantages of working with NoSQL databases such as MongoDB and
Cassandra. The main advantages are high scalability and high availability. 

 Disadvantages of NoSQL – 
NoSQL is an open-source database.
GUI is not available
Backup is a weak point for some NoSQL databases like MongoDB.
Large document size. 
Classification of DBMS
1) Data Model
2) User Numbers
3) Database distribution
i. Centralized Systems
ii. Distributed database systems
iii. Homogenous distributed database systems
iv. Heterogeneous distributed database systems
Categorization of DBMS
T Y P E S O F D A T A B A SES
 Non-relational databases
Non-relational databases place information in field categories that we create
so that information is available for sorting and disseminating the way we
need it.
The data in a non-relational database, however, is limited to that
program and cannot be extracted and applied to a number of other
software programs, or other database files

The data can only be "copied and pasted.“ Example: a spread sheet

 Relational databases
In relational databases, fields can be used in a number of ways. It is
developed based on a database model that provides for logical
connections among files (known as tables) by including
identifying data from one table in another table
Dat abase A rchi tec ture

A Database Architecture is a representation of DBMS


design. It helps to design, develop, implement, and
maintain the database management system. A DBMS
architecture allows dividing the database system into
individual components that can be independently
modified, changed, replaced, and altered. It also helps to
understand the components of a database.
A Database stores critical information and helps access
data quickly and securely. Therefore, selecting the correct
Architecture of DBMS helps in easy and efficient data
management.
 1-Tier Architecture (Single Tier Architecture)
 2-Tier Architecture
 3-Tier Architecture
1-Tier Architecture
This Architecture in DBMS is the simplest architecture of
Database in which the client, server, and Database all reside on
the same machine. A simple one tier architecture example would
be anytime you install a Database in your system and access it to
practice SQL queries. But such architecture is rarely used in
production.
2-Tier Architecture
This Architecture in DBMS is a Database architecture where
the presentation layer runs on a client (PC, Mobile, Tablet, etc.),
and data is stored on a server called the second tier. Two tier
architecture provides added security to the DBMS as it is not
exposed to the end-user directly. It also provides direct and faster
communication.
3-Tier Architecture
  This Architecture in DBMS is the most popular client server
architecture in DBMS in which the development and maintenance of
functional processes, logic, data access, data storage, and user
interface is done independently as separate modules.
3-Tier Architecture

In the diagram,

o It shows the architecture of DBMS.

o Mapping is the process of transforming request response between


various database levels of architecture.

o Mapping is not good for small database, because it takes more


time.

o In External / Conceptual mapping, DBMS transforms a request on


an external schema against the conceptual schema.

o In Conceptual / Internal mapping, it is necessary to transform the


request from the conceptual to internal levels.
3-Tier Architecture
 1. Physical Level
- Physical level describes the physical storage structure of data in database.
- It is also known as Internal Level.
- This level is very close to physical storage of data and describes how the data is being
stored in secondary storage devices like disks and tapes .
- At lowest level, it is stored in the form of bits with the physical addresses on the
secondary storage device.
- At highest level, it can be viewed in the form of files.
- The internal schema defines the various stored data types. It uses a physical data model.

2. Conceptual Level
- Conceptual level describes the structure of the whole database for a group of users, data
is represented in the form of various database tables.
- It is also called as the data model.
- Conceptual schema is a representation of the entire content of the database.
- These schema contains all the information to build relevant external records.
- It hides the internal details of physical storage.

3. External Level
- External level is related to the data which is viewed by individual end users.
- This level includes a no. of user views or external schemas. This level is closest to the
user.
- The main focus of external level is data abstraction.
- External view describes the segment of the database that is required for a particular
Advantages of DBMS
 Minimized redundancy and data inconsistency: Data is normalized in DBMS to
minimize the redundancy which helps in keeping data consistent. For Example, student
information can be kept at one place in DBMS and accessed by different users.This
minimized redundancy is due to primary key and foreign keys

 Simplified Data Access: A user need only name of the relation not exact location to
access data, so the process is very simple.

 Multiple data views: Different views of same data can be created to cater the needs
of different users. For Example, faculty salary information can be hidden from student
view of data but shown in admin view.

 Data Security: Only authorized users are allowed to access the data in DBMS. Also,
data can be encrypted by DBMS which makes it secure.

 Concurrent access to data: Data can be accessed concurrently by different users at


same time in DBMS.

 Backup and Recovery mechanism: DBMS backup and recovery mechanism helps to


avoid data loss and data inconsistency in case of catastrophic failures.
UNIT 2

Important Components DBA, database, application program,


DDL, DML etc. Component of DBMS, query processor,
Data dictionary. Physical database structures, Normalization
and logical design.
DBA (Database Administrator)

Database administrators (DBAs) use specialized software to store and


organize data.
The role may include capacity planning, installation, configuration,
database design, migration, performance monitoring, security,
troubleshooting, as well as backup and data recovery.

The DBA is responsible for understanding and managing the overall


database environment. By developing and implementing a strategic
blueprint to follow when deploying databases within their organization,
DBAs are instrumental to the ongoing efficacy of modern applications that
rely on databases for data storage and access.
Without the DBA's oversight, it is inevitable that application and system
outages, downtime and slowdowns will occur. Problems such as these
result in business outages that can negatively affect revenue, customer
experience and company reputation.

Types of DBA’s - System DBA, Database architect, Database analyst,


Application DBA, Task-oriented DBA, Performance analyst, Data
Warehouse administrator and Cloud DBA .
DDL (Data Definition Language)

Data Definition or Data Description Language (DDL) is a syntax for creating


and modifying database objects such as tables, indices, and users. DDL
statements are similar to a computer programming language for
defining data structures, especially database schemas. Common examples
of DDL statements include CREATE, ALTER, and DROP.

 CREATE Statement:
CREATE TABLE [table name] ( [column definitions] ) [table parameters]
DDL (Data Definition Language)

 DROP Statement:
The DROP statement destroys an existing database, table, index, or view.

DROP objecttype objectname.

For example, the command to drop a table named employees is:

DROP TABLE employees;

The DROP statement is distinct from the DELETE and TRUNCATE


statements, in the DELETE and TRUNCATE do not remove the table itself.
For example, a DELETE statement might delete some(or all) data from a
table while leaving the table itself in the database, whereas a DROP
statement removes the entire table from the database.
DDL (Data Definition Language)

 ALTER Statement:
The ALTER statement modifies an existing database object.

ALTER objecttype objectname parameters.

For example, the command to add (then remove) a column


named bubbles for an existing table named sink is:
DDL (Data Definition Language)

 TRUNCATE Statement:

The TRUNCATE statement is used to delete all data from a table. It's much


faster than DELETE.

TRUNCATE TABLE removes all rows from a table, but the table structure and
its columns, constraints, indexes, and so on remain. To remove the table
definition in addition to its data, use the DROP TABLE statement.
DML (Data Manipulation Language)

Data Manipulation Language (DDL) is a computer programming


language used for adding (inserting), deleting, and modifying (updating)
data in a database.

 INSERT Statement:
INSERT INTO table_name (column1 [, column2, column3 ... ]) VALUES
(value1 [, value2, value3 ... ])

If you are adding values for all the columns of the table, you do not need to
specify the column names in the SQL query.

INSERT INTO table_name VALUES (value1, value2, value3, ...);


DML (Data Manipulation Language)

 DELETE Statement:
 The DELETE statement removes one or more records from a table.
DML (Data Manipulation Language)

 UPDATE Statement:
 The UPDATE   statement changes the data of one or more records in
a table. Either all the rows can be updated, or a subset may be chosen
using a condition.

UPDATE table_name SET column_name = value [, column_name = value ...]
[WHERE condition]
DQL (Data Query Language)
DQL statements are used for performing queries on the data within schema
objects. The purpose of DQL commands is to get the schema relation based on
the query passed to it.

 SELECT Statement:
The SELECT statement returns a result set of records, from one or more tables.

o SELECT clause is the list of columns or SQL expressions that must be


returned by the query. This is approximately the relational
algebra projection operation.
o AS optionally provides an alias for each column or expression in
the SELECT clause. This is the relational algebra rename operation.
o FROM specifies from which table to get the data.
o WHERE specifies which rows to retrieve. This is approximately the
relational algebra selection operation.
o GROUP BY groups rows sharing a property so that an aggregate
function can be applied to each group.
o HAVING selects among the groups defined by the GROUP BY clause.
o ORDER BY specifies how to order the returned rows.
o The DISTINCT keyword eliminates duplicate data.
DCL (Data Control Language)
DCL statements are used to control access to data stored in a database
(Authorization).

 DCL Commands:

• GRANT to allow specified users to perform specified tasks.


• REVOKE to remove the user accessibility to database object.
• DENY – bans certain permissions from users.

GRANT SELECT,INSERT,UPDATE,DELETE on Employee To  'jeffrey'@'localhost‘;


GRANT ALL ON db1.* TO 'jeffrey'@'localhost';
GRANT 'role1', 'role2' TO 'user1'@'localhost', 'user2'@'localhost';

REVOKE INSERT ON *.* FROM 'jeffrey'@'localhost';


REVOKE 'role1', 'role2' FROM 'user1'@'localhost', 'user2'@'localhost';
REVOKE SELECT ON world.* FROM 'role3';

DENY CREATE TABLE , SELECT, UPDATE TO 'jeffrey'@'localhost‘;


TCL (Transaction Control Language)
TCL commands deal with the transaction within the database. 

 TCL Commands:

• COMMIT–  command is used to permanently save any transaction into


the database. The COMMIT command saves all the transactions to the
database since the last COMMIT or ROLLBACK command.
TCL (Transaction Control Language)

• ROLLBACK– restores the database to the last committed state.


ROLLBACK also used with the savepoint.
TCL (Transaction Control Language)
• SAVEPOINT– It is used to temporarily save a transaction so that you can
rollback to that point whenever required.

Example:
SAVEPOINT savepoint_name;

• SET TRANSACTION– specify characteristics for the transaction.


DBMS Structure
It is a software that allows access to data stored in a database and provides an
easy and effective method of –  
• Defining the information.
• Storing the information.
• Manipulating the information.
• Protecting the information from system crashes or data theft.
• Differentiating access permissions for different users.

The DBMS is divided


into three
components:

• Query Processor
• Storage Manager
• Disk Storage
DBMS Structure
1. Query Processor : 
It interprets the requests (queries) received from end user via an
application program into instructions. It also executes the user request which
is received from the DML compiler. 
Query Processor contains the following components – 
• DML Compiler – 
It processes the DML statements into low level instruction (machine
language), so that they can be executed. 
 
• DDL Interpreter – 
It processes the DDL statements into a set of table containing meta
data (data about data). 
 
• Embedded DML Pre-compiler – 
It processes DML statements embedded in an application program into
procedural calls. 
 
• Query Optimizer – 
It executes the instruction generated by DML Compiler. 
DBMS Structure
2. Storage Manager : 
Storage Manager is a program that provides an interface between the data stored in the database
and the queries received. It is also known as Database Control System. It maintains the consistency
and integrity of the database by applying the constraints and executes the DCL statements. It is
responsible for updating, storing, deleting, and retrieving data in the database. 
It contains the following components – 
 
• Authorization Manager – 
It ensures role-based access control, i.e. checks whether the particular person is privileged to
perform the requested operation or not. 
 
• Integrity Manager – 
It checks the integrity constraints when the database is modified. 
 
• Transaction Manager – 
It controls concurrent access by performing the operations in a scheduled way that it receives the
transaction. Thus, it ensures that the database remains in the consistent state before and after the
execution of a transaction. 
 
• File Manager – 
It manages the file space and the data structure used to represent information in the database. 
 
• Buffer Manager – 
It is responsible for cache memory and the transfer of data between the secondary storage and
main memory. 
DBMS Structure

3. Disk Storage : 
It contains the following components – 
 
• Data Files – 
It stores the data. 
 
• Data Dictionary – 
It contains the information about the structure of any database
object. It is the repository of information that governs the
metadata. 
 
• Indices – 
It provides faster retrieval of data item. 
N O R M A L I Z AT I O N

6-
6
• https://www.javatpoint.com/dbms-keys
EXAMPLE O F A R E L A T ION W I T H
ANOMALIES

Employee Salary Project Budget Function


Brown 20 Mars 2 technician

Green 35 Jupiter 15 designer


Green 35 Venus 15 designer
Hoskins 55 Venus 15 manager
Hoskins 55 Jupiter 15 consultant
Hoskins 55 Mars 2 consultant
Moore 48 Mars 2 manager
Moore 48 Venus 15 designer
Kemp 48 Venus 15 designer
Kemp 48 Jupiter 15 manager

6-7

The key is made up of the attributes Employee and Project


ANOMALIES IN U N - N O R M A L IZ E D TA B L E
( RELATION )
 Redundancy
 Update anomaly.

 Deletion anomaly

 Insertion anomaly.

6-9
A n o m a l i e s i n t h e exa m pl e Re l a t i o n

Anomaly – Multiple Copies of data.

•The value of the salary of each employee is repeated in all the tuples
relating to it: therefore there is a redundancy.
•If the salary of an employee changes, we have to modify the value in
all the corresponding tuples. This problem is known as the update
anomaly.

•If an employee stops working on all the projects but does not leave the
company, all the corresponding tuples are deleted and so, even the basic
information, name and salary is lost. This problem is known as the deletion
anomaly.

• If we have information on a new employee, we cannot insert it


until the employee is assigned to a project. This is known as the
insertion.
N O R M A LIZAT ION

The rules for transformations called are normalization based


on sound theoretical principles and ensure that the final
normalized relations obtained.

• reduce duplication of data,


• ensure that no mistakes occur when data are added or
deleted and simplify retrieval of required data

9.10
F IRST N O R M A L F O R M
 A relation is in 1NF if and only if all underlying
domains contain atomic values.

 Involves
• Removing repeating and place them in
groups separate table
• Also identify the primary key

6-
11
S E C O N D N O R M A L FO R M
 A table is in 2NF if-
 it is in 1NF
 every non key attribute is fully functionally dependent on the
entire key and NOT PARTIALLY dependent.

 Involves
• If an attribute depends on only part of a multi-valued key
remove it to a separate table
T HIR D N O R M A L F O R M
 A table is in a 3NF if –
 it is in 2NF and
 it has no transitive dependencies for NPA,

An attribute that is not part of any candidate key is


known as non-prime key.
 Involves
• All non key attribute should be dependent on the
primary key, else remove them to a separate table

• Violation of 3Nf when one field depends on another


field which in turn depends on the primary key
(transitive dependency)
B oyc e - C o d d N o r m a l Fo r m ( B C N F )

It is an advanced version of 3NF, that’s why it is also referred as


3.5NF.

BCNF is stricter than 3NF and for every functional dependency


X -> Y : X should be the Super Key of the table.
N O R M A L IZ A T IO N IS C A R R IE D O U T F O R F O U R
REASON

To structure the data so that any pertinent


relationships between entities can be represented

To permit simple retrieval of data in response to


query and a report request

Tosimplify the maintenance of the data through


updates, insertions and deletions

Toreduce the need to restructure or reorganize data


when new application requirements arise
- 14

9.14
UNIT 3

Introduction to RDBMS, E – R Model and E – r Diagram


Examples and exercise E F. CODD 12 rules for relation
database, data base concept:- Transaction management
properties of a transaction, commit
and Rollback, concurrency, locking.
ENTITY RELATIONSHIP
DIAGRAMS
Basic Elem ents and Rules
W H A T I S A R E LA T ION A L D AT A B A S E S YST E M ?

 Models a real-world enterprise


• Entities (e.g., teams, games)
• Relationships

 A Relational Database Management System (DBMS) is a software


system designed to store, manage, and facilitate access to databases.
BUSINESS ANALYST
 A business analyst is someone who analyzes
an organization it’s business or processes ,domain and
documents or systems, assessing
the business model or its integration
technology. with

 Often BA has a technical background,


whether
the having worked as a programmer or
engineer, or completing a Computer Science
degree. Others may move into a BA role from a
business role – their status as a subject matter
expert and their analytical skills make them
suitable for the role.
E N T IT Y -R E L AT IO N S H IP (E-R) M O D E LIN G
KEY TERMS
A database can be modeled as:
• a collection of entities,
• relationship among entities.
Entity
• A person, place, object, event or concept in the user
environment about which the organization wishes to
maintain data
• An entity is an object that exists and is
distinguishable from other objects.

10.18
A T T R IB U T E S
Entities have attributes
• A characteristic of an entity that is of interest to an
organization
• Example: people have names and addresses
Example:
customer = (customer_id, customer_name, customer_street,
customer_city )
loan = (loan_number, amount )

• An entity is represented by a set of attributes, that is


descriptive properties possessed by all members of an
entity set.

 Domain – the set of permitted values for each attribute


C O M P O SIT E A T T R IB U T ES
E N T IT Y -R E L AT IO N S H IP (E-R) M O D E L IN G
K EY T ER M S

 Identifier
• A candidate key that has been selected as the unique
identifying characteristic for an entity type

• Selection rules for an identifier


1. Choose a candidate key that will not change its value
2. Choose a candidate key that will never be null
E N T IT Y -R E L AT IO N S H IP (E-R ) M O D E L IN G
K EY T ER M S

 Relationship
An association between the instances of one or more

entity types that is of interest to the organization
Depicting Entities and Attributes
C A R D IN A LIT Y
 The number of instances of entity B that can
be associated with each instance of entity A
 For a binary relationship the mapping
cardinality must be one of the following types:
• One to one
• One to many
• Many to one
• Many to many
A T T R IB U T E O F A R E L A T IO N S H IP S E T

SSN street
number balance
city
name

customer holds account

entity

attribute

relationship
E R D D E V E L O P M E N T P RO C ESS
 Identify the entities
 Determine the attributes for each entity

 Select the identifier for each entity

 Establish the relationships between the entities

 Draw an entity model

 Test the relationships and the keys


W H Y U SE E R D IAG R A M S ?

 Provides a global quick reference to an


organization’ s data structures.

 Can be used individually to design an


Information System’s (IS) data structure
C O N C U R R E N C Y C O N T RO L
 Concurrent execution of user programs: key to good DBMS
performance.

 Interleaving actions of different programs: trouble!


• e.g., account-transfer & print statement at same time

 DBMS ensures such problems don’t arise.


• Users/programmers can pretend they are using a single-
user system. (called “Isolation”)

Thank goodness! Don’t have to program “very, very


carefully”.
C O M P A RIS O N FILE A N D D AT A B A SE SYST E M
File System Relational Database System
Data redundancy and Control Redundancy
inconsistency
Difficulty in accessing data Providing storage structure for efficient
query processing Concurrency Control
Multiple files and formats Simple and single format
Integrity problems Helps enforce Integrity Constraints
May leave data in an Ensures Atomicity of Transaction
inconsistent state with partial
updates carried out
Concurrent accesses can lead to Transaction Isolation
inconsistencies
Security problems Restricting Unauthorized Access
Difficult to recover data Providing Backup & Recovery
No Multiple User Interface Providing Multiple User Interfaces
Availability of up to date Real time data , Flexibility
Information difficult
D A T A B A SE S U S E RS
 End users in many fields
• Business, education, science, …
 Programmers
• Build enterprise applications on top of DBMSs
• Build web services that run off DBMSs
 Database administrators (DBAs)
• Design physical schemas
• Handle security and authorization
• Data availability, crash recovery
• Database tuning as needs evolve

…must understand how a DBMS works


Codd’s 12 Rules for RDBMS
Dr Edgar F. Codd was a Computer Scientist who invented the Relational
model for Database management. Based on relational model, the Relational
database was created. Codd's rule define what quality a DBMS requires in
order to become a Relational Database Management System(RDBMS). He
proposed a total of 13 rules:

Rule 0: The foundation rule:


For any system that is advertised as, or claimed to be, a relational data
base management system, that system must be able to manage data bases
entirely through its relational capabilities.

Rule 1: The information rule:


All information in a relational data base is represented explicitly at the
logical level and in exactly one way – by values in tables.

Rule 2: The guaranteed access rule:


Each and every datum (atomic value) in a relational data base is
guaranteed to be logically accessible by resorting to a combination of table
name, primary key value and column name.
Codd’s 12 Rules for RDBMS
Rule 3: Systematic treatment of null values:
Null has several meanings, it can mean missing data, not applicable or no value. It
should be handled consistently. Also, Primary key must not be null, ever. Expression
on NULL must give null.

Rule 4: Dynamic online catalog based on the relational model:


Database dictionary(catalog) is the structure description of the
complete Database and it must be stored online. The Catalog must be governed by
same rules as rest of the database. The same query language should be used on catalog
as used to query database.

Rule 5: The comprehensive data sublanguage rule:


A relational system may support several languages and various modes of terminal
use (for example, the fill-in-the-blanks mode). However, there must be at least one
language(Eg: ? ) whose statements are expressible, per some well-defined syntax, as
character strings and that is comprehensive in supporting all of the following items:
• Data definition.
• View definition.
• Data manipulation (interactive and by program).
• Integrity constraints.
• Authorization.
• Transaction boundaries (begin, commit and rollback).
Codd’s 12 Rules for RDBMS
Rule 6: The view updating rule:
All views that are theoretically updatable are also updatable by the system.

Rule 7: Possible for high-level insert, update, and delete:


There must be Insert, Delete, Update operations at each level of relations. Set
operation like Union, Intersection and minus should also be supported. [CRUD]

Rule 8: Physical data independence:


Application programs and terminal activities remain logically unimpaired whenever
any changes are made in either storage representations or access methods.

Rule 9: Logical data independence:


If there is change in the logical structure(table structures) of the database the user
view of data should not change. Say, if a table is split into two tables, a new view should
give result as the join of the two tables. This rule is most difficult to satisfy.

Rule 10: Integrity independence:


The database should be able to enforce its own integrity rather than using other
programs. Key and Check constraints, trigger etc, should be stored in Data Dictionary.
This also make RDBMS independent of front-end.
Eg: Entity Integrity , Referential Integrity.
Codd’s 12 Rules for RDBMS
Rule 11: Distribution independence:
The end-user must not be able to see that the data is distributed over
various locations. Users should always get the impression that the data is
located at one site only.

Rule 12: The nonsubversion rule:


If a relational system has a low-level (single-record-at-a-time) language,
that low level cannot be used to subvert or bypass the integrity rules and
constraints expressed in the higher level relational language (multiple-records-
at-a-time).
**No cheating Rule.
DB Transaction Concept

A transaction can be defined as a group of tasks. A single task is the minimum


processing unit which cannot be divided further.

Let’s take an example of a simple transaction. Suppose a bank employee transfers


Rs 500 from A's account to B's account. This very simple and small transaction
involves several low-level tasks.
T R A N SA C T IO N S : A CID P RO P E R T IES
 Key concept is a transaction: a sequence of database actions
(reads/writes). - in order to ensure accuracy, completeness, and data
integrity.

 DBMS ensures atomicity (all-or-nothing property) even if system


crashes in the middle of a transaction.
 Each transaction, executed completely, must take the DB between
consistent states or must not run at all.
 DBMS ensures that concurrent transactions appear to run in
isolation.
 DBMS ensures durability of committed transactions even if system
crashes.
DB Transaction Concept

States of Transactions :

1) Begin the transaction.


2) Execute a set of data manipulations and/or queries.
3) If no error occurs, then commit the transaction.
4) If an error occurs, then roll back the transaction.
DB Transaction Concept
• Active − In this state, the transaction is being executed. This is the initial state of
every transaction.

• Partially Committed − When a transaction executes its final operation, it is said


to be in a partially committed state.

• Failed − A transaction is said to be in a failed state if any of the checks made by


the database recovery system fails. A failed transaction can no longer proceed
further.

• Aborted − If any of the checks fails and the transaction has reached a failed
state, then the recovery manager rolls back all its write operations on the
database to bring the database back to its original state where it was prior to
the execution of the transaction. Transactions in this state are called aborted.
The database recovery module can select one of the two operations after a
transaction aborts −
Re-start the transaction
Kill the transaction

• Committed − If a transaction executes all its operations successfully, it is


said to be committed. All its effects are now permanently established on
DB Transaction Concept
In a multiprogramming environment where multiple transactions can be
executed simultaneously, it is highly important to control the concurrency of
transactions. We have concurrency control protocols to ensure atomicity, isolation,
and serializability of concurrent transactions. Concurrency control protocols can
be broadly divided into two categories −
Lock based protocols
Time stamp based protocols

 Lock-Based Protocols:
Database systems equipped with lock-based protocols use a mechanism by
which any transaction cannot read or write data until it acquires an appropriate
lock on it. Locks are of two kinds −
Binary Locks − A lock on a data item can be in two states; it is either locked or
unlocked.
Shared/exclusive − This type of locking mechanism differentiates the locks based
on their uses. If a lock is acquired on a data item to perform a write operation, it is
an exclusive lock. Allowing more than one transaction to write on the same data
item would lead the database into an inconsistent state. Read locks are shared
because no data value is being changed.
UNIT 4

Data integrity, integrity constraints, Auditing, backup and


recovery. Data dictionary, system catalogue,
introduction to distributed data base. Introduction to
client – server and ODBS connectivity.
Data Integrity
Data integrity refers to the accuracy and consistency (validity) of data over its
lifecycle. Compromised data, after all, is of little use to enterprises, not to mention
the dangers presented by sensitive data loss. For this reason, maintaining data
integrity is a core focus of many enterprise security solutions.

Consider the
tables employees 
and departments 
and the business
rules for the
information in
each of the tables
Data Integrity
 Integrity Constraints:
• Are a set of rules. It is used to maintain the quality of information.
• Ensure that the data insertion, updating, and other processes have to be performed in
such a way that data integrity is not affected.
• Is used to guard against accidental damage to the database.

Types of integrity constraints :


1) Domain Constraint
2) Entity Integrity Constraint
3) Referential Integrity Constraint
4) Key Constraint

 Domain constraints
• Domain constraints can be defined as the definition of a valid set of values for
an attribute.

• The data type of domain includes string, character, integer, time, date, currency,
etc. The value of the attribute must be available in the corresponding domain.
Data Integrity
I N T E G R IT Y C O N S T R A INTS
 An integrity constraint is a rule that restricts the values
that may be present in the database.

 Entity integrity - The rows in a table represent entities, and


each one must be uniquely identified. Hence we have the
primary key that must have a unique non- null value for
each row.

 Referential integrity - This constraint involves the foreign keys.


Foreign keys tie the relations together, so it is vitally
important that the links are correct. Every foreign key must
either be null or its value must be the actual value of a key
in another table.
Data Integrity

 Entity Integrity Constraints

• The entity integrity constraint states that primary key value can't be null.

• This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify
those rows.

• A table can contain a null value other than the primary key field.
Data Integrity

 Referential Integrity Constraints

• A referential integrity constraint is specified between two tables.

• In the Referential integrity constraints, if a foreign key in Table 1 refers to


the Primary Key of Table 2, then every value of the Foreign Key in
Table 1 must be null or be available in Table 2.
Data Integrity

 Key Constraints

• Keys are the entity set that is used to identify an entity within its entity set
uniquely.

• An entity set can have multiple keys, but out of which one key will be the
primary key. A primary key can contain a unique and null value in the
relational table.
Data Integrity

 NOT NULL Integrity Constraints


By default, all columns in a table allow nulls. Null means the absence of
a value. A NOT NULL constraint requires a column of a table contain no
null values. For example, you can define a NOT NULL constraint to
require that a value be input in the last_name column for every row of
the employees table.
Data Integrity

 UNIQUE Key Integrity Constraints


Every value in a column or set of columns (key) be unique(not same)—
that is, no two rows of a table have duplicate values in a specified
column or set of columns.

UNIQUE key constraint
is defined on the
DNAME column of
the dept table to
disallow rows with
duplicate department
names.
Data Integrity

 PRIMARY Key Integrity Constraints


Each table in the database can have at most
one PRIMARY KEY constraint. The values in the group of one or more
columns subject to this constraint constitute the unique identifier of the
row. In effect, each row is named by its primary key values.
•No two rows of a table have duplicate values in the specified column or set of
columns.
•The primary key columns do not allow nulls. That is, a value must exist for the
primary key columns in each row.
Data Integrity
 FOREIGN Key Integrity Constraints

 It is a column (or combination of columns) in a table whose


values must match values of a column in some other
table. FOREIGN KEY constraints enforce referential integrity,
which essentially says that if column value A refers to column
value B, then column value B must exist.

• Each value inserted or updated in orders.customer_id must exactly


match a value in customers.id, or be NULL.

• Values in customers.id that are referenced


by orders.customer_id cannot be deleted or updated, unless
you have cascading actions. However, values
of customers.id that are not present in orders.customer_id can
be deleted or updated.
Data Integrity
Database Backup & Recovery

One of the key responsibilities of a database administrator (DBA) is


to prepare for the possibility of media, hardware and software failure as
well as to recover databases during a disaster. Should any of these
failures occur, the major objective is to ensure that the database is
available to users within an acceptable time period, while ensuring that
there is no loss of data.

• Confidence of DBA that the data on which the company business


depends are backed up successfully and that the data can be
recovered from these backups within the permissible time limits, per a
Service Level Agreement (SLA) or recovery time objective, as specified
in the organization’s disaster recovery plan?

• Has the DBA taken measures to draft and test the procedures to
protect as well as recover the databases from numerous types of
failures?
Database Backup & Recovery

• Develop a comprehensive backup plan.


 What needs to be backed up ? Eg: OS,RDBMS Software, Passwords.

• Perform effective backup management.


 Automating & Monitoring Backups, Validating.

• Perform periodic databases restore testing.

• Have backup and recovery SLAs drafted and communicated to all stakeholders.

• Have the Disaster Recovery Plan (DRP) database portion drafted and
documented.

• Keep your knowledge and know-how on database and OS backup and


recovery tools up to date.
Data Dictionary

DEFINITION:
A collection of names, definitions, and attributes about data elements that
are being used or captured in a database, information system, or part of a
research project. It describes the meanings and purposes of data elements
within the context of a project, and provides guidance on interpretation,
accepted meanings and representation. A Data Dictionary also provides
metadata about data elements. The metadata included in a Data Dictionary can
assist in defining the scope and characteristics of data elements, as well the rules
for their usage and application.

USES:
• Assist in avoiding data inconsistencies across a project
• Help define conventions that are to be used across a project
• Provide consistency in the collection and use of data across multiple members
of a research team
• Make data easier to analyze
• Enforce the use of Data Standards
System Catalogue
A group of tables and views that incorporate vital details regarding a database.
Every database comprised of a system catalog and the information in the system
catalog specifies the framework of the database.
The system catalog is a set of objects, which includes information that defines:
• Other objects included in the database
• The database structure itself
• Several other vital pieces of information
The system catalogue intended for implementation can be split into logical groups of
objects. This is to offer tables that are accessible by not just the admin of the database, but
also by all other database users as well. For instance, users might want to see the specific
database privileges that they have been granted with; however, have no requirement to
find out regarding the database's processes or internal structure.

A user generally looks up the system catalogue to gain information regarding the
user's own objects as well as privileges, while the database admin must be capable of
inquiring about any event or structure inside the database. In certain implementations, one
can find system catalogue objects, which can be accessible only by the administrator of the
database.

A system catalogue is extremely important to database admins or all other database


users who wish to understand the nature and structure of a database. The system
catalogue lets order to be kept, not just by the users and database administrator, but also
by the database server as well.
Distributed Database
A Distributed Database (DDB) is an integrated collection of databases that is
physically distributed across sites in a computer network. Portions of the
database are stored in multiple physical locations and processing is distributed
among multiple database nodes. To form a distributed database system (DDBS),
the files must be structured, logically interrelated, and physically distributed
across multiple sites. In addition, there must be a common interface to access
the distributed data.
Distributed Database
 TYPES :
 Homogenous Database
All different sites store database identically. The operating system, database
management system and the data structures used – all are same at all sites.
Hence, they’re easy to manage. 
Distributed Database
 TYPES :
 Heterogeneous Database
Different sites can use different schema and software that can lead to
problems in query processing and transactions. Also, a particular site might be
completely unaware of the other sites. Different computers may use a different
operating system, different database application. They may even use different
data models for the database. Hence, translations are required for different sites
to communicate.  

 Distributed Data Storage:


 Replication
 Fragmentation
 Horizontal Fragmentation (Splitting by Rows)
 Vertical Fragmentation (Splitting by Columns)
Open Database Connectivity (ODBC)

It is a C programming language interface that makes it possible for


applications to access data from a variety of DBMS’s. ODBC is a low-level, high-
performance interface that is designed specifically for relational data stores.

• Allows maximum interoperability


• Independence of any DBMS from which it accesses data.
• Can add software components called drivers, which interface between an
application and a specific DBMS.

  The application uses ODBC functions through an ODBC driver manager with


which it is linked, and the driver passes the query to the DBMS.

Eg: SQLite , HTML , MYSQL


Security & Integrity Violations

The data stored in the database needs to be protected from unauthorized


access, malicious destruction or alteration, and accidental introduction of
inconsistency.

 Accidental loss of data from:


• Crashes during transaction processing
• Anomalies due to concurrent access to the database
• Anomalies due to the distribution of data over several computers

 Malicious Access :
• Unauthorized reading of data (theft of information)
• Unauthorized modification of data
• Unauthorized destruction of data

You might also like