0% found this document useful (0 votes)
7 views

Database-notes

The document provides an overview of databases, including their definitions, importance, applications, and major components. It discusses database management systems (DBMS), their functions, and the languages used for data definition, manipulation, and control. Additionally, it covers data modeling concepts, including entities, attributes, relationships, and the process of creating entity-relationship diagrams.

Uploaded by

foorwebiosapp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Database-notes

The document provides an overview of databases, including their definitions, importance, applications, and major components. It discusses database management systems (DBMS), their functions, and the languages used for data definition, manipulation, and control. Additionally, it covers data modeling concepts, including entities, attributes, relationships, and the process of creating entity-relationship diagrams.

Uploaded by

foorwebiosapp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

Chapter: Introduction to Database

What is Database?
The software that store huge collection of data called database.
Using database software make easier to find, add, update and delete data.

Why study database?


Database are useful:
-Many computing application deal with large amount of information.
-Database system give a set of tools for storing, searching and managing this information
Database in CS
-Data are a core topic in computer science
-Basic concept and skills with database are part of the skill.

What are the applications/use of database?


-medical record
-bank account
-student record
-Airlines: reservations, schedules
-customer history
-Web index
-Library catalog
-Sales: customers, products, purchases
-Manufacturing: production, inventory, orders, supply chain
-Human resources: employee records, salaries, tax deductions
What are the major component of database?
-Data(the database)
-Software User
-Hardware
-User Application program

Software to process
queries

Software to process
access stored data

Meta data Stored


data
Who is the user of database?
End user: Use database and achieve some goal
Application developer: Write software to allow end user to interface with the database system.
DBA: Database administrator design and manage the database system
Database system programmer: write the database software (SQL command)

What database system allow user to do?


-Store data
-Update data
-Delete data
-Retrieve data
-organize (indexing/sorting)
-protect/security
-apply condition
What a file-based system? What is the problem of it?

A file-based system is a way of managing data where each set of data is stored in separate files,
often in a flat, unstructured format.

File based systems

• Data is stored in files


• Each file has a specific format
• Programs that use these files depend on knowledge about that format

It is not as good as a database system because (problems of early database system)

1. Data Redundancy and Inconsistency: Data is often duplicated across multiple files,
leading to redundancy and potential inconsistencies.
2. Data Isolation: It is difficult to access and integrate data scattered across different files.
3. Integrity Problems: Enforcing data integrity rules (like constraints and relationships) is
difficult.
4. Atomicity Issues: Ensuring that all parts of a transaction are completed successfully (or
none at all) is complex.
5. Concurrent Access: Handling multiple users accessing and updating data simultaneously
is challenging.
6. No standards
7. Data dependence
8. No way to generate ad hoc queries
9. No provision for security, recovery, concurrency, etc
10. Security: Implementing robust security measures is harder compared to database
systems.
11. Data Independence: Changes in data structure often require changes in application
programs, unlike databases which separate data and application logic.
Chapter: Database System Concepts and Architecture

What is DBMS(Database Management System)?


Database Management System
DBMS is a program that used to control, access and manipulation of data in Database.
It provides an environment so that the user can interact with database.

Example of DBMS:
-Oracle
-MS Access
-MYSQL
-SQL Server
-DB-2
-Ingress
-Postgress SQL

What DBMS does?


DBMS provides-
-Persistence(stability)
- consistency
-Integrity(completeness)
-Security
-data independence
-avoid redundancy/duplicate
-Data dictionary
DBMS Provide user with:
-DDL(Data Definition Language)
-DML(Data Manipulation Language) and
-DCL(Data Control Language): grant, revoke, access control, authorization)
(often with one language)

Describe DDL, DML and DCL?


DDL(Data Definition Language):

DDL is used to define and manage database structures. It includes commands that create, alter,
and delete database objects such as tables, indexes, and schemas.

 Commands: CREATE, ALTER, DROP

DML (Data Manipulation Language)

DML is used to manipulate the data within the database. It includes commands that allow users
to insert, update, delete, and retrieve data from database tables.

 Commands: INSERT, UPDATE, DELETE, SELECT

 Procedural DML:
o Requires the user to specify what data is needed and how to get it.
o The user must define the sequence of operations to retrieve or manipulate the
data.
o Example: SQL's PL/SQL or T-SQL, where the user writes procedures and
functions that include detailed steps and loops.
 Non-Procedural DML:
o Requires the user to specify what data is needed without defining the sequence
of operations.
o The database system determines the optimal way to retrieve or manipulate the
data.
o Example: Standard SQL (SELECT, INSERT, UPDATE, DELETE), where the
user specifies the desired data through high-level queries.
Non-procedural DML is generally considered easier to use and more declarative, allowing the
database system to optimize the execution plan.

DCL(Data Control Language):


grant, revoke, access control, authorization
(often with one language)

DCL is used to control access to data in the database. It includes commands that grant or revoke
permissions to users or groups of users.

 Commands: GRANT, REVOKE


 Example:

GRANT SELECT ON Customers TO user1;


These commands control access permissions for the database.

What is Data dictionary?


- A data dictionary is a central collection of information about data elements, including their
definitions, meanings, relationships, origin, usage, format, and properties.

Metadata Management: Storing definitions, relationships, and formats of data elements.


Keep records of Tables, user, rules , view, index
Logs(who uses and what data are being used)
Data Integrity: Enforcing data integrity rules and constraints.
Data Access Control: Managing access permissions and security policies for data
elements.
Data Integration: Facilitating data integration by providing a clear map of data
relationships and dependencies.
Schema and mapping
What Levels of Abstraction?

Physical level describes how a record (e.g., customer) is stored.


Logical level: describes data stored in database, and the relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
View level: application programs hide details of data types. Views can also hide information
(e.g., salary) for security purposes.

View of Data

What are Instances and Schemas?

Schema – the logical structure of the database (overall design)e.g., the database consists of
information about a set of customers and accounts and the relationship between them)
Analogous to type information of a variable in a program

Physical schema: database design at the physical level


Logical schema: database design at the logical level

Instance – the actual content of the database at a particular point in time.


Analogous to the value of a variable
What is data dependency?

Physical Data Independence – the ability to modify the physical schema without changing the
logical schema
Applications depend on the logical schema
In general, the interfaces between the various levels and components should be well defined so
that changes in some parts do not seriously influence others.

What is ANSI/SPARC Architecture?

ANSI/SPARC stands for American National Standards Institute, Standards Planning And
Requirements Committee.

1) Internal level: For systems designers/programmer


2) Conceptual level: For database designers and administrators(DBA)
3) External level: For database users

1) Internal Level:
• Deals with physical storage of data
• Structure of records on disk - files, pages, blocks
• Indexes and ordering of records
• Used by database system programmers
2) Conceptual Level:
• Deals with the organisation of the data as a whole
• Abstractions are used to remove unnecessary details of the internal level
• Used by DBAs and application programmers
3) External Level:
• Provides a view of the database tailored to a user
• Parts of the data may be hidden
• Data is presented in a useful form
• Used by end users and application programmers
User 1 User 2 User 3

External External
View 1 View 2

Conceptual
View DBA

Stored
Data

Who is Database Administrator?


Coordinates all the activities of the database system; the database administrator has a good
understanding of the enterprise’s information resources and needs.
Database administrator's duties include:
-Schema definition
-Storage structure and access method definition
-Schema and physical organization modification
-Granting user authority to access the database
-Specifying integrity constraints
-Acting as liaison with users
-Monitoring performance and responding to changes in requirements
What is mapping?

• Mappings translate information from one level to the next


• External/Conceptual (physical)
• Conceptual/Internal (logical)
• These mappings provide data independence

• Physical data independence


• Changes to internal level shouldn’t affect conceptual level
• Logical data independence
• Conceptual level changes shouldn’t affect external levels

What is Database System Structure?

A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. The functional components of a database system can be broadly divided into the
storage manager and the query processor components.

Storage manager is important for data and memory management and interface

Query processor is important because if helps the database system simplify and facilitate access
to data

What is Storage Management?


Storage manager is a program module that provides the interface between the low-level data
stored in the database and the application programs and queries submitted to the system.

The storage manager is responsible to the following tasks:


interaction with the file manager
efficient storing, retrieving and updating of data
The storage manager components include
Authorization and integrity manager
Transaction manager
File manager
Buffer Manager
Compare Disk-Based and Memory Resident DBMS.

 In Disk-Based DBMS, the original database resides in the disk and a part of the database
is loaded in the main memory for processing.
 In main-memory DBMS, original copy of the database resides in the main memory and a
back-up is kept in the disk.
 Memory access is faster than disk access. So MMDB is faster then disk-based DBMS.
 Memory must be non-volatile in MMDB.
 Disk-based DBMS in highly scalable where is MMDBMS is limited scalable.

What Centralized Architecture?


 Centralized DBMS run on a single
computer System and don’t interact
with other Computer System
 All processing is performed centrally.
 A number of controllers are used for
different purposes

What is Client-Server Architecture?


 Client – Server System have functionality split between a server system and multiple
client system.
 Server System satisfy requests generated by Client System.
 Database functionality is divided into two parts. From end and Back-end.
– Back end manages – access structure query evaluation, concurrency control and
recovery.
– Front-end database system consists of tools such as forms, report writers and
graphical user interface facilities.

Client Client Client

Server
Chapter: Data modeling

What is Data Modeling?

Data Modeling in Databases is the process of defining and structuring the data elements and
their relationships within a database, using conceptual, logical, and physical models. This
ensures efficient organization, storage, and retrieval of data in the database system.

Database Design
Conceptual design- -Build a model independent of the choice of DBMS
Logical design -Create the database in a given DBMS
Physical design -How the database is stored in hardware

Diagramming Entities
In an E/R Diagram, an entity is usually drawn as a box with rounded corners

• The box is labelled with the name of the class of objects represented by that entity
Relationships
Relationships are an association between two or more entities
• Each Student takes several Modules
• Each Module is taught by a Lecturer
• Each Employee works for a single Department

Relationships have
• A name
• A set of entities that participate in them
• A degree – the number of entities
that participate (most
have degree 2)
• A cardinality ratio

Cardinality Ratios
Each entity in a relationship can participate in zero, one, or more than one instances of that
relationship
• This leads to 3 types of relationship…

One to one (1:1)


• Each lecturer has a unique office

• One to many (1:M)


• A lecturer may tutor many students, but each student has just one tutor

• Many to many (M:M)


• Each student takes several modules, and each module is taken by several students

Diagramming Relationships
Relationships are links between two
entities
• The name is given in a diamond box
• The ends of the link show cardinality
Removing M:M Relationships
• Many to many relationships are difficult to represent
• We can split a many to many relationship into two one to
many relationships
• An entity represents the M:M relationship
Making E/R Models
To make an E/R model you need to identify
• Enitities
• Attributes
• Relationships
• Cardinality ratios
• from a description
• General guidelines
• Since entities are things or objects they are often nouns in the description
• Attributes are facts or properties, and so are often nouns also
• Verbs often describe relationships between entities

Example
A university consists of a number of departments. Each department offers several courses. A
number of modules make up each course. Students enrol in a particular course and take modules
towards the completion of that course. Each module is taught by a lecturer from the appropriate
department, and each lecturer tutors a group of students
Example – Entities
A university consists of a number of departments. Each department offers several courses. A
number of modules make up each course. Students enrol in a particular course and take modules
towards the completion of that course. Each module is taught by a lecturer from the appropriate
department, and each lecturer tutors a group of students
Entities and Attributes
Sometimes it is hard to tell if something should be an entity
or an attribute
• They both represent objects or facts about the world
• They are both often represented by nouns in descriptions
• General guidelines
• Entities can have attributes but attributes have no smaller parts
• Entities can have relationships between them, but an attribute belongs to a single entity

Example
We want to represent information about products in a database. Each product has a description, a
price and a supplier. Suppliers have addresses, phone numbers, and names. Each address is made
up of a street address, a city, and a postcode.

Example - Entities/Attributes
• Entities or attributes:
• product
• description
• price
• supplier
• address
• phone number
• name
• street address
• city
• postcode
• Products, suppliers,
and addresses all
have smaller parts
so we can make
them entities
• The others have no smaller parts and belong to a single entity

Example – Relationships
• Each product has a supplier
• Each product has a single supplier but there is nothing to stop a supplier supplying many
products
• A many to one relationship
• Each supplier has an address
• A supplier has a single address
• It does not seem sensible for two different suppliers to have the same address
• A one to one relationship
One to One Relationships:
• Some relationships between entities, A and B, might be redundant if
• It is a 1:1 relationship between A and B
• Every A is related to a B and every B is related to an A
• Example – the supplier-address relationship
• Is one to one
• Every supplier has an address
• We don’t need addresses that are not related to a supplier

Redundant Relationships
• We can merge the two entities that take part in a redundant relationship together
• They become a single entity
• The new entity has all the attributes of the old one
Making E/R Diagrams
• From a description of the requirements identify the
• Entities
• Attributes
• Relationships
• Cardinality ratios of the relationships
• Draw the E/R diagram and then
• Look at one to one relationships as they might be redundant
• Look at many to many relationships as they might need to be split into two one to
many links

Debugging Designs
With a bit of practice E/R diagrams can be
used to plan queries
• You can look at the
diagram and figure out how to find useful information
• If you can’t find the information you need, you may need to change the design
How can you find a list of students who are enrolled in Database systems?
What is Entity, attribute and relationship?

 Entity: A thing (object or Concept) that has an independent resistance in an organization,


Example: Customer, account, etc. in Bank Information System.
 Attribute: It is a property of an entity customer entity can be described by name, date of
birth etc.
 Relationship: A relationship is an association between two or more entities.

What are the different Types of Attribute?

 Simple attribute : Can not be divided


 Composite attribute: Can be divided into sub-parts e.g. address.
 Single-valued attribute – account number.
 Multi-valued attribute – takes a set of values for a specific entity . e.g. Phone-number{0,1
or more than 1}

What is ER diagram?

An Entity-Relationship (ER) Diagram is a visual representation of the data entities and their
relationships within a database.

A rectangle represents an entity e.g.

Customer

An ellipse represents an attribute e.g.

Name

A diamond represents a relationship

Opens

A line makes a link between entities and attribute


entity and relationship

Double ellipse represents multi valued attribute.

address

Dashed ellipse represents a derived attribute

date- of birth age

Double lines represent a total participation of an entity in a relationship set.


Loan number

Borrower Loan
Customer

Amount

What are the types of Mapping Cardinalities?

Mapping Cardinalities
Express the number of entities to which another entity can be associated via a relationship set.
Most useful in describing binary relationship sets.
For a binary relationship set the mapping cardinality must be one of the following types:
-One to one
-One to many
-Many to one
-Many to many

 One- to- One : A customer can open only one account. An account can be opened by only
one customer.
 One- to – many: A customer can open one or more accounts. An account can be opened
by only one customer.
 Many- to one: A customer can open only one account. An account can be opened by one
or more customers.
 Many- to – Many : A customer can open many accounts. An account can be opened by
many customers.

a) One to one
b) One to many

Note: Some elements in A and B may not be mapped to any elements in the other set

a) Many to one
b) Many to many
Note: Some elements in A and B may not be mapped to any elements in the other set
What is Participation Constraints?
 Total: If every entity in set E participate at least one relationship in R.
 Partial: If only some entities in E participate in Relationship R.

How Mapping Cardinalities affect ER Design?

Can make access-date an attribute of account, instead of a relationship attribute, if each account
can have only one customer i.e., the relationship from account to customer is many to one, or
equivalently, customer to account is one to many

What are the symbols of E-R Diagrams?


Rectangles represent entity sets.
Diamonds represent relationship sets.
Lines link attributes to entity sets and entity sets to relationship sets.
Ellipses represent attributes
-Double ellipses represent multivalued attributes.
-Dashed ellipses denote derived attributes.
Underline indicates primary key attributes (will study later)

Give example of E-R Diagram with Composite, Multivalued, and Derived Attributes.

Give example of Relationship Sets with Attributes.


What is Roles in E R diagram?

-Entity sets of a relationship need not be distinct


-The labels “manager” and “worker” are called roles; they specify how employee entities interact
via the works-for relationship set.
-Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles.
-Role labels are optional, and are used to clarify semantics of the relationship

What is Cardinality Constraints?


-We express cardinality constraints by drawing either a directed line (), signifying “one,” or an
undirected line (—), signifying “many,” between the relationship set and the entity set.
-E.g.: One-to-one relationship:
-A customer is associated with at most one loan via the relationship borrower
-A loan is associated with at most one customer via borrower
What is One-To-Many Relationship?
In the one-to-many relationship a loan is associated with at most one customer via borrower, a
customer is associated with several (including 0) loans via borrower

What is Weak Entity Sets?


-An entity set that does not have a primary key is referred to as a weak entity set.
-An entity set that has a primary key is termed as Strong entity set
-For a weak entity set to be meaningful, it must be associated with another entity set called
identifying entity set
-Association is total and one-to-many from the identifying to the weak entity set
-Identifying relationship depicted using a double diamond
-The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes
among all the entities of a weak entity set with the help of a strong entity set
-The primary key of a weak entity set is formed by the primary key of the identifying (strong)
entity set on which the weak entity set is existence dependent, plus the weak entity set’s
discriminator.
Give examples of more Weak Entity Set

-In a university, a course is a strong entity and a course-offering can be modeled as a weak entity
-The discriminator of course-offering would be semester (including year) and section-number (if
there is more than one section)
-If we model course-offering as a strong entity we would model course-number as an attribute.
Then the relationship with course would be implicit in the course-number attribute

What is Specialization?

-Top-down design process; we designate subgroupings within an entity set that are distinctive
from other entities in the set.
-These subgroupings become lower-level entity sets that have attributes or participate in
relationships that do not apply to the higher-level entity set.
-Depicted by a triangle component labeled ISA (E.g. customer “is a” person).
-Attribute inheritance – a lower-level entity set inherits all the attributes and relationship
participation of the higher-level entity set to which it is linked.
What is existence dependency?

Existence dependency means the lifetime of one entity relies on another. If the existence of an
entity ‘x’ depends on the existence of entity ‘y’, then ‘x’ is called existence dependent on ‘y’. if y
is deleted then x will be automatically deleted.

y is dominated entity .
x is subordinate entity.

Imagine a child table (dependent) with a foreign key referencing a parent table. If a record in the
parent table is deleted (ceases to exist), the corresponding records in the child table may also be
deleted to ensure data integrity.

What is UML?

UML (Unified Modeling Language) is a standardized visual language for modeling the structure
and behavior of software systems.

Name Address
id Customer
Id
Customer Name
address

ERD models entity where UML model objects.


Object are like entity but additionally provides a set of function

ER diagrams focus on data modeling (entities & relationships) while UML offers a broader
toolkit for software system design.

What is Generalization?

-A bottom-up design process – combine a number of entity sets that share the same features into
a higher-level entity set.
-Specialization and generalization are just inversions of each other; they are represented in an E-
R diagram in the same way.
-The terms specialization and generalization are used interchangeably.
Specialization and Generalization

-Can have multiple specializations of an entity set based on different features.


-E.g. permanent-employee vs. temporary-employee, in addition to officer vs. secretary vs. teller
-Each particular employee would be
-a member of one of permanent-employee or temporary-employee, and also a
member of one of officer, secretary, or teller
-The ISA relationship also referred to as superclass - subclass relationship

What is Aggregation?

Consider the ternary relationship works-on, which we saw earlier


Suppose we want to record managers for tasks performed by an
employee at a branch
What is the difference between a week entity set and a strong entity set? Explain with an
example.
Draw an ER diagram for the following scenario: "In a university, a student enrolls in one or more
courses. Each course is taught by a single teacher. To maintain instruction quality, a teacher can
deliver only one course."

Why OO Model ?
Limitations of Relational Model for Complex Applications e.g. CAD Internet, Bio-informatic
applications etc.
To Support Complex Data Type e.g.
Address{ Street No, Street Name, City}, Phone-number { o---n entries}

What is Object – Classes?


There are many similar objects in a database.
We can group similar objects to form a “Class”.
– Class employee{ /* Variables*/
String Name;
String Address;
Date Start-date;
int Salary;
/*Messages*/
int annual-salary();
String get Name();
String get-Address();
int Service-Length();
}

What is Inheritance?
-Several classes are similar
-To allow the direct representation of similarities among classes, we can construct class
hierarchy
Person

Employee Customer

Officer
Security Others

Object of a class contain variables defined in its super classes.


What is Object Identity?

-An object retains its identity even if same or all of the values of variables or definitions of
methods change over time.
-Object-Oriented Systems use an unique object identifier to identify object.
-Object Database Management Group( ODM G ) Standardizes Object Query Language( OQL)
like SQL in relational systems.

What is Object Relational Model?


 Object – relational data model extends the relational data model by providing a richer
type system including complex data types and object- orientation.
 SQL need to be correspondingly extended to deal with the richer type system.
 ORDBMS provide a convenient migration path for users of relational databases who wish
to use object-oriented features.

What is Conceptual, Logical, and Physical Data Models?

Logical Data Model

 Description: Adds more detail to the conceptual model, specifying the attributes of the
entities and the relationships between them, without considering how these will be
physically implemented in the database.
 Purpose: To prepare a detailed blueprint of the database structure that can be used to
design the physical model.

Physical Data Model

 Description: Translates the logical data model into a specific database management
system (DBMS), defining tables, columns, data types, indexes, and constraints.
 Purpose: To implement the actual database structure in the chosen DBMS for efficient
storage, retrieval, and management of data.
Chapter: Relational Model and Relational Algebra

What the Relational Systems in Database?

• Information is stored as tuples or records in relations or tables


• There is a sound mathematical theory of relations
• Most modern DBMS are based on the relational model

• The relational model covers 3 areas:


• Data structure
• Data integrity
• Data manipulation

Characteristics or relational system are:


Simple,
has all the properties to process,
data with efficiency

What is relation (table) in Database?

Relation/Table: A table in a database represents a relation, which is a set of tuples (rows). Each
table consists of columns (fields) and rows (records).

Define degree of relation


The degree of a relation (or arity) in a database refers to the number of attributes (columns) in a
table. It represents the count total number of fields (5 is in the example given below)

Therefore, the degree of the Employees (relation name) relation is 5.


Define cardinality of relation

The cardinality of a relation in a database refers to the number of tuples (rows) present in a
table.

Therefore, the cardinality of the Employees relation is 3.

What is union-compatible relation?

Two relation R and S are union-compatible if they have same number of columns and
corresponding column have the same domains.

Example:

Union-compatible relation : R and S

S
The result of the union operation on these two tables will be:

Let R and S be two union-compatible relations . then R ∪ S is a relation which contains touples
from both relation.

R ∪ S ={X: X∈ R or X ∈S}

Example non union-compitibale:


Another example of non union-compitibale:

Define super key , candidate key and primary key and foreign key in database.

Super Key

A super key is a set of one or more attributes (columns) that can uniquely identify a record (row)
in a table. A super key can have additional attributes that are not necessary for unique
identification.

Example: In a student table, possible super keys could be:

 {StudentID}
 {StudentID, Name}
 {StudentID, Name, Email}

Candidate Key

A candidate key is a minimal super key, meaning it is a super key with no redundant attributes.
In other words, it is the smallest subset of attributes that can uniquely identify a record in a table.

Example: From the previous super keys, possible candidate keys could be:{StudentID}

 {Email}

Here, {StudentID, Name} is not a candidate key because Name is redundant (StudentID alone
can uniquely identify a record).

Primary Key

A primary key is a candidate key that is chosen by the database designer to uniquely identify
records in a table. There can be only one primary key per table, and it cannot contain null values.

Example: If we choose {StudentID} as the primary key for the student table, it means StudentID
will uniquely identify each record in that table, and no two students can have the same
StudentID.
Foreign key:
A foreign key in a database is a field (or collection of fields) in one table that uniquely identifies
a row of another table. It establishes a link between the two tables, enforcing referential integrity
by ensuring that the value in the foreign key column must match a primary key value in the
referenced table.
Customers Table

Orders Table

In this example:

 CustomerID in the Customers table is the primary key.


 CustomerID in the Orders table is a foreign key that references the CustomerID in the
Customers table.

Describe the options to maintain referential integrity.


Referential integrity is a concept in relational databases that ensures the consistency and validity
of data relationships between tables. It requires that a foreign key in one table must always refer
to a valid, existing primary key in another table. This guarantees that relationships between
tables remain consistent and that there are no orphaned records.

Restrict (No Action): This disallows inserting a record with a foreign key value that doesn't
exist in the referenced table's primary key.
Cascade: When a record with a referenced primary key is deleted, all corresponding foreign
key references are automatically deleted as well (cascading the deletion).
Set Null: Instead of deleting, the foreign key value can be set to null when the referenced
primary key is deleted.
Set Default: A default value can be assigned to the foreign key if the referenced primary key
is deleted.

Triggers Custom procedural code that executes in response to certain events (INSERT,
UPDATE, DELETE) on a table.
What is Integrity Constraints?

Integrity Constraints are rules applied to ensure the accuracy, consistency, and validity of data
within a database. They enforce the reliability of data by restricting the types of data that can be
inserted, updated, or deleted, thus maintaining the integrity of the database.

CREATE TABLE Accounts (


AccountID INT PRIMARY KEY,
Balance DECIMAL(10, 2),
CHECK (Balance >= 0) );

CREATE TABLE Products (


ProductID INT PRIMARY KEY,
ProductName VARCHAR(100) NOT NULL,
Price DECIMAL(10, 2) NOT NULL );

Design a database indicating the tables (including primary key and foreign key) for the
following scenario:
In a car company, there are many employees, some of them are in sales, some are in accounts,
some are human resource and some are in management. The company sells cars to customers.
Customer information including names, addresses, mobile numbers, etc. are vital. The orders of
each customer are also stored. The payment information with payment amount, payment type,
payment date, etc. are important.

To design a database for a car company with the given requirements, we'll create several tables
to store information about employees, customers, orders, and payments. Each table will include
primary keys and foreign keys where necessary to establish relationships between the data.

Tables

1. Employees
o EmployeeID (Primary Key)
o Name
o Department (Sales, Accounts, Human Resource, Management)
o Address
o MobileNumber
2. Customers
o CustomerID (Primary Key)
o Name
o Address
o MobileNumber
3. Orders
o OrderID (Primary Key)
o CustomerID (Foreign Key references Customers.CustomerID)
o OrderDate
o TotalAmount
4. OrderDetails
o OrderDetailID (Primary Key)
o OrderID (Foreign Key references Orders.OrderID)
o CarID (Foreign Key references Cars.CarID)
o Quantity
o UnitPrice
5. Payments
o PaymentID (Primary Key)
o OrderID (Foreign Key references Orders.OrderID)
o PaymentAmount
o PaymentType (e.g., Credit Card, Cash, Bank Transfer)
o PaymentDate
6. Cars
o CarID (Primary Key)
o CarModel
o Manufacturer
o Price

Relationships

 Each Order is placed by a Customer, and a customer can place multiple orders.
 Each Order can have multiple OrderDetails, indicating different cars purchased in that
order.
 Each Payment is associated with an Order.
 Each OrderDetail includes details about a specific Car.
What are the Six basic operators(Fundamental operator) of Relational algebra?

Unary operator:
Select(σ) [raw]
project (π) [column]
union (∪)

Binary operator used on pair of operation:


set difference(-)
Cartesian product(x)
rename

Give examples of Select Operation.

Relation r

A B C D

7
  1
7
  5
3
  12
1
  23
0

A=B ^ D > 5 (r)

A B C D

1 7
 
2 1
 
3 0
Describe Select Operation.

-Notation:  p(r)
-p is called the selection predicate
-Defined as:
p(r) = {t | t  r and p(t)}
-Where p is a formula in which terms are connected by :  (and),  (or),  (not)
- Each term is one of:
<attribute>op <attribute> or <constant>
where op is one of: =, , >, . <. 
-Example of selection:
 branch-name=“Perryridge”(account)

Give example Project Operation

Relation r:

A,C (r)
Notation:

A1, A2, …, Ak (r)


where A1, A2 are attribute names and r is a relation name.
The result is defined as the relation of k columns obtained by erasing the columns that are not
listed
Duplicate rows removed from result, since relations are sets
E.g. To eliminate the branch-name attribute of account
account-number, balance (account)

Relations r, s:

r  s:
Explain Union Operation.

Notation: r  s
Defined as:
r  s = {t | t  r or t  s}
For r  s to be valid.
1. r, s must have the same number of attributes
2. The attribute domains must be compatible
e.g., 2nd column
of r deals with the same type of values as does the 2nd column of s
E.g. to find all customers with either an account or a loan
customer-name (depositor)  customer-name (borrower)

Explain Set Difference Operation.

r – s:
-Notation r – s
Defined as:
r – s = {t | t  r and t  s}
-Set differences must be taken between compatible relations.
-r and s must have the same number of attributes
-attribute domains of r and s must be compatible

Give Example of Cartesian-Product Operation

r x s:

Notation r x s
Defined as:
r x s = {t q | t  r and q  s}
Assume that attributes of r(R) and s(S) are disjoint. (That is, R  S = ).
If attributes of r(R) and s(S) are not disjoint, then renaming must be used.
How Rename Operation works?
-Allows us to name, and therefore to refer to, the results of relational-algebra expressions.
-Allows us to refer to a relation by more than one name.
-Example:  x (E)
-returns the expression E under the name X
- If a relational-algebra expression E has arity n, then
x (A1, A2, …, An) (E)
returns the result of expression E under the name X, and with the
attributes renamed to A1, A2, …., An.

Banking Example

branch (branch-name, branch-city, assets)


customer (customer-name, customer-street, customer-only)
account (account-number, branch-name, balance)
loan (loan-number, branch-name, amount)
depositor (customer-name, account-number)
borrower (customer-name, loan-number)

Example Queries

Find all loans of over $1200

amount > 1200 (loan)

Find the loan number for each loan of an amount greater than $1200

loan-number (amount > 1200 (loan))

Find the names of all customers who have a loan, an account, or both, from the bank

customer-name (borrower)
 customer-name (depositor)

Find the names of all customers who have a loan and an account at bank.

customer-name (borrower)
 customer-name (depositor)

Find the names of all customers who have a loan at the Perryridge branch.

customer-name (branch-name=“Perryridge”
(borrower.LN = loan.LN(borrower x loan)))
Find the names of all customers who have a loan at the Perryridge branch but do not have an
account at any branch of the bank.

customer-name (branch-name = “Perryridge”


(borrower.LN = loan.LN(borrower x loan))) –
customer-name(depositor)

Find the names of all customers who have a loan at the Perryridge branch.

 Query 1
customer-name(branch-name = “Perryridge” (
borrower.LN = loan.LN(borrower x loan)))
 Query 2
customer-name(loan.LN = borrower.LN
((branch-name = “Perryridge”
(loan)) x borrower))

Find the largest account balance


-Rename account relation as d
-The query is:

balance(account) - account.balance
(account.balance < d.balance (account x rd (account)))

Additional Operations

Set intersection
Natural join
Division
Set-Intersection Operation

Notation: r  s
Defined as:
r  s ={ t | t  r and t  s }
Assume:
r, s have the same arity
attributes of r and s are compatible
Note: r  s = r - (r - s)
Natural-Join Operation

-Let r and s be relations on schemas R and S respectively.


Then, r s is a relation on schema R  S obtained as follows:
-Consider each pair of tuples tr from r and ts from s.
If tr and ts have the same value on each of the attributes in R  S, add a tuple t to the result,
where
-t has the same value as tr on r
-t has the same value as ts on s
-Example:R = (A, B, C, D), S = (E, B, D)
-Result schema = (A, B, C, D, E)
-r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B  r.D = s.D (r x s))

Natural Join Operation – Example


Division Operation(r  s )

Suited to queries that include the phrase “for all”.


Let r and s be relations on schemas R and S respectively where
R = (A1, …, Am, B1, …, Bn)
S = (B1, …, Bn)
The result of r  s is a relation on schema
R – S = (A1, …, Am)
Outer Join

An extension of the join operation that avoids loss of information.


Computes the join and then adds tuples form one relation that do not match tuples in the other
relation to the result of the join.
Uses null values:
-null signifies that the value is unknown or does not exist
-All comparisons involving null are (roughly speaking) false by definition.
-Will study precise meaning of comparisons with nulls later
Null Values

It is possible for tuples to have a null value, denoted by null, for some of their attributes
null signifies an unknown value or that a value does not exist.
The result of any arithmetic expression involving null is null.
Aggregate functions simply ignore null values
For duplicate elimination and grouping, null is treated like any other value, and two nulls are
assumed to be the same
Chapter: SQL command

What is query language?

A query language in a database is a specialized computer language that acts as an interface for
users to ask questions, retrieve specific data, and even manipulate and manage information
within the database.

The most widely used query language for relational databases is SQL (Structured Query
Language).

Procedural Query Language:

 Requires the user to specify how to obtain the desired result by providing a detailed
sequence of operations.
 The user defines the control flow and the exact steps to be followed.
 Example: SQL's procedural extensions like PL/SQL, relational algebra etc. , where the
user writes procedures and functions that include loops and conditionals.

Non-Procedural Query Language:

 Requires the user to specify what data is needed without detailing how to obtain it.
 The database management system determines the best way to execute the query.
 Example: Standard SQL, where users write declarative queries like SELECT statements
to specify the desired data.

What are the Differentiate between drop, truncate and delete commands.

DROP:

 Purpose: Deletes an entire table, including its structure, indexes, constraints, and data.
 Usage: DROP TABLE table_name;
 Effect: Irreversible operation; removes all data and schema definition of the table.

TRUNCATE:

 Purpose: Removes all rows from a table quickly without logging individual row
deletions.
 Usage: TRUNCATE TABLE table_name;
 Effect: Fast operation; resets table data without affecting its structure. Cannot be rolled
back.
DELETE:

 Purpose: Removes specific rows from a table based on a condition or deletes all rows if
no condition is specified.
 Usage: DELETE FROM table_name WHERE condition;
 Effect: Slow compared to TRUNCATE; deletes rows one by one and logs each deletion
in the transaction log. Can be rolled back (if within a transaction) and allows using
conditions for selective deletions.

SQL Query Structure

1. Basic Form:

SELECT A1, A2, ..., An


FROM r1, r2, ..., rm
WHERE P

A1, A2, ..., An: Represent attributes (columns) to be selected.


r1, r2, ..., rm: Represent relations (tables) from which to select.
P: A predicate (condition) for filtering rows.

Relational Algebra Equivalent:


πA1, A2, ..., An(σP (r1 × r2 × ... × rm))
πA1, A2, ..., An: Projection operation to select specific attributes.
σP: Selection operation to filter rows based on predicate P.
r1 × r2 × ... × rm: Cartesian product of relations (tables).

Example SQL Query

1. Example Query:

SELECT branch-name
FROM loan

 -name attribute from the loan relation.

Relational Algebra Equivalent:

 In relational algebra, the query can be expressed as:


πbranch-name(loan)
πbranch-name: Projection operation to select the branch-name attribute.

SQL allows duplicates in relations as well as in query results.


To force the elimination of duplicates, insert the keyword distinct after select.
Find the names of all branches in the loan relations, and remove duplicates
select distinct branch-name
from loan
The keyword all specifies that duplicates not be removed.
select all branch-name
from loan

An asterisk in the select clause denotes “all attributes”


select *
from loan
The select clause can contain arithmetic expressions involving the operation, +, –, *, and /, and
operating on constants or attributes of tuples.
The query:
select loan-number, branch-name, amount * 100
from loan
would return a relation which is the same as the loan relations, except that the attribute amount is
multiplied by 100.
Design a database with table names (using normalization concept) from the seniario given
below:
In an organization there are the following information:
Emp_id, Emp_Name, Address, Emp_salary, Joining_date, current_postion, casual_leave,
earned_leave, training_id, training_date, training_duraion, training_received.

Answer:
Draw table like:
Enm_info(Emp_id, Emp_Name, Address)
Emp_Salary(Emp_id,Month,Year, Salary)
Position_info(Emp_id, postion,Joining_date)
Leave_info(Emp_id, leave_id, start_date_leave,end_date_leave)
Traing_info(T_id, Emp_id, Traing_start, Training_end, completed(Yes/No)

Here is example of a table


Customer_info(id, name, address, payment)

Write SQL command to find how many customer have paid more then 175?
Ans:
Select count(*)
From Customer_info A
Where A.paymnet>175
Write SQL command to find the name of the customer from the above table.

Answer: Select name


From Customer_info

Write SQL command to show the name and address of the customer those have paid more
than 200.
select A.Name, A.Address
from customer_info A
where A.payment>200

Consider the tables:


Bank_info(as A): (B_id,B_B_Name,B_Address)
Customer_info(as B)(id, Name, Address, Payment,B_id)

Write SQL command to find the of List of customer name, address, Bank name of those
who have paid more than 190 tk

Select B.Name, B.Address, A.B_name


From Bank_info A, Customer_info B
Where A.B_id=B.B_id and B.Payment>190.

Write SQL Command to select unique Bank id from customer_info tabe.

Select distinct B_id


From Customer_info
Write SQL Command to find Bank id and the number of customer who paid in the bank
from customer_info tabe.

Select B_id, count(*)


From from customer_info
Group by B-id
Write SQL Command to find the bank name where “XYZ” exists in the bank name.
SELECT bank_name
FROM Bank_info
WHERE bank_name LIKE '%XYZ%';

SELECT bank_name
FROM Bank_info
WHERE bank_name LIKE “% _YZ%';
Here ‘_’ means = any 1 letter and ‘%’ means any letter(with any length)

Write SQL Command to find the name of the students who study in ICT and has taken
course ICT-203
Select A.Name
From Student A, Department B, course C
Where A.dept_id=B.id and B.id=c.id and B.Name=”ICT”

Write SQL Command to change dept. Name from ICT to IT

Id Name

1 ICT

2 CSE

3 EE

Table: dept_info

Update dept_info
Set Name=”IT”
Where id=”1”

Id Name Dept5_id Status


1 X 6 0
2 Y 3 0
3 Z 2 0
4 a 5 0

Write SQL Command to Edit the status to 1 where student study in CSE and has taken
course 102
UPDATE student
SET status = 1
WHERE dept_id = (SELECT dept_id FROM departments WHERE dept_name = 'CSE') AND id IN (SELECT
student_id FROM enrollments WHERE course_id = 102);

Course table:

Std_id Course_id
1 101
1 102
2 101
2 103
3 104
3 103

Which course taken by How many student?

Select count(*), course_id


From course
Group by course_id

Result:

Count Course
2 101
1 102
2 103
1 104

Customer table

id Name Address
1 M Dhk
2 N
3 P Ctg

Select name
From customer
Where Address is NUL

Or Where Address is NUL


Find the number of students those who have NOT taken course ICT-501

Select count(*), course_id


From course
Where course_id<>ICT 501
Group by course_id

Write SQL command to create table


Create table T1(
Id int primary key,
Name varchar NOT NULL,
Price int default 0);

Write SQL command to drop a table.

Drop a table T1;

How to add a new field?


Alter table T1
Add email varchar(30);

How to delete a column

Alter table T1
Drop column email;

(alter used in SQL server, modify used in oracle and MySQL)

How to add contrain :

Alter table T1
Add contrain PK_person
Primary key (id, last name)

 The command modifies the T1 table.


 It adds a primary key constraint named PK_person.
 The primary key is a composite key made up of the id and last_name columns.
 This means that each combination of id and last_name must be unique in the table T1, and
neither id nor last_name can be null.

Create table T1(


C1 int , c2 int , c3 varchar, primary key(c1,c2);
Here c1+c2=PK

Create table T1(


C1 int , c2 int , c3 varchar, unique(c1,c2);

Create table T1(


Id int NOT NULL,
Id int unique);

How to drop a constraint

Alter table T1
Drop constraint;

Change table name

Alter table T1
Rename from T1 to T2;

Remove all data form a table


Truncate table T1

Create table T1(


C1 int , c2 int,
Check(c1>0 AND c1=c2));

Create table T1(


C1 int primary key,
C2 varchar NOT NULL);

Say T1(Roll, Name, GPA )

Insert into T1
(Roll, Name, GPA)
Value (1, “X”, 5.0);

Multiple value insert:


Insert into T1
(Roll, Name, GPA)
Value (1, “X”, 5.0),
Value (2, “Y”, 5.0),
Value (3, “Z”, 5.0);

Delete from T1
Where id=2;

Say T1(c1,c2,c3)

Create view V(c1,c2)


As select c1,c2
From T1;

Create view V
As select c1,c2
From T1
Where id=6;

Drop view V(view name)

Create index i
On T1(c1,c2);

Create unique index i


On T1(c1,c2);

Drop index i;

Give an example SQL command where the format of the date you are trying to insert, matches
the format of the date column in the table.
Explain with example the difference between "Union" and "Union all" keywords in the
context of SQL?

Union:
Description: Combines the result sets of two or more SELECT queries and removes
duplicate rows.

Example:

SELECT name FROM employees


UNION
SELECT name FROM contractors;

This query will return a list of unique names from both the employees and contractors tables,
removing any duplicates.

UNION ALL:

 Description: Combines the result sets of two or more SELECT queries without removing
duplicate rows.

Example:
SELECT name FROM employees
UNION ALL
SELECT name FROM contractors;
This query will return a list of all names from both the employees and contractors tables,
including duplicates.

Consider a database with the following three tables. The columns underlined are primary
keys, these are foreign keys in other tables. Applicant_Info (Application ID, Passport_No, Name,
Fathers_Name, Address, Mobile_No, Citizenship, Application_Date)
Booking_Info (Booking ID, Application_ID, Paid_Amount, Travel_From, Travel_To,
Tra.vel_Date) . ,
Write down the necessary SQL commands for the above Tables.
Find the applicant names, and mobile numbers of people who are Bangladeshi citizens.
Find the number of people of British citizenship who have applied so far.
Find the number of people of each country who have applied so far.
Find the applicant names, and paid amount who have paid more than four (4) Lac taka in a single
booking.
Find the applicant names, and paid amount who have paid more than four (4) Lac while traveling
from UK in a single booking.
Edit the Name of a person from Abdul Karimm to Abdul Karim (Application_ID= `102512').
Find the name of applicants whose names starts with "George".
Find the number of unique Application_Id from the Booking_Info table.

Consider the following schema:


worker (name varchar(15),department varchar(20), badge int)
titles ( titleid varchar(20), price float)
Write a stored procedure that will insert a new record in worker table. The information should be
passed as parameters.

Write a stored procedure that will update price of the costliest item in the titles table to double.

Consider a database with the following table. The columns underlined are primary keys.
Applicant_Info (Application ID, Passport_No, Name, Fathers_Name, Address, Mobile No,
Citizenship, Application_Date)
Write down the necessary SQL commands for the above Tables. Write an SQL command to add
one record in Applicant_Info Table. iii) Write an SQL command to delete that one record in
Applicant_Info Table.

Consider a university database for the scheduling of classrooms for final exams. This database
could be modeled as a single entity set exam, with attributes course-name, section-number,
room-number and time. Alternatively one or more additional entity sets could be defined, along
with relationship sets to replace some of the attributes of the exam entity set, as

 course with attributes name,department and c-number


 section with attributes s-number and enrollment, and dependent as a weak entity set
on course.
 room with attributes r-number, capacity and building

Show the E-R diagram illustrating the use of all three additional entity sets listed.

Consider the following schema: 20


Suppliers(sid: integer, sname: string, saddr: string)
Parts(pid: integer, pname: string, pcolor: string)
Subparts(spid: integer, pid: integer)
Catalog(sid: integer, pid: integer, cost: real)

The key fields are underlined, and the domain of each field is listed after the field
name. Therefore sid is the key for Suppliers, pid is the key for Parts, and sid and pid
together form the key for Catalog. The Catalog relation lists the prices charged for
parts by Suppliers. The Subparts relation lists for each part, its subparts if any.

si snam saddr si pna scolo


d e d me r

supplier catalo parts subpart


g s

cos
t
Figure: E-R diagram

Write SQL queries that answer the questions below:


Find the sids of suppliers who supply some red or green part.
Find the sids of suppliers who supply some red part and some green part.
Find the sids of suppliers who supply every part.
Find the sids of suppliers who supply every red part.
Find the pids of parts supplied by at least two different suppliers.
Find the pids of the most expensive parts supplied by suppliers named Lucent.
Find how many suppliers supply some red part.
List the names for all the subparts for a computer.
List the names for all the subparts of each part in the database.

Consider the following relations:


Student(snum: integer, sname: string, major: string, level: string, age: integer)
Class(name: string, meets at: time, room: string, fid: integer)
Enrolled(snum: integer, cname: string)
Faculty(fid: integer, fname: string, deptid: integer)

The meaning of these relations is straightforward; for example, Enrolled has one record
per student-class pair such that the student is enrolled in the class.

Write the SQL statements required to create these relations, including appropriate versions of all
primary and foreign key integrity constraints.

Express integrity constraints in SQL for the following constraint: Two classes cannot meet
in the same room at the same time.

Write a trigger that will add a record in the Employee_History table when a record is removed
from the Employee_Info table.

Consider two tables: Customer_Info(CustomerID, Name, Mobile_No) and


Customer_Backup(CustomerlD, Name, Mobile_No).
Write a procedure with an input parameter of Customer ID, that will search the name and mobile
number of that customer from Custamer_ Info table and add that record in the Customer Backup
table.

Write the following SQL commands to :


(i) create a database named "PGDB".
(ii) remove a database named "PGDB". create a table named "People" with columns of Name,
Age, District & Mobile No.
(iv) ensure that the age value in the already created "People" table is greater than 20.
(v) ensure that the District value in the already created "People" table is by default "Dhaka".
(vi) construct an index in the Mobile_No column of "People" table.
(vii) construct a view named "View People" using the Name and Age info from "People" table.
(viii) remove all data from "People" table without removing the structure of the table.

Give an example a an SQL command where the format of the date you are trying to insert,
matches the format of the date column in the table.
Chapter : Database Design and Normalization

Functional Dependencies

Functional Dependencies in a database describe a relationship between two sets of attributes in


a relation, where one set uniquely determines another set. They are fundamental in ensuring data
integrity and are used in normalization processes to reduce redundancy.

Example

In a relation R with attributes A and B, if A uniquely determines B (denoted as A → B), then for
any two tuples in R, if the A values are the same, the B values must also be the same.

For instance, in a table of students:

 StudentID → StudentName This means each StudentID uniquely determines a


StudentName.

What is Normalization ?

Normalization is the process of dividing large tables into smaller, related tables to organize data,
minimize redundancy, and improve data integrity.

What are the advantages of Normalization?

Minimized Data Redundancy: Reduces duplication of data.


Improved Data Integrity: Ensures data consistency and accuracy.
Efficient Data Management: Simplifies maintenance and updates.
Enhanced Query Performance: Optimizes data retrieval processes.
Logical Data Structure: Facilitates better understanding and organization of data.
Avoidance of Anomalies: Prevents insertion, update, and deletion anomalies.

Normal form Description


1NF A relation is in 1NF if it contains atomic values.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functionally dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transitive dependency exists.
BCNF A stronger definition of 3NF, known as Boyce-Codd Normal Form.
Give example of normalization.

1st NF:

The First Normal Form (1NF) in a database ensures that each table column contains atomic,
indivisible values, and each column contains values of a single type. Additionally, each table
must have a unique identifier or primary key.

-each field in a table contains different information


-each record needed to be unique

Course Content
Programming C,C++, Java
web HTML,PHP, ASP

Now this above table become in 1NF:

Course Content
Programming C
Programming C++
Programming Java
web HTML
Web PHP
web ASP

2NF:
The Second Normal Form (2NF) in a database builds on 1NF by ensuring that all non-key
attributes are fully functionally dependent on the entire primary key, eliminating partial
dependency on any subset of the primary key.

Prime attribute: an attribute which is a part of candidate key is known as prime attribute.

Non-prime attribute: an attribute which is NOT part of primary key is said to be non-prime
attribute.

Every non-primary attribute should be fully functionally dependent on prime key attribute.
(if X --> A holds then there should NOT be any proper subset X of A also holds true.

Conditions of 2NF:

-It should be in 1NF.


-it should NOT have partial dependency.

Table Name: student project


Std_id Project_id Stu_name Project_name

Now in 2NF:

Student

Std_id Project_id Stu_name

Project:
Std_id Project_name

3NF:

 The first condition for the table to be in Third Normal Form is that the table should be in the
Second Normal Form.

 Has no non-key attribute is transiently dependent on primary key.

 The third Normal Form ensures the reduction of data duplication. It is also used to achieve
data integrity.

Vendor table

id Name Acc_no Bank_code_no Bank

id--> Name, acc_no,Bank_code_no

i) Here in this exmaple there are Name, acc_no,Bank_code_no and they are functionally
dependent on id

ii) ‘Bank’ is function dependent on Bank_code_no.

Bank_code_no.--> Bank

So in 3 NF:

Table: Vendor

id Name Acc_no Bank_code_no


Table: Bank

Bank_code_no Bank

Another example of 3NF:


Before applying 3NF Table: Student

Stu_id Stu_name city Zip_code

After applying 3NF:

Table: Student

Stu_id Stu_name city Zip_code

Table: Zip_code
Zip_code city

With appropriate example, briefly discuss the second and third normalization form for a
relational database system.

Normalization is a process used in database design to minimize redundancy and dependency by


organizing fields and table of a database. The goal is to divide large tables into smaller, more
manageable ones without losing any data. Let's discuss the Second Normal Form (2NF) and Third Normal
Form (3NF).

Second Normal Form (2NF)

A relation is in 2NF if:

1. It is in First Normal Form (1NF).


2. All non-key attributes are fully functionally dependent on the primary key.
Example: Consider a table that records the courses taken by students:

Here, the primary key is a composite key consisting of StudentID and CourseID. The attribute
CourseName depends only on CourseID and not on StudentID. Thus, it violates 2NF as not all
non-key attributes are fully functionally dependent on the entire primary key.

To convert this to 2NF, we split it into two tables:

Students

Now, each non-key attribute is fully functionally dependent on the primary key of its table.
Third Normal Form (3NF)

A relation is in 3NF if:

1. It is in Second Normal Form (2NF).


2. There are no transitive dependencies (i.e., non-key attributes are not dependent on other non-
key attributes).

Example: Consider the Courses table from the previous example:

Here, InstructorOffice is dependent on Instructor, which is not part of the primary key.
This indicates a transitive dependency and violates 3NF.

To convert this to 3NF, we create a new table for instructors:

Courses:
Now, there are no transitive dependencies, and the table is in 3NF.

Normalization helps in organizing the database efficiently, ensuring data integrity and reducing
redundancy.
Chapter: File Organization and Indexing

Write Basic Concepts of index?


Indexing mechanisms used to speed up access to desired data.
 E.g., author catalog in library
Search Key - attribute to set of attributes used to look up records in a file.
An index file consists of records (called index entries) of the form

Index files are typically much smaller than the original file
Two basic kinds of indices:
 Ordered indices: search keys are stored in sorted order
 Hash indices: search keys are distributed uniformly across “buckets” using a
“hash function”.

What are Index Evaluation Metrics?

-Access types
 Access type can include finding records with a specified attribute value and
finding records whose record attribute value fall in a specified range
-Access time
-Insertion time
-Deletion time
-Space overhead

What is Ordered Indices?


Indexing techniques evaluated on basis of:

-In an ordered index, index entries are stored sorted on the search key value. E.g., author
catalog in library.
-Primary index: in a sequentially ordered file, the index whose search key specifies the
sequential order of the file.
 Also called clustering index
 The search key of a primary index is usually but not necessarily the primary key.
-Secondary index: an index whose search key specifies an order different from the sequential
order of the file. Also called
non-clustering index.
-Index-sequential file: ordered sequential file with a primary index.

What is Dense Index Files?


-Dense index — Index record appears for every search-key value in the file.

What is Sparse Index Files?


-Sparse Index: contains index records for only some search-key values.
 Applicable when records are sequentially ordered on search-key
-To locate a record with search-key value K we:
 Find index record with largest search-key value < K
 Search file sequentially starting at the record to which the index record points
What is difference between dense and sparse index file?

-Sparse indices requires less space and less maintenance overhead for insertions and deletions.
-Sparse indices are generally slower than dense index for locating records.
-There is a tradeoff between access time and space overhead
-A good compromise is to have a sparse index with one index entry per block.

What is Multilevel Index?


If primary index does not fit in memory, access becomes expensive.
To reduce number of disk accesses to index records, keep primary index on disk and construct a
sparse index on that primary index.
 outer index – a sparse index of primary index
 inner index – the primary index file
If even outer index is too large to fit in main memory, yet another level of index can be created,
and so on.
Indices at all levels must be updated on insertion or deletion from the file.
How Single-level index insertion occur?

 Perform a lookup using the search-key value appearing in the record to be


inserted.
 Dense indices – if the search-key value does not appear in the index, insert it.
 Sparse indices – if index stores an entry for each block of the file, no change
needs to be made to the index unless a new block is created. In this case, the first
search-key value appearing in the new block is inserted into the index.

How deletion in index take place?

If deleted record was the only record in the file with its particular search-key value, the search-
key is deleted from the index also.
Single-level index deletion:
 Dense indices – deletion of search-key is similar to file record deletion.
 Sparse indices – if an entry for the search key exists in the index, it is deleted by
replacing the entry in the index with the next search-key value in the file (in
search-key order). If the next search-key value already has an index entry, the
entry is deleted instead of being replaced.
Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level
algorithms

What do you understand by primary and secondary index? Explain with example.

Frequently, one wants to find all the records whose values in a certain field (which is not the
search-key of the primary index) satisfy some condition.
 Example 1: In the account database stored sequentially by account number, we
may want to find all accounts in a particular branch
 Example 2: We want to find all accounts with a specified balance or range of
balances
We can have a secondary index with an index record for each search-key value; index record
points to a bucket that contains pointers to all the actual records with that particular search-key
value.

Secondary Index on balance field of account


-Secondary indices have to be dense.
-When a file is modified, every index on the file must be updated, Updating indices imposes
overhead on database modification.
-Sequential scan using primary index is efficient, but a sequential scan using a secondary index is
expensive
 each record access may fetch a new block from disk

What is B+-Tree Index Files?


B+-tree indices are an alternative to indexed-sequential files.
-Disadvantage of indexed-sequential files: performance degrades as file grows. Periodic
reorganization of entire index file is required.
-Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in
the face of insertions and deletions. Reorganization of entire file is not required to maintain
performance.
-Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead.
-Advantages of B+-trees outweigh disadvantages, and they are used extensively.
B+-tree is a rooted tree satisfying the following properties:

All paths from root to leaf are of the same length


Each node that is not a root or a leaf has between [n/2] and n children.
A leaf node has between [(n–1)/2] and n–1 values
Special cases:
 If the root is not a leaf, it has at least 2 children.
 If the root is a leaf (that is, there are no other nodes in the tree), it can have
between 0 and (n–1) values.
B+-Tree Node Structure

 Ki are the search-key values


 Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of
records (for leaf nodes).
The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1
Advantages of B+ Tree Index in Database

1. Balanced Structure:
o Ensures all leaf nodes are at the same level.
o Provides uniform access time for all data.
2. Efficient Searching:
o Binary search within nodes.
o Reduces the number of disk reads.
3. Range Queries:
o Supports efficient range queries.
o Sequential access to leaf nodes via linked list pointers.
4. Dynamic Insertions and Deletions:
o Automatically balances itself during insertions and deletions.
o Minimizes the need for reorganization.
5. Non-Redundant Internal Nodes:
o Only stores keys, not actual data.
o Reduces redundancy and saves space.
6. High Fan-Out:
o Minimizes tree height.
o Reduces I/O operations due to fewer levels to traverse.
7. Efficient Space Utilization:
o Node splitting and merging keep nodes densely packed.
o Ensures effective use of storage.

Disadvantages of B+ Tree Index in Database

1. Complex Implementation:
o More complex than simpler structures like hash tables or binary trees.
o Requires careful management of node splits and merges.
2. Space Overhead:
o Additional storage required for pointers.
o Internal and leaf nodes need to store links.
3. Write-Heavy Workloads:
o Insertions and deletions can cause node splits and merges.
o Performance may degrade under heavy write operations.
4. Maintenance Cost:
o Regular rebalancing needed to maintain structure.
o More CPU and memory overhead during maintenance operations.
5. Not Always Optimal for Single Key Lookups:
o May be slower than hash indexes for single key lookups.
o More suitable for range queries and ordered datasets.
What is Hashing ?
-A bucket is a unit of storage containing one or more records (a bucket is typically a disk block).
-In a hash file organization we obtain the bucket of a record directly from its search-key value
using a hash function.
-Hash function h is a function from the set of all search-key values K to the set of all bucket
addresses B.
-Hash function is used to locate records for access, insertion as well as deletion.
-Records with different search-key values may be mapped to the same bucket; thus entire bucket
has to be searched sequentially to locate a record.

Give Example of Hash File Organization.


Hash file organization of account file, using branch-name as key
There are 10 buckets,
The binary representation of the ith character is assumed to be the integer i.
The hash function returns the sum of the binary representations of the characters modulo 10
 E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3

What is Hash Functions?


Worst hash function maps all search-key values to the same bucket; this makes access time
proportional to the number of search-key values in the file.
An ideal hash function is uniform, i.e., each bucket is assigned the same number of search-key
values from the set of all possible values.
Typical hash functions perform computation on the internal binary representation of the search-
key.
 For example, for a string search-key, the binary representations of all the
characters in the string could be added and the sum modulo the number of buckets
could be returned. .
How to Hand of Bucket Overflows?
Bucket overflow can occur because of
 Insufficient buckets
 Skew in distribution of records. This can occur due to two reasons:
 multiple records have same search-key value
 chosen hash function produces non-uniform distribution of key values
Although the probability of bucket overflow can be reduced, it cannot be eliminated; it is
handled by using overflow buckets.
Overflow chaining – the overflow buckets of a given bucket are chained together in a linked list.
What is Hash Indices?
Hashing can be used not only for file organization, but also for index-structure creation.
A hash index organizes the search keys, with their associated record pointers, into a hash file
structure.
Describe hash file organization with example.

In a database, hash files store data like a phonebook with hashed names.

1. Hash Function: A code turns a search key (like ID) into a bucket address.
2. Storage: Records go in buckets based on their hash value.
3. Retrieval: The search key's hash value directs you to the bucket for (hopefully) fast
lookups.

It's quick but can have collisions (multiple records in one bucket).

What are advantage and disadvantages of hash function used for indexing?
Advantage hash function used for indexing:
Fast Retrieval: Direct access to records via search key enables quicker lookups compared to
traditional methods.
Simple Implementation: Easier to understand and implement than complex indexing
structures.
Dynamic Operations: Efficient handling of insertions and deletions due to hash value-based
record placement.
Fast Access: Provides constant time complexity (O(1)) for search, insert, and delete
operations on average.
Efficient Space Utilization: More space-efficient if the hash function evenly distributes
entries.
Simplicity: Straightforward and easy to manage implementation.
Uniform Distribution: Good hash functions distribute keys uniformly, reducing collisions
and enhancing efficiency.

Disadvantages of Hash Function for Indexing:

Collisions:

 Multiple keys hashing to the same index require extra handling (e.g., chaining, open
addressing), potentially degrading performance.

Fixed Size:

 The hash table size must be predefined and can be difficult to adjust dynamically.

Poor Performance in Worst Case:

 With many collisions, time complexity can degrade to O(n).

No Range Queries:

 Hash indexing does not efficiently support range queries.

Dependency on Hash Function:

 Performance depends on the hash function's quality; poor hash functions can cause
clustering and performance issues.

No Order:

 Hashing does not maintain data order, requiring additional techniques for ordered
retrieval.

Non-Key Value Search:

 Searching for values in non-key fields is less efficient compared to indexed search trees.
Chapter: Transaction management

What is transaction? Describe in brief the ACID properties of a database system.

A transaction is a single unit of program execution that access and possible update various data
item maintaining data consistency data consistency and integrity, with changes committed if
successful or rolled back if any operation fails.( ACID properties).

- A database must see consistent database before execution


-During transection execution the database may be inconsistent
- When the transaction is committed the database must be consistent.

2 major issue to deal with:


i) failure of hardware/system crash
ii) concurrent execution of multiple transection.

ACID Properties

1. Atomicity (All or nothing)


o Definition: Atomicity ensures that all operations within a transaction are
completed successfully. If any operation fails, the entire transaction is aborted,
and the database is left unchanged.
o Example: In a banking system, transferring money from one account to another
involves debiting one account and crediting another. Atomicity ensures that either
both operations are completed, or neither is.
o
2. Consistency(Valid state to valid state)
o Definition: Consistency ensures that a transaction brings the database from one
valid state to another valid state, maintaining the database's predefined rules and
constraints.
o Example: In a university database, if a new student is added, the database should
maintain all integrity constraints such as unique student IDs and valid course
registrations.
3. Isolation (Transactions do not interfere)
o Definition: Isolation ensures that transactions are executed in isolation from each
other. Concurrent transactions should not interfere with each other, and the
intermediate state of a transaction should not be visible to other transactions.
o Example: If two transactions are updating the same set of rows in a table,
isolation ensures that one transaction’s updates do not affect the other transaction
until the first one is complete.
4. Durability(Changes are permanent)
o Definition: Durability ensures that once a transaction has been committed, its
changes are permanent, even in the event of a system failure.
o Example: After transferring money between accounts, the changes should persist
even if the system crashes immediately after the transaction is committed.
Describe different transaction states with necessary transition diagram.

Transaction States

1. Active
o The initial state when the transaction begins. Operations are being executed.
2. Partially Committed
o After the final operation has been executed but before the transaction is
committed. All changes are temporary until successfully committed.
3. Committed
o The state after the transaction has been successfully completed and all changes are
permanently saved to the database.
4. Failed
o The state when the transaction cannot proceed due to a failure, such as a
constraint violation or system crash.
5. Aborted
o The state after the transaction has been rolled back, and the database is restored to
its state prior to the start of the transaction.
Describe the shadow database scheme for implementing database atomicity and durability.

Shadow Database Scheme

The shadow database scheme is a technique to ensure atomicity and durability in databases. It
involves maintaining two copies of the database: the current database and the shadow (backup)
database.

1. Current Database: The working copy where all transactions are initially applied.
2. Shadow Database: An unchanged copy of the database representing the last consistent
state.

Steps

1. Begin Transaction:
o Operations are applied to the current database.
2. Commit Transaction:
o Before committing, the changes are written to a new location in the storage.
o If the commit is successful, the shadow database pointer is updated to point to the
new location.
3. Rollback Transaction:
o If a transaction fails, the system discards changes in the current database.
o The shadow database remains unchanged, ensuring durability.

Benefits

 Atomicity: Ensures that either all operations of a transaction are reflected in the database
or none at all.
 Durability: Once a transaction is committed, its changes are permanent even in case of a
system failure.

How to Implement of Atomicity and Durability?

The recovery-management component of a database system implements the support for


atomicity and durability.
The shadow-database scheme:
 assume that only one transaction is active at a time.
 a pointer called db_pointer always points to the current consistent copy of the
database.
 all updates are made on a shadow copy of the database, and db_pointer is made
to point to the updated shadow copy only after the transaction reaches partial
commit and all updated pages have been flushed to disk.
 in case transaction fails, old consistent copy pointed to by db_pointer can be
used, and the shadow copy can be deleted.
The shadow-database scheme:

-Assumes disks to not fail


-Useful for text editors, but extremely inefficient for large databases: executing a single
transaction requires copying the entire database.

What is Concurrent Executions?


Multiple transactions are allowed to run concurrently in the system. Advantages are:
 increased processor and disk utilization, leading to better transaction throughput:
one transaction can be using the CPU while another is reading from or writing to
the disk
 reduced average response time for transactions: short transactions need not wait
behind long ones.
Concurrency control schemes – mechanisms to achieve isolation, i.e., to control the interaction
among the concurrent transactions in order to prevent them from destroying the consistency of
the database
How control concurrency in database?

Controlling concurrency in databases typically involves using various locking mechanisms to


ensure that transactions do not interfere with each other in a way that would lead to data
inconsistency. Here are the key methods:

1. Lock-Based Protocols

Controlling concurrency in databases involves using various locking mechanisms to ensure


transactions do not interfere with each other. Here are the key methods:

Lock-Based Protocols

Shared Lock (S)

 Shared Mode (S): Allows multiple transactions to read the same data item
simultaneously.
 Behavior: Transactions can acquire shared locks concurrently, but no transaction can
write to the data item until all shared locks are released.

Exclusive Lock (X)

 Exclusive Mode (X): Allows a transaction to read and write to a data item.
 Behavior: When a transaction holds an exclusive lock, no other transactions can read or
write to the data item until the exclusive lock is released.

Two-Phase Locking (2PL)

 Growing Phase: Transactions acquire all the locks they need but do not release any.
 Shrinking Phase: Transactions release their locks but do not acquire any new ones.
 Guarantee: Ensures serializability but can lead to deadlocks.
What is deadlock in Database? How to handle it?

A deadlock in a database occurs when two or more transactions are waiting for each other to
release locks, creating a cycle of dependencies that prevents any of them from proceeding.

Handling Deadlocks

1. Prevention Protocols:

 Ensure no circular wait: Enforce a strict order in which transactions must request locks,
avoiding cycles in the wait-for graph.

2. Wait-Die Scheme:

 Older Transactions Wait: If an older transaction requests a lock held by a younger


transaction, it waits.
 Younger Transactions Die: If a younger transaction requests a lock held by an older
transaction, it is aborted and restarted later.

3. Wound-Wait Scheme:

 Older Transactions Wound Younger Ones: If an older transaction requests a lock held
by a younger transaction, the younger transaction is aborted (wounded) and restarted
later.
 Younger Transactions Wait: If a younger transaction requests a lock held by an older
transaction, it waits.

Deadlock Detection and Resolution:

 Deadlock Detection: Regularly check for cycles in the wait-for graph.


 Resolution: Abort one or more transactions to break the cycle, typically choosing the one
that has done the least amount of work (to minimize rollback costs).

Timeout:

 Transactions wait for a specified period before timing out and rolling back if they cannot
acquire the necessary locks.
How to recovery from deadlock?

1. Rollback:

 Partial Rollback: Only the deadlocked transactions are rolled back to a point before they
requested the conflicting locks.
 Steps:
o Identify the deadlocked transactions.
o Roll back these transactions to a safe state.
o Restart them to continue execution.

2. Total Rollback:

 Complete Rollback: All the transactions involved in the deadlock are rolled back
entirely.
 Steps:
o Abort all transactions in the deadlock cycle.
o Restart them from the beginning.

Choosing Transactions to Abort

 Least Cost: Preferably abort transactions that have done the least amount of work to
minimize rollback costs.
 Age: Younger transactions are often chosen to abort over older transactions to avoid
significant disruption.

What is Schedules?
Schedules – sequences that indicate the chronological order in which instructions of concurrent
transactions are executed
 a schedule for a set of transactions must consist of all instructions of those
transactions
 must preserve the order in which the instructions appear in each individual
transaction.

Example Schedules

Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The following
is a serial schedule (Schedule 1 in the text), in which T1 is followed by T2.
Let T1 and T2 be the transactions defined previously. The following schedule (Schedule 3
in the text) is not a serial schedule, but it is equivalent to Schedule 1.

In both Schedule 1 and 3, the sum A + B is preserved.


The following concurrent schedule (Schedule 4 in the text) does not preserve the value of the the
sum A + B.

What is Serializability?

 Each transaction keeps the database consistent.


 So, serial execution of transactions also keeps it consistent.

A schedule (possibly concurrent) is serializable if it is equivalent to a serial schedule. There are


two types of serializability:

1. Conflict Serializability
2. View Serializability

We focus only on read and write operations. Transactions can perform any computation on data
in local buffers between reads and writes. Our schedules include only read and write instructions.

What is Conflict Serializability?

-Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists
some item Q accessed by both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
-Intuitively, a conflict between li and lj forces a (logical) temporal order between them. If li and
lj are consecutive in a schedule and they do not conflict, their results would remain the same
even if they had been interchanged in the schedule.

What do you understand by view serializability? Explain with view equivalent conditions
with example.

View Serializability

View serializability is a concept in database concurrency control that ensures transactions are
executed in a way that produces the same results as some serial (one after the other) execution of
those transactions, preserving the consistency of the database.

View Equivalent Conditions

Two schedules are considered view equivalent if they satisfy the following conditions:

1. Initial Read:
o The same transaction reads the initial value of each data item in both schedules.
2. Update Read:
o If a transaction reads a value written by another transaction, the same transaction
reads that value in both schedules.
3. Final Write:
o The same transaction performs the final write operation on each data item in both
schedules.

Example:

Consider two transactions, T1 and T2, accessing data items A and B:

Schedule S1 (Serial):

1. T1 reads A
2. T1 writes A
3. T2 reads B
4. T2 writes B

Schedule S2 (Concurrent):

1. T1 reads A
2. T2 reads B
3. T1 writes A
4. T2 writes B
In this example:

 Initial Read (A): Both schedules satisfy this condition.


 Write After Read (A): Both T1's read and write of A occur in the same order in both
schedules.
 Write After Write: There are no writes to the same data item (A or B) by different
transactions in either schedule.

Therefore, even though the execution order is different in S2, the final outcome (values of A and
B) is the same as if they were executed sequentially in S1. This makes S2 view equivalent to S1,
and thus, view serializable.

Is the following schedule conflict serializable? Show the entire procedure.

T1 T2 T3
Read(P)
P : = PX5
Write(P)
Read(P)
Value=PX0.5
P : = P-value
Write(P)
Read(Q)
Q : = Q-100
Write(Q) Read(P)
M:=P
Write(M)

Read(Q)
Q : = Q+value
Write(Q)
Chapter: Integrity and Security

What is Domain Constraints?

• Integrity constraints guard against accidental damage to the database, by ensuring that
authorized changes to the database do not result in a loss of data consistency.
• Domain constraints are the most elementary form of integrity constraint.
• They test values inserted in the database, and test queries to ensure that the comparisons
make sense.

What is Referential Integrity?

• Ensures that a value that appears in one relation for a given set of attributes also appears
for a certain set of attributes in another relation.
– Arises due to dangling tuple
– Subset dependency
– Example: If “Perryridge” is a branch name appearing in one of the tuples in the
account relation, then there exists a tuple in the branch relation for branch
“Perryridge”.

What is Assertions?

• An assertion is a predicate expressing a condition that we wish the database always to


satisfy.
• An assertion in SQL takes the form
create assertion <assertion-name> check <predicate>
• When an assertion is made, the system tests it for validity, and tests it again on every
update that may violate the assertion
– This testing may introduce a significant amount of overhead; hence assertions
should be used with great care.
• Asserting
for all X, P(X)
is achieved in a round-about fashion using
not exists X such that not P(X)

What is Triggers?

• A trigger is a statement that is executed automatically by the system as a side effect of a


modification to the database.
• To design a trigger mechanism, we must:
– Specify the conditions under which the trigger is to be executed.
– Specify the actions to be taken when the trigger executes.
• Exmple:
• Suppose that instead of allowing negative account balances, the bank deals with
overdrafts by
– setting the account balance to zero
– creating a loan in the amount of the overdraft
– giving this loan a loan number identical to the account number of the overdrawn
account
• The condition for executing the trigger is an update to the account relation that results in
a negative balance value.

When Not To Use Triggers?


• Triggers were used earlier for tasks such as
– maintaining summary data (e.g. total salary of each department)
– Replicating databases by recording changes to special relations (called change or
delta relations) and having a separate process that applies the changes over to a
replica
• There are better ways of doing these now:
– Databases today provide built in materialized view facilities to maintain
summary data
– Databases provide built-in support for replication

What is the Security?

• Security - protection from malicious attempts to steal or modify data.


– Database system level
• Authentication and authorization mechanisms to allow specific users
access only to required data
• We concentrate on authorization in the rest of this chapter
– Operating system level
• Operating system super-users can do anything they want to the database!
Good operating system level security is required.
– Network level: must use encryption to prevent
• Eavesdropping (unauthorized reading of messages)
• Masquerading (pretending to be an authorized user or sending messages
supposedly from authorized users)

– Physical level
• Physical access to computers allows destruction of data by intruders;
traditional lock-and-key security is needed
• Computers must also be protected from floods, fire, etc.
– More in Chapter 17 (Recovery)
– Human level
• Users must be screened to ensure that an authorized users do not give
access to intruders
• Users should be trained on password selection and secrecy
What is Authorization?

Forms of authorization on parts of the database:


• Read authorization - allows reading, but not modification of data.
• Insert authorization - allows insertion of new data, but not modification of existing
data.
• Update authorization - allows modification, but not deletion of data.
• Delete authorization - allows deletion of data

Forms of authorization to modify the database schema:


• Index authorization - allows creation and deletion of indices.
• Resources authorization - allows creation of new relations.
• Alteration authorization - allows addition or deletion of attributes in a relation.
• Drop authorization - allows deletion of relations.

What Authorization and Views?


• Users can be given authorization on views, without being given any authorization on the
relations used in the view definition
• Ability of views to hide data serves both to simplify usage of the system and to enhance
security by allowing users access only to data they need for their job
• A combination or relational-level security and view-level security can be used to limit a
user’s access to precisely the data that user needs.
• Suppose a bank clerk needs to know the names of the customers of each branch, but is
not authorized to see specific loan information.
• Approach: Deny direct access to the loan relation, but grant access to the view
cust-loan, which consists only of the names of customers and the branches at
which they have a loan.
• The cust-loan view is defined in SQL as follows:
create view cust-loan as
select branchname, customer-name
from borrower, loan
where borrower.loan-number = loan.loan-number

• The clerk is authorized to see the result of the query:


select *
from cust-loan
• When the query processor translates the result into a query on the actual relations in the
database, we obtain a query on borrower and loan.
• Authorization must be checked on the clerk’s query before query processing replaces a
view by the definition of the view.
Authorization on Views
• Creation of view does not require resources authorization since no real relation is being
created
• The creator of a view gets only those privileges that provide no additional authorization
beyond that he already had.
• E.g. if creator of view cust-loan had only read authorization on borrower and loan, he
gets only read authorization on cust-loan

What is Encryption?

• Data may be encrypted when database authorization provisions do not offer sufficient
protection.
• Properties of good encryption technique:
– Relatively simple for authorized users to encrypt and decrypt data.
– Encryption scheme depends not on the secrecy of the algorithm but on the
secrecy of a parameter of the algorithm called the encryption key.
– Extremely difficult for an intruder to determine the encryption key.

What is Authentication?

• Password based authentication is widely used, but is susceptible to sniffing on a network


• Challenge-response systems avoid transmission of passwords
– DB sends a (randomly generated) challenge string to user
– User encrypts string and returns result.
– DB verifies identity by decrypting result
– Can use public-key encryption system by DB sending a message encrypted using
user’s public key, and user decrypting and sending the message back

• Digital signatures are used to verify authenticity of data


– E.g. use private key (in reverse) to encrypt data, and anyone can verify
authenticity by using public key (in reverse) to decrypt data. Only holder of
private key could have created the encrypted data.
– Digital signatures also help ensure nonrepudiation sender
cannot later claim to have not created the data

You might also like