Unit 1
Unit 1
● File system is basically a way of arranging the files in a storage medium like
hard disk. File system organizes the files and helps in retrieval of files when
they are required. File systems consists of different files which are grouped
into directories. The directories further contain other folders and files. File
system performs basic operations like management, file naming, giving
access rules etc.
2. Redundant data can be present in a file system. In DBMS there is no redundant data.
4. There is no efficient query processing in file system. Efficient query processing is there in DBMS.
File systems provide less security in comparison to DBMS has more security mechanisms as compared to
7. DBMS. file system.
● Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC,
19 etc).
● Record: Collection of related data items, e.g. in the above example the three
data items had no meaning. But if we organize them in the following way,
then they collectively represent meaningful information.
Question 1:
● The columns of this relation are called Fields, Attributes or Domains. The
rows are called Tuples or Records.
● We now have a collection of 4 tables. They can be called a “related collection” because we can clearly find out that there
are some common attributes existing in a selected pair of tables. Because of these common attributes we may combine
the data of two or more tables together to find out the complete details of a student. Questions like “Which hostel does the
youngest student live in?” can be answered now, although Age and Hostel attributes are in different tables.
Question 2:
Disadvantages of DBMS
● It is bit complex. Since it supports multiple functionality to give the user the
best, the underlying software has become complex. The designers and
developers should have thorough knowledge about the software to get the
most out of it.
● Because of its complexity and functionality, it uses large amount of memory. It
also needs large memory to run efficiently.
● DBMS system works on the centralized system, i.e.; all the users from all over
the world access this database. Hence any failure of the DBMS, will impact all
the users.
● DBMS is generalized software, i.e.; it is written work on the entire systems
rather specific one. Hence some of the application will run slow.
Types of Databases
● Centralized databases
○ One to a few cores, shared memory
● Client-server
○ One server machine executes work on behalf of multiple client machines.
● Parallel databases
○ Many core shared memory
○ Shared disk
○ Shared nothing
● Distributed databases
○ Geographical distribution, Schema/data heterogeneity
●
DBMS
Database Engine
● A database system is partitioned into modules that deal with each of the
responsibilities of the overall system.
● The functional components of a database system can be divided into
○ The storage manager,
○ The query processor component,
○ The transaction management component.
○
● Query Processor : It interprets the requests (queries) received from end user via an
application program into instructions. It also executes the user request which is received from
the DML compiler. Query Processor contains the following components:
1. DML Compiler – which translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine understands. A
query can usually be translated into any of a number of alternative evaluation plans that all
give the same result. The DML compiler also performs query optimization, that is, it picks
the lowest cost evaluation plan from among the alternatives.
2. DDL Interpreter – which interprets DDL statements and records the definitions in the data
dictionary.
3. Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
Question 3:
Storage Manager
● Storage Manager is a program that provides an interface between the data stored in the
database and the queries received. It is also known as Database Control System. It maintains
the consistency and integrity of the database by applying the constraints and executes
the DCL statements. It is responsible for updating, storing, deleting, and retrieving data in the
database. It contains the following components –
1. Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
2. Transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed without
conflicting.
3. File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
4. Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than
the size of main memory.
Excellence and Service
CHRIST
Deemed to be University
Database users
● Naive users
○ are unsophisticated users who interact with the system by invoking one of the application programs
that have been written previously.
● Application programmers
○ are computer professionals who write application programs
● Sophisticated users
○ interact with the system without writing programs.
○ they form their requests in a database query language and submit each query to a query processor,
whose function is to break down DML statements into instructions that the storage manager
understands. Analysts who submit queries to explore data in the database fall in this category.
● Database Administrators
○ A person who has central control over the system is called a database administrator (DBA).
Question 4:
● Expand DBA
Database applications
● Database applications are usually partitioned into two or three parts
● Two-tier architecture -- the application resides at the client machine, where
it invokes database system functionality at the server machine
● Three-tier architecture -- the client machine acts as a front end and does not
contain any direct database calls.
○ The client end communicates with an application server, usually through a forms
interface.
○ The application server in turn communicates with a database system to access data.
●
Excellence and Service
CHRIST
Deemed to be University
View of Data
● Physical level - The lowest level of abstraction describes how the data are
actually stored. The physical level describes complex low-level data
structures in detail.
● Logical level - The next-higher level of abstraction describes what data are
stored in the database, and what relationships exist among those data. The
logical level thus describes the entire database in terms of a small number of
relatively simple structures. Database administrators use the logical level of
abstraction.
● View level - The highest level of abstraction describes only part of the entire
database. The view level of abstraction exists to simplify their interaction with
the system. The system may provide many views for the same database.
● Physical level: describes how a record (e.g., customer) is stored (block of consecutive
storage locations –e.g.: words or bytes). The language compiler hides this level of detail from
programmers.
● Logical level: describes the record stored in database by a type definition, and the
relationships among the record types. Programmers using a programming language work at
this level of abstraction. Similarly, database administrators usually work at this level.
● View level: application programs hide details of data types. Views can also hide information
(such as an employee’s salary) for security purposes. The views also provide a security
mechanism to prevent users from accessing certain parts of the database.
Transaction Management
Data Models
ER model
● is a high-level data model.
● It is based on a perception of a real world that consists of a collection
of basic objects, called entities, and of relationships among these
objects.
● The E-R data model employs three basic notions:
○ entity sets,
○ relationship sets, and
○ attributes.
● Entities are described in a database by a set of attributes.
● A relationship is an association among several entities.
● The set of all entities of the same type and the set of all relationships
of the same type are termed an entity set and relationship set,
respectively.
Excellence and Service
CHRIST
Deemed to be University
E-R Diagram
● The overall logical structure (schema) of a database can be expressed graphically by an E-R
diagram, which is built up from the following components:
○ Rectangles, which represent entity sets
○ Ellipses, which represent attributes
○ Diamonds, which represent relationships among entity sets
○ Lines, which link attributes to entity sets and entity sets to relationships
○ Double ellipses, which represent multivalued attributes
○ Dashed ellipses, which denote derived attributes
○ Double lines, which indicate total participation of an entity in a relationship set
○ Double rectangles, which represent weak entity sets
Entity Set
● An entity is a “thing” or “object” in the real world that is distinguishable from all other
objects.
● E.g.: each person in an enterprise is an entity.
● An entity is represented by a set of attributes. Attributes are descriptive properties
possessed by each member of an entity set.
● An entity set is a set of entities of the same type that share the same
properties, or attributes. The set of all persons who are customers at a given
bank, for example, can be defined as the entity set customer.
Attributes
● Simple and composite attributes
○ Simple attributes are not divided into subparts.
○ Composite attributes can be divided into subparts. E.g.: name attribute: first-name, middle-initial,
and last-name.
Relationship set
Most of the relationship sets in
● A relationship is an association among several entities. a database system are binary.
● A relationship set is a set of relationships of the same type.
● Represented by a diamond shape
The association between entity sets
is referred to as participation; that is,
the entity sets E1, E2, . . .,En
participate in relationship set R. A
relationship instance in an E-R
schema represents an association
between the named entities in the
real-world enterprise that is being
modeled.
● The function that an entity plays in a relationship is called that entity’s role.
● The same entity set participates in a relationship set more than once, in different roles. E.g.: Employee has
2 roles: worker, manager. Relationship: works-for
● A relationship may also have descriptive attributes.
● There can be more than one relationship set involving the same entity sets. E.g.: customer and loan entity
sets participate in the relationship set borrower. The customer and loan entity sets may participate in
another relationship set, guarantor.
Descriptive Attributes
Cardinality
Mapping cardinalities
Example:
One-to-many:
A loan is associated
Many-to-one:
with atmost one
A loan is associated with
customer through a
several customers
borrower, and a
(including zero) through
customer is associated
a borrower, and a
with several loans
customer is associated
through a borrower.
with atmost one loan
(including zero)
through a borrower.
One-to-one:
Many-to-Many:
A loan is associated with
A customer is
several customers
associated with several
(including zero) through a
loans through a
borrower, and a customer
borrower. A loan is
is associated with several
associated with several
loans through a borrower.
customers through a
(including zero)
borrower
Role
● Roles are indicated by labeling the lines that connect diamonds to rectangles.
Keys
● The values of the attribute values of an entity must be such that they can
uniquely identify the entity.
● A key allows us to identify a set of attributes that suffice to distinguish entities
from each other. Keys also help uniquely identify relationships, and thus
distinguish relationships from each other
○ A super key is a set of one or more attributes that, taken collectively, allow us to identify
uniquely an entity in the entity set. Eg.: customer-id attribute of customer entity set. Its
attributes can have null values
○ candidate keys – is a minimal superkey of an entity set. i.e., it is a superkey with no
redundant attributes. Its attributes can have null values
○ primary key – only one candidate key is selected as the primary key. Its attributes
cannot have null values All Candidate keys
are super keys. All
○ Partial key – depends on primary key super keys cannot be
candidate keys.
● The three entity sets employee, job, and branch, related through the
relationship set works-on.
● Specialization
● Generalization
● Aggregation
Specialization
● The process of designating subgroupings within an entity set is called specialization
● specialization is depicted by a triangle component labeled ISA
● The label ISA stands for “is a” and represents, for example, that a customer “is a”
person.
● The ISA relationship may also be referred to as a superclass-subclass relationship.
● Higher- and lower-level entity sets are depicted as regular entity sets—that is, as
rectangles containing the name of the entity set.
Generalization
● which is a containment relationship that exists between a higher-level entity set and
one or more lower-level entity sets.
● E.g.: person is the higher-level entity set and customer and employee are lower-level
entity sets.
● Higher- and lower-level entity sets also may be designated by the terms superclass
● and subclass, respectively.
● The person entity set is the superclass of the customer and employee subclasses.
Excellence and Service
CHRIST
Deemed to be University
Constraints on Generalization
● Condition-defined
○ Here, all the lower-level entities are evaluated on the basis of the same attribute (e.g.:
account-type), this type of generalization is said to be attribute-defined.
● User-defined
● Here, the database user assigns low-level entities to a given entity set
● Disjoint
○ If an entity may belong to only one lower-level entity set within a single generalization.
● Overlapping.
● If an entity may belong to more than one lower-level entity set within a single
generalization.
● completeness constraint
○ Total generalization or specialization. Each higher-level entity must belong to a lower-
level entity set.
○ Partial generalization or specialization. Some higher-level entities may not belong to
any lower-level entity set.
Aggregation
● Aggregation is an abstraction through which relationships are treated as higher level entities.
E.g.: the relationship set works-on (relating the entity sets employee, branch, and job) is a
higher-level entity set called works-on. Such an entity set is treated in the same manner as is
any other entity set.
● We can then create a binary relationship manages between works-on and manager to
represent who manages what tasks.
Relational Model
● The relational model is a lower-level model.
● It uses a collection of tables to represent both data and the relationships among those data.
Each table has multiple columns, and each column has a unique name.
● The relational model is an example of a record-based model.
● Its conceptual simplicity has led to its widespread adoption; today a vast majority of database
products are based on the relational model.
● Designers often formulate database schema design by first modeling data at a high level,
using the E-R model, and then translating it into the relational model.
Database Schema
● A database schema, along with primary key and foreign key dependencies, can be depicted pictorially by
schema diagrams.
● Each relation appears as a box, with the attributes listed inside it and the relation name above it. If there are
primary key attributes, a horizontal line crosses the box, with the primary key attributes listed above the
line.
● Foreign key dependencies appear as arrows from the foreign key attributes of the referencing relation to
the primary key of the referenced relation.
● Do not confuse a schema diagram with an E-R diagram. In particular, E-R diagrams do not show foreign
key attributes explicitly, whereas schema diagrams show them explicity.
1950s and early Magnetic tapes were developed for data storage. Processing of data consisted of
1960s: reading data from one or more tapes and writing data to a new tape.
Data could also be input from punched card decks, and output to printers. Data
are accessed in sequential order
Late 1960s and 1970s: Widespread use of hard disks in the late 1960s. hard disks allowed direct access
to data.
A landmark paper by Codd [1970] defined the relational model, and nonprocedural
ways of querying data in the relational model, and relational databases were born.
1980s The fully functional System R prototype led to IBM’s first relational database
product, SQL/DS.
The 1980s also saw much research on parallel and distributed databases, as well
as initial work on object-oriented databases.
Early 1990s: The SQL language was designed primarily for decision support
applications, which are query intensive, and transaction processing applications,
which are update intensive.
Late 1990s: Database system were deployed to support very high transaction processing
rates, as well as very high reliability and 24×7 availability (availability 24 hours a
day, 7 days a week, meaning no downtime for scheduled maintenance activities
● Example: to select those tuples of the loan relation where the branch is
“Perryridge”
● To select all tuples in which the amount lent is more than $1200
To combine several predicates
into a larger predicate; the
connectives and (∧), or (∨), and
not (¬) can be used.
● Example: to find those tuples pertaining to loans of more than $1200 made by
the Perryridge branch
● The project operation is a unary operation that returns its argument relation,
with certain attributes left out.
● Since a relation is a set, any duplicate rows are eliminated.
● Projection is denoted by the uppercase Greek letter pi (Π).
● The attributes appear as a subscript to Π. The argument relation follows in
parentheses.
● Example: to list all loan numbers and the amount of the loan
● Example: Find the names of all bank customers who have either an account
or a loan or both.
● First: find the names of all customers with a loan in the bank:
● Second: find the names of all customers with an account in the bank
Notice that there are 10 tuples in the result, even though there
are seven distinct borrowers and six depositors. This apparent
discrepancy occurs because Smith, Jones, and Hayes are
borrowers as well as depositors. Since relations are sets,
duplicate values are eliminated.
● Example: to find the names of all customers who have a loan at the
Perryridge branch.
Intermediate Result
● The rename operator, denoted by the lowercase Greek letter rho (ρ)
● Given a relational-algebra expression E, the expression returns the result of
expression E under the name x.
● Step 2: take the set difference between the relation Πbalance (account) and
the temporary relation just computed, to obtain the result.
● Example: find all customers who have both a loan and an account
● Example: Find the names of all customers who have a loan at the bank, along
with the loan number and the loan amount.
● Example: Find the names of all branches with customers who have an
account in the bank and who live in Harrison
● Generalized Projection
● Aggregate Functions
● Outer Join
● Null Values∗∗
Generalized Projection
● Example: Consider the credit-info relation. find how much more each person
can spend
● Result
Aggregate Functions
● Example: find out the total sum of salaries of all the part-time employees in
the bank
● Example: find the total salary sum of all part-time employees at each branch
of the bank separately First, partition the relation pt-
works into groups based on
the branch, and then apply
the aggregate function on
each group.
● Example: find the maximum salary for part-time employees at each branch
Outer Join
● Example: to generate a single relation with all the information (street, city,
branch name, and salary) about full-time employees, natural join is applied.
● We can use the outer-join operation to avoid this loss of information. There
are actually three forms of the operation: left outer join, denoted ⟕; right
outer join, denoted ⟖ ; and full outer join, denoted ⟗ .
●
Null Values∗∗
● the special value null indicates “value unknown or nonexistent,” and any arithmetic operations
(such as +,−, ∗, /) involving null values must return a null result.
● Any comparisons (such as <,<=,>,>=, =) involving a null value evaluate to special value
unknown; we cannot say for sure whether the result of the comparison is true or false, so we
say that the result is the new truth value unknown.
● Three Boolean operations are defined to deal with the truth value unknown.
1. and: (true and unknown) = unknown;
(false and unknown) = false;
(unknown and unknown) = unknown.
2. or: (true or unknown) = true;
(false or unknown) = unknown;
(unknown or unknown) = unknown.
3. not: (not unknown) = unknown.
Relational Calculus
● There exist construct – denoted by ‘∃’ there exists a tuple t in relation r such
that predicate Q(t) is true
● For all construct - denoted by ’∀’ means “Q is true for all tuples t in relation r.”
● Example: Find the loan number for each loan of an amount greater than
$1200” Tuple variable t is defined on
only the loan-number attribute,
since that is the only attribute
having a condition specified for t.
Thus, the result is a relation on
(loannumber).
● Explanation: The set of all tuples t such that there exists a tuple s in relation
loan for which the values of t and s for the loan-number attribute are equal, and
the value of s for the amount attribute is greater than $1200.
● Example: Find the names of all customers who have a loan from the Perryridge
branch t is a free variable.
Tuple variable s and u
is said to be a bound
variable.
● Example: Find all customers who have an account at the bank but do not
have a loan from the bank
● Example: Find all customers who have an account at all branches located in
Brooklyn
● Explanation: The set of all customers (that is, (customername) tuples t) such
that, for all tuples u in the branch relation, if the value of u on attribute branch-
city is Brooklyn, then the customer has an account at the branch whose name
appears in the branch-name attribute of u.
● A tuple-relational-calculus formula is built up out of atoms. An atom has one of the following forms:
1. s ∈ r, where s is a tuple variable and r is a relation (we do not allow use of the∉ operator)
2. s[x] Θ u[y], where s and u are tuple variables, x is an attribute on which s is defined, y is an attribute on which u is
defined, and Θ is a comparison operator (<, ≤, =, =, >, ≥); we require that attributes x and y have domains whose
members can be compared by Θ
3. s[x] Θ c, where s is a tuple variable, x is an attribute on which s is defined, Θ is a comparison operator, and c is a
constant in the domain of attribute x
● We build up formulae from atoms by using the following rules:
1. An atom is a formula.
2. If P1 is a formula, then so are ¬P1 and (P1).
3. If P1 and P2 are formulae, then so are P1 ∨ P2, P1 ∧ P2, and P1 ⇒ P2.
4. If P1(s) is a formula containing a free tuple variable s, and r is a relation, then ∃ s ∈ r (P1(s)) and ∀ s ∈ r (P1(s)) are
also formulae.
● As we could for the relational algebra, we can write equivalent expressions that are not identical in appearance. In the tuple
relational calculus, these equivalences include the following three rules:
1. P1 ∧ P2 is equivalent to ¬ (¬(P1) ∨ ¬(P2)).
2. ∀ t ∈ r (P1(t)) is equivalent to ¬ ∃ t ∈ r (¬P1(t)).
3. P1 ⇒ P2 is equivalent to ¬(P1) ∨ P2.
● uses domain variables that take on values from an attribute’s domain, rather
than values for an entire tuple.
● Domain relational calculus serves as the theoretical basis of the widely used
QBE language.
● An expression in the domain relational calculus is of the form:
● An atom in the domain relational calculus has one of the following forms:
1. < x1, x2, . . . , xn > ∈ r, where r is a relation on n attributes and x1, x2, . . . , xn are
domain variables or domain constants.
2. x Θ y, where x and y are domain variables and Θ is a comparison operator (<, ≤, =, =,
>, ≥). We require that attributes x and y have domains that can be compared by Θ.
3. x Θ c, where x is a domain variable, Θ is a comparison operator, and c is a constant in
the domain of the attribute for which x is a domain variable.
● We build up formulae from atoms by using the following rules:
1. An atom is a formula.
2. If P1 is a formula, then so are ¬P1 and (P1).
3. If P1 and P2 are formulae, then so are P1 ∨ P2, P1 ∧ P2, and P1 ⇒ P2.
4. If P1(x) is a formula in x, where x is a domain variable, then ∃ x (P1(x)) and ∀ x (P1(x))
are also formulae.
● As a notational shorthand, we write ∃ a, b, c (P(a, b, c)) for ∃ a (∃ b (∃ c (P(a, b, c))))
● Example: Find the loan number, branch name, and amount for loans of over
$1200:
● Example: Find all loan numbers for loans with an amount greater than $1200:
● Example: Find the names of all customers who have a loan from the
Perryridge branch and find the loan amount:
Thank you