0% found this document useful (0 votes)
2 views64 pages

Database Systems Notes

The document outlines the key concepts and components of Database Management Systems (DBMS), including database models, normalization, SQL, and transaction management. It discusses the advantages and disadvantages of traditional file processing systems compared to the database approach, emphasizing data integration, sharing, and consistency. Additionally, it covers the evolution of databases, the 3-schema architecture, entity-relationship modeling, and distributed database systems.

Uploaded by

francismungangu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views64 pages

Database Systems Notes

The document outlines the key concepts and components of Database Management Systems (DBMS), including database models, normalization, SQL, and transaction management. It discusses the advantages and disadvantages of traditional file processing systems compared to the database approach, emphasizing data integration, sharing, and consistency. Additionally, it covers the evolution of databases, the 3-schema architecture, entity-relationship modeling, and distributed database systems.

Uploaded by

francismungangu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

Database Management Systems

Course Outline
• Concepts of DBMS
• File systems and databases
• Database models
• 3-schema architecture
• Entity relationship modeling
• Normalization
• Introduction to SQL
• Transaction management and concurrency control
• Distributed database systems
• database security and error recovery
Concepts and Definitions:
• Database – an organized collection of logically
related data
• Data – consists of facts , text, graphics, image,
sound, and video segments that have meaning
in the users environment.
• Information – Data that has been processed
so as to increase the knowledge of the person
using it.
• Metadata – data that describes the properties
or characteristics of other data.
• It allows designers to understand what data
means. Also helps to distinguish between
seemingly similar data
• Database management system – a s/w
application used to create, maintain and
provide controlled access to user databases.
Traditional file processing systems:
• Computer file processing systems were used
before databases came to be.
• As business applications grew and became more
complex, the file systems experienced several
limitations:
1. program- data dependence
- File descriptions are stored within each application
program that accesses a file
- A change to a file structure requires changes to all
the descriptions that access that file
- Its difficult to locate all the affected programs
- Errors are introduced in managing the changes
- 2. Duplication of data
- Applications are developed independently hence
data duplication results
- Leads to loss of data integrity, additional storage
space and more efforts in updating
3. Limited data sharing
- Each application has its own private files
- Users have little opportunity to share data
outside their applications
4. lengthy development times
- Each new application requires developer starts
the design of new fie formats and descriptions
from scratch
- Little opportunity to benefit from previous
development efforts
4. Excessive program maintenance
- From all the above limitations
The Database approach:
• Lays emphasis on the integration and sharing of
data throughout the org.
• Advantages
Program- data independence – Allows for
separation of data descriptions and application
programs
Minimal data redundancy – databases integrate
separate data files into a single logical structure
Improved data sharing- it’s a corporate resource
that authorized users can use jointly using views
 Improved data consistency
• achieved by eliminating redundancy. Simplifies data
updating and saves on storage space
 Increased productivity of application development
• Due to re-use.
 Enforcement of standards
• Single point authority in administration which
establishes and enforces standards e.g. naming
conventions, uniform procedures for access, updates
and protection
• Reduced program maintenance – due to data
independence
• Disadvantages:
 Need for skilled personnel
- Initial skill in database design and implementation
- Constant training to cope with technology
advancement
 Installation and management cost
- Cost of a multi-user DBMS
- May require upgrades to existing hardware
- Training of staff
 Need for explicit back-up and recovery
 Organizational conflict – shared databases require
consensus on data definitions and ownership
Evolution of databases:
• 1970’s – hierarchical and network
• 1980’s – relational
• 1990’s – object oriented, object -relational
3-schema architecture for database
development:

User view 1 User view 2 User view n

Conceptual
schema

Physical schema
External schema:
• Logical description of a portion of a database
required by a user to perform some task.
• Independent of the database technology
• A subset of associated conceptual schema
relevant to a particular user or group of users
• May be understood as a business transaction
e.g. form, report
Conceptual schema:
• Detailed specification of the overall structure
of organizational data
• Defines the whole database without reference
to how data will be stored
• independent of the database technology
• Mainly expressed in graphical formats e.g ER
notation, OM notation
External schema:
• Logical description of a portion of a database
required by a user to perform some task.
• Independent of the database technology
• A subset of associated conceptual schema
relevant to a particular user or group of users
• May be understood as a business transaction
e.g. form, report
Physical schema:
• Contains specification of how data from a
conceptual schema are stored in a computers
secondary memory
• Its technology dependent e.g. SQL
ENTITY- RELATIONSHIP MODEL
• E-R model
A detailed logical representation of data for an
org or a business area. It is expressed in terms of
entities, attributes, relationships
• ER diagram
A graphical representation of the ER model
1. Entities
• Entity – a person, place, object, event or concept
in the user environment about which an org
wishes to maintain data
E.g.
• person - student, patient
• Place- country, city
• Object – building, machine
• Event – sale, registration
• Concept – account, course
• Entity type – a collection of entities that share
common characteristics. Usually expressed in
singular and its name written in capital letters.
• Entity instance – a single occurrence of an entity
type.
There are two kinds:
• Strong entity type- an entity that exists
independently of other entity types e.g
EMPLOYEE
• Instances of strong entity types always have a
unique characteristic called identifier.
• Weak entity type –an entity whose existence
depends on some other entity type. It does ot
have its own identifier.
• The entity type on which a weak entity type
depends on is called the identifying owner while
the relationship between them is called the
identifying relationship
2. Attributes
• Attribute – is a characteristic of an entity type that
is of interest to an org.
Kinds of attributes:
• Composite – one that can be broken down into
component parts e.g. address
• Simple attribute – one that can not be broken
down into smaller parts
• Multi-valued attribute – one that can take one or
more values for a given entity instance e.g. skill
• Derived attribute – one whose values can be
calculated from related attribute values
• Identifier – one that uniquely identifies
individual instances of an entity type.
3. Relationships
• Relationship type – is a meaningful association
between entity types. Verbs are mainly used to
name them.
• Degree of a relationship:
• Degree of a relationship is the number of entity
types that participate in that relationship.
• 1. unary – rship btn instances of a single entity type
• 2 Binary – rship btn instances of two entity types
• 3. ternary – a simultaneous rship among instances
of three entity types
• Cardinality constraint
• Suppose there are two entities, A & B connected
by a rship. A cardinality constraint specifies the
number of instances of entity B that can be
associated with each instance of entity A.
• Minimum cardinality of a rship is the minimum
number of instances of entity B that may be
associated with each instance of entity A.
• Maximum cardinality of a rship is the max number
of instances of entity B that may be associated
with a single occurrence of entity A
• We have:
• One to one
• One to many
• Many to many

• Participation in a rship may be optional or


mandatory for the entities involved.
• If the min. cardinality is zero, then optional
• If the min cardinality is one, then mandatory
Associative entity:
• An entity type that associates instances of one or
more entity types and contains attributes that are
peculiar to the relationship between those entity
instances.
• Represented using a diamond symbol enclosed
within an entity box
• Verb changes to a noun
• Presence of more than one relationship attribute
• For a relationship to be converted to an
associative entity type, the following conditions
should exist:
• 1. All of the relationships for the participating
entity types should be ‘many’ relationships
• 2. The associative entity should have one or
more attributes in addition to the identifier
• 3. The resulting associative entity should have
independent meaning to end users
• A hospital has large number of registered physicians. Attributes
of physicians include physician-ID and specialty. Patients are
admitted to the hospital by physicians. Attributes of patient
include patient-ID and patient-name. Any patient who is
admitted must have exactly one admitting physician. A physician
may optionally admit any number of patients. Once admitted a
given patient may be treated by at least one physician. A
particular physician may treat any number of patients. Whenever
a patient is treated, the hospital wishes to record the details of
treatment (Treatment-Detail). Components of Treatment-Detail
include: Date, Time and Results. (class example)

NORMALIZATION
• It is the process of decomposing relations with
anomalies to produce smaller well structured
relations.
• Normalization is based on the analysis of
functional dependencies.
• A functional dependency is a constraint between
two attributes or two sets of attributes. For any
relation R, attribute B is functionally dependent on
attribute A if for every valid instance of A , that
value of A uniquely determines the value of B
Example:
STUDENTCOURSE(StudID, CourseName,DateCompleted)
The functional dependency is represented as
StudID, CourseName DateCompleted
This implies that the date a course is completed is
completely determined by the identity of the student
and the name of the course.
It also implies that StudID and CourseName work in
combination as the candidate key for that relation
NB: A candidate key is always a determinant while a
determinant may not always be a candidate key.
• A candidate key is an attribute or combination
of attribute that uniquely identifies a row in a
relation.
• It has two properties:
1) Unique identification – For every row, the
value of that key must uniquely identify that
row
2) Non-redundancy no attribute in the key can
be deleted without destroying the property
of unique identification
• SSN Name, Address, BirthDate

• Implies that a person’s name, address and


birth date are functionally dependent on that
persons social security number.

• The attribute on the left hand of the arrow in a


functional dependency is called a determinant
Basic Normal forms:
• First normal form (1NF)
• A relation is in 1NF if it contains no multi-
valued attributes.
• A table with multi-valued attributes is
converted to a relation in the first normal form
by extending the data in each column to fill
cells that were empty.
• Second Normal Form(2NF):
• A relation is in 2NF if it is in first normal form and
every non-key attribute is fully functionally
dependent on the primary key.
• A relation that is in 1NF will be in 2NF if any one of
the following conditions apply:
• 1. The primary key consists of only one attribute
• 2. No non-key attributes exist in the relation i.e. all
of the attributes in the relation are components of
the primary key
• 3. Every non-key attribute is functionally
dependent on the full set of primary key attributes
• To convert a relation into 2NF, we decompose the
relation into new relations that satisfy one or
more of the conditions above

• Third Normal Form(3NF)


• A relation is in 3NF if it is 2NF and no transitive
dependencies exist.
• Transitive dependency is a functional
dependency between two or more non-key
attributes
• Example:
• Consider the relation;
• SALES(Cust-ID, Name, SalesPerson, Region)

Cust-ID Name SalesPerson Region


• This implies Cust-ID is primary key and all remaining
attributes are functionally dependent on it.
• However there is a transitive dependency in that
region is functionally dependent on salesPerson
which is functionally dependent on Cust-ID
• This introduces insertion, deletion and modification
inconsistencies or anomalies.
• These anomalies can be removed by decomposing
the relation SALES in two relations:
• SALES(Cust-ID, Name, salesPerson)
• SALESPERSON(salesPerson, Region)
• Exercise:
• For each of the following relations indicate the
normal form for that relation. If the relation is not
in third normal form, decompose it into 3NF
relations. Functional dependencies other than
those implied by the Primary key are shown
where applicable.
• CLASS(Course-No, Section-No)
• CLASS(Course-No, Section-No, Room)
• CLASS(Course-No, Section-No, Room, Capacity)
• Room Capacity
CLASS(Course-No, Section-No,CourseName,Room,Capacity)
Course-No CourseName
Room Capacity
DISTRIBUTED DATABASE SYSTEMS
• Distributed database
• A single logical database that is spread physically
across computers in multiple locations that are
connected by a data communications network. It
requires that multiple database management
systems are running at each remote site.
• The degree to which these different DBMS
cooperate and whether there is a master site that
coordinates the multiple sites distinguish different
type of distributed database environments.
Decentralized database:
• A database that is stored on computers at
multiple locations but the computers are not
interconnected by a network and users can not
share data.
• Business conditions that encourage use of
distributed databases:
1. Distribution and autonomy of business units- Depts.
and facilities in modern orgs are geographically
distributed and each unit may have authority to
create its own information
2. Data sharing- Business decisions require sharing data
across business units and it must be convenient to
consolidate data across local databases on demand
3. Data communication costs and reliability – its more
economical to locate data and applications close to
where they are needed as opposed to transferring
data across networks. (Dependence on comm.
networks can be risky)
• Objectives and trade-offs of distributed
databases:
• Location transparency - a user or an application
program updating data need not know the location
of the data. The user is unaware of data
distribution. A distributed database must
provide ,location transparency.
• Local autonomy – capability to administer a local
database and to operate independently when
connections to other nodes have failed. This
implies that there is no reliance on a central site
and each local site can administer security, log
transactions and recover when local failures occur.
• Synchronous versus asynchronous distributed
database
• Synchronous- all data across the network are
continuously kept up to date such that a user at
any site can access data anywhere on the network
and get the same answer.
• It ensures data integrity
• minimizes complexity of tracing the most recent
copy of data
• However, may result into slow response
• Asynchronous technology keeps copies of
replicated data at diff nodes so that local
servers can access data without reaching out
across the network.
• Causes delay when data updates are
propagated across the remote databases and
temporary inconsistency
• However, has better response time
• Advantages of distributed databases over
centralized databases:
• Increased reliability and availability - no total
failure
• Local control- exercise greater control over their
data
• Modular growth- easier to add a local node and
its data
• Lower communication costs – data is located close
to point of use
• Disadvantages
• Software costs and complexity - DDBMS
• Processing overhead – from coordination
among sites
• Slow response
• Data integrity problems – from increased
complexity and need for coordination
Options for distributing a database
• 1. Data replication
• Involves storing a separate copy of the database
at each site. Its common with asynchronous
distributed database technology.
• Adv:
• Reliability- back up copies
• Fast response
• Node decoupling – each transaction may proceed
without coordination across the network
• Reduced network traffic
Disadvantages:
• Storage requirements
• Complexity and cost of updating
When to use replication:
Data timeliness- apps that can tolerate out of date
data are suitable for replication
DBMS capabilities – if capabilities for data reference
across nodes is limited.
Communication network capabilities – no dedicated
connection required
Heterogenity – different OS and database designs,
replication gets complicated
• 2. Horizontal partitioning
• Rows of a relation are distributed to many
sites

• 3. Vertical partitioning
• Distributing the columns of a relation into
separate files stored at various sites while
repeating the primary key in each of the files
Advantages of partitioning
• Efficiency – data stored close to point of use and
separate from data used by other apps
• Local optimization – optimized performance for
local access
• Security – data not relevant to a site is not availed
Disadvantages
• Inconsistent access speed
• Back-up vulnerability
• DDBMS Architecture
• Each site has a local DBMS that manages the
database stored there
• Each site also has a copy of the DDBMS and
associated data dictionary which contains the
location of all data in the network as well as data
definitions
• Requests from users or apps. Are first processed
by the DDBMS which determines if the
transaction is local or global
• The DDBMS then routes the requests accordingly
DATABASE SECURITY
• It’s the protection of data against accidental or
intentional threats to its integrity and access.
Potential threats to data security:
 Accidental losses – human error , s/w or h/w
caused breaches
 Theft or fraud
 Loss of privacy or confidentiality
 Loss of data integrity – invalid/corrupted data
 Loss of availability – sabotage of h/w, networks or
apps
Generally important security features in data
management include:
 Views/sub-schemas – restrict user view to the
database
 Authorization rules- identify users and restricts
their actions against the database
 User-defined procedures – additional constraints
 Encryption procedures – encode data in an
unrecognizable format
 Authentication schemes –positively identify a user
 Back-up, journaling and check pointing capabilities
– facilitate recovery
DATABASE RECOVERY
• Mechanisms of restoring a database quickly and
accurately after loss or damage
• Basic recovery facilities
• 1. Back-up
For Large databases - dynamic back-up and
Incremental back-up
2. Journaling facilities – maintain an audit trail of
transactions and database changes
Transaction log- records of transactions processes
against the database
change log – before and after images of modified
records
Security log – alerts of security violations
• 3. Check – point facility
• Facility whereby the DBMS periodically refuses to
accept any new transactions, goes to a quiet
state whereby database and transactions logs are
synchronized.
4. Recovery manager
• Module that restores database to a correct
condition when failure occurs and resumes
processing user requests
• Recovery techniques
1. Backward recovery or rollback
• Backs out or undo unwanted changes
• Before images of records that have been changed
are applied to the database to return it to an
earlier state
• Used to reverse changes made by transactions
that aborted or terminated abnormally
2. Forward recovery or roll forward
Starts with an earlier copy of the database, after-
images which are results of good transactions are
applied and the database is quickly moved forward
to a later state
Its faster and accurate since only the most recent
‘good’ after images need to be applied
DATABASE CONCURRENCY

• Process of managing simultaneous operations


against a database so that integrity is maintained
and operations do not interfere with each other
in a multi-user environment
• Two approaches:
• Versioning/optimistic
• Locking / pessimistic
• Versioning:
• Each transaction is restricted to a view of the
database as of the time it started and when it
modifies a record, the DBMS creates a new
record version instead of overwriting the old
record.
• Whenever a conflict in record updates arises,
the earlier changes are given priority since
each transaction it time- stamped
• Locking mechanism
• Any data that are retrieved by a user for
updating must be locked or denied to other
users until the update is completed or aborted
• It enforces sequential updating process that
prevents erroneous updates
• Types of locks
1. Shared locks/S-locks/Read locks
• Allow transactions to read but not update a
record. A transaction should place a shared lock
on a record when it will only read but not update
that record.
• Placing a shared lock on a resource prevents
another user from placing an exclusive lock on
that resource.
• 2. Exclusive lock/X-lock/Write lock
• Prevents another transactions from reading
and therefore updating a record until it is
unlocked.
• A transaction should place an excusive lock on
a resource/record when it is about to update
that record.
• Placing an exclusive lock on a record prevents
another user from placing any other type of
lock on that record.
DEADLOCK
• Deadlock is a situation that results when two or
more transactions have locked a common
resource and each must wait for the other to
unlock that resource.
• There are two main ways of resolving deadlocks:
1. Deadlock prevention – user programs must lock
all the resources that they require at the beginning
of a transaction rather than one at time.
2. Deadlock resolution – allow deadlock to occur
but build mechanisms into the DBMS for detecting
and breaking the deadlock

You might also like