Database Systems Notes
Database Systems Notes
Course Outline
• Concepts of DBMS
• File systems and databases
• Database models
• 3-schema architecture
• Entity relationship modeling
• Normalization
• Introduction to SQL
• Transaction management and concurrency control
• Distributed database systems
• database security and error recovery
Concepts and Definitions:
• Database – an organized collection of logically
related data
• Data – consists of facts , text, graphics, image,
sound, and video segments that have meaning
in the users environment.
• Information – Data that has been processed
so as to increase the knowledge of the person
using it.
• Metadata – data that describes the properties
or characteristics of other data.
• It allows designers to understand what data
means. Also helps to distinguish between
seemingly similar data
• Database management system – a s/w
application used to create, maintain and
provide controlled access to user databases.
Traditional file processing systems:
• Computer file processing systems were used
before databases came to be.
• As business applications grew and became more
complex, the file systems experienced several
limitations:
1. program- data dependence
- File descriptions are stored within each application
program that accesses a file
- A change to a file structure requires changes to all
the descriptions that access that file
- Its difficult to locate all the affected programs
- Errors are introduced in managing the changes
- 2. Duplication of data
- Applications are developed independently hence
data duplication results
- Leads to loss of data integrity, additional storage
space and more efforts in updating
3. Limited data sharing
- Each application has its own private files
- Users have little opportunity to share data
outside their applications
4. lengthy development times
- Each new application requires developer starts
the design of new fie formats and descriptions
from scratch
- Little opportunity to benefit from previous
development efforts
4. Excessive program maintenance
- From all the above limitations
The Database approach:
• Lays emphasis on the integration and sharing of
data throughout the org.
• Advantages
Program- data independence – Allows for
separation of data descriptions and application
programs
Minimal data redundancy – databases integrate
separate data files into a single logical structure
Improved data sharing- it’s a corporate resource
that authorized users can use jointly using views
Improved data consistency
• achieved by eliminating redundancy. Simplifies data
updating and saves on storage space
Increased productivity of application development
• Due to re-use.
Enforcement of standards
• Single point authority in administration which
establishes and enforces standards e.g. naming
conventions, uniform procedures for access, updates
and protection
• Reduced program maintenance – due to data
independence
• Disadvantages:
Need for skilled personnel
- Initial skill in database design and implementation
- Constant training to cope with technology
advancement
Installation and management cost
- Cost of a multi-user DBMS
- May require upgrades to existing hardware
- Training of staff
Need for explicit back-up and recovery
Organizational conflict – shared databases require
consensus on data definitions and ownership
Evolution of databases:
• 1970’s – hierarchical and network
• 1980’s – relational
• 1990’s – object oriented, object -relational
3-schema architecture for database
development:
Conceptual
schema
Physical schema
External schema:
• Logical description of a portion of a database
required by a user to perform some task.
• Independent of the database technology
• A subset of associated conceptual schema
relevant to a particular user or group of users
• May be understood as a business transaction
e.g. form, report
Conceptual schema:
• Detailed specification of the overall structure
of organizational data
• Defines the whole database without reference
to how data will be stored
• independent of the database technology
• Mainly expressed in graphical formats e.g ER
notation, OM notation
External schema:
• Logical description of a portion of a database
required by a user to perform some task.
• Independent of the database technology
• A subset of associated conceptual schema
relevant to a particular user or group of users
• May be understood as a business transaction
e.g. form, report
Physical schema:
• Contains specification of how data from a
conceptual schema are stored in a computers
secondary memory
• Its technology dependent e.g. SQL
ENTITY- RELATIONSHIP MODEL
• E-R model
A detailed logical representation of data for an
org or a business area. It is expressed in terms of
entities, attributes, relationships
• ER diagram
A graphical representation of the ER model
1. Entities
• Entity – a person, place, object, event or concept
in the user environment about which an org
wishes to maintain data
E.g.
• person - student, patient
• Place- country, city
• Object – building, machine
• Event – sale, registration
• Concept – account, course
• Entity type – a collection of entities that share
common characteristics. Usually expressed in
singular and its name written in capital letters.
• Entity instance – a single occurrence of an entity
type.
There are two kinds:
• Strong entity type- an entity that exists
independently of other entity types e.g
EMPLOYEE
• Instances of strong entity types always have a
unique characteristic called identifier.
• Weak entity type –an entity whose existence
depends on some other entity type. It does ot
have its own identifier.
• The entity type on which a weak entity type
depends on is called the identifying owner while
the relationship between them is called the
identifying relationship
2. Attributes
• Attribute – is a characteristic of an entity type that
is of interest to an org.
Kinds of attributes:
• Composite – one that can be broken down into
component parts e.g. address
• Simple attribute – one that can not be broken
down into smaller parts
• Multi-valued attribute – one that can take one or
more values for a given entity instance e.g. skill
• Derived attribute – one whose values can be
calculated from related attribute values
• Identifier – one that uniquely identifies
individual instances of an entity type.
3. Relationships
• Relationship type – is a meaningful association
between entity types. Verbs are mainly used to
name them.
• Degree of a relationship:
• Degree of a relationship is the number of entity
types that participate in that relationship.
• 1. unary – rship btn instances of a single entity type
• 2 Binary – rship btn instances of two entity types
• 3. ternary – a simultaneous rship among instances
of three entity types
• Cardinality constraint
• Suppose there are two entities, A & B connected
by a rship. A cardinality constraint specifies the
number of instances of entity B that can be
associated with each instance of entity A.
• Minimum cardinality of a rship is the minimum
number of instances of entity B that may be
associated with each instance of entity A.
• Maximum cardinality of a rship is the max number
of instances of entity B that may be associated
with a single occurrence of entity A
• We have:
• One to one
• One to many
• Many to many
• 3. Vertical partitioning
• Distributing the columns of a relation into
separate files stored at various sites while
repeating the primary key in each of the files
Advantages of partitioning
• Efficiency – data stored close to point of use and
separate from data used by other apps
• Local optimization – optimized performance for
local access
• Security – data not relevant to a site is not availed
Disadvantages
• Inconsistent access speed
• Back-up vulnerability
• DDBMS Architecture
• Each site has a local DBMS that manages the
database stored there
• Each site also has a copy of the DDBMS and
associated data dictionary which contains the
location of all data in the network as well as data
definitions
• Requests from users or apps. Are first processed
by the DDBMS which determines if the
transaction is local or global
• The DDBMS then routes the requests accordingly
DATABASE SECURITY
• It’s the protection of data against accidental or
intentional threats to its integrity and access.
Potential threats to data security:
Accidental losses – human error , s/w or h/w
caused breaches
Theft or fraud
Loss of privacy or confidentiality
Loss of data integrity – invalid/corrupted data
Loss of availability – sabotage of h/w, networks or
apps
Generally important security features in data
management include:
Views/sub-schemas – restrict user view to the
database
Authorization rules- identify users and restricts
their actions against the database
User-defined procedures – additional constraints
Encryption procedures – encode data in an
unrecognizable format
Authentication schemes –positively identify a user
Back-up, journaling and check pointing capabilities
– facilitate recovery
DATABASE RECOVERY
• Mechanisms of restoring a database quickly and
accurately after loss or damage
• Basic recovery facilities
• 1. Back-up
For Large databases - dynamic back-up and
Incremental back-up
2. Journaling facilities – maintain an audit trail of
transactions and database changes
Transaction log- records of transactions processes
against the database
change log – before and after images of modified
records
Security log – alerts of security violations
• 3. Check – point facility
• Facility whereby the DBMS periodically refuses to
accept any new transactions, goes to a quiet
state whereby database and transactions logs are
synchronized.
4. Recovery manager
• Module that restores database to a correct
condition when failure occurs and resumes
processing user requests
• Recovery techniques
1. Backward recovery or rollback
• Backs out or undo unwanted changes
• Before images of records that have been changed
are applied to the database to return it to an
earlier state
• Used to reverse changes made by transactions
that aborted or terminated abnormally
2. Forward recovery or roll forward
Starts with an earlier copy of the database, after-
images which are results of good transactions are
applied and the database is quickly moved forward
to a later state
Its faster and accurate since only the most recent
‘good’ after images need to be applied
DATABASE CONCURRENCY