Database Notes From Adada
Database Notes From Adada
1
3.5.1 Advantages of Normalisation Approach........................30
3.5.2 Disadvantages................................................................30
Exercise......................................................................................31
2
6.2.1.1 Date’s 12 rules of distribution........................................52
APPENDIX...................................................................................71
1
Martin O. Adada
3
Revision Questions.....................................................................71
4
1.0 OVERVIEW OF DATABASE SYSTEMS
5
Such a typical filing /processing system has the limitation of
more and more files and application programs being added to
the system at any time. Such a scheme has a number of major
disadvantages:
6
ii. Difficulty in accessing - Suppose that one of the bank
officers needs to find out the names of all customers who
live within the city's 78-phone code. The officer would ask
the data processing department to generate such a list.
Such a request may not have been anticipated while
designing the system originally and the only options
available are:-
7
Conclusion.
These difficulties among others have prompted the development of
DBMS.
Unlike the file system with may separate and unrelated files, the
Database consists of logically related data store in a single data
repository. The problems inherent in file systems make using the
database system very desirable and therefore, the database
represents a change in the way the end user data are stored
accessed and arranged.
This is a database that supports multiple users at the same time for
relatively small number e.g. 50 users in a department the database
is referred to as a workgroup database. While one, which
supports many departments is called an enterprise database.
8
iv. Distributed database system
9
4. Integrity - Centralized control can also ensure that adequate
checks are incorporated to the DBMS provide data integrity.
Data integrity means that the data contained in the database is
both accurate and consistent e.g. employee age must be
between 28-25 years.
10
H/W
Workspace (disks for storage)
Migration (movement from tradition separate systems to
an integrated one)
2. Centralization Problems
11
File System Environment
Personnel Sales Accounts
Department Department Department
_____________________________________________________
_________________________
Personnel
DATABASE
Department
Employees
Customers
Sales
Sales Department DBMS Inventory
Accounts
Integrated
Accounting
Department System
12
The current generation of DBMS software stores not only the data
structures in a central location but also stores the relationships
between the database components
The DBMS also takes care of defining all the required access
paths of the required component.
13
Hardware
This identifies all the systems physical devices e.g. the composition
peripherals, storage devices etc.
Software
These are a collection of programs used by the computers within
the database system.
i. O.S - manages all hardware components and makes it
possible for all other and software to run on the
composition.
ii. The DBMS - manages the database within the database
system e.g. Oracle, DB2, Ms Access etc.
iii.Applications programs and utilities to access and
manipulate data in the DBMS.
People
These are all database systems users:-
1. Systems administrator - Oversees the database systems general
operations.
2. Database administrator (DBA) - Manages the DBMS use and
ensures that the database is functioning properly. His functions
include:
14
DDL compiler or the data storage and definition language
compiler to generate modification to appropriate internal
systems tables e.g. data dictionary.
iv. Granting authorization to data access - This is so as to
regulate which parts of the database users can access.
v. The database manager keeps integrity Constrains in a
special system structure whenever an update takes place
in the system.
15
5. End users - These are the people who use the application
programs to run the organizations daily operations. They fall
in the following classes:
Procedures
These are instructions and rules that govern the design and use
of the database system.
They enforce standards by which business is conducted within
the organisation an with customers.
They also ensure that there is an organized way to monitor and
audit both the data that enter the database and the information
that is generated through the use of such data.
DATA
This covers the collection for facts stored in the database and
since data is the raw material from which information is generated
the determination of what data is to be stored into the database
and how the data is to be organized is a vital part of the database
designer jobs.
16
2.0 DATABASE ARCHITECTURE AND ENVIRONMENT
17
In a non-database environment each application is responsible for
maintaining the currency of data and a change in data item.
18
Employee.Address
Employee.Annual_Sal
User
1 User
2
Conceptual View
Employee.Name:String
Employee.Soc_Sec_No:Integer DBA
Employee.Address:String
Employee.Annual_Sal:Double
Connection to the
DBA
Internal View
Name:String Length 25 Offset 0
Soc_Sec_No:Integer 9 Offset 25
Address: String Length 5 Offset 34
Salary: 9,2 dec Offset 39
19
The view of each level is described as a scheme, which is an
outline or a plan that describes the records and relations existing
in the view. It also describes the way in which entities at one level
of abstraction can be mapped onto the next level.
External Level (External or User view)
This is at the highest level of database abstraction where only
those portions of the database of concern to the user or
application programs are included.
Internal View
This is the lowest level of abstraction
View A closest View
to the
B physical
View C
storage method used.
It indicated how data would be stored and describe the data
structures and access methods to be used by the database. The
internal schema implements it.
Conceptual View
External Level
20
INTERNAL
VIEW
User/application view
Conceptual Level
Internal Level
21
Mapping between views
Two mappings are required, one between external and conceptual
views and another between the conceptual records to internal
ones.
Data Independence
This is the immunity of users/application programs from changes
in storage structure and access mechanism.
The 3 levels of abstractions along with the mappings from internal
to conceptual and from conceptual to external provide 2 distinct
levels of data independence i.e.:
Logical Data Independence
Physical Data Independence
22
This indicates that the physical storage structures or devices used
for storing the data can be changed without necessitating a change
in the conceptual view or any of the external view. Any change is
absorbed by the mapping between the conceptual and internal
views.
23
2.3 Components Of The DBMS
24
Almost all the DBMS(s) use SQL running on machines ranging
from microcomputers to large main frames.
25
10. Data Dictionary (DD) - This provides the following
facilities:
Documentation of data items
Provision of Standard definition an names for data items.
Data item description.
Removal of redundancy in documentation of data item.
Documentation of relationships between data items;
26
ii. Data Storage Management - Creation of complex
structure required for data storage is done by DBMS thus
relieving us from the difficult task of defining and
programming the physical data characteristics. A modern
DBMS system provides storage for data and related data
entry forms or screen definitions, report definition, data
validation rules, procedural code structures to handle
video and picture formats etc.
27
ensure that multiple users can access the database con-
currently and still guarantee integrity of the database.
28
DBMS may provide communication functions to access
the database through the internet using internet browsers
e.g. Netscape or Explorer as the front-ends
29
2.5 Overall System Structure
File Manager
This manages the allocation of space in the disk storage and the
data structures used to represent information stored on the disk. It
deals more on the physical aspects.
Database Manager
Provides the interface between the low level data stored in the
database and the application and programs the queries submitted
to the system.
Query Processor
This translates statements in a query language into low-level
instruction that the DB manager understands. In addition the
query processor attempts to transform a user request into more
efficient statement, thus finding a good strategy for executing the
query.
DML Pre-compiler
This converts the DML statements embedded in an application
program to normal procedure calls in the language. The pre-
30
compiler must interact with the query processor order generate
the appropriate code.
31
Major Components Of Dbms
Programmers Users
DBA
Query DDL
DML processor compiler
Pre-processor
System
buffers
Database &
System catalog
32
Database Life Cycle (DBLC)
2. Database Design
3. Implementation
33
5. Operation
34
3.0 CONCEPTUAL DATA MODEL
A database model is a collection of logical constructs used to
represent the data structure and relationships found within the
database.
These are models that are used to describe data at the lowest level.
They are very few in number and the two widely known ones are:
i. Unifying model
ii. Frame memory model
35
NB: Like the E-R model, the object-oriented model is based on a
collection of object where an object contains values stored in
instance variables with the object.
36
3.2.1 E-R Model Basic Concepts
The model employs the following components:
Entity sets
Relationship sets
Attributes
1. Entity sets
An entity is a thing or object in the real world that is
distinguishable from all other objects. It may be concrete e.g. a
person or a book or it may be abstract e.g. a loan, holiday a
concept etc. An entity set is a set of entities of the same type that
share the same properties or attitudes e.g. a set of all persons who
are customers of a bank.
2. Relationship sets
An association between two or more entities is called a
relationship.
3. Attributes
They are descriptive properties or characteristics possessed by
each member of an entity set.
37
like dependant name can take several values ranging from o-n
thus it is said to be multi valued.
38
3.2.3 Relationship Sets
A relationship is an association amongst several entities while a
relationship set is a set of relationships of the same tuple. It is a
mathematical relation on n>2 possible non-distinct entity sets e.g.
consider 2 entity sets, loan and branch. A relationship set loan,
branch can be defined to denote association between a bank loan
and the branch in which that loan is obtained.
Example
Consider 2 entity sets Customer and loan.
A relationship set - A borrower can be defined to denote the
association between customers and the bank loans that the
customers have.
Types Of Relationships
a b
1 1
a b
2 2
a b
3 3
a b
4 4
ii. One to Many relationship (1:M) - An entity in A is associated
with any number of entities in B while an entity in B can be
associated with at most one entity in A .
a1 b1
a2 b2
a3 b3
a4 b4
a5 b5
39
40
iii.Many to one relationship (M:1) - An entity in A is associated
with at most one entity in B and an entity in B can be associated
with a number of entities in A.
a1 b1
a2 b2
a3 b3
a4 b4
a5 b5
a b
1 1
a b
2 2
a b
3 3
a b
4 4
Existence Dependencies
41
Exercise.
Differentiate between super key, primary candidates and
candidate keys.
Exercise.
Draw an E-R diagram that shows the hospital environment,
theatres, patients (in and out-patients) doctors, nurses, wards and
ward beds.
42
Strong Entity Set
This is an entity set that has a primary key. For weak entity set to
be meaningful it must be part of a one to many relationships.
Specialization
An entity set may include sub-groupings of entities that are distinct
in some way from other entities in the set. This is called
specialization of the entity set e.g. the entity bank account could
have different types e.g.
Credit account
Checking account
Savings account - interest rate
Checking account - overdraft amount
Aggregation
This is abstraction through which relationship are heated as
higher-level entities e.g. the relationship set borrower and the
entity sets customer and loan can be treated as a higher set called
borrower as a whole.
43
3.4 Entity modeling (Diagrammatic representation)
relationships
Student Payment
Lecturer Student
Exercise
A company consists of a number of departments each having a
number of employees each department has a manager who must be
on a monthly payroll, other employees are either on a monthly or
weekly payroll and are members of the sports club if they so wish.
Construct an entity - relationship diagram depicting the scenario.
44
relationships are shown by use of either a bar drawn across the
line or a continuous line.
Optional
Family Child
Mandatory
Course Student
45
Representing Attributes
Exercise
Consider the entity relationship Student_Course that defines a
course undertaken by many students.
Generate a sample tabular representation of the above assuming
key attributes are course-code and stud-no respectively.
46
recorded about a surgeon includes name, address and phone
number.
Required.
47
3.5 DATA NORMALIZATION
1NF
A table or relation is said to be in first normal form, if and only if
it contains no repeating groups i.e. it has no repeated values for
particular attributes in a simple record. If there are repeating
groups and attributes they should be isolated to form a new entity.
2NF
A table is said to be in 2NF if and only if it is in 1NF and every
non-key attribute is fully dependent on the key attribute. Attributes
not fully dependent should be isolated to form a new entity.
3NF
A table is said to be in 3NF if and only if it is 2nd NF and every
non-key attribute is not dependent on any other non-key attribute.
All non-key attributes that are dependent on other non-key
attributes, should be isolated to form a new entity
48
Example: An invoice
Address ___________________________
___________________________
Thank you.
Amount____________
Un-normalised data.
Invoice (Invoice no., Date, Customer, Cust_address,
Deliv_To,Product code, Quantity, Unit Price, amount, Invoice
amount)
49
INVOICE (Invoice number, date, customer address, Deliv_
address, Invoice_Amount )
PRODUCT (Product code, invoice number, product description,
Quantity,Unit price, amount)
50
Corresponding ERD
Invoice Product
Invoice
Product
Customer
3.5.2 Disadvantages
51
Exercise
A customer account details in a bank are stored in a table that has the following
structure, normalise this data to 3NF. Customer (branch -no, account no, address,
postcode, tel)
52
A hospital drug dispensing record requires that, for each
patient, the pharmacy must record the
following information.
Total
…………………….
Paid
……………………..
Balance
…………………
53
(b) Perform data normalization for the table to 3NF.
Showing clearly the results of each stage.
54
4.0 RELATIONAL DATABASE SYSTEM
Motivation
1. To shield programmers and users from the structural
complexities of the database.
2. For conceptual simplicity
Properties of Relations
1. There is no duplicate tuples – The body of a relation is a
mathematical set, which by definition does not include duplicate
elements.
2. Tuples are unordered - Sets are unordered
3. Attributes are unordered - The heading of a relation is a set that
is unordered.
4. All simple attributes values are atomic meaning that relations
do not contain repeating groups (normalized)
Primary Keys
These are special type of more general construct candidate keys. A
candidate key is a unique identifier and each relation has at least
55
one candidate key. For a given relation, one of the candidate keys
is chosen to be the primary key and the rest are called alternate
keys.
Let r be a relation with attributes a1, a2, an. The set of attributes
K= (Ai, Ai .........AK) of R is said to be a candidate key of R. If it
satisfies the following 2 time independent properties:
i. Uniqueness - At any given time, no 2 distinct types of R
have the same values for Ai, Aj ----------AK.
ii. Minimality - None of Ai, Aj -------- Ak can be discarded
from K without destroying the uniqueness property.
56
4.2 Relational Database Language
57
SELECT
This corresponds o a projection operation of the relational
algebra. Its used to list the attributed desired in the result of a
query.
FROM
This corresponds to a Cartesian product operation of the
relational algebra. It lists the relations to be scanned in the
evaluation of the expression
WHERE
Corresponds to the predicate of the relational algebra. It consist
of a predicate involving attributes of the relations that appear in
the FROM clause.
58
A typical SQL query will be of the form:
SELECT
A1,A2, A3, ................An
FROM
R1, R2, R3, .....................Rn
WHERE
P
Ai represents an attribute; each r a relation and P is a predicate.
Select clause
Examples (i) SELECT Branch name
FROM Loan
STUDENT COURSE
Code Stud.id Name Code Title
IMIS 001 Charles IMIS Info.
Systems
BIT 002 Mary BIT Bachelor of IT
BIT 003 Maina CIT Cert in IT
CIT 004 Judy DIT Dip in IT
59
The select clause can also contain arithmetical expressions
involving operations +, -, *, and operating on constants or
attributes of tables e.g.
SELECT Branch_name, Loan_number, Amount*100
FROM loan
Where Clause
Specifies a condition that has to be met. SQL uses the logical
connectives AND, OR and NOT in the where clause. It also uses
operands of logical connectives <, < =, >, >=, = and < >. It also
includes a BETWEEN operations e.g.
(i) Select loan_number
From loan
(ii) Select loan_number
From loan
Where branch_name = "River Road" and Amount Between
10,000 And 15,000.
From Clause
This specifies the source (relations), which is a Cartesian product.
The SQL uses the notion relation-name. Attribute-name to avoid
ambiguity in case where an attribute appears in the schemer of
more that one relation e.g.
Example
Select Customer_name, borrower. loan number
From borrower, loan
Where borrower.loan_number = loan.loan_number
AND branch_name= "Moi Avenue"
This will return the name of the customer the loan-number is the
customer loan no. appears in Moi Avenue.
60
Old_name AS New_name. e.g.
Select *
From loan
Order by amount desc, loan-number desc
61
4.2.2 Aggregate Functions
Example
(i) SELECT Branch name, Avg(balance)
FROM Account
GROUP BY Branch -name
Null Values
Null values indicate absence of information about the value of an
attribute. e.g.
SELECT loan-number
62
FROM loan
WHERE Amount is Null
Query to find the names of all branches that have assets greater
than at least one branch located in Brooklyn would be.
63
Patterns are case sensitive i.e. uppercase do not match lower
case characters.
Examples
(i) “Mary %” matches any string beginning with “Mary”
(ii) “%ry” Matches any string containing “ry” as a sub-string
e.g. very, mary, ary etc.
(iii) “- - -“ Matches any string of exactly three characters.
(iv) “- - -%” Matches any string of at least 3 characters.
SELECT Customer-name
FROM Customer
WHERE Customer -street LIKE “%main %”
Examples.
LIKE “ab\%cd%”ESCAPE “\” - matches all strings beginning
with “ab%cd”
LIKE “ab\\cd%” ESCAPE”\” - matches all strings beginning
with “ab\cd”
Mismatches.
SQL allows the search for mismatches using the NOT LIKE
comparison operator Set Operations.
64
4.2.5 SQL and Set
SQL operations Union, Intersect and Except operate on relations
and correspond to the relational operations , and -,
(i) Union
To indicate duplicates
65
(iii) The Exception
Example
SELECT Loan_number
FROM Loan
WHERE Amount is NULL
To test for the absence of a null value we use the predicate “IS
NOT NULL”
4.4.6 VIEWS
Example
CREATE VIEW Customer AS
66
(SELECT Branch_name, Customer_name
FROM Depositor.account)
WHERE Depositor.Account_number, Account.account_number
(i) Deletion
DELETE FROM r
WHERE P
P represents the predicate, r represent the relation.
The statement first finds all tuples t in r which P(t) is true &
then deletes them from r
Where clause can be omitted in which case all tuples in P are
deleted.
Example
DELETE FROM Loan
- Deletes all tuples from the loan relation.
67
To delete all loans with loan amounts between 1300 &1500
Insertion
(ii)
To insert data into a relation:-
Specify a tuple to be inserted or
Write a query whose result is a set of tuples to be inserted
Tuples to be inserted must be in the correct arity.
Example
(iii) Updates
68
69
Update Of A View
A modification is permitted through a view only if the view in
question is defined in terms of one relation of the actual
relational database i.e. of a logical level db
Example
CREATE VIEW Branch_loan AS
SELECT Branch_name, loan_number
FROM loan
INSERT INTO Branch_loan
VALUES (“Moi Avenue”, “Accoo8”)
Syntax
CREATE TABLE r(A1D1, A2D2, -----, AnDn,
[Integrity Constraints],
…………
………...
………...
[Integrity - constraints]
Examples
(i) CREATE TABLE Customer
(Customer_name CHAR(20) NOT NULL,
Customer_street CHAR(30),
Customer_city CHAR(30),
PRIMARY KEY (customer_name))
70
Check (assets> = 0))
71
5.0 TRANSACTIONS MANAGEMENT AND
CONCURRENCY CONTROL
(i) Atomicity
(ii) Consistency
72
The effects of a successfully completed (committed) transaction
are permanently recorded in the database and cannot be undone.
These properties are usually referred to as ACID properties.
73
(i) Lost Update Problem
t2 Fetch R
Update R t3
t4 Update R
__ t2 Update R
Fetch R t3 __
74
t4 Roll back
Transactions that only read the database can obtain the wrong
result if they're allowed to read partial result or incomplete
transactions, which has simultaneously updated the database.
Consider 2 transactions A & B operating on an account records.
Transaction A is summing account balances while transaction B is
transferring amount 10 from account 3 to account 1.
75
Transaction A Time Transaction B
___ t6 Updates
account 1(40
+ 10 = 50)
___ t7 Commit
76
A serial schedule is the way in which all the reads and writes of
each transaction are run sequentially one after another.
A schedule is said to be serialised if all reads and writes of each
transaction can be re ordered in such a way that when they are
grouped together as in a serial schedule, they net affect of
executing this re-organised schedule is the same as that of the
original schedule.
77
(i) Locking method
Shared Locks
These are used during read operations since read operations
cannot conflict. More than one transaction is permitted to hold
read locks simultaneously of the same data item.
2-Phase locking
To ensure serialisability the 2- phase locking protocol defines how
transaction acquire and relinquish locks. 2-phase locking
guarantees serialisability but it does not prevent deadlocks. The 2-
phases are:
78
(ii) No unlock operation can precede an unlock operation
in the same transaction.
(iii) No data is affected until all locks are obtained i.e. until
the transaction is in the locked point.
Deadlocks
79
Techniques To Control Deadlocks
1. Deadlock Prevention
2. Deadlock Detection
3. Deadlock Avoidance
The transaction must obtain all the locks it needs before it can be
executed. This technique avoids rolled up of conflicting
transactions by requiring that locks be obtained in successions,
but the serial lock assignment increase action response times.
Conclusion:
The best deadlock control method depends on the database
environment, if the probability is low, deadlock detection is
recommended, if probability is high, deadlock prevention is
recommended and if response time is not high on the system
priority list deadlock avoidance might be employed.
80
stamp value uses an explicit order in which transactions are
submitted to the DBMS. The stamps must have 2 properties;
i. Uniqueness - which assures that no equal time stamp
values can exist.
ii. Monotonicity - which assures that time stamp values
always increase.
81
Validation Phase
The transaction is validated to ensure that the changes made will
not affect the integrity and consistency of the database. If a
validation phase is negative, the transaction is restarted and the
changes are discarded.
Write Phase
The changes are permanently applied (written) to the database.
Conclusion
The optimistic approach is acceptable for mostly read or query
database system that require very few update transactions.
82
6.0 DISTRIBUTED DATABASES
83
6.2 Distributed Database Approaches
Client/Server
84
recovery are performed by the DBMS and present the results to
users. Now, however, as with file servers, applications can run on
network nodes separate from servers, the client (or requestor)
nodes.
85
distributed processing" is reserved for client/server, and another
term is used for the other approach (see the next section). Users
are advised to exercise care, however, because both terms are
used rather indiscriminately.
86
optimization, which must now take network communication as well
as platform differences into account.
87
6.2.1 Distributed DBMS
88
are sometimes referred to as co-operative processing, but this is
also too broad a term, that also applies to client/server (the DBMS
and applications cooperate) as well as to certain types of
hardware architectures (the central processors cooperate)
89
6.2.1.1 Date’s 12 rules of distribution.
Transparency - is the “Rule 0”b of DDBMS, as formulated by
date:
90
Similar to Codd's 12 rules for relational databases, the objectives
can be used to evaluate DDBMS product claims: Any objective
that is not achieved defeats some transparency aspect, imposing
costs on users.
Products vary on which objectives they achieve and to what
degree, and users must understand the practical consequences of
the objectives and make decisions based on how consequential
each is for their specific environment.
It is, however, important to note that, like Codd's rules the
objectives are not independent, and giving up any will have
ramifications for the others.
91
2: NO Reliance On A Central Site
3: Continuous Operation
4: Location Independence
Users should not have to know where data is physically stored and
should be able to behave - from a logical standpoint - as if the
data were all stored on their own local node.
92
5: Fragmentation Independence
93
6: Replication Independence
Users should be able to behave-from a logical standpoint-as if the
data were not, fact, replicated at all.
94
The system should support distributed transactions.
Other than individual operations, and retrievals, a DDBMS must
also support-distributed transactions, that is, sets of multiple
operations spanning multiple nodes. As in the non-distributed
case, concurrency control and recovery are critical functions. The
two-phase commit (2PC) recovery feature comes in handy.
9: Hardware Independence.
Users should be able to run the same DBMS on different hardware
systems, and have those systems all participate as equal partners
in a distributed system.
95
10: Software Independence.
Users should be able to run the same DBMS under different
operating systems, even on the same hardware.
96
7.0 DATABASE RECOVERY MANAGEMENT
Recovery restores a database from a given state usually
inconsistent to a previously consistent state recovery techniques
are based on the atomic transaction property.
Levels of Backups
1. A full back up of the database of dump of the database.
2. Reference backup of the database in which only the last
modifications done on the database are copied.
3. A back-up of the transaction log only this level backup all the
transaction log operations that are not reflected in the previous
back-up copy of the database. The database backup is stored in
a secure place usually in a different building and protected
against dangers such as fire, theft flood and other potential
calamities back-up existence guarantees recovery system
(hardware/software) failures. Failures that claim databases and
systems are generally induced by software, hardware, program
exemption, transactions and external factors.
97
1. Software - Software induced failures may be traceable to the
O.S, DBMS, S/W application programs or viruses.
2. Hardware - Hardware induced failures may include memory
chip errors, disk crashes, bad disk sectors, disk full error etc
3. Programming exemptions - Application programs end-users
may roll back transactions when certain conditions are defined
e.g. a recovery procedure may be initiated if withdrawal funds
is made when customer funds are at O or when en-user has
initiated an unintended keyboard error such as pressing Ctrl c,
the system detects deadlocks and aborts one of the transactions.
4. External factors - Backups are especially important when a
system suffers complete distraction due to fire, earthquakes,
floods etc. The database recovery process generally follows a
predictable scenario where first you determine the type and the
extent of the required recovery.
An entry is made in the local log file each time the following
commands are issued by a transaction:-
Begin transaction
Write (insert, delete, update)
Commit transaction
Abort transaction
98
2. The type of log record i.e. as listed above
3. Identifier of data object - affected
4. Before - image of the data object
5. Log management information
Check Pointing
The recovery manager periodically check points (dumps) and on
recovery it only has to go back as far as the last check point)
1. Ways:
Transaction induced abort e.g. insufficient memory space-
time slice.
Unforeseen transaction failure arising from bugs.
System induced aborts e.g. when transaction manager
explicitly aborts a
transaction causes it to conflicts with another transaction
or to break a deadlock.
99
7.3 Recovery Protocols
Salvation Program
When all other recovery techniques fail, a salvation program may
be used. This is a specially designed program that scans the
100
database after failure to assess the damages and to restore a valid
state by rescuing whatever data are recognizable.
101
8.0 SECURITY, INTEGRITY AND CONTROL
This is the protection of data from accidental or deliberate threats,
which might cause unauthorized modification disclosure or
destruction of data and the protection of the Information System
from the degradation of non-availability of services.
Data integrity in this context of security is when data are the same
as in source documents and have not been accidentally or
intentionally altered, destroyed or disclosed.
Risks:
These are various dangers to information systems, the people,
hardware, software, data and other assets with which they are
associated.
The dangers include:
Natural disasters, thieves, industrial spies, disgruntled employees.
There Risk means the potential loss to the firm.
Threats:
Refer to people, actions, events or other situations that could
trigger losses, they are potential causes of loss.
102
Common Controls
Controls are counter measures to threats. They are tools that are
used to counter risks from the variety of people, actions, events or
situations that can threaten an IS.
Are used to identify risk, prevent risk, reduce risks and recover
from actual losses.
Physical Controls
These are controls that use conventional physical protection
measures.
Might include door locks, keyboard locks, fire doors, surp pumps.
Control over access and use of computer facilities and equipment
and controls for prevention of theft.
Inclusive controls to reduce contain or eliminate the damage from
natural disasters, power outages, humidity, dust, high temperature
and other conventional threats.
Electronic Controls
Are controls that use electronic measures to prevent or identify the
threats.
Might include intruder detection and biological access compels
e.g. log-on ID, passwords, badges and hand, voice or retina print
access controls.
Software Controls
Are program code and controls used in IS applications to prevent,
identify or recover from errors, un-authorized access and other
threats.
e.g. programming code placed in payroll application to prevent a
data entry clerk from entering hourly rate of pay that is too high.
Management Controls
Result from setting, implementing and enforcing policies and
procedures e.g. employees required to back up or archive their
103
data at regular interval and take backup copies of data files to
secure, off-site locations for storage.
Common Threats
Natural disasters, unauthorized access (e.g. theft, vandalism,
invasion of privacy), computer crime and computer viruses.
Natural disasters
E.g. five, floods, water damage, earthquakes, tornadoes,
hurricanes, mud slides, wind and storm damage
Security planning should consider
Disaster prevention
Disaster containment
Disaster recovery
Employee errors
Ordinary carelessness or poor employee training e.g. formatting
the hard disk rather than drive A, keying incorrect data.
104
Industrial espionage
It’s the theft of original data by competitors. Also called economic
espionage
Hacking
Also known as cracking. It’s the unauthorized entry by a person
into a computer
system or network.
Hackers are people who illegally gain access to the computer
systems of others.
They can insert viruses onto networks, steal data and software,
damage data or vandalize a system.
Toll Fraud
Swindling companies and organisations e.g. through telephone
bills through false pretences – e.g. use of slugs instead of real
coins
Toll hackers use maintenance ports, modem pools, voice mail
systems, automated attendants or other facilities of PBX, the
private branch exchanges that are the computerized telephone
switches at customer sites.
Signs of frauds:
1. Numerous short calls
2. Simultaneous use of one telephone access mode
3. Numerous calls after business hours
4. Large increases in direct inward system access dialing or
DISA
Data diddling
Use of a computer system by employees to forge documents or
change data in records for gain.
105
It appears to be performing a proper task but may actually
perform a variety of mischievous or criminal activities e.g.
printing paychecks to employees or vendors who don’t exist.
Trap doors
Are procedures or code that allows a person to avoid the usual
security procedures for use of or access to a system or data.
Computer viruses
A computer virus is a hidden program that inserts itself into your
computer system and forces the system to clone the virus (i.e. it
replicates itself.)
They may cause serious damage by modifying data, erasing files
or formatting disks.
e.g. cruise or stealth virus might lie dormant until it can capture
financial information and transmit the data to thieves
Antivirus programs or vaccination products can be used.
Antivirus programs help in:
Preventing the virus program inserting itself in your
system
Detecting a virus program so you can take emergency
action
Controlling the damage virus can do once they have
been detected.
Privacy violations
Privacy is the capacity of individuals or organizations to control
information about themselves.
Privacy rights imply:
106
Type and amounts of data that may be collected about
individuals or organizations are limited.
That individuals and organizations have the ability to
access, examine and correct the data stored about them.
Disclosure, use or dissemination of those data is
restricted.
Program bugs
Bugs are defects in programming code. They are prevalent to new
software, are normally discovered by users, and software vendors
provide “patches” to their code.
107
8.4 Protection.
To protect the database, measures must be taken at several levels:-
108
A user may be assigned all, none or a combination of these types
of authorization. In addition to these forms of authorization for
access to data, a user may be granted authorization to modify the
database schema: -
Index authorization. Allows the creation and deletion of
indices.
Resource authorization. Allows the creation of new
relations.
Alteration authorization. Allows the addition or deletion of
attributes in a relation.
Drop authorization. Allows the deletion of relations.
Exercise
Make a list of security concerns for a bank. For each item on your
list, state whether this concern relates to physical security, human
security, operating system security or database security.
System Integrity
This refers to the system operation conforming to the design
specifications despite attempts to make it behave incorrectly.
109
9.0 QUERY OPTIMIZATION.
Optimization represents both a challege and opportunity for
relational systems. A challege because optimization is required in
such a system is to achieve acceptable performance and an
opportunity because it’s one of the strenghs of the relational
approach. The advantage of system-managed optimization is not
just that users donot have to worry about how nest to state their
queries but the real possibility that the optimizer might actually do
better than a human programmer.
110
that is ussually chosen is some kind of abstruct syntax tree or
query tree.
Example. ( ( SP JOIN S ) WHERE P# = ‘P2’ ) [SNAME] stands
for ‘get the names of suppliers who supply part P2.
Final result
JOIN over S#
SP S
Query tree for ‘Names of suppliers who supply part P2’
Stage 2 : Convert to canonical form.
The optimizer performs a number of optimizations that are
‘guaranteed to be good’ regardless of the data values and access
paths that exist in the stored database i.e. relational database
allow all but the simplest of the queries to be expressed in a
variety of ways that are atleast superficially distinct.
Example
The expression
111
( A JOIN B ) WHERE restriction-on-A
can be transformed into the equivalent but more efficient
expression
( A WHERE restriction-on-A) JOIN B
For each low level operation, the optimizer will have available to
it a set of predifined implementation procedures. Each procedure
will have a (parameterized) cost formula associated with it,
indicating the cost (in terms of disk I/O’s, processor utilization).
The cost formulas are used in stage 4.
Using the information from the catalogue regarding the current
state in the database the optimizer will choose one or more
candidate procedures for implementing each of the low level
operations in the query expressions. The process is sometimes
referred to as access path selection.
112
9.2 Relational Algebra.
Consists of a collection of operators (e.g. join), that take relations
as their operands and return relations as their result. It consits of
eight operators, two groups of four each :
The traditional set of operations union, intersection,
difference and cartesian product.
The special relational operations restrict, project, join,and
divide.
Restrict : Returns a relation consisting of all tuples from a
specified relation that satisfy a specific condition.
Project : Returns a relation consisting of all tuples that remain
as (sub)tuples in s specified relation after specified
attributes have been eliminated.
Product : Returns a relation consisting of all possible tuples that
are a combination of two tuples , one from each of two
specified relations.
Union : Returns a relation consisting of all tuples appering in
either or both of two specified relations.
Intersect : Returns a relation consisting of all possible tuples
appearing in both of two specified relations.
Difference : Returns a relation consisting of all possible tuples
appearing in the first and not in the second of the two
specified relations.
Join : Returns a relation consisting of all possible tuples that
are a combination of two tuples , one from each of two
specified relations, such that the two tuples
contributing to any given combination have a common
value for the common attribute(s) of the two relations
(and that common value appears just once, not twice,
in the result tuple. This is a natural join.
Divide : Takes two relations, one binary and one unary, and
returns a relation consisting of all values of one
atrribute of the binary relation that match (in the other
attribute) all values in the unary relation.
113
114
10.0 LATEST DEVELOPMENTS IN DATABASE
TECHNOLOGY.
10.1 Developments
Integrating other types of information.
Images Records
Graphics Text
Video Documents
Rules
Web technology
Geographical information systems (GIS)
Capturing, storing, checking, integrating, analysing and
displaying spartilly referenced data about the earth. It includes
topographic maps, remote sensing images, photograhic images,
geodetic e.t.c.
Knowledge-based databases.
Databases that support logical rules of inference.
Computer aided manufacturing.
Data warehousing.
115
A subset of data warehouse that supports the requirements of a
particular department or business function.
Data mining :
The process of extracting valid, previusly unknown,
comprehensible and actionable information from large databases
and using it to make crucial business decisions.
Active databases :
Databases that are based on events, conditions and actions. They
are triggered upon the occurrence of certain events in the system.
Related to process control systems.
Online analytical processing (OLAP) :
The dynamic sysnthesis, analysis and consolidation of large
volumes of multi-dimensional data.
Digital publishing.
Computer aided software engineering.
Data Warehouses.
116
10.2 Applications
Decision-support Systems
As online availability of data has grown, businesses have began to
exploit the available data to make better decisions about their
activities, such as what items to stock and how best to target
customers to increase sales.
Data analysis
Although complex statistical analysis is best left to statistics
packages, databases should support simple, commonly used, forms
of data analysis. Since the data in the databases are usually large
in volume, they need to be summarized in some fashion if we are to
derive information that humans can use. The SQL aggregation
functionality is limited; so several extensions have been
implemented by different databases. For instance, although SQL
defines only a few aggregate functions, many database systems
provide a richer set of functions including variance, median, and
so on.
Data mining
The term data mining refers loosely to finding relevant
information, or “discovering knowledge” from a large volume of
data. Like knowledge discovery in artificial intelligence, data
mining attempts to discover statistical rules and patterns
automatically form data.
Data warehousing
Large companies have presences at numerous sites, each of which
may generate a large volume of data. A data warehouse is a
repository (or archive) of information gathered from multiple
sources, stored under a unified schema at a single site. Once
gathered, the data are stored for a long time, permitting access to
117
historical data. Thus data warehouses provide the user a single
consolidated interface to data, making decision support queries
easier to write.
Multimedia databases
Recently there interest in databases that store multimedia data,
such as images, audio and video. Database functionality becomes
important when the number of multimedia objects stored is large.
Issues such as transaction updates, querying facilities, and
indexing then become important.
Multimedia objects often have descriptive attributes, such as those
indicating when they were created, who created them and to what
category they belong.
118
such as a file that is noted in the database but whose contents are
missing and vise versa.
119
(ii) The development of a relatively low-cost wireless
digital communication infrastructure, based on
wireless local-area networks, cellular digital packet
networks, and other technologies.
Mobile computing has proved useful in many applications. Many
business travelers use lap top computers to enable them to work
and access data en route. Delivery services use mobile computers
to assist in packet tracking. Emergency response services use
mobile computers at the scene of disasters, medical emergencies,
and the like to access information and to enter data pertaining to
the situation.
120
Appendix
Revision Questions.
Question 1
Define what you understand by the following terms:
i) Rendudant data
ii) Duplicate data
iii) Conceptual model
iv) Physical model
v) Non-attribute key
vi) Domain
vii) Foreign Key
viii) Candidate key
ix) Alternate Key
X) Primary Key
Question 2.
(a) Briefly describe normalization.
(b) List and briefly describe advantages and disadvantages of
database approach to database systems.
(d) A Local Authority with several branch libraries wishes to
maintain a database of stocked books. It is assumed that
each book has a unique title and author but many copies of
the more popular books will be stocked at any branch.
Assuming the un-normalized entity BRANCH is given as:
BRANCH(Branch-no, Branch-address(Title,Author,
Publisher, No-of-copies))
i) Identify and normalize all the entities and their
attributes to third normal form (3NF).
ii) Draw the conceptual data model diagram for ,your
solution in(i)showing the entities and
relationships involved.
Question 3.
Some computer installations, particularly the larger, more
sophisticated ones are using databases
in which to store the original data.
(a) Differentiate between a database and database
management system.
121
(b) Explain briefly three of the major problems
associated with the implementation and operation of a
comprehensive database.
(c) List and explain briefly five of the advantages
claimed for a well-designed database.
Question 4.
(a) Describe the different forms of security which might be
used in a PC system database to provide back-up facilities
or prevent unauthorized access to the computer files.
(b) A proposed database for a book order processing
system will have four relations.
Customers
Customer orders
Order details
Books
(i) Suggest what fields these tables might contain
(ii) Draw an entity relationship model for the books
order processing system
(c) Define the terms
(i) Entity instance
(ii) Schema
(iii) Alias
(iv) External schema
Question 5.
(a) Taking two transactions T1 and T2, illustrate the
following three common concurrent transaction problems.
(i) Lost update problem
(ii) Uncommitted dependency problem
(iii) Inconsistency analysis problem
(b) Differentiate the following.
(i) Shared locks and Exclusive locks
(ii) Growing phase and shrinking phase.
122
(c) Briefly explain the following techniques used to control
deadlocks.
(i) Deadlock prevention
(ii) Deadlock detection
Question 6.
(a) Discuss the various types of relationship cardinality
(b) Define the following
(i) Data integrity
(ii) Systems integrity
(c) Give reasons why there is a need for database
security?
(d) Outline main controls that an organization can use to
counter threats to its database.
Question 7.
(a) The following is an outfit table.
123
(iii) Write an SQL statement that returns a relation
where the colour of the outfit is blue or the price
is more than or equal to kshs. 1,200.
(iv) Write an SQL statement to delete the record for
the outfit Savco trouser from the table.
(b) (i) What does the phrase ’’Query optimisation’’
mean ?
124
(ii) Write a canonical form for :
SELECT O-num,O_name
FROM Order
WHERE O_quantity BETWWEN 45 AND
95 ;
Question 8.
(a) Define the following terms as they relate to relational
database
(i) Relationship
(ii) Primary key
(iii) Domain
(iv) Tuple
(b) What is a transaction? Explain the ACID properties of
a transaction.
(c) Outline four advantages of object technology.
Question 9.
(a) Define distributed database systems. What is
distributed in this case
(b) Discuss eight of the dates’ 12 rules of distributed
databases.
Question 10.
(a) Define the terms:
(i) Entity instance
(ii) Schema
(iii) Alias
(iv) Identifying owner
125
me e t Date
101 John California Nov/2/1994
214 Paul Georgia Oct/8/1993
862 Harris California Jan/3/1995
918 Derek Florida Nov/2/1995
924 Cecil California Feb/1/1994
991 Sue Florida Jan/4/1994
Question 11.
(a) Discuss the basic search conditions in SQL.
126
A database engineer has embarked on this and has already
come up with the following:
UNF UNF
Level
CNum 1
Cname 1
Pnum 2
Paddress 2
RStartDa 2
te
RfinishD 2
ate
Ramount 2
ONum 2
Oname 2
Question 12.
(a) Discuss strategies for distributing databases. Give a brief
discussion of the potential problems caused by concurrency.
127
MKT Marketing AD
6 M2
T13 Smith Manag MKT Introduction Cecil AD C
50 ement 3 to Marketing M4
128