0% found this document useful (0 votes)
49 views42 pages

Database Management Systems

This document discusses database management systems and the transition from flat file data storage to relational databases. It covers the problems with flat files like data redundancy and inconsistencies. The key benefits of relational databases are single data storage, single updates, and currency of information. Other topics include database design stages, distributed databases, data definition and manipulation languages, and data normalization to remove anomalies.

Uploaded by

Toxy Kayz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views42 pages

Database Management Systems

This document discusses database management systems and the transition from flat file data storage to relational databases. It covers the problems with flat files like data redundancy and inconsistencies. The key benefits of relational databases are single data storage, single updates, and currency of information. Other topics include database design stages, distributed databases, data definition and manipulation languages, and data normalization to remove anomalies.

Uploaded by

Toxy Kayz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

Chapter 9

Database Management Systems


Objectives for Chapter 9
• Problems inherent in the flat file approach to data
management that gave rise to the database concept
• Relationships among the defining elements of the database
environment
• Anomalies caused by unnormalized databases and the need
for data normalization
• Stages in database design: entity identification, data
modeling, constructing the physical database, and preparing
user views
• Features of distributed databases and issues to consider in
deciding on a particular database configuration
Flat-File Versus Database
Environments
• Computer processing involves two components: data and
instructions (programs)
• Conceptually, there are two methods for designing the
interface between program instructions and data:
• File-oriented processing: A specific data file was created for each
application
• Data-oriented processing: Create a single data repository to support
numerous applications.
• Disadvantages of file-oriented processing include redundant
data and programs and varying formats for storing the
redundant data.
Flat-File Environment
User 1 Data
Transactions
Program 1
A,B,C
User 2
Transactions
Program 2
X,B,Y
User 3
Transactions
Program 3
L,B,M
Data Redundancy and Flat-File
Problems
• Data Storage - creates excessive storage costs of
paper documents and/or magnetic form
• Data Updating - any changes or additions must be
performed multiple times
• Currency of Information - potential problem of
failing to update all affected files
• Task-Data Dependency - user’s inability to obtain
additional information as his or her needs change
Database Approach
User 1
Database
Transactions
Program 1
A,
User 2
D B,
Transactions B C,
Program 2 M X,
S Y,
User 3 L,
Transactions M
Program 3
Advantages of the Database Approach
Data sharing/centralize database resolves flat-file
problems:
 No data redundancy: Data is stored only once,
eliminating data redundancy and reducing storage costs.
 Single update: Because data is in only one place, it
requires only a single update, reducing the time and cost
of keeping the database current.
 Current values: A change to the database made by any
user yields current data values for all other users.
 Task-data independence: As users’ information needs
expand, the new needs can be more easily satisfied than
under the flat-file approach.
Disadvantages of the Database
Approach
• Can be costly to implement
• additional hardware, software, storage, and network resources are
required
• Can only run in certain operating environments
• may make it unsuitable for some system configurations
• Because it is so different from
the file-oriented approach, the database
approach requires training users
• may be inertia or resistance
System Requests
Elements of the Database Environment
Database
System Development Administrator
Process

Applications
User DBMS
Transactions
Programs Data
Definition Host
U Language Operating
S Transactions User System
Data
E Programs Manipulation
R Language
S Transactions User
Query
Programs Language Physical
Database

User Queries
Internal Controls and DBMS
• The database management system (DBMS) stands between the
user and the database per se.
• Thus, commercial DBMS’s (e.g., Access or Oracle) actually consist of
a database plus…
• Plus software to manage the database, especially controlling access
and other internal controls
• Plus software to generate reports, create data-entry forms, etc.
• The DBMS has special software to know which data elements each
user is authorized to access and deny unauthorized requests of
data.
DBMS Features
• Program Development - user created applications
• Backup and Recovery - copies database
• Database Usage Reporting - captures statistics on database usage
(who, when, etc.)
• Database Access - authorizes access to sections of the database
• Also…
• User Programs - makes the presence of the DBMS transparent to the
user
• Direct Query - allows authorized users to access data without
programming
Data Definition Language (DDL)
• DDL is a programming language used to define the
database per se.
• It identifies the names and the relationship of all data elements,
records, and files that constitute the database.
• DDL defines the database on three viewing levels
• Internal view – physical arrangement of records (1 view)
• Conceptual view (schema) – representation of database (1 view)
• User view (subschema) – the portion of the database each user views
(many views)
Data Manipulation Language (DML)
• DML is the proprietary programming language that a
particular DBMS uses to retrieve, process, and store data
to / from the database.
• Entire user programs may be written in the DML, or selected
DML commands can be inserted into universal programs,
such as COBOL and FORTRAN.
• Can be used to ‘patch’ third party applications to the DBMS
Query Language
• The query capability permits end users and professional
programmers to access data in the database without the need
for conventional programs.
• Can be an internal control issue since users may be making an ‘end
run’ around the controls built into the conventional programs
• IBM’s structured query language (SQL) is a fourth-generation
language that has emerged as the standard query language.
• Adopted by ANSI as the standard language for all relational
databases
Functions of the DBA
Database Conceptual Models
• Refers to the particular method used to organize
records in a database
• A.k.a. “logical data structures”
• Objective: develop the database efficiently so that
data can be accessed quickly and easily
• There are three main models:
• hierarchical (tree structure)
• network
• relational
• Most existing databases are relational. Some legacy systems
use hierarchical or network databases.
The Relational Model
• The relational model portrays data in the form of two
dimensional ‘tables’.
• Its strength is the ease with which tables may be linked to
one another.
• A major weakness of hierarchical and network databases
• Relational model is based on the relational algebra functions
of restrict, project, and join.
Relational Algebra
RESTRICT – filtering out rows, PROJECT – filtering out columns,
such as the dark blue such as the light blue

JOIN – build a new table or data set from multiple existing tables

X1 Y1 Y1 Z1 X1 Y1 Z1

X2 Y2 Y2 Z2 X2 Y2 Z2

X3 Y1 Y3 Z3 X3 Y1 Z1
Associations and Cardinality
• Association – the labeled line connecting two entities or tables in
a data model
• Describes the nature of the between them
• Represented with a verb, such as ships, requests, or receives
• Cardinality – the degree of association between two entities
• The number of possible occurrences in one table that are associated
with a single occurrence in a related table
• Used to determine primary keys and foreign keys
“Crow’s Feet” Cardinalities
(1:0,1)

(1:1)

(1:0,M)

(1:M)

(M:M)
Properly Designed Relational Tables
• Each row in the table must be unique in at least one attribute,
which is the primary key.
• Tables are linked by embedding the primary key into the related
table as a foreign key.
• The attribute values in any column must all be of the same
class or data type.
• Each column in a given table must be uniquely named.
• Tables must conform to the rules of normalization, i.e., free
from structural dependencies or anomalies.
Three Types of Anomalies
• Insertion Anomaly: A new item cannot be added to
the table until at least one entity uses a particular
attribute item.
• Deletion Anomaly: If an attribute item used by only
one entity is deleted, all information about that
attribute item is lost.
• Update Anomaly: A modification on an attribute
must be made in each of the rows in which the
attribute appears.
• Anomalies can be corrected by creating additional
relational tables.
Advantages of Relational Tables
• Removes all three types of anomalies
• Various items of interest (customers,
inventory, sales) are stored in
separate tables.
• Space is used efficiently.
• Very flexible – users can form ad hoc
relationships
The Normalization Process
• A process which systematically splits unnormalized complex
tables into smaller tables that meet two conditions:
• all nonkey (secondary) attributes in the table are dependent on the
primary key
• all nonkey attributes are independent of the other nonkey attributes
• When unnormalized tables are split and reduced to third normal
form, they must then be linked together by foreign keys.
Steps in Normalization
Unnormalized table with
repeating groups Remove
repeating
groups
First normal
form 1NF
Remove
partial
dependencies
Second normal
form 2NF
Remove
transitive
Third normal dependencies
form 3NF

Remove
remaining
Higher normal anomalies
forms
Accountants and Data
Normalization
• Update anomalies can generate conflicting and obsolete
database values.
• Insertion anomalies can result in unrecorded transactions and
incomplete audit trails.
• Deletion anomalies can cause the loss of accounting records
and the destruction of audit trails.
• Accountants should understand the data normalization process
and be able to determine whether a database is properly
normalized.
Six Phases in Designing Relational
Databases
1. Identify entities
• identify the primary entities of the
organization
• construct a data model of their
relationships
2. Construct a data model showing entity
associations
• determine the associations between
entities
• model associations into an ER diagram
Six Phases in Designing Relational
Databases
3. Add primary keys and attributes
• assign primary keys to all entities in the model
to uniquely identify records
• every attribute should appear in one or more
user views
4. Normalize and add foreign keys
• remove repeating groups, partial and transitive
dependencies
• assign foreign keys to be able to link tables
Six Phases in Designing Relational
Databases
5. Construct the physical database
• create physical tables
• populate tables with data
6. Prepare the user views
• normalized tables should support all required
views of system users
• user views restrict users from have access to
unauthorized data
Distributed Data Processing (DDP)
• Data processing is organized around several information
processing units (IPUs) distributed throughout the organization.
• Each IPU is placed under the control of the end user.
• DDP does not always mean total decentralization.
• IPUs in a DDP system are still connected to one another and
coordinated.
• Typically, DDP’s use a centralized database.
• Alternatively, the database can be distributed, similar to the
distribution of the data processing capability.
Distributed Data
Processing

Central Centralized
Site Database

Site A Site B Site C


Centralized Databases in DDP
Environment
• The data is retained in a central location.
• Remote IPUs send requests for data.
• Central site services the needs of the remote IPUs.
• The actual processing of the data is performed at the remote IPU.
Advantages of DDP
• Cost reductions in hardware and data entry tasks
• Improved cost control responsibility
• Improved user satisfaction since control is closer to the user
level
• Backup of data can be improved through the use of multiple
data storage sites
Disadvantages of DDP
• Loss of control
• Mismanagement of resources
• Hardware and software incompatibility
• Redundant tasks and data
• Consolidating incompatible tasks
• Difficulty attracting qualified personnel
• Lack of standards
Data Currency
• Occurs in DDP with a centralized database
• During transaction processing, data will
temporarily be inconsistent as records are
read and updated.
• Database lockout procedures are necessary
to keep IPUs from reading inconsistent data
and from writing over a transaction being
written by another IPU.
Distributed Databases:
Partitioning
• Splits the central database into segments that
are distributed to their primary users
• Advantages:
• users’ control is increased by having data stored at local sites
• transaction processing response time is improved
• volume of transmitted data between IPUs is reduced
• reduces the potential data loss from a disaster
The Deadlock Phenomenon
• Especially a problem with partitioned databases
• Occurs when multiple sites lock each other out of data that
they are currently using
• One site needs data locked by another site.
• Special software is needed to analyze and resolve conflicts.

• Transactions may be terminated and restarted.


The Deadlock Phenomenon
Locked A, waiting for C Locked E, waiting for A

A,B
E, F

C,D

Locked C, waiting for E


Distributed Databases: Replication
• The duplication of the entire database
for multiple IPUs
• Effective for situations with a high
degree of data sharing, but no
primary user
• Supports read-only queries
• Data traffic between sites is reduced
considerably.
Concurrency Problems and Control
Issues
• Database concurrency is the presence of
complete and accurate data at all IPU sites.
• With replicated databases, maintaining current
data at all locations is difficult.
• Time stamping is used to serialize transactions.
• Prevents and resolves conflicts created by updating
data at various IPUs
Distributed Databases and the
Accountant
• The following database options impact the organization’s ability to
maintain database integrity, to preserve audit trails, and to have
accurate accounting records.
• Centralized or distributed data?
• If distributed, replicated or partitioned?
• If replicated, totally or partially replication?
• If partitioned, what allocation of the data segments among the sites?

You might also like