Database Management Systems
Database Management Systems
Applications
User DBMS
Transactions
Programs Data
Definition Host
U Language Operating
S Transactions User System
Data
E Programs Manipulation
R Language
S Transactions User
Query
Programs Language Physical
Database
User Queries
Internal Controls and DBMS
• The database management system (DBMS) stands between the
user and the database per se.
• Thus, commercial DBMS’s (e.g., Access or Oracle) actually consist of
a database plus…
• Plus software to manage the database, especially controlling access
and other internal controls
• Plus software to generate reports, create data-entry forms, etc.
• The DBMS has special software to know which data elements each
user is authorized to access and deny unauthorized requests of
data.
DBMS Features
• Program Development - user created applications
• Backup and Recovery - copies database
• Database Usage Reporting - captures statistics on database usage
(who, when, etc.)
• Database Access - authorizes access to sections of the database
• Also…
• User Programs - makes the presence of the DBMS transparent to the
user
• Direct Query - allows authorized users to access data without
programming
Data Definition Language (DDL)
• DDL is a programming language used to define the
database per se.
• It identifies the names and the relationship of all data elements,
records, and files that constitute the database.
• DDL defines the database on three viewing levels
• Internal view – physical arrangement of records (1 view)
• Conceptual view (schema) – representation of database (1 view)
• User view (subschema) – the portion of the database each user views
(many views)
Data Manipulation Language (DML)
• DML is the proprietary programming language that a
particular DBMS uses to retrieve, process, and store data
to / from the database.
• Entire user programs may be written in the DML, or selected
DML commands can be inserted into universal programs,
such as COBOL and FORTRAN.
• Can be used to ‘patch’ third party applications to the DBMS
Query Language
• The query capability permits end users and professional
programmers to access data in the database without the need
for conventional programs.
• Can be an internal control issue since users may be making an ‘end
run’ around the controls built into the conventional programs
• IBM’s structured query language (SQL) is a fourth-generation
language that has emerged as the standard query language.
• Adopted by ANSI as the standard language for all relational
databases
Functions of the DBA
Database Conceptual Models
• Refers to the particular method used to organize
records in a database
• A.k.a. “logical data structures”
• Objective: develop the database efficiently so that
data can be accessed quickly and easily
• There are three main models:
• hierarchical (tree structure)
• network
• relational
• Most existing databases are relational. Some legacy systems
use hierarchical or network databases.
The Relational Model
• The relational model portrays data in the form of two
dimensional ‘tables’.
• Its strength is the ease with which tables may be linked to
one another.
• A major weakness of hierarchical and network databases
• Relational model is based on the relational algebra functions
of restrict, project, and join.
Relational Algebra
RESTRICT – filtering out rows, PROJECT – filtering out columns,
such as the dark blue such as the light blue
JOIN – build a new table or data set from multiple existing tables
X1 Y1 Y1 Z1 X1 Y1 Z1
X2 Y2 Y2 Z2 X2 Y2 Z2
X3 Y1 Y3 Z3 X3 Y1 Z1
Associations and Cardinality
• Association – the labeled line connecting two entities or tables in
a data model
• Describes the nature of the between them
• Represented with a verb, such as ships, requests, or receives
• Cardinality – the degree of association between two entities
• The number of possible occurrences in one table that are associated
with a single occurrence in a related table
• Used to determine primary keys and foreign keys
“Crow’s Feet” Cardinalities
(1:0,1)
(1:1)
(1:0,M)
(1:M)
(M:M)
Properly Designed Relational Tables
• Each row in the table must be unique in at least one attribute,
which is the primary key.
• Tables are linked by embedding the primary key into the related
table as a foreign key.
• The attribute values in any column must all be of the same
class or data type.
• Each column in a given table must be uniquely named.
• Tables must conform to the rules of normalization, i.e., free
from structural dependencies or anomalies.
Three Types of Anomalies
• Insertion Anomaly: A new item cannot be added to
the table until at least one entity uses a particular
attribute item.
• Deletion Anomaly: If an attribute item used by only
one entity is deleted, all information about that
attribute item is lost.
• Update Anomaly: A modification on an attribute
must be made in each of the rows in which the
attribute appears.
• Anomalies can be corrected by creating additional
relational tables.
Advantages of Relational Tables
• Removes all three types of anomalies
• Various items of interest (customers,
inventory, sales) are stored in
separate tables.
• Space is used efficiently.
• Very flexible – users can form ad hoc
relationships
The Normalization Process
• A process which systematically splits unnormalized complex
tables into smaller tables that meet two conditions:
• all nonkey (secondary) attributes in the table are dependent on the
primary key
• all nonkey attributes are independent of the other nonkey attributes
• When unnormalized tables are split and reduced to third normal
form, they must then be linked together by foreign keys.
Steps in Normalization
Unnormalized table with
repeating groups Remove
repeating
groups
First normal
form 1NF
Remove
partial
dependencies
Second normal
form 2NF
Remove
transitive
Third normal dependencies
form 3NF
Remove
remaining
Higher normal anomalies
forms
Accountants and Data
Normalization
• Update anomalies can generate conflicting and obsolete
database values.
• Insertion anomalies can result in unrecorded transactions and
incomplete audit trails.
• Deletion anomalies can cause the loss of accounting records
and the destruction of audit trails.
• Accountants should understand the data normalization process
and be able to determine whether a database is properly
normalized.
Six Phases in Designing Relational
Databases
1. Identify entities
• identify the primary entities of the
organization
• construct a data model of their
relationships
2. Construct a data model showing entity
associations
• determine the associations between
entities
• model associations into an ER diagram
Six Phases in Designing Relational
Databases
3. Add primary keys and attributes
• assign primary keys to all entities in the model
to uniquely identify records
• every attribute should appear in one or more
user views
4. Normalize and add foreign keys
• remove repeating groups, partial and transitive
dependencies
• assign foreign keys to be able to link tables
Six Phases in Designing Relational
Databases
5. Construct the physical database
• create physical tables
• populate tables with data
6. Prepare the user views
• normalized tables should support all required
views of system users
• user views restrict users from have access to
unauthorized data
Distributed Data Processing (DDP)
• Data processing is organized around several information
processing units (IPUs) distributed throughout the organization.
• Each IPU is placed under the control of the end user.
• DDP does not always mean total decentralization.
• IPUs in a DDP system are still connected to one another and
coordinated.
• Typically, DDP’s use a centralized database.
• Alternatively, the database can be distributed, similar to the
distribution of the data processing capability.
Distributed Data
Processing
Central Centralized
Site Database
A,B
E, F
C,D