0% found this document useful (0 votes)
128 views21 pages

Database Modelling & Design (Chap 1 - 3)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views21 pages

Database Modelling & Design (Chap 1 - 3)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Database Modeling and Design

Lecture Notes

Contents
I. Database Systems and the Life Cycle (Chapter 1)……………………2
Introductory concepts; objectives of database management 2
Relational database life cycle 3
Characteristics of a good database design process 7
II. Requirements Analysis (Chapter 3)………………………………….8

III. Entity-Relationship (ER) Modeling (Chapters 2-4)……………… 11


Basic ER modeling concepts 11
Schema integration methods 22
Entity-relationship 26
Transformations from ER diagrams to SQL Tables 29
IV. Normalization and normal forms (Chapter 5)………………………35
First normal form (1NF) to third normal form (3NF) and BCNF 35
3NF synthesis algorithm (Bernstein) 42
Fourth normal form (4NF) 47

V. Access Methods (Chapter 6)…………………………..………………50


Sequential access methods 50
Random access methods 52
Secondary Indexes 58
Denormalization 62
Join strategies 64
VI. Database Distribution Strategies (Chapter 8)……………………….66
Requirements of a generalized DDBMS: Date’s 12 Rules 68
Distributed database requirements 72
The non-redundant “ best fit” method 74
The redundant “all beneficial sites” method 77

Prepared by:
Eng. TANG Lineker

School of Engineering & Applied Technology


ZTF UNIVERSITY

1
I. Database Systems and the Life Cycle
Introductory Concepts
data—a fact, something upon which an inference is based (information or knowledge has
value, data has cost)
data item—smallest named unit of data that has meaning in the real world (examples: last
name, address, ssn, political party)
data aggregate (or group) -- a collection of related data items that form a
whole concept; a simple group is a fixed collection, e.g. date (month, day, year); a
repeating group is a variable length collection, e.g. a set of aliases.
record—group of related data items treated as a unit by an application program (examples:
presidents, elections, congresses)

file—collection of records of a single type (examples: president, election)

database—computerized collection of interrelated stored data that serves the needs of


multiple users within one or more organizations, i.e. interrelated collections of records
of potentially many types. Motivation for databases over files: integration for easy access
and update, non-redundancy, multi-access.

database management system (DBMS) -- a generalized software system for


manipulating databases. Includes logical view (schema, sub-schema), physical view
(access methods, clustering), data manipulation language, data definition language, utilities
- security, recovery, integrity, etc.

database administrator (DBA) -- person or group responsible for the effective use of
database technology in an organization or enterprise. Motivation: control over all phases of
the lifecycle.

Objectives of Database Management


1. Data availability—make an integrated collection of data available to a wide variety of
users
* at reasonable cost—performance in query update, eliminate or control data
redundancy
* in meaningful format—data definition language, data dictionary
* easy access—query language (4GL, SQL, forms, windows, menus);
embedded SQL, etc.; utilities for editing, report generation, sorting

2. Data integrity—insure correctness and validity


* checkpoint/restart/recovery
* concurrency control and multi-user updates
* accounting, audit trail (financial, legal)

3. Privacy (the goal) and security (the means)


* schema/sub-schema, passwords

4. Management control—DBA: lifecycle control, training, maintenance

2
5. Data independence (a relative term) -- avoids reprogramming of applications, allows
easier conversion and reorganization
* physical data independence—program unaffected by changes in the storage structure or
access methods

* logical data independence—program unaffected by changes in the schema

* Social Security Administration example (1980ís)


- changed benefit checks from $999.99 to $9999.99 format
- had to change 600 application programs
- 20,000 work hours needed to make the changes (10 work years)
* Student registration system—cannot go to a 4-digit or hexadecimal course numbering
system because of difficulty changing programs

*Y2K (year 2000) problem—many systems store 2-digit years (e.g. ‘02-OCT-98’) in their
programs and databases, that give incorrect results when used in date arithmetic (especially
subtraction), so that ‘00’ is still interpreted as 1900 rather than 2000. Fixing this problem
requires many hours of reprogramming and database alterations for many companies and
government agencies.

Relational Database Lifecycle


1. Requirements formulation and analysis
* natural data relationships (process-independent)
* usage requirements (process-dependent)
* hardware/software platform (OS, DBMS)
* performance and integrity constraints
* result: requirements specification document, data dictionary entries

2. Logical database design


2.1 ER modeling (conceptual design)
2.2 View integration of multiple ER models
2.3 Transformation of the ER model to SQL tables
2.4 Normalization of SQL tables (up to 3NF or BCNF)
*result: global database schema, transformed to table definitions
3. Physical database design
* index selection (access methods)
* clustering

4. Database distribution (if needed for data distributed over a network)


* data fragmentation, allocation, replication

5. Database implementation, monitoring, and modification

3
4
5
6
Characteristics of a Good Database Design Process
* iterative requirements analysis
- interview top-down
- use simple models for data flow and data relationships
- verify model

* stepwise refinement and iterative re-design

* well-defined design review process to reduce development costs review team


-database designers
-DBMS software group
-end users in the application areas when to review
- after requirements analysis & conceptual design
- after physical design
- after implementation (tuning) meeting format
- short documentation in advance
- formal presentation
- criticize product, not person
- goal is to locate problems, do solutions off line
- time limit is 1-2 hours

7
II. Requirements Analysis

Purpose - identify the real-world situation in enough detail


to be able to define database components. Collect two types of data: natural data (input to the
database) and processing data (output from the database).

Natural data requirements (what goes into the database)

1. Organizational objectives
- sell more cars this year
- move into to recreational vehicle market

2. Information system objectives


- keep track of competitors’ products and prices
- improve quality and timing of data to management regarding production schedule delays,
etc.
- keep track of vital resources needed to produce and market a product

3. Organizational structure/chart

4. Administrative and operational policies


- annual review of employees
- weekly progress reports
- monthly inventory check
- trip expense submission

5. Data elements, relationships, constraints, computing environment

Processing requirements (what comes out of the database)

1. Existing applications - manual, computerized

2. Perceived new applications

* quantifies how data is used by applications

* should be a subset of data identified in the natural relationships


(but may not be due to unforeseen applications)

* problem - many future applications may be unknown

8
Data and Process Dictionary Entries for Requirements Analysis
in the Database Design Lifecycle
Entity Description (possibly in a data dictionary)
Name customer
Reference-no 4201
Cardinality 10,000
Growth rate 100 per month
Synonyms user, buyer
Role (or description) someone who purchases or rents a
product made by the company.
Security level 0 (customer list is public)
Subtypes adults, minors
Key attribute(s) cust-no
Non-key attribute(s) cust-name, addr, phone, payment-status Relationship to
other entities salesperson, order, product
Used in which applications billing, advertising
Attribute description (data elements in a data dictionary)
Name cust-no
Reference-no 4202
Range of legal values 1 to 999,999
Synonyms cno, customer-number
Data type integer
Description customer id number set by the company.
Key or nonkey key
Source of data table of allowable id numbers
Used in applications billing
Attribute trigger /*describes actions that occur when a
data element is queried or updated*/
Relationship description
Name purchase
Reference-no 511037
Degree binary
Entities and connectivity customer(0,n), product(1,n)
Synonyms buy
Attributes (of the relationship) quantity, order-no
Assertions a customer must have purchased at
least one product, but some products
may not have been purchased as yet by
any customers.
Process (application) description
Name payroll
Reference-no 163
Frequency bi-weekly
Priority 10
Deadline noon Fridays
Data elements used emp-name, emp-salary
Entities used employee
Data volume (how many entities) implicit from entity cardinality

9
Interviews at different levels
Top management - business definition, plan/objectives, future plans

Middle management - functions in operational areas, technical areas, job-titles, job functions

Employees - individual tasks, data needed, data out

Specific end-users of a DBMS - applications and data of interest

Basic rules in interviewing

1. Investigate the business first

2. Agree with the interviewee on format for documentation (ERD, DFD, etc.)

3. Define human tasks and known computer applications

4. Develop and verify the flow diagram(s) and ER diagram(s)

5. Relate applications to data (this helps your programmers)

Example: order entry clerk

Function: Take customer orders and either fill them or make adjustments.
Frequency: daily

Task Def Volume Data Elements

1. Create order 2000 A, B, E, H


2. Validate order 2000 A, B, G, H, J
3. Fill out error form 25 A, C
4. Reserve item/price 6000 A, D, H
5. Request alternate items 75 A, E, I, K,M
6. Enter unit price 5925 A, F, J, N

10
III. Entity-Relationship (ER) Modeling

Basic ER Modeling Concepts

Entity - a class of real world objects having common characteristics and properties about
which we wish to record information.

Relationship - an association among two or more entities

* occurrence - instance of a relationship is the collective instances of the related entities

* degree - number of entities associated in the relationship (binary, ternary, other n-ary)

* connectivity - one-to-one, one-to-many, many-to-many

* existence dependency (constraint) - optional/mandatory

Attribute - a characteristic of an entity or relationship

* Identifier - uniquely determines an instance of an entity

* Identity dependence - when a portion of an identifier is inherited from another entity

* Multi-valued - same attribute having many values for one entity

* Surrogate - system created and controlled unique key (e.g. Oracle’s “create sequence”)

11
12
13
14
Super-class (super-type)/subclass (subtype) relationship
Generalization
* similarities are generalized to a super-class entity, differences are specialized to a subclass entity,
called an “ISA” relationship (“specialization” is the inverse relationship)

* disjointness constraint - there is no overlap among subclasses

* completeness constraint - constrains subclasses to be all-inclusive of the super-class or not (i.e.


total or partial coverage of the superclass)

* special property: hierarchical in nature

* special property: inheritance - subclass inherits the primary key of the super-class, super-class has
common nonkey attributes, each subclass has specialized non-key attributes

Aggregation
* “part-of” relationship among entities to a higher type aggregate entity (“contains” is the inverse
relationship)

* attributes within an entity, data aggregate (mo-day-year)

* entity clustering variation: membership or “is-member-of” relationship

15
Constraints

16
Constraints in ER modeling
* role - the function an entity plays in a relationship

* existence constraint (existence dependency) - weak entity

* exclusion constraint - restricts an entity to be related to only of several other

* entities at a given point in time

- mandatory/optional
- specifies lower bound of connectivity of entity instances
- participating in a relationship as 1 or 0

* uniqueness constraint – one-to-one functional dependency among key attributes


in a relationship: binary, ternary, or higher n-ary

17
18
19
20
21

You might also like