c1 Intro
c1 Intro
Introduction
1-1
Part 1: Introduction
References:
Ramez Elmasri, Shamkant B. Navathe: Fundamentals of Database Systems, 3rd Ed., Ch. 16, Practical Database Design and Tuning. Toby J. Teorey: Database Modeling & Design, 3rd Edition. Morgan Kaufmann, 1999, ISBN 1-55860-500-2. Graeme C. Simsion, Graham C. Witt: Data Modeling Essentials, 2nd Edition. Coriolis, 2001, ISBN 1-57610-872-4, 459 pages. Robert J. Muller: Database Design for Smarties Using UML for Data Modeling. Morgan Kaufmann, 1999, ISBN 1-55860-515-0, ca. $40. Peter Koletzke, Paul Dorsey: Oracle Designer Handbook, 2nd Edition. ORACLE Press, 1998, ISBN 0-07-882417-6, 1075 pages, ca. $40. Martin Fowler, Kendall Scott: UML Distilled, Second Edition. Addison-Wesley, 2000, ISBN 0-201-65783-X, 185 pages. Grady Booch, James Rumbaugh, Ivar Jacobson: The Unied Modeling Language User Guide. Addison Wesley Longman, 1999, ISBN 0-201-57168-4, 482 pages. Carlo Batini, Stefano Ceri, Shamkant B. Navathe: Conceptual Database Design. Benjamin/Cummings, 1992, ISBN 0-8053-0244-1, 470 pages. Richard Barker: CASE*Method: Tasks and Deliverables. Addison-Wesley, 1990, ISBN 0201416972, ca. $69. Rauh/Stickel: Konzeptuelle Datenmodellierung (in German). Teubner, 1997. Udo Lipeck: Skript zur Vorlesung Datenbanksysteme (in German), Univ. Hannover, 1996.
1. Introduction
1-2
Objectives
After completing this chapter, you should be able to: explain correctness and quality criteria for database schemas, explain diculties and risks. enumerate what else, besides the mere schema design, needs to be done during a database project. explain the relationship between application programs and database design. explain the three phases of database design.
Why does one not directly start with a relational design?
1. Introduction
1-3
Overview
' $
2. Users, Application Programs, Data 3. Phases of Database Design 4. System Development Lifecycle 5. Summary
1. Introduction
1-4
This is done by selecting, aggregating and combining information that was previously entered. The system must know the structure of the data to support powerful queries.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-5
DB Schema
DB State
(Instance)
1. Introduction
1-6
1. Introduction
1-7
Entering, modifying, or deleting information changes the state: Old State Update
E
New State
1. Introduction
1-8
During DB design, a formal model of some aspects of the real world (a mini-world) must be built.
The information which is needed to answer the required questions must be available. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-9
A model needs to be structured. Though text may contain all of the necessary information, it cannot be used by a computer for answering questions (except with higher AI).
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-10
1. Introduction
1-11
A legal question about the real world cannot be formulated as a query to the database.
The needed information is missing in the database.
Database states are possible which do not correspond to a legal state in the real world.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-12
Constraints (1)
Two kinds of errors must be distinguished: Entering wrong data, i.e. the DB state corresponds a dierent situation of the real world than the actual one.
E.g., 8 points given for Homework 1 in the DB vs. 10 in the real world. Then the DB state is wrong, but not the schema. What can be done to guard against such errors?
1. Introduction
1-13
Constraints (2)
If the DB contains illegal/meaningless data, it becomes inconsistent with our general understanding of the real world. If a programmer assumes that the data fullls some condition, but it actually does not, this can have all kinds of strange eects (including the loss of data).
E.g. the programmer assumes that a certain column cannot contain null values. So he/she uses no indicator variable when fetching data. As long as there are no null values, this works. But if the schema does not prevents this, after some time, somebody will enter a null value. Then the program will crash (with a user-unfriedly error message).
1. Introduction
1-14
Constraints (3)
Given only the structural denitions (e.g. tables, columns, column datatypes), there are usually still many database states which do not correspond to states of the real world. Additional conditions which database states have to satisfy should be specied. In this way, invalid states are excluded. Such conditions are called (integrity) constraints.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-15
Constraints (4)
Each data model has special support for certian common kinds of constraints, e.g. the relational model and SQL oer: Keys: Unique identication of rows. Foreign keys: Dynamic domain dened by a key. NOT NULL: Entries for a column cannot be empty. CHECK: Conditions that refer only to single rows. Arbitrary conditions can be specied as constraints (in natural language, logic, as SQL queries, programs, . . . ).
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-16
Constraints (5)
Why specify constraints? Some protection against data input errors. Constraints document knowledge about DB states. Enforment of laws / company standards. Protection against inconsistency if redundant data is stored. Queries/programs become simpler if the programmer is not required to handle the most general cases (i.e., cases where the constraint is not satised).
E.g., if columns are known to be not null: no indicator variable. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-17
Constraints (6)
Constraints and Exceptions: Constraints cannot have any exceptions. A good DBMS will reject any attempt to enter data which violates a specied constraint. One can expect that eventually there will be exceptional situations in which the DBS seems unexible because of the specied constraints. Only conditions that are unquestionable should be dened as constraints.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-18
1. Introduction
1-19
1. Introduction
1-20
The task of database design is certainly not complete when only a set of CREATE TABLE statements is delivered. Additional documentation is needed.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-21
1. Introduction
1-22
The documentation might include a small example DB state. There should be some explanation for every schema element (e.g. tables and columns).
These can be used later in the help les for input elds. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-23
1. Introduction
1-24
1. Introduction
1-25
If two patients have bronchitis, is this counted as two dierent health problems or two instances of the same problem?
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-26
1. Introduction
1-27
1. Introduction
1-28
Overview
1. The Task of Database Design
' $
1. Introduction
1-29
1. Introduction
1-30
1. Introduction
1-31
1. Introduction
1-32
1. Introduction
1-33
The database schema is smaller than the complete specication of the needed programs.
It can be understood as a concise representation of the essential functions of a large subset of the required programs. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-34
It was dicult to use the data for other purposes than the one for which they were orginally collected.
It is frustrating if one knows that the information is in there, but the new evaluation would be too dicult to program (or even require manual analysis). [Good from the data privacy standpoint . . . ]
1. Introduction
1-35
Thus, data must be seen independent from a specic program. Vice versa, programs should not depend on the way the data is stored (data organisation/le format).
1. Introduction
1-36
certain application programs are executed so often that a single disk cannot support the required number of accesses per second.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-37
In relational DBMSs, indexes can be added or deleted without any change to an application program. SQL is a declarative language: One species only which conditions the result must satisfy, but not how it should be computed. The query optimizer automatically uses indexes.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-38
The Internal Schema (or Physical Schema) describes the way the data is actually stored.
E.g. relations plus indexes, disks, and many storage parameters.
Users can refer (in SQL queries) only to the conceptual schema.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-39
However, these are not part of the SQL standard and highly system dependent.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-40
1. Introduction
1-41
1. Introduction
1-42
External Schema n
Conceptual Schema
Internal Schema
[ANSI/SPARC 1978]
1. Introduction
1-43
1. Introduction
1-44
1. Introduction
1-45
Overview
1. The Task of Database Design 2. Users, Application Programs, Data
' $
1. Introduction
1-46
1. Introduction
1-47
1. Introduction
1-48
DBMS features do not inuence conceptual design, and only partially inuence the logical design.
This ensures that the conceptual design is not invalidated, if a dierent DBMS is later used.
In the conceptual schema, non-standard datatypes for the attributes can be used.
Of course, this makes the logical design more dicult. But objectrelational systems do have an extensible type system. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-49
1. Introduction
1-50
Old DB designs are often heavily denormalized, which makes changes dicult and expensive.
Each piece of redundant data (that is not completely managed by the DBMS, like, e.g. an index) makes application programs more dicult and inconsistencies possible. Denormalization also means that certain pieces of information can only be stored together, which makes the schema less exible.
1. Introduction
1-51
Often, it is not possible to create the complete ER-Schema in one step, because this is very large. Then one starts with small ER-schemas which describe only the data necessary for one application or user (or a small group of related applications).
For each application/user, one such schema is developed.
1. Introduction
1-52
These schemas must then be integrated to get the complete enterprise data model (i.e. the conceptual schema of the database).
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-53
(0, )
r rrr rr r r rr r rr r
solved
(0, )
Exercise
d d d
Name
Points
No
MaxPoints
This mini-world contains students and homework exercises (entities, objects). Students have a name and an email address (attributes, properties, data about objects).
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-54
1. Introduction
1-55
Standard tool for conceptual design. Every professional DB designer must know it well. The graphical notation helps to establish a better overview; to see the structure of the data.
It is also useful for communicating with the future users. This notation was probably an important success factor.
1. Introduction
1-56
Thus, a schema transformation into another data model is necessary. Many variants/extensions of the ER-model have been proposed. Several dierent graphical notations are used.
If you know one notation, it is easy to learn another one, since the basic concepts are the same. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-57
In the ER-model, there is a distinction between entities and relationships. In the relational model, both are represented by relations.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-58
1. Introduction
1-59
1. Introduction
1-60
Of course, if one uses a CASE tool for managing ER-diagrams, one has to stick to the notation supported by the tool.
Modern CASE tools have some support for user-dened extensions. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-61
CASE-Tools (1)
CASE = Computer Aided Software Engineering.
In general, CASE tools support the development of software, e.g. by managing design documents, enforcing syntax rules, performing consistency/style checks, translating between dierent views of a system, and supporting project management and team work.
There are special CASE-Tools for database projects, e.g. Oracle Designer, ERwin, PowerDesigner, ER Studio. A specialized graphical editor for ER-diagrams is a standard component of such tools.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-62
CASE-Tools (2)
Standard features of database CASE-Tools: Support for dierent kinds of diagrams, e.g. ER-diagrams, diagrams of relational schemas, business process diagrams. Repository for storing all design documents.
This should include version management and consistency checks. Normally, many ER-diagrams must be managed. A single one would be too big (could only be used as wallpaper).
Automatic translation from the ER-model into the relational model (and vice versa). Automatic generation of software prototypes.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-63
UML (1)
Currently, the Unied Modeling Language (UML), is gaining more and more acceptance. UML is a system of notations for visualizing dierent aspects of an object-oriented software design. UML 1.1 was adopted as a standard by the OMG (Object Management Group) on Nov. 14, 1997. Current version: 1.3.
The UML project started in 1994, when Grady Booch, Ivar Jacobson, and James Rumbaugh, authors of previously competing objectoriented design methods, joined their eorts. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-64
UML (2)
The UML has nine common types of diagrams: Class Diagram, Object Diagram Use Case Diagram Sequence Diagram, Collaboration Diagram Statechart Diagram, Activity Diagram Component Diagram, Deployment Diagram UML class diagrams are similar to ER-diagrams.
The ER-model is certainly not outdated by UML, only extended (and again, the notation is slightly changed).
1. Introduction
1-65
UML (3)
One of the CASE-tools for UML is Rational Rose.
The three UML inventors work for/own the company Rational.
Probably, many future database projects will use UML. But: Its goal is software-design, not DB design. It is more object-oriented than might be good for relational systems.
E.g. it has no built-in notion of keys.
Oracle Designer does not support UML, but Oracle JDeveloper and Sybase PowerDesigner do.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-66
Overview
1. The Task of Database Design 2. Users, Application Programs, Data 3. Phases of Database Design
' $
5. Summary
1. Introduction
1-67
Oracle CASE*Method
Strategy
c
Analysis
c
Design
c c
Build
c
User Documentation
c
Transition
c
Production
Stefan Brass: Database Design
[Barker, 1990]
Universitt Giessen, 2002 a
1. Introduction
1-68
1. Introduction
1-69
an hourly rate.
Then there is no incentive to ever nish the product.
1. Introduction
1-70
1. Introduction
1-71
1. Introduction
1-72
Is the project feasible in the given limits? Prioritize the project goals: Not everything that would be nice to have is worth the eort.
If it should turn out later that time or budget is insucient: What can be sacriced and what is essential? Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-73
One method to estimate the complexity of a project is the Function Point Method.
See: Software engineering textbooks, http://www.ifpug.org/. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-74
The nal ER-diagrams are developed, including all attributes and business rules/constraints. The function hierarchy/business process diagrams are further developed. Dataow and entity usages are analyzed.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-75
Describe required interfaces with other software. Collect information about the expected data volumes, function frequencies, and performance expectations.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-76
1. Introduction
1-77
Functions are mapped into modules (application programs) and manual procedures.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-78
1. Introduction
1-79
1. Introduction
1-80
The database should be lled with example data of the same size as the production database will be.
Only in this way performance can be tested and tuned.
1. Introduction
1-81
1. Introduction
1-82
apparently small
1. Introduction
1-83
Documentation (1)
Documentation should be an ongoing process occurring throughout the system development process. It should accompany the rst prototype the user sees and every other software deliverable. [Koletzke/Dorsey] We all know the nightmare stories of developers who come in to modify an existing system for which there is no documentation. [Koletzke/Dorsey]
1. Introduction
1-84
Documentation (2)
By preparing careful system and user documentation throughout the life cycle of the project, developers are not left with a major task at the end. In addition, frequently no client money is left at this point to pay to extend the development process further. [Koletzke/Dorsey] System documentation will be mainly developed during the Design phase. User documentation (and the help system) can only be developed when the design is complete.
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-85
1. Introduction
1-86
1. Introduction
1-87
Certain tasks (e.g. copying data between systems) might need to be done manually (extra work, possible errors).
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-88
Users who already switched to the new system may help in training users which still have to switch.
1. Introduction
1-89
Overview
1. The Task of Database Design 2. Users, Application Programs, Data 3. Phases of Database Design 4. System Development Lifecycle
' $
5. Summary
& %
1. Introduction
1-90
Business rules are what prevents the business from chaos (not everybody can do what he/she wants).
Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-91
It is also important which business rules are likely to change in future and which ones are very stable.
I watched a large insurance company struggling to introduce a new product. The hold-up was the time required to develop a supporting information system. Meanwhile, one of the companys competitors was able to introduce a similar product, making use of an existing information system, and win a major share of the market. [Simsion/Witt, 2001]
1. Introduction
1-92
Constraints
If each student must have an email address, this attribute must be NOT NULL. If there cannot be two students with the same rst and last name, these two attributes form a key.
View Denitions
The weighting of points for a course is 30% homeworks, 35% project, 35% nal exam. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-93
1. Introduction
1-94
1. Introduction
1-95
1. Introduction
1-96
Non-Redundancy: Every relevant aspect of the real world should be represented only once.
The schema should be minimal, i.e. no schema element can be removed without violating the completeness requirement. Stefan Brass: Database Design Universitt Giessen, 2002 a
1. Introduction
1-97
Stability/Flexibility/Extensibility: The schema can be easily adapted to changing requirements. Simplicity and Elegance
A solution with fewer, more generic schema elements might be preferable to a larger schema.
1. Introduction
1-98
Readability
Diagrams should be drawn in a grid, line crossings should be minimized, symmetric structures should be emphasized, related concepts should be near in the diagram.
Uniformity
Style, naming conventions, abbreviations should be uniform.
1. Introduction
1-99
1. Introduction
1-100
1. Introduction
1-101
1. Introduction
1-102
1. Introduction
1-103
1. Introduction
1-104
1. Introduction
1-105
1. Introduction
1-106
1. Introduction
1-107
1. Introduction
1-108
The employees do not like the new system. The workers union protests against it. The system violates data privacy laws. Or the company gets a bad reputation because of questionable practice regarding personal data.
Stefan Brass: Database Design Universitt Giessen, 2002 a