The Entity-Relationship Model: IS698 Min Song
The Entity-Relationship Model: IS698 Min Song
Model
IS698
Min Song
Overview of Database Design
Conceptual design: (ER Model is used at this
stage.)
What are the entities and relationships in the
enterprise?
What information about these entities and
relationships should we store in the database?
What are the integrity constraints or business
rules that hold?
A database `schema’ in the ER Model can be
represented pictorially (ER diagrams).
Can map an ER diagram into a relational schema.
name
ER Model Basics ssn lot
Employees
Entity: Real-world object distinguishable
from other objects. An entity is described
(in DB) using a set of attributes.
Entity Set: A collection of similar entities.
E.g., all employees.
All entities in an entity set have the same set of
attributes. (Until we consider ISA hierarchies,
anyway!)
Each entity set has a key.
Each attribute has a domain.
name
since Employees
name dname
super- subor-
ssn lot did budget visor dinate
Reports_To
Employees Works_In Departments
Consider
Employees Manages Departments
Works_In: An
employee can
work in many
departments; a
dept can have
many employees.
In contrast, each
dept has at most
one manager,
according to the
1-to-1 1-to Many Many-to-1 Many-to-Many
key constraint on
Manages.
Participation Constraints
Does every department have a manager?
If so, this is a participation constraint: the participation
of Departments in Manages is said to be total (vs.
partial).
Every did value in Departments table must appear in
a tuple of the Manages relation.
since
name dname
ssn lot did budget
Works_In
since
Weak Entities
A weak entity can be identified uniquely only by considering the
primary key of another (owner) entity.
Owner entity set and weak entity set must participate in a
one-to-many relationship set (one owner, many weak
entities).
Weak entity set must have total participation in this
identifying relationship set.
name
cost pname age
ssn lot
As hours_worked
in C++, or other PLs, hourly_wages ISA
attributes are inherited. contractid
Aggregation Employees
Design choices:
Should a concept be modeled as an entity
or an attribute?
Should a concept be modeled as an entity
or a relationship?
Identifying relationships: Binary or ternary?
Aggregation?
Entity vs. Attribute
Should address be an attribute of Employees or an
entity (connected to Employees by a relationship)?
Depends upon the use we want to make of address
information, and the semantics of the data:
If we have several addresses per employee,
policyid cost
Binary vs. Ternary Relationships
(Contd.)
Previous example illustrated a case when two
binary relationships were better than one ternary
relationship.
An example in the other direction: a ternary
relation Contracts relates entity sets Parts,
Departments and Suppliers, and has descriptive
attribute qty. No combination of binary
relationships is an adequate substitute:
S “can-supply” P, D “needs” P, and D “deals-
with” S does not imply that D has agreed to
buy P from S.
How do we record qty?
Summary of Conceptual Design
Conceptual design follows requirements analysis,
Yields a high-level description of data to be stored
ER model popular for conceptual design
Constructs are expressive, close to the way people
think about their applications.
Basic constructs: entities, relationships, and
attributes (of entities and relationships).
Some additional constructs: weak entities, ISA
hierarchies, and aggregation.
Note: There are many variations on ER model.
Summary of ER (Contd.)
Several kinds of integrity constraints can be
expressed in the ER model: key constraints,
participation constraints, and overlap/covering
constraints for ISA hierarchies.
Some constraints (notably, functional
dependencies) cannot be expressed in the ER
model. (e.g., z = x + y)
Constraints play an important role in
determining the best database design for an
enterprise.
Summary of ER (Contd.)
ER design is subjective. There are often many
ways to model a given scenario! Analyzing
alternatives can be tricky, especially for a
large enterprise. Common choices include:
Entity vs. attribute, entity vs. relationship,
binary or n-ary relationship, whether or not
to use ISA hierarchies, and whether or not
to use aggregation.
Ensuring good database design: resulting
relational schema should be analyzed and
refined further. FD information and
normalization techniques are especially useful.
Chapter 3
Data Storage and Access Methods
Approaches
Approach description, key concepts
Contributions (novelty, improved)
Assumptions
Problem Statement – R* Tree
Given
Data containing points and rectangles
Spatial queries (point, range query, insert, delete)
Find - An Access Method (Data Structure)
A hierarchical organization of rectangles
Example from wikipedia
Objectives
Efficiency of spatial queries
Constraints
Balanced tree
Each node is a disk page and has >= m (min # of entries)
entries.
Root has at least two children unless it is a leaf
Efficiency metric = number of disk-pages accessed
Why is this problem important?
Multi-dimensional Applications
Large geographic data. e.g., Map objects like
countries occupy regions of non-zero size in two
dimension.
Common real world usage: “Find all museums
within 2 miles of my current location".
CAD
…
Many DBMS servers support spatial indices
Orcale, IBM DB2, …
Why is this problem Hard?
B-tree split methods ineffective in 2-dimensions
Ex. Sorting
Reference: A Guttman ‘R-tree a dynamic index structure for spatial searching’, 1984
Performance Parameters
beyond R-tree
(Q1) The area covered by a directory rectangle should be minimized.
Intuitions:
Reduce overlap between sibling nodes.
Reduce traversal of multiple branches for point query
Reinsert old data changes entries between neighboring nodes and thus
decreases overlap.
Due to more restructuring, less splits occur
Difference between R-tree and
R*-tree
Minimization of area, margin, and overlap is crucial to
the performance of R-tree / R*-tree.
R4 R4
R5 R5
R3 R3
Preferred by R-tree
R1
R2
R4
R5
R3
Preferred by R*-tree
Validation Methodology
Methodology
Experiments with simulated workloads
Evaluation of design decisions
Results
R*-tree outperforms variants of R-tree
and 2-level grid file.
R*-tree is robust against non-uniform
data distributions.
Summary
Paper’s focus
R*-tree – implementations and performance
Ideas
Heuristic Optimizations (pp. 208)
Reduction of area, margin, and overlap of the directory
rectangles
Better Storage Utilization (pp 211)
Forced Reinsertion (splits can be prevented)
Experimental comparison
Using many data distributions
Assumptions, Rewrite today
Assumptions
Indexing data in two-dimensional space
Bulk load and bulk reorganization not available
Concurrency control and recovery costs are negligible
Reinserts during split!
Rewrite today
Bulk-load of rectangles
Compare with newer methods
R+ tree (disjoint sibling), Hilbert-R-tree
Analytical results
Formally compare R*-tree with alternatives