Database Systems
Database Systems
Betiglu Mengistu
2007
Jimma University
Faculty of Technology
Department of Electrical Engineering
EENG 477 - Database Systems 1st Semester, 2007/08 (2000 E.C.)
Course Objective : This course is intended for Electrical and Computer Engineering students to:
- familiarize them with the fundamentals of database systems and modeling techniques.
- give a ground knowledge for the analysis, design and implementation of database systems.
- discuss issues related to storage and security.
- introduce distributed and parallel database concepts.
Course Outcome : By the end of this course the students are expected to:
- understand the fundamental concept of database systems.
- identify different modeling levels and techniques and utilize them.
- analyze, design and implement a database system for a specific system.
Course Outline
1
5. Structured Query Language (SQL) (8Hrs)
ª Introduction
ª Schema Definition in SQL
ª Simple Query Constructs and Syntax
ª Nested Sub-queries and Complex Queries
ª Views
ª Embedded and Dynamic SQL
[Relational Database Design and Implementation Project]
References:
1. Database System Concepts: Silbershatz, Korth, Sudarshan; McGraw Hill; 4th Edition
2. Fundamentals of Database Systems: Elmasri, Navathe; Pearson; 4th Edition
3. Database System The Complete Book: H.G. Mollina, J.D. Ullman, J. Widom; Prentice Hall; 1st Edition
4. Database Management Systems: Raghu Ramakrishnan, Johannes Gehrke; McGraw Hill; 2nd Edition
Betiglu M
Betiglu Mengistu
2
EENG 477- Database Systems 1
1. Fundamental Concepts of a Database Systems
and third generation systems realized the sharing of an integrated database among many
users within an application environment.
- The fourth-generation database technology, namely relational database technology arises
to solve the lack of data independence and the tedious navigational access to the database in
the second and third generations. Relational database technology is characterized by the
notion of a declarative query.
- Fifth-generation database technology will be characterized by a richer data model and a
richer set of database facilities necessary to meet the requirements of applications beyond
the business data-processing applications for which the first four generations of database
technology have been developed.
- Consistency: It must ensure that the data itself is not only consistently stored but can be
retrieved and shared efficiently.
- Concurrency: It must enable multiple users and systems to all retrieve the data at the same
time and to do so logically and consistently.
Such database systems span from single-user database system that run on a single personal
computer to a high-performance database systems that run on a main frame.
(Data Server).
- User tier presents the user interface for the application, displays data and collects user
input. It also sends and requests for data to the next tier. It is often known as the
presentation tier.
- The business tier incorporates the business rules for the application. It receives requests for
data from the user tier, evaluates them against the business rules and passes them on to the
data tier. It then receives data from the data tier and passes back to the user tier. It is also
known as the business logic tier.
- And finally at the base, the data tier comprises the data storage and a layer that passes data
from the data storage to the business tier and vice versa. It is also known as the data tier.
Client
Web Server
Data Server
- File manager: manages disk storage allocation and data structure for stored data.
- Buffer manger: is responsible for fetching data from disk storage to the main memory.
Query Processor: is a module that handles queries as well as requests for modification of the
data and metadata. Some of the components are:
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 6
1. Fundamental Concepts of a Database Systems
- DDL interpreter (compiler): processes DDL statements for schema definition (meta-data)
and records the definitions in the data dictionary.
- DML compiler: analyze, translates and optimizes DML statements in a high-level query
language into an evaluation plan consisting of low-level instructions codes to the query
evaluation (execution) engine.
- Query evaluation engine: execute low-level instructions generated by the DML compiler.
The components of general database management system can be summarized in the figure
shown below.
Application Application Query Tools Administration
Interfaces Programs Tools
Authorization Transaction
Buffer Manager File Manager and Integrity Manager
Manager
Storage Manager
Ñ Hierarchical Model
The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and
child data segments. This structure implies that a record can have repeating information,
Compiled By: Betiglu
generally in the child data segments. Data in a series of records will have a set of field values
attached to it. It collects all the instances of a specific record together as a record type. These
record types are the equivalent of tables in the relational model, and with the individual records
being the equivalent of rows. To create links between these record types, the hierarchical model
uses Parent Child Relationships. In a hierarchical database the parent-child relationship is one to
many. This restricts a child segment to having only one parent segment. Hierarchical DBMSs
were popular from the late 1960s, with the introduction of IBM's Information Management
System (IMS) DBMS, through the 1970s.
Ñ Network Model
Some data may naturally be modeled with more than one parent per child. So, the network model
permitted the modeling of many-to-many relationships in data. In 1971, the Conference on Data
Systems Languages (CODASYL) formally defined the network model. The basic data modeling
construct in the network model is the set construct. A set consists of an owner record type, a set
name, and a member record type. A member record type can have that role in more than one set,
hence the multi-parent concept is supported. An owner record type can also be a member or
owner in another set. The data model is a simple network, and link and intersection record types
may exist, as well as sets between them.
Ñ Relational Model
The history of the relational database began with E.F. Codd's 1970 paper, A Relational Model of
Data for Large Shared Data Banks. The concept derives from his principles of relational algebra.
Most of the database systems in use today are based on the relational system, known as
Relational Database Management Systems (RDBMS)
The model initial allows the definition of data structures, storage and retrieval operations and
integrity constraints. In such a database the data and relations between them are organized in
tables. A table is a collection of records and each record in a table contains the same fields
organized in columns. The records in the table form the rows of the table.
Properties of Relational Tables:
- Values Are Atomic
- Each Row is Unique
- Column Values Are of the Same Kind
- The Sequence of Columns is Insignificant
- The Sequence of Rows is Insignificant
- Each Column Has a Unique Name
The above three models the so-called legacy data models: the network and hierarchical models;
and the relational model are categorized under implementation (or representational) data
Compiled By: Betiglu
models which are closer to the physical structure of the database. Implementation data models
provide concepts to the understanding of users but they are not too far away from the way data
is organized within the computer.
The other category of data model is the high-level (or conceptual) data model.
Object-Oriented Model
The advancement of the Object-Oriented Programming (OOP) tends to evolve a new database
management system namely the Object DBMS (ODBMS). The object data model is a way for
the modeling of a database in ODBMS. It can be regard as high-level implementation data model
that is closer to the conceptual model. It is based on the object oriented concept mainly for
ODBMS implementation but can also be used in the data model of RDBMS implementation.
This combination object-oriented data model with the relational model leads into a data model
known as object-relational data model.
Requirement analysis
Compiled By: Betiglu
Requirement analysis of a database design determines the data, information, system components,
data processing and analysis functions required by the system. It involves the process of
identifying and documenting the data required by users to meet present and future information
needs.
Requirements are determined by interviewing producers and users of data and producing a
formal requirements specification. The specification includes the data required for processing,
natural data relationships, constraints with respect to performance, integrity and security.
The Requirements analysis should address the following questions
- What user views are required (present and future)?
- What data elements are required in these user views?
- What are the primary keys that uniquely identify entities in the organization?
- What are the relationships between data elements?
- What are the operational requirements such as security, integrity, and response time?
Steps in requirements analysis for database design:
1. Identify scope of the design effort.
2. Establish metadata collection standards – who to interview, what to collect, how to
structure interview.
3. Identify user views – extracted by reviewing user tasks, types of decisions. Forms,
reports, graphs, maps can be useful information for defining views.
User view- subset of data used by a user in a specific context
4. Build a data dictionary – define and describe each item in detail: name, description, type,
length, range and relationships
5. Identify data volumes and usage patterns – how much data is used and how frequently is
data change.
6. Identify operational (functional) requirements.
The output of the requirement analysis can be broadly classified in to two as: data requirement
and functional requirement.
Design
Design of a database involves three types of designing steps:
Conceptual Design: Synthesis of information from requirements analysis according to
semantic rules. Outcome is a conceptual model. The conceptual model describes entities,
attributes and relations among entities independent of implementation details.
Implementation (Logical) Design: Transforms the conceptual data model into an internal
Compiled By: Betiglu
model - schema that can be processed by a particular DBMS. For example E/R model to
relational model mapping.
Physical Design: Involves design of internal storage structures, record formats, access
methods, record blocking and soon. [Requires a higher level study]
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 10
1. Fundamental Concepts of a Database Systems
Implementation
Implementation of a database is simply translating the implementation design into one of the
database management systems. That is writing/developing the entities and/or the objects in the
database schema together with their relationships and constraints.
The steps in the database design can be summarized in the following diagram.
Problem
Requirement
Analysis
Functional Conceptual
Analysis Design
Implementation
(Logical) Design
Application Program
DBMS Implementation
Design
Dependent (Logical) Schema
Application Program
Physical
Structure
Design
Internal Schema
Implementation (Low-level
Data Model)
Application Program
Entity: represents existing real-world objects or concepts, such as places, objects, events,
persons, orders, customers, and so on.
Relationship: represents associations between objects, such as the fact that a customer
may place an order.
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 2
2. Entity - Relationship (E/R) Data Model
Attribute: describes the entity, such as the invoice date or the customer first name.
- The projects are having start date, due date, complete date and
status that describe their progress. Every project is lead by a senior
manager organized into teams of five to eight programmers
coordinated by a team leader.
Entity Sets
Entities are the principal data objects about which information is to be collected in E/R model.
Entities are usually recognizable concepts, either concrete or abstract, such as person, places,
things, or events which have relevance to the database. An entity set is then a set consisting of
the same type of entities that share same properties.
Consider the case study; some specific examples of entities are then:
EMPLOYEES, PROJECTS, CUSTOMERS …
The candidate entities from the requirement statements are the nouns and the adjective noun
phrases. The “EMPLOYEES” entity set represents all the set of employees and the “Projects”
entity set represents all the set of projects.
Entities are classified as independent (Strong) or dependent (Weak).
& A strong (independent) entity is one that does not rely on other entities for identification.
& A weak (dependent) entity is one that relies on other entities for identification.
An individual occurrence of an entity set is also known as an instance (object).
Attributes
Attributes are descriptive properties that are associated with an entity. A set of attributes
describe an entity.
A particular instance of an attribute is called a value.
For example, “Employee Id” and “Name” are the attributes of the “EMPLOYEES” entity set; and
“Kevin Jones” is one value of the attribute “Name”.
The domain of an attribute is the collection of all possible values an attribute can have. The
domain of “Name” is a character string.
Attributes can be classified as identifiers or descriptors.
& Identifiers: more commonly called keys, uniquely identify an instance of an entity.
Example: “Employee Id” uniquely identifies an employee entity from the entity
set.
& Descriptor: describes a non-unique characteristic of an entity instance.
Example: “Name” is a descriptor for the “EMPLOYEES” entity set.
Compiled By: Betiglu
& Composite Attributes: are attributes that are composed of smaller subparts that can be
subdivided into the subparts (Attributes).
Example: “Address” of the “EMPLOYEES” entity set that can be divided into
“City”, “Home Address”, “Phone”, and “P.O. Box”
Hierarchical Composite Attributes
Address
Another classification of attributes is based on the values that they can hold as: Single-valued
and Multi-valued attributes.
& Single-valued Attributes: are attributes having only one possible value at any time.
Example: “Name” and “Gender” of the “EMPLOYEES” entity set.
& Multi-valued Attributes: are attributes that are having possibly more than one value.
Example: “Address” of the “EMPLOYEES” entity set.
Attributes can also be categorized Stored and Derived attributes.
& Derived Attributes: are attributes that can be calculated from the related stored
attributes, entities or general states.
& Stored Attributes: on the other hand are attributes that can not be calculated in any way
from the stored attributes.
Example: “Birth Date” of the “EMPLOYEES” entity set is a stored attribute,
where as “Age” is a derived attribute that can be calculated from the “Birth Date”
and “Current Date”.
Relationship Sets
A Relationship represents an association between two or more entities. An example of a
relationship would be:
- “EMPLOYEES” are Assigned to “TEAMS”
Compiled By: Betiglu
A Relationship Set is then a set consisting same types of relationships. The entities involved in
the relationship are known as participating entities and the function the entity plays in a
relationship is called the entity’s role.
Example: In the Assigned relationship “EMPLOYEES” and “TEAMS” entity sets are the
participating entity sets; and the “EMPLOYEES” entity has a role as a “Programmer” or
“Team Leader” in the relationship.
Relationships are classified in terms of degree, connectivity, cardinality, and existence.
Degree: The degree of a relationship is the number of entities associated with the relationship.
The n-ary (multi-way) relationship is the general form for degree n. Special cases are the binary,
and ternary, where the degree is 2, and 3, respectively
Connectivity: The connectivity of a relationship describes the mapping of associated entity
instances in the relationship. The values of connectivity are “one” or “many”.
Cardinality: The cardinality of a relationship is the actual number of related occurrences for
each of the two entities. The basic types of connectivity for relations are: one-to-one, one-to-many,
and many-to-many.
A B A B A B
a1 b1 b1 a1 b1
a2
a2 b2 b2 a2 b2
a3 b3 b3 a3 b3
a4
a4 b4 b4 a4 b4
ª Design Principles
E/R Diagram
Compiled By: Betiglu
The Entity Relationship (E/R) data model is a diagrammatical data model. The elements of the
E/R model are represented by:
- Rectangles - for the Entity sets,
- Ellipses - for the Attributes,
Task
- Lines - for the links between the attributes and the entity sets and between the entity
sets and the relationships.
- Double border Rectangles - for the weak entity sets.
- Double border Ellipses - for the multi-valued attributes.
- Dashed border Ellipses - for the derived attributes.
- Arrow Head Line - for the link between an entity set and a one-to-one or many-to-one
relationship. The arrow is headed to the one side entity set.
Example
- “EMPLOYEES” are Assigned to “TEAMS”
- “CUSTOMERS” Owns “PROJECTS”
- “TEAMS” works on “PROJECTS”
Composite attributes are represented by linked ellipses as depicted in the above figure with the
attributes “Address” and “H Addrs”.
Relationship Attributes: Attribute(s) may be used in some relationships to describe the
relationship further. Consider the relationship “WorksOn” between the “TEAMS” and
“PROJECTS” entity sets. The relationship can be further described if an attribute “Task” is
added to it as follows.
Compiled By: Betiglu
Name Name
Descr EmpId BDate
Age
TEAMS Assigned EMPLOYEES
Address
Name
ProjId CustId Address
SDate DDate
Multi-way Relationship: Consider the three way relationship between the “PROJECTS”,
“TEAMS”, and “SOFTWARE” entity sets.
Cardinality Limits of a Relationship: The credential limit of a relationship is labeled as:
- 0..* or 0..∞ indicating zero or more participation of the entity in the relationship.
- 1..* or 1..∞ indicating one or more participation of the entity in the relationship.
- 0..1 indicating zero or one participation of the entity in the relationship.
- 1..1 indicating exactly one participation of the entity in the relationship.
Compiled By: Betiglu
Task
5..8
EMPLOYEES
Fig 5. Multi-way Relationships and credential limits
The multi-way (ternary) relationship shown in figure 5 above can be reduced to a binary
relationship with the use of an entity set in place of the relationship and having three new
relationships for the links in between the participating entity sets and the relationship.
Task
For
PROJECTS
Fig 6. Multi-way Relationships to Binary Relationship
If the multi-way relationship set that is transformed into the binary relationship had any
attributes, these are assigned to the entity set that replaces the relationship.
Entity Set Roles in a Relationship: In some relationships a single entity set may participate
more than once in such case a label is on the link line from the entity set is used to differentiate
the participation of the entity set.
Assigned By
TEAMS Assigned EMPLOYEES
Assigned
For
Compiled By: Betiglu
Task
Design Issues
The following are some useful principles to be followed in designing databases.
1. Faithfulness - first and for most, the design should be faithful to the specifications. That
is classes or entity sets and their attributes should reflect reality.
2. Avoiding Redundancy - be careful to say everything only once.
3. Simplicity - avoid introducing more elements into your design than are absolutely
necessary.
4. Picking the Right kind of Element - Sometimes we have options regarding the type of
design element used to represent a real-world concept.
Use of Entity Set versus Attributes: Generally, of something has more information associated
with it than just its name, it probably needs to be an entity set. However, if it has only its
name to contribute to the design, then it is probably better to make it an attribute.
Example: A “SOFTWARE” entity set may have a “Version” attribute, or
“VERSION” can be argued to be an entity set.
Entity versus Relationship Sets: Since relationships represent events there will always be
confusion between the entity sets and relations.
Binary versus n-ary Relationship Sets: Generally, of something has more information
associated with it than just its name, it probably needs to be an entity set. However, if it
Compiled By: Betiglu
has only its name to contribute to the design, then it is probably better to make it an
attribute.
Remarks on Designing
- Choose meaningful naming for the entities, attributes and relationships.
- Use short links.
- Cluster diagram if it has too many entities and relationships.
Keys
As described above keys are attributes or set of attributes that suffice to distinguish entities from
each other.
A super key also know as super set is then a set of one or more attributes that in group
(collectively) can identify an entity uniquely from the entity set.
Example: Consider the “EMPLOYEES” entity set, then
- “EmpId”, “EmpId, Name”, “NationalId”, “NationalId, BDate”, … are super keys
Compiled By: Betiglu
The more interesting super set is the minimal super set that is referred to as the candidate key.
The candidate key is the sufficient and the necessary set of attributes to distinguish an entity set.
Example: In the “EMPLOYEES” entity set
- “EmpId”, “NationalId”, “Name, BDate” (assuming that there is no coincidence
that employees with the same name may born on the same day) … are candidate
keys
Name
ProjId CustId Address
SDate DDate
The designer of the database is the one that makes the choice of the candidate keys for
implementation, but the choice has to be made carefully. Primary key is a term used to refer to
the candidate key that is selected by the designer for implementation.
relationship set (one owner entity is associated with one or more weak entities, but each
weak entity has a single owner). This relationship set is called the identifying
relationship or supporting relationship of the weak entity set.
& The weak entity set must have total participation in the identifying relationship.
To distinguish weak entities that depend on one particular identifying entity (string entity) an
attribute or set attributes in the weak entity set is used. Such an attribute or set of attributes is
referred to as discriminator.
The key (primary key) of a weak entity set consists of:
1. Zero or more of its own attribute; the discriminator, and
2. Key attributes of the owner (identifying) entity set.
Notation for Weak Entity Set
- Weak Entity sets are represented by Double boarder Rectangles.
- The identifying many-to-one relationship is represented by Double border Diamonds.
- If the entity set has a discriminator then it is represented by Underlining the
attribute(s).
Example:
- Consider “TEAMS” entity set Teams with the same name can be formed to work on
different projects. Thus neither “Name” nor “Description” can uniquely identify a
“TEAMS” entity. Rather an entity will be distinguished when it is related to a
“PROJECT” entity. Note that the relationship in between is a many-to-one relationship
when it is seen from the “TEAMS” side.
Name
ProjId Descr
SDate DDate
Generalization is a design process of bottom-up approach in which multiple entity sets are
synthesized into a higher level entity set based on their common features.
Specialization and generalization in E/R model are represented by a triangle labeled “ISA”
between the entities. The vertex of the triangle is towards the generalized (super class) entity
set.
Example:
- “FULL-TIME EMPLOYEES” and “PART-TIME EMPLOYEES” can be generalized
to for an entity set “EMPLOYEES”
- “PROJECTS” entity set may be further specialized into “WEB-BASED PROJECTS”
and “WIN32 PROJECTS”
PROJECTS EMPLOYEES
ISA ISA
Ñ Aggregation
One deficiency in E/R modeling is the fact that a relationship is allowed only between entity
sets. But in some cases it may be advantageous to have a relationship between a relationship and
an entity set or a collection of entity sets.
Example:
- Consider the “WorksOn” relationship between “TEAMS” and “PROJECTS”. From the
case study it is to be noted that every project is to be lead by a senior manager. Hence
the manager is responsible for managing the teams, projects and the outcome of the
project, the software. Therefore the resulting E/R model would be as follows.
Manages
EMPLOYEES
Fig 12. Redundancy in E/R model
As can be seen from the diagram above redundancy loop is introduced as the E/R model doesn’t
Compiled By: Betiglu
An alternative way to avoid the redundancy is with the use of the aggregation. Aggregation is
an abstraction through which collection of related entity sets and relationships are treated as
high-level entities. It allows indicating for a relationship set (identified through a box) to
participate in another relationship set.
Example:
- The previous example can be alternatively represented as follows
Manages
EMPLOYEES
Fig 13. Aggregation in E/R model
Compiled By: Betiglu
Employees
EmpId Name BDate Sub City Kebele Phone
E001 Alemu Girma 01/10/70 Bole 06 011-663-0712
E004 Kelem Belete 12/04/68 Gulele 03 011-227-2525
Fig 1. Typical Employee relation instance
The columns in the table are representing the attributes of the relationship, and the rows (other
than the heading row) represent tuples (records) of the relation.
A relation in a relational model consists of:
& The Relation schema: - that describes the column heads for the table and
& The Relation instance: - that is the table with the set of tuples.
The set of relation schema forms schema for the relational database called database schema
(relational database schema).
In relational model the relation schema are described first. And the schema specifies
- The relation's name
- Name for each attribute (field or column)
- Domain of each attribute: - A domain is referred to in a relation schema by the domain
name and has a set of associated values.
Example
Compiled By: Betiglu
Properties of Relations
& Rows (tuples) in a single relation are unique (that is; no two tuples are identical).
& Relations are set of tuples not lists (that is; order of tuples in a relation is immaterial).
& Attributes are atomic.
& The values that appear in a column must be drawn from the domain associated with that
column.
& The degree, also called arity, of a relation is the number of attributes in the relation.
& The relation names in a relational database are distinct.
Key Constraints
A key constraint is a statement that a certain minimal subset of the attributes of a relation is a
unique identifier for a tuple in the relation.
A set of attributes that uniquely identifies a tuple according to a key constraint is called a
candidate key for the relation; often abbreviated just as key.
Key attributes in relational model are indicated by underlying the attributes in the relational.
Example
- Employees (EmpId, Name, BDate, SubCity, Kebel, Phone)
- Projects (PrjId, Name, SDate, DDate, CDate)
- Teams (Name, Descr)
REMARK: Note that a key for a relation may not be directly inferred from the high-level
conceptual models in some cases.
In the above example for the “WorkSchedule” to refer to the “Employees” relation instance, it
has an attribute ‘Employee’ of the same type as the ‘EmpId’ in the “Employees” relation which is
a primary key. The foreign key constraint is implemented through the ‘Employee’ attribute in
the referencing relation “WorkSchedule”.
WorkSchedule Employees
… Hours Employee EmpId Name …
… 8 E001 E001 Alemu Girma …
… 6 E004 E004 Kelem Belete …
… 8 E002 E002 Mulken Getu …
… 4 E004
(a) Referencing relation (b) Referenced relation
Fig 2. Foreign Constraint in Relational Model
NOTE: - A single tuple can be referenced by zero or more tuples in the referencing relation, but
a single tuple with a single foreign key attribute can only reference one tuple.
- A foreign key could refer to the same relation.
- A relational database consists of related relations through a foreign key.
Name Name
Descr EmpId BDate
Age
TEAMS Assigned EMPLOYEES
Address
Name
ProjId CustId Address
SDate DDate
Then the relations from the strong entity sets having only simple and single valued
attributes are as follows
- Projects(ProjId, Name, SDate, DDate)
- Customers(CustId, Name, Address)
For the weak entity set (TEAMS) in figure 3 above the corresponding relation is:
- Teams(ProjId, Name, Descr)
Example
From figure 3 above the corresponding relations for the relationship sets are:
- Assigned(EmpId, ProjId, TeamName)
- Owns(ProjId, CustId)
NOTE: Supporting relationships (for example WorksOn) need not be transformed to relations if
their purpose is solely for identifying a weak entity set by passing on the identifying
strong entity set’s primary key to the weak entity set; otherwise they will introduce
redundancy.
Suppose entity set E and F are related through a many-to-one relationship R from E to F, then it
is possible to join the relations for E and R that come out of this E/R model into a single relation
S with a schema consisting of:
All attributes of the entity set E,
The keys attributes of the entity set F, and
All Attributes of the relationship R.
If the participation of E into R is total it is also possible to include all attributes of F in the
relation S and have one single relation S in place of the three relations E, F and R.
The primary key for S would the primary key of E.
Example
Consider the entity sets “PROJECTS” and “CUSTOMERS” and the corresponding
relationship “Owns”, then we can have:
- Projects(ProjId, Name, SDate, DDate, CustId) and Customers(CustId, Name, Address),
or
- Projects(ProjId, Name, SDate, DDate, CustId, Name, Address)
relation instead all its attributes are passed to all immediate lower-level entity sets
realtions.
2. Use of Nulls: One relation having a large set of attributes of all the lower-level entity
sets and higher-level entity set; entities have NULL in attributes that don’t belong to
them. Involves large number of NULL values for disjoint generalization.
3. Object-Oriented Approach: One relation per subset of subclasses, with all relevant
attributes including:
& Attributes of the higher-level entity set, and
& Attributes of that lower-level entity set.
The primary key of the higher-level entity set becomes the primary key of each relation.
Example
Consider the entity sets “EMPLOYEES” and its lower-level entity sets, then
- FullTimeEmployees(EmpId, Salary, Saving, Allowance)
- PartTimeEmployees(EmpId, HourlyPay, ContractPeriod)
ª Dependencies
In a database design the two most common pitfalls that result in bad designing are:
& Repetition of information, and
& Inability to present certain information (Loss of information).
Ñ Functional Dependencies
Functional dependency is a kind of constraint that helps to remove redundancy in relational
database design.
Defintion: Functional dependency denoted by X J A is an assertion about a relation R that
whenever two tuples of R agree on all the attributes of X, then they must also agree
on the attribute A. We say that “X J A holds in R” or “X functional determines A”
Note that in the notation X J A; X represent sets of attributes and A represent single
attribute. That is A1 A2 A3…An J B
The functional dependency is a generalization of the notion of superkey.
Example:
- Consider the Teams relation: Teams(PrjId, Name, Descr), then
Compiled By: Betiglu
Transitivity Rule
1. Let X be a set of attributes that eventually will become the closure. First, we initialize X
to be X.
2. Now, we repeatedly search for some functional dependency B1 B2 …Bm J C Such that all
of B1, B2...Bm are in the set of attributes X but C is not. We then add C to the set X.
3. Repeat step 2 as many times as necessary until no more attributes can be added to X.
4. The set X, after no more attributes can be added to it, is the closure set X+.
Example: Consider a relation with attributes A, B, C, D, E, and F. Suppose that this relation
has the functional dependencies AB J C, BC J AD, D J E, and CF J B. What is
the closure of {A, B}, that is {A, B}?
Solution:
X = {A, B}
From the function dependency AB J C, we add C to X that is X = {A, B, C}
Similarly; BC J AD Ö X = {A, B, C, D}
D J E Ö X = {A, B, C, D, E}
No more changes in X are possible. Thus {A, B}+ = {A, B, C, D, E}
From the closure set it is to follow that AB J D
Exercise: Test whether D J A flows from the functional dependency set?
To test for D J A, first determine the closure set of {D}
X = {D}
From the function dependency D J E, we add E to X that is X = {D, E}
No more changes in X are possible. Thus {D}+ = {D, E}
From the closure set D J A does not hold.
Ñ Multivalued Dependencies
Multivalued dependency for a relation R, is defined as a constraint when the values of one set of
attributes is fixed, then the values in certain other attributes are independent of values of all the
other attributes in R.
That is; for a multivalued dependency X JJ Y in R where X and Y are subsets of the set of
Compiled By: Betiglu
attributes in R, if t and u are tuples in the relational instance r for the schema R, then there exist
a third tuple v that agrees:
1. with both t and u on X’s,
& Reduce the number of columns in tables. The less number of columns in tables, the more
rows can fit on a single data page, which helps to boost read performance of the
RDBMS.
& Reduce the amount of SQL code. The less code there is, the less that has to run, speeding
your application's performance.
& Maximize the use of clustered indexes. The more data is separated into multiple tables
because of normalization, the more clustered indexes become available to help speed
up data access.
& Reduce the total number of indexes. The less columns tables have, the less need there is
for multiple indexes to retrieve it. And the fewer indexes, the less negative is the
performance effect of data insertion, modification and deletion.
Redundancy in a database design results in data anomalies classified as:
Insertion Anomalies
Deletion Anomalies
Modification Anomalies
Example: Consider a relation schemas for Employees and Teams in a single realtion as
follows
Emp_Teams(EmpId, Name, BDate, Gender, TeamId, Project, TeamName)
It can easily be noted that there is redundancy of data in the “Emp_Teams” relation for
the Teams detail of the Employees. Consider the following instance for the relations
Employees
EmpId Name BDate Gender TeamId Project TeamName
E001 Alemu Girma 01/10/70 M 1 1 Programmer
E001 Alemu Girma 01/10/70 M 3 2 Programmer
E004 Kelem Belete 12/04/68 M 2 1 Tester
E005 Mulu Tasew 10/05/69 F 3 2 Programmer
E008 Belachew K 02/11/62 M 1 1 Programmer
E003 Almaz B 05/06/65 F 5 3 Programmer
E005 Mulu Tasew 10/05/69 F 2 1 Tester
- Insertion Anomalies: Suppose we want to insert a new employee that works in project 1
as a programmer, then the corresponding fields for the Team
detail has to be entered correctly. If data is entered incorrectly
the consistency will be violated.
Compiled By: Betiglu
- Deletion Anomalies: Suppose E003 is to be removed from the employees list, then
Team information of TeamId 5 will also be removed and vice
versa.
- Modification Anomalies: During data update the consistency may also be violated as in
the case of insertion.
Although normalization is a way to remove redundancy anomalies and preserve consistency,
integrity and maintainability, it may also lead:
& Increase in storage space
& Complex queries (queries with many multiple joins of tables)
In such situations it may be desired to denormalize some of the tables in order to reduce storage
space and the number of required joins.
Denormalization is the process of selectively taking normalized tables and re-combining the data
in them. Sometimes the addition of a single column of redundant data to a table from another
table can reduce a 4-way join into a 2-way join, significantly boosting performance by reducing
the time it takes to perform the join.
Databases intended for Online Transaction Processing (OLTP) are normalized. By contrast,
databases intended for On Line Analytical Processing (OLAP) operations are primarily "read
only" databases and tend to extract historical data that has accumulated in the project for quite a
long time. For such databases, redundant or "denormalized" data may facilitate Business
Intelligence applications.
While denormalization can boost storage and query performance, it can also have negative
effects. For example, by adding redundant data to tables, you risk the following problems:
& More data means the RDBMS has to read more data pages than otherwise needed,
hurting performance.
& Redundant data can lead to data anomalies and bad data.
& In many cases, extra code will have to be written to keep redundant data in separate
tables in synch, which adds to database overhead.
Ñ Normal Forms
Normalization procedure provides:
& A framework for analyzing relation schemas based on functional and multivalued
dependencies.
& A series of normal form test that can be carried out on individual relation schemas so
that the relational database can be normalized to any degree.
Normalization through decomposition need to preserve the existence of two additional
Compiled By: Betiglu
Join dependency is a general form of multivalued dependency where n = 2. (i.e. JD(R1, R2)
implies R1 ∩ R2 →→ ( R1 − R2 ) and using complement property R1 ∩ R2 →→ ( R2 − R1 ) ).
5NF also known as Project-Join Normal Form (PJNF) requires that there are no non-trivial join
dependencies that do not follow from the key constraints. A table is said to be in the 5NF if and
only if it is in 4NF and every join dependency in it is implied by the candidate keys.
That is; if JD(R1, R2, … Rn) is non-trivial join dependency in R, then
& Every Ri is superkey of R.
A join dependency (JD), denoted by JD(R1, R2, … Rn), specified on relational schema R, specifies
a constraint on the state r of R. The constraint states that every legal state r of R have a
nonadditive join decomposition into R1, R2, … Rn.
Although, there are also other higher level normalizations such as DKNF and 6NF, most
relational database designs are sufficiently normalized at BCNF level or even at 3NF.
Compiled By: Betiglu
4. Relational Algebra
Relational Algebra is a procedural query language that consists of a set of operations that take
one or two relations as input and produce a new relation as a result. The algebra operations
enable a user to retrieve specific request on a relational model. The operations that produce a
new relation can be further manipulated using operations of the relation algebra. The sequence
of the relational algebra that produces new relation forms a relational algebra expression.
- Selection (σ)
- Projection (Π)
- Rename(ρ)
Binary Operators.
- Product (Cartesian Product) (¯)
- Union ( U )
- Difference ( – )
The binary operators listed above are also known as set operators as they are derived from the
set theory.
Ñ Unary Operations
Select Operation
The select operation selects a subset of tuples from a relation instance that satisfies a given
predicate (condition).
It is denoted by
σ C (R )
Compiled By: Betiglu
Example
- From the “EMPLOYEES” relation to extract Senior Mangers, selection operation can
be written as
- Employees(EmpId, Name, BDate, Age, Gender, Position, Salary)
Project Operation
While the select operation is picking certain rows from a relation, projection operation forms a
new relation by picking certain columns in the relation.
It is denoted by:
Π A (R )
Where Π represents the PROJECT operator and A is a set of attributes in the relation R.
Example
- To extract Employees Name and Position only from the “EMPLOYEES” relation
Rename Operation
Unlike relations in the relational model the new relations driven from the relational algebra
expression do not have name that will allow us to refer to them in other expressions. The
renaming operator can be used to explicitly rename resulting relations of an expression.
It is denoted by:
ρ S ( A , A ,L A ) (R )
1 2 n
Where ρ represents the RENAME operator and S is a name for the new relation and A1,
A2, … An are new names for the attributes in the relation R.
After the renaming the name of the relation and the attributes can be used as ordinary relation
Compiled By: Betiglu
Ñ Binary Operations
- Consider the Employees, EmpTeams and Teams relation and develop a relational
algebra expression that retrieves the name and position of Employees that work on
Project 1 as Programmers and rename the relation as Programmers1.
- Employees(EmpId, Name, BDate, Age, Gender, Position, Salary)
- EmpTeams(EmpId, TeamId)
- Teams(TeamId, PrjId, Name, Descr)
ρ Programmers1( Name, Position ) (Π E.Name, E.Position (σ PrjId=1 AND T.Name=" Pr ogrammer" (
σ ET.TeamId=T.TeamId ( ρ T (Teams )Χσ E.EmpId= ET.EmpId (
ρ E (Employees)Χρ T (EmpTeams ))))))
Compiled By: Betiglu
The expression tree for the above relational algebra expression is:
Π E . Name , E . Position
σ ET .TeamId =T .TeamId
ρT σ ET .TeamId =T .TeamId
Teams Χ
ρT ρT
Employees EmpTeams
Union Operation
The union operation on R and S denoted by R U S results a relation that includes all tuples either
in R or in S or in both. Duplicates are eliminated from the result.
Intersection Operation
The intersection operation on R and S denoted by R I S results a relation that includes all tuples
in both R and S.
Example
- Consider the following relations R and S, then R U S and R – S are given as shown to
the right.
R S RUS R–S
A B C D A B A B
1 2 1 2 1 2 5 6
3 4 4 3 3 4
5 6 4 3
5 6
- Find name and position of Employees that work on both Projects 1 and 2 as
Programmers.
Similar to the previous example, we can have
ρ Programmers 2( Name, Position ) (Π E.Name, E.Position (σ PrjId=2 AND T.Name="Programmer" (
σ ET.TeamId=T.TeamId (σ E.EmpId= ET.EmpId (
ρ E (Employees)Χρ T (EmpTeams )) Χρ T (Teams )))))
The employees working in both projects 1 and 2 are then given by relational
algebra:
Programmers1 I Programmers2
ª Additional Operations
The set of relational algebra operations {σ, Π, ρ,¯, U , –} is a complete set that the other
original relational algebra operations such as intersection, join, division and assignment can be
expressed as the sequence of the fundamental operations. In situation where the use of the
fundamental operators result complex and lengthy expressions such operators are helpful to
minimize the complexity of queries.
- The pervious example that retrieves the name and position of Employees that work on
Project 1 as Programmers from the modified relations below can be simplified as:
- Employees(EmpId, FullName, BDate, Age, Gender, Position, Salary)
- EmpTeams(EmpId, TeamId)
- Teams(TeamId, PrjId, TName, Descr)
Π E.FullName, E.Position (σ T.PrjId =1 AND T.TName ="Programmer" (
(ρ E (Employees) >< ρ T (EmpTeams )) >< ρT (Teams )))
union of the schemas of R ands S. (That is, the operation does not eliminate repeated columns in
the two relations R and S if any).
Example
- Consider the following relations R and S, then R >< S is given as shown to the right.
R.B<S.B
R S R >< S
R.B <S.B
A B B C D A R.B S.B C D
1 2 2 a x 1 2 4 b y
3 4 4 b y 1 2 5 c z
5 c z 3 4 5 c Z
Assignment Operation
The assignment operation denoted by I is similar to assignment operation in programming that
helps to assign the result of a relational algebra from the right into a relation variable to the left.
Subsequent assignment operations can be used to develop complex sequential queries, the
intermediate assignment operations do not result any relation to the user.
Division Operation
The division operation denoted by ÷ is suited to queries that include “universal quantification” or
the phrase “for all”. A division operation is applied to two relations R(Z ) ÷ S ( X ) , where X ⊆ Z
and the result is T (Y ) where Y = Z − X . For tuples t to appear in the result T, the values in t
must appear in R in combination with every tuple in S.
The division operation can be expressed using the sequence of the fundamental operators as:
T 1 ← Π Y (R )
& T 2 ← Π Y ((S Χ T 1) − R )
T ← T1 − T 2
Example
- Consider the following relations R and S, then R ÷S is given as shown to the right.
R S R÷S
A B B A
1 2 2 1
1 4 4 3
3 2
2 4
Compiled By: Betiglu
4 4
3 4
- Retrieve all the projects that “Jhon” and “Dave” are jointly working as programmers.
ª Extended Operations
The basic relational algebra operations have been extended in several ways to enhance the
expressive power of the original relational algebra. Some of the extended operations are:
- Outer Join
- Extended Projection
- Duplicate Elimination
- Aggregation and Grouping, …
Keeps all tuples in both the left and right relations when no matching tuples are found,
padding them with NULL values as needed
Example
_
- Consider the following relations R and S, then R ><− S is given as shown to the right.
R S R ><− S
_
A B B C D A B C D
1 2 2 a x 1 2 a x
3 4 4 b y 3 4 b y
4 6 5 c z NULL 5 c z
7 d w NULL 7 d w
- Write relational algebra that retrieves all the projects and the corresponding teams in
the projects.
- Projects(PrjId, PName, SDate, DDate, CDate)
- Teams(TeamId, PrjId, TName, Descr)
((
Π P.PName, T.TName ρ P (Pr ojects ) −>< ρ T (Teams )
_
))
Extended Projection Operation
The extended (or generalization) projection extends the projection operation by allowing
arithmetic functions to be used in the project list to compute and produce new columns.
Example
- Write a relational algebra for calculating the net pay of employees.
Duplicate Elimination
Not that the projection operation and the set operations discussed so far are set operations, that
is the resulting relation is a relation without duplication. But there are cases that may not purely
result relations without duplication (bag operations) in such situations the duplicate elimination
operator (δ) is used to eliminate duplicate tuples from the resulting relation. It is denoted by:
δ (R )
Compiled By: Betiglu
operators but they are used by the grouping operator (γ) that groups tuples according to their
values in one or more attributes. It is denoted by:
γ L (R )
Where L is either the list of grouping attributes in order or list of aggregation functions
applied to the attributes of the relation R.
Example
- Write a relational algebra that determines the number of teams all the employees are
working in.
Sorting Operation
Sorting operator (τ) turns a relation into a list of sorted tuples according to one or more
attributes. The operator is used as a final step since all the other operators work on either a set
or a bag but not in a list. It is denoted by:
τ S (R )
Where S is a list of attributes of R indicating the sort order of the resulting relation.
Example
- Write a relational algebra that presents net pay of employees in order.
{t | p(t )}
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 11
4. Relational Algebra
Where t. is a tuple variable and p(t) a predicate (condition) that is to be true for the tuple t.
Formulas in the predicate of the tuple calculus are composed of atoms, variables and quantifiers
∃ (existential quantifier) and ∀ (universal quantifier)
Example
- Find all the employees that are working on projects having a due date before January 1,
2007.
⎧e.Name|Employees(e ) AND ( (∃et )(∃t )(∃p )(EmpTeams(et ) AND et.EmpId = e.EmpId AND ⎫
⎪ ⎪
⎨ Teams(t ) AND et.TeamId = t.TeamId AND ⎬
⎪ Pr ojects( p ) AND t.Pr jId = p.Pr jId AND p.DDate < 1/ 1/ 2007 ))⎪⎭
⎩
{yz|(∃x )(∃a )(∃b )(Teams(wxyz) AND Pr ojects(abcde) AND b ="banking db" AND x = a)}
Compiled By: Betiglu
ª Introduction
Structured Query Language (SQL) is a query language that is standardized by the American
National Standards Institute (ANSI) for most commercial relational database management
systems (RDBMS). To retrieve or update information users execute 'queries' (SQL Statements)
to pull or modify the requested information from the database using criteria that is defined by
the user.
Unfortunately, there are many different versions of the SQL language, but to be in compliance
with the ANSI standard, they must support the same major keywords in a similar manner (such
as SELECT, UPDATE, DELETE, INSERT, WHERE, and others). Most of the SQL database
programs also have their own proprietary extensions in addition to the SQL standard such as
TSQL of Microsoft SQL Server and PLSQL of Oracle!
SQL supports data definition, query and update in Data Definition Language and Data
Manipulation Language (DML)
Example:
- CREATE DATABASE SWPRJCT
<database_name> is the name of the new database.
The command also has different optional parameters in different RDBMS that helps in specifying
owner, file, growth, …
The primary key constraint in a relation is enforced by using the key word PRIMARY KEY
following the key attribute or incase of multiple attributes it can be specified on a separate line as
shown in the Teams table above.
The referential integrity constraint in a relational database is implemented by the use of a
foreign key. If the referential integrity enforced using a FOREIGN KEY is violated the default
SQL statement forces the rejection of the violating tuple. However, by the use of the optional
referential trigged actions the designer can attach clauses to the foreign key constraint such as:
- ON DELETE {CASCADE | NO ACTION | SET DEFAULT | SET NULL}
- ON UPDATE {CASCADE | NO ACTION | SET DEFAULT | SET NULL }
The default case is NO ACTION, on which the violating action is rejected. CASCADE option
ON DELETE deletes all the referencing rows on deletion of a row. SET DEFALT and SET
NULL allow replacing for all the referencing rows the column value by the default value or null
value. (Microsoft SQL Server 2000 doesn’t support SET DEFALUT and SET NULL)
The ALTER TABLE command allows modification (adding, changing, or dropping) of a column
or constraint in a table.
The syntax for the command is:
ALTER TABLE <table_name>
[ALTER COLUMN <column_name> <new_data_type>] |
[ADD <column_definition> | <constraint>] |
[DROP <column_name> | < constraint>]
- <table_name> is the name of the table to be altered.
- The ALTER TABLE command takes either of the three optional actions ALTER
COLUMN, ADD or DROP. The ALTER COLUMN option modifies an existing column
definition, the ADD option adds a new column or constraint and the DROP option drops
existing column or constraint.
Example 1:
Example 2:
Compiled By: Betiglu
The DROP command is used to drop an exiting table, database or schema. The syntax for the
command is:
DROP TABLE <table_name>
DROP DATABASE <database_name>
DROP SCHEMA <schema_name>
Example:
DROP TABLE Projects
book. The same applies to a database index; it helps to find information about a specific row or
rows without having to search through the entire table.
An index for a table is managed by an external table which consists of the search key (index
attribute) and a pointer to the location of the data as columns.
Creating indexes is a straightforward process when done with the CREATE INDEX statements.
The basic CREATE INDEX statement is:
CREATE [CLUSTERED | NONCLUSTERED] INDEX <index_name>
ON {<table> | <view> } ( <column> [ ASC | DESC ] [ ,...n ] )
Example:
The following statement creates the DueDate nonclustered index on the Projects table:
CREATE INDEX DueDate ON Projects(DDate)
NOTE: A table can have only one clustered index. If a primary key constraint is created on a
table, a clustered index may be created to support the constraint.
The DROP command is also used with indexes to drop an existing database in a table. The
syntax for the command is:
DROP INDEX <index_name> [,...n ]
Example:
DROP INDEX DueDate
- <column_list> is the list of column names whose values are retrieved by the query.
- <table_list> is the list of table names required in the process.
- <condition> is Boolean expression (conditional expression) that determines the rows to be
selected in the query. The expression is build from the logical comparison operators (=, >,
<, >=, <= and <>)
The column list in the SELECT clause can be replaced by an asterisk (*) to retrieve all the
columns in the participating tables.
The WHERE clause is an optional clause needed when a condition is to be set for retrieval of
rows, if the clause is not used in the statement, all the rows for the selected columns in the
specified tables will be retrieved.
Example:
A query to retrieve all the columns for all projects:
SELECT *
FROM Projects
A query to retrieve the name and due date of projects that are not yet completed:
SELECT Name, DDate
FROM Projects
WHERE CDate=NULL
A query to retrieve the projects name and corresponding team names for projects that are
not yet completed:
SELECT Projects.Name, Teams.Name
FROM Projects, Teams
WHERE Projects.PrjId=Teams.PrjId AND CDate=NULL
In SQL queries it may happen that two participating tables have columns with identical names,
to avoid the ambiguity of the columns the name of the table is used together with the column
name as shown above. Ambiguity may also arise if a single table is to participate more than once
in a query, in such situations an alias may be used for the tables as shown in the following query.
Compiled By: Betiglu
Example:
SELECT p.Name, t.Name
FROM Projects AS p, Teams AS t
WHERE p.PrjId=t.PrjId AND CDate=NULL
The SELECT statement by default results a bag of rows rather than a set of rows (i.e. duplicate
rows may exist in the resulting rows). To remove duplicates and have a set of rows as a result
one can the DISTINCT key word on the SELECT clause as follows:
SELECT DISTINCT <column_list>
FROM <table_list>
WHERE <condition>
Example:
A query to retrieve employees name, the projects they are participating and due date of
the project.
SELECT DISTINCT e.Name, p.Name, p.DDate
FROM Employees AS e, EmpTeams AS et, Teams AS t, Projects AS p
WHERE e.EmpId=et.EmpId AND et.TeamId=t.TeamId AND p.PrjId=t.PrjId
If the SELECT is not DISTINCT the resulting table (view) will include identical set of
rows for an employee participating in different teams for same project.
Strings in the WHERE clause can be compared with the use of the comparison operators (=, <,
>, <=, >= and <>) and also the LIKE operator that provides the capability to compare strings
on the basis of pattern match. The expression is of the form:
S LIKE P
Where S is the string or the column name to be compared and p is the pattern constructed from
two special characters:
- _ : refers to a match to any one character in S, and
- % : refers to zero or more character sequences match in S.
String constants in SQL are enclosed by a single apostrophe. If the string consists of an
apostrophe a escape sequence with an apostrophe is used (i.e. two single apostrophes are used to
refer to a single apostrophe in a string constant).
The LIKE expression can also be used with the NOT operation as follows
S NOT LIKE P
Example:
A query to retrieve employees with a name starting by the letter ‘A’.
Compiled By: Betiglu
SELECT *
FROM Employees
WHERE Name=’A%’
INSERT
The INSERT statement adds one or more new rows to a table. In a simplified treatment,
INSERT has this form:
INSERT INTO <table_name>| <view_name> [(column_list)] data_values
- data_values are one or more rows to be inserted into the named table or view.
- column_list is a list of column names, separated by commas, that can be used to specify the
columns for which data is supplied.
If column_list is not specified, all the columns in the table or view receive data. When a
column_list does not name all the columns in a table or view, a value of NULL (or the default
value if a default is defined for the column) is inserted into any column not named in the list. All
columns not specified in the column list must either allow null values or have a default assigned.
The data values supplied must match the column list. The number of data values must be the
same as the number of columns, and the data type, precision, and scale of each data value must
match those of the corresponding column.
There are two ways to specify the data values:
& VALUES (<value_or_expression> [,..n])
& SELECT <subquery>
The VALUES statement inserts a single row with the column values <value_or_expression> in
the columns listed in the INSERT INTO column list. The SELECT subquery is a standard
query that results a temporary table and the resulting rows in the table are inserted to the table
in the INSERT INTO clause. The columns in the subquery need to much the columns in the
columns list.
Example
INSERT INTO Projects(PrjId, Name, SDate)
VALUES (1, 'Test Project', '05-25-2006')
INSERT INTO Teams
VALUES (1, 1, 'Programmers Team 1', Programmers team for project 2.')
INSERT INTO Teams(TeamId, PrjId, Name)
Compiled By: Betiglu
UPDATE
The UPDATE statement changes the existing data in a table. The syntax for the UPDATE
command is:
UPDATE <table_name>| <view_name>
SET <column_name> = <value> [,..n]
WHERE <condtion>
- <value> is new value to be assigned to the column <column_name>
- The WHERE clause specifies the <condition> for selecting the rows to be modified. If the
WHERE clause is not included the update will be done for all existing rows in the table
Example
UPDATE Teams
SET Description = 'Programmers team for project2’
WHERE TeamId = 11
DELETE
The DELETE statement removes row(s) from a table. The syntax for the DELETE command
is:
DELETE FROM <table_name>| <view_name>
WHERE <condtion>
- The WHERE clause specifies the <condition> for selecting the rows to be deleted. If the
WHERE clause is not included the all existing rows in the table will be deleted unless
there is a constraint that protects the deletion of the rows.
Example
DELETE Teams
WHERE TeamId=2
Ñ Nested Queries
SQL SELECT statements can be contained in the WHERE clause of another SQL statement to
form Nested queries. The SELECT statement that contains the nested query is said to be the
outer query. Subqueries in a nested SQL statement can produce scalar value (constant) or table.
Subqueries resulting scalar value can be used in comparison expression of the WHERE clause
similar to constant or column value comparisons. For subqueries that result table special
operators are used in the test expression such as the operator IN that is used to test the existence
of a scalar value in the resulting table.
Example:
Considering the following relations,
- Employees(EmpId, Name, BDate, SubCity, Kebele, Phone, Salary)
- Teams(TeamId, PrjId, Name, Descr)
- EmpTeams(EmpId, TeamId)
- Projects(PrjId, Name, SDate, DDate, CDate, CustId)
- Customers(CustId, Name, Address)
Write a query to retrieve all the projects that are owned by the customer ‘XYZ’. (Assume
name of a customer is unique)
SELECT Name, SDate, DDate, CDate
FROM Projects
WHERE CustId = (SELECT CustId
FROM Customers
WHERE Name=’XYZ’)
Write a query to retrieve employees name and phone that are participating on projects
Compiled By: Betiglu
between.
That is:
(Subquery 1)
UNION | INTERSECT | EXCEPT |
(Subquery 2)
For the set operations the two involving subqueries should select identical columns in their
<column-list> of the SELECT statement. If a need arise an alias (rename) of columns can be
used to have a common set of attributes.
Example:
Write a query to retrieve all the projects that are owned by ‘XYZ’ and ‘Dave’ is not
participating.
(SELECT p.Name, SDate, DDate, CDate
FROM Projects AS p, Customers AS c
WHERE p.CustId = c.CustId AND c.Name=’XYZ’)
EXCEPT
Ñ Joined Tables
Most of the time queries are written to gather information from more than one table. In such
situation the FROM clause in the SELECT statement may consist of the tables lists and a
Compiled By: Betiglu
Cross Product
<left_table> CROSS JOIN <right_table>
The Cross product forms the Cartesian product set from the participating tables.
Natural Join
<left_table> NATURAL JOIN <right_table>
The natural join forms a join of rows with identical values in the common attributes of the
participating tables.
Theta Join
<left_table> JOIN <right_table> ON<condition>
The theta join forms the theta join on the joining condition specified by the ON clause.
Outer Join
<left_table> [RIGHT | LEFT | FULL] OUTER {NATURAL} JOIN
<right_table> {ON<condition>}
The outer join forms a join from all the rows from the right table (if RIGHT is used), or the
left table (if LEFT is used), or both the tables (if FULL is used) to the other table on the
joining condition. When there is no much from the left or right table a NULL value is
replaced to the selected column. The NATURAL operator may be used optionally in the
OUTER JOIN to have a natural condition for the join.
Example:
The query that retrieves employees name and phone that are participating on projects
that are owned by the customer ‘XYZ’ can be modified as follows using joined tables:
SELECT Name, Phone
FROM Employees
WHERE EmpId IN (SELECT EmpId
FROM EmpTeams NATURAL JOIN Teams
WHERE PrjId IN (SELECT PrjId
FROM Projects AS p JOIN Customers AS c
Compiled By: Betiglu
ON p.CustId=c.CustId
WHERE c.Name=’XYZ’))
A query to retrieve all the projects and the teams they consist if any:
SELECT p.Name, p.SDATE, p.DDate, t.Name, t.Descr
FROM Projects AS p LEFT OUTER JOIN Teams AS t ON p.PrjId=t.PrjId
The query returns all the projects, and if the projects are having teams the teams
will be joined with the teams as well. Projects having more than one team will be
joined with each teams in the resulting table.
Summation
SUM (<column_name>)
Returns the sum of values for the numeric field (column) <column_name>.
Average
AVG (<column_name>)
Returns the average of values for the numeric field (column) <column_name>.
Compiled By: Betiglu
Minimum
MIN (<column_name>)
Returns the minimum of values for the numeric field (column) <column_name>.
Maximum
MAX (<column_name>)
Returns the maximum of values for the numeric field (column) <column_name>.
Count
COUNT (<column_name> | *)
Returns the number values in the column <column_name> or the number of rows in the
table if * is used.
Example:
A query that retrieves the number of employees
SELECT COUNT(*)
FROM Employees
A NULL value in the aggregation function is ignored. That is a AVG function will not include
the NULL value in the average calculation in any way, and the COUNT(A) function counts only
the non-null values of column A.
The GROUP BY clause is used after the WHERE clause to group the result of the SQL
statement. The syntax for the clause is:
GROUP BY <group_column>[,..n]
HAVING <condition>
- The GROUP BY clause specifies the order in which the grouping is made.
- The HAVING clause set a condition <condition> for the rows that are included in the
grouping query.
Compiled By: Betiglu
Example:
A query that counts the number of teams a given Project is having
SELECT p.Name, COUNT(t.Name) AS NoOfTeams
FROM Projects p JOIN Teams t ON p.PrjId=t.PrjId
GROUP BY p.Name
To add on a condition for grouping such as only projects that are having teams more
than 2.
SELECT p.Name, COUNT(t.Name) AS NoOfTeams
FROM Projects p JOIN Teams t ON p.PrjId=t.PrjId
GROUP BY p.Name
HAVING COUNT(t.Name)>2
For only the projects that are having a name starting with ‘A’, the query can be written
as:
SELECT p.Name, COUNT(t.Name) AS NoOfTeams
FROM Projects p JOIN Teams t ON p.PrjId=t.PrjId
GROUP BY p.Name
HAVING p.Name LIKE ‘A%’
SUMMARY
In summary the general form the SELECT statement is as follows
SELECT select_list
FROM table_source
[WHERE search_condition]
[GROUP BY group_by_expression]
[HAVING search_condition]
[ORDER BY order_expression [ASC | DESC] ]
The clauses enclosed in the square brace are optional clause that can be omitted.
ª Views
View in the context of SQL is a virtual table that is derived from one or more tables in an
alternative way. View does not necessarily exist in physically form rather they are used for
Compiled By: Betiglu
retrieving data, as well as updating or deleting rows in some other tables in a different form that
is frequently used.
Example:
It may be frequently required to retrieve employees and the projects they are
participating. In such case it would be advantageous to have a view that consists of the
employees and the projects instead of making the join operation all the time.
It should also be noted that as data in the original table changes, so does data in the view. This is
because a view isn't really a table itself, but only a way to look at part of the original table.
As views are generated from tables there will be restrictions on the views that may not allow
data update, insert or delete. But if the views allow data update, rows updated or deleted in the
view are updated or deleted in the corresponding table as well.
The CREATE VIEW statement is used to create views and the syntax is as follows:
CREATE VIEW <view_name>
AS <select_statement>
Example:
Considering the following relations,
- Employees(EmpId, EmpName, BDate, SubCity, Kebele, Phone, Salary)
- Teams(TeamId, PrjId, TeamName, Descr)
- EmpTeams(EmpId, TeamId)
- Projects(PrjId, PrjName, SDate, DDate, CDate, CustId)
A view for having retrieve employees and the projects they are participating.
CREATE VIEW ProjectEmployees
AS SELECT p.*, e.*
FROM Projects AS p NATURAL JOIN Teams AS t NATURAL JOIN EmpTeams AS
et NATURAL JOIN Employees AS e
REMARK: All the columns in the SELECT statement must have valid unique names. Otherwise
an alias has to be used for the columns.
Views as the database objects can also be dropped using the DROP statement as follows:
DROP VIEW <view_name>
Example:
To drop the ProjectEmployees view.
Compiled By: Betiglu
Views can be used in the SELECT statement FROM clause similar to the tables.
Example:
To retrieve all the employees working on ‘XYZ’ Project.
SELECT EmpName
FROM ProjectEmployees
WHERE PrjName=’XYZ’
& Seek Time: time taken for read/write head to locate the proper track (cylinder). Typical
range for seek time is 7 to 10 millisecond.
& Rotational Latency (Delay): time taken to locate the sector containing the first desired
block. Typical rotational latency is 1 cycle per 10 milliseconds.
& Transfer Time: time to transfer data to the memory.
Ñ Data Representation
Data is stored in a form of record that consists of a collection of related data times. The data
items or values form a sequence of bytes that corresponds to particular fields.
Data type representation:
- INTEGER – 4 Bytes
- FLOAT – 4 or 8 Bytes
- DATETIME – 8 Bytes
- CHAR(n) – n Bytes; pad character (┴) is used to fill in unused characters’ bytes.
- VARCHAR(n) – maximum of n+1 Bytes; unused characters’ bytes are ignored.
- Enumerated types – represent integer codes with the request bytes.
Fixed Length Record Representation
Example:
Consider the Employees table:
Employees(EmpId, Name, BDate, Address, Salary)
- EmpId – INTGER – 4 Bytes
- Name – CHAR(30) – 30 Bytes
- BDate – DATETIME – 8 Bytes
- Address – VARCHAR(50) – 51 Bytes
- Salary – FLOAT – 4 Bytes
Length
Block HeaderR1 R2 Rn
The block header may consist of:
- Link to one or more other blocks.
- Information about the block.
- Information about the relation.
- Directory for the offset of each record.
- Block ID.
- Time stamp for the block.
Betiglu
Deletion is expensive: similar to the insertion operation deletion may also involve large
data movement.
Update: may require data reorganization if the updated field is the ordering key,
otherwise update operation is simple operation that requires block reading; modifying the
record and rewriting the block back to disk.
Indexes
Indexes are data structures to organize records via trees or hashing. Like sorted files,
they speed up searches for a subset of records, based on values in certain “search key”
fields. Updates are much faster than in sorted files.
Hashing
A file organization based on hashing provides fast access to records on certain search
condition. In hashing a hash function also know as randomizing function, h() is applied to
the hash field value to yield an address of the disk block in which the record is to be
stored.
B-Trees
The B-Tree data structure can be used as the primary organization of the records. B-Tree
is also used in indexing.
Primary Index
An index record (or index entry) is a separate file from the data file that consists of the search
key values and pointers to one or more records. The file is an ordered record with the search key
and the pointer identifies a disk block and an offset within the block to identify the record. The
primary index is mostly built from the primary key and in this context the record has distinct
values in the search key.
There are two types of ordered indexes namely dense index and sparse index.
Dense Index: has an index record for every search key in the data file. The number of entries in
a dense index is equal to the number of records in the data file.
Sparse (Non-dense) Index: has index entry for only the first records in a block known as
anchor record of the block. The numbers of entries in the index file is equal the number of
blocks for the data file. To locate a record with the help of the search key from a sparse index a
block that is pointed by the index entry with the largest search key value that is less than or
equal to the searched value is read.
NOTE: A single data file can have only one primary or clustering index.
Example:
Index File Data File: Employees Index File Data File: Employees
Key Ptr Name … Salary Key Ptr Name … Salary
Adam Adam 2500 Adam Adam 2500
Charles Charles 1400 Elisabeth Charles 1400
Dave Dave 1200 Kevin Dave 1200
Elisabeth
Helen Elisabeth 1600 Elisabeth 1600
Helen 1500 Helen 1500
John John 1200 John 1200
Kevin
Mary Kevin 2300 Kevin 2300
Mary 1800 Mary 1800
Fig 3. a) Dense index b) Sparse Index for Primary Index
Secondary Index
Secondary indexes provide a secondary way of accessing the data file. Since the data file is not
organized in the search key of the secondary index a block anchor can not be used for having a
sparse index in the secondary index. Hence a secondary is necessarily a dense index. Secondary
indexes enhance the performance of queries that use keys other than the search key of the
primary index.
Betiglu
Example:
Multilevel Indexes
The main reason for having an index file is to have better search algorithm such as binary search
that reduces response time considerably. The binary search requires (log2bi) block access for an
index file having bi blocks. For large data size the index file will also increase in size and it may
not be kept in memory, hence require several disk block reads.
Example:
Consider a data file having 10,000,000 records and the block size is 10 records for
data and 100 for index (block factor). Determine the maximum number of block access
for a data search using:
a. A sequential search with no index.
b. A sequential search on the dense primary index search key.
c. A sequential search on the sparse primary index search key.
d. A binary search on a dense primary index search key, and
e. A binary search on a sparse primary index search key.
Solution
a. 10,000,000/10 = 1,000,000 blocks
b. 10,000,000/100 + 1 = 100,001 blocks
c. (10,000,000/10)/100 + 1 = 10,001 blocks
d. log2(100,000) + 1 = 17 + 1 = 19 blocks
e. log2(10,000) + 1 = 14 + 1 = 15 blocks
Betiglu
REAMRK: Block Factor (bfr) is the ratio of the block size to record size either for the data file
or index file. That is, a number of data or index records that can fit in a block.
To deal this problem, the primary index file (also applicable for secondary index) is treated as the
data file and a sparse index is built on top of it. The idea behind this logic is a multilevel index
that reduces block access for reading the index file as well. The index file that is used for creating
the other primary index is referred to as the first-level index of the multilevel index and the
index on the first index is called second-level index of the multilevel index, and so on.
Example:
Example:
For the previous example consider a two-level index, then
a. A binary search for the dense primary index will result:
10,000,000/100 = 100,000 Æ 100,000/100 = 1,000
log2(1,000) + 1 + 1= 10 + 2 = 12 block access
b. A binary search for the sparse primary index will result:
1000,000/100 = 10,000 Æ 10,000/100 = 100
log2(100) + 1+ 1 = 7 + 2 = 9 block access
For a three-level sparse primary index the binary search will have:
log2(100/100) + 1 + 1+ 1 = 1 + 3 = 4 block access
Ñ B-Tree Index Files
B+-Tree Index Structure
The B+-tree index structure is a form a balanced tree in which every path from the root of the
tree to a leaf of the tree is equal length. Each non-leaf node in the tree has between n/2 and n
Betiglu
The B+-tree index structure imposes performance overhead on insertion and deletion and adds
space overhead too; however it alleviates degradation on performance as the file grows, both for
index lookup and for sequential data scan.
A typical node of a B+-tree index structure is as follows:
P1 K1 P2 K2 P3 Kn‐1 Pn
Fig 6. B+-Tree Node
K1, K2, … Kn-1 are the search keys and P1, P2, … Pn are pointers that point to either a file or
record if the node is a leaf or a next level node in the tree structure otherwise.
Example:
John
P1 K1 P2 K2 P3 Kn‐1 Pn B1 P1 K1 P2 B2 K2 P3 B3 Kn‐1 Pn
(a) Leaf Node (b) Non-leaf Node
Fig 8. B-Tree Node
Example:
Dave John
Ñ Hash Index
The Hashing technique on file organization avoids the need for accessing index structure that
may require more disk access (I/O operation). Using hashing file organization the block of a
record is determined by computing a hash function on the search key. A storage that can store
one or more records having similar hash function result is referred to as bucket.
The hash function takes the search keys and uniformly randomizes the records in the buckets.
& Uniform distribution: the hash function assigns each bucket the same number of search
key values from the set of all possible search key values.
& Random distribution: the hash value will not be correlated to any externally visible
ordering on the search key values; the hash function will appear to be random.
Example:
A hash function that finds the sum of the binary representation of the characters in the
search key value and take modulo to the number of buckets.
Bucket Overflow
The main reasons for bucket overflow are:
Insufficient Buckets: the number of buckets assigned may not be sufficient for the
current data size. The number of buckets (nB) must be chosen in such way that it is
greater than the number records (nT) divided by the number of records that can fit in a
bucket (fT). That is nB>nT/fT.
Skew: some buckets may hold more records than others, and they may go overflow while
the others are still having space. The major reason for skew are:
- Multiple records for same search key,
- Non-uniform distribution of search key by the hash function.
Best solution for the overflow of buckets is the use of dynamic hashing (example: extendable
hashing) that can be modified dynamically to accommodate the growth or the shrinkage of the
database. But if a static has to be used then to avoid the consequences of overflow one can choose
either of the following options:
- Choose a hash function based on the current size, or
- Choose a hash function based on the anticipated size of the file, or
- Periodically organize the hash structure in response to file growth or shrinkage.
Betiglu
Optimizer
Statistics
Data about Data
Then after the “BasicSalary” can be used for column definition data type.
Department of Electrical and Computer Engineering | AAU
EENG 447- Database Systems 2
7. Integrity and Security
The CHECK constraint can also be used in a table definition as a tuple based constraint as
CHECK (<logical_expression>)
Example:
CREATE TABLE Employees (
:
CONSTRAINT EmpDate_Constraint CHECK (EmpDate <= GETDATE())
)
General Constraints
A general constraint or a user defined constraint is an assertion defined by the user requirement.
Domain and Referential integrity constraints are special types of general constraint set by the
requirement.
The syntax for general assertion is:
CREATE ASSERTION <assertion_name> CHECK <predicate>
The <predicate> is a valid conditional expression similar to the <condition> in the WHERE
clause of the SELECT-FROM-WHERE statement.
Example:
Constraint on the number of employees in a team:
CREATE ASSERTION NumberOfTeamMembers CHECK
(8 >= ALL (SELECT EmpId FROM EmpTeams GROUP BY TeamId)
When an assertion is created the system tests it for validity of the predicate and if the assertion
is valid then can only any future modification to the database is allowed.
Ñ Triggers
Triggers are statements that the database management system executes automatically in
response to a modification to the database. Triggers need to specify:
- The event that will cause or initiate the trigger execution,
- Condition to be specified for the trigger execution to proceed, and
- The action to be taken in response.
condition ?
event ⎯⎯ ⎯ ⎯ ⎯⎯→ action
Betiglu
The trigger action may be used to inform respective administrators to take actions through
email, or it may execute some operation in response.
The trigger events are:
- INSERT, DELETE, UPDATE and SELECT.
The actions for the triggers may be taken:
- After successful completion of the operation (event): AFTER
- Before the execution of the operation (event): BEFORE (INSTEAD OF)
The syntax for the trigger statement is:
CREATE TRIGGER <trigger_name>
ON {<table>|<view>}
{FOR | AFTER | INSTEAD OF} {[INSERT] | [UPDATE] | [DELETE] |
[SELECT]}
AS
<SQL_Statement>
- Human
Database System Security
Database system security can be implemented with the use of:
- Account and Role Creation,
- Privilege granting,
- Privilege revocation, and
- Security level assignment.
Ñ Authorization
Authorization levels in a database system can be set at broad categories as:
Data Level Authorization
- Read
- Insert
- Update
- Delete
Schema Level Authorization
- Index
- Resource
- Alter
- Drop
Privilege Granting
The syntax for privilege granting is as follows:
GRANT <privilege_list> ON {<table>|<view>}
TO <account_list> [WITH GRANT OPTION]
<privilege_list> is possible data level authorization for the table or view stated as:
{SELECT | INSERT | UPDATE | DELETE | ALL}
To grant access to a specific column in a table
GRANT {SELECT | UPDATE } (<column>) ON {<table>|<view>}
TO <account_list> [WITH GRANT OPTION]
Betiglu
To grant access for the column to be referenced in a foreign key or view that requires schema
building use:
GRANT REFERENCES (<column>) ON {<table>|<view>}
TO <account_list> [WITH GRANT OPTION]
Privilege Revoking
The syntax for privilege revoking is as follows:
REVOKE <privilege_list> ON {<table>|<view>}
FROM <account_list> [RESTRICT | CASCADE]
To revoke grant option from an account:
REVOKE GRANT OPTION FOR <privilege_list> ON {<table>|<view>}
FROM <account_list>
Privilege Denying
The syntax to deny a privilege from an account list is:
DENY <privilege_list> ON {<table>|<view>}
TO <account_list> [CASCADE]
decryption
P ←⎯ ⎯ ⎯ ⎯ ⎯⎯ C
K
(P )
Betiglu
Modern Cryptography systems can be broadly classified into Symmetric-key systems that use a
single key that both the sender and recipient have, and Asymmetric-key systems also known as
public-key systems that use two keys, a public key known to everyone and a private key that
only the recipient of messages uses.
Symmetric Key Algorithms
- DES (Data Encryption Standard)
- IDEA (International Data Encryption Algorithm)
Asymmetric Key Algorithms
- RSA (Rivest, Shamir and Adleman)
- DSA (Digital Signature Algorithm )
Ñ Authentication
Authentication is a process of verifying the identity of a user who is claimed to be.
There are two ways of authenticating a user:
- Use of Password
- Challenge Response.
With the use of a password a user is requested for user name and password upon login to a
system.
In a challenge response, the system sends a challenge string to the user upon login request; then
the user encrypts the message and sends the encrypted message to the system. The system
verifies the user by comparing the originally send challenge string and decrypted message
received from the user. For the encryption process a symmetric-key or a public-key algorithm
may be used. In the symmetric-key algorithm the shared key is saved in the system where as in a
public-key algorithm the public key is the only key know by the system and the private key
remains secret with the user.
Betiglu
OOA views the world as objects with data structures and behaviors and events that trigger
operations, or object behavior changes, that change the state of objects. The idea that a system
can be viewed as a population of interacting objects, each of which is an atomic bundle of data
and functionality, is the foundation of object technology and provides an attractive alternative
for the development of complex systems. This is a radical departure from prior methods of
requirements specification, such as functional decomposition and structured analysis and design
Object-oriented design (OOD) is concerned with developing an object-oriented model of a
software system to implement the identified requirements. Many OOD methods have been
described since the late 1980s. The most popular OOD methods include Booch, Buhr,
Wasserman, and the HOOD method developed by the European Space Agency OOD can yield
the benefits of maintainability, reusability and productivity.
OOD builds on the products developed during Object-Oriented Analysis (OOA) by refining
candidate objects into classes, defining message protocols for all objects, defining data structures
and procedures, and mapping these into an object-oriented programming language (OOPL) (see
Object-Oriented Programming Languages). Several OOD methods (Booch, Shlaer-Mellor, Buhr,
Rumbaugh) describe these operations on objects, although none is an accepted industry standard.
Analysis and design are closer to each other in the object-oriented approach than in structured
analysis and design. For this reason, similar notations are often used during analysis and the
early stages of design. However, OOD requires the specification of concepts nonexistent in
analysis, such as the types of the attributes of a class, or the logic of its methods.
Design can be thought of in two phases. The first, called high-level design, deals with the
decomposition of the system into large, complex objects. The second phase is called low-level
design. In this phase, attributes and methods are specified at the level of individual objects. This
is also where a project can realize most of the reuse of object-oriented products, since it is
possible to guide the design so that lower-level objects correspond exactly to those in existing
object libraries or to develop objects with reuse potential. As in OOA, the OOD artifacts are
represented using CASE tools with object-oriented terminology.
3. Extent declaration: - name for the set of currently existing objects of the class.
Department of Electrical and Computer Engineering | AAU
EENG 447- Database Systems 4
8. Object Oriented Databases
Attributes in ODL
Attributes are (usually) elements with a type that does not involve classes. They can be of simple,
enumerated or structured type. The syntaxes for the three types are as follows, respectively:
attribute <type> <name>;
attribute enum <typename>
{<enumlist1>, <enumlist2>,…}<name>;
attribute struct <typename>
{<type> <name1>, <type> <name2>,…}<name>;
Example:
- Consider the “EMPLOEES” class partial declaration.
class Employee {
attribute string empid;
attribute string name;
attribute integer age;
attribute enum Gender {Male, Female} gender;
attribute struct Address {string city, string hAddr, string phone} address;
};
REMARK: The names for the enumerated and structured data types are not necessary for the
declaration but giving the name helps to refer to the type outside the class
declaration using the scoped name such as, “Employees::Gender” and
“Employees::Address”.
Relationships in ODL
Syntax for a relationship in ODL is as follows
relationship <type> <name>
inverse <relationship>;
Betiglu
The basic collection types of attributes and relationships in ODL model are:
1. Set:- Set<T> denotes a relationship to class T with finite number of association between
the objects in the class. It is an unordered set of association which doesn’t allow
repetition.
2. Bag (Multiset):- It is similar to the Set operator which allows repetition of association to
one object.
3. List:- It is an association in which order is material.
4. Array:- Array<T, i> denotes a fixed number of association to objects in the class T which
are indexed.
Inverse Relationships
Unlike E/R design the relationships in ODL model are only binary. Hence for every relationship
in class C there is an inverse relationship in the related class D. Suppose class C has a
relationship R to class D, then class D must have some relationship S to class C. R and S must
then be true inverses. That is; if object d is related to object c by R, then c must be related to d by
S.
Example:
- Consider the “EMPLOYEES” class and its relationship to “TEAMS” class.
class Employee {
relationship Set <Team> assigned;
Betiglu
};
class Team {
relationship Set <Employee> formed;
};
Multiplicity of Relationships
Multiplicity of relationships in ODL design is indicated by the type operators in the relationship
and the inverse relationship.
Many-to-many relationships: - are indicated by Set<…> for the type of the relationship
and its inverse.
One-to-many relationships: - have Set<…> in the relationship of the “one” and just the
class for the relationship of the “many.”
One-to-one relationships: - have classes as the type in both directions.
REMARK: Note that the Set operator in the relationship type declaration is optional and can be
omitted for one-to-many and one-to-one relationships.
Example:
- In the previous relationship declaration between “EMPLOYEES” class and the
“TEAMS” class there is a many-to-many relationship.
class Employee {
relationship Set <Team> assigned
inverse Team::formed;
};
Betiglu
class Team {
NOTE: Recall that ODL does not support 3-way or higher relationships. Multiway relationships
in ODL may be simulated by a “connecting” class, whose objects represent tuples of
objects that will be connected by the multiway relationship.
Inheritance in ODL
Inheritance in ODL is similar to the usual object-oriented inheritance principle. It indicates a
relationship between superclass and subclasses. Subclass lists only the properties unique to it and
it inherits its superclass’ properties.
Inheritance in ODL design is indicated by the colon operator as follows
class <Subclass>:<Superclass> {
<list of element declarations>
};
Betiglu
Example:
A key consisting of more than one attribute needs additional parentheses around those
attributes.
Example:
class Employee (key EmpId, NationalId, (Name, BDate)) {
attribute string empid;
attribute string name;
:
relationship Set <Teams> assigned;
inverse Teams::formed;
:
};
For each class in ODL there is an extent, the set of existing objects of that class. Extent is the
relation with that class as its schema (definition). It is indicated after the class name, along with
keys, as:
(extent <extent name> key <list of keys>)
Conventionally, singular nouns are used for class names and plural for the corresponding extent.
Betiglu
Example:
class Employee (extent Employees key EmpId, NationalId, (Name, BDate)) {
attribute string empid;
attribute string name;
:
relationship Set <Teams> assigned;
inverse Teams::formed;
:
};
Set-valued attributes: Recall that attributes in ODL class can be constructed from the type
constructors Set, Bag, List and Array. Such cases can be handled by three different methods:
1. By making one tuple for each value.
Example:
- Consider the “EMPLOEES” class having a set of addresses:
class Employee (extent Employees key EmpId, NationalId, (Name, BDate)){
:
attribute struct set Address {string city, string hAddr, string phone} address;
};
Employees
EmpId Name BDate Sub City Kebele Phone
E001 Alemu Girma 01/10/70 Bole 06 011-663-0712
E001 Alemu Girma 01/10/70 Bole 06 011-663-0712
E004 Kelem Belete 12/04/68 Gulele 03 011-227-2525
};
class Team {
relationship Set <Employee> formed;
inverse Employee::assigned;
};
commonality in the architecture of the different OODBs because of three necessary components:
object managers, object servers, and object stores. Applications interact with object managers,
which work through object servers to gain access to object stores. OODBs provide the following
benefits:
OODBs allow for the storage of complex data structures that cannot be easily stored
using conventional database technology.
OODBs support all the persistence necessary when working with object-oriented
languages.
OODBs contain active object servers that support not only the distribution of data but
also the distribution of work (in this context, relational database management systems
(DBMS) have limited capabilities)
In addition, OODBs were designed to be well integrated with object-oriented programming
languages such as C++ and Smalltalk. They use the same object model as these languages. With
OODBs, the programmer deals with transient (temporary) and persistent (permanent) objects in
a uniform manner. The persistent objects are in the OODB, and thus the conceptual walls
between programming and database are removed. As stated earlier, the employment of a unified
conceptual model greatly simplifies development.
The type of database application should dictate the choice of database management technology.
In general, database applications can be categorized into two different applications:
Data collection applications focus on entering data into a database and providing queries
to obtain information about the data. Examples of these kinds of database applications are
accounts payable, accounts receivable, order processing, and inventory control. Because
these types of applications contain relatively simple data relationships and schema design,
relational database management systems (RDBMs) are better suited for these
applications.
Information analysis applications focus on providing the capability to navigate through
and analyze large volumes of data. Examples of these applications are CAD/CAM/CAE,
production planning, network planning, and financial engineering. These types of
applications are very dynamic and their database schemas are very complex. This type of
application requires a tightly-coupled language interface and the ability to handle the
creation and evolution of schema of arbitrary complexity without a lot of programmer
intervention. Object-oriented databases support these features to a great degree and are
therefore better suited for the information analysis type of applications.
OODBs are also used in applications handling BLOBs (binary large objects) such as images,
sound, video, and unformatted text. OODBs support diverse data types rather than only the
simple tables, columns and rows of relational databases.
Betiglu
work via an object layer that sits atop a conventional tabular relational engine. Vendors
integrate OO features into the databases via software modules (such as Informix’s
DataBlades or Oracle’s Cartridges), each designed to handle video, audio, text, or other
types of media. So, in addition to handling the numerical data generally used in relational
databases, OR databases can handle multimedia data types.
Performance. OO databases can store data sets in their entirety and thus typically run
faster than relational databases, which must break data sets into parts for storage within
tables and then reassemble them in response to queries. In addition, said Sun’s Cattell,
OO databases can automatically cache data in the client application’s memory, thereby
eliminating extra calls to the DBMS’s back end and speeding up responses. And OO
databases use optimizers that determine the best way to use a database’s indices and
physical layout to satisfy a query. However, relational databases have reduced OO
databases’ performance advantage with improved optimizers. The optimizers improve
ways of finding information within relational databases’ tables and indices.
Standardization. Relational databases use the long-established SQL (Structured Query
Language) standard, which has been adopted by the International Organization for
Standardization (ISO) and the American National Standards Institute (ANSI). SQL, used
for querying and updating a relational database, serves as a user interface and application
program interface to an RDBMS.
Betiglu
9. Int
troduc tion to
o Paral
llel an d Distr
ributed
d
Dattabase System
ms
A databbase system can have vaarious architeecture such as:
Client – Seerver Datab
base System
m: A system with
w task shaare between
n server and client.
Parallel Database
Da Sysstem: Speeds up processsing within a system byy the use off parallel
query proceessing.
Distributed
d Databasee System: Data
D are distrributed acrooss sites keepping them closer
c to
where they are generateed and needed often.
Ñ Clientt – Serve r System
m
In a clieent-server sy
ystem the daatabase funcctionalities are
a broadly divided
d into:
Server, and
Client
Departm
ment of Electrrical and Com
mputer Engineeering | AAU
EENG 4447- Database Systems 2
9. Introduuction to Parallel and Distribbuted Databasee Systems
Ñ Paralllel Datab
base Systtems
Parallell Systems im
mprove processing and d I/O speeds. Parallel Database
D Syystems uses parallel
processsing to improove query peerformance.
Important issues in
n parallel dattabase systems (parallel systems) arre:
Speed Up – response tim
me
Scale Up – throughput
t
Shared Mem
mory Shared Nothing
N
Share Disk
Hierarch
hical
Betiglu
B ti l
Departm
ment of Electrrical and Com
mputer Engineeering | AAU
EENG 447- Database Systems 3
9. Introduction to Parallel and Distributed Database Systems
I/O Parallelism
I/O parallelism in a parallel database system refers to reducing the time required to retrieve
relations (data) from disk by partitioning the relations on multiple disks.
Horizontal Partitioning is a method that can be used in I/O parallelism that clusters tuples of a
relation. Some of the partitioning techniques are:
Round-Robin
Hash
Range
Interquery Parallelism
In interquery parallelism queries or transactions are executed in parallel. The primary use is to
Scale Up transaction processing system.
It is harder to implement in share nothing and shared disk architectures as introduces Cache –
coherency problem that can be handled by the use of locking mechanizim.
Lock the page – read/write – flush page – release lock.
Intraquery Parallelism
Intraquery parallelism refers to executing a single query or transaction in parallel on multiple
processors and disks. Its primary use is Speed Up running query.
Intraoperation Parallelism: parallelizing the execution of each individual operations
such as: sort, select, …
Interoperation Parallelism: parallelizing the different operations in query execution.
Betiglu
Ñ Distri buted Da
atabase Systems
The daatabase is stoored on sevveral computters (known
n as sites or nodes) thatt communicaate on a
network.
Main reason
ns for distribbuted databaase:
¾ Sharring: share data
d across sites.
s
¾ Auto
onomy: each
h site has a degree
d of control over th
he data sharred locally.
¾ Availability: if one
o fails oth
her sites will remain in service.
Implementaation issues
¾ Atom
micity
¾ Tran
nsaction Co
ommit Prottocols
¾ Con
ncurrency Control
C
Locking,, deadlock haandling
Complexity
y
¾ Softw
ware develoopment cost
¾ Greaater potentiaal for bug
¾ Incrreased processsing overheead
Betiglu
B ti l
Departm
ment of Electrrical and Com
mputer Engineeering | AAU
EENG 447- Database Systems 5
9. Introduction to Parallel and Distributed Database Systems
Distributed
Databases
Homogenous Heterogeneous
(Unfederated) (Federated)
Fragmentation
¾ Horizontal Fragmentation
A relation, r is fragmented into a subset of relations r1, r2, r3, … rn
r = r1 ∪ r2 ∪ r3 ∪ … rn
¾ Vertical Fragmentation
The schema of the relation, r(R) is fragmented into a subset schemas R1, R2, R3, …
Rn
R = R1 ∪ R2 ∪ R3 ∪ … Rn
The fragmentation is done to have
r = r1 ZY r2 ZY r3 ZY … rn