Dbms 2
Dbms 2
Introduction
The Entity-Relationship model (or ER model) is a way of graphically representing the logical
relationships of objects in order to create a database. Creation of an ER diagram is the first step
in designing a database. It helps the designer(s) to understand and to specify the desired
components of the database and the relationships among those components. An ER model is a
graphical representation which contains entities or "items", relationships among the entities and
attributes of the entities and relationships.
The following are the three basic elements in the ER model.
Let us take University database as an example and try to understand how ER model is arrived at.
Example:
A university consists of a number of departments. Each department offers several courses. Each
course includes a number of modules. Students enroll in a particular course and study modules
towards the completion of that course. Each module is taught by a lecturer from the appropriate
department, and each lecturer teaches a group of students.
Entities
Entities are real world items or concepts that exist on their own and are represented as objects or
things of interest. An entity type is a collection of entities that share a common definition.
Identify all nouns in our university example,
A university consists of a number of departments. Each department offers several courses. Each
course includes a number of modules. Students enroll in a particular course and study modules
towards the completion of that course. Each module is taught by a lecturer from the appropriate
department, and each lecturer teaches a group of students.
This scenario consists of students, lecturers, modules, courses and departments. So here the
physical things(Physical things are those which exist in this world, that we can touch, feel etc.)
like students, lecturers and abstract things(An abstract thing is an idea or a concept in your mind.
It is not something that you can physically reach out and touch, smell, hear, taste, see) like
modules,department etc., make an entity type. If we take students as an entity type, then each
student in the university is an entity. The entities are represented as nouns in the description
because they are objects or things.
We can touch an entity of physical things and feel the entity of abstract things but an entity type
is simply an idea. Student is an idea of physical things (entity type) while Scott, Nancy, Lindsey,
and Mackenzie are touchable (Student names are entities). Department is an idea of abstract
things (entity type) while IT,CSE,ECE and CIVIL are entities.
Entity Diagrams
The box is labeled with the name of the entity type. The entities identified in our
example are shown in Figure 2.1.
Weak Entity
If an entity depends on another existing entity then it is considered as weak. A weak entity cannot
be identified by its own attributes. A weak entity is represented by double rectangles in E-R
diagram.
Example:
SubModule is a good example for weak entity. The SubModule will be meaningless without a
Module entity and so it depends on the existence of Module as shown in Figure 2.2
Attributes
Attributes represent properties, facts, aspects or details of an entity. There are attributes or
particular properties that describe each entity.
In our University database each student in the university will have a Student ID, Name, Course
taken etc. Similarly each lecturer will have his/her own properties of ID, Name, department etc.
Attributes will have a name, an associated entity and properties of an entity. Attributes are often
nouns also.
Attributes in ER diagram
Multivalued Attribute
A multivalued attribute is an attribute that has more than one value attached to it. For instance if
phone number and graduating degree are the attributes of an Entity called Person, then those
attributes could have multiple values, as a person could have multiple phone numbers or could
hold multiple graduating degrees. We represent a multivalued attribute by double oval in E-R
diagram.
Single Valued Attribute: Attribute that holds a single value; in Our example the attributes of
Students such as Roll number, Age, Date of Birth, City etc., can have only a single value.
In our example, a Student can have multiple phone numbers, and so Phone number is a
multivalued attribute.
Figure 2.4 : Multivalued Attributes
Relationships
The association between two or more entities is called a relationship. In our University database,
each student studies several Modules and each Lecturer teaches several Students. Here the entity
types Student - Modules and Lecturer - Students have a relationship. The Verbs most often
describe relationships between entities.
Identify the verbs(relationships) in our University database example:
A university consists of a number of departments. Each department offers several courses. Each
course includes a number of modules. Students enroll in a particular course and study modules
towards the completion of that course. Each module is taught by a lecturer from the appropriate a
department, and each lecturer teaches a group of students.
Each relationship has a name, a set of entities that participate in it, a degree and a cardinality
ratio. The degree is the number of entities that participate in that relationship(most have degree 2,
For example in figure 2.3 each Lecturer teaches several Students, so we can say that this
relationship has degree 2. Here the degree is 2 because it has two entities related to it).
Relationships in an ER diagram
Relationships are denoting links between two entities.
The name of the relationship is given in a diamond box (For example Belongs to as
shown in Figure 5.1).
Cardinality Ratio
Each entity can be involved in three types of relationships as shown:
One to One (1:1)
Each student belongs to one University. We can illustrate this ratio by writing ones
on the lines indicating the relationship as shown in Figure 2.5.
Figure 2.5 : One-one Mapping
A lecturer teaches many students, and this One to Many relationship is illustrated in
figure 2.7.
Figure 2.7 : One-Many
Each student takes many modules, and each module is taken by many students as
shown in figure 2.9.
Figure 2.9 : Many-Many
Entities
Attributes
Relationships
Cardinality ratios
Now lets see how an ER model will look like when all these elements are put together. The final
ER Model of our University database is shown in the Figure 2.10. In this figure we have shown
the entities and the relationship between the entities which depict the complete ER model of a
University. Here Department, Course, Module, Lecturer and Student are the entities.
The relationships in the Figure 2.10 are defined as Department Offers many Courses and those
two entities have One to Many relationship. A Department Assigns Many Lecturers(One(1) To
Many(n)). Each Lecturer teaches Many Students(One(1) To Many(n)). Every Student takes
several Modules(Many(n) To Many(n)). Every Module includes Many Courses(Many(n) To
Many(n)). A Course is enrolled by Many Students(One(1) to Many(n)).
Summary
2.2. Normalization - First Normal Form, Second Normal Form and Third
Normal Form
The database design technique that is used to organize tables in a manner that reduces
redundancy and dependency of data is called Normalization. It is the scientific process of
decomposing complex tables(Relations) into smaller and easily manageable tables. The use of
normalization is to accurately access data from database. Without normalization, database
systems can be inaccurate, redundant, slow and inefficient. They might not produce the data that
is expected. Listed below are the advantages of normalization.
Advantages
Helps to avoid update anomalies. That is, it isolates data so that additions, deletions, and
modifications of a field can be made in just one table. The changes are then propagated to
the rest of the database through the defined relationships.
Theory of Normalization is still developing. For example, the discussions on 6th Normal Form
are in progress. However, in most practical applications normalization achieves its best in Third
Normal Form. The evolution of Normalization theories is illustrated below:
What is a KEY ?
A KEY is a value used to uniquely identify a row in a table. It could be a single column or a
combination of multiple columns.
Note: The columns in a table that are NOT used to uniquely identify a record or row in a table
are called non-key columns.
The primary key column in a table cannot have duplicate values. Each primary key value
must be unique.
The primary key values cannot be modified.
The primary key column should have a value when a new record is inserted into the table.
Example:
The table below contains the details of students. Here studentId is Primary Key which is used to
uniquely identify the details of a student from the table.
Composite Key
If two or more columns are used to uniquely identify a record then combination of those multiple
columns constitutes a composite key.
In the Student table given below, we have StudentId, TestId and Mark. Here one student can take
multiple tests and one test can be taken by multiple students. In this case in order to uniquely
identify the mark of a student in a test we require both StudentId and TestId. This is a composite
key.
Student Table
Table 2.1
Functional Dependency
In simple terms, functional dependency can be explained as follows. If you know one attribute
then you can get another attribute. Then both these attributes are said to be functionally
dependent. In the Student table given below, we can get the attribute 'Name' if you know the
attribute 'StudentId', then Name and StudentId are functionally dependent. Here we can say
StudentId is determinant and Name as dependent.
For example, let's consider the Student table given below. Table 2.2 stores student
details(StudentId, Name, Languages Known), student's department details (Dept_No,
Dept_Name) and lecturer details (LecturerInCharge, Designation) for Students.
In this approach, we keep repeating the languages known and department details data for all the
students in the same field. This is called an UnNormalized table. Instead of storing the same data
again and again, we could normalize the data and create related tables.
Let's see how we can normalize the table,create related tables and learn forms with the Student
table(which is not normalized):
Table 2.2
Table 2.2 is not in 1NF since there are repeating groups (more than 1 value in a field). The
column "Languages Known" has(English, Hindi and Tamil) in the Row(Tuple)1 and (English
and Hindi) in the Row(Tuple) 2 .To satisfy 1NF we can create separate rows for each value in
Languages Known by duplicating the values in the remaining columns. Table 2.3 represents the
same.
1NF Rules
2NF Rules
A relation in 1NF will be in second normal form (2NF) if there are no partial
dependencies.
Partial dependency
It is the functional dependency on part of the primary key instead of the entire primary key.
It is clear that we can't move forward to make our simple database in 2nd Normalization form
unless we partition the columns in Table 2.3. Here, assume that StudentId and Dept_No together
act as the key (Composite key). As per 2NF all non-key attributes must be dependent on whole
key.
Student
Table 2.4
Department
Table 2.5
Languages
Table 2.6
Table 2.7
Figure 2.15 : Foreign Key
Foreign key refers primary key of another table. It helps to connect the two tables.
The foreign key ensures that a row in a table is mapped to a corresponding row in another
table.
Foreign key does not have to be unique; most often it is not unique.
Foreign Key
For example, consider the Figure 2.16 given in the previous page, where Dept_No in the Student
table is foreign key of Dept_No in Department table. Here let's try to add a student with
StudentId as "103" and Dept_No as "D003" in Student table as shown below. But the entry for
Dept_No "D003" is not present in Department table which means we have added a student to a
department which does not exist. This leads to inconsistency of data across related tables. Hence
RDMS has the concept of referential integrity which does not allow to add a record to the table
that contains the foreign key unless there is a corresponding record in the table to which it is
linked.
Student
Table 2.8
Department
Table 2.9
Consider the table 2.9. Changing the non-key column Lecturer In Charge , may change
Designation. Here Dept_No acts as the key. All other columns are non-key attributes. As per 3NF
non-key attributes should not be dependent on any other non-key attributes but 'Lecturer In
Charge' is dependent on 'Designation'. Both Lecturer In Charge and Designation are non-key
attributes. So it forms transitive dependency. So, to satisfy 3NF let's split the table in a short
while.
Third Normal Form
Third normal form (3NF) is the third step in database normalization and it builds on the first
(INF)and second normal forms(2NF).
The Third Normal Form(3NF) states that all column references in the referenced data that are not
dependent on the primary key should be removed. Another way of putting this statement is that
only foreign key columns should be used to reference another table, and the other columns from
the parent table should not exist in the reference table.
The Second Normal form(2NF) covers in case of multi-column primary keys. 3NF is meant to
cover single column keys as mentioned in transitive functional dependencies above.
3NF Rules
Rule 2- The table has no transitive functional dependencies which is explained above.
We need to divide our table if it has to be moved from second normal form(2NF) into Third
Normal form(3NF). In table 2.1 Dept_No acts as the key. All other columns are non-key
attributes. The non-key attributes should not be dependent on any other non-key attributes as per
third normal form. The 'Designation' is dependent on 'Lecturer In Charge' and these are non key
attributes in the Lecturer table explained. It forms transitive dependency. So, to satisfy 3NF split
the table as follows.
Student
Table 2.10
Department
Table 2.11
Lecturer
Table 2.12
Languages
Table 2.13
The example given above cannot be decomposed further to attain higher forms of normalization
because it is already normalized to the highest level.Normally only complex data bases would
need next levels of normalization.
2.3. Joins
Note: The query will give results from two tables as Cartesian product(A Cartesian product is
defined as all possible combinations of rows in all tables). If join condition is omitted. The first
table's rows are joined with all rows of the second table. For example, if the first table has 30
rows and the second table has 10 rows, the result will be 30 * 10, or 300 rows. This query will
take a long time to execute.
Let's use the two tables below to explain the join conditions.
Table "Student"
Table 2.14
Table "Department"
Table 2.15
In the above example the column that is common between both the tables is Dept_No. Using
Dept_No,the Student and Department tables can be joined to combine data from both the tables
as shown below.
Lets consider a scenario to retrieve the details of student who belong to 'CSE' department. We
have to join two tables based on the common column present in the two tables.
Figure 2.18 : Mapping data
2.4. Summary
The database design technique which is used to organize tables in a manner that reduces
redundancy and dependency of data is called as Normalization.
There are three forms of normalization. They are First Normal form(1NF),Second
Normal form(2NF) and Third Normal form(3NF).
A key is a value used to uniquely identify a row in a table. One or more columns could be
used to form a key for a table.
A primary key is a single column value used to identify a database record uniquely.
A composite key is a primary key derived by combining multiple columns and is used to
identify a record uniquely.
The field in a table which matches the primary key column of another table is called as
foreign key. The cross-reference tables can be achieved by foreign key.
First Normal Form-The multi-valued attributes (called repeating groups) should be
removed i.e. elimination of repeating groups. All attributes must be atomic.
Second Normal Form- Partial functional dependencies must be removed. The attributes
that are not a part of the key should be dependent on the entire key for that entity.
Third normal Form- States that all column reference in referenced data that are not
dependent on the primary key(transitive dependency) should be removed.
Join is a means of combining fields from two tables by using values common to both. It
allows to combine data from more than one table into a single result set.
EXTRA:
Database Design
Goal of design is to generate a formal specification of the database schema
Methodology:
2. Then convert E-R diagram to SQL DDL, or whatever database model you are
using
E-R Model is not SQL based. It's not limited to any particular DBMS. It is a conceptual and
semantic model captures meanings rather than an actual implementation
Entity rectangle
Attribute oval
Relationship diamond
Link - line
Entity Type: set of similar objects or a category of entities; they are well defined
Attribute: describes one aspect of an entity type; usually [and best when] single valued and
indivisible (atomic)
May be composite attribute has further structure; also use oval for
composite attribute, with ovals for components connected to it by lines
Entity Types
An entity type is named and is described by set of attributes
Note that the value for an attribute can be a set or list of values, sometimes
called "multi-valued" attributes
This is in contrast to the pure relational model which requires atomic values
Entity Schema:
The meta-information of entity type name, attributes (and associated domain), key constraints
Entity Types tend to correspond to nouns; attributes are also nouns albeit descriptions of the
parts of entities
May have null values for some entity attribute instances no mapping to domain for those
instances
Keys
Superkey: an attribute or set of attributes that uniquely identifies an entity--there can be many of
these
Candidate key: a superkey such that no proper subset of its attributes is also a superkey
(minimal superkey has no unnecessary attributes)
Primary key: the candidate key chosen to be used for identifying entities and accessing records.
Unless otherwise noted "key" means "primary key"
Secondary key: attribute or set of attributes commonly used for accessing records, but not
necessarily unique
Foreign key: term used in relational databases (but not in the E-R model) for an attribute that
is the primary key of another table and is used to establish a relationship with that table where it
appears as an attribute also.
So a foreign key value occurs in the table and again in the other table. This conflicts with the
idea that a value is stored only once; the idea that a fact is stored once is not undermined.
Graphical Representation in E-R diagram
Rectangle -- Entity
Ellipses -- Attribute (underlined attributes are [part of] the primary key)
Dashed ellipses-- derived attribute, e.g. age is derivable from birthdate and current date.
[Drawing notes: keep all attributes above the entity. Lines have no arrows. Use straight lines
only]
Relationships
Relationship: connects two or more entities into an association/relationship
Relationships tend to be verbs or verb phrases; attributes of relationships are again nouns
[Drawing tips: relationship diamonds should connect off the left and right points; Dia can label
those points with cardinality; use Manhattan connecting line (horizontal/vertical zigzag)]
2000 describes the relationship - it's the value of the since attribute of
MajorsIn relationship type
The role of a relationship type names one of the related entities. The name of the entity is usually
the role name.
e.g., "John" is value of Student role, "CS" value of Department role of MajorsIn
relationship type
We do not have distinct names for the roles. It is not clear who reports to whom.
Solution: the role name of relationship type need not be same as name of entity type from which
participants are drawn
Values of Subordinate and Supervisor both drawn from entity type Employee
Roles are edges labeled with role names (omitted if role name = name of entity set). Most
attributes have been omitted.
Relationship Type
Relationship types are described by the set of roles (entities) and [optional] attributes
Think that entities are nouns; relationship types are often verbs
students and departments are the entities (nouns) and roles in relationship
types
Here we have equate the role name (Student) the name of the entity type (Student) of the
participant in the relationship.
Degree of relationship
The number of roles in the relationship
Note: ternary relationships may sometimes be replaced by two binary relationships (see book
Figures 3.5 and 3.13). Semantic equivalence between ternary relationships and two binary ones
are not necessarily true.
Cardinality of Relationships
Cardinality is the number of entity instances to which another entity set can map under the
relationship. This does not reflect a requirement that an entity has to participate in a relationship.
Participation is another concept.
One-to-one: X-Y is 1:1 when each entity in X is associated with at most one entity in Y, and
each entity in Y is associated with at most one entity in X.
One-to-many: X-Y is 1:M when each entity in X can be associated with many entities in Y, but
each entity in Y is associated with at most one entity in X.
Many-to-many: X:Y is M:M if each entity in X can be associated with many entities in Y, and
each entity in Y is associated with many entities in X ("many" =>one or more and sometimes
zero)
Relationship
Participation
Constraints
Total participation
Represented by double
line from entity
rectangle to relationship diamond
E.g., A Class entity cannot exist unless related to a Faculty member entity in
this example, not necessarily at Juniata.
Key constraint
Partial participation
Not every entity instance must participate
E.g., A Textbook entity can exist without being related to a Class or vice
versa.
Weak entity may have a partial key, called a discriminator, that distinguishes instances of the
weak entity that are related to the same strong entity
Use double rectangle for weak entity, with double diamond for relationship connecting it to its
associated strong entity
Note: not all existence dependent entities are weak the lack of a key is essential to definition
Role names, Ri, and their corresponding entity sets. Roles must be single valued (the number of
roles is called its degree)
Attribute names, Aj, and their corresponding domains. Attributes in the E-R model may be set or
multi-valued.
Key: Minimum set of roles and attributes that uniquely identify a relationship
The most common type of join is: SQL INNER JOIN (simple join). An SQL INNER JOIN
return all rows from multiple tables where the join condition is met.
10308 2 1996-09-18
10309 37 1996-09-19
10310 77 1996-09-20
CustomerI
CustomerName ContactName Country
D
Notice that the "CustomerID" column in the "Orders" table refers to the "CustomerID" in the
"Customers" table. The relationship between the two tables above is the "CustomerID" column.
Then, if we run the following SQL statement (that contains an INNER JOIN):
Example
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
Try it yourself
INNER JOIN: Returns all rows when there is at least one match in BOTH
tables
LEFT JOIN: Return all rows from the left table, and the matched rows from
the right table
RIGHT JOIN: Return all rows from the right table, and the matched rows from
the left table
FULL JOIN: Return all rows when there is a match in ONE of the tables
or:
SELECT column_name(s)
FROM table1
JOIN table2
ON table1.column_name=table2.column_name;
Demo Database
In this tutorial we will use the well-known Northwind sample database.
Maria Germa
1 Alfreds Futterkiste Obere Str. 57 Berlin 12209
Anders ny
10309 37 3 1996-09-19 1
10310 77 8 1996-09-20 2
Example
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;
Try it yourself
Note: The INNER JOIN keyword selects all rows from both tables as long as there is a match
between the columns. If there are rows in the "Customers" table that do not have matches in
"Orders", these customers will NOT be listed.
Demo Database
In this tutorial we will use the well-known Northwind sample database.
Maria Germa
1 Alfreds Futterkiste Obere Str. 57 Berlin 12209
Anders ny
10308 2 7 1996-09-18 3
10309 37 3 1996-09-19 1
10310 77 8 1996-09-20 2
Example
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;
Try it yourself
Note: The LEFT JOIN keyword returns all the rows from the left table (Customers), even if there
are no matches in the right table (Orders).
The FULL OUTER JOIN keyword combines the result of both LEFT and RIGHT joins.
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name=table2.column_name;
Demo Database
In this tutorial we will use the well-known Northwind sample database.
Maria Germa
1 Alfreds Futterkiste Obere Str. 57 Berlin 12209
Anders ny
10308 2 7 1996-09-18 3
10309 37 3 1996-09-19 1
10310 77 8 1996-09-20 2
CustomerName OrderID
Alfreds Futterkiste
10382
10351
Note: The FULL OUTER JOIN keyword returns all the rows from the left table (Customers),
and all the rows from the right table (Orders). If there are rows in "Customers" that do not have
matches in "Orders", or if there are rows in "Orders" that do not have matches in "Customers",
those rows will be listed as w
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name=table2.column_name;
or:
SELECT column_name(s)
FROM table1
RIGHT OUTER JOIN table2
ON table1.column_name=table2.column_name;
10308 2 7 1996-09-18 3
10309 37 3 1996-09-19 1
10310 77 8 1996-09-20 2
Example
SELECT Orders.OrderID, Employees.FirstName
FROM Orders
RIGHT JOIN Employees
ON Orders.EmployeeID=Employees.EmployeeID
ORDER BY Orders.OrderID;
Try it yourself
Note: The RIGHT JOIN keyword returns all the rows from the right table (Employees), even if
there are no matches in the left table (Orders).
Notice that each SELECT statement within the UNION must have the same number of columns.
The columns must also have similar data types. Also, the columns in each SELECT statement
must be in the same order.
Note: The UNION operator selects only distinct values by default. To allow duplicate values, use
the ALL keyword with UNION.
PS: The column names in the result-set of a UNION are usually equal to the column names in
the first SELECT statement in the UNION.
Demo Database
In this tutorial we will use the well-known Northwind sample database.
Charlotte 49 Gilbert
1 Exotic Liquid Londona EC1 4SD UK
Cooper St.
Example
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers
ORDER BY City;
Try it yourself
Note: UNION cannot be used to list ALL cities from the two tables. If several customers and
suppliers share the same city, each city will only be listed once. UNION selects only distinct
values. Use UNION ALL to also select duplicate values!
SQL UNION ALL Example
The following SQL statement uses UNION ALL to select all (duplicate values also) cities from
the "Customers" and "Suppliers" tables:
Example
SELECT City FROM Customers
UNION ALL
SELECT City FROM Suppliers
ORDER BY City;
Try it yourself
Example
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION ALL
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
ORDER BY City;