Unit 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

UNIT 2

Data Modelling
Contents
• Data Modeling Using the E-R Model: Entity Types, Entity Sets,
Attributes, and Keys, Relationships, relationship Types, Roles, and
Structural Constraints, Weak Entity Types, Refining the ER Design for
the COMPANY Database, ER Diagrams, Naming Conventions, and
Design Issues
• The Relational Data Model: Relational Model Concepts, Relational
Constraints and Relational Database Schemas, Update Operations and
Dealing with Constraint Violations, Basic Relational Algebra
Operations, Additional Relational Operations
Entity and entity types
Entity
An entity is a real-world thing which can be distinctly identified like a person,
place or a concept. It is an object which is distinguishable from others. If we
cannot distinguish it from others then it is an object but not an entity. An
entity can be of two types:
• Tangible Entity: Tangible Entities are those entities which exist in the real
world physically. Example: Person, car, etc.
• Intangible Entity: Intangible Entities are those entities which exist only
logically and have no physical existence. Example: Bank Account, etc.
• Example: If we have a table of a Student (Roll_no, Student_name, Age,
Mobile_no) then each student in that table is an entity and can be uniquely
identified by their Roll Number i.e Roll_no.
Entity and entity types
Entity Type
• The entity type is a collection of entities having similar attributes.
• In the above example, Student table example, we have each row as
an entity and they are having common attributes i.e each row has its
own value for attributes Roll_no, Age, Student_name and Mobile_no.
• So, we can define the above STUDENT table as an entity type because
it is a collection of entities having the same attributes.
• The table below shows how the data of different entities( different
students) are stored.
Entity and entity types
Entity Type
• Consider the example below;
Entity and entity types
Types of Entity types
• Strong Entity Type: Strong entities are those entity types which have a key
attribute. The primary key helps in identifying each entity uniquely. In the above
example, Roll_no identifies each element of the table uniquely and hence, we can
say that STUDENT is a strong entity type. Represented by a rectangle.
• Weak Entity Type: Weak entity types do not have a key attribute. Weak entity
type can't be identified on its own. It depends upon some other strong entity for
its distinct identity. For example, there can be children only if the parent exits.
There can be no independent existence of children. There can be a room only if
building exits. There can be no independent existence of a room. A weak entity is
represented by a double outlined rectangle. The relationship between a weak
entity type and strong entity type is called an identifying relationship and shown
with a double outlined diamond instead of a single outlined diamond.
Entity and entity types
Entity Set
• Entity Set is a collection of entities of the same entity type. In the
above example of STUDENT entity type, a collection of entities from
the Student entity type would form an entity set. We can say that
entity type is a superset of the entity set as all the entities are
included in the entity type. Let's try to understand this with the help
of an example.
• Example 1: In the below example, two entities E1 (2, Angel, 19,
8709054568) and E2(4, Analisa, 21, 9847852156) form an entity set.
Entity and entity types
Entity Set
• Example
Attributes
• An attribute describes the facts, details or characteristics of an entity.
• An attribute is a characteristic. An attribute refers to a database
component, such as a table. It also may refer to a database field.
Database Keys
• A DBMS key is an attribute or set of an attribute which helps you to
identify a row(tuple) in a relation(table). They allow you to find the
relation between two tables. Keys help you uniquely identify a row in
a table by a combination of one or more columns in that table.
• In the above-given example, employee ID is a primary key because it
uniquely identifies an employee record. In this table, no other
employee can have the same employee ID.
Database Keys
Why do we need keys?
• Keys help you to identify any row of data in a table. In a real-world
application, a table could contain thousands of records. Moreover,
the records could be duplicated. Keys ensure that you can uniquely
identify a table record despite these challenges.
• Allows you to establish a relationship between and identify the
relation between tables
• Help you to enforce identity and integrity in the relationship.
Database Keys
Types of keys
• Super Key
• Primary Key
• Candidate Key
• Alternate Key
• Foreign Key
• Compound Key
• Composite Key
• Surrogate Key
Database Keys
Super key
• A superkey is a group of single or multiple keys which identifies rows
in a table. A Super key may have additional attributes that are not
needed for unique identification.
• In the above-given example, EmpSSN and EmpNum name are
superkeys.
Database Keys
Primary key
• PRIMARY KEY is a column or group of columns in a table that uniquely
identify every row in that table. The Primary Key can't be a duplicate
meaning, the same value can't appear more than once in the table. A table
cannot have more than one primary key.
Rules for defining Primary Keys
• Two rows can't have the same primary key value
• Its a must for every row to have a primary key value.
• The primary key field cannot be null.
• The value in a primary key column can never be modified or updated if any
foreign key refers to that primary key.
Database Keys
Alternate key
• ALTERNATE KEYS is a column or group of columns in a table that
uniquely identify every row in that table. A table can have multiple
choices for a primary key but only one can be set as the primary key.
All the keys which are not primary key are called an Alternate Key.
• In this table, StudID, Roll No, Email are qualified to become a primary
key. But since StudID is the primary key, Roll No & Email become the
alternative keys.
Database Keys
Candidate key
• CANDIDATE KEY is a set of attributes that uniquely identify tuples in a
table. Candidate Key is a super key with no repeated attributes. The
Primary key should be selected from the candidate keys. Every table must
have at least a single candidate key. A table can have multiple candidate
keys but only a single primary key.
Properties of Candidate key
• It must contain unique values
• Candidate key may have multiple attributes
• Must not contain null values
• It should contain minimum fields to ensure uniqueness
• Uniquely identify each record in a table
Database Keys
Candidate key
• Example: In the given table Stud ID, Roll No, and email are candidate
keys which help us to uniquely identify the student record in the
table.
Database Keys
Foreign key
• FOREIGN KEY is a column that creates a relationship between two tables.
The purpose of Foreign keys is to maintain data integrity and allow
navigation between two different instances of an entity. It acts as a cross-
reference between two tables as it references the primary key of another
table.
• In this example, we have two tables, teach and department in a school.
However, there is no way to see which teacher works in which department.
• In this table, adding the foreign key in Deptcode to the Teacher name, we
can create a relationship between the two tables.
• This concept is also known as Referential Integrity.
Database Keys
Foreign key
• FOREIGN
Database Keys
Compound key
• COMPOUND KEY has two or more attributes that allow you to
uniquely recognize a specific record. It is possible that each column
may not be unique by itself within the database. However, when
combined with the other column or columns the combination of
composite keys become unique. The purpose of compound key is to
uniquely identify each record in the table.
• In this example, OrderNo and ProductID can't be a primary key as it
does not uniquely identify a record. However, a compound key of
Order ID and Product ID could be used as it uniquely identified each
record.
Database Keys
Compound key
• COMPOUND
Database Keys
Composite key
• COMPOSITE KEY is a combination of two or more columns that
uniquely identify rows in a table. The combination of columns
guarantees uniqueness, though individually uniqueness is not
guaranteed. Hence, they are combined to uniquely identify records in
a table.
• The difference between compound and the composite key is that any
part of the compound key can be a foreign key, but the composite
key may or maybe not a part of the foreign key.
Database Keys
Surrogate key
• An artificial key which aims to uniquely identify each record is called a
surrogate key. These kind of keys are unique because they are created
when you don't have any natural primary key. They do not lend any
meaning to the data in the table. Surrogate key is usually an integer.
• Below, given example, shown shift timings of the different employees. In
this example, a surrogate key is needed to uniquely identify each
employee.
Surrogate keys are allowed when
• No property has the parameter of the primary key.
• In the table when the primary key is too big or complicated.
Database Keys
Surrogate key
• Surrogate
Database Keys
Difference between Primary key and Foreign Key
Relationship types
One to One Relationship
• This type of relationship allows only one record on each side of the
relationship. The primary key relates to only one record—or none—in
another table. An employee can work in at most one department, and
a department can have at most one employee.
One to Many Relationship
• A one-to-many relationship allows a single record in one table to be
related to multiple records in another table. An employee can work in
many departments (>=0), but a department can have at most one
employee.
Relationship types
Many to One Relationship
• An employee can work in at most one department (<=1), and a
department can have several employees.

Many to Many Relationship


• This is a complex relationship in which many records in a table can
link to many records in another table. An employee can work in many
departments (>=0), and a department can have several employees.
Constraints in relationships
Relational Constraints
• Types
Constraints in relationships
Domain Constraint
• Domain constraint defines the domain or set of values for an
attribute.
• It specifies that the value taken by the attribute must be the atomic
value from its domain.
• Here, value ‘A’ is not allowed since only integer values can be taken by
the age attribute.
Constraints in relationships
Tuple Uniqueness Constraint
• Tuple Uniqueness constraint specifies that all the tuples must be
necessarily unique in any relation.
• Table 1, this relation satisfies the tuple uniqueness constraint since
here all the tuples are unique.
• Table 2, this relation does not satisfy the tuple uniqueness constraint
since here all the tuples are not unique.
Constraints in relationships
Key Constraint
In any relation,
• All the values of primary key must be unique.
• The value of primary key must not be null.
This relation does not satisfy the key constraint as here all the values of
primary key are not unique
Constraints in relationships
Entity Integrity Constraint
• Entity integrity constraint specifies that no attribute of primary key
must contain a null value in any relation.
• This is because the presence of null value in the primary key violates
the uniqueness property.
Constraints in relationships
Referential Integrity Constraint
• This constraint is enforced when a foreign key references the primary
key of a relation.
• It specifies that all the values taken by the foreign key must either be
available in the relation of the primary key or be null
Developing Diagrams
ER diagrams
• An Entity Relationship Diagram (ERD) is a visual representation of different
entities within a system and how they relate to each other.
• They are widely used to design relational databases. The entities in the ER
schema become tables, attributes and converted the database schema.
Since they can be used to visualize database tables and their relationships
it’s commonly used for database troubleshooting as well.
• Entity relationship diagrams are used in software engineering during the
planning stages of the software project. They help to identify different
system elements and their relationships with each other. It is often used as
the basis for data flow diagrams or DFD’s as they are commonly known.
Developing Diagrams
ER diagrams Symbols and notations
• Consider the symbols below;
Developing Diagrams
ER diagrams Symbols and notations
Entity
• An entity can be a person, place, event, or object that is relevant to a given
system. For example, a school system may include students, teachers,
major courses, subjects, fees, and other items. Entities are represented in
ER diagrams by a rectangle and named using singular nouns.
Weak Entity
• A weak entity is an entity that depends on the existence of another entity.
In more technical terms it can be defined as an entity that cannot be
identified by its own attributes. It uses a foreign key combined with its
attributed to form the primary key. An entity like order item is a good
example for this. The order item will be meaningless without an order so it
depends on the existence of the order
Developing Diagrams
ER diagrams Symbols and notations
Attribute
• An attribute is a property, trait, or characteristic of an entity, relationship, or
another attribute. For example, the attribute Inventory Item Name is an attribute
of the entity Inventory Item. An entity can have as many attributes as necessary.
Meanwhile, attributes can also have their own specific attributes. For example,
the attribute “customer address” can have the attributes number, street, city, and
state. These are called composite attributes.
Multivalued Attribute
• If an attribute can have more than one value it is called a multi-valued attribute.
It is important to note that this is different from an attribute having its own
attributes. For example, a teacher entity can have multiple subject values.
Developing Diagrams
ER diagrams Symbols and notations
Derived Attribute
• An attribute based on another attribute. This is found rarely in ER
diagrams. For example, for a circle, the area can be derived from the
radius.
Relationship
• A relationship describes how entities interact. For example, the entity
“Carpenter” may be related to the entity “table” by the relationship
“builds” or “makes”. Relationships are represented by diamond
shapes and are labeled using verbs.
Developing Diagrams
ER diagrams Symbols and notations
Recursive Relationship
• If the same entity participates more than once in a relationship it is
known as a recursive relationship. In the below example an employee
can be a supervisor and be supervised, so there is a recursive
relationship.
Developing Diagrams
ER diagrams Symbols and notations
Cardinality and Ordinality
• These two further define relationships between entities by placing
the relationship in the context of numbers. In an email system, for
example, one account can have multiple contacts. The relationship, in
this case, follows a “one to many” model.
Developing Diagrams
ER diagrams Symbols and notations
How to draw ER Diagram
1. Identify all the entities in the system. An entity should appear only
once in a particular diagram. Create rectangles for all entities and
name them properly.
2. Identify relationships between entities. Connect them using a line
and add a diamond in the middle describing the relationship.
3. Add attributes for entities. Give meaningful attribute names so
they can be understood easily.
Developing Diagrams
Example Questions
1. Construct an E-R diagram for a car-insurance company
whose customers own one or more cars each. Each car has
associated with it zero to any number of recorded accidents.

2. Construct an E-R diagram for a hospital with a set of


patients and a set of medical doctors. Associate with each
patient a log of the various tests and examinations conducted
Relational Model Concept
• A database relation is not the same thing as a relational database. It
does not imply a relationship between tables, despite its name.
• Rather, a database relation refers to an individual table in a relational
database.
• In a relational database, the table is a relation because it stores the
relation between data in its column-row format. The columns are the
table's attributes, and the rows represent the data records. A single
row is known as a tuple.
Properties of a Relation
• Its name must be unique in the database: A database cannot contain multiple tables of the same
name.
• Each relation must have a set of columns (attributes): It must also have a set of rows to contain
the data. As with the table names, no attributes can have the same name.
• No tuple (row) can be a duplicate: In practice, a database might contain duplicate rows, but
practices should be in place to avoid this, such as the use of unique primary keys.
• A relation must contain at least one attribute (column) that identifies each tuple (row)
uniquely: This is usually the primary key. This primary key cannot be duplicated. This means that
no tuple can have the same unique, primary key. The key cannot have a NULL value, which means
that the value must be known.
• Each cell (field) must contain a single value: For example, you can't enter something like "Tom
Smith" and expect the database to understand that you have a first and last name. Rather, the
database will understand that the value of that cell is exactly what has been entered.
• All attributes (columns) must be of the same domain: In other words, they must have the same
data type. You can't mix a string and a number in a single cell.
Relation Keys
Note;
• These were discussed above.
• These include Super key, Candidate key, Primary key, Composite key,
Compound key, Secondary or Alternative key, Non- key attribute,
Non- prime attribute, Foreign key, Simple key, Artificial key.
Relational Constraint: Relation Integrity
• Refers to the accuracy and consistency of data within a relationship.
• A relational database concept, which states that table relationships
must always be consistent.
• Referential integrity/ Relation Integrity requires that a foreign key
must have a matching primary key or it must be null. This constraint is
specified between two tables (parent and child); it maintains the
correspondence between rows in these tables. It means the
reference from a row in one table to another table must be valid.
• So, referential integrity requires that, whenever a foreign key value is
used it must reference a valid, existing primary key in the parent
table.
Relational Constraint: Relation Integrity
Example
• For example, if we delete record number 15 in a primary table, we
need to be sure that there’s no foreign key in any related table with
the value of 15.
• We should only be able to delete a primary key if there are no
associated records. Otherwise, we would end up with an orphaned
record.
• To ensure that there are no orphan records, we need to enforce
referential integrity. An orphan record is one whose foreign key
FK value is not found in the corresponding entity – the entity where
the PK is located. Recall that a typical join is between a PK and FK
Relational Constraint: Relation Integrity
Example
So referential integrity will prevent users from:
• Adding records to a related table if there is no associated record in
the primary table.
• Changing values in a primary table that result in orphaned records in
a related table.
• Deleting records from a primary
table if there are matching
related records.
Relational Constraint: Relation Integrity
Consequences of a lack of referential integrity
• A lack of referential integrity in a database can lead to incomplete data
being returned, usually with no indication of an error. This could result in
records being “lost” in the database, because they’re never returned in
queries or reports.
• It could also result in strange results appearing in reports (such as
products without an associated company).
• Or worse yet, it could result in customers not receiving products they paid
for.
• Worse still, it could affect life and death situations, such as a hospital
patient not receiving the correct treatment, or a disaster relief team not
receiving the correct supplies or information.
Update Operations
Anomalies in a relational mode
1. Update Anomaly
• These happen when the person charged with the task of keeping all
the records current and accurate, is asked, for example, to change an
employee’s title due to a promotion.
• If the data is stored redundantly in the same table, and the person
misses any of them, then there will be multiple titles associated with
the employee. The end user has no way of knowing which is the
correct title.
Update Operations
Anomalies in a relational mode
1. Update Anomaly
• If a branch changes address, such as the Round Hill branch in Figure
10.3, we need to update all rows referring to that branch. Changing
existing information incorrectly is called an update anomaly.
Update Operations
Anomalies in a relational mode
2. Insertion Anomaly
• These happen when inserting vital data into the database is not
possible because other data is not already there.
• For example, if a system is designed to require that a customer be on
file before a sale can be made to that customer, but you cannot add a
customer until they have bought something, then you have an insert
anomaly.
Update Operations
Anomalies in a relational mode
2. Insertion Anomaly
• An insertion anomaly occurs when you are inserting inconsistent
information into a table. When we insert a new record, such as
account no. A-306 in Figure 10.2, we need to check that the branch
data is consistent with existing rows
Update Operations
Anomalies in a relational mode
3. Deletion Anomaly
• These happen when the deletion of unwanted information causes
desired information to be deleted as well.
• For example, if a single database record contains information about a
particular product along with information about a salesperson for the
company and the salesperson quits, then information about the
product is deleted along with salesperson information.
Update Operations
Anomalies in a relational mode
3. Deletion Anomaly
• A deletion anomaly occurs when you delete a record that may contain
attributes that shouldn’t be deleted. For instance, if we remove
information about the last account at a branch, such as account A-101
at the Downtown branch in Figure 10.4, all of the branch information
disappears.
Basic Relational Algebra Operations
Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output. It uses
operators to perform queries. An operator can be either unary or binary.
They accept relations as their input and yield relations as their output.
Relational algebra is performed recursively on a relation and intermediate
results are also considered relations.
• The fundamental operations of relational algebra are as follows −
• Select
• Project
• Union
• Set difference
• Cartesian product
• Rename
• We will discuss all these operations in the following sections.
Basic Relational Algebra Operations
The Select Operation (σ)
• It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
• Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula which may use connectors like
and, or, and not. These terms may use relational operators like − =, ≠, ≥, < , >, ≤.
• For example −
σsubject = "database"(Books)
• Output − Selects tuples from books where subject is 'database'.

σsubject = "database" and price = "450"(Books)


• Output − Selects tuples from books where subject is 'database' and 'price' is 450.

σsubject = "database" and price = "450" or year > "2010"(Books)


• Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books published after 2010.
Basic Relational Algebra Operations
Select Operation
Example 1
σ topic = "Database" (Tutorials)
• Output - Selects tuples from Tutorials where topic = 'Database'.

Example 2
σ topic = "Database" and author = "guru99"( Tutorials)
• Output - Selects tuples from Tutorials where the topic is 'Database' and 'author' is guru99.

Example 3
σ sales > 50000 (Customers)
• Output - Selects tuples from Customers where sales is greater than 50000
Basic Relational Algebra Operations
Project Operation (∏)
• It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r)
• Where A1, A2 , An are attribute names of relation r.
• Duplicate rows are automatically eliminated, as relation is a set.
• For example −
∏subject, author (Books)
• Output − selects and projects columns named as subject and author
from the relation Books.
Basic Relational Algebra Operations
Projection(π) CustomerID CustomerName Status

Example of Projection: 1 Google Active


2 Amazon Active
• Consider the following table 3 Apple Inactive
4 Alibaba Active

• Here, the projection of CustomerName and status will give


• Π CustomerName, Status (Customers) CustomerName Status
Google Active
Amazon Active
Apple Inactive
Alibaba Active
Basic Relational Algebra Operations
Union Operation (∪)
• It performs binary union between two given relations and is defined as −
r ∪ s = { t | t ∈ r or t ∈ s}
Notation − r U s
• Where r and s are either database relations or relation result set (temporary relation).
• For a union operation to be valid, the following conditions must hold −
• r, and s must have the same number of attributes.
• Attribute domains must be compatible.
• Duplicate tuples are automatically eliminated.
• For example −
∏ author (Books) ∪ ∏ author (Articles)
• Output − Projects the names of the authors who have either written a book or an article
or both.
Basic Relational Algebra Operations
3. Union Operation (υ)
Example
• Consider the following tables.
Table A Table B
Column 1 Column 2 Column 1 Column 2
1 1 1 1
1 2 1 3

• A ∪ B gives; Table A ∪ B
Column 1 Column 2
1 1
1 2
1 3
Basic Relational Algebra Operations
Set Difference (−)
• The result of set difference operation is tuples, which are present in
one relation but are not in the second relation.
Notation − r − s
• Finds all the tuples that are present in r but not in s.
• For example −
∏ author (Books) − ∏ author (Articles)
• Output − Provides the name of authors who have written books but
not articles.
Basic Relational Algebra Operations
Cartesian Product (Χ)
• Combines information of two different relations into one.
Notation − r Χ s
• Where r and s are relations and their output will be defined as −
• r Χ s = { q t | q ∈ r and t ∈ s}
• For example −
σauthor = 'tutorialspoint'(Books Χ Articles)
• Output − Yields a relation, which shows all the books and articles
written by tutorialspoint.
Basic Relational Algebra Operations
Rename Operation (ρ)
• The results of relational algebra are also relations but without any
name. The rename operation allows us to rename the output
relation. 'rename' operation is denoted with small Greek letter rho ρ.
Notation − ρ x (E)
• Where the result of expression E is saved with name of x.

• Additional operations are −


• Set intersection
• Natural join
Basic Relational Algebra Operations
Intersection
• An intersection is defined by the symbol ∩
•A∩B
• Defines a relation consisting of a set of all tuple that are in both A and
B. However, A and B must be union-compatible.
Basic Relational Algebra Operations
JOIN clause is used to combine rows from two or more tables, based
on a related column between them.
Different Types of SQL JOINs
• Here are the different types of the JOINs in SQL:
• (INNER) JOIN: Returns records that have matching values in both tables
• LEFT (OUTER) JOIN: Returns all records from the left table, and the matched
records from the right table
• RIGHT (OUTER) JOIN: Returns all records from the right table, and the
matched records from the left table
• FULL (OUTER) JOIN: Returns all records when there is a match in either left or
right table
Basic Relational Algebra Operations
These joins will be done as a practical.
Basic Relational Algebra Operations
Operation Purpose

The SELECT operation is used for selecting a subset of the tuples according to a given
Select(σ)
selection condition

Summary Projection(π)
The projection eliminates all attributes of the input relation but those mentioned in the
projection list.

Union Operation(∪) UNION is symbolized by symbol. It includes all tuples that are in tables A or in B.

- Symbol denotes it. The result of A - B, is a relation which includes all tuples that are in A
Set Difference(-)
but not in B.

Intersection(∩) Intersection defines a relation consisting of a set of all tuple that are in both A and B.

Cartesian Product(X) Cartesian operation is helpful to merge columns from two relations.

Inner Join Inner join, includes only those tuples that satisfy the matching criteria.

Theta Join(θ) The general case of JOIN operation is called a Theta join. It is denoted by symbol θ.

EQUI Join When a theta join uses only equivalence condition, it becomes a equi join.

Natural join can only be performed if there is a common attribute (column) between the
Natural Join(⋈)
relations.

Outer Join In an outer join, along with tuples that satisfy the matching criteria.

Left Outer Join( ) In the left outer join, operation allows keeping all tuple in the left relation.

Right Outer join() In the right outer join, operation allows keeping all tuple in the right relation.

In a full outer join, all tuples from both relations are included in the result irrespective of
Full Outer Join()
the matching condition.

You might also like