0% found this document useful (0 votes)
10 views114 pages

L1-5 Merged

Uploaded by

blackman250a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views114 pages

L1-5 Merged

Uploaded by

blackman250a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Database Concepts

Presented by
Dr. Amira M. Gaber
Lecture 1 , 2
Data and Database
Data are row facts and the heart of database which used to produce a useful
information. Good, timely, relevant information is the key of decision making.
Good decision making is the key to organization survival.
database is a repository of data, designed to support efficient data storage,
retrieval and maintenance. Multiple types of databases exist to suit various
industry requirements. A database may be specialized to store binary files,
documents, images, videos, relational data, multidimensional data, transactional
data, analytic data, or geographic data to name a few.
Types of Database

Data can be stored in various forms:


 relational database: data is stored in a tabular form
 hierarchical database. data is organized in a tree structure form,
 network database Data stored as graphs representing relationships between
objects.
In this book, we focus on relational databases.
Applications of Database
 Multimedia databases that can store pictures, video clips, and sound messages.
 Geographic information systems (GISs) can store and analyze maps, weather data, and
satellite images.
 Data warehouses and online analytical processing (OLAP) systems are used in many
companies to extract and analyze useful business information from very large databases to
support decision making.
 Real-time and active database technology is used to control industrial and manufacturing
processes.
 A database search techniques are being applied to the World Wide Web to improve the
search for information that is needed by users browsing the Internet.
Database Management System (DBMS)

DBMS : is a computerized system that enables users to create and maintain a


database. The DBMS is a general-purpose software system that facilitates the
processes of defining, constructing, manipulating, and sharing databases among
various users and applications.
Importance of Database Management System
(DBMS)
DBMS is the ability to have multiple users insert, update and delete data to the same
data file without ’"Mess“.
This means that
 different users will not cause the data to become inconsistent
 no data should be accidentally lost through these operations.
 have a standard interface for data access, tools for data backup, data restore and
recovery.
 have a way to handle work with huge volumes of data and users.
Objectives of DBMS

• The main objectives of database management system are data availability, data
integrity, data security and data independence.
• Data availability: refers to the data are available form a wide variety of users in a
meaningful format to be easily accessed.
• Data integrity: refers to the correctness and reliable of the data in the database.
• Data security: refers to the data in the database can be accessed only by the
authorized users by using passwords.
• Data independence: refers to the immunity of user applications to changes made
in the definition and organization of data.
File-based approach

Data redundancy
• Often, within an organization, files and applications are created by different
programmers from various departments over long periods of time. This can lead
to data redundancy. This practice can lead to several problems such as:
• Inconsistency in data format
• The same information being kept in several different places (files)
• Data inconsistency, a situation where various copies of the same data are conflicting,
wastes storage space and duplicates effort
File-based approach
Data isolation
• Data isolation is a property that determines when and how changes made by
one operation become visible to other concurrent users and systems. This
problem leads to
• Difficulty for new applications to retrieve the appropriate data, which might
be stored in various files.
File-based approach
Integrity problems
• Problems with data integrity is the maintenance and assurance that the data in
a database are correct and consistent. Factors to consider when addressing
this issue are:
• Data values must satisfy certain consistency constraints that are specified in
the application programs.
• It is difficult to make changes to the application programs in order to
enforce new constraints.
File-based approach
Security problems
• Security can be a problem with a file-based approach because:
• There are constraints regarding accessing privileges.
• Application requirements are added to the system in an ad-hoc manner so it
is difficult to enforce constraints.
File-based approach
Concurrency access
• Concurrency is the ability of the database to allow multiple users access to the
same record without adversely affecting transaction processing. A file-based
system must manage, or prevent, concurrency by the application programs.
Typically, in a file-based system, when an application opens a file, that file is
locked. This means that no one else has access to the file at the same time.
DBMS
DB
Raw data
+ data
Advantages of DBMS
1. Improved data sharing
2. Improved data security
3. Better data integration
4. Minimized data inconsistency
5. Improved data access
6. Improved decision making
Data Dictionary
• Data Dictionary
A data dictionary, also known as a “system catalog,” is a centralized store of
information about the database. It contains information about the tables, the fields the
tables contain, data types, primary keys, indexes, the joins which have been established
between those tables, referential integrity, cascades update, cascade delete, etc. This
information stored in the data dictionary is Called “Metadata.”
• Metadata
The information (data) about the data in a database is called Metadata
Types of DBMS architecture
Types of DBMS architecture

Client- Server
Types of DBMS architecture

Client- Server Web-Development


Database Administrator (DBA)
• A database administrator (DBA) is responsible for the maintenance,
performance, integrity and security of a database. Additional role
requirements are likely to include planning, development and
troubleshooting.
Database Administrator (DBA) Roles
• Establishing the needs of users and monitoring user access and security;
• Monitoring performance and managing parameters to provide fast query responses
to front-end users;
• Mapping out the conceptual design for a planned database in outline;
• Take into account both, back-end organization of data and front-end accessibility
for end users;
• Refining the logical design so that it can be translated into a specific data model;
• Further refining the physical design to meet system storage requirements;
Database Administrator (DBA) Roles
• Installing and testing new versions of the database management system
(DBMS);
• Maintaining data standards, including adherence to the Data Protection Act;
• Writing database documentation, including data standards, procedures and
definitions for the data dictionary (metadata);
• Controlling access permissions and privileges;
• Developing, managing and testing backup and recovery plans;
Database Administrator (DBA) Roles
• Ensuring that storage, archiving, backup and recovery procedures are functioning
correctly;
• Capacity planning;
• Working closely with IT project managers, database programmers and Web
developers;
• Communicating regularly with technical, applications and operational staff to ensure
database integrity and security;
• Commissioning and installing new applications.
Database Concepts
Presented by
Dr. Amira M. Gaber
Lecture 3
Database Design Phases
1. Conceptual design
Conceptual modeling is an important phase in designing a successful database
application. It sketches out the entities to be represented and determines what
kinds of relationships exist between them. It deals with the scope of the database
to be created and defines the general rules that need to be considered.

Database Design Phases


2. Logical Design

• The logical phase of database design is also called the data modeling mapping
phase. This phase gives us a result of relation schemas. The basis for these
schemas is the ER or the Class Diagram.

• To create the relation schemas is mainly mechanical operation. There are


rules for transferring the ER model or class diagram to relation schemas.

Database Design Phases


3. Normalization

• Normalization is, in fact, the last piece of the logical design puzzle. The main
purpose of normalization is to remove superfluity and every other potential
anomaly during the update.

• Normalization in database design is a way to change the relation schema to


reduce any superfluity. With every normalization phase, a new table is added
to the database.
Database Design Phases
4. Physical Design

• The last phase of database design is the physical design phase. In this phase,
we implement the database design. Here, a DBMS (Database Management
System) must be chosen to use.
Entity Relationship Diagram (ERD)
Entity Relationship Diagram (ERD)

• ERD stands for entity relationship diagram. People also call these types of
diagrams ER diagrams and Entity Relationship Models. An ERD visualizes
the relationships between entities like people, things, or concepts in a database.
An ERD will also often visualize the attributes of these entities.

• By defining the entities, their attributes, and showing the relationships between
them, an ER diagram can illustrate the logical structure of databases.
The Importance of the ERD

• Document an existing database structure


• Debug, troubleshoot, and analyze
• Design a new database
• Gather design requirements
• Business process re-engineering (BPR)
Common ERD Symbols
• An ER diagram has three main components: entities, relationships, and
attributes connected by lines.

• Entities, which are represented by rectangles. An entity is an object or concept


about which you want to store information. The Entity should have more than
one attribute and more than one entity set.

• Entity Set: is a set of entities of the same type that share the same properties.
A noun is used to represent an entity set.
Common ERD Symbols

Entity

Common ERD Symbols


• A weak entity is an entity that must defined by a foreign key relationship with
another entity as it cannot be uniquely identified by its own attributes alone.

Weekentity

Common ERD Symbols


• Attributes, which are represented by ovals. A key attribute is the unique,
distinguishing characteristic of the entity. For example, an employee's social
security number might be the employee's key attribute.
Attribute

Common ERD Symbols


• There are types for the attributes
• A multivalued attribute can have more than one value. For example, an
employee entity can have multiple skill values.
Multi value Attribute

Common ERD Symbols


• A derived attribute is based on another attribute. For example, an employee's
monthly salary is based on the employee's annual salary.

Derived
Attribute
Common ERD Symbols
• A composite attribute is an attribute where the values of that attribute can be
further subdivided into meaningful sub-parts.

First
Name Last name

Name
Common ERD Symbols
• Mandatory attributes - Mandatory attributes must have a value. For example,
in most businesses that track personal information, Name is required.

• Optional attributes - Optional attributes may have a value or be left null


• Unique identifier - This type of attribute distinguishes one entity from
another. For example, in a classroom, you can distinguish between one student
and another using a student ID. This is known as aKey
Common ERD Symbols
• Key, is a field or a set of fields that has/have a unique value for each record in
the relation. You need a key to ensure that you do not meet redundancies
within a relation.
There are three types of the relationships

• Candidate key – A candidate key is an attribute or set of attributes that


uniquely identifies a record in a relation.

• Primary key – A primary key is one of the candidate keys from a relation.
Every relation must have a primary key. A primary key shall be at least:
Common ERD Symbols
• Simple keys – these keys have a single attribute.
• Composite keys – these keys have multiple attributes.
• Foreign keys – these keys exist usually when there are two or more relations.
An attribute from one relation has to exist in the other(s) relation.
Relationship sets exist between the two attributes.

Common ERD Symbols


• Relationships, which are represented by diamond shapes, show how two
entities share information in the database.
There are three types of the relationships

• Recursive Relationship which the entities can be self-linked.

Common ERD Symbols


• Binary Relationship: two entities are connected directly together this called
the degree of the relation.

Common ERD Symbols


• Ternary Relationship: three entities are connected directly together this called
the degree of the relation.

Common ERD Symbols


• Connecting lines, solid lines that connect attributes and show the
relationships of entities in the diagram.

• Name of the Relation • Should be verb

Common ERD Symbols


• Cardinality specifies the numerical attribute of the relationship between
entities. It can be one-to-one, many-to-one, or many-to-many
Constraints
• Constraints Every business has restrictions on which attribute values and
which relationships are allowed. In the conceptual data model constraints are
used to handle these restrictions. A constraint is a requirement that entity sets
must satisfy in a relationship. Constraints may refer to a single attribute of an
entity set, or to relationship between entities.

Constraints
• Domain Integrity: Domain restricts the values of attributes in the relation and
is a constraint of the relational model. However, there are real-world semantics
for data that cannot be specified if used only with domain constraints. We need
more specific ways to state what data values are or are not allowed and which
format is suitable for an attribute. For example, the Employee ID (EID) must
be unique or the employee Birthdate is in the range [Jan 1, 1950, Jan 1, 2000].
Constraints
• Entity integrity: To ensure entity integrity, it is required that every table have a
primary key. Neither the PK nor any part of it can contain null values. This is
because null values for the primary key mean we cannot identify some rows.
For example, in the EMPLOYEE table, Phone cannot be a primary key since
some people may not have a telephone.

Constraints
• Referential integrity: Referential integrity requires that a foreign key must have a
matching primary key or it must be null. This constraint is specified between two
tables (parent and child); it maintains the correspondence between rows in these
tables. It means the reference from a row in one table to another table must be valid.

• Examples of referential integrity constraint in the Customer/Order database of the


Company:

• Customer(CustID, CustName)
• Order(OrderID, CustID, OrderDate)

ERD Models
• Conceptual ERD or data model: This model has the most abstraction and
least amount of detail, as such it's appropriate for large projects that need a
higher level view used by business analysts. A typical conceptual ERD will
contain entities and relationships, but offer no details on specific database
columns or cardinalities. It's a general, high-level view of database design.

ERD Models
• Logical ERD or data model: This model adds more detail to the conceptual
model by defining additional entities that are operational and transactional.
• Physical ERD or data model: This model serves as the actual design or
blueprint of the database with lots of technical details including defining
cardinality and showing primary and foreign keys of entities instead of just
their abstract semantic names. For this type of ERD, attributes will often be
listed to represent the columns of the real database table.
Database Concepts
Presented by
Dr. Amira M. Gaber
Lecture 4: Logical Model (Mapping)
Database Design Phases
1. Conceptual design
Conceptual modeling is an important phase in designing a successful database
application. It sketches out the entities to be represented and determines what
kinds of relationships exist between them. It deals with the scope of the
database to be created and defines the general rules that need to be considered.
Database Design Phases

2. Logical Design
• The logical phase of database design is also called the data modeling
mapping phase. This phase gives us a result of relation schemas. The basis
for these schemas is the ER or the Class Diagram.
• To create the relation schemas is mainly mechanical operation. There are
rules for transferring the ER model or class diagram to relation schemas.
Database Design Phases
3. Normalization
• Normalization is, in fact, the last piece of the logical design puzzle. The main
purpose of normalization is to remove superfluity and every other potential
anomaly during the update.
• Normalization in database design is a way to change the relation schema to
reduce any superfluity. With every normalization phase, a new table is added
to the database.
Database Design Phases
4. Physical Design
• The last phase of database design is the physical design phase. In this phase,
we implement the database design. Here, a DBMS (Database Management
System) must be chosen to use.
Logical Model Design

The logical phase of database design is also called the data modeling
mapping phase. This phase gives us a result of relation schemas.
The basis for these schemas is the ER or the Class Diagram.
To create the relation schemas is mainly mechanical operation.
There are rules for transferring the ER model or class diagram to
relation schemas.
Mapping Rules

Mapping Regular Entity


• Each Entity will be a relation schema (Table further)
• Each simple attribute will be a column in the relation
schema
Mapping Rules

For Composite Attribute


Each composite attribute will be a separate column
For Composite Attribute
Mapping Rules
• Each multivalued attribute will be a separate relation with a foreign key and
treated as 1: M relationship between original entity and new relation
Mapping Rules
Mapping Weak Entity
Becomes a separate relation with a foreign key taken from the superior entity
– Primary key composed of:
• Partial identifier of weak entity Primary
• Primary key of identifying relation (Strong Relation)
Mapping Rules

Mapping Binary Relationships


• One-to-Many: Primary key on the one side becomes a
foreign key on the many side
Mapping Rules

Mapping Binary Relationships


• Many-to-Many: Create a new relation with the primary
keys of the two entities as its primary key
Mapping Rules

Mapping Binary Relationships


• One-to-One: Primary key on the mandatory side
becomes a foreign key on the optional side
Mapping Rules
Mapping Associative Entities
• Identifier Not Assigned: Default primary key for the
association Default primary key for the association relation is
composed of the primary keys of the two entities (as in M: N
relationship)
• Identifier Assigned It is natural and familiar to end-users
Default identifier may not be unique
Mapping Rules

Mapping Recursive Relationships


• One-to-Many: Recursive foreign key in the same relation
Mapping Rules

Mapping Recursive Relationships


• Many-to-Many: Two relations, one for the entity type
One for an associative relation in which the primary key
has two attributes, both taken from the primary key of
the entity.
Mapping Rules

Mapping Ternary (and n-ary) Relationships


• One relation for each entity and one for the associative
entity.
• Associative entity has foreign keys to each entity in the
relationship
Database Concepts
Presented by
Dr. Amira M. Gaber
Lecture 5: Conceptual and Logical Model (Mapping)
A university consists of several faculties. Within each faculty
there are several departments. Each department may run a
number of courses. All teaching staff is attached to departments,
each staff member belonging to a unique department. Every
course is composed of sub-courses. Some sub-courses are part of
more than one course. Staff may teach on many sub-courses and
each sub-course may be taught by a number of staff.
Database Concepts
Presented by
Dr. Amira M. Gaber
Lecture 5: Normalization
Normalization

Normalization is a procedure in relational database design


that aims at converting relational schemas into a more
desirable form. The goal is to remove redundancy in
relations and the problems that follow from it, namely
insertion, deletion and update anomalies.
Data Redundancy

Data redundancy implies finding the same data in more than one location
within database tables. Redundancy in a relational schema is a non-optimal
relational database design because of the following problems:
 Insertion Anomalies
 Deletion Anomalies
 Update Anomalies
Insertion Anomalies
An insertion anomaly happens when the insertion of a data record is not
possible unless we also add some additional unrelated data to the record.
Deletion Anomalies
A deletion anomaly happens when deletion of a data record results in losing
some unrelated information that was stored as part of the record that was
deleted from a table.
Update Anomalies
An update anomaly occurs when updating data for an entity in one place may
lead to inconsistency, with the existing redundant data in another place
Decompositions

To overcome the above problems caused by data redundancy


Decomposition in relational database design implies breaking
down a relational schema into smaller and simpler relations that
avoid redundancy. The idea is to be able to query the smaller
relations for any information that we were previously able to
retrieve from the original relational schema.
Decompositions

While breaking down a given relational schema helps to


avoid redundancy, one should be careful not to lose
information. That is, it should be possible to reconstruct
the original relational schema back from the smaller
relations. Functional dependencies guide us to achieve
this reversibility of information.
Functional Dependencies

Functional Dependency (FD) is a type of integrity constraint that


extends the idea of a super key. It defines a dependency between
subsets of attributes of a given relation.
Functional Dependencies

Example

• Using the Student schema of our first example and shown in Table 4.7, a
FD, STUDENT_ID → COLLEGE holds on the Student schema, while an
FD, STUDENT → COLLEGE will not hold over relation schema because
there maybe students with the same name in two different colleges.
The following set of FDs
{STUDENT_ID → COLLEGE, STUDENT_ID → STUDENT,
STUDENT_ID → STUDENT → COLLEGE
STUDENT_ID → STUDENT → RANK}
Normalization – Normal Forms
The Normal forms progress towards obtaining an optimal design. Normalization is a step-wise
process, where each step transforms the relational schemas into a higher normal form. Each Normal
form contains all the previous normal forms and some additional optimization over them.

 First Normal form (1st NF)


 Second Normal Form (2nd NF)
 Third Normal Form (3rd NF)
 BCNF
First Normal Form (1st NF)
The idea of atomic values for attribute ensures that there are no
‘repeating groups’. This is because a relational database management system
is capable of storing a single value only at the intersection of a row and a
column. Repeating Groups are when we attempt to store multiple values at the
intersection of a row and a column and a table that will contain such a value is
not strictly relational.
First Normal Form (1st NF)
“A table is in 1NF if and only if it satisfies the following five conditions”
• There is no top-to-bottom ordering to the rows.
• There is no left-to-right ordering to the columns.
• There are no duplicate rows.
• Every row-and-column intersection contains exactly one value from the applicable
domain (and nothing else).
• All columns are regular [i.e. rows have no hidden components such as row IDs,
object IDs, or hidden timestamps].”
Second Normal Form (2nd NF)
• A relation is in second formal form when it is in 1NF and there is no such
non-key attribute that depends on part of the candidate key, but on the
entire candidate key.
• It follows from the above definition that a relation that has a single attribute
as its candidate key is always in 2NF.
Third Normal Form (3rd NF)

A relation is in third normal form if

• It is in 2NF
• There is no such non-key attribute that depends transitively on the candidate key.
That is every attribute depends directly on the primary key and not through a
transitive relation where an attribute Z may depend on a non-key attribute Y and Y
in turn depends on the primary key X.
First Normal Form (1st NF)
“Remove Repeating Group”
Second Normal Form (2nd NF)
“Remove Partial Dependency”
Non-primary key is depend on a primary key
Third Normal Form (3rd NF)
“Remove Transitive Dependency”
Non-key attribute is depend on another non-key attribute

You might also like