L1-5 Merged
L1-5 Merged
Presented by
Dr. Amira M. Gaber
Lecture 1 , 2
Data and Database
Data are row facts and the heart of database which used to produce a useful
information. Good, timely, relevant information is the key of decision making.
Good decision making is the key to organization survival.
database is a repository of data, designed to support efficient data storage,
retrieval and maintenance. Multiple types of databases exist to suit various
industry requirements. A database may be specialized to store binary files,
documents, images, videos, relational data, multidimensional data, transactional
data, analytic data, or geographic data to name a few.
Types of Database
• The main objectives of database management system are data availability, data
integrity, data security and data independence.
• Data availability: refers to the data are available form a wide variety of users in a
meaningful format to be easily accessed.
• Data integrity: refers to the correctness and reliable of the data in the database.
• Data security: refers to the data in the database can be accessed only by the
authorized users by using passwords.
• Data independence: refers to the immunity of user applications to changes made
in the definition and organization of data.
File-based approach
Data redundancy
• Often, within an organization, files and applications are created by different
programmers from various departments over long periods of time. This can lead
to data redundancy. This practice can lead to several problems such as:
• Inconsistency in data format
• The same information being kept in several different places (files)
• Data inconsistency, a situation where various copies of the same data are conflicting,
wastes storage space and duplicates effort
File-based approach
Data isolation
• Data isolation is a property that determines when and how changes made by
one operation become visible to other concurrent users and systems. This
problem leads to
• Difficulty for new applications to retrieve the appropriate data, which might
be stored in various files.
File-based approach
Integrity problems
• Problems with data integrity is the maintenance and assurance that the data in
a database are correct and consistent. Factors to consider when addressing
this issue are:
• Data values must satisfy certain consistency constraints that are specified in
the application programs.
• It is difficult to make changes to the application programs in order to
enforce new constraints.
File-based approach
Security problems
• Security can be a problem with a file-based approach because:
• There are constraints regarding accessing privileges.
• Application requirements are added to the system in an ad-hoc manner so it
is difficult to enforce constraints.
File-based approach
Concurrency access
• Concurrency is the ability of the database to allow multiple users access to the
same record without adversely affecting transaction processing. A file-based
system must manage, or prevent, concurrency by the application programs.
Typically, in a file-based system, when an application opens a file, that file is
locked. This means that no one else has access to the file at the same time.
DBMS
DB
Raw data
+ data
Advantages of DBMS
1. Improved data sharing
2. Improved data security
3. Better data integration
4. Minimized data inconsistency
5. Improved data access
6. Improved decision making
Data Dictionary
• Data Dictionary
A data dictionary, also known as a “system catalog,” is a centralized store of
information about the database. It contains information about the tables, the fields the
tables contain, data types, primary keys, indexes, the joins which have been established
between those tables, referential integrity, cascades update, cascade delete, etc. This
information stored in the data dictionary is Called “Metadata.”
• Metadata
The information (data) about the data in a database is called Metadata
Types of DBMS architecture
Types of DBMS architecture
Client- Server
Types of DBMS architecture
• The logical phase of database design is also called the data modeling mapping
phase. This phase gives us a result of relation schemas. The basis for these
schemas is the ER or the Class Diagram.
• Normalization is, in fact, the last piece of the logical design puzzle. The main
purpose of normalization is to remove superfluity and every other potential
anomaly during the update.
• The last phase of database design is the physical design phase. In this phase,
we implement the database design. Here, a DBMS (Database Management
System) must be chosen to use.
Entity Relationship Diagram (ERD)
Entity Relationship Diagram (ERD)
• ERD stands for entity relationship diagram. People also call these types of
diagrams ER diagrams and Entity Relationship Models. An ERD visualizes
the relationships between entities like people, things, or concepts in a database.
An ERD will also often visualize the attributes of these entities.
• By defining the entities, their attributes, and showing the relationships between
them, an ER diagram can illustrate the logical structure of databases.
The Importance of the ERD
• Entity Set: is a set of entities of the same type that share the same properties.
A noun is used to represent an entity set.
Common ERD Symbols
Entity
Weekentity
Derived
Attribute
Common ERD Symbols
• A composite attribute is an attribute where the values of that attribute can be
further subdivided into meaningful sub-parts.
First
Name Last name
Name
Common ERD Symbols
• Mandatory attributes - Mandatory attributes must have a value. For example,
in most businesses that track personal information, Name is required.
• Primary key – A primary key is one of the candidate keys from a relation.
Every relation must have a primary key. A primary key shall be at least:
Common ERD Symbols
• Simple keys – these keys have a single attribute.
• Composite keys – these keys have multiple attributes.
• Foreign keys – these keys exist usually when there are two or more relations.
An attribute from one relation has to exist in the other(s) relation.
Relationship sets exist between the two attributes.
Constraints
• Domain Integrity: Domain restricts the values of attributes in the relation and
is a constraint of the relational model. However, there are real-world semantics
for data that cannot be specified if used only with domain constraints. We need
more specific ways to state what data values are or are not allowed and which
format is suitable for an attribute. For example, the Employee ID (EID) must
be unique or the employee Birthdate is in the range [Jan 1, 1950, Jan 1, 2000].
Constraints
• Entity integrity: To ensure entity integrity, it is required that every table have a
primary key. Neither the PK nor any part of it can contain null values. This is
because null values for the primary key mean we cannot identify some rows.
For example, in the EMPLOYEE table, Phone cannot be a primary key since
some people may not have a telephone.
Constraints
• Referential integrity: Referential integrity requires that a foreign key must have a
matching primary key or it must be null. This constraint is specified between two
tables (parent and child); it maintains the correspondence between rows in these
tables. It means the reference from a row in one table to another table must be valid.
• Customer(CustID, CustName)
• Order(OrderID, CustID, OrderDate)
ERD Models
• Conceptual ERD or data model: This model has the most abstraction and
least amount of detail, as such it's appropriate for large projects that need a
higher level view used by business analysts. A typical conceptual ERD will
contain entities and relationships, but offer no details on specific database
columns or cardinalities. It's a general, high-level view of database design.
ERD Models
• Logical ERD or data model: This model adds more detail to the conceptual
model by defining additional entities that are operational and transactional.
• Physical ERD or data model: This model serves as the actual design or
blueprint of the database with lots of technical details including defining
cardinality and showing primary and foreign keys of entities instead of just
their abstract semantic names. For this type of ERD, attributes will often be
listed to represent the columns of the real database table.
Database Concepts
Presented by
Dr. Amira M. Gaber
Lecture 4: Logical Model (Mapping)
Database Design Phases
1. Conceptual design
Conceptual modeling is an important phase in designing a successful database
application. It sketches out the entities to be represented and determines what
kinds of relationships exist between them. It deals with the scope of the
database to be created and defines the general rules that need to be considered.
Database Design Phases
2. Logical Design
• The logical phase of database design is also called the data modeling
mapping phase. This phase gives us a result of relation schemas. The basis
for these schemas is the ER or the Class Diagram.
• To create the relation schemas is mainly mechanical operation. There are
rules for transferring the ER model or class diagram to relation schemas.
Database Design Phases
3. Normalization
• Normalization is, in fact, the last piece of the logical design puzzle. The main
purpose of normalization is to remove superfluity and every other potential
anomaly during the update.
• Normalization in database design is a way to change the relation schema to
reduce any superfluity. With every normalization phase, a new table is added
to the database.
Database Design Phases
4. Physical Design
• The last phase of database design is the physical design phase. In this phase,
we implement the database design. Here, a DBMS (Database Management
System) must be chosen to use.
Logical Model Design
The logical phase of database design is also called the data modeling
mapping phase. This phase gives us a result of relation schemas.
The basis for these schemas is the ER or the Class Diagram.
To create the relation schemas is mainly mechanical operation.
There are rules for transferring the ER model or class diagram to
relation schemas.
Mapping Rules
Data redundancy implies finding the same data in more than one location
within database tables. Redundancy in a relational schema is a non-optimal
relational database design because of the following problems:
Insertion Anomalies
Deletion Anomalies
Update Anomalies
Insertion Anomalies
An insertion anomaly happens when the insertion of a data record is not
possible unless we also add some additional unrelated data to the record.
Deletion Anomalies
A deletion anomaly happens when deletion of a data record results in losing
some unrelated information that was stored as part of the record that was
deleted from a table.
Update Anomalies
An update anomaly occurs when updating data for an entity in one place may
lead to inconsistency, with the existing redundant data in another place
Decompositions
Example
• Using the Student schema of our first example and shown in Table 4.7, a
FD, STUDENT_ID → COLLEGE holds on the Student schema, while an
FD, STUDENT → COLLEGE will not hold over relation schema because
there maybe students with the same name in two different colleges.
The following set of FDs
{STUDENT_ID → COLLEGE, STUDENT_ID → STUDENT,
STUDENT_ID → STUDENT → COLLEGE
STUDENT_ID → STUDENT → RANK}
Normalization – Normal Forms
The Normal forms progress towards obtaining an optimal design. Normalization is a step-wise
process, where each step transforms the relational schemas into a higher normal form. Each Normal
form contains all the previous normal forms and some additional optimization over them.
• It is in 2NF
• There is no such non-key attribute that depends transitively on the candidate key.
That is every attribute depends directly on the primary key and not through a
transitive relation where an attribute Z may depend on a non-key attribute Y and Y
in turn depends on the primary key X.
First Normal Form (1st NF)
“Remove Repeating Group”
Second Normal Form (2nd NF)
“Remove Partial Dependency”
Non-primary key is depend on a primary key
Third Normal Form (3rd NF)
“Remove Transitive Dependency”
Non-key attribute is depend on another non-key attribute