0% found this document useful (0 votes)
56 views19 pages

ADE Unit-1: Entity Relational Model

1. An entity-relationship model (ERM) is a conceptual representation of data used in database design. It depicts the relationships of entity sets within a system as diagrams of entities and relationships. 2. ERM involves identifying entities and relationships and drawing diagrams to visually represent them. These diagrams are then used to design the database at the conceptual and logical levels. 3. The ERM technique can be used to describe any data domain and helps produce a relational database design by mapping the conceptual model to logical and physical models.

Uploaded by

Pradeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views19 pages

ADE Unit-1: Entity Relational Model

1. An entity-relationship model (ERM) is a conceptual representation of data used in database design. It depicts the relationships of entity sets within a system as diagrams of entities and relationships. 2. ERM involves identifying entities and relationships and drawing diagrams to visually represent them. These diagrams are then used to design the database at the conceptual and logical levels. 3. The ERM technique can be used to describe any data domain and helps produce a relational database design by mapping the conceptual model to logical and physical models.

Uploaded by

Pradeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

ADE Unit-1

ENTITY RELATIONAL MODEL:


Entities:
1. An object in the mini world about which information is to be
stored. Examples: persons, books, courses
2. It must be possible to distinguish entities from each other, i.e.,
objects must have some identity.
Examples:
Entity book identified by ISBN number, entity vacations identified
by travel agency booking number.
Relationship:
A relationnot in the strict relational model sensebetween pairs
of entities (a binary relationship
In software engineering, an entity-relationship model (ERM) is an
abstract and conceptual representation of data.
Entity-relationship modeling is a database modeling method,
used to produce a type of conceptual schema or semantic
data model of a system, often a relational database, and its
requirements in a top-down fashion. Diagrams created by
this process are called entity-relationship diagrams, ER
diagrams, or ERDs.
The first stage of information system design uses these
models during the requirements analysis to describe
information needs or the type of information that is to be
stored in a database.
The data modeling technique can be used to describe any
ontology (i.e. an overview and classifications of used terms and
their relationships) for a certain area of interest.

In the case of the design of an information system that is based


on a database, the conceptual data model is, at a later stage
(usually called logical design), mapped to a logical data model,
such as the relational model; this in turn is mapped to a physical
model during physical design.
The building blocks: entities, relationships, and attributes
Two related entities:
An entity with an attribute
A relationship with an attribute

PRIMARY KEY:
An entity may be defined as a thing which is recognized as being
capable of an independent existence and which can be uniquely
identified. An entity is an abstraction from the complexities of
some domain.

NORMALIZATION:
Database normalization is the process of removing
redundant data from your tables in to improve storage
efficiency, data integrity, and scalability.
In the relational model, methods exist for quantifying how
efficient a database is. These classifications are called
normal forms (or NF), and there are algorithms for
converting a given database between them.
Normalization generally involves splitting existing tables into
multiple ones, which must be re-joined or linked each time a
query is issued.
The Purpose of Normalization:
1. Normalization is a technique for producing a set of relations
with desirable properties, given the data requirements of an
enterprise.
2. The process of normalization is a formal method that identifies
relations based on their primary or candidate keys and the
functional dependencies among their attributes.
Definition of 1NF:
First Normal Form is a relation in which the intersection of each
row and column contains one and only one value. There are two
approaches to removing repeating groups from unnormalized
tables:
1.Removes the repeating groups by entering appropriate data in
the empty columns of rows containing the repeating data.
2.Removes the repeating group by placing the repeating data,
along with a copy of the original key attribute(s), in a separate
relation. A primary key is identified for the new relation.
Second normal form (2NF):

It is a relation that is in first normal form and every non-primarykey attribute is fully functionally dependent on the primary key.
Boyce-Codd normal form (BCNF):
A relation is in BCNF, if and only if, every determinant is a
candidate key.
Multi-valued dependency (MVD) :
It represents a dependency between attributes (for example, A, B
and C) in a relation, such that for each value of A there is a set of
values for B and a set of value for C. However, the set of values
for B and C are independent of each other.

QUERY PROCESSING:

Validate and translate the query

Good syntax.
All referenced relations exist.
Translate the SQL to relational algebra.
Optimize
Make it run faster.
Evaluate

:
Three Steps of Query Processing
1) The Parsing and translation will first translate the query into its
internal form, and then translate the query into relational algebra
and verifies relations.
2) Optimization is to find the most efficient evaluation plan for a
query because there can be more than one way.
3) Evaluation is what the query-execution engine takes a queryevaluation plan to execute that plan and returns the answers to
the query.
Evaluation

Evaluation is what the query-execution engine takes a queryevaluation plan to execute that plan and returns the answers to
the query.
Selection Operation (primary index):

A l g o r i t h m A2 (binary search). Applicable if


selection is an equality comparison
on the attribute on which file is ordered.
A s s u m e that the blocks of a relation are
stored contiguously
C o s t estimate (number of disk blocks to be
scanned):
* log2(br) cost of locating
the first tuple by a binary
search on the blocks
*
SC (A, r) numbers of records that will
satisfy the selection
* SC (A,r)/fr number of blocks that these
records will occupy
*
Equality condition on a key attribute:
SC(A,r) = 1; estimate reduces to
EA2 = log2(br)
Index scan- search algorithms that
use an index; condition is on
search-key of index.
A3(primary index on candidate
key, equality). Retrieve a single
record
that
satisfies
the
corresponding equality condition.
EA3 = HTi +1

A4(primary index on nonkey,


equality) Retrieve multiple records.
Let the search - key attribute be A.
A5(equality on search-key of secondary index).
Retrieve a single record if the search-key is a
candidate key
EA5 = HTi + 1

End

Retrieve multiple records (each


may be on a different block) if
the search- key is not a
candidate key. EA5 = HTi +
SC(A, r)
Join operation:
Compute the theta join, r s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr, ts) to see if they satisfy the
join condition
if they do, add tr ts to the result.
End
r is called the outer relation and s the inner
relation of the join.
Requires no indices and can be used with any kind
of join condition.
Expensive since it examines
every pair of tuples in the two
relations. If the smaller relation
fits entirely in main memory, use
that relation as the inner relation.

QUERY OPTIMIZATION:

A site-seeing trip
Start : A SQL Query
End: An execution plan
Intermediate Stopovers
query trees
logical tree transforms
strategy selection
What happens after the journey?
Execution plan is executed
Query answer returned

Query Trees:

Logical Transformation and Spatial Queries


Traditional logical transform rules
For relational queries with simple data types and operations
CPU costs are much smaller and I/O costs

Need to be reviewed for spatial queries


o complex data types, operations
o CPU cost is hgher

Example:

Query Processing and Optimizer process

Transaction processing:
Transaction processing is designed to maintain a computer
system (typically a database or some modern filesystems) in a

known, consistent state, by ensuring that any operations carried


out on the system that are interdependent are either all
completed successfully or all canceled successfully.
Transaction processing allows multiple individual operations
to be linked together automatically as a single, indivisible
transaction.
The transaction-processing system ensures that either all
operations in a transaction are completed without error, or
none of them are.
If some of the operations are completed but errors occur
when the others are attempted, the transaction-processing
system rolls back all of the operations of the transaction
(including the successful ones), thereby erasing all traces of
the transaction and restoring the system to the consistent,
known state that it was in before processing of the
transaction began.
If all operations of a transaction are completed successfully, the
transaction i s committed by the system, and all changes to the
database are made permanent; the transaction cannot be rolled
back once this is done.
Transaction processing guards against hardware and software
errors that might leave a transaction partially completed, with the
system left in an unknown, inconsistent state. If the computer
system crashes in the middle of a transaction, the transaction
processing system guarantees that all operations in any
uncommitted (i.e., not completely processed) transactions are
cancelled.
Transactions are processed in a strict chronological order. If
transaction n+1 intends to touch the same portion of the
database as transaction n, transaction n+1 does not begin until
transaction n is committed. Before any transaction is committed,
all other transactions affecting the same part of the system must

also be committed; there can be no holes in the sequence of


preceding transactions.
Methodology:
The basic principles of all transaction-processing systems are the
same. However, the terminology may vary from one transactionprocessing system to another, and the terms used below are not
necessarily universal.
Rollback:
Transaction-processing systems ensure database integrity by
recording intermediate states of the database as it is
modified, then using these records to restore the database to
a known state if a transaction cannot be committed.
For example, copies of information on the database prior to
its modification by a transaction are set aside by the system
before the transaction can make any modifications (this is
sometimes called a before image). If any part of the
transaction fails before it is committed, these copies are
used to restore the database to the state it was in before the
transaction began.
Rollforward:
It is also possible to keep a separate journal of all modifications to
a database (sometimes called after images); this is not required
for rollback of failed transactions, but it is useful for updating the
database in the event of a database failure, so some transaction processing systems provide it.
If the database fails entirely, it must be restored from the most
recent back -up. The back- up will not reflect transactions
committed since the back-up was made.
However, once the database is restored, the journal of after
images can be applied to the database (rollforward) to bring the

database up to date. Any transactions in progress at the time of


the failure can then be rolled back.
The result is a database in a consistent, known state that includes
the results of all transactions committed up to the moment of
failure.

Deadlocks:
In some cases, two transactions may, in the course of their
processing, attempt to access the same portion of a database at
the same time, in a way that prevents them from proceeding
. For example, transaction A may access portion X of the
database, and transaction B may access portion Y of the
database.
If, at that point, transaction A then tries to access portion Y of the
database while transaction B tries to access portion X, a deadlock
occurs, and neither transaction can move forward. T
Concurrency control:
Concurrency control in
Database management systems
other transactional objects, and related distributed
applications (e.g., Grid computing and Cloud computing)
ensures that database transactions are performed
concurrently without violating the data integrity of the
respective databases.
Thus concurrency control is an essential element for
correctness in any system where two database transactions
or more, executed with time overlap, can access the same
data, e.g., virtually in any general-purpose database system.
Consequently a vast body of related research has been
accumulated since database systems have emerged in the
early 1970s.

A well established concurrency control theory for database


systems is outlined in the references mentioned above:
serializability theory, which allows to effectively design and
analyze concurrency control methods and mechanisms.
An alternative theory for concurrency control of atomic
transactions over abstract data types is presented in (Lynch
et al. 1993), and not utilized below. This theory is more
refined, complex, with a wider scope, and has been less
utilized in the Database literature than the classical theory
above.
Each theory has its pros and cons, emphasis and insight. To
some extent they are complementary, and their merging
may be useful.

Database transaction and the ACID rules


The concept of a database transaction (or atomic transaction) has
evolved in order to enable both a well understood database
system behavior in a faulty environment where crashes can
happen any time, and recovery from a crash to a well understood
database state.
A database transaction is a unit of work, typically encapsulating a
number of operations over a database (e.g., reading a database
object, writing, acquiring lock, etc.), an abstraction supported in
database and also other systems.
Each transaction has well defined boundaries in terms of which
program/code executions are included in that transaction
(determined by the transaction's programmer via special
transaction commands).
Every database transaction obeys the following rules (by support
in the database system;

i.e., a database system is designed to guarantee them for the


transactions it runs)
Atomicity
Either the effects of all or none of its operations remain ("all
or nothing" semantics) when a transaction is completed
(committed or aborted respectively).
In other words, to the outside world a committed transaction
appears (by its effects on the database) to be indivisible,
atomic, and an aborted transaction does not leave effects on
the database at all, as if never existed.
Consistency - Every transaction must leave the database in a
consistent (correct) state, i.e., maintain the predetermined
integrity rules of the database (constraints upon and among the
database's objects).
A transaction must transform a database from one consistent
state to another consistent state (however, it is the responsibility
of the transaction's programmer to make sure that the transaction
itself is correct, i.e., performs correctly what it intends to perform
(from the application's point of view) while the predefined
integrity rules are enforced by the DBMS). Thus since a database
can be normally changed only by transactions, all the database's
states are consistent. An aborted transaction does not change the
database state it has started from, as if it never existed (atomicity
above).
Isolation - Transactions cannot interfere with each other (as an
end result of their executions). Moreover, usually (depending on
concurrency control method) the effects of an incomplete
transaction are not even visible to another transaction. Providing
isolation is the main goal of concurrency control.

Durability - Effects of successful (committed) transactions must


persist through crashes (typically by recording the transaction's
effects and its commit event in a non-volatile memory).

RECOVERY.

Recovery after failure?


Distributed recovery maintains atomicity and durability
What happens then?
Abort transactions affected by the failure
Including all subtransactions
Flag the site as failed
Check for recovery or wait for message to confirm
On restart, abort partial transactions which were active
at the time of the failure
Perform local recovery
Update copy of database to be consistent with
remainder of the system
Recovery Protocol
Protocols at failed site to complete all transactions outstanding at
the time of failures.
Classes of failures

1. Site failure
2. Lost messages
3. Network partitioning
4. Byzantine failures
Effects of failures
1. Inconsistent database
2. Transaction processing is blocked
3. Failed component unavailable
Independent Recovery
A recovering site makes a transition directly to a final state
without communicating with other sites.
Lemma
For a protocol, if a local states concurrency set contains both an
abort and commit, it is not resilient to an arbitrary failure of a
single site.
Si

commit because other sites may be in abort

Si

abort because other sites may be in commit

Rule 1: S: Intermediate state


If C(s) contains a commit failure transition from S to commit
Otherwise failure transition from S to abort

Database Tuning:
Database tuning describes a group of activities used to
optimize and homogenize the performance of a database.
It usually overlaps with query tuning, but refers to design of
the database files, selection of the database management
system (DBMS), operating system and CPU the DBMS runs
on.
The goal is to maximize use of system resources to perform
work as efficiently and rapidly as possible.
Most systems are designed to manage work efficiently, but it
is possible to greatly improve performance by customizing
settings and the configuration for the database and the
DBMS being tuned
I/O tuning
Hardware and software configuration of disk subsystems are
examined: RAID levels and configuration [1], block and stripe

size allocation, and the configuration of disks, controller cards,


storage cabinets, and external storage systems such as a SAN.
Transaction logs and temporary spaces are heavy consumers of
I/O, and affect performance for all users of the database. Placing
them appropriately is crucial.
Frequently joined tables and indexes are placed so that as they
are requested from file storage, they can be retrieved in parallel
from separate disks simultaneously. Frequently accessed tables
and indexes are placed on separate disks to balance I/O and
prevent read queuing.
DBMS tuning:
DBMS tuning refers to tuning of the DBMS and the configuration
of the memory and processing resources of the computer running
the DBMS. This is typically done through configuring the DBMS,
but the resources involved are shared with the host system.
Tuning the DBMS can involve setting the recovery interval (time
needed to restore the state of data to a particular point in time),
assigning parallelism (the breaking up of work from a single
query into tasks assigned to different processing resources), and
network protocols used to communicate with database
consumers.
Memory is allocated for data, execution plans, procedure cache,
and work space. It is much faster to access data in memory than
data on storage, so maintaining a sizable cache of data makes
activities perform faster. The same consideration is given to work
space. Caching execution plans and procedures means that they
are reused instead of recompiled
when needed. It is important to take as much memory as
possible, while leaving enough for other processes and the OS to
use without excessive paging of memory to storage.

Processing resources are sometimes assigned to specific activities


to improve concurrency. On a server with eight processors, six
could be reserved for the DBMS to maximize available processing
resources for the database.
Database maintenance:
Database maintenance includes
backups, column statistics
updates, and defragmentation of data inside the database files.
[2]
On a heavily used database, the transaction log grows rapidly.
Transaction log entries must be removed from the log to make
room for future entries. Frequent transaction log backups are
smaller, so they interrupt database activity for shorter periods of
time.
DBMS use statistic histograms to find data in a range against a
table or index. Statistics updates should be scheduled frequently
and sample as much of the underlying data as possible. Accurate
and updated statistics allow query engines to make good
decisions about execution plans, as well as efficiently locate data.
Defragmentation of table and index data increases efficiency in
accessing data. The amount of fragmentation depends on the
nature of the data, how it is changed over time, and the amount
of free space in database pages to accept inserts of data without
creating additional pages.

You might also like