0% found this document useful (0 votes)
19 views34 pages

Unit 5 ADBMS

The document provides an overview of triggers and active databases in DBMS, detailing their components, types, and uses. It explains the creation of database triggers, the concept of Entity ECA rules, and the significance of data warehousing, including its architecture and applications. Additionally, it discusses concurrency control in multi-user database systems.

Uploaded by

Rohit Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views34 pages

Unit 5 ADBMS

The document provides an overview of triggers and active databases in DBMS, detailing their components, types, and uses. It explains the creation of database triggers, the concept of Entity ECA rules, and the significance of data warehousing, including its architecture and applications. Additionally, it discusses concurrency control in multi-user database systems.

Uploaded by

Rohit Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

RSR RUNGTA COLLEGE OF ENGINEERING AND TECHNOLOGY, BHILAI

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Subject Notes Subject Name: ADBMS

Course/Semester: MCA-I

UNIT-5

Explain about triggers and active databases in DBMS

A trigger is a procedure which is automatically invoked by the DBMS in response to changes to the
database, and is specified by the database administrator (DBA). A database with a set of associated
triggers is generally called an active database.

Parts of trigger

A triggers description contains three parts, which are as follows −

 Event − An event is a change to the database which activates the trigger.


 Condition − A query that is run when the trigger is activated is called as a condition.
 Action −A procedure which is executed when the trigger is activated and its condition
is true.

Use of trigger

Triggers may be used for any of the following reasons −


 To implement any complex business rule, that cannot be implemented using integrity
constraints.
 Triggers will be used to audit the process. For example, to keep track of changes made
to a table.
 Trigger is used to perform automatic action when another concerned action takes place.

Types of triggers
The different types of triggers are explained below −
 Statement level trigger − It is fired only once for DML statement irrespective of
number of rows affected by statement. Statement-level triggers are the default type of
trigger.
 Before-triggers − At the time of defining a trigger we can specify whether the trigger
is to be fired before a command like INSERT, DELETE, or UPDATE is executed or
after the command is executed. Before triggers are automatically used to check the
validity of data before the action is performed. For instance, we can use before trigger
to prevent deletion of rows if deletion should not be allowed in a given case.
 After-triggers − It is used after the triggering action is completed. For example, if the
trigger is associated with the INSERT command then it is fired after the row is inserted
into the table.
 Row-level triggers − It is fired for each row that is affected by DML command. For
example, if an UPDATE command updates 150 rows then a row-level trigger is fired
150 times whereas a statement-level trigger is fired only for once.

Create database trigger

To create a database trigger, we use the CREATE TRIGGER command. The details to be given at the
time of creating a trigger are as follows −

 Name of the trigger.


 Table to be associated with.
 When trigger is to be fired: before or after.
 Command that invokes the trigger- UPDATE, DELETE, or INSERT.
 Whether row-level triggers or not.
 Condition to filter rows.
 PL/SQL block is to be executed when trigger is fired.
The syntax to create database trigger is as follows −

CREATE [OR REPLACE] TRIGGER triggername


{BEFORE|AFTER}
{DELETE|INSERT|UPDATE[OF COLUMNS]} ON table
[FOR EACH ROW {WHEN condition]]
[REFERENCE [OLD AS old] [NEW AS new]]
BEGIN
PL/SQL BLOCK
END.

Entity ECA Rules


Entity ECA (EECA) rules can be used to trigger actions to run when data is modified or searched. It is
useful for maintaining entity fields (database columns) that are based on other entity fields or for
updating data in a separate system based on data in this system. EECA rules should not generally be
used for triggering business processes because the rules are applied too widely. Service ECA rules are
a better tool for triggering processes.
For example here is an EECA rule from the Work.eecas.xml file in Mantle Business Artifacts that
calls a service to update the total time worked on a task (WorkEffort) when a TimeEntry is created,
updated, or deleted:

<eeca entity="mantle.work.time.TimeEntry" on-create="true" on-update="true" on-delete="true" get-


entire-entity="true">
<condition><expression>workEffortId</expression></condition>

<actions><service-call name="mantle.work.TaskServices.update#TaskFromTime" in-map="cont


ext"/></actions>

</eeca>

An ECA (event-condition-action) rule is a specialized type of rule to conditionally run actions based
on events. For Entity ECA rules the events are the various find and modify operations you can do with
a record. Set any of these attributes (of the eeca element) to true to trigger the EECA rule on the
operation: on-create, on-update, on-delete, on-find-one, on-find-list, on-find-iterator, on-find-
count.
By default the EECA rule will run after the entity operation. To have it run before set the run-
before attribute to true. There is also a run-on-error attribute which defaults to false and if set to true
the EECA rule will be triggered even if there is an error in the entity operation.
When the actions run the context will be whatever context the service was run in, plus the entity field
values passed into the operation for convenience in using the values. There are also special context
fields added:
 entityValue: A Map with the field values passed into the entity operation. This may not
include all field values that are populated in the database for the record. To fill in the field
values that are not passed in from the database record set the eeca.get-entire-
entity attribute to true.
 originalValue: If the eeca.get-original-value attribute is set to true and the EECA rule
runs before the entity operation (**run-before=**true) this will be an EntityValue object
representing the original (current) value in the database.
 eecaOperation: A String representing the operation that triggered the EECA rule, basically
the on-* attribute name without the "on-".
The condition element is the same condition as used in XML Actions and may contain expression and
compare elements, combined as needed with or, and, and not elements.
The actions element is the same as actions elements in service definitions, screens, forms, etc. It
contains a XML Actions script. See the Overview of XML Actions section for more information.

Data Warehousing
A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports
analytical reporting, structured and/or ad hoc queries and decision making. This tutorial adopts a step-
by-step approach to explain all the necessary concepts of data warehousing.

Background
A Database Management System (DBMS) stores data in the form of tables, uses ER model and the
goal is ACID properties. For example, a DBMS of college has tables for students, faculty, etc.
A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is typically
collected from multiple heterogeneous sources like files, DBMS, etc. The goal is to produce
statistical results that may help in decision makings. For example, a college might want to see quick
different results, like how the placement of CS students has improved over the last 10 years, in
terms of salaries, counts, etc.
Need for Data Warehouse
An ordinary Database can store MBs to GBs of data and that too for a specific purpose. For storing
data of TB size, the storage shifted to Data Warehouse. Besides this, a transactional database
doesn’t offer itself to analytics. To effectively perform analytics, an organization keeps a central
Data Warehouse to closely study its business by organizing, understanding, and using its historic
data for taking strategic decisions and analyzing trends.
Benefits of Data Warehouse:
1. Better business analytics: Data warehouse plays an important role in every business to
store and analysis of all the past data and records of the company. which can further
increase the understanding or analysis of data to the company.
2. Faster Queries: Data warehouse is designed to handle large queries that’s why it runs
queries faster than the database.
3. Improved data Quality: In the data warehouse the data you gathered from different
sources is being stored and analyzed it does not interfere with or add data by itself so
your quality of data is maintained and if you get any issue regarding data quality then
the data warehouse team will solve this.
4. Historical Insight: The warehouse stores all your historical data which contains details
about the business so that one can analyze it at any time and extract insights from it
Data Warehouse vs DBMS
Example Applications of Data Warehousing
Data Warehousing can be applied anywhere where we have a huge amount of data and we want to
see statistical results that help in decision making.

 Social Media Websites: The social networking websites like Facebook, Twitter,
Linkedin, etc. are based on analyzing large data sets. These sites gather data related to
members, groups, locations, etc., and store it in a single central repository. Being a
large amount of data, Data Warehouse is needed for implementing the same.
 Banking: Most of the banks these days use warehouses to see the spending patterns of
account/cardholders. They use this to provide them with special offers, deals, etc.
 Government: Government uses a data warehouse to store and analyze tax payments
which are used to detect tax thefts.
There can be many more applications in different sectors like E-Commerce, telecommunications,
Transportation Services, Marketing and Distribution, Healthcare, and Retail.

Data Warehouse Architecture


A data-warehouse is a heterogeneous collection of different data sources organised under a unified
schema. There are 2 approaches for constructing data-warehouse: Top-down approach and Bottom-
up approach are explained as below.
1. Top-down approach:

The essential components are discussed below:


1. External Sources –
External source is a source from where data is collected irrespective of the type of data.
Data can be structured, semi structured and unstructured as well.

2. Stage Area –
Since the data, extracted from the external sources does not follow a particular format,
so there is a need to validate this data to load into datawarehouse. For this purpose, it is
recommended to use ETL tool.
 E(Extracted): Data is extracted from External data source.
 T(Transform): Data is transformed into the standard format.

 L(Load): Data is loaded into datawarehouse after transforming it into the


standard format.

3. Data-warehouse –
After cleansing of data, it is stored in the datawarehouse as central repository. It
actually stores the meta data and the actual data gets stored in the data marts. Note that
datawarehouse stores the data in its purest form in this top-down approach.

4. Data Marts –
Data mart is also a part of storage component. It stores the information of a particular
function of an organisation which is handled by single authority. There can be as many
number of data marts in an organisation depending upon the functions. We can also say
that data mart contains subset of the data stored in datawarehouse.

5. Data Mining –
The practice of analysing the big data present in datawarehouse is data mining. It is
used to find the hidden patterns that are present in the database or in datawarehouse
with the help of algorithm of data mining.
This approach is defined by Inmon as – datawarehouse as a central repository for the
complete organisation and data marts are created from it after the complete
datawarehouse has been created.

Advantages of Top-Down Approach –


1. Since the data marts are created from the datawarehouse, provides consistent
dimensional view of data marts.

2. Also, this model is considered as the strongest model for business changes. That’s why,
big organisations prefer to follow this approach.

3. Creating data mart from datawarehouse is easy.

Disadvantages of Top-Down Approach –


1. The cost, time taken in designing and its maintenance is very high.

2. Bottom-up approach:
1. First, the data is extracted from external sources (same as happens in top-down
approach).

2. Then, the data go through the staging area (as explained above) and loaded into data
marts instead of datawarehouse. The data marts are created first and provide reporting
capability. It addresses a single business area.

3. These data marts are then integrated into datawarehouse.

This approach is given by Kinball as – data marts are created first and provides a thin view for
analyses and datawarehouse is created after complete data marts have been created.
Advantages of Bottom-Up Approach –
1. As the data marts are created first, so the reports are quickly generated.

2. We can accommodate more number of data marts here and in this way datawarehouse
can be extended.

3. Also, the cost and time taken in designing this model is low comparatively.

Disadvantage of Bottom-Up Approach –


1. This model is not strong as top-down approach as dimensional view of data marts is not
consistent as it is in above approach.

SQL Trigger
Trigger: A trigger is a stored procedure in database which automatically invokes whenever a
special event in the database occurs. For example, a trigger can be invoked when a row is inserted
into a specified table or when certain table columns are being updated.
Syntax:
create trigger [trigger_name]

[before | after]
{insert | update | delete}
on [table_name]

[for each row]

[trigger_body]
Explanation of syntax:
1. create trigger [trigger_name]: Creates or replaces an existing trigger with the
trigger_name.
2. [before | after]: This specifies when the trigger will be executed.
3. {insert | update | delete}: This specifies the DML operation.
4. on [table_name]: This specifies the name of the table associated with the trigger.
5. [for each row]: This specifies a row-level trigger, i.e., the trigger will be executed for
each row being affected.
6. [trigger_body]: This provides the operation to be performed as trigger is fired
BEFORE and AFTER of Trigger:
BEFORE triggers run the trigger action before the triggering statement is run. AFTER triggers run
the trigger action after the triggering statement is run.
Example:
Given Student Report Database, in which student marks assessment is recorded. In such schema,
create a trigger so that the total and percentage of specified marks is automatically inserted
whenever a record is insert.
Here, as trigger will invoke before record is inserted so, BEFORE Tag can be used.

Suppose the database Schema –


mysql> desc Student;

+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+

| tid | int(4) | NO | PRI | NULL | auto_increment |

| name | varchar(30) | YES | | NULL | |


| subj1 | int(2) | YES | | NULL | |

| subj2 | int(2) | YES | | NULL | |

| subj3 | int(2) | YES | | NULL | |


| total | int(3) | YES | | NULL | |

| per | int(3) | YES | | NULL | |

+-------+-------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)

SQL Trigger to problem statement.


create trigger stud_marks
before INSERT
on

Student

for each row


set Student.total = Student.subj1 + Student.subj2 + Student.subj3, Student.per = Student.total * 60 /
100;

Above SQL statement will create a trigger in the student database in which whenever subjects
marks are entered, before inserting this data into the database, trigger will compute those two values
and insert with the entered values. i.e.,

mysql> insert into Student values(0, "ABCDE", 20, 20, 20, 0, 0);
Query OK, 1 row affected (0.09 sec)

mysql> select * from Student;


+-----+-------+-------+-------+-------+-------+------+
| tid | name | subj1 | subj2 | subj3 | total | per |

+-----+-------+-------+-------+-------+-------+------+

| 100 | ABCDE | 20 | 20 | 20 | 60 | 36 |
+-----+-------+-------+-------+-------+-------+------+
1 row in set (0.00 sec)

In this way trigger can be creates and executed in the databases.

Concurrency Control

Concurrency Control is the management procedure that is required for controlling concurrent execution
of the operations that take place on a database.

But before knowing about concurrency control, we should know about concurrent execution.

Concurrent Execution in DBMS

o In a multi-user system, multiple users can access and use the same database at one time, which
is known as the concurrent execution of the database. It means that the same database is
executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the database
by multiple users for performing different operations, and in that case, concurrent execution of
the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an interleaved
manner, and no operation should affect the other executing operations, thus maintaining the
consistency of the database. Thus, on making the concurrent execution of the transaction
operations, there occur several challenging problems that need to be solved.

Problems with Concurrent Execution

In a database transaction, the two main operations are READ and WRITE operations. So, there is a
need to manage these two operations in the concurrent execution of the transactions as if these
operations are not performed in an interleaved manner, and the data may become inconsistent. So, the
following problems occur with the Concurrent Execution of the operations:

Problem 1: Lost Update Problems (W - W Conflict)

The problem occurs when two different database transactions perform the read/write operations on the
same database items in an interleaved manner (i.e., concurrent execution) that makes the values of the
items incorrect hence making the database inconsistent.

For example:

Consider the below diagram where two transactions T X and TY, are performed on the same
account A where the balance of account A is $300.

o At time t1, transaction T X reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted and
not updated/write).
o Alternately, at time t3, transaction T Y reads the value of account A that will be $300 only
because TX didn't update the value yet.
o At time t4, transaction T Y adds $100 to account A that becomes $400 (only added but not
updated/write).
o At time t6, transaction T X writes the value of account A that will be updated as $250 only, as
TY didn't update the value yet.
o Similarly, at time t7, transaction T Y writes the values of account A, so it will write as done at
time t4 that will be $400. It means the value written by T X is lost, i.e., $250 is lost.

Hence data becomes incorrect, and database sets to inconsistent.

Dirty Read Problems (W-R Conflict)


The dirty read problem occurs when one transaction updates an item of the database, and somehow the
transaction fails, and before the data gets rollback, the updated database item is accessed by another
transaction. There comes the Read-Write Conflict between both transactions.

For example:

Consider two transactions TX and TY in the below diagram performing read/write operations on
account A where the available balance in account A is $300:

o At time t1, transaction T X reads the value of account A, i.e., $300.


o At time t2, transaction T X adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction T Y reads account A that will be read as $350.
o Then at time t5, transaction T X rollbacks due to server problem, and the value changes back to
$300 (as initially).
o But the value for account A remains $350 for transaction T Y as committed, which is the dirty
read and therefore known as the Dirty Read Problem.

Unrepeatable Read Problem (W-R Conflict)

Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different values
are read for the same database item.

For example:

Consider two transactions, TX and TY, performing the read/write operations on account A, having
an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction T Y reads the value from account A, i.e., $300.
o At time t3, transaction T Y updates the value of account A by adding $100 to the available
balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction T X reads the available value of account A, and that will be
read as $400.
o It means that within the same transaction T X, it reads two different values of account A, i.e., $
300 initially, and after updation made by transaction T Y, it reads $400. It is an unrepeatable
read and is therefore known as the Unrepeatable read problem.

Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency Control
comes into role.

Concurrency Control

Concurrency Control is the working concept that is required for controlling and managing the
concurrent execution of database operations and thus avoiding the inconsistencies in the database. Thus,
for maintaining the concurrency of the database, we have the concurrency control protocols.

Concurrency Control Protocols

The concurrency control protocols ensure the atomicity, consistency, isolation,


durability and serializability of the concurrent execution of the database transactions. Therefore, these
protocols are categorized as:

o Lock Based Concurrency Control Protocol


o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol

We will understand and discuss each protocol one by one in our next sections.
Query Processing in DBMS
Query Processing is the activity performed in extracting data from the database. In query processing, it
takes various steps for fetching the data from the database. The steps involved are:

1. Parsing and translation


2. Optimization
3. Evaluation

The query processing works in the following way:

Parsing and Translation

As query processing includes certain activities for data retrieval. Initially, the given user queries get
translated in high-level database languages such as SQL. It gets translated into expressions that can be
further used at the physical level of the file system. After this, the actual evaluation of the queries and
a variety of query -optimizing transformations and takes place. Thus before processing a query, a
computer system needs to translate the query into a human-readable and understandable language.
Consequently, SQL or Structured Query Language is the best suitable choice for humans. But, it is not
perfectly suitable for the internal representation of the query to the system. Relational algebra is well
suited for the internal representation of a query. The translation process in query processing is similar
to the parser of a query. When a user executes any query, for generating the internal form of the query,
the parser in the system checks the syntax of the query, verifies the name of the relation in the database,
the tuple, and finally the required attribute value. The parser creates a tree of the query, known as 'parse-
tree.' Further, translate it into the form of relational algebra. With this, it evenly replaces all the use of
the views when used in the query.

Thus, we can understand the working of a query processing in the below-described diagram:

Suppose a user executes a query. As we have learned that there are various methods of extracting the
data from the database. In SQL, a user wants to fetch the records of the employees whose salary is
greater than or equal to 10000. For doing this, the following query is undertaken:
select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to be translated in the form of relational
algebra. We can bring this query in the relational algebra form as:

o σsalary>10000 (πsalary (Employee))


o πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra operation by using different
algorithms. So, in this way, a query processing begins its working.

Evaluation

For this, with addition to the relational algebra translation, it is required to annotate the translated
relational algebra expression with the instructions used for specifying and evaluating each operation.
Thus, after translating the user query, the system executes a query evaluation plan.

Query Evaluation Plan

o In order to fully evaluate a query, the system needs to construct a query evaluation plan.
o The annotations in the evaluation plan may refer to the algorithms to be used for the particular
index or the specific operations.
o Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.
o Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a
query. The query evaluation plan is also referred to as the query execution plan.
o A query execution engine is responsible for generating the output of the given query. It takes
the query execution plan, executes it, and finally makes the output for the user query.

Optimization

o The cost of the query evaluation can vary for different types of queries. Although the system is
responsible for constructing the evaluation plan, the user does need not to write their query
efficiently.
o Usually, a database system generates an efficient query evaluation plan, which minimizes its
cost. This type of task performed by the database system and is known as Query Optimization.
o For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation. It is because the overall operation cost depends on the memory allocations to several
operations, execution costs, and so on.

Finally, after selecting an evaluation plan, the system evaluates the query and produces the output of
the query.

Database Recovery Techniques in DBMS


Database systems, like any other computer system, are subject to failures but the data stored in them
must be available as and when required. When a database fails it must possess the facilities for fast
recovery. It must also have atomicity i.e. either transaction are completed successfully and committed
(the effect is recorded permanently in the database) or the transaction should have no effect on the
database. There are both automatic and non-automatic ways for both, backing up of data and recovery
from any failure situations. The techniques used to recover the lost data due to system crashes,
transaction errors, viruses, catastrophic failure, incorrect commands execution, etc. are database
recovery techniques. So to prevent data loss recovery techniques based on deferred update and
immediate update or backing up data can be used. Recovery techniques are heavily dependent upon
the existence of a special file known as a system log. It contains information about the start and end
of each transaction and any updates which occur during the transaction. The log keeps track of all
transaction operations that affect the values of database items. This information is needed to recover
from transaction failure.
 The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
 read_item(T, X): This log entry records that transaction T reads the value of database
item X.
 write_item(T, X, old_value, new_value): This log entry records that transaction T
changes the value of the database item X from old_value to new_value. The old value is
sometimes known as a before an image of X, and the new value is known as an afterimage
of X.
 commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the
database.
 abort(T): This records that transaction T has been aborted.
 checkpoint: Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in a consistent state, and all the transactions were committed.
A transaction T reaches its commit point when all its operations that access the database have been
executed successfully i.e. the transaction has reached the point at which it will not abort (terminate
without completing). Once committed, the transaction is permanently recorded in the database.
Commitment always involves writing a commit entry to the log and writing the log to disk. At the
time of a system crash, item is searched back in the log for all transactions T that have written a
start_transaction(T) entry into the log but have not written a commit(T) entry yet; these transactions
may have to be rolled back to undo their effect on the database during the recovery process.
 Undoing – If a transaction crashes, then the recovery manager may undo transactions
i.e. reverse the operations of a transaction. This involves examining a transaction for the
log entry write_item(T, x, old_value, new_value) and set the value of item x in the
database to old-value. There are two major techniques for recovery from non-
catastrophic transaction failures: deferred updates and immediate updates.
 Deferred update – This technique does not physically update the database on disk until
a transaction has reached its commit point. Before reaching commit, all transaction
updates are recorded in the local transaction workspace. If a transaction fails before
reaching its commit point, it will not have changed the database in any way so UNDO
is not needed. It may be necessary to REDO the effect of the operations that are
recorded in the local transaction workspace, because their effect may not yet have been
written in the database. Hence, a deferred update is also known as the No-undo/redo
algorithm
 Immediate update – In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its commit point. However,
these operations are recorded in a log on disk before they are applied to the database,
making recovery still possible. If a transaction fails to reach its commit point, the effect
of its operation must be undone i.e. the transaction must be rolled back hence we
require both undo and redo. This technique is known as undo/redo algorithm.
 Caching/Buffering – In this one or more disk pages that include data items to be
updated are cached into main memory buffers and then updated in memory before being
written back to disk. A collection of in-memory buffers called the DBMS cache is kept
under the control of DBMS for holding these buffers. A directory is used to keep track
of which database items are in the buffer. A dirty bit is associated with each buffer,
which is 0 if the buffer is not modified else 1 if modified.
 Shadow paging – It provides atomicity and durability. A directory with n entries is
constructed, where the ith entry points to the ith database page on the link. When a
transaction began executing the current directory is copied into a shadow directory.
When a page is to be modified, a shadow page is allocated in which changes are made
and when it is ready to become durable, all pages that refer to the original are updated
to refer new replacement page.
 Backward Recovery – The term “Rollback ” and “UNDO” can also refer to backward
recovery. When a backup of the data is not available and previous modifications need to
be undone, this technique can be helpful. With the backward recovery method, unused
modifications are removed and the database is returned to its prior condition. All
adjustments made during the previous traction are reversed during the backward
recovery. In another word, it reprocesses valid transactions and undoes the erroneous
database updates.
 Forward Recovery – “Roll forward “and “REDO” refers to forwarding recovery.
When a database needs to be updated with all changes verified, this forward recovery
technique is helpful.
Some failed transactions in this database are applied to the database to roll those
modifications forward. In another word, the database is restored using preserved data
and valid transactions counted by their past saves.
Some of the backup techniques are as follows :

 Full database backup – In this full database including data and database, Meta
information needed to restore the whole database, including full-text catalogs are
backed up in a predefined time series.
 Differential backup – It stores only the data changes that have occurred since the last
full database backup. When some data has changed many times since last full database
backup, a differential backup stores the most recent version of the changed data. For
this first, we need to restore a full database backup.
 Transaction log backup – In this, all events that have occurred in the database, like a
record of every single statement executed is backed up. It is the backup of transaction
log entries and contains all transactions that had happened to the database. Through
this, the database can be recovered to a specific point in time. It is even possible to
perform a backup from a transaction log if the data files are destroyed and not even a
single committed transaction is lost.
What is a Web Database?
A web database is a system for storing information that can then be accessed via a website. For
example, an online community may have a database that stores the username, password, and other
details of all its members. The most commonly used database system for the internet is MySQL due to
its integration with PHP — one of the most widely used server side programming languages.
At its most simple level, a web database is a set of one or more tables that contain data. Each table has
different fields for storing information of various types. These tables can then be linked together in
order to manipulate data in useful or interesting ways. In many cases, a table will use a primary key,
which must be unique for each entry and allows for unambiguous selection of data.

A web database can be used for a range of different purposes. Each field in a table has to have a
defined data type. For example, numbers, strings, and dates can all be inserted into a web database.
Proper database design involves choosing the correct data type for each field in order to reduce
memory consumption and increase the speed of access. Although for small databases this often isn't
so important, big web databases can grow to millions of entries and need to be well designed to work
effectively.

Web Server and Its Type


Web Server: Web server is a program which processes the network requests of the users and serves
them with files that create web pages. This exchange takes place using Hypertext Transfer Protocol
(HTTP).
Basically, web servers are computers used to store HTTP files which makes a website and when a
client requests a certain website, it delivers the requested website to the client. For example, you
want to open Facebook on your laptop and enter the URL in the search bar of google. Now, the
laptop will send an HTTP request to view the facebook webpage to another computer known as the
webserver. This computer (webserver) contains all the files (usually in HTTP format) which make
up the website like text, images, gif files, etc. After processing the request, the webserver will send
the requested website-related files to your computer and then you can reach the website.
Different websites can be stored on the same or different web servers but that doesn’t affect the
actual website that you are seeing in your computer. The web server can be any software or
hardware but is usually a software running on a computer. One web server can handle multiple
users at any given time which is a necessity otherwise there had to be a web server for each user
and considering the current world population, is nearly close to impossible. A web server is never
disconnected from the internet because if it was, then it won’t be able to receive any requests, and
therefore cannot process them.
There are many web servers available in the market both free and paid. Some of them are described
below:
 Apache HTTP server: It is the most popular web server and about 60 percent of the
world’s web server machines run this web server. The Apache HTTP web server was
developed by the Apache Software Foundation. It is an open-source software which
means that we can access and make changes to its code and mold it according to our
preference. The Apache Web Server can be installed and operated easily on almost all
operating systems like Linux, MacOS, Windows, etc.

 Microsoft Internet Information Services (IIS): IIS (Internet Information Services) is


a high performing web server developed by Microsoft. It is strongly united with the
operating system and is therefore relatively easier to administer. It is developed by
Microsoft, it has a good customer support system which is easier to access if we
encounter any issue with the server. It has all the features of the Apache HTTP Server
except that it is not an open-source software and therefore its code is inaccessible which
means that we cannot make changes in the code to suit our needs. It can be easily
installed in any Windows device.

 Lighttpd: Lighttpd is pronounced as ‘Lightly’. It currently runs about 0.1 percent of the
world’s websites. Lighttpd has a small CPU load and is therefore comparatively easier
to run. It has a low memory footprint and hence in comparison to the other web servers,
requires less memory space to run which is always an advantage. It also has speed
optimizations which means that we can optimize or change its speed according to our
requirements. It is an open-source software which means that we can access its code
and add changes to it according to our needs and then upload our own module (the
changed code).

 Jigsaw Server: Jigsaw has been written in the Java language and it can run CGI
(common gateway interference) scripts as well as PHP programs. It is not a full-fledged
server and was developed as an experimental server to demonstrate the new web
protocols. It is an open-source software which means that we can access its code and
add changes to it according to our needs and then upload our own module (the changed
code). It can be installed on any device provided that the device supports Java language
and modifications in Java.

 Sun Java System: The Sun Java System supports various languages, scripts, and
technologies required for Web 2.0 such as Python, PHP, etc. It is not an open-source
software and therefore its code is inaccessible which means that we cannot make
changes in the code to suit our needs.

XML Database
XML database is a data persistence software system used for storing the huge amount of information in
XML format. It provides a secure place to store XML documents.

You can query your stored data by using XQuery, export and serialize into desired format. XML
databases are usually associated with document-oriented databases.

Types of XML databases

There are two types of XML databases.

1. XML-enabled database
2. Native XML database (NXD)

XML-enable Database

XML-enable database works just like a relational database. It is like an extension provided for the
conversion of XML documents. In this database, data is stored in table, in the form of rows and columns.

Native XML Database

Native XML database is used to store large amount of data. Instead of table format, Native XML
database is based on container format. You can query data by XPath expressions.

Native XML database is preferred over XML-enable database because it is highly capable to store,
maintain and query XML documents.

Let's take an example of XML database:

1. <?xml version="1.0"?>
2. <contact-info>
3. <contact1>
4. <name>Vimal Jaiswal</name>
5. <company>SSSIT.org</company>
6. <phone>(0120) 4256464</phone>
7. </contact1>
8. <contact2>
9. <name>Mahesh Sharma </name>
10. <company>SSSIT.org</company>
11. <phone>09990449935</phone>
12. </contact2>
13. </contact-info>

In the above example, a table named contacts is created and holds the contacts (contact1 and contact2).
Each one contains 3 entities name, company and phone.
Data Warehouse Architecture
A data warehouse architecture is a method of defining the overall architecture of data communication
processing and presentation that exist for end-clients computing within the enterprise. Each data
warehouse is different, but all are characterized by standard vital components.

Production applications such as payroll accounts payable product purchasing and inventory control are
designed for online transaction processing (OLTP). Such applications gather detailed data from day to
day operations.

Data Warehouse applications are designed to support the user ad-hoc data requirements, an activity
recently dubbed online analytical processing (OLAP). These include applications such as forecasting,
profiling, summary reporting, and trend analysis.

Production databases are updated continuously by either by hand or via OLTP applications. In contrast,
a warehouse database is updated from operational systems periodically, usually during off-hours. As
OLTP data accumulates in production databases, it is regularly extracted, filtered, and then loaded into
a dedicated warehouse server that is accessible to users. As the warehouse is populated, it must be
restructured tables de-normalized, data cleansed of errors and redundancies and new fields and keys
added to reflect the needs to the user for sorting, combining, and summarizing data.

Data warehouses and their architectures very depending upon the elements of an organization's
situation.

Three common architectures are:

o Data Warehouse Architecture: Basic


o Data Warehouse Architecture: With Staging Area
o Data Warehouse Architecture: With Staging Area and Data Marts

Data Warehouse Architecture: Basic


Operational System

An operational system is a method used in data warehousing to refer to a system that is used to
process the day-to-day transactions of an organization.

Flat Files

A Flat file system is a system of files in which transactional data is stored, and every file in the
system must have a different name.

Meta Data

A set of data that defines and gives information about other data.

Meta Data used in Data Warehouse for a variety of purpose, including:

Meta Data summarizes necessary information about data, which can make finding and work with
particular instances of data more accessible. For example, author, data build, and data changed, and
file size are examples of very basic document metadata.

Metadata is used to direct a query to the most appropriate data source.

Lightly and highly summarized data

The area of the data warehouse saves all the predefined lightly and highly summarized (aggregated)
data generated by the warehouse manager.

The goals of the summarized information are to speed up query performance. The summarized record
is updated continuously as new information is loaded into the warehouse.

End-User access Tools


The principal purpose of a data warehouse is to provide information to the business managers for
strategic decision-making. These customers interact with the warehouse using end-client access tools.

The examples of some of the end-user access tools can be:

o Reporting and Query Tools


o Application Development Tools
o Executive Information Systems Tools
o Online Analytical Processing Tools
o Data Mining Tools

Data Warehouse Architecture: With Staging Area

We must clean and process your operational information before put it into the warehouse.

e can do this programmatically, although data warehouses uses a staging area (A place where data is
processed before entering the warehouse).

A staging area simplifies data cleansing and consolidation for operational method coming from multiple
source systems, especially for enterprise data warehouses where all relevant data of an enterprise is
consolidated.

Data Warehouse Staging Area is a temporary location where a record from source systems is copied.
Data Warehouse Architecture: With Staging Area and Data Marts

We may want to customize our warehouse's architecture for multiple groups within our organization.

We can do this by adding data marts. A data mart is a segment of a data warehouses that can provided
information for reporting and analysis on a section, unit, department or operation in the company, e.g.,
sales, payroll, production, etc.

The figure illustrates an example where purchasing, sales, and stocks are separated. In this example, a
financial analyst wants to analyze historical data for purchases and sales or mine historical information
to make predictions about customer behavior.

Properties of Data Warehouse Architectures

The following architecture properties are necessary for a data warehouse system:-
1. Separation: Analytical and transactional processing should be keep apart as much as possible.

2. Scalability: Hardware and software architectures should be simple to upgrade the data volume,
which has to be managed and processed, and the number of user's requirements, which have to be met,
progressively increase.

3. Extensibility: The architecture should be able to perform new operations and technologies without
redesigning the whole system.

4. Security: Monitoring accesses are necessary because of the strategic data stored in the data
warehouses.

5. Administerability: Data Warehouse management should not be complicated.

Types of Data Warehouse Architectures

Single-Tier Architecture
Single-Tier architecture is not periodically used in practice. Its purpose is to minimize the amount of
data stored to reach this goal; it removes data redundancies.

The figure shows the only layer physically available is the source layer. In this method, data warehouses
are virtual. This means that the data warehouse is implemented as a multidimensional view of
operational data created by specific middleware, or an intermediate processing layer.

The vulnerability of this architecture lies in its failure to meet the requirement for separation between
analytical and transactional processing. Analysis queries are agreed to operational data after the
middleware interprets them. In this way, queries affect transactional workloads.

Two-Tier Architecture

The requirement for separation plays an essential role in defining the two-tier architecture for a data
warehouse system, as shown in fig:
Although it is typically called two-layer architecture to highlight a separation between physically
available sources and data warehouses, in fact, consists of four subsequent data flow stages:

1. Source layer: A data warehouse system uses a heterogeneous source of data. That data is stored
initially to corporate relational databases or legacy databases, or it may come from an
information system outside the corporate walls.
2. Data Staging: The data stored to the source should be extracted, cleansed to remove
inconsistencies and fill gaps, and integrated to merge heterogeneous sources into one standard
schema. The so-named Extraction, Transformation, and Loading Tools (ETL) can combine
heterogeneous schemata, extract, transform, cleanse, validate, filter, and load source data into
a data warehouse.
3. Data Warehouse layer: Information is saved to one logically centralized individual repository:
a data warehouse. The data warehouses can be directly accessed, but it can also be used as a
source for creating data marts, which partially replicate data warehouse contents and are
designed for specific enterprise departments. Meta-data repositories store information on
sources, access procedures, data staging, users, data mart schema, and so on.
4. Analysis: In this layer, integrated data is efficiently, and flexible accessed to issue reports,
dynamically analyze information, and simulate hypothetical business scenarios. It should
feature aggregate information navigators, complex query optimizers, and customer-friendly
GUIs.

Three-Tier Architecture

The three-tier architecture consists of the source layer (containing multiple source system), the
reconciled layer and the data warehouse layer (containing both data warehouses and data marts). The
reconciled layer sits between the source data and data warehouse.
The main advantage of the reconciled layer is that it creates a standard reference data model for a
whole enterprise. At the same time, it separates the problems of source data extraction and integration
from those of data warehouse population. In some cases, the reconciled layer is also directly used to
accomplish better some operational tasks, such as producing daily reports that cannot be satisfactorily
prepared using the corporate applications or generating data flows to feed external processes
periodically to benefit from cleaning and integration.

This architecture is especially useful for the extensive, enterprise-wide systems. A disadvantage of this
structure is the extra file storage space used through the extra redundant reconciled layer. It also makes
the analytical tools a little further away from being real-time.

MultiDimensional Data Model


The multi-Dimensional Data Model is a method which is used for ordering data in the database along
with good arrangement and assembling of the contents in the database.
The Multi Dimensional Data Model allows customers to interrogate analytical questions associated
with market or business trends, unlike relational databases which allow customers to access data in
the form of queries. They allow users to rapidly receive answers to the requests which they made by
creating and examining the data comparatively fast.
OLAP (online analytical processing) and data warehousing uses multi dimensional databases. It is
used to show multiple dimensions of the data to users.
It represents data in the form of data cubes. Data cubes allow to model and view the data from many
dimensions and perspectives. It is defined by dimensions and facts and is represented by a fact table.
Facts are numerical measures and fact tables contain measures of the related dimensional tables or
names of the facts.
Working on a Multidimensional Data Model

On the basis of the pre-decided steps, the Multidimensional Data Model works.
The following stages should be followed by every project for building a Multi Dimensional Data
Model :

Stage 1 : Assembling data from the client : In first stage, a Multi Dimensional Data Model collects
correct data from the client. Mostly, software professionals provide simplicity to the client about the
range of data which can be gained with the selected technology and collect the complete data in detail.
Stage 2 : Grouping different segments of the system : In the second stage, the Multi Dimensional
Data Model recognizes and classifies all the data to the respective section they belong to and also
builds it problem-free to apply step by step.
Stage 3 : Noticing the different proportions : In the third stage, it is the basis on which the design
of the system is based. In this stage, the main factors are recognized according to the user’s point of
view. These factors are also known as “Dimensions”.
Stage 4 : Preparing the actual-time factors and their respective qualities : In the fourth stage, the
factors which are recognized in the previous step are used further for identifying the related qualities.
These qualities are also known as “attributes” in the database.
Stage 5 : Finding the actuality of factors which are listed previously and their qualities : In the
fifth stage, A Multi Dimensional Data Model separates and differentiates the actuality from the
factors which are collected by it. These actually play a significant role in the arrangement of a Multi
Dimensional Data Model.
Stage 6 : Building the Schema to place the data, with respect to the information collected from
the steps above : In the sixth stage, on the basis of the data which was collected previously, a Schema
is built.
For Example :
1. Let us take the example of a firm. The revenue cost of a firm can be recognized on the basis of
different factors such as geographical location of firm’s workplace, products of the firm,
advertisements done, time utilized to flourish a product, etc.
Example 1
2. Let us take the example of the data of a factory which sells products per quarter in Bangalore.
The data is represented in the table given below :

2D factory data
In the above given presentation, the factory’s sales for Bangalore are, for the time dimension, which
is organized into quarters and the dimension of items, which is sorted according to the kind of item
which is sold. The facts here are represented in rupees (in thousands).

Now, if we desire to view the data of the sales in a three-dimensional table, then it is represented in
the diagram given below. Here the data of the sales is represented as a two dimensional table. Let
us consider the data according to item, time and location (like Kolkata, Delhi, Mumbai). Here is the
table :

3D data representation as 2D
This data can be represented in the form of three dimensions conceptually, which is shown in the
image below :
3D data representation

Advantages of Multi Dimensional Data Model

The following are the advantages of a multi-dimensional data model :


 A multi-dimensional data model is easy to handle.
 It is easy to maintain.
 Its performance is better than that of normal databases (e.g. relational databases).
 The representation of data is better than traditional databases. That is because the multi-
dimensional databases are multi-viewed and carry different types of factors.
 It is workable on complex systems and applications, contrary to the simple one-
dimensional database systems.
 The compatibility in this type of database is an upliftment for projects having lower
bandwidth for maintenance staff.

Disadvantages of Multi Dimensional Data Model

The following are the disadvantages of a Multi Dimensional Data Model :


 The multi-dimensional Data Model is slightly complicated in nature and it requires
professionals to recognize and examine the data in the database.
 During the work of a Multi-Dimensional Data Model, when the system caches, there is a
great effect on the working of the system.
 It is complicated in nature due to which the databases are generally dynamic in design.
 The path to achieving the end product is complicated most of the time.
 As the Multi Dimensional Data Model has complicated systems, databases have a large
number of databases due to which the system is very insecure when there is a security
break.

OLAP Operations in DBMS


OLAP stands for Online Analytical Processing Server. It is a software technology that allows users
to analyze information from multiple database systems at the same time. It is based on
multidimensional data model and allows the user to query on multi-dimensional data (eg. Delhi ->
2018 -> Sales data). OLAP databases are divided into one or more cubes and these cubes are known
as Hyper-cubes.

OLAP operations:

There are five basic analytical operations that can be performed on an OLAP cube:

1. Drill down: In drill-down operation, the less detailed data is converted into highly
detailed data. It can be done by:
 Moving down in the concept hierarchy
 Adding a new dimension
In the cube given in overview section, the drill down operation is performed by moving
down in the concept hierarchy of Time dimension (Quarter -> Month).

2. Roll up: It is just opposite of the drill-down operation. It performs aggregation on the
OLAP cube. It can be done by:
 Climbing up in the concept hierarchy
 Reducing the dimensions
In the cube given in the overview section, the roll-up operation is performed by
climbing up in the concept hierarchy of Location dimension (City -> Country).

3. Dice: It selects a sub-cube from the OLAP cube by selecting two or more dimensions.
In the cube given in the overview section, a sub-cube is selected by selecting following
dimensions with criteria:
 Location = “Delhi” or “Kolkata”
 Time = “Q1” or “Q2”
 Item = “Car” or “Bus”

4. Slice: It selects a single dimension from the OLAP cube which results in a new sub-
cube creation. In the cube given in the overview section, Slice is performed on the
dimension Time = “Q1”.

5. Pivot: It is also known as rotation operation as it rotates the current view to get a new
view of the representation. In the sub-cube obtained after the slice operation,
performing pivot operation gives a new view of it.

You might also like