Unit 5 ADBMS
Unit 5 ADBMS
Course/Semester: MCA-I
UNIT-5
A trigger is a procedure which is automatically invoked by the DBMS in response to changes to the
database, and is specified by the database administrator (DBA). A database with a set of associated
triggers is generally called an active database.
Parts of trigger
Use of trigger
Types of triggers
The different types of triggers are explained below −
Statement level trigger − It is fired only once for DML statement irrespective of
number of rows affected by statement. Statement-level triggers are the default type of
trigger.
Before-triggers − At the time of defining a trigger we can specify whether the trigger
is to be fired before a command like INSERT, DELETE, or UPDATE is executed or
after the command is executed. Before triggers are automatically used to check the
validity of data before the action is performed. For instance, we can use before trigger
to prevent deletion of rows if deletion should not be allowed in a given case.
After-triggers − It is used after the triggering action is completed. For example, if the
trigger is associated with the INSERT command then it is fired after the row is inserted
into the table.
Row-level triggers − It is fired for each row that is affected by DML command. For
example, if an UPDATE command updates 150 rows then a row-level trigger is fired
150 times whereas a statement-level trigger is fired only for once.
To create a database trigger, we use the CREATE TRIGGER command. The details to be given at the
time of creating a trigger are as follows −
</eeca>
An ECA (event-condition-action) rule is a specialized type of rule to conditionally run actions based
on events. For Entity ECA rules the events are the various find and modify operations you can do with
a record. Set any of these attributes (of the eeca element) to true to trigger the EECA rule on the
operation: on-create, on-update, on-delete, on-find-one, on-find-list, on-find-iterator, on-find-
count.
By default the EECA rule will run after the entity operation. To have it run before set the run-
before attribute to true. There is also a run-on-error attribute which defaults to false and if set to true
the EECA rule will be triggered even if there is an error in the entity operation.
When the actions run the context will be whatever context the service was run in, plus the entity field
values passed into the operation for convenience in using the values. There are also special context
fields added:
entityValue: A Map with the field values passed into the entity operation. This may not
include all field values that are populated in the database for the record. To fill in the field
values that are not passed in from the database record set the eeca.get-entire-
entity attribute to true.
originalValue: If the eeca.get-original-value attribute is set to true and the EECA rule
runs before the entity operation (**run-before=**true) this will be an EntityValue object
representing the original (current) value in the database.
eecaOperation: A String representing the operation that triggered the EECA rule, basically
the on-* attribute name without the "on-".
The condition element is the same condition as used in XML Actions and may contain expression and
compare elements, combined as needed with or, and, and not elements.
The actions element is the same as actions elements in service definitions, screens, forms, etc. It
contains a XML Actions script. See the Overview of XML Actions section for more information.
Data Warehousing
A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports
analytical reporting, structured and/or ad hoc queries and decision making. This tutorial adopts a step-
by-step approach to explain all the necessary concepts of data warehousing.
Background
A Database Management System (DBMS) stores data in the form of tables, uses ER model and the
goal is ACID properties. For example, a DBMS of college has tables for students, faculty, etc.
A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is typically
collected from multiple heterogeneous sources like files, DBMS, etc. The goal is to produce
statistical results that may help in decision makings. For example, a college might want to see quick
different results, like how the placement of CS students has improved over the last 10 years, in
terms of salaries, counts, etc.
Need for Data Warehouse
An ordinary Database can store MBs to GBs of data and that too for a specific purpose. For storing
data of TB size, the storage shifted to Data Warehouse. Besides this, a transactional database
doesn’t offer itself to analytics. To effectively perform analytics, an organization keeps a central
Data Warehouse to closely study its business by organizing, understanding, and using its historic
data for taking strategic decisions and analyzing trends.
Benefits of Data Warehouse:
1. Better business analytics: Data warehouse plays an important role in every business to
store and analysis of all the past data and records of the company. which can further
increase the understanding or analysis of data to the company.
2. Faster Queries: Data warehouse is designed to handle large queries that’s why it runs
queries faster than the database.
3. Improved data Quality: In the data warehouse the data you gathered from different
sources is being stored and analyzed it does not interfere with or add data by itself so
your quality of data is maintained and if you get any issue regarding data quality then
the data warehouse team will solve this.
4. Historical Insight: The warehouse stores all your historical data which contains details
about the business so that one can analyze it at any time and extract insights from it
Data Warehouse vs DBMS
Example Applications of Data Warehousing
Data Warehousing can be applied anywhere where we have a huge amount of data and we want to
see statistical results that help in decision making.
Social Media Websites: The social networking websites like Facebook, Twitter,
Linkedin, etc. are based on analyzing large data sets. These sites gather data related to
members, groups, locations, etc., and store it in a single central repository. Being a
large amount of data, Data Warehouse is needed for implementing the same.
Banking: Most of the banks these days use warehouses to see the spending patterns of
account/cardholders. They use this to provide them with special offers, deals, etc.
Government: Government uses a data warehouse to store and analyze tax payments
which are used to detect tax thefts.
There can be many more applications in different sectors like E-Commerce, telecommunications,
Transportation Services, Marketing and Distribution, Healthcare, and Retail.
2. Stage Area –
Since the data, extracted from the external sources does not follow a particular format,
so there is a need to validate this data to load into datawarehouse. For this purpose, it is
recommended to use ETL tool.
E(Extracted): Data is extracted from External data source.
T(Transform): Data is transformed into the standard format.
3. Data-warehouse –
After cleansing of data, it is stored in the datawarehouse as central repository. It
actually stores the meta data and the actual data gets stored in the data marts. Note that
datawarehouse stores the data in its purest form in this top-down approach.
4. Data Marts –
Data mart is also a part of storage component. It stores the information of a particular
function of an organisation which is handled by single authority. There can be as many
number of data marts in an organisation depending upon the functions. We can also say
that data mart contains subset of the data stored in datawarehouse.
5. Data Mining –
The practice of analysing the big data present in datawarehouse is data mining. It is
used to find the hidden patterns that are present in the database or in datawarehouse
with the help of algorithm of data mining.
This approach is defined by Inmon as – datawarehouse as a central repository for the
complete organisation and data marts are created from it after the complete
datawarehouse has been created.
2. Also, this model is considered as the strongest model for business changes. That’s why,
big organisations prefer to follow this approach.
2. Bottom-up approach:
1. First, the data is extracted from external sources (same as happens in top-down
approach).
2. Then, the data go through the staging area (as explained above) and loaded into data
marts instead of datawarehouse. The data marts are created first and provide reporting
capability. It addresses a single business area.
This approach is given by Kinball as – data marts are created first and provides a thin view for
analyses and datawarehouse is created after complete data marts have been created.
Advantages of Bottom-Up Approach –
1. As the data marts are created first, so the reports are quickly generated.
2. We can accommodate more number of data marts here and in this way datawarehouse
can be extended.
3. Also, the cost and time taken in designing this model is low comparatively.
SQL Trigger
Trigger: A trigger is a stored procedure in database which automatically invokes whenever a
special event in the database occurs. For example, a trigger can be invoked when a row is inserted
into a specified table or when certain table columns are being updated.
Syntax:
create trigger [trigger_name]
[before | after]
{insert | update | delete}
on [table_name]
[trigger_body]
Explanation of syntax:
1. create trigger [trigger_name]: Creates or replaces an existing trigger with the
trigger_name.
2. [before | after]: This specifies when the trigger will be executed.
3. {insert | update | delete}: This specifies the DML operation.
4. on [table_name]: This specifies the name of the table associated with the trigger.
5. [for each row]: This specifies a row-level trigger, i.e., the trigger will be executed for
each row being affected.
6. [trigger_body]: This provides the operation to be performed as trigger is fired
BEFORE and AFTER of Trigger:
BEFORE triggers run the trigger action before the triggering statement is run. AFTER triggers run
the trigger action after the triggering statement is run.
Example:
Given Student Report Database, in which student marks assessment is recorded. In such schema,
create a trigger so that the total and percentage of specified marks is automatically inserted
whenever a record is insert.
Here, as trigger will invoke before record is inserted so, BEFORE Tag can be used.
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
+-------+-------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
Student
Above SQL statement will create a trigger in the student database in which whenever subjects
marks are entered, before inserting this data into the database, trigger will compute those two values
and insert with the entered values. i.e.,
mysql> insert into Student values(0, "ABCDE", 20, 20, 20, 0, 0);
Query OK, 1 row affected (0.09 sec)
+-----+-------+-------+-------+-------+-------+------+
| 100 | ABCDE | 20 | 20 | 20 | 60 | 36 |
+-----+-------+-------+-------+-------+-------+------+
1 row in set (0.00 sec)
Concurrency Control
Concurrency Control is the management procedure that is required for controlling concurrent execution
of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent execution.
o In a multi-user system, multiple users can access and use the same database at one time, which
is known as the concurrent execution of the database. It means that the same database is
executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the database
by multiple users for performing different operations, and in that case, concurrent execution of
the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an interleaved
manner, and no operation should affect the other executing operations, thus maintaining the
consistency of the database. Thus, on making the concurrent execution of the transaction
operations, there occur several challenging problems that need to be solved.
In a database transaction, the two main operations are READ and WRITE operations. So, there is a
need to manage these two operations in the concurrent execution of the transactions as if these
operations are not performed in an interleaved manner, and the data may become inconsistent. So, the
following problems occur with the Concurrent Execution of the operations:
The problem occurs when two different database transactions perform the read/write operations on the
same database items in an interleaved manner (i.e., concurrent execution) that makes the values of the
items incorrect hence making the database inconsistent.
For example:
Consider the below diagram where two transactions T X and TY, are performed on the same
account A where the balance of account A is $300.
o At time t1, transaction T X reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted and
not updated/write).
o Alternately, at time t3, transaction T Y reads the value of account A that will be $300 only
because TX didn't update the value yet.
o At time t4, transaction T Y adds $100 to account A that becomes $400 (only added but not
updated/write).
o At time t6, transaction T X writes the value of account A that will be updated as $250 only, as
TY didn't update the value yet.
o Similarly, at time t7, transaction T Y writes the values of account A, so it will write as done at
time t4 that will be $400. It means the value written by T X is lost, i.e., $250 is lost.
For example:
Consider two transactions TX and TY in the below diagram performing read/write operations on
account A where the available balance in account A is $300:
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different values
are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account A, having
an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction T Y reads the value from account A, i.e., $300.
o At time t3, transaction T Y updates the value of account A by adding $100 to the available
balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction T X reads the available value of account A, and that will be
read as $400.
o It means that within the same transaction T X, it reads two different values of account A, i.e., $
300 initially, and after updation made by transaction T Y, it reads $400. It is an unrepeatable
read and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency Control
comes into role.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing the
concurrent execution of database operations and thus avoiding the inconsistencies in the database. Thus,
for maintaining the concurrency of the database, we have the concurrency control protocols.
We will understand and discuss each protocol one by one in our next sections.
Query Processing in DBMS
Query Processing is the activity performed in extracting data from the database. In query processing, it
takes various steps for fetching the data from the database. The steps involved are:
As query processing includes certain activities for data retrieval. Initially, the given user queries get
translated in high-level database languages such as SQL. It gets translated into expressions that can be
further used at the physical level of the file system. After this, the actual evaluation of the queries and
a variety of query -optimizing transformations and takes place. Thus before processing a query, a
computer system needs to translate the query into a human-readable and understandable language.
Consequently, SQL or Structured Query Language is the best suitable choice for humans. But, it is not
perfectly suitable for the internal representation of the query to the system. Relational algebra is well
suited for the internal representation of a query. The translation process in query processing is similar
to the parser of a query. When a user executes any query, for generating the internal form of the query,
the parser in the system checks the syntax of the query, verifies the name of the relation in the database,
the tuple, and finally the required attribute value. The parser creates a tree of the query, known as 'parse-
tree.' Further, translate it into the form of relational algebra. With this, it evenly replaces all the use of
the views when used in the query.
Thus, we can understand the working of a query processing in the below-described diagram:
Suppose a user executes a query. As we have learned that there are various methods of extracting the
data from the database. In SQL, a user wants to fetch the records of the employees whose salary is
greater than or equal to 10000. For doing this, the following query is undertaken:
select emp_name from Employee where salary>10000;
Thus, to make the system understand the user query, it needs to be translated in the form of relational
algebra. We can bring this query in the relational algebra form as:
After translating the given query, we can execute each relational algebra operation by using different
algorithms. So, in this way, a query processing begins its working.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the translated
relational algebra expression with the instructions used for specifying and evaluating each operation.
Thus, after translating the user query, the system executes a query evaluation plan.
o In order to fully evaluate a query, the system needs to construct a query evaluation plan.
o The annotations in the evaluation plan may refer to the algorithms to be used for the particular
index or the specific operations.
o Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.
o Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a
query. The query evaluation plan is also referred to as the query execution plan.
o A query execution engine is responsible for generating the output of the given query. It takes
the query execution plan, executes it, and finally makes the output for the user query.
Optimization
o The cost of the query evaluation can vary for different types of queries. Although the system is
responsible for constructing the evaluation plan, the user does need not to write their query
efficiently.
o Usually, a database system generates an efficient query evaluation plan, which minimizes its
cost. This type of task performed by the database system and is known as Query Optimization.
o For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation. It is because the overall operation cost depends on the memory allocations to several
operations, execution costs, and so on.
Finally, after selecting an evaluation plan, the system evaluates the query and produces the output of
the query.
Full database backup – In this full database including data and database, Meta
information needed to restore the whole database, including full-text catalogs are
backed up in a predefined time series.
Differential backup – It stores only the data changes that have occurred since the last
full database backup. When some data has changed many times since last full database
backup, a differential backup stores the most recent version of the changed data. For
this first, we need to restore a full database backup.
Transaction log backup – In this, all events that have occurred in the database, like a
record of every single statement executed is backed up. It is the backup of transaction
log entries and contains all transactions that had happened to the database. Through
this, the database can be recovered to a specific point in time. It is even possible to
perform a backup from a transaction log if the data files are destroyed and not even a
single committed transaction is lost.
What is a Web Database?
A web database is a system for storing information that can then be accessed via a website. For
example, an online community may have a database that stores the username, password, and other
details of all its members. The most commonly used database system for the internet is MySQL due to
its integration with PHP — one of the most widely used server side programming languages.
At its most simple level, a web database is a set of one or more tables that contain data. Each table has
different fields for storing information of various types. These tables can then be linked together in
order to manipulate data in useful or interesting ways. In many cases, a table will use a primary key,
which must be unique for each entry and allows for unambiguous selection of data.
A web database can be used for a range of different purposes. Each field in a table has to have a
defined data type. For example, numbers, strings, and dates can all be inserted into a web database.
Proper database design involves choosing the correct data type for each field in order to reduce
memory consumption and increase the speed of access. Although for small databases this often isn't
so important, big web databases can grow to millions of entries and need to be well designed to work
effectively.
Lighttpd: Lighttpd is pronounced as ‘Lightly’. It currently runs about 0.1 percent of the
world’s websites. Lighttpd has a small CPU load and is therefore comparatively easier
to run. It has a low memory footprint and hence in comparison to the other web servers,
requires less memory space to run which is always an advantage. It also has speed
optimizations which means that we can optimize or change its speed according to our
requirements. It is an open-source software which means that we can access its code
and add changes to it according to our needs and then upload our own module (the
changed code).
Jigsaw Server: Jigsaw has been written in the Java language and it can run CGI
(common gateway interference) scripts as well as PHP programs. It is not a full-fledged
server and was developed as an experimental server to demonstrate the new web
protocols. It is an open-source software which means that we can access its code and
add changes to it according to our needs and then upload our own module (the changed
code). It can be installed on any device provided that the device supports Java language
and modifications in Java.
Sun Java System: The Sun Java System supports various languages, scripts, and
technologies required for Web 2.0 such as Python, PHP, etc. It is not an open-source
software and therefore its code is inaccessible which means that we cannot make
changes in the code to suit our needs.
XML Database
XML database is a data persistence software system used for storing the huge amount of information in
XML format. It provides a secure place to store XML documents.
You can query your stored data by using XQuery, export and serialize into desired format. XML
databases are usually associated with document-oriented databases.
1. XML-enabled database
2. Native XML database (NXD)
XML-enable Database
XML-enable database works just like a relational database. It is like an extension provided for the
conversion of XML documents. In this database, data is stored in table, in the form of rows and columns.
Native XML database is used to store large amount of data. Instead of table format, Native XML
database is based on container format. You can query data by XPath expressions.
Native XML database is preferred over XML-enable database because it is highly capable to store,
maintain and query XML documents.
1. <?xml version="1.0"?>
2. <contact-info>
3. <contact1>
4. <name>Vimal Jaiswal</name>
5. <company>SSSIT.org</company>
6. <phone>(0120) 4256464</phone>
7. </contact1>
8. <contact2>
9. <name>Mahesh Sharma </name>
10. <company>SSSIT.org</company>
11. <phone>09990449935</phone>
12. </contact2>
13. </contact-info>
In the above example, a table named contacts is created and holds the contacts (contact1 and contact2).
Each one contains 3 entities name, company and phone.
Data Warehouse Architecture
A data warehouse architecture is a method of defining the overall architecture of data communication
processing and presentation that exist for end-clients computing within the enterprise. Each data
warehouse is different, but all are characterized by standard vital components.
Production applications such as payroll accounts payable product purchasing and inventory control are
designed for online transaction processing (OLTP). Such applications gather detailed data from day to
day operations.
Data Warehouse applications are designed to support the user ad-hoc data requirements, an activity
recently dubbed online analytical processing (OLAP). These include applications such as forecasting,
profiling, summary reporting, and trend analysis.
Production databases are updated continuously by either by hand or via OLTP applications. In contrast,
a warehouse database is updated from operational systems periodically, usually during off-hours. As
OLTP data accumulates in production databases, it is regularly extracted, filtered, and then loaded into
a dedicated warehouse server that is accessible to users. As the warehouse is populated, it must be
restructured tables de-normalized, data cleansed of errors and redundancies and new fields and keys
added to reflect the needs to the user for sorting, combining, and summarizing data.
Data warehouses and their architectures very depending upon the elements of an organization's
situation.
An operational system is a method used in data warehousing to refer to a system that is used to
process the day-to-day transactions of an organization.
Flat Files
A Flat file system is a system of files in which transactional data is stored, and every file in the
system must have a different name.
Meta Data
A set of data that defines and gives information about other data.
Meta Data summarizes necessary information about data, which can make finding and work with
particular instances of data more accessible. For example, author, data build, and data changed, and
file size are examples of very basic document metadata.
The area of the data warehouse saves all the predefined lightly and highly summarized (aggregated)
data generated by the warehouse manager.
The goals of the summarized information are to speed up query performance. The summarized record
is updated continuously as new information is loaded into the warehouse.
We must clean and process your operational information before put it into the warehouse.
e can do this programmatically, although data warehouses uses a staging area (A place where data is
processed before entering the warehouse).
A staging area simplifies data cleansing and consolidation for operational method coming from multiple
source systems, especially for enterprise data warehouses where all relevant data of an enterprise is
consolidated.
Data Warehouse Staging Area is a temporary location where a record from source systems is copied.
Data Warehouse Architecture: With Staging Area and Data Marts
We may want to customize our warehouse's architecture for multiple groups within our organization.
We can do this by adding data marts. A data mart is a segment of a data warehouses that can provided
information for reporting and analysis on a section, unit, department or operation in the company, e.g.,
sales, payroll, production, etc.
The figure illustrates an example where purchasing, sales, and stocks are separated. In this example, a
financial analyst wants to analyze historical data for purchases and sales or mine historical information
to make predictions about customer behavior.
The following architecture properties are necessary for a data warehouse system:-
1. Separation: Analytical and transactional processing should be keep apart as much as possible.
2. Scalability: Hardware and software architectures should be simple to upgrade the data volume,
which has to be managed and processed, and the number of user's requirements, which have to be met,
progressively increase.
3. Extensibility: The architecture should be able to perform new operations and technologies without
redesigning the whole system.
4. Security: Monitoring accesses are necessary because of the strategic data stored in the data
warehouses.
Single-Tier Architecture
Single-Tier architecture is not periodically used in practice. Its purpose is to minimize the amount of
data stored to reach this goal; it removes data redundancies.
The figure shows the only layer physically available is the source layer. In this method, data warehouses
are virtual. This means that the data warehouse is implemented as a multidimensional view of
operational data created by specific middleware, or an intermediate processing layer.
The vulnerability of this architecture lies in its failure to meet the requirement for separation between
analytical and transactional processing. Analysis queries are agreed to operational data after the
middleware interprets them. In this way, queries affect transactional workloads.
Two-Tier Architecture
The requirement for separation plays an essential role in defining the two-tier architecture for a data
warehouse system, as shown in fig:
Although it is typically called two-layer architecture to highlight a separation between physically
available sources and data warehouses, in fact, consists of four subsequent data flow stages:
1. Source layer: A data warehouse system uses a heterogeneous source of data. That data is stored
initially to corporate relational databases or legacy databases, or it may come from an
information system outside the corporate walls.
2. Data Staging: The data stored to the source should be extracted, cleansed to remove
inconsistencies and fill gaps, and integrated to merge heterogeneous sources into one standard
schema. The so-named Extraction, Transformation, and Loading Tools (ETL) can combine
heterogeneous schemata, extract, transform, cleanse, validate, filter, and load source data into
a data warehouse.
3. Data Warehouse layer: Information is saved to one logically centralized individual repository:
a data warehouse. The data warehouses can be directly accessed, but it can also be used as a
source for creating data marts, which partially replicate data warehouse contents and are
designed for specific enterprise departments. Meta-data repositories store information on
sources, access procedures, data staging, users, data mart schema, and so on.
4. Analysis: In this layer, integrated data is efficiently, and flexible accessed to issue reports,
dynamically analyze information, and simulate hypothetical business scenarios. It should
feature aggregate information navigators, complex query optimizers, and customer-friendly
GUIs.
Three-Tier Architecture
The three-tier architecture consists of the source layer (containing multiple source system), the
reconciled layer and the data warehouse layer (containing both data warehouses and data marts). The
reconciled layer sits between the source data and data warehouse.
The main advantage of the reconciled layer is that it creates a standard reference data model for a
whole enterprise. At the same time, it separates the problems of source data extraction and integration
from those of data warehouse population. In some cases, the reconciled layer is also directly used to
accomplish better some operational tasks, such as producing daily reports that cannot be satisfactorily
prepared using the corporate applications or generating data flows to feed external processes
periodically to benefit from cleaning and integration.
This architecture is especially useful for the extensive, enterprise-wide systems. A disadvantage of this
structure is the extra file storage space used through the extra redundant reconciled layer. It also makes
the analytical tools a little further away from being real-time.
On the basis of the pre-decided steps, the Multidimensional Data Model works.
The following stages should be followed by every project for building a Multi Dimensional Data
Model :
Stage 1 : Assembling data from the client : In first stage, a Multi Dimensional Data Model collects
correct data from the client. Mostly, software professionals provide simplicity to the client about the
range of data which can be gained with the selected technology and collect the complete data in detail.
Stage 2 : Grouping different segments of the system : In the second stage, the Multi Dimensional
Data Model recognizes and classifies all the data to the respective section they belong to and also
builds it problem-free to apply step by step.
Stage 3 : Noticing the different proportions : In the third stage, it is the basis on which the design
of the system is based. In this stage, the main factors are recognized according to the user’s point of
view. These factors are also known as “Dimensions”.
Stage 4 : Preparing the actual-time factors and their respective qualities : In the fourth stage, the
factors which are recognized in the previous step are used further for identifying the related qualities.
These qualities are also known as “attributes” in the database.
Stage 5 : Finding the actuality of factors which are listed previously and their qualities : In the
fifth stage, A Multi Dimensional Data Model separates and differentiates the actuality from the
factors which are collected by it. These actually play a significant role in the arrangement of a Multi
Dimensional Data Model.
Stage 6 : Building the Schema to place the data, with respect to the information collected from
the steps above : In the sixth stage, on the basis of the data which was collected previously, a Schema
is built.
For Example :
1. Let us take the example of a firm. The revenue cost of a firm can be recognized on the basis of
different factors such as geographical location of firm’s workplace, products of the firm,
advertisements done, time utilized to flourish a product, etc.
Example 1
2. Let us take the example of the data of a factory which sells products per quarter in Bangalore.
The data is represented in the table given below :
2D factory data
In the above given presentation, the factory’s sales for Bangalore are, for the time dimension, which
is organized into quarters and the dimension of items, which is sorted according to the kind of item
which is sold. The facts here are represented in rupees (in thousands).
Now, if we desire to view the data of the sales in a three-dimensional table, then it is represented in
the diagram given below. Here the data of the sales is represented as a two dimensional table. Let
us consider the data according to item, time and location (like Kolkata, Delhi, Mumbai). Here is the
table :
3D data representation as 2D
This data can be represented in the form of three dimensions conceptually, which is shown in the
image below :
3D data representation
OLAP operations:
There are five basic analytical operations that can be performed on an OLAP cube:
1. Drill down: In drill-down operation, the less detailed data is converted into highly
detailed data. It can be done by:
Moving down in the concept hierarchy
Adding a new dimension
In the cube given in overview section, the drill down operation is performed by moving
down in the concept hierarchy of Time dimension (Quarter -> Month).
2. Roll up: It is just opposite of the drill-down operation. It performs aggregation on the
OLAP cube. It can be done by:
Climbing up in the concept hierarchy
Reducing the dimensions
In the cube given in the overview section, the roll-up operation is performed by
climbing up in the concept hierarchy of Location dimension (City -> Country).
3. Dice: It selects a sub-cube from the OLAP cube by selecting two or more dimensions.
In the cube given in the overview section, a sub-cube is selected by selecting following
dimensions with criteria:
Location = “Delhi” or “Kolkata”
Time = “Q1” or “Q2”
Item = “Car” or “Bus”
4. Slice: It selects a single dimension from the OLAP cube which results in a new sub-
cube creation. In the cube given in the overview section, Slice is performed on the
dimension Time = “Q1”.
5. Pivot: It is also known as rotation operation as it rotates the current view to get a new
view of the representation. In the sub-cube obtained after the slice operation,
performing pivot operation gives a new view of it.