PGDM BA 04 - Data Mining
PGDM BA 04 - Data Mining
Online Analytical Processing(OLAP) refers to a set of software tools used for data
analysis in order to make business decisions. OLAP provides a platform for gaining
insights from databases retrieved from multiple database systems at the same time. It
is based on a multidimensional data model, which enables users to extract and view
data from various perspectives. A multidimensional database is used to store OLAP
data. Many Business Intelligence (BI) applications rely on OLAP technology.
Product type
Location
Time
Data engineers build a multidimensional OLAP system that consists of the following elements.
Data warehouse
A data warehouse collects information from different sources, including applications, files, and
databases. It processes the information using various tools so that the data is ready for analytical
purposes. For example, the data warehouse might collect information from a relational database that
stores data in tables of rows and columns.
ETL tools
Extract, transform, and load (ETL) tools are database processes that automatically retrieve, change,
and prepare the data to a format fit for analytical purposes. Data warehouses use ETL to convert
and standardize information from various sources before making it available to OLAP tools.
OLAP server
An OLAP server is the underlying machine that powers the OLAP system. It uses ETL tools to
transform information in the relational databases and prepare them for OLAP operations.
OLAP database
An OLAP database is a separate database that connects to the data warehouse. Data engineers
sometimes use an OLAP database to prevent the data warehouse from being burdened by OLAP
analysis. They also use an OLAP database to make it easier to create OLAP data models.
OLAP cubes
A data cube is a model representing a multidimensional array of information. While it’s easier to
visualize it as a three-dimensional data model, most data cubes have more than three dimensions.
An OLAP cube, or hypercube, is the term for data cubes in an OLAP system. OLAP cubes are rigid
because you can't change the dimensions and underlying data once you model it. For example, if
you add the warehouse dimension to a cube with product, location, and time dimensions, you have
to remodel the entire cube.
1. The OLAP server collects data from multiple data sources, including relational databases and data
warehouses.
2. Then, the extract, transform, and load (ETL) tools clean, aggregate, precalculate, and store data in
an OLAP cube according to the number of dimensions specified.
3. Business analysts use OLAP tools to query and generate reports from the multidimensional data in
the OLAP cube.
OLAP uses Multidimensional Expressions (MDX) to query the OLAP cube. MDX is a query, like
SQL, that provides a set of instructions for manipulating databases.
Data modeling is the representation of data in data warehouses or online analytical processing
(OLAP) databases. Data modeling is essential in relational online analytical processing
(ROLAP) because it analyzes data straight from the relational database. It stores
multidimensional data as a star or snowflake schema.
Star schema
The star schema consists of a fact table and multiple dimension tables. The fact table is a data
table that contains numerical values related to a business process, and the dimension table
contains values that describe each attribute in the fact table. The fact table refers to dimensional
tables with foreign keys—unique identifiers that correlate to the respective information in the
dimension table.
In a star schema, a fact table connects to several dimension tables so the data model looks like a
star. The following is an example of a fact table for product sales:
Product ID
Location ID
Salesperson ID
Sales amount
The product ID tells the database system to retrieve information from the product dimension
table, which might look as follows:
Product ID
Product name
Product type
Product cost
Likewise, the location ID points to a location dimension table, which could consist of the
following:
Location ID
Country
City
Salesperson ID
First name
Last name
Email
Snowflake schema
The snowflake schema is an extension of the star schema. Some dimension tables might lead to
one or more secondary dimension tables. This results in a snowflake-like shape when the
dimension tables are put together.
For example, the product dimension table might contain the following fields:
Product ID
Product name
Product type ID
Product cost
The product type ID connects to another dimension table as shown in the following example:
Product type ID
Type name
Version
Variant
Business analysts perform several basic analytical operations with a multidimensional online
analytical processing (MOLAP) cube.
Roll up
In roll up, the online analytical processing (OLAP) system summarizes the data for specific
attributes. In other words, it shows less-detailed data. For example, you might view product sales
according to New York, California, London, and Tokyo. A roll-up operation would provide a
view of the sales data based on countries, such as the US, the UK, and Japan.
Drill down
Drill down is the opposite of the roll-up operation. Business analysts move downward in the
concept hierarchy and extract the details they require. For example, they can move from viewing
sales data by years to visualizing it by months.
Slice
Data engineers use the slice operation to create a two-dimensional view from the OLAP cube.
For example, a MOLAP cube sorts data according to products, cities, and months. By slicing the
cube, data engineers can create a spreadsheet-like table consisting of products and cities for a
specific month.
Dice
Data engineers use the dice operation to create a smaller subcube from an OLAP cube. They
determine the required dimensions and build a smaller cube from the original hypercube.
Pivot
The pivot operation involves rotating the OLAP cube along one of its dimensions to get a
different perspective on the multidimensional data model. For example, a three-dimensional
OLAP cube has the following dimensions on the respective axes:
X-axis—product
Y-axis—location
Z-axis—time
X-axis—location
Y-axis—time
Z-axis—product
Data mining
Data mining is analytics technology that processes large volumes of historical data to find
patterns and insights. Business analysts use data-mining tools to discover relationships within the
data and make accurate predictions of future trends.
Online analytical processing (OLAP) is a database analysis technology that involves querying,
extracting, and studying summarized data. On the other hand, data mining involves looking
deeply into unprocessed information. For example, marketers could use data-mining tools to
analyze user behaviors from records of every website visit. They might then use OLAP software
to inspect those behaviors from various angles, such as duration, device, country, language, and
browser type.
OLTP
Online transaction processing (OLTP) is a data technology that stores information quickly and
reliably in a database. Data engineers use OLTP tools to store transactional data, such as
financial records, service subscriptions, and customer feedback, in a relational database. OLTP
systems involve creating, updating, and deleting records in relational tables.
OLTP is great for handling and storing multiple streams of transactions in databases. However, it
cannot perform complex queries from the database. Therefore, business analysts use an OLAP
system to analyze multidimensional data. For example, data scientists connect an OLTP database
to a cloud-based OLAP cube to perform compute-intensive queries on historical data.
Amazon Redshift is a cloud data warehouse designed specifically for online analytical processing.
Amazon Relational Database Service (Amazon RDS) is a relational database with OLAP
functionality. Data engineers use Amazon RDS with Oracle OLAP to perform complex queries on
dimensional cubes.
Amazon Aurora is a MySQL- and PostgreSQL-compatible cloud relational database. It is optimized
for running complex OLAP workloads.