CH 1
CH 1
Manual system/IT
system Generated
Business Raw Data
Operations
What is business intelligence
Objective of BI
BI
Projects To deliver certain
amount of value in
predefined period
Some BI tools
business intelligence software helps companies gain
insight on their overall growth, sales trends, and
customer behavior.
Microsoft BI platform
Oracle BI
Pentaho
SAP business intelligence
WebFOCUS
The Benefits of BI
Time savings
Single version of truth
Improved strategies and plans
Improved tactical decisions
More efficient processes
Cost savings
Faster, more accurate reporting
Improved decision making
Improved customer service
Increased revenue
Data Analysis Problems
The same data found in many different systems Example:
customer data across different stores and departments
The same concept is defined differently
Heterogeneous sources
Relational DBMS, On-Line Transaction Processing (OLTP)
Unstructured data in files (e.g., MS Word)
Data quality is bad
Missing data, imprecise data, different use of systems
Data are “volatile”
Data deleted in operational systems (6 months)
Data change over time – no historical information
Data warehouses are constructed via a process of data cleaning, data integration,
data transformation, data loading, and periodic data refreshing.
To facilitate decision making, the data in a data warehouse are organized around
major subjects (e.g., customer, item, supplier, and activity). The data are stored to
provide information from a historical perspective, such as in the past 6 to 12
months, and are typically summarized.
Characteristics of Data Warehouse
Subject oriented. A data warehouse can be used to analyze a
particular subject area. For example, sales- can be a particular
subject.
Time variant. Data are not current but normally time series.
Conti…
Summarized Operational data are mapped into a decision-
usable format
Large volume. Time series data sets are normally quite large.
Data Mining
Knowledge discovery from data. Supports association,
classification, clustering and presenting data mining results
using visualization tools.
Middle Tier-
In the middle tier, we have the OLAP Server that can be implemented in either
of the following ways:
Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.
Relational OLAP (ROLAP), which is an extended relational database
management system. The ROLAP maps the operations on multidimensional
data to standard relational operations.
Top-Tier-
This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.
Data Mart
A data mart is a simple form of a Data Warehouse. It is
focused on a single subject. Data Mart draws data from only
a few sources. These sources may be central Data
warehouse, internal operational systems, or external data
sources.
It is subject-oriented, and it is designed to meet the needs
of a specific group of users.
A data mart is a segment of a data warehouse that can
provide data for reporting and analysis on a section, unit,
department or operation in the company, e.g. sales, payroll,
production.
Data Warehouse vs Data Mart
What is tactical and strategic decision?
Data warehouse: Tactical decisions are known as short-term decisions because the
alternatives are selected within a limited time frame
Holds multiple subject areas whereas strategic decisions are generally the long-term decisions
because the selection of an alternative is done between different
Holds very detailed information strategies.
Data mart:
Often holds only one subject area- for example, Finance, or Sales
May hold more summarized data (although many hold full detail)
It helps to take tactical decisions for the business.
Concentrates on integrating information from a given subject area
or set of source systems
Is built focused on a dimensional model using a star schema.
Reasons for creating a data mart
Easy access to frequently needed data
Creates collective view by a group of users
Improves end-user response time
Ease of creation
Lower cost than implementing a full data warehouse
Potential users are more clearly defined than in a full
data warehouse
Contains only business essential data and is less
cluttered.
Types of Data Marts
There are three main types
of data marts are:
Dependent: A dependent
data mart allows sourcing
organization's data from a
single Data Warehouse.
Types of Data Marts
Independent: Independent data
mart is created without the use of
a central data warehouse.
This kind of Data Mart is an ideal
option for smaller groups within
an organization.
An independent data mart has
neither a relationship with the
enterprise data warehouse nor
with any other data mart. In
Independent data mart, the data is
input separately, and its analyses
are also performed autonomously.
Types of Data Marts
Hybrid data Mart:
A hybrid data mart combines input from sources apart from Data
warehouse.
This could be helpful when you want ad-hoc integration, like
after a new group or product is added to the organization.
It is best suited for multiple database environments and fast
implementation turnaround for any organization.
It also requires least data cleansing effort. Hybrid Data mart also
supports large storage structures, and it is best suited for flexible
smaller data-centric applications.
Hybrid Data Marts
Steps to Implement Data Mart
Designing
Gathering the business and technical requirements
Identifying data sources
Selecting the appropriate subset of data
Designing the logical and physical structure of the data mart
Constructing
This step includes creating the physical database and the logical
structures associated with the data mart to provide fast and efficient
access to the data. This step involves the following tasks:
Creating the physical database and storage structures, such as
tablespaces, associated with the data mart
Creating the schema objects, such as tables and indexes defined in
the design step
Determining how best to set up the tables and the access structures
Steps to Implement Data Mart
Populating
The populating step covers all of the tasks related to getting the
data from the source, cleaning it up, modifying it to the right format
and level of detail, and moving it into the data mart.
Mapping data sources to target data structures
Extracting data
Cleansing and transforming the data
Loading data into the data mart
Creating and storing metadata
Accessing
The accessing step involves putting the data to use: querying the data,
analyzing it, creating reports, charts, and graphs, and publishing these
Steps to Implement Data Mart
Set up an intermediate layer for the front-end tool to use. This layer,
the metalayer, translates database structures and object names into
business terms, so that the end user can interact with the data mart
using terms that relate to the business function.
Maintain and manage these business interfaces.
Set up and manage database structures, like summarized tables,
that help queries submitted through the front-end tool execute
quickly and efficiently.
Managing
This step involves managing the data mart over its lifetime. In this
step, you perform management tasks such as the following:
Providing secure access to the data
Managing the growth of the data
Optimizing the system for better performance
Ensuring the availability of data even with system failures
Metadata in Data Warehouse
Metadata is simply defined as data about data. The data that is
used to represent other data is known as metadata. For example,
the index of a book serves as a metadata for the contents in the
book.
The basics in the design build on the actual business process which the data
warehouse should cover. This could for instance be a sales situation in a retail
store.
The grain of the model is the exact description of what the dimensional model
should be focusing on. This could for instance be “An individual line item on a
customer slip from a retail store”.
Dimensions are the foundation of the fact table, and is where the data for the
fact table is collected. Typically dimensions are nouns like date, store, inventory
etc. These dimensions are where all the data is stored. For example, the date
dimension could contain data such as year, month and weekday.
This step is to identify the numeric facts that will populate each fact table row.
This step is closely related to the business users of the system, since this is
where they get access to data stored in the data warehouse.
Data Warehouse schema
Schema is a logical description of the entire database.
Data warehouse uses:
Star
Snowflake
Fact Constellation schema.
Data Warehouse schema
Star Schema
Each dimension in a star schema is represented with only one-
dimension table.
This dimension table contains the set of attributes.
The following diagram shows the sales data of a company with
respect to the four dimensions, namely time, item, branch, and
location.
Usually the fact tables in a star schema are in third normal form
(3NF) whereas dimensional tables are de-normalized.
Despite the fact that the star schema is the simplest architecture,
it is most commonly used nowadays and is recommended by
Oracle.
Data Warehouse schema
Data Warehouse schema
Snowflake Schema
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
OLAP operations
Roll-up
Roll-up performs aggregation on a data cube in any of the following
ways:
By climbing up a concept hierarchy for a dimension
By dimension reduction