0% found this document useful (0 votes)
39 views65 pages

CH 1

Business intelligence (BI) tools allow organizations to analyze large amounts of data generated every day and gain insights to make better decisions. As data volumes grow exponentially, BI helps transform petabytes of data into valuable products and services. BI technologies gather, store, access, and analyze data to help users make informed decisions. A data warehouse integrates data from multiple sources to support analysis and decision making. It provides a single, consistent view of business data over time to help analyze trends and strategies. Data marts contain a subset of data focused on a particular subject like sales or finance to support tactical decision making.

Uploaded by

gauravkhunt110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views65 pages

CH 1

Business intelligence (BI) tools allow organizations to analyze large amounts of data generated every day and gain insights to make better decisions. As data volumes grow exponentially, BI helps transform petabytes of data into valuable products and services. BI technologies gather, store, access, and analyze data to help users make informed decisions. A data warehouse integrates data from multiple sources to support analysis and decision making. It provides a single, consistent view of business data over time to help analyze trends and strategies. Data marts contain a subset of data focused on a particular subject like sales or finance to support tactical decision making.

Uploaded by

gauravkhunt110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

1.

Data, Information, Knowledge


 Data
 Items that are the most elementary descriptions of
things, events, activities, and transactions
 May be internal or external
 Information
 Organized data that has meaning and value
 Knowledge
 Processed data or information that conveys
understanding, experience or learning applicable to a
problem or activity
Life cycle of data
Why we need BI?
 21st century is regarded as the age of information
technology, thus ability to use data and information in real
time has become a key to success of any and every
organization.
 Due to recent revolution in internet technologies, the
amount of information generated every second is
enormous.
 But here we must understand that the real power does not
lie in the data and information itself, the key lies in
changing those Petabytes of data in some valuable
products and services. And here lies the power of business
intelligence tools.
What is business intelligence
 Business Intelligence (BI) is –
 New technology for understanding the past & predicting
the future
 A broad category of technologies that allows for
gathering, storing, accessing & analyzing data to help
business users make better decisions
 The process of BI is based on the transformation of data
to information, then to decisions, and finally to actions.

Manual system/IT
system Generated
Business Raw Data
Operations
What is business intelligence
Objective of BI

BI
Projects To deliver certain
amount of value in
predefined period
Some BI tools
 business intelligence software helps companies gain
insight on their overall growth, sales trends, and
customer behavior.
 Microsoft BI platform
 Oracle BI
 Pentaho
 SAP business intelligence
 WebFOCUS
The Benefits of BI
 Time savings
 Single version of truth
 Improved strategies and plans
 Improved tactical decisions
 More efficient processes
 Cost savings
 Faster, more accurate reporting
 Improved decision making
 Improved customer service
 Increased revenue
Data Analysis Problems
 The same data found in many different systems Example:
 customer data across different stores and departments
 The same concept is defined differently
 Heterogeneous sources
 Relational DBMS, On-Line Transaction Processing (OLTP)
 Unstructured data in files (e.g., MS Word)
 Data quality is bad
 Missing data, imprecise data, different use of systems
 Data are “volatile”
 Data deleted in operational systems (6 months)
 Data change over time – no historical information

A good DW is a prerequisite for successful BI


Data warehousing
 A data warehouse is constructed by integrating the data from multiple
heterogeneous sources. It supports analytical reporting, structured and/or ad hoc
queries, and decision making.

 Data warehouse systems provide multidimensional data analysis capabilities,


collectively referred to as online analytical processing(OLAP)

 Data warehouses are constructed via a process of data cleaning, data integration,
data transformation, data loading, and periodic data refreshing.

 Data in Data warehouse stored under a unified schema

 To facilitate decision making, the data in a data warehouse are organized around
major subjects (e.g., customer, item, supplier, and activity). The data are stored to
provide information from a historical perspective, such as in the past 6 to 12
months, and are typically summarized.
Characteristics of Data Warehouse
 Subject oriented. A data warehouse can be used to analyze a
particular subject area. For example, sales- can be a particular
subject.

 Integrated. All inconsistencies regarding naming convention


and value representations are removed.

 Nonvolatile. Data are stored in read-only format and do not


change over time.

 Time variant. Data are not current but normally time series.
Conti…
 Summarized Operational data are mapped into a decision-
usable format

 Large volume. Time series data sets are normally quite large.

 Not normalized. DW data can be, and often are, redundant.

 Metadata. Data about data are stored.

 Data sources. Data come from internal and external


unintegrated operational systems.
Conti..
Data Warehouse
A DW is a
 subject-oriented,
 integrated,
 time-varying,
 non-volatile
collection of data that is used primarily in organizational
decision making.”

-- Bill Inmon, Building the Data Warehouse 1996


Data Warehouse vs. Operational DBMS
Parameters OLTP OLAP

User Clerk, IT Professional Knowledge worker


Function Day to day operations Decision support
DB Design Application-oriented Subject-oriented
Data Current, Isolated Historical,
Consolidated
View Detailed, Flat relational Summarized,
Multidimensional
Usage Structured, Repetitive Ad hoc
Unit of work Short, Simple transaction Complex query
Access Read/write Read Mostly
Operations Index/hash on prim. Key Lots of Scans
Conti..
Parameters OLTP OLAP

# Rec. accessed Tens Millions


#Users Thousands Hundreds
Db size 100 MB-GB 100 GB-TB
Metric Trans. throughput Query throughput,
response
Advantages of Data Warehousing
 Timely access to data
 Enhanced data consistency and quality
 Enhanced business intelligence
 Advanced query processing
 Retention of data history
 Disaster recovery implications
Data warehousing usage
 Information processing- Supports querying, basic statistical
analysis and reporting.

 Analytical processing- Multidimensional analysis of data


warehouse data. Supports OLAP operations.

 Data Mining
 Knowledge discovery from data. Supports association,
classification, clustering and presenting data mining results
using visualization tools.

 Data Warehousing provides the Enterprise with a memory. Data


Mining provides the Enterprise with intelligence.
Components of Data Warehouse
Components of Data Warehouse
 Operational Source System-
 The data stored in the warehouse is uploaded from the
operational systems (such as marketing or sales). The data
may pass through an operational data store and may require
data cleansing
 Data Staging Area
 Data staging area is the storage area as well as set of ETL
process that extract data from source system. It is everything
between source systems and Data warehouse.
 Data staging are never be used for reporting purpose. Data is
extracted from source system and stored, cleansed,
transformed in staging area to load into data warehouse.
Components of Data Warehouse
 Data Presentation Area
 Data presentation area is generally called as data
warehouse. It’s the place where cleaned, transformed
data is stored in a dimensionally structured warehouse
and made available for analysis purpose.
 Data Access Tools
 once data is available in presentation area it is accessed
using data access tools like Business Objects.
Three tier data warehouse architecture
Three tier data warehouse architecture
 Bottom Tier-
The bottom tier of the architecture is the data warehouse database server. It
perform the Extract, Clean, Load, and refresh functions.

 Middle Tier-
In the middle tier, we have the OLAP Server that can be implemented in either
of the following ways:
Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.
Relational OLAP (ROLAP), which is an extended relational database
management system. The ROLAP maps the operations on multidimensional
data to standard relational operations.

 Top-Tier-
This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.
Data Mart
 A data mart is a simple form of a Data Warehouse. It is
focused on a single subject. Data Mart draws data from only
a few sources. These sources may be central Data
warehouse, internal operational systems, or external data
sources.
 It is subject-oriented, and it is designed to meet the needs
of a specific group of users.
 A data mart is a segment of a data warehouse that can
provide data for reporting and analysis on a section, unit,
department or operation in the company, e.g. sales, payroll,
production.
Data Warehouse vs Data Mart
What is tactical and strategic decision?
 Data warehouse: Tactical decisions are known as short-term decisions because the
alternatives are selected within a limited time frame

 Holds multiple subject areas whereas strategic decisions are generally the long-term decisions
because the selection of an alternative is done between different
 Holds very detailed information strategies.

 It helps to take a strategic decision.


 Works to integrate all data sources
 Does not necessarily use a dimensional model but feeds
dimensional models.

 Data mart:
 Often holds only one subject area- for example, Finance, or Sales
 May hold more summarized data (although many hold full detail)
 It helps to take tactical decisions for the business.
 Concentrates on integrating information from a given subject area
or set of source systems
 Is built focused on a dimensional model using a star schema.
Reasons for creating a data mart
 Easy access to frequently needed data
 Creates collective view by a group of users
 Improves end-user response time
 Ease of creation
 Lower cost than implementing a full data warehouse
 Potential users are more clearly defined than in a full
data warehouse
 Contains only business essential data and is less
cluttered.
Types of Data Marts
 There are three main types
of data marts are:
 Dependent: A dependent
data mart allows sourcing
organization's data from a
single Data Warehouse.
Types of Data Marts
 Independent: Independent data
mart is created without the use of
a central data warehouse.
 This kind of Data Mart is an ideal
option for smaller groups within
an organization.
 An independent data mart has
neither a relationship with the
enterprise data warehouse nor
with any other data mart. In
Independent data mart, the data is
input separately, and its analyses
are also performed autonomously.
Types of Data Marts
 Hybrid data Mart:
 A hybrid data mart combines input from sources apart from Data
warehouse.
 This could be helpful when you want ad-hoc integration, like
after a new group or product is added to the organization.
 It is best suited for multiple database environments and fast
implementation turnaround for any organization.
 It also requires least data cleansing effort. Hybrid Data mart also
supports large storage structures, and it is best suited for flexible
smaller data-centric applications.
Hybrid Data Marts
Steps to Implement Data Mart
 Designing
 Gathering the business and technical requirements
 Identifying data sources
 Selecting the appropriate subset of data
 Designing the logical and physical structure of the data mart
 Constructing
 This step includes creating the physical database and the logical
structures associated with the data mart to provide fast and efficient
access to the data. This step involves the following tasks:
 Creating the physical database and storage structures, such as
tablespaces, associated with the data mart
 Creating the schema objects, such as tables and indexes defined in
the design step
 Determining how best to set up the tables and the access structures
Steps to Implement Data Mart
 Populating
 The populating step covers all of the tasks related to getting the
data from the source, cleaning it up, modifying it to the right format
and level of detail, and moving it into the data mart.
 Mapping data sources to target data structures
 Extracting data
 Cleansing and transforming the data
 Loading data into the data mart
 Creating and storing metadata
 Accessing
 The accessing step involves putting the data to use: querying the data,
analyzing it, creating reports, charts, and graphs, and publishing these
Steps to Implement Data Mart
 Set up an intermediate layer for the front-end tool to use. This layer,
the metalayer, translates database structures and object names into
business terms, so that the end user can interact with the data mart
using terms that relate to the business function.
 Maintain and manage these business interfaces.
 Set up and manage database structures, like summarized tables,
that help queries submitted through the front-end tool execute
quickly and efficiently.
 Managing
 This step involves managing the data mart over its lifetime. In this
step, you perform management tasks such as the following:
 Providing secure access to the data
 Managing the growth of the data
 Optimizing the system for better performance
 Ensuring the availability of data even with system failures
Metadata in Data Warehouse
 Metadata is simply defined as data about data. The data that is
used to represent other data is known as metadata. For example,
the index of a book serves as a metadata for the contents in the
book.

 In other words, we can say that metadata is the summarized


data that leads us to detailed data.

 we can define metadata as follows.


 Metadata is the road-map to a data warehouse.
 Metadata in a data warehouse defines the warehouse objects.
 Metadata acts as a directory. This directory helps the decision
support system to locate the contents of a data warehouse.
Metadata in Data Warehouse
Categories of Metadata
 Metadata can be broadly categorized into three categories:

 Business Metadata - It has the data ownership information, business


definition, and changing policies.

 Technical Metadata - It includes database system names, table and


column names and sizes, data types and allowed values. Technical
metadata also includes structural information such as primary and
foreign key attributes and indices.

 Operational Metadata - It includes information on whether the data


is active or archived.
Meta Data in Data Warehouse
 Role of Metadata
Metadata in Data Warehouse
Challenges for Metadata Management
 Metadata helps in driving the accuracy of reports, validates data
transformation, and ensures the accuracy of calculations. Metadata
also enforces the definition of business terms to business end-users.
With all these uses of metadata, it also has its challenges.

 Metadata in a big organization is scattered across the organization.


This metadata is spread in spreadsheets, databases, and
applications.
 Metadata could be present in text files or multimedia files. To use
this data for information management solutions, it has to be
correctly defined.
 There are no industry-wide accepted standards. Data management
solution vendors have narrow focus.
 There are no easy and accepted methods of passing metadata.
Dimensional Modeling
 Dimensional data model is most often used in data
warehousing systems.
 Dimensional modeling always uses the concepts of facts
(measures), and dimensions (context).
 Facts are typically (but not always) numeric values that can
be aggregated, and dimensions are groups of hierarchies
and descriptors that define the facts. For example, sales
amount is a fact; timestamp, product, register#, store#, etc.
are elements of dimensions.
Dimensional modeling process

 Choose the business process


 Declare the grain
 Identify the dimensions
 Identify the fact
Dimensional modeling process

 The basics in the design build on the actual business process which the data
warehouse should cover. This could for instance be a sales situation in a retail
store.

 The grain of the model is the exact description of what the dimensional model
should be focusing on. This could for instance be “An individual line item on a
customer slip from a retail store”.

 Dimensions are the foundation of the fact table, and is where the data for the
fact table is collected. Typically dimensions are nouns like date, store, inventory
etc. These dimensions are where all the data is stored. For example, the date
dimension could contain data such as year, month and weekday.

 This step is to identify the numeric facts that will populate each fact table row.
This step is closely related to the business users of the system, since this is
where they get access to data stored in the data warehouse.
Data Warehouse schema
 Schema is a logical description of the entire database.
 Data warehouse uses:
 Star
 Snowflake
 Fact Constellation schema.
Data Warehouse schema
Star Schema
 Each dimension in a star schema is represented with only one-
dimension table.
 This dimension table contains the set of attributes.
 The following diagram shows the sales data of a company with
respect to the four dimensions, namely time, item, branch, and
location.

Usually the fact tables in a star schema are in third normal form
(3NF) whereas dimensional tables are de-normalized.

Despite the fact that the star schema is the simplest architecture,
it is most commonly used nowadays and is recommended by
Oracle.
Data Warehouse schema
Data Warehouse schema
Snowflake Schema

 Some dimension tables in the Snowflake schema are normalized.


snowflake schema reduces redundancy

 The normalization splits up the data into additional tables.


snowflake structure can reduce the
effectiveness of browsing, since more joins will be needed to
execute a query.
 Unlike Star schema, the dimensions table in a snowflake schema
are normalized. For example, the item dimension table in star
schema is normalized and split into two dimension tables,
namely item and supplier table.
Data Warehouse schema
Data Warehouse schema
 Fact Constellation Schema
This kind of schema can be viewed as a collection of stars, and
hence is called a galaxy schema o

 A fact constellation has multiple fact tables. It is also


known as galaxy schema.

 The following diagram shows two fact tables, namely


sales and shipping.
Data Warehouse schema
Types of OLAP Servers

We have four types of OLAP servers:


 Relational OLAP (ROLAP)
 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
 Specialized SQL Servers
Types of OLAP Servers
Relational OLAP
 Relational On-Line Analytical Processing (ROLAP) work mainly
for the data that resides in a relational database, where the base
data and dimension tables are stored as relational tables. ROLAP
servers are placed between the relational back-end server and
client front-end tools.
 It is efficient in managing both numeric and textual data.
 ROLAP applications display a slower performance as compared
to other style of OLAP tools
Types of OLAP Servers
 Multidimensional OLAP
 In MOLAP data are pre-summarized and are stored in an optimized
format in a multidimensional cube, instead of in a relational database.
 In this type of model, data are structured into proprietary formats in
accordance with a client’s reporting requirements with the calculations
pre-generated on the cubes.
 This is probably by far, the best OLAP tool to use in making analysis
reports since this enables users to easily reorganize or rotate the cube
structure to view different aspects of data. This is done by way of slicing
and dicing.
 MOLAP analytic tools are also capable of performing complex
calculations. Since calculations are predefined upon cube creation, this
results in the faster return of computed data.
 one primary weakness of which is that MOLAP tool is less scalable
than a ROLAP tool.
 The MOLAP approach also introduces data redundancy.
Types of OLAP Servers
 Hybrid OLAP (HOLAP)
 Hybrid OLAP is a combination of both ROLAP and MOLAP.
It offers higher scalability of ROLAP and faster computation
of MOLAP.
 Specialized SQL Servers
 Specialized SQL servers provide advanced query language
and query processing support for SQL queries over star and
snowflake schemas in a read-only environment.
OLAP operations
 What is OLAP?
 Online Analytical Processing (OLAP) is based on the multidimensional
data model that allow user to extract and view data from different points of
view. OLAP data is stored in a multidimensional database.

 Since OLAP servers are based on multidimensional view of data, we


will discuss OLAP operations in multidimensional data.

 Here is the list of OLAP operations:

 Roll-up
 Drill-down
 Slice and dice
 Pivot (rotate)
OLAP operations
Roll-up
 Roll-up performs aggregation on a data cube in any of the following
ways:
 By climbing up a concept hierarchy for a dimension
 By dimension reduction

 Roll-up is performed by climbing up a concept hierarchy for the


dimension location.
 On rolling up, the data is aggregated by ascending the location hierarchy
from the level of city to the level of country.
Roll-up
OLAP operations
Drill-down

 Drill-down is the reverse operation of roll-up. It is performed by either


of the following ways:
 By stepping down a concept hierarchy for a dimension
 By introducing a new dimension.

 Drill-down is performed by stepping down a concept hierarchy for the


dimension time.
 On drilling down, the time dimension is descended from the level of
quarter to the level of month.
 It navigates the data from less detailed data to highly detailed data.
Drill-down
OLAP operations
Slice
 The slice operation selects one particular dimension from a
given cube and provides a new sub-cube.

 Here Slice is performed for the dimension "time" using the


criterion time = "Q1".

 It will form a new sub-cube by selecting one or more


dimensions.
OLAP operations
 Slice
OLAP operations
Dice

 Dice selects two or more dimensions from a given cube and


provides a new sub-cube. Consider the following diagram that
shows the dice operation.

 The dice operation on the cube based on the following selection


criteria involves three dimensions.
(location = "Toronto" or "Vancouver")
(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")
Dice
OLAP operations
Pivot
 The pivot operation is also known as rotation. It rotates the data
axes in view in order to provide an alternative presentation of
data.

 Item and location axes in 2-D slice are rotated.


Pivot
Another Example
APPLICATION OF OLAP:
 OLAP is widely used in several realms of data management. Some of
these applications include: -
 Financial Applications
 Activity-based costing (resource allocation)
 Budgeting
 Marketing/Sales Applications
 Market Research Analysis
 Sales Forecasting
 Promotions Analysis
 Customer Analyses
 Market/Customer Segmentation
 Business modeling
 Simulating business behavior
 Extensive, real-time decision support system for managers.

You might also like