0% found this document useful (0 votes)
13 views36 pages

Mod 4 Insights - Instructor

Business Insights in Business Intelligence

Uploaded by

Devika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views36 pages

Mod 4 Insights - Instructor

Business Insights in Business Intelligence

Uploaded by

Devika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Business Intelligence:

Data Mining
Module 4: Insights and Decisions
Data Mining
Data Mining Introduction

Data mining is used to improve decision making by finding


useful patterns and insights from data.

It is an analytic process that examines large amounts of


data from different perspectives and summarizes the data
in such a way that useful patterns and relationships are
discovered.

Supervised: Improve a decision model.


Unsupervised: Find patterns in the data.

Source: BABOK v3.


Unsupervised Learning

• With unsupervised learning we don’t know the answer – we ask the algorithm to
find patterns in the data of which we are unaware.
• Cluster analysis and association rules are based on unsupervised learning.
• For example, we may have 10,000 customers and are trying to discover if there are
certain segments in the data that would enable the company to more effectively
market to 5-8 ‘types’ of customers. The algorithm classifies each customer into one
of the segments.

Source: bigdata-madesimple.com
Unsupervised Learning II
• K-means Clustering:
Spending of 10 Customers
900

800

700

600

500

400

300

200

100

0
0 100 200 300 400 500 600 700 800
Unsupervised Learning III

Spending of 10 Customers
900

800

700

600

500

400

300

200

100

0
0 100 200 300 400 500 600 700 800
Unsupervised Learning IV
Supervised Learning

• Using data mining methods for predicting an outcome based on a set of input variables, or
features.
• When we try and fit data based on a defined outcome.
• For example, a linear regression is a form of supervised learning.
• Another form of supervised learning is used in image recognition – where the algorithm is
given a selection of ‘correct’ images, and then has to find new ‘correct’ images from a set.
The algorithm uses correct and incorrect images in order to better determine the features
that differentiate one type of image from another.

Source: bigdata-madesimple.com
Supervised Learning II

New Customer

K-NN: Use observations from the past that are


most like the new observation to classify or
predict the value of the target variable.

Regression analysis captures the relationship


between two or more variables.
Predict an outcome of a target variable
based on several input variables.
Jaggia et al, Business Analytics © 2021 McGraw Hill
Reinforcement Learning

• This is based on unsupervised learning and provides a mechanism to discover


the optimal performance of the algorithm.
• The software itself makes decisions on which path to pursue and learns through
some feedback mechanism.
• For example, Amazon suggests new products to you based on your purchase
history (association rules – unsupervised learning). It could use the feedback
(did you buy the product or not?) to improve future suggestions.

Source: bigdata-madesimple.com
Mar I/O
Mar I/O Questions

• What type of machine learning took place?


• Supervised, Unsupervised or Reinforcement
• What are the main differences between a human learning to play a
video game and Machine Learning?
The Forms of ML

Source: meconferences.com
Data Mining Applications

1. Banking: financial forecasting, credit risk, fraud detection.


2. Target Marketing: market segmentation, customer classification, optimizing
campaign performance.
3. Insurance: categorize groups of customers to determine pricing, fraud,
customer retention.
4. Telecommunications: reduce customer churn, bundling.
5. Operations Management: planning and scheduling, quality control.
6. Retail Sales Forecasting: customer profiling, market basket analysis.
7. Systems Diagnosis: predict faults for preventative maintenance.

Can you think of data mining applications from your life?


How does this relate to social media data collection and privacy?

Source: Sabherwal and Becerra-Fernandez, Business Intelligence. Wiley, 2011.


CRISP-DM Model

CRoss InduStry
Process for Data
Mining.

This is a standard process


used in agile environments.

In building a BI project as part


of a Business Analysis
solution, keep in mind that
these process steps will need
to be followed.
CRISP-DM
Data Mining

• Data mining is used as an input in human decision-making.


• This can be presented in the form of visual dashboards and reports.
• It can also be used in automated decision-making systems (e.g. Amazon
recommendations).

BI Type BI description
Such as clustering – find patterns in groups of
Descriptive customers (what segments exist in our customer
set?).
Such as decision trees (what characteristics
Diagnostic
define a customer segment?).
Such as classification (what marketing activity
Predictive should we engage in with this customer
segment?).
Decision Tree
This tree follows the structure described:

-Start with the Age feature.


For Age <= 30, check the Job feature.
If the Job is "student," predict "Yes."
Otherwise, predict "No."
-For Age > 30, check the Balance feature.
If the Balance is > 5000, check the
Previous Campaign feature.
If Previous Campaign is > 1, predict "Yes."
Otherwise, predict "No."
-If the Balance is <= 5000, check the
Marital Status feature.
If Marital Status is "single," predict "No."
Otherwise, predict "Yes."
Common Mistakes in Deploying Data Mining Technologies

1. User expectations are too high.


2. Putting the right tools in the wrong hands.
• The right stakeholder level is the line manager.

3. Providing data that users need to figure out how to use.


4. Training users only at the beginning of the project.
5. Going for a quick win rather than planning for the long haul.
6. The organization goes for the big bang.
7. Data roles and governance are not adequately addressed.
8. The organization fails to demonstrate value.

Source: Sabherwal and Becerra-Fernandez, Business Intelligence. Wiley, 2011.


Business Intelligence:
OLAP
Module 4: Insights and Decisions
Online Analytical Processing (OLAP)
• OLAP enables the fast querying of data using a simplified process.
• OLAP stores the data in a more friendly format (called a cube) that is set up
specifically to help support data retrieval in an analytical context.
• It is an interactive solution – you see the results of your query immediately.
So much so that the fact a user is even querying data is transparent.
• OLAP is also supported in Microsoft Excel. It is easy to build PowerPivot
solutions that enable users to select and reformat datasets. However, there
are limits and ‘powerusers’ may become frustrated.
• OLAP organizes data by measures (numbers) and dimensions (categories),
allowing the user to ‘slice’ categories. Typical categories include region,
product, customer, date. The actual data might be sales dollars or units.
• A slice is a subset of data from a multidimensional array (usually 2
dimensions) corresponding to a single value set for one or more of the
dimensions in the subset.
The OLAP Cube

Question:

What other type


of data storage
solution reminds
you of the OLAP
cube, in terms of
its purpose and
logical location in
the data
framework?

Source: Business Intelligence, Analytics and Data Science, Pearson, 2018.


The OLAP Cube (page 2)

Size Units Price Sales


Small 403 $19.99 $8055.97
Medium 1840 $19.99 $36781.60
Large 2756 $19.99 $55092.44
Extra Large 311 $21.99 $6838.89

Source: Business Intelligence, Analytics and Data Science, Pearson, 2018.


The OLAP Cube (page 3)

Source: Business Intelligence, Analytics and Data Science, Pearson, 2018.


Using OLAP
Drill Down / Drill Up
OLAP enables a user to interact with the data to drill down (go down
one level of detail) and drill up (aggregate several levels of detail. A
two-dimensional view is a ‘slice’, and a view with more than two
dimensions is a ‘dice’. This is called hierarchies or parent/child
relationships.
Using OLAP (page 2)
Drill Down / Drill Up
OLAP enables a user to interact with the data to drill down (go down
one level of detail) and drill up (aggregate several levels of detail. A
two-dimensional view is a ‘slice’, and a view with more than two
dimensions is a ‘dice’. This is called hierarchies or parent/child
relationships.

Active Calculations
For example, there may be a computed field for growth year over year
as a percentage. Sales for a product line’s growth might be 20% overall.
When you drill down into the data, it will update the growth percentage
to reflect the dimensions the user is looking at.

((CY$ - LY$)/LY$) x 100


Using OLAP (page 3)
Drill Down / Drill Up
OLAP enables a user to interact with the data to drill down (go down one level 2010
of detail) and drill up (aggregate several levels of detail. A two dimensional
view is a ‘slice’, and a view with more than two dimensions is a ‘dice’. This is 2011
called hierarchies or parent/child relationships.
2012
Active Calculations
2013
For example, there may be a computed field for growth year over year as a
percentage. Sales for a product line’s growth might be 20% overall. When you 2014
drill down into the data, it will update the growth percentage to reflect the
dimensions the user is actually looking at. 2015

2016
Mix/Match Dimensions
OLAP makes it easy to move things around – do you want years at the top of 2017
your view or down the side? Do you want to see the data as product by
country,
2010 or 2011
country by product?
2012 2013 2014 2015 2016 2017 2018
Module 4: Insights and Decisions
BI User Groups
BI User Models

• So far, we have talked a lot about data – but how do we start to make this data
available to users?

Business
Intelligence
is the link
between
the data
warehouse
and the
user.

Source: BABOK v3.


Querying vs Reporting

What?
Querying vs Reporting (page 2)

• Users will access the data by performing queries (asking questions) or getting
reports (usually automated).
• The same tool (e.g. Tableau, Excel) can be used for both. The main difference is
the level of flexibility versus ease of use (reports can be customized and delivered
in an email).
• Queries can be simple, using a graphical tool such as Tableau (these rely on
metadata). In more sophisticated environments SQL queries can be used directly
with tables, or through a statistical software language like R. The intended use
and user capability will be an important requirement for a BI project. Also, do not
assume that all users will have the same level of capability. The solution will need
to be designed to the lowest common denominator.
• A report is like a repeated query over time, based on the user requirement. For
example, a CFO may want to see a report showing daily sales by business unit,
product line, and country. The format is set up once, and then the report is
updated with new data every day and emailed to the CFO.
Types of Users

Who?
Types of Users (page 2)
Data Consumers
These users are close to only needing reporting. They may use BI software to design
simple reports, often based on some sort of template. However, the primary use will be
to ‘check in’ on key data points at regular intervals, using a pre-designed report or
dashboard.

Advanced Users
These users will have some idea of the data (usually a data mart so that the context is
simplified). For example, a salesperson may want to manipulate the view to understand
which customers are buying certain products, growing the fastest in a certain geography,
etc.

Power Users
These users will be developing reports, perhaps arranging for data access, and would be
able to run queries perhaps using a tool such as SQL. For example, the sales department
may have a power user that defines and designs standard reporting for the sales staff.
Four Decision Points

How?
Four Decision Points (page 2)

Cyclical Reports
Regular reports sent out with data updated over time. Non-interactive.

Ad Hoc Queries
This is the key innovation in BI. It provides users with access to the data directly. This
is where data needed for decision making becomes self-service.

Interactive Dashboards
A combination of the above two decision points – the value of having a standardized
format, but also can ‘drill-down’ into the data or reorganize variables to investigate
questions.

Conditional Alerts
A type of ‘exception reporting’. When certain conditions are met, an alert is sent to the
user. For example, when the daily sales on a top account falls below a certain level.
BI Model

Data
CRM Report
Mart Data Consumer

Query
Data Data
POS ETL
Warehouse Mart Report
Advanced User

Data
ERP Query
Mart
Report
Power User

Pictures: openclipart.org
Next Week

• Business Intelligence Strategy

You might also like