Mod 4 Insights - Instructor
Mod 4 Insights - Instructor
Data Mining
Module 4: Insights and Decisions
Data Mining
Data Mining Introduction
• With unsupervised learning we don’t know the answer – we ask the algorithm to
find patterns in the data of which we are unaware.
• Cluster analysis and association rules are based on unsupervised learning.
• For example, we may have 10,000 customers and are trying to discover if there are
certain segments in the data that would enable the company to more effectively
market to 5-8 ‘types’ of customers. The algorithm classifies each customer into one
of the segments.
Source: bigdata-madesimple.com
Unsupervised Learning II
• K-means Clustering:
Spending of 10 Customers
900
800
700
600
500
400
300
200
100
0
0 100 200 300 400 500 600 700 800
Unsupervised Learning III
Spending of 10 Customers
900
800
700
600
500
400
300
200
100
0
0 100 200 300 400 500 600 700 800
Unsupervised Learning IV
Supervised Learning
• Using data mining methods for predicting an outcome based on a set of input variables, or
features.
• When we try and fit data based on a defined outcome.
• For example, a linear regression is a form of supervised learning.
• Another form of supervised learning is used in image recognition – where the algorithm is
given a selection of ‘correct’ images, and then has to find new ‘correct’ images from a set.
The algorithm uses correct and incorrect images in order to better determine the features
that differentiate one type of image from another.
Source: bigdata-madesimple.com
Supervised Learning II
New Customer
Source: bigdata-madesimple.com
Mar I/O
Mar I/O Questions
Source: meconferences.com
Data Mining Applications
CRoss InduStry
Process for Data
Mining.
BI Type BI description
Such as clustering – find patterns in groups of
Descriptive customers (what segments exist in our customer
set?).
Such as decision trees (what characteristics
Diagnostic
define a customer segment?).
Such as classification (what marketing activity
Predictive should we engage in with this customer
segment?).
Decision Tree
This tree follows the structure described:
Question:
Active Calculations
For example, there may be a computed field for growth year over year
as a percentage. Sales for a product line’s growth might be 20% overall.
When you drill down into the data, it will update the growth percentage
to reflect the dimensions the user is looking at.
2016
Mix/Match Dimensions
OLAP makes it easy to move things around – do you want years at the top of 2017
your view or down the side? Do you want to see the data as product by
country,
2010 or 2011
country by product?
2012 2013 2014 2015 2016 2017 2018
Module 4: Insights and Decisions
BI User Groups
BI User Models
• So far, we have talked a lot about data – but how do we start to make this data
available to users?
Business
Intelligence
is the link
between
the data
warehouse
and the
user.
What?
Querying vs Reporting (page 2)
• Users will access the data by performing queries (asking questions) or getting
reports (usually automated).
• The same tool (e.g. Tableau, Excel) can be used for both. The main difference is
the level of flexibility versus ease of use (reports can be customized and delivered
in an email).
• Queries can be simple, using a graphical tool such as Tableau (these rely on
metadata). In more sophisticated environments SQL queries can be used directly
with tables, or through a statistical software language like R. The intended use
and user capability will be an important requirement for a BI project. Also, do not
assume that all users will have the same level of capability. The solution will need
to be designed to the lowest common denominator.
• A report is like a repeated query over time, based on the user requirement. For
example, a CFO may want to see a report showing daily sales by business unit,
product line, and country. The format is set up once, and then the report is
updated with new data every day and emailed to the CFO.
Types of Users
Who?
Types of Users (page 2)
Data Consumers
These users are close to only needing reporting. They may use BI software to design
simple reports, often based on some sort of template. However, the primary use will be
to ‘check in’ on key data points at regular intervals, using a pre-designed report or
dashboard.
Advanced Users
These users will have some idea of the data (usually a data mart so that the context is
simplified). For example, a salesperson may want to manipulate the view to understand
which customers are buying certain products, growing the fastest in a certain geography,
etc.
Power Users
These users will be developing reports, perhaps arranging for data access, and would be
able to run queries perhaps using a tool such as SQL. For example, the sales department
may have a power user that defines and designs standard reporting for the sales staff.
Four Decision Points
How?
Four Decision Points (page 2)
Cyclical Reports
Regular reports sent out with data updated over time. Non-interactive.
Ad Hoc Queries
This is the key innovation in BI. It provides users with access to the data directly. This
is where data needed for decision making becomes self-service.
Interactive Dashboards
A combination of the above two decision points – the value of having a standardized
format, but also can ‘drill-down’ into the data or reorganize variables to investigate
questions.
Conditional Alerts
A type of ‘exception reporting’. When certain conditions are met, an alert is sent to the
user. For example, when the daily sales on a top account falls below a certain level.
BI Model
Data
CRM Report
Mart Data Consumer
Query
Data Data
POS ETL
Warehouse Mart Report
Advanced User
Data
ERP Query
Mart
Report
Power User
Pictures: openclipart.org
Next Week