100% found this document useful (2 votes)
2K views69 pages

Topic 1 Etw3482

The general procedures for data mining include: 1) Asking a business question to define the goal. 2) Collecting and preparing relevant data from multiple sources. 3) Exploring the data to identify patterns and anomalies and determine appropriate variables. 4) Modeling the data using supervised or unsupervised learning methods to develop and validate predictive models.

Uploaded by

Danny Lion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
2K views69 pages

Topic 1 Etw3482

The general procedures for data mining include: 1) Asking a business question to define the goal. 2) Collecting and preparing relevant data from multiple sources. 3) Exploring the data to identify patterns and anomalies and determine appropriate variables. 4) Modeling the data using supervised or unsupervised learning methods to develop and validate predictive models.

Uploaded by

Danny Lion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Topic 1

Introduction to Data Mining

Definition of Data Mining

By Ts Dr Lee How Chinh


What is Data Mining?
DIKW Pyramid

Wisdom

Knowledge

Information

Data

Erl, T., Khattak, W., & Buhler, P. (2016). Big data fundamentals: Concepts, drivers & techniques. Boston:
Prentice Hall, ServiceTech Press.
Data Mining
• A synonym for “Knowledge Discovery From Data” or KDD.
• An interdisciplinary subject of computer science and statistics.
• It contains the knowledge discovery steps.
• The process of discovering interesting pattern and knowledge
from large amounts of data.
• Consists of many analytics methods, but they can be categorised
into two broad categories:
o Pattern Discovery
o Predictive Modelling.
What types of data can be mined?
• Data that are meaningful for the application.
• Basic forms of data are database data, data warehouse
data and transactional data.
• Other forms of data, like data streams, sequence data,
graph or networked data, spatial data, text data, etc.
DIKW Pyramid in Business Architecture

Wisdom
Strategic Judgement
(CSFs) (Constraints)
Tactical Knowledge Action
(KPIs) (Adjustments)

Operational
Information Experience
(PIs/Metrics) (Results)
Data

Events
Erl, T., Khattak, W., & Buhler, P. (2016). Big data fundamentals: Concepts, drivers & techniques. Boston:
Prentice Hall, ServiceTech Press.
Lesson Summary

• DIKW pyramid
• Data mining is a knowledge discovery
process
• There are two groups of analytics
methods in data mining: pattern discovery
and predictive modelling.
Why is data mining becoming popular?
Topic 1
Introduction to Data Mining

The reasons for data mining


getting important.
By Ts Dr Lee How Chinh
The reasons for data mining getting important.
The reasons can be categorised as
• Technical reasons
• Business operational reasons.
Technical reasons
• Connectivity is becoming a norm
oMobile internet, WIFI, etc.
• The popularity of smart devices
• Data storage is getting cheaper and easier to access.
Technical Reasons:
Connectivity is becoming a norm
Technical Reasons:
The popularity of Smart devices
Technical Reasons:
Data storage is getting cheaper and easier to
access
Technical reasons
• Connectivity is becoming a norm
oMobile internet, WIFI, etc.
• The popularity of smart devices
• Data storage is getting cheaper and easier to access.

These factors make high quality data can be easily


available, obtained and stored.
Business operational reasons
• Globalisation in business
• The complexity in business operation
• The competitive business environment
• The changes of customers’ expectation
• The influence of information on customers’ behaviour
• ...
Business Operational Reasons:
Globalisation in business

• The opportunity for globalising a business is


more accessible.
• The outcomes are the increase in the flow of
goods, services, capital, people, and ideas.
Business Operational Reasons:
The complexity in business operations
• With the advancement of technologies, business
opportunities are limitless.
• The customers’ preferences and technologies change
rapidly nowadays. These make the anticipation of the
products or services trend challenges.
• The business operations such as marketing, sales,
inventory management, order management, etc. need
to be highly sensitive to markets and be flexible to the
changes.
Business Operational Reasons:
The competitive business environment
• The data, resources and opportunities are easily
accessible by competitors.
• Customers can easily access and compare the products
and services.
• The flourish of information increases the difficulty to
meet customers’ expectations.
• The cost to retain a customer is increasing.
Business operational reasons
• Globalisation in business
• The complexity in business operation
• The competitive business environment
• The changes of customers’ expectation
• The influence of information on customers’ behaviour
• ...
Decision-makers need to make a faster and more accurate decisions.
And the cost of making a wrong decision is getting higher.
Lesson Summary

• The technical factors cause data can be


readily available and stored.
• The business operational reasons induce
the need for data mining for survival.
Can you identify the examples of data mining
applications in business from your own experience?

How does it benefit the business?

And how does it benefit you as a customer?


Topic 1
Introduction to Data Mining

The benefits and business


applications of data mining
By Ts Dr Lee How Chinh
The benefits of data mining in business
Some specific benefits associated with successful data
mining applications in business are listed below:
• Increase customer acquisition and retention
• Uncover and reduce frauds
• Improve production quality, and minimise production losses in
manufacturing
• Increase upselling and cross-selling
• Sell products and services in preferable combinations
Data mining applications in business
Data mining has many different applications, some are
listed as below:
• Product recommendation
• Detecting fraudulent credit card transactions
• Customers churn prediction
• Employee attrition prediction
• Customer segmentation
• Targeted marketing
• the list goes on...
Lesson Summary

• The benefits of applying data mining in business


are minimising costs, generating revenue, or
improving customer experiences.
• There many data mining applications in business,
the only limit is our creativity.
Topic 1
Introduction to Data Mining

General procedures for data mining

By Ts Dr Lee How Chinh


The General Procedures for Data Mining
Data Mining Process
Ask a
Business
Question

Implement, Collect and


Act and Prepare the
Evaluate data

Model the Explore the


data data
Data Mining Process
What is the goal?

What do you need?


Ask a Finding Association?
Business Classify? Estimate? Describe?
Question Do you have the proper data?

What actions are planned?


The Business Question for Common Data Mining Applications
Business Question Application What is Predicted?

How to better target product/service Profiling and segmentation. Customer behaviours and needs by
offers? segment.
Which product/service to recommend? Cross-sell and up-sell. Probable customer purchases.

How to grow and maintain valuable Acquisition and retention. Customer preferences and purchase
customers? patterns.
How to direct the right offer to the right Campaign management. The success of customer
person at the right time? communications.
How to minimise operational disruptions Asset maintenance The real drivers of asset or equipment
and maintenance costs? failure
How to decrease fraud losses and lower Fraud management and Unknown fraud cases and future risks.
false positives? cybersecurity
Data Mining Process
Which data are relevant?

Collect How many data sources are


involved?
and
Prepare Do you have access to the data?
the data
Do you have privacy issues?

Will the data be available?


Data Mining Process
Are there anomalies or patterns?

How the data look?


Explore
Do you have too many or too few
the data variables?
Do you need to impute or
transform the data?

Do you need to aggregate or


create the data?
Data Mining Process
Which methods do you need?
Supervised or unsupervised learning
methods?
Train different models (algorithms
and approaches)
Model the
Validate and test all the models
data
Select the best model
according to the question

Score the champion model


Data Mining Process
What did you learn?

Can you explain the answer with


the model?
Implement
, Act and Can you tell a story based on the
finding?
Evaluate
Can you deploy the model in
time?
Can you refine the data
collection?
Business Data Analytics Framework
Business Value
Optimisation
“What to offer?”

Data Mining
“What will happen?”

OLAP & Diagnostic Analytics


“Why did it happen?”

Query and Report


“What happened?”

Complexity
Lesson Summary

• Data mining process


• Ask a business question
• Collect and prepare the data
• Explore the data
• Model the data
• Implement, act and evaluate
• Business data analytics framework
What is SAS Enterprise Miner?

How does SAS Enterprise Miner


integrate the data mining process?
Topic 1
Introduction to Data Mining

SEMMA: The SAS Data Mining

By Ts Dr Lee How Chinh


SEMMA: The SAS Data Mining
SAS Enterprise Miner
• The SAS Enterprise Miner interface is one of the interface
among SAS software suite.
• It offers secure analysis management and provides a wide
variety of tools with a consistent graphical interface.
• The strength of SAS Enterprise Miner is data mining.
• The complex mining techniques are carried out in a totally code-
free environment so you can focus on the exploring the data,
applying the models and algorithms, discovery of new patterns,
and new questions to ask.
SAS Enterprise Miner
SEMMA
• SAS considers data mining solution as a process rather than a
set of analytical tools.
• The acronym SEMMA refers to a methodology that clarifies this
process.
▪ Sample the data by extracting a portion of a large dataset that big enough to
contain the significant information, and yet small enough to manipulate quickly.
▪ Explore the data by searching for unanticipated trends and anomalies in order
to gain understanding and ideas
▪ Modify the data by creating, selecting, and transforming the variables to focus
on the model selection process.
▪ Model the data by allowing the software to search automatically for a
combination of data that reliably predicts a desired outcome.
▪ Assess the data by evaluating the usefulness and reliability of the findings from
the data mining process.
SAS Enterprise Miner
SAS Enterprise Miner
SAS Enterprise Miner
SAS Enterprise Miner
SAS Enterprise Miner
SAS Enterprise Miner
SAS Enterprise Miner
SAS Enterprise Miner
SAS Enterprise Miner
Lesson Summary

• The strength of SAS Enterprise Miner in


analytic workflow as one of the interface
of SAS software suite.
• SEMMA as a data mining solution process.
Are there any problems when models
are deployed into an operational
system?
Topic 1
Introduction to Data Mining

Operationalising Analytics

By Ts Dr Lee How Chinh


What is Operationalising Analytics?
Operationalising Analytics
• Building a powerful model is not enough. The model must be
embedded into operational decision making processes to drive
business results.
• Based on a survey from SAS, organisations deploy less than
50% of the best models and spend more than three months to
deploy 90% of the models.
• A model’s value is based on how quickly it is deployed.
Model Deployment Issues

• Different skill set


• Different focus
• Different working environment

Data Scientist/ Data Analytics team IT team


Produce Models Deploy Models
Model Deployment Common Issues
• Operational and development environments are different
• Lack of transparency in model construction
• No common model repository
• Lack of well established processes
• No performance monitoring
• Deployment architecture scalability
• Inability to comply with regulatory requirements
ModelOps
ModelOps provides an organisational
philosophy and practice that support
the continual flow of analytics

The goal of ModelOps is to create a


shared approach to the creation and
deployment of models
ModelOps
ModelOps must be embedded
into operational decision-making
processes to put models into
production.

An additional loop of business


dimension is added to ensure
the model is affecting business
decisions and results.
The Seven Key Steps for Operationalising Analytics
1. Register
▪ To register all the models on a centralised model repository
▪ To ensure all models and related components are easily traceable and
governable.
2. Deploy
▪ The analytical models are integrated into a production environment and
produce results.
▪ The models from various sources , both commercial and open source, are
combined and compared. Then select the champion model to deploy.
The Seven Key Steps for Operationalising Analytics
3. Decide
▪ A decisioning engine is connected to the deployment step
▪ Allowing business users to integrate analytical models and business rules into
decision flow
4. Act
▪ Actions is where the decisions are published to the operational business
processes.
▪ The decision makers must be able to access the most updated information as
the decision is executing.
▪ The users should be able to publish decision flows across various channels and
processes
▪ The decision can be executed in batch or real time.
The Seven Key Steps for Operationalising Analytics
5. Measure
▪ Understanding how well the decision flows are performing
▪ To record and track information about a decision in real time or during the
execution of the decision flow.
▪ The contact and response results can be collected and feed them back into the
model monitoring process to improve model performance.
6. Monitor
▪ After the decision flow creates results and values, monitoring models’ ongoing
performance begins.
▪ Models are evaluated to see whether they are still behaving as expected based
on the market conditions, business requirements and new data
The Seven Key Steps for Operationalising Analytics
7. Retrain
▪ If a model performance degrades, the organisation should take one of the three
approaches:
▪ Retrain the existing model using new data
▪ Revise the model with new techniques, feature engineering, or new data elements,
etc.
▪ Replace the model with a better model.
The Seven Key Steps for Operationalising Analytics
7. Retrain
▪ If a model performance degrades, the organisation should take one of the three
approaches:
▪ Retrain the existing model using new data
▪ Revise the model with new techniques, feature engineering, or new data elements,
etc.
▪ Replace the model with a better model.
Lesson Summary

• ModelOps embedded with the decision-


making process can ensure the model is
deployed efficiently and the results
impact the business decision.
If you are a leader, what can you do to
help operationalise analytics in your
organisation?

You might also like