0% found this document useful (0 votes)
24 views51 pages

ANL303 - Week - 1 - Jan 2023 Includes Course Overview

This document outlines the course structure and assessment for a course on fundamentals of data mining. The course consists of six weekly seminars covering various data mining topics. Assessment includes quizzes, assignments, participation, and a final exam. Quizzes cover material from two study units and assess prerequisite knowledge. Assignments include an individual tutor-marked assignment, a group-based assignment applying data mining software, and participation based on attendance and contributions. The final exam evaluates all course topics.

Uploaded by

syed ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views51 pages

ANL303 - Week - 1 - Jan 2023 Includes Course Overview

This document outlines the course structure and assessment for a course on fundamentals of data mining. The course consists of six weekly seminars covering various data mining topics. Assessment includes quizzes, assignments, participation, and a final exam. Quizzes cover material from two study units and assess prerequisite knowledge. Assignments include an individual tutor-marked assignment, a group-based assignment applying data mining software, and participation based on attendance and contributions. The final exam evaluates all course topics.

Uploaded by

syed ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

ANL303

Fundamentals of Data Mining


Study Unit 1
Course Structure & Assessment

Week Topic Seminar Time


Week 1 Overview of Data Mining 31 Jan 23 (Tue) 7pm – 10pm
Week 2 Cross Industry Standard Process for Data Mining (CRISP-DM) 7 Feb 23 (Tue) 7pm – 10pm
Week 3 Data Exploration and Preprocessing 14 Feb 23 (Tue) 7pm – 10pm
Week 4 Association and Clustering 21 Feb 23 (Tue) 7pm – 10pm
Week 5 Predictive Modelling: Decision Trees 28 Feb 23 (Tue) 7pm – 10pm
Week 6 Introduction to Cloud Analytics 7 Mar 23 (Tue) 7pm – 10pm

• There will be hands-on exercises using IBM SPSS Modeler. Please ensure that you have installed the
software

3
Course Structure & Assessment
Six (6) weekly seminars of three (3) hour duration

• Course Assessment:

50%
OCAS

6%
50%
Graded Quizzes
Tutor-Marked OES
18%
Assignment (TMA)
Group-Based Final Examination 50%
20%
Assignment (GBA)

Participation 6%
4
Course Structure & Assessment
Six (6) weekly seminars of three (3) hour duration
• Course Assessment:
• OCAS
• 3 Quizzes (6%), (includes 1 Compulsory Pre-course Quiz)
• 1 Tutor-Marked Assignment (18%)
• 1 Group-Based Assignment (20%) and
• Participation (6%)
• OES
• Exam (50%)
• Attendance
• CET students to mark attendance before each class/session via the QR code
which will be sent to CET students’ email by CCPE.
Course Assessment
• Quizzes
– Pre-Course Quiz 1 (PCOQ1) (2%)
• Course coverage: Units 1 & 2
• Must achieve 60 marks to remain in course
– Pre-Class Quiz 1 (PCQ01, 2%)
• Course coverage: Units 3 & 4
– Pre-Class Quiz 2 (PCQ02, 2%)
• Course coverage: Units 5 & 6

6
Assessment components
Tutor-Marked Assignment (TMA) (18%)

• Cover topics from SU1 to SU2


• Submit one written report in MS Word document format
• Individual work
• Submission Date: 13 February 23, 11.55pm

7
Assessment components
Group-Based Assignment (GBA) (20%)

• Cover topics from SU1 to SU4


• Require the use of IBM SPSS Modeler
• Submit one written report in MS Word document format
• GBA groups should have a maximum of 4 students. Otherwise, for a team of 3 students only if it is not
possible
• Only the Group Leader submits the report
• Submission Date: 28 February 23, 11.55pm

8
Assessment components
Participation (6%)

• 2 components – class attendance (30%) + participation (70%)


• If you are not able to attend, you can still earn the participation marks by doing the following:
1. Inform the instructor in advance (or within a day after the seminar) of your inability to attend that
seminar.
2. Prepare either a summary or the activities' answers to the instructor:
• a two-page summary of the recorded seminar session that you missed (the summary cannot be a simple
copy-and-paste exercise of the PowerPoint slides; it should cover the key points of what was presented by
the instructor during the session). OR
• two-page answers to the seminar's activities.
3. Submit either the summary or the activities’ answers to the instructor within one (1) week from the date
of the seminar that you missed.

9
Assessment components
Exam (50%)

• Exam date: Refer to Student’s Portal


• Duration: 2 hours
• Covers ALL topics in Study Unit 1 to 6

10
Unit 1 Overview & Activities

Study Unit 1
Overview of Data Mining
Key Learning Objectives for this unit include:

• Outline the key concept of data mining;


• Differentiate the various types of data mining;
• Describe tasks to be done at various stages in a
data mining process;
• Identify possible applications of data mining in
different business contexts.
What is Data? Wisdom

Knowledge

Information
Data by themselves have no meaning
because they are without context and
interpretation. Data
Not support decision-making
The Data, Information, Knowledge, Wisdom (DIKW) hierarchy developed by Rowley (2007)
Rowley, J. (2007). The wisdom hierarchy: Representations of the DIKW hierarchy. Journal of Information Science 33(2), 163-180.
What is Data? Wisdom

Knowledge

Information is processed data (i.e., data


with meaning) Information
Support decision-making

Data

The Data, Information, Knowledge, Wisdom (DIKW) hierarchy developed by Rowley (2007)
Rowley, J. (2007). The wisdom hierarchy: Representations of the DIKW hierarchy. Journal of Information Science 33(2), 163-180.
What is Data? Wisdom

Knowledge

Information
Data mining is the transformation of data
into information for decision-making.
Data

The Data, Information, Knowledge, Wisdom (DIKW) hierarchy developed by Rowley (2007)
Rowley, J. (2007). The wisdom hierarchy: Representations of the DIKW hierarchy. Journal of Information Science 33(2), 163-180.
How to increase the sales of this book?

Transactional data What kind of information can we get


16
by looking at the transactional data?
17
Beer and Diapers?

If you are the manager of the


supermarket, what decisions will you
make based on this piece of information?
Beer and Diapers?
Data mining
is the process
of finding
previously
unknown
and valid
information
from data and
using that
information to
take actions
Beer Diaper or make
decisions.
Types of Data Mining
Based on data mining functionalities, data mining methods can be categorised into:
• Descriptive data mining
• Predictive data mining

20
Descriptive Data Mining
• Focuses on what has already happened in the past
• Explores patterns and relationships that may exist in data

21
Descriptive Data Mining
• Focuses on what has already happened in the past
• Explores patterns and relationships that may exist in data

• Example 1: to present general characteristics of a dataset


(i.e., summarisation)

Exploration techniques such as summary


statistics and visualisation can be used
(to be discussed in Study Unit 3)

22
Descriptive Data Mining
• Focuses on what has already happened in the past
• Explores patterns and relationships that may exist in data

• Example 2: to identify relationships among variables from


the dataset
(i.e., association)

Association rule mining can be used E.g., Item A and Item B are usually purchased by
customers at the same time.
(to be discussed in Study Unit 4)

23
Descriptive Data Mining
• Focuses on what has already happened in the past Income
• Explores patterns and relationships that may exist in data

• Example 3: to group data into classes of similar objects


(i.e., clustering)

Age
Cluster analysis can be used
(to be discussed in Study Unit 4)

24
Descriptive Data Mining
Examples of association analysis and clustering
Clustering can be used to group customers
based on their similarities in terms of age and
income Customer Age Income Chips Bread Milk Butter
Income
Amy 4000  
Ben 2500  
Cindy 1500  
David 40 4500 
Evan 2800  
Flora 7000 
Gloria 45 6000   
Age
Association analysis can be used to identify the relationship
among items purchased by the customers
25
Predictive Data Mining Decision trees can be used
(to be discussed in Study Unit 5)

The term prediction refers to classification and estimation

• Classification refers to the prediction of an output variable that is categorical in nature


– Example: predicting whether a customer is a buyer or a non-buyer

• Estimation refers to the prediction of an output variable that is quantitative in nature


– Example: predicting the amount a customer spends in an order

26
Summary
• Descriptive data mining:
– Summarisation
– Association
– Clustering Some data mining techniques can do one or
more of these…
• Predictive data mining:
– Classification
– Estimation

27
Data Mining Process
Problem
definition

Data mining
Data quality
technique
assessment
evaluation
Problem Definition
1. Identification of a business problem

2. Translation of the business problem into a data mining application

30
In this
example,
what is the
business
problem?

Business problem: the sales of the product is declining


Business objective: to improve book sales
31
In this
example,
what is the
data
mining
objective?

Data mining objective: to identify books that are usually purchased by


customers at the same time and then recommend them for customers to
increase sales (or bundle them together for pricing discount etc) 32
Problem
definition

Data mining
Data quality
technique
assessment
evaluation
Data Quality Assessment
1. Collection of data

2. Assessment of data quality

3. Preparation of data for mining

34
In this
example,
what data
will you
collect?

35
Problem
definition

Data mining
Data quality
technique
assessment
evaluation
Data Mining Technique Evaluation
1. Identification of appropriate data mining techniques

2. Construction of models

3. Assessment of model performance

4. Identification of the final best model

37
Data Mining Technique Evaluation

38
Data Mining Applications
Data Mining Applications
Examples:

1. Customer Relationship Management

2. Credit Scoring

3. Fraud Detection

4. Retailing

40
Customer Relationship Management
E.g., identify products that are usually
Value Association purchased together by customers

Better cross-/up-selling

E.g., market
segmentation for Better retention
E.g., predict customer churn
target marketing
Predictive modelling
Clustering
Profit
Time
Loss
Better acquisition 41
Credit Scoring
• Predictive modelling can be used to:
– Identify factors related to at-risk customers
– Assess the risk of granting a loan to an applicant, based on the characteristics of that applicant

• Models and insights can be used to:


– Increase profits by offering loans to high-value and low-risk customers

42
Fraud Detection
• Predictive modelling can be used to:
– Identify suspicious cases that may warrant further investigation

• Models and insights can be used to:


– Understand how frauds are attempted by the fraudsters
– Implement new controls to prevent similar fraud cases from happening in the future

43
Retailing
• Data mining can be used to:
– Analyse buying patterns of customers

• Models and insights can be used to:


– Rearrange the storage locations of products in stores
– Design advertising and promotion strategies
– Online retailers can recommend products based on products customers search in the websites

44
Advantages and
Disadvantages of Data
Mining
Advantages of Data Mining
• Provides a range of powerful analytical tools for organisations to outperform their
competitors

• Transforms large amounts of data into insights for better decision making

• Can be applied in many sectors such as banking and finance, manufacturing, marketing
and retail

46
Disadvantages of Data Mining
• The quality of data mining results and applications depends on the availability and quality of
data

• Data mining is not perfect and acting on wrongly discovered or “random” patterns can have
consequences

• Successful data mining requires users to be knowledgeable in the business domain and data
mining tools

• IT expertise is also necessary for extraction and preparation of data, as well as model
deployment

• Substantial investment of resources in data mining may be required

47
Case Discussion (30 mins)
Case Discussion: Tammy, the product manager
Background

• July 2021: The telco market in Singapore is intensely competitive. With the entrance of
new digital only operators, the full-service incumbents are feeling the pressure. With a
falling average revenue per user (ARPU) due to competition, Tammy, the product
manager of one of the incumbent telcos is thinking to offer a new plan in Q4 of 2021. The
idea is to bundle unlimited outgoing local calls with unlimited data and then promote it to
customers to prevent them from churning.

49
Case Discussion: Tammy, the product manager
Your task

• Based on the background, discuss among your group:


1. What is the business problem
2. Propose one data mining application
3. Describe how the proposed application can help solve the business problem

• Post your answers (in point-form) in the Discussion Forum


(Go to Canvas > Discussions > SU1 – Case Discussion)

*Please remember to write down the names of all group members in your post

50
End of Study Unit 1
See you next week!

You might also like