0% found this document useful (0 votes)
126 views

Course Outline - Data Mining

From SMU SIS

Uploaded by

Sean Koh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views

Course Outline - Data Mining

From SMU SIS

Uploaded by

Sean Koh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Instructor: Steven Hoi

School of Information Systems


Singapore Management University
Email: [email protected]
Course Logistics
• Instructor
– Assoc Prof. Steven Hoi
– Office: SIS-level4-052
– Email: [email protected]
– Tel: (+65) 6808 7949
– Office hours: by appointments
• Teaching Assistant
– LAN Yunshi
– E-mail: [email protected]
– Office hours: by appointments
• Course Schedule
– Time: 15.30 – 18.45 PM, every Thursday
– Venue: SIS Seminar Rm 2.4

©Steven Hoi IS424 Data Mining & Business Analytics 2


What is Data Mining?
• “Data Mining (the analysis step of the knowledge
discovery in databases process, or KDD), is the
process of discovering new patterns from large data
sets involving methods at the intersection of artificial
intelligence, machine learning, statistics and
database systems, etc.
• The goal of data mining is to extract knowledge from
a data set in a human-understandable structure”
[wikipedia]

©Steven Hoi IS424 Data Mining & Business Analytics 3


Why Data Mining?
Top 10 most wanted job skills of 2016 in US according to LinkedIn.
• Cloud and distributed computing
• Statistical analysis and data mining
• Mobile development
• Network and information security
• Middleware and integration software
• Storage systems and management
• User interface design
• Algorithm design
• Java development
• Web architecture and development frameworks

©Steven Hoi IS424 Data Mining & Business Analytics 4


Course Objective & Learning Outcomes
• Understand fundamental data mining concepts and
techniques
– Why and How, some glimpse of what is under the hood
– Lectures, in-class discussions, exercises and exams

• Understand the connections to business context


– How to set goals, prepare data, choose parameters, etc.
– Lectures, in-class discussions, and final projects

• Hands-on exploration with real data


– Result presentation, visualization and evaluation
– Lab assignments, and final project

©Steven Hoi IS424 Data Mining & Business Analytics 5


Course Topics
• Introduction to Data Mining
• Data: Understanding data characteristics
• Data Exploration: making sense of data
• Classification
• Clustering
• Association Analysis
• Other Topics in Data Mining

©Steven Hoi IS424 Data Mining & Business Analytics 6


Course Components & Grading

• Course Components and Grading


– Classroom participation 10%
– Exercises 10%
– Lab assignments 20%
– Midterm exam 30%
– Course project 30%
• Proposal 5%
• Final Report & Presentation 25%

©Steven Hoi IS424 Data Mining & Business Analytics 7


Grading – Classroom Participation (10%)
• Class attendance component
– You should attend at least 10 out of 12 weeks
(excluding recess week 8)
– Marks will be pro-rated for those with less than 10
weeks attendance
• Class activity component
– Based on how a student actively participates in the
classes, e.g., answer questions, raise good
questions, spot critical typos/mistakes in slides, etc.
– Active participation in e-learn discussions

©Steven Hoi IS424 Data Mining & Business Analytics 8


Grading – Exercises (10%)
• Motivation
– Get students well prepared before the lectures
– Encourage students for training self learning ability
– Validate if students can understand the concepts
• Grading
– Students should complete each of the exercises
after each lecture and submit solutions via E-learn
– Efforts may also be taken into consideration for
grading, in addition to checking the correctness only

©Steven Hoi IS424 Data Mining & Business Analytics 9


Grading – Midterm Exam (30%)
• Mid-term Exam (30%)
– Week 9 in class, 2.5 hours
– Cheat-Sheet
• One-page, Single-side, A4
– No computer, mobile, and Internet allowed

©Steven Hoi IS424 Data Mining & Business Analytics 10


Grading – Project (30%)
• Objective:
– To complement in-class learning, by getting students to apply
the techniques, or investigate selected topics in greater depth or
breadth.

• Team (4 – 5 members)
– Work load should be evenly distributed.
– Peer evaluation adjustments apply.
– Form a team by Week 3
– Choose your project topic by Week 3

• Start early !
• Proposal due in Week 6
• Proposal presentation in Week 7

©Steven Hoi IS424 Data Mining & Business Analytics 11


Project Types
A Data Mining Task/ A Start-up Analytics Application
 Kaggle Competitions: https://www.kaggle.com/
 Real company data & tasks, give reference and obtain permission
 Real-world data sets publicly available:
UCI machine learning repository http://archive.ics.uci.edu/ml/
 Singapore open data sets https://data.gov.sg/
 SAS data sets, other public data sets, Twitter, wikipedia, DBLP
 Data mining tasks with interesting results
 Classification, cluster analysis, association mining, etc.
 Result presentation
 Project report, visualization of result, analysis and summary

©Steven Hoi IS424 Data Mining & Business Analytics 12


Grading – Project (30%)
• Deliverables
– Proposal Report & Presentation
• 2 pages (max) hard copy
• Lists group members
• Scope of problem, data sources, analysis that needs to be done,
project planning and team management
• Simple group presentation, emphasis on content, not
showmanship
– Project Final Report
• 20 pages (max) all inclusive, hardcopy, stapled, no jackets
– Final Oral Presentation
• Order of presentation will be determined later

©Steven Hoi IS424 Data Mining & Business Analytics 13


Grading – Project (30%)
• Grading Guidelines
– 5% => Proposal Report & Presentation
– 25% => Final Report & Presentation
– Areas that will be important:
• Problem originality, modeling elegance and
innovativeness
• Ability to reveal problem structure and apply analysis
• Appropriate use of techniques and tools
• Clarity, completeness and accuracy of report
• Good writing skills
 Give references!

©Steven Hoi IS424 Data Mining & Business Analytics 14


Grading – Project (30%)

Wk 4-6 Wk 12
Wk 1-3 Investigation for Submission of Wk 13&14
Form a team of Project Proposal Final report Final oral
4-5 persons (20 pages max, hardcopy) presentation
(max 5) and 20 minutes
choose topic
Wk 4
Submit team Wk 7
names and Submit proposal report and
Project Topic (2 pages max, hardcopy)
5-min proposal presentation

©Steven Hoi IS424 Data Mining & Business Analytics 15


Course Schedule
Week Lecture Topic Lab / Project Milestones Reading
1 Course Overview B1.Ch 1
Introduction to data mining
2 What is data? Project spec release B1.Ch 2

3 Data Exploration Submit Project team and topic B1.Ch 3


4 Classification 1: Decision tree Lab 1: Data Exploration Due B1.Ch 4

5 Classification 2: Alternatives NA B1.Ch 5

6 Clustering 1: K-means clustering Submit Project Proposal B1.Ch 8


Lab 2: Classification Due
7 Project Proposal Presentation NA B1.Ch 8
B1.Ch 6
8 Session break (Recess) (Recess Week)
9 Midterm Exam NA

10 Clustering 2: Hierarchical clustering NA B2.Ch10.4


B2.Ch10.5
11 Association Mining Lab 3: Clustering Due [P08,P09]
12 NA Lab 4: Association Due

13 Project FINAL Presentation 1 Submit Project FINAL Report


14 Project FINAL Presentation 2 NA
15 NA

©Steven Hoi IS424 Data Mining & Business Analytics 16


Class Software for Lab/Project
 SAS Enterprise Miner
 Campus license \\fs21\Applications\SAS_Installer
 Software \\fs21\Applications\SAS_Installer\SAS_TM_EM
 Please follow lab instructions
 KDDLabs (SMU initiative)
 Web-based interactive Data Mining platform (IS480 project)
 Towards salable data mining and machine learning
 Open-source Software Toolbox (Optional)
 Scikit-learn: machine learning in Python (http://scikit-learn.org/)
 WEKA (http://www.cs.waikato.ac.nz/ml/weka/)
 Tensorflow (https://www.tensorflow.org/)

©Steven Hoi IS424 Data Mining & Business Analytics 17


Course Book Materials
• Textbook
– Tan, Steinbach, Kumar, “Introduction to Data Mining,” Pearson
Addision Wesley, 2006 (Book B1)

• Reference Books
– Jiawei Han and Micheline Kamber (2011), “Data Mining: Concepts and
Techniques,” Morgan Kaufmann Publishers, 3rd edition
– “Data Mining: Practical Machine Learning Tools and Techniques”
(Third Edition) Ian H. Witten, Eibe Frank, Kaufmann Publishers, 2011.
– Heikki Mannila, Padhraic Smyth, David Hand (2001), “Principles of
Data Mining,” MIT Press
– Mamdouh Refaat, “Data Preparation for Data Mining using SAS”,
Morgan Kaufmann

©Steven Hoi IS424 Data Mining & Business Analytics 18

You might also like