0% found this document useful (0 votes)

64 views

CS 412: Introduction To Data Mining Course Syllabus

This document is the syllabus for a course on data mining. It outlines the course goals and structure. The course is divided into two parts - the first part focuses on pattern discovery and the second part focuses on cluster analysis. Some of the major topics that will be covered include frequent pattern mining, sequential pattern mining, graph pattern mining, pattern-based classification, k-means clustering, hierarchical clustering, and density-based clustering. The course aims to introduce students to fundamental concepts and methods in these areas of data mining.

Uploaded by

Ahsan Asim

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views

CS 412: Introduction To Data Mining Course Syllabus

Uploaded by

Ahsan Asim

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Note: This is the syllabus from Spring 2017.

Spring 2018 syllabus is under revision and will

include content on "Classificiation". Final verion of the Spring 2018 syllabus will be made
available prior to start of Spring 2018 semester.
CS 412: Introduction to Data Mining Course
Syllabus
Course Description
This course is an introductory course on data mining. It introduces the basic concepts, principles,
methods, implementation techniques, and applications of data mining, with a focus on two major
data mining functions: (1) pattern discovery and (2) cluster analysis.

In the first part of the course, which focuses on pattern discovery, you will learn why pattern
discovery is important, what the major tricks are for efficient pattern mining, and how to apply pattern
discovery in some interesting applications. The course provides you the opportunity to learn
concepts, principles, and skills to practice and engage in scalable pattern discovery methods on
massive data; discuss pattern evaluation measures; study methods for mining diverse kinds of
frequent patterns, sequential patterns, and sub-graph patterns; and study constraint-based pattern
mining, pattern-based classification, and explore their applications.

In the second part of the course, which focuses on cluster analysis, you will learn concepts and
methodologies for cluster analysis, which is also known as clustering, data segmentation, or
unsupervised learning. We will introduce the basic concepts of cluster analysis and then study a set
of typical clustering methodologies, algorithms, and applications. This includes partitioning methods,
such as k-means, hierarchical methods, such as BIRCH, density-based methods, such as DBSCAN,
and grid-based methods, such as CLIQUE. We will also discuss methods for clustering validation.
The learning will be enhanced by clustering software and programming assignments.

The technical contents of the course are based on the textbook Data Mining: Concepts and
Techniques (3rd ed), as well as the on-campus course CS 412 – Introduction to Data Mining, which is
offered in the Department of Computer Science at the University of Illinois. Please note several
themes covered in the textbook are not covered in this online course, including (1) data
preprocessing and preparation, (2) data warehouse and data cube technology, and (3) classification.
This is because these themes have been covered or will be covered, with possible in-depth
treatment, in several other courses offered in the Data Science Online Master program. Therefore,
this course will focus on the in-depth study of the two major data mining functions illustrated above.

Course Goals and Objectives

Upon successful completion of this course, for pattern discovery, you will be able to:
• Recall important pattern discovery concepts, methods, and applications, in particular, the basic
concepts of pattern discovery, such as frequent pattern, closed pattern, max-pattern, and
association rules.
• Identify efficient pattern mining methods, such as Apriori, ECLAT, and FPgrowth.
• Compare pattern evaluation issues, especially several popularly used measures, such as lift, chi-
square, cosine, Jaccard, and Kulczynski, and their comparative strengths.
• Compare mining diverse patterns, including methods for mining multi-level, multi-dimensional
patterns, qualitative patterns, negative correlations, compressed and redundancy-aware top-k
patterns, and mining long (colossal) patterns.
• Learn well-known sequential pattern mining methods, including methods for mining sequential
patterns, such as GSP, SPADE, PrefixSpan, and CloSpan.
• Learn graph pattern mining, including methods for subgraph pattern mining, such as gSpan,
CloseGraph, graph indexing methods, mining top-k large structural patterns in a single large
network, and graph mining applications, such as graph indexing and similarity search in graph
databases.
• Learn constraint-based pattern mining, including methods for pushing different kinds of constraints,
such as data and pattern-based constraints, anti-monotone, monotone, succinct, convertible, and
multiple constraints.
• Learn pattern-based classifications, including CBA, CMAR, PatClass, and DPClass.
• Enjoy various pattern mining applications, such as mining spatiotemporal and trajectory patterns
and mining quality phrases.
• Explore further topics on pattern analysis, such as pattern mining in data streams, software bug
mining, pattern discovery for image analysis, and privacy-preserving data mining.

For cluster analysis, you will be able to:

• Recall basic concepts, methods, and applications of cluster analysis, including the concept of
clustering, the requirements and challenges of cluster analysis, a multi-dimensional categorization
of cluster analysis, and an overview of typical clustering methodologies.
• Learn multiple distance or similarity measures for cluster analysis, including Euclidean and
Minkowski distances; proximity measures for symmetric and asymmetric binary variables; distance
measures between categorical attributes, ordinal attributes, and mixed types; proximity measures
between two vectors – cosine similarity; and correlation measures between two variables –
covariance and correlation coefficient.
• Learn popular distance-based partitioning algorithms for cluster analysis, including K-Means, K-
Medians, K-Medoids, and the Kernel K-Means algorithms.
• Learn hierarchical clustering algorithms, including basic agglomerative and divisive clustering
algorithms, BIRCH, a micro-clustering-based approach, CURE, which explores well-scattered
representative points, CHAMELEON, which explores graph partitioning on the KNN Graph of the
data, and a probabilistic hierarchical clustering approach.
• Learn the density-based approach to cluster analysis, which can group dense regions of arbitrary
shape, such as DBScan and OPTICS.
• Learn the grid-based approach, which organizes individual regions of the data space into a grid-like
structure, such as STING and CLIQUE.
• Study concepts and methods for clustering evaluation and validation by introducing clustering
validation using external measures and internal measures, and the measures for evaluating cluster
stability and clustering tendency.

Textbook and Readings

Although the lectures are designed to be self-contained, it is recommended (but not required) to
reference the textbook: Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concepts and
techniques (3rd ed.). Waltham: Morgan Kaufmann.

You can download a PDF version of the chapters 1, 6, 7 and 2, 10, 11, 13 from Data mining:
Concepts and techniques (3rd ed.) for free. Note that these are all the chapters related to the topics
covered in this course, so the free PDF version of the chapters is sufficient for this course.

If you would like to purchase the entire textbook, the publisher has an exclusive offer just for
Coursera students. You can save 30% on either the print or eBook version of Data Mining: Concepts
and Techniques, 3rd Edition and receive free shipping on all orders. Here is how it works:

• Add the book to your cart.

• Enter code COMP317 and click Apply.
• The discount will be applied to the list price and cannot be combined with other promotions.

Course Outline
This 4-credit hour course is 16 weeks long. You should invest 6–8 hours every week in this course.

The course is composed of two parts. Part 1 of the course, Week 1 to Week 9, focuses on pattern
discovery. Part 2 of the course, Week 10 to Week 16, focuses on cluster analysis. All of the course
content will be released on the first day of class, with the exception of the 2 proctored exams, which
will not be released until the day of each exam (for more information on the proctored exams, read
the section Elements of This Course below). Although all content (except for exams) is made
available to the entire class on the first day, the course follows a schedule (see the table below).

Week Duration Topics

Course Orientation; Course Part 1 Pattern Discovery
1/17– Overview; Pattern Discovery Basic Concepts; Efficient
1
1/22 Pattern Mining Methods; Pattern Discovery
Programming Assignment 1

1/23–
2 Pattern Evaluation; Mining Diverse Frequent Patterns
1/29

Sequential Pattern Mining; Pattern Mining Applications:

3 1/30–2/5
Mining Spatiotemporal and Trajectory Patterns

4 2/6–2/12 Constraint-Based Mining

2/13–
5 Graph Pattern Mining
2/19

2/20–
6 Pattern-Based Classification
2/26

Pattern Mining Applications: Mining Quality Phrases

7 2/27–3/5
from Text Data; Advanced Topics on Pattern Discovery

Pattern Discovery Programming Assignment 2;

8 3/6–3/12
Preparation for Part 1 Exam

3/13–
9 Course Part 1 Exam on Pattern Discovery
3/19

3/20–
10 Spring break
3/26

Course Part 2 Cluster Analysis Overview; Cluster

11 3/27–4/2 Analysis Introduction; Similarity Measures for Cluster
Analysis
Partitioning-Based Clustering Methods; Hierarchical
12 4/3–4/9
Clustering Methods

Hierarchical Clustering Methods (continued); Density-

4/10–
13 Based and Grid-Based Clustering Methods; Cluster
4/16
Analysis Programming Assignment 1

4/17– Methods for Clustering Validation; Cluster Analysis

14
4/23 Programming Assignment 2

4/24–
15 Preparation for Part 2 Exam
4/30

16 5/1–5/7 Course Part 2 Exam on Cluster Analysis

MOOC Version and CS 412 Content Mapping

If you have taken the MOOC version of the course, namely Pattern Discovery and Cluster Analysis,
below is how the content in those two MOOCs maps to this course.

MOOC Equivalent CS 412

Pattern Discovery MOOC Week 1–3, 7, and 8

No MOOC equivalent Week 4, 5, 6, and 9

Cluster Analysis MOOC Week 11–14

No MOOC equivalent Week 15 and 16

Assignment Deadlines
For all assignment deadlines, please refer to the Course Assignment Deadlines, Late Policy, and
Academic Calendar page.

Elements of This Course

The course is comprised of the following elements:

• Lecture Videos. In each week, the concepts you need to know will be presented through a
collection of short video lectures. You may stream these videos for playback within the browser by
clicking on their titles or download the videos. You may also download the slides that go along with
the videos.
• In-Video Questions. Some lecture videos have questions associated with them to help verify your
understanding of the topics. These questions will automatically appear while watching the video if
you stream the video through your browser. These questions do not contribute toward your final
score in the class.

• Lesson Quizzes. Each week may contain one or multiple lessons. A lesson is a series of videos
on a certain topic, which concludes with a lesson quiz. You will be allowed 2 attempts for each quiz.
There is no time limit on how long you take to complete each attempt at the quiz. Each attempt may
present a different selection of questions to you. Your highest score will be used when calculating
your final score in the class.
• Programming Assignments. There are 4 total programming assignments in this course – 2 are
designed around the topic of pattern discovery and the other 2 on cluster analysis. For more
information about the programming assignments, please read the instructions on programming
assignment in respective weeks.
• Proctored Exams. There are 2 proctored exams in this class. The Part 1 Exam will be released
during Week 9. The Part 2 Exam will be released during Week 16. Both exams will be proctored via
a proctoring service called ProctorU. For more information about ProctorU and the proctor exams,
read the Proctored Exam page.

Grading Distribution and Scale

Grading Distribution

Percentage Weight of Final

Assignment Frequency
Grade

Lesson Quizzes 17 17 x 2% per quiz = 34%

Programming Assignments (or
4 4 x 4% per MP = 16%
MP)

Course Part 1 Exam 1 30%

Course Part 2 Exam 1 20%

Grading Scale

Letter Percent Letter Percent Letter Percent

Grade Needed Grade Needed Grade Needed

A+ 95% B+ 80% C 65%

A 90% B 75% D 60%

Below
A- 85% B- 70% F
60%

View Grades

You can view your grade on each assignment by clicking the Assignments tab on the left menu bar.

The Art of Public Speaking 13th Edition
94% (51)
The Art of Public Speaking 13th Edition
450 pages
What Men Dont Want Women To Know - The Secrets, The Lies, The Unspoken Truth - Smith and Doe
90% (20)
What Men Dont Want Women To Know - The Secrets, The Lies, The Unspoken Truth - Smith and Doe
157 pages
Service Manual
94% (36)
Service Manual
681 pages
Love Will Come and Find Me Again
90% (30)
Love Will Come and Find Me Again
7 pages
Dangerous Google - Searching For Secrets PDF
88% (26)
Dangerous Google - Searching For Secrets PDF
12 pages
How To Download Documents From Scribd For Free - 7 Methods
67% (9)
How To Download Documents From Scribd For Free - 7 Methods
25 pages
Ecotec Engine Handbook Complete Web
100% (1)
Ecotec Engine Handbook Complete Web
108 pages
Knowledge Matters Virtual Business Quiz Answers
0% (2)
Knowledge Matters Virtual Business Quiz Answers
7 pages
I'm Not That Girl
100% (5)
I'm Not That Girl
5 pages
The Ethical Slut PDF
55% (69)
The Ethical Slut PDF
298 pages
Spectrum Reading Workbook Grade 8
100% (5)
Spectrum Reading Workbook Grade 8
165 pages
How To Disappear - Erase Your Digital Footprint, Leave False Trails, and Vanish Without A Trace PDF
100% (3)
How To Disappear - Erase Your Digital Footprint, Leave False Trails, and Vanish Without A Trace PDF
111 pages
Billie Holiday God Bless The Child PDF
100% (1)
Billie Holiday God Bless The Child PDF
3 pages
Sandboxels Mod List
75% (4)
Sandboxels Mod List
7 pages
HOW TO FIND GMAIL Password
38% (8)
HOW TO FIND GMAIL Password
3 pages
1z0 1048
No ratings yet
1z0 1048
32 pages
Emerging Trends in HCI
No ratings yet
Emerging Trends in HCI
32 pages
73L DI Turbo Diesel - 1997 - Engine Circuits Federal
100% (3)
73L DI Turbo Diesel - 1997 - Engine Circuits Federal
3 pages
1973 Mustang Wiring Diagram
No ratings yet
1973 Mustang Wiring Diagram
13 pages
Ford Programming Instructions
100% (2)
Ford Programming Instructions
9 pages
HAIW-516 Manual-V2 - 0
No ratings yet
HAIW-516 Manual-V2 - 0
17 pages
Fluids Mechanics Group Project
100% (3)
Fluids Mechanics Group Project
19 pages
CS 412: Introduction To Data Mining Course Syllabus
No ratings yet
CS 412: Introduction To Data Mining Course Syllabus
7 pages
UG BSF Clustering
No ratings yet
UG BSF Clustering
119 pages
8 Clustering
No ratings yet
8 Clustering
89 pages
Data Mining Method
No ratings yet
Data Mining Method
9 pages
8clst
No ratings yet
8clst
100 pages
Clustering Full 1
No ratings yet
Clustering Full 1
98 pages
Clustering
No ratings yet
Clustering
123 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
8clst
No ratings yet
8clst
98 pages
Clustering
No ratings yet
Clustering
84 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
103 pages
Cluster Analysis
No ratings yet
Cluster Analysis
39 pages
CS F415 1322 Data Mining
No ratings yet
CS F415 1322 Data Mining
3 pages
Data Mining and Analysis: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Analysis: Fundamental Concepts and Algorithms
9 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
70 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
123 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
56 pages
Template-data_mining
No ratings yet
Template-data_mining
3 pages
What Is Cluster Analysis?: Unsupervised Learning Stand-Alone Tool Preprocessing Step
No ratings yet
What Is Cluster Analysis?: Unsupervised Learning Stand-Alone Tool Preprocessing Step
21 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
120 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
8 CLST
No ratings yet
8 CLST
98 pages
Data Mining
No ratings yet
Data Mining
98 pages
A4629ac494 Syllabus
No ratings yet
A4629ac494 Syllabus
3 pages
1.3 What Kind of Data Can Be Mined?
No ratings yet
1.3 What Kind of Data Can Be Mined?
5 pages
Data Mining
No ratings yet
Data Mining
3 pages
Kmeans Ex
No ratings yet
Kmeans Ex
98 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
127 pages
Lecture 6 - Clustering
No ratings yet
Lecture 6 - Clustering
25 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Cluster Analysis: Concepts and Techniques - Chapter 7
100% (1)
Cluster Analysis: Concepts and Techniques - Chapter 7
60 pages
Dwdm Unit-II Notes
No ratings yet
Dwdm Unit-II Notes
29 pages
Chapter 7. Cluster Analysis
No ratings yet
Chapter 7. Cluster Analysis
120 pages
CAS CS 565, Data Mining
No ratings yet
CAS CS 565, Data Mining
30 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
CE0716-Data Warehouse and Mining_Compulsory
No ratings yet
CE0716-Data Warehouse and Mining_Compulsory
5 pages
Data Mining
No ratings yet
Data Mining
26 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Recommender System - Module 2 - Data Mining Techniques in Recommender System
No ratings yet
Recommender System - Module 2 - Data Mining Techniques in Recommender System
58 pages
CS-DM MODULE -1
No ratings yet
CS-DM MODULE -1
27 pages
Data Mining
No ratings yet
Data Mining
3 pages
Data Mining
No ratings yet
Data Mining
22 pages
PROFICIENCY Data Mining
No ratings yet
PROFICIENCY Data Mining
6 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Intro To Data Mining
No ratings yet
Intro To Data Mining
25 pages
Chapter 8. Cluster Analysis
No ratings yet
Chapter 8. Cluster Analysis
51 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
DWDM Syllabus
No ratings yet
DWDM Syllabus
2 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
DM NOTES
No ratings yet
DM NOTES
91 pages
Unit 4
No ratings yet
Unit 4
65 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Mastering Algorithms and Data Structures
From Everand
Mastering Algorithms and Data Structures
Manish Soni
No ratings yet
Abdul Hanan 2018-Uam-1253 Human Computer Interaction (Final)
No ratings yet
Abdul Hanan 2018-Uam-1253 Human Computer Interaction (Final)
13 pages
Figure 4.1 For Development Plan
No ratings yet
Figure 4.1 For Development Plan
7 pages
Mns-University of Agriculture Multan: Departement of Computer Science
No ratings yet
Mns-University of Agriculture Multan: Departement of Computer Science
9 pages
3.1.1. Multithreaded Design
No ratings yet
3.1.1. Multithreaded Design
18 pages
Data Mining: Lecture - 03
No ratings yet
Data Mining: Lecture - 03
56 pages
2
No ratings yet
2
9 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Introduction To Information Technology (COMP1111) : Dr. Aisha Mahmood
No ratings yet
Introduction To Information Technology (COMP1111) : Dr. Aisha Mahmood
14 pages
Introduction To Information Technology (COMP1111) : Dr. Aisha Mahmood
No ratings yet
Introduction To Information Technology (COMP1111) : Dr. Aisha Mahmood
16 pages
Introduction To Information Technology (COMP1111) : Dr. Aisha Mahmood
No ratings yet
Introduction To Information Technology (COMP1111) : Dr. Aisha Mahmood
11 pages
Technical Theatre Handbook
No ratings yet
Technical Theatre Handbook
153 pages
AI Tools and Prompts
100% (4)
AI Tools and Prompts
94 pages
10 Useful Websites You Wish You Knew Earlier! 6 (2017)
0% (1)
10 Useful Websites You Wish You Knew Earlier! 6 (2017)
21 pages
2020-04-01 Used Car Buying Guide PDF
100% (1)
2020-04-01 Used Car Buying Guide PDF
244 pages
HEALTH CARE POWER OF ATTORNEY (SC Statutory Form)
No ratings yet
HEALTH CARE POWER OF ATTORNEY (SC Statutory Form)
7 pages
Enders Game
100% (11)
Enders Game
250 pages
Building Simple Network
100% (1)
Building Simple Network
47 pages
Black & Decker The Complete Outdoor Builder
100% (11)
Black & Decker The Complete Outdoor Builder
529 pages
The ARRL General Class License - ARRL Inc
100% (1)
The ARRL General Class License - ARRL Inc
463 pages
20 REAL Ways To Make Money From Home
75% (4)
20 REAL Ways To Make Money From Home
22 pages
COMPUTER REPAIR Smartiepants - F - Ken Jaskulski
83% (6)
COMPUTER REPAIR Smartiepants - F - Ken Jaskulski
330 pages
Learn Computer Science
100% (2)
Learn Computer Science
24 pages
NBR Definition and Creation Process - 005 - e
No ratings yet
NBR Definition and Creation Process - 005 - e
17 pages
Polymer Bound Catalysts
No ratings yet
Polymer Bound Catalysts
12 pages
Estimation of Subgrade Resilient Modulus From Soil Index Properties
No ratings yet
Estimation of Subgrade Resilient Modulus From Soil Index Properties
8 pages
Cse r22 Upto 4th Sem
No ratings yet
Cse r22 Upto 4th Sem
102 pages
FTP
No ratings yet
FTP
20 pages
Answer To MTP - Final - Syllabus 2008 - Jun2014 - Set 1
No ratings yet
Answer To MTP - Final - Syllabus 2008 - Jun2014 - Set 1
18 pages
(2015) The Production Routing Problem A Review of Formulations and Solution Algorithms PDF
No ratings yet
(2015) The Production Routing Problem A Review of Formulations and Solution Algorithms PDF
12 pages
MEC 801 Production and Operation Management Unit II
No ratings yet
MEC 801 Production and Operation Management Unit II
174 pages
AA Hollow Cathode Lamps - Recommended Operating Conditions: Single Element, Multi-Element and Continuum Lamps
No ratings yet
AA Hollow Cathode Lamps - Recommended Operating Conditions: Single Element, Multi-Element and Continuum Lamps
2 pages
STT
No ratings yet
STT
9 pages
Activity Solutions For Web Development With Django
No ratings yet
Activity Solutions For Web Development With Django
120 pages
Reflection From A Periodically Perforated Plane Using A Subsectional PDF
No ratings yet
Reflection From A Periodically Perforated Plane Using A Subsectional PDF
8 pages
Stuck Thread 253 Middleware Magic PDF
No ratings yet
Stuck Thread 253 Middleware Magic PDF
10 pages
Network Reduction Toolbox
No ratings yet
Network Reduction Toolbox
2 pages
Data Migration in Sap Using LSMW: (Batch Input Recording Method)
No ratings yet
Data Migration in Sap Using LSMW: (Batch Input Recording Method)
61 pages
Chordify Brian Tyler Formula 1 Theme Piano Version W Sheet Music
No ratings yet
Chordify Brian Tyler Formula 1 Theme Piano Version W Sheet Music
1 page