CS 412: Introduction To Data Mining Course Syllabus
CS 412: Introduction To Data Mining Course Syllabus
In the first part of the course, which focuses on pattern discovery, you will learn why pattern
discovery is important, what the major tricks are for efficient pattern mining, and how to apply pattern
discovery in some interesting applications. The course provides you the opportunity to learn
concepts, principles, and skills to practice and engage in scalable pattern discovery methods on
massive data; discuss pattern evaluation measures; study methods for mining diverse kinds of
frequent patterns, sequential patterns, and sub-graph patterns; and study constraint-based pattern
mining, pattern-based classification, and explore their applications.
In the second part of the course, which focuses on cluster analysis, you will learn concepts and
methodologies for cluster analysis, which is also known as clustering, data segmentation, or
unsupervised learning. We will introduce the basic concepts of cluster analysis and then study a set
of typical clustering methodologies, algorithms, and applications. This includes partitioning methods,
such as k-means, hierarchical methods, such as BIRCH, density-based methods, such as DBSCAN,
and grid-based methods, such as CLIQUE. We will also discuss methods for clustering validation.
The learning will be enhanced by clustering software and programming assignments.
The technical contents of the course are based on the textbook Data Mining: Concepts and
Techniques (3rd ed), as well as the on-campus course CS 412 – Introduction to Data Mining, which is
offered in the Department of Computer Science at the University of Illinois. Please note several
themes covered in the textbook are not covered in this online course, including (1) data
preprocessing and preparation, (2) data warehouse and data cube technology, and (3) classification.
This is because these themes have been covered or will be covered, with possible in-depth
treatment, in several other courses offered in the Data Science Online Master program. Therefore,
this course will focus on the in-depth study of the two major data mining functions illustrated above.
• Recall basic concepts, methods, and applications of cluster analysis, including the concept of
clustering, the requirements and challenges of cluster analysis, a multi-dimensional categorization
of cluster analysis, and an overview of typical clustering methodologies.
• Learn multiple distance or similarity measures for cluster analysis, including Euclidean and
Minkowski distances; proximity measures for symmetric and asymmetric binary variables; distance
measures between categorical attributes, ordinal attributes, and mixed types; proximity measures
between two vectors – cosine similarity; and correlation measures between two variables –
covariance and correlation coefficient.
• Learn popular distance-based partitioning algorithms for cluster analysis, including K-Means, K-
Medians, K-Medoids, and the Kernel K-Means algorithms.
• Learn hierarchical clustering algorithms, including basic agglomerative and divisive clustering
algorithms, BIRCH, a micro-clustering-based approach, CURE, which explores well-scattered
representative points, CHAMELEON, which explores graph partitioning on the KNN Graph of the
data, and a probabilistic hierarchical clustering approach.
• Learn the density-based approach to cluster analysis, which can group dense regions of arbitrary
shape, such as DBScan and OPTICS.
• Learn the grid-based approach, which organizes individual regions of the data space into a grid-like
structure, such as STING and CLIQUE.
• Study concepts and methods for clustering evaluation and validation by introducing clustering
validation using external measures and internal measures, and the measures for evaluating cluster
stability and clustering tendency.
You can download a PDF version of the chapters 1, 6, 7 and 2, 10, 11, 13 from Data mining:
Concepts and techniques (3rd ed.) for free. Note that these are all the chapters related to the topics
covered in this course, so the free PDF version of the chapters is sufficient for this course.
If you would like to purchase the entire textbook, the publisher has an exclusive offer just for
Coursera students. You can save 30% on either the print or eBook version of Data Mining: Concepts
and Techniques, 3rd Edition and receive free shipping on all orders. Here is how it works:
Course Outline
This 4-credit hour course is 16 weeks long. You should invest 6–8 hours every week in this course.
The course is composed of two parts. Part 1 of the course, Week 1 to Week 9, focuses on pattern
discovery. Part 2 of the course, Week 10 to Week 16, focuses on cluster analysis. All of the course
content will be released on the first day of class, with the exception of the 2 proctored exams, which
will not be released until the day of each exam (for more information on the proctored exams, read
the section Elements of This Course below). Although all content (except for exams) is made
available to the entire class on the first day, the course follows a schedule (see the table below).
1/23–
2 Pattern Evaluation; Mining Diverse Frequent Patterns
1/29
2/13–
5 Graph Pattern Mining
2/19
2/20–
6 Pattern-Based Classification
2/26
3/13–
9 Course Part 1 Exam on Pattern Discovery
3/19
3/20–
10 Spring break
3/26
4/24–
15 Preparation for Part 2 Exam
4/30
If you have taken the MOOC version of the course, namely Pattern Discovery and Cluster Analysis,
below is how the content in those two MOOCs maps to this course.
Assignment Deadlines
For all assignment deadlines, please refer to the Course Assignment Deadlines, Late Policy, and
Academic Calendar page.
• Lecture Videos. In each week, the concepts you need to know will be presented through a
collection of short video lectures. You may stream these videos for playback within the browser by
clicking on their titles or download the videos. You may also download the slides that go along with
the videos.
• In-Video Questions. Some lecture videos have questions associated with them to help verify your
understanding of the topics. These questions will automatically appear while watching the video if
you stream the video through your browser. These questions do not contribute toward your final
score in the class.
• Lesson Quizzes. Each week may contain one or multiple lessons. A lesson is a series of videos
on a certain topic, which concludes with a lesson quiz. You will be allowed 2 attempts for each quiz.
There is no time limit on how long you take to complete each attempt at the quiz. Each attempt may
present a different selection of questions to you. Your highest score will be used when calculating
your final score in the class.
• Programming Assignments. There are 4 total programming assignments in this course – 2 are
designed around the topic of pattern discovery and the other 2 on cluster analysis. For more
information about the programming assignments, please read the instructions on programming
assignment in respective weeks.
• Proctored Exams. There are 2 proctored exams in this class. The Part 1 Exam will be released
during Week 9. The Part 2 Exam will be released during Week 16. Both exams will be proctored via
a proctoring service called ProctorU. For more information about ProctorU and the proctor exams,
read the Proctored Exam page.
Grading Distribution
Grading Scale
Below
A- 85% B- 70% F
60%
View Grades
You can view your grade on each assignment by clicking the Assignments tab on the left menu bar.