IT-416 Data Mining
IT-416 Data Mining
Text Books: 1. J. Han, M. Kamber, Data Mining: Concepts and Techniques, Second Edition, Morgan
Reference Books: Kaufmann Publishers, 2006
2. M. Kantardzic, Data Mining Concepts, Models, Methods and Algorithms, IEEE Press,
2003
3. M.H. Dunham, S. Sridhar, Data Mining Introductory and Advanced Topics, Pearson
Education, 2006
Course Data mining refers to a family of techniques used to detect interesting nuggets of
Introduction & relationships/knowledge in data. While the theoretical underpinnings of the field have
Description: been around for quite some time (in the form of pattern recognition, statistics, data
analysis and machine learning), the practice and use of these techniques have been
largely ad-hoc. With the availability of large databases to store, manage and assimilate
data, the new thrust of data mining lies at the intersection of database systems, artificial
intelligence and algorithms that efficiently analyze data. The distributed nature of several
databases, their size and the high complexity of many techniques present interesting
computational challenges.
Course Objectives: Introduce students to the basic concepts and techniques of Data Mining.
Develop skills of using recent data mining software for solving practical problems.
Course Outcomes: After completing this course, the student should demonstrate the knowledge and ability
to:
Explain the importance of Data Mining as a new discipline in the IT field.
Show and understand the various kinds of Data Mining Tasks.
Use data mining to solve real life problems.
Class Policies: Attendance for lectures is compulsory. Attendance for less than 75% of the lectures
will result in students being barred from taking the Final Exam.
If you are absent from the lecture due to: Sickness – Medical Certificate is required,
in case of emergency – letter of guardian is required.
There will be no makeup quiz.
Make-up for Mid Term will only be given to those with STRONG VALID reason by the
prior approval of the Head of department.
Cheating and Plagiarism will not be tolerated and will be penalized accordingly.
There will be 5-7 assignments besides on class exercises. Assignments need to be
submitted before the deadline. If you have questions or doubts contact us in our
offices during visiting hours or use our email address.
pg. 1
Course Outline:
01 Introduction
02 Data Preprocessing
Data representation
Data summarization
Data cleaning
Data integration and transformation
03 Association Rule Mining
The Apriori algorithm
Practical Assignment 1
Quiz
06 Classification
Introduction
Decision Tress
07 Classification
Decision tree induction
Attribute Selection Measures
Decision tree measures
Practical Assignment 3
Quiz
08 Classification
K-Nearest Neighbor
Naïve Bayes
Viva of Assignments
09 Mid Term Week
10 Classification
Belief Network
Neural Networks
pg. 2
11 Clustering
Introduction
K-Means
Quiz
Practical Assignment 4
12 Clustering
Hierarchical
13 Clustering
DB-Scan
Quiz
Practical Assignment 5
14 Feature Selection
Introduction
Filter methods
Wrapper methods
15 Web Mining
Introduction
Fingerprinting
HITS algorithm
Quiz
Viva of Assignments
16 Wrap-up course
Viva of Assignments
Final Examination
Grading Policy:
1 Assignments 10%
2 Quizzes 5%
3 Presentations 10%
3 Mid term 25%
Important notes:
4-5 numbers of quizzes will take place in the class to measure the learning progress of the students. These
quizzes will be announced or unannounced.
Plagiarism Policy:
During this course a strict no tolerance plagiarism policy will be adopted. While collaboration in this course is
highly encouraged, you must ensure that you do claim other people’s work/idea as your own. Plagiarism
occurs when the words, ideas, assertion, theories, figures, images, programming code of others is
presented as your own work. Failing to comply with plagiarism policy will lead to strict penalties including
zero marks in assignments.
_______________________________________________________________________________________
pg. 3