DataMining Syllabus
DataMining Syllabus
Unit 4: Clustering:
4.1 Introduction, Clustering,
4.2 Cluster Analysis,
4.3 Clustering Methods- K means, Hierarchical clustering,
4.4 Agglomerative clustering, Divisive clustering,
4.5 Clustering and segmentation software, evaluating clusters.
Instructions for paper setting: Seven questions are to be set in total. First question will be
conceptual covering entire syllabus and will be compulsory to attempt. Three questions will be set
from each Part A and Part B (one from each unit) Student needs to attempt two questions out of
three from each part. Each question will be of 20 marks.
Evaluation Tools:
Assignment/Tutorials
Sessional tests
Surprise questions during lectures/Class Performance
End Sem examination
Program 1: Use Boston House Price Dataset i.e. housing.arff. Apply all preprocessing algorithms and
create a version of the initial data set in which the categorical data are converted into numerical data.
Program 2: Use all the above algorithms to classify weather data from the “weather.arff” file.
Perform initial preprocessing and create a version of the initial dataset in which all numeric attributes
should be converted to categorical data.
Program 3: Use k-means algorithm to bank data from the “bank.arff” file. Perform initial
preprocessing and create a version of the initial data set in which the ID field should be removed and
the "children" attribute should be converted to categorical data.
Program 4: Use Apriori algorithm to generate association rules for Iris data from the “iris.arff” file.
Perform preprocessing and convert categorical data into numeric attributes and analyze results.
Program 5: Use “vote.arff” file for the processing of various attribute selection algorithm and
evaluation of various performance measures.
Program 6: Use “diabetes.arff” file to evaluate various performance parameters for any three
classifiers. Then generate and display the comparison graph for all the above performance
parameters through various charts or graphs.
Program 7: Design and create an ensemble model using more than one classification algorithms
(Bagging and Boosting)
Program 8: Design and create an ensemble model using more than two classification algorithms
(Boosting).
Program 9: Design a Project for the Analysis and Identification handwritten digits using neural
network model behavior.
Program 10: Design a Project for the Analysis and Prediction of student’s behavior using suitable
algorithms.
Software required/Weblinks:
WEKA 3.8.3
www.cs.wakato.ac.nz
http://wekatutorial.com
www.tutorialspoint.com
Evaluation Tools:
Experiments in lab
File work/Class Performance
Viva (Question and answers in lab)
End Term Practical Exam