0% found this document useful (0 votes)
49 views3 pages

Guidelines-Datamining-I - UGCF-BA-major-sem 3 - July 24

nil

Uploaded by

rcg.2552
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views3 pages

Guidelines-Datamining-I - UGCF-BA-major-sem 3 - July 24

nil

Uploaded by

rcg.2552
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

B.A.

with Computer Science as Major discipline


Undergraduate Programme of study with Computer Science discipline as one of the two Core Disciplines

DISCIPLINE SPECIFIC CORE COURSE- Data Mining-I (Guidelines)


Sem III (July 2024 Onwards)

Sr. Units Chapter No. of


No. Hours
1 Unit 1: Introduction to Data Mining: 1.1-1.4, 2.1-2.2 8
Motivation and Challenges for data
mining, Types of data mining tasks,
Applications of data mining, Data
measurements, Data quality, Supervised
vs. unsupervised techniques
2 Unit 2: Data Pre-Processing: Data 2.3.1, 2.3.2, 2.3.3 (introduction), 2.3.4 9
aggregation, sampling, dimensionality (introduction), 2.3.5 (introduction),
reduction, feature subset selection, 2.3.6 (Binarization and Discretization
feature creation, variable transformation. of Continuous attributes), 2.3.7, 2.4.2,
2.4.3 (excluding properties)
3 Unit 3: Cluster Analysis: Basic 5.1.1, 5.1.2, 5.1.3 (well-separated and 11
concepts of clustering, measure of Density-based) 5.2 (upto Data in
similarity, types of clusters and Euclidean Space), 5.5.1, 5.5.5
clustering methods, K-means algorithm,
measures for cluster validation,
determine optimal number of clusters
4 Unit 4: Association Rule Mining: 4 (up to 4.2.2), 4.3 (introduction, 4.3.1) 8
Transaction data-set, frequent itemset,
support measure, rule generation,
confidence of association rule, Apriori
algorithm, Apriori principle
5 Unit 5: Classification: Naive Bayes 3 (up to 3.3.3), 3.4 (introduction) 3.6, 9
classifier, Nearest Neighbour classifier, 6.3, 6.4, 6.11 (introduction, 6.11.2)
decision tree, overfitting, confusion
matrix, evaluation metrics and model
evaluation.

Text Book:
1. Tan P.N., Steinbach M, Karpatne A. and Kumar V. Introduction to Data Mining, Second
edition, Sixth Impression, Pearson, 2023.

Additional References:
1. Han J., Kamber M. and Pei J. Data Mining: Concepts and Techniques, 3 edition, 2011,
rd

Morgan Kaufmann Publishers.


2. Zaki M. J. and Meira J. Jr. Data Mining and Machine Learning: Fundamental Concepts
and Algorithms, 2 edition, Cambridge University Press, 2020.
nd

3. Aggarwal C. C. Data Mining: The Textbook, Springer, 2015


4. Insight into Data mining: Theory and Practice, Soman K. P., Diwakar Shyam, Ajay V.,
PHI 2006
Datasets may be downloaded from :
1. https://archive.ics.uci.edu/datasets
2. https://www.kaggle.com/datasets?fileType=csv
3. https://data.gov.in/
4. https://ieee-dataport.org/datasets
Suggested Practical Exercises
1. Apply data cleaning techniques on any dataset (e,g, Paper Reviews dataset in UCI repository).
Techniques may include handling missing values, outliers and inconsistent values. A set of validation
rules can be prepared based on the dataset and validations can be performed.
2. Apply data pre-processing techniques such as standardization/normalization, transformation,
aggregation, discretization/binarization, sampling etc. on any dataset
3. Run Apriori algorithm to find frequent item sets and association rules on 2 real datasets and use
appropriate evaluation measures to compute correctness of obtained patterns
a) Use minimum support as 50% and minimum confidence as 75%
b) Use minimum support as 60% and minimum confidence as 60 %
4. Use Naive bayes, K-nearest, and Decision tree classification algorithms to build classifiers on
any two datasets. Pre-process the datasets using techniques specified in Q2. Compare the
Accuracy, Precision, Recall and F1 measure reported for each dataset using the abovementioned
classifiers under the following situations:
i. Using Holdout method (Random sampling):
a) Training set = 80% Test set = 20%
b) Training set = 66.6% (2/3rd of total), Test set = 33.3%
ii. Using Cross-Validation:
a) 10-fold
b) 5-fold
5. Apply simple K-means algorithm for clustering any dataset. Compare the performance of
clusters by varying the algorithm parameters. For a given set of parameters, plot a line graph
depicting MSE obtained after each iteration.
Project: Students should be promoted to take up one project on using dataset downloaded from
any of the websites given above and the dataset verified by the teacher. Preprocessing steps and
at least one data mining technique should be shown on the selected dataset. This will allow the
students to have a practical knowledge of how to apply the various skills learnt in the subject for
a single problem/project.

Prepared by:
1. Dr Anamika Gupta (Shaheed Sukhdev College of Business Studies)
2. Dr Manju Bhardwaj (Maitreyi College)
3. Dr Sarabjeet Kaur (Indraprastha College For Women)
4. Prof. Sharanjit Kaur (Acharya Narendra Dev College)

You might also like