Data Mining Assignment 1 2023 Preprocessing and Frequent Pattern

This document outlines an assignment for a data mining course which requires students to explore, preprocess, and conduct frequent pattern mining on a dataset of graduate university students. Students must analyze the dataset, identify patterns and interesting rules, and provide recommendations based on the rules. The assignment is divided into tasks of data exploration, preprocessing, frequent pattern mining using association rules, and answering questions about insights gained from the analysis.

Uploaded by

Asma MSCS 2022 FAST NU LHR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views2 pages

Data Mining Assignment 1 2023 Preprocessing and Frequent Pattern

Uploaded by

Asma MSCS 2022 FAST NU LHR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Assignment 1

Data Exploration, Pre-processing, and Frequent Pattern Mining

Data Mining, Fall 2023

Due Date:20 th September 2023

Submission Location: Submit a Word document on Google Classroom. The name of the document should be your
roll number.

Question:

In this assignment, you have to pre-process the data and identify interesting patterns in the given dataset of
Graduate University Students.

1) Data Exploration
In this assignment, pre-process and explore the given dataset using WEKA. Report your findings in a word document
and upload the document on Google classroom.

a. [20 mark] Explore the dataset, and

a. For each attribute, report the following: type, mean, median, mode, range, and variance. These
measures of central tendency and dispersion help to analyze the attribute.

b. For each attribute, identify issues in data quality like missing value, inconsistency, noise, outliers etc.
Suggest the appropriate response if any of the above potential problems exist in specific data
attributes. For example, how you intend to handle missing values, outliers etc.

c. Analyze the attributes based on the above information. (Don't just give numerical values; also explain
in simple English what information it gave you regarding the attribute)
i. How is an attribute distributed? (normal, skewed) and
ii. Find other insights, such as which attributes can be eliminated because of little or no change
in variance (Low variance filter).

b. [5 marks] Explore correlation among different attributes.

a. Analyze which attributes are positively related and which are negatively related.
b. Use graphs like scatter plots to get insights.

c. [5 marks] Discuss the new insights you found from visualizing and exploring the data, the techniques you tested,
and the results you obtained. You can include the different graphs and plots you have used for visualization,
but do examples in plain English.

2) Data Pre-processing and Frequent Pattern Mining

After data exploration, your task is to pre-process the given dataset and find trends and patterns using
association rule mining. Pre-processing includes data discretization (binning), data reduction, data smoothing,
and feature selection. Explain your choices, such as why you selected equal frequency or width binning. Also,
explain your choices for normalization and data reduction.

NOTE: Data pre-processing and frequent pattern mining is an iterative process. You may need to pre-process
data multiple times to identify exciting and valuable rules that give new insights.
Experiment with different parameters to extract strong rules (e.g., rules with high lift and confidence, which at
the same time have relatively good support). Convert the dataset into a form suitable for Association Rule
Mining. Pre-process the attributes so you can see some patterns in data and extract rules using Apriori.

1. [10 points] Use confidence as an interestingness measure of an association rule. Rank the top 10
association rules for at least the three different combinations of support and confidence. Explain the rules
and why you consider them interesting and valuable. Furthermore, also give recommendations based on
the discovered rules that might help the user.

2. [10 points] Use interest as an interestingness measure of an association rule. Rank the top 10 association
rules for at least three combinations of support and interest. Explain the rules and why you consider it
interesting and useful. Furthermore, also give recommendations based on the discovered rules that might
help the user.

3. [10 points] Try to formulate some questions that you want to ask of your rule learning extraction systems.
Select the attributes that will be required to answer your questions. Run Association rule mining to
extract interesting patterns. Show at least 10 rules. Explain the rules and why you consider them
interesting and useful. Explain what insight you got regarding your questions.
a. For example, one may want to find the effect of the number of study hours, job, marital status,
and highly educated parents on CGPA. To figure this out, select the appropriate attributes, pre-
process them, and run apriori. You can set the class attributes in Weka to find rules about a
particular attribute.

Note: The top 5 most interesting rules are most likely not the top 5 in the result set of the Apriori algorithm.
They are rules that, in addition to having high support, lift, and confidence, also gives some non-trivial, useful
information based on the underlying business objectives.

Submission: Do include the different graphs and plots that you have used for visualization.

Note: The following Weka and Data Mining tutorial is helpful

http://facweb.cs.depaul.edu/mobasher/classes/ect584/WEKA/index.html.

Building Better Models with JMP Pro
From Everand
Building Better Models with JMP Pro
Jim Grayson
No ratings yet
Bookstein - 2014 - Measuring and Reasoning Numerical Inference in The Sciences
100% (1)
Bookstein - 2014 - Measuring and Reasoning Numerical Inference in The Sciences
570 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
3 pages
Measures of Skewness and Kurtosis
No ratings yet
Measures of Skewness and Kurtosis
29 pages
Why Data Mining
No ratings yet
Why Data Mining
12 pages
R23-DWDM Syllabus
No ratings yet
R23-DWDM Syllabus
5 pages
devdm
No ratings yet
devdm
7 pages
unit 1
No ratings yet
unit 1
28 pages
dwm
No ratings yet
dwm
19 pages
DWM Lab Manual
No ratings yet
DWM Lab Manual
92 pages
DBMS
No ratings yet
DBMS
51 pages
DWDM MANUAL-1
No ratings yet
DWDM MANUAL-1
96 pages
Seperated
No ratings yet
Seperated
11 pages
RDataMining Slides Association Rules PDF
No ratings yet
RDataMining Slides Association Rules PDF
75 pages
Association Rule Mining With R
No ratings yet
Association Rule Mining With R
58 pages
DM CIA 3
No ratings yet
DM CIA 3
60 pages
Data Mining Lab Manual
100% (1)
Data Mining Lab Manual
41 pages
Advanced Data Analytics Assignment
No ratings yet
Advanced Data Analytics Assignment
6 pages
ML Theory Questions
No ratings yet
ML Theory Questions
2 pages
Homework Assignment: Project 1
No ratings yet
Homework Assignment: Project 1
2 pages
Task 1
No ratings yet
Task 1
3 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
Dmbi Lab 7om
No ratings yet
Dmbi Lab 7om
8 pages
CEUC502 - DMBI_Question_Bank
No ratings yet
CEUC502 - DMBI_Question_Bank
12 pages
2023 Its665 - Isp565 - Group Project
No ratings yet
2023 Its665 - Isp565 - Group Project
6 pages
Its665 Isp565 Group Project March 2023
No ratings yet
Its665 Isp565 Group Project March 2023
10 pages
DataWarehousing DataMining Question Bank
No ratings yet
DataWarehousing DataMining Question Bank
3 pages
Tu4 Weka Tutorials
No ratings yet
Tu4 Weka Tutorials
8 pages
Module-4 DM _introduction
No ratings yet
Module-4 DM _introduction
5 pages
Data Mining Hahahaha
No ratings yet
Data Mining Hahahaha
65 pages
M S Ramaiah Institute of Technology Department of Information Science & Engg
No ratings yet
M S Ramaiah Institute of Technology Department of Information Science & Engg
11 pages
DATA MINING
No ratings yet
DATA MINING
44 pages
Unit-2
No ratings yet
Unit-2
8 pages
Data Mining1
No ratings yet
Data Mining1
13 pages
ITS665_ISP565_GROUP_PROJECT_MAC2024
No ratings yet
ITS665_ISP565_GROUP_PROJECT_MAC2024
9 pages
Weka Ex
No ratings yet
Weka Ex
3 pages
Key Terms: Association Rules and Statistics
No ratings yet
Key Terms: Association Rules and Statistics
1 page
Final Exam Review
No ratings yet
Final Exam Review
6 pages
2021 ITS665 - ISP565 - GROUP PROJECT-revMac21
No ratings yet
2021 ITS665 - ISP565 - GROUP PROJECT-revMac21
6 pages
Exploring Data with Access 2016
From Everand
Exploring Data with Access 2016
Larry Rockoff
No ratings yet
Lab Manual Computer Science & Engineering
No ratings yet
Lab Manual Computer Science & Engineering
29 pages
Rapid Minder Assignment
No ratings yet
Rapid Minder Assignment
38 pages
Association Pattern of Students Thesis e
No ratings yet
Association Pattern of Students Thesis e
10 pages
Ques 1.give Some Examples of Data Preprocessing Techniques?: Assignment - DWDM Submitted By-Tanya Sikka 1719210284
No ratings yet
Ques 1.give Some Examples of Data Preprocessing Techniques?: Assignment - DWDM Submitted By-Tanya Sikka 1719210284
7 pages
Data Preprocessing for Clustering
No ratings yet
Data Preprocessing for Clustering
40 pages
Access 2016: Up To Speed
From Everand
Access 2016: Up To Speed
R.M. Hyttinen
5/5 (2)
Lecture 2.1.3 2.1.4
No ratings yet
Lecture 2.1.3 2.1.4
34 pages
CIS 467 - Topic 2 - Data Exploration and Preprocessing
No ratings yet
CIS 467 - Topic 2 - Data Exploration and Preprocessing
81 pages
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
No ratings yet
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
5 pages
Feature Extraction and Reduction by using ModifiedApriori algorithm (1)
No ratings yet
Feature Extraction and Reduction by using ModifiedApriori algorithm (1)
9 pages
Data Mining and Warehousing Lab
No ratings yet
Data Mining and Warehousing Lab
4 pages
CS 515 Data Warehousing and Data Mining
No ratings yet
CS 515 Data Warehousing and Data Mining
5 pages
DA_LabFile
No ratings yet
DA_LabFile
63 pages
Data Mining University Answer
No ratings yet
Data Mining University Answer
10 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Data Preparation Basics#
No ratings yet
Data Preparation Basics#
2 pages
Data Mining Unit2
No ratings yet
Data Mining Unit2
9 pages
Recommender System - Module 2 - Data Mining Techniques in Recommender System
No ratings yet
Recommender System - Module 2 - Data Mining Techniques in Recommender System
58 pages
Assignment (Association Rule Mining)
No ratings yet
Assignment (Association Rule Mining)
2 pages
126VW122019
No ratings yet
126VW122019
2 pages
Chap.3 Data Preprocessing
No ratings yet
Chap.3 Data Preprocessing
6 pages
DWDM Assignment 1
No ratings yet
DWDM Assignment 1
4 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
42 pages
Neural IR
No ratings yet
Neural IR
45 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
Lecture Crawling
No ratings yet
Lecture Crawling
38 pages
Book Review: Marcos Lopez de Prado: Advances in Financial Machine Learning, Wiley, 2018
No ratings yet
Book Review: Marcos Lopez de Prado: Advances in Financial Machine Learning, Wiley, 2018
3 pages
KNN and Naive Bayes
No ratings yet
KNN and Naive Bayes
61 pages
Comprehension and Mathematics Debbie Draper
No ratings yet
Comprehension and Mathematics Debbie Draper
24 pages
Tugas Review Jurnal
No ratings yet
Tugas Review Jurnal
9 pages
Research Questions About Relationships Among Variables
No ratings yet
Research Questions About Relationships Among Variables
13 pages
ISRAEL-GRP
No ratings yet
ISRAEL-GRP
41 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
6 pages
B4 Très Imprtt Méthodes Biostatistiques Et Épidémiologiques Employées Pour La Recherche Biomédicale PDF
No ratings yet
B4 Très Imprtt Méthodes Biostatistiques Et Épidémiologiques Employées Pour La Recherche Biomédicale PDF
8 pages
Joke
No ratings yet
Joke
31 pages
Complete ICAP Solutions
No ratings yet
Complete ICAP Solutions
88 pages
Chapter 2
No ratings yet
Chapter 2
88 pages
Stata Book PDF
No ratings yet
Stata Book PDF
81 pages
Ge 10 Pretest
No ratings yet
Ge 10 Pretest
3 pages
Business Statistics - 41000 Jeffrey R. Russell Fall 2016: 1. Content
No ratings yet
Business Statistics - 41000 Jeffrey R. Russell Fall 2016: 1. Content
2 pages
Customer Churn Prediction in Telecommunications
No ratings yet
Customer Churn Prediction in Telecommunications
12 pages
Internal Quality Controll Handbook For Chemical Laboratories
No ratings yet
Internal Quality Controll Handbook For Chemical Laboratories
52 pages
Unit-2: Logistic Regression
No ratings yet
Unit-2: Logistic Regression
30 pages
Sample Hypothesis Research Paper
100% (3)
Sample Hypothesis Research Paper
5 pages
Chapter 4
No ratings yet
Chapter 4
58 pages
BSM Unit-I Notes
No ratings yet
BSM Unit-I Notes
20 pages
Probability Jee Mains
No ratings yet
Probability Jee Mains
7 pages
Lecture Guide in Math009: Probability and Statistics
0% (1)
Lecture Guide in Math009: Probability and Statistics
44 pages
Dilla University: Page 1 of 6
100% (2)
Dilla University: Page 1 of 6
6 pages
Unit 5: Hypothesis Testing
No ratings yet
Unit 5: Hypothesis Testing
6 pages
Business Analytics Foundations Discussion
No ratings yet
Business Analytics Foundations Discussion
3 pages
Math11 SP Q3 M3 PDF
No ratings yet
Math11 SP Q3 M3 PDF
16 pages