0% found this document useful (0 votes)
27 views2 pages

Data Mining Assignment 1 2023 Preprocessing and Frequent Pattern

This document outlines an assignment for a data mining course which requires students to explore, preprocess, and conduct frequent pattern mining on a dataset of graduate university students. Students must analyze the dataset, identify patterns and interesting rules, and provide recommendations based on the rules. The assignment is divided into tasks of data exploration, preprocessing, frequent pattern mining using association rules, and answering questions about insights gained from the analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views2 pages

Data Mining Assignment 1 2023 Preprocessing and Frequent Pattern

This document outlines an assignment for a data mining course which requires students to explore, preprocess, and conduct frequent pattern mining on a dataset of graduate university students. Students must analyze the dataset, identify patterns and interesting rules, and provide recommendations based on the rules. The assignment is divided into tasks of data exploration, preprocessing, frequent pattern mining using association rules, and answering questions about insights gained from the analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment 1

Data Exploration, Pre-processing, and Frequent Pattern Mining


Data Mining, Fall 2023

Due Date:20 th September 2023

Submission Location: Submit a Word document on Google Classroom. The name of the document should be your
roll number.

Question:

In this assignment, you have to pre-process the data and identify interesting patterns in the given dataset of
Graduate University Students.

1) Data Exploration
In this assignment, pre-process and explore the given dataset using WEKA. Report your findings in a word document
and upload the document on Google classroom.

a. [20 mark] Explore the dataset, and


a. For each attribute, report the following: type, mean, median, mode, range, and variance. These
measures of central tendency and dispersion help to analyze the attribute.

b. For each attribute, identify issues in data quality like missing value, inconsistency, noise, outliers etc.
Suggest the appropriate response if any of the above potential problems exist in specific data
attributes. For example, how you intend to handle missing values, outliers etc.

c. Analyze the attributes based on the above information. (Don't just give numerical values; also explain
in simple English what information it gave you regarding the attribute)
i. How is an attribute distributed? (normal, skewed) and
ii. Find other insights, such as which attributes can be eliminated because of little or no change
in variance (Low variance filter).

b. [5 marks] Explore correlation among different attributes.


a. Analyze which attributes are positively related and which are negatively related.
b. Use graphs like scatter plots to get insights.

c. [5 marks] Discuss the new insights you found from visualizing and exploring the data, the techniques you tested,
and the results you obtained. You can include the different graphs and plots you have used for visualization,
but do examples in plain English.

2) Data Pre-processing and Frequent Pattern Mining

After data exploration, your task is to pre-process the given dataset and find trends and patterns using
association rule mining. Pre-processing includes data discretization (binning), data reduction, data smoothing,
and feature selection. Explain your choices, such as why you selected equal frequency or width binning. Also,
explain your choices for normalization and data reduction.

NOTE: Data pre-processing and frequent pattern mining is an iterative process. You may need to pre-process
data multiple times to identify exciting and valuable rules that give new insights.
Experiment with different parameters to extract strong rules (e.g., rules with high lift and confidence, which at
the same time have relatively good support). Convert the dataset into a form suitable for Association Rule
Mining. Pre-process the attributes so you can see some patterns in data and extract rules using Apriori.

1. [10 points] Use confidence as an interestingness measure of an association rule. Rank the top 10
association rules for at least the three different combinations of support and confidence. Explain the rules
and why you consider them interesting and valuable. Furthermore, also give recommendations based on
the discovered rules that might help the user.

2. [10 points] Use interest as an interestingness measure of an association rule. Rank the top 10 association
rules for at least three combinations of support and interest. Explain the rules and why you consider it
interesting and useful. Furthermore, also give recommendations based on the discovered rules that might
help the user.

3. [10 points] Try to formulate some questions that you want to ask of your rule learning extraction systems.
Select the attributes that will be required to answer your questions. Run Association rule mining to
extract interesting patterns. Show at least 10 rules. Explain the rules and why you consider them
interesting and useful. Explain what insight you got regarding your questions.
a. For example, one may want to find the effect of the number of study hours, job, marital status,
and highly educated parents on CGPA. To figure this out, select the appropriate attributes, pre-
process them, and run apriori. You can set the class attributes in Weka to find rules about a
particular attribute.

Note: The top 5 most interesting rules are most likely not the top 5 in the result set of the Apriori algorithm.
They are rules that, in addition to having high support, lift, and confidence, also gives some non-trivial, useful
information based on the underlying business objectives.

Submission: Do include the different graphs and plots that you have used for visualization.

Note: The following Weka and Data Mining tutorial is helpful


http://facweb.cs.depaul.edu/mobasher/classes/ect584/WEKA/index.html.

You might also like