Assign 1
Assign 1
Submission: Submit the assignment hardcopy in the second Data Mining class of the week (23 or 24 Nov. 2023).
1. (20 points)
Apply your basic data mining knowledge to compare students’ performance in the midterm exam results of a
course for two years, i.e., 2020 and 2021 (result_20_21.xls). You should provide your comments and comparison
by using the statistical description of the data (e.g., mean, median, mode, variance, 5-number summary, etc.)
and plots (boxplot, histogram, etc.). (2 to 3 pages report required)
2. (20 points)
Download the DryBean dataset from UCI Machine Learning Repository. Read the datasets’ descriptions and report
the following (use any language or tool of your choice to solve this problem):
a. The types of the attributes (continuous [interval, ratio], categorical [nominal, ordinal]). Also identify which
attribute(s) are input attribute(s) and which are class attribute(s) (if any).
b. Compute the five-number summary for any two continuous attributes. Compute the mode for categorical
attributes.
c. Compute the mean and standard deviation for the two continuous attributes.
d. Generate the quantile (percentile) plots for two attributes in each dataset.
e. Generate the histogram or distribution plot for each of the two attributes selected in (b).
f. Generate the scatter plots for the two attributes selected in (d).
3. (10 points)
Download and install Weka, a data mining tool, on your systems. Explore the tool and the datasets provided
with the installation. Submit a report containing basic statistics and plots (e.g., scatter plot matrix) for the Iris
dataset using Weka tool. (2 to 3 pages report required)
https://sourceforge.net/projects/weka/
https://www.cs.waikato.ac.nz/ml/weka/
https://waikato.github.io/weka-wiki/downloading_weka/