The Handwritten Solutions To The First Five Questions, and The Report of Last Question
The Handwritten Solutions To The First Five Questions, and The Report of Last Question
Submission Instructions: Submit a single pdf file (or zipped folder) on LMS containing
the handwritten solutions to the first five questions, and the report of last question.
1. Given the following data (in ascending order) for the attribute age: 12, 15,16, 16,
19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52,
72. [10 Marks]
(a) Use smoothing by bin means to smooth these data, using a bin depth of 3.
Illustrate your steps. Comment on the effect of this technique for the given data.
(b) How might you determine outliers in the data?
(c) What other methods are there for data smoothing?
3. Use these methods to normalize the following group of data: 200, 300, 400,
600,1000
(a) min-max normalization by setting min = 0 and max = 1 [10
Marks]
(b) z-score normalization
(c) normalization by decimal scaling
4. Using the data for age given in question 1, answer the following: [10
Marks]
(a) Use min-max normalization to transform the value 35 for age onto the range [0.0,
1.0].
(b) Use z-score normalization to transform the value 35 for age.
(c) Use normalization by decimal scaling to transform the value 35 for age.
(d) Comment on which method you would prefer to use for the given data, giving
reasons as to why.
5. Suppose a group of 12 sales price records has been sorted as follows: [10
Marks]
5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215.
Partition them into three bins by each of the following methods:
(a) equal-frequency (equal-depth) partitioning
(b) equal-width partitioning
(c) clustering
(d) do numerosity reduction of these data to obtain 50% data reduction.
a) Load Iris dataset (available in the data folder of Weka installation). Explore
different filters in the preprocess tab and demonstrate the use of following:
attribute: Discretize, Normalize, Standardize, MathExpression
instance: Resample, Randomize
Assignment 3: Data Pre-processing Due Date: 24 April, 2020 Marks: 100
b) Apply attribute selection (from select attributes tab) and report the resulting
attributes, using following methods:
CfsSubsetEval
PrincipalComponents
c) Apply dimensionality reduction to the SGPA dataset you created in assignment 1
for prediction of your Spring2020 SGPA. Report the original dimensions and the
reduced dimensions obtained.