0% found this document useful (0 votes)
92 views4 pages

Id No Inst Time Status Ag e Se X Ph. Ecog Ph. Karno Pat. Karno Meal - Cal WT - Loss

1. The document discusses various data normalization and smoothing techniques. It asks to apply min-max normalization, z-score normalization, and binning to different datasets. 2. For age data, it asks to apply min-max normalization, z-score normalization, and decimal scaling to value 35. For sales data partitioned into bins of equal frequency and width. 3. It also asks to discretize age into categories, and sketch examples of different sampling techniques like SRSWOR, SRSWR, cluster and stratified sampling on age data partitions. 4. Additionally, it provides instructions to write an R program to find factorial of a number and create a student data frame.

Uploaded by

likhith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views4 pages

Id No Inst Time Status Ag e Se X Ph. Ecog Ph. Karno Pat. Karno Meal - Cal WT - Loss

1. The document discusses various data normalization and smoothing techniques. It asks to apply min-max normalization, z-score normalization, and binning to different datasets. 2. For age data, it asks to apply min-max normalization, z-score normalization, and decimal scaling to value 35. For sales data partitioned into bins of equal frequency and width. 3. It also asks to discretize age into categories, and sketch examples of different sampling techniques like SRSWOR, SRSWR, cluster and stratified sampling on age data partitions. 4. Additionally, it provides instructions to write an R program to find factorial of a number and create a student data frame.

Uploaded by

likhith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

Use the two methods below to normalize the following data:


(5X10=50)
20, 35, 40, 63, 17, 29
(a) What is the mean of the data? What is the median? (2)
(b) min-max normalization by setting min = 0 and max = 1 (4)
(c) z-score normalization (4)

2. Suppose that the data for analysis include the attribute age. The age values
for the data tuples are (in increasing order):
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35,
35, 36, 40, 45, 46, 52, 70
Smooth the data using following methods using a bin depth of 3
(a) smoothing by bin means (3)
(b) smoothing by bin medians (3)
(c) smoothing by bin boundaries (4)

3. In the following real-world data, tuples with missing values for some
attributes are a common occurrence. Fill the missing values with
appropriate method and justify why you have chosen that method. (10 M)
meal
Id ag se ph. ph. pat. .
no Inst time status e x Ecog karno karno Cal wt.loss
1 3 306 2 74 1 1 90 100 1175
2 3 455 2 68 1 0 90 90 1225 15
3 3 1 56 1 0 90 90 15
4 5 210 57 1 1 90 60 1150 11
5 1 883 2 60 1 0 90 0
102
6 12 2 1 1 1 50 80 513 0
7 7 310 2 68 2 2 70 60 384 10
8 11 2 71 2 2 60 80 1
9 1 218 2 53 1 1 70 80 825 16
10 7 166 2 61 2 70 271 34
4. Write a program in R to print a square pattern with # character.
Sample Output:
Print a pattern like square with # character:
Input the number of characters for a side: 4
#
##
###
####

5. Write a different between vector and list in R. Explain with examples.


1. Using the following data for age 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25,
25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
answer the following:
(a) Use min-max normalization to transform the value 35 for age onto the
range [0.0, 1.0]. (3)
(b) Use z-score normalization to transform the value 35 for age, where the
standard deviation of age is 12.94 years. (3)
(c) Use normalization by decimal scaling to transform the value 35 for
age. (3)
(d) Comment on which method you would prefer to use for the given data,
giving reasons as to why. (1)

2. Suppose a group of 12 sales price records has been sorted as follows:


5, 10, 11, 13, 15, 35, 50, 55, 72, 92
Partition them into three bins by each of the following methods:
(a) equal-frequency (equal-depth) partitioning, bin depth=3 (5)
(b) equal-width partitioning, bin width = 10 (5)
3. Using the data for age given in Question 1
(a) Discretize the age attribute into “youth,” “middle-aged,” and “senior.”
(b) Sketch examples of each of the following sampling techniques:
SRSWOR, SRSWR, cluster sampling, stratified sampling. Use samples of
size 5 and the strata “youth,” “middle-aged,” and “senior.”

4. Write a R program to find the factorial of the given number

5. Create a student data frame using R language.

You might also like