PS03 Descriptive Statistics
PS03 Descriptive Statistics
I. Concept Questions
1. If all values in a sample are the same constant (say c) what is the standard deviation? What is
the mean? Does the mode exist?
2. The arithmetic mean of the 15 customer orders is 54. Find the new (combined) arithmetic
mean in each of the following situations:
(a) X = [ 5,2,3,2,5,5]
(b) Y = [70,65,90,70,80,75,95]
6. Suppose that the values for a given set of data are grouped into intervals. The intervals and
corresponding frequencies are as follows.
CS40003: Data Analytics ©DSamanta, IIT Kharagpur
Age Frequency
1-5 200
5-15 450
15-20 300
20-50 1500
50-80 700
80-110 44
7. Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33,
33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, tri-model,
etc.).
(d) Can you find (roughly) the first quartile (Q1) and he third quartile (Q3) of the data?
8. In many applications, new data sets are incrementally added to the existing large data sets.
Thus an important consideration for computing descriptive data summary is whether a
measure can be computed efficiently in incremental manner. Use count, standard deviation,
and median as examples to show that a distributive or algebraic measure facilitates efficient
incremental computation, whereas a holistic measure does not.
9. Which of the following measures of central tendency allow: a) distributive, b) algebraic and
c) holistic measures:
(a) Mean
(b) Median
CS40003: Data Analytics ©DSamanta, IIT Kharagpur
(c) Mode
10. Give three situations where AM, GM and HM are the right measures to provide a better
central tendency.
11. Suppose frequency distribution of two samples are shown in the following graphs:
Locate the position of 1) Mean 2) Median 3) Mode in each of the above mentioned
graphs.
13. How the box-plot will look like for the following type of samples:
(a) Symmetric
(b) Positive skew-symmetric
(c) Negative skew-symmetric
(d) In-variate data
14. Variance of a sample X={x1, x2, x3, …., xn} is calculated using the following formula:
1
1
where x’ is mean(x).
15. The standard deviation of the sample X is zero, is it possible? If possible, then what it does
mean? Under what type of distribution of data in X it is possible? Give an example.
CS40003: Data Analytics ©DSamanta, IIT Kharagpur
16. Give an example of X such that standard deviation is with a maximum value possible.
17. From the tabulation of marks of students participated in four courses c1,c2,c3 & c4, box-
plots are shown in the following figure:
18. The following measurements were recorded for the drying time, in hours, of a certain brand
of latex paint
CS40003: Data Analytics ©DSamanta, IIT Kharagpur
(c) Calculate the sample median.
(d) Compute the 20% trimmed mean for the above data set.
19. A tire manufacturer wants to determine the inner diameter of a certain grade of tire. Ideally,
the diameter would be 570 mm. The data are as follows:
II Objective Questions
1. The scores of eight persons in an IQ test were:
(a) 107
(b) 110
(c) 112
(d) 104
(e) None of the above.
CS40003: Data Analytics ©DSamanta, IIT Kharagpur
(e) a nominal variable
(f) an ordinal variable
(g) an interval variable
(h) a ratio variable.
CS40003: Data Analytics ©DSamanta, IIT Kharagpur
9. A measurable characteristic of a population is:
(a) a parameter
(b) a statistic
(c) a sample
(d) an experiment.
10. What is the primary characteristic of a set of data for which the standard deviation is
zero?
(a) All values of the variable appear with equal frequency.
(b) All values of the variable have the same value.
(c) The mean of the values is also zero.
(d) All of the above are correct.
(e) None of the above is correct.
11. Let X be the distance in miles from their present homes to residences when in high school
of individuals at a class reunion. Then X is:
(a) a population
(b) a statistic
(c) a sample
(d) none of the above.
13. The median is a better measure of central tendency than the mean if:
14. A small sample of automobile owners at IIT Kharagpur produced the following number
of parking tickets during a particular year: 4, 0, 3, 2, 5, 1, 2, 1, 0. The mean number of
tickets (rounded to the nearest tenth) is:
(a) 1.7
(b) 2.0
(c) 2.5
CS40003: Data Analytics ©DSamanta, IIT Kharagpur
(d) 3.0
15. A set of data points follow a simple linear relation y= 3x + 2, where x is any integer number. The
mean of the values of y for all values of x in the range [1 ... 100] is
(a) 50
(b) 50.5
(c) 152
(d) 152.5
16. Suppose frequency distribution of two samples (I and II) are shown in the following figure:
f
X
X1 X2 X3
(a) The means, medians and modes for both I and II will be located at X2.
(b) The means of both I and II are at X1 and median and mode of II are at X1 and X3,
respectively.
(c) The means of both I and II are at X1 and mode and median of II are at X1 and X3,
respectively.
(d) Data II does not have neither median nor mean.
17. Number of wickets obtained by a bowler in 10 Test matches are shown in the following table.
Number of wickets 0 1 2 3 4
Number of Test matches 1 3 4 1 1
(a) 1
(b) 2
(c) 3
(d) 4
CS40003: Data Analytics ©DSamanta, IIT Kharagpur