Assignment
Assignment
1. In a survey, citizens of selected cities were asked to select their preferred mode of
transportation from the options: car, bus, train, bicycle, and walking. Is the data collected
quantitative or qualitative? Explain why?
2. Representatives of the insurance industry aimed to investigate the monetary loss resulting
from earthquake damage to single-family dwellings in KPK, in October 2005. From the set
of all single-family homes in KPK, 100 homes were selected for inspection. Point out the
population and sample for this problem.
3. A survey was performed to gather information about students' exam scores. The scores
obtained by five students in a particular subject are as follows: 80, 75, 90, 85, and 95.
Identify the elements, variables, and observations in this dataset.
4. A survey aims to gather information about the consumption habits of university students. The
researcher randomly selects 300 students from a university with a total student population of
10,000. Determine the population and the sample in this scenario.
5. A census is conducted to gather data on the income levels of households in a town. The
census covers all 5000 households residing in the town. Identify the population and explain
why it is a census rather than a sample.
6. A questionnaire asked the participants to rate their agreement with a statement on a scale
from 1 to 5, where 1 represents "strongly disagree" and 5 represents "strongly agree."
Determine the scale of measurement for this agreement ratings.
7. A survey collects data on the opinions of participants using a Likert scale from 1 to 7.
Explain why this scale is considered ordinal and not interval or ratio.
8. What are the implications of using different scales of measurement?
9. What are different methods of summarizing qualitative and quantitative data? Explain.
10. Describe a scenario where a variable could be measured on a nominal scale, an ordinal scale,
and an interval scale.
11. Explain the concept of bar graph and histogram with suitable example.
12. A histogram is constructed using data on the ages of participants, with the x-axis representing
age intervals and the y-axis representing frequency. Describe how you would interpret the
height of a bar in the histogram.
13. In a dataset, the mode occurs more than once. What does this indicate about the distribution
of the data?
14. A number of questions were asked to visitors of random sample in a Pakistan tourist
information center. For each question below, describe the type of data obtained.
a. Are you staying overnight in Pakistan?
b. How many times have you visited Pakistan previously?
c. Which of the following attractions have you visited? (Mohenjo Daro, Harappa,
Lahore Fort, Shah Faisal Mosque, Minar-e-Pakistan, Northern Areas)
d. How likely are you to visit Pakistan again in the next 24 months: (1) unlikely, (2)
likely, (3) very likely?
e. How would you rate the services you received in your stay on a scale from 1 (very
poor) to 5 (very good)?
15. In a study measuring students' scores, the recorded scores of a sample of 50 students are as
follows: 165, 168, 170, 172, 175, 178, 180, 182, 185, 188, 190, 193, 196, 199, 202, 205, 208,
210, 213, 216, 219, 222, 225, 228, 230, 233, 236, 239, 242, 245, 248, 250, 253, 256, 259,
262, 265, 268, 270, 273, 276, 279, 282, 285, 288, 290, 293, 296, 299, 302, 305. Calculate the
following measures of central tendency for the sample of heights: mean, median, and mode.
16. An article summarized results from a survey of 1001 adults on the use of profanity. When
asked “How many times do you use swear words in conversations?” 46% responded a few or
more times per week, 32% responded a few times a month or less, and 21% responded never.
Use the given information to construct a segmented bar chart.
17. The study mentioned in book described the results of a survey of 1006 adults who were
asked about various technologies (3 tech). Below table summarizes the responses to
questions about how essential these technologies were
Relative Frequency
Response Computer Mobile DVD player
Can’t imagine living 0.46 0.41 0.19
without
Would miss but 0.28 0.25 0.35
could do without
Could definitely live .26 0.34 0.46
without
Construct a comparative bar chart that shows the distribution of responses for the three different
technologies.
18. Students in Punjab are required to pass an exit exam in order to graduate from high school.
The pass rate for Rawalpindi Government High School has been rising, as have the rates for
Jhelam School and the high school of Shahpur. The percentage of students that passed the
test was as follows:
a. Construct a comparative bar chart that allows the change in the pass rate for each
group to be compared.
b. Is the change the same for each group? Comment on any difference observed
19. Unhealthy diet in adolescents and adults increases the risk of heart disease. In a study of
3110 adolescents and 2205 adults, researchers found 33.6% of adolescents and 13.9% of
adults were unfit; the percentage was similar in adolescent males (32.9%) and females
(34.4%), but was higher in adult females (16.2%) than in adult males (11.8%).
a. Summarize this information using a comparative bar graph that shows
differences between males and females within the two different age groups.
b. Comment on the interesting features of your graphical display
20. A survey conducted on 1001 adults asked “How accurate are the weather forecasts in your
area?”. The responses are summarized in the table below:
Extremely 4%
Very 27%
Somewhat 53%
Not too 11%
Not at all 4%
Not sure 1%
a. Do you think this is an effective use of a pie chart? Why or why not?
b. Construct a bar chart to show the distribution of deaths by object struck. Is this
display more effective than the pie chart in summarizing this data set? Explain
22. Turkey provided following information based on the survey regarding the cell phone use for
men and women:
Relative frequency
Average Number of Minutes Men Women
Used per Month
0 to < 200 0.56 0.61
200 to < 400 0.18 0.18
400 to < 600 0.10 0.13
600 to < 800 0.16 0.08
a. Construct a relative frequency histogram for average number of minutes used per month
for men. How would you describe the shape of this histogram?
b. Construct a relative frequency histogram for average number of minutes used per month
for women. Is the distribution for average number of minutes used per month similar for
men and women? Explain.
c. What proportion of men average less than 400 minutes per month?
d. Estimate the proportion of men that average less than 500 minutes per month.
e. Estimate the proportion of women that average 450 minutes or more per month
23. Paper is given to the students of 2 nd semester in an introductory statistics course. What is
likely to be true of the shape of the histogram of scores if:
a. The exam is quite easy?
b. The exam is quite difficult?
c. Half the students in the class have had calculus, the other half have had no prior
college math courses, and the exam emphasizes mathematical manipulation? Explain
your reasoning in each case
24. For each of the 365 days clearness index was calculated during a particular year. The
accompanying table summarizes the resulting data:
a. Determine the relative frequencies and draw the corresponding histogram. (Be careful
here—the intervals do not all have the same width.)
b. Cloudy days are those with a clearness index smaller than 0.35. What proportion of the
days was cloudy?
c. Clear days are those for which the index is at least 0.65. What proportion of the days was
clear?
25. By using the 5 class intervals 100 to 120, 120 to 140, . . . , 180 to 200, devise a frequency
distribution based on 70 observations whose histogram could be described as follows:
a. symmetric
b. positively skewed
c. bimodal
d. negatively skewed
26. Take a dataset of your own choice and make scatter plot and trend line. Also highlight the
type of relationships depicted by the diagram.
27. Given a scatter plot, how would you differentiate the negative and positive correlation
between the two variables? Provide an example.
28. The article in Journal of Energy Engineering gave data on various characteristics of
subdivisions that could be used in deciding whether to provide electrical power using
overhead lines or underground lines. Data on the variable x=total length of streets within a
subdivision are as follows
1280 5320 4390 2100 1240 3060 4770 1050 360 3330 3380 340 1000 960 1320 530 3350 540
3870 1250 2400 960 1120 2120 450 2250 2320 2400 3150 5700 5220 500 1850 2460 5850 2700
2730 1670 100 5770 3150 1890 510 240 396 1419 2109 5770
a. Construct a stem-and-leaf display for these data using the thousands digit as the stem.
Comment on the various features of the display.
b. Construct a histogram using class boundaries of 0 to 1000, 1000 to 2000, and so on.
How would you describe the shape of the histogram?
c. What proportion of subdivisions has total length less than 2000? Between 2000 and
4000?
29. A survey is conducted among 500 students, the following data shows the number of hours
they spend on studying per week: 2, 5, 3, 4, 5, 6, 7, 2, 3, 4, 5, 6, 7, 8, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6,
7, 8, 9, 10. Calculate the frequency and relative frequency for each distinct value of study
hours.
30. A survey was conducted to determine the favorite color among 400 participants. The
responses were as follows: red =75, blue =105, green =85, yellow = 55, orange =40, purple
=30, pink =25, brown = 45. Calculate the relative frequency for each color.
31. Suppose that data on x = poverty rate (%) and y = high school dropout rate (%) for the 50
U.S. states were used to construct the following scatterplot:
Interpret this scatterplot. Would you describe the relationship between poverty rate and dropout
rate as positive, negative, or as having no discernible relationship between x and y? Explain
32. To examine the relationship between study hours and exam scores among college students,
researcher conducted a study to examine study. The following data was collected:
Study Hours: 2, 4, 5, 6, 7, 8, 9, 10
Exam Scores: 60, 70, 75, 80, 85, 90, 95, 100
a. Create a scatter plot using the given data.
b. Describe the pattern observed in the scatter plot.
c. Calculate the correlation coefficient between study hours and exam scores. Interpret the
result.
d. Based on the scatter plot, can we conclude that increased study hours lead to higher exam
scores? Explain your reasoning.
33. By following the 13 real estate prices (in $) as 389,950; 230,500; 158,000; 479,000; 639,000;
114,950; 5,500,000; 387,000; 659,000; 529,000; 575,000; 488,800; 1,095,000. Calculate the
IDR and also examine if there is any outliers in mentioned data.
34. The weights (in kilograms) of a sample of 50 participants are as follows: 70, 75, 80, 65, 70,
75, 80, 85, 90, 75, 80, 85, 70, 75, 80, 65, 70, 75, 80, 85, 70, 75, 80, 65, 70, 75, 80, 85, 70, 75,
80, 65, 70, 75, 80, 85, 70, 75, 80, 65, 70, 75, 80, 85, 70, 75, 80, 65, 70, 75. Calculate the
mean, median, and mode of the weights.
35. The Transportation Department reported the number of speeding-related crash fatalities for
the 20 dates that had the highest number of these fatalities between 1994 and 2003. The data
is compiled below:
36. An experiment to study the lifetime (in hours) for a certain type of component involved
putting 10 components into operation and observing them for 100 hr. Eight of the
components failed during that period, and those lifetimes were recorded. The lifetimes of the
two components still functioning after 100 hr are recorded as 100+. The resulting sample
observations were (48 79 100+ 35 92 86 57 100+ 17 29) which of the measures of center can
be calculated, and what are the values of those measures?
37. The monthly expenses of a household for the year 2022 are as follows: 2000, 2200, 2300,
2500, 2100, 2000, 2200, 2300, 2500, 2100, 2000, 2200, 2300, 2500, 2100, 2000, 2200, 2300,
2500, 2100, 2000, 2200, 2300, 2500, 2100, 2000, 2200, 2300, 2500, and 2100. Note that
expenses are in dollars. Calculate the range, interquartile range, mean absolute deviation
(MAD), and coefficient of variation of the monthly expenses.
38. Consider two datasets, Dataset A and Dataset Z, both containing 100 observations. The mean
of Dataset A is 50 with a standard deviation of 10, while the mean of Dataset Z is 50 with a
standard deviation of 5. Which dataset exhibits greater variability? Explain your reasoning.
39. Below mentioned data represents the average study time ( in hours) and GPA for a given
sample of students:
41. The following table shows the ages of competitors in a football event in US:
Age Percent
18-24 18.26
25-34 16.25
35-44 25.88
45-54 19.26
55+ 20.35
42. Following is a random sample of five (x, y) pairs of data points: 112, 2002 130, 6002 115,
2702 124, 5002 114, 2102
a. Compute the covariance.
b. Compute the correlation coefficient.
43. A sample consisting of four pieces of luggage was selected from among those checked at an
airline counter, yielding the following data on x = weight (in pounds):
X1 = 33.5, x2 = 27.3, x3 = 36.7, x4 =30.5
Suppose that one more piece is selected; denote its weight by x5. Find a value of x5 such that
x‾ = sample median.
44. A survey gave summary quantities for sodium content (in milligrams per kilogram) of
chocolate pudding made from instant mix:
3099 3112 2401 2824 2682 2510 2297 3959 3068 3700
Compute the mean, the standard deviation, and the interquartile range for sodium content of
these chocolate puddings.
45. The paper in literature gave summary quantities for blood lead level (in micrograms per
deciliter) for a sample of whites and a sample of African Americans. This data is provided
below:
Whites: 8.3; 0.9, 2.9, 5.6, 5.8, 5.4, 1.2, 1.0, 1.4, 2.1, 1.3, 5.3, 8.8, 6.6, 5.2, 3.0, 2.9, 2.7, 6.7,
3.2
African Americans: 4.8, 1.4, 0.9, 10.8, 2.4, 0.4, 5.0, 5.4, 6.1, 2.9, 5.0, 2.1, 7.5, 3.4, 13.8, 1.4,
3.5, 3.3, 14.8, 3.7
a. Compute the values of the mean and the median for blood lead level for the sample of
African Americans. Which of the mean or the median is larger? What characteristic of
the data set explains the relative values of the mean and the median?
b. Construct a comparative boxplot for blood lead level for the two samples. Write a few
sentences comparing the blood lead level distributions for the two samples.
46. A student gave two national aptitude tests. The national average and standard deviation were
475 and 100, respectively, for the first test and 30 and 8, respectively, for the second test. The
student scored 625 on the first test and 45 on the second test. Use z scores to determine on
which exam the student performed better relative to the other test takers.
47. Survey conducted on for a group of 162 college students reported that, the average number of
responses changed from the correct answer to an incorrect answer on a test containing 80
multiple choice items was 1.4. The corresponding standard deviation was reported to be 1.5.
Based on this mean and standard deviation, what can you tell about the shape of the
distribution of the variable number of answers changed from right to wrong? What can you
say about the number of students who changed at least six answers from correct to incorrect?
48. Suppose that the following data values are 2021 per capita expenditures on public parks for
selected 50 districts of Pakistan :
29.48, 24.45, 23.64, 23.34, 22.10, 21.16, 19.83, 18.01, 17.95, 17.23, 16.53, 16.29, 15.89,
15.85, 13.64, 13.37, 13.16, 13.09, 12.66, 12.37, 11.93, 10.99, 10.55, 10.24, 10.06, 9.84, 9.65,
8.94, 7.70, 7.56, 7.46, 7.04, 6.58, 5.98, 19.81, 19.25, 19.18, 18.62, 14.74, 14.53, 14.46, 13.83,
11.85, 11.71, 11.53, 11.34, 8.72, 8.22, 8.13, 8.01, 11.71, 11.53, 11.34, 8.72, 8.22, 8.13
a. Summarize this data set with a frequency distribution. Construct the corresponding
histogram.
b. Use the histogram in Part (a) to find approximate values of the following percentiles:
i. 50th
ii. 70th
iii. 90th
iv. 10th
v. 40th
49. Consider the following data:
62 23 27 56 52 34 42 40 68 45 83
50. A sample of n = 5 college students yielded the following observations on number of traffic
citations for a moving violation during the previous year:
x1= 1, x2 = 0, x3 = 0, x4 =3, x5 = 2
Calculate s2 and s.