Data Visualization
Data Visualization
Scenario:
A bar chart shows the annual revenue (in millions) for three companies over four years:
Company
Annual A for
Revenue Company B Company
Companies A, B, andCC
60
Revenue (millions)
40
20
0
2019 2020 2021 2022
Questions:
(b) Calculate the absolute revenue growth for Company A from 2019 to 2022.
Solution
Solution:
(a) In 2021, the revenues are: A = 35, B = 30, C = 18. Therefore, Company A had
the highest revenue.
(b) Company A grew from 20 (2019) to 40 (2022), an increase of 40 − 20 = 20 million.
Data Interpretation Practice 2
Scenario:
A box plot displays daily sales (in units) for two shops over a month.
60
Daily Sales (units)
50
40
Shop X Shop Y
Questions:
(a) Which shop has a higher median, and by how many units?
Solution
Solution:
(a) Shop X median = 45; Shop Y median = 50; the difference is 50 − 45 = 5 units.
(b) IQR for Shop X = 50 − 40 = 10; IQR for Shop Y = 55 − 45 = 10.
(c) Range for Shop X = 55 − 35 = 20; Range for Shop Y = 65 − 40 = 25. So, Shop Y
has a larger overall range.
Data Interpretation Practice 3
Scenario:
A pie chart shows the market share of five companies:
Tech-4
A-Plus
20%
25%
15%
SoftPro 15%
25%
Others
Cube Inc
Questions:
(b) If the total market is $200 million, how many dollars does Tech-4 hold?
(c) What percentage of the market is held by companies other than A-Plus and Cube Inc?
Solution
Solution:
(a) A-Plus and Cube Inc both hold 25%.
(b) Tech-4 holds 20% of $200 million, i.e., $200 million × 0.20 = $40 million.
(c) Combined share of A-Plus and Cube Inc = 25% + 25% = 50%. Thus, the remaining
market share is 100% - 50% = 50%.
Data Interpretation Practice 4
Scenario:
Consider the data set: {10, 12, 13, 14, 15}.
(b) Now, add an outlier 50 to form {10, 12, 13, 14, 15, 50}. Compute the new mean and
median.
(c) Explain how the outlier affects the mean and median.
Solution
Solution:
(a) Original Data: {10, 12, 13, 14, 15}:
• Median = 13.
(c) Explanation: The mean increased from 12.8 to 19 because it is sensitive to extreme
values, while the median changed only slightly (from 13 to 13.5), showing that the
median is less affected by outliers.
Data Interpretation Practice 5
Scenario:
The following table shows the frequency of test scores:
Score 50 60 70 80 90
Frequency 2 3 5 4 1
Questions:
Solution
Solution:
(a) Mean: Multiply each score by its frequency and sum:
Total Sum = 50(2) + 60(3) + 70(5) + 80(4) + 90(1) = 100 + 180 + 350 + 320 + 90 = 1040.
Scenario:
A box plot for monthly sales (in $1000s) of a company shows:
• Median = 45,
Questions:
(b) Using the 1.5 IQR rule, calculate the lower and upper fences.
Solution
Solution:
(a) IQR = Q3 − Q1 = 55 − 40 = 15.
(b) Lower fence = Q1 − 1.5 × IQR = 40 − 22.5 = 17.5.
Upper fence = Q3 + 1.5 × IQR = 55 + 22.5 = 77.5.
Data Interpretation Practice 7
Scenario:
A histogram of annual incomes (in $1000s) for a sample shows most incomes are between
$30k and $50k, with a long tail toward higher incomes.
(b) What does the skewness imply about the relationship between the mean and the me-
dian?
Solution
Solution:
(a) The long tail toward higher incomes indicates the distribution is right-skewed.
(b) In a right-skewed distribution, the mean is typically greater than the median,
because the high values pull the average upward.
Data Interpretation Practice 8
Scenario:
Dataset A: {40, 45, 50, 55, 60}
Dataset B: {35, 45, 50, 55, 75}
Questions:
(b) Based on the means and data spread, which dataset appears to be more right-skewed?
Explain.
Solution
Solution:
(a) Dataset A: Mean = (40 + 45 + 50 + 55 + 60)/5 = 250/5 = 50.
Dataset B: Mean = (35 + 45 + 50 + 55 + 75)/5 = 260/5 = 52.
(b) In Dataset A, the values are evenly spaced around 50. In Dataset B, the highest
value (75) is much larger than the rest, pulling the mean upward (mean = 52 while
the median is likely closer to 50). Thus, Dataset B appears to be more right-skewed.
Data Interpretation Practice 9
Scenario:
Below is a grouped frequency distribution of exam scores:
Score Range Frequency Midpoint
40 – 50 4 45
50 – 60 6 55
60 – 70 8 65
70 – 80 5 75
80 – 90 3 85
Questions:
Solution
Solution:
(a) Estimated Mean: Multiply each midpoint by its frequency and sum:
Scenario:
A histogram of exam scores shows two peaks. The first peak is around 55 and the second
peak is around 75. Most students score between 45 and 65 or between 65 and 85, with a dip
in frequency around 65.
(b) How might the mean and median compare in a bimodal distribution?
(c) Suggest one possible explanation for a bimodal exam score distribution.
Solution
Solution:
(a) A bimodal distribution has two distinct peaks, indicating there are two groups
within the data with different common scores.
(b) In a bimodal distribution, the mean might lie between the two modes, but the
median could be closer to the central point between them or be influenced by the
group sizes. They may not be equal.
(c) One possible explanation is that the exam was taken by two distinct groups of
students (e.g., one group that prepared well and one that did not), resulting in two
clusters of scores.
Summary
These problems cover a range of data interpretation tasks, from reading bar charts, box
plots, and pie charts to calculating means and variances from grouped data, and interpreting
distribution shapes and skewness. The questions move from basic to moderately tough,
helping you develop a deeper understanding of data summary and interpretation.
End of Document