3.3 Assignment: One Variable Statistics: A) Histogram
3.3 Assignment: One Variable Statistics: A) Histogram
a) Histogram
Data is organized into bins, and the frequency of each bin is displayed as bars in a
histogram, a graphical representation of the data's distribution. A histogram of test
results that displays the proportion of students falling into each score range might
illustrate this
This refers to statistics, such as mean, median, and mode, that characterize a dataset's
middle or typical value. An example is the average test score for a class.
c) IQR
The first quartile (Q1) is subtracted from the third quartile (Q3) to get the IQR, which
calculates the spread of the middle 50% of a dataset. For instance: The IQR for a
dataset having Q1 = 25 and Q3 = 75 would be 50.
D ) Percentile
A percentile indicates the value below which a given percentage of data points in a
dataset fall. Example: A score in the 90th percentile means that 90% of scores are
below that value.
e) Outlier
A data point that deviates noticeably from previous observations and is frequently
located far from the core values is called an outlier. Example: 30% might be seen as an
outlier in a class whose test results are typically between 80 and 90 percent.
2) Part 1: Creating the Data Set
Here is a set of 20 data points that meet the given conditions: Data set: 10, 10, 15, 18,
19, 20, 20, 22, 24, 25, 25, 25, 26, 27, 28, 30, 30, 32, 33, 35
● Mean: 20
● Median: 25
● IQR: 10
● Sample standard deviation: Between 9 and 12
To generate a dataset that satisfies the given statistical properties, I followed these
steps:
1. Target Mean (20): To get a mean of 20, I aimed to create a data set where the
sum of all data points equals 20 times the number of data points (i.e.,
20×20=400). I distributed the values around 20, with some slightly higher and
some lower.
2. Median (25): I ensured that the middle value, or the average of the two middle
values, was exactly 25. Since there are 20 data points, the 10th and 11th values
must be around 25 to make the median correct.
3. IQR (10): The IQR is the difference between the third quartile (Q3) and the first
quartile (Q1). I aimed for Q3 to be around 30 and Q1 to be around 20, so the IQR
would be approximately 30−20=10
4. Sample Standard Deviation (between 9 and 12): The standard deviation should
show a moderate spread around the mean. I varied the values to be not too
tightly clustered but not too far apart, ensuring the standard deviation stayed
within the target range.
● Balancing the Mean and Standard Deviation: Keeping the standard deviation
between 9 and 12 while maintaining a mean of exactly 20 was tricky. Initially,
some data points were too far from the mean, resulting in a higher standard
deviation. I adjusted by bringing values closer to the mean without changing the
overall sum of the data points.
● Ensuring the Median and IQR: After setting the mean, I had to tweak the data to
ensure the correct median and IQR. This required careful positioning of the
middle values and some minor adjustments to maintain the desired spread of the
data.
3) Create a histogram to display the data
When finding the percentile of the data point 40 in the data set we must
organise the data by first listing all the values in the data set in the ascending
order
26,45,27,50,52,12,28,48,52,14,20,32,9,36,51,42,1,30,21,40,37,35,57,47,42,5
3
The next step is to arrang these values in ascending order
1,9,12,14,20,21,26,27,28,30,32,35,36,37,40,42,42,45,47,48,50,51,52,52,53,5
7
We must now determine the position of the data point 40
In this question there are 14 value below 40 (the value before 40 in the
ordered list). There are a total of 26 data points so, the percentile is
14
Percentile = 26
x 100≈ 53. 85
We will now move on to find the mean, median and mode using the following
methods
981
mean= 26
≈ 37. 73
7)Find the mean, median, and mode of the data set. Explain the method you
used.
Median
The median is the middle value when the data set is arranged in ascending
order. If there is an even number of data points, the median is the average of
the two middle values.
Steps:
1, 9, 12, 14, 20, 21, 26, 27, 28, 30, 32, 35, 36, 37, 40, 42, 42, 45, 47, 48, 50,
51, 52, 52, 53, 57
36+37
Median = 2
= 36.5
3. Mode:
The mode is the value that appears most frequently in the data set.
Steps:
Summary:
• Mean: 37.73
• Median: 36.5
• Mode: 42, 52 (bimodal)
To calculate the sample standard deviation of the data set, we follow these
steps:
Data Set:
2
∑(𝑥𝑖 − μ) =137.63+52.85+115.12+150.57+203.62 =4348.92
4. Divide by n - 1:
Since this is a sample, we divide by n - 1 (where n is the number of data
points). There are 26 data points, so n - 1 = 25.
4348.92
Variance= 25
=173.96
5. Take the square root of the variance to get the standard deviation:
Final Answer:
1, 9, 12, 14, 20, 21, 26, 27, 28, 30, 32, 35, 36, 37, 40, 42, 42, 45, 47, 48, 50,
51, 52, 52, 53, 57
● Q1 (1st Quartile): This is the median of the lower half of the data set (not
including the overall median).
○ Lower half: 1, 9, 12, 14, 20, 21, 26, 27, 28, 30, 32
(20+21)
○ Q1 is the median of this group: 2
= 20.5
● Q3 (3rd Quartile): This is the median of the upper half of the data set
(not including the overall median).
○ Upper half: 40, 42, 42, 45, 47, 48, 50, 51, 52, 52, 53, 57
(47+48)
○ Q3 is the median of this group: 2
= 47.5
IQR=Q3−Q1=47.5−20.5=27
Using 1.5 times the IQR, we calculate the boundaries for potential outliers:
Lower Boundary=Q1−1.5×IQR=20.5−1.5×27=20.5−40.5=−20
Upper Boundary=Q3+1.5×IQR=47.5+1.5×27=47.5+40.5=88
Any data point outside the range −20,88is considered an outlier. Since all data
points are between 1 and 57, there are no outliers based on the IQR method.
The z-score for each data point tells us how many standard deviations away
the point is from the mean:
𝑥𝑖−μ
Z-score= σ
A common rule of thumb is that any data point with a z-score greater than 3 or
less than -3 is considered an outlier (more than 3 standard deviations away
from the mean).
We can calculate the boundaries for potential outliers based on the mean and
standard deviation:
Any value outside the range −1.84,77.3 is considered an outlier. Since all
values in the data set are between 1 and 57, there are no outliers based on
the standard deviation method.
Conclusion:
Based on both the IQR method and the standard deviation method, there are
no outliers in this data set. All values fall within the expected range for each
method.
10)In what ways might the graph be considered misleading or biased? Explain why
the producer of the graph may have prepared it in this way
Y-Axis Range: It may overstate the variations across episodes if the Y-axis is not set
to zero.It may be more difficult to compare the bars for live and 7-day viewers
separately for each program due to the stacked bar representation.• Bar Widths:
Perception can be distorted by uneven or irregular bar widths.• Colour Usage: The
graph's color scheme may make it challenging to discern between viewers who are
live and those who are delayed.
11. Prepare one grid with separate box-and-whisker plots of both live viewers and
7-day viewers per episode