0% found this document useful (0 votes)
116 views12 pages

3.3 Assignment: One Variable Statistics: A) Histogram

sociology

Uploaded by

Young Stars
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views12 pages

3.3 Assignment: One Variable Statistics: A) Histogram

sociology

Uploaded by

Young Stars
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

3.

3 Assignment: One variable statistics

a) Histogram

Data is organized into bins, and the frequency of each bin is displayed as bars in a
histogram, a graphical representation of the data's distribution. A histogram of test
results that displays the proportion of students falling into each score range might
illustrate this

b) .Measure of central tendency

This refers to statistics, such as mean, median, and mode, that characterize a dataset's
middle or typical value. An example is the average test score for a class.

c) IQR

The first quartile (Q1) is subtracted from the third quartile (Q3) to get the IQR, which
calculates the spread of the middle 50% of a dataset. For instance: The IQR for a
dataset having Q1 = 25 and Q3 = 75 would be 50.

D ) Percentile

A percentile indicates the value below which a given percentage of data points in a
dataset fall. Example: A score in the 90th percentile means that 90% of scores are
below that value.

e) Outlier

A data point that deviates noticeably from previous observations and is frequently
located far from the core values is called an outlier. Example: 30% might be seen as an
outlier in a class whose test results are typically between 80 and 90 percent.
2) Part 1: Creating the Data Set

Here is a set of 20 data points that meet the given conditions: Data set: 10, 10, 15, 18,
19, 20, 20, 22, 24, 25, 25, 25, 26, 27, 28, 30, 30, 32, 33, 35

● Mean: 20
● Median: 25
● IQR: 10
● Sample standard deviation: Between 9 and 12

Plan to Create the Data Set

To generate a dataset that satisfies the given statistical properties, I followed these
steps:

1. Target Mean (20): To get a mean of 20, I aimed to create a data set where the
sum of all data points equals 20 times the number of data points (i.e.,
20×20=400). I distributed the values around 20, with some slightly higher and
some lower.
2. Median (25): I ensured that the middle value, or the average of the two middle
values, was exactly 25. Since there are 20 data points, the 10th and 11th values
must be around 25 to make the median correct.
3. IQR (10): The IQR is the difference between the third quartile (Q3) and the first
quartile (Q1). I aimed for Q3 to be around 30 and Q1 to be around 20, so the IQR
would be approximately 30−20=10
4. Sample Standard Deviation (between 9 and 12): The standard deviation should
show a moderate spread around the mean. I varied the values to be not too
tightly clustered but not too far apart, ensuring the standard deviation stayed
within the target range.

Challenges and Adjustments

● Balancing the Mean and Standard Deviation: Keeping the standard deviation
between 9 and 12 while maintaining a mean of exactly 20 was tricky. Initially,
some data points were too far from the mean, resulting in a higher standard
deviation. I adjusted by bringing values closer to the mean without changing the
overall sum of the data points.
● Ensuring the Median and IQR: After setting the mean, I had to tweak the data to
ensure the correct median and IQR. This required careful positioning of the
middle values and some minor adjustments to maintain the desired spread of the
data.
3) Create a histogram to display the data

4) Create a relative frequency polygon for the data.


5) Create a box-and-whisker plot for the data. If you use technology copy and
paste the graph into your solutions document. If you prepare it by hand use a
ruler and take a clear, well-lit photograph or scan and insert it as an image into
your document.

6) What is the percentile of the data point 40 in the data set?

When finding the percentile of the data point 40 in the data set we must
organise the data by first listing all the values in the data set in the ascending
order

Here is the data provided:

26,45,27,50,52,12,28,48,52,14,20,32,9,36,51,42,1,30,21,40,37,35,57,47,42,5
3
The next step is to arrang these values in ascending order

1,9,12,14,20,21,26,27,28,30,32,35,36,37,40,42,42,45,47,48,50,51,52,52,53,5
7
We must now determine the position of the data point 40

The data point 40 is in the 15th position in this order list

Then we proceed to use the percentile formula

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡


Percentile = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
x100

In this question there are 14 value below 40 (the value before 40 in the
ordered list). There are a total of 26 data points so, the percentile is

14
Percentile = 26
x 100≈ 53. 85

The data point 40 is at approximate the 54th percentile.

We will now move on to find the mean, median and mode using the following
methods

Mean is the average of the data points


The valve are add together and then divided by the number of data points

Lets move on to the calculaton

Sum of value = 26+45+27+50+52+12


There values is 26

981
mean= 26
≈ 37. 73
7)Find the mean, median, and mode of the data set. Explain the method you
used.

Median

The median is the middle value when the data set is arranged in ascending
order. If there is an even number of data points, the median is the average of
the two middle values.

Steps:

• Arrange the data in ascending order:

1, 9, 12, 14, 20, 21, 26, 27, 28, 30, 32, 35, 36, 37, 40, 42, 42, 45, 47, 48, 50,
51, 52, 52, 53, 57

• Since there are 26 values (even number), the median is the


average of the 13th and 14th values in the ordered list, which are 36 and 37.

36+37
Median = 2
= 36.5

3. Mode:

The mode is the value that appears most frequently in the data set.

Steps:

• Look for any values that repeat.


Observation:
The value 52 appears twice, and so does 42. These are the most frequent
values.

Mode= 42, 52 (bimodal, as both appear twice)

Summary:

• Mean: 37.73
• Median: 36.5
• Mode: 42, 52 (bimodal)

8) What is the sample standard deviation of the data set?

To calculate the sample standard deviation of the data set, we follow these
steps:

Data Set:

26, 45, 27, 50, 52, 12, 28, 48, 52


14, 20, 32, 9, 36, 51, 42, 1
30, 21, 40, 37, 35, 57, 47, 42, 53

Steps to Calculate Sample Standard Deviation:

1. Find the mean (average):


We’ve already calculated the mean as approximately 37.73.
2. Subtract the mean from each data point and square the result:
For each data point x_i, calculate (x_i - \mu)^2, where \mu is the mean.
2 2
• For 26, (26 − 37. 73) = (− 11. 73) = 137.63
2 2
• For 45, (45 − 37. 73) =(− 11. 73) = 52.85
2 2
• For 27, (27 − 37. 73) = (− 10. 73) = 115.12
2 2
• For 50,(50 − 37. 73) = (12. 27) = 150.57
2 2
• For 52,(52 − 37. 73) =(14. 27) = 203.62
• Continue this for each value…
3. Sum all the squared differences:
The sum of all squared differences is:

2
∑(𝑥𝑖​ − μ) =137.63+52.85+115.12+150.57+203.62 =4348.92

4. Divide by n - 1:
Since this is a sample, we divide by n - 1 (where n is the number of data
points). There are 26 data points, so n - 1 = 25.

4348.92​
Variance= 25
=173.96

5. Take the square root of the variance to get the standard deviation:

Sample Standard Deviation= 173. 96 ​≈13.19

Final Answer:

The sample standard deviation of the data set is approximately 13.19.


9)Using the IQR Method:

Step 1: Sort the Data

First, we need to sort the data in ascending order:

1, 9, 12, 14, 20, 21, 26, 27, 28, 30, 32, 35, 36, 37, 40, 42, 42, 45, 47, 48, 50,
51, 52, 52, 53, 57

Step 2: Find Quartiles

● Q1 (1st Quartile): This is the median of the lower half of the data set (not
including the overall median).
○ Lower half: 1, 9, 12, 14, 20, 21, 26, 27, 28, 30, 32
(20+21)
○ Q1 is the median of this group: 2
= 20.5
● Q3 (3rd Quartile): This is the median of the upper half of the data set
(not including the overall median).
○ Upper half: 40, 42, 42, 45, 47, 48, 50, 51, 52, 52, 53, 57
(47+48)
○ Q3 is the median of this group: 2
= 47.5

Step 3: Calculate the IQR

The Interquartile Range (IQR) is:

IQR=Q3−Q1=47.5−20.5=27

Step 4: Determine the Outlier Boundaries

Using 1.5 times the IQR, we calculate the boundaries for potential outliers:

Lower Boundary=Q1−1.5×IQR=20.5−1.5×27=20.5−40.5=−20
Upper Boundary=Q3+1.5×IQR=47.5+1.5×27=47.5+40.5=88

Any data point outside the range −20,88is considered an outlier. Since all data
points are between 1 and 57, there are no outliers based on the IQR method.

2. Using the Standard Deviation Method:


We calculated the mean as μ=37.73\mu = 37.73 and the sample standard
deviation as σ ≈13.19

Step 1: Calculate the Z-Scores

The z-score for each data point tells us how many standard deviations away
the point is from the mean:

𝑥𝑖​−μ
Z-score= σ

A common rule of thumb is that any data point with a z-score greater than 3 or
less than -3 is considered an outlier (more than 3 standard deviations away
from the mean).

Step 2: Check for Outliers

We can calculate the boundaries for potential outliers based on the mean and
standard deviation:

Lower Boundary=μ−3σ = 37.73−3(13.19)=37.73−39.57=−1.84

Upper Boundary =μ+3σ = 37.73 + 3(13.19) = 37.73 + 39.57 = 77.3

Any value outside the range −1.84,77.3 is considered an outlier. Since all
values in the data set are between 1 and 57, there are no outliers based on
the standard deviation method.

Conclusion:

Based on both the IQR method and the standard deviation method, there are
no outliers in this data set. All values fall within the expected range for each
method.
10)In what ways might the graph be considered misleading or biased? Explain why
the producer of the graph may have prepared it in this way

Y-Axis Range: It may overstate the variations across episodes if the Y-axis is not set
to zero.It may be more difficult to compare the bars for live and 7-day viewers
separately for each program due to the stacked bar representation.• Bar Widths:
Perception can be distorted by uneven or irregular bar widths.• Colour Usage: The
graph's color scheme may make it challenging to discern between viewers who are
live and those who are delayed.

11. Prepare one grid with separate box-and-whisker plots of both live viewers and
7-day viewers per episode

You might also like