0% found this document useful (0 votes)
15 views26 pages

Transforming Datas

This document investigates how statistical parameters such as mean, median, standard deviation, and quartiles are affected by translations and enlargements of data. It analyzes adding, subtracting, and multiplying a constant value to a dataset and examines the patterns in how these transformations influence the mean, standard deviation, and interquartile range. Graphs and calculations are provided to demonstrate the effects on the data distribution and parameters.

Uploaded by

26086
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views26 pages

Transforming Datas

This document investigates how statistical parameters such as mean, median, standard deviation, and quartiles are affected by translations and enlargements of data. It analyzes adding, subtracting, and multiplying a constant value to a dataset and examines the patterns in how these transformations influence the mean, standard deviation, and interquartile range. Graphs and calculations are provided to demonstrate the effects on the data distribution and parameters.

Uploaded by

26086
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Transforming datas

Introduction :
In this investigation, I will be investigating how translations and enlargements of data affect the statistical
parameters such as mean, median, the quartiles, standard deviation and more. I will be analyzing how the
transformation of datas influences these parameters. To do this, I will be experimenting with multiple variables
to see how it is impacting parameters such as additions, subtraction and multiplication on the data setThe
investigation will be firstly conducted through testing adding, subtracting and multiplying a certain constant
value on the data scores to figure out how these changes affect the mean and standard deviation and find the
pattern. In addition to that, the investigation will also regard how IQR ranges and quatiles are influenced.
During this, the concept of a certain constant value will be exhibited by the letter “a '' and will analyze the
conjunction between the range of “a '' and the patterns of changes on mean and standard deviation in the course
of this investigation. For instance, it will regard the differences when a > 0 , a < 1 and a < 0. Through this I will
be able to develop a pattern related to cumulative frequency and graphical characteristics regarding interquartile
range. Moreover, it will mitigate the understanding of quartiles such as median and interquartile ranges and the
method of calculating those values.

For this investigation, we have received the height data of 60 students.

In this investigation all the values will be rounded up to 3 d.p for concision. However when calculating the
difference, the full decimal will be used to provide more accurate results.

Parameters
Equations or Symbol Functions

Means Σ𝑥 the average of a set of values.


𝑛
The total sum of the data values divided by the number of data scores.

Median The middle value of daft set arranged in to find the center value.
order of size Ex. 2 , 3 , 4
The median is 3

Standard Deviation Σ(𝑥 − 𝑥) A Measure of the amount of variation of a random variable expected
𝑛 about its mean.

Sigma Σ The summation notation

IQR Q3 - Q1 IQR stands for interquartile range. It is the distance from the first
quartile ( Q 1 ) to the last quartile ( Q 3 ) in a data set.

Table 1 - The parameters


The table shows the height in centimeters of 60 students.
Data set :
The table of 60 peoples’ heights
177 175 137 155 150 166
132 146 179 140 169 177
141 148 130 179 135 130
157 172 178 143 143 136
132 166 130 151 145 178
131 171 160 140 179 166
145 142 177 176 132 135
164 179 161 145 134 169
139 149 135 142 172 148
159 160 137 130 130 164
Table 2 - the data of height of the 60 students

1. Finding mean and standard deviation of the data

I. Mean
To calculate the mean of the given data, you first sum all the numbers together and divide it by the total
quantity of datas.

The sum of the values = 9168


The number datas = 60
9168 ÷ 60 = 152.8
II. Standard deviation
2
Σ(𝑥−𝑥)
To calculate the standard deviation, you use the formula of 𝑛
.
Here, the values, x is the each value from the data given, n is the total number of samples, 𝑥 is the mean of the
data and sigma is the symbol of “sum of”.

By substituting the values on the equation,

2
Σ(𝑥−152.8)
60

2 2 2 2
(130 − 152. 8) + (130 − 152. 8) + ⋯ + (172 − 152. 8) + (179 − 152. 8) = 17.09542236=17.095
Graph 1
Graph 1 is the dot plot graph of the given height data shown by frequency on the y axis and heights on the x
axis. The dots are plotted on each height centimeters in the range between 130 and 180 in steps of 1 cm.
Therefore the values exactly lie on the x axis(y = 0), it means there are no people who are that height.

Investigating how the parameters, mean and standard deviation is impacted when there is a change on the data
set

[Addition] : When 5 cm is added to each heights in the data set (a > 0)

Mean : When 5 cm is added to each height in the data set, the sum of the data set become 9468
And the calculation for the mean can be shown like :

9468(new sum) ÷ 60 = 157.8

Default When the data are additioned by 5 Comparison with original default
value

152.8 157.8 |152. 8 − 157. 8| = 5


Table 3 - the mean of the given data

As you can see the value of mean has changed and increased by 5 when 5cm is added to each score in the data
set. And you can conclude that when a certain value is added to the entire values in a given data set, the mean
changes identically.
Standard deviation : When 5 cm is added to each height in the data set, the calculation for the standard
deviation can be showns as :
2
Σ{𝑥+5−(𝑥+5)}
60
2
Σ(𝑥+5−157.8)
60
I have implemented addition by 5 on the side of the x value since all the values are equally increased by 5

2 2 2
(130 + 5 −157.8) + ⋯ +(172 + 5 − 157.8) + (179 + 5 − 157.8)
60
= 17.09542236=17.095

Default When the data are additioned by 5 Comparison with original default
value

17.09542236 17.09542236 |17. 09542236 − 17. 09542236|


=0
Table 4 -
As you can see the value of standard deviations have not changed when 5cm is added to each scores in the data
set

Graph 2
Graph 2 compares the changes on the dot plot graph when each score is added by 5. As you can observe the dot
graphs have parallely shifted by 5 on the direction of the x-axis. The blue dots represent the original value and
the red dots represent the data when each value is added by 5. This parallel shift explains the proportional move
on the entire data set when adding a constant value. ( a > 0 ). Furthermore it is also possible to observe that the
range of the data isn't changed even if it is translated.
[Subtraction] : When 12 cm is subtracted from each values in the data set ( a < 0 )

Mean : When 12 cm is subtracted to each height in the data set, the sum of the data set become 9096
And the calculation for the mean can be shown like :

9096(new sum) ÷ 60 = 151.6

Default When the data are subtracted by Comparison with original default
12 value

152.8 151.6 |152. 8 − 151. 6| = 1.2


Table 5 the change in mean when 12 is subtracted
As you can see the value of mean have changed and decreased by 0.2 when 12cm is subtracted to each scores in
the data set

Standard deviation : When 5 cm is added to each height in the data set, the calculation for the standard
deviation can be showns as :
2
Σ{𝑥−12−(𝑥−12)}
60

Σ(𝑥−12−140.8
60
I have implemented subtraction by 12 on the side of the x value since all the values are equally decreased by 12

2 2
(130 −12 − 140.8) + ⋯ +(172 −12 − 140.8 + (179 −12 − 140.8)
60
= 17.09542236 =17.095

Default When the data are subtracted by Comparison with original default
12 value

17.095 17.095 |17. 09542236 − 17. 09542236|


=0
Table 6 the changes on the standard deviation when 12 is subtracted
As you can see the value of standard deviations have not changed when 5cm is added to each scores in the data
set
Graph 3
Graph 3 compares the changes on the dot plot graph when each score is added by -12. The blue dots represent
the original value and the red dots represent the data when each value is subtracted by 12.
As you can observe the dot graph has a parallel shift by -12 in the direction of the x-axis. This parallel shift
explains the proportional move on the entire data set when subtracting a constant value ( a < 0 ). Furthermore it
is also possible to observe that the range of the data isn't changed even if it is translated.

Testing with different constant a values


Range Different a Mean Change in Standard Change ins
values Mean deviation standard
deviation

a>0 When 1 is 153.8 +1 17.095 0


added

When 3 is 155.8 +3 17.095 0


added

a<0 When 2 is 150.8 -2 17.095 0


subtracted

When 5 is 147.8 -5 17.095 0


subtracted
Table 7 the changes on the mean and standard deviation when different values are added and subtracted
constantly on the data set.
Conclusion of first investigating, [Addition and Subtraction]

Why does mean changes


In conclusion the means will be changed since the sum of the values in the distribution will change but that
sum will still be divided by the same number. And you can see that when 𝑎 is added, the value of mean increases
positively since the sum of the data increases positively. For instance, in the experiment when 5 is added to each
value in the data set, the mean has also increased by 5. On the other hand when 𝑎 is subtracted to the entire
values in a given data set, the mean decreases negatively since the sum of the data set also decreases. For
instance, in the experiment when 12 is subtracted from each value in the data set, the mean has also decreased
by 12. As a result of this, you can observe that the changes on the mean are identical to the changes on each
data set. This is because the mean represents the central tendency of the dataset, calculated by summing up all
values and dividing by the total number of values. When a constant value, a, is added to or subtracted to each
score in the data, the sum of the data points increases or decreases accordingly, leading to a parallel shift of the
entire data.
This can be further explained in the form of equation :
m = k : original mean
a : constant value
u : new mean
m+a=k+a
Here, m + a is the new mean, therefore we can convert (m+a) to u
u=k+a
u-k=a
As you can see here, the mean subtracted by the changed mean value is equal to the constant value added.

Why does not Standard Deviation change


However the standard deviations remains unchanged even though 𝑎 is added or subtracted. This is because the
standard deviation is a measure of how dispersed the data is in relation to the mean. Furthermore it is the
square root of the variance, the new difference from the mean is the same as the original difference from the
original mean. This is further supported through the claim made in the paragraph above which states the
parallel shift. Basically, it means the range of the data does not change because all the values are changed
identically. Thus, the variance hasn't changed, the standard deviation remains the same.

This can be further justified through the equation :


2
Σ(𝑥−𝑥)
The equation of the standard deviation 𝑛
.
2
Σ{(𝑥+𝑎)−(𝑥+𝑎)}
When a constant value is added on the data set, the equation become 𝑛
.
2 2
Σ{𝑥−𝑥} Σ(𝑥−𝑥)
If you simplify the numerator, 𝑛
which is equal to the original standard deviation 𝑛
.
[Multiplication] When 5 is multiplied on each values on the data set ( a > 0 )

Mean : When 5 is multiplied to each height in the data set, the sum of the data set become 45840
And the calculation for the mean can be shown like :

45840(new sum) ÷ 60 = 764

Default When the data are multiplied by 5 Comparison with original default
value

152.8 764 764


= 5
152.8

Table 8 , changes in mean when 5 is multiplied on each score.

As you can see the value of mean has changed and increased by 5 time of the default value
(152.8 x 5 =764) when each of the scores in the data set are multiplied by 5. And you can conclude that when a
certain value is multiplied to the entire values in a given data set, the mean changes identically

Standard Deviation: When 5 is multiplied to each height in the data set, the calculation for the standard
deviation can be showns as :
2
Σ{5𝑥−𝑥(5)}
60
2
Σ(5𝑥−764)
60
2 2 2
(650 −764) + ⋯ +(860 −764) + (895 −764)
60
= 85. 478712 = 85.479

Default When the data are multiplied by 5 Comparison with original default
value

17.095 85. 479 85.479


=5
17.095

Table 9 , changes in standard deviation when 5 is multiplied on each score.

As you can observe, the value of standard deviations have increased by the multiplication of 5
(17.095 x 5 = 85.478) . This shows that when the mean is multiplied by a certain number and increases. The
standard deviation also increases positively, and the standard deviation also increases by the same
multiplication of a constant value acted on the data set. Therefore when the number(a) that is multiplied on the
values is bigger than 0 (a > 0), the standard deviation increases.
Graph 4.
Graph 4 compares the changes on the dot plot graph(cumulative frequency x height) when each score is
multiplied by 5. As you can observe, I have utilized a different type of dot plot graph unlike the dot plot I used
where the investigation was experimenting with the effect of adding and subtracting. It is because the previous
graph loses its readability because of the concept of multiplication, the graph becomes significantly wider
compared to adding and subtracting. Therefore to keep the investigation concise and provide better
communication I have changed the formatting of the graph. The green dots represent the original value and the
red dots represent the data when each value is multiplied by 5. As you can observe not only the graph is shift up
on the y axis but the range on the y axis direction between each score has increased, from this you can observe
that multiplying a constant value over a date set creates changes on its standard deviation as it is a measure of
the range from the mean.

[Multiplication] When 0.2 is multiplied on each values on the data set ( a > 0)
1 1
0.2 is a decimal number which can converted to fraction, 5
and multiplying 5
is equal to division. Therefore
this session is investigating how mean and standard deviation is impacted when the values are divided by
certain values.

1833.6(new sum) ÷ 60 = 30.56


Default When the data are multiplied by Comparison with original default
0.2 value

152.8 30.56 30.56


=
1
= 0. 2
152.8 5

Table 10 , changes in mean when 0.2 is multiplied on each score.

As you can see the value of mean has changed and decreased by 0.2 times of the default value
(152.8 x 0.2 =30.56) when each of the scores in the data set are multiplied by 0.2. And you can conclude that
when a certain value is multiplied to the entire values in a given data set, the mean changes identically.

Standard Deviation: When 0.2 is multiplied to each height in the data set, the calculation for the standard
deviation can be showns as :
2
Σ{(0.2)𝑥−𝑥(0.2)}
60
2
Σ(𝑥−30.56)
60
2 2 2
(130 −30.56) + ⋯ +(172 −30.56) + (179 −30.56)
60
= 3.3905977357192 = 3.391

Default When the data are multiplied by Comparison with original default
0.2 value

17.095 3.391 3.391


=
1
= 0. 2
17.095 5

Table 11 , changes in standard deviation when 0.2 is multiplied on each score.

As you can observe, the value of standard deviations have decreased by the multiplication of 0.2 This shows
that when the mean is multiplied by a certain number and decreases, the standard deviation also decreases by
the same multiplication of a constant value acted on the data set. Therefore when the number(a) that is
multiplied on the values is smaller than 1 (a < 1), the standard deviation decreases.
Graph 5.
Graph 5 compares the changes on the dot plot graph(cumulative frequency x height) when each score is
multiplied by 0.2. As you can observe, The green dots represent the original value and the red dots represent the
data when each value is multiplied by 0.2. As you can observe not only the graph is shift down on the y axis but
the range on the y axis direction between each score has increased, from this you can observe that multiplying a
constant value over a date set creates changes on its standard deviation as it is a measure of the range from the
mean.

[Multiplication] When a value that is lower than 0 is multiplied on each values on the data set ( a < 0)
When -1 is multiplied

Default When the data are multiplied by -1 Comparison with original default
value

152.8 -152.8 |152. 8 − (− 152)| = 305.6


Table 12 , changes in mean when -1 is multiplied on each score.

As you can observe, when a < 0, the value becomes negative however in the field of height measurement, it is
impossible to be negative. However it is still able to calculate the mean of the value as it is shown above on the
table. And you can further observe that the new mean when -1 is multiplied on each scores have different of
multiplication by -1 which shows it follows the same rule when a > 0
Standard deviation:
2
Σ{(−1)𝑥+𝑥(−1)}
60
2
Σ(𝑥+152.8)
60

2 2 2
(−130 +152.883333) + ⋯ +(−172 +152.883333) + (−179 +152.883333)
60
= 17.09542236 = = 17.095

Default When the data are multiplied by -1 Comparison with original default
value

17.095 17.095 17.095


= 1
17.095

Table 13 , changes in standard deviation when -1 is multiplied on each scores


As you can see even though -1 is multiplied to each score on the data set, the standard deviation did not change
and identically when it is timed by 1(when there are no changes). This is because the range of the data has not
changed but parrelly shifted. In addition, in the standard deviation formula, we can observe that it is an absolute
value since it is a square of the value in the square root. This further explains why the standard deviation is
positive even though negative values are multiplied to each score. In addition you can also observe that even
though a negative number is multiplier,the changes in the standard deviation are still positive.

When -2 is multiplied
Mean :
-18336(new sum) ÷ 60 = -305.6

Default When the data are multiplied by -2 Comparison with original default
value

152.8 -305.6 −305.6


= -2
152.8

Table 14 , changes in mean when -2 is multiplied on each score.

As you can observe from the comparison between the default mean and the changed mean, it has changed by
the multiplication of -2. Therefore when each of the scores are multiplied by -2, the mean is also multiplied
identically.

Standard Deviation: When -2 is multiplied to each height in the data set, the calculation for the standard
deviation can be showns as :
Σ(−2𝑥−𝑥(−2)
60
2
Σ(−2𝑥+305.6)
60
2 2 2
(−260 −305.6) + ⋯ +(−254 −305.6) + (−258 −305.6)
60
= 34.19084472 = 34.191
As you can observe, even though the data is multiplied by a negative number, -2, the standard deviation stays
positive.

Default When the data are multiplied by -2 Comparison with original default
value

17.095 34. 190 34.190


=2
17.095

Table 15 , changes in standard deviation when -2 is multiplied on each score.

As you can see from table 15, you can observe that even when -2 is multiplied on the data set, the comparison
between the original standard deviation and the value when -2 is multiplied is 2 which is a positive number.

Testing with different constant a values


Range Different a Mean Change in Standard Change ins
values Mean deviation standard
deviation

a>0 When 0 is 0 0 0 0
multiplied

When e is 152.8e ×e 46.470 ×e


multiplied

a<1 When0.5 is 76.4 ×0.5 8.547 ×0.5


multiplied

When 0.7 is 106.960 ×0.7 11.967 ×0.7


multiplied

a<0 When -3 is -458.400 ×-3 51.286 ×3


multiplied

When - 5 is -764 ×-5 85.477 ×5


multiplied
Table 16 the changes on the mean and standard deviation when different values are multiplied constantly on the
data set.
As you can see all the change on standard deviation is positive no matter whether the values are a > 0 or a < 0.
For instance when -3 is multiplied, the changes on the mean was the multiplication of -3 but the changes on the
standard deviation is just multiplication of 3
Conclusion of first investigating, [Multiplication]

Why does mean changes


In conclusion the means have changed because the sum of the data have changed but the number of datas are
the same. And you can observe that when a(a>0) is multiplied on each score the mean also increases identically.
For example, when 5 is multiplied, the mean has also increased by 5 times. On the other hand when a (a< 1) is
multiplied on each score the mean has decreased by a time. For example when 0.2 is multiplied, the mean value
decreases by 0.2. This is because when a certain value, a is multiplied, it affects the range of the datas, however
as all the values increase accordingly, it leads to a parallel stretch in the mean. In simpler terms, multiplying
each score in a data set by a certain value stretches or compresses the distribution without changing its shape.
Therefore, the mean is affected proportionally to the extent of this stretching or compressing. If you multiply all
scores by a value greater than 1, the mean will increase; if you multiply by a value between 0 and 1, the mean
will decrease. In the case of when a < 0, the mean has changed identically but negative when a is multiplied on
each score.. For instance, when 5 is multiplied the mean is 764 and when -5 is multiplied, the mean becomes
-764. This is because when each score in a data set is multiplied by a negative value, the mean of the
transformed data set also changes in a corresponding manner, specifically changing its sign. This occurs because
the mean is calculated as the sum of all the scores divided by the number of scores. When a negative multiplier
is applied to each score, the entire sum of the data set is effectively multiplied by this negative value.
Mathematically, if the original mean was positive, multiplying by a negative value will result in a negative mean

Why does Standard Deviation change


Unlike additions and subtractions, if each of the scores is multiplied by a certain value, the standard deviation
changes. For example when 5 is multiplied, the standard deviation also increases by the multiplication of 5. As 5
is an integer that is bigger than 0, you can observe that when a > 0 is multiplied, the standard deviation also
increases proportionally. However when 0.2 is multiplied, the standard deviation is not increased like but
decreased by the multiplication of 0.2. In this case, you can observe that when a < 1 is multiplied on the data, the
standard deviation decreases proportionally. This is because when each score in a dataset is multiplied by a
constant value, the standard deviation of the dataset changes because it is a measure of the spread or dispersion
of the data. Specifically, when a value is multiplied, the gap between each value increases. For instance, if there is
130 and 131, when they are multiplied by 3, they become 390 and 393 which you can observe the change in the
difference 1 to 3. Because of the property the standard deviation will be multiplied by the absolute value of the
constant. On the other hand, it is also similar to the case where a negative value is multiplied to each data set.
Since the standard deviation is a measure of the spread or dispersion of a set of data points around the mean, it
cannot be negative. For instance when -2 is multiplied, the standard deviation is 34.19084472 which is equal to
when 2 is multiplied to the data set.
[Cumulative frequency] Grouping the data into intervals and graphing it through the use of
technology

Height (cm) Frequency Cumulative Frequency


120< 𝐻𝑒𝑖𝑔ℎ𝑡 ≤130 5 5
130< 𝐻𝑒𝑖𝑔ℎ𝑡 ≤140 14 19
140< 𝐻𝑒𝑖𝑔ℎ𝑡 ≤150 13 32
150< 𝐻𝑒𝑖𝑔ℎ𝑡 ≤160 6 38
160< 𝐻𝑒𝑖𝑔ℎ𝑡 ≤170 8 46
170< 𝐻𝑒𝑖𝑔ℎ𝑡 ≤180 14 60
Table 17 table for the cumulative frequency graph of the height data.

Graph 6
Graph 6 is the graph which shows the cumulative frequency graph of the height data from table N. Throughout this
graph you can observe the first, second and third quartiles of the data. The first quartile, 𝑄1 is also known as the

lower quartile. The values lower than 𝑄1 represent the 25th percentile where lowest 25% data is below this point.

The second quartile, 𝑄2 represents the median of the data set which is the middle of the entire data set, therefore

the data below the median value is the 50th percentile, the lowest 50 % of data. The third quartile, also known as 𝑄3

represents the 75th percentile which shows the lowest 75% of the data is below this value.

Finding Median Q2
𝑛+1
To find the median, we can use the median equation ( 2
th). In this equation, n represents the number of values in

the data set. Therefore n = 60 From this equation we know what number of terms the median is in the data set.
60+1
2
= 30.5

Therefore the median is the 30.5th term of the data.


And you can observe on the cumulative frequency graph that, when y = 30.5, x = 143.83
So the median of the height data is 148.5
Q2 : 148.5
Finding Lower quartile Q1
1
To find the the lower quartile, we can use the lower quartile equation ( 4 (𝑛 + 1)𝑡ℎ). In this equation, n represents

the number of values in the data set. Therefore n = 60. From this equation we know what number of terms the
median is in the data set.
1
4
(60 + 1) = 15.25

Therefore the lower quartile is the 15.25th term of the data.


And you can observe on the cumulative frequency graph that when y = 15.25, x = 132.32
So the lower quartile of the the height data is 137
Q1 : 137 cm
Finding Upper quartile Q3
3
To find the the upper quartile, we can use the lower quartile equation ( 4 (𝑛 + 1)𝑡ℎ). In this equation, n represents

the number of values in the data set. Therefore n = 60. From this equation we know what number of terms the
median is in the data set.
3
4
(60 + 1) = 45.75

Therefore the upper quartile is the 45.75th term of the data.


And you can observe on the cumulative frequency graph that when y =45.75, x = 169
So the lower quartile of the the height data is 169
Q3 : 169 cm

Finding Interquartile range(Q3 - Q1)


Interquartile rage is a measure of the spread of the data and it is calculated by Q3- Q1. Therefore tt is defined as the
difference between the 75th and 25th percentiles of the data.
Q3 - Q1 = IQR
169 - 137 = 32
IQR = 32 cm
In table, it looks like

Parameters Original Data

Q1 Lower Quartile 137

Q2 Median 148.5

Q3 Upper Quartile 169

IQR Interquartile Range 32


Table 18. The Quartiles of cumulative frequency graph of height data set.

Investigation how the median and IQR impacted the changes on the values of the data set.

[Addition] : When 5 cm is added to each heights in the data set (a > 0)

Graph 7 shows the comparison of two cumulative frequency graphs where the green represents the original
graph and red is when 5 is added to each score in the data. From the observation, the graph is parallel shifted to
the right side by 5 which is the x axis direction. Furthermore, you can observe that the values of quartiles are
also just parrelly shifted therefore the new parameters of Q1 Q2 Q3 will be the original value increased by 5.
On the other hand, the interquartile range stays the same because adding a constant to all data points shifts the
entire data set but does not change the spacing between the quartiles, which is why the IQR remains
unchanged.

This can be also proven through equation :


The new Q1 : Q1 + 5

The new Q3 : Q3 + 5

Then the new IQR will be (Q3+5)-(Q1+5)

= Q3 - Q1

= 32

Parameters Original Data When +5

Q1 Lower Quartile 137 142

Q2 Median 148.5 153.5

Q3 Upper Quartile 169 174

IQR Interquartile Range 32 32


Table 19. The changes on the Quartiles of cumulative frequency graph of height when 5 is added on the data set.
[Subtraction] : When 12cm is subtracted from each values in the data set ( a < 0 )

Graph 8
Graph 8 shows the comparison of two cumulative frequency graphs where the green represents the original
graph and red is when 12 is subtracted from each score in the data. From the observation, the graph is parallel
shifted to the left by 12 which is the x axis direction. Furthermore, you can observe that the values of quartiles
are also just parrelly shifted therefore the new parameters of Q1 Q2 Q3 will be the original value decreased by
12 On the other hand, the interquartile range stays the same because adding a constant to all data points shifts
the entire data set but does not change the spacing between the quartiles, which is why the IQR remains
unchanged.

Parameters Original Data When -12 Changes on the Quartiles

Q1 Lower Quartile 137 125 -12

Q2 Median 148.5 136.5 -12

Q3 Upper Quartile 169 157 -12

IQR Interquartile Range 32 32 0


Table 20. The changes on the Quartiles of cumulative frequency graph of height when 12 is subtracted on the
data set.
Why does median change
In conclusion of why median and other quartiles changes when a constant value is added, both a > 0 and a < 0 is
because when a certain constant value is added, the cumulative frequency graph parrely translate on the x axis
so the values of quartiles proportionally increase or decrease identically with the constant changing value.

Why does IQR change?


On the other hand, IQR, the interquartile range does not change when a constant value is added or subtracted
because the range between Q3 and Q1 is not affected when the graph is parrelly translated. Since the entire
graph shifts proportionally, the range stays constant.
The new Q1 : Q1 + a

The new Q3 : Q3 + a

Then the new IQR will be (Q3+a)-(Q1+a)

= Q3 - Q1(a disappears)

= 32

Testing the changes on mean with various a values


Different a values Median Changes in Median IQR Changes in IQR

When 1 is added 138 +1 32 0

When 2 is added 139 +2 32 0

When 3 is added 140 +3 32 0

When 1 is 136 -1 32 0
subtracted

When 2 is 135 -2 32 0
subtracted

When 3 is 134 -3 32 0
subtracted
[Multiplication] When 5 is multiplied on each values on the data set ( a > 0 )

Graph 9
Graph 9 shows the graph of the cumulative frequency graph when 5 is multiplied on each score in the data set .
From the observation, the graph is translated to the x positive direction as the initial value has changed to 650
from 130. Furthermore, you can observe that the values of quartiles are also changed to new parameters of Q1
Q2 Q3 .On the other hand, the interquartile range has also changed.

Parameters Original Data When x 5 Changes on the quartiles

Q1 Lower Quartile 137 685 ×5

Q2 Median 148.5 742.5 ×5

Q3 Upper Quartile 169 845 ×5

IQR Interquartile Range 32 160 ×5


Table 21. The changes on the Quartiles of cumulative frequency graph of height when 5 is multiplied on the
data set.

As you can see from the changes on the quartiles. All of the values of quartiles, Q1, Q2, Q3 and IQR have been
multiplied by 5 when 5 is multiplied on each of the scores in the data set.

This can be also proven through equation :


The new Q1 : (5)Q1

The new Q3 : (5)Q3

Then the new IQR will be (5)Q3- (5)Q1


Here, the 5 is a common factor ,

=5(Q3 - Q1)

=5(32)
5 x 32 = 160

[Multiplication] When 0.2 is multiplied on each values on the data set ( a < 1)

Graph 10
Graph 10 shows the graph of the cumulative frequency graphs when 0.2 is multiplied on each score in the data
set . From the observation, the graph is translated to the x positive direction as the initial value has changed to
25 from 130. Furthermore, you can observe that the values of quartiles are also changed to new parameters of Q1
Q2 Q3 .On the other hand, the interquartile range has also changed.

Parameters Original Data When × 0.2 Changes on the quartiles

Q1 Lower Quartile 137 25.4 ×0.2

Q2 Median 148.5 29.7 ×0.2

Q3 Upper Quartile 169 33.8 ×0.2

IQR Interquartile Range 32 6.4 ×0.2


Table 21. The changes on the Quartiles of cumulative frequency graph of height when 0.2 is multiplied on the
data set.
[Multiplication] When -0.3 is multiplied on each values on the data set ( a > 0)

Graph 11
Graph 11 shows the graph of the cumulative frequency graphs when -0.3 is multiplied on each score in the data
set . From the observation, the graph is translated to the x positive direction as the initial value has changed to
-39 from 130. Furthermore, you can observe that the values of quartiles are also changed to new parameters of
Q1 Q2 Q3 .On the other hand, the interquartile range has also changed.

Parameters Original Data When × -0.3 Changes on the quartiles

Q1 Lower Quartile 137 -41.1 ×-0.3

Q2 Median 148.5 -44.55 ×-0.3

Q3 Upper Quartile 169 -50.7 ×-0.3

IQR Interquartile Range 32 -9.6 ×-0.3


Table 23. The changes on the Quartiles of cumulative frequency graph of height when -0.3 is multiplied on the
data set.
Testing with various values.
Different a values Median Changes in Median IQR Changes in IQR

When 0 is 0 ×0 0 ×0
multiplied

When 1 is 137 ×1 32 ×1
multiplied

When 2 is 274 ×2 64 ×2
multiplied

When 3 is 411 ×3 96 x3
multiplied

When -1 is -137 ×-1 -32 x-1


multiplied

When -2 is -274 ×-2 -64 ×-2


multiplied

when - 3 is -411 ×-3 -96 ×-3


multiplied

Conclusion of Second investigating, [Multiplication of a values, a>0 and a<0]

Changes on the quartiles and median

When a constant value is multiplied to the scores of the data set, the values of quartiles are also proportionally
affected by the certain value. For example when it was multiplied by 5, the Q1, Q3 and IQR increased by the
multiplication of 5. For the median value, equal to other quartiles, the median has increased by the
multiplication of 5. By this, you can assure that when a, a > 0 is multiplied to the data set, the quartile values
increase proportionally by the same value. However when a < 1, the value of quartiles decreases and it is proven
by the experiment when 0.2 is multiplied to the data set. For instance when 0.2 was multiplied the Q1, Q3 and
Iar have decreased by the multiplication of 0.2. For the median value, equal to other quartiles, the median has
decreased by the multiplication of 0.2. Then we can assure that when a > 1 is multiplied on the data set, the
quartile values decrease. Lastly also when a < 0 is multiplied on the data,the values of quartiles are also
proportionally affected by the certain value. For example when it was multiplied by -0.3, the Q1, Q3 and IQR
decreased by -0.3. For the median value, equal to other quartiles, the median has decreased by the
multiplication of -0.3

Changes on IQR.
As you can observe on table 21 and 23, when a constant value was multiplied on the data set, the IQR range also
followed the changes. For example when 5 (a > 0)was multiplied on the data set, the IQR range also increased
by the multiplication of 5. When 0.2 (a >0) was multiplied, the IQR range also decreased by the multiplication
of 0.2. So we can assure that when a > 0 is multiplied, the IQR range proportionally increases or decreases.
However when a < 0 is multiplied on the data set, it creates differences while calculating the IQR. In a normal
IQR calculation, Q3 should be greater than Q1 to calculate appropriate IQR however if a < 0 is multiplied on
the data set, the Q3 value becomes smaller than Q1. Hence the IQR becomes negative. But according to
research(‘Why can’t IQR be negative’ Study, 2022) , The IQR is always non-negative because it represents the
range of the middle 50% of the data, reflecting the spread of data, not its absolute values. Therefore when a < 0
is multiplied, it indicates an error in the data processing or calculation. The IQR, as a measure of statistical
dispersion, is inherently non-negative.

[Conclusion]

In conclusion, this investigation reveals that when a constant value (a > 0 or a < 0) is added to each data point in
a data set, both the mean and median are shifted by the constant value a. However, the standard deviation IQR
remains unchanged by this edition of a constant value because they are measures of spread or dispersion in the
data, and adding a constant to each data point does not alter the relative distances between the data points.
Thus, while measures of central tendency (mean and median) change due to the constant shift, measures of
variability (standard deviation and IQR) remain unchanged. When considering the multiplication of each data
point by a constant value a, the mean and median changes proportionally. For example, if a > 1, the mean and
median increases by the multiplication of a. Conversely, if 0 < a < 1, the mean decreases proportionally by the
multiplication of a. Moreover, if a < 0, the mean and median changes the sign accordingly. Similarly, the
standard deviation and IQRalso change when each data point is multiplied by a constant. Multiplying by a > 1
increases the standard deviation and IQR proportionally. In addition, multiplying by (0 < a < 1) both of the
values decreases proportionally .. This is because the standard deviation and IQR measures the spread of data
points around the mean, and multiplying by a constant scales the distances between data points. Multiplying by
a negative value affects the standard deviation similarly in magnitude, as the standard deviation is inherently
non-negative.

Further Investigation

1. Transforming the given set of data so that it has a mean of 0

To convert the mean of a data set to 0, each score in the data set should be subtracted by the mean of the
original data set which we have calculated previously on the introduction of this investigation. This is because of
the mechanism of how the dot plot is parrelly transmitted on the x axis when the data set is subtracted by the
constant value a. For example in this investigation, when the data set was subtracted by 12, the mean decreased
proportionally. Therefore by subtracting 152.8, which is the mean of the original data set, it will be able to have a
mean as 0.

To calculated the mean, the sum of the data should be divided by the number of data set so ,

(130−152.8) + (130−152.8) + ⋯ +(175−152.8) + (179−152.8)


60
=0
2. Transform the given set of data so that it has a standard deviation of 1

To convert the standard deviation of a data set to 1, each of the scores should be divided by the value of standard
deviation because then each data point is divided by the standard deviation, it standardizes the distribution,
ensuring that the data's spread is normalized to a standard deviation of 1. This standardization makes it simpler
to compare and analyze different datasets by ensuring consistent variability.

To calculate the standard deviation :

𝑛
𝑛 ∑(𝑥𝑛 ÷ 17.1)
1 2
∑ {(𝑥 ÷ 17.1) − ( 60
)} 2 2 2 2
𝑛=1 {(7.6)−(8.94)} + {(7.6)−(8.94)} + ⋯ + {(10.24)−(8.94)} + {(10.24)−(8.94)} 60
60
= 60
= 60
= 1

3. Transform the given set of data so that it has a mean of 0 and a standard deviation of 1.

To transform the data set to a form that has a mean of 0 and a standard deviation of 1, wecan combine two

methods used on number one and two, which is subtracting 152.8 and dividing 17.1 from the data set.

Therefore, the data set will be

(130 − 152.8) (130−152.8) (177−152.8) (177−152.8)


17.1
+ 17.1
+ ⋯ + 17.1
+ 17.1

𝑛
𝑛 ∑ ((𝑥−152.8) ÷ 17.1)
2
∑ {(𝑥𝑛−152.8) ÷ 17.1) − ( 𝑛=1 60
)} 2 2 2
𝑛=1 (−1.3) + (−1.3) + ⋯ + (1.4) + (1.4) 60
60
= 60
= 60
= 1

You might also like