0% found this document useful (0 votes)
146 views61 pages

2-Statistical Measures of Data

The document discusses various statistical measures used to analyze data including measures of central tendency, measures of variation, Chebyshev's theorem, and z-scores. It defines key terms like mean, median, mode, range, variance, standard deviation, and introduces formulas and examples for calculating each measure.

Uploaded by

jasonaguilon99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views61 pages

2-Statistical Measures of Data

The document discusses various statistical measures used to analyze data including measures of central tendency, measures of variation, Chebyshev's theorem, and z-scores. It defines key terms like mean, median, mode, range, variance, standard deviation, and introduces formulas and examples for calculating each measure.

Uploaded by

jasonaguilon99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

STATISTICAL

Topic 2 MEASURES OF
DATA
Parameter and Statistic
Measures of Central Tendency
Measures of Variation
Chebyshev’s Theorem
Z-scores
Parameter and Statistic
A measure computed on the
basis of data obtained from a sample
is termed a statistic.

A parameter is a measure
computed on the basis of data
obtained from an entire population.
The sample statistic is
presumed to be an estimate
of the population parameter.
Example

The average height of


a selected sample of
grade one pupils in a
STATISTIC
certain school is a PARAMETER
statistic. The average
height of all grade
one pupils in the
school is a parameter.
Measures of Central Tendency

Any measure indicating the


center of a set of data, arranged
in an increasing and decreasing
order of magnitude.
Measures of Central Tendency

MEAN
MEDIAN
MODE
FOR UNGROUPED DATA(< 30 values)

MEAN

The mean is the


average. It is computed by
summing the inputs and
dividing by the number of
inputs.
Formula:

 i
x
X  i 1
N
Example
1. The number of employees at 5 different drug
stores are 3, 5, 6, 4, and 6. Treating the data
as population, find the mean number of
employees for the 5 stores.
2. A food inspector examined a random sample
of 7 cans of certain brand of tuna to
determine the percent of foreign impurities.
The following data were recorded: 1.8, 2.1,
1.7, 1.6, 0.9, 2.7, and 1.8. Compute the
sample mean.
MEDIAN
The median is the midpoint
of a set of numbers. The
numbers must be arranged in
order from lowest to highest or
vice-versa.
If there is an odd number of
inputs, the median is the middle
input.
If there is an even number
of inputs, the median is the
average of the two inputs in the
middle.
Example

1. On 5 terms tests in sociology a student


has made grades of 82, 93, 86, 92 and
79. Find the median for this population
of grades.
2. The nicotine contents for a random
sample of 6 cigarettes of a certain
brand are found to be 2.3, 2.7, 2.5,
2.9, 3.1, and 1.9 milligrams. Find the
median.
MODE
The mode is the input
that appears most times.
There can be more than one
mode.
Example

1. The donations from the residents of


fairway Forest towards the Virginia Lung
Association are recorded as 9, 10, 5, 9, 9,
7, 8, 6, 10, and 11. Find the mode.
2. The number of movies attended last
month by a random sample of 12 high
school students were recorded as follows:
2, 0, 3, 1, 2, 4, 2, 5, 4, 0, 1, and 4. Find
the mode.
Notation:

M
ea
sureo
f

C
entra
lTe
nde
ncy S
tatistic P
ara
m e
ter

M
ea
n X 
M
ed
ian m
d M
d
M
od
e m
o M
o
Problem:

The following data show the


amount of phosphates per
load of laundry, in grams, for a
random sample of various
types of detergents used
according to the prescribed
directions.
Laundry Detergent Phosphates per Load(gm)
A&P Blue Sail 48
Dash 47
Concentrated All 42
Cold Water All 42
Breeze 41
Oxydol 34
Ajax 31
Sears 30
Fab 29
Cold Power 29
Bold 29
Rinso 26
For the given data, find the
mean
median
mode
Measures of Variability

A measure of dispersion of
about the mean. It describes how
the observations spread out along
the scale of distribution.
Measures of Variability

Consider the following measurements, in liters, for


two samples of orange juice bottled by companies A
and B.
Sample A 0.97 1.00 0.94 1.03 1.06
Sample B 1.06 1.01 0.88 0.91 1.14

• Both samples have the same mean, 1.00 liters.


• Company A bottles orange juice with a more
uniform content than company B.
• The variability or dispersion of the observations
from the average is less for sample A than for
sample B.
Measures of Variation

Range
Variance
Standard Deviation
FOR UNGROUPED DATA(< 30 values)

RANGE
The range of a set of data
is the difference between the
largest and smallest number in
the set.
R = x H - xL
Example:

The IQs of 5 members of a


family are 108, 112, 127, 118,
and 113. Find the range.
R = x H – xL
= 127 – 108

= 19
VARIANCE

The variance is the sum


of squared deviations from
the mean of a distribution
divided by the number of
cases. It is the square of the
standard deviation.
For a population,
N

variance,  2 
 (x
i 1
i  ) 2

N
For a sample, n

 i
( x  x ) 2

variance, s 
2 i 1
n 1
Example:
Assuming that two sets A and B are populations,
calculate their variance.
SetA 3 4 5 6 8 9 10 12 15
Set B 3 7 7 7 8 8 8 9 15

For Set A:
N

 ( xi   ) 2
9

 x  8
2
i
2  i 1
 i 1
N 9


 5   4    3   2   0   1  2   4   7 
2 2 2 2 2 2 2 2 2

9
124

9
STANDARD DEVIATION

The standard deviation is


a measure of the spread or
dispersion of scores from the
mean in a distribution.
Problem:

A certain city in the south has a total of


nine industrial factories. These factories
reported the following number of days
their operation were stopped because of
strikes: 20, 19, 18, 16, 15, 14, 13, 12 &
8. Determine the following: range,
variance, and standard deviation.
Solution:

Computing for the range:

R = 20 - 8 = 12
Computing for the MEAN:
n

x i
 i 1
n
20  19  18  16  15  14  13  12  8

9

  15
Computing for the variance: (   15)
Xi Xi -  (Xi - )2
20 5 25
19 4 16
18 3 9
16 1 1
15 0 0
14 -1 1
13 -2 4
12 -3 9
8 -7 49

 /Xi -  / = 26 (Xi - )2 = 114


From the Table: (Xi - )2 = 114
N=9

 (x i  ) 2

  2 i 1
N
114
2
𝜎 =
9
=12.67
Computing for the Standard Deviation:

 2
 12 . 67    2

  12 . 67
  3 . 56
Chebyshev’s
Theorem
A Russian mathematician
discovered that the fraction of the
measurements falling between any
two values symmetric about the mean
is related to the standard deviation.
Chebyshev’s Theorem

“At least the fraction 1 - 1/k2 of


the measurements of any set
of data must lie within k
standard deviations of the
mean.”
Problem:

A coffee-maker is regulated so that it


takes an average of 5.8 minutes to brew a
cup of coffee with a standard deviation of
0.6 minute. Using Chebyshev's theorem,
determine the percentage of the times
that this coffee-maker is used will the
brewing time take anywhere from 4.6
minutes to 7.0 minutes?
x  5. 8
Solution:
s  0.6 Interval =7.0

Interval = x  ks
7.0 = 5.8 + k(0.6)
k(0.6) = 1.2
k = 2

1 1 3
1  2  1 2
k 2

4
 75%
Problem
 If the IQs of a random sample of 1080
students at a large university have a mean
score of 120 and a standard deviation of 8,
use Chebyshev’s theorem to determine the
interval containing at least 810 of the IQs in
the sample. In what range can we be sure
that no more than 120 of the scores fall?
Measures of Relative Variation

If the units are different between


two or more distributions and/or
their means are different, then, the
measures of relative variation are
more appropriate in comparing the
variability among these
distributions.
Measures of Relative Variation

They are:
the standard score, Z
the coefficient of variation,V
Standard Score

The standard score tells the


relative location of a
particular raw score with
regards to the mean of all
scores in a series.
Formula:

x
z

Coefficient of Variation

The coefficient of variation


expresses the standard deviation
as a percentage of the mean.
Formula:

s
V  100 %
x
OR


V  100%

Problem:

An automobile
salesman made a SALE
profit of $245 on a
subcompact model
for which the
average profit has
been $200 with a
standard deviation
of $50.
Later on the same
day, he made a
profit of $620 on a
large luxury model
for which the
average profit has
been $500 with a
standard deviation
of $150.
For which of these
two models is the
salesman's profit
relatively higher?
x
Solution: z

For Subcompact For Luxury


Model Model
x1 = 245 x2 = 620
1 = 200 2 = 500
50  200  0.9
z 1 =245 620  500
1
z
2 2
= 150  0 .8
50 150

Profit for the subcompact model is


relatively higher.
Problem:

The mean closing price of stock A


over the past year is P120 with a
standard deviation of P15. In the
case of stock B, the mean is P100
with a standard deviation of P8.
Which stock varies more in price?
s
Solution: V  100%
x
For Stock A For Stock B
x  120 x  100
s  15 s 8
15 8
V 100%  12.5% V 100%  8.0%
120 100

Stock A has a greater variability in


price. Therefore, stock A is more risky
than B.
Student
Activity
Answer the following:

1. What is the median of the following set


of data?
1, -1, 2, -2, 3, -3, 4, -4, 5, -5, 6, -6, 7, -7
2. Which of the following sets of data has
the highest variability relative to the
mean?
a. 10, 15, 19, 26, 28, 34, 48, 55
b, 62, 65, 68, 71, 75, 76, 79, 80
c. 120, 120, 124, 125, 128, 128
3. Listed below are the average monthly
incomes in pesos of Filipino families for
each of the 13 regions in the country:
Determine the ungrouped mean,
median and modal monthly income.
Which of the three measures best
describes the data? Why?
1249.50 7500.00 20.000.00
2250.00 7249.00 10,250.00
5249.00 8250.00 2250.00
6250.00 9249.00 6250.00
6500.00
4. During the dry season, when there is
a peak of construction activity,
shortages of cement and consequent
price fluctuations occur. A materials
purchaser for a construction firm
makes a random check on 15 cement
dealers and obtains the following
prices in pesos per 45 kg bag of
cement:
105, 105, 105, 110, 110, 115, 115, 120,
120, 120, 125, 125, 130, 130, 135

 What is the current price range of a 45


kg bag of cement?
 Compute for the standard and mean
deviations of the given data.
5. The IQ distribution of first year
Engineering students has a mean of
120 and a standard deviation of 8.
Use Chebyshev’s theorem to
determine the interval containing

 At least three-fourths of the IQs


 At least what percentage of these scores
must lie between 104 and 136?
6. A real estate broker is in charge of
selling lots in two new subdivisions.
The mean selling price of a lot in
Jade Park Village is P250,000 with a
standard deviation of P15,000. In
Ruby Valley Subdivision, the mean
price is P300,000 and the standard
deviation is P25,000. Which
subdivision has the larger variation
in lot prices?
7. Shown below are John’s scores,
mean, and the standard deviation of
each of the three tests given to 1000
students. On which test did John
stand highest? Lowest?

Test Mean Std. Deviation John’s Score


Math 47.2 4.8 53
English 64.2 8.3 71
Physics 75.4 11.7 72
8. By stopwatch method, the completion
time for a particular task in an
assembly line is measured for a
sample of 40 workers. The results are
summarized below. The production
manager has determined that a
median completion time of 3.75
minutes or less is acceptable. He
would however pull out workers for
further training if the median time is
found to be greater than 3.75
minutes. Will training be necessary?
Completion Time(min) Number of Workers
5.9 – 5.5 2
5.4 – 5.0 5
4.9 – 4.5 7
4.4 – 4.0 10
3.9 – 3.5 8
3.4 – 3.0 5
2.9 – 2.5 2
2.4 – 2.0 1

You might also like