100% found this document useful (1 vote)
639 views28 pages

Statistics 101

The chapter discusses various statistical concepts including measures of central tendency, dispersion, and probabilities. It aims to teach students how to calculate and interpret the mean, median, mode, range, variance, and standard deviation of data sets. Several examples are provided to demonstrate how statistical tools can be used to solve problems in real-world scenarios.

Uploaded by

Carlo Carlo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
639 views28 pages

Statistics 101

The chapter discusses various statistical concepts including measures of central tendency, dispersion, and probabilities. It aims to teach students how to calculate and interpret the mean, median, mode, range, variance, and standard deviation of data sets. Several examples are provided to demonstrate how statistical tools can be used to solve problems in real-world scenarios.

Uploaded by

Carlo Carlo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

CHAPTER 5

STATISTICS

“Statistics is a grammar of Science”-Karl Pearson

Statistics is used almost every day in our life. It is also very useful in the field of research.
Below are some examples of the application of Statistics in real life: You are commissioned by
Poro Point Management Corporation (PPMC) to know how satisfied their clients are in terms of
the services they are providing them. Gina and Jane are watching the championship game
between Ateneo de Manila University (ADMU) and University of the Philippines (UO)Women’s
Volleyball teams.Out of the blue Gina offers Jane a bet that UP will win in that game. Should
Jane take the bet that Gina is offering her? Rico wants to conduct a survey on whether his
barangay should use its funds to build a new chapel or new health center. How many respondents
does he need to consider in his survey? How can he make sure that his survey is fair?

In this chapter the following lessons will be discussed: Data Management /Data Gathering
and Organizing ;Measures of Central Tendency ;Measures of Dispersion; Measures of Relative
Position ;Probabilities and Normal Distributions ;Linear Regression ; and Linear Relation
Coefficient.

Learning Objectives

At the end of the chapter, the students will be able to:

1. recall the different terms and basic concepts in Statistics;


2. compute and interpret the mean, median and mode of ungrouped and grouped data;
3. solve and interpret different measure of position;
4. calculate and interpret the range, MD, variance, and standard deviation of ungrouped
data; and
5. employ the different statistical tools in solving real life problems.

LESSON 1 Data Collection and Management

What is STATISTICS?

Statistics involves the collection, organization, summarization, presentation, and interpretation of data.

61
Branches of Statistics

Statistics may be subdivided into two fields: the descriptive and the inferential fields:

STATISTICAL PROCESS

Collection Organization and Analysis of data Interpretation of data


presentation

Gathering of Refers to the process of Involves the Process of drawing out


information or grouping or classifying description of data and generalization and
data through the collected data organized through the conclusions based on
survey, interviews, according to the use of some statistical the data collected,
or testing. characteristics. It could methods and presented, and
be summarized through treatment. analyzed.
textual, graphical and
tabular forms.

Data Collection Methods

Interview A face-to-face interaction where one person (the interviewer) asks a series of
questions to another (the interviewee) in order to gather necessary
information.
Survey and A written copy of questions is givn to the respondents (people who
Questionnaire participated in the survey or research).
Census Data collection where data is gathered from all members of the population.
Observation Data gathering that makes use of the senses.
Experimentation Most commonly used for scientific research where causality is being tested.
Documents or Consist of scrutinizing and studying accessible data from databases,
records review anecdotes, diaries, reports, grades, and other data from the related sources that
would help in answering the research questions at hand.
Data can be gathered through sampling either random sampling like fishbowl method or lottery
method, and stratified random sampling; or using purposive sampling technique.

62
Data Organization and Presentation

Organization and presentation of data is the next phase of the statistical process after data
collection. This may be done through variety of formats: stem-and-leaf plots, tables, and graphs.

Stem-and-leaf plots

A stem-and-leaf plot is a way to organize numerical data using a two-column arrangement. The
first column (stem) consists of the digits other than the ones digit. The ones digits are listed in
order on the second column (leaf).

Example:

Below are the time (in minutes) taken by 30 students to finish a 20-item statistics quiz. Create the
stem-and –leaf plot of the time in minutes.

12 23 41 20 16 38 52 27 31 19
32 18 38 25 29 50 33 24 11 45
41 28 15 34 21 38 55 49 38 32

Solution: Because all figures are two-digit numbers, the tens digits will be placed on the stem
column and the ones digits will be on the leaf column.

Stem Leaf
1 1,2,5,6,8,9
2 0,1,3,4,5,7,8,9
3 1,2,2,3,4,8,8,8,8
4 1,1,5,9
5 0,2,5

Note that the corresponding units digits for each stem must be arranged from the lowest to value to
the highest .Also note that of there are one-digit numbers in the set of data, the stem is 0 since one-
digit numbers have no tens digit numbers . In addition, if here are three-digit figures, the first two-
digits will be on the stem.

Frequency Distribution Table

This is another way to organize and present statistical data, the values in the set of data
are grouped into classes and the number of cases in each class is recorded. It consists of the class
intervals, class limits, class size or class width, frequency, class boundaries, and class mark.

63
CHECK YOUR PROGRESS 1

The following table lists the ages of customers who purchased a cruise. Construct a stem-and-
leaf diagram for the data.

Ages of Customers Who Purchased a Cruise

32 45 66 21 62 68 72

61 55 23 38 44 77 64

46 50 33 35 42 45 51

51 28 40 41 52 52 33

LESSON 2 Measure of Central Tendency


Measure of central tendency or measure of location is a summary measure that describes a whole
set of data with a single quantity that represents the middle or center of its distribution cluster around a
central value.
NOTE: Data are collected from small portion of the large group.
Population-entire group under consideration
Sample-any subset of the population

Characteristics of Measure of Central Tendency

According to Nature of Computation

MEAN MEDIAN MODE


Computational or Rank or positional average Inspectional or commercial
calculated average average

According to Sensitivity to Other Data

MEAN MEDIAN MODE

Easily affected by an May or may not be May or may not be affected by an


increase or decrease in the affected by extreme values introduction of other data
number of data

64
 Mean is the average or arithmetic average of the scores or data.
 It is said to be the most reliable measure of central tendency and has the
̅)
Mean (𝒙 least probable error but does not supply information for the homogeneity of
the distribution.

To compute the mean of ungrouped data, we can use the formula:


Mean (𝒙
̅) for ∑𝑥
𝑥̅ =
𝑛
Ungrouped Data *the sum of the numbers divided by n
𝑥̅ for sample mean
𝜇(lowercase mu) for the population mean

EXAMPLE 1
Six students in Biology class of 20 students receive test grades of 92, 84, 65, 76, 88, and 90. Find the
mean of the test scores.
∑ 𝑥 92 + 84 + 65 + 76 + 88 + 90
𝑥̅ = = = 82.5
𝑛 6
The mean of the test scores of six students is 82.5.
CHECK YOUR PROGRESS 2
1. A doctor ordered 4 separate blood test to measure a patient’s blood cholesterol levels. The test
results were: 245, 235, 220, and 210.
2. Find the mean of the scores in Math quiz: 20,27,23,28,23,25

Weighted Mean

Weighted mean is mean calculated by giving values in a data set more


influence according to some attribute of the data. It is an average in which each
quantity to be averaged is assigned a weight.
The formula for weighted mean is
∑ 𝑤𝑥
𝑊𝑀 = ∑𝑤
,
where w is the weight of each value
x is the matching value

65
EXAMPLE
Compute for the general weighted average of Carla in her four major subjects in the first semester.
Subjects Grade(x) Units(w)
Statistics 1.75 4
Linear Algebra 2.75 3
Calculus 1 2.50 4
Solid Geometry 1.75 3

∑ 𝑤𝑥 [(4)(1.75)+(3)(2.75)+(4)(2.50)+(3)(1.75)
GWA= ∑𝑤
= 4+3+4+3
30.5
GWA= 14
GWA=2.18 Carla’s GWA for the first semester is 2.18.
CHECK YOUR PROGRESS 3

At MJR fitness and health society, 60% of the members are women and 40% are men. What is the
average age of all the members if the average age of women is 35 and the average age of the men is 30?

Mean of Grouped Data using Class Mark


∑(𝑓𝑥)
𝑥̅ =
∑𝑓
Where: 𝑥̅ =mean of grouped data
f=frequency of each class
x=class mark of each class
Examples
Consider the frequency of distribution below:
Class Interval Frequency(f)
75-79 5
70-74 7
65-69 8
60-64 10
55-59 18
50-54 9
45-49 5
Determine the mean of the distribution.
Solution:

First , get the midpoint or class mark of each class interval, by adding the lower limit and upper
79+75
limit then divide by 2 (example: Upper Limit =79 and Lower Limit =75, = 77).Next is
2
multiply the frequency of each class to the corresponding midpoint or class mark. Then, get the
sum of the products.

66
Class Interval Frequency(f) Classmark(x) fx From the values in the table,
75-79 5 77 385
we can compute the value of
70-74 7 72 504
the mean by substituting to the
65-69 8 67 536
60-64 10 62 620 formula:
55-59 18 57 399 ∑(𝑓𝑥) 3100
50-54 9 52 468 𝑥̅ = = = 62
45-49 5 47 188 ∑𝑓 50
N=50 The mean of the data is 62.
∑ 𝑓𝑥
= 3100

CHECK YOUR PROGRESS 4

The heights of 40 grade 6 pupils in a certain grade school are presented in a frequency
distribution as shown below:
Height of a class of 40 Students
Class Interval Frequency(f)
48-52 4
53-57 7
58-62 7
63-67 8
68-72 6
73-77 6
78-82 2
N=40
Determine the average height of the students using the midpoint method.

̃) of Ungrouped Data
Median(𝒙

Median is the middle value in a set of quantities, and falls in the middle- most position of
the whole of data. It separates an ordered set of data into two equal parts.
The median value in an ungrouped data is determined by arranging the numbers in the
values from lowest to highest or vice versa.
*If there is an odd amount of numbers, the median value is the middle most number, with
the same amount of numbers below or above.
*If there is an even amount of numbers in the list, the middle pair must be determined,
added together and divided by 2 to find the median value.
It can be used to determine an approximate average.

Example:
Find the median of the data in the following lists.
a.4,8,1, 14,9,21, 12
b.46,23,92,89,77,108

67
Solution

a.The list 4,8,1, 14,9,21, 12 contains 7 numbers, ranking the numbers from smallest to largest
gives: 1,4, 8,9, 12,14,21 the middle number is 9, thus the median is 9.

b. The list 46, 23,92,89,77,108 contains 6 numbers, ranking the numbers from smallest to largest
gives: 23,46,77,89,92,108 . The two middle numbers are 77 and 89.
77+89 166
Thus 2 = 2 = 83, the median is 83.

CHECK YOUR PROGRESS 5

Find the median of the data in the following list.

1. 14,27,3,82,64,34,8,51
2. 21.3, 37.4, 11.6, 82.5, 17.2
̃) of Grouped Data
Median(𝒙

𝑁
−< 𝑐𝑓𝑏
𝑥̃ = 𝐿𝐵𝑀𝐶 + [ 2 ]𝑖
𝑓𝑀𝐶
Where: 𝑥̃=median of grouped data
𝐿𝐵𝑀𝐶 =lower boundary of the median class
𝑁=sum of the fruquency
< 𝑐𝑓𝑏 = less than cumulative frequency below the median class
𝑓𝑀𝐶 =frequency of the median class
i= class size

Steps:
a.Compute for the less <cf of the data.
𝑁
b.Determine the median class by computing for the values of 2 .
𝑁
c.Locate the computed value for at the <cf column (must be within one of the <cf).The interval
2
corresponding to the <cf value is the median class.
d.Look at the <cf corresponding to the median class .Then get the <cf before the median class.
𝑁
e.Subtract the < 𝑐𝑓𝑏 from 2 .
f.Divide the answer in step e by the frequency of the median class .
g.Multiply the answer in step f by the value of i. To determine the value of i , subtract the lower
limit from the upper limit in any of the class intervals then add 1.
h.Add the answer in the step g to the exact lower limit (𝐿𝐵𝑀𝐶 ) of the median class.The answer is
this step is the median value of the data set.

68
Example
The record of 21 people in a 100m race is summarized in the given frequency table:
Time (in seconds) Frequency
51-55 2
56-60 7
61-65 8
66-70 4
N=21
Determine the median of the given data.

Solution
a.Compute for the less <cf of the data.
Time (in seconds) frequency <cf
51-55 2 2
56-60 7 9
61-65 8 17
66-70 4 21
N=21
𝑁 𝑁 21
b.Determine the median class by computing for the values of 2 . = = 𝟏𝟎. 𝟓
2 2
𝑁
c.Locate the computed value for at the <cf column (must be within one of the <cf).The interval
2
corresponding to the <cf value is the median class.
 Looking at the <cf column , we can see that 10.5 lies within 17. The interval that
corresponds to 17 is the interval 61-65, which is the median class.
d.Look at the <cf corresponding to the median class .Then get the <cf before the median class.
 The < 𝑐𝑓𝑏 (<cf before the median class) is 9.
𝑁 𝑁
e.Subtract the < 𝑐𝑓𝑏 from 2 . 2
−< 𝑐𝑓𝑏 = 10.5 − 9 = 1.5
1.5
f.Divide the answer in step e by the frequency of the median class . 8 = 0.1875
g.Multiply the answer in step f by the value of i. To determine the value of i , subtract the lower
limit from the upper limit in any of the class intervals then add 1.
 i=65-61=4+1=5
 i=5
 0.1875*5=0.9375
h.Add the answer in the step g to the exact lower limit (𝐿𝐵𝑀𝐶 ) of the median class.The answer is
this step is the median value of the data set.
 60.5 is the lower boundary of the median class
 60.5+0.9375=61.4375≈ 61.44

Using the formula we will have:

69
𝑁
−< 𝑐𝑓𝑏
𝑥̃ = 𝐿𝐵𝑀𝐶 + [ 2 ]𝑖
𝑓𝑀𝐶
21
−9
𝑥̃ = 60.5 + [ 2 ]5
8
10.5 − 9
𝑥̃ = 60.5 + [ ]5
8

𝑥̃ = 60.5 + 0.9375 = 61.4375 ≈ 61.44


Hence, the median of the data set in the problem is 61.44 seconds.

CHECK YOUR PROGRESS 6

The heights of 40 grade 6 pupils in a certain grade school are presented in a frequency
distribution as shown below:
Height of a class of 40 Students
Class Interval Frequency(f)
48-52 4
53-57 7
58-62 7
63-67 8
68-72 6
73-77 6
78-82 2
N=40
Determine the median of the height of grade six pupils.

Mode (𝒙
̂) of Ungrouped Data

The mode is the quantity with the most number of frequency, if there is no repeated number in
the list, then there is no mode.
 Unimodal distribution- contains only one mode
 Bimodal distribution- contains two modes
 Trimodal- a set of data with three modes
Example:
Find the mode of the data in the following lists.
a.18, 15, 21, 16, 15,14,15,21 mode is 15
b. 2,5,8,9,11,4,7,23 There is no mode since that there is no repeated number.

CHECK YOUR PROGRESS 7


Find the mode of the data in the following lists.
a.3,3,3,3,4,4,5,5,5,8 b.12,34,12,71,48,93,71
Mode (𝒙̂) of Grouped Data

70
∆𝟏
̂ = 𝑳𝑩𝑴𝒐𝑪 + (
𝒙 )𝒊
∆𝟏 + ∆𝟐

Where :𝑥̂ = 𝑚𝑒𝑑𝑖𝑎𝑛 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎


𝐿𝐵𝑀𝑜𝐶 = 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
𝑖 = 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒
∆𝟏 =difference between the frequency of the modal class and the frequency above it
∆𝟐 =difference between the frequency of the modal class and the frequency below it
Steps
a.Identify the modal class by determining the interval with the highest frequency.
b.Determine the exact lower limit (𝑳𝑩𝑴𝒐𝑪 ) of the modal class
c.Calculate ∆𝟏 and ∆𝟐 .
d.Determine the value of i by subtracting the lower limit from the upper limit in any of the class
intervals then add 1.
e.Substitute the values in the formula.

Example

The record of 21 people in a 100m race is summarized in the given frequency table:
Time (in seconds) Frequency
51-55 2
56-60 7
61-65 8
66-70 4
N=21
Determine the mode of the given data.
Solution
a.Identify the modal class by determining the interval with the highest frequency.
 The highest frequency is 8 and the corresponding interval is 61-65, which means that this
is the modal class.
b.Determine the exact lower limit (𝑳𝑩𝑴𝒐𝑪 ) of the modal class . 𝑳𝑩𝑴𝒐𝑪 = 𝟔𝟎. 𝟓
c.Calculate ∆𝟏 and ∆𝟐 .
 ∆𝟏 = 𝟖 − 𝟕 = 𝟏
 ∆𝟐 = 𝟖 − 𝟒 = 𝟒
d.Determine the value of i by subtracting the lower limit from the upper limit in any of the class
intervals then add 1.
∆𝟏
 i=65-61=4+1=5 ̂ = 𝑳𝑩𝑴𝒐𝑪 + (
𝒙 )𝒊
e.Substitute the values in the formula. ∆𝟏 + ∆𝟐
𝟏
̂ = 𝟔𝟎. 𝟓 + (
𝒙 )𝟓
𝟏+𝟒
𝟏
̂ = 𝟔𝟎. 𝟓 + ( ) 𝟓
𝒙
𝟓
̂ = 𝟔𝟎. 𝟓 + 𝟏
𝒙
̂ = 𝟔𝟏. 𝟓
𝒙
Therefore the mode of the data is 61.5.
71
CHECK YOUR PROGRESS 8

The heights of 40 grade 6 pupils in a certain grade school are presented in a frequency
distribution as shown below:
Height of a class of 40 Students
Class Interval Frequency(f)
48-52 4
53-57 7
58-62 7
63-67 8
68-72 6
73-77 6
78-82 2
N=40
Determine the mode of the height of grade six pupils.

LESSON 3 Measure of Relative Position


Measures of relative position are used to locate the relative position of an observation in a
set of data and they are said to be the natural extension of the median. The common measures of
relative position are the quartiles, deciles, and percentiles, and standard scores/z-scores. They are
the extension of median.
1. Quartiles are natural extension of median that divide a distribution into four equal
parts. The lower quartile 𝑄1(first quartile) is the value of the variable below which
25% of the cases lies. The formulas are as follows:

Ungrouped Data 𝑘(𝑁 + 1)


𝑄𝑘 = 𝑡ℎ 𝑖𝑡𝑒𝑚.
4
Grouped Data 𝑘𝑁
−< 𝑐𝑓𝑏 )
𝑄𝑘 = 𝐿𝐵𝑄𝑘 + [ 4 ]𝑖
𝑓𝑄𝑘
Where : 𝑄𝑘 = kth quartile
𝐿𝐵𝑄𝑘 =lower boundary of the kth quartile
< 𝑐𝑓𝑏 =less than cumulative frequency below
the kth quartile class
𝑓𝑄𝑘 =frequency of the kth quartile class
i=class size
N=total number of observations
This means that 25% /50%/75%/100% of the observations lies below this value.

2. Deciles are natural extensions of median that divide a distribution into ten equal parts.
The lower decile 𝐷1 (first decile) is the value of the variable below which 10% of the
cases, and so on. The formulas are as follows:

72
Ungrouped Data 𝑘(𝑁 + 1)
𝐷𝑘 = 𝑡ℎ 𝑖𝑡𝑒𝑚.
10
Grouped Data 𝑘𝑁
−< 𝑐𝑓𝑏 )
𝐷𝑘 = 𝐿𝐵𝐷𝑘 + [ 10 ]𝑖
𝑓𝐷𝑘
Where : 𝐷𝑘 = kth decile
𝐿𝐵𝐷𝑘 =lower boundary of the kth decile
< 𝑐𝑓𝑏 =less than cumulative frequency below
the kth decile class
𝑓𝐷𝑘 =frequency of the kth decile class
i=class size
N=total number of observations
This means that 10% /20%/30%.../100% of the observations lies below this value.

3. Percentiles are natural extensions of median that divide a distribution into 100 equal
parts. There are 99 percentiles, denoted by 𝑃1 , 𝑃2 , … 𝑃99. They are generally used to
characterize values according to the percentage below them.

Ungrouped Data 𝑘(𝑁 + 1)


𝑃𝑘 = 𝑡ℎ 𝑖𝑡𝑒𝑚.
100
Grouped Data 𝑘𝑁
−< 𝑐𝑓𝑏 )
𝑃𝑘 = 𝐿𝐵𝑝𝑘 + [100 ]𝑖
𝑓𝑃𝑘
Where : 𝐷𝑘 = kth percentile
𝐿𝐵𝐷𝑘 =lower boundary of the kth percentile
< 𝑐𝑓𝑏 =less than cumulative frequency below
the kth percentile class
𝑓𝐷𝑘 =frequency of the kth percentile class
i=class size
N=total number of observations
This means that 1% /2%/3%, 4%, 5%,.../100% of the observations lies below this value.

Example for measure of position of ungrouped data


Find the 𝑸𝟑 , 𝑫𝟗 , 𝐚𝐧𝐝 𝑷𝟕𝟕 of each of the following data sets. Make a boxplot for each
distribution.
Scores of students in the Math Quiz: 45, 59, 52, 46, 41, 26, 36, 34, 38, 41, 39, 38, 30, 49, 46, 51

Solution

a.Find 𝑄3 k=3 N=16


3(16+1) 3(17) 51
𝑄3 = 4 = 4 = 4 =12.75
This means 75% of the students have score that are below or lower than 12.75.

73
b.Find 𝐷9 k=9 N=16
9(16 + 1) 9(17) 153
𝐷9 = = = = 15.3
10 10 10
This means that 90% of the students have the score that are below or lower than 15.3.
c.Find 𝑷𝟕𝟕 k=77 N=16
(
77 16 + 1 ) (
77 17 ) 1309
𝑃77 = = = = 13.09
100 100 100
This means that 77% of the students have the score that are below or lower than 13.09.

Example for measure of position for grouped data

Thirty students in a class took a Math test. The results are recorded in groups. The
data is shown in the table that follows:

Score frequency <cf


70-79 2 30
60-69 3 28
50-59 2 25
40-49 7 23
30-39 9 16
20-29 7 7
N=30
Find the 1 quartile, 3 decile, and 68th percentile.
st rd

Solution
1st quartile k=1 N=30
𝑘𝑁 1∙3
 Find the first quartile class. = 4 = 7.5
4
 Locate the 7.5 in the <cf column, and it is within <cf=16, identify the interval that
corresponds to it , 30-39.
 Next, solve using the formula.So we have,
𝑘𝑁
−< 𝑐𝑓𝑏 )
𝑄𝑘 = 𝐿𝐵𝑄𝑘 + [ 4 ]𝑖
𝑓𝑄𝑘
7.5 − 7
𝑄1 = 29.5 + [ ] 10
9
𝑄1 = 29.5 + 0.55555 …

𝑄1 = 30.06

This means that 25% of the students scored below or lower than 30.06 points.

3rd decile k=3 N=30

74
𝑘𝑁 3∙30 90
 Find the third decile class. = = 10 = 9
10 10
 Locate the 9 in the <cf column, and it is within <cf=16,identify the interval that
corresponds to it , 30-39.
 Next, solve using the formula. So we have,
𝑘𝑁
−< 𝑐𝑓𝑏 )
𝐷𝑘 = 𝐿𝐵𝐷𝑘 + [ 10 ]𝑖
𝑓𝐷𝑘
9−7
𝐷3 = 29.5 + [
] 10
9
𝐷3 = 29.5 + 2.2222 …
𝐷3 = 31.72
This means that 25% of the students scored below or lower than 31.72 points.

68th percentile k=68 N=68


𝑘𝑁 68∙30 2040
 Find the third decile class. 100 = 100 = 100 = 20.4
 Locate the 24 in the <cf column, and it is within <cf=23,identify the interval that
corresponds to it , 40-49.
 Next, solve using the formula.So we have,
20.4 − 16)
𝑃68 = 39.5 + [ ] 10
7

𝑃68 = 39.5 + 62.8571 …

𝑃68 = 102.36

Standard Score (z-score) specifies how many standard deviations an observation is from (above
or below) the mean. It can be calculated using the formula.

𝑋−𝜇 𝑋 − 𝑋̅
𝑧= 𝑜𝑟 𝑧 =
𝜎 𝑠
Where z is the z-score, X is the value of the element, 𝜇 is the population mean, 𝜎 is the
population standard deviation. Z-scores are also a way to compare results from a test to a normal
population.

Examples:
Convert the following scores to z-scores where 𝜇 = 65 𝑎𝑛𝑑 𝜎 = 15
a.)55 b).70 c.)85
𝑋−𝜇 55−65
a. 𝑧 = 𝜎 = 15 = −0.67
𝑋−𝜇 70−65
b. 𝑧 = = = 0.33
𝜎 15
𝑋−𝜇 85−65
c. 𝑧 = = = 1.33
𝜎 15

75
CHECK YOUR PROGRESS 9

I.Compute the 3rd decile, 9th decile and 75th percentile.

20,27,23,28,23,25

II.Thirty students in a class took a Math test. The results are recorded in groups. The data is shown in
the table that follows:
Score frequency <cf
70-79 2 30
60-69 3 28
50-59 2 25
40-49 7 23
30-39 9 16
20-29 7 7
N=30
Find the 2 quartile, 40 decile, and 80th percentile.
nd th

LESSON 4 Measure of Variability


measures of spread or variability, and measure of dispersion

Measure of Variability

-are the measures used to determine how clustered together or how far apart the values in a set of
data are with reference to the center, i.e.mean , of the distribution

Significance of measures of variability

It determines the homogeneity and heterogeneity of the set of data.

Homogeneity- the quality or state of all being the same

Heterogeneity-the quality or state of being diverse

Interpretation of the result

 Smaller value of variability indicates that the scores in the data are more clustered closely
around the mean. There is a data consistency, homogeneity, and uniformity of
distribution.
 Bigger measure of variability suggests heterogeneity, and uniformity of distribution.

76
Four ways to describe the variability of a set of data

Range, Mean absolute deviation, Variance, Standard deviation

Range

The distance between the highest and the lowest values in a set of data. It is unreliable
measure of variability because only two values are considered.

R=Highest Score – Lowest Score

Example

Determine the range of the following set of data.

a. 76, 90,98,82,79,88,82,97,75 Range=98-75=23

b. 3,4,12,9,15,9,11,13 Range=15-3=12

c. 178,183,123,118,127,192, 200 Range=200-118=82

Mean Absolute Deviation (MAD)

It is one of the measures of variability that utilizes all values in a data set. It is defined as the
average of the absolute deviations of each score from the mean.
Formula in getting the mean deviation is:
∑|𝑥 − 𝑥̅ |
𝑀𝐴𝐷 =
𝑁
Where:
X=individual score from the data set,
𝑥̅ =mean of the data set,
∑|𝑥 − 𝑥̅ |=sum of all absolute values of the difference of each scores from the mean, and
N=total number of observations in the data set
Steps:
1. Find the mean.
2. Calculate the distance of each score from the mean by getting the absolute value of
their difference.
3. Plug the appropriate values in the formula and solve.

Example:
Find the mean absolute deviation.
1.The shoe sizes of 8 students in a classroom are 6,6,7,8,8,9,10, and 10.

77
Solution
6+6+7+8+8+9+10+ 10. 64
 The mean of the shoe size is = =8
8 8

x(shoe size) 𝑥 − 𝑥̅ |𝑥 − 𝑥̅ |
6 -2 2
6 -2 2
7 -1 1
8 0 0
8 0 0
9 1 1
10 2 2
10 2 2
TOTAL ∑|𝑥 − 𝑥̅ | = 10
Thus,
∑|𝑥 − 𝑥̅ | 10
𝑀𝐴𝐷 = = = 1.25
𝑁 8
Therefore the average distance of each shoe size from the mean is 1.25.

Variance and Standard Deviation

Variance

The basic measure of dispersion, and the average of the squared deviations of each score
from the mean.

Standard Deviation

The most commonly used to measure for dispersion, is the square root of the variance.

Variance and Standard deviation Formula

For population variance 𝟐


𝒙 )𝟐
∑(𝒙 − ̅
𝝈 =
𝑵

For sample variance ∑(𝑥 − 𝑥̅ )2


𝑠2 =
𝑁−1

78
For population standard deviation
∑(𝑥 − 𝑥̅ )2
𝜎=√
𝑁

For sample standard deviation


∑(𝑥 − 𝑥̅ )2
𝑠=√
𝑁−1
Example:
Compute the range, mean absolute deviation, variance and standard deviation of daily rate
(pesos) of 7 employees in a company:
352, 388, 402, 391, 339, 362, and 426

Solution

a.Range= HS-LS=426 − 352 = 74


b. Mean Absolute Deviation.
352 + 388 + 402 + 391 + 339 + 362 + 426 2,660
𝑥̅ = = = 380
7 7
X (𝑥 − 𝑥̅ ) |(𝑥 − 𝑥̅ )| (𝑥 − 𝑥̅ )2
352 -28 28 784
388 8 8 64
402 22 22 484
391 11 11 121
339 -41 41 1681
362 -18 18 324
426 46 46 2116
∑|(𝑥 − 𝑥̅ )| = 174 ∑(𝑥 − 𝑥̅ )2 = 5574

∑|𝑥 − 𝑥̅ | 174
𝑀𝐴𝐷 = = = 24.86
𝑁 7
c.Sample Variance
2
∑(𝑥 − 𝑥̅ )2 5574
𝑠 = = = 929
𝑁−1 7−1
d.Sample Standard deviation
∑(𝑥 − 𝑥̅ )2 5574
𝑠=√ =√ = √929 = 30.48
𝑁−1 7−1

CHECK YOUR PROGRESS 10


Compute the range, mean absolute deviation, sample variance and sample standard deviation
1. The scores of the students in Math quiz are: 43, 44, 50, 37, 29, 43, 34, and 48.
2. The number of minutes of phone calls by 9 random students from BSIT16.4, 15.2, 18.9,
12.5, 14.3, 20.7, 10.8, 11.4, and 10.9.
79
LESSON 5

A normal distribution forms a bell-shaped curve that is symmetric about a vertical


line through the mean of the data. A graph of a normal distribution is shown above.

Properties of Normal Distribution

Every normal distribution has the following properties.

 The graph is symmetric about a vertical line through the mean of the distribution.
 The mean, median and mode are equal.
 The y-value of each point on the curve is the percent represented as a decimal of the data
at the corresponding x-value.
 Area under the curve that are symmetric about the mean are equal.
 The total area under the curve is 1.

QUESTION What is the area under the curve to the right of the mean for a
normal distribution?
Empirical Rule for a Normal Distribution

In a normal distribution, approximately,

 68% of the data lie within 1 standard deviation of the mean.


 95% of the data lie within 2 standard deviation of the mean.
 99.7% of the data lie within 3 standard deviation of the mean.

80
Example
A survey of 1000 Philippine gas stations found that the price charged
for a gallon of regular gas could be closely approximated by a normal distribution
with a mean of Php 3.10 and a standard deviation of Php 0.18 .How many of the
station charge
a.between Php 2.74 and Php 3.46 for a gallon of regular gas?
b.less than Php 3.28 for a gallon of regular gas?
c.more than Php 3.46 for a gallon of regular gas?

Solution

a.The Php 2.74 per gallon price is 2 standard deviations below the mean.
The Php 3.46 price is 2 standard deviations above the mean. In a normal
distribution, 95% of all data lie within 2 standard deviation of the mean.
Therefore approximately

(95%)(100)=(0.95)(100)=950 of the stations charge between Php 2.74


and Php 3.46 for a gallon of regular gas.

b.The Php 3.28 price is 1 standard deviation above the mean .In a
normal distribution , 34% of all data lie between the mean and 1
standard deviation above the mean. Thus approximately

(34%)(1000)=(0.34)(1000)=340 of the station charge between Php 3.10


and Php 3.28 for a gallon of regular gasoline. Half of the1000 stations,
or 500 stations charge less than the mean. Therefore about
340+500=840 of the stations charge less than Php 3.28 for a gallon of
regular gas.

c.The Php 3.46 is 2 standard deviation above the mean. In normal


distribution, 95% of all data are within 2 standard deviations of the
mean. This means that the other 5% of the data will lie either more than
2 standard deviation above the mean or more than 2 standard deviations
below the mean. Since we are interested only in the data that are more
1
than 2 standard deviations above the mean, which is 2 𝑜𝑓 5% or 2.5% of
the data.Thus about (2.5%)(1000)=(0.025)(1000)=25 of the stations
charge more than Php 3.46 for a gallon of regular gas.

81
The Standard Normal Distribution
The standard normal distribution is the normal distribution that has a
mean of 0 and a standard deviation of 1.

Linear Correlation Coefficient

HISTORICAL NOTE

Karl Pearson
Karl Pearson spent most of his career as a mathematics professor
at University College ,London. Some of his major contributions
concerned the development of statistical procedures such as
regression analysis and correlation. He was particularly
interested in applying statistical concept to the study of heredity.
The term standard deviation was invented by Pearson , and
because of his work in the area of correlation, the formal name
given to the linear correlation coefficient is Pearson product
moment of correlation. He was also a co-founder of statistical
journal Biometrika.

To determine the strength of linear relationship between two variables, statisticians use a statistic
called a linear correlation coefficient, which is denoted by the variable r and is defined as
follows.
Linear Correlation Coefficient

For the n ordered pairs (𝑥1, 𝑦1 ), (𝑥2, 𝑦2 ), (𝑥3, 𝑦3 ), . . . , (𝑥𝑛 , 𝑦𝑛 ), the linear correlation coefficient r is given by

𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦)


𝑟=
√[𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2 ][𝑛(∑ 𝑦 2 ) − (∑ 𝑦)2
\

 If the linear correlation coefficient r is positive, the relationship between the variables has
a positive correlation. In this case, if one variable increases, the other variable tends to
increase.
 If r is negative, the linear relationship between the variables has a negative correlation. In
this case, if one variable increases, the other variable tends to decrease.

82
Strength of Relationship

Values of Correlation r Interpretation


±.80 to ±.99 Very High correlation
±.60 to ±.79 High correlation
±.40 to ±.59 Moderate correlation
±.20 to ±.39 Low correlation
±.01 to ±.19 Negligible correlation

Example
Find the value of correlation and the strength of relationship of the hours
spent playing basketball and Algebra test scores.
Hours spent playing 4 5 7 8 10
basketball(x)
Weekly Algebra test 52 60 72 79 83
score(y)

83
Solution
hours spent playing Weekly Algebra Test Score(y) 𝑥𝑦 𝑥2 𝑦2
basketball(x)
4 52 208 16 2704
5 60 300 25 3600
7 72 504 49 5184
8 79 632 64 6241
10 83 830 100 6889
∑ 𝑥 =34 ∑ 𝑦 =346 2
∑ 𝑥𝑦 =2474 ∑ 𝑥 =254 2
∑ 𝑦 =24618
n=5
Applying to the formula, we have
𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦)
𝑟=
√[𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2 ][𝑛(∑ 𝑦 2 ) − (∑ 𝑦)2
5(2474) − (34)(346)
𝑟=
√[5(254) − (34)2 ][34(24618) − (346)2 ]
12370 − 11764
𝑟=
√(1270 − 1156)(123090 − 119716)
608
𝑟=
√(114)(3374)
608
𝑟=
√384636
𝑟 ≈ 0.98
The value or correlation is 0.98 which means that there is a very high positive relationship
between the students’ algebra test scores and the time student spent playing a basketball.

CHAPTER SUMMARY

LESSON 1 Data Collection and


Management

Statistics Statistics involves the collection, organization, summarization,


presentation, and interpretation of data.

Braches of Statistics Descriptive and Inferential Statstics

Data Collection Methods These are the data collection methods: Interview;Survey and

84
Questionnaire ;Census; Observation; Experimentation; and
Documents or records review
Stem-and Leaf Plots A stem-and-leaf plot is a way to organize numerical data using a
two-column arrangement. The first column (stem) consists of the
digits other than the ones digit. The ones digits are listed in order
on the second column (leaf).
LESSON 2 Measure of Central
Tendency

Measure of Centra Tendency Measure of central tendency or measure of location is a summary


measure that describes a whole set of data with a single quantity
that represents the middle or center of its distribution cluster
around a central value.
Mean Mean is the average or arithmetic average of the scores or data.

Median is the middle value in a set of quantities, and falls in the


Median
middle- most position of the whole of data. It separates an ordered
set of data into two equal parts.

The mode is the quantity with the most number of frequency, if


Mode there is no repeated number in the list, then there is no mode.

LESSON 3 Measure of Relative


Position

Quartiles Quartiles are natural extension of median that divide a distribution


into four equal parts. The lower quartile 𝑄1(first quartile) is the
value of the variable below which 25% of the cases lies.

Deciles are natural extensions of median that divide a distribution


Deciles
into ten equal parts. The lower decile 𝐷1 (first decile) is the value
of the variable below which 10% of the cases, and so on.
Percentiles are natural extensions of median that divide a
Percentiles distribution into 100 equal parts. There are 99 percentiles, denoted
by 𝑃1 , 𝑃2 , … 𝑃99. They are generally used to characterize values
according to the percentage below them.
LESSON 4 Measure of
Variability
Measure of Varibility It measures used to determine how clustered together or how far
apart the values in a set of data are with reference to the center,
i.e.mean , of the distribution
Range The distance between the highest and the lowest values in a set of
data.

85
Mean Absolute It is defined as the average of the absolute deviations of each score
Deviation(MAD) from the mean.

The basic measure of dispersion, and the average of the squared


Variance deviations of each score from the mean.

Standard Deviation The most commonly used to measure for dispersion, is the square
root of the variance.
LESSON 5 Normal
Distribution
Normal Distribution A normal distribution forms a bell-shaped curve that is
symmetric about a vertical line through the mean of the
data.

The standard normal distribution is the normal


Standard Normal Distribution
distribution that has a mean of 0 and a standard deviation
of 1.
Linear Correlation Coefficient It determine the strength of linear relationship between two
variables, statisticians use a statistic called a linear correlation
coefficient, which is denoted by the variable r

86
CHAPTER TEST

Test I. Find the mean, median and mode(s) if any, for the given data. Round the noninteger
means to the nearest tenth. (Note:Copy and answer)

Given Mean Median Mode


1.2,7,5,7,14
2.8,3,3,17,9,22,19
3.11,8,2,5,17,39,52,42
4.101,88, 74, 60, 12,94,74,85
5.255, 178, 192, 154, 202, 188,178,201
Test II. This table shows the overall achievement required for different awards in a tertiary
science subject. (Solution and answer only)
AWARD ACHIEVEMENT The science subject has three assessment
High Distinction 80% and above tasks. Each task is weighted as follows:
Distinction 70%-79% Assessment Task 1: weight 60%
Credit 60%-69% Assessment Task 2: weight 30%
Satisfactory 50%-59% Assessment Task 3: weight 10%
Unsatisfactory Below 50%

Alex’s result for each task was: Assessment Task 1: 70%; Assessment Task 2: 80%; and
Assessment Task 3: 90% . What is Alex’s award for science?

A. High Distinction B. Distinction C. Credit D. Satisfactory

Test III. Find the mean, median, and mode , 𝑄1, 𝐷6 𝑎𝑛𝑑 𝑃58of the following grouped data.

Weight of 50 women in a fitness club. (Solution and answer only)

Weight in lbs Frequency


129-126 2
121-128 7
113-120 6
105-112 5
97-104 10
89-96 12
81-88 8

87
Test IV. Fast-food Calories

A survey of 10 fast-food restaurants noted the number of calories in a mid-sized hamburger. The
results are given in the table below.

Calories in a mid-sized hamburger


514 507 502 498 496 506 458 478 461 514

Find the range, Mean Absolute deviation, variance and standard deviation. Round the answer in
to the nearest hundredth. (Solution and answer only)

Test V. Use z-Scores

A customer group tested a sample of 100 light bulbs. It found that the mean life span of the bulbs
was 842 h, with a standard deviation of 90.One particular light bulb from the DuraBright
Company had a z-score of 1.2 .What was the life span of this light bulb?

Test VI. Construct the stem-and-leaf plot of the number of written prescription of a doctor each
day for a 36-day period. (answer only)

Number of Written Prescriptions per Day


8 12 14 10 9 16
7 14 10 7 11 16
11 12 8 14 13 10
9 14 15 12 10 8
10 14 8 7 12 15
14 10 9 13 10 12
Test VI. A psychologist wants to determine whether there is a relationship between how long it
takes a subject to compete a manual task and the number of hours of sleep the subject it had the
night before. The results from a study of 10 people are given in the following table.

Hours of 6.2 8.1 7.5 8.4 5.0 6.2 4.8 8.0 3.8 5.9
Sleep
Minutes to 9.0 8.6 8.4 8.6 10 9.3 9.9 8.9 10.4 9.1
complete a
task
Find the linear correlation coefficient for the data and the strength of relationship. Round your
answer to the nearest hundredth.

88

You might also like