Lecture 9
Lecture 9
Mathematics in the
Modern World
STATISTICS – PART I
2
WHAT IS STATISTICS?
Statistics is the science of collecting, analyzing, and
drawing conclusions from data. It is a mathematical
discipline to collect and summarize data.
• Descriptive Statistics – describing data via graphs, tables, or other statistical measures.
Applicable when you have a lot of data and want to summarize it appropriately.
• Inferential Statistics – inferring/estimating population characteristics from sample data.
If a sample represents a given population accurately, then analyzing the sample can lead
3
POPULATION AND SAMPLE
4
COMMON SAMPLING TECHNIQUES
Probability Sampling
• Simple Random – every member has an
equal chance of being selected. Random
number generators or draw lots can be
performed.
• Systematic – every member of the
population is listed with a number and
individuals are chosen at regular intervals.
5
COMMON SAMPLING TECHNIQUES
Probability Sampling
• Stratified – dividing the population into subpopulations
(strata) that may differ in important ways, based on
6
COMMON SAMPLING TECHNIQUES
Non-Probability Sampling
• Convenience– the individuals who happen
to be the most accessible to the
researcher.
• Purposive – also known as judgment
sampling, involves the researcher to use
their expertise in selecting a sample that is
most useful to the research.
7
COMMON SAMPLING TECHNIQUES
Non-Probability Sampling
• Snowball – recruit participants via other
participants.
• Quota – relies on the non-random
selection of a predetermined number or
proportions of units known as quota. The
population is divided into mutually
exclusive subgroups (strata) and then
recruit samples until the quota is reached.
8
MEASURES OF CENTRAL TENDENCY
• Used to find the central value of the set of
data. It represents the entire population.
• Three measures of central tendency:
1. Mean – average of all data
2. Median – middlemost value in the ordered
arrangement of values in the dataset
3. Mode – most frequently occurring
item/value in a dataset.
9
MEASURES OF CENTRAL TENDENCY
Mean
10
MEASURES OF CENTRAL TENDENCY
Median
• The middle value of a set of observations arranged in increasing or
decreasing order of magnitude, denoted by 𝑥.
!
• It is a positional value unaffected by the presence of extreme values
or outliers. Thus, it is more appropriate to report than the mean
when outliers are present.
ODD EVEN
11
MEASURES OF CENTRAL TENDENCY
Mode
• The value that occurs the greatest number of times or with the
highest frequency.
• Appropriate for nominal or categorical type of data.
• If observations occur with equal frequencies, then there is no modal
value for the data set.
12
MEASURES OF CENTRAL TENDENCY
Exercise
A student listed 10 of his classmates' scores in GE 4 prelims:
34, 42, 27, 34, 45, 32, 31, 44, 33 and 31.
13
MEASURES OF CENTRAL TENDENCY
Exercise: 34, 42, 27, 34, 45, 32, 31, 44, 33 and 31.
Calculate the following: 27, 31, 31, 32, 33, 34, 34, 42, 44, 45.
34 + 42 + 27 + 34 + 45 + 31 + 44 + 33 + 31
a. Mean 𝑥̅ = = 35.30
10
33 + 34
b. Median 𝑥. = = 33.50
2
15
MEASURES OF CENTRAL TENDENCY
Exercise: 89, 79, 87, 97, 94, 80, 85
Calculate the following: 79, 80, 85, 87, 89, 94, 97
89 + 79 + 87 + 97 + 94 + 80 + 85
a. Mean 𝑥̅ = = 87.29
7
b. Median 𝑥. = 87
17
MEASURES OF RELATIVE POSITION
1. Percentiles
where:
N – total number of data
2. Quartiles k – kth position
18
MEASURES OF RELATIVE POSITION
Exercise SCORES
34
Given the following scores in an Algebra test:
42
27
34
Calculate the following:
45
a. Q3 32
31
b. D6
44
c. P40 33
31
19
MEASURES OF RELATIVE POSITION
Exercise R SCORES
1 27
3(10) 𝑄! = 34 + 0.5(42 − 34)
a.) 𝑄! = = 7.5 2 31
4 𝑄! = 38 3 31
4 32
6(10) 5 33
b.) 𝐷" = =6 𝐷" = 34
10 6 34
7 34
40(10) 8 42
c.) 𝑃#$ = =4 𝑃#$ = 32 9 44
100
10 45
20
MEASURES OF RELATIVE POSITION
Exercise
Given the following grades in Statistics:
SCORES
89
Calculate the following: 79
87
a. Q1
97
b. D8 94
80
c. P65
85
21
MEASURES OF RELATIVE POSITION
Exercise
22
MEASURES OF RELATIVE POSITION
• The percentile cut the data into two
Ø P% of the data that lie below it.
Ø (100-P)% of the data that live above it.
• Quartiles are three numbers, Q1, Q2, and
Q3 that divide the data set into fourths.
• The median is Q2 which divides the data
into lower and upper sets.
• Q1 is the median of the lower set and Q3
Q2 = D5 = P50
is the median of the upper set.
23
MEASURES OF RELATIVE POSITION
• The interquartile range (IQR) describes the difference between Q3 and Q1
and can help determine potential outliers if it is 1.5 IQR below the first
quartile and 1.5 IQR above the third quartile. Outliers can be errors or some
abnormality in the data set.
Example:
Below are 11 salaries in US dollars. Calculate the IQR and determine any
potential outliers.
$33,000; $64,500; $28,000; $54,000; $72,000; $68,500;
$69,000; $42,000; $54,000; $120,000; $40,500
24
MEASURES OF RELATIVE POSITION
Example: Arrange in order: $28,000; $33,000; $40,500; $42,000; $54,000;
$54,000; $64,500; $68,500; $69,000; $72,000; $120,000
Median or Q2 = $54,000
Q1 = $40,500 Q3 = $69,000
IQR = Q3 – Q1 = $69,000 - $40,500 = $28,500
1.5 (IQR) = (1.5)($28,500) = $42,750
Q1 – 1.5 (IQR) = $40,500 – $42,750 = -$2,250
Q3 + 1.5 (IQR) = $69,000 + $42,750 = $111,750
Thus, $120,000 is a potential outlier.
25
BOX AND WHISKERS PLOT
26
BOX AND WHISKERS PLOT
• A five-number summary is used to construct a box plot:
{Xmin, Q1, Q2, Q3, Xmax}
• Each number is represented by a vertical line segment. A box is formed using
the segments Q1 and Q3 as its two vertical sides and two horizontal lines are
extended from the vertical segments marking Q1 and Q3 to the adjacent
extreme values.
27
BOX AND WHISKERS PLOT
Example:
Construct a box plot with the given five-number summary:
Xmin = 1.39
Q1 = 1.90
Q2 = 2.62
Q3 = 3.33
Xmax = 4.00
28
BOX AND WHISKERS PLOT
Example: Given the data on the table, construct a box plot.
R SCORES
Xmin = 79 1 79
Q1 = 80 2 80
3 85
Q2 = 87
4 87
Q3 = 94 5 89
Xmax = 97 6 94
7 97
29
MEASURES OF DISPERSION/VARIABILITY
• Measures how the data spreads from the central location.
• Used in comparing two sets of data. The lesser the value is, the closer the values
of observation from the central values.
Four measures of dispersion/variability:
1. Range – difference between highest and lowest
2. Variance – considers deviation of each observation from the mean
3. Standard Deviation (SD) – positive square root of the variance.
4. Mean absolute deviation (MAD) – average distance between each data point
from the mean.
30
MEASURES OF DISPERSION/VARIABILITY
Range
• The difference between the highest and the lowest value.
Where:
R – range
HV – highest value
LV – lowest value
31
MEASURES OF DISPERSION/VARIABILITY
Variance
• Considers the deviation of each observation from the mean.
• The formulas are given below:
Population Variance
32
MEASURES OF DISPERSION/VARIABILITY
Variance
• Considers the deviation of each observation from the mean.
• The formulas are given below:
Sample Variance
33
MEASURES OF DISPERSION/VARIABILITY
Standard Deviation (SD)
• Positive square root of the variance.
34
MEASURES OF DISPERSION/VARIABILITY
Mean Absolute Deviation (MAD)
• Average distance between each data point and the mean. It gives the
idea on the variability in a data set.
where
xi – set of observation or data
𝑥 – mean
n – sample size or total number of
observations
35
MEASURES OF DISPERSION/VARIABILITY
Example: Consider the test results from a sample of students in a class.
SCORES
Calculate the following:
34
42
27 a. Range
34
45
b. Mean Absolute Deviation
32 c. Variance
31
d. Standard Deviation
44
33
31
36
MEASURES OF DISPERSION/VARIABILITY
Example: Consider the test results from a sample of students in a class.
SCORES Calculate the following:
34
42 a. Range
27
34
45
32
31 R = 45 – 27 = 18
44
33
31
37
MEASURES OF DISPERSION/VARIABILITY
Example: Consider the test results from a sample of students in a class.
xi |xi - 𝑥 | Calculate the following:
34 1.3
42 6.7 b. Mean Absolute Deviation
27 8.3
34 1.3
45 9.7
32 3.3 Mean = 𝑥 = 35.30
31 4.3
= 50.2
44 8.7
33 2.3
31 4.3 = 5.02
38
MEASURES OF DISPERSION/VARIABILITY
Example: Consider the test results from a sample of students in a class.
xi |xi - 𝑥 | (xi - 𝑥 )2
34 1.3 1.69 Calculate the following:
42 6.7 44.89 c. Variance
27 8.3 68.89
34 1.3 1.69
45 9.7 94.09
32 3.3 10.89
d. Standard Deviation
31 4.3 18.49
44 8.7 75.69
33 2.3 5.29
31 4.3 18.49
39
MEAN AND STANDARD DEVIATION
For a normal distribution (bell-shaped and symmetric):
• ~ 68% of the data is within 1 SD
of the mean.
• ~ 95% of the data is within 2 SD
of the mean.
• ~99.7% of the data is within 3 SD
of the mean.
68-95-99.7 RULE
40