0% found this document useful (0 votes)
18 views

Lecture 9

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lecture 9

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

GE 4

Mathematics in the
Modern World

ENGR. MARC FRANCIS M. LABATA, PH.D. Lecture 9


INSTRUCTOR
TOPIC:

STATISTICS – PART I

2
WHAT IS STATISTICS?
Statistics is the science of collecting, analyzing, and
drawing conclusions from data. It is a mathematical
discipline to collect and summarize data.

Two Branches of Statistics:

• Descriptive Statistics – describing data via graphs, tables, or other statistical measures.

Applicable when you have a lot of data and want to summarize it appropriately.
• Inferential Statistics – inferring/estimating population characteristics from sample data.

If a sample represents a given population accurately, then analyzing the sample can lead

to significant conclusions about the population as a whole.

3
POPULATION AND SAMPLE

Population – the entire group that a statistical


sample is drawn. If an accurate sample is drawn, a
significant hypothesis can be developed with
reference to the entire population.

Sample – a set of data collected from a specific


population by a defined procedure.

4
COMMON SAMPLING TECHNIQUES
Probability Sampling
• Simple Random – every member has an
equal chance of being selected. Random
number generators or draw lots can be
performed.
• Systematic – every member of the
population is listed with a number and
individuals are chosen at regular intervals.

5
COMMON SAMPLING TECHNIQUES
Probability Sampling
• Stratified – dividing the population into subpopulations
(strata) that may differ in important ways, based on

relevant characteristics. Random or systematic

sampling is then employed to select a sample from


each subgroup.

• Cluster – dividing the population into subgroups that

have similar characteristics to the whole sample. The


entire subgroup is then selected.

6
COMMON SAMPLING TECHNIQUES
Non-Probability Sampling
• Convenience– the individuals who happen
to be the most accessible to the
researcher.
• Purposive – also known as judgment
sampling, involves the researcher to use
their expertise in selecting a sample that is
most useful to the research.

7
COMMON SAMPLING TECHNIQUES
Non-Probability Sampling
• Snowball – recruit participants via other
participants.
• Quota – relies on the non-random
selection of a predetermined number or
proportions of units known as quota. The
population is divided into mutually
exclusive subgroups (strata) and then
recruit samples until the quota is reached.

8
MEASURES OF CENTRAL TENDENCY
• Used to find the central value of the set of
data. It represents the entire population.
• Three measures of central tendency:
1. Mean – average of all data
2. Median – middlemost value in the ordered
arrangement of values in the dataset
3. Mode – most frequently occurring
item/value in a dataset.

9
MEASURES OF CENTRAL TENDENCY
Mean

10
MEASURES OF CENTRAL TENDENCY
Median
• The middle value of a set of observations arranged in increasing or
decreasing order of magnitude, denoted by 𝑥.
!
• It is a positional value unaffected by the presence of extreme values
or outliers. Thus, it is more appropriate to report than the mean
when outliers are present.
ODD EVEN

11
MEASURES OF CENTRAL TENDENCY
Mode
• The value that occurs the greatest number of times or with the
highest frequency.
• Appropriate for nominal or categorical type of data.
• If observations occur with equal frequencies, then there is no modal
value for the data set.

12
MEASURES OF CENTRAL TENDENCY
Exercise
A student listed 10 of his classmates' scores in GE 4 prelims:
34, 42, 27, 34, 45, 32, 31, 44, 33 and 31.

Calculate the following:


a. Mean
b. Median
c. Mode

13
MEASURES OF CENTRAL TENDENCY
Exercise: 34, 42, 27, 34, 45, 32, 31, 44, 33 and 31.
Calculate the following: 27, 31, 31, 32, 33, 34, 34, 42, 44, 45.

34 + 42 + 27 + 34 + 45 + 31 + 44 + 33 + 31
a. Mean 𝑥̅ = = 35.30
10

33 + 34
b. Median 𝑥. = = 33.50
2

c. Mode Mode = 31, 34


14
MEASURES OF CENTRAL TENDENCY
Exercise
Given the following grades in Math:
89, 79, 87, 97, 94, 80, 85

Calculate the following:


a. Mean
b. Median
c. Mode

15
MEASURES OF CENTRAL TENDENCY
Exercise: 89, 79, 87, 97, 94, 80, 85
Calculate the following: 79, 80, 85, 87, 89, 94, 97

89 + 79 + 87 + 97 + 94 + 80 + 85
a. Mean 𝑥̅ = = 87.29
7

b. Median 𝑥. = 87

c. Mode Mode = No mode


16
MEASURES OF RELATIVE POSITION
• The significance of one observed value in a data set strongly
depends on how the value compares to other observed values in a
data set.
• The data must be ordered from smallest to largest.
• Three measures of relative position:
1. Percentiles – divides ordered data into hundredths
2. Quartiles – data set divided into quarters
3. Deciles – data is separated into 10 groups

17
MEASURES OF RELATIVE POSITION
1. Percentiles
where:
N – total number of data
2. Quartiles k – kth position

The calculated number is the


3. Deciles position of the desired value.

18
MEASURES OF RELATIVE POSITION
Exercise SCORES
34
Given the following scores in an Algebra test:
42
27
34
Calculate the following:
45
a. Q3 32
31
b. D6
44
c. P40 33
31

19
MEASURES OF RELATIVE POSITION
Exercise R SCORES
1 27
3(10) 𝑄! = 34 + 0.5(42 − 34)
a.) 𝑄! = = 7.5 2 31
4 𝑄! = 38 3 31
4 32
6(10) 5 33
b.) 𝐷" = =6 𝐷" = 34
10 6 34
7 34
40(10) 8 42
c.) 𝑃#$ = =4 𝑃#$ = 32 9 44
100
10 45

20
MEASURES OF RELATIVE POSITION
Exercise
Given the following grades in Statistics:
SCORES
89
Calculate the following: 79
87
a. Q1
97
b. D8 94
80
c. P65
85

21
MEASURES OF RELATIVE POSITION
Exercise

1(7) 𝑄% = 79 + 0.75(80 − 79)


a.) 𝑄% = = 1.75 R SCORES
4 𝑄% = 79.75 1 79
2 80
8(7) 𝐷" = 89 + 0.6(94 − 89)
b.) 𝐷& = = 5.6 3 85
10 4 87
𝐷" = 92
5 89
65(7) 𝑃 = 87 + 0.55(89 − 87) 6 94
c.) 𝑃"' = = 4.55 "'
100 7 97
𝑃"' = 88.10

22
MEASURES OF RELATIVE POSITION
• The percentile cut the data into two
Ø P% of the data that lie below it.
Ø (100-P)% of the data that live above it.
• Quartiles are three numbers, Q1, Q2, and
Q3 that divide the data set into fourths.
• The median is Q2 which divides the data
into lower and upper sets.
• Q1 is the median of the lower set and Q3
Q2 = D5 = P50
is the median of the upper set.

23
MEASURES OF RELATIVE POSITION
• The interquartile range (IQR) describes the difference between Q3 and Q1
and can help determine potential outliers if it is 1.5 IQR below the first
quartile and 1.5 IQR above the third quartile. Outliers can be errors or some
abnormality in the data set.
Example:
Below are 11 salaries in US dollars. Calculate the IQR and determine any
potential outliers.
$33,000; $64,500; $28,000; $54,000; $72,000; $68,500;
$69,000; $42,000; $54,000; $120,000; $40,500

24
MEASURES OF RELATIVE POSITION
Example: Arrange in order: $28,000; $33,000; $40,500; $42,000; $54,000;
$54,000; $64,500; $68,500; $69,000; $72,000; $120,000
Median or Q2 = $54,000
Q1 = $40,500 Q3 = $69,000
IQR = Q3 – Q1 = $69,000 - $40,500 = $28,500
1.5 (IQR) = (1.5)($28,500) = $42,750
Q1 – 1.5 (IQR) = $40,500 – $42,750 = -$2,250
Q3 + 1.5 (IQR) = $69,000 + $42,750 = $111,750
Thus, $120,000 is a potential outlier.

25
BOX AND WHISKERS PLOT

• A box and whisker plot (or box plot) is a


convenient way of visually displaying the data
distribution through their quartiles.
• Lines extending parallel from the boxes are
known as the “whiskers” which indicate
variability outside the upper and lower quartiles.
• Outliers are plotted as individual dots that are
in-line with whiskers.
• It can be useful when comparing distributions
between many groups or datasets.

26
BOX AND WHISKERS PLOT
• A five-number summary is used to construct a box plot:
{Xmin, Q1, Q2, Q3, Xmax}
• Each number is represented by a vertical line segment. A box is formed using
the segments Q1 and Q3 as its two vertical sides and two horizontal lines are
extended from the vertical segments marking Q1 and Q3 to the adjacent
extreme values.

27
BOX AND WHISKERS PLOT
Example:
Construct a box plot with the given five-number summary:
Xmin = 1.39
Q1 = 1.90
Q2 = 2.62
Q3 = 3.33
Xmax = 4.00

28
BOX AND WHISKERS PLOT
Example: Given the data on the table, construct a box plot.

R SCORES
Xmin = 79 1 79
Q1 = 80 2 80
3 85
Q2 = 87
4 87
Q3 = 94 5 89
Xmax = 97 6 94
7 97

29
MEASURES OF DISPERSION/VARIABILITY
• Measures how the data spreads from the central location.
• Used in comparing two sets of data. The lesser the value is, the closer the values
of observation from the central values.
Four measures of dispersion/variability:
1. Range – difference between highest and lowest
2. Variance – considers deviation of each observation from the mean
3. Standard Deviation (SD) – positive square root of the variance.
4. Mean absolute deviation (MAD) – average distance between each data point
from the mean.

30
MEASURES OF DISPERSION/VARIABILITY
Range
• The difference between the highest and the lowest value.

Where:
R – range
HV – highest value
LV – lowest value

31
MEASURES OF DISPERSION/VARIABILITY
Variance
• Considers the deviation of each observation from the mean.
• The formulas are given below:

Population Variance

32
MEASURES OF DISPERSION/VARIABILITY
Variance
• Considers the deviation of each observation from the mean.
• The formulas are given below:

Sample Variance

33
MEASURES OF DISPERSION/VARIABILITY
Standard Deviation (SD)
• Positive square root of the variance.

34
MEASURES OF DISPERSION/VARIABILITY
Mean Absolute Deviation (MAD)
• Average distance between each data point and the mean. It gives the
idea on the variability in a data set.

where
xi – set of observation or data
𝑥 – mean
n – sample size or total number of
observations

35
MEASURES OF DISPERSION/VARIABILITY
Example: Consider the test results from a sample of students in a class.
SCORES
Calculate the following:
34
42
27 a. Range
34
45
b. Mean Absolute Deviation
32 c. Variance
31
d. Standard Deviation
44
33
31

36
MEASURES OF DISPERSION/VARIABILITY
Example: Consider the test results from a sample of students in a class.
SCORES Calculate the following:
34
42 a. Range
27
34
45
32
31 R = 45 – 27 = 18
44
33
31

37
MEASURES OF DISPERSION/VARIABILITY
Example: Consider the test results from a sample of students in a class.
xi |xi - 𝑥 | Calculate the following:
34 1.3
42 6.7 b. Mean Absolute Deviation
27 8.3
34 1.3
45 9.7
32 3.3 Mean = 𝑥 = 35.30
31 4.3
= 50.2
44 8.7
33 2.3
31 4.3 = 5.02

38
MEASURES OF DISPERSION/VARIABILITY
Example: Consider the test results from a sample of students in a class.
xi |xi - 𝑥 | (xi - 𝑥 )2
34 1.3 1.69 Calculate the following:
42 6.7 44.89 c. Variance
27 8.3 68.89
34 1.3 1.69
45 9.7 94.09
32 3.3 10.89
d. Standard Deviation
31 4.3 18.49
44 8.7 75.69
33 2.3 5.29
31 4.3 18.49

39
MEAN AND STANDARD DEVIATION
For a normal distribution (bell-shaped and symmetric):
• ~ 68% of the data is within 1 SD
of the mean.
• ~ 95% of the data is within 2 SD
of the mean.
• ~99.7% of the data is within 3 SD
of the mean.

68-95-99.7 RULE

40

You might also like