Bsem 34 Chapter 1 Complete
Bsem 34 Chapter 1 Complete
Advanced
Statistics
Prepared by: Samantha N. Ame, LPT
01
Introduction
-definition
-nature of statistics
-population and sample
-variables and measurements
Statistics
Plural Sense - statistics are numbers being studied /data
themselves/ or numbers derived from the data
Variables Measurements
A characteristic that can take on The specific value assigned to a
different values within a variable for a particular individual
population or sample. (e.g., or item. (e.g., John's height is 180
Height, weight, income) cm)
SCALES OF MEASUREMENTS/ HIERARCHY
OF MEASUREMENTS
NOMINAL/ CLASSIFICATORY
- lowest in the hierarchy
-assigns names/labels of unspecified value (arbitrary) or
order
-brands, gender, color, religion
SCALES OF MEASUREMENTS/ HIERARCHY
OF MEASUREMENTS
ORDINAL/ RANKING
-has characteristics of nominal but can be ordered
- can be classified which is superior with which
- relative position of one case is known
- honor roll, rating of best movies, typhoon signal number
Weakness: we can’t measure the degree of the ranking; we don’t
know the difference from each point in ranking
SCALES OF MEASUREMENTS/ HIERARCHY
OF MEASUREMENTS
First column on the left of the table which describes the data on the given row
Imagine you have a collection of books. An array is like lining them up on a shelf, where
you can easily grab any specific book by its position. A frequency table is like sorting them
by genre, giving you an overall picture of how many horror, romance, and non-fiction books
you have, without needing to know the exact order on the shelf.
STEPS IN CONSTRUCTING FDT
Determine the
RANGE CLASS SIZE
Range= |highest C= R/K CLASS INTERVALS
value- lowest value| AND FREQUENCY
01 02 03 04 05
06 07 08 09 10
CLASS BOUNDARIES
< CF > CF
start with the frequency start with the total
of the 1st class then add number of frequency
lower limit - 0.5 up the frequencies of then subtract to the
upper limit + 0.5 each class frequency of the first
class, and so on…
STEPS IN CONSTRUCTING FDT
RELATIVE FREQUENCY
CLASS MARK
(Frequency of a value or CUMULATIVE FREQUENCY
-average of the upper limit and group) / (Total number of tells you how many observations
lower limit data) * 100% lie above or below a certain value
relative frequency always adds up to in a dataset.
(Lower limit+ upper limit) ÷2 100% for the entire data set.
06 07 08 09 10
CLASS BOUNDARIES
< CF > CF
start with the frequency start with the total
of the 1st class then add number of frequency
lower limit - 0.5 up the frequencies of then subtract to the
upper limit + 0.5 each class frequency of the first
class, and so on…
Raw scores of students in a 200 item test
144 112 156 122 168 172 141 159 127 154
156 145 134 137 123 149 144 160 136 139
142 138 159 151 147 150 126 152 147 136
135 132 146 133 150 122 139 149 152 129
131 155 116 140 145 135 160 125 172 163
1. Range= |172-112|= 60
2. K= 1+ 3.322 log 50 = 6.643978 ≈ 7 ; always round up to the next whole number
3. Class size= R ÷ K= 60/ 7= 8.571428571 ≈ 9 ; always round up to the next whole (ODD) number
4. The lower limit the the lowest value from the raw dataset which is 112
5. The class interval will start at 112 and the next class will start at 121 because 112+ 9 (class size)
CLASS f Class Class Bound. Rel. Freq. <CF >CF
Mark
Upper lim.
Lower limit
112- 120 2
121- 129 7
130- 138 10
139- 147 12
148- 156 11
157- 165 5
166- 174 3
Total 50
Class mark is the average of the upper limit and
lower limit
CLASS f Class
Mark (Lower limit+ upper limit) ÷2
112-120 2 116 112 +120= 232/2=116
Total 50
lower limit - 0.5
CLASS Class Bound. upper limit + 0.5
157-165 156.5-165.5
166-174 166.5-174.5
Total
(Frequency of a value or group) / (Total number of data) * 100%
CLASS f Rel. Freq.
or you can just move 2 decimal places to the right so
112-120 2 4 2/ 50 = 0.04 * 100= 4% you dont have to multiply it by 100
157-165 5 10
166-174 3 6
relative frequency always adds up to 100% for the entire data set.
112-120 2 2 2
121-129 7 9 2+ 7 =9
130-138 10 19 9+10=19
139-147 12 31 19+12= 31
148-156 11 42 31+11 = 42
157-165 5 47
42 + 5 = 47
166-174 3 50
47+ 3 = 50
Total 50
The last row should be the same as the total
frequency
start with the total number of frequency then
CLASS f >CF subtract to the frequency of the first class, and so
on…
112-120 2 50 50
121-129 7 48 50-2 = 48
139-147 12 31 41-10 = 31
148-156 11 19 31-1 2 = 19
157-165 5 8
19-11 = 8
166-174 3 3
8-5 =3
Total 50
The last row should be the same as the frequency of
its class, hence their difference should be zero
DESCRIPTIVE STATISTICS:
MEASURES OF CENTRAL TENDENCY
Measures of Central Tendency
- These are the points which scores tend to cluster– the center of concentration
of the scores.
- They give single numbers which describe the totality of the collected data
For ungrouped data: sum of all the values in a data ÷ number of values
For grouped data: sum of the multiplied classmark and frequency of all classes,
divided by the total frequency/population
For ungrouped data: sum of all the values in a data ÷
number of values
5-9 7 7 49
20-24 18 22 396
25-29 8 27 216
30-34 5 32 160
35-39 3 37 111
64 1273
A weighted mean is a type of average that assigns different weights to
different data points. (process is same as computing mean for grouped data)
17 29.5
MEDIAN
- It's the middle value when the data is arranged in ascending order.
- Resistant to Outliers: The median is less affected by outliers because it only depends on
the middle value, not all values.
- Best for Skewed Data: When data is skewed, the median is often a better measure of
central tendency than the mean.
- Denoted by Md
For ungrouped data: if odd, take the middle value; if even, take the average of 2 middle val.
For grouped data:
For ungrouped data: ALWAYS ARRANGE IN ASCENDING ORDER
If odd: just take the middlemost value
If even: take the average of two middle values
1. 149, 161, 150, 163, 152, 165, 155, 170, 160, 175
2. 89, 75, 90, 85, 78, 87, 80
For grouped data: Determine the class that contains the middle value. This
is the class where the cumulative frequency first reaches or exceeds half of the
total number of observations (n/2).
MODE
- Most Frequent Value: It's the value that occurs most often in the data set.
- Can Have Multiple Modes: A dataset can have no mode (if no value repeats), one mode,
or multiple modes (multiple values with the same highest frequency).
- Best for Categorical Data: The mode is often used for categorical data (like colors or
brands) where a numerical mean or median might not make sense.
- Denoted by Mo
For ungrouped data: observe the value that occurs most often.
For grouped data:
For ungrouped data:
1. 1, 2, 2, 2, 8, 1, 4, 10
2. 170, 165, 155, 160, 150
3. Color of cars owned by faculty: 40 white, 2 blue, 10 red
For grouped data: refer to your book (page 100)
Test Scores
87, 65, 92, 78, 80, 73, 88, 60, 95, 81,
70, 84, 68, 75, 90, 72, 85, 79, 93, 67,
82, 74, 91, 76, 89, 63, 86, 71, 94, 77,
83, 66, 96, 69, 62, 97, 61, 98, 64, 99
Create a frequency distribution table out of this
dataset and identify the mean, median, and mode
DESCRIPTIVE STATISTICS:
MEASURES OF DISPERSION
Ungrouped data:
RANGE R= |HV- LV|
VARIANCE
STANDARD
DEVIATION
example
Scores of 7 students in a quiz: 4, 7, 8, 2, 2, 9, 3
Variance:
Standard Deviation:
example
Scores of students in a quiz: 4, 7, 8, 2, 2, 8, 9, 2, 5, 7
Variance:
Standard Deviation:
grouped data:
RANGE
R= |ULHC- LLLC|
It should only be used for data measured on a ratio scale. This means the scale has a true
zero, so dividing by the mean makes sense.
Coefficient of Variance
You're a teacher who wants to compare the variability of scores on a recent science exam in
two different classes (Class A and Class B).
Class A Class B
P1, read as first percentile, is the value below which 1% of the values fall.
P2, read as second percentile, is the value below which 2% of the values fall.
…
P99, read as ninety-ninth percentile, is the value below which 99% of the
values fall.
Decile
Deciles - are values that divide the array into 10 equal parts. Thus,
D1, read as first decile, is the value below which 10% of the values fall.
D2, read as second decile, is the value below which 20% of the values fall.
.
.
.
D9, read as ninth decile, is the value below which 90% of the values fall.
Quartile
Quartiles - are values that divide the array into 4 equal parts. Thus,
Q1, read as first quartile, is the value below which 25% of the values fall.
Q2, read as second quartile, is the value below which 50% of the values fall.
Q3, read as third quartile, is the value below which 75% of the values fall.
P10= D1
P20= D2
P25 =Q1
P30= D3
P40= D4
P50= D5 = Q2
P60= D6
P70= D7
P75 =Q3
P80= D8
P90= D9
P100=D10 = Q4
Find P10, P25, P50, D1, D5, Q1, Q2
The scores of ten students in a 20-point Math quiz are as follows:
6, 12, 18, 8, 9, 10, 9, 15, 17, 15 (P*N/100)
(D*N/10)
(Q*N/4)
2. Find P10= 10*10/ 100 = 1 ; value is located on the 1st position which is 6. This means
that 10% of the students got scores equal or below 6, and 90% got equal or above the score 6
3. Find P25= 25*10/100= 2.5≈ 3(always round up to the next whole number if decimal)
value is located on the 3rd position from the dataset which is 9. This means that 25%
of the students got scores equal or below 9,
Find P30, P50, D1, D5, Q1, Q2
50 57 63 69 72 74 77 80 82 84 87
50 59 65 69 72 75 77 80 82 84 87
50 59 66 69 72 75 77 80 82 85 88
50 60 66 69 72 75 77 81 83 86 89
50 60 68 70 73 75 78 81 83 86 89
50 60 68 71 73 75 79 81 84 86 91
Thanks!
Do you have any questions?