0% found this document useful (0 votes)
7 views58 pages

Bsem 34 Chapter 1 Complete

This document provides an overview of advanced statistics, covering definitions, types of data, population and sample concepts, and measurement scales. It explains descriptive and inferential statistics, as well as methods for data collection and presentation, including frequency distribution tables. Additionally, it discusses measures of central tendency such as mean, median, and mode, along with their calculations for both ungrouped and grouped data.

Uploaded by

Chrisha Sarah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views58 pages

Bsem 34 Chapter 1 Complete

This document provides an overview of advanced statistics, covering definitions, types of data, population and sample concepts, and measurement scales. It explains descriptive and inferential statistics, as well as methods for data collection and presentation, including frequency distribution tables. Additionally, it discusses measures of central tendency such as mean, median, and mode, along with their calculations for both ungrouped and grouped data.

Uploaded by

Chrisha Sarah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

BSEM 34

Advanced
Statistics
Prepared by: Samantha N. Ame, LPT
01
Introduction
-definition
-nature of statistics
-population and sample
-variables and measurements
Statistics
Plural Sense - statistics are numbers being studied /data
themselves/ or numbers derived from the data

Singular Sense – is the science of the development of


applications of the most effective methods for planning
experiments, collecting, organizing, summarizing,
presenting, analyzing, interpreting, and drawing conclusions
based on the data.
Types of Data
Qualitative Quantitative
The data values are The data values are counts or
non-numeric categories numerical measurements. It
can either be discrete or
Example: continuous
Gender
Blood type Discrete: number of buildings,
Open-ended surveys population, number of books
Interviews Continuous: temperature,
weight, time
Nature of Statistics
● Statistics is descriptive: It summarizes
the characteristics of data by
calculating measures like mean, median,
and mode.

● Statistics is inferential: It allows us to


draw conclusions about a larger
population based on a smaller sample.

● Statistics is probabilistic: It helps us


understand the likelihood of events
occurring, acknowledging there's always
some level of uncertainty.
Population and
Sample
● Population: The entire collection of
individuals or items we are interested in
studying. (Imagine all students in a
university)
● Sample: A subset of the population
chosen to represent the larger group.
(Imagine a group of students selected
for a survey)
Types of Samples
Non-probability samples- the Probability samples- the
samples are drawn from the samples are drawn from the
population deliberately. population by following set of
rules wherein each unit in the
population has an equal chance
to be chosen as a sample.
Convenience sampling
Judgement sampling Simple Random Sampling
Snowball sampling Stratified Random Sampling
Purposive sampling Systematic Sampling
Quota sampling Cluster Sampling
Variables and Measurements

Variables Measurements
A characteristic that can take on The specific value assigned to a
different values within a variable for a particular individual
population or sample. (e.g., or item. (e.g., John's height is 180
Height, weight, income) cm)
SCALES OF MEASUREMENTS/ HIERARCHY
OF MEASUREMENTS

NOMINAL/ CLASSIFICATORY
- lowest in the hierarchy
-assigns names/labels of unspecified value (arbitrary) or
order
-brands, gender, color, religion
SCALES OF MEASUREMENTS/ HIERARCHY
OF MEASUREMENTS

ORDINAL/ RANKING
-has characteristics of nominal but can be ordered
- can be classified which is superior with which
- relative position of one case is known
- honor roll, rating of best movies, typhoon signal number
Weakness: we can’t measure the degree of the ranking; we don’t
know the difference from each point in ranking
SCALES OF MEASUREMENTS/ HIERARCHY
OF MEASUREMENTS

INTERVAL zero has a value


- Categories can now be numerically differentiated
- Describes a variable whose attributes are ranked and
have equal distances between adjacent attributes.
(distance of positions are equal)
- IQ score, age, temperature (celsius and fahrenheit)
***weight is not an interval scale
SCALES OF MEASUREMENTS/ HIERARCHY
OF MEASUREMENTS

RATIO zero means absence


- Highest level of measurement
- Manipulation of the interval scale because they add a true zero
point to an interval scale.

- birth rate, unemployment rate, weight, height,


distance, temperature in Kelvin (0 K= no thermal
energy)
SCALES OF MEASUREMENTS
1. Number of eggs laid by chickens
2. Amount of fertilizer given to plants
3. Weight of Pechay
4. Speed of car
5. Tomato plant variety
6. Color of alcohol packaging
7. Educational level of parents (high school grad, college grad, MS,
PhD)
8. Online seller satisfaction rating (1- 5 stars)
9. Number of Covid-19 positive cases
10. Test scores
02
Collection and
Presentation
of Data
DIFFERENT WAYS TO PRESENT DATA
Textual presentations are best suited for presenting complex
data and ideas in a detailed and nuanced way.
Tabular presentations are best suited for presenting large
amounts of data in a clear and organized way.
Graphical presentations are best suited for visualizing
patterns and trends in the data in a way that is easy to
understand.
found in column heads which describe the data in each column

First column on the left of the table which describes the data on the given row

Source Note: International Telecommunication Union. (2023). ICT Facts and


Figures 2023. https://www.itu.int/pub/D-PREF-BB.REG_OUT01-2023/en
WAYS OF ORGANIZING COLLECTED DATA
ARRAY- arrangement of raw data according to order of magnitude
- used if data set is small and well- defined

FREQUENCY DISTRIBUTION TABLE- condensed version of array


-categorizes data into intervals/ classes
- used if dataset is large
- order of data is not important; counts >>> sequence

Imagine you have a collection of books. An array is like lining them up on a shelf, where
you can easily grab any specific book by its position. A frequency table is like sorting them
by genre, giving you an overall picture of how many horror, romance, and non-fiction books
you have, without needing to know the exact order on the shelf.
STEPS IN CONSTRUCTING FDT
Determine the
RANGE CLASS SIZE
Range= |highest C= R/K CLASS INTERVALS
value- lowest value| AND FREQUENCY

01 02 03 04 05

NUMBER OF CLASSES LOWER LIMIT


Lowest value is always
K=1+ 3.322 log N the lowest limit
Where N= total number of
values/ population
STEPS IN CONSTRUCTING FDT
RELATIVE FREQUENCY
CLASS MARK
(Frequency of a value or CUMULATIVE FREQUENCY
-average of the upper limit and group) / (Total number of tells you how many observations
lower limit data) * 100% lie above or below a certain value
relative frequency always adds up to in a dataset.
(Lower limit+ upper limit) ÷2 100% for the entire data set.

06 07 08 09 10

CLASS BOUNDARIES
< CF > CF
start with the frequency start with the total
of the 1st class then add number of frequency
lower limit - 0.5 up the frequencies of then subtract to the
upper limit + 0.5 each class frequency of the first
class, and so on…
STEPS IN CONSTRUCTING FDT
RELATIVE FREQUENCY
CLASS MARK
(Frequency of a value or CUMULATIVE FREQUENCY
-average of the upper limit and group) / (Total number of tells you how many observations
lower limit data) * 100% lie above or below a certain value
relative frequency always adds up to in a dataset.
(Lower limit+ upper limit) ÷2 100% for the entire data set.

06 07 08 09 10

CLASS BOUNDARIES
< CF > CF
start with the frequency start with the total
of the 1st class then add number of frequency
lower limit - 0.5 up the frequencies of then subtract to the
upper limit + 0.5 each class frequency of the first
class, and so on…
Raw scores of students in a 200 item test
144 112 156 122 168 172 141 159 127 154
156 145 134 137 123 149 144 160 136 139
142 138 159 151 147 150 126 152 147 136
135 132 146 133 150 122 139 149 152 129
131 155 116 140 145 135 160 125 172 163

1. Range= |172-112|= 60
2. K= 1+ 3.322 log 50 = 6.643978 ≈ 7 ; always round up to the next whole number
3. Class size= R ÷ K= 60/ 7= 8.571428571 ≈ 9 ; always round up to the next whole (ODD) number
4. The lower limit the the lowest value from the raw dataset which is 112
5. The class interval will start at 112 and the next class will start at 121 because 112+ 9 (class size)
CLASS f Class Class Bound. Rel. Freq. <CF >CF
Mark
Upper lim.
Lower limit
112- 120 2

121- 129 7

130- 138 10

139- 147 12

148- 156 11

157- 165 5

166- 174 3

Total 50
Class mark is the average of the upper limit and
lower limit
CLASS f Class
Mark (Lower limit+ upper limit) ÷2
112-120 2 116 112 +120= 232/2=116

121-129 7 125 121+129=250/2=125

130-138 10 134 130+138=268/2 =134

139-147 12 143 139+147= 286/2= 143

148-156 11 152 148+156 = 304/2 = 152

157-165 5 161 157 + 165= 322/2 = 161

166-174 3 170 166+ 174= 340/2 = 170

Total 50
lower limit - 0.5
CLASS Class Bound. upper limit + 0.5

112-120 111.5-120.5 112- 0.5= 111.5 120+0.5= 120.5


121-129 120.5-129.5
120.5 - 0.5= 120.5 129+ 0.5 = 129.5
130-138 129.5-138.5
. .
139-147 138.5-147.5
. .
148-156 147.5-156.5 . .

157-165 156.5-165.5

166-174 166.5-174.5

Total
(Frequency of a value or group) / (Total number of data) * 100%
CLASS f Rel. Freq.
or you can just move 2 decimal places to the right so
112-120 2 4 2/ 50 = 0.04 * 100= 4% you dont have to multiply it by 100

121-129 7 14 7/ 50 = 0.14 * 100= 14

130-138 10 20 10/50= 0.20 *100 = 20


139-147 12 24
12/50= 0.24 *100 = 24
148-156 11 22

157-165 5 10

166-174 3 6
relative frequency always adds up to 100% for the entire data set.

Total 50 100 4+14+20+24+2210+6+10= 100


start with the frequency of the 1st class then
CLASS f <CF add up the frequencies of each class

112-120 2 2 2

121-129 7 9 2+ 7 =9

130-138 10 19 9+10=19

139-147 12 31 19+12= 31

148-156 11 42 31+11 = 42
157-165 5 47
42 + 5 = 47
166-174 3 50
47+ 3 = 50
Total 50
The last row should be the same as the total
frequency
start with the total number of frequency then
CLASS f >CF subtract to the frequency of the first class, and so
on…
112-120 2 50 50

121-129 7 48 50-2 = 48

130-138 10 41 48-7 =41

139-147 12 31 41-10 = 31

148-156 11 19 31-1 2 = 19
157-165 5 8
19-11 = 8
166-174 3 3
8-5 =3
Total 50
The last row should be the same as the frequency of
its class, hence their difference should be zero
DESCRIPTIVE STATISTICS:
MEASURES OF CENTRAL TENDENCY
Measures of Central Tendency
- These are the points which scores tend to cluster– the center of concentration
of the scores.
- They give single numbers which describe the totality of the collected data

Mean The average

Median The middlemost

Mode The most frequent

mean is calculated, median is found, mode is observed


MEAN
- It is the most popular measure of central tendency
- Affected by Extremes: The mean is sensitive to outliers because it includes
every value in its calculation.
- Best for Symmetrical Data: When data is evenly distributed, the mean
often represents the typical value well.
- mu (μ) for population mean and x bar for sample mean

For ungrouped data: sum of all the values in a data ÷ number of values
For grouped data: sum of the multiplied classmark and frequency of all classes,
divided by the total frequency/population
For ungrouped data: sum of all the values in a data ÷
number of values

The following are the scores of seven students during the


final examination. Compute for the mean score.

89, 75, 90, 85, 78, 87, and 80

x̄= (89+75+90+85+78+87+80) ÷ 7 = 83.43


For grouped data: sum of the multiplied classmark and frequency
of all classes, divided by the total frequency/population
CLASS f C.M. (X) f iX i

5-9 7 7 49

10-14 10 12 120 = 1273÷ 64


15-19 13 17 221 = 19.89

20-24 18 22 396

25-29 8 27 216

30-34 5 32 160

35-39 3 37 111

64 1273
A weighted mean is a type of average that assigns different weights to
different data points. (process is same as computing mean for grouped data)

Subject units Grade W iX i

GNED 03 3 2.25 6.750

GNED 05 3 1.75 5.250 = 29.5÷ 17


GNED 11 3 2.00 6.000 = 1.74

EDUC 50 3 2.00 6.000

EDUC 60 3 1.50 3.000

FITT 1 2 1.25 2.500

17 29.5
MEDIAN
- It's the middle value when the data is arranged in ascending order.
- Resistant to Outliers: The median is less affected by outliers because it only depends on
the middle value, not all values.
- Best for Skewed Data: When data is skewed, the median is often a better measure of
central tendency than the mean.
- Denoted by Md

For ungrouped data: if odd, take the middle value; if even, take the average of 2 middle val.
For grouped data:
For ungrouped data: ALWAYS ARRANGE IN ASCENDING ORDER
If odd: just take the middlemost value
If even: take the average of two middle values

1. 149, 161, 150, 163, 152, 165, 155, 170, 160, 175
2. 89, 75, 90, 85, 78, 87, 80
For grouped data: Determine the class that contains the middle value. This
is the class where the cumulative frequency first reaches or exceeds half of the
total number of observations (n/2).
MODE
- Most Frequent Value: It's the value that occurs most often in the data set.
- Can Have Multiple Modes: A dataset can have no mode (if no value repeats), one mode,
or multiple modes (multiple values with the same highest frequency).
- Best for Categorical Data: The mode is often used for categorical data (like colors or
brands) where a numerical mean or median might not make sense.
- Denoted by Mo

For ungrouped data: observe the value that occurs most often.
For grouped data:
For ungrouped data:

1. 1, 2, 2, 2, 8, 1, 4, 10
2. 170, 165, 155, 160, 150
3. Color of cars owned by faculty: 40 white, 2 blue, 10 red
For grouped data: refer to your book (page 100)
Test Scores

87, 65, 92, 78, 80, 73, 88, 60, 95, 81,
70, 84, 68, 75, 90, 72, 85, 79, 93, 67,
82, 74, 91, 76, 89, 63, 86, 71, 94, 77,
83, 66, 96, 69, 62, 97, 61, 98, 64, 99
Create a frequency distribution table out of this
dataset and identify the mean, median, and mode
DESCRIPTIVE STATISTICS:
MEASURES OF DISPERSION
Ungrouped data:
RANGE R= |HV- LV|

VARIANCE

STANDARD
DEVIATION
example
Scores of 7 students in a quiz: 4, 7, 8, 2, 2, 9, 3

Range: R= |HV- LV|

Variance:

Standard Deviation:
example
Scores of students in a quiz: 4, 7, 8, 2, 2, 8, 9, 2, 5, 7

Range: R= |HV- LV|

Variance:

Standard Deviation:
grouped data:
RANGE
R= |ULHC- LLLC|

ULHC- Upper Limit of the Highest Class


LLLC- Lower Limit of the Lowest Class
grouped data:
VARIANCE
Range:
Variance:
Standard Deviation:
Coefficient of Variance
Also called as relative standard deviation
It expresses the standard deviation of your data as a percentage of the mean.
This allows you to compare the variability of two datasets, even if they’re measured in
different units.

CV= (standard deviation/mean)*100


A higher CV indicates that your data points are further spread out from the mean, v.v.

It should only be used for data measured on a ratio scale. This means the scale has a true
zero, so dividing by the mean makes sense.
Coefficient of Variance
You're a teacher who wants to compare the variability of scores on a recent science exam in
two different classes (Class A and Class B).

Class A Class B

Mean 54.9 46.9


Standard Deviation 12.1 10.8
Coefficient of Variation 22.04% 23.03%

Lower CV= data points are clustered to the mean


Higher CV= data points are further spread out from the mean
Measures of
Position
are values below which a
specified fraction or percentage
of the observations in a given set
must fall.
Percentile
Percentiles are values that divide a set of observations in an array into 100 equal parts.
Thus,

P1, read as first percentile, is the value below which 1% of the values fall.

P2, read as second percentile, is the value below which 2% of the values fall.

P99, read as ninety-ninth percentile, is the value below which 99% of the
values fall.
Decile
Deciles - are values that divide the array into 10 equal parts. Thus,

D1, read as first decile, is the value below which 10% of the values fall.

D2, read as second decile, is the value below which 20% of the values fall.
.
.
.
D9, read as ninth decile, is the value below which 90% of the values fall.
Quartile
Quartiles - are values that divide the array into 4 equal parts. Thus,

Q1, read as first quartile, is the value below which 25% of the values fall.

Q2, read as second quartile, is the value below which 50% of the values fall.

Q3, read as third quartile, is the value below which 75% of the values fall.
P10= D1
P20= D2
P25 =Q1
P30= D3
P40= D4
P50= D5 = Q2
P60= D6
P70= D7
P75 =Q3
P80= D8
P90= D9
P100=D10 = Q4
Find P10, P25, P50, D1, D5, Q1, Q2
The scores of ten students in a 20-point Math quiz are as follows:
6, 12, 18, 8, 9, 10, 9, 15, 17, 15 (P*N/100)
(D*N/10)
(Q*N/4)

1. Arrange in ascending order: 6, 8, 9, 9, 10, 12, 15, 15, 17, 18

2. Find P10= 10*10/ 100 = 1 ; value is located on the 1st position which is 6. This means
that 10% of the students got scores equal or below 6, and 90% got equal or above the score 6

3. Find P25= 25*10/100= 2.5≈ 3(always round up to the next whole number if decimal)
value is located on the 3rd position from the dataset which is 9. This means that 25%
of the students got scores equal or below 9,
Find P30, P50, D1, D5, Q1, Q2

50 57 63 69 72 74 77 80 82 84 87

50 59 65 69 72 75 77 80 82 84 87

50 59 66 69 72 75 77 80 82 85 88

50 60 66 69 72 75 77 81 83 86 89

50 60 68 70 73 75 78 81 83 86 89

50 60 68 71 73 75 79 81 84 86 91
Thanks!
Do you have any questions?

CREDITS: This presentation template was created by Slidesgo, and


includes icons by Flaticon, and infographics & images by Freepik

You might also like