0% found this document useful (0 votes)
14 views76 pages

Module 0. Review On Statistics

The document provides an overview of basic statistics, covering key concepts such as data types (quantitative and qualitative), population and sample definitions, and the importance of statistics in data collection and analysis. It distinguishes between descriptive and inferential statistics, outlines measurement types and properties, and discusses various data collection methods and sampling techniques. Additionally, it explains measures of central tendency, including mean, median, and mode, along with practical activities for applying these concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views76 pages

Module 0. Review On Statistics

The document provides an overview of basic statistics, covering key concepts such as data types (quantitative and qualitative), population and sample definitions, and the importance of statistics in data collection and analysis. It distinguishes between descriptive and inferential statistics, outlines measurement types and properties, and discusses various data collection methods and sampling techniques. Additionally, it explains measures of central tendency, including mean, median, and mode, along with practical activities for applying these concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

REVIEW ON BASIC STATISTICS

Part 1
Data (Datum)

• Items in a record or report are facts expressed


in numbers or described by their quality or
kind. These facts are called data. The major
concern of Statistics is about data and how to
deal with it.
• Ex. color of the eyes, scores, class size, height
Two Types of Data
• Quantitative
– can be measured and expressed in numbers
• Qualitative
– are facts for which no numerical measure exists
and are usually expressed in categories or kind
like: color of the skin, ideas, opinions, answers ( in
verbal form) to open-ended questions
Population, Sample:

Population - a collection of all the units


from which data is to be collected.

Sample – a subset or a representative


part of the population
Census, sampling
• Census - the process where information is
gathered from all the units in the population
• Sampling - the process where information is
gathered from only a part of the population
• The information derived from the sample is
used to make some generalizations or
inferences about the whole population.
• Errors are unavoidable when these
generalizations are made.
• The role of statistics is to provide the
procedures so as to minimize these errors.
Statistics (Singular and plural sense)
• In singular sense: Statistics is a branch of
Science that deals with the development of
methods for a more effective way of
collecting, organizing, presenting, and
analyzing data.
• In plural sense Statistics can mean the data
itself or some numerical computations derived
from the data.
2 Major Areas of Statistics

1. Descriptive Statistics
- deals largely with summary calculations, graphical
displays and describing important features of a set
of data. It does not attempt to draw
conclusions/insights about anything that pertains to
more than the data themselves.
2. Inductive / Inferential Statistics
- is concerned with making generalizations from
information gathered from a small group of
observations (sample) to a bigger group of
observations (population).
- It is equipped with an enormous number of
analytical tools that allows the investigator to grasp a
better understanding about the population from
which the sample data was gathered based on the
information that is contained only in the sample.
Measurement

– The process of assigning a number or a numerical


value to a characteristic of the object that is being
measured.
– Another way of looking at data is on the way they
are measured. Measurements are always
associated with numbers that have some
interesting properties or characteristics.
4 Properties of Numbers

1. Identity
– it enables a person to distinguish one number
from the other.
– They are identified by their shapes or the way they
are written.
– This is the simplest property of numbers.
2. Order
– it refers to the way numbers are arranged in a
sequence.
– It is an established convention that 1 comes
before 2, 2 comes before 3, and so on. We also say
that “7 is greater than 6” or “3 is less than 5”.
3. Additivity / equality of scale

– the property that allows us to add numbers.


There is equality of scale used, when we say “
3g + 5g = 8g” , we are confident that we are
correct because of equality of scale.
4. Absolute zero
– it means it has nothing of the characteristic
that is being measured. In temperature, the
characteristic that is being measured is
‘amount of heat’. A temperature of zero
degree Celsius does not mean that the object
does not have any amount of heat. A length of
zero meters absolutely means the object has
‘no length’.
Four Types of measurement
• Nominal – property: identity only
• Ordinal – properties: order and identity
• Interval – properties: equality of scale, order
and identity
• Ratio – properties: absolute zero, equality of
scale, order and identity
• A large number of statistical analysis tools are
available for each type of measurement. It is
important that the statistical user have a good
understanding of the type of data that is to be
processed in order that the chosen statistical
tool is used properly.
Types of Variables
• Variables are the characteristics or properties
measured from objects, persons or things.
• Types:
– Discrete variables: can be counted, thus assume a
value which is a whole number.
• Ex. Number of passers or failures in the LET
– Continues variables: can be measured using some
units of measurement, which may take some
decimal numbers
• Ex. Weights, heights, ages
Activity 1. Get ¼ pc of paper. Find a partner. Write
your names. Answer in the same format.(5minutes
only)

#1. What type of data are we getting from the following? Write
‘qt’ if quantitative, ‘ql’if qualitative on the space provided.
___ width ___ scores ___ mass ___ temp in oC
___ % ratings ___ gender ___ income ___ height in ft
___ dean’s list ___ preferences___ perceptions___ civil
status
2. What type of measurement are we getting from the following? Write N if
nominal, O if ordinal, I if interval, or R if ratio:
___ width ___ scores ___ mass ___ temperature in oC
___ ratings in % ___ gender ___ income ___ height in ft
___ dean’s list ___ preferences ___ perceptions ___ civil status
3.
Four Methods of Collecting Data
• By Interview
• By Questionnaires
• By Direct Observation
• By utilizing Existing Records
– published or unpublished,
– primary(first hand and have not been subjected to
some transcription or condensation) or secondary
(transcribed or compiled from original sources)
Sampling Techniques
• Doing a census, that is studying the entire
population, is not always feasible because of
limited resources like money and time .
• Oftentimes researchers resort to do sample
surveys.
• To make reliable inferences regarding the
population, from which the sample was taken, one
should select a sample that is a good
representation of the population, that is unbiased
sample.
2 major Sampling Techniques
• Probability/Random Sampling
– A kind of sample selection where each member of
the target population has a known non-zero
chance of being selected
• Non-probability /Non-random Sampling
Probability/Random Sampling

– Simple random
– Systematic
– Stratified
– Cluster
– Multi-stage
Non-probability /Non-random Sampling

– Convenience/availability
– Quota
– Purposive/Judgmental
– Snowball/Referral
– Panel
Score Data
• After checking your students’ test papers you
now have a set of data. They need to be
organized. Statistical organization of scores is a
systematic arrangement or grouping of scores.
The purpose is to determine their significant
meaning.
Presentation of data
• Tabulating
• Ordering
• Ranking
• Grouping
• Graphing
Tabulating: The talligram
• Scores:
86 74 66 70 75 56 69
70 73 66 74 81 60 76
80 81 61 67 63 68 73
63 75 71 58 72 83 69
79 67 68 64 69 73 69
78 88 62 76 72 65 66
70 73 61 78 84 77
0 1 2 3 4 5 6 7 8 9 T

8 1 11 1 1 1 1 7

7 111 1 11 1111 11 11 11 1 11 1 20

6 1 11 1 11 1 1 111 11 11 1111 19

5 1 1 2

T 5 5 3 7 4 3 7 3 6 5 48
Ordering
– The data is arranged in descending (highest to lowest)
or ascending (lowest to highest) order writing each
score as many times as it occurs. Ordered
arrangement of scores is a prerequisite to ranking of
scores
Ranking
• Arrange scores in descending order (from highest to lowest).
Write each scores as many times as it occurs in one column.
This is the first column.
• Number each score consecutively from 1 to n where n equals
the number of scores. This is the second column.
• On the third column write the ranks of each scores.
– The rank of a score occurring once is the same as its consecutive
number.
– To find the rank of a score occurring two or more times, add the first
and the last consecutive numbers of the score and divide the sum by
two. The result is the rank.
Activity 2:
• Make a talligram out of the following scores in
CPE105 final exam and find their ranks.
94 76 56 79 80 87 68 75 70
95 93 90 76 87 67 76 87 87
96 68 93 56 76 87 94 51 60
97 51 56 76 67 87 90 56 88
98 85 59 57
Grouping: Class frequency distribution
• Find the range R = Hs - Ls
• Determine/ estimate the number of
intervals/classes, k. k= √n
• Find the class width ( c ) or the width of the
interval. C = R/k
• Find the lowest limit and the other limits of the
classes.
• Tally the scores.
• Write the frequencies, class boundaries,
cumulative frequencies.
Ex: Class Frequency Distribution Table

Class Tally f Class Midpts Cum F


Interval Boundaries (cm) (<) (>)

87 – 91 1 1 86.5 – 91.5 89 48 1
82 – 86 111 3 81.5 – 86.5 84 47 4
77 – 81 1111111 7 76.5 – 81.5 79 44 11
72 – 76 111111111111 12 71.5 – 76.5 74 37 23
67 – 71 1111111111 10 66.5 – 71.5 69 25 33
62 – 66 11111111 8 61.5 – 66.5 64 15 41
57 – 61 1111111 7 56.5 – 61.5 59 _7 48_
Graphing: Graphical Presentation of Class Frequency
Distributions

• Frequency polygon: midpts, f


• Bar Chart: class limits, f
• Histogram: class boundaries, f
• Frequency Ogive: upper class boundaries, cf< (or
lower class boundaries, cf>)
• Pie Chart: frequencies- percentages - angle
• Pictograph: pictures
Descriptive Measures:

• Measures of Central Tendency


• Other Measures of location
• Measures of Variability
• Measures of Divergence from Normality
• Measures of Relationship/Association
Measures of Central Tendency
• Mean
• Median
• Mode
1. Mean – indicates the center of gravity in
the distribution & each score contributes
to its determination.
- is the most stable measure of
central tendency. But, when there are
extreme scores, it is easily and usually
affected; thus, in cases like this, the
MEDIAN is used.
The Mean (ungrouped data)

where: x are the scores,


n is the total number of scores
∑ is the symbol for summation
The Weighted Mean:

_ ∑ x i wi
X = __________
∑wi

where: xi’s are the scores,


wi’s are the weights of each
score
∑ is the symbol for
summation
The Mean (grouped data)
__
∑ fixi
X = __________
n
where: fi class frequency of the ith class
interval
xi is the class mark or the midpoint of
the ith class interval
n is the total number of scores
2. Median – is the middlemost score when
arranged from highest to lowest.
If # of scores is odd, the
median is the middlemost. If even, the
median is the average of the 2
middlemost scores.
-this is the point on a scale of
measurement above which & below
which 50% of the observations fall.
The Median (ungrouped data)

~ X(n+1)/2 if n is odd
X={
½ ( x n + xn + 1 ) if n is even
2 2

where the Xi’s are the scores arranged from lowest

to highest.
The Median (grouped data)
~
X = Lm + ( n/2 – cf<) c
f

where Lm = lower class boundary of the median class


n = total frequency or total number of observations
cf< = cumulative frequency equal to or next lower than the n/2
c = class interval
f = frequency of the median class
The Mode (ungrouped data)

- It indicates the center of


concentration in the distribution
- the most frequently occurring score
in a set of data
- the score with the highest frequency.
The Mode (grouped data)

۸
Crude mode: X = Lm + C
2
where c is the class interval and
Lm is the lower class boundary of the modal class

۸ ~
_
Refined mode X = 3X – 2X
Importance of the central measure:

1. It gives the concise description of


the performance of the group as a
whole.
2. It enables us to compare 2 or more
groups in terms of typical
performance.
Examples
1. Given the following test scores in Math 1, find the mean, median and mode.
Scores: 71, 68, 68, 58, 55, 52, 52, 45, 38, 38, 38, 30, 25, 25,
Solution:

= 71 + 68 + 68 + 58 + 55 + 52 + 52 + 45 + 38 + 38 + 38 + 30 + 25+25
14

= 663
14

= 47.36
ACTIVITY 3: BY TWOS. Find the mean, median, and
mode.

1. Ungrouped Scores

70 42 30 27
70 42 30 26
68 34 30 26
52 34 30 19
52 34 30 19
2. Grouped Scores

fi xi
85 – 89 2 87
80 – 84 1 82
75 – 79 3 77
70 - 74 4 72
65 – 69 5 67
60 – 64 7 62 Highest Score: 87
55 – 59 5 57
50 – 54 2 52 Lowest Score: 34
45 – 49 4 47
40 – 44 3 42
35 – 39 2 37
30 – 34 2 32
Other Measures of location
• Other measures of location that describe or
locate the non-central position of a set of data
are referred to as quantiles or fractiles . Most
common fractiles are known as percentiles,
deciles, and quartiles.
Percentiles
• are values that divide an ordered set of
observations into 100 equal parts denoted by
P1, P2, …, P99 such that 1% of the data falls
below P1, 2% of the data falls below P2, …,
and 99% of the data falls below P99.
Deciles
• are values that divide an ordered set of
observations into 10 equal parts denoted byD 1,
D2, …, D9 such that 10% of the data falls
below D1, 20% of the data falls below D2, …,
and 90% of the data falls below D9.
Quartiles
• are values that divide an ordered set of
observations into 4 equal parts denoted byQ 1,
Q2, and Q3 such that 25% of the data falls
below Q1, 50% of the data falls below Q2 and
75% of the data falls below Q3.
To solve for percentiles, deciles or quartiles
from ungrouped data
• 1. Arrange data in increasing order of magnitude
(ascending order).
• 2. Solve for the value of L where
L = mn / 100 for percentiles
L = mn / 10 for deciles L = mn / 4
for quartiles
where m is the location of the percentile,
decile, or quartile
n is the number of observations
3. If L is an integer, the desired fractile gets
the average of the Lth and the (L+1)th observation.
If L is fractional, the desired fractile gets the
next higher integer to find the required
location. The fractile corresponds to the value in
that location.
• Example:Find P63, D8, and Q1 in the following set of score data in
Bio 1: 95, 34, 45, 67, 56, 58, 76, 87, 91, 39, 56, 78

Solution:
The data arranged in ascending order:
34, 39, 45, 56, 56, 58, 67, 76, 78, 87, 91, 95. n = 12

P63: L = 63(12) / 100 = 7.56 → 8.


•This means that the 8th value in the set of data is
the 63rd percentile. Therefore, P63 = 76. This means
that 63% of the data falls below 76.
Data arranged in ascending order:
34, 39, 45, 56, 56, 58, 67, 76, 78, 87, 91, 95. n = 12

D8: L = 8(12) /10 = 9.6 → 10.


This means that the 10th value in the data is
the 8th decile. Therefore, D8 = 87 which means that 80%
of the data falls below 87.
Q1: L = 1(12) / 4 = 3
This means that the 1st quartile is the average
between the 3rd and the 4th value in the data. Hence, Q1
= (45 + 56) / 2 = 50.5 . This further means that 25% of
the data falls below 50.5
From Grouped Data:
• a.) Percentile:Pm = Lm + ( mn/100 – cf<) c
fm

• b.) Decile: Dm = Lm + ( mn/10 – cf<) c .


fm

• c.) Quartile: Qm = Lm + ( mn/4 – cf<) c .


fm
Example:
1.) Using the data given in Table 2, compute
for the following:
a.)P43
b.)D9
c.)Q3
Table 2.
----------------------------------------------------------------------------
Class f Class Cum Freq
Boundaries (cf<)
87 – 91 1 86.5 – 91.5 48
82 – 86 3 81.5 – 86.5 47
77 – 81 7 76.5 – 81.5 44
72 – 76 12 71.5 – 76.5 37
67 – 71 10 66.5 – 71.5 25
62 – 66 8 61.5 – 66.5 15
57 – 61 7 56.5 – 61.5 7
____________________________________________________
n = 48
Solution:
a.) P43: mn/100 = 43(48)/100 = 20.64 → class (67 – 71)
cf< = 15
Lm = 66.5
fm = 10
c=5

Hence, P43 = 66.5 + (20.64 – 15) 5 = 69.32 .


10
This means that 43% of the data falls below 69.32
b.) D9: mn/10 = 9(48) /10 = 43.2 → class (77 -81)
cf< = 37
Lm = 76.5
fm = 7
c=5

Hence, D9 = 76.5 + (43.2 – 37) 5 = 80.93


7
This means that 90% of the data falls below 80.93
c.) Q3: mn/4 = 3(48)/4 = 36 → class (72 – 76)
cf< = 25
Lm = 71.5
fm = 12
c=5

Hence, Q3 = 71.5 + (36 – 25) 5 = 76.08


12

This means that 75 % of the data falls below 76.08


MEASURES OF VARIABILITY
1. Range: R = Hs-Ls
2. Quartile Deviation: Q = (Q3 – Q1 )/ 2
3. Variance: s2 = Σ(x- x¯ )2 /n-1
4. Standard Deviation: s = √s2
- this is the most stable measure of
variability. When there are extreme scores ,
we use the Quartile Deviation to measure
spread or variability.
5. Coefficient of Variation: CV = s /x¯
Ex. Measures of Dispersion/Variability
50 Boys 50 Girls Interpretations
_ _
X= 34.6 X = 34.5 No significant difference

R = 51 – 15 R = 45 – 19 The boys’ scores are more


variable than the girls’.

= 36 = 26 The boys’ scores are more


heterogeneous while
the girls are more
homogeneous.
Activity 4 : By twos
Given the following sets of score data,
History 5 sec A : 54, 45, 67, 89, 68, 53, 78
History 5 Sec B: 75, 46, 79, 56, 67, 35, 56 :

1. Find the mean of scores. Which section is


performing better than the other?
2. Find the standard deviation of scores.
Which section is more variable than the
other?
Measures of Divergence from Normality

1. Skewness – a distribution is said to be skewed


when it is bent to the right or to the left; that
is, when the different points in the
distribution and the center of gravity is
shifted to one side or the other.
Reference point is 0. If index of skewness
is nearer to 0, the more normal is the
distribution, the further away from 0 the
more skewed is the distribution.
a. Negatively skewed – if the scores pile at the upper – score
end, the curve is bent to the left. It represents a bright
class,

or a very easy test.


c. Positively skewed – the scores pile at the
lower-score end, the curve is bent to the
right – representing a dull class

or a very difficult test


2. Kurtosis – refers to the “peakedness” or flatness of a
frequency distribution as compared with the normal.
a. Leptokurtic – more peaked than
normal
b. Platykurtic – flatter than normal
c. Mesokurtic – when normal
NORMAL CURVE
Measures of Correlation/Association
• Spearman Rank Correlation
• Pearson r
• Spearman Brown Formula
• Kuder-Richardson Formula 20/21

You might also like