Intro To Variability
Intro To Variability
A Course Presentation in
STATISTICS
Course Content
• Basic Concepts
• Measures of Central Tendency
• Measures of Variability
• Measures of Correlation
• Non-parametric Test
• Parametric Tests of Hypothesis
• One-Way ANOVA
Introduction to
Statistics
• Definition
• The Nature of Data
• Uses of Statistics
• Methods of Sampling
• Statistics and Computers
What is
Statistics?
Statistics
Definition
Statistics is a collection of
methods for planning
experiments, obtaining data and
then organizing, summarizing,
presenting, analyzing,
interpreting and drawing
conclusions based on the data.
Nature of Data
Population vs. Sample
A population is the complete
collection of elements (scores,
people, measurements, and so on)
to be studied.
A sample is a sub-collection of
elements drawn from a population.
Parameter vs. Statistic
A parameter is a numerical
measurement describing some
characteristic of a population.
A statistic is a numerical
measurement describing some
characteristic of a sample.
Qualitative vs.
Quantitative Data
Qualitative (or categorical or
attribute) data can be separated into
different categories that are
distinguished by some non-numerical
characteristic.
Quantitative data consist of numbers
representing counts or measurements.
Discrete vs. Continuous Data
Discrete data result from either a
finite number of possible values or a
countable number of possible values.
(That is, the number of possible
values is 0, 1, 2 or more)
Continuous data result from
infinitely many possible values that
can be associated with points on a
continuous scale in such a way that
there are no gaps or interruptions.
Levels of Measurement
Nominal Level of
Measurement
The nominal level of measurement is
characterized by data that consists
of names, labels, or categories only.
The data cannot be arranged in an
ordering scheme.
Example: gender, civil status,
nationality, religion, etc.
Ordinal Level of
Measurement
The ordinal level of measurement
involves data that may be arranged in
some order, but differences between
data values either cannot be
determined or are meaningless.
Example: good, better or best
speakers; 1 star, 2 star, 3 star movie;
employee rank
Interval Level of
Measurement
The interval level of measurement is
like the ordinal level, with the
additional property that meaningful
amounts of differences between data
can be determined. However, there
are no inherent (natural) zero starting
point.
Example: body temperature, year
(1955, 1843, 1776, 1123, etc.)
Ratio Level of
Measurement
The ratio level of measurement is
the interval modified to include the
inherent zero starting point. For
values at this level, differences and
ratios are meaningful.
Example: weights of plastic, lengths
of movies, distances traveled by cars
Determining Adequate
Sample Size
Sampling Formula
(Slovin’s)
N
n = -----------
1 + e N
2
Probability Sampling
Non-Probability Sampling
Probability Sampling
The sample is a proportion (a certain
percent) of the population and such
sample is selected from the
population by means of some
systematic way in which every
element of the population has a
chance of being included in the
sample.
Non-Probability Sampling
The sample is not a proportion of the
population and there is no system in
selecting the sample. The selection is
dependent on the situation from
which the sample is taken.
Types of Non-Probability
Sampling are…
Accidental Sampling
Quota Sampling
Convenience Sampling
Accidental Sampling
The sample elements are
selected by chance.
Example: the researcher stands
in a street corner and interviews
everyone who passes by
Quota Sampling
Specified number of
elements of certain types
are included in the sample.
Example: the number of
viewers to a TV show
Convenience Sampling
A process of picking out
elements to constitute a
sample in the most convenient
and fastest way.
Example: samples to get
reactions to hot and
controversial issues.
Types of Probability
Sampling are…
Pure Random Sampling
Systematic Sampling
Stratified Sampling
Purposive Sampling
Cluster Sampling
RANDOM SAMPLING
Random Sampling is a sampling technique
where members of the population are
selected in such a way that each member
has an equal chance of being selected.
It is also called the lottery or raffle type
of sampling. It uses table of random
numbers.
Stratified Sampling
With stratified sampling, the
population is subdivided into at least
two different subpopulations(or
strata) that share the same
characteristics (such as gender), and
then a sample is drawn from each
stratum.
Systematic Sampling
In systematic sampling, one chooses
a starting point and then select every
kth (such as every 5th) element in
the population.
Purposive Sampling
In purposive sampling, the respondents are
chosen on the basis of their knowledge of
the information desired.
Ex: If a research is to be conducted on
the history of a place, the old people of
the place must be consulted and included
in the sample.
Cluster Sampling
In cluster sampling, the population
area is divided into sections (or
clusters), a few of those sections are
randomly selected , and then all the
members from the selected sections
are chosen as samples.
Measures of Central
Tendency
• Mean
• Median
• Mode
Measures of Central
Tendency
Mean
• The most reliable and the most
sensitive measure of position.
• It is the most widely used
measure.
• It is commonly known as the
“average” although the median and
the mode are also known as
averages.
Mean:
Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
How to solve the simple
mean:
• The simple mean is obtained by
adding all the values/
observations of a certain
variable and divide the sum by
the total number of values,
cases or observations.
Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
85+44+27+24+1 181
Mean = ----------------------- = -------- = 3.62 (Strongly Agree)
50 50
Table of Interpretation
(5 pt. Likert Scale)
4.20 – 5.00 Very Strongly Agree
3.40 – 4.19 Strongly Agree
2.60 – 3.39 Agree
1.80 – 2.59 Disagree
1.00 – 1.79 Strongly Disagree
The Median
What is
the
Median?
The median is . . .
• A positional measure that divides
the set of data exactly into two
parts.
• It is the score/observation that is
centrally located between the
highest and the lowest observation.
• Determined by rearranging the data
into an array.
Median for Odd Sample Median for Even Sample
n+1 n n
X = ------- X = --- + --- + 1
2 2 2
--------------
2
Using the data
in Example 1,
find the
median fat
content of the
5 meals.
Example 1:
A study was done on 5 typical fast-food
meals in Metro Manila. The following table
shows the amount of fat, in number of
teaspoons, present in each meal. Calculate
the mean amount of fat for these 5 fast-
food meals.
Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
Median for Odd Sample
Odd???
The array for the data A is :
10, 14, 16, 18, 22
• To obtain the median fat
content of the 5 meals we have
to use the median formula for
odd sample since n = 5.
• Median = [(n + 1)/2]s
• Median = (5 + 1)/2
• Median = 3rd item = 16
Median for
Even Sample
What is
even?
The following are samples scores
obtained from a 75 item summative test:
(n= 12) 48, 53, 63, 65, 45, 47, 52, 48,
63, 54, 63, 53
Array : 45, 47, 48, 48, 52, 53, 54, 55, 63, 63, 63, 65
• Since n = 12 (even).
• Median = [ 6ths + 7ths /2]
• Median = [(53 + 54)/2] = 53.5
Find the median for
Exercise #2.
Mode
The mode is …
The most favorite score.
The score having the highest
frequency.
The most frequently occurring score.
The least reliable measure of position
Determined by way of inspection.
A set of data is said to
be …
• Unimodal or monomodal if it
has only one mode.
• Example: 33, 35, 35, 38,
40, 46
• Its mode is 35.
A set of data is said to
be …
• Bimodal if it has two modes.
• Example: 33, 35, 35, 38,
40, 40, 46
• Its modes are 35 and 40.
A set of data is said to be …
• Multimodal if it has more than
two modes.
• Example: 33, 35, 35, 38, 40,
40, 46, 46, 51, 58, 58, 60
• Its modes are 35, 40, 46 and
58.
Assignment #1: Find the mean,
median and the mode of the ff:
1. 85, 82, 83, 88, 85, 87, 89,
90
2. 12, 14, 20, 19, 23, 22, 28
3. 24, 34, 27, 27, 34, 24
4. 102, 100, 111, 100, 106, 102
5. 75, 86, 78, 84, 88, 86, 84,
85, 81, 84, 80
Grouped
Data
What is a Frequency
Distribution?
• A Frequency
Distribution is a tabular
representation of data
consisting of intervals
and their respective
frequencies.
Other ways of
presenting
data are . . .
BAR CHART
90
80
70
60
50 East
40 West
30 North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
LINE GRAPH
100
90
80
70
60 East
50 West
40 North
30
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
PIE CHART
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Scatter Plot
100
90
80
70
60 East
50 West
40 North
30
20
10
0
0 1 2 3 4 5
How to construct a
Frequency Distribution:
• Determine the range. R = H0 –
LO.
• Determine the class size (c) using
the formula, c = (R+1)/ #CI.
• Construct the interval
• Tally the data and determine the
frequency for each interval.
The class interval in a
frequency distribution must:
• Not overlap.
• Be relatively complete where
each data can be tallied in the
different interval.
• Have a uniform class size.
• Not be less than 7 but not
more than 15.
Data:
77 77 85 72 69 80 75 69 80 64
72 68 48 60 44 87 52 74 72 76
63 81 56 71 54 76 81 78 55 74
82 59 40 73 61 80 58 75 63 48
46 51 80 42 65 54 79 57 72 67
Data:
77 77 85 72 69 80 75 69 80 64
72 68 48 60 44 87 52 74 72 76
63 81 56 71 54 76 81 78 55 74
82 59 40 73 61 80 58 75 63 48
46 51 80 42 65 54 79 57 72 67
Total 3342 Mean 66.84
Frequency Distribution
Class Interval f % CF< %
82-87 3 6% 50 100%
76-81 12 24% 47 94%
70-75 10 20% 35 70%
64-69 6 12% 25 50%
58-63 6 12% 19 38%
52-57 6 12% 13 26%
46-51 4 8% 7 14%
40-45 3 6% 3 6%
50 100%
(fM)
Mean = ----------
n
Class Interval f Mdpt. fM
82-87 3 84.5 253.5
76-81 12 78.5 942
70-75 10 72.5 725
64-69 6 66.5 399
58-63 6 60.5 363
52-57 6 54.5 327
46-51 4 48.5 194
40-45 3 42.5 127.5
50 3331
18 4.9 24.01
12 10.9 118.81 CV = (5.34/22.9) X 100
22.9 41.2 256.9
= 23.32%
Do the same computation
for Group B…
Problem:
Two seemingly equally excellent BS
Psychology students are vying for an
academic honor where only one must
have to be chosen to get the award.
The following are their grades used
as basis for the award:
Franzen : 91, 90, 94, 93, 92
Rico : 92, 92, 90, 94, 92
Whom do you think deserves to get
the award?
Guiding Principle
The lesser the value of the
measure, the more consistent,
the more homogeneous and
the less scattered are the
observations in the set of
data.