Statistics WT Lab
Statistics WT Lab
St. Paul
Paul University
University Philippines
Philippines
A Course Presentation in
STATISTICS
Definition
Statistics is a science that
deals with the collection,
presentation, analysis and
interpretation of data.
Definition
Statistics is a collection of
methods for planning
experiments, obtaining data
and then organizing,
summarizing, presenting,
analyzing, interpreting and
drawing conclusions based on
the data.
Divisions of
Statistics
Descriptive and
Inferential Statistics
Descriptive Statistics is a
statistical procedure
concerned with the
describing the characteristics
and properties of a group of
persons, places or things.
Example
•A teacher computes the
average grade of her
students and then
determine the top ten
students.
Inferential Statistics is a
statistical procedure that is used
draw inferences or information
about the properties or
characteristics of people, places or
things on the basis of the
information obtained from the small
portion of a large group.
Example
•A dermatologist tests the
relative effectiveness of
a new brand of medicine
in curing pimples and
other skin diseases.
Basic
Terminologies
Population vs. Sample
A population is the complete
collection of elements (scores,
people, measurements, and so on)
to be studied.
A sample is a sub-collection of
elements drawn from a population.
Parameter vs. Statistic
A parameter is a numerical
measurement describing some
characteristic of a population.
A statistic is a numerical
measurement describing some
characteristic of a sample.
Data
• Data are facts, or set of
informations or observations under
study. More specifically, data are
gathered by the researcher from a
population or from a sample. Data
may be classified into two
categories, qualitative and
quantitative data.
Nature of Data
Qualitative vs.
Quantitative Data
Qualitative (or categorical or
attribute) data can be separated into
different categories that are
distinguished by some non-numerical
characteristic.
Quantitative data consist of
numbers representing counts or
measurements.
Discrete vs. Continuous
Data
Discrete data result from either a
finite number of possible values or
a countable number of possible
values. (That is, the number of
possible values is 0, 1, 2 or more)
Continuous data result from
infinitely many possible values that
can be associated with points on a
continuous scale in such a way that
there are no gaps or interruptions.
Levels of
Measurement
Nominal Level of
Measurement
• The nominal level of measurement
is characterized by data that
consists of names, labels, or
categories only. The data cannot be
arranged in an ordering scheme.
This is used when we want to
distinguish one object from another
for identification purposes.
Ordinal Level of
Measurement
• The ordinal level of measurement
involves data that may be
arranged in some order, but
differences between data values
either cannot be determined or
are meaningless.
Interval Level of
Measurement
The interval level of measurement is
like the ordinal level, with the additional
property that meaningful amounts of
differences between data can be
determined. However, there are no
inherent (natural) zero starting point.
Example: body temperature, year
(1955, 1843, 1776, 1123, etc.)
Ratio Level of
Measurement
The ratio level of measurement is
the interval modified to include the
inherent zero starting point. For
values at this level, differences and
ratios are meaningful.
Example: weights of plastic, lengths
of movies, distances traveled by cars
Data Gathering
Techniques
The main objective of
Statistics
To help us in making
wise decision.
Decision-making is an
important part of our lives.
Everybody makes decisions
almost everyday.
For instance, students
decide on what course
they would take in college
that could give them high
salary and a better
future.
• Mothers decide on what
brand of milk to buy.
•Business-minded
people think whether
to put their money in
the bank or to open a
business or a factory
Collecting Data
• In conducting a study or
research, collection of data
is the first step. Data may
be gathered from primary
or secondary sources.
Two Sources of
Data
Primary Sources of
Data
Primary sources of statistical
data are the government
institutions, business agencies,
and other organizations. For
example, National Statistics
Office (NSO), Information
derived from personal
interview.
Secondary Sources of
Data
• Secondary Sources are books,
encyclopedia, journals, magazines,
and research or studies conducted
by other individuals.
Different
Different Ways
Ways of
of
Collecting
Collecting Data
Data
The Direct or Interview
Method
• In this method, the researcher has
a direct contact with the
interviewee. The researcher obtains
the information needed by asking
questions and inquiries from the
interviewee. This method is usually
used in business research.
The Direct or Interview
Method
• For example, a business firm would
interview residents of a certain
barangay regarding their favorite brand
of toothpaste, soap or shoes. TV
personnel would ask televiewers about
their favorite noontime show. Even
political analysts use this method to
determine public opinion or preferences
for candidates in upcoming elections.
• Using this method, the researcher
can get more accurate answers on
responses since clarifications can
be made if the interviewee or
respondent does not understand
the question. However, this
method is costly and time-
consuming.
The Indirect or
Questionnaire Method
• This method makes use of written
questionnaire. The researcher gives or
distributes the questionnaire to the
respondents either by personal delivery or
by mail. Using this method, the researcher
can save a lot of time and money in
gathering the information needed because
questionnaires can be given to a large
number of respondents at the same time.
• However, the researcher cannot
expect that all distributed
questionnaire will be retrieved
because some respondents simply
ignore the questionnaires. In
addition, clarifications cannot be
made if the respondent does not
understand the question.
The Registration
Method
• This method of colleting data is
govern by laws. For example, birth
and death rates are registered in the
National Statistics Office for records
and future use. The number of
registered cars can be found at the
Land Transportation Office (LTO).
The Registration
Method
Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
How to solve the
simple mean:
• The simple mean is obtained
by adding all the values/
observations of a certain
variable and divide the sum
by the total number of
values, cases or observations.
Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
85+44+27+24+1 181
Mean = ----------------------- = -------- = 3.62 (Strongly Agree)
50 50
Table of Interpretation
(5 pt. Likert Scale)
4.20 – 5.00 Very Strongly Agree
3.40 – 4.19 Strongly Agree
2.60 – 3.39 Agree
1.80 – 2.59 Disagree
1.00 – 1.79 Strongly Disagree
The
The Median
Median
What is
the
Median?
The median is . . .
• A positional measure that divides
the set of data exactly into two
parts.
• It is the score/observation that is
centrally located between the
highest and the lowest observation.
• Determined by rearranging the
data into an array.
Example 1:
A study was done on 5 typical fast-
food meals in Metro Manila. The
following table shows the amount of
fat, in number of teaspoons, present in
each meal. Calculate the mean amount
of fat for these 5 fast-food meals.
Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
Median
Median for
for Odd
Odd Sample
Sample
Odd???
The array for the data A is
:
10, 14, 16, 18, 22
• To obtain the median fat
content of the 5 meals we
have to use the median
formula for odd sample since
n = 5.
• Median = [(n + 1)/2]s
• Median = (5 + 1)/2
• Median = 3rd item = 16
Median
Median for
for
Even
Even Sample
Sample
What is
even?
The following are samples scores
obtained from a 75 item summative
test:
(n= 12) 48, 53, 63, 65, 45, 47, 52,
48, 63, 54, 63, 53
Array : 45, 47, 48, 48, 52, 53, 54, 55, 63, 63, 63, 65
• Since n = 12 (even).
• Median = [ 6ths + 7ths /2]
• Median = [(53 + 54)/2] = 53.5
Mode
The mode is …
The most favorite score.
The score having the highest
frequency.
The most frequently occurring
score.
The least reliable measure of
position
Determined by way of inspection.
A set of data is said to
be …
• Unimodal or monomodal
if it has only one mode.
• Example: 33, 35, 35, 38,
40, 46
• Its mode is 35.
A set of data is said to
be …
• Bimodal if it has two
modes.
• Example: 33, 35, 35, 38,
40, 40, 46
• Its modes are 35 and 40.
A set of data is said to be
…
• Multimodal if it has more
than two modes.
• Example: 33, 35, 35, 38,
40, 40, 46, 46, 51, 58, 58,
60
• Its modes are 35, 40, 46
and 58.
Grouped
Grouped
Data
Data
What is a Frequency
Distribution?
•A Frequency
Distribution is a tabular
representation of data
consisting of intervals
and their respective
frequencies.
Other
Other ways
ways
of
of
presenting
presenting
data
data are
are .. .. ..
BAR CHART
90
80
70
60
50 East
40 West
30 North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
LINE GRAPH
100
90
80
70
60 East
50 West
40 North
30
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
PIE CHART
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Scatter Plot
100
90
80
70
60 East
50 West
40 North
30
20
10
0
0 1 2 3 4 5
How to construct a
Frequency Distribution:
• Determine the range. R = H0 – LO.
• Determine the ideal class interval
(ICI).
• Determine the class size (i) using the
formula, i = R/ICI.
• Construct the interval
• Tally the data and determine the
frequency for each interval.
The class interval in a
frequency distribution
must:
• Not overlap.
• Be relatively complete
where each data can be
tallied in the different
interval.
• Have a uniform class size.
Data:
77 77 85 72 69 80 75 69 80 64
72 68 48 60 44 87 52 74 72 76
63 81 56 71 54 76 81 78 55 74
82 59 40 73 61 80 58 75 63 48
46 51 80 42 65 54 79 57 72 67
Data:
77 77 85 72 69 80 75 69 80 64
72 68 48 60 44 87 52 74 72 76
63 81 56 71 54 76 81 78 55 74
82 59 40 73 61 80 58 75 63 48
46 51 80 42 65 54 79 57 72 67
Total 3342 Mean 66.84
Frequency Distribution
Class Interval f %
40-45 3 6%
46-51 4 8%
52-57 6 12%
58-63 6 12%
64-69 6 12%
70-75 10 20%
76-81 12 24%
82-86 3 6%
50 100%
Uses
Uses of
of the
the Measures
Measures
of
of Central
Central Tendency
Tendency
The Mean is used…
For interval and ratio measurements
When there are no extreme values in a
distribution since it is easily affected by
extremely high or extremely low scores
When higher statistical computations
are wanted
When the greatest reliability of the
measure of central tendency is wanted
since its computations include all the
given
values
The Median is used…
For ordinal and ranked measurements
When there are extreme values, thus
the distribution is markedly skewed
For an open-end distribution; that is, the
lowest or the highest class interval or
both are defined (i.e., 50 and below or
100 and above)
When one desires to know whether the
cases fall within the upper halves or the
lower halves of a distribution.
The Mode is used…
For nominal and categorical data
When a rough or quick estimate of
a central value is wanted
When the most popular or the most
typical case or value in a
distribution is wanted
Limitations
Limitations of
of the
the
Measures
Measures of
of Central
Central
Tendency
Tendency
The Limitations of the
Mean…
It is the most widely used average, since
it is the most familiar. However, it is
often misused. It can not be used if the
clustering of values. Or items is not
substantial.
If the given values do not tend to cluster
around a central value, the mean is a
poor measure of central location.
It is easily affected by extremely large or
small values. One small value can easily
pull down the mean.
The Limitations of the
Mean…
The mean can not be used to compare
distributions since the means of 2 or
more distributions may be the same but
their other characteristics may be
entirely different. The means of
distribution A whose values are 80, 85
and 90 and distribution B whose values
are 86, 85, 84 are both 85. We can not
imply, however, that both distributions
possess the same characteristics since
their patterns of dispersions or
variations are markedly different despite
having the same mean.
The Limitations of the
Median…
It is easily affected by the number of
items in a distribution.
It can not be determined if the given
values are not arranged according to
magnitude
If several values are contained in a
distribution, it becomes laborious task to
arrange them according to magnitude
Its value is not as accurate as the mean
since it is just an ordinal statistic.
The Limitations of the
Mode…
It is seldom or rarely used since it
does not always exist.
Its value is just a rough estimate of
the center of concentration of a
distribution.
It is very unstable since its value
easily changes depending on the
approaches used in finding it.
Measures of
Variability
• The statistical tool used to
describe the degree to
which scores/
observations are
scattered/dispersed.
• It is also used to determine
the degree of consistency/
homogeneity of scores.
Measures of Variability
Range(R) = HO - LO
Mean Absolute Deviation
(MAD)
Standard Deviation(s)
Variance(s2)
Coefficient of Variation (CV)
The following are the scores obtained
by two groups of 2nd year ASHE
students in N101:
Group A Group B
30 30
28 20
27 18
25 16
25 15
23 15
21 14
20 13
18 12
12 12
2 Range = 30 – 12 = 18
X |X - Mean| (X - Mean)
30 7.1 50.41 Standard dev’n =
12 10.9 118.81
22.9 41.2 256.9 CV = (5.34/22.9) X 100
= 23.32%
Problem:
Two seemingly equally excellent
BSN students are vying for an
academic honor where only one
must have to be chosen to get the
award. The following are their
grades used as basis for the award:
Franzen : 91, 90, 94, 93, 92
Rico : 92, 92, 90, 94, 92
Whom do you think deserves to get
the award?
Guiding
Principle
The lesser the value of the
measure, the more
consistent, the more
homogeneous and the less
scattered are the
observations in the set of
data.