0% found this document useful (0 votes)
6 views120 pages

Statistics WT Lab

The document provides an overview of statistics, defining it as the science of data collection, analysis, and interpretation. It discusses the two main divisions of statistics: descriptive and inferential, along with key concepts such as population vs. sample, and qualitative vs. quantitative data. Additionally, it covers data gathering techniques, sampling methods, and measures of central tendency, emphasizing the importance of statistics in decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views120 pages

Statistics WT Lab

The document provides an overview of statistics, defining it as the science of data collection, analysis, and interpretation. It discusses the two main divisions of statistics: descriptive and inferential, along with key concepts such as population vs. sample, and qualitative vs. quantitative data. Additionally, it covers data gathering techniques, sampling methods, and measures of central tendency, emphasizing the importance of statistics in decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 120

St.

St. Paul
Paul University
University Philippines
Philippines

A Course Presentation in
STATISTICS
Definition
Statistics is a science that
deals with the collection,
presentation, analysis and
interpretation of data.
Definition
Statistics is a collection of
methods for planning
experiments, obtaining data
and then organizing,
summarizing, presenting,
analyzing, interpreting and
drawing conclusions based on
the data.
Divisions of
Statistics
Descriptive and
Inferential Statistics
Descriptive Statistics is a
statistical procedure
concerned with the
describing the characteristics
and properties of a group of
persons, places or things.
Example
•A teacher computes the
average grade of her
students and then
determine the top ten
students.
Inferential Statistics is a
statistical procedure that is used
draw inferences or information
about the properties or
characteristics of people, places or
things on the basis of the
information obtained from the small
portion of a large group.
Example
•A dermatologist tests the
relative effectiveness of
a new brand of medicine
in curing pimples and
other skin diseases.
Basic
Terminologies
Population vs. Sample
A population is the complete
collection of elements (scores,
people, measurements, and so on)
to be studied.
A sample is a sub-collection of
elements drawn from a population.
Parameter vs. Statistic
 A parameter is a numerical
measurement describing some
characteristic of a population.
 A statistic is a numerical
measurement describing some
characteristic of a sample.
Data
• Data are facts, or set of
informations or observations under
study. More specifically, data are
gathered by the researcher from a
population or from a sample. Data
may be classified into two
categories, qualitative and
quantitative data.
Nature of Data
Qualitative vs.
Quantitative Data
Qualitative (or categorical or
attribute) data can be separated into
different categories that are
distinguished by some non-numerical
characteristic.
Quantitative data consist of
numbers representing counts or
measurements.
Discrete vs. Continuous
Data
 Discrete data result from either a
finite number of possible values or
a countable number of possible
values. (That is, the number of
possible values is 0, 1, 2 or more)
 Continuous data result from
infinitely many possible values that
can be associated with points on a
continuous scale in such a way that
there are no gaps or interruptions.
Levels of
Measurement
Nominal Level of
Measurement
• The nominal level of measurement
is characterized by data that
consists of names, labels, or
categories only. The data cannot be
arranged in an ordering scheme.
This is used when we want to
distinguish one object from another
for identification purposes.
Ordinal Level of
Measurement
• The ordinal level of measurement
involves data that may be
arranged in some order, but
differences between data values
either cannot be determined or
are meaningless.
Interval Level of
Measurement
The interval level of measurement is
like the ordinal level, with the additional
property that meaningful amounts of
differences between data can be
determined. However, there are no
inherent (natural) zero starting point.
Example: body temperature, year
(1955, 1843, 1776, 1123, etc.)
Ratio Level of
Measurement
The ratio level of measurement is
the interval modified to include the
inherent zero starting point. For
values at this level, differences and
ratios are meaningful.
Example: weights of plastic, lengths
of movies, distances traveled by cars
Data Gathering
Techniques
The main objective of
Statistics
To help us in making
wise decision.
Decision-making is an
important part of our lives.
Everybody makes decisions
almost everyday.
For instance, students
decide on what course
they would take in college
that could give them high
salary and a better
future.
• Mothers decide on what
brand of milk to buy.
•Business-minded
people think whether
to put their money in
the bank or to open a
business or a factory
Collecting Data

• In conducting a study or
research, collection of data
is the first step. Data may
be gathered from primary
or secondary sources.
Two Sources of
Data
Primary Sources of
Data
Primary sources of statistical
data are the government
institutions, business agencies,
and other organizations. For
example, National Statistics
Office (NSO), Information
derived from personal
interview.
Secondary Sources of
Data
• Secondary Sources are books,
encyclopedia, journals, magazines,
and research or studies conducted
by other individuals.
Different
Different Ways
Ways of
of
Collecting
Collecting Data
Data
The Direct or Interview
Method
• In this method, the researcher has
a direct contact with the
interviewee. The researcher obtains
the information needed by asking
questions and inquiries from the
interviewee. This method is usually
used in business research.
The Direct or Interview
Method
• For example, a business firm would
interview residents of a certain
barangay regarding their favorite brand
of toothpaste, soap or shoes. TV
personnel would ask televiewers about
their favorite noontime show. Even
political analysts use this method to
determine public opinion or preferences
for candidates in upcoming elections.
• Using this method, the researcher
can get more accurate answers on
responses since clarifications can
be made if the interviewee or
respondent does not understand
the question. However, this
method is costly and time-
consuming.
The Indirect or
Questionnaire Method
• This method makes use of written
questionnaire. The researcher gives or
distributes the questionnaire to the
respondents either by personal delivery or
by mail. Using this method, the researcher
can save a lot of time and money in
gathering the information needed because
questionnaires can be given to a large
number of respondents at the same time.
• However, the researcher cannot
expect that all distributed
questionnaire will be retrieved
because some respondents simply
ignore the questionnaires. In
addition, clarifications cannot be
made if the respondent does not
understand the question.
The Registration
Method
• This method of colleting data is
govern by laws. For example, birth
and death rates are registered in the
National Statistics Office for records
and future use. The number of
registered cars can be found at the
Land Transportation Office (LTO).
The Registration
Method

The list of registered voters in


the Philippines is found in the
Commission on Elections
(COMELEC). This method of
gathering data is perhaps the
most reliable because this is
enforced by law.
The Experimental
Method
• This method is usually used
to find out the cause and
effect relationships.
Scientific researchers often
use this method.
The Experimental
Method
• For example, agriculturists
would like to know the effect
of a new brand of fertilizers
on the growth of plants. The
new kind of fertilizers will be
applied to ten sets of plants.
Determining
Determining
Adequate
Adequate Sample
Sample
Size
Size
In research we seldom use the
entire population because of the cost
and time involved. In fact, most
researchers do not use the population
in their study. Instead, the sample
which is a small representative of a
population is used. The characteristics
of the whole or entire population is
described using the characteristics
observed from the sample.
To determine the
sample size from a
given population size,
the Slovin’s formula is
used.
Sampling Formula
(Slovin’s)
N
n = -----------
1 + e 2N

Where n = sample size


N = population size
e = margin of error
• Observe that there is a margin
of error. When we use a
sample, we do not get the
actual value but just an
estimate of the parameter.
Hence, there is error
associated when using the
sample.
Examples in finding the
sample size
1. A group of researchers will conduct a
survey to find out the opinion of
residents of a particular community
regarding the oil price hike. If there
are 10,000 residents in the community
and the researchers plan to use a
sample using a 10% margin of error,
what would be the sample size be?
Example for Slovin’s
Formula
Solution: Here: N = 10 000 and e
= 10% or 0.10. Substituting the
given values in the formula, we
have
10 000 10 000
n =-------------= ------------
1+(.10)2(10 000) 1+(.01)(10 000)

n =10 000/101= 99.01or 99


Example 2.

•Suppose that in Example


1, the researchers would
like to use a 5% margin
of error. What should be
the size of the sample?
Example 2

Solution: Here: N = 10 000 and e =


5% or 0.05. Substituting the given
values in the formula, we have
10 000 10 000
n =-------------= ------------
1+(.05)2(10 000) 1+(.0025)(10
000)

n =10 000/1+25= 384.62or 385


• What did you observe of the
sample size as we reduce the
margin of error?

• If you want to have a more


accurate result, are you going to
consider a larger sample?
SAMPLING
SAMPLING
TECHNIQUES
TECHNIQUES
Definition
• Sampling may be defined as
measuring a small portion of
something and then making a
general statement about the
whole thing (Bradfield &
Moredock, 1957)
Why
Why do
do we
we need
need
sampling?
sampling?
Why we need
sampling…
 Sampling makes possible the
study of a large, heterogeneous
population.
 Sampling is for economy, speed,
and accuracy.
 Sampling saves the sources of
data from being all consumed.
General
General Types
Types of
of
Sampling
Sampling
There are two general
types of sampling…
Probability Sampling
Non-Probability Sampling
Probability Sampling
 The sample is a proportion (a
certain percent) of the population
and such sample is selected from
the population by means of some
systematic way in which every
element of the population has a
chance of being included in the
sample.
Non-Probability
Sampling
 The sample is not a proportion of
the population and there is no
system in selecting the sample.
The selection is dependent on the
situation from which the sample is
taken. This technique lacks
objectivity of the selection. It is
sometimes called subjective
sampling.
Types of Non-
Probability Sampling
are…
Convinience Sampling
Quota Sampling
Purposive Sampling
Convenience Sampling
This is used because of the
convenience it offers to the
researchers.
Example: The researcher
wishes to investigate the most
popular noontime show may
just interview the respondents
through the telephone.
Quota Sampling
In this type of sampling,
the proportions of the
various subgroups in the
population are
determined and the
sample is drawn to have
the same percentage in
it.
Quota Sampling
Example: Suppose we want
to determine the teenager’s
most favorite brand of t –
shirt. If there are 1000
female and 1000 male
teenagers and we want to
draw 150 members
Quota Sampling
for our sample, we can
select 75 female and 75
male teenagers from the
population without using
randomization.
Purposive Sampling
This is based on certain
criteria laid down by the
researcher. People who
satisfy the criteria are
interviewed.
Purposive Sampling
In purposive sampling, the
respondents are chosen on the basis
of their knowledge of the information
desired.
Ex: If a research is to be conducted
on the history of a place, the old
people of the place must be
consulted and included in the
sample.
Purposive Sampling
Example: Suppose the
target is to find out the
effectivity of a certain kind
of champoo. Of course bald
fellows will not be included
in the sample.
Types of Probability
Sampling are…
Simple Random
Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
RANDOM SAMPLING
 Simple Random Sampling is a sampling
technique where members of the
population are selected in such a way
that each member has an equal chance
of being selected.
 It is also called the lottery or raffle type
of sampling.
Stratified Sampling
 With stratified sampling, the
population is subdivided into at
least two different
subpopulations(or strata) that share
the same characteristics (such as
gender), and then a sample is drawn
from each stratum.
Systematic Sampling
In systematic sampling, one
chooses a starting point and then
select every kth (such as every
5th) element in the population.
Cluster Sampling
In cluster sampling, the population
area is divided into sections (or
clusters), a few of those sections
are randomly selected , and then
all the members from the selected
sections are chosen as samples.
Measures
Measures of
of Central
Central
Tendency
Tendency
Mean
• The most reliable and the most
sensitive measure of position.
• It is the most widely used
measure.
• It is commonly known as the
“average” although the median
and the mode are also known as
averages.
Mean:
•It comes into 2 different
forms:
1) Simple Mean
2) Weighted Mean
Example 1:
A study was done on 5 typical fast-
food meals in Metro Manila. The
following table shows the amount of
fat, in number of teaspoons, present in
each meal. Calculate the mean amount
of fat for these 5 fast-food meals.

Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
How to solve the
simple mean:
• The simple mean is obtained
by adding all the values/
observations of a certain
variable and divide the sum
by the total number of
values, cases or observations.
Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16

• To obtain the simple mean


amount of fat for the 5 fast-
food meals
• Mean = (14+18+22+10+16)/5
• Mean = 80/5 = 16
• This means to say that mean
fat content of the 5 fast-food
meals is too much.
Example 2:
• The following represents the final
grades obtained by a nursing
student one summer term:
• Anatomy (5 units) - - - 93
• Chemistry (3 units) - - - 88
• SOT 2 (2 units) - - - 89
– Find the weighted average of the
student.
To
To solve
solve for
for the
the weighted
weighted
average
average ofof the
the student
student we
we
have...
have...
wixi
Mean = ----------
w

93(5) + 88(3) + 89(2)


Mean = --------------------------
10

465 + 264 + 178 907


Mean = ----------------------- = -------- = 90.7 (Excellent)
10 10
Example 3:
• The following represents the responses
of 50 randomly chosen respondents in
one item of a research questionnaire:
• Very Strongly Agree (5) - - - 17
• Strongly Agree (4) - - - 11
• Agree (3) - - - 9
• Disagree (2) - - - 12
• Strongly Disagree (1) - - - 1
– Find the weighted response of the
respondents.
To
To solve
solve for
for the
the weighted
weighted
response
response we we have...
have...
wixi
Mean = ----------
w

5(17) + 4(11) + 3(9) + 2(12) + 1(1)


Mean = ------------------------------------------
50

85+44+27+24+1 181
Mean = ----------------------- = -------- = 3.62 (Strongly Agree)
50 50
Table of Interpretation
(5 pt. Likert Scale)
4.20 – 5.00 Very Strongly Agree
3.40 – 4.19 Strongly Agree
2.60 – 3.39 Agree
1.80 – 2.59 Disagree
1.00 – 1.79 Strongly Disagree
The
The Median
Median
What is
the
Median?
The median is . . .
• A positional measure that divides
the set of data exactly into two
parts.
• It is the score/observation that is
centrally located between the
highest and the lowest observation.
• Determined by rearranging the
data into an array.
Example 1:
A study was done on 5 typical fast-
food meals in Metro Manila. The
following table shows the amount of
fat, in number of teaspoons, present in
each meal. Calculate the mean amount
of fat for these 5 fast-food meals.

Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
Median
Median for
for Odd
Odd Sample
Sample

Odd???
The array for the data A is
:
10, 14, 16, 18, 22
• To obtain the median fat
content of the 5 meals we
have to use the median
formula for odd sample since
n = 5.
• Median = [(n + 1)/2]s
• Median = (5 + 1)/2
• Median = 3rd item = 16
Median
Median for
for
Even
Even Sample
Sample

What is
even?
The following are samples scores
obtained from a 75 item summative
test:
(n= 12) 48, 53, 63, 65, 45, 47, 52,
48, 63, 54, 63, 53

Array : 45, 47, 48, 48, 52, 53, 54, 55, 63, 63, 63, 65

• Since n = 12 (even).
• Median = [ 6ths + 7ths /2]
• Median = [(53 + 54)/2] = 53.5
Mode
The mode is …
The most favorite score.
The score having the highest
frequency.
The most frequently occurring
score.
The least reliable measure of
position
Determined by way of inspection.
A set of data is said to
be …
• Unimodal or monomodal
if it has only one mode.
• Example: 33, 35, 35, 38,
40, 46
• Its mode is 35.
A set of data is said to
be …
• Bimodal if it has two
modes.
• Example: 33, 35, 35, 38,
40, 40, 46
• Its modes are 35 and 40.
A set of data is said to be

• Multimodal if it has more
than two modes.
• Example: 33, 35, 35, 38,
40, 40, 46, 46, 51, 58, 58,
60
• Its modes are 35, 40, 46
and 58.
Grouped
Grouped
Data
Data
What is a Frequency
Distribution?
•A Frequency
Distribution is a tabular
representation of data
consisting of intervals
and their respective
frequencies.
Other
Other ways
ways
of
of
presenting
presenting
data
data are
are .. .. ..
BAR CHART
90
80
70
60
50 East
40 West
30 North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
LINE GRAPH
100
90
80
70
60 East
50 West
40 North
30
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
PIE CHART

1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Scatter Plot
100
90
80
70
60 East
50 West
40 North
30
20
10
0
0 1 2 3 4 5
How to construct a
Frequency Distribution:
• Determine the range. R = H0 – LO.
• Determine the ideal class interval
(ICI).
• Determine the class size (i) using the
formula, i = R/ICI.
• Construct the interval
• Tally the data and determine the
frequency for each interval.
The class interval in a
frequency distribution
must:
• Not overlap.
• Be relatively complete
where each data can be
tallied in the different
interval.
• Have a uniform class size.
Data:
77 77 85 72 69 80 75 69 80 64
72 68 48 60 44 87 52 74 72 76
63 81 56 71 54 76 81 78 55 74
82 59 40 73 61 80 58 75 63 48
46 51 80 42 65 54 79 57 72 67
Data:
77 77 85 72 69 80 75 69 80 64
72 68 48 60 44 87 52 74 72 76
63 81 56 71 54 76 81 78 55 74
82 59 40 73 61 80 58 75 63 48
46 51 80 42 65 54 79 57 72 67
Total 3342 Mean 66.84
Frequency Distribution
Class Interval f %
40-45 3 6%
46-51 4 8%
52-57 6 12%
58-63 6 12%
64-69 6 12%
70-75 10 20%
76-81 12 24%
82-86 3 6%
50 100%
Uses
Uses of
of the
the Measures
Measures
of
of Central
Central Tendency
Tendency
The Mean is used…
 For interval and ratio measurements
 When there are no extreme values in a
distribution since it is easily affected by
extremely high or extremely low scores
 When higher statistical computations
are wanted
 When the greatest reliability of the
measure of central tendency is wanted
since its computations include all the
given
values
The Median is used…
 For ordinal and ranked measurements
 When there are extreme values, thus
the distribution is markedly skewed
 For an open-end distribution; that is, the
lowest or the highest class interval or
both are defined (i.e., 50 and below or
100 and above)
 When one desires to know whether the
cases fall within the upper halves or the
lower halves of a distribution.
The Mode is used…
For nominal and categorical data
When a rough or quick estimate of
a central value is wanted
When the most popular or the most
typical case or value in a
distribution is wanted
Limitations
Limitations of
of the
the
Measures
Measures of
of Central
Central
Tendency
Tendency
The Limitations of the
Mean…
 It is the most widely used average, since
it is the most familiar. However, it is
often misused. It can not be used if the
clustering of values. Or items is not
substantial.
 If the given values do not tend to cluster
around a central value, the mean is a
poor measure of central location.
 It is easily affected by extremely large or
small values. One small value can easily
pull down the mean.
The Limitations of the
Mean…
 The mean can not be used to compare
distributions since the means of 2 or
more distributions may be the same but
their other characteristics may be
entirely different. The means of
distribution A whose values are 80, 85
and 90 and distribution B whose values
are 86, 85, 84 are both 85. We can not
imply, however, that both distributions
possess the same characteristics since
their patterns of dispersions or
variations are markedly different despite
having the same mean.
The Limitations of the
Median…
 It is easily affected by the number of
items in a distribution.
 It can not be determined if the given
values are not arranged according to
magnitude
 If several values are contained in a
distribution, it becomes laborious task to
arrange them according to magnitude
 Its value is not as accurate as the mean
since it is just an ordinal statistic.
The Limitations of the
Mode…
It is seldom or rarely used since it
does not always exist.
Its value is just a rough estimate of
the center of concentration of a
distribution.
It is very unstable since its value
easily changes depending on the
approaches used in finding it.
Measures of
Variability
• The statistical tool used to
describe the degree to
which scores/
observations are
scattered/dispersed.
• It is also used to determine
the degree of consistency/
homogeneity of scores.
Measures of Variability
Range(R) = HO - LO
Mean Absolute Deviation
(MAD)
Standard Deviation(s)
Variance(s2)
Coefficient of Variation (CV)
The following are the scores obtained
by two groups of 2nd year ASHE
students in N101:
Group A Group B
30 30
28 20
27 18
25 16
25 15
23 15
21 14
20 13
18 12
12 12
2 Range = 30 – 12 = 18
X |X - Mean| (X - Mean)
30 7.1 50.41 Standard dev’n =

G 28 5.1 26.01 256.9/(10-1)


R 27 4.1 16.81
= 28.54
O 25 2.1 4.41 = 5.34
U 25 2.1 4.41
P Mean Absolute Dev’n
23 0.1 0.01 = 41.2/10
21 1.9 3.61 = 4.12
A
20 2.9 8.41 Variance = (5.34)2
18 4.9 24.01 =
28.54

12 10.9 118.81
22.9 41.2 256.9 CV = (5.34/22.9) X 100

= 23.32%
Problem:
 Two seemingly equally excellent
BSN students are vying for an
academic honor where only one
must have to be chosen to get the
award. The following are their
grades used as basis for the award:
Franzen : 91, 90, 94, 93, 92
Rico : 92, 92, 90, 94, 92
Whom do you think deserves to get
the award?
Guiding
Principle
 The lesser the value of the
measure, the more
consistent, the more
homogeneous and the less
scattered are the
observations in the set of
data.

You might also like