Statistical Analysis With Software Application
Statistical Analysis With Software Application
• Determine the level of Let’s break this definition into four parts. The first
measurement of a variable. part states that statistics involves the collection of
information. The second refers to the organization
and summarization of information. The third
states that the information is analyzed to draw
conclusions or answer specific questions. The
fourth part states that results should be reported
using some measure that represents how
convinced we are that our conclusions reflect
reality.
• Statistics is important because it enables 4. Statistics table may be misused.
people to make decisions based on empirical
evidence. 5. Statistics is only, one of the methods of
studying a problem.
• Statistics provides us with tools needed to
co n ve r t m a ssive d a ta in to p e r tin e n t Definitions:
information that can be used in decision
• Universe is the set of all entities under
making.
study.
• Statistics can provide us information that we
• A Population is the total or entire group of
can use to make sensible decisions.
individuals or observations from which
What information is referred to in the information is desired by a researcher. Apart
definition? from persons, a population may consist of
mosquitoes, villages, institution, etc.
The information referred to the definition is the
data. According to the Merriam Webster • An individual is a person or object that is a
dictionary, data are “factual information used member of the population being studied.
as a basis for reasoning, discussion, or
• A statistic is a numerical summary of a
calculation”.
sample.
Data can be numerical, as in height, or
• Sample is the subset of the population.
nonnumerical, as in gender. In either case,
data describe characteristics of an individual. • Descriptive statistics consist of organizing
and summarizing data. Descriptive statistics
Field of Statistics
describe data through numerical summaries,
A. Mathematical Statistics- The study and tables, and graphs.
development of statistical theory and methods
• Inferential statistics uses methods that
in the abstract.
take a result from a sample, extend it to the
B. Applied Statistics- The application of population, and measure the reliability of the
statistical methods to solve real problems result.
involving randomly generated data and the
• A parameter is a numerical summary of a
development of new statistical methodology
population
motivated by real problems. Example branches
of Applied Statistics: psychometric, Example: Consider the Scenario.
econometrics, and biostatistics.
You are walking down the street and notice
Limitation of Statistics that a person walking in front of you drops
Statistics is not suitable to the study of PHP100. Nobody seems to notice the PHP100
qualitative phenomenon. except you. Since you could keep the money
without anyone knowing, would you keep the
2. Statistics does not study individuals. money or return it to the owner?
In the PHP100 study presented, the population 2. Collect the information needed to answer
is all the students at the school. Each student the questions.
is an individual. The sample is the 50 students
selected to participate in the study. Conducting research on an entire population is
often difficult and expensive, so we typically
Suppose 39 of the 50 students stated that they look at a sample. This step is vital to the
would return the money to the owner. We could statistical process, because if the data are not
present this result by saying that the percent of collected correctly, the conclusions drawn are
students in the survey who would return the meaningless. Do not overlook the importance
money to the owner is 78%. This is an of appropriate data collection.
example of a descriptive statistic because it
describes the results of the sample without Example:
making any general conclusions about the
population. So 78% is a statistic because it is a A research objective is presented. For each
numerical summary based on a sample. research objective, identify the population and
Descriptive statistics make it easier to get an sample in the study.
overview of what the data are telling us.
1. The Philippine Mental Health Associations
If we extend the results of our sample to the contacts 1,028 teenagers who are 13 to 17
population, we are performing inferential years of age and live in Antipolo City and
statistics. The generali zation c ontains asked whether or not they had been
uncertainty because a sample cannot tell us prescribed medications for any mental
everything about a population. Therefore, disorders, such as depression or anxiety.
inferential statistics includes a level of
confidence in the results. So rather than saying Population: Teenagers 13 to 17 years of age
that 78% of all students would return the who live in Antipolo City
money, we might say that we are 95%
confident that between 74% and 82% of all Sample: 1,028 teenagers 13 to 17 years of
students would return the money. Notice how age who live in Antipolo City
this inferential statement includes a level of
confidence (measure of reliability) in our
results. It also includes a range of values to
1. A farmer wanted to learn about the weight sample of 50 batteries. (Inferential
of his soybean crop. He randomly sampled Statistics)
100 plants and weighted the soybeans on
each plant. 3. Janine wants to determine the variability of
her six exam scores in Algebra.
Population: Entire soybean crop (Descriptive Statistics)
Sample: 100 selected soybean crop 4. A shipping company wishes to estimate the
number of passengers traveling via their
3. Organize and summarize the information. ships next year using their data on the
number of passengers in the past three
Descriptive statistics allow the researcher to
years. (Inferential Statistics)
obtain an overview of the data and can help
determine the type of statistical methods the 5. A politician wants to determine the total
researcher should use. number of votes his rival obtained in the
past election based on his copies of the
4. Draw conclusion from the information.
tally sheet of electoral returns.
In this step the information collected from the (Descriptive Statistics)
sample is generalized to the population.
DISTINCTION BETWEEN QUALITATIVE AND
Inferential statistics uses methods that takes
QUANTITATIVE VARIABLES
results obtained from a sample, extends them
to the population, and measures the reliability Variables are the characteristics of the
of the result. individuals within the population. For example,
recently my mother and I planted a tomato
Take Note!
plant in our backyard. We collected information
If the entire population is studied, then about the tomatoes harvested from the plant.
inferential statistics is not necessary, because The individuals we studied were the tomatoes.
descriptive statistics will provide all the The variable that interested us was the weight
information that we need regarding the of a tomato.My mom noted that the tomatoes
population. had different weights even though they came
from the same plant. She discovered that
Example: variables such as weight may vary.
For the following statements, decide whether it If variables did not vary, they would be
belongs to the field of descriptive statistics or constants, and statistical inference would
inferential statistics. not be necessary. Think about it this way: If
each tomato had the same weight, then
1. A badminton player wants to know his knowing the weight of one tomato would allow
average score for the past 10 games. us to determine the weights of all tomatoes.
(Descriptive Statistics) However, the weights of the tomatoes vary.
One goal of research is to learn the causes of
2. A car manufacturer wishes to estimate the
the variability so that we can learn to grow
average lifetime of batteries by testing a
plants that yield the best tomatoes.
It is helpful to divide variables into different possible values. If you count to get the
types, as different statistical methods are value of a quantitative variable, it is
applicable to each. The main division is into discrete.
qualitative (or categorical) or quantitative (or
numerical variables). 2. A continuous variable is a quantitative
variable that has an infinite number of
Variables can be classified into two groups: possible values that are not countable. If
you measure to get the value of a
1. Qualitative variables (Categorical) is quantitative variable, it is continuous.
variable that yields categorical responses.
It is a word or a code that represents a Example:
class or category.
Determine whether the following quantitative
2. Quantitative variables (Numeric) takes variables are discrete or continuous.
on numerical values representing an
amount or quantity. 1. The number of heads obtained after
flipping a coin five times. (Discrete)
Example:
2. The number of cars that arrive at a
Determine whether the following variables are McDonald’s drive-through between 12:00
qualitative or quantitative. P.M and 1:00 P.M. (Discrete)
- Food Preferences
- Stage of Disease
- Social Economic Class (First, Middle, Lower)
- Severity of Pain
Both interval and ratio data involve B. ______________________________
measurement. Most data analysis techniques
that apply to ratio data also apply to interval 2. Every year the PSA releases the Current
data..Therefore, in most practical aspects, Population Report based on a survey of
these types of data (interval and ratio) are 50,000 households. The goal of this report
grouped under metric data. In some other is to learn the demographic characteristics,
instances, these type of data are also known such as income, of all households within
a s n u m e r i c a l di s c r e t e an d n u m e r ic a l the Philippines.
continuous.
A. ______________________________
Example:
B. ______________________________
Categorize each of the following as nominal,
ordinal, interval or ratio measurement. 3. Researchers want to determine whether or
not higher folate intake is associated with a
1. Rankin g of colle ge at hletic teams.
lower risk of hypertension (high blood
(Ordinal)
pressure) in women (27 to 44 years of
2. Employee number. (Nominal) age). To make this determination, they look
at 7373 cases of hypertension in these
3. Number of vehicles registered. (Ratio) women and find that those who consume
at least 1000 micrograms per day of total
4. Brands of soft drinks. (Nominal) folate had a decreased risk of hypertension
compared with those who consume less
5. Number of car passers along C5 on a
than 200.
given day. (Ratio)
A. ______________________________
6. Zip code (Nominal)
B. ______________________________
7. Degree of pain (Ordinal)
II. Indicate whether the following statements
ACTIVITIES/ASSESSMENTS:
require the use of descriptive or inferential
Read each item carefully. Write the answer statistics.
on the yellow paper. Answers Only.
______________1. A teacher wants to know
I. A research objective is presented. For the attitudes of all students towards abortion.
each, identify the (A) population and (B)
______________2. A market analyst of a sales
sample in the study.
firm draws a chart showing the sales figures of
8. A polling organization contacts 2141 male a given product for the period 2006-2007.
university graduates who have a white-
______________3. A forecaster predicts the
collar job and asks whether or not they had
results of an election using the number of
received a raise at work during the past 4
votes cast in 15 out of 25 barangays.
months.
______________4. Men are better in math
A. ______________________________
than women.
_____________5. Forty percent of the ______________10. Brands of soft drinks
employees of an organization were recorded
tardy for at least 15 working days. ______________11. Socioeconomic status
DATA COLLECTION
. It is a common practice that people receive
AND BASIC Concepts large quantities of information everyday through
conversations, televisions, computers, the radios,
in Sampling DESIGN newspapers, posters, notices and instructions. It is
just because there is so much information available
that people need to be able to absorb, select and
reject it. In everyday life, in business and industry,
Objectives: certain statistical information is necessary and it is
After successful completion of this independent to know where to find it how to collect it.
module, you should be able to:
Analysis of data can lead to powerful results. Data
can be used to offset anecdotal claims, such as the
• Determine the sources of data
suggestion that cellular telephones cause brain
(primary and secondary data).
cancer. Anecdotal means that the information being
conveyed is based on casual observation, not
• Distinguish the different methods
scientific research. Because data are powerful, they
data collection under primary and can be dangerous when misused. The misuse of
secondary data. data usually occurs when data are incorrectly
obtained or analyzed. For example, radio or
• Determine the appropriate television talk shows regularly ask poll questions for
sample size.
which respondents must call in or use the Internet to
supply their vote. Most likely, the individuals who are
• Differentiate various sampling
going to call in are those who have a strong opinion
techniques.
about the topic. This group is not likely to be
representative of people in general, so the results of
• Know the sources of errors in
the poll are not meaningful. Whenever we look at
sampling.
data, we should be mindful of where the data come
from.
5. Collect data.
Without proper planning for data collection, a
Choosing of Method of Data Collection
number of problems can occur. If the data
collection steps and processes are not Decision-makers need information that is
properly planned, the research project can relevant, timely, accurate and usable. The cost
ultimately end up with a data set that does not of obtaining, processing and analyzing these
serve the purpose for which it was intended. data is high. The challenge is to find ways,
For example, if more than one person is which lead to information that is cost-effective,
involved in the data collection, but data relevant, timely and important for immediate
collectors do not follow consistent data use. Some methods pay attention to timeliness
collection practices, they can end up with data and reduction in cost. Others pay attention to
with different units, collection processes, and accuracy and the strength of the method in
variable names. using scientific.
Consequences from Improperly Collected The statistical data may be classified under
Data two categories, depending upon the sources.
approaches: Primary Data and Secondary
• Inability to answer research questions
Data.
accurately.
SOURCES OF DATA
• Inability to repeat and validate the study.
Whether conducting research in the social
• Distorted findings resulting in wasted
sciences, humanities arts, or natural sciences,
resources.
the ability to distinguish between primary and
• Misleading other researchers to pursue secondary sources is essential.
fruitless avenues of investigation.
Primary Sources - Provide a first-hand
• Compromising decisions for public policy. account of an event or time period and are
considered to be authoritative. They
• Causing harm to human participants and represent original thinking, reports on
animal subjects. discoveries or events, or they can share new
information. Often these sources are created
Steps in Data Gathering at the time the events occurred but they can
also include sources that are created later.
1. Set the objectives for collecting data
They are usually the first formal appearance
2. Determine the data needed based on the of original research.
set objectives.
Primary Data - are data documented by the agency may have been different from the
pr im ar y source. The data collectors purpose of the user of these secondary data.
documented the data themselves. Sec ond ly, t her e m ay have be en bia s
introduced, the size of the sample may have
The first hand information obtained by the been inadequate, or there may have been
investigator is more reliable and accurate since arithmetic or definition errors, hence, it is
the investigator can extract the correct necessary to critically investigate the validity of
information by removing doubts, if any, in the the secondary data.
minds of the respondents regarding certain
questions. High response rates might be The primary data can be collected by the
obtained since the answers to various following five methods:
questions are obtained on the spot. It permits
1. Direct pe rsonal intervie ws - The
explanation of questions concerning difficult
researcher has direct contact with the
subject matter.
inter viewee. T he resear cher gather s
Secondary Sources - offer an analysis, information by asking questions to the
interpretation or a restatement of primary interviewee.
sources and are considered to be
2. Indirect/Questionnaire Method - This
persuasive. They often involve
methods of data collection involve sourcing
generalisation, synthesis, interpretation,
and accessing existing data that were
commentary or evaluation in an attempt to
originally collected for the purpose of the study.
convince the reader of the creator's
argument. They often attempt to describe or Designing good “questioning tools” forms an
explain primary sources. important and time consuming phase in the
development of most research proposals.
Secondary Data - are data documented by a
Once the decision has been made to use
secondary source. The data collectors had the
these techniques, the following questions
data documented by other sources.
should be considered before designing our
In secondary data, data are primary data for tools:
the agency that collected them, and become
secondary for someone else who uses these
• What exactly do we want to know, according
to the objectives and variables we identified
data for his own purposes.
earlier? Is questioning the right technique to
Secondary data are less expensive to collect obtain all answers, or do we need additional
both in money and time. These data can also techniques, such as observations or
be better utilized and sometimes the quality of analysis of records?
such data may be better because these might
have been collected by persons who were • Of whom will we ask questions and what
techniques will we use? Do we understand
specially trained for that purpose.
the to pic s uff ici entl y to de sig n a
On the other hand, such data must be used questionnaire, or do we need some loosely
with great care, because such data may also structured interviews with key informants or
be full of errors due to the fact that the purpose a focus group discussion first to orient
of the collection of the data by the primary ourselves?
• Are our informants mainly literate or Example:
illiterate? If illiterate, the use of self-
administered questionnaires is not an - Can y ou de scribe exactly what the
option. traditional birth attendant did when your
labor started?
• How large is the sample that will be
interviewed? Studies with many respondents - What do you think are the reasons for a high
often use s horter, highly structu red drop-out rate of village health committee
questionnaires, whereas smaller studies members?
al lo w mor e fl ex ib il it y an d m ay us e
A closed-ended question is a type of
questionnaires with a number of open-ended
question that includes a list of response
questions.
categories from which the respondent will
Key Design Principles of a Good select his answer. It is useful if the range of
Questionnaire possible responses is known. This type of
question is usually appropriate for collecting
1. Keep the questionnaire as short as possible. objective data.
7. Write special instructions for interviewers or Question wording and question order have a
respondents. large effect on the responses obtained.
9. Always test your questions before taking the Two surveys were taken in late 1993/early
survey. (Pre-test) 1994 about Elvis Presley.
An open-ended question is a type of question One survey asked: “In the past few years,
that does not include response categories. The there have been a lot of rumors and stories
respondent is not given any possible answers about whether Elvis Presley is really dead.
to choose from. This type of question is usually How do you feel about this? Do you think there
appropriate for collecting subjective data. It is any possibility that these rumors are true
permit free responses that should be recorded and that Elvis Presley is still alive, or don’t you
in the respondent’s own words. think so?”
Second survey asked: “A recent television - Unrealistic Controlled Environments
show examined various theories about Elvis
- Inability to Control for All Variables
Presley’s death. Do you think it is possible that
Elvis is alive or not?” 5. Observation is a technique that involves
systematically selecting, watching and
8% of the respondents to the first question said
recoding behaviors of people or other
it is possible that Elvis is still alive and 16% of
phenomena and aspects of the setting in which
respondents to the second question said it is
they occur, for the purpose of getting (gaining)
possible that Elvis is still alive.
specified information. It includes all methods
3. A focus group is a group interview of from simple visual observations to the use of
approximately six to twelve people who share high level machines and measurements,
similar characteristics or common interests. A sophisticated equipment or facilities such as:
facilitator guides the group based on a - Radiographic
predetermined set of topics.
- biochemical
4. Experiment is a method of collecting data
where there is direct human intervention on the - X-ray machines
conditions that may affect the values of the - Microscope
variable of interest.
- Clinical examinations
Bear in mind that the experimental method has
several limitations that you should be aware of. - Microbiological examinations
The secondary data can be collected by the The sample size is typically denoted by n and
following five methods: it is always a positive integer. No exact sample
size can be mentioned here and it can vary in
1. Published report on newspaper and different research settings. However, all else
periodicals. being equal, large sized sample leads to
increased precision in estimates of various
2. Financial Data reported in annual reports.
properties of the population.
3. Records maintained by the institution.
Take Note!
4. Internal reports of the government
- Representativeness, not size, is the more
departments.
important consideration.
5. Information from official publications.
- Use no less than 30 subjects if possible.
Take Note!
- If you use complex statistics, you may need
• Always investigate the validity and reliability a minimum of 100 or more in your sample
of the data by examining the collection (varies with method).
method employed by your source.
SAMPLE SIZE
3. Degree of Variability
( e )
Zσ
determine the appropriate sample size: n≥
1. Level of Precision
where:
Also called sampling error, the level of
precision, is the range in which the true value Z is the z-score corresponding to level of
of the population is estimated to be. confidence.
( 0.03 )
1.96(0.5)
n≥ = 1067.11 When p = 0.5, the maximum value of
p(1- p)=0.25. This is called the most
conservative estimate, since it gives the
We need a 1068 sample for our study. largest possible estimate of n.
• Estimating Proportion (Infinite The conservative formula using the strong law
Population) of large number.
4 (e)
confidence interval for p with specified margin 1 Z
n≥ ≈ 385
of error e is given by
2 Where:
( e)
Z
n≥ p(1 − p)
Confidence level is 95%.
(
2.58 2
0 01 )
n≥ 0.5(1 − 0.5) = 16,641 N is the population size.
N
n≥
1 + Ne2
Where:
Example:
The researcher need to survey 286 BS stat - Important that the individuals included in a
students. sample represent a cross section of
individuals in the population.
• Finite Population Correction
- If sample is not representative it is biased.
If the population is small then the sample size You cannot generalize to the population from
can be reduced slightly your statistical data.
n0
n≥ Some definitions are needed to make the
n −1
1+ o notion of a good sample more precise.
N
Definitions: - Deliberately or purposively selecting a
“representative” sample.!
• Observation unit - An object on which a Misspecifying the target population. !
measurement is taken. This is the basic unit Failing to include all of the target population
of observation, sometimes called an element. in the sampling frame, called
In studying human populations, observation undercoverage.!
units are often individuals. Including population units in the sampling
frame that are not in the target population,
• Target population - The complete collection
called overcoverage.
of observations we want to study.
- Having multiplicity of listings in the sampling
• Sampled population - The collection of all
frame.!
possible observation units that might have
Substituting a convenient member of a
been chosen in a sample; the population
population for a designated member who is
from which the sample was taken.
not readily available.
• Sample - A subset of a population.
- Failing to obtain responses from all of the
• Sampling unit - A unit that can be selected chosen sample. (Nonresponse)
for a sample. We may want to study
- Allowing the sample to consist entirely of
individuals, but do not have a list of all
volunteers.
individuals in the target population. Instead,
households serve as the sampling units, and Advantage of Sampling Over Complete
the observation units are the individuals Enumeration
living in the households.
- Less Labor
• Sampling frame - A list, map, or other
specification of sampling units in the - Reduced Cost
population from which a sample may be - Greater Speed
selected. For a survey using in-person
interviews, the sampling frame might be a list - Greater Scope
of all street addresses.
- Greater Efficiency and Accuracy
• Sampling technique/Sampling Strategies - - Convenience
It is a plan you set forth to be sure that the
sample you use in your research study - Ethical Considerations
represents the population from which you
Two Type of Samples
drew your sample.
1. Probability Sample
• Sampling Bias - This involves problems in
your sampling, which reveals that your - Samples are obtained using some objective
sample is not representative of your chance mechanism , thus involving
population. randomization.
The following examples indicate some ways in
which selection bias can occur:
- They require the use of a complete listing of - Most basic method of drawing a probability
the elements of the universe called the sample.
sampling frame.
- Assigns equal probabilities of selection to
- The probabilities of selection are known. each possible sample.
Sampling Procedure
N PopulationSize
k= =
n SampleSize
Example:
• Stratified Random Sampling
We want to select a sample of 50 students
- It is obtained by separating the population
from 500 students under this method kth item
into non-overlapping groups called strata
and picked up from the sampling frame.
and then obtaining a simple random sample
Solution: from each stratum.
500
k= = 10 - The individuals within each stratum should
50 be homogeneous (or similar) in some way.
We start to get a sample starting form i and for
every kth unit subsequently. Suppose the Example:
random number i is 6, then we select 15, 25,
A sample of 50 students is to be drawn from a
35, 45, .. .
population consisting of 500 students
Advantage: Drawing of the sample is easy. It belonging to two institutions A and B. The
is easy to administer in the field, and the number of students in the institution A is 200
sample is spread evenly over the population. and the institution B is 300. How will you draw
the sample using proportional allocation?
Disadvantage: May give poor precision when
unsuspected periodicity is present in the
population.
Given:
( N ) 1 ( 500 )
n 50
n1 = N = 200 = 20
(N) ( 500 )
n 50
n2 = N2 = 300 = 30
Example:
Disadvantage: In actual field applications, 1. Organize the sampling process into stages
adjacent households tend to have more similar where the unit of analysis is systematically
characteristics than households distantly apart. grouped.
Example:
Used probability sampling if the main objective • Purposive Sampling - It is based on certain
of the sample survey is making inferences criteria laid down by the researcher. People
about the characteristics of the population who satisfy the criteria are interviewed. It is
under study. used to determine the target population of
those who will be taken for the study.
• Judgement Sampling - selects sample in ACTIVITIES/ASSESSMENTS:
accordance with an expert’s judgment.
I. Determine if the source would be a primary
Cases wherein Non-Probability Sampling is or a secondary source.
Useful
______________1. Government Records
- Only few are willing to be interviewed
______________2. Dictionary
- Extreme difficulties in locating or identifying
subjects ______________3. Artifact
REFERENCES:
h t t p : / / w w w. n a t c o 1 . o r g / r e s e a r c
h / fi l e s /SamplingStrategies.pdf
https://data36.com/statistical-bias-types-
explained/
MODULE 3: DESCRIPTIVE STATISTICS
OBJECTIVES:
After successful completion of this module, you should be
able to:
34 42 20 50 17 9 34 43
50 18 35 43 50 23 23 35
37 38 38 39 39 38 38 39
24 29 25 26 28 27 44 44
49 48 46 45 45 46 45 46
Number
Age (in thousands)
25 - 34 14,482
35 - 44 14,156
45 - 54 13,801
55 - 64 12,123
65 - 74 7,010
Scores Frequency
10 - 19 25
20 - 29 36
30 - 39 40
40 and over 12
xmax − xmin
cw =
nc
→ →
Scores Frequency
1 - 10 5
11 - 20 9
21 - 30 10
31 - 40 12
41 - 50 24
Total 60
0 7
Ungrouped data without a 1 15
frequency distribution 2 12
3 4
4 5
5 2
Total 45
Sample Mean
n r
∑i=1 xi ∑i=1 fxi
x̄ = x̄ =
n n
Population Mean
N r
∑i=1 xi ∑ i=1 fxi
μ= μ=
N N
(2 )
n
− < cf i
x̃ = LB +
f
( d1 + d2 )
d1
x = LB + i
Data Set I 108 112 116 120 124
Data Set II 108 112 116 120 205
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Class Frequency x fx
Interval (f)
55 - 59 3
50 - 54 6
45 - 49 7
40 - 44 9
35 - 39 6 LC + UP
30 - 34 4
x=
25 - 29 5
2
7 55 + 59
∑ fxi =
Total n= x= = 57
2
i=1
50 + 54
x= = 52
2
Frequency 7
Class Interval
(f)
x fx ∑i=1 fxi
55 - 59 3 57 171 x̄ =
50 - 54 6 52 312 n
45 - 49 7 47 329
40 - 44 9 42 378 1,675
35 - 39 6 37 222
=
30 - 34 4 32 128
40
25 - 29 5 27 135
7 = 41.88
Total n = 40 ∑ fxi = 1,675
i=1
Class
f LB < cf
Interval
55 - 59 3
50 - 54 6
45 - 49 7
40 - 44 9
35 - 39 6
30 - 34 4
25 - 29 5
55 − 0.5 = 54.5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5
Class f LB < cf
Interval
55 - 59 3 54.5
50 - 54 6 49.5
45 - 49 7 44.5
40 - 44 9 39.5
35 - 39 6 34.5
30 - 34 4 29.5
25 - 29 5 24.5 5
Total n = 40
5 + 4 = 9 + 6 = 15 + 9 = 24 + 7 = 31 + 6 = 37 + 3 = 40
Class
f LB < cf
Interval
55 - 59 3 54.5 40
50 - 54 6 49.5 37
45 - 49 7 44.5 31 n 40
40 - 44 9 39.5 24
= = 20
2 2
35 - 39 6 34.5 15
30 - 34 4 29.5 9
25 - 29 5 24.5 5
Total n = 40
(2 )
n
− < cf i
(20 − 15)5
x̃ = LB + x̃ = 39.5 + = 42.28
f 9
Class
f LB < cf
Interval
55 - 59 3 54.5 40
50 - 54 6 49.5 37
45 - 49 7 44.5 31
40 - 44 9 39.5 24
35 - 39 6 34.5 15
30 - 34 4 29.5 9
25 - 29 5 24.5 5
d1 = 9 − 6 = 3
( d1 + d2 )
d1
x = LB + i
d2 = 9 − 7 = 2
( 3 + 2)
3
x = 39.5 + 5 = 42.5
)
(4
nk − < cf i
Qk = LB + f
nk
Qclass = + 0.5
4
( 10 )
nk
− < cf i
Dk = LB +
f
nk
Dclass = + 0.5
10
( 100 )
nk
− < cf i
Pk = LB +
f
nk
Pclass = + 0.5
100
Month Hour Lost (x)
January 55
February 23
March 37
April 37
May 48
June 42
July 27
August 20
September 30
October 32
November 24
December 40
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
(12)(3)
Qclass = = 9.5
4
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
Q3 = 40 + 0.5(42 − 40)
= 41
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
(12)(4)
Dclass = + 0.5 = 5.3
10
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
D4 = 30 + 0.3(32 − 30)
= 30.6
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
(12)(55)
Pclass = + 0.5 = 7.1
100
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
Class f LB < cf
Interval
55 - 59 3
50 - 54 6
45 - 49 7
40 - 44 9
35 - 39 6
30 - 34 4
25 - 29 5
55 − 0.5 = 54.5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5
Class f LB < cf
Interval
55 - 59 3 54.5
50 - 54 6 49.5
45 - 49 7 44.5
40 - 44 9 39.5
35 - 39 6 34.5
30 - 34 4 29.5
25 - 29 5 24.5 5
Total n = 40
5 + 4 = 9 + 6 = 15 + 9 = 24 + 7 = 31 + 6 = 37 + 3 = 40
Class
f LB < cf
Interval
55 - 59 3 54.5 40
50 - 54 6 49.5 37
45 - 49 7 44.5 31 nk (40)(1)
= = 10
40 - 44 9 39.5 24 4 4
35 - 39 6 34.5 15
30 - 34 4 29.5 9
25 - 29 5 24.5 5
Total n = 40
( 4 )
nk
− < cf i
(10 − 9)5
Qk = LB + Q1 = 34.5 + = 35.33
f 6
Class
f LB < cf
Interval
55 - 59 3 54.5 40
50 - 54 6 49.5 37
45 - 49 7 44.5 31 nk (40)(7)
= = 28
40 - 44 9 39.5 24 10 10
35 - 39 6 34.5 15
30 - 34 4 29.5 9
25 - 29 5 24.5 5
Total n = 40
( 10 )
nk
− < cf i
(28 − 24)5
Dk = LB + D7 = 44.5 + = 47.36
f 7
Class
f LB < cf
Interval
55 - 59 3 54.5 40
50 - 54 6 49.5 37
45 - 49 7 44.5 31 nk (40)(10)
= =4
40 - 44 9 39.5 24 100 100
35 - 39 6 34.5 15
30 - 34 4 29.5 9
25 - 29 5 24.5 5
Total n = 40
( 100 )
nk
− < cf i (5 − 0)5
P10 = 24.5 + = 29.5
Pk = LB + 5
f
18 - 24 28
25 - 31 54
32 - 38 38
39 - 45 20
46 - 52 17
53 - 59 3
Class
Interval f LB < cf
18 - 24 28
25 - 31 54
32 - 38 38
39 - 45 20
46 - 52 17
53 - 59 3 18 − 0.5 = 17.5
Total n= 25 − 0.5 = 24.5
32 − 0.5 = 31.5
Class
Interval f LB < cf
18 - 24 28 17.5 28
25 - 31 54 24.5
32 - 38 38 31.5
39 - 45 20 38.5
46 - 52 17 45.5
53 - 59 3 52.5
Total n = 160
Class
f LB < cf
Interval
18 - 24 28 17.5 28
25 - 31 54 24.5 82
nk (160)(2)
32 - 38 38 31.5 120 = = 80
39 - 45 20 38.5 140
4 4
46 - 52 17 45.5 157
53 - 59 3 52.5 160
Total n = 160
( 4 )
nk
− < cf i
(80 − 28)7
Qk = LB + Q2 = 24.5 + = 31.24
f 54
Class
f LB < cf
Interval
18 - 24 28 17.5 28
25 - 31 54 24.5 82
nk (160)(5)
32 - 38 38 31.5 120 = = 80
10 10
39 - 45 20 38.5 140
46 - 52 17 45.5 157
53 - 59 3 52.5 160
Total n = 160
( 10 )
nk
− < cf i
(80 − 28)7
Dk = LB + D5 = 24.5 + = 31.24
f 54
Class
f LB < cf
Interval
18 - 24 28 17.5 28
25 - 31 54 24.5 82
nk (160)(50)
32 - 38 38 31.5 120 = = 80
100 100
39 - 45 20 38.5 140
46 - 52 17 45.5 157
53 - 59 3 52.5 160
Total n = 160
( 100 )
nk
− < cf i
(80 − 28)7
Pk = LB + P50 = 24.5 + = 31.24
f 54
Figure 1 Figure 2
R = Xmax. − Xmin.
n r
∑ i=1 (xi − x̄)2 ∑i=1 f(xi − x̄)2
s= s=
n−1 n−1
r
N
∑ i=1 (xi − μ)2 ∑ i=1 f(xi − μ)2
σ= σ=
N N
Sample Variance
n r
∑i=1 (xi − x̄)2 ∑i=1 f(xi − x̄)2
2
s2 = s =
n−1 n−1
Population Variance
N r
∑ i=1 (xi − μ)2 ∑i=1 f(xi − μ)2
2 2
σ = σ =
N N
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3
50 - 54 6
45 - 49 7
40 - 44 9
35 - 39 6
30 - 34 4
25 - 29 5
7 7
∑ ∑
Total n= fxi = f(xi − x̄)2 =
i=1 i=1
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61
50 - 54 6 52 312 102.41
45 - 49 7 47 329 26.21
40 - 44 9 42 378 0.01
35 - 39 6 37 222 23.81
30 - 34 4 32 128 97.61
25 - 29 5 27 135 221.41
7 7
Total n = 40 ∑ fxi = ∑ f(xi − x̄) =
2
i=1 1,675 i=1
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61 685.83
50 - 54 6 52 312 102.41 614.46
45 - 49 7 47 329 26.21 183.47
40 - 44 9 42 378 0.01 0.09
35 - 39 6 37 222 23.81 142.86
30 - 34 4 32 128 97.61 390.44
25 - 29 5 27 135 221.41 1107.05
7 7
Total n = 40 ∑ fxi = ∑
f(xi − x̄)2 =
i=1 1,675 i=1 3,124.20
f(x1 − x̄)2 = 3(228.61) = 685.83
f(x2 − x̄)2 = 6(102.41) = 614.46
f(x3 − x̄)2 = 7(26.21) = 183.47
2
7 f(x − x̄)
∑i=1 i
s= n−1
Class
Interval
(xi − x̄)2 f(xi − x̄)2
55 - 59 228.61 685.83 3,124.20
s=
50 - 54 102.41 614.46 40 − 1
45 - 49 26.21 183.47 = 8.95
40 - 44 0.01 0.09
35 - 39 23.81 142.86 7
30 - 34 97.61 390.44 ∑i=1 f(xi − x̄)2
2
25 - 29 221.41 1107.05
s =
n−1
7
∑ f(xi − x̄) =
2
Total 3,124.20
i=1 3,124.20 s2 =
40 − 1
= 80.11
Data Set 15 13 20 19 14
n n
∑ ∑
(xi − x̄) = 0 → (xi − x̄)2 ≠ 0
i=1 i=1
16.2 − 3.11 = 13.09
16.2 + 3.11 = 19.31
Skewness < 0 Skewness > 0 Skewness = 0
x̄ − x
Sk =
s
3(x̄ − x̃)
Sk =
s
QD
k=
P90 − P10
Data Set A 40 38 42 40 39 39 43 40 39 40
Data Set B 46 37 40 33 42 36 40 47 34 45
Normal Distribution
Normal Curve
50 100 150
Properties of Normal Curve
μ−σ μ μ+σ
area = 1
0.50 0.50
μ1 = μ2, σ1 < σ2 μ1 < μ2, σ1 < σ2
μ1 < μ2, σ1 = σ2
x−μ
z=
σ
< z) )
Standard Normal Distribution Table 1 (Positive SideP(Z
0 z1 z1 0
z1 0 z2 0 z2 z1 0
1 − Area
z1 z2 0 z1 0 z2
1 − Area 1 − Area
0 z1 0 0 z1
Area = 1
0 z1 0 z1 0
Area = 0.50
z1 0 0 z1
z1 z2 0 z1 0 z2
z1 0 z2 0 z2 z1 0
0.50 − Area 0.50 − Area
z1 0 z1 0 0
0.50 − Area Area = 0.50
0 z1 0 0 z1
Area = 0.50
Area = P(X > 560)
X
450 510 570
560
Using Table 1
Using Table 2
X
35.55 38.72 41.89
35
Using Table 1
( 3.17 )
35 − 38.72
=P Z<
= P(Z < − 1.17)
= 1 − P(Z ≥ − 1.17)
= 1 − 0.8790 −2 −1 0 1 2
Z
= 0.1210
−1.17
Using Table 2
(
35 − 38.72
3.17 )
=P Z<
= P(Z < − 1.17)
= 0.1210
Z
−2 −1 0 1 2
−1.17
X
35.55 38.72 41.89
35 40
Using Table 1
(
35 − 38.72
3.17 )
=P 40 − 38.72
≤Z≤
3.17
= P(−1.17 ≤ Z ≤ 0.40)
= P(Z ≤ 0.40) − [1 − P(Z ≥ − 1.17)]
= 0.6554 − [1 − 0.8790] Area = P(−1.17 ≤ Z ≤ 0.40)
= 0.6554 − 0.1210
= 0.5344
X
−2 −1 0 1 2
−1.17 0.40
Using Table 2
X
−2 −1 0 1 2
−1.17 0.40
ACTIVITIES/ASSESSMENTS:
ACTIVITIES/ASSESSMENTS:
A.
B.
ACTIVITIES/ASSESSMENTS:
Needs
Origin / Rating Poor Improvement Satisfactory V Good Excellent Total
ACTIVITIES/ASSESSMENTS:
ACTIVITIES/ASSESSMENTS:
37 46 37 26 30 41 28 49 29 34 46 50 38 35 42
35 46 45 27 41 26 45 39 43 46 36 32 46 36 48
49 47 30 43 31 34 38 41 39 45 28 43 37 39 26
38 30 29 38 26 31 42 44 48 43 37 46 38 27 50
42 33 42 42 43 39 39 31 46 46 48 48 50 45 31
Scores Frequency Percentage (%)
26 to 30
31 to 35
36 to 40
41 to 45
46 to 50
Total
ACTIVITIES/ASSESSMENTS:
ACTIVITIES/ASSESSMENTS:
MODULE 4: INFERENTIAL STATISTICS
OBJECTIVES:
After successful completion of this module, you should be
able to:
α
Null Hypothesis:
Alternative Hypothesis:
α
α
One-tailed and Left tailed One-tailed and Right tailed
H a : μ1 < μ2 Ha : μ1 > μ2
Rejection Region
Rejection Region
-2 0 2 -2 0 2
Two-tailed
a : μ1 ≠ μ2
H
Rejection Region Rejection Region
-2 0 2
To determine if the data is follows a normality
distribution, we can use the graphical or
numerical method.
STEP 1:
STEP 2: n
SS = 2
∑( i
x − x̄)
i=1
∑ i ( n+1−i
STEP 3: b= a x − xi)
i=1
n
m=
2
n−1
m=
2
Shapiro - Wilk Table
STEP 4: b2
W=
SS
STEP 5:
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2
Ho : μ1 ≥ μ2
Ha : μ1 < μ2
α = 0.05
pvalue ≤ α
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2
Ho : μ1 = μ2
Ha : μ1 ≠ μ2
α = 0.05
pvalue ≤ α
pvalue ≤ α
Ho : μ1 = μ2 = . . . = μk
Ha :
α = 0.10
pvalue ≤ α
pvalue ≤ α
pvalue ≤ α
pvalue ≤ α
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative
linear relationship.
• The closer to 1, the stronger the positive
linear relationship.
• The closer to 0, the weaker the linear
relationship.
Y Y Y
X X X
r = -1 r = -.6 r =0
Y Y
r = .6 r=1
Test Statistic:
df
t=r
1 − r2
where:
df = degrees of freedom
r = correlation coefficient of P
Note:
df = n − 2
α = 0.0.5
df
t=r
1 − r2
df = n − 2
pvalue ≤ α
Chi-Square Distribution
χ2
(O − E)2
∑
2
χ =
E
Example:
α = 0.0.5
(row total)(column total)
E=
grand total
pvalue ≤ α
ACTIVITIES/ASSESSMENTS:
ACTIVITIES/ASSESSMENTS:
ACTIVITIES/ASSESSMENTS:
ACTIVITIES/ASSESSMENTS:
Men Women
(in $) (in $)
107.48 125.98
143.61 45.53
90.19 56.35
125.53 80.62
70.7 46.37
83 44.34
129.63 75.21
154.22 68.48
93.8 85.82
126.11
ACTIVITIES/ASSESSMENTS:
ACTIVITIES/ASSESSMENTS:
Head
Height
Circumference
(inches)
(inches)
27.75 17.5
24.5 17.1
25.5 17.1
26 17.3
25 16.9
27.75 17.6
26.5 17.3
27 17.5
26.75 17.3
26.75 17.5
27.5 17.5
ACTIVITIES/ASSESSMENTS: