100% found this document useful (1 vote)
63 views126 pages

Statistical Analysis With Software Application

Uploaded by

Jim Root
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
63 views126 pages

Statistical Analysis With Software Application

Uploaded by

Jim Root
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

MODULE 1: DEFINITION OF STATISTICS

Statistics plays a major role in many aspects of our


INTRODUCTION TO THE lives. It is used in sports, for example, to help a
general manager decide which player might be the
STATISTICAL best fit for a team. It is used in politics to help
candidates understand how the public feels about

CONCEPTS various policies. And statistics is used in medicine to


help determine the effectiveness of new drugs. Used
a p p r o p r i a t e l y, s t a t i s t i c s c a n e n h a n c e o u r
understanding of the world around us. Used
Objectives: inappropriately, it can lend support to inaccurate
After successful completion of this beliefs. Understanding statistical methods will
provide you with the ability to analyze and critique
module, you should be able to:
studies and the opportunity to become an informed
consumer of information. Understanding statistical
• Define statistics.
methods will also enable you to distinguish solid
analysis from bogus “facts.”
• Enumerate the importance and
limitations of statistics Many people say that statistics is numbers. After all,
we are bombarded by numbers that supposedly
• Explain the process of statistics represent how we feel and who we are. Certainly,
statistics has a lot to do with numbers, but this
• Know the difference between definition is only partially correct. Statistics is also
descriptive and inferential about where the numbers come from (that is, how
statistics. they were obtained) and how closely the numbers
reflect reality.
• Distinguish between qualitative
Statistics is the science of collecting, organizing,
and quantitative variables.
summarizing, and analyzing information to draw
conclusions or answer questions. In addition,
• Distinguish between discrete and
statistics is about providing a measure of confidence
continuous variables. in any conclusions.

• Determine the level of Let’s break this definition into four parts. The first
measurement of a variable. part states that statistics involves the collection of
information. The second refers to the organization
and summarization of information. The third
states that the information is analyzed to draw
conclusions or answer specific questions. The
fourth part states that results should be reported
using some measure that represents how
convinced we are that our conclusions reflect
reality.
• Statistics is important because it enables 4. Statistics table may be misused.
people to make decisions based on empirical
evidence. 5. Statistics is only, one of the methods of
studying a problem.
• Statistics provides us with tools needed to
co n ve r t m a ssive d a ta in to p e r tin e n t Definitions:
information that can be used in decision
• Universe is the set of all entities under
making.
study.
• Statistics can provide us information that we
• A Population is the total or entire group of
can use to make sensible decisions.
individuals or observations from which
What information is referred to in the information is desired by a researcher. Apart
definition? from persons, a population may consist of
mosquitoes, villages, institution, etc.
The information referred to the definition is the
data. According to the Merriam Webster • An individual is a person or object that is a
dictionary, data are “factual information used member of the population being studied.
as a basis for reasoning, discussion, or
• A statistic is a numerical summary of a
calculation”.
sample.
Data can be numerical, as in height, or
• Sample is the subset of the population.
nonnumerical, as in gender. In either case,
data describe characteristics of an individual. • Descriptive statistics consist of organizing
and summarizing data. Descriptive statistics
Field of Statistics
describe data through numerical summaries,
A. Mathematical Statistics- The study and tables, and graphs.
development of statistical theory and methods
• Inferential statistics uses methods that
in the abstract.
take a result from a sample, extend it to the
B. Applied Statistics- The application of population, and measure the reliability of the
statistical methods to solve real problems result.
involving randomly generated data and the
• A parameter is a numerical summary of a
development of new statistical methodology
population
motivated by real problems. Example branches
of Applied Statistics: psychometric, Example: Consider the Scenario.
econometrics, and biostatistics.
You are walking down the street and notice
Limitation of Statistics that a person walking in front of you drops
Statistics is not suitable to the study of PHP100. Nobody seems to notice the PHP100
qualitative phenomenon. except you. Since you could keep the money
without anyone knowing, would you keep the
2. Statistics does not study individuals. money or return it to the owner?

3. Statistical laws are not exact.


Suppose you wanted to use this scenario as a account for the variability in our results. One
gauge of the morality of students at your goal of inferential statistics is to use statistics
school by determining the percent of students to estimate parameters.
who would return the money. How might you
do this? You could attempt to present the PROCESS OF STATISTICS
scenario to every student at the school, but
1. Identify the research objective.
this would be difficult or impossible if the
student body is large. A second possibility is to A researcher must determine the question(s)
present the scenario to 50 students and use he or she wants answered. The question(s)
the results to make a statement about all the must clearly identify the population that is to be
students at the school. studied. Identify the research objective.

In the PHP100 study presented, the population 2. Collect the information needed to answer
is all the students at the school. Each student the questions.
is an individual. The sample is the 50 students
selected to participate in the study. Conducting research on an entire population is
often difficult and expensive, so we typically
Suppose 39 of the 50 students stated that they look at a sample. This step is vital to the
would return the money to the owner. We could statistical process, because if the data are not
present this result by saying that the percent of collected correctly, the conclusions drawn are
students in the survey who would return the meaningless. Do not overlook the importance
money to the owner is 78%. This is an of appropriate data collection.
example of a descriptive statistic because it
describes the results of the sample without Example:
making any general conclusions about the
population. So 78% is a statistic because it is a A research objective is presented. For each
numerical summary based on a sample. research objective, identify the population and
Descriptive statistics make it easier to get an sample in the study.
overview of what the data are telling us.
1. The Philippine Mental Health Associations
If we extend the results of our sample to the contacts 1,028 teenagers who are 13 to 17
population, we are performing inferential years of age and live in Antipolo City and
statistics. The generali zation c ontains asked whether or not they had been
uncertainty because a sample cannot tell us prescribed medications for any mental
everything about a population. Therefore, disorders, such as depression or anxiety.
inferential statistics includes a level of
confidence in the results. So rather than saying Population: Teenagers 13 to 17 years of age
that 78% of all students would return the who live in Antipolo City
money, we might say that we are 95%
confident that between 74% and 82% of all Sample: 1,028 teenagers 13 to 17 years of
students would return the money. Notice how age who live in Antipolo City
this inferential statement includes a level of
confidence (measure of reliability) in our
results. It also includes a range of values to
1. A farmer wanted to learn about the weight sample of 50 batteries. (Inferential
of his soybean crop. He randomly sampled Statistics)
100 plants and weighted the soybeans on
each plant. 3. Janine wants to determine the variability of
her six exam scores in Algebra.
Population: Entire soybean crop (Descriptive Statistics)

Sample: 100 selected soybean crop 4. A shipping company wishes to estimate the
number of passengers traveling via their
3. Organize and summarize the information. ships next year using their data on the
number of passengers in the past three
Descriptive statistics allow the researcher to
years. (Inferential Statistics)
obtain an overview of the data and can help
determine the type of statistical methods the 5. A politician wants to determine the total
researcher should use. number of votes his rival obtained in the
past election based on his copies of the
4. Draw conclusion from the information.
tally sheet of electoral returns.
In this step the information collected from the (Descriptive Statistics)
sample is generalized to the population.
DISTINCTION BETWEEN QUALITATIVE AND
Inferential statistics uses methods that takes
QUANTITATIVE VARIABLES
results obtained from a sample, extends them
to the population, and measures the reliability Variables are the characteristics of the
of the result. individuals within the population. For example,
recently my mother and I planted a tomato
Take Note!
plant in our backyard. We collected information
If the entire population is studied, then about the tomatoes harvested from the plant.
inferential statistics is not necessary, because The individuals we studied were the tomatoes.
descriptive statistics will provide all the The variable that interested us was the weight
information that we need regarding the of a tomato.My mom noted that the tomatoes
population. had different weights even though they came
from the same plant. She discovered that
Example: variables such as weight may vary.

For the following statements, decide whether it If variables did not vary, they would be
belongs to the field of descriptive statistics or constants, and statistical inference would
inferential statistics. not be necessary. Think about it this way: If
each tomato had the same weight, then
1. A badminton player wants to know his knowing the weight of one tomato would allow
average score for the past 10 games. us to determine the weights of all tomatoes.
(Descriptive Statistics) However, the weights of the tomatoes vary.
One goal of research is to learn the causes of
2. A car manufacturer wishes to estimate the
the variability so that we can learn to grow
average lifetime of batteries by testing a
plants that yield the best tomatoes.
It is helpful to divide variables into different possible values. If you count to get the
types, as different statistical methods are value of a quantitative variable, it is
applicable to each. The main division is into discrete.
qualitative (or categorical) or quantitative (or
numerical variables). 2. A continuous variable is a quantitative
variable that has an infinite number of
Variables can be classified into two groups: possible values that are not countable. If
you measure to get the value of a
1. Qualitative variables (Categorical) is quantitative variable, it is continuous.
variable that yields categorical responses.
It is a word or a code that represents a Example:
class or category.
Determine whether the following quantitative
2. Quantitative variables (Numeric) takes variables are discrete or continuous.
on numerical values representing an
amount or quantity. 1. The number of heads obtained after
flipping a coin five times. (Discrete)
Example:
2. The number of cars that arrive at a
Determine whether the following variables are McDonald’s drive-through between 12:00
qualitative or quantitative. P.M and 1:00 P.M. (Discrete)

1. Haircolor (Qualitative) 3. The distance of a 2005 Toyota Prius can


travel in city conditions with a full tank of
2. Temperature (Quantitative) gas. (Continuous)
3. Stages of breast cancer (Qualitative) 4. Number of words correctly spelled.
(Discrete)
4. Number of hamburger sold (Quantitative)
5. Time of a runner to finish one lap.
5. Number of children (Quantitative)
(Continuous)
6. Zip code (Qualitative)
LEVELS OF MEASUREMENT
7. Place of birth (Qualitative)

8. Degree of pain (Qualitative)

DISTINCTION BETWEEN DISCRETE AND


CONTINUOUS

Quantitative variables may be further classified


into:

1. A discrete variable is a quantitative


variable that either a finite number of
Levels of Measurement
possible values or a countable number of
It is important to know which type of scale is 3. Interval Level - This is a measurement level
represented by your data since different not only classifies and orders the
statistics are appropriate for different scales of measurements, but it also specifies that the
measurement. A characteristic may be
measured using nominal, ordinal, interval and
ration scales. . A
. Arithmetic
1. Nominal Level - They are sometimes operations such as addition and subtraction
called categorical scales or categorical can be performed on values of the variable.
data. Such a scale classifies persons or
objects into two or more categories. Example:
Whatever the basis for classification, a
person can only be in one category, and - Te m p e r a t u r e o n F a h r e n h e i t / C e l s i u s
Thermometer
members of a given category have a
common set of characteristics. - Trait anxiety (e.g., high anxious vs. low
anxious)
Example:
- IQ (e.g., high IQ vs. average IQ vs. low IQ)
- Method of payment (cash, check, debit card,
credit card) 4. Ratio Level - A ratio scale represents the
highest, most precise, level of measurement. It
- Type of school (public vs. private) has the properties of the interval level of
- Eye Color (Blue, Green, Brown) measurement and the ratios of the values of
the variable have meaning. A
2. Ordinal Level - This involves data that may Arithmetic
be arranged in some order, but differences operations such as multiplication and division
between data values either cannot be can be performed on the values of the
determined or meaningless. An ordinal scale variable.
not only classifies subjects but also ranks them
in terms of the degree to which they possess a Example:
characteristics of interest. In other words, an
- Height and weight
ordinal scale puts the subjects in order from
highest to lowest, from most to least. Although - Time
ordinal scales indicate that some subjects are
higher, or lower than others, they do not
- Time until death
indicate how much higher or how much better. Operations that make sense for variables of
different scales.
Example:

- Food Preferences
- Stage of Disease
- Social Economic Class (First, Middle, Lower)
- Severity of Pain
Both interval and ratio data involve B. ______________________________
measurement. Most data analysis techniques
that apply to ratio data also apply to interval 2. Every year the PSA releases the Current
data..Therefore, in most practical aspects, Population Report based on a survey of
these types of data (interval and ratio) are 50,000 households. The goal of this report
grouped under metric data. In some other is to learn the demographic characteristics,
instances, these type of data are also known such as income, of all households within
a s n u m e r i c a l di s c r e t e an d n u m e r ic a l the Philippines.
continuous.
A. ______________________________
Example:
B. ______________________________
Categorize each of the following as nominal,
ordinal, interval or ratio measurement. 3. Researchers want to determine whether or
not higher folate intake is associated with a
1. Rankin g of colle ge at hletic teams.
lower risk of hypertension (high blood
(Ordinal)
pressure) in women (27 to 44 years of
2. Employee number. (Nominal) age). To make this determination, they look
at 7373 cases of hypertension in these
3. Number of vehicles registered. (Ratio) women and find that those who consume
at least 1000 micrograms per day of total
4. Brands of soft drinks. (Nominal) folate had a decreased risk of hypertension
compared with those who consume less
5. Number of car passers along C5 on a
than 200.
given day. (Ratio)
A. ______________________________
6. Zip code (Nominal)
B. ______________________________
7. Degree of pain (Ordinal)
II. Indicate whether the following statements
ACTIVITIES/ASSESSMENTS:
require the use of descriptive or inferential
Read each item carefully. Write the answer statistics.
on the yellow paper. Answers Only.
______________1. A teacher wants to know
I. A research objective is presented. For the attitudes of all students towards abortion.
each, identify the (A) population and (B)
______________2. A market analyst of a sales
sample in the study.
firm draws a chart showing the sales figures of
8. A polling organization contacts 2141 male a given product for the period 2006-2007.
university graduates who have a white-
______________3. A forecaster predicts the
collar job and asks whether or not they had
results of an election using the number of
received a raise at work during the past 4
votes cast in 15 out of 25 barangays.
months.
______________4. Men are better in math
A. ______________________________
than women.
_____________5. Forty percent of the ______________10. Brands of soft drinks
employees of an organization were recorded
tardy for at least 15 working days. ______________11. Socioeconomic status

______________6. There are very few ______________12. Status Employment


gender-related occupations.
______________13. Number of missing teeth
____________ 7. An account predicts
______________14. Number of vehicles
accuracy rate of a client’s financial resources.
registered
______________ 8. A quality control manager
______________15. Jersey Number
wishes to check production output.
______________16. Number of employees
______________ 9. Records indicated that
collecting retirement
75% of the faculty in the graduate school are
benefits from GSIS
doctoral degree holders.
______________17. Duration of a seizure
______________ 10. There is no relationship
between educational qualification of parents ______________18. Cause of death
and academic achievement of their children.
______________19. Dividends
III. Identify the qualitative and quantitative
variables and indicate the highest level of ______________20. Current assets list
measurement required in each. If
quantitative, classify whether discrete or ______________21. Number of heart attacks
continuous.
______________22. Account receivable!
______________1. Occupation
______________23. Clothing size
______________2. Number of government
officials ______________24. Blood type

______________3. Favorite color ______________25. Ethnic group

______________4. Temperature in Celsius REFERENCES:


degrees
Statistics. Informed Decision using Data by
______________5. Type of school Michael Sullivan, III,. Fifth Edition

______________6. Volume of mineral water Sampling: Design and Analysis by Sharon L.


sold daily Lhr. Second Edition

______________7. Employee number

______________8. Civil status

______________9. Equity accounts


MODULE 2: DATA COLLECTION

DATA COLLECTION
. It is a common practice that people receive
AND BASIC Concepts large quantities of information everyday through
conversations, televisions, computers, the radios,
in Sampling DESIGN newspapers, posters, notices and instructions. It is
just because there is so much information available
that people need to be able to absorb, select and
reject it. In everyday life, in business and industry,
Objectives: certain statistical information is necessary and it is
After successful completion of this independent to know where to find it how to collect it.
module, you should be able to:
Analysis of data can lead to powerful results. Data
can be used to offset anecdotal claims, such as the
• Determine the sources of data
suggestion that cellular telephones cause brain
(primary and secondary data).
cancer. Anecdotal means that the information being
conveyed is based on casual observation, not
• Distinguish the different methods
scientific research. Because data are powerful, they
data collection under primary and can be dangerous when misused. The misuse of
secondary data. data usually occurs when data are incorrectly
obtained or analyzed. For example, radio or
• Determine the appropriate television talk shows regularly ask poll questions for
sample size.
which respondents must call in or use the Internet to
supply their vote. Most likely, the individuals who are
• Differentiate various sampling
going to call in are those who have a strong opinion
techniques.
about the topic. This group is not likely to be
representative of people in general, so the results of
• Know the sources of errors in
the poll are not meaningful. Whenever we look at
sampling.
data, we should be mindful of where the data come
from.

Even when data tell us that a relation exists, we


need to investigate. For example, a study showed
that breast-fed children have higher IQs than those
who were not breast-fed. Does this study mean that
a mother who breast-feeds her child will increase the
child’s IQ? Not necessarily. It may be that some
factor other than breast-feeding contributes to the IQ
of the children. In this case, it turns out that mothers
who breastfeed generally have higher IQs than
those who do not. Therefore, it may be genetics that
leads to the higher IQ, not breast-feeding.
3. Determine the method to be used in data
gathering and define the comprehensive
data collection points.

4. Design data gathering forms to be used.

5. Collect data.
Without proper planning for data collection, a
Choosing of Method of Data Collection
number of problems can occur. If the data
collection steps and processes are not Decision-makers need information that is
properly planned, the research project can relevant, timely, accurate and usable. The cost
ultimately end up with a data set that does not of obtaining, processing and analyzing these
serve the purpose for which it was intended. data is high. The challenge is to find ways,
For example, if more than one person is which lead to information that is cost-effective,
involved in the data collection, but data relevant, timely and important for immediate
collectors do not follow consistent data use. Some methods pay attention to timeliness
collection practices, they can end up with data and reduction in cost. Others pay attention to
with different units, collection processes, and accuracy and the strength of the method in
variable names. using scientific.

Consequences from Improperly Collected The statistical data may be classified under
Data two categories, depending upon the sources.
approaches: Primary Data and Secondary
• Inability to answer research questions
Data.
accurately.
SOURCES OF DATA
• Inability to repeat and validate the study.
Whether conducting research in the social
• Distorted findings resulting in wasted
sciences, humanities arts, or natural sciences,
resources.
the ability to distinguish between primary and
• Misleading other researchers to pursue secondary sources is essential.
fruitless avenues of investigation.
Primary Sources - Provide a first-hand
• Compromising decisions for public policy. account of an event or time period and are
considered to be authoritative. They
• Causing harm to human participants and represent original thinking, reports on
animal subjects. discoveries or events, or they can share new
information. Often these sources are created
Steps in Data Gathering at the time the events occurred but they can
also include sources that are created later.
1. Set the objectives for collecting data
They are usually the first formal appearance
2. Determine the data needed based on the of original research.
set objectives.
Primary Data - are data documented by the agency may have been different from the
pr im ar y source. The data collectors purpose of the user of these secondary data.
documented the data themselves. Sec ond ly, t her e m ay have be en bia s
introduced, the size of the sample may have
The first hand information obtained by the been inadequate, or there may have been
investigator is more reliable and accurate since arithmetic or definition errors, hence, it is
the investigator can extract the correct necessary to critically investigate the validity of
information by removing doubts, if any, in the the secondary data.
minds of the respondents regarding certain
questions. High response rates might be The primary data can be collected by the
obtained since the answers to various following five methods:
questions are obtained on the spot. It permits
1. Direct pe rsonal intervie ws - The
explanation of questions concerning difficult
researcher has direct contact with the
subject matter.
inter viewee. T he resear cher gather s
Secondary Sources - offer an analysis, information by asking questions to the
interpretation or a restatement of primary interviewee.
sources and are considered to be
2. Indirect/Questionnaire Method - This
persuasive. They often involve
methods of data collection involve sourcing
generalisation, synthesis, interpretation,
and accessing existing data that were
commentary or evaluation in an attempt to
originally collected for the purpose of the study.
convince the reader of the creator's
argument. They often attempt to describe or Designing good “questioning tools” forms an
explain primary sources. important and time consuming phase in the
development of most research proposals.
Secondary Data - are data documented by a
Once the decision has been made to use
secondary source. The data collectors had the
these techniques, the following questions
data documented by other sources.
should be considered before designing our
In secondary data, data are primary data for tools:
the agency that collected them, and become
secondary for someone else who uses these
• What exactly do we want to know, according
to the objectives and variables we identified
data for his own purposes.
earlier? Is questioning the right technique to
Secondary data are less expensive to collect obtain all answers, or do we need additional
both in money and time. These data can also techniques, such as observations or
be better utilized and sometimes the quality of analysis of records?
such data may be better because these might
have been collected by persons who were • Of whom will we ask questions and what
techniques will we use? Do we understand
specially trained for that purpose.
the to pic s uff ici entl y to de sig n a
On the other hand, such data must be used questionnaire, or do we need some loosely
with great care, because such data may also structured interviews with key informants or
be full of errors due to the fact that the purpose a focus group discussion first to orient
of the collection of the data by the primary ourselves?
• Are our informants mainly literate or Example:
illiterate? If illiterate, the use of self-
administered questionnaires is not an - Can y ou de scribe exactly what the
option. traditional birth attendant did when your
labor started?
• How large is the sample that will be
interviewed? Studies with many respondents - What do you think are the reasons for a high
often use s horter, highly structu red drop-out rate of village health committee
questionnaires, whereas smaller studies members?
al lo w mor e fl ex ib il it y an d m ay us e
A closed-ended question is a type of
questionnaires with a number of open-ended
question that includes a list of response
questions.
categories from which the respondent will
Key Design Principles of a Good select his answer. It is useful if the range of
Questionnaire possible responses is known. This type of
question is usually appropriate for collecting
1. Keep the questionnaire as short as possible. objective data.

2. Decide on the type of questionnaire (Open Example:


Ended or Closed Ended).
Did you eat any of the following foods
3. Write the questions properly. yesterday?
4. Order the questions appropriately.
• Fish or meat Yes No
5. Avoid questions that prompt or motivate the
• Eggs. Yes No
respondent to say what you would like to hear.
• Milk or cheese Yes No
6. Wr it e a n int ro du ct or y let te r o r an
introduction. Take Note!

7. Write special instructions for interviewers or Question wording and question order have a
respondents. large effect on the responses obtained.

8. Translate the questions if necessary. Example:

9. Always test your questions before taking the Two surveys were taken in late 1993/early
survey. (Pre-test) 1994 about Elvis Presley.

An open-ended question is a type of question One survey asked: “In the past few years,
that does not include response categories. The there have been a lot of rumors and stories
respondent is not given any possible answers about whether Elvis Presley is really dead.
to choose from. This type of question is usually How do you feel about this? Do you think there
appropriate for collecting subjective data. It is any possibility that these rumors are true
permit free responses that should be recorded and that Elvis Presley is still alive, or don’t you
in the respondent’s own words. think so?”
Second survey asked: “A recent television - Unrealistic Controlled Environments
show examined various theories about Elvis
- Inability to Control for All Variables
Presley’s death. Do you think it is possible that
Elvis is alive or not?” 5. Observation is a technique that involves
systematically selecting, watching and
8% of the respondents to the first question said
recoding behaviors of people or other
it is possible that Elvis is still alive and 16% of
phenomena and aspects of the setting in which
respondents to the second question said it is
they occur, for the purpose of getting (gaining)
possible that Elvis is still alive.
specified information. It includes all methods
3. A focus group is a group interview of from simple visual observations to the use of
approximately six to twelve people who share high level machines and measurements,
similar characteristics or common interests. A sophisticated equipment or facilities such as:
facilitator guides the group based on a - Radiographic
predetermined set of topics.
- biochemical
4. Experiment is a method of collecting data
where there is direct human intervention on the - X-ray machines
conditions that may affect the values of the - Microscope
variable of interest.
- Clinical examinations
Bear in mind that the experimental method has
several limitations that you should be aware of. - Microbiological examinations

- Ethical, moral, and legal Concerns


It gives relatively more accurate data on size can produce accuracy of results.
behavior and activities but Investigators or Moreover, the results from the small sample
observer’s own biases, prejudice, desires, and size will be questionable. A sample size that is
etc. and needs more resources and skilled too large will result in wasting money and time
human power during the use of high level because enough sample will normally give an
machines. accurate result.

The secondary data can be collected by the The sample size is typically denoted by n and
following five methods: it is always a positive integer. No exact sample
size can be mentioned here and it can vary in
1. Published report on newspaper and different research settings. However, all else
periodicals. being equal, large sized sample leads to
increased precision in estimates of various
2. Financial Data reported in annual reports.
properties of the population.
3. Records maintained by the institution.
Take Note!
4. Internal reports of the government
- Representativeness, not size, is the more
departments.
important consideration.
5. Information from official publications.
- Use no less than 30 subjects if possible.
Take Note!
- If you use complex statistics, you may need
• Always investigate the validity and reliability a minimum of 100 or more in your sample
of the data by examining the collection (varies with method).
method employed by your source.

• Do not use inappropriate data for your


research.

• The choice of methods of data collection is


largely based on the accuracy of the
information they yield.

SAMPLE SIZE

“How many participants should be chosen for a


survey”?

One of the most frequent problems in


Representative Sample
statistical analysis is the determination of the
appropriate sample size. One may ask why
sample size is so important. The answer to this
is that an appropriate sample size is required
for validity. If the sample size it too small, it will
not yield valid results. An appropriate sample
Desired Confidence
Z - Score
Level
80% 1.28
85% 1.44
90% 1.65
95% 1.96
99% 2.58

3. Degree of Variability

Depending upon the target population and


attributes under consideration, the degree of
variability varies considerably. The more
Choosing of sample size depends on non- heterogeneous a population is, the larger the
statistical considerations and statistical sample size is required to get an optimum level
considerations. of precision.

Methods in Determining the Sample Size


• Non-statistical considerations – It may
include availability of resources, man power,
• Estimating the Mean or Average
budget, ethics and sampling frame.
The sample size required to estimate the
• Statistical considerations – It will include population mean µ to with a level of confidence
the desired precision of the estimate. with specified margin of error e, given by
Three criteria need to be specified to 2

( e )

determine the appropriate sample size: n≥
1. Level of Precision
where:
Also called sampling error, the level of
precision, is the range in which the true value Z is the z-score corresponding to level of
of the population is estimated to be. confidence.

2. Confidence Interval e is the level of precision.

It is statistical measure of the number of times Take Note:


out of 100 that results can be expected to be
within a specified range. For example, a If When σ is unknown, it is common practice to
confidence interval of 90% means that results conduct a preliminary survey to determine s
of an action will probably meet expectations and use it as an estimate of σ or use results
90% of the time. from previous studies to obtain an estimate of
σ. When using this approach, the size of the
To find the right z – score to use, refer to the sample should be at least 30. The formula for
table: the sample standard deviation s is
which we know only after we have taken the
∑ (x − x̄)2 sample.
s= n−1
There are two ways to solve this dilemma:
Example:
1. We could determine a preliminary value for
A soft drink machine is regulated so that the p based on a pilot study or an earlier study.
amount of drink dispensed is approximately
normally distributed with a standard deviation Example:
equal to 0.5 ounce. Determine the sample size
needed if we wish to be 95% confident that our If last month 37% of all voters thought that
sample mean will be within 0.03 ounce from state taxes are too high, then it is likely that the
the true mean. proportion with that opinion this month will not
be dramatically different, and we would use the
Solution: The z – score for confidence level value 0.37 for p in the formula.
95% in the z – table is 1.96.
2. Simply to replace p in the formula by 0.5.
2

( 0.03 )
1.96(0.5)
n≥ = 1067.11 When p = 0.5, the maximum value of
p(1- p)=0.25. This is called the most
conservative estimate, since it gives the
We need a 1068 sample for our study. largest possible estimate of n.

• Estimating Proportion (Infinite The conservative formula using the strong law
Population) of large number.

The sample size required to obtain a 2

4 (e)
confidence interval for p with specified margin 1 Z
n≥ ≈ 385
of error e is given by

2 Where:

( e)
Z
n≥ p(1 − p)
Confidence level is 95%.

Where: The level of precision is 0.05.

Z is the z-score corresponding to level of Example:


confidence.
Suppose we are doing a study on the
e is the level of precision. inhabitants of a large town, and want to find
out how many households serve breakfast in
P is population proportion. the mornings. We don’t have much information
on the subject to begin with, so we’re going to
There is a dilemma in this formula:
assume that half of the families serve
It dependents on breakfast: this gives us maximum variability.
x
p= So p = 0.5. We want 99% confidence and at
N least 1% precision.
Solution: The z – score for confidence level Where:
99% in the z – table is 2.58.
no is Cochran’s sample size recommendation.

(
2.58 2

0 01 )
n≥ 0.5(1 − 0.5) = 16,641 N is the population size.

This is the link for online calculator of sample


We need a 16,641 sample for our study. size:

• Slovin’s Formula https://select-statistics.co.uk/calculators/


sample-size-calculator-population-proportion/
Slovin’s formula is used to calculate the
sample size n given the population size and h t t p s : / / w w w. c a lc u la t o r. n e t / s a m p le - s iz e -
error. It is computed as calculator.html

N
n≥
1 + Ne2

Where:

N is the total population.

e is the level of precision.

Example:

A researcher plans to conduct a survey about


food preference of BS Stat students. If the
population of students is 1000, find the sample BASIC SAMPLING DESIGN
size if the error is 5%.
The goal in sampling is to obtain individuals for
Solution: a study in such a way that accurate information
1000 about the population can be obtained.
n≥ = 285.71
1 + 1000(0.05)2 Reason for Sampling

The researcher need to survey 286 BS stat - Important that the individuals included in a
students. sample represent a cross section of
individuals in the population.
• Finite Population Correction
- If sample is not representative it is biased.
If the population is small then the sample size You cannot generalize to the population from
can be reduced slightly your statistical data.
n0
n≥ Some definitions are needed to make the
n −1
1+ o notion of a good sample more precise.
N
Definitions: - Deliberately or purposively selecting a
“representative” sample.!
• Observation unit - An object on which a Misspecifying the target population. !
measurement is taken. This is the basic unit Failing to include all of the target population
of observation, sometimes called an element. in the sampling frame, called
In studying human populations, observation undercoverage.!
units are often individuals. Including population units in the sampling
frame that are not in the target population,
• Target population - The complete collection
called overcoverage.
of observations we want to study.
- Having multiplicity of listings in the sampling
• Sampled population - The collection of all
frame.!
possible observation units that might have
Substituting a convenient member of a
been chosen in a sample; the population
population for a designated member who is
from which the sample was taken.
not readily available.
• Sample - A subset of a population.
- Failing to obtain responses from all of the
• Sampling unit - A unit that can be selected chosen sample. (Nonresponse)
for a sample. We may want to study
- Allowing the sample to consist entirely of
individuals, but do not have a list of all
volunteers.
individuals in the target population. Instead,
households serve as the sampling units, and Advantage of Sampling Over Complete
the observation units are the individuals Enumeration
living in the households.
- Less Labor
• Sampling frame - A list, map, or other
specification of sampling units in the - Reduced Cost
population from which a sample may be - Greater Speed
selected. For a survey using in-person
interviews, the sampling frame might be a list - Greater Scope
of all street addresses.
- Greater Efficiency and Accuracy
• Sampling technique/Sampling Strategies - - Convenience
It is a plan you set forth to be sure that the
sample you use in your research study - Ethical Considerations
represents the population from which you
Two Type of Samples
drew your sample.
1. Probability Sample
• Sampling Bias - This involves problems in
your sampling, which reveals that your - Samples are obtained using some objective
sample is not representative of your chance mechanism , thus involving
population. randomization.
The following examples indicate some ways in
which selection bias can occur:
- They require the use of a complete listing of - Most basic method of drawing a probability
the elements of the universe called the sample.
sampling frame.
- Assigns equal probabilities of selection to
- The probabilities of selection are known. each possible sample.

- They are generally referred to as random - Results to a simple random sample.


samples.
Advantage: It is very simple and easy to use.
- They allow drawing of valid generalizations
about the universe/population. Disadvantage: The sample chosen may be
distributed over a wide geographic area.
2. Non - probability Sample
When to use: This is preferable to use if the
- Samples are obtained haphazardly, selected population is not widely spread geographically.
purposively or are taken as volunteers. Also, this is more appropriate to use if the
population is more or less homogenous with
- The probabilities of selection are unknown. respect to the characteristics of the population.
- They should not be used for statistical
inference.

Sampling Procedure

- Identify the population.


- Determine if population is accessible.
- Select a sampling method.
- Choose a sample that is representative of
the population.

- Ask the question, can I generalize to the Simple Random Sampling


general population from the accessible
population?

Sampling technique can be grouped into how


• Systematic Random Sampling
selections of items are made such as
probability sampling and non-probability - It is obtained by selecting every kth
sampling. individual from the population.

Basic Sampling Technique of Probability - The first individual selected corresponds to a


Sampling random number between 1 to k.

• Simple Random Sampling


Obtaining a Systematic Random Sample When to use: This is advisable to us if the
ordering of the population is essentially
1. Decide on a method of assigning a unique random and when stratification with numerous
serial number, from 1 to N, to each one of data is used.
the elements in the population.

2. Compute for the sampling interval

N PopulationSize
k= =
n SampleSize

3. Select a number, from 1 to k, using a


randomization mechanism. The element in
the population assigned to this number is
the first element of the sample. The other
elements of the sample are those assigned
to the numbers and so on until you get a
sample of size.
Systematic Random Sampling

Example:
• Stratified Random Sampling
We want to select a sample of 50 students
- It is obtained by separating the population
from 500 students under this method kth item
into non-overlapping groups called strata
and picked up from the sampling frame.
and then obtaining a simple random sample
Solution: from each stratum.
500
k= = 10 - The individuals within each stratum should
50 be homogeneous (or similar) in some way.
We start to get a sample starting form i and for
every kth unit subsequently. Suppose the Example:
random number i is 6, then we select 15, 25,
A sample of 50 students is to be drawn from a
35, 45, .. .
population consisting of 500 students
Advantage: Drawing of the sample is easy. It belonging to two institutions A and B. The
is easy to administer in the field, and the number of students in the institution A is 200
sample is spread evenly over the population. and the institution B is 300. How will you draw
the sample using proportional allocation?
Disadvantage: May give poor precision when
unsuspected periodicity is present in the
population.

When to use: This is advisable to us if the


ordering of the population is essentially
random and when stratification with numerous
data is used.
Solution:

There are two strata in this case.

Given:

N1 = 200 N2 = 300 N = 500 n = 50

( N ) 1 ( 500 )
n 50
n1 = N = 200 = 20

(N) ( 500 )
n 50
n2 = N2 = 300 = 30

The sample sizes are 20 from A and 30 from


B. Then the units from each institution are to
be selected by simple random sampling.

Advantage: Stratification of respondents is


advantageous in terms of precision of the
estimates of the characteristics of the
population. Sampling designs may vary by
stratum to adjust for the differences in the
conditions across strata. It is easy to use as a
random sampling design.

Disadvantage: Values of the stratification


variable may not be easily available for all
units in the population especially if the
characteristic of interest is homogeneous. It is
possible that there are not representative in Stratified Random Sampling
one or two strata. Also, transportation costs
can be high if the population covers a wide
geographic area.
• Cluster Sampling
When to use: If the population is such that the
- You take the sample from naturally occurring
distribution of the characteristics of the
groups in your population.
respondents under consideration concentrated
in small and spread segment of the population. - The clusters are constructed such that the
Thus, this is preferred to use if precise sampling units are heterogeneous within the
estimates are desired for stratified parts of the cluster and homogeneous among the
population and if sampling problems differ in clusters.
the various strata of the population.
Obtaining a Cluster Sample When to use: If the population can be
grouped into clusters where individual
1. Divide the population into non-overlapping population elements are known to be different
clusters. with respect to the characteristics under study,
this preferable to use.
2. Number the clusters in the population from 1
to N.

3. Select n distinct numbers from 1 to N using


a randomization mechanism. The selected
clusters are the clusters associated with the
selected numbers.

4. The sample will consist of all the elements in


the selected clusters.

Example:

A researcher wants to survey academic


performance of high school students in Cluster Sampling
MIMAROPA.
• Multi - Stage Sampling
1. He/She can divide the entire population into
different clusters.
- Selection of the sample is done in two or
more steps or stages, with sampling units
2. Then the researcher selects a number of varying in each stage.
clusters depending on his research through
- The population is first divided into a number
simple or systematic random sampling.
of first-stage sampling units from which a
3. Then, from the selected clusters the sample is drawn. Smaller units, called the
researcher can either include all the high secondary sampling units, comprising the
school students as subject or he can select a selected first-stage units then serve as the
number of subjects from each cluster through sampling units for the next stage. If needed
simple or systematic random sampling. additional stages may be added until the
units of observation for the survey are
Advantage: There is no need to come out with clearly identified. The units comprising the
a list of units in the population; all what is samples selected from the previous stage
needed is simply a list of the clusters. It is also constitute the frame for the stages.
less costly since the elements are physically
closer together. Obtaining a Multi-Stage Sampling

Disadvantage: In actual field applications, 1. Organize the sampling process into stages
adjacent households tend to have more similar where the unit of analysis is systematically
characteristics than households distantly apart. grouped.

2. Select a sampling technique for each


3. Sy s t e ma t i ca l l y ap p l y t h e sa m p li n g
technique to each stage until the unit of
analysis has been selected.

Example:

Suppose we wish to study the expenditure


patterns of households in NCR. We can select
a sample of households for this study using
simple three-stage sampling.

- First, divide into smaller cities/municipalities


and a random sample of these cities/
municipalities is collected. Multi-Stage Sampling

- Second, a random sample of smaller areas


such as barangays is taken from within each
of the cities/municipalities chosen in the first Ba s i c S a m pl i ng Te ch n iq u e of No n -
stage. Probability Sampling

- Third, a random sample of even smaller • Accidental Sampling - There is no system


areas such as households is taken from of selection but only those whom the
within each of the areas chosen in the researcher or interviewer meets by chance.
second stage.
• Quota Sampling - There is specified
Advantage: It is easier to generate adequate number of persons of certain types is
sampling frames. Transportation costs are included in the sample. The researcher is
greatly reduced since there is some form of aware of categories within the population
clustering among the ultimate or final samples; and draws samples from each category. The
i.e., they are in the sample lower-stage units. si ze of eac h c at eg or ica l sam pl e i s
proportional to the proportion of the
Disadvantage: Its complexity in theory may be population that belongs in that category.
difficult to apply in the field. Estimation
procedures may be difficult for non-statisticians • Convenience Sampling - It is a process of
to follow. picking out people in the most convenient
and fastest way to get reactions
When to use: If no population list is available immediately. This method can be done by
and if the population covers a wide area. telephone interview to get the immediate
reactions of a certain group of sample for a
Take Note! certain issue.

Used probability sampling if the main objective • Purposive Sampling - It is based on certain
of the sample survey is making inferences criteria laid down by the researcher. People
about the characteristics of the population who satisfy the criteria are interviewed. It is
under study. used to determine the target population of
those who will be taken for the study.
• Judgement Sampling - selects sample in ACTIVITIES/ASSESSMENTS:
accordance with an expert’s judgment.
I. Determine if the source would be a primary
Cases wherein Non-Probability Sampling is or a secondary source.
Useful
______________1. Government Records
- Only few are willing to be interviewed
______________2. Dictionary
- Extreme difficulties in locating or identifying
subjects ______________3. Artifact

- Probability sampling is more expensive to ______________4. A TV show explaining what


implement happened in Philippines.

- Cannot enumerate the population elements. ______________5. Autobiography about


Rodrigo Duterte.
Sources of Errors in Sampling
______________6. Enrile diary describing
1. Non-sampling Error what he thought about the
world war II.
- Errors that result from the survey process.
______________7. Audio and video
- Any errors that cannot be attributed to the recordings
sample-to-sample variability.
______________8. Speeches
Sources of Non-Sampling Error
______________9. Newspaper
1. Non-responses
______________10. Review Articles
2. Interviewer Error
II. Determine the sample size of the following
3. Misrepresented Answers problems. Show your solution.
4. Data entry errors 1. A dermatologist wishes to estimate the
proportion of young adults who apply
5. Questionnaire Design
sunscreen regularly before going out in the
6. Wording of Questions sun in the summer. Find the minimum
sample size required to estimate the
7. Selection Bias proportion with precision of 3%, and 90%
confidence.
2. Sampling Error
2. The administration at a college wishes to
- Error that results from taking one sample estimate, the proportion of all its entering
instead of examining the whole population. freshmen who graduate within four years,
with 95% confidence. Estimate the
- Error that results from using sampling to
minimum size sample required. Assume
estimate information regarding a population.
1. that the population standard deviation is σ completed and returned at the end of the
= 1.3 and precision level is 0.05. program.

2. A government agency wishes to estimate ______________4. 24 Hour Fitness wants to


the proportion of drivers aged 16–24 who administer a satisfaction survey to its current
have been involved in a traffic accident in members. Using its membership roster, the
the last year. It wishes to make the club randomly selects 40 club members and
estimate to within 1% error and at 90% asks them about their level of satisfaction with
confidence. Find the minimum sample size the club.
required, using the information that several ______________5. A radio station asks its
years ago the proportion was 0.12. listeners to call in their opinion regarding the
use of U.S. forces in peacekeeping missions.
3. An internet service provider wishes to
estimate, to within one percentage error, ______________6. A tax auditor selects every
the current proportion of all email that is 1000th income tax return that is received.
spam, with 85% confidence. Last year the ______________7. For a survey, a sample of
proportion that was spam was 71%. municipalities was selected from every
Estimate the minimum size sample province in the country and included all child
required if the total email that is spam is laborers in the selected municipalities.
10,000.
______________8. To determine his DSL
III. Determine the type of sampling. (ex. Internet connection speed, Shawn divides up
Simple Random Sampling, Purposive the day into four parts: morning, midday,
Sampling) evening, and late night. He then measures his
Internet connection speed at 5 randomly
______________1. To determine customer selected times during each part of the day.
opinion of its boarding policy, Southwest
Airlines randomly selects 60 flights during a ______________9. A college official divides
certain week and surveys all passengers on the student population into five classes:
the flights. freshman, sophomore, junior, senior, and
graduate student. The official takes a simple
______________2. A member of Congress random sample from each class and asks the
wishes to determine her constituency’s opinion members opinions regarding student services.
regarding estate taxes. She divides her
______________10. In the game of lotto, 6
constituency into three income classes: low-
balls are selected from a container with 42
income households, middle-income
balls.
households, and upper-income households.
She then takes a simple random sample of IV. Using proportional allocation, determine
households from each income class. the sample size needed for every school.
The total population of students is 10,679,
______________3. The presider of a guest-
and the minimum sample is 2,450.
lecture series at a university stands outside the
auditorium before a lecture begins and hands
every fifth person who arrives, beginning with
the third, a speaker evaluation survey to be
Population
School Sample
per School
Antipolo National
3,360
High School
Bagong Nayon
National 2,540
High School
Dela Paz National
2,122
High School
Sta. Cruz National
1,290
High School
Tubigan National
1,367
High School
Total 10,679

REFERENCES:

Statistics. Informed Decision using Data by


Michael Sullivan, III,. Fifth Edition
Sampling: Design and Analysis by Sharon L.
Lhr. Second Edition
http://www.economicsdiscussion.net/statistics/
sa m p lin g /a d va n ta g e s- o f- sa m p lin g - o ve r -
completeenumeration-in-statistics/11980

h t t p : / / w w w. n a t c o 1 . o r g / r e s e a r c
h / fi l e s /SamplingStrategies.pdf

https://data36.com/statistical-bias-types-
explained/
MODULE 3: DESCRIPTIVE STATISTICS
OBJECTIVES:
After successful completion of this module, you should be
able to:
34 42 20 50 17 9 34 43
50 18 35 43 50 23 23 35
37 38 38 39 39 38 38 39
24 29 25 26 28 27 44 44
49 48 46 45 45 46 45 46
Number
Age (in thousands)
25 - 34 14,482
35 - 44 14,156
45 - 54 13,801
55 - 64 12,123
65 - 74 7,010

Scores Frequency
10 - 19 25
20 - 29 36
30 - 39 40
40 and over 12

xmax − xmin
cw =
nc
→ →
Scores Frequency
1 - 10 5
11 - 20 9
21 - 30 10
31 - 40 12
41 - 50 24
Total 60

Ungrouped data with a


frequency distribution
No. of Television
Sets Frequency

0 7
Ungrouped data without a 1 15
frequency distribution 2 12
3 4
4 5
5 2
Total 45
Sample Mean

n r
∑i=1 xi ∑i=1 fxi
x̄ = x̄ =
n n

Population Mean
N r
∑i=1 xi ∑ i=1 fxi
μ= μ=
N N

(2 )
n
− < cf i
x̃ = LB +
f
( d1 + d2 )
d1
x  = LB + i
Data Set I 108 112 116 120 124
Data Set II 108 112 116 120 205
Class Interval Frequency

55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27

Class Frequency x fx
Interval (f)
55 - 59 3
50 - 54 6
45 - 49 7
40 - 44 9
35 - 39 6 LC + UP
30 - 34 4
x=
25 - 29 5
2
7 55 + 59
∑ fxi =
Total n= x= = 57
2
i=1
50 + 54
x= = 52
2

Frequency 7
Class Interval
(f)
x fx ∑i=1 fxi
55 - 59 3 57 171 x̄ =
50 - 54 6 52 312 n
45 - 49 7 47 329
40 - 44 9 42 378 1,675
35 - 39 6 37 222
=
30 - 34 4 32 128
40
25 - 29 5 27 135
7 = 41.88
Total n = 40 ∑ fxi = 1,675
i=1
Class
f LB < cf
Interval
55 - 59 3
50 - 54 6
45 - 49 7
40 - 44 9
35 - 39 6
30 - 34 4
25 - 29 5
55 − 0.5 = 54.5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5

Class f LB < cf
Interval
55 - 59 3 54.5
50 - 54 6 49.5
45 - 49 7 44.5
40 - 44 9 39.5
35 - 39 6 34.5
30 - 34 4 29.5
25 - 29 5 24.5 5
Total n = 40

5 + 4 = 9 + 6 = 15 + 9 = 24 + 7 = 31 + 6 = 37 + 3 = 40

Class
f LB < cf
Interval
55 - 59 3 54.5 40
50 - 54 6 49.5 37
45 - 49 7 44.5 31 n 40
40 - 44 9 39.5 24
= = 20
2 2
35 - 39 6 34.5 15
30 - 34 4 29.5 9
25 - 29 5 24.5 5
Total n = 40

(2 )
n
− < cf i
(20 − 15)5
x̃ = LB + x̃ = 39.5 + = 42.28
f 9
Class
f LB < cf
Interval
55 - 59 3 54.5 40
50 - 54 6 49.5 37
45 - 49 7 44.5 31
40 - 44 9 39.5 24
35 - 39 6 34.5 15
30 - 34 4 29.5 9
25 - 29 5 24.5 5

d1 = 9 − 6 = 3
( d1 + d2 )
d1
x  = LB + i
d2 = 9 − 7 = 2

( 3 + 2)
3
x  = 39.5 + 5 = 42.5
)
(4
nk − < cf i

Qk = LB + f
nk
Qclass = + 0.5
4

( 10 )
nk
− < cf i
Dk = LB +
f
nk
Dclass = + 0.5
10

( 100 )
nk
− < cf i
Pk = LB +
f
nk
Pclass = + 0.5
100
Month Hour Lost (x)
January 55
February 23
March 37
April 37
May 48
June 42
July 27
August 20
September 30
October 32
November 24
December 40

20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
(12)(3)
Qclass = = 9.5
4

20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12

Q3 = 40 + 0.5(42 − 40)
= 41

20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
(12)(4)
Dclass = + 0.5 = 5.3
10

20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12

D4 = 30 + 0.3(32 − 30)
= 30.6
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
(12)(55)
Pclass = + 0.5 = 7.1
100

20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12

P55 = 37 + 0.1(37 − 37)


= 37

Class Interval Frequency


55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27

Class f LB < cf
Interval
55 - 59 3
50 - 54 6
45 - 49 7
40 - 44 9
35 - 39 6
30 - 34 4
25 - 29 5
55 − 0.5 = 54.5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5
Class f LB < cf
Interval
55 - 59 3 54.5
50 - 54 6 49.5
45 - 49 7 44.5
40 - 44 9 39.5
35 - 39 6 34.5
30 - 34 4 29.5
25 - 29 5 24.5 5
Total n = 40

5 + 4 = 9 + 6 = 15 + 9 = 24 + 7 = 31 + 6 = 37 + 3 = 40

Class
f LB < cf
Interval
55 - 59 3 54.5 40
50 - 54 6 49.5 37
45 - 49 7 44.5 31 nk (40)(1)
= = 10
40 - 44 9 39.5 24 4 4
35 - 39 6 34.5 15
30 - 34 4 29.5 9
25 - 29 5 24.5 5
Total n = 40

( 4 )
nk
− < cf i
(10 − 9)5
Qk = LB + Q1 = 34.5 + = 35.33
f 6

Class
f LB < cf
Interval
55 - 59 3 54.5 40
50 - 54 6 49.5 37
45 - 49 7 44.5 31 nk (40)(7)
= = 28
40 - 44 9 39.5 24 10 10
35 - 39 6 34.5 15
30 - 34 4 29.5 9
25 - 29 5 24.5 5
Total n = 40

( 10 )
nk
− < cf i
(28 − 24)5
Dk = LB + D7 = 44.5 + = 47.36
f 7
Class
f LB < cf
Interval
55 - 59 3 54.5 40
50 - 54 6 49.5 37
45 - 49 7 44.5 31 nk (40)(10)
= =4
40 - 44 9 39.5 24 100 100
35 - 39 6 34.5 15
30 - 34 4 29.5 9
25 - 29 5 24.5 5
Total n = 40

( 100 )
nk
− < cf i (5 − 0)5
P10 = 24.5 + = 29.5
Pk = LB + 5
f

Class Interval Frequency

18 - 24 28
25 - 31 54
32 - 38 38
39 - 45 20
46 - 52 17
53 - 59 3

Class
Interval f LB < cf

18 - 24 28
25 - 31 54
32 - 38 38
39 - 45 20
46 - 52 17
53 - 59 3 18 − 0.5 = 17.5
Total n= 25 − 0.5 = 24.5
32 − 0.5 = 31.5
Class
Interval f LB < cf

18 - 24 28 17.5 28
25 - 31 54 24.5
32 - 38 38 31.5
39 - 45 20 38.5
46 - 52 17 45.5
53 - 59 3 52.5
Total n = 160

28 + 54 = 82 + 38 = 120 + 20 = 140 + 17 = 157 + 3 = 160

Class
f LB < cf
Interval
18 - 24 28 17.5 28
25 - 31 54 24.5 82
nk (160)(2)
32 - 38 38 31.5 120 = = 80
39 - 45 20 38.5 140
4 4
46 - 52 17 45.5 157
53 - 59 3 52.5 160
Total n = 160

( 4 )
nk
− < cf i
(80 − 28)7
Qk = LB + Q2 = 24.5 + = 31.24
f 54

Class
f LB < cf
Interval
18 - 24 28 17.5 28
25 - 31 54 24.5 82
nk (160)(5)
32 - 38 38 31.5 120 = = 80
10 10
39 - 45 20 38.5 140
46 - 52 17 45.5 157
53 - 59 3 52.5 160
Total n = 160

( 10 )
nk
− < cf i
(80 − 28)7
Dk = LB + D5 = 24.5 + = 31.24
f 54
Class
f LB < cf
Interval
18 - 24 28 17.5 28
25 - 31 54 24.5 82
nk (160)(50)
32 - 38 38 31.5 120 = = 80
100 100
39 - 45 20 38.5 140
46 - 52 17 45.5 157
53 - 59 3 52.5 160
Total n = 160

( 100 )
nk
− < cf i
(80 − 28)7
Pk = LB + P50 = 24.5 + = 31.24
f 54

Figure 1 Figure 2
R = Xmax. − Xmin.

Sample Standard Deviation

n r
∑ i=1 (xi − x̄)2 ∑i=1 f(xi − x̄)2
s= s=
n−1 n−1

Population Standard Deviation

r
N
∑ i=1 (xi − μ)2 ∑ i=1 f(xi − μ)2
σ= σ=
N N
Sample Variance

n r
∑i=1 (xi − x̄)2 ∑i=1 f(xi − x̄)2
2
s2 = s =
n−1 n−1

Population Variance

N r
∑ i=1 (xi − μ)2 ∑i=1 f(xi − μ)2
2 2
σ = σ =
N N

Class Interval Frequency

55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3
50 - 54 6
45 - 49 7
40 - 44 9
35 - 39 6
30 - 34 4
25 - 29 5
7 7

∑ ∑
Total n= fxi = f(xi − x̄)2 =
i=1 i=1

Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61
50 - 54 6 52 312 102.41
45 - 49 7 47 329 26.21
40 - 44 9 42 378 0.01
35 - 39 6 37 222 23.81
30 - 34 4 32 128 97.61
25 - 29 5 27 135 221.41
7 7
Total n = 40 ∑ fxi = ∑ f(xi − x̄) =
2
i=1 1,675 i=1

1,675 (x1 − x̄)2 = (57 − 41.88)2 = 228.61


x̄ =
40 (x2 − x̄)2 = (52 − 41.88)2 = 102.41
= 41.88 (x3 − x̄)2 = (47 − 41.88)2 = 26.21

Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61 685.83
50 - 54 6 52 312 102.41 614.46
45 - 49 7 47 329 26.21 183.47
40 - 44 9 42 378 0.01 0.09
35 - 39 6 37 222 23.81 142.86
30 - 34 4 32 128 97.61 390.44
25 - 29 5 27 135 221.41 1107.05
7 7
Total n = 40 ∑ fxi = ∑
f(xi − x̄)2 =
i=1 1,675 i=1 3,124.20
f(x1 − x̄)2 = 3(228.61) = 685.83
f(x2 − x̄)2 = 6(102.41) = 614.46
f(x3 − x̄)2 = 7(26.21) = 183.47
2
7 f(x − x̄)
∑i=1 i
s= n−1
Class
Interval
(xi − x̄)2 f(xi − x̄)2
55 - 59 228.61 685.83 3,124.20
s=
50 - 54 102.41 614.46 40 − 1
45 - 49 26.21 183.47 = 8.95
40 - 44 0.01 0.09
35 - 39 23.81 142.86 7
30 - 34 97.61 390.44 ∑i=1 f(xi − x̄)2
2
25 - 29 221.41 1107.05
s =
n−1
7

∑ f(xi − x̄) =
2
Total 3,124.20
i=1 3,124.20 s2 =
40 − 1
= 80.11

Data Set 15 13 20 19 14

n n

∑ ∑
(xi − x̄) = 0 → (xi − x̄)2 ≠ 0
i=1 i=1
16.2 − 3.11 = 13.09
16.2 + 3.11 = 19.31
Skewness < 0 Skewness > 0 Skewness = 0

x̄ − x 
Sk =
s
3(x̄ − x̃)
Sk =
s
QD
k=
P90 − P10

Data Set A 40 38 42 40 39 39 43 40 39 40
Data Set B 46 37 40 33 42 36 40 47 34 45
Normal Distribution

Normal Curve

50 100 150
Properties of Normal Curve

Inflection point Inflection point

μ−σ μ μ+σ

area = 1

0.50 0.50
μ1 = μ2, σ1 < σ2 μ1 < μ2, σ1 < σ2

μ1 < μ2, σ1 = σ2
x−μ
z=
σ
 < z) )
Standard Normal Distribution Table 1 (Positive SideP(Z

Standard Normal Distribution Table 2 (Negative Side  P(Z < − z))

0 z1 z1 0

z1 0 z2 0 z2 z1 0
1 − Area

z1 z2 0 z1 0 z2
1 − Area 1 − Area
0 z1 0 0 z1
Area = 1

0 z1 0 z1 0
Area = 0.50

z1 0 0 z1

z1 z2 0 z1 0 z2

z1 0 z2 0 z2 z1 0
0.50 − Area 0.50 − Area

z1 0 z1 0 0
0.50 − Area Area = 0.50

0 z1 0 0 z1
Area = 0.50
Area = P(X > 560)

X
450 510 570
560

Using Table 1

P(X > 560) = P (Z > z) Area = P(Z > 0.83)


= 0.2033
( )
560 − 510
=P Z>
60
= P(Z > 0.83)
= 1 − P(Z ≤ 0.83)
= 1 − 0.7967 Z
−2 −1 0 1 2
= 0.2033
0.83

Using Table 2

P(X > 560) = P (Z > z) Area = P(Z > 0.83)


= 0.2033
( )
560 − 510
=P Z>
60
= P(Z > 0.83)
= 0.2033 Z
−2 −1 0 1 2
0.83
Area = P(X < 35)

X
35.55 38.72 41.89
35

Using Table 1

P(X < 35) = P (Z < z) Area = P(Z < − 1.17) = 0.1210

( 3.17 )
35 − 38.72
=P Z<
= P(Z < − 1.17)
= 1 − P(Z ≥ − 1.17)
= 1 − 0.8790 −2 −1 0 1 2
Z
= 0.1210
−1.17
Using Table 2

P(X < 35) = P (Z < z) Area = P(Z < − 1.17) = 0.1210

(
35 − 38.72
3.17 )
=P Z<
= P(Z < − 1.17)
= 0.1210
Z
−2 −1 0 1 2
−1.17

Area = P(35 ≤ X ≤ 40)

X
35.55 38.72 41.89
35 40
Using Table 1

P(35 ≤ X ≤ 40) = P(z ≤ Z ≤ z)

(
35 − 38.72
3.17 )
=P 40 − 38.72
≤Z≤
3.17
= P(−1.17 ≤ Z ≤ 0.40)
= P(Z ≤ 0.40) − [1 − P(Z ≥ − 1.17)]
= 0.6554 − [1 − 0.8790] Area = P(−1.17 ≤ Z ≤ 0.40)
= 0.6554 − 0.1210
= 0.5344

X
−2 −1 0 1 2
−1.17 0.40

Using Table 2

P(35 ≤ X ≤ 40) = P(z ≤ Z ≤ z)


40 − 38.72
( 3.17 3.17 )
35 − 38.72
=P ≤Z≤
= P(−1.17 ≤ Z ≤ 0.40)
= [0.50 − P(Z ≥ 0.40) + [0.50 − P(Z ≤ − 1.17)]
= [0.50 − 0.3446] + [0.50 − 0.1210]
= 0.1554 + 0.3790
= 0.5344 Area = P(−1.17 ≤ Z ≤ 0.40)

X
−2 −1 0 1 2
−1.17 0.40
ACTIVITIES/ASSESSMENTS:

ACTIVITIES/ASSESSMENTS:

A.

B.

ACTIVITIES/ASSESSMENTS:

Needs
Origin / Rating Poor Improvement Satisfactory V Good Excellent Total

External 0% 2% 12% 19% 9% 41%


Internal 4% 8% 15% 23% 9% 59%
Grand Total 4% 10% 27% 41% 17% 100%
ACTIVITIES/ASSESSMENTS:

Salary Frequency Percentage


41,000 - 50,000 1 1%
51,000 - 60,000 20 13%
61,000 - 70,000 53 35%
71,000 - 80,000 43 29%
81,000 - 90,000 26 17%
91,000 - 100,000 6 4%
101,000 - 110,000 1 1%
Total 150 100%

ACTIVITIES/ASSESSMENTS:

ACTIVITIES/ASSESSMENTS:

37 46 37 26 30 41 28 49 29 34 46 50 38 35 42
35 46 45 27 41 26 45 39 43 46 36 32 46 36 48
49 47 30 43 31 34 38 41 39 45 28 43 37 39 26
38 30 29 38 26 31 42 44 48 43 37 46 38 27 50
42 33 42 42 43 39 39 31 46 46 48 48 50 45 31
Scores Frequency Percentage (%)
26 to 30
31 to 35
36 to 40
41 to 45
46 to 50
Total
ACTIVITIES/ASSESSMENTS:

ACTIVITIES/ASSESSMENTS:
MODULE 4: INFERENTIAL STATISTICS
OBJECTIVES:
After successful completion of this module, you should be
able to:

α
Null Hypothesis:

Alternative Hypothesis:
α

α
One-tailed and Left tailed One-tailed and Right tailed
H a : μ1 < μ2 Ha : μ1 > μ2
Rejection Region
Rejection Region

-2 0 2 -2 0 2

Two-tailed
 a : μ1 ≠ μ2
H
Rejection Region Rejection Region

-2 0 2
To determine if the data is follows a normality
distribution, we can use the graphical or
numerical method.

Histogram plots the observed values against their


frequency, states a visual estimation whether the
distribution is bell shaped or not.

Q-Q probability plots display the observed values


against normally distributed data (represented by the
line).
The hypotheses used are:
Ho: The sample data follows a normal distribution.
Ha: The sample data does not follow a normal
distribution.

When we are testing normality:


• If P value > alpha, it means that the data are
normal.
• If P value ≤ alpha, it means that the data are NOT
normal.

STEP 1:
STEP 2: n
SS = 2
∑( i
x − x̄)
i=1

∑ i ( n+1−i
STEP 3: b= a x − xi)
i=1

n
m=
2
n−1
m=
2
Shapiro - Wilk Table
STEP 4: b2
W=
SS
STEP 5:
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2
Ho : μ1 ≥ μ2

Ha : μ1 < μ2

α = 0.05
pvalue ≤ α
H0 : μ1 ≥ μ2 and Ha : μ1 < μ2
H0 : μ1 ≤ μ2 and Ha : μ1 > μ2
H0 : μ1 = μ2 and Ha : μ1 ≠ μ2
Ho : μ1 = μ2

Ha : μ1 ≠ μ2

α = 0.05
pvalue ≤ α
pvalue ≤ α
Ho : μ1 = μ2 = . . . = μk
Ha :
α = 0.10
pvalue ≤ α
pvalue ≤ α
pvalue ≤ α
pvalue ≤ α
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative
linear relationship.
• The closer to 1, the stronger the positive
linear relationship.
• The closer to 0, the weaker the linear
relationship.
Y Y Y

X X X
r = -1 r = -.6 r =0
Y Y

r = .6 r=1
Test Statistic:
df
t=r
1 − r2
where:
df = degrees of freedom
r = correlation coefficient of P
Note:
df = n − 2
α = 0.0.5
df
t=r
1 − r2

df = n − 2

pvalue ≤ α
Chi-Square Distribution

χ2

Chi - Square: Test for


Independence

Chi - Square: Test for


Independence
Chi - Square: Test for
Independence

(O − E)2

2
χ =
E

(row total)(column total)


E=
grand total

Observed and Expected Frequencies

Observed Values Low Medium High Row Total


Some College 20 35 20 80
Bachelor's Degree 17 33 25 70
Masters Degree 11 18 21 50
Column Total 48 86 66 200

1. There are 2 variables, and both are measured as


categories, usually at the nominal level.
2. The two variables should consist of two or more
categorical, independent groups.
3. The data in the cells should be frequencies, or counts
of cases rather than percentages or some other
transformation of the data.
4. For a 2 by 2 table, all expected frequencies > 5.
5. For a larger table, all expected frequencies > 1 and
no more than 20% of all cells may have expected
frequencies < 5.
Example:

Example:
α = 0.0.5
(row total)(column total)
E=
grand total
pvalue ≤ α
ACTIVITIES/ASSESSMENTS:
ACTIVITIES/ASSESSMENTS:

ACTIVITIES/ASSESSMENTS:

Normal Bone Osteopenia Osteoporosis


1200 1000 890
1000 1100 650
980 700 1100
900 800 900
750 500 400
800 700 350

ACTIVITIES/ASSESSMENTS:
Men Women
(in $) (in $)
107.48 125.98
143.61 45.53
90.19 56.35
125.53 80.62
70.7 46.37
83 44.34
129.63 75.21
154.22 68.48
93.8 85.82
126.11
ACTIVITIES/ASSESSMENTS:

Case Before After Case Before After


1 85 95 11 89 97
2 84 98 12 87 98
3 86 97 13 82 95
4 87 92 14 81 95
5 89 96 15 86 92
6 82 93 16 89 91
7 80 94 17 89 94
8 84 95 18 84 95
9 86 90 19 85 96
10 82 82 20 88 97

ACTIVITIES/ASSESSMENTS:
Head
Height
Circumference
(inches)
(inches)
27.75 17.5
24.5 17.1
25.5 17.1
26 17.3
25 16.9
27.75 17.6
26.5 17.3
27 17.5
26.75 17.3
26.75 17.5
27.5 17.5

ACTIVITIES/ASSESSMENTS:

No. Of Years Smoking Status


of Education Current Former Never
Less than 12 178 88 208
12 137 69 143
13 - 15 44 25 44
16 or more 34 33 51
ACTIVITIES/ASSESSMENTS:
Head
Height
Circumference
(inches)
(inches)
27.75 17.5
24.5 17.1
25.5 17.1
26 17.3
25 16.9
27.75 17.6
26.5 17.3
27 17.5
26.75 17.3
26.75 17.5
27.5 17.5

You might also like