0% found this document useful (0 votes)
9 views17 pages

MS5 6

The document describes conducting a hypothesis test to determine whether Filipino women have lower average cholesterol levels than American women. Specifically, it describes blood tests performed on 19 Filipino women ages 20-39 that found a sample mean cholesterol level of 181.52 mg/dl and standard deviation of 40 mg/dl. It assumes the cholesterol levels in American women of the same age range are normally distributed with a mean of 90 mg/dl. The hypothesis test will use a significance level of alpha = 0.05.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

MS5 6

The document describes conducting a hypothesis test to determine whether Filipino women have lower average cholesterol levels than American women. Specifically, it describes blood tests performed on 19 Filipino women ages 20-39 that found a sample mean cholesterol level of 181.52 mg/dl and standard deviation of 40 mg/dl. It assumes the cholesterol levels in American women of the same age range are normally distributed with a mean of 90 mg/dl. The hypothesis test will use a significance level of alpha = 0.05.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Conduct a test of hypothesis to determine

whether Filipino women have lower average


cholesterol level than their American
counterparts. Use alpha = 0.05.

1. Juanita Lopez, a production supervisor at


chemical company, wants to be sure that the
Super-Duper can is filled with an average of
16oz of product. If the mean volume is
significantly less than 16 oz, customers will
likely complain, prompting undesirable
publicity. The physical size of the can doesn’t
3. In a study of air-bag effectiveness, it was
allow a mean volume significantly above 16
found that in 821 crashes of midsize cars
oz. A random sample of 36 cans shows a
equipped with air bags, 46 of the crashes
sample mean of 15.7 oz. Assuming σ is 0.2 oz,
resulted in hospitalization of the drivers. Use a
conduct a hypothesis test with α = 0.01.
0.01 level of significance to test the claim that
the airbag hospitalization rate is lower than
the 7.8% rate for crashes of midsize cars
equipped with automatic safety belts.

4. Suppose that the teacher of a school claims


that the average weight of student population
2. We want to compare fasting serum cholesterol greater than from 140 lb. and we desire to test
levels of Filipino women to that of the the truth of this claim. We have a random
American women. Assume the cholesterol sample of 6 students of the school weights
levels in 20 to 39 years old women in the from student population. Use a 0.10 level of
United States in normally distributed with 𝜇 = significance.
90𝑚𝑔/𝑑l. Blood tests are performed on 19
female Filipinos in this age range rendered a
sample mean cholesterol level of 181.52
mg/dl and standard deviation of 40 mg/dl.
3.2. Types of questions

CLOSED-ENDED

a. Two-way questions
b. Multiple-choice questions
c. Checklist
d. Ranking
e. Rating scale
- Odd-numbered
CONSTRUCTING QUESTIONNAIRES - Even-numbered

ADVANTAGES
1. Purpose  Finite responses
8. Pretest and
 Easy and quick to answer
2. Pre-existing
Questionnaire
Validation
 Easy to code
STEPS IN DISADVANTAGES
3. Domains and CONSTRUCTI 7. Cover ltter,
Types of Instructions, and
NG
Questions
QUESTIONNAI
Layout
 Bias
RE  Limited information
4. Consider the
6. Ordering
Audience
OPEN-ENDED does not include response categories.
5. Write
Questions ADVANTAGES

 Express ideas
 Add new information
1. Determine the purpose
DISADVANTAGES
A. What do I need to know?
 Difficult to answer and analyze
B. Why do I need to know it?
C. What will happen as a result of this  Require effort and time
questionnaire?  Illegibility in handwriting

CONTINGENCY QUESTIONS (filter questions) –


special type of close-ended question because it applies
2. Pre-existing questionnaires only to subgroup of respondents.

In the course of your literature review, pay careful 4. Consider the audience
attention to how others are measuring the concept you
want to measure. They may have already tested the
reliability and validity of measure. A. Who should I ask?
B. Choose an appropriate data collection method:
3.1. Domains of questions o Mailed
o Personal (face-to-face) interview
o Telephone
o Web-based
5. Write the questions

1. Watch your ranges.

7. Avoid bias.

2. Avoid abstract terms or jargon.

8. Avoid hidden assumptions and contingencies.

3. Clarifying detail.

6. Ordering

FUNNEL SEQUENCE

 Progressively narrower scope


 To ascertain something about the
respondent’s frame of reference on a topic
4. Avoid double-barreled questions.  To prevent further specific questions from
blasting the initial overall view of the
respondent

Example:

5. Avoid double negative wording.

6. Use of appropriate scale.


 Use quality reproduction.
 Proofread.
 Limit matrix/ grid questions.

8. Pretest and Validation

Review and pilot test the survey. Talk through the


survey questions with potential respondents, as
colleagues to review them, and/or select a few potential
respondents and ask them to complete the survey and
provide feedback on the content.

INFERENCE ABOUT TWO MEANS

To perform inference on the difference of two


population means, we must first determine whether the
data come from an independent or dependent sample.

 A sampling method is independent when the


individuals selected for one sample do not
dictate which individuals are to be in a
INVERTED FUNNEL SEQUENCE
second sample.
 Specific questions on a topic are asked first,  A sampling method is dependent when the
and these eventually lead to a more general individual selected to be in one sample are
question. used to determine the individuals to be in
 To think through his or here attitude before the second sample.
reaching an overall evaluation. Example: Determine whether the sample is independent
Example: or dependent.

1. A researcher wants to know if the mean length


of stay in for-profit hospitals is different from
the mean length of stay in not-for-profit
hospitals. He randomly selected 20
individuals in the for-profit hospital and
matched them with 20 individuals in the
notfor-profit by diagnosis.
2. An urban economist believes that commute
times to work in the South are less than
commute times to work in the Midwest. He
randomly selects 40 employed individuals in
the south and 45 employed individuals in the
Midwest and determines their commute times.
3. In an experiment conducted in biology class,
Prof. Rhea measured the time required for 12
students to catch a failing meter stick using
their dominant hand and nondominant hand.
The goal of the study was to determine
whether the reaction time in an individual’s
dominant hand is different from the reaction
time in the non-dominant hand.
7. Cover letter, Instructions, and Layout
TWO INDEPENDENT MEANS

 Include the title.  Allows researchers to evaluate or to compare


 Consider using a “booklet” format so it the mean difference between two
stands out from just “paper.”
populations using the data from two separate Note:
samples.
 Used to test whether population means are Use the z-distribution to conduct the test if you have
significantly different from each other, using two independent samples taken from normally
the means from randomly drawn samples. distributed populations and if you know both
population standard deviation or both samples
ASSUMPTIONS exceeds 30.

1. Your dependent variable should be measured INDEPENDENT SAMPLE t-TEST


on a continuous scale (i.e., it is measured at
the interval or ratio level).
2. Your independent variable should consist of
two categorical, independent groups.
3. You should have independence of
observations, which means that there is no
relationship between the observations in each
group or between the groups themselves.
4. There should be no significant outliers.
5. Your dependent variable should be
approximately normally distributed for
each group of the independent variable.
6. There needs to be homogeneity of variances.
[equality of population variances] REJECTION REGION

HYPOTHESES

Note:
INDEPENDENT SAMPLE z-TEST
 If both population standard deviation are
unknown and the sample size are small use
t-distribution, however, you need to use first
F-test to determine the variance are equal or
not.
 If the results of F-test is fail to reject the
hypothesis then you will use the t-
distribution Case 1, which means that the
variance are equal.
 If the results of F-test is reject the null
hypothesis then you will use the t-
distribution Case 2, which means that the
variance are not equal.
REJECTION REGION
F-TEST

 For the comparison of two variances or


standard deviations, an F - test is used.
 The sampling distribution of the variances is
called the F – distribution.
Note:

 The values of F cannot be negative.


 The distribution is positively skewed.
 The F distribution is a family of curves based
on the degrees of freedom of the
denominator.

ASSUMPTIONS

1. The populations from which the samples were


obtained must be normally distributed.
2. The samples must be independent of each
other.

Example:

An agricultural research institute is studying two


new varieties of palay both of which are reputedly high-
yielding varieties. There are a few studies which suggest
that the difference in the yield per hectare may be
significant. The head of the institute decides to find out
if there is, in fact, a significant difference in yield. Forty
hectares are planted to variety A and thirty hectares to
variety B. Both varieties are grown under identical Example:
laboratory conditions.
Suppose we put people on 2 diets “the fruit diet and
At harvest time, the results are: the bread diet”. Participants are randomly assigned to
either 7-days of eating exclusively fruits or 7-week of
exclusively eating bread. At the end of the day, we
measure weight gain by each participant. Is bread diet
causes more weight gain compared to fruits diet? Test
the claim using 10% level of significance.

At 1% Level of Significance, is there a significant


difference in the yield of two palay variety?
Solution:
Solution:
Example:

An apartment rental agent tells the personnel


manager of a firm thinking of building a plant in the
agent’s city that the mean rental rates for two-bedroom
apartment are the same in sector A and B of the city. To
test this claim, the personnel manager randomly samples
apartment complexes in each sector and obtained the
following data.

What can the personnel manager conclude about the


agent’s claim at 0.05 level?

Solution:
Consider the case of the two experimental diets
designed to add weight to malnourished third-world
children. The given table presents the weight gains
made by 8 children, who were fed diet A and 9 children
who received diet B.

Prove that both children in different diet have the


same population mean weight at 0.05 level?

Solution:

Example:
HYPOTHESES

REJECTION REGION

Example:

An industrial engineer is evaluating a new technique


to assemble air compressors. If there is a difference in
the number of compressors that can be assembled when
the existing procedure is used, and when the new
technique is followed, she will recommend that the
PAIRED SAMPLE t-TEST company use the approach that the result in the greatest
worker productivity. A sample of 8 employees is
The dependent sample t-test (also called the selected at random, and the number of compressor they
paired t-test or paired-samples t-test) compares the used in each procedure for 1 week using the existing
means of two related groups to determine whether there procedure is recorded. The same 8 workers are then
is a statistically significant difference between these trained to use the new technique, and their output for 1
means. week is then noted:

ASSUMPTIONS

1. Your dependent variable should be measured


at the interval or ratio level (i.e., they are
continuous). Solution:
2. Your independent variable should consist of
two categorical, "related groups" or
"matched pairs”.
3. There should be no significant outliers in the
differences between the two related groups.
4. The distribution of the differences in the
dependent variable between the two related
groups should be approximately normally
distributed.
ASSUMPTIONS
1. We have two independent sets of randomly
selected sample data.
2. For both samples, the conditions 𝑛𝑝 ≥ 5 and
𝑛𝑝(1 − 𝑝) ≥ 5 are satisfied.

HYPOTHESES

REJECTION REGION

Example:

Johns Hopkins researchers conducted a study of


pregnant IBM employees. Among 30 employees who
worked with glycol ethers, 10 (or 33.3%) had
miscarriages, but among 750 who were not exposed to
glycol ethers, 120 (or 16.0%) had miscarriages. At the
TWO SAMPLE PROPORTION TEST 0.01 significance level, test the claim that the
miscarriage rate is greater for women exposed to glycol
ethers.
 A two-proportion z-test allows you to
compare two proportions to see if they are We stipulate that sample 1 is the group that
the same. worked with glycol ethers and sample 2 is the group not
 When testing a hypothesis made about two exposed, so the sample statistics can be summarized as
population proportions – such as proportions shown here:
of cured patients in a population given some
treatment and a second population given a
placebo.

TWO-PROPORTION z-TEST
Solution:
 The correlation between the variables may
either be showing direct or inverse
relationship.

FEATURES OF r

 Unit free
 Range between -1 and 1
 The closer to -1, the stronger the negative
linear relationship.
 The closer to 1, the stronger the positive
linear relationship.
 The closer to 0, the weaker the linear
relationship.

Caution!

 A correlation of 70% does not mean that 70%


of the points are clustered around a line. Nor
should we claim here that we have twice as
much linear association with a set of points,
which has a correlation of 35%.
 Correlation does not imply causation.
 A The presence of outliers easily affects the
correlation of a set of data.
- In some situations, we ought to remove
these outliers from the data set and re-do
the correlation analysis.
LINEAR CORRELATION ANALYSIS - In other case, these outliers ought not to
be removed as there will always be some
Used to measures the degree of relationship points detached from the rest of the data.
between two variables x and y by means of a single
number called the correlation coefficient. PEARSON PRODUCT MOMENT
CORRELATION COEFFICIENT
 Only concerned with strength of the
relationship.  Commonly called the Pearson r.
 No causal effect is implied.  It measures the linear relationship between
Note: two variables.
 The level of measurement of the data for the
 The value of the correlation coefficient two variable are either in interval or ratio
denoted by the symbol “r” ranges from -1 to scale.
1.
Where:

 x = the observed data for the independent


variable
 y = the observed data for the dependent
variable Example:
 n = no. of samples
The Rip-off Vending Machine Company operates
coffee vending machines in office buildings. The
company wants to study the relationship; if any, that to
study number of cups sold per day and the number of
persons working in each building. Sample data for the
study were collected by the company and presented
below and test the significance at 0.05 level.
No. of Person 5 6 14 19 15 11 18 22
Working at
Location
No. of cups of coffee 10 20 30 40 30 20 40 40
sold

Solution:

QUALITATIVE INTERPRETATION

Note:

 If r is negative, this means that for every


increase in one variable, there is a
corresponding decrease in the second
variable or that there is an inverse
relationship between variables x and y.
 If r is positive, this means that for every
increase in one variable, there is a
corresponding increase in the second variable
or that there is a direct relationship between
variables x and y.

HYPOTHESES
Sample regression line provides an estimate of
the population regression line as well as a predicted
value of Y.

Note:

 b0 and b1 are obtained by finding the values


of b0 and b1 that minimizes the sum of the
squared residuals.
REGRESSION ANALYSIS

 Regression analysis is used primarily to  b0 provides an estimate of β0.


model causality and provide prediction.
 b1 provides an estimate of β1.
 Predicts the value of a dependent (response)
variable based on the value of at least one INTERPRETATION OF THE SLOPE AND THE
independent (explanatory) variable. INTERCEPT
 A dependency of one variable on the other.

POPULATION LINEAR REGRESSION

Population regression line is a straight line that


describes the dependence of the average value of one
variable on the other.

Note:

 When b1>0, Y increases as X increases. In


this case, we say that Y is directly or
positively related to X.
 When b1<0, Y decreases as X increases, and
we say that Y is inversely or negatively
related to X.
 When b1=0, Y is a constant and is equal to
the y-intercept. This implies that there is no
change in Y whatever X value is. This
implies that variable x and y have no
relationship.

Example:

SAMPLE LINEAR REGRESSION


Example:

Given the following


information, determine if
the square footage of the
store affecting its annual
sales?

Solution:
Solution:

CONFIDENCE INTERVAL OF THE SLOPE

The slope of 1.487 means that for each increase of


one unit in X, we predict the average of Y to increase by
an estimated 1.487 units.

The model estimates that for each increase of one


square foot in the size of the store, the expected annual
sales are predicted to increase by $1487.

INFERENCE ABOUT THE SLOPE: t-TEST

 t - test for a population slope.


INFERENCE ABOUT THE SLOPE: F-TEST
 Is there a linear dependency of Y on X?

 F - test for a population slope.


 Is there a linear dependency of Y on X?
A simple technique for prediction is through linear
regression analysis which utilizes an equation of the
form.

RELATIONSHIP BETWEEN t-TEST AND F-TEST

CALCULATION OF THE REGRESSION


EQUATION

Example:

Given the following


information, determine if
the square footage of the
store affecting its annual
sales?

Solution:
Example:

Find an equation that describe the relationship


between the output of sample of Tackey Toy employee
and their aptitude test.

Since the p-value (0.0003) is less than the level of


significance 0.05, we reject the null hypothesis.
Therefore, there is a linear dependency on the annual
sales of produce stores on their size in square footage
and there is evidence that square footage affects annual
sales.

Solution:
Suppose, for example, that the unfortunate Hiram
Ramos, personnel manager for Tackey Toy, is
considering hiring an applicant who scored a 4 on the
aptitude test. The supervisor of the department wants
someone hired who can produce an average of 30 dozen
units. Of course, it is not possible to tell exactly what
the applicant’s future production might be. By
substituting 4 for x in the regression equation, we have,

Therefore the manager can not hired an employee


who scored 4 because he can only produce 21.58 dozen.

RESIDUAL ANALYSIS

 Purposes
- Examine linearity
- Evaluate violations of assumptions
 Plot residuals vs. Xi, Yi, and time
- Graphical Analysis of Residuals

PITFALLS OF REGRESSION ANALYSIS


 Lacking an awareness of the assumptions
underlying least-squares regression.
 Not knowing how to evaluate assumptions.
 Not knowing the alternatives to classical
regression if some assumption is violated.
 Using a regression model without knowledge
of the subject matter.

STRATEGIES FOR AVOIDING PITFALLS OF


REGRESSION

 Start with a scatter plot of X on Y to observe


possible relationship.
 Perform residual analysis to check the
assumptions.
- Use a histogram, stem-and-leaf display,
box-and-whisker plot, or normal
probability plot of the residuals to
uncover possible non-normality.
 If there is violation of any assumption, use
alternative methods to least-squares
regression or alternative least-squares models
(e.g., Curvilinear or multiple regression)
 If there is no evidence of assumption
violation, then test for the significance of the
regression coefficients.

You might also like