0% found this document useful (0 votes)
129 views

DSME 2011 G, I and J Assignment 6: Sampling Distributions, One-And Two - Sample Tests, Regression

This document discusses sampling distributions, one-sample and two-sample tests, and regression. It provides instructions for a group assignment on these topics that is due on December 5th. The assignment includes a mini-case on what went wrong with election polls in 2016 and multiple choice questions for exam preparation. The central limit theorem and its implications for sampling distributions are important concepts covered in the questions.

Uploaded by

mikotgg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views

DSME 2011 G, I and J Assignment 6: Sampling Distributions, One-And Two - Sample Tests, Regression

This document discusses sampling distributions, one-sample and two-sample tests, and regression. It provides instructions for a group assignment on these topics that is due on December 5th. The assignment includes a mini-case on what went wrong with election polls in 2016 and multiple choice questions for exam preparation. The central limit theorem and its implications for sampling distributions are important concepts covered in the questions.

Uploaded by

mikotgg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Sampling Distributions, One- and Two- Sample Tests, Regression

DSME 2011 G, I and J


Assignment 6

This group assignment is due by Dec 5 (M) 5pm. Please send a hardcopy to the drop box in 2/F, CYT. Put down the
group name and all members’ names and SIDs on the first page. Late submission will lead to a deduction in points.

NOTE: I was planning to use Part 2 of Good Luck Lemon Tea (Part I appeared in Assignment 3), but the final exam
came sooner than expected. So I use a simpler case here and some MC questions for your exam preparation. There
will not be a separate Review Question as announced before.

Section I: Mini-Case
What Went Wrong in the Election Polls?

Polling and the Trump Big Fail


COMMENTARY by Alan Murray
Fortune, November 14, 2016, 8:08 AM EST

If nothing else, Election 2016 proved that the old models are broken
Last week’s Big Fail by the press and punditocracy in predicting Donald Trump’s election has been attributed by
some to rampant bias. That’s an understandable charge: Trump supporters in media or among the political class
were few and far between.
But the real problem was the polls, which misled analysts of all stripes. Their failure raises the question: is public
opinion polling, one of the great innovations of the 20th century, dying in the 21st? That has implications for
business as well as politics.
My former colleagues at the Pew Research Center have done a good job summarizing the three leading theories
behind the poll miss in a blog post here.
The first is the most fundamental – known as “non-response bias.” The science of polling is based on an
understanding that you can accurately infer the attitudes of a large population by polling a relatively small random
sample. Random selection of phone numbers thus became the heart of modern polling, and the best pollsters – like
Pew – extended that to cell phones. But these days, most people won’t take the time to answer a pollster’s
questions. Response rates have fallen below 10%. And while statistical methods are used to ensure the polling
sample accurately represents the broader population, there’s still the nagging question of whether some inherent
bias separates those who will talk to pollsters from those who won’t. It’s a big issue for polling generally. But I’m
not convinced it has much to do with the Trump fail.
Second is the “shy Trumpers” theory – the notion that some Trump supporters weren’t willing to admit their
support to telephone pollsters. But comparing telephone polls with online polls suggests that difference was pretty
small. So again, not persuasive.
That leads to the third, and most important, reason. Pollsters may have done a pretty good job measuring the
attitudes of the American public. Where they failed was in accurately predicting who would show up at the polls.
Their “likely voter” models were based largely on history; and there was nothing in recent history like Election
2016.

1 of 8
Bottom line: today’s polls, while challenged by the difficulties of obtaining truly random samples, are nevertheless
still pretty good at divining public attitudes. But what they can’t do is predict public behavior. “The weakest point of
all election polls is likely voter modeling,” says my friend Jon Cohen, chief research officer at SurveyMonkey. “That’s
now completely broken.”

Most election polls predicted Hilary Clinton to be the next US president, but it’s Donald Trump who won, by a
significant margin (in terms of electoral vote). What went wrong in the opinion polls?

1 According to the article above, election polls are not very useful in predicting the winner. There can be
“non-response bias” and other errors. Briefly explain what they are.

2 Find two more articles that discuss why most election polls gave a wrong prediction. For each of these
articles, list the reasons they provide. Attach the articles to your answer.

3 Can you think of other reason(s) that may explain what went wrong in the election polls?

4 Suggest way(s) to improve on the process of election polls.

Section II: MULTIPLE CHOICE


Choose the one alternative that best completes the statement or answers the question.

1) For air travelers, one of the biggest complaints is of the waiting time between when the airplane taxis away from
the terminal until the flight takes off. This waiting time is known to have a right skewed distribution with a mean of
10 minutes and a standard deviation of 8 minutes. Suppose 100 flights have been randomly sampled. Describe the
sampling distribution of the mean waiting time between when the airplane taxis away from the terminal until the
flight takes off for these 100 flights.
A) Distribution is approximately normal with mean = 10 minutes and standard error = 0.8 minutes.
B) Distribution is right skewed with mean = 10 minutes and standard error = 8 minutes.
C) Distribution is approximately normal with mean = 10 minutes and standard error = 8 minutes.
D) Distribution is right skewed with mean = 10 minutes and standard error = 0.8 minutes.

2) Suppose a sample of n = 50 items is selected from a population of manufactured products and the weight, X, of
each item is recorded. Prior experience has shown that the weight has a probability distribution with μ = 6 ounces
and σ = 2.5 ounces. Which of the following is true about the sampling distribution of the sample mean if a sample
of size 15 is selected?
A) The shape of the sampling distribution is approximately normal.
B) The standard deviation of the sampling distribution is 2.5 ounces.
C) The mean of the sampling distribution is 6 ounces.
D) All of the above are correct.

2 of 8
3) The Central Limit Theorem is important in statistics because
A) for a large n, it says the sampling distribution of the sample mean is approximately normal, regardless of the
shape of the population.
B) for any population, it says the sampling distribution of the sample mean is approximately normal, regardless of
the sample size.
C) for any sized sample, it says the sampling distribution of the sample mean is approximately normal.
D) for a large n, it says the population is approximately normal.

4) Major league baseball salaries averaged $3.26 million with a standard deviation of $1.2 million in a certain year
in the past. Suppose a sample of 100 major league players was taken. Find the approximate probability that the
mean salary of the 100 players was no more than $3.0 million.
A) 0.9849
B) Approximately 0
C) 0.0151
D) Approximately 1

5) For sample size 16, the sampling distribution of the mean will be approximately normally distributed
A) if the sample is normally distributed.
B) if the shape of the population is symmetrical.
C) if the sample standard deviation is known.
D) regardless of the shape of the population.

6) A university dean is interested in determining the proportion of students who receive financial aid. Rather than
examine the records for all students, the dean randomly selects 200 students and finds that 118 of them are
receiving financial aid. The 95% confidence interval for π is 0.59 ± 0.07. Interpret this interval.
A) We are 95% confident that the true proportion of all students with financial aid is between 0.52 and 0.66.
B) We are 95% confident that 59% of the students are on financial aid.
C) We are 95% confident that between 52% and 66% of the sampled students receive financial aid.
D) 95% of the students get between 52% and 66% of their tuition paid for by financial aid.

7) A major department store chain is interested in estimating the mean amount its credit card customers spent on
their first visit to the chain's new store in the mall. Fifteen credit card accounts were randomly sampled and
analyzed with the following results: = $50.50 and S = 20. Assuming the distribution of the amount spent on their
first visit is approximately normal, what is the shape of the sampling distribution of the sample mean that will be
used to create the desired confidence interval for μ?
A) A t distribution with 14 degrees of freedom
B) A t distribution with 15 degrees of freedom
C) A standard normal distribution
D) Approximately normal with a mean of $50.50

3 of 8
8) An economist is interested in studying the incomes of consumers in a particular country. The population
standard deviation is known to be $1,000. A random sample of 50 individuals resulted in a mean income of
$15,000. What is the upper end point in a 99% confidence interval for the average income?
A) $15,364
B) $15,052
C) $15,330
D) $15,141

9) Suppose a 95% confidence interval for μ turns out to be (1,000, 2,100). What is meant by "95% confident"?
A) 95% of the observations in the entire population fall in the given interval.
B) In repeated sampling, the population parameter would fall in the given interval 95% of the time.
C) In repeated sampling, 95% of the intervals constructed would contain the population mean.
D) 95% of the observations in the sample fall in the given interval.

10) A pizza chain is considering opening a new store in an area without any such stores. The chain will open if there
is evidence that more than 5,000 of the 20,000 households in the area have a favorable view of its chain. It
conducts a telephone poll of 300 randomly selected households in the area and finds that 96 have a favorable view.
The pizza chain's conclusion from the hypothesis test using a 5% level of significance is:
A) to delay opening a new store until additional evidence is collected.
B) We cannot tell what the decision should be from the information given.
C) to open a new store.
D) not to open a new store.

11) If we are performing a two-tail test of whether μ = 100, the probability of detecting a shift of the mean to 105
will be ________ the probability of detecting a shift of the mean to 110.
A) less than
B) equal to
C) not comparable to
D) greater than

12) The marketing manager for an automobile manufacturer is interested in determining the proportion of new
compact-car owners who would have purchased a GPS navigation system if it had been available for an extra cost
of $300. The manager believes from previous information that the proportion is 0.30. Suppose that a survey of 200
new compact-car owners is selected and 79 indicate that they would have purchased the GPS system. If you were
to conduct a test to determine whether there is evidence that the proportion is different from 0.30 and decided not
to reject the null hypothesis, what conclusion could you reach?
A) There is not sufficient evidence that the proportion is not 0.30.
B) There is sufficient evidence that the proportion is 0.30.
C) There is not sufficient evidence that the proportion is 0.30.
D) There is sufficient evidence that the proportion is 0.30.

4 of 8
13) The quality control engineer for a furniture manufacturer is interested in the mean amount of force necessary
to produce cracks in stressed oak furniture. She performs a two-tail test of the null hypothesis that the mean for
the stressed oak furniture is 650. The calculated value of the Z test statistic is a positive number that leads to a p-
value of 0.080 for the test. Suppose the engineer had decided that the alternative hypothesis to test was that the
mean was greater than 650. What would be the p-value of this one-tail test?
A) 0.160
B) 0.840
C) 0.040
D) 0.960

14) The t test for the difference between the means of 2 independent populations assumes that the respective
A) sample variances are equal.
B) populations are approximately normal.
C) sample sizes are equal.
D) All of the above.

15) A researcher randomly sampled 30 graduates of an MBA program to study the effect of gender on starting
salaries. The result of the pooled-variance t test of the mean salaries of the females (Population 1) and males
(Population 2) in the sample is given below.

If the researcher was attempting to show statistically that the female MBA graduates have a significantly lower
mean starting salary than the male MBA graduates. What is the alternative hypothesis?
A) H1: μfemales ≠ μmales
B) H1: μ females > μ males
C) H1: μ females = μ males
D) H1: μ females < μ males

5 of 8
16) A large national bank charges local companies for using their services. A bank official reported the results of a
regression designed to predict the bank's charges (Y)-measured in dollars per month-for services rendered to local
companies. One independent variable used to predict service charges to a company is the company's sales revenue
(X)-measured in millions of dollars. Data for 21 companies who use the bank's services were used to fit the model:
Yi = β0 + β1Xi + εi
The results of the simple linear regression are provided below.
= -2,700 + 20X, SYX = 65, two-tail p-value = 0.034 (for testing β1)

A 95% confidence interval for β1 is (15, 30). Interpret the interval.


A) You are 95% confident that mean service charge (Y) will increase between $15 and $30 for every $1 million
increase in sales revenue (X).
B) You are 95% confident that the sales revenue (X) will increase between $15 and $30 million for every $1 increase
in service charge (Y).
C) At the α = 0.05 level, there is no evidence of a linear relationship between service charge (Y) and sales revenue
(X).
D) You are 95% confident that the mean service charge will fall between $15 and $30 per month.

SCENARIO 13-12
The manager of the purchasing department of a large saving and loan organization would like to develop a model
to predict the amount of time (measured in hours) it takes to record a loan application. Data are collected from a
sample of 30 days, and the number of applications recorded and completion time in hours is recorded. Below is the
regression output:

6 of 8
17) Referring to Scenario 13-12, the value of the measured t test statistic to test whether the amount of time
depends linearly on the number of loan applications recorded is
A) 15.2388
B) 3.2559
C) 0.8924
D) 232.2200

18) The standard error of the mean


A) measures the variability of the mean from sample to sample.
B) decreases as the sample size increases.
C) is never larger than the standard deviation of the population.
D) All of the above.

19) At a computer manufacturing company, the actual size of computer chips is normally distributed with a mean of
1 centimeter and a standard deviation of 0.1 centimeter. A random sample of 12 computer chips is taken. What is
the standard error for the sample mean?
A) 0.029
B) 0.050
C) 0.120
D) 0.091

20) A confidence interval was used to estimate the proportion of statistics students who are female. A random
sample of 72 statistics students generated the following 90% confidence interval: (0.438, 0.642). Using the
information above, what total size sample would be necessary if we wanted to estimate the true proportion to
within ±0.08 using 95% confidence?
A) 420
B) 105
C) 597
D) 150

21) An appliance manufacturer claims to have developed a compact microwave oven that consumes a mean of no
more than 250 W. From previous studies, it is believed that power consumption for microwave ovens is normally
distributed with a population standard deviation of 15 W. A consumer group has decided to try to discover if the
claim appears true. They take a sample of 20 microwave ovens and find that they consume a mean of 257.3 W.
What are the appropriate hypotheses to determine if the manufacturer's claim appears reasonable?
A) H0: μ ≤ 250 versus H1: μ > 250.
B) H0: μ ≥ 257.3 versus H1: μ < 257.3.
C) H0: μ ≥ 250 versus H1: μ < 250.
D) H0: μ = 250 versus H1: μ ≠ 250.

7 of 8
22) Major league baseball salaries averaged $3.26 million with a standard deviation of $1.2 million in a certain year
in the past. Suppose a sample of 100 major league players was taken. Find the approximate probability that the
mean salary of the 100 players exceeded $4.0 million.
A) 0.0228
B) Approximately 0
C) Approximately 1
D) 0.9772

23) When determining the sample size for a proportion for a given level of confidence and sampling error, the
closer to 0.50 that π is estimated to be, the sample size required ________.
A) can be smaller, larger or unaffected
B) is larger
C) is smaller
D) is not affected

24) Which of the following statements is not true about the level of significance in a hypothesis test?
A) The significance level is also called the α level.
B) The level of significance is the maximum risk we are willing to accept in making a Type I error.
C) The significance level is another name for Type II error.
D) The larger the level of significance, the more likely you are to reject the null hypothesis.

25) A candy bar manufacturer is interested in trying to estimate how sales are influenced by the price of their
product. To do this, the company randomly chooses 6 small cities and offers the candy bar at different prices. Using
candy bar sales as the dependent variable, the company will conduct a simple linear regression on the data below:

City Price ($) Sales


River Falls 1.30 100
Hudson 1.60 90
Ellsworth 1.80 90
Prescott 2.00 40
Rock Elm 2.40 38
Stillwater 2.90 32

What is the estimated slope for the candy bar price and sales data?
A) 0.784
B) 161.386
C) -3.810
D) -48.193
8 of 8

You might also like