STAR Rando Questions Stats
STAR Rando Questions Stats
Why look at regression statistics? o Regression is used when a researcher wants to establish this
relationship as a basis for PREDICTION
--> Ex. Good for clinical decision making and predicting quantifi-
able clinical outcomes (prognosis)
1) Linear regression
2 types of regression analysis:
2) Nonlinear regression
1) NONLINEAR REGRESSION
NOTE: applying a linear regression line to curvilinear data, will not
allow us to get a full hold on all the data. we will miss some of the
"moments"
we use a different formula!
how to we adjust for nonlinear regression?
called: QUADRATIC EQUASION
defines the quadratic curve
Quadratic equation (deF)
Y-hat = a + b1X + b2X^2
polynomial regression (def) process of deriving the quadratic equation
lowest p-value!
Researchers must decide to use a linear or curvilinear regression 1) do an analysis of variance for both linear and nonlinear regres-
model based on visual inspection of the scatter plot and then sion lines
doing an analysis of variance to determine if their pick was the
right model for the set of data. After doing the analysis of variance 2) lowst p-value will tell you which line to use
for each equation, what variable is used to determine which one
is better?
linear regression line (def) • In a perfect correlation, all data points will fall on a straight line
2) Some data points will be above the regression line and some
When r < 1, 3 things occur: below. Some will fall far from the line and some will fall closely
the line that gives you the SMALLEST sum of squares is the line
of best fit
why do we want the line that gives us the SMALLEST sum of the smaller the sum of squares, the closer the data points will be
squares to the line of best fit
ASSUMPTIONS FOR REGRESSION ANALYSIS
1) For any given value of X, Y scores exists
’ MEANING, if we sampled more subjects, we'd see a distribution
of Y scores for each value of X
--> ex. The BP scores in our data set is a random observation from
the larger population distribution of all possible BP scores for that
What 2 assumptions do we make:
age
2/6
REGRESSION
Study online at https://quizlet.com/_190zv4
• Dependent: clinical
performance
• Covariate: GPA
need to adjust means of the covariates to look at the true regres-
sion relationship
• After making the covariate means the same, we can use re-
gression lines to predict what the mean score for the X and Y
variables would be if the covariate were not present, by comparing
the covariate to the dependent variable
the dependent variable is the one that will make the largest shift if
a co-variate is effecting the results of the data.
why do we compare the covariate to the dependent variable?
the independent variable stands alone and will not be effected
• 1) Linearity of the covariate
• 2) Homogeneity of slopes
2) Homogeneity of slopes
--> Testing for homogeneity of slopes should be done BEFORE
the ANCOVA
3) Independence of the covariate o Do not want the indep variable to influence the value of the
covariate
as many as we want!
5/6
REGRESSION
Study online at https://quizlet.com/_190zv4
• They must all be related to the dependent variable and indepen-
dent to the indep variable
how many covariates can we examine at once?
• Ex. If we wanted to compare strength @ different ranges ’ covari-
ates may = height, weight, limb girth, BMI, etc
When is ANCOVA run> after ANOVA and initial regression line has been calculated
• Although ANCOVA does increase power, it does not provide a
safeguard against problems in study design ’ the ANCOVA is not
a substitute for using/not using randomization and a good study
design
6/6
chi square
Study online at https://quizlet.com/_5xslur
The actual counts in the cells of a contingency table are referred
False
to as the expected cell frequencies.
D. The alternative hypothesis states that the two classifications are
statistically independent
NOT If ri is the row total for row i and cj is the column total for col-
umn j, then the estimated expected cell frequency corresponding
to row i and column j equals ricj/n.
When we carry out a chi-square test for independence, the null
The chi-square test is valid if all of the estimated expected cell
hypothesis states that the tow relevant classifications.
frequencies are at least 5.
The chi-square statistic is based on (r-i) (c-i) degrees of freedom
where r and c denote the number of rows and columns respec-
tively in the contingency table.
None of the above.
NOTIf ri is the row total for row i and cj is the column total for col-
umn j, then the estimated expected cell frequency corresponding
to row i and column j equals ricj/n.
Which of the following statements about the chi-square test of
The chi-square test is valid if all of the estimated expected cell
independence is false?
frequencies are at least 5.
The alternative hypothesis states that the two classifications are
The chi-square statistic is based on (r-i) (c-i) degrees of freedom
statistically independent.
where r and c denote the number of rows and columns respec-
tively in the contingency table.
None of the above.
In a contingency table, when all the expected frequencies equal
True
the observed frequencies the calculated x2 statistic equals zero.
When using chi-square goodness of fit test with multinomial prob-
abilities, the rejection of the null hypothesis indicates that at least
True
one of the multinomial probabilities is not equal to the value stated
in the null hypothesis.
A fastener manufacturing company uses chi-square goodness of
fit test to determine if the population of all lengths of 1/4 inch bolts
it manufactures is distributed according to a normal distribution. If
False
we reject the null hypothesis, it is reasonable to assume that the
population distribution is at least approximately normally distrib-
uted.
When using the chi-square goodness of fit test, if the value of the
True
chi-square statistic is large enough, we reject the null hypothesis
Correct: increases
In performing a chi-square test of independence, as the differ-
NOT decreases
ence between the respective observed and expected frequencies
may decrease of increase depending on the number of rows and
decrease, the probability of concluding that the row variable is
columns.
independent of the column variable
remains the same.
E. at least 5.
NOT
A chi-square goodness of fit test is considered to be valid if each greater than zero.
of the expected cell frequencies is ___________________. less than 5.
between 0 and 5.
at most 1
The x2 statistic from a contingency table with 6 rows and five correct: 20
columns will have _____ degrees of freedom NOT 30 24 5 25
The chi-square goodness of fit test is _________ a one-tailed test
Yes always NOT sometimes or never
with the rejection point in the right tail.
The x2 statistic is used to test whether the assumption of normal-
ity is reasonable for a given population distribution. The sample correct 3
consists of 5000 observations and is divided into 6 categories NOT 4999 6 5 4
(intervals). The degrees of freedom for the chi-square statistic is:
1/6
chi square
Study online at https://quizlet.com/_5xslur
Statistical inference
The process of using sample statistics to draw conclusions about Statistical inference
true population parameters is called:
The classification of student class designation (freshman, sopho-
Categorical random variable
more, junior, senior) is an example of:
To monitor campus security, the campus police office is taking a
survey of the number of students in a parking lot each 30 minutes
of a 24-hour period with the goal of determining when patrols a discrete random variable
of the lot would serve the most students. If X is the number of
students in the lot each period of time, then X is an example of:
The personnel director at a large company studied the eating
habits of the company's employees. The director noted whether
employees brought their own lunches to work, ate at the company
an observational study
cafeteria, or went out to lunch. The goal of the study was to
improve the food service at the company cafeteria. This type of
data collection would best be considered as:
A sample of 200 students at a Big-Ten university was taken after
the midterm to ask them whether or not they studied for the exam
before the midterm, and whether they did well or poorly on the
midterm. The following table contains the result.
Referring to the above table, of those who did not study for the
exam, _______ percent of them did well on the midterm.
An insurance company evaluates many numerical variables about
a person before deciding on an appropriate rate for automobile in-
surance. A representative from a local insurance agency selected
50
a random sample of insured drivers and recorded, X, the number
of claims each made in the last 3 years, with the following results.
Xf
Which measure of central tendency can be used for both numer-
mode
ical and categorical variables?
Which of teh following is NOT a measure of central tendency? interquartile range
A business venture can result in the following outcomes (with
their corresponding chance of occurring in parentheses): Highly
Successful (10%), Successful (25%), Break Even (25%), Disap-
20%
pointing (20%), and Highly Disappointing (?). If these are the only
outcomes possible for the business venture, what is the chance
that the business venture will be considered Highly Disappointing
The employees of a company were surveyed on questions re-
garding their educational background and marital status. Of the
600 employees, 400 had college degrees, 100 were single, and 60 0.733
were single college graduates. The probability that an employee
of the company is single or has a college degree is:
Thirty-six of the staff of 80 teachers at a local school are certified in
CPR. In 180 days of school, about how many days can we expect 81
that the teacher on bus duty will likely be certified?
If n=10 and p=0.70, then the mean of the binomial distribution is: 7.00
A professor receives, on average, 24.7 e-mails from her students
per 24 hour period on the day before the midterm exam. To
Poisson distribution
compute the probability of receiving at least 10 e-mails on such a
day, what type of probability distribution should be used?
For some positive value of Z, the probability that a standard normal
1.16
variable is between 0 and Z is 0.3770. Therefore the value of Z is:
2/6
chi square
Study online at https://quizlet.com/_5xslur
FoFor some positive value of X, the probability that a standard
normal variable is between 0 and +1.5X is equal to 0.4332. There- 1.00
fore the value of X is:
2 of every 3 observations would fall between 1 standard deviation
above and below the mean.
4 of every 5 observations would fall between 1.28 standard
If a particular batch of data is approximately normally distributed,
deviation above and below the mean.
we would find that approximately
19 of every 20 observations would fall between 2 standard devi-
ation above and below the mean.
ALL OF THE ABOVE
A company that sells annuities must base the annual payout on
the probability distribution of the length of life of the participants
in the plan. Suppose the probability distribution of the lifetimes of
the participants is approximately normally distributed with a mean 0.0228
of 68 years of age and a standard deviation of 3.5 years. What
proportion of the plan recipients would receive payments beyond
age 75?
Weight # Jars
You work at Smuckers in Orrville, Ohio and are in charge of the 8.0 oz. 300
grape jelly production line. The weight of jars of grape jelly and the 8.5 oz. 287
number of jars at each weight for the last month (4 weeks) are as 8.9 oz. 128
follows: 9.0 oz. 67
ANSWER 8.42 oz
You work at Smuckers in Orrville, Ohio and are in charge of the
grape jelly production line. The weight of jars of grape jelly and the
0.1366
number of jars at each weight for the last month (4 weeks) are as
follows:
There are 5 starting players on the Bulldog basketball team. Each
player wanted to improve her shooting percentage, especially
Calculate the mean, median, mode, variance and standard devi-
Hannah. The number of baskets Hannah made in the last 10
ation and range and list them in that order.
games are listed below:
15.6, 15.5, 12, 49.16, 7.01, 23
12,13,21,18,19,8,4,22,12,27
The width of a confidence interval estimate for a proportion will be: narrower for 90% confidence than for 95% confidence
If you were constructing a 99% confidence interval of the pop-
ulation mean based on a sample of n=25 where the standard 2.7969
deviation of the sample s=0.05, the critical value of t will be:
A confidence interval was used to estimate the proportion of sta-
tistics students who are females. A random sample of 72 students
generated a 90% confidence interval of 0.438 to 0.642. Therefore, False
it is reasonable to assume that the population of female students
should be above 50%.
Suppose a 95% confidence interval turns out to be 1,000 to 2,100.
If the researcher wishes to reduce the width of the interval. One True
way he can do that is to increase the sample size
A major department store chain is interested in estimating the
average amount its credit card customers spent on their first
visit to the store. 85 accounts were randomly selected. The 95% most of his customers are spending more than $50
confidence interval was $56.89 t0 $145.78. Therefore, the store
manager can be 95% confident that
If the calculated Z-score falls in the rejection region (in the tail of
rejected
the bell curve), then the null hypothesis is:
A Type II error is committed when we reject a null hypothesis that
False
is true.
How many Kleenex should a Kimiberly Clark Corp. package of
tissues contain? Researchers determined that, on average, a
person uses 60 tissues during a cold, and the population standard
3/6
chi square
Study online at https://quizlet.com/_5xslur
deviation is 22. Suppose a random sample of 100 Kleenex users
Critical Value = 1.96; Calculated Z = 3.64; Reject the null hypoth-
yielded a mean of 52. Test the following hypothesis at the 5%
esis
significance level.
Given your conclusion in question #8, you should recommend that
False
Kimberly Clark Corp. put more than 60 tissues in each box.
In a sample of 265 subjects, the average score on an examination
was 63.8. Historically, the population standard deviation has been [63.43 to 64.17]
à = 3.08.What is the 95% confidence interval estimate for the µ?
When testing for independence in a contingency table with 3 rows
6
and 4 columns, there are ______ degrees of freedom.
If we wish to determine whether there is evidence that the propor-
tion of items of interest is the same in group 1 as in group 2, the True
appropriate tests to use are the Z test and the chi-square test
A manufacturing company produces bike frames and makes 1200
per year. These bike frames can be produced using three different
processes. Each process has a unique cost associated with it.
The CEO wants to know if the manufacturing processes are
5.99 CRITICAL VALUE
related to the overall cost of bike frame production. The following
12.56 CHI SQUARE VALUE
contingency table gives the number of bike frames produced with
each process and cost level. At a significance level of .05, perform
the chi-square test of independence to determine if production
process is related to production cost.
5. Question : In the past, of all the students enrolled in "Ba-
sic Business Statistics" 10% earned A's, 20% earned B's, 30%
earned C's, 20% earned D's and the remaining 20% either failed
or withdrew from the course.
Use a = .05, and determine if the grade distribution for Dr. John-
son's class is different than the historical grade distribution.
In a contingency table, when all the expected frequencies equal
the observed frequencies the calculated Chi-square statistic True
equals zero.
The actual counts in the cells of a contingency table are referred
False
to as the expected cell frequencies
In general, the degrees of freedom in a contingency table are
(rows -1)(columns-1)
equal to:
The chi-square goodness of fit is always a one-tailed test with the
True
rejection region in the right tail.
When conducting the chi-square goodness of fit test for your
Integrated Research Project, you will be using "Agree, Disagree
False
and No opinion" for your survey responses. Therefore the degrees
of freedom used to obtain the critical value is 2.
The Dalton Baseball Team is trying to analyze the number of hits
thier team got last season in order to train in the off season and
improve their record for next season. The number of hits each of
the 15 players received are as follows:
Baseball Player # Hits
1 34
4/6
chi square
Study online at https://quizlet.com/_5xslur
2 21
3 14
4 13
5 15
6 16
18, 21, 15
7 19
8 21
21, 16.53, 23
99
10 11
X 17.53, 16, 21
11 18
12 21
13 14
14 21
15 16
The weather is bad for the Northern Ohio area and storms are be-
ing forecasted. If the probability of a thunderstorm hitting Canton,
Ohio, is 67%, and the probability of hail is 34%, and the probability
73%
of both a thunderstorm and hail hitting is 28%, what then is the
probability of either a thunderstorm or hail hitting the Canton, Ohio
are today?
The t distribution approaches the _______________ as the sam-
Z, increases
ple size ___________.
When the sample size and sample standard deviation remain the
same, a 99% confidence interval for a population mean, µ will be Wider than
_________________ the 95% confidence interval for µ
When the population is normally distributed, population standard
deviation s is unknown, and the sample size is n = 15; the confi- The t distribution
dence interval for the population mean µ is based on:
In a manufacturing process a random sample of 36 bolts manu-
factured has a mean length of 3 inches with a standard deviation
2.865 to 3.136
of .3 inches. What is the 99% confidence interval for the true mean
length of the bolt?
The chi-square goodness of fit test for multinomial probabilities
4
with 5 categories has _____ degrees of freedom
A U.S. based internet company offers an on-line proficiency
course in basic accounting. Completion of this online course
satisfies the "Fundamentals of Accounting" course requirement
in many MBA programs. In the first semester 315 students have
enrolled in the course. The marketing research manager divided 15.56
the country into seven regions of approximately equal populations.
The course enrollment values in each of the seven regions are
given below. The management wants to know if there is equal
interest in the course across all regions. 45,60,30,40,50,55,35,
You have diligently collected your surveys and crunched all of
the numbers for your primary research on your IRP. One of the
questions you asked was as folows:
The Orrville Public Library would see an increase in number of
customers if they extended their evening hours to stay open until 17.16
11:00 pm on weekend nights.
You had 57 surveys collected from library adminstrators and staff,
with 39 Agree and 10 disagree. What is the calculated Chi-Square
value for this test?
You have diligently collected your surveys and crunched all of
the numbers for your primary research on your IRP. One of the
questions you asked was as folows:
The Orrville Public Library would see an increase in number of The critical value is 3.84, an yes it is significant
customers if they extended their evening hours to stay open until
11:00 pm on weekend nights.
You had 57 surveys collected from library adminstrators and
5/6
chi square
Study online at https://quizlet.com/_5xslur
staff, with 39 Agree and 10 disagree. Based on the calculated
Chi-square value from above, what is the critical value you will
compare that to and is this data now proven to be significant?
6/6
Statistics 101
Data Analysis and Statistical Inference
In-class problems on confidence intervals
Decide whether the following statements are true or false. Explain your reasoning.
Problems:
a) For a given standard error, lower confidence levels produce wider confidence intervals.
False. To get higher confidence, we need to make the interval wider interval. This is evident in the multiplier,
which increases with confidence level.
b) If you increase sample size, the width of confidence intervals will increase.
False. Increasing the sample size decreases the width of confidence intervals, because it decreases the standard
error.
c) The statement, "the 95% confidence interval for the population mean is (350, 400)", is equivalent to the
statement, "there is a 95% probability that the population mean is between 350 and 400".
False. 95% confidence means that we used a procedure that works 95% of the time to get this interval. That is,
95% of all intervals produced by the procedure will contain their corresponding parameters. For any one
particular interval, the true population percentage is either inside the interval or outside the interval. In this
case, it is either in between 350 and 400, or it is not in between 350 and 400. Hence, the probabliity that the
population percentage is in between those two exact numbers is either zero or one.
d) To reduce the width of a confidence interval by a factor of two (i.e., in half), you have to quadruple the
sample size.
True, as long as we're talking about a CI for a population percentage. The standard error for a population
percentage has the square root of the sample size in the denominator. Hence, increasing the sample size by a
factor of 4 (i.e., multiplying it by 4) is equivalent to multiplying the standard error by 1/2. Hence, the interval
will be half as wide. This also works approximately for population averages as long as the multiplier from the t-
curve doesn't change much when increasing the sample size (which it won't if the original sample size is large).
e) Assuming the central limit theorem applies, confidence intervals are always valid.
By "valid," we mean that the confidence interval procedure has a 95% chance of producing an interval that
contains the population parameter.
False. The central limit theorem is needed for confidence intervals to be valid. However, it is also necessary
that the data be collected from random samples. Confidence intervals will not remedy poorly collected data.
f) The statement, "the 95% confidence interval for the population mean is (350, 400)" means that 95% of the
population values are between 350 and 400.
False. The confidence interval is a range of plausible values for the population average. It does not provide a
range for 95% of the data values from the population. To find the percentage of values in the population
between 350 and 400, we need to look at a histogram of the data values and determine what percentage of
observations are between 350 and 400.
g) If you take large random samples over and over again from the same population, and make 95% confidence
intervals for the population average, about 95% of the intervals should contain the population average.
h) If you take large random samples over and over again from the same population, and make 95% confidence
intervals for the population average, about 95% of the intervals should contain the sample average.
False. The confidence interval is a range for the population average, not for the sample average. In fact, every
confidence interval contains its corresponding sample average, because CIs are of the form: sample avg. +/-
multiplier SE. So, the sample average is right in the middle of the CI.
i) It is necessary that the distribution of the variable of interest follows a normal curve.
False. It is necessary that the distribution of the sample average follows a normal curve. The data values of the
variable, however, need not follow a normal curve, because if the sample size is large enough the central limit
theorem for the sample average will apply.
j) A 95% confidence interval obtained from a random sample of 1000 people has a better chance of containing
the population percentage than a 95% confidence interval obtained from a random sample of 500 people.
False. All 95% confidence intervals have the property that they come from a procedure that has a 95% chance of
yielding an interval that contains the true value. The confidence interval method automatically accounts for
sample size in the standard error. A 95% CI with n=1000 will be narrower than a 95% CI with n=500, but both
CIs will have 95% confidence of containing the population percentage.
k) If you make go through life making 99% confidence intervals for all sorts of population means, about 1% of
the time the intervals won't cover their respective population means.
True. Since 99% of the intervals should contain the corresponding population mean, 1% of them will not.