Comparison of Means: Hypothesis Testing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Hypothesis Testing

In hypothesis testing, an analyst tests a statistical sample, with the goal of


providing evidence on the plausibility of the null hypothesis.

Statistical analysts test a hypothesis by measuring and examining a random


sample of the population being analyzed. All analysts use a random
population sample to test two different hypotheses: the null hypothesis and the
alternative hypothesis.

The null hypothesis is usually a hypothesis of equality between population


parameters; e.g., a null hypothesis may state that the population mean return
is equal to zero. The alternative hypothesis is effectively the opposite of a null
hypothesis (e.g., the population mean return is not equal to zero). Thus, they
are mutually exclusive, and only one can be true. However, one of the two
hypotheses will always be true.

Comparison of Means
There are many cases in statistics where you’ll want to compare means for two populations or
samples. Which technique you use depends on what type of data you have and how that data is
grouped together.

The four major ways of comparing means from data that is assumed to be normally distributed are:
1. Independent Samples T-Test. Use the independent samples t-test when you want
to compare means for two data sets that are independent from each other. Click here for a
step by step article.
2. One sample T-Test. Choose this when you want to compare means between one data set
and a specified constant (like the mean from a hypothetical normal distribution). Click
here for a step by step article.
3. Paired Samples T-Test. Use this test if you have one group tested at two different times. In
other words, you have two measurements on the same item, person, or thing.The
groups are “paired” because there intrinsic connections between them (i.e. they
are not independent). This comparison of means is often used for groups of patients before
treatment and after treatment, or for students tested before remediation and after
remediation. Click here for a step by step article.
4. One way Analysis of Variance (ANOVA). Although not really a test for comparison of
means, ANOVA is the main option when you have more than two levels of independent
variable. For example, if your independent variable was “brand of coffee” your levels might
be Starbucks, Peets and Trader Joe’s. Use this test when you have a group of individuals
randomly split into smaller groups and completing different tasks (like drinking different
coffee).

• Null Hypothesis: H0: ρ = 0


• Alternate Hypothesis: Ha: ρ ≠ 0

Testing a correlation for significance,:

• The symbol for the population correlation coefficient is ρ, the Greek


letter “rho.”
• ρ = population correlation coefficient (unknown)
• r = sample correlation coefficient (known; calculated from sample data)

1
Testing a proportion,

The One-Sample Proportion Test is used to assess whether a population proportion (P1) is
significantly different from a hypothesized value (P0). This is called the hypothesis of inequality. The
hypotheses may be stated in terms of the proportions, their difference, their ratio, or their odds
ratio, but all four hypotheses result in the same test statistics.

If the test concludes that the correlation coefficient is significantly different from zero,
we say that the correlation coefficient is “significant.”

If the test concludes that the correlation coefficient is not significantly different from
zero (it is close to zero), we say that correlation coefficient is “not significant.”

one-sample t-test can be used only, when the data are normally distributed
one-sample t-test is used to compare the mean of one sample to a known
standard (or theoretical/hypothetical) mean (μ).a

What Is a T-Test?
A t-test is a type of inferential statistic used to determine if there is a significant
difference between the means of two groups, which may be related in certain
features. It is mostly used when the data sets, like the data set recorded as
the outcome from flipping a coin 100 times, would follow a normal distribution
and may have unknown variances. A t-test is used as a hypothesis testing
tool, which allows testing of an assumption applicable to a population.

• Calculating a t-test requires three key data values. They include the
difference between the mean values from each data set (called the
mean difference), the standard deviation of each group, and the number
of data values of each group.
• There are several different types of t-test that can be performed
depending on the data and type of analysis required.
• A large t-score indicates that the groups are different.
• A small t-score indicates that the groups are similar.

Z-TEST

• A z-test is a statistical test to determine whether two population means


are different when the variances are known and the sample size is
large.
• It can be used to test hypotheses in which the z-test follows a normal
distribution.

2
• A z-statistic, or z-score, is a number representing the result from the z-
test.
• Z-tests are closely related to t-tests, but t-tests are best performed when
an experiment has a small sample size.
• Also, t-tests assume the standard deviation is unknown, while z-tests
assume it is known.

F TEST :

An F-test is any statistical test in which the test statistic has an F-distribution under the null
hypothesis. It is most often used when comparing statistical models that have been fitted to a
data set, in order to identify the model that best fits the population from which the data were
sampled.

Formula of one-sample t-test

The t-statistic can be calculated as follow:


t=m−μs/n−−√
where,
· m is the sample mean
· n is the sample size
· s is the sample standard deviation with n−1n−1 degrees of
freedom
· μμ is the theoretical value
We can compute the p-value corresponding to the absolute value of the t-test
statistics (|t|) for the degrees of freedom (df): df=n−1df=n−1.

A Simple Introduction to ANOVA

Analysis of variance (ANOVA) is a collection of statistical models and their associated


estimation procedures (such as the "variation" among and between groups) used to analyze
the differences among means. ANOVA was developed by the statistician Ronald Fisher.

a technique, which compares the samples on the basis of their means, is called
ANOVA.
Analysis of variance (ANOVA) is a statistical technique that is used to check if
the means of two or more groups are significantly different from each other.
ANOVA checks the impact of one or more factors by comparing the means of
different samples.

3
Another measure to compare the samples is called a t-test. When we have only
two samples, t-test and ANOVA give the same results. However, using a t-test
would not be reliable in cases where there are more than 2 samples. If we
conduct multiple t-tests for comparing more than two samples, it will have a
compounded effect on the error rate of the result.

There are two kinds of means that we use in ANOVA calculations, which are
separate sample means and the grand mean . The grand mean
is the mean of sample means or the mean of all observations combined,
irrespective of the samples
The Null hypothesis in ANOVA is valid when all the sample means are equal,
or they don’t have any significant difference.
On the other hand, the alternate hypothesis is valid when at least one of the
sample means is different from the rest of the sample means. In mathematical
form, they can be represented as:

The one-way ANOVA compares the means between the groups you are
interested in and determines whether any of those means are statistically
significantly different from each other. Specifically, it tests the null
hypothesis:

where µ = group mean and k = number of groups. If, however, the one-way
ANOVA returns a statistically significant result, we accept the alternative
hypothesis (HA), which is that there are at least two group means that are
statistically significantly different from each other.

At this point, it is important to realize that the one-way ANOVA is


an omnibus test statistic and cannot tell you which specific groups were
statistically significantly different from each other, only that at least two
groups were. To determine which specific groups differed from each other,
you need to use a post hoc test.
In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way
ANOVA that examines the influence of two different categorical independent variables on
one continuous dependent variable. The two-way ANOVA not only aims at assessing
the main effect of each independent variable but also if there is any interaction between them.

4
ANOVA with interaction effects:

Interaction effects represent the combined effects of factors on the dependent measure. When an
interaction effect is present, the impact of one factor depends on the level of the other factor. Part
of the power of ANOVA is the ability to estimate and test interaction effects.

F-Statistic

The statistic which measures if the means of different samples are significantly
different or not is called the F-Ratio. Lower the F-Ratio, more similar are the
sample means. In that case, we cannot reject the null hypothesis.

F = Between group variability / Within group variability

when the outcome or dependent variable (in our case the test scores) is
affected by two independent variables/factors we use a slightly modified
technique called two-way ANOVA.
hat’s why a two-way ANOVA can have up to three hypotheses, which are as
follows:

Two null hypotheses will be tested if we have placed only one observation in
each cell. For this example, those hypotheses will be:
H1: All the music treatment groups have equal mean score.
H2: All the age groups have equal mean score.

For multiple observations in cells, we would also be testing a third hypothesis:


H3: The factors are independent or the interaction effect does not exist.

An F-statistic is computed for each hypothesis we are testing.

What is Data Mining?


Data Mining is defined as extracting information from huge sets of data. In
other words, we can say that data mining is the procedure of mining knowledge
from data. The information or knowledge extracted so can be used for any of
the following applications −
• Market Analysis
• Fraud Detection
• Customer Retention
• Production Control
• Science Exploration
5

Cross tabulation is a method to quantitatively analyze the relationship


between multiple variables. Also known as contingency tables or cross
tabs, cross tabulation groups variables to understand the correlation between
different variables. It also shows how correlations change from one variable
grouping to another.
Also known as contingency tables or cross tabs, cross tabulation groups
variables to understand the correlation between different variables. It also
shows how correlations change from one variable grouping to another
One simple way to do cross tabulations is Microsoft Excel’s pivot table feature.
Pivot tables are a great way to search for patterns as they help in easily
grouping raw data.

Ordinary Least Sum of Squares Model:

Ordinary least squares (OLS) regression is a statistical method of analysis that


estimates the relationship between one or more independent variables and a
dependent variable; the method estimates the relationship by minimizing the
sum of the squares in the difference between the observed and predicted
values of the dependent variable configured as a straight line. In this entry, OLS
regression will be discussed in the context of a bivariate model, that is, a model
in which there is only one independent variable ( X ) predicting a dependent
variable ( Y ). However, the logic of OLS regression is easily extended to the
multivariate model in which there are two or more independent variables.
Linear Regression :
Regression analysis is a very widely used statistical tool to establish a relationship
model between two variables. One of these variable is called predictor variable whose
value is gathered through experiments. The other variable is called response variable
whose value is derived from the predictor variable.
In Linear Regression these two variables are related through an equation, where
exponent (power) of both these variables is 1. Mathematically a linear relationship
represents a straight line when plotted as a graph. A non-linear relationship where
the exponent of any variable is not equal to 1 creates a curve.

What Is Multiple Linear Regression (MLR)?

6
Multiple linear regression (MLR), also known simply as multiple regression, is a
statistical technique that uses several explanatory variables to predict the
outcome of a response variable. The goal of multiple linear regression (MLR) is
to model the linear relationship between the explanatory (independent)
variables and response (dependent) variable.

In essence, multiple regression is the extension of ordinary least-squares


(OLS) regression because it involves more than one explanatory variable.
Assumption of Linear Programming:
• Linear relationship
• Multivariate normality
• No or little multicollinearity
• No auto-correlation
• Homoscedasticity
• There should be a linear and additive relationship between dependent (response)
variable and independent (predictor) variable(s). A linear relationship suggests
that a change in response Y due to one unit change in X¹ is constant, regardless
of the value of X¹. An additive relationship suggests that the effect of X¹ on Y is
independent of other variables.
• There should be no correlation between the residual (error) terms. Absence of
this phenomenon is known as Autocorrelation.
• The independent variables should not be correlated. Absence of this phenomenon
is known as multicollinearity.
• The error terms must have constant variance. This phenomenon is known as
homoskedasticity. The presence of non-constant variance is referred to
heteroskedasticity.
• The error terms must be normally distributed.

Obtaining the Best fit line


A line of best fit is a straight line that is the best approximation of the given set
of data.
A more accurate way of finding the line of best fit is the least square method
The line of best fit (or trendline) is an educated guess about where a linear
equation might fall in a set of data plotted on a scatter plot.

Outliers and Influential Observations

▪ An outlier is a data point whose response y does not follow the general
trend of the rest of the data.

7
▪ A data point has high leverage if it has "extreme" predictor x values.
With a single predictor, an extreme x value is simply one that is
particularly high or low. With multiple predictors, extreme x values may
be particularly high or low for one or more predictors, or may be
"unusual" combinations of predictor values (e.g., with two predictors
that are positively correlated, an unusual combination of predictor
values might be a high value of one predictor paired with a low value of
the other predictor).
A data point is influential if it unduly influences any part of a regression
analysis, such as the predicted responses, the estimated slope coefficients, or
the hypothesis test results. Outliers and high leverage data points have
the potential to be influential, but we generally have to investigate further to
determine whether or not they are actually influential.

Outliers

Data points that diverge in a big way from the overall pattern are
called outliers. There are four ways that a data point might be considered an
outlier.

§ It could have an extreme X value compared to other data points.

§ It could have an extreme Y value compared to other data points.

§ It could have extreme X and Y values.

▪ It might be distant from the rest of the data, even without extreme X or
Y values.

Influential Points

An influential point is an outlier that greatly affects the slope of the regression
line. One way to test the influence of an outlier is to compute the regression
equation with and without the outlier.

If your data set includes an influential point, here are some things to consider.

▪ An influential point may represent bad data, possibly the result of


measurement error. If possible, check the validity of the data point.

8
▪ Compare the decisions that would be made based on regression
equations defined with and without the influential point. If the
equations lead to contrary decisions, use caution.

Test Your Understanding

In the context of regression analysis, which of the following statements are


true?

I. When the data set includes an influential point, the data set is nonlinear.
II. Influential points always reduce the coefficient of determination.
III. All outliers are influential data points.

(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above

Solution

The correct answer is (E).

Last lesson Next lesson

Multicollinearity refers to a situation in which more than two explanatory


variables in a multiple regression model are highly linearly related. We have
perfect multicollinearity if, for example as in the equation above, the
correlation between two independent variables is equal to 1 or −1.
• Multicollinearity is a statistical concept where independent variables in a
model are correlated.
• Multicollinearity among independent variables will result in less reliable
statistical inferences.
• It is better to use independent variables that are not correlated or
repetitive when building multiple regression models that use two or
more variables.
Multicollinearity occurs when independent variables in a regression model are
correlated. This correlation is a problem because independent variables should

9
be independent. If the degree of correlation between variables is high enough,
it can cause problems when you fit the model and interpret the results.

Dimension Reduction Techniques


dimension reduction, is the transformation of data from a high-dimensional
space into a low-dimensional space so that the low-dimensional
representation retains some meaningful properties of the original data, ideally
close to its intrinsic dimension.
Concept of latent dimensions
In statistics, latent variables (from Latin: present participle of lateo (“lie
hidden”), as opposed to observable variables) are variables that are not
directly observed but are rather inferred (through a mathematical model) from
other variables that are observed (directly measured)
A latent variable model is a statistical model that relates a set of observable
variables (so-called manifest variables) to a set of latent variables.

Common Dimensionality Reduction Techniques

Dimensionality reduction can be done in two different ways:

• By only keeping the most relevant variables from the original dataset
(this technique is called feature selection)
• By finding a smaller set of new variables, each being a combination of
the input variables, containing basically the same information as the
input variables (this technique is called dimensionality reduction)

Why is Dimensionality Reduction required?

Here are some of the benefits of applying dimensionality reduction to a


dataset:

• Space required to store the data is reduced as the number of dimensions


comes down
• Less dimensions lead to less computation/training time
• Some algorithms do not perform well when we have a large dimensions.
So reducing these dimensions needs to happen for the algorithm to be
useful

10
• It takes care of multicollinearity by removing redundant features. For
example, you have two variables – ‘time spent on treadmill in minutes’
and ‘calories burnt’. These variables are highly correlated as the more
time you spend running on a treadmill, the more calories you will burn.
Hence, there is no point in storing both as just one of them does what
you require
• It helps in visualizing data. As discussed earlier, it is very difficult to
visualize data in higher dimensions so reducing our space to 2D or 3D
may allow us to plot and observe patterns more clearly
Principal Components Analysis,
WHAT IS PRINCIPAL COMPONENT ANALYSIS?

Principal Component Analysis, or PCA, is a dimensionality-reduction method


that is often used to reduce the dimensionality of large data sets, by
transforming a large set of variables into a smaller one that still contains most
of the information in the large set.

Factor Analysis

Factor analysis is a technique that is used to reduce a large number of


variables into fewer numbers of factors. This technique extracts maximum
common variance from all variables and puts them into a common score. As an
index of all variables, we can use this score for further analysis.

Factor analysis is a statistical method used to describe variability among


observed, correlated variables in terms of a potentially lower number of
unobserved variables called factors. For example, it is possible that variations
in six observed variables mainly reflect the variations in two unobserved
(underlying) variables. Factor analysis searches for such joint variations in
response to unobserved latent variables. The observed variables are modelled
as linear combinations of the potential factors, plus "error" terms.

Probability:
Probability is the branch of mathematics concerning numerical descriptions of
how likely an event is to occur, or how likely it is that a proposition is true.
The probability of an event is a number between 0 and 1, where, roughly
speaking, 0 indicates impossibility of the event and 1 indicates certainty.

11
• The probability of an event can only be between 0 and 1 and can also be
written as a percentage.
• The probability of event AAA is often written as P(A)P(A)P, left
parenthesis, A, right parenthesis.
• If P(A) > P(B) means, then event A has a higher chance of occurring than
event B.
• If P(A) = P(B)P(A)=P(B)P, left parenthesis, A, right parenthesis, equals, P,
left parenthesis, B, right parenthesis, then events AAA and BBB are
equally likely to occur.

Types of Probability
There are three major types of probabilities:
• Theoretical Probability
• Experimental Probability
• Axiomatic Probability

Theoretical Probability
It is based on the possible chances of something to happen. The theoretical
probability is mainly based on the reasoning behind probability. For example, if
a coin is tossed, the theoretical probability of getting a head will be ½.

Experimental Probability
It is based on the basis of the observations of an experiment. The experimental
probability can be calculated based on the number of possible outcomes by
the total number of trials. For example, if a coin is tossed 10 times and heads is
recorded 6 times then, the experimental probability for heads is 6/10 or, 3/5.

Axiomatic Probability
In axiomatic probability, a set of rules or axioms are set which applies to all
types. These axioms are set by Kolmogorov and are known as Kolmogorov’s
three axioms. With the axiomatic approach to probability, the chances of
occurrence or non-occurrence of the events can be quantified.
First axiom[edit]
The probability of an event is a non-negative real number:

12
Second axiom[edit]
See also: Unitarity (physics)
This is the assumption of unit measure: that the probability that at least one of the elementary
events in the entire sample space will occur is 1

Third axiom[edit]
This is the assumption of σ-additivity:
Any countable sequence of disjoint sets (synonymous with mutually

exclusive events) satisfies

Conditional Probability is the likelihood of an event or outcome occurring


based on the occurrence of a previous event or outcome.

Mutually Exclusive events:


Mutually exclusive is a statistical term describing two or more events that
cannot happen simultaneously. It is commonly used to describe a situation
where the occurrence of one outcome supersedes the other.
In logic and probability theory, two events are mutually exclusive or disjoint if
they cannot both occur at the same time. A clear example is the set of
outcomes of a single coin toss, which can result in either heads or tails, but not
both
Dependent and Independent Events
Two events are said to be dependent if the occurrence of one event changes
the probability of another event. Two events are said to be independent
events if the probability of one event that does not affect the probability of
another event. If two events are mutually exclusive, they are not independent.
Also, independent events cannot be mutually exclusive
Marginal probability is the probability of an event irrespective of the outcome
of another variable. Conditional probability is the probability of one event
occurring in the presence of a second event.

Conditional Probability

We may be interested in the probability of an event given the occurrence of


another event.

The probability of one event given the occurrence of another event is called
the conditional probability. The conditional probability of one to one or more
random variables is referred to as the conditional probability distribution.

13
For example, the conditional probability of event A given event B is written
formally as:

• P(A given B)

The “given” is denoted using the pipe “|” operator; for example:

• P(A | B)

The conditional probability for events A given event B is calculated as follows:

• P(A given B) = P(A and B) / P(B)

Marginal Probability

We may be interested in the probability of an event for one random variable,


irrespective of the outcome of another random variable.

For example, the probability of X=A for all outcomes of Y.

The probability of one event in the presence of all (or a subset of) outcomes of
the other random variable is called the marginal probability or the marginal
distribution. The marginal probability of one random variable in the presence
of additional random variables is referred to as the marginal probability
distribution.

It is called the marginal probability because if all outcomes and probabilities


for the two variables were laid out together in a table (X as columns, Y as
rows), then the marginal probability of one variable (X) would be the sum of
probabilities for the other variable (Y rows) on the margin of the table.

There is no special notation for the marginal probability; it is just the sum or
union over all the probabilities of all events for the second variable for a given
fixed event for the first variable.

• P(X=A) = sum P(X=A, Y=yi) for all y

This is another important foundational rule in probability, referred to as the


“sum rule.”

14
The marginal probability is different from the conditional probability
(described next) because it considers the union of all events for the second
variable rather than the probability of a single event.

Bayes Theorem.

describes the probability of an event, based on prior knowledge of conditions


that might be related to the event.[2] For example, if the risk of developing
health problems is known to increase with age, Bayes' theorem allows the risk
to an individual of a known age to be assessed more accurately (by
conditioning it on their age) than simply assuming that the individual is typical
of the population as a whole.

KEY TAKEAWAYS

• Bayes' theorem allows you to update predicted probabilities of an event


by incorporating new information.
• Bayes' theorem was named after 18th-century mathematician Thomas
Bayes.
• It is often employed in finance in updating risk evaluation.

probability distribution is the mathematical function that gives the


probabilities of occurrence of different possible outcomes for
an experiment.[1][2] It is a mathematical description of a random phenomenon
in terms of its sample space and the probabilities of events (subsets of the
sample space).[3]

Probability distributions are generally divided into two classes.


1) discrete probability distribution
2) continuous probability distributions

15
· A discrete probability distribution is applicable to the scenarios
where the set of possible outcomes is discrete (e.g. a coin toss, a roll of a
dice), and the probabilities are here encoded by a discrete list of the
probabilities of the outcomes, known as the probability mass function.

On the other hand, continuous probability distributions are applicable


to scenarios where the set of possible outcomes can take on values in a
continuous range (e.g. real numbers), such as the temperature on a
given day. In this case, probabilities are typically described by
a probability density function.[4][6][10]

What is Normal Distribution?


Normal distribution, also known as the Gaussian distribution, is a probability
distribution that is symmetric about the mean, showing that data near the
mean are more frequent in occurrence than data far from the mean. In graph
form, normal distribution will appear as a bell curve.

KEY TAKEAWAYS

• A normal distribution is the proper term for a probability bell curve.


• In a normal distribution the mean is zero and the standard deviation is
1. It has zero skew and a kurtosis of 3.
• Normal distributions are symmetrical, but not all symmetrical
distributions are normal.
• In reality, most pricing distributions are not perfectly normal.

A probability distribution whose sample space is one-dimensional (for example


real numbers, list of labels, ordered labels or binary) is called univariate, while
a distribution whose sample space is a vector space of dimension 2 or more is
called multivariate

Continuous Probability Distributions


The relationship between the events for a continuous random variable and
their probabilities is called the continuous probability distribution and is
summarized by a probability density function, or PDF for short.

Normal Distribution
The normal distribution is also called the Gaussian distribution (named for Carl
Friedrich Gauss) or the bell curve distribution.
16
The distribution covers the probability of real-valued events from many
different problem domains, making it a common and well-known distribution,
hence the name “normal.” A continuous random variable that has a normal
distribution is said to be “normal” or “normally distributed.”
Some examples of domains that have normally distributed events include:

• The heights of people.


• The weights of babies.
• The scores on a test.
Central limit theorem

Central Limit Theorem (CLT)

In the study of probability theory, the central limit theorem (CLT) states that
the distribution of sample approximates a normal distribution (also known as a
“bell curve”) as the sample size becomes larger, assuming that all samples are
identical in size, and regardless of the population distribution shape.
• The central limit theorem (CLT) states that the distribution of sample
means approximates a normal distribution as the sample size gets larger.
• Sample sizes equal to or greater than 30 are considered sufficient for the
CLT to hold.
• A key aspect of CLT is that the average of the sample means and
standard deviations will equal the population mean and standard
deviation.
• A sufficiently large sample size can predict the characteristics of a
population accurately.
Central Limit Theorem exhibits a phenomenon where the average of the
sample means and standard deviations equal the population mean and
standard deviation, which is extremely useful in accurately predicting the
characteristics of populations.

The Central Limit Theorem states that the sampling distribution of the sample
means approaches a normal distribution as the sample size gets larger — no
matter what the shape of the population distribution. This fact holds especially
true for sample sizes over 30.
ll this is saying is that as you take more samples, especially large ones, your
graph of the sample means will look more like a normal distribution.

17
What is Discrete Distribution?

A discrete distribution is a distribution of data in statistics that has discrete


values. Discrete values are countable, finite, non-negative integers, such as 1,
10, 15, etc.

What Is a Poisson Distribution?


In statistics, a Poisson distribution is a probability distribution that can be used
to show how many times an event is likely to occur within a specified period of
time. In other words, it is a count distribution. Poisson distributions are often
used to understand independent events that occur at a constant rate within a
given interval of time. It was named after French mathematician Siméon
Denis Poisson.

The Poisson distribution is a discrete function, meaning that the variable can
only take specific values in a (potentially infinite) list. Put differently, the
variable cannot take all values in any continuous range. For the Poisson
distribution (a discrete distribution), the variable can only take the values 0, 1,
2, 3, etc., with no fractions or decimals.

KEY TAKEAWAYS
• A Poisson distribution can be used to measure how many times an event
is likely to occur within "X" period of time, named after mathematician
Siméon Denis Poisson.
• Poisson distributions, therefore, are used when the factor of interest is a
discrete count variable.
• Many economic and financial data appear as count variables, such as
how many times a person becomes unemployed in a given year, thus
lending itself to analysis with a Poisson distribution.
What is a Binomial Distribution?
A binomial distribution can be thought of as simply the probability of a
SUCCESS or FAILURE outcome in an experiment or survey that is repeated
multiple times. The binomial is a type of distribution that has two possible
outcomes (the prefix “bi” means two, or twice). For example, a coin toss has
only two possible outcomes: heads or tails and taking a test could have two
possible outcomes: pass or fail.
· The first variable in the binomial formula, n, stands for the
number of times the experiment runs.

18
· The second variable, p, represents the probability of one
specific outcome.
Binomial distributions must also meet the following three criteria:

1. The number of observations or trials is fixed. In other words,


you can only figure out the probability of something happening if
you do it a certain number of times. This is common sense—if you
toss a coin once, your probability of getting a tails is 50%. If you toss
a coin a 20 times, your probability of getting a tails is very, very close
to 100%.
2. Each observation or trial is independent. In other words, none
of your trials have an effect on the probability of the next trial.
3. The probability of success (tails, heads, fail or pass) is exactly
the same from one trial to another.
What is a Binomial Distribution? The Bernoulli Distribution.
The binomial distribution is closely related to the Bernoulli distribution.
According to Washington State University, “If each Bernoulli trial is
independent, then the number of successes in Bernoulli trails has a binomial
Distribution. On the other hand, the Bernoulli distribution is the Binomial
distribution with n=1.”
A Bernoulli distribution is a set of Bernoulli trials. Each Bernoulli trial has one
possible outcome, chosen from S, success, or F, failure. In each trial, the
probability of success, P(S) = p, is the same. The probability of failure is just 1
minus the probability of success: P(F) = 1 – p. (Remember that “1” is the total
probability of an event occurring…probability is always between zero and 1).
Finally, all Bernoulli trials are independent from each other and the probability
of success doesn’t change from trial to trial, even if you have information
about the other trials’ outcomes.
The Binomial Distribution Formula

A Binomial Distribution shows either (S)uccess or (F)ailure.


The binomial distribution formula is:

b(x; n, P) = nCx * Px * (1 – P)n – x


Where:
b = binomial probability
x = total number of “successes” (pass or fail, heads or tails etc.)
P = probability of a success on an individual trial
n = number of trials

19
Predective Modeling
Predictive modeling is a process that uses data and statistics to predict
outcomes with data models. These models can be used to predict anything
from sports outcomes and TV ratings to technological advances and corporate
earnings. Predictive modeling is also often referred to as: Predictive analytics.
Predictive modeling is a process that uses data and statistics to predict
outcomes with data models. These models can be used to predict anything
from sports outcomes and TV ratings to technological advances and corporate
earnings.

Predictive modeling is also often referred to as:

• Predictive analytics
• Predictive analysis
• Machine learning
Concept of Multiple Linear regression
Multiple linear regression is used to estimate the relationship between two or
more independent variables and one dependent variable. You can use
multiple linear regression when you want to know:
1. How strong the relationship is between two or more independent
variables and one dependent variable (e.g. how rainfall, temperature,
and amount of fertilizer added affect crop growth).
2. The value of the dependent variable at a certain value of the
independent variables (e.g. the expected yield of a crop at certain levels
of rainfall, temperature, and fertilizer addition).
Multiple linear regression (MLR), also known simply as multiple regression, is a
statistical technique that uses several explanatory variables to predict the
outcome of a response variable. The goal of multiple linear regression (MLR) is
to model the linear relationship between the explanatory (independent)
variables and response (dependent) variable.
In essence, multiple regression is the extension of ordinary least-squares
(OLS) regression because it involves more than one explanatory variable.
KEY TAKEAWAYS
• Stepwise regression is a method that iteratively examines the statistical
significance of each independent variable in a linear regression model.

20
• The forward selection approach starts with nothing and adds each new
variable incrementally, testing for statistical significance.
• The backward elimination method begins with a full model loaded with
several variables and then removes one variable to test its importance
relative to overall results.
• Stepwise regression has its downsides, however, as it is an approach
that fits data into a model to achieve the desired result.

Three approaches to stepwise regression:


1. Forward selection begins with no variables in the model, tests each
variable as it is added to the model, then keeps those that are deemed
most statistically significant—repeating the process until the results are
optimal.
2. Backward elimination starts with a set of independent variables,
deleting one at a time, then testing to see if the removed variable is
statistically significant.
3. Bidirectional elimination is a combination of the first two methods that
test which variables should be included or excluded.
Dummy Varable
In statistics and econometrics, particularly in regression analysis, a dummy
variable[a] is one that takes only the value 0 or 1 to indicate the absence or
presence of some categorical effect that may be expected to shift the
outcome.[2][3] They can be thought of as numeric stand-ins for qualitative facts
in a regression model, sorting data into mutually exclusive categories (such as
smoker and non-smoker).[4]

A dummy variable is a variable that takes values of 0 and 1, where the values
indicate the presence or absence of something (e.g., a 0 may indicate a
placebo and 1 may indicate a drug). Where a categorical variable has more
than two categories, it can be represented by a set of dummy variables, with
one variable for each category. Numeric variables can also be dummy
coded to explore nonlinear effects. Dummy variables are also known
as indicator variables, design variables, contrasts, one-hot coding, and binary
basis variables.

Logistic regression:

21
Logistic regression is a statistical model that in its basic form uses
a logistic function to model a binary dependent variable, although many more
complex extensions exist. In regression analysis, logistic regression (or logit
regression) is estimating the parameters of a logistic model (a form of
binary regression).
Logistic regression is the appropriate regression analysis to conduct when the
dependent variable is dichotomous (binary). Like all regression analyses, the
logistic regression is a predictive analysis. Logistic regression is used to
describe data and to explain the relationship between one dependent binary
variable and one or more nominal, ordinal, interval or ratio-level independent
variables.

A logistic regression model predicts a dependent data variable by analyzing the


relationship between one or more existing independent variables. For
example, a logistic regression could be used to predict whether a political
candidate will win or lose an election or whether a high school student will be
admitted to a particular college.

Logistic regression vs. linear regression


The main difference between logistic regression and linear regression is that
logistic regression provides a constant output, while linear regression provides
a continuous output.

In logistic regression, the outcome, such as a dependent variable, only has a


limited number of possible values. However, in linear regression, the outcome
is continuous, which means that it can have any one of an infinite number of
possible values.

Logistic regression is used when the response variable is categorical, such as


yes/no, true/false and pass/fail. Linear regression is used when the response
variable is continuous, such as number of hours, height and weight.

Odds are determined from probabilities and range between 0 and infinity.
Odds are defined as the ratio of the probability of success and the probability
of failure.

What is a Likelihood-Ratio Test?

22
The Likelihood-Ratio test (sometimes called the likelihood-ratio chi-squared
test) is a hypothesis test that helps you choose the “best” model between
two nested models. “Nested models” means that one is a special case of the
other. For example, you might want to find out which of the following models
is the best fit:
· Model One has four predictor variables (height, weight, age,
sex),
· Model Two has two predictor variables (age,sex). It is “nested”
within model one because it has just two of the predictor variables
(age, sex).
This theory cam also be applied to matrices. For example, a scaled identity
matrix is nested within a more complex compound symmetry matrix.

In statistics, the likelihood-ratio test assesses the goodness of fit of two


competing statistical models based on the ratio of their likelihoods, specifically
one found by maximization over the entire parameter space and another
found after imposing some constraint. If the constraint (i.e., the null
hypothesis) is supported by the observed data, the two likelihoods should not
differ by more than sampling error.[1] Thus the likelihood-ratio test tests
whether this ratio is significantly different from one, or equivalently whether
its natural logarithm is significantly different from zero.
The likelihood-ratio test, also known as Wilks test,[2] is the oldest of the three
classical approaches to hypothesis testing, together with the Lagrange
multiplier test and the Wald test.[3] In fact, the latter two can be
conceptualized as approximations to the likelihood-ratio test, and are
asymptotically equivalent.[4][5][6] In the case of comparing two models each of
which has no unknown parameters, use of the likelihood-ratio test can be
justified by the Neyman–Pearson lemma. The lemma demonstrates that the
test has the highest power among all competitors.[7]

What is a Pseudo R-squared?


2 8 D ec em b e r 2 01 6 Ja s o n S h a f r i n 1 C o m m en t

When running an ordinary least squares (OLS) regression, one common metric
to assess model fit is the R-squared (R2). The R2 metric can is calculated as
follows.

· R2 = 1 – [Σi(yi-ŷi)2]/[Σi(yi-ȳ)2]

23
The dependent variable is y, the predicted value from the OLS regression is ŷ,
and the average value of y across all observations is ȳ. The index for observations
is omitted for brevity.

One can interpret the R2 metric a variety of ways.

1. R-squared as explained variability – The denominator of the


ratio can be thought of as the total variability in the dependent
variable, or how much y varies from its mean. The numerator of the
ratio can be thought of as the variability in the dependent variable
that is not predicted by the model. Thus, this ratio is the proportion
of the total variability unexplained by the model. Subtracting this
ratio from one results in the proportion of the total variability
explained by the model. The more variability explained, the better
the model.
2. R-squared as improvement from null model to fitted model –
The denominator of the ratio can be thought of as the sum of
squared errors from the null model–a model predicting the
dependent variable without any independent variables. In the null
model, each y value is predicted to be the mean of the y values.
Consider being asked to predict a y value without having any
additional information about what you are predicting. The mean of
the y values would be your best guess if your aim is to minimize the
squared difference between your prediction and the actual y value.
The numerator of the ratio would then be the sum of squared
errors of the fitted model. The ratio is indicative of the degree to
which the model parameters improve upon the prediction of the
null model. The smaller this ratio, the greater the improvement and
the higher the R-squared.
3. R-squared as the square of the correlation – The term “R-
squared” is derived from this definition. R-squared is the square of
the correlation between the model’s predicted values and the
actual values. This correlation can range from -1 to 1, and so the
square of the correlation then ranges from 0 to 1. The greater the
magnitude of the correlation between the predicted values and the
actual values, the greater the R-squared, regardless of whether the
correlation is positive or negative.

24
So then what is a pseudo R-squared? When running a logistic regression, many
people would like a similar goodness of fit metric. An R-squared value does not
exist, however, for logit regressions since these regressions rely on “maximum
likelihood estimates arrived at through an iterative process. They are not
calculated to minimize variance, so the OLS approach to goodness-of-fit does
not apply.” However, there are a few variations of a pseudo R-squared which
are analogs to the OLS R-squared. For instance:

· Efron’s Pseudo R-Squared. R2 = 1 – [Σi(yi-πˆi)2]/[Σi(yi-ȳ)2],


where πˆi are the model’s predicted values.
· McFadden’s Pseudo R-Squared. R2 = 1 – [ln LL(Mˆfull)]/[ln
LL(Mˆintercept)]. This approach is one minus the ratio of two log
likelihoods. The numerator is the log likelihood of the logit model
selected and the denominator is the log likelihood if the model just
had an intercept. McFadden’s Pseudo R-Squared is the approach
used as the default for a logit regression in Stata.
· McFadden’s Pseudo R-Squared (adjusted). R2adj = 1 – [ln
LL(Mˆfull)-K]/[ln LL(Mˆintercept)]. This approach is similar to above but
the model is penalized penalizing a model for including too many
predictors, where K is the number of regressors in the model. This
adjustment, however, makes it possible to have negative values for
the McFadden’s adjusted Pseudo R-squared.
ROC
A receiver operating characteristic curve, or ROC curve, is a graphical plot that
illustrates the diagnostic ability of a binary classifier system as its
discrimination threshold is varied. The method was originally developed for
operators of military radar receivers, which is why it is so named.
The ROC curve is created by plotting the true positive rate (TPR) against
the false positive rate (FPR) at various threshold settings. The true-positive rate
is also known as sensitivity, recall or probability of detection[10] in machine
learning. The false-positive rate is also known as probability of false alarm[

A useful tool when predicting the probability of a binary outcome is


the Receiver Operating Characteristic curve, or ROC curve.
It is a plot of the false positive rate (x-axis) versus the true positive rate (y-axis)
for a number of different candidate threshold values between 0.0 and 1.0. Put
another way, it plots the false alarm rate versus the hit rate.

25
The true positive rate is calculated as the number of true positives divided by
the sum of the number of true positives and the number of false negatives. It
describes how good the model is at predicting the positive class when the
actual outcome is positive.

1 True Positive Rate = True Positives / (True Positives + False Negatives)


The true positive rate is also referred to as sensitivity.

1 Sensitivity = True Positives / (True Positives + False Negatives)


The false positive rate is calculated as the number of false positives divided by
the sum of the number of false positives and the number of true negatives.

It is also called the false alarm rate as it summarizes how often a positive class
is predicted when the actual outcome is negative.

Classification table:

The Classification Table (aka the Confusion Matrix) compares the predicted
number of successes to the number of successes actually observed and
similarly the predicted number of failures compared to the number actually
observed.

We have four possible outcomes:

True Positives (TP) = the number of cases which were correctly classified to be
positive, i.e. were predicted to be a success and were actually observed to be a
success

False Positives (FP) = the number of cases which were incorrectly classified as
positive, i.e. were predicted to be a success but were actually observed to be a
failure

26
True Negatives (TN) = the number of cases which were correctly classified to
be negative, i.e. were predicted to be a failure and were actually observed to
be a failure

False Negatives (FN) = the number of cases which were incorrectly classified as
negative, i.e. were predicted to be a failure but were actually observed to be a
success

Discriminant Function,
Linear discriminant analysis (LDA), normal discriminant analysis (NDA),
or discriminant function analysis is a generalization of Fisher's linear
discriminant, a method used in statistics and other fields, to find a linear
combination of features that characterizes or separates two or more classes of
objects or events. The resulting combination may be used as a linear classifier,
or, more commonly, for dimensionality reduction before later classification.

LDA is closely related to analysis of variance (ANOVA) and regression analysis,


which also attempt to express one dependent variable as a linear combination
of other features or measurements.[1][2] However, ANOVA
uses categorical independent variables and a continuous dependent variable,
whereas discriminant analysis has continuous independent variables and a
categorical dependent variable

The data for the time series is stored in an R object called time-series object. It
is also a R data object like a vector or data frame.
The time series object is created by using the ts() function.
The data for the time series is stored in an R object called time-series object. It
is also a R data object like a vector or data frame.
The time series object is created by using the ts() function.
The gap between the actual data and the trend line is known as the seasonal
variation. Seasonal variation can be described as the difference between
the trend of data and the actual figures for the period in question. A seasonal
variation can be a numerical value (additive) or a percentage (multiplicative)

Time-series analysis involves looking at what has happened in the recent past
to help predict what will happen in the near future.
27
Seasonal variation
A Seasonal Variation (SV) is a regularly repeating pattern over a fixed number
of months
Trend
A Trend (T) is a long-term movement in a consistent direction. Trends can be
hard to spot because of the confusing impact of the SV. The easiest way to
spot the Trend is to look at the months that hold the same position in each set
of three period patterns.

Seasonal variation can be described as the difference between the trend of


data and the actual figures for the period in question. A seasonal variation can
be a numerical value (additive) or a percentage (multiplicative). The term
‘seasonal’ is applied to a time period, not necessarily a traditional season
(summer, autumn etc.). For example sales may be a lot higher for a store
around Christmas, but lower in January.

The decomposition of time series is a statistical task that deconstructs a time


series into several components, each representing one of the underlying
categories of patterns.[1] There are two principal types of decomposition,
which are outlined below.

1) Decomposition based on rates of change

2) Decomposition based on predictability

Decomposition of time series


Plotting time series data is an important first step in analyzing their various
components. Beyond that, however, we need a more formal means for
identifying and removing characteristics such as a trend or seasonal
variation. As discussed in lecture, the decomposition model reduces a time
series into 3 components: trend, seasonal effects, and random errors. In
turn, we aim to model the random errors as some form of stationary
process.
What is Autocorrelation?

Autocorrelation is a mathematical representation of the degree of similarity


between a given time series and a lagged version of itself over successive time
intervals. It is the same as calculating the correlation between two different
28
time series, except autocorrelation uses the same time series twice: once in its
original form and once lagged one or more time periods.

Understanding Autocorrelation

Autocorrelation can also be referred to as lagged correlation or serial


correlation, as it measures the relationship between a variable's current value
and its past values. When computing autocorrelation, the resulting output can
range from 1 to negative 1, in line with the traditional correlation statistic. An
autocorrelation of +1 represents a perfect positive correlation (an increase
seen in one time series leads to a proportionate increase in the other time
series). An autocorrelation of negative 1, on the other hand, represents
perfect negative correlation (an increase seen in one time series results in a
proportionate decrease in the other time series). Autocorrelation measures
linear relationships; even if the autocorrelation is minuscule, there may still be
a nonlinear relationship between a time series and a lagged version of itself.

• Autocorrelation represents the degree of similarity between a given


time series and a lagged version of itself over successive time intervals.
• Autocorrelation measures the relationship between a variable's current
value and its past values.
• An autocorrelation of +1 represents a perfect positive correlation, while
an autocorrelation of negative 1 represents a perfect negative
correlation.
• Technical analysts can use autocorrelation to see how much of an
impact past prices for a security have on its future price.

In time series analysis, the partial autocorrelation function (PACF) gives


the partial correlation of a stationary time series with its own lagged values,
regressed the values of the time series at all shorter lags. It contrasts with
the autocorrelation function, which does not control for other lags.

What is Exponential Smoothing?


Exponential smoothing of time series data assigns exponentially decreasing
weights for newest to oldest observations. In other words, the older the data,
the less priority (“weight”) the data is given; newer data is seen as more
relevant and is assigned more weight. Smoothing parameters (smoothing
constants)— usually denoted by α— determine the weights for observations.
29
Exponential smoothing is usually used to make short term forecasts, as longer
term forecasts using this technique can be quite unreliable.

· Simple (single) exponential smoothing uses a weighted


moving average with exponentially decreasing weights.
· Holt’s trend-corrected double exponential smoothing is
usually more reliable for handling data that shows trends, compared
to the single procedure.
· Triple exponential smoothing (also called the Multiplicative
Holt-Winters) is usually more reliable for parabolic trends or data
that shows trends and seasonality..

Autoregressive Moving Average Models


In the statistical analysis of time series, autoregressive–moving-
average (ARMA) models provide a parsimonious description of a (weakly) stationary stochastic
process in terms of two polynomials, one for the autoregression (AR) and the second for
the moving average (MA).

In one-way analysis of variance, the null hypothesis assumes that are least two of the population
means are different. FALSE

One-way analysis of variance requires that the sample size for each level to be equal to one
another. FALSE

All analysis of variance models require that the data measurement be at least categorical level.
FALSE

When calculating an ANOVA table for Two-Factor Analysis of Variance with Replication, we consider
the following sources of variation except: Between the blocks

When performing a Two-Factor Analysis of Variance with Replication, in order to measure the
interaction effect, the sample size for each combination of Factor A and Factor B must be greater
than or equal to 2

In order to develop a
relative frequency
distribution, each

30
frequency count must
be divided by:

Your Answer: the


total
number
of data
values.

The relative frequency is the number of


items in each category divided by the
total number of data values.

2. The most effective technique to display


either continuous variables or discrete
variables that have many possible
outcomes is

Your Answer: a pie chart.


Correct Answer: a grouped data
frequency
distribution.

A pie chart is a graph in which each slice


of the circle represents a category to be
displayed. This would not be an effective
technique to display a discrete variable
with many possible outcomes.

3. Use the 2k ≥ n guideline to determine


the suggested number of classes when
the number of data values is 62.

Your Answer: 6

Setting k = 6 and n = 62 yield 26 = 64


≥ 62.

4. The beginnings of a cumulative


frequency distribution are presented
below. What is the next number in the
Cumulative Frequency column?
Cumulative
Classes Frequency
Frequency
6.1 to
1 1
8

31
8.1 to
2
10
10.1 to
3
12

Your Answer: 1
Correct Answer: 3

Each term in the cumulative frequency


represents the sum of all the frequency
entries up to that point. Thus the next
term is 3 (ie 1 + 2)

5. All of the following are criteria for


constructing classes in a grouped
frequency distribution except

Your Answer: Mutually exclusive


classes
Correct Answer: All of the above are
criteria for
constructing classes
in a grouped
frequency
distribution.

The four criteria are:

1. Mutually exclusive classes


2. All inclusive classes
3. Equal width
4. Avoid empty classes

6. Classes in a frequency distribution that


do not overlap so that a data value can
be placed in only one class are said to
be

Your Answer: (blank)

7. The data for an ogive is found in which


distribution?

Your Answer: (blank)

32
8. Look at the Excel Output in Figure 2-7 of
your text. How many males have a
balance from 990 to 1139?

Your Answer: (blank)

9. Which of the following is not a


characteristic of bar charts?

Your Answer: They are graphical


representations of
categorical data.
Correct Answer: Multiple variables
must be graphed on
separate graphs.

They represent categorical data, but it is


not true that multiple variables must be
graphed on separate graphs.

10. Which of the following is not a


characteristic of stem & leaf diagrams.

Your Answer: Stem values are single


digit numbers.

Stem values need not be single digit.

11. Which of the following is true about line


charts?

Your Answer: Straight lines connect


consecutive points.

The lines connecting consecutive points


indicate the changes over the given
period of time.

12. On a scatter diagram, what values are


placed on the horizontal axis?

Your Answer: The dependent


variable

33
Correct Answer: The independent
variable

The independent variable is placed on


the horizontal axis, and the dependent
variable is placed on the vertical axis.

1. Having an investment increase in value and having the same


investment decrease in value during the same reporting period are
mutually exclusive events.

Your Answer: True

An investment cannot increase and decrease in value at the same


time.

2. Two consecutive customers placing an order at a fast-food restaurant


are independent events.

Your Answer: (blank)

3. Classical probability assessments are based on actual observations.

Your Answer: True


Correct Answer: False

Relative frequency of occurrence is based on actual observations.


Classical probability is the ratio of the number of ways the event of
interest can occur to the number of ways any event can occur when
the events are equally likely.

4. Subjective probability assessment reflects a decision-maker's state of


mind regarding the chances an event will occur.

Your Answer: False


Correct Answer: True

If the decision maker's state of mind determines the probability


assessment, the method is subjective.

5. Adding the probability of an event to the probability of that event's


complement must always equal 1.0.

34
Your Answer: True

Since , then .

6. The joint probability of two events E1 and E2 is expressed as P(E1 or


E2).

Your Answer: True


Correct Answer: False

The joint probability of two events E1 and E2 is expressed as


P(E1 and E2).

7. The probability that an event will occur given that some other event
has already happened is known as joint probability.

Your Answer: False

The probability that an event will occur given that some other event
has already happened is known as conditional probability.

8. The probability that either events E1 or E2 will occur can always be


calculated by adding the probabilities of E1 and E2 together.

Your Answer: True


Correct Answer: False

P(E1 or E2) = P(E1) + P(E2) only for mutually exclusive events.

9. The probability of event E1 occurring given that event E2 has already


occurred equals the probability of E1 when E1 and E2 are mutually
exclusive.

Your Answer: False

The probability of event E1 occurring given that event E2 has already


occurred equals the probability of E1 when E1 and E2 are independent.

10. Revising probabilities based on new information can be performed


using Bayes' Theorem.

35
Your Answer: True

Revising probabilities based on new information can be performed


using Bayes' Theorem.

1. If Ho is μ ≥ 20, which of the following represents a Type I error?

Your Answer: μ = 22; reject Ho

Ho is true and is rejected. This is a Type I error.

2. Which of the following is true about hypotheses?

Your Answer: The null hypothesis is the statement the researcher


wishes to show to be true.
Correct Answer: The null hypothesis represents the condition that will
be assumed to exist unless sufficient evidence is
presented to show the condition has changed.

The null hypothesis represents the condition that is assumed to be true.


The researcher seeks to show that the alternative hypothesis is true.

3. If Ho is μ ≥ 10, which of the following represents a Type II error?

Your Answer: μ = 10; reject Ho


Correct Answer: μ = 9; do not reject Ho

Ho is true and is rejected. This is a Type I error. Type Is are failing to


reject a false null hypothesis.

I error

4. The probability (assuming the null hypothesis is true) of obtaining a test


statistic at least as extreme as the test statistic that was calculated from
the sample is known as the

Your Answer: power.


Correct Answer: p-value.

The p-value is the probability (assuming the null hypothesis is true) of


obtaining a test statistic at least as extreme as the test statistic that was
calculated from the sample.

36
5. Find the critical z-value for the hypothesis test calculated at α = 5% when

H0 : μ ≥ 20 and HA : μ < 20; σ = 0.8; = 19.8; and n = 50.

Your Answer: -1.47


Correct Answer: -1.645

When the tail area under the standard normal curve is 5%, the z-value is -
1.645.

6. Find the z-value for the test statistic for the hypothesis test calculated at α

= 5% when H0 : μ ≥ 20; σ = 0.8; = 19.8; and n = 50.

Your Answer: -1.65


Correct Answer: -1.768

7. The manufacturer of headache pills assumes that each contains 200 mg of


active ingredient and being under or being over is undesirable. If they are
going to test at the 10% significance level, what should be their decision
rule for rejecting the null hypothesis?

Your Answer:
If , reject the null hypothesis.
Correct Answer:
If or , reject the null
hypothesis.

Reject the hypothesis if it is outside the region bordered by the critical


values. Thus reject if .

8. The maximum allowable probability of committing a Type I statistical error


is known as the

Your Answer: significance level.

Significance level is the maximum allowable probability of committing a


Type I statistical error.

9. In order to test the claim that the proportion of republican voters in a


particular city is less than 60 percent, a random sample of 150 voters was

37
selected and found to consist of 54 percent Republicans. What is the p-
value for this sample?

Your Answer: 0.0025


Correct Answer: 0.0668

10. Which of the following is not one of the conditions for using the t-
distribution.

Your Answer: σ is unknown.


Correct Answer: The hypothesis test is one-tailed.

The t-distribution must be used if the sample size is small and σ is


unknown but cannot be used if the population is not normal. Whether or
not the hypothesis is one-tailed is of no consequence.

11. Which of the following can be used to decrease both α and β?

Your Answer: Decrease the desired confidence.


Correct Answer: Increase the sample size.

Increasing the sample size will decrease both α and β.

12. The probability that a hypothesis test will reject the null hypothesis when
the null hypothesis is false is called

Your Answer: error.


Correct Answer: power.

Power is the probability that the hypothesis test will reject the null
hypothesis when the null hypothesis is false

13. What is the z-value for the test statistic for the following hypothesis

test? ; n = 60; the sample proportion is


0.085 and α = 0.10.

38
Your Answer: -0.1353

In one-way
analysis of
variance, the
null hypothesis
assumes that
are least two of
the population
means are
different.

Your Fal
Answer: se

In one-way analysis of variance, the alternative hypothesis


assumes that are least two of the population means are
different.

2. One-way analysis of variance requires that the sample size for


each level to be equal to one another.

Your Answer: True


Correct Answer: False

One-way analysis of variance does not require that the


sample size for each level to be equal to one another.

3. The following table is a random sample of daily withdrawal of


cash (in $1000), from four branches of a bank, located in
different areas of a city.

Branch
A B C D
113 120 132 122
121 127 130 118
117 125 129 125
110 129 135 125

39
The results of a one-way ANOVA, is reported below.

ANOVA
Source of
SS df MS F
variation
Between 544.25 3 181.4167
Within 167.5 12 13.95833
Total 711.75 15

If we wish to conduct a one-way ANOVA, the proper test


statistic will be a t test of all pair-wise means in the table.

Your Answer: True


Correct Answer: False

4. In Question 3, The value of the test statistic is


F = SSB/ SSW = 544.25 / 167.5 = 3.249.

Your Answer: True


Correct Answer: False

The test statistic F = MSB/ MSW


F = 181.4167 / 13.95833 = 12.997.

5. In question 3, the critical value of F is


(D1 = 3, D2 = 12, α=0.05) = 3.490.

Your Answer: False


Correct Answer: True

D1 = k -1 = 4 -1 =3, D2 = n − k = 16 − 4 = 12
at α = 0.05 F = 3.490

6. In question 3, because the test statistic F = 12.997 > critical


value = 3.490, we reject the null hypothesis, and conclude
that all means are different.

Your Answer: True


Correct Answer: False

By rejecting the null hypothesis, we conclude that not all the


populations have the same mean.

40
7. A randomized block ANOVA was performed on the differences
in price of a gallon of gasoline in three cities (A, B, and C),
where blocks represent the type of gasoline (regular, special,
extra, and super)

City
A B C
Regular 1.58 1.60 1.59
Special 1.84 1.68 1.90
Extra 1.44 1.45 1.50
Super 1.33 1.50 1.55

The results of a randomized block design test, is shown below.

ANOVA
P-
Sourc d
SS MS F valu
e f
e
.2384 .0794 .004
Rows 3 13.108
67 89 82
Colum .0183 .0091 1.5130 .293
2
ns 5 75 55 7
.0363 .0060
Error 6
83 64
1
Total .2932
1

Was blocking necessary? Yes, because the p-value for rows


(block) is 0.00482.

Your Answer: (blank)

8. In Question 7, we can conclude that we have sufficient


evidence that the average prices of gasoline in the three cities
differ.

Your Answer: False

9. The Tukey-Kramer multiple comparisons procedure is used to


determine where the population differences occur for a
randomized block ANOVA design.

Your Answer: True


Correct Answer: False

41
Tukey-Kramer is used to determine where the population
differences occur for a one-way ANOVA design. Fisher’s Least
Significant Difference is used for a randomized block ANOVA
design.

10. All analysis of variance models require that the data


measurement be at least categorical level.

Your Answer: True


Correct Answer: False

Analysis of variance models require that the data


measurement be interval or ratio level.

When performing a chi-square


goodness-of-fit test, a large
value of the chi-square
statistic provides evidence
that the null hypothesis should
be rejected.

Your Answer: False


Correct Answer: True

The rejection region is always in the upper


tail.

2. The null hypothesis for a chi-square


goodness-of-fit test states that the
population data does not follow the
hypothesized distribution.

Your Answer: False

The null hypothesis for a chi-square


goodness-of-fit test states that the
population does follow the hypothesized
distribution.

3. The degrees of freedom for a chi-square


goodness-of-fit test are calculated as N-1.

Your Answer: True


Correct Answer: False

42
The degrees of freedom are equal to k-1
where k is the number of categories or
observed cell frequencies.

4. The chi-square goodness-of-fit test is used


to test discrete distributions only.

Your Answer: True


Correct Answer: False

The chi-square goodness-of-fit test can be


used with both discrete and continuous data
but the continuous data must be grouped
into categories before the statistic can be
calculated.

5. The Jerome Light Bulb Company recently


conducted a statistical test to determine
whether the number of hours that light bulbs
last is normally distributed with a mean of
500 and a standard deviation of 20. A
sample of 300 light bulbs was divided into 8
groups to form a grouped data frequency
distribution. The degrees of freedom for the
test will be 299.

Your Answer: False

The degrees of freedom are k-1=8-1-7.

6. Contingency analysis helps to make


decisions when multiple means are involved.

Your Answer: True


Correct Answer: False

Contingency analysis helps to make


decisions when multiple proportions are
involved.

7. In contingency analysis, we expect the


actual frequencies in each cell to
approximately match the corresponding
expected cell frequencies when the
characteristics are independent.

43
Your Answer: True

The values for the expected cell frequencies


assume that the characteristics are
independent.

8. If a contingency analysis test is performed


with a 4 × 5 design, the number of degrees
of freedom is for determining the chi-square
critical value is 20 – 1 = 19.

Your Answer: False

Degrees of freedom are calculated as (r-


1)×(c-1) =(4-1)×(5-1)=12.

9. If any of the expected cell frequencies are


less than 5, categories can be combined so
that all expected frequencies are at least 5.

Your Answer: True

If any of the expected cell frequencies are


less than 5, categories can be combined so
that all expected frequencies are at least 5.

10. A survey was conducted in which males and


females were asked whether they owned a
laptop personal computer. The following data
were observed.

To test whether having a laptop is


independent of gender, the expected cell
frequency for males who have a laptop is
106.67.

Your Answer: True

The expected frequency is (Row


total)(Column total)/(Grand total) =
(200)(160)/300=106.67.

44
A high correlation between two independent variables such that the two variables contribute
redundant information to the model is known as

Your Answer: multicollinearity .

Which of the following choices stepwise regression procedures offers a


means of observing multicollinearity problems because we can see how
the regression model changes as each new variable is added to it?

Your Answer: Forward selection


Correct Answer: Standard stepwise regression

Consider the following stepwise regression procedure. All variables are forced into the model to
begin the process. Variables are removed one at a time until no more insignificant variables are
found. Once a variable has been removed from the model, it cannot be reentered. This procedure
is known as

Your Answer: forward selection.


Correct Answer: backward elimination.

A forecasting model of the following form was developed:

Which of the following best describes the form of this model?

Your Answer: 3rd order polynomial model

Interaction exists in a multiple regression model when

Your Answer: one independent variable affects the relationship between another independent
variable and a dependent variable.
Which of the following methods is
used to help assess whether the
regression model meets the
assumption of having normally
distributed residuals?

Develop a histogram of the standardized residuals.

Develop a normal probability plot of the residuals.

Develop a histogram of the residuals.

All of the above.

45
Which of
the
following
statement
s is true?
Dummy variables are always assigned the value zero or one.

The number of dummy variables is always one fewer than the number of
categories.

Dummy variables are used to incorporate categorical variables into a


regression model.

All of the above

In a multiple regression model, the regression slope coefficients measure the average change in
the dependent variable for a one-unit change in all the independent variables.

Your Answer: False

The regression slope coefficients measure the average change in the dependent variable for a one-
unit change in the independent variable, while all other independent variables remain constant.

In a multiple regression model, the sample size must be at least one greater than the number of
independent variables. However, it is recommended that the sample size should be at least four
times the number of independent variables.

Correlation coefficients are the quantitative measure used to determine the strength of the linear
relationship between two variables.

The variance inflation factor measures multicollinearity in the regression model. The analysis of
variance F-test is a method for testing whether the overall model is significant.

The variance inflation factor is an indication of the significance of the regression model.

Your Answer: True


Correct Answer: False

The R-Squared value is a measure of the percentage of explained variation in the dependent
variable that takes into account the relationship between the sample size and the number if
independent variables in the regression model.

Your Answer: False

46
The Adjusted R-Squared value is a measure of the percentage of explained variation in the
dependent variable that takes into account the relationship between the sample size and the
number if independent variables in the regression model.

A complete polynomial model contains terms of all orders less than or equal to the pth order.

Your Answer: True


A second-order regression model produces a parabola which opens up or down. -- TRUE

The coefficient of partial determination is a measure of the marginal contribution of each


independent variable, given that other independent variables are in the model.

Your Answer: True

The least squares method minimizes which of the following?

Your Answer: total sum of squares


Correct Answer: sum of squared residuals

In the regression model, both the x and y variable are considered to be random variables.

Your Answer: False

The range of possible values for the correlation coefficient is 0 to 1.

Your Answer: True


Correct Answer: False

If the dependent variable decreases as the independent variable increases, the


correlation coefficient is negative. The range is -1 to 1

The t-test for determining whether the population correlation is


significantly different from 0.0 requires that the data are at
least categorical level.

Your Answer: True


Correct Answer: False

The t-test for determining whether the population correlation is significantly


different from 0.0 requires that the data are at least interval or ratio level.

The null hypothesis in a two-tailed significance test for the correlation is H0: ρ = 0.

Your Answer: True

47
Researchers tested to see if there is a correlation between drinking Beverage A and cancer. If they
found the correlation to be 0.9, they can assume that drinking Beverage A causes cancer.

Your Answer: True


Correct Answer: False

In a linear regression model, the actual y values for each level of x are
uniformly distributed around the mean of y.

Your Answer: True


Correct Answer: False

The actual y values are normally distributed around the mean of y.

The regression line passes through the point .

Your Answer: True

β0 and β1 are considered sample statistics.

Your Answer: True


Correct Answer: False

β0 and β1 are considered population


parameters. b0 and b1 are sample statistics.

It the correlation coefficient is negative, the slope of the


graph will also be negative.

Your Answer: False


Correct Answer: True

The sign of the correlation coefficient and the slope will


always be the same

The residual in a regression model is defined as the


difference between the average value and the predicted
value of the dependent variable for a given level of the
independent variable.

Your Answer: True


Correct Answer: False

The residual in a regression model is defined as the difference between the actual value and the
predicted value of the dependent variable for a given level of the independent variable.

48
The chi-square goodness-of-fit test can be used to determine whether the sample data come from
a normally distributed population.

a uniformly distributed population.

a binomially distributed population.

all of the above.

A researcher is interested in determining whether or not a set of data follows the


normal distribution but he is uncertain of the mean and standard deviation and
will estimate the mean and standard deviation using the sample data. He has
decided to group the data into six categories. The degrees of freedom are equal
to ____ .

Your Answer: 6
Correct Answer: 3

For every parameter that is estimated using sample data, you lose one additional degree of
freedom. In this case both the mean and standard deviation need to be estimated from sample
data so the degrees of freedom are equal to k-1-2 where k=6 is equal to the number of
categories. The degrees of freedom are 3.

When performing a chi-


square goodness-of-fit
test, a large value of the
chi-square statistic
provides evidence that
the null hypothesis
should be rejected.

Your Answer: False


Correct Answer: True

The rejection region is always in the upper tail.

2. The null hypothesis for a chi-square


goodness-of-fit test states that the
population data does not follow the
hypothesized distribution.

Your Answer: True


Correct Answer: False

The alternative hypothesis for a chi-square


goodness-of-fit test states that the

49
population does not follow the hypothesized
distribution.

3. The degrees of freedom for a chi-square


goodness-of-fit test are calculated as N-1.

Your Answer: True


Correct Answer: False

The degrees of freedom are equal to k-1


where k is the number of categories or
observed cell frequencies.

4. The chi-square goodness-of-fit test is used


to test discrete distributions only.

Your Answer: True


Correct Answer: False

The chi-square goodness-of-fit test can be


used with both discrete and continuous data
but the continuous data must be grouped
into categories before the statistic can be
calculated.

5. The Jerome Light Bulb Company recently


conducted a statistical test to determine
whether the number of hours that light bulbs
last is normally distributed with a mean of
500 and a standard deviation of 20. A
sample of 300 light bulbs was divided into 8
groups to form a grouped data frequency
distribution. The degrees of freedom for the
test will be 299.

Your Answer: False

The degrees of freedom are k-1=8-1-7.

6. Contingency analysis helps to make


decisions when multiple means are involved.

Your Answer: False

50
Contingency analysis helps to make
decisions when multiple proportions are
involved.

7. In contingency analysis, we expect the


actual frequencies in each cell to
approximately match the corresponding
expected cell frequencies when the
characteristics are independent.

Your Answer: True

The values for the expected cell frequencies


assume that the characteristics are
independent.

8. If a contingency analysis test is performed


with a 4 × 5 design, the number of degrees
of freedom is for determining the chi-square
critical value is 20 – 1 = 19.

Your Answer: True


Correct Answer: False

Degrees of freedom are calculated as (r-


1)×(c-1) =(4-1)×(5-1)=12.

9. If any of the expected cell frequencies are


less than 5, categories can be combined so
that all expected frequencies are at least 5.

Your Answer: False


Correct Answer: True

If any of the expected cell frequencies are


less than 5, categories can be combined so
that all expected frequencies are at least 5.

10. A survey was conducted in which males and


females were asked whether they owned a
laptop personal computer. The following data
were observed.

51
To test whether having a laptop is
independent of gender, the expected cell
frequency for males who have a laptop is
106.67.

Your Answer: True

The expected frequency is (Row


total)(Column total)/(Grand total) =
(200)(160)/300=106.67.

52

You might also like