0% found this document useful (0 votes)
23 views13 pages

Chapter Five

Uploaded by

tsegaab temesgen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views13 pages

Chapter Five

Uploaded by

tsegaab temesgen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Chapter 5

Analysis Of Variance (Anova)

Upon completion of this unit, you will be able to


explain
 What is ANOVA?

 How does ANOVA work?

 Types of ANOVA

 ANOVA assumptions

 Why use ANOVA?

 Benefits of ANOVA for businesses

 ANOVA examples: When might you use it?

 How to conduct an ANOVA test

 ANOVA analysis

 What are the limitations of ANOVA?

Introductions to Analysis of Variance

In this Lesson, we introduce Analysis of Variance or ANOVA.


ANOVA is a statistical method that analyzes variances to determine
if the means from more than two populations are the same. In other
words, we have a quantitative response variable and a categorical
explanatory variable with more than two levels. In ANOVA, the
categorical explanatory is typically referred to as the factor.

What is analysis of variance (ANOVA)?


Analysis of Variance (ANOVA) is a statistical formula used to
compare variances across the means (or average) of different
groups. A range of scenarios use it to determine if there is any
difference between the means of different groups.
ANOVA is a statistical test used to compare the means of multiple
groups. Discover how it works, when to use it and why it’s a
powerful tool for businesses.

ANOVA, or Analysis of Variance, is a test used to determine


differences between research results from three or more unrelated
samples or groups.

You might use ANOVA when you want to test a particular


hypothesis between groups, determining – in using one-way
ANOVA – the relationship between an independent variable and one
quantitative dependent variable.

An example could be examining how the level of employee training


impacts customer satisfaction ratings. Here the independent
variable is the level of employee training; the quantitative dependent
variable is customer satisfaction.

You would use ANOVA to help you understand how employees of


different training levels – for example, beginner, intermediate and
advanced – with the null hypothesis for the test being that they
have the same customer satisfaction ratings. If there is a
statistically significant result, it means the null hypothesis is
rejected – meaning the employee groups performed differently.

The key word in ‘Analysis of Variance’ is the last one. ‘Variance’


represents the degree to which numerical values of a particular
variable deviate from its overall mean.

ANOVA Terminology

Dependent variable: This is the item being measured that is


theorized to be affected by the independent variables.

Independent variable: These are the items being measured that


may have an effect on the dependent variable.
A null hypothesis (H0): This is when there is no difference between
the groups or means. Depending on the result of the ANOVA test,
the null hypothesis will either be accepted or rejected.

An alternative hypothesis (H1): When it is theorized that there is


a difference between groups and means.

Factors and levels: In ANOVA terminology, an independent


variable is called a factor which affects the dependent variable.
Level denotes the different values of the independent variable that
are used in an experiment.

Types of ANOVA

There are various approaches to using ANOVA for your data


analysis. Here’s an introduction to some of the most common ones.

One-way ANOVA

One-way ANOVA is its most simple form – testing differences


between three or more groups based on one independent variable.
For example, comparing the sales performance of different stores in
a retail chain.

Two-way ANOVA

Used when there are two independent variables, two-way ANOVA


allows for the evaluation of the individual and joint effects of the
variables. For example, it could be used to understand the impact
of both advertising spend and product placement on sales revenue.
What’s the difference between one-way and two-way ANOVA
tests?
This is defined by how many independent variables are included in
the ANOVA test. One-way means the analysis of variance has one
independent variable, two-way means the test has two independent
variables.

ANOVA assumptions

Like other types of statistical methods, ANOVA compares the means


of different groups and shows you if there are any statistical
differences between the means. ANOVA is classified as an omnibus
test statistic. This means that it can’t tell you which specific groups
were statistically significantly different from each other, only that at
least two of the groups were.

ANOVA relies on the main assumptions that must be met for the
test results to be valid.

Normality

The first assumption is that the groups each fall into what is called
a normal distribution. This means that the groups should have a
bell-curve distribution with few or no outliers.

Homogeneity of variance

Also known as homoscedasticity, this means that the variances


between each group are the same.

Independence

The final assumption is that each value is independent from each


other. This means, for example, that unlike a conjoint analysis the
same person shouldn’t be measured multiple times.
Why use ANOVA?

ANOVA is a versatile and powerful statistical technique, and the


essential tool when researching multiple groups or categories. The
one-way ANOVA can help you know whether or not there are
significant differences between the means of your independent
variable.

Why is that useful? Because when you understand how the means
of each group in your independent variable differ, you can begin to
understand which of them has a connection to your dependent
variable (such as landing page clicks) and begin to learn what is
driving that behavior.

You could also repeat this test multiple times to see whether or not
a single independent variable (such as temperature) affects multiple
dependent variables (such as purchase rates of sun cream,
attendance at outdoor venues and likelihood to hold a cook-out)
and if so, which ones.

Benefits of ANOVA for businesses

ANOVA has a wide range of applications in research across


numerous fields, from social sciences to medicine, and industrial
research to marketing.

Its unique benefits make ANOVA particularly valuable to


businesses. Here are its three main use cases in the business
world.
Informing decision making

Businesses can use ANOVA to inform decisions about product


development, marketing strategies and more.

Using resources

By identifying which variables have the most significant impact on a


particular outcome, businesses can better allocate resources to
those areas.

Understanding different variables

ANOVA doesn’t just tell you that differences exist between groups –
it can also reveal the interaction between different variables. This
can help businesses better understand complex relationships and
dynamics, leading to more effective interventions and strategies.

ANOVA application

• ANOVA can be used in situations where the researcher is


interested in the differences in sample means across three or more
categories

ANOVA examples: When might you use it?

Here’s how different types of ANOVA test can be used to solve


different questions a business could face.

Does the geographical region have an effect on the sales


performance of a retail chain?

A ANOVA can be used to answer this question, as you have one


independent variable (region) and one dependent variable (sales
performance).

You’ll need to collect data for different geographical regions where


your retail chain operates – for example, the USA’s Northeast,
Southeast, Midwest, Southwest and West regions. A one-way
ANOVA can then assess the effect of these regions on your
dependent variable (sales performance) and determine whether
there is a significant difference in sales performance across these
regions.

Does the time of year and type of product have an effect on the
sales of a company?

To answer this question, a two-way ANOVA can be used, as you


have two independent variables (time of year and product type) and
one dependent variable (sales).

A two-way ANOVA can then simultaneously assess the effect of


these variables on your dependent variable (sales) and determine
whether there is an interaction effect between the time of the year
and the type of product on the company’s sales.

Anova Formula
Analysis of variance, or ANOVA, is a strong statistical technique
that is used to show the difference between two or more means or
components through significance tests. It also shows us a way to
make multiple comparisons of several populations means. The
Anova test is performed by comparing two types of variation, the
variation between the sample means, as well as the variation within
each of the samples.
F = MST/MSE

1. Where,

F = Anova Coefficient.

MSB = Mean sum of squares between the groups.

MSW = Mean sum of squares within the groups.


Why does ANOVA work?
Some people question the need for ANOVA; after all, mean values
can be assessed just by looking at them. But ANOVA does more
than only comparing means.
Even though the mean values of various groups appear to be
different, this could be due to a sampling error rather than the
effect of the independent variable on the dependent variable. If it is
due to sampling error, the difference between the group means is
meaningless. ANOVA helps to find out if the difference in the mean
values is statistically significant.

Questions that ANOVA helps to answer


Organizations use ANOVA to make decisions about which
alternative to choose among many possible options. For example,
ANOVA can help to:
 Compare the yield of two different wheat varieties under three
different fertilizer brands.
 Compare the effectiveness of various social media advertisements
on the sales of a particular product.
 Compare the effectiveness of different lubricants in different types
of vehicles.

Limitations of ANOVA
ANOVA can only tell if there is a significant difference between the
means of at least two groups, but it can’t explain which pair differs
in their means. If there is a requirement for granular data,
deploying further follow up statistical processes will assist in
finding out which groups differ in mean value. Typically, ANOVA is
used in combination with other statistical methods.
ANOVA also makes assumptions that the dataset is uniformly
distributed, as it compares means only.
Chapter 6
Correlation and Simple Linear Regression
Introduction

Correlation and regression are statistical methods that are


commonly used in the numerous literatures to associate two or
more variables. Although frequently confused, they are quite
different. Correlation measures the association between two
variables and quantitates the strength of their relationship.
Correlation evaluates only the existing data. Regression uses the
existing data to define a mathematical equation which can be used
to predict the value of one variable based on the value of one or
more other variables and can therefore be used to extrapolate
between the existing data. The regression comparison can therefore
be used to predict the outcome of observations not previously seen
or tested.

CORRELATION

Correlation provides a numerical measure of the linear or “straight-


line” relationship between two continuous variables X and Y. The
resulting correlation coefficient or “r value” is more formally known
as the Pearson product moment correlation coefficient after the
mathematician who first described it. X is known as the
independent or explanatory variable while Y is known as the
dependent or response variable. A significant advantage of the
correlation coefficient is that it does not depend on the units of X
and Y and can therefore be used to compare any two variables
regardless of their units.

An essential first step in calculating a correlation coefficient is to


plot the observations in a “scatter gram” or “scatter plot” to visually
evaluate the data for a potential relationship or the presence of
outlying values. It is frequently possible to visualize a smooth curve
through the data and thereby identify the type of relationship
present. The independent variable is usually plotted on the X-axis
while the dependent variable is plotted on the Y-axis. A “perfect”
correlation between X and Y (Figure a) has an r value of 1 (or -1). As
X changes, Y increases (or decreases) by the same amount as X,
and we would conclude that X is responsible for 100% of the
change in Y. If X and Y are not related at all (i.e., no correlation)
(Figure b), their r value is 0, and we would conclude that X is
responsible for none of the change in Y.

a) perfect linear correlation b) no correlation c) positive correlation

(r = 1) (r = 0) (0 < r < 1)

d) negative correlation

(-1 < r < 0)

Types of Correlations
If the data points assume an oval pattern, the r value is somewhere
between 0 and 1, and a moderate relationship is said to exist. A
positive correlation (Figure c) occurs when the dependent variable
increases as the independent variable increases. A negative
correlation (Figure d) occurs when the dependent variable increases
as the independent variable decreases or vice versa.

Perfect correlations (r value = 1 or -1) are rare, especially in


medicine where physiologic changes are due to multiple
interdependent variables as well as inherent random biologic
variation. Further, the presence of a correlation between two
variables does not necessarily mean that a change in one variable
necessarily causes the change in the other variable. Correlation
does not necessarily imply causation.

REGRESSION

Regression analysis mathematically describes the dependence of


the Y variable on the X variable and constructs an equation which
can be used to predict any value of Y for any value of X. It is more
specific and provides more information than does correlation.
Unlike correlation, however, regression is not scale independent
and the derived regression equation depends on the units of each
variable involved. As with correlation, regression assumes that each
of the variables is normally distributed with equal variance. In
addition to deriving the regression equation, regression analysis
also draws a line of best fit through the data points of the scatter
gram. These “regression lines” may be linear, in which case the
relationship between the variables fits a straight line, or nonlinear,
in which case a polynomial equation is used to describe the
relationship.
Regression (also known as simple regression, linear regression, or
least squares regression) fits a straight line equation of the
following form to the data:

Y = a + bX
where Y is the dependent variable, X is the single independent
variable, a is the Y-intercept of the regression line, and b is the
slope of the line (also known as the regression coefficient).

Once the equation has been derived, it can be used to predict the
change in Y for any change in X. It can therefore be used to
extrapolate between the existing data points as well as predict
results which have not been previously observed or tested.

A t test is utilized to ascertain whether there is a significant


relationship between X and Y, as in correlation, by testing whether
the regression coefficient, b, is different from the null hypothesis of
zero (no relationship).

Along with the regression equation, slope, and intercept, regression


analysis provides another useful statistic: the standard error of the
slope. Just as the standard error of the mean is an estimate of how
closely the sample mean approximates the population mean, the
standard error of the slope is an estimate of how closely the
measured slope approximates the true slope. It is a measure of the
“goodness of fit” of the regression line to the data and is calculated
using the standard deviation of the residuals.

Residuals represent the difference between the observed value of Y


and that which is predicted by X using the regression equation. If
the regression line fits the data well, the residuals will be small.
Large residuals may point to the presence of outlying data which,
as in correlation, can significantly affect the validity of the
regression equation.

The steps in calculating a regression equation are similar to those


for calculating a correlation coefficient. First, a scatter gram is
plotted to determine whether the data assumes a linear or
nonlinear pattern. If outliers are present, the need for nonlinear
regression, transformation of the data, or non-parametric methods
should be considered. Assuming the data are normally distributed,
the regression equation is calculated. The residuals are then
checked to confirm that the regression line fits the data well. If the
residuals are high, the possibility of non-normally distributed data
should be reconsidered.

When reporting the results of a regression analysis, one should


report not only the regression equation, regression coefficients, and
their significance levels, but also the standard deviation or variance
of each regression coefficient and the variance of the residuals. A
common practice is to “standardize” the regression coefficients by
converting them to the standard normal (z) distribution. This allows
regression coefficients calculated on different scales to be compared
with one another such that conclusions can be made independent
of the units involved. Confidence bands (similar to confidence
intervals) can also be calculated and plotted along either side of the
regression line to demonstrate the potential variability in the line
based on the standard error of the slope.

You might also like