0% found this document useful (0 votes)
8 views21 pages

MLR TestingSignificance

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

MLR TestingSignificance

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Multiple Linear Regression

Subtitle
Content

• Testing Significance overall significance of


Overall fit of the model
• Testing for Individual Regression Coefficients
Important terms
› R2
› Hypothesis Testing
› Significance level
› Degree of freedom
› One Tailed and 2 Tailed Test
R2
› It is coefficient of Determination
› It tests how good your model compared to worst Model
› R-Squared is a statistical measure in a regression model that determines the
proportion of variance in the dependent variable that can be explained by
the independent variable. In other words, r-squared shows how well the data fit
the regression model (the goodness of fit).
› The most common interpretation of r-squared is how well the regression model
explains observed data. For example, an r-squared of 60% reveals that 60% of
the variability observed in the target variable is explained by the regression
model. Generally, a higher r-squared indicates more variability is explained by
the model.
R2

The goodness of fit of regression models can be analyzed on the


basis of the R-square method. The more the value of the r-square
near 1, the better the model is.

The value of R-square can also be negative when the model fitted is
worse than the average fitted model.
R-Squared Vs Adjusted R-Squared
› Adjusted R-Squared is an updated version of R-squared which
takes account of the number of independent variables while
calculating R-squared.
› The main problem with R-squared is that the R-Square value always
increases with an increase in independent variables irrespective of
the fact that where the independent variable is contributing to the
model or not.
Hypothesis Testing
Hypothesis testing is done to confirm our observation
about the population using sample data, within the
desired error level. Through hypothesis testing, we can
determine whether we have enough statistical evidence
to conclude if the hypothesis about the population is
true or not.

When we fit a straight line through a linear regression


model, we get the slope and intercept for the line.
Hypothesis testing is used to confirm if our beta
coefficients are significant in a linear regression model.

Key steps to perform hypothesis test are as follows:

• Formulate a Hypothesis
• Determine the significance level
• Determine the type of test
• Calculate the Test Statistic values and the p
values
• Make Decision

Formulating Hypothesis
› One of the key steps to do this is to formulate the below two hypotheses:
› The null hypothesis represented as H₀ is the initial claim that is based on the
prevailing belief about the population.
The alternate hypothesis represented as H₁ is the challenge to the null
hypothesis. It is the claim which we would like to prove as True

There is no relationship between the dependent variable y and the independent


variable xi. In this case, the regression coefficient βi is zero. This is the claim
for the null hypothesis in an individual regression coefficient test: H0:βi=0.
Significance level
› In regression analysis, the significance level (often denoted as α) is a threshold
used to determine whether the coefficients of the independent variables in the
model are statistically significant. It essentially tells you the probability of
incorrectly rejecting the null hypothesis when it is actually true i.e.Alpha
represents an acceptable probability of a Type I error
› For example, if we choose a significance level of 0.05 (commonly used), it means
we are willing to accept a 5% chance of incorrectly rejecting the null hypothesis.
› So, if the absolute value of the t-statistic is greater than the critical value
corresponding to α = 0.05, we reject the null hypothesis and conclude that the
coefficient is statistically significant.
› In practice, the most commonly used alpha values are 0.01, 0.05, and 0.1, which
represent a 1%, 5%, and 10% chance of a Type I error, respectively
P-Value
› A p-value is a metric that expresses the likelihood that an observed difference
could have occurred by chance. As the p-value decreases the statistical
significance of the observed difference increases. If the p-value is too low, you
reject the null hypothesis.
› E.g. you are trying to test whether the new advertising campaign has increased
the product's sales. The p-value is the likelihood that the null hypothesis, which
states that there is no change in the sales due to the new advertising campaign,
is true.
› If the p-value is .30, then there is a 30% chance that there is no increase or
decrease in the product's sales. If the p-value is 0.03, then there is a 3%
probability that there is no increase or decrease in the sales value due to the
new advertising campaign. As you can see, the lower the p-value, the chances of
the alternate hypothesis being true increases, which means that the new
advertising campaign causes an increase or decrease in sales.
Degree of freedom
› Degrees of freedom refer to the maximum number of logically
independent values, which may vary in a data sample.

› In regression analysis, the degrees of freedom (df) represent the number


of independent pieces of information available for estimating a
parameter.

› The degrees of freedom in regression analysis depend on the number of
observations (N), the number of independent variables in the model (k),
and any constraints imposed on the model.
Degree of freedom
› Suppose we have a simple linear regression model:
› Yi​=β0​+β1​Xi​+εi​
• Yi​is the dependent variable.
• Xi​is the independent variable.
• β0​and1β1​are the intercept and slope coefficients, respectively.
• εi​is the error term.
› In this example, there are two parameters to estimate:β0​and β1​
.
• The degrees of freedom in this case would be: df=N−k−1 , N is the number of
observations. k is the number of independent variables (excluding the intercept).
Degree of freedom
› Suppose we have data on the heights (X) and weights (Y) of 10
individuals. We want to fit a simple linear regression model to predict
weight from height.
• Number of observations (N) = 10
• Number of independent variables (k) = 1 (height)
• Intercept (β0​
) and slope (β1​
) are the parameters to estimate.
› Therefore, the degrees of freedom would be: df=10−1−1=8
› This means that in this regression model, there are 8 degrees of freedom
available for estimating the parameters. It's essentially the number of
data points that provide independent information for parameter
estimation after accounting for the constraints imposed by the model.
One Tailed and 2 Tailed Test
› A one-tailed test and a two-tailed test are two types of hypothesis tests
used in statistical analysis to assess the significance of a relationship or
difference in a population parameter.
› A one-tailed test results from an alternative hypothesis which specifies
a direction. i.e. when the alternative hypothesis states that the parameter
is in fact either bigger or smaller than the value specified in the null
hypothesis.
› A two-tailed test results from an alternative hypothesis which does not
specify a direction.
One-tailed Tests
› A one-tailed test may be either left-tailed or right-tailed.
› A left-tailed test is used when the alternative hypothesis states that the
true value of the parameter specified in the null hypothesis is less than
the null hypothesis claims.
› A right-tailed test is used when the alternative hypothesis states that the
true value of the parameter specified in the null hypothesis is greater
than the null hypothesis claims
› E.g. The manufacturer now decides that it is only interested whether the
mean lifetime of an energy-saving light bulb is less than 60 days.
– H0 : The mean lifetime of an energy-saving light bulb is 60 days.
– H1: The mean lifetime of an energy-saving light bulb is less than
60days.
› we have a “less than” in the alternative hypothesis. This means that we
will perform a left-sided one-tailed test.
Two-tailed Tests
› The main difference between one-tailed and two-tailed tests is that one-
tailed tests will only have one critical region whereas two-tailed tests will
have two critical regions.

› E.g. A light bulb manufacturer claims that its' energy saving light bulbs
last an average of 60 days. Set up a hypothesis test to check this claim
and comment on what sort of test we need to use.
› So we have
– H0 : The mean lifetime of an energy-saving light bulb is 60 days
– H1 : The mean lifetime of an energy-saving light bulb is not 60 days.

› Because of the “is not” in the alternative hypothesis, we have to consider


both the possibility that the lifetime of the energy-saving light bulb is
greater than 60 and that it is less than 60. This means we have to use a
two-tailed test.
One-tailed Tests &Two-tailed Tests
Testing for Significance
› The F test is used to determine whether a significant
relationship exists between the dependent variable and the set
of all the independent variables; we will refer to the F test as the
test for overall significance.
› If the F test shows an overall significance, the t test is used to
determine whether each of the individual independent variables
is significant.
› A separate t test is conducted for each of the independent
variables in the model; we refer to each of these t tests as a test
for individual significance.
Testing for Significance
F Statistics

You might also like