Tute - 04
Tute - 04
CPM: 19601
MC: 94269
An incredibly basic statistical tool for displaying the relationship between two variables is a scatter
diagram. To fit a model between the two variables, it is frequently combined with a straightforward
linear regression line. Even though scatterplots can appear to be a mess, occasionally we can spot trends
in the data. The two graphs on the left, for instance, appear to be roughly following a line: the top graph
appears to follow a line with a positive slope, while the bottom graph appears to follow a line with a
negative slope. When we say that the data in a scatterplot appears to follow a trend, what we’re really
saying is that it appears to follow some line, or maybe some other kind of curve, like for example an
exponential curve or sinusoidal curve.
No matter the shape of the curve that the data follows, we call it the approximating curve, and the
process of finding the equation of the approximating curve is called curve fitting. Regression line: As
soon as we saw the plotted points, it was natural for us to begin searching for trends in the scatterplots.
In fact, when using scatterplots, we probably spend most of our time identifying trends. The plot by
itself isn't all that useful, but if we can use the plot to spot a pattern in the data, we might be able to use
that pattern to infer meaning from the data or make predictions about it.
A regression line will be used most of the time to accomplish this. It is the line in a scatterplot that most
effectively depicts the trend in the data. The terms best-fit line, line of best fit, and least-squares all refer
to regression lines. T The regression line is a trend line that we use to simulate a linear trend that we see
in a scatterplot, but we must be aware that not all data will exhibit a linear relationship. The regression
curve would be parabolic, for instance, if the relationship resembled the parabola's curve. We'll
primarily concentrate on linear regression for the remainder of this lesson.
2. Discuss 4 practical situations where regression analysis may be used in business decision-making.
One of the most popular statistical methods is linear regression. The relationship between one or more
predictor variables and a response variable is quantified using this method. Simple linear regression, the
most fundamental type of linear regression, is employed to quantify the relationship between a single
predictor variable and a single response variable. The relationship between multiple predictor variables
and a response variable can be quantified using multiple linear regression if we have more than one
predictor variable.
For instance, they might fit a simple linear regression model with revenue as the response
variable and advertising spending as the predictor variable. The regression model would look
like this: When there are no advertisements, the coefficient β0 would represent the total
expected revenue. When ad spending is increased by one unit, the coefficient β1 would
represent the typical change in total revenue (e.g., one dollar). If β1 is negative, then more
advertising expenditures will result in lower revenue. If β1 is nearly zero, advertising
expenditures have little impact on revenue. And if β1 is positive, it would imply that higher
advertising expenditures are linked to higher revenue. Depending on the value of β1, a company
may decide to either decrease or increase their ad spending.
Medical researchers often use linear regression to understand the relationship between drug dosage
and the blood pressure of patients.
blood pressure = β0 + β1(dosage)
For instance, researchers may give patients different dosages of a specific medication and track
how their blood pressure changes. They might use dosage as the predictor variable and blood
pressure as the response variable in a simple linear regression model. The regression model
would look like this: When the dosage is zero, the coefficient β0 would represent the anticipated
blood pressure. The average change in blood pressure when the dosage is increased by one unit
would be represented by the coefficient β1. If β1 is negative, it indicates that a dosage increase
is linked to a drop in blood pressure. A dosage increase is not associated with a change in blood
pressure if β1 is close to zero. If β1 is positive, it would mean that an increase in dosage is
associated with an increase in blood pressure. Depending on the value of β1, researchers may
decide to change the dosage given to a patient.
Agricultural scientists often use linear regression to measure the effect of fertilizer and water on
crop yields.
crop yield = β0 + β1(amount of fertilizer) + β2(amount of water)
For instance, researchers may vary the water and fertilizer applications in various fields to
observe the effects on crop yield. They might fit a multiple linear regression model with crop
yield as the response variable and fertilizer and water as the predictor variables. The expected
crop yield in the absence of fertilizer or water would be represented by the coefficient β0 in the
regression model. If the amount of water doesn't change, the coefficient β1 would represent the
typical change in crop yield when fertilizer is increased by one unit. The coefficient β2 would
represent the average change in crop yield when water is increased by one unit, assuming the
amount of fertilizer remains unchanged. Depending on the values of β1 and β2, the scientists
may change the amount of fertilizer and water used to maximize the crop yield.
Data scientists for professional sports teams often use linear regression to measure the effect that
different training regimens have on player performance.
points scored = β0 + β1(yoga sessions) + β2(weightlifting sessions)
For instance, data scientists in the NBA may examine how various frequencies of yoga and
weightlifting sessions each week affect a player's point total. With yoga and weightlifting
sessions as the predictor variables and total points earned as the response variable, they could
fit a multiple linear regression model. The regression model would look like this: The coefficient
0 would represent the predicted number of points for a player who does neither yoga nor
weightlifting. If the number of weekly weightlifting sessions stays the same, the coefficient of 1
would represent the average change in points earned when weekly yoga sessions are increased
by one. The coefficient β2 represents the average change in points scored when weekly
weightlifting sessions are increased by one while weekly yoga sessions remain constant.
Depending on the values of β1 and β2, data scientists may advise a player to engage in weekly
yoga and weightlifting sessions to maximize the points scored.
3. What is the difference between a simple regression analysis and a multiple regression analysis,
give examples for the two in a functional form (Describe your independent and dependent
variables clearly?
E: g
y = bx+ a
Where,
y is a dependent variable we need and to find, x is an independent variable. The constants “a” and
“b” drives the equation. But according to our definition, as the multiple regression takes several
independent variables (x), so for the equation we will have multiple x values too.
If we consider the performance of students decreases only by not having additional technology
support to attend online classes. Then we need to find the relationship between only 2 variables.
b*t +a =bt+a
Multiple regression analysis
p= decrease in performance
Interpretation of function
y = bx+ a
4. Regression analysis is the most useful and common method of demand estimation. Do you
agree? Explain.
Yes, I do agree.
In general, there are many ways to estimate the demand of a firm such as consumer surveys,
However, researchers have preferred the regression technique and even complimented the
other techniques mentioned above with regression, in many instances due to their advantages
such as:
Objective nature: the results from a regression will not differ (drastically) from one
researcher to another and will provide consistent analysis of data.
Provides more complete information; based on economic theories, regression
analysis can help determine cause-and-effect relations and even prove or disprove
certain prevailing theoretical notions.
Less costly; conducting regression is less costly and can be done relatively quickly
using basic statistical software.
5. Explain the rationale behind deriving the regression line using the ordinary least squares (OLS)
method?
While the scatter plot provides coordinates for dependent-independent variable combinations,
there may be observations of multiple coordinates that deviate from the average (mean) value of
coordinates. This difference is referred to as the vertical deviation or the error of the observed data
from the calculated regression line.
The basic goal of the ordinary least squares approach is to keep this deviation or error (denoted by e
and in some sources as u) to a minimum.
Since the sum of deviations equals zero (i.e.:∑𝑛 𝑒𝑡 = 0), w e o v e r c o m e t h i s b y t a k i n g
𝑡=1
S q u a r e d d e v i a ti o n s a n d a i m e d t o m i n i m i z e t h e m u s i n g d i ff e r e n ti a ti o n .
6. Fill in the blanks of the following equation and briefly explain what each term represents
1. The dependent (response) variable and the independent (predictor) variable should have
a linear and additive relationship (s). A linear connection implies that a change in reaction Y
caused by a one-unit change in X1 is constant, independent of X1. An additive connection
implies that X1's influence on Y is unaffected by other factors.
2. No correlation should exist between the residual (error) terms. Autocorrelation is the
absence of this phenomenon.
3. There should be no correlation between the independent variables. The absence of this
phenomenon is referred to as multicollinearity.
4. The variance of the error terms must be constant. This is referred to as homoscedasticity.
Heteroscedasticity is the existence of non-constant variation.
8. Develop a hypothetical regression function to determine the sales revenue of a business and
interpret the equation developed (You may also incorporate the error term in your answer).
𝑌̂𝑡=10.4+6.25𝑋𝑡
Where,
Y = Sales Revenue
X = Advertising Expenditure
When the spending on advertising is dropped to zero, the sales revenue is expected to be
Rs. 10.4 million.
For each increase in advertising by Rs. 1 million, the sales revenue is expected to rise
by 6.25 million rupees.
If the advertising expense values observed to obtain this equation are very distant from
Rs. 0, then the value of autonomous sales (Rs. 10.4 million), has no meaning as the
observed advertising expenses are not within, or is even far away from Rs. 0.