0% found this document useful (0 votes)
3 views

Linear Regression

The document discusses the application of linear regression in statistics, detailing its advantages, disadvantages, assumptions, and limitations. It highlights various fields where linear regression is utilized, such as epidemiology, finance, econometrics, and environmental science. The paper emphasizes the importance of meeting specific assumptions for effective linear regression analysis and outlines the types of linear regression, including simple and multiple linear regression.

Uploaded by

huey.lin.jp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Linear Regression

The document discusses the application of linear regression in statistics, detailing its advantages, disadvantages, assumptions, and limitations. It highlights various fields where linear regression is utilized, such as epidemiology, finance, econometrics, and environmental science. The paper emphasizes the importance of meeting specific assumptions for effective linear regression analysis and outlines the types of linear regression, including simple and multiple linear regression.

Uploaded by

huey.lin.jp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Statistics and Applied Mathematics 2023; 8(6): 133-137

ISSN: 2456-1452
Maths 2023; 8(6): 133-137
© 2023 Stats & Maths Application of linear regression with their advantages,
https://www.mathsjournal.com
Received: 09-08-2023 disadvantages, assumption and limitations
Accepted: 16-09-2023

P Anandhi P Anandhi and Dr. E Nathiya


Assistant Professor of Statistics,
Department of Mathematics,
Sona College of arts and science,
DOI: https://dx.doi.org/10.22271/maths.2023.v8.i6b.1463
Salem, Tamil Nadu, India
Abstract
Dr. E Nathiya Regression analysis is one of the most commonly used strategies in statistics. The simple purpose of
Assistant Professor, Department regression analysis is to match a version that finely describes the connection among one or more
of Statistics, Government Arts predictor variables and a reaction variable. Regression strategies are the most extensively used statistical
College for Women, Salem, Tamil strategies hired on a huge form of optimization troubles within the area of carried out studies. The
Nadu, India fundamental forms of linear regression strategies could be reviewed along with their applications,
advantages, and drawbacks to endorse a manner of choosing regression strategies for specific forms of
optimization troubles.

Keywords: Direct retrogression, Simple direct retrogression, multiple direct retrogression

Introduction
In this paper, the application, advantages, assumptions and limitations and disadvantages of
following linear regression strategies in studies are provided. Linear regression technique to
are expecting rainfall in India turned into utilized by reference [1]. Reference [2], used the
electricity regression technique to look at the impact of amassed oxygen deficit. According to
the exponential electricity distribution, reference [3] advanced Bayesian evaluation for the
linear regression version with random mistakes distribution. In reference [4], a contrast of
estimating diffusive CH4 through closed chambers the use of linear and exponential regression
turned into made. 12 ELEKTRON MAGAZINE Reference [5], provided the bearing Residual
Useful Life (RUL) estimation through featuring a brand new method through combining data-
pushed and version primarily based totally strategies. Estimation of bulk electricity structures
the use of linear regression-primarily based totally disturbance value approach turned into
provided through reference [6]. In reference [7], a more than one linear regression technique
turned into used to forecast constructing strength performance. Reference [8], provided using
more than one linear regression strategies with interactions to version and forecast hourly
electric powered load. In reference [9], the strength performance of the economic homes turned
into modelled the use of the bushy linear regression approach.

Direct retrogression
Linear regression is a method of analyzing data that predicts the value of unknown data using
another associated and known data value. It mathematically models the unknown or dependent
variable and the known or independent variable as a direct equation. Retrogression is a
statistical fashion enforced in the fields of engineering, business, finance, clinical care, and
other disciplines with the thing of discovering the correspondence between one dependent
variable and a chain of different unprejudiced variables. There is numerous retrogression
strategies described in the literature that are used for study purposes.
For illustration, one would conceivably need to determine a courting pattern among the
Corresponding Author: burdens according to their heights through the use of the direct retrogression system. Before
P Anandhi trying to fit a direct interpretation to the set up data, one needs to first test whether there's-- or
Assistant Professor of Statistics, no longer is-- a dating of pursuits among the variables. To estimate the robustness of the
Department of Mathematics,
Sona College of arts and science, connection among variables, a matter plot may be a salutary tool. (1( ܽ =). The canonical
Salem, Tamil Nadu, India expression used for the direct retrogression system is proven through equation (1), in which y

~133~
International Journal of Statistics and Applied Mathematics https://www.mathsjournal.com

is the established variable, x is the unprejudiced variable, a is Implementation of linear regression


the intercept value (while = 0), and b is the pitch of the line. 1. Epidemiology
Figure 1 suggests the direct retrogression wind. Linear Relating smoking to mortality and illnesses got here from the
retrogression is a machine literacy conception that's used to observational studies imposing linear regression evaluation.
make or train models (Fine models or equations) for working For example, we've got thirteen ELEKTRON MAGAZINE a
supervised literacy problems related to prognosticating linear regression model in which cigarette smoking is the
nonstop numerical values. Supervised literacy problems explanatory variable, and the mounted variable is the lifespan
represent the class of problems where the value (Data) of the of an character measured in years.
independent or predictor variable (Features) and the
dependent or response variables are formerly known. The 2. Finance
known values of the dependent and independent variables are Linear regression and the beta concept are used for
used to come up with a fine model or formula, also called a assessment and evaluation of the systematic chance of
direct retrogression equation, which is latterly used to investment. This comes straight away from the beta
prognosticate or estimate affair given the value of input coefficient of the linear regression model that relates the cross
features (the independent variable). In machine literacy tasks, again on the investment to the cross again on all risky assets.
direct retrogression is used to make the vaccination of
numerical values from a set of input values. The following is 3. Econometrics
an illustration of a uni variate direct retrogression analysis Linear regression is applied in economics as an optimization
representing the relationship between height and weight in tool. In current econometrics. turning into the street through
grown-ups using the retrogression line. The regression line is data elements reflecting paired values of the impartial and set
superimposed over the size-to-weight scatter diagram to up variables can be completed the use of linear regression
illustrate the direct relationship. estimating model.

Advantages of Direct Regression 4. Environmental Science


1. Linear Regression plays properly while the dataset is Environmental generation finds a large style of linear
linearly separable. We can use it to locate the character of regression applications. Environmental effect monitoring on
the connection many of the variables. fish and benthic surveys to estimate the effect of metal mine
2. Linear Regression is simpler to implement, interpret and or paper pulp at the aquatic environment uses linear
really green to train. regression techniques.
3. Linear regression is vulnerable to over-becoming
however it is able to be averted the use of a few Assumption of Linear Regression
dimensionality discount strategies, regularization (L1 and Linear regression is a beneficial statistical approach we are
L2) strategies and cross-validation able to use to apprehend the connection among variables, x
4. After the linear regression method, the exponential and y. However, earlier than we behaviour linear regression,
regression method is an smooth one to recognize and we need to first make certain that 4 assumptions are met:
observe due to the fact most effective 3 data portions are
required for exponential regression. 1. Linear dating
5. It produces correct forecasts. The forecast is correct if the There exists a linear dating among the impartial variable, x,
estimate of the difference among the real projections and and the structured variable.
what has occurred is lower. 2. Independence: The residuals are impartial. In particular,
6. When you realize the connection among the impartial and there may be no correlation among consecutive residuals in
based variable have a linear relationship, this set of rules time collection data.
is the pleasant to apply due to it is much less complexity 3. Homoscedasticity: The residuals have regular variance at
to as compared to different algorithms. each degree of x.
4. Normality: The residuals of the version are typically
Disadvantages of Linear Regression distributed. If one or greater of those assumptions are
Main trouble of Linear Regression is the idea of linearity violated, then the consequences of our linear regression can
among the established variable and the impartial variables. In be unreliable or maybe misleading. In this post, we offer an
the actual world, the facts are hardly ever linearly separable. It cause of every assumption, the way to decide if the belief is
assumes that there may be a straight-line courting among the met, and what to do if the belief is violated.
established and impartial variables which is wrong many The line is continually a directly line- There isn't anyt any
times. curve or grouping element for the duration of the conduction
1. On the alternative hand in linear regression approach of a linear regression. There is a linear courting among the
outliers could have large consequences at the regression variables (based variable and unbiased variable). If the facts
and obstacles are linear on this approach. fails the assumptions of homoscedasticity or normality, a
2. Diversely, linear regression assumes a linear dating nonparametric take a look at is probably used. (For example,
among based and unbiased variables. That way it the Spearman rank take a look at). Example of facts that fails
assumes that there may be a straight-line dating among to fulfil the assumptions: One might imagine that cured meat
them. It assumes independence among attributes. intake and the occurrence of colorectal most cancers within
3. But then linear regression additionally seems at a dating side the U.S have a linear courting. But later on, it involves
among the implies of the based variables and the the information that there's a completely excessive variety
unbiased variables. Just because the imply isn't a entire distinction among the gathering of facts of each the variables.
description of a unmarried variable, linear regression isn't Since the homoscedasticity assumption is being violated here,
a entire description of relationships amongst variables. there may be no linear regression take a look at. However, a

~134~
International Journal of Statistics and Applied Mathematics https://www.mathsjournal.com

Spearman rank take a look at may be finished to realize Residual want to comply with a bell delivery with the
approximately the connection among the given variables. recommendation of 0. In unique words, if we draw a
histogram of the residual term it wants to be a bell shape
Limitations of the Linear Regression curve having an average close to 0 with regular popular
We cannot take a look at linear regression blindly on any of deviation. Residual Terms following normal distribution The
the datasets. The records desires to be within side the normality assumption of errors is important because of the
constraint such that we're capable of take a look at a Linear truth on the identical time as predicting individual facts
Regression set of guidelines on it. There are a few boundaries points, the self -guarantee interval spherical that prediction
that need to be satisfied. These are: assumes that the residuals are generally distributed. We need
to use 'Generalised Linear Models' if we want to lighten up
• Linearity the normality.
• Constant Error Variance
• No autocorrelation of the Multicollinearity
• Residuals Multicollinearity takes place while the independent variable X
• Normal Errors is predicated upon on the alternative independent variable. In
• Multicollinearity 4% error a model with correlated variables, it's miles difficult to
• Exogeneity or Omitted Variable Bias determine out the real relationship many of the independent
and primarily based totally variable. In unique words, it turns
Linearity into difficult to find out which independent variable is simply
The relationship many of the aim variable and the contributing to assume the primarily based totally variables.
independent variable want to be linear. Linear Relationship vs Additionally, with correlated variables, the coefficient of the
No Relationship amongst independent and established independent variable is predicated upon on the alternative
variables Sometimes the immediately line may not be the variables present within side the dataset. If this happens we
right in shape to facts and we can also need to choose the will come to be with an incorrect cease of independent
polynomial function like beneathneath root, square root, log, variables contributing to the prediction of the primarily based
and so on to in shape the facts. totally variable. The great way to check for the
multicollinearity is with the useful resource of the use of
Constant Error Variance (Homoscedasticity or no plotting heat map. The variables having immoderate
Heteroskedasticity) correlations are multicollinearity. Heat map of sklearn boston
Homoscedasticity describes a scenario in which the error term dataset.
is the same at some stage in all values of the independent Exogeneity or Omitted Variable Bias:
variables. If we have got were given a dataset wherein the Before we understand Exogeneity it's far crucial to understand
spread of the data or variance will growth as X will growth while we generate a linear regression line there can be an
then there can be a problem. And it'd now no longer be the mistakess associated with it. (It is not like residual).
great idea to use linear regression in such scenarios. Or in
unique words, the residuals of the elements want to now no y = ax1 + bx2 + . . . . . + nxn +
longer observe any pattern. Let's plot a scatter plot amongst
primarily based totally and independent variables: To check wherein represents all of the factors that impact the aim
the heteroskedasticity of the data, we plot residual plot and variables and is not covered within side the model. Consider a
the expected cease end result is that the plot want to be feature A which is not covered with inside the model. So, it's
randomly spread out and there want to now no longer be any far the part of the error term. And moreover, A has a
patterns. immoderate correlation with the x2 and y variable. This will
make coefficient b as biased and will now no longer be an
Independent Error Terms or No autocorrelation of the actual coefficient. (i.e. Sample is not a reflected photo of
residuals population value) Or in unique words, A variable is correlated
The residual term need to now no longer depend on the with an independent variable within side the model, and with
previous residual term. Or in unique words, y(x) is relying on the error term. And the real model to be anticipated is: but we
y(x+1). This assumption makes experience while we are by skip over zi while we run our regression. Therefore, zi
dealing with time series related data. Consider an example of receives absorbed thru the error term and we will definitely
the stock rate, wherein the present-day rate is relying at the estimate: (where in) If the correlation of and is not 0 and one
previous rate. This violates the concept of the Independent after the alternative affects, then is correlated with the error
Error Terms. term. Therefore, Exogeneity or Omitted Variable Bias takes
place while a statistical model leaves out one or extra relevant
Normal Errors variables. The exceptional way to deal with endogeneity
Residual want to conform with a bell-fashioned distribution issues is through instrumental variables (IV) techniques. And
with the advice of 0. In unique words, if we draw a histogram the most now no longer unusual place IV estimator is Two
of the residual term it wants to be a bell shape curve having Stage Least Squares (TSLS).
an average close to 0 with regular popular deviation. Residual
Terms following normal distribution. The normality Types of Linear Retrogression
assumption of errors is important because of the truth on the Typically, direct retrogression is divided into two types
identical time as predicting individual facts points, the self- multiple direct retrogression and simple direct retrogression.
guarantee interval spherical that prediction assumes that the so, for better concurrence, We will bandy these types in
residuals are generally distributed. We need to use detail.
'Generalised Linear Models' if we want to lighten up the
normality assumption.

~135~
International Journal of Statistics and Applied Mathematics https://www.mathsjournal.com

Simple Linear Retrogression: Simple direct retrogression is relative have an effect on of one or more predictor variables
a statistical system that allows us to epitomize and study to the criterion value. The real belongings agent may also
connections between two nonstop need to find out that the dimensions of the homes and the
(Quantitative) variables. Using simple direct retrogression, it's extensive type of bedrooms have a sturdy correlation to the
possible to identify connections between two quantitative fee of a home, on the identical time because the proximity to
variables. One can use simple direct retrogression to establish, schools has no correlation at all, or perhaps a horrible
1. How tightly are two variables related to one another (for correlation if it`s far generally a retirement community. The
case, how downfall and soil corrosion are related)? second benefit is the capability to understand outliers, or
2. The Quantum of the independent variable at a specific anomalies.
position that the dependent variable is at (e.g., the
quantum of soil corrosion at a certain position of Disadvantages of Multiple Regressions
downfall). Any downside of the usage of a a couple of regression model
usually it comes right all the way down to the records being
Advantages of simple linear regression used. Two examples of this are the usage of incomplete
The biggest advantage of linear regression models is their records and falsely concluding that a correlation is a
linearity – this means that the estimation procedure is easy to causation. Linear regression executes poorly when there are
understand and follow on a modular level. Additionally, these non-linear relationships.
equations are straightforward to interpret, making them easier
to comprehend than nonlinear models. Applications of Multiple Regressions
It can be used to prognosticate the relationship between
Disadvantages of simple linear regression reckless driving and the total number of road accidents caused
The sup position of linearity between dependent and by a motorist or, to use a business illustration, the effect on
independent variables it is frequently relatively prone to noise deals and spending a certain amount of money on advertising.
and overfitting. Linear regression relatively sensitive to Retrogression is one of the most common models of machine
outlier. It's prone to multicollinearity. literacy.

Applications of simple linear regression Conclusion


Marks scored by scholars grounded on number of hours Regression techniques are the types of predictive modelling
studied (immaculately)- Then marks scored in examinations techniques that investigate the correspondence among two
are independent and the number of hours studied is variables in which one is dependent and the other is an
independent. Predicting crop yields grounded on the quantum independent variable. Many regression techniques have been
of downfall- Yield is a dependent variable while the measure developed and many more are in process of making. I have
of rush is an independent variable. Predicting the Salary of a discussed about linear regression. Linear regression is simple
person grounded on times of experience- thus, Experience to implement but does not give accurate results. Regression
becomes the independent while Salary turns into the techniques are useful statistical methods that can be leveraged
dependent variable. to estimate the degree to which independent variables are
affecting dependent variables. These regression techniques
Multiple Linear Regression should be implemented according to the limits defined on the
Multiple direct retrogression relate to a statistical trend that's given data set. One of the best ways to explore which
used to prognosticate the consequence of a variable grounded regression technique should be implemented on the problem
on the values of two or further variables. It's occasionally is to check the family of the variables involved in that
known simply as multiple retrogression and it's an extension problem.
of direct retrogression. The variable that we want to predict
forecast is known as the dependent variable, while the References
variables we use to prognosticate forecast the value of the 1. Wi YM, Joo SK, Song KB. Holiday load forecasting
dependent variable are known as independent or annotative using fuzzy polynomial regression with 17 Elektron
variables. Magazine weather feature selection and adjustment. IEEE
Trans Power Syst. 2011;27(2):596–603.
Multiple Linear Regression Formula 2. Hu Z, Gao J. Uncertain Gompertz regression model with
imprecise observations. Soft Comput. 2020;24(4):2543–
𝑦𝑖 =β0+β1Xi1+β2Xi2+…+βpXip+Є 2549.
3. Roush W, Dozier W, Branton S. Comparison of
Where, Gompertz and neural network models of broiler growth.
• yi is the dependent or predicted variable Poult Sci. 2006;85(4):794–797.
• β0 is the y-intercept, i.e., the value of y when both xi and 4. Gabriel JM, Moris YG, Moral CP, Calvo CM, Beltrá RL.
x2 are 0. A Gompertz regression model for fern spore germination.
• β1 and β2 are the regression coefficients representing the Anales del Jardn Botánico de Madrid. 2015;72(1):1–8.
change in y relative to a one-unit change in xi1 and xi2, 5. Wu L, You S, Dong J, Liu Y, Bilke T. Multiple direct
respectively. retrogression-ground disturbance magnitude estimations
• p is the slant coefficient for each independent variable for bulk power systems. In: 2018 IEEE Power & Energy
• ϵ is the model’s random error (residual) term. Society General Meeting (PESGM). IEEE; c2018. p. 1–5.
6. Ciulla G, Amico AD. Structure energy performance
Advantages of Multiple linear regressions soothsaying: A multiple direct retrogression approach.
There are main blessings to reading statistics using a multiple Appl Energy. 2019;253:113500.
regression model. The first is the capability to determine the
~136~
International Journal of Statistics and Applied Mathematics https://www.mathsjournal.com

7. Hong T, Gui M, Baran ME, Willis HL. Modelling and


vaticinating hourly electric cargo by multiple direct
retrogression with relations. In: IEEE PES General
Meeting. IEEE; c2010. [Specify page range].
8. Chung W. Using the fuzzy direct retrogression system to
standard the energy effectiveness of marketable
structures. Appl Energy. 2012;95:45–49.

~137~

You might also like