0% found this document useful (0 votes)

51 views

Linear Regression Report

This document discusses linear regression, a numerical method and machine learning technique. Linear regression finds the relationship between variables to make predictions. It draws a best fit line through data points to model relationships between dependent and independent variables. The document outlines applications of linear regression such as machine learning, trend estimation, and economics. It is commonly used for forecasting, analyzing impacts of independent variables, and predicting values in various fields.

Uploaded by

dodla.naidu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

Linear Regression Report

Uploaded by

dodla.naidu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/342452947

Numerical method1 Linear regression report

Technical Report · June 2020

CITATIONS READS
0 1,938

1 author:

Hussein Ali Ahmed Ghanim

University of Miskolc
24 PUBLICATIONS 16 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

E-Tuttoring Systems View project

All content following this page was uploaded by Hussein Ali Ahmed Ghanim on 25 June 2020.

The user has requested enhancement of the downloaded file.

UNIVERSITY OF MISKOLC
FACULTY OF MECHANICAL
ENGINEERING AND INFORMATICS
COMPUTER SCIENCE DEPARTMENT
Numerical method1

Linear regression report

Ghanim Hussein Ali Ahmed

Ph.D. student in computer science

2019-2020

-1-
Table of Contents

INTRODUCTION....................................................................................................................

Applications of Linear Regression ...........................................................................................

Linear Regression ....................................................................................................................
Types of Linear Regression .........................................................................................................
Polynomial Regression .......................................................................................................
Multiple Linear Regression ................................................................................................
General Linear Least Squares............................................................................................
General Matrix Formulation for Linear Least Squares ......................................................
Statistical Aspects of Least-Squares Theory.......................................................................
Advantages Disadvantages of Linear Regression.....................................................................
Nonlinear Regression ...............................................................................................................
Bibliography.............................................................................................................................

1
INTRODUCTION
Numerical methods are techniques that formulate mathematical problems in such a way that they can
be solved with arithmetic operations that produce numerical results instead of closed-form results. The
majority of numerical methods require significant amounts of arithmetic calculations [1]. While numerical
methods are diverse in nature, they have one common characteristic: they inevitably require large numbers
of repetitive arithmetic calculations [1]. It's little wonder that the role of numeric methods in engineering
problem solving has increased dramatically in recent years with the advent of fast, powerful digital
computers [1].
Today, computers and numerical methods provide an alternative for such complicated calculations.
Using computer power to obtain solutions directly, you can approach these calculations without recourse
to simplifying assumptions or time-intensive techniques. Although analytical solutions are still extremely
valuable both for problem solving and for providing insight, numerical methods represent alternatives that
greatly enlarge your capabilities to confront and solve problems. As a result, more time is available for the
use of your creative skills. Thus, more emphasis can be placed on problem formulation and solution
interpretation and the incorporation of total system,
Since the late 1940s the widespread availability of digital computers has led to a veritable explosion in
the use and development of numerical methods. At first, this growth was somewhat limited by the cost of
access to large mainframe computers, and, consequently, many engineers continued to use simple analytical
approaches in a significant portion of their work. Needless to say, the recent evolution of inexpensive
personal computers has given us ready access to powerful computational capabilities or “holistic,”
awareness
We live in the era of vast quantities of data, powerful computers and artificial intelligence [2]. Data
science and machine learning are driving the identification of photos, the development of autonomous
vehicles, decisions in the financial and energy sectors, developments in medicine, social networks and
more. Linear regression is an important part of this [2].
There are many aaplication of numerical methods in computer sciene fields, the most used methods are
Mathematical Modeling and Problem Solving, interpolation, optimization, and Linear regression. Linear
regression is one of the basic techniques in numerical methods and machine learning. Whether you're
looking to do math, machine learning or scientific computing, there's a fair chance you'll need it [2].
Linear Regression
Regression analysis is one of the main areas in statistics and machine learning [3]. There are plenty of
regression methods. Linear regression is one of them. The term regression is used when attempting to find
the relation between variables [3]. The relationship is used in Machine Learning, and in statistical modeling,
to predict the outcome of future events.
Regression analysis is used to estimate the relationship between a dependent variable and one or more
independent variables. This technique is widely applied to predict the outputs, forecasting the data,
analyzing the time series, and finding the causal effect dependencies between the variables. There are
several types of regression techniques at hand based on the number of independent variables, the
dimensionality of the regression line, and the type of dependent variable. Out of these, the two most popular
regression techniques are linear regression and logistic regression.
Researchers use regression to indicate the strength of the impact of multiple independent variables on a
dependent variable on different scales. Regression has numerous applications. For example, consider a data
set consisting of weather information recorded over the past few decades. Using that data, we could forecast
weather for the next couple of years. Regression is also widely used in organizations and businesses to
assess risk and growth based on previously recorded data.
Linear regression is perhaps one of the most important and widely used regression techniques. It’s
among the easiest regression methods [4]. One of its key benefits is the ease of interpreting results. Linear
regression uses the relationship between the data-points to draw a straight line across all of them [4]. This
line is able to be used to predict future values. Linear regression is essentially a linear method to model the
relationship between your independent variables and and dependent variables [4]. which means that let’s
assume if we have a scatter plot with some points on it, the aim for linear regression is to make a line that
can be as close as to all the points as possible.
2
Applications of Linear Regression
There are many applications to linear regression such as machine learning, trend estimation, and
economics. The most common supervised learning machine learning algorithm is the linear regression
because of its’ simplicity and the fact that it has been around for a while [4]. Trend estimation also
extensively uses linear regression because after all regression is the prediction of results with the continuous
output [4]. In economics, many things are also predicted using linear regression such as labor demand and
supply, consumption spending and etc [4].
Applications of linear regression in machine learning:
Linear regression is one of the most common used algorithm machine learning processes in the world
and it helps prepare businesses in a volatile and dynamic environment. Machine Learning needs to be
supervised for the computers to effectively and efficiently utilize their time and efforts. One of the top ways
to do it is through linear regression.
Linear regrissions simply put, machines need to be supervised in order to effectively learn new
things. The biggest ability of machines is that they can learn about the problem and execute solutions
seamlessly. This greatly reduces and eliminates human error.
It is also used to find the relationship between forecasting and variables. A task is performed based on
a dependable variable by analyzing the impact of an independent variable on it. Those proficient in
programming software such as Python, C can sci-kit learn the library to import the linear regression model
or create their own custom algorithm before applying it to the machines. This means that it is highly
customizable and easy to learn. Organizations across the world are heavily investing in linear regression
training when it comes to their employees in order to prepare the workforce for the future.
The main benefits of linear regression in machine learning is as follows.
Forecasting
A main advantage of using a linear regression model in machine learning is the ability to forecast trends
and make predictions that are feasible. Data scientists can use these predictions and make further deductions
based on machine learning [5]. It is quick, efficient, and accurate. This is predominantly since machines
process large volumes of data and there is minimum human intervention [5]. Once the algorithm is
established, the process of learning becomes simplified.
Beneficial to small businesses
By altering one or two variables, machines can understand the impact on sales. Since deploying linear
regression is cost-effective, it is greatly advantageous to small businesses since short-term and long-term
forecasts can be made when it comes to sales [5]. This means that small businesses can plan their resources
well and create a growth trajectory for themselves [5]. They will also be to understand the market and its
preferences and learn about supply and demand [5].
Preparing Strategies
Since machine learning enables prediction, one of the biggest advantages of a linear regression model
in it is the ability to prepare a strategy for a given situation, well in advance, and analyze various outcomes
[5]. Meaningful information can be derived from the regression model of forecasting thereby helping
companies plan strategically and make executive decisions [5].
Types of Linear Regression
Linear Regression is generally classified into three types such as Simple Linear Regression, Multible
Linear Regression, and Polynomial Regression
1. Simple Linear Regression
Simple Linear regression has only 1 predictor variable and 1 dependent variable. the given equation for
the Linear Regression, 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 . If there is only 1 predictor available then it is known as Simple
Linear Regression.
𝑇ℎ𝑒 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑆𝐿𝑅 𝑤𝑖𝑙𝑙 𝑏𝑒 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖, 𝛽1 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑓𝑜𝑟 𝑋1 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑎𝑛𝑑 𝛽0 𝑖𝑠 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
While executing the prediction, there is an error term which is associated with the equation.
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝜀, 𝜀 𝑖𝑠 𝑡ℎ𝑒 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑤𝑖𝑡ℎ 𝑒𝑎𝑐ℎ 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
The goal of the SLR model is to find the estimated values of β1 & β0 by keeping the error term (ε)
minimum.
3
Case study: implementing Simple Linear Regression in Python

2. Multiple Linear Regression

Multiple Linear Regression is a type of Linear Regression when the input has multiple features
(variables).
Model Representation
Similar to Simple Linear Regression, we have input variable(X) and output variable(Y). But the input
variable has n features. Therefore, we can represent this linear model as follows;

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + ⋯ + 𝛽𝑛 𝑋𝑛
β1 = coefficient for X1 variable
β2 = coefficient for X2 variable
β3 = coefficient for X3 variable and so on…
β0 is the intercept (constant term). While doing the prediction, there is an error term which is associated
with the equation.
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + ⋯ + 𝜀𝑖 , 𝜀𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑤𝑖𝑡ℎ 𝑒𝑎𝑐ℎ 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
The goal of the MLR model is to find the estimated values of β 0 , β1 , β2 , β3 … by keeping the error term
(i) minimum. Broadly speaking supervised machine learning algorithms are classified into two types-
Regression: Used to predict continuous variable and Classification: Used to predict discrete variable.
Assumptions for Multiple Linear Regression
1. Linearity: There should be a linear relationship between dependent and independent variables like
shown in the below example graph.

2. Multicollinearity: There should not be high correlation between two or more independent variables.
Multicollinearity can be checked using correlation matrix, Tolerance and Variance Influencing Factor
(VIF).
3. Homoscedasticity: If Variance of errors are constant across independent variables, then it is called
Homoscedasticity. The residuals should be homoscedastic. Standardized residuals versus predicted values
is used to check homoscedasticity as shown in the below figure. Breusch-Pagan and White tests are the
famous tests used to check Homoscedasticity. Q-Q plots are also used to check homoscedasticity.

4
4. Multivariate Normality: Residuals should be normally distributed.
5. Categorical Data: Any categorical data present should be converted into dummy variables.
6. Minimum records: There should be at least 20 records of independent variables.
Mathematical formulation of Multiple Linear Regression
In Linear Regression, we try to find a linear relationship between independent and dependent variables
by using a linear equation on the data.
The equation for linear line is- Y=mx + c
Where m is slope and c is intercept.
In Linear Regression, we are actually trying to predict the best m and c values for dependent variable Y
and independent variable x. We fit as many lines and take the best line that gives the least possible error,
we use the corresponding m and c values to predict y value.
The same concept can be used in multiple Linear Regression where we have multiple independent
variables, x1 , x2 , x3 …xn. the equation changes to-
Y=M1 X 1 + M2 X 2 + M3 M3 + …Mn X n+C
The above equation is not a line but a plane of multi dimensions.
Model Evaluation:
Model can be evaluated by using the below methods-
Mean absolute error: It is the mean of absolute values of the errors, formulated as-

Mean squared error: It is mean of square of errors.

Root mean squared error: It is just the square root of MSE.

Applications
1. Effect of independent variable on dependent variable can be calculated.
2. Used to predict trends.
3. Used to find how much change can be expected in a dependent variable with change in an
independent variable.
Case study: implementing Multible Linear Regression in Python

3. Polynomial Regression
Polynomial regression is a non-linear regression. In Polynomial regression, the relationship of the
dependent variable is fitted to the nth degree of the independent variable.
Equation of polynomial regression:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖1 + 𝛽2 𝑋𝑖2 + 𝛽3 𝑋𝑖3 + 𝛽4 𝑋𝑖4 + 𝛽5 𝑋𝑖5 + ⋯ + 𝛽𝑛 𝑋𝑖𝑛 + 𝜀 (𝑖 = 1,2,3,… 𝑛)

5
Underfitting and Overfitting
When we fit a model, we try to find the optimised, best-fit line, which can describe the impact of the
change in the independent variable on the change in the dependent variable by keeping the error term
minimum. While fitting the model, there can be two events which will lead to the bad performance of the
model. These events are
• Underfitting
• Overfitting
Underfitting
Underfitting is the condition where the model could not fit the data well enough. The under-fitted model
leads to low accuracy of the model. Therefore, the model is unable to capture the relationship, trend or
pattern in the training data. Underfitting of the model could be avoided by using more data, or by optimising
the parameters of the model.
Overfitting
Overfitting is the opposite case of underfitting, i.e., when the model predicts very well on training data
and is not able to predict well on test data or validation data. The main reason for overfitting could be that
the model is memorising the training data and is unable to generalise it on test/unseen dataset. Overfitting
can be reduced by doing feature selection or by using regularisation techniques.

Above graphs depict the three cases of the model performance.

Case study: implementing Linear Regression in Python
Dataset Introduction
The data concerns city-cycle fuel consumption in miles per gallon(mpg) to be predicted. There are a total
of 392 rows, 5 independent variables, and 1 dependent variable. All 5 predictors are continuous variables.
Attribute Information:
mpg: continuous (Dependent Variable)
cylinders: multi-valued discrete
displacement: continuous
horsepower: continuous
weight: continuous
acceleration: continuous
The objective of the problem statement is to predict the miles per gallon using Linear Regression model.
Read the data. Let’s take a look at what the data looks like:

From the above graph, we can clearly infer that there is a negative linear relationship between horsepower
and miles per gallon (mpg). With horsepower increasing, mpg is decreasing.
Now, let’s perform the Simple linear regression.
From the output of the above SLR model, the equation of the best fit line of the model is
mpg = 39.94 + (-0.16)*(horsepower)
By comparing the above equation to the SLR model equation Yi= βiXi + β0 , β0=39.94, β1=-0.16
6
Now, check for the model relevancy by looking at its R2 and RMSE Values
R2 and RMSE (Root mean square) values are 0.6059 and 4.89 respectively. It means that 60% of the
variance in mpg is explained by horsepower. For a simple linear regression model, this result is okay, but
not so good since there could be an effect of other variables like cylinders, acceleration etc. RMSE value
is also very less. Let’s check how the line fits the data

From the graph, we can infer that the best fit line is able to explain the effect of horsepower on mpg.
Multiple Linear Regression With scikit-learn
Since the data is already loaded in the system, we will start performing multiple linear regression.
The actual data has 5 independent variables and 1 dependent variable (mpg)
The best fit line for Multiple Linear Regression is
Y=46.26+-0.4cylinders+-8.313e-05displacement+-0.045horsepower + -0.01weight + -0.03acceleration
By comparing the best fit line equation with 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖1 + 𝛽2 𝑋𝑖2 + 𝛽3 𝑋𝑖3 + ⋯
β0 (Intercept)= 46.25, β1 = -0.4, β2 = -8.313e-05, β3 = -0.045, β4 = 0.01, β5 = -0.03
Now, let’s check the R2 and RMSE values.
R2 and RMSE (Root mean square) values are 0.707 and 4.21 respectively. It means that ~71% of the
variance in mpg is explained by all the predictors. This depicts a good model. Both values are less than the
results of Simple Linear Regression that means that adding more variables to the model will help in good
model performance. However, the more the value of R2 and least RMSE, the better the model will be.
Multiple Linear Regression- Implementation using Python. Let us take a small data set and try out a
building model using python.

The above figure shows the top 5 rows of the data. We are actually trying to predict Sel (dependent variable)
based on the independent variables Temp. We first check for our assumptions in our data set.
1. Check for Linearity

7
We could see from that above graph, there exists a linear relationship between Sel and Temp.
2. Check for Multicollinearity

3. Check for Homoscedasticity

8
General Linear Least Squares

Nonlinear Regression

THE MATH AND LOGIC BEHIND LINEAR REGRESSION

The goal of linear regression is to identify the best fit line passing through continuous data by employing
a specific mathematical criterion. This technique falls under the umbrella of supervised machine learning.
Machine learning is broadly classified into three types; supervised learning, unsupervised learning, and
reinforcement learning. This classification is based on the data that we give to the algorithm. In supervised
learning, we train the algorithm with both input and output data. Unsupervised learning occurs when there’s
no output data given to the algorithm and it has to learn the underlying patterns by analyzing the input data.
Finally, reinforcement learning involves an agent taking an action in an environment to maximize the
reward in a particular situation. It paves the way for choosing the best possible path for an algorithm to
traverse. Now, let’s look more closely at linear regression itself.
Linear regression assumes that the relationship between the features and the target vector is
approximately linear. That is, the effect (also called coefficient, weight, or parameter) of the features on
the target vector is constant. Mathematically, linear regression is represented by the equation y=mx+c+ε.
In this equation, y is our target, x is the data for a single feature, m and c are the coefficients identified
by fitting the model, and ε is the error.
Now, our goal is to tune the values of m and c to establish a good relationship between the input
variable x and the output variable y. The variable m in the equation is called variance and is defined as the
amount by which the estimate of the target function changes if different training data were used. The
variable c represents the bias, the algorithm’s tendency to consistently learn the wrong things by not taking
into account all the information in the data. For the model to be accurate, bias needs to be low. If there are
any inconsistencies or missing values in the data set, bias increases. Hence, we must carry out proper
preprocessing of the data before we train the algorithm.

9
The two main metrics we use to evaluate linear regression models are accuracy and error. For a model
to be highly accurate with minimum error, we need to achieve low bias and low variance. We partition the
data into training and testing data sets to keep bias in check and ensure accuracy.
A DEEP DIVE INTO LINEAR REGRESSION
Before we build a supervised machine learning model, all we have is data comprising inputs and outputs.
To estimate the dependency between them using linear regression, we pick two random values, variance
and bias. Thereby, we consider a tuple from the data set, feed the input values to the equation y = mx +
c, and predict the new values. Later, we calculate the loss incurred by the predicted value using a loss
function.
The values of m and c are picked randomly, but they must be updated to minimize the error. We thereby
consider loss function as a metric to evaluate the model. Our goal is to obtain a line that best reduces the
error. The most common loss function used is mean squared error. It is mathematically represented as

If we don’t square the error, the positive and negative points cancel each other out. The static
mathematical equations of bias and variance are as follows:

When we train a network to find the ideal variance and bias, different values can yield different errors.
Out of all the values, there will be one point where the error value will be minimized, and the parameters
corresponding to this value will yield an optimal solution. At this point, gradient descent comes into the
picture.
Gradient descent is an optimization algorithm that finds the values of parameters (coefficients) of a
function (f) to minimize the cost function (cost). The learning rate defines the rate at which the parameters
are updated. It controls the rate at which we would be adjusting the weights of our network with respect to
the loss gradient. The lower the value, the slower we travel the downward slope along which the weights
get updated at every step.
Both the m and c values are updated as follows:

Once the model is trained and achieves a minimum error, we can fix the values of bias and variance.
Ultimately, this is how the best fit line looks like when plotted between the data points:

Linear Regression Model Representation

There are four assumptions associated with a linear regression model:

1. Linearity: The relationship between independent variables and the mean of the dependent variable
is linear.

10
2. Homoscedasticity: The variance of residuals should be equal.
3. Independence: Observations are independent of each other.
4. Normality: For any fixed value of an independent variable, the dependent variable is normally
distributed.

As such, linear regression was developed in the field of statistics and is studied as a model for
understanding the relationship between input and output numerical variables, but has been borrowed by
machine learning. It is both a statistical algorithm and a machine learning algorithm.
Linear regression is an attractive model because the representation is so simple. The representation is a
linear equation that combines a specific set of input values (x) the solution to which is the predicted output
for that set of input values (y). As such, both the input values (x) and the output value are numeric.
Linear Regression Learning the Model
Learning a linear regression model means estimating the values of the coefficients used in the
representation with the data that we have available.
In this section we will take a brief look at four techniques to prepare a linear regression model. This is
not enough information to implement them from scratch, but enough to get a flavor of the computation and
trade-offs involved.
There are many more techniques because the model is so well studied. Take note of Ordinary Least
Squares because it is the most common method used in general. Also take note of Gradient Descent as it is
the most common technique taught in machine learning classes.
1. Simple Linear Regression
With simple linear regression when we have a single input, we can use statistics to estimate the
coefficients. This requires that you calculate statistical properties from the data such as means, standard
deviations, correlations and covariance. All of the data must be available to traverse and calculate statistics.
2. Ordinary Least Squares
When we have more than one input we can use Ordinary Least Squares to estimate the values of the
coefficients. The Ordinary Least Squares procedure seeks to minimize the sum of the squared residuals.
This means that given a regression line through the data we calculate the distance from each data point to
the regression line, square it, and sum all of the squared errors together. This is the quantity that ordinary
least squares seeks to minimize. This approach treats the data as a matrix and uses linear algebra operations
to estimate the optimal values for the coefficients. It means that all of the data must be available and you
must have enough memory to fit the data and perform matrix operations. It is unusual to implement the
Ordinary Least Squares procedure yourself unless as an exercise in linear algebra. It is more likely that you
will call a procedure in a linear algebra library. This procedure is very fast to calculate.
3. Gradient Descent
When there are one or more inputs you can use a process of optimizing the values of the coefficients by
iteratively minimizing the error of the model on your training data. This operation is called Gradient
Descent and works by starting with random values for each coefficient. The sum of the squared errors are
calculated for each pair of input and output values. A learning rate is used as a scale factor and the
coefficients are updated in the direction towards minimizing the error. The process is repeated until a
minimum sum squared error is achieved or no further improvement is possible. When using this method,
you must select a learning rate (alpha) parameter that determines the size of the improvement step to take
on each iteration of the procedure. Gradient descent is often taught using a linear regression model because
it is relatively straightforward to understand. In practice, it is useful when you have a very large dataset
either in the number of rows or the number of columns that may not fit into memory.

11
4. Regularization
There are extensions of the training of the linear model called regularization methods. These seek to
both minimize the sum of the squared error of the model on the training data (using ordinary least squares)
but also to reduce the complexity of the model (like the number or absolute size of the sum of all coefficients
in the model). Two popular examples of regularization procedures for linear regression are:
• Lasso Regression : where Ordinary Least Squares is modified to also minimize the absolute sum of
the coefficients (called L1 regularization).
• Ridge Regression : where Ordinary Least Squares is modified to also minimize the squared absolute
sum of the coefficients (called L2 regularization).
These methods are effective to use when there is collinearity in your input values and ordinary least
squares would overfit the training data.
Preparing Data for Linear Regression
Linear regression is been studied at great length, and there is a lot of literature on how your data must
be structured to make best use of the model.
As such, there is a lot of sophistication when talking about these requirements and expectations which
can be intimidating. In practice, you can uses these rules more as rules of thumb when using Ordinary Least
Squares Regression, the most common implementation of linear regression.
• Linear Assumption. Linear regression assumes that the relationship between your input and output
is linear. It does not support anything else. This may be obvious, but it is good to remember when
you have a lot of attributes. You may need to transform data to make the relationship linear (e.g. log
transform for an exponential relationship).
• Remove Noise. Linear regression assumes that your input and output variables are not noisy.
Consider using data cleaning operations that let you better expose and clarify the signal in your data.
This is most important for the output variable and you want to remove outliers in the output variable
(y) if possible.
• Remove Collinearity. Linear regression will over-fit your data when you have highly correlated
input variables. Consider calculating pairwise correlations for your input data and removing the most
correlated.
• Gaussian Distributions. Linear regression will make more reliable predictions if your input and
output variables have a Gaussian distribution. You may get some benefit using transforms (e.g. log
or BoxCox) on your variables to make their distribution more Gaussian looking.
• Rescale Inputs: Linear regression will often make more reliable predictions if you rescale input
variables using standardization or normalization.
Advantages and disadvantages of Linear Regression
Advantages:
1. Linear Regression performs well when the dataset is linearly separable. We can use it to find the nature
of the relationship among the variables.
2. Linear Regression is easier to implement, interpret and very efficient to train.
3. Linear Regression handles over-fitting prety well using dimensionally reduction techniques,
regularization, and cross-validation.
4. The most common uses for linear regression are to predict results for a given data set.
5. The extrapolation beyond a specific data set.
Disadvanages:
1. The assumption of linearity between dependent and independent variables.
2. It is often quite prone to noise and overfitting
3. Linear regression is quite sensitive to outliers
4. It is prone to multicolinearity
Linear Regression Use Cases
• Sales Forecasting
• Risk Analysis

12
• Housing Applications to Predict the prices and other factors
• Finance Applications to Predict Stock prices, investment evaluation, etc.

The basic idea behind linear regression is to find the relationship between the dependent and independent
variables. It is used to get the best fitting line that would predict the outcome with the least error. We can
use linear regression in simple real-life situations, like predicting the SAT scores with regard to the number
of hours of study and other decisive factors.

With this in mind, let us take a look at a use case.

Use Case – Implementing Linear Regression

The process takes place in the following steps:

1. Loading the Data

2. Exploring the Data
3. Slicing the Data
4. Train and Split Data
5. Generate the Model
6. Evaluate the accuracy

Bibliography

[1] R. P. C. Steven C. Chapra, Numerical Methods for Engineers, New York: Published by
McGraw-Hill Education, 2015.
[2] M. Stojiljkovic, "Linear Regression in Python," Real Python, 2012 -2020. [Online].
Available: https://realpython.com/linear-regression-in-python/. [Accessed 6 5 2020].
[3] "Machine Learning - Linear Regression," w3schools, [Online]. Available:
https://www.w3schools.com/python/python_ml_linear_regression.asp. [Accessed 6 May
2020].
[4] A. Adib, "Basics of linear regression," Meduim, 6 November 2018. [Online]. Available:
https://medium.com/datadriveninvestor/basics-of-linear-regression-9b529aeaa0a5.
[Accessed 6 May 2020].
[5] Imarticus, "LINEAR REGRESSION AND ITS APPLICATIONS IN MACHINE
LEARNING," Imarticus, 19 March 2019. [Online]. Available:
https://blog.imarticus.org/linear-regression-and-its-applications-in-machine-learning/.
[Accessed 9 May 2020].

View publication stats

Mugambi Transport Management System
No ratings yet
Mugambi Transport Management System
12 pages
Table of Content
No ratings yet
Table of Content
15 pages
Agile Product Team Performance Concept
No ratings yet
Agile Product Team Performance Concept
46 pages
Development of a Machine Learning-Based Financial Risk Control Sy
No ratings yet
Development of a Machine Learning-Based Financial Risk Control Sy
70 pages
Proposal Defense Sentiment Analysis
No ratings yet
Proposal Defense Sentiment Analysis
13 pages
Profitoptimization 2015
No ratings yet
Profitoptimization 2015
8 pages
Informe en Ingles Matematicas
No ratings yet
Informe en Ingles Matematicas
5 pages
Intelligent Transportation Systems: Functional Design for Effective Traffic Management 2nd Edition Robert Gordon (Auth.) pdf download
100% (3)
Intelligent Transportation Systems: Functional Design for Effective Traffic Management 2nd Edition Robert Gordon (Auth.) pdf download
61 pages
Instant Download Intelligent Transportation Systems: Functional Design for Effective Traffic Management 2nd Edition Robert Gordon (Auth.) PDF All Chapters
100% (1)
Instant Download Intelligent Transportation Systems: Functional Design for Effective Traffic Management 2nd Edition Robert Gordon (Auth.) PDF All Chapters
55 pages
Daa Report-1
No ratings yet
Daa Report-1
21 pages
Computer Application in Economics
100% (2)
Computer Application in Economics
220 pages
DMO Assignment 11135
No ratings yet
DMO Assignment 11135
7 pages
A Review of Optimization Techniques_ Applications and Comparative
No ratings yet
A Review of Optimization Techniques_ Applications and Comparative
14 pages
AP Mini Project Report
No ratings yet
AP Mini Project Report
20 pages
What Is Monte Carlo Simulation
No ratings yet
What Is Monte Carlo Simulation
12 pages
Project Management Software and Its Utilities
No ratings yet
Project Management Software and Its Utilities
67 pages
Full download Mathematics Applied to Engineering and Management 1st Edition Mangey Ram (Editor) pdf docx
100% (5)
Full download Mathematics Applied to Engineering and Management 1st Edition Mangey Ram (Editor) pdf docx
55 pages
Zaya or
No ratings yet
Zaya or
6 pages
CVL 398 pd1
No ratings yet
CVL 398 pd1
41 pages
Man-Hour Estimation Model Development in BIM-Based
No ratings yet
Man-Hour Estimation Model Development in BIM-Based
14 pages
The Applications of Engineering Simulation Software
100% (2)
The Applications of Engineering Simulation Software
10 pages
Maha Hospital Management Case Study
No ratings yet
Maha Hospital Management Case Study
51 pages
Customer Churn
No ratings yet
Customer Churn
70 pages
(eBook PDF) Mathematics for Machine Technology 7th Edition pdf download
100% (6)
(eBook PDF) Mathematics for Machine Technology 7th Edition pdf download
46 pages
Quantitative Techniques
No ratings yet
Quantitative Techniques
264 pages
Dinesh Chandra Verma Principles of Computer Systems and Network Management
100% (1)
Dinesh Chandra Verma Principles of Computer Systems and Network Management
266 pages
Operations Research Applications in The Field of Information and Communication Technologies
No ratings yet
Operations Research Applications in The Field of Information and Communication Technologies
10 pages
Face Recognition
No ratings yet
Face Recognition
23 pages
The Applications of Chemical Engineering Simulation Software
100% (1)
The Applications of Chemical Engineering Simulation Software
9 pages
Advanced Decision Making For HVAC Engineers
100% (2)
Advanced Decision Making For HVAC Engineers
196 pages
c Ovid Management System Project Report
No ratings yet
c Ovid Management System Project Report
48 pages
6thsem Proposal
No ratings yet
6thsem Proposal
9 pages
Full-pages-1
No ratings yet
Full-pages-1
1 page
17 Computer Models and Software: Urban Stormwater Management Manual
No ratings yet
17 Computer Models and Software: Urban Stormwater Management Manual
19 pages
Full Mathematics Applied To Engineering and Management 1st Edition Mangey Ram (Editor) PDF All Chapters
100% (5)
Full Mathematics Applied To Engineering and Management 1st Edition Mangey Ram (Editor) PDF All Chapters
62 pages
Assignment ON Operation Research
No ratings yet
Assignment ON Operation Research
15 pages
Intelligent Transportation Systems: Functional Design for Effective Traffic Management 2nd Edition Robert Gordon (Auth.) - Download the full ebook now to never miss any detail
100% (5)
Intelligent Transportation Systems: Functional Design for Effective Traffic Management 2nd Edition Robert Gordon (Auth.) - Download the full ebook now to never miss any detail
62 pages
DRAFT About BAS
No ratings yet
DRAFT About BAS
5 pages
Extensible Operator Models
No ratings yet
Extensible Operator Models
20 pages
Visualizing Sorting Algorithms DAA Mini Project (1)
No ratings yet
Visualizing Sorting Algorithms DAA Mini Project (1)
25 pages
BBK Dissertation
100% (2)
BBK Dissertation
5 pages
Application of Linear Programming Techniques To Practical
100% (1)
Application of Linear Programming Techniques To Practical
13 pages
ICECMSN81
No ratings yet
ICECMSN81
16 pages
Fraud Detection Using Machine Learning
No ratings yet
Fraud Detection Using Machine Learning
36 pages
Cfwi 2014 Technical Paper No 8 - BP To Developing SD Based WF Models
No ratings yet
Cfwi 2014 Technical Paper No 8 - BP To Developing SD Based WF Models
77 pages
MTP Thesis Arun
No ratings yet
MTP Thesis Arun
32 pages
MY FINAL YEAR PROJECT[1]_BIGGY_BIGGY
No ratings yet
MY FINAL YEAR PROJECT[1]_BIGGY_BIGGY
22 pages
Linear Models in Matrix Form A Hands On Approach for the Behavioral Sciences Optimized DOCX Download
100% (3)
Linear Models in Matrix Form A Hands On Approach for the Behavioral Sciences Optimized DOCX Download
14 pages
Circularity-Indicators Methodology May2015
No ratings yet
Circularity-Indicators Methodology May2015
98 pages
2017 AutoDasp
No ratings yet
2017 AutoDasp
13 pages
Articulos Calculo Integral
No ratings yet
Articulos Calculo Integral
7 pages
Book Reviews: Computer Programs For Construction Management
No ratings yet
Book Reviews: Computer Programs For Construction Management
1 page
Design and Implementation of A Smart City Traffic Management System
No ratings yet
Design and Implementation of A Smart City Traffic Management System
12 pages
Management Science
No ratings yet
Management Science
8 pages
Se JR Handdbook 2015 PDF
No ratings yet
Se JR Handdbook 2015 PDF
23 pages
125977091
No ratings yet
125977091
7 pages
Introduction to Artificial Intelligence
No ratings yet
Introduction to Artificial Intelligence
15 pages
Intelligent Transportation Systems: Functional Design For Effective Traffic Management 2nd Edition Robert Gordon (Auth.)
100% (2)
Intelligent Transportation Systems: Functional Design For Effective Traffic Management 2nd Edition Robert Gordon (Auth.)
62 pages
New Directions On Model Predictive Control PDF
100% (3)
New Directions On Model Predictive Control PDF
232 pages
Financial Engineering: Statistics and Data Analysis
From Everand
Financial Engineering: Statistics and Data Analysis
Mohit Chatterjee
No ratings yet
Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020
100% (1)
Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020
32 pages
Download Full Multilevel Modeling Using R (Second Edition) W. Holmes Finch PDF All Chapters
100% (1)
Download Full Multilevel Modeling Using R (Second Edition) W. Holmes Finch PDF All Chapters
55 pages
BAFBANA - Chapter 1-4
No ratings yet
BAFBANA - Chapter 1-4
19 pages
Regression Modelling With Actuarial and Financial Applications - Key Notes
No ratings yet
Regression Modelling With Actuarial and Financial Applications - Key Notes
3 pages
Obe Syllabus (Oblicon)
No ratings yet
Obe Syllabus (Oblicon)
43 pages
Quantitative Techniques For Business
No ratings yet
Quantitative Techniques For Business
14 pages
Predictive Analysis Assignment Answers
No ratings yet
Predictive Analysis Assignment Answers
4 pages
1 Linear Regreesion Introduction
No ratings yet
1 Linear Regreesion Introduction
7 pages
Intrusion Detection System
No ratings yet
Intrusion Detection System
45 pages
Determinants of Profitability in Indian Automotive Industry
No ratings yet
Determinants of Profitability in Indian Automotive Industry
4 pages
Presentation - Final Thesis
No ratings yet
Presentation - Final Thesis
62 pages
Stock Market Prediction Using Machine Learning
100% (1)
Stock Market Prediction Using Machine Learning
7 pages
3 - Vaishnavi Seth Dr. Sharad Kumar
No ratings yet
3 - Vaishnavi Seth Dr. Sharad Kumar
30 pages
Power of Knockoff: The Impact of Ranking Algorithm, Augmented Design, and Symmetric Statistic
No ratings yet
Power of Knockoff: The Impact of Ranking Algorithm, Augmented Design, and Symmetric Statistic
67 pages
Question Paper Unsolved - Eco No Metrics
No ratings yet
Question Paper Unsolved - Eco No Metrics
21 pages
DAP_LabManual(22MCAL36)
No ratings yet
DAP_LabManual(22MCAL36)
31 pages
Aikaeli, Mkenda - 2014 - The Botswana Journal of Economics The Journal of The Botswana Economics Association (BEA)
No ratings yet
Aikaeli, Mkenda - 2014 - The Botswana Journal of Economics The Journal of The Botswana Economics Association (BEA)
23 pages
Comparing Two Regression Slopes by Means of An ANCOVA
No ratings yet
Comparing Two Regression Slopes by Means of An ANCOVA
4 pages
Module 5
No ratings yet
Module 5
26 pages
Stage-2 S-203 - Business Mathematics & Statistics
No ratings yet
Stage-2 S-203 - Business Mathematics & Statistics
3 pages
Business Statistics Guidelines
No ratings yet
Business Statistics Guidelines
6 pages
Stock Market Prediction Using Hidden Markov Model
No ratings yet
Stock Market Prediction Using Hidden Markov Model
4 pages
Statistical Analysis in Climate Research Hans Von
No ratings yet
Statistical Analysis in Climate Research Hans Von
4 pages
An Introduction To QSAR Methodology
No ratings yet
An Introduction To QSAR Methodology
24 pages
Selection Bias (Heckman-SPSS)
No ratings yet
Selection Bias (Heckman-SPSS)
9 pages
Effect of Audience On Music Performance Anxiety
No ratings yet
Effect of Audience On Music Performance Anxiety
18 pages
Kjod 44 177 PDF
No ratings yet
Kjod 44 177 PDF
7 pages
Segmentation and Positioning
100% (1)
Segmentation and Positioning
10 pages
Linear Regression
No ratings yet
Linear Regression
23 pages
Dissertation Kgare M
No ratings yet
Dissertation Kgare M
114 pages

Linear Regression Report

Uploaded by

Linear Regression Report

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Numerical method1 Linear regression report

Technical Report · June 2020

Hussein Ali Ahmed Ghanim

E-Tuttoring Systems View project

The user has requested enhancement of the downloaded file.

Linear regression report

Ghanim Hussein Ali Ahmed

Ph.D. student in computer science

Applications of Linear Regression ...........................................................................................

2. Multiple Linear Regression

Mean squared error: It is mean of square of errors.

Root mean squared error: It is just the square root of MSE.

Above graphs depict the three cases of the model performance.

3. Check for Homoscedasticity

THE MATH AND LOGIC BEHIND LINEAR REGRESSION

Linear Regression Model Representation

There are four assumptions associated with a linear regression model:

With this in mind, let us take a look at a use case.

Use Case – Implementing Linear Regression

1. Loading the Data

View publication stats

You might also like