LinearRegression
LinearRegression
Linear regression uses the relationship between the data-points to draw a straight line through all them.
In the example below, the x-axis represents age, and the y-axis represents speed. We have registered the age and speed of 13 cars as they were passing a
tollbooth. Let us see if the data we collected could be used in a linear regression:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()
Result:
Run example »
Example
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
def myfunc(x):
return slope * x + intercept
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
Result:
Run example »
Example Explained
You can learn about the Matplotlib module in our Matplotlib Tutorial.
You can learn about the SciPy module in our SciPy Tutorial.
import matplotlib.pyplot as plt
from scipy import stats
Create the arrays that represent the values of the x and y axis:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
Execute a method that returns some important key values of Linear Regression:
Create a function that uses the slope and intercept values to return a new value. This new value represents where on the y-axis the corresponding x value will
be placed:
def myfunc(x):
return slope * x + intercept
Run each value of the x array through the function. This will result in a new array with new values for the y-axis:
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
R for Relationship
It is important to know how the relationship between the values of the x-axis and the values of the y-axis is, if there are no relationship the linear regression can
not be used to predict anything.
The r value ranges from -1 to 1, where 0 means no relationship, and 1 (and -1) means 100% related.
Python and the Scipy module will compute this value for you, all you have to do is feed it with the x and y values.
Example
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
print(r)
Try it Yourself »
Note: The result -0.76 shows that there is a relationship, not perfect, but it indicates that we could use linear regression in future predictions.
To do so, we need the same myfunc() function from the example above:
def myfunc(x):
return slope * x + intercept
Example
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
def myfunc(x):
return slope * x + intercept
speed = myfunc(10)
print(speed)
Run example »
The example predicted a speed at 85.6, which we also could read from the diagram:
Bad Fit?
Let us create an example where linear regression would not be the best method to predict future values.
Example
These values for the x- and y-axis should result in a very bad fit for linear regression:
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
def myfunc(x):
return slope * x + intercept
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
Result:
Run example »
Example
import numpy
from scipy import stats
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
print(r)