Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
J?NTTY@%SX
*(.8'5()B(.-'%
=03?$
Q606%A57*807,0%684
HA%A0/4*80%60 X'/%26F*%M%>+**%<*<.*+K'8(3%,0'+3%(*>0%027,%<'802G%A7=8%/:%>'+%H*47/<%684%=*0%68%*R0+6%'8*
S',0'8%987F*+,703G
?*64%<3%.('=Z
200:,Z[[+*=*8*+607F*0
'463G5'<[
&'((')
DC
12'0'%.3%+*4526+(7*%'8%98,:(6,2
!"#$%"&'(#)*+,-+..'"%)/-"&
01-(213)'%)!$23"%
;*6+8%0'%7<:(*<*80%:'(38'<76(%+*=+*,,7'8%>+'<%,5+6052%)702%,'<*
,7<:(*%:302'8%5'4*
!"#$%"&'(#)*+,-+..'"%)/"-&0#(
Linear regression can perform well only if there is a linear correlation
between the input variables and the output variable. As I mentioned before
polynomial regression is built on linear regression. If you need a refresher
on linear regression, here is the link to linear regression:
4'%+(-)*+,-+..'"%)5#,"-'23&)'%)!$23"%
;*6+8%02*%5'85*:0,%'>%(78*6+%+*=+*,,7'8%684%4*F*(':%6%5'<:(*0*
(78*6+%+*=+*,,7'8%6(='+702<%>+'<%,5+6052%78%:302'8
0')6+4,4606,57*85*G5'<
Polynomial regression can Dnd the relationship between input features and
the output variable in a better way even if the relationship is not linear. It
uses the same formula as the linear regression:
Y = BX + C
Here, we get X and Y from the dataset. X is the input feature and Y is the
output variable. Theta values are initialized randomly.
We are adding more terms here. We are using the same input features and
taking diNerent exponentials to make more features. That way, our
algorithm will be able to learn about the data better.
1".2)/0%32'"%)4%5)6-(5'+%2)7+.3+%2
Cost function gives an idea of how far the predicted hypothesis is from the
values. The formula is:
Here, alpha is the learning rate. You choose the value of alpha.
!$28"%)9&:#+&+%2(2'"%)";)!"#$%"&'(#)*+,-+..'"%
Here is the step by step implementation of Polynomial regression.
1. We will use a simple dummy dataset for this example that gives the data
of salaries for positions. Import the dataset:
import pandas as pd
import numpy as np
df = pd.read_csv('position_salaries.csv')
df.head()
2. Add the bias column for theta 0. This bias column will only contain 1.
Because if you multiply 1 with a number it does not change.
df = df.drop(columns='Position')
4. DeDne our input variable X and the output variable y. In this example,
‘Level’ is the input feature and ‘Salary’ is the output variable. We want to
predict the salary for levels.
y = df['Salary']
X = df.drop(columns = 'Salary')
X.head()
5. Take the exponentials of the ‘Level’ column to make ‘Level1’ and ‘Level2’
columns.
X['Level1'] = X['Level']**2
X['Level2'] = X['Level']**3
X.head()
6. Now, normalize the data. Divide each column by the maximum value of
that column. That way, we will get the values of each column ranging from
0 to 1. The algorithm should work even without normalization. But it helps
to converge faster. Also, calculate the value of m which is the length of the
dataset.
m = len(X)
X = X/X.max()
7. DeDne the hypothesis function. That will use the X and theta to predict
the ‘y’.
8. DeDne the cost function, with our formula for cost-function above:
9. Write the function for gradient descent. We will keep updating the theta
values until we Dnd our optimum cost. For each iteration, we will calculate
the cost for future analysis.
10. All the functions are deDned. Now, initialize the theta. I am initializing
an array of zero. You can take any other random values. I am choosing
alpha as 0.05 and I will iterate the theta values for 700 epochs.
theta = np.array([0.0]*len(X.columns))
J, theta = gradientDescent(X, y, theta, 0.05, 700)
11. We got our Dnal theta values and the cost in each iteration as well. Let’s
Dnd the salary prediction using our Dnal theta.
12. Now plot the original salary and our predicted salary against the levels.
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
plt.scatter(x=X['Level'],y= y)
plt.scatter(x=X['Level'], y=y_hat)
plt.show()
Our prediction does not exactly follow the trend of salary but it is close.
Linear regression can only return a straight line. But in polynomial
regression, we can get a curved line like that. If the line would not be a nice
curve, polynomial regression can learn some more complex trends as well.
13. Let’s plot the cost we calculated in each epoch in our gradient descent
function.
plt.figure()
plt.scatter(x=list(range(0, 700)), y=J)
plt.show()
The cost fell drastically in the beginning and then the fall was slow. In a
good machine learning algorithm, cost should keep going down until the
convergence. Please feel free to try it with a diNerent number of epochs and
diNerent learning rates (alpha).
Follow this link for the full working code: Polynomial Regression
*+3"&&+%5+5)-+(5'%,<
6%2+-(12'7+)8+".9(2'(#):(2();'.<(#'=(2'"%)'%)!$23"%
H6:%,:*57>75%:6+0,%'>%02*%)'+(4I%:+*,*80%02*%*F*80,%'8%70%684
86F7=60*%6+'/84
0')6+4,4606,57*85*G5'<
0'&'#(-)>+?2.)0+(-13)6%)!$23"%)@'23)5)/+A)4'%+.)BC)D"E+F)5%)G4!
!-"H+12
&784%,7<7(6+%J7B7:*476%:+'>7(*,%/,78=%5'/80KF*50'+7L*+%684%8*6+*,0K
8*7=2.'+%<*02'4%78%1302'8I%6%,7<:(*%684%/,*>/(M
<*47/<G5'<
4",'.2'1)*+,-+..'"%)'%)!$23"%)>"):+2+12)I+(-2):'.+(.+
N<:'+0680%*O/607'8,%0'%4*F*(':%6%('=7,075%+*=+*,,7'8%6(='+702<%684
P')%0'%4*F*(':%6%('=7,075%+*=+*,,7'8%6(='+702<%)702M
0')6+4,4606,57*85*G5'<
J<'#E)5)G+<-(#)G+2A"-K)/-"&)01-(213)6%)!$23"%
Q*067(%*R:(68607'8%684%,0*:%.3%,0*:%7<:(*<*80607'8%'>%6%@*/+6(
@*0)'+B
<*47/<G5'<
J<'#E)5)*+1"&&+%E(2'"%)0$.2+&)L.'%,)0'&9#+)D"E+.)'%)!$23"%
P')%0'%./7(4%6%<'F7*%+*5'<<*84607'8%,3,0*<%78%1302'8
<*47/<G5'<
=',%)0:);"-)>8+)7('#$)!'3?
S3%T')6+4,%Q606%A57*85*
P684,K'8%+*6(K)'+(4%*R6<:(*,I%+*,*6+52I%0/0'+76(,I%684%5/0078=K*4=*%0*5287O/*,
4*(7F*+*4%H'8463%0'%T2/+,463G%H6B*%(*6+878=%3'/+%467(3%+70/6(G%T6B*%6%(''B
X'/+%*<67( U*0%027,%8*),(*00*+
S3%,7=878=%/:I%3'/%)7((%5+*60*%6%H*47/<%655'/80%7>%3'/%4'8V0%6(+*643%26F*%'8*G%?*F7*)%'/+%1+7F653%1'(753%>'+%<'+*%78>'+<607'8
6.'/0%'/+%:+7F653%:+65075*,G
DC%
@"-+);-"&)>"A(-5.)7(2()=3'+%3+ &'((')
-%H*47/<%:/.(75607'8%,26+78=%5'85*:0,I%74*6,I%684%5'4*,G
?*64%<'+*%>+'<%T')6+4,%Q606%A57*85*
@"-+)/-"&)@+5'0&