Fisseha Berhane,: Analytical and Numerical Solutions, With R, To Linear Regression Problems
Fisseha Berhane,: Analytical and Numerical Solutions, With R, To Linear Regression Problems
https://datascience-enthusiast.com/R/Linear_Regression.html 1/22
07/01/2021 Linear Regression
https://datascience-enthusiast.com/R/Linear_Regression.html 2/22
07/01/2021 Linear Regression
library(ggplot2)
library(data.table)
library(magrittr)
library(caret)
library(fields)
library(plot3D)
https://datascience-enthusiast.com/R/Linear_Regression.html 3/22
07/01/2021 Linear Regression
population profit
6.1101 17.5920
5.5277 9.1302
8.5186 13.6620
7.0032 11.8540
5.8598 6.8233
8.3829 11.8860
https://datascience-enthusiast.com/R/Linear_Regression.html 4/22
07/01/2021 Linear Regression
ex1data1%>%ggplot(aes(x=population, y=profit))+
geom_point(color="blue",size=4,alpha=0.5)+
ylab('Profit in $10,000s')+
xlab('Population of City in 10,000s')+ggtitle ('Figure 1: Scatter plot of training data'
)+
theme(plot.title = element_text(size = 16,colour="red"))
Gradient Descent
https://datascience-enthusiast.com/R/Linear_Regression.html 5/22
07/01/2021 Linear Regression
https://datascience-enthusiast.com/R/Linear_Regression.html 6/22
07/01/2021 Linear Regression
https://datascience-enthusiast.com/R/Linear_Regression.html 7/22
07/01/2021 Linear Regression
X=cbind(1,ex1data1$population)
y=ex1data1$profit
head(X)
1 6.1101
1 5.5277
1 8.5186
1 7.0032
1 5.8598
1 8.3829
round(computeCost(X,y,theta),2)
32.07
Gradient descent
https://datascience-enthusiast.com/R/Linear_Regression.html 8/22
07/01/2021 Linear Regression
gd=list()
cost=rep(0,iters)
for(k in 1:iters){
z=rep(0,ncol(X))
for(i in 1:ncol(X)){
for(j in 1:nrow(X)){
z[i]=z[i]+(((X[j,]%*%theta)-y[j])*X[j,i])
}
}
theta= theta-((alpha/nrow(X))*z)
cost[k]=computeCost(X,y,theta)
}
gd$theta= theta
gd$cost=cost
gd
}
https://datascience-enthusiast.com/R/Linear_Regression.html 9/22
07/01/2021 Linear Regression
iterations = 1500
alpha = 0.01
theta= matrix(rep(0, ncol(X)))
gradientDescent_results=gradientDescent(X,y,theta,alpha,iterations)
theta=gradientDescent_results$theta
theta
-3.630291
1.166362
https://datascience-enthusiast.com/R/Linear_Regression.html 10/22
07/01/2021 Linear Regression
data.frame(Cost=gradientDescent_results$cost,Iterations=1:iterations)%>%
ggplot(aes(x=Iterations,y=Cost))+geom_line(color="blue")+
ggtitle("Cost as a function of number of iteration")+
theme(plot.title = element_text(size = 16,colour="red"))
https://datascience-enthusiast.com/R/Linear_Regression.html 11/22
07/01/2021 Linear Regression
ex1data1%>%ggplot(aes(x=population, y=profit))+
geom_point(color="blue",size=3,alpha=0.5)+
ylab('Profit in $10,000s')+
xlab('Population of City in 10,000s')+
ggtitle ('Figure 1: Scatter plot of training data') +
geom_abline(intercept = theta[1], slope = theta[2],col="red",show.legend=TRUE)+
theme(plot.title = element_text(size = 16,colour="red"))+
annotate("text", x = 12, y = 20, label = paste0("Profit = ",round(theta[1],4),"+",round(
theta[2],4)," * Population"))
Visualizing J(θ )
https://datascience-enthusiast.com/R/Linear_Regression.html 12/22
07/01/2021 Linear Regression
Intercept=seq(from=-10,to=10,length=100)
Slope=seq(from=-1,to=4,length=100)
for(i in 1:length(Intercept)){
for(j in 1:length(Slope)){
t = c(Intercept[i],Slope[j])
Cost[i,j]= computeCost(X, y, t)
}
}
https://datascience-enthusiast.com/R/Linear_Regression.html 13/22
07/01/2021 Linear Regression
https://datascience-enthusiast.com/R/Linear_Regression.html 14/22
07/01/2021 Linear Regression
Normal Equation
T −1 T
θ = (X X) X y
theta2=solve((t(X)%*%X))%*%t(X)%*%y
theta2
-3.895781
1.193034
https://datascience-enthusiast.com/R/Linear_Regression.html 15/22
07/01/2021 Linear Regression
theta
-3.630291
1.166362
gradientDescent_results=gradientDescent(X,y,theta,alpha,iterations)
theta=gradientDescent_results$theta
theta
-3.895781
1.193034
https://datascience-enthusiast.com/R/Linear_Regression.html 16/22
07/01/2021 Linear Regression
(Intercept) -3.89578087831187
population 1.1930336441896
painless!!
https://datascience-enthusiast.com/R/Linear_Regression.html 17/22
07/01/2021 Linear Regression
https://datascience-enthusiast.com/R/Linear_Regression.html 18/22
07/01/2021 Linear Regression
Feature Normalization
'data.frame'
round(apply(ex1data2,2,mean),10)
size 0
bedrooms 0
price 340412.659574468
apply(ex1data2,2,sd)
size 1
bedrooms 1
price 125039.899586401
Gradient Descent
https://datascience-enthusiast.com/R/Linear_Regression.html 19/22
07/01/2021 Linear Regression
X=cbind(1,ex1data2$size,ex1data2$bedrooms)
y=ex1data2$price
head(X)
1 0.13000987 -0.2236752
1 -0.50418984 -0.2236752
1 0.50247636 -0.2236752
1 -0.73572306 -1.5377669
1 1.25747602 1.0904165
1 -0.01973173 1.0904165
iterations = 6000
alpha = 0.01
theta= matrix(rep(0, ncol(X)))
gradientDescent_results=gradientDescent(X,y,theta,alpha,iterations)
theta=gradientDescent_results$theta
theta
340412.660
110631.050
-6649.474
https://datascience-enthusiast.com/R/Linear_Regression.html 20/22
07/01/2021 Linear Regression
Normal Equation
theta2=solve((t(X)%*%X))%*%t(X)%*%y
theta2
340412.660
110631.050
-6649.474
(Intercept) 340412.659574468
size 110631.050278846
bedrooms -6649.4742708198
Summary
https://datascience-enthusiast.com/R/Linear_Regression.html 21/22
07/01/2021 Linear Regression
ALSO ON HTTP://DATASCIENCE-ENTHUSIAST.COM
Tableau’s Level Of Detail Streaming Data is data that Welcome to the second 1 is odd 2 i
Expressions With R - Part … is generated continuously assignment of Week 2. You is even 5 is
Level Of Detail (LOD) … and it includes various … are going to use word … is odd 8 is e
https://datascience-enthusiast.com/R/Linear_Regression.html 22/22