0% found this document useful (0 votes)
12 views3 pages

3-LinearRegression Formula Based

The document analyzes a dataset containing head size and brain weight measurements for 237 individuals. Simple linear regression is performed to find the relationship between head size and brain weight, yielding a slope of 0.2634 and intercept of 325.57. The coefficient of determination (R^2) is calculated as 0.6393, indicating a moderately strong linear relationship.

Uploaded by

animehv5500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

3-LinearRegression Formula Based

The document analyzes a dataset containing head size and brain weight measurements for 237 individuals. Simple linear regression is performed to find the relationship between head size and brain weight, yielding a slope of 0.2634 and intercept of 325.57. The coefficient of determination (R^2) is calculated as 0.6393, indicating a moderately strong linear relationship.

Uploaded by

animehv5500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

In [23]: %matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#Download HeadBrain Dataset from Kaggle

In [24]: data=pd.read_csv('headbrain.csv')
print(data.shape)
data.head()

(237, 4)
Out[24]: Gender Age Range Head Size(cm^3) Brain Weight(grams)

0 1 1 4512 1530

1 1 1 3738 1297

2 1 1 4261 1335

3 1 1 3777 1282

4 1 1 4177 1590

In [25]: X = data['Head Size(cm^3)'].values


Y = data['Brain Weight(grams)'].values

In [26]: # Mean of X and Y


mean_x = np.mean(X)
mean_y = np.mean(Y)
# total no. of values
n=len(X)
numer=0
denom=0
for i in range(n):
numer+=(X[i]-mean_x)*(Y[i]-mean_y)
denom+=(X[i]-mean_x)**2
b1=numer/denom
b0=mean_y - (b1*mean_x)

print(b1,b0)

0.26342933948939945 325.57342104944223

In [29]: # Lets plot it and see graphically


max_x=np.max(X)+100
min_x=np.min(X)+100

x= np.linspace(min_x, max_x, 1000)


y=b0+b1*x

plt.plot(x,y,label='Regression Line')
plt.scatter(X,Y,label='Scatter Plot')

plt.xlabel('Head Size')
plt.ylabel('Brain Weight')
plt.legend()
plt.show()

In [30]: # To find how good our model is, lets calculate R Square
# ss_t is total sum of square
# ss_r is sum of residual
ss_t = 0
ss_r = 0
for i in range(n):
y_pred=b0+b1*X[i]
ss_t += (Y[i]-mean_y)**2
ss_r += (Y[i]-y_pred)**2
r2=1-(ss_r/ss_t)
print(r2)

0.6393117199570003

In [32]: # Now lets see how it can be implemented using ML library sci kit learn
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Scikit learn can't use Rank 1 matrix


X=X.reshape((n,1))

# Creating Model
reg=LinearRegression()

reg=reg.fit(X,Y)

Y_pred=reg.predict(X)

mse=mean_squared_error(Y,Y_pred)
rmse=np.sqrt(mse)
r2_score=reg.score(X,Y)

print(np.sqrt(mse))
print(r2_score)
72.1206213783709
0.639311719957

You might also like