0% found this document useful (0 votes)
79 views

Linear Regression Python Sklearn Numpy P PDF

This document provides a tutorial on performing linear regression in Python. It introduces two YouTube videos about installing Anaconda and explaining linear regression in Python. It then shows code for importing libraries, loading sample data, preprocessing the data by removing missing values, fitting a linear regression model using Scikit-Learn, and plotting the regression line along with the data points. The code calculates the slope and intercept of the regression line and outputs the linear regression equation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Linear Regression Python Sklearn Numpy P PDF

This document provides a tutorial on performing linear regression in Python. It introduces two YouTube videos about installing Anaconda and explaining linear regression in Python. It then shows code for importing libraries, loading sample data, preprocessing the data by removing missing values, fitting a linear regression model using Scikit-Learn, and plotting the regression line along with the data points. The code calculates the slope and intercept of the regression line and outputs the linear regression equation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Linear Regression Python

December 23, 2015

Linear Regression Python Tutorial by Michael Galarnyk


youtube video on how to install anaconda on mac osx:
https://www.youtube.com/watch?v=B6d5LrA8bNE
youtube video explaining linear regression using python (this notebook):
https://www.youtube.com/watch?v=dSYJVbj4Eew

In [4]: import numpy as np


import pandas as pd
from sklearn.linear_model import LinearRegression
%pylab inline
import matplotlib.pyplot as plt

Populating the interactive namespace from numpy and matplotlib

In [5]: raw_data = pd.read_csv("linear.csv") #any dataset will work. You can get the data from my github
# https://github.com/mGalarnyk/Linear_Regression
raw_data.head(3)

Out[5]: x y
0 82.583220 134.907414
1 73.922466 134.085180
2 34.887445 NaN

1) Preprocess the data to remove any points with a missing y value

In [6]: filtered_data = raw_data[~np.isnan(raw_data["y"])] #removes rows with NaN in them


filtered_data.head(3)

Out[6]: x y
0 82.583220 134.907414
1 73.922466 134.085180
3 61.839983 114.530638

2) Fit a linear regression model using sklearn’s LinearRegression package

In [7]: npMatrix = np.matrix(filtered_data)


X, Y = npMatrix[:,0], npMatrix[:,1]
mdl = LinearRegression().fit(X,Y) # either this or the next line
#mdl = LinearRegression().fit(filtered_data[[’x’]],filtered_data.y)
m = mdl.coef_[0]
b = mdl.intercept_
print "formula: y = {0}x + {1}".format(m, b) # following slope intercept form

formula: y = [ 1.5831968]x + [ 4.4701969]

1
In [8]: plt.scatter(X,Y, color=’blue’)
plt.plot([0,100],[b,m*100+b],’r’)
plt.title(’Linear Regression Example’, fontsize = 20)
plt.xlabel(’X’, fontsize = 15)
plt.ylabel(’Y’, fontsize = 15)

Out[8]: <matplotlib.text.Text at 0x10ba4e250>

1 official documentation
http://scikit-learn.org/stable/auto examples/linear model/plot ols.html

You might also like