0% found this document useful (0 votes)
5 views15 pages

2 Regression

Uploaded by

metapi5906
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views15 pages

2 Regression

Uploaded by

metapi5906
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

CHAPTER 2: REGRESSION

1. CHECKING LINEARITY : [pg.no:21-22]

PROGRAM :

from pandas import DataFrame

import matplotlib.pyplot as plt

Stock_Market = {'Year': [2017, 2017, 2017, 2017, 2017, 2017,

2017, 2017, 2017, 2017, 2017, 2017, 2016, 2016, 2016, 2016,

2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016],

'Month': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3,

2, 1, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1],

'Interest_Rate': [2.75, 2.5, 2.5, 2.5, 2.5,

2.5, 2.5, 2.25, 2.25, 2.25, 2, 2, 2, 1.75, 1.75, 1.75, 1.75,

1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75],

'Unemployment_Rate': [5.3, 5.3, 5.3, 5.3,

5.4, 5.6, None, 5.5, None, 5.6, 5.7, 5.9, 6, 5.9, 5.8, 6.1,

6.2, 6.1, 6.1, 6.1, 6.1, 5.9, 6.2, 6.2],

'Stock_Index_Price': [1464, 1394, 1357,

1293, 1256, 1254, 1234, 1195, 1159, 1167, 1130, 1075, 1047,

965, 943, 958, 971, 949, 884, 866, 876, 822, 704, 719]}

df = DataFrame(Stock_Market, columns=['Year', 'Month',

'Interest_Rate', 'Unemployment_Rate', 'Stock_Index_Price'])

plt.scatter(df['Interest_Rate'], df['Stock_Index_Price'],

color='red')

plt.title('Stock Index Price Vs Interest Rate', fontsize=14)

plt.xlabel('Interest Rate', fontsize=14)

plt.ylabel('Stock Index Price', fontsize=14)

plt.grid(True)

plt.show()

plt.scatter(df['Unemployment_Rate'],
df['Stock_Index_Price'], color='green')

plt.title('Stock Index Price Vs Unemployment Rate',

fontsize=14)

plt.xlabel('Unemployment Rate', fontsize=14)

plt.ylabel('Stock Index Price', fontsize=14)

plt.grid(True)

plt.show()

OUTPUT:

2. SIMPLE LINEAR REGRESSION:[pg.no:22-23]

PROGRAM :

from pandas import DataFrame

from sklearn import linear_model


Stock_Market = {'Year': [2017, 2017, 2017, 2017, 2017, 2017,

2017, 2017, 2017, 2017, 2017, 2017, 2016, 2016, 2016, 2016,

2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016],

'Month': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3,

2, 1, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1],

'Interest_Rate': [2.75, 2.5, 2.5, 2.5, 2.5,

2.5, 2.5, 2.25, 2.25, 2.25, 2, 2, 2, 1.75, 1.75, 1.75, 1.75,

1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75],

'Unemployment_Rate': [5.3, 5.3, 5.3, 5.3,

5.4, 5.6, None, 5.5, None, 5.6, 5.7, 5.9, 6, 5.9, 5.8, 6.1,

6.2, 6.1, 6.1, 6.1, 6.1, 5.9, 6.2, 6.2],

'Stock_Index_Price': [1464, 1394, 1357,

1293, 1256, 1254, 1234, 1195, 1159, 1167, 1130, 1075, 1047,

965, 943, 958, 971, 949, 884, 866, 876, 822, 704, 719]}

df = DataFrame(Stock_Market, columns=['Year', 'Month',

'Interest_Rate', 'Unemployment_Rate', 'Stock_Index_Price'])

# Here we have 1 variable for linear regression

X = df[['Interest_Rate']]

Y = df['Stock_Index_Price']

# Model fitting with sklearn linear regression

regr = linear_model.LinearRegression()

regr.fit(X, Y)

# Displaying Intercept and coefficients

print('Intercept:\n', regr.intercept_)

print('\nCoefficients:\n', regr.coef_)

# Prediction with sklearn

new_interest_rate = 2.75

print('Predicted Stock Index Price:\n',

regr.predict([[new_interest_rate]]))

OUTPUT:
INTERPRETATION:

Simple linear regression is of the form y=w0 + wlx. The output shows wo (Intercept)

as —99 . 4 6431881371655 and W1 (Coefficient) as 564 . 2038924 9. According to the

above example, the equation becomes

Stock_Index_Price= wo+W1* Interest_Rate

i.e, Stock_Index_Price= -99.46431881371655 +564.20389249* Interest Rate

Stock_Index_Price = 1452 . 0 9 63 8 554 which is exactly the predicted stock index price.

3. READING FROM A CSV FILE AND PREDICTING A SET OF DEPENDENT VARIABLES :[pg.no:24-25]

PROGRAM:

import pandas as pd

from pandas import DataFrame

from sklearn import linear_model

# Reading the input data from a csv file

df = pd.read_csv("stock.csv")

# Here we have 1 variable for linear regression

X = df[['Interest_Rate']]

Y = df['Stock_Index_Price']

# Model fitting with sklearn linear regression

regr = linear_model.LinearRegression()

regr.fit(X, Y)

# Displaying Intercept and coefficients

print('Intercept:\n', regr.intercept_)

print('Coefficients:\n', regr.coef_)

# Prediction with sklearn for all the interest rates

new_interest_rate = df[['Interest_Rate']]
df1 = DataFrame(regr.predict(new_interest_rate))

print('Predicted Stock Index Price:\n', df1)

Output:

4. MULTIPLE LINEAR REGRESSION :[pg.no:25-27]

PROGRAM:

from pandas import DataFrame

from sklearn import linear_model

import statsmodels.api as sm
Stock_Market = {'Year': [2017, 2017, 2017, 2017, 2017, 2017,

2017, 2017, 2017, 2017, 2017, 2017, 2016, 2016, 2016, 2016,

2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016],

'Month': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3,

2, 1, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1],

'Interest_Rate': [2.75, 2.5, 2.5, 2.5, 2.5,

2.5, 2.5, 2.25, 2.25, 2.25, 2, 2, 2, 1.75, 1.75, 1.75, 1.75,

1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75],

'Unemployment_Rate': [5.3, 5.3, 5.3, 5.3,

5.4, 5.6, 5.5, 5.5, 5.5, 5.6, 5.7, 5.9, 6, 5.9, 5.8, 6.1,

6.2, 6.1, 6.1, 6.1, 6.1, 5.9, 6.2, 6.2],

'Stock_Index_Price': [1464, 1394, 1357,

1293, 1256, 1254, 1234, 1195, 1159, 1167, 1130, 1075, 1047,

965, 943, 958, 971, 949, 884, 866, 876, 822, 704, 719]}

df = DataFrame(Stock_Market, columns=['Year', 'Month',

'Interest_Rate', 'Unemployment_Rate', 'Stock_Index_Price'])

# Here we have 2 variables for multiple regression.

X = df[['Interest_Rate', 'Unemployment_Rate']]

Y = df['Stock_Index_Price']

# Model fitting with sklearn linear regression

regr = linear_model.LinearRegression()

regr.fit(X, Y)

# Displaying Intercept and coefficients

print('Intercept:\n', regr.intercept_)

print('Coefficients:\n', regr.coef_)

# Prediction with sklearn

new_interest_rate = 2.75

new_unemployment_rate = 5.3

print('Stock Index Price:')

print(regr.predict([[new_interest_rate,new_unemployment_rate]]))

# Prediction with statsmodels


X = sm.add_constant(X) # adding a constant

model = sm.OLS(Y, X).fit()

predictions = model.predict(X)

print(model.summary())

Output:

INTREPRETATION OF RESULT:

This output includes the intercept and coeffcients. We can use this information to

build the multiple linear regression equation as follows. Stock_Index_Price = (Intercept) +


(Interest_Rate coef)*X1 +(Unemployment_Rate coef)*X2 Substituting the values of

intercept and coeffcients we get Stock_Index_Price= (1798.4040) +(3455401)*X1+ (-250.1466)*X2

Let. Interest_Rate = 2.75 (i.e., X 1= 2.75) and Unemployment_Rate = 5.3


(i.e., X2= 5.3). Substituting the above data into the regression equation, we will get the exact same
predicted results as displayed. = (1798.4040) + (3455401)*(2.75)+(-250.1466)*(5.3) = 1422.86

The table OLS Regression results displays a comprehensive table with statistical info

generated by statsmodels. Following are some important information from the OLS Regression
Results table.

 Adj. R-squared reflects the fit of the model.


 R-squared values range from 0 to 1, where a higher value generally indicates a better fit,
assuming certain conditions are met.
 const coeffcient is our Y-intercept. It means that if Interest Rate coeffcient is zero, then the
expected output (i.e., the Y) would be equal to the const coeffcient.
 Interest Rate coefficient represents the change in the output Y due to a change of one unit in
the interest rate (everything else held constant).
 Unemployment Rate coefficient represents the change in the output Y due to a change of one
unit in the interest rate (everything else held constant).
 std err reflects the level of accuracy of the coeffcients. The lower it is, the higher is the level of
accuracy.
 P >ltl is your p-value. A p-value of less than 0.05 is considered to be statistically
 significant. Confidence interval represents the range in which our coefficients are likely to fall
(with a likelihood of95%).

Notice that the coeffcients captured in this table (highlighted) match with the coeffcients generated
by sklearn. We got consistent results by applying both sklearn and statsmodels.

5. LINEAR REGRESSION:[pg.no:29-30]

PROGRAM :

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('position_salaries.csv')

X = dataset.iloc[:, 1:2].values

y = dataset.iloc[:, 2].values

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.2, random_state=0)

# Fitting Linear Regression to the dataset

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Visualizing the Linear Regression results

plt.scatter(X_train, y_train, color='red')

plt.plot(X_train, regressor.predict(X_train), color='blue')

plt.title('Linear Regression')

plt.xlabel('Years of Experience')

plt.ylabel('Salary')

plt.show()

Output:

Explanation:

In this example, we have used 4 libraries namely numpy, pandas, matplotlib and

sklearn. We have imported libraries and got the dataset first. The dataset is a table

which contains all values in our csv file. X, the 2nd column which contains Years of

Experience array and y the last column which contains Salary array. We have split

our dataset to get training set and testing set (both X and y values per each set).

Test_size=0.2:We have split our dataset (10 observations) into 2 parts (training

set, test set) and the ratio of test set compare to dataset is 0.2 (2 observations will

be put into the test set. We can put it 1/7 to get 20% or 0.2, they are the same. We

should not let the test set too big. If it's too big, we will be lacking data to train. Normally, we should
pick around 5% to 30%.

Train_size : If we use the test size already, the rest of data will

automatically be assigned to train_size.

Random_state : This is the seed for the random number generator. We can put
an instance of the RandomState class as well. If we leave it blank or 0, the

RandomState instance used by np.random will be used instead. We have

already the train set, test set, and built the linear regression model. Now, will build

a polynomial regression model and visualize it.

6. POLYNOMINAL REGRESSION :[pg.no:30-31]

PROGRAM :

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('position_salaries.csv')

X = dataset.iloc[:, 1:2].values

y = dataset.iloc[:, 2].values

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.2, random_state=0)

# Fitting polynomial regression to the dataset

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures

poly_reg = PolynomialFeatures(degree=4)

X_poly = poly_reg.fit_transform(X)

lin_reg = LinearRegression()

lin_reg.fit(X_poly, y)

# Visualizing the Polynomial Regression results

def viz_polynomial():

plt.scatter(X, y, color='red')

plt.plot(X, lin_reg.predict(poly_reg.fit_transform(X)),

color='blue')

plt.title('Polynomial Regression')

plt.xlabel('Years of Experience')
plt.ylabel('Salary')

plt.show()

return

viz_polynomial()

OUTPUT:

7. LOGISTIC REGRESSION:[pg.no:32-33]

PROGRAM FOR CONFUSION MATRIX:

import pandas as pd

import seaborn as sn

import matplotlib.pyplot as plt

data = {'y_Predicted': [1, 1, 0, 1,0,1,1,0,1,0,0,0],

'y_Actual': [1, 0, 0, 1, 0, 1, 0, 0, 1,0,1,0]

df = pd.DataFrame(data, columns=['y_Actual', 'y_Predicted'])

# Creating confusion matrix

confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'],


colnames=['Predicted'],margins=True)

# Generating heatmap and displaying it

ax = sn.heatmap(confusion_matrix, annot=True)

plt.show()
# Getting the statistics of the confusion matrix

print(confusion_matrix)

Output:

8. PROGRAM:[pg.no:33-35]

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn import metrics

import seaborn as sn

import matplotlib.pyplot as plt

candidates = {

'gmat': [780, 750, 690, 710, 680, 730, 690, 720, 740,

690, 610, 690, 710, 680, 770, 610, 580, 650, 540,

590, 620, 600, 550, 550,570, 670, 660, 580, 650,

660, 640, 620, 660, 660, 680, 650, 670, 580, 590, 690],

'gpa': [4,3.9, 3.3, 3.7, 3.9, 3.7, 2.3, 3.3, 3.3,

1.7, 2.7, 3.7, 3.7,3.3, 3.3, 3, 2.7, 3.7, 2.7, 2.3,

3.3, 2,2.3, 2.7, 3, 3.3, 3.7, 2.3, 3.7,

3.3, 3, 2.7, 4, 3.3, 3.3, 2.3, 2.7, 3.3, 1.7,


3.7],

'work experience': [3, 4, 3, 5, 4, 6, 1, 4, 5, 1,3, 5, 6,

4, 3, 1, 4, 6, 2, 3, 2, 1, 4, 1, 2, 6, 4, 2, 6, 5, 1, 2, 4,

6, 5, 1,2, 1, 4, 5],

'admitted': [1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1,

1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1,

1, 0, 0, 0, 0, 1]

df = pd.DataFrame(candidates, columns=['gmat', 'gpa',

'work experience', 'admitted'])

X = df[['gmat', 'gpa', 'work experience']]

y = df['admitted']

# Splitting the dataset into training and testing

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.25, random_state=0)

# Fitting logistic regression to the dataset

logistic_regression = LogisticRegression()

logistic_regression.fit(X_train, y_train)

y_pred = logistic_regression.predict(X_test)

# Creating confusion matrix

confusion_matrix = pd.crosstab(y_test, y_pred,

rownames=['Actual'], colnames=['Predicted'], margins=True)

# Generating heatmap and displaying it

ax = sn.heatmap(confusion_matrix, annot=True)

plt.show()

print(confusion_matrix)

# Displaying accuracy

print('Accuracy:', metrics.accuracy_score(y_test,y_pred))

Output:
9. PROGRAM:[pg.no:37-38]

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn import metrics

import seaborn as sn

import matplotlib.pyplot as plt

candidates = {

'gmat': [780, 750, 690, 710, 680, 730, 690, 720, 740,

690, 610, 690, 710, 680, 770, 610, 580, 650, 540, 590, 620,

600, 550, 550, 570, 670, 660, 580, 650, 660, 640, 620, 660,

660, 680, 650, 670, 580, 590, 690],

'gpa': [4,3.9, 3.3, 3.7, 3.9, 3.7, 2.3, 3.3, 3.3,

1.7, 2.7, 3.7, 3.7,3.3, 3.3, 3, 2.7, 3.7, 2.7, 2.3,

3.3, 2,2.3, 2.7, 3, 3.3, 3.7, 2.3, 3.7,

3.3, 3, 2.7, 4, 3.3, 3.3, 2.3, 2.7, 3.3, 1.7,

3.7],

'work experience': [3, 4, 3, 5, 4, 6, 1, 4, 5, 1,3, 5, 6,

4, 3, 1, 4, 6, 2, 3, 2, 1, 4, 1, 2, 6, 4, 2, 6, 5, 1, 2, 4,
6, 5, 1,2, 1, 4, 5],

'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,

0,0,0,0,0,0,0,1,1,0,1,1,0,0,1,1,1,0,0,0,0,1]

df = pd.DataFrame(candidates, columns=['gmat', 'gpa',

'work experience', 'admitted'])

X = df[['gmat', 'gpa', 'work experience']]

y = df['admitted']

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.25, random_state=0)

logistic_regression = LogisticRegression()

logistic_regression.fit(X_train, y_train)

new_candidates = {

'gmat': [590, 740, 680, 610, 710],

'gpa': [2,3.7, 3.3, 2.3, 3],

'work experience': [3, 4, 6, 1, 5]

df2 = pd.DataFrame(new_candidates, columns=['gmat', 'gpa',

'work experience'])

y_pred = logistic_regression.predict(df2)

print(df2)

print(y_pred)

Output:

You might also like