2 Regression
2 Regression
PROGRAM :
2017, 2017, 2017, 2017, 2017, 2017, 2016, 2016, 2016, 2016,
5.4, 5.6, None, 5.5, None, 5.6, 5.7, 5.9, 6, 5.9, 5.8, 6.1,
1293, 1256, 1254, 1234, 1195, 1159, 1167, 1130, 1075, 1047,
965, 943, 958, 971, 949, 884, 866, 876, 822, 704, 719]}
plt.scatter(df['Interest_Rate'], df['Stock_Index_Price'],
color='red')
plt.grid(True)
plt.show()
plt.scatter(df['Unemployment_Rate'],
df['Stock_Index_Price'], color='green')
fontsize=14)
plt.grid(True)
plt.show()
OUTPUT:
PROGRAM :
2017, 2017, 2017, 2017, 2017, 2017, 2016, 2016, 2016, 2016,
5.4, 5.6, None, 5.5, None, 5.6, 5.7, 5.9, 6, 5.9, 5.8, 6.1,
1293, 1256, 1254, 1234, 1195, 1159, 1167, 1130, 1075, 1047,
965, 943, 958, 971, 949, 884, 866, 876, 822, 704, 719]}
X = df[['Interest_Rate']]
Y = df['Stock_Index_Price']
regr = linear_model.LinearRegression()
regr.fit(X, Y)
print('Intercept:\n', regr.intercept_)
print('\nCoefficients:\n', regr.coef_)
new_interest_rate = 2.75
regr.predict([[new_interest_rate]]))
OUTPUT:
INTERPRETATION:
Simple linear regression is of the form y=w0 + wlx. The output shows wo (Intercept)
Stock_Index_Price = 1452 . 0 9 63 8 554 which is exactly the predicted stock index price.
3. READING FROM A CSV FILE AND PREDICTING A SET OF DEPENDENT VARIABLES :[pg.no:24-25]
PROGRAM:
import pandas as pd
df = pd.read_csv("stock.csv")
X = df[['Interest_Rate']]
Y = df['Stock_Index_Price']
regr = linear_model.LinearRegression()
regr.fit(X, Y)
print('Intercept:\n', regr.intercept_)
print('Coefficients:\n', regr.coef_)
new_interest_rate = df[['Interest_Rate']]
df1 = DataFrame(regr.predict(new_interest_rate))
Output:
PROGRAM:
import statsmodels.api as sm
Stock_Market = {'Year': [2017, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2016, 2016, 2016, 2016,
5.4, 5.6, 5.5, 5.5, 5.5, 5.6, 5.7, 5.9, 6, 5.9, 5.8, 6.1,
1293, 1256, 1254, 1234, 1195, 1159, 1167, 1130, 1075, 1047,
965, 943, 958, 971, 949, 884, 866, 876, 822, 704, 719]}
X = df[['Interest_Rate', 'Unemployment_Rate']]
Y = df['Stock_Index_Price']
regr = linear_model.LinearRegression()
regr.fit(X, Y)
print('Intercept:\n', regr.intercept_)
print('Coefficients:\n', regr.coef_)
new_interest_rate = 2.75
new_unemployment_rate = 5.3
print(regr.predict([[new_interest_rate,new_unemployment_rate]]))
predictions = model.predict(X)
print(model.summary())
Output:
INTREPRETATION OF RESULT:
This output includes the intercept and coeffcients. We can use this information to
The table OLS Regression results displays a comprehensive table with statistical info
generated by statsmodels. Following are some important information from the OLS Regression
Results table.
Notice that the coeffcients captured in this table (highlighted) match with the coeffcients generated
by sklearn. We got consistent results by applying both sklearn and statsmodels.
5. LINEAR REGRESSION:[pg.no:29-30]
PROGRAM :
import numpy as np
import pandas as pd
dataset = pd.read_csv('position_salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Splitting the dataset into the Training set and Test set
test_size=0.2, random_state=0)
regressor = LinearRegression()
regressor.fit(X_train, y_train)
plt.title('Linear Regression')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
Output:
Explanation:
In this example, we have used 4 libraries namely numpy, pandas, matplotlib and
sklearn. We have imported libraries and got the dataset first. The dataset is a table
which contains all values in our csv file. X, the 2nd column which contains Years of
Experience array and y the last column which contains Salary array. We have split
our dataset to get training set and testing set (both X and y values per each set).
Test_size=0.2:We have split our dataset (10 observations) into 2 parts (training
set, test set) and the ratio of test set compare to dataset is 0.2 (2 observations will
be put into the test set. We can put it 1/7 to get 20% or 0.2, they are the same. We
should not let the test set too big. If it's too big, we will be lacking data to train. Normally, we should
pick around 5% to 30%.
Train_size : If we use the test size already, the rest of data will
Random_state : This is the seed for the random number generator. We can put
an instance of the RandomState class as well. If we leave it blank or 0, the
already the train set, test set, and built the linear regression model. Now, will build
PROGRAM :
import numpy as np
import pandas as pd
dataset = pd.read_csv('position_salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Splitting the dataset into the Training set and Test set
test_size=0.2, random_state=0)
poly_reg = PolynomialFeatures(degree=4)
X_poly = poly_reg.fit_transform(X)
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
def viz_polynomial():
plt.scatter(X, y, color='red')
plt.plot(X, lin_reg.predict(poly_reg.fit_transform(X)),
color='blue')
plt.title('Polynomial Regression')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
return
viz_polynomial()
OUTPUT:
7. LOGISTIC REGRESSION:[pg.no:32-33]
import pandas as pd
import seaborn as sn
ax = sn.heatmap(confusion_matrix, annot=True)
plt.show()
# Getting the statistics of the confusion matrix
print(confusion_matrix)
Output:
8. PROGRAM:[pg.no:33-35]
import pandas as pd
import seaborn as sn
candidates = {
'gmat': [780, 750, 690, 710, 680, 730, 690, 720, 740,
690, 610, 690, 710, 680, 770, 610, 580, 650, 540,
660, 640, 620, 660, 660, 680, 650, 670, 580, 590, 690],
4, 3, 1, 4, 6, 2, 3, 2, 1, 4, 1, 2, 6, 4, 2, 6, 5, 1, 2, 4,
6, 5, 1,2, 1, 4, 5],
'admitted': [1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1,
1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1,
1, 0, 0, 0, 0, 1]
y = df['admitted']
test_size=0.25, random_state=0)
logistic_regression = LogisticRegression()
logistic_regression.fit(X_train, y_train)
y_pred = logistic_regression.predict(X_test)
ax = sn.heatmap(confusion_matrix, annot=True)
plt.show()
print(confusion_matrix)
# Displaying accuracy
print('Accuracy:', metrics.accuracy_score(y_test,y_pred))
Output:
9. PROGRAM:[pg.no:37-38]
import pandas as pd
import seaborn as sn
candidates = {
'gmat': [780, 750, 690, 710, 680, 730, 690, 720, 740,
690, 610, 690, 710, 680, 770, 610, 580, 650, 540, 590, 620,
600, 550, 550, 570, 670, 660, 580, 650, 660, 640, 620, 660,
3.7],
4, 3, 1, 4, 6, 2, 3, 2, 1, 4, 1, 2, 6, 4, 2, 6, 5, 1, 2, 4,
6, 5, 1,2, 1, 4, 5],
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,
0,0,0,0,0,0,0,1,1,0,1,1,0,0,1,1,1,0,0,0,0,1]
y = df['admitted']
test_size=0.25, random_state=0)
logistic_regression = LogisticRegression()
logistic_regression.fit(X_train, y_train)
new_candidates = {
'work experience'])
y_pred = logistic_regression.predict(df2)
print(df2)
print(y_pred)
Output: