0% found this document useful (0 votes)

17 views14 pages

Supervised Learning For Data Science...

This document discusses various supervised learning techniques including linear regression, multiple linear regression, and making predictions on new data. It loads data on tips and salaries, explores outliers, builds linear regression models to predict tip amounts and salaries based on features, evaluates the models, and makes predictions on new data using the multiple linear regression model. Feature engineering techniques like label encoding, one-hot encoding, and removing insignificant features are applied to build an optimal model.

Uploaded by

shivaybhargava33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views14 pages

Supervised Learning For Data Science...

Uploaded by

shivaybhargava33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Supervised Learning

Example Log Transformation

Import Libraries
import seaborn as sns
import numpy as np
import pandas as pd

Load Data Set and make a copy

tips =sns.load_dataset('tips')
tips1= tips
tips1

Create Box plot to check outliers

sns.boxplot (data = tips1 , x = 'day', y = 'total_bill' )

Create dist plot

sns.distplot(tips1['total_bill'])

Apply log Transformation to address outliers

tips1['total_bill'] = np.log10(tips1['total_bill'])

Create box plot and check outlier again

sns.boxplot (data = tips1 , x = 'day', y = 'total_bill' )

Create dist plot

sns.distplot(tips1['total_bill'])

Save the result in .xls

tips1.to_excel('C:\\Noble\\Training\\DS Temporary Files\\tips.xlsx')
Simple Linear regression –
Import the Libraries
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt

Load the Data Set

os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')

os.getcwd()

df1= pd.read_csv('Salary_Data.csv')
print (df1)

create the graph to check the trend

plt.plot(df1["YearsExperience"], df1["Salary"])
plt.show()

Split the data into x and y - Independent and Dependent variable

x = df1.iloc[:,:-1].values
print (x)
y = df1.iloc[:,1].values
print (y)

Split the Data – Train Test split

from sklearn.model_selection import train_test_split
x_train, x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)

Model fitting
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(x_train, y_train)

Prediction
y_pred= reg.predict(x_test)
print (y_pred)

y= mx +c (Coefficient and Interceptor Values)

Y= slope
from sklearn.metrics import r2_score
print ('Coefficient', reg.coef_)
print ('Intercept', reg.intercept_)
Accuracy of the model
r2_score(y_test,y_pred)

Final Result in Data Frame

x_final = pd.DataFrame(x,columns= ['Experience'])
y_final = pd.DataFrame(y,columns= ['Salary'])
y_pred_final = pd.DataFrame(y_pred,columns= ['Salary Prediction'])
result = pd.concat([x_final,y_final,y_pred_final], axis =1)
print (result)
result.to_excel("C:\\Noble\\Training\\DS Temporary Files\\Simple
Regression.xlsx")
Create a Graph with predicted numbers
plt.scatter(x_train,y_train)
plt.plot (x_train,reg.predict(x_train),'red' )

predicted graph on test data

plt.scatter(x_test,y_test)
plt.plot (x_train,reg.predict(x_train),'red' )

Prediction for new set of data

y_pred= reg. predict ([[12], [9.6],[8.5], [2.5]])
print (y_pred)

Linear Regression Prediction with Data Frame

Import Libraries
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt

Change directory

os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')

os.getcwd()
Load Data Set

df1= pd.read_csv('Salary_Data.csv')
print (df1)

Plot Graph
plt.plot(df1["YearsExperience"], df1["Salary"])
plt.show()

X and Y as Data Frame

x = df1.iloc[:,:-1]
print (x)
y = df1.iloc[:,1]
print (y)

Train Test Split

from sklearn.model_selection import train_test_split

x_train, x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)

Linear Regression

from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(x_train, y_train)
Prediction
y_pred= reg.predict(x_test)
print (y_pred)

Coefficient and Intercept

print ('Coefficient', reg.coef_)
print ('Intercept', reg.intercept_)

Accuracy
from sklearn.metrics import r2_score
r2_score(y_test,y_pred)

Export data to excel

y_pred_final = pd.DataFrame(reg.predict(x),columns= ['Salary Prediction'])

result = pd.concat([x,y,y_pred_final], axis =1)
print (result)
result.to_excel("C:\\Noble\\Training\\DS Temporary Files\\Simple
Regression.xlsx")
Multiple Linear regression –
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from sklearn.metrics import r2_score

Load Data Set

os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')
df1=pd.read_csv('50_Startups.csv')
df1

Split x and y
x = df1.iloc[:,:-1].values
print (x)

y = df1.iloc[:,4].values
print (y)
Label Encoding
from sklearn.preprocessing import LabelEncoder
Label = LabelEncoder()
x[:,3]= Label.fit_transform(x[:,3])
print (x)

One Hot Encoding

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])],
remainder='passthrough')
x = np.array(ct.fit_transform(x))
print (x)

Print X as Data Frame

print (pd.DataFrame(x))

Split the data as train , test split

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test =train_test_split (x,y,test_size =
0.2,random_state= 42)

Create the Model

from sklearn.linear_model import LinearRegression
reg= LinearRegression()
reg.fit(x_train,y_train)

Predictions
y_pred= reg.predict(x_test)
print (y_pred)

Print Result
result = pd.concat([pd.DataFrame(y_pred),pd.DataFrame(y_test)], axis =1)
print (result)
Print Y and Prediction in one data frame - Concat
y_pre= pd.DataFrame(y_pred, columns =['Prediction'])
y_te = pd.DataFrame(y_test,columns= ['Actual'])
x_te = pd.DataFrame(x_test,columns= ['CF','FR','New Y','R&D','Admin','Mark'])
result = pd.concat([x_te,y_te,y_pre], axis =1)
print (result)

Accuracy
r2_score(y_test, y_pred)
Regression Coefficient
reg.coef_

Regression Intercept
reg.intercept_

Ordinary Least Square Method

x=x.astype('float64')
import statsmodels.api as sm
reg_ols = sm.OLS (endog = y, exog = x)
reg_ols = reg_ols.fit()
print (reg_ols.summary())

Tune the Model by removing State Column (P Value Greater than 0.05)
Print the Data Frame
pd.DataFrame(x)

Create the OLS Method by removing the variable which has maximum P
Value – Remove Column 4

x_opt=x[:,[0,1,2,3,5]]
import statsmodels.api as sm
reg_ols = sm.OLS (endog = y, exog =x_opt)
reg_ols = reg_ols.fit()
print (reg_ols.summary())

Create the OLS Method by removing the variable which has maximum P
Value – Remove Column last Column

x_opt=x[:,[0,1,2,3]]
import statsmodels.api as sm
reg_ols = sm.OLS (endog = y, exog =x_opt)
reg_ols = reg_ols.fit()
print (reg_ols.summary())

All the variables with P Value < 0.05 removed , create the model again with
new data set

Train test Split

from sklearn.model_selection import train_test_split
xopt_train,xopt_test,y_train,y_test =train_test_split (x_opt,y,test_size =
0.2,random_state= 42)

Create Model

from sklearn.linear_model import LinearRegression

reg= LinearRegression()
reg.fit(xopt_train,y_train)

Prediction
yopt_pred= reg.predict(xopt_test)
print (yopt_pred)

Print Result
result = pd.concat([pd.DataFrame(yopt_pred),pd.DataFrame(y_test)], axis =1)
print (result)

Print Original Data Frame with Predicted Value

yopt_pre= pd.DataFrame(yopt_pred, columns =['Prediction'])

y_te = pd.DataFrame(y_test,columns= ['Actual'])
x_te = pd.DataFrame(x_test,columns= ['CF','FR','New Y','R&D','Admin','Mark'])
result = pd.concat([x_te,y_te,yopt_pre], axis =1)
print (result)

Check Accuracy
r2_score(y_test, yopt_pred)

Prediction for All 50 records

yfull_pred= reg.predict(x_opt)
print (yfull_pred)

Accuracy
r2_score(y, yfull_pred)

Create the Model with only column R& D Spend

x_opt=x[:,3:4]
x_opt

Train Test Split

from sklearn.model_selection import train_test_split
xopt_train,xopt_test,y_train,y_test =train_test_split (x_opt,y,test_size =
0.2,random_state= 42)

Print Shape
print (xopt_train.shape)

Create Model with one column

from sklearn.linear_model import LinearRegression
freg= LinearRegression()
freg.fit(xopt_train,y_train)
Prediction and Check accuracy
yone_pred= freg.predict(x_opt)
r2_score(y, yone_pred)

Print the result as Graph

import seaborn as sns
sns.regplot( x = yone_pred, y = y, scatter_kws={"color": "b"}, line_kws={"color":
"r"},ci = None)

Prediction for New Data Set

Load new Data Set

df_Predict=pd.read_csv('50_Startups_Predictions.csv')
df_Predict

Count Number of Records

df_Predict.count()

Create Array
x_Predict = df_Predict.values
print (x_Predict)

Label Encoding
Label_Predict = LabelEncoder()
x_Predict[:,3]= Label_Predict.fit_transform(x_Predict[:,3])
print (x_Predict)

One Hot Encoding

Print X Values
print (pd.DataFrame(x_Predict))

Generate Predicted Values

xone_Predict= x_Predict[:,3:4]
yone_Predict= freg.predict(xone_Predict)
print (yone_Predict)

Display the result as Data Frame – with X

yone_Predict= pd.DataFrame(yone_Predict, columns =['Prediction'])

x_Predict = pd.DataFrame(x_Predict,columns= ['CF','FR','New Y','R&D','Admin','Mark'])
result = pd.concat([x_Predict,yone_Predict], axis =1)
print (result)

Display the result with Actual Input Data Set

yone_Predict= pd.DataFrame(yone_Predict, columns =['Prediction'])
result = pd.concat([df_Predict,yone_Predict], axis =1)
print (result)

ML All Projectpdf Removed
No ratings yet
ML All Projectpdf Removed
41 pages
Practicalpgm ML
No ratings yet
Practicalpgm ML
33 pages
Naive Bayes
No ratings yet
Naive Bayes
58 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
HP Troubleshooting Assessment (0001106069)
100% (3)
HP Troubleshooting Assessment (0001106069)
4 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
ML Lab Record - 250625 - 105014
No ratings yet
ML Lab Record - 250625 - 105014
29 pages
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
No ratings yet
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
26 pages
ML Internal Questions
No ratings yet
ML Internal Questions
15 pages
Aiml Practicals
No ratings yet
Aiml Practicals
22 pages
Regression
No ratings yet
Regression
16 pages
Linear Regression Besant
No ratings yet
Linear Regression Besant
11 pages
MLLAB
No ratings yet
MLLAB
10 pages
ML
No ratings yet
ML
17 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
ML 6 7 8
No ratings yet
ML 6 7 8
10 pages
Btech1007022 Lab5
No ratings yet
Btech1007022 Lab5
14 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Da 012307
No ratings yet
Da 012307
8 pages
Print Out ML - Finallllllllllllllll
No ratings yet
Print Out ML - Finallllllllllllllll
11 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
1
No ratings yet
1
13 pages
DS P6 Yash
No ratings yet
DS P6 Yash
8 pages
Code Structure
No ratings yet
Code Structure
6 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
ML Remaining
No ratings yet
ML Remaining
17 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
Dataset - Apollo Io Scraper - 2024 12 17 - 14 17 24 698
No ratings yet
Dataset - Apollo Io Scraper - 2024 12 17 - 14 17 24 698
142 pages
AI ML - Cycle 2 Programs
No ratings yet
AI ML - Cycle 2 Programs
15 pages
Btech1007022 Lab5.1
No ratings yet
Btech1007022 Lab5.1
9 pages
Linear Regression - Cheatsheet
No ratings yet
Linear Regression - Cheatsheet
8 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Ads Exp 01 B4 64
No ratings yet
Ads Exp 01 B4 64
4 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Import As Import As Import As: "Default - CSV"
No ratings yet
Import As Import As Import As: "Default - CSV"
9 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Data Analytics
No ratings yet
Data Analytics
10 pages
Import Pandas As PD
No ratings yet
Import Pandas As PD
3 pages
Machine File
No ratings yet
Machine File
27 pages
ML Practicals
No ratings yet
ML Practicals
11 pages
Program
No ratings yet
Program
3 pages
Import As From Import From Import From Import: R'creditcard - CSV' 'Time' 'Time'
No ratings yet
Import As From Import From Import From Import: R'creditcard - CSV' 'Time' 'Time'
3 pages
Experiment No.8
No ratings yet
Experiment No.8
5 pages
ML Lab Prgms Split
No ratings yet
ML Lab Prgms Split
3 pages
Task 8
No ratings yet
Task 8
2 pages
ml1 PRG
No ratings yet
ml1 PRG
2 pages
LR
No ratings yet
LR
2 pages
Topic
No ratings yet
Topic
2 pages
Liner Regression Chapter N1
No ratings yet
Liner Regression Chapter N1
1 page
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
Wa0002.
No ratings yet
Wa0002.
5 pages
Exp 1
No ratings yet
Exp 1
6 pages
Praktikum 1 Jupiter Machine Learning
No ratings yet
Praktikum 1 Jupiter Machine Learning
1 page
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Synthetic Speech Detection Through Short Term and Long-Term Prediction Traces
100% (1)
Synthetic Speech Detection Through Short Term and Long-Term Prediction Traces
14 pages
NMX 7 1
No ratings yet
NMX 7 1
141 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Sage 300 Construction and Real Estate - UserGuide 24
No ratings yet
Sage 300 Construction and Real Estate - UserGuide 24
144 pages
ARO Jamnagar
No ratings yet
ARO Jamnagar
26 pages
How To Become A Generative AI Data Scientist in 2025 - Shared
No ratings yet
How To Become A Generative AI Data Scientist in 2025 - Shared
19 pages
C: Users Dell Downloads Salary - Data - CSV
No ratings yet
C: Users Dell Downloads Salary - Data - CSV
2 pages
Meri Mehboob Behane (Complete)
33% (12)
Meri Mehboob Behane (Complete)
46 pages
CCBP Intensive Curriculum
No ratings yet
CCBP Intensive Curriculum
38 pages
Programming in
No ratings yet
Programming in
72 pages
User Manual - FI9804P
No ratings yet
User Manual - FI9804P
70 pages
Machine Learning Project Presentation
No ratings yet
Machine Learning Project Presentation
14 pages
Grade Isc 11 Wrick Computer Project 1
No ratings yet
Grade Isc 11 Wrick Computer Project 1
50 pages
Regulatory Sandbox Framework en
No ratings yet
Regulatory Sandbox Framework en
15 pages
FCKSCRBD
No ratings yet
FCKSCRBD
13 pages
Recurrence Relation
No ratings yet
Recurrence Relation
7 pages
Three-Step Control With PID - 3step: SIMATIC S7-1200 / S7-1500 + TIA Portal V15.1
No ratings yet
Three-Step Control With PID - 3step: SIMATIC S7-1200 / S7-1500 + TIA Portal V15.1
45 pages
Attachments 20639
No ratings yet
Attachments 20639
9 pages
Consumer Attitude Towards Pakistani Clothing Brands On Facebook Before and During Covid-19 Pandemic
No ratings yet
Consumer Attitude Towards Pakistani Clothing Brands On Facebook Before and During Covid-19 Pandemic
23 pages
Matplotlib For Data Science..
No ratings yet
Matplotlib For Data Science..
11 pages
SML Practical 2: Overview of Operators in R
No ratings yet
SML Practical 2: Overview of Operators in R
5 pages
Olympus AR Workflow Module 2.0 Business Requirements Document
No ratings yet
Olympus AR Workflow Module 2.0 Business Requirements Document
11 pages
Roadmap To Learn MERN STACK
No ratings yet
Roadmap To Learn MERN STACK
7 pages
PRJ Movie Recommendation Data Science..
No ratings yet
PRJ Movie Recommendation Data Science..
7 pages
UI UX Design - Report Text English - Mellya Voltina N 11 X6
No ratings yet
UI UX Design - Report Text English - Mellya Voltina N 11 X6
6 pages
Paper ML Ieee
No ratings yet
Paper ML Ieee
5 pages
sc250 04 11.exos
No ratings yet
sc250 04 11.exos
8 pages
MGT 302 3rd Slide
No ratings yet
MGT 302 3rd Slide
14 pages
SL Classification For Data Science..
No ratings yet
SL Classification For Data Science..
4 pages
Configure Digital Clock
No ratings yet
Configure Digital Clock
3 pages
Digi-Locker - An Overview: Merlin Ann George & Dr. A M Viswambharam
No ratings yet
Digi-Locker - An Overview: Merlin Ann George & Dr. A M Viswambharam
3 pages
Network Printer - Raspi Cups Prints PDF Input As Garbage On Paper - Server Fault
No ratings yet
Network Printer - Raspi Cups Prints PDF Input As Garbage On Paper - Server Fault
2 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)

Supervised Learning For Data Science...

Uploaded by

Supervised Learning For Data Science...

Uploaded by

Supervised Learning

Example Log Transformation

Load Data Set and make a copy

Create Box plot to check outliers

Create dist plot

Apply log Transformation to address outliers

Create box plot and check outlier again

Create dist plot

Save the result in .xls

Load the Data Set

os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')

create the graph to check the trend

Split the data into x and y - Independent and Dependent variable

Split the Data – Train Test split

y= mx +c (Coefficient and Interceptor Values)

Final Result in Data Frame

predicted graph on test data

Prediction for new set of data

Linear Regression Prediction with Data Frame

os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')

X and Y as Data Frame

Train Test Split

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

Coefficient and Intercept

Export data to excel

y_pred_final = pd.DataFrame(reg.predict(x),columns= ['Salary Prediction'])

Load Data Set

One Hot Encoding

Print X as Data Frame

Split the data as train , test split

Create the Model

Ordinary Least Square Method

Train test Split

from sklearn.linear_model import LinearRegression

Print Original Data Frame with Predicted Value

yopt_pre= pd.DataFrame(yopt_pred, columns =['Prediction'])

Prediction for All 50 records

Create the Model with only column R& D Spend

Train Test Split

Create Model with one column

Print the result as Graph

Prediction for New Data Set

Load new Data Set

Count Number of Records

One Hot Encoding

Generate Predicted Values

Display the result as Data Frame – with X

yone_Predict= pd.DataFrame(yone_Predict, columns =['Prediction'])

Display the result with Actual Input Data Set

You might also like