100% found this document useful (1 vote)
2K views

Assignment 4 Simple Linear Regression

The document discusses building simple linear regression models to predict delivery time using sorting time and to predict salary hike using years of experience. It performs EDA, builds linear regression models, evaluates the models, and makes predictions for both tasks. For delivery time prediction, it imports data, visualizes distributions, renames columns, checks correlations, builds an OLS model, evaluates coefficients and metrics, and makes manual and automatic predictions. A similar process is followed for salary hike prediction using years of experience data.

Uploaded by

alka aswar
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
2K views

Assignment 4 Simple Linear Regression

The document discusses building simple linear regression models to predict delivery time using sorting time and to predict salary hike using years of experience. It performs EDA, builds linear regression models, evaluates the models, and makes predictions for both tasks. For delivery time prediction, it imports data, visualizes distributions, renames columns, checks correlations, builds an OLS model, evaluates coefficients and metrics, and makes manual and automatic predictions. A similar process is followed for salary hike prediction using years of experience data.

Uploaded by

alka aswar
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

1) Delivery_time -> Predict delivery time using sorting time

2) Salary_hike -> Build a prediction model for Salary_hike

------------------------------------------------------------

Build a simple linear regression model by performing EDA and do necessary


transformations and select the best model using R or Python.
Q.1) Delivery_time -> Predict delivery time using sorting time
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

# import dataset
data=pd.read_csv('delivery_time.csv')
data

#EDA and Data Visualization


data.info()

sns.distplot(data['Delivery Time'])
sns.distplot(data['Sorting Time'])

# Renaming Columns
dataset=data.rename({'Delivery Time':'delivery_time', 'Sorting
Time':'sorting_time'},axis=1)
dataset

#Correlation Analysis
dataset.corr()

sns.regplot(x=dataset['sorting_time'],y=dataset['delivery_time'])

#Model Building
model=smf.ols("delivery_time~sorting_time",data=dataset).fit()

model.summary()

#Model Testing
# Finding Coefficient parameters
model.params

# Finding tvalues and pvalues


model.tvalues , model.pvalues

# Finding Rsquared Values


model.rsquared , model.rsquared_adj

#Model Predictions
# Manual prediction for say sorting time 5
delivery_time = (6.582734) + (1.649020)*(5)
delivery_time

# Automatic Prediction for say sorting time 5, 8


new_data=pd.Series([5,8])
new_data
data_pred=pd.DataFrame(new_data,columns=['sorting_time'])
data_pred

model.predict(data_pred)

Q.2) Salary_hike -> Build a prediction model for Salary_hike


# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

# import dataset
data=pd.read_csv('Salary_Data.csv')
data

#EDA and Data Visualization


data.info()

sns.distplot(data['YearsExperience'])
sns.distplot(data['Salary'])

# Renaming Columns
dataset1=data1.rename({'YearsExperience':'Experiance in year'},axis=1)
dataset1

#Correlation Analysis
dataset1.corr()

sns.regplot(x=dataset1['Experiance in year'],y=dataset1['Salary'])

#Model Building
model=smf.ols("Salary~YearsExperience",data=data1).fit()

model.summary()

#Model Testing
# Finding Coefficient parameters
model.params

# Finding tvalues and pvalues


model.tvalues , model.pvalues

# Finding Rsquared Values


model.rsquared , model.rsquared_adj

#Model Predictions
# Manual prediction for say 3 Years Experience
Salary = (25792.200199) + (9449.962321)*(3)
Salary

# Automatic Prediction for say sorting time 5, 8


new_data=pd.Series([5,8])
new_data

data_pred=pd.DataFrame(new_data,columns=['YearsExperience'])
data_pred
model.predict(data_pred)

You might also like