0% found this document useful (0 votes)
3 views

assignment2

The document outlines a data analysis process using a dataset containing temperature and revenue information. It includes data visualization with scatter plots, a linear regression model to predict revenue based on temperature, and evaluation metrics such as Mean Squared Error and R-Squared. The OLS regression results indicate a strong relationship between temperature and revenue, with an R-squared value of 0.997.

Uploaded by

Priangshu Paul
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

assignment2

The document outlines a data analysis process using a dataset containing temperature and revenue information. It includes data visualization with scatter plots, a linear regression model to predict revenue based on temperature, and evaluation metrics such as Mean Squared Error and R-Squared. The OLS regression results indicate a strong relationship between temperature and revenue, with an R-squared value of 0.997.

Uploaded by

Priangshu Paul
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("Datasets/SalesData.csv")

df.head()

Temperature Revenue
0 24.566884 534.799028
1 26.005191 625.190122
2 27.790554 660.632289
3 20.595335 487.706960
4 11.503498 316.240194

sns.scatterplot(x = df["Temperature"], y = df["Revenue"])

<Axes: xlabel='Temperature', ylabel='Revenue'>

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test =


train_test_split(df[["Temperature"]], df[["Revenue"]], test_size =
0.2, random_state=2)
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

lin_reg.fit(X_train, y_train)

LinearRegression()

print(f"Coefficient: {lin_reg.coef_}")
print(f"Intercept: {lin_reg.intercept_}")

Coefficient: [[21.38145125]]
Intercept: [46.72052514]

y_pred = lin_reg.predict(X_test)

from sklearn.metrics import mean_squared_error

print(f"Mean Squared Error: {mean_squared_error(y_test, y_pred)}")

Mean Squared Error: 636.1533670417468

from sklearn.metrics import r2_score

print(f"R-Squared: {r2_score(y_test, y_pred)}")

R-Squared: 0.973546292060864

plt.scatter(X_train, y_train, color = "blue")


plt.plot(X_train, lin_reg.predict(X_train), color = "red")
plt.title("Salary vs Experience (Training Set)")
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.show()
plt.scatter(X_test, y_test, color = "Blue")
plt.plot(X_train, lin_reg.predict(X_train), color = "Red")
plt.title("Salary vs Experience (Testing Set)")
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.show()
import statsmodels.api as sm

X = df["Temperature"]
y = df[["Revenue"]]

model = sm.OLS(y, X).fit()


print(model.summary())

OLS Regression Results

======================================================================
=================
Dep. Variable: Revenue R-squared (uncentered):
0.997
Model: OLS Adj. R-squared (uncentered):
0.997
Method: Least Squares F-statistic:
1.756e+05
Date: Thu, 13 Feb 2025 Prob (F-statistic):
0.00
Time: 21:13:20 Log-Likelihood:
-2398.1
No. Observations: 500 AIC:
4798.
Df Residuals: 499 BIC:
4802.
Df Model: 1

Covariance Type: nonrobust

======================================================================
=========
coef std err t P>|t| [0.025
0.975]
----------------------------------------------------------------------
---------
Temperature 23.2244 0.055 419.007 0.000 23.116
23.333
======================================================================
========
Omnibus: 3.228 Durbin-Watson:
2.022
Prob(Omnibus): 0.199 Jarque-Bera (JB):
3.426
Skew: 0.080 Prob(JB):
0.180
Kurtosis: 3.372 Cond. No.
1.00
======================================================================
========

Notes:
[1] R² is computed without centering (uncentered) since the model does
not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is
correctly specified.

You might also like