0% found this document useful (0 votes)
7 views

Python_Codes_Regression - Jupyter Notebook

Uploaded by

termp89
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Python_Codes_Regression - Jupyter Notebook

Uploaded by

termp89
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

Simple Regression
In [1]:  1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import warnings
5 warnings.filterwarnings('ignore')
6 import statsmodels.formula.api as smf
7 import statsmodels.api as sm

In [2]:  1 data = pd.DataFrame({'RDE': [2,3,5,4,11,5],'AP': [20,25,34,30,40,3


2 data.plot('RDE', 'AP', kind='scatter')
3 plt.title("Annual Profit against R&D Expenditure")
4
5 plt.xlabel("R&D Expenditure (Millions)")
6 ​
7 plt.ylabel("Annual Profit (Millions)")

Out[2]: Text(0, 0.5, 'Annual Profit (Millions)')

localhost:8888/notebooks/Python_Codes_Regression.ipynb 1/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

In [3]:  1 df=pd.DataFrame({'RDE': [2,3,5,4,11,5],'AP': [20,25,34,30,40,31]})


2 df.plot('RDE','AP', kind='scatter')
3 lm = smf.ols("AP ~ RDE", data=df).fit()
4 xmin = df.RDE.min()
5 xmax = df.RDE.max()
6 ​
7 X = np.linspace(xmin, xmax, 100)
8 ​
9 # params[0] is the intercept (w₀)
10 # params[1] is the slope (w₁)
11 Y = lm.params[0] + lm.params[1] * X
12 plt.plot(X, Y, color="darkgreen")
13 plt.xlabel("R&D Expenditure (Millions)")
14 plt.ylabel("Annual Profit (Millions)")

Out[3]: Text(0, 0.5, 'Annual Profit (Millions)')

In [4]:  1 df = pd.DataFrame({'RDE': [2,3,5,4,11,5,10,8],'AP': [20,25,34,30,40


2 # create and fit the linear model
3 lm = smf.ols(formula='AP ~ RDE', data=df).fit()
4 print(lm.params)

Intercept 20.157895
RDE 1.973684
dtype: float64

In [5]:  1 # use the fitted model for prediction


2 lm.predict({'RDE': 10})
3 # Expected Annual Profit (Millons) for R&D Expenditure of 10 (Mill

Out[5]: 0 39.894737
dtype: float64

localhost:8888/notebooks/Python_Codes_Regression.ipynb 2/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

In [6]:  1 df_rd=pd.read_excel("R&D_Profit.xlsx")
2 df_rd

Out[6]:
R&D Expenditure (Millions) Annual Profit (Millions)

0 2 20

1 3 25

2 5 34

3 4 30

4 11 40

5 5 31

localhost:8888/notebooks/Python_Codes_Regression.ipynb 3/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

In [7]:  1 X=df_rd['R&D Expenditure (Millions)']


2 y=df_rd['Annual Profit (Millions)']
3 # Add a constant to the X variable for the intercept term
4 X = sm.add_constant(X)
5 ​
6 # Fit the model
7 model = sm.OLS(y, X).fit()
8 ​
9 # Print model summary
10 print(model.summary())

OLS Regression Results


=====================================================================
===============
Dep. Variable: Annual Profit (Millions) R-squared:
0.826
Model: OLS Adj. R-squared:
0.783
Method: Least Squares F-statistic:
19.05
Date: Wed, 23 Oct 2024 Prob (F-statistic):
0.0120
Time: 18:18:33 Log-Likelihood:
-14.351
No. Observations: 6 AIC:
32.70
Df Residuals: 4 BIC:
32.29
Df Model: 1
Covariance Type: nonrobust
=====================================================================
=========================
coef std err t P>|t
| [0.025 0.975]
---------------------------------------------------------------------
-------------------------
const 20.0000 2.646 7.559 0.00
2 12.654 27.346
R&D Expenditure (Millions) 2.0000 0.458 4.364 0.01
2 0.728 3.272
=====================================================================
=========
Omnibus: nan Durbin-Watson:
1.500
Prob(Omnibus): nan Jarque-Bera (JB):
0.327
Skew: -0.000 Prob(JB):
0.849
Kurtosis: 1.857 Cond. No.
11.8
=====================================================================
=========

Notes:
[1] Standard Errors assume that the covariance matrix of the errors i
s correctly specified.

localhost:8888/notebooks/Python_Codes_Regression.ipynb 4/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

C:\Users\91941\anaconda3\lib\site-packages\statsmodels\stats\stattool
s.py:74: ValueWarning: omni_normtest is not valid with less than 8 ob
servations; 6 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i
"

In [8]:  1 lm=smf.ols(formula='AP~RDE', data=df).fit()


2 lm.summary()

Out[8]:
OLS Regression Results

Dep. Variable: AP R-squared: 0.871

Model: OLS Adj. R-squared: 0.849

Method: Least Squares F-statistic: 40.42

Date: Wed, 23 Oct 2024 Prob (F-statistic): 0.000710

Time: 18:48:59 Log-Likelihood: -18.166

No. Observations: 8 AIC: 40.33

Df Residuals: 6 BIC: 40.49

Df Model: 1

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

Intercept 20.1579 2.094 9.626 0.000 15.034 25.282

RDE 1.9737 0.310 6.358 0.001 1.214 2.733

Omnibus: 0.039 Durbin-Watson: 1.564

Prob(Omnibus): 0.980 Jarque-Bera (JB): 0.151

Skew: -0.053 Prob(JB): 0.927

Kurtosis: 2.336 Cond. No. 15.0

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.

In [9]:  1 data = pd.read_excel("Store_Data.xlsx")


2 data.head()

Out[9]:
Bars Price Promotion

0 4141 59 200

1 3842 59 200

2 3056 59 200

3 3519 59 200

4 4226 59 400

localhost:8888/notebooks/Python_Codes_Regression.ipynb 5/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

In [10]:  1 data.describe()

Out[10]:
Bars Price Promotion

count 34.000000 34.000000 34.000000

mean 3098.676471 77.823529 388.235294

std 1256.422018 16.286210 162.862102

min 675.000000 59.000000 200.000000

25% 2125.250000 59.000000 200.000000

50% 3430.500000 79.000000 400.000000

75% 3968.750000 99.000000 600.000000

max 5120.000000 99.000000 600.000000

In [11]:  1 lm = smf.ols(formula='Bars ~ Price + Promotion', data=data).fit()


2 print(lm.params)

Intercept 5837.520759
Price -53.217336
Promotion 3.613058
dtype: float64

localhost:8888/notebooks/Python_Codes_Regression.ipynb 6/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

In [12]:  1 lm.summary()

Out[12]:
OLS Regression Results

Dep. Variable: Bars R-squared: 0.758

Model: OLS Adj. R-squared: 0.742

Method: Least Squares F-statistic: 48.48

Date: Wed, 23 Oct 2024 Prob (F-statistic): 2.86e-10

Time: 18:52:31 Log-Likelihood: -266.26

No. Observations: 34 AIC: 538.5

Df Residuals: 31 BIC: 543.1

Df Model: 2

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

Intercept 5837.5208 628.150 9.293 0.000 4556.400 7118.642

Price -53.2173 6.852 -7.766 0.000 -67.193 -39.242

Promotion 3.6131 0.685 5.273 0.000 2.216 5.011

Omnibus: 1.418 Durbin-Watson: 2.282

Prob(Omnibus): 0.492 Jarque-Bera (JB): 0.486

Skew: -0.034 Prob(JB): 0.784

Kurtosis: 3.582 Cond. No. 2.45e+03

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
[2] The condition number is large, 2.45e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

In [13]:  1 #Predcited average/mean sales for price of 79 cents and promotiona


2 lm.predict({'Price': 79, 'Promotion': 400})

Out[13]: 0 3078.574405
dtype: float64

In [ ]:  1 ​

localhost:8888/notebooks/Python_Codes_Regression.ipynb 7/7

You might also like