0% found this document useful (0 votes)
10 views

Multiple Regression

Uploaded by

pedropinto8400
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Multiple Regression

Uploaded by

pedropinto8400
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

MultipleRegression

March 14, 2024

1 Regressão múltipla
• E se mais de uma variável influenciar o que está sendo interessado?
• Exemplo: predizer o preço de um carro com base em seus vários atributos.
• Se também houver multiplas variáveis dependentes - coisas que estão tentando ser previstas
- isso é uma regressão multivariável.

1.0.1 Ainda usa least squares


• A unica diferença é que agora terá coeficientes diferentes para cada fator.
• Esses coeficientes implicam no quão importante cada fator realmente é, se os dados estiverem
normalizados.
• Se é livrado de variáveis que não influenciam.
• Ainda pode medir a adequação com r-squared.
• Precisa assumir que os diferentes fatores não são dependentes uns dos outros.

1.1 Pratica
[2]: import pandas as pd

df = pd.read_excel('cars.xls')

[3]: %matplotlib inline


import numpy as np
df1=df[['Mileage','Price']]
bins = np.arange(0,50000,10000)
groups = df1.groupby(pd.cut(df1['Mileage'],bins)).mean()
print(groups.head())
groups['Price'].plot.line()

Mileage Price
Mileage
(0, 10000] 5588.629630 24096.714451
(10000, 20000] 15898.496183 21955.979607
(20000, 30000] 24114.407104 20278.606252
(30000, 40000] 33610.338710 19463.670267
/tmp/ipykernel_12254/679127490.py:5: FutureWarning: The default of
observed=False is deprecated and will be changed to True in a future version of

1
pandas. Pass observed=False to retain current behavior or observed=True to adopt
the future default and silence this warning.
groups = df1.groupby(pd.cut(df1['Mileage'],bins)).mean()

[3]: <Axes: xlabel='Mileage'>

[4]: import statsmodels.api as sm


from sklearn.preprocessing import StandardScaler
scale = StandardScaler()

X = df[['Mileage', 'Cylinder', 'Doors']]


y = df['Price']

X[['Mileage', 'Cylinder', 'Doors']] = scale.fit_transform(X[['Mileage',␣


↪'Cylinder', 'Doors']].values)

X = sm.add_constant(X)

print (X)

est = sm.OLS(y, X).fit()

2
print(est.summary())

const Mileage Cylinder Doors


0 1.0 -1.417485 0.52741 0.556279
1 1.0 -1.305902 0.52741 0.556279
2 1.0 -0.810128 0.52741 0.556279
3 1.0 -0.426058 0.52741 0.556279
4 1.0 0.000008 0.52741 0.556279
.. … … … …
799 1.0 -0.439853 0.52741 0.556279
800 1.0 -0.089966 0.52741 0.556279
801 1.0 0.079605 0.52741 0.556279
802 1.0 0.750446 0.52741 0.556279
803 1.0 1.932565 0.52741 0.556279

[804 rows x 4 columns]


OLS Regression Results
==============================================================================
Dep. Variable: Price R-squared: 0.360
Model: OLS Adj. R-squared: 0.358
Method: Least Squares F-statistic: 150.0
Date: Thu, 14 Mar 2024 Prob (F-statistic): 3.95e-77
Time: 16:21:34 Log-Likelihood: -8356.7
No. Observations: 804 AIC: 1.672e+04
Df Residuals: 800 BIC: 1.674e+04
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.134e+04 279.405 76.388 0.000 2.08e+04 2.19e+04
Mileage -1272.3412 279.567 -4.551 0.000 -1821.112 -723.571
Cylinder 5587.4472 279.527 19.989 0.000 5038.754 6136.140
Doors -1404.5513 279.446 -5.026 0.000 -1953.085 -856.018
==============================================================================
Omnibus: 157.913 Durbin-Watson: 0.069
Prob(Omnibus): 0.000 Jarque-Bera (JB): 257.529
Skew: 1.278 Prob(JB): 1.20e-56
Kurtosis: 4.074 Cond. No. 1.03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
/tmp/ipykernel_12254/1575598944.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

3
See the caveats in the documentation: https://pandas.pydata.org/pandas-
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
X[['Mileage', 'Cylinder', 'Doors']] = scale.fit_transform(X[['Mileage',
'Cylinder', 'Doors']].values)

[5]: y.groupby(df.Doors).mean()

[5]: Doors
2 23807.135520
4 20580.670749
Name: Price, dtype: float64

[11]: scaled = scale.transform([[45000, 8, 4]])


scaled = np.insert(scaled[0], 0, 1)
print(scaled)
predicted = est.predict(scaled)
print(predicted)

[1. 3.07256589 1.96971667 0.55627894]


[27658.15707316]

You might also like