0% found this document useful (0 votes)

75 views129 pages

Intermediate Regression With Statsmodels in Python

Linear regression and logistic regression are the two most widely used statistical models and act like master keys, unlocking the secrets hidden in datasets. In this course, you’ll build on the skills you gained in "Introduction to Regression in Python with statsmodels", as you learn about linear and logistic regression with multiple explanatory variables.

Uploaded by

jcmayac

We take content rights seriously. If you suspect this is your content, claim it here.

0% found this document useful (0 votes)

75 views129 pages

Intermediate Regression With Statsmodels in Python

Uploaded by

jcmayac

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 129

Parallel slopes linear

regression
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
The previous course
This course assumes knowledge from Introduction to Regression with statsmodels in Python

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

From simple regression to multiple regression
Multiple regression is a regression model with more than one explanatory variable.

More explanatory variables can give more insight and be er predictions.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

The course contents
Chapter 1 Chapter 2
"Parallel slopes" regression Interactions

Simpson's Paradox

Chapter 3 Chapter 4
More explanatory variables Multiple logistic regression

How linear regression works The logistic distribution

How logistic regression works

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

The fish dataset
mass_g length_cm species Each row represents a sh

242.0 23.2 Bream mass_g is the response variable

5.9 7.5 Perch 1 numeric, 1 categorical explanatory

200.0 30.0 Pike variable

40.0 12.9 Roach

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

One explanatory variable at a time
from statsmodels.formula.api import ols mdl_mass_vs_species = ols("mass_g ~ species + 0",
data=fish).fit()
mdl_mass_vs_length = ols("mass_g ~ length_cm",
data=fish).fit() print(mdl_mass_vs_species.params)

print(mdl_mass_vs_length.params) species[Bream] 617.828571

species[Perch] 382.239286
species[Pike] 718.705882
Intercept -536.223947
species[Roach] 152.050000
length_cm 34.899245
dtype: float64
dtype: float64

1 intercept coe cient 1 intercept coe cient for each category

1 slope coe cient

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Both variables at the same time
mdl_mass_vs_both = ols("mass_g ~ length_cm + species + 0",
data=fish).fit()

print(mdl_mass_vs_both.params)

species[Bream] -672.241866
species[Perch] -713.292859
species[Pike] -1089.456053
species[Roach] -726.777799
length_cm 42.568554
dtype: float64

1 slope coe cient

1 intercept coe cient for each category

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Comparing coefficients
print(mdl_mass_vs_length.params) print(mdl_mass_vs_both.params)

Intercept -536.223947 species[Bream] -672.241866

length_cm 34.899245 species[Perch] -713.292859
species[Pike] -1089.456053
species[Roach] -726.777799
length_cm 42.568554

print(mdl_mass_vs_species.params)

species[Bream] 617.828571
species[Perch] 382.239286
species[Pike] 718.705882
species[Roach] 152.050000

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Visualization: 1 numeric explanatory variable
import matplotlib.pyplot as plt
import seaborn as sns

sns.regplot(x="length_cm",
y="mass_g",
data=fish,
ci=None)

plt.show()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Visualization: 1 categorical explanatory variable
sns.boxplot(x="species",
y="mass_g",
data=fish,
showmeans=True)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Visualization: both explanatory variables
coeffs = mdl_mass_vs_both.params plt.axline(xy1=(0, ic_bream), slope=sl, color="blue")
print(coeffs) plt.axline(xy1=(0, ic_perch), slope=sl, color="green")
plt.axline(xy1=(0, ic_pike), slope=sl, color="red")
plt.axline(xy1=(0, ic_roach), slope=sl, color="orange")
species[Bream] -672.241866
species[Perch] -713.292859
species[Pike] -1089.456053
species[Roach] -726.777799
length_cm 42.568554

ic_bream, ic_perch, ic_pike, ic_roach, sl = coeffs

sns.scatterplot(x="length_cm",
y="mass_g",
hue="species",
data=fish)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
Predicting parallel
slopes
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
The prediction workflow
import pandas as pd length_cm
import numpy as np 0 5
expl_data_length = pd.DataFrame( 1 10
{"length_cm": np.arange(5, 61, 5)}) 2 15
print(expl_data_length) 3 20
4 25
5 30
6 35
7 40
8 45
9 50
10 55
11 60

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

The prediction workflow
[A, B, C] x [1, 2] ==> [A1, B1, C1, A2, B2, C2] length_cm species
0 5 Bream
1 5 Roach
from itertools import product
2 5 Perch
product(["A", "B", "C"], [1, 2])
3 5 Pike
4 10 Bream
length_cm = np.arange(5, 61, 5) 5 10 Roach
species = fish["species"].unique() 6 10 Perch
...
p = product(length_cm, species) 41 55 Roach
42 55 Perch
expl_data_both = pd.DataFrame(p, 43 55 Pike
columns=['length_cm', 44 60 Bream
'species']) 45 60 Roach
print(expl_data_both) 46 60 Perch
47 60 Pike

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

The prediction workflow
Predict mass_g from length_cm only length_cm mass_g
0 5 -361.7277
1 10 -187.2315
prediction_data_length = expl_data_length.assign(
2 15 -12.7353
mass_g = mdl_mass_vs_length.predict(expl_data)
3 20 161.7610
)
4 25 336.2572
5 30 510.7534
... # number of rows: 12

Predict mass_g from both explanatory length_cm species mass_g

variables 0 5 Bream -459.3991
1 5 Roach -513.9350
2 5 Perch -500.4501
prediction_data_both = expl_data_both.assign(
3 5 Pike -876.6133
mass_g = mdl_mass_vs_both.predict(expl_data)
4 10 Bream -246.5563
)
5 10 Roach -301.0923
... # number of rows: 48

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Visualizing the predictions
plt.axline(xy1=(0, ic_bream), slope=sl, color="blue")
plt.axline(xy1=(0, ic_perch), slope=sl, color="green")
plt.axline(xy1=(0, ic_pike), slope=sl, color="red")
plt.axline(xy1=(0, ic_roach), slope=sl, color="orange")

sns.scatterplot(x="length_cm",
y="mass_g",
hue="species",
data=fish)

sns.scatterplot(x="length_cm",
y="mass_g",
color="black",
data=prediction_data)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Manually calculating predictions for linear regression
coeffs = mdl_mass_vs_length.params Intercept -536.223947
print(coeffs) length_cm 34.899245

intercept, slope = coeffs length_cm mass_g

0 5 -361.727721
1 10 -187.231494
explanatory_data = pd.DataFrame(
2 15 -12.735268
{"length_cm": np.arange(5, 61, 5)})
3 20 161.760959
4 25 336.257185
prediction_data = explanatory_data.assign(
5 30 510.753412
mass_g = intercept + slope * explanatory_data
...
)
9 50 1208.738318
10 55 1383.234545
print(prediction_data) 11 60 1557.730771

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Manually calculating predictions for multiple
regression
coeffs = mdl_mass_vs_both.params
print(coeffs)

species[Bream] -672.241866
species[Perch] -713.292859
species[Pike] -1089.456053
species[Roach] -726.777799
length_cm 42.568554

ic_bream, ic_perch, ic_pike, ic_roach, slope = coeffs

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

np.select()
conditions = [
condition_1,
condition_2,
# ...
condition_n
]

choices = [list_of_choices] # same length as conditions

np.select(conditions, choices)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Choosing an intercept with np.select()
conditions = [ [ -672.24 -726.78 -713.29 -1089.46
explanatory_data["species"] == "Bream", -672.24 -726.78 -713.29 -1089.46
explanatory_data["species"] == "Perch", -672.24 -726.78 -713.29 -1089.46
explanatory_data["species"] == "Pike", -672.24 -726.78 -713.29 -1089.46
explanatory_data["species"] == "Roach" -672.24 -726.78 -713.29 -1089.46
] -672.24 -726.78 -713.29 -1089.46
-672.24 -726.78 -713.29 -1089.46
-672.24 -726.78 -713.29 -1089.46
choices = [ic_bream, ic_perch, ic_pike, ic_roach]
-672.24 -726.78 -713.29 -1089.46
-672.24 -726.78 -713.29 -1089.46
intercept = np.select(conditions, choices) -672.24 -726.78 -713.29 -1089.46
-672.24 -726.78 -713.29 -1089.46]
print(intercept)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

The final prediction step
prediction_data = explanatory_data.assign( length_cm species intercept mass_g
intercept = np.select(conditions, choices), 0 5 Bream -672.2419 -459.3991
mass_g = intercept + slope * explanatory_data["length_cm"]) 1 5 Roach -726.7778 -513.9350
2 5 Perch -713.2929 -500.4501
print(prediction_data) 3 5 Pike -1089.4561 -876.6133
4 10 Bream -672.2419 -246.5563
5 10 Roach -726.7778 -301.0923
6 10 Perch -713.2929 -287.6073
7 10 Pike -1089.4561 -663.7705
8 15 Bream -672.2419 -33.7136
...
40 55 Bream -672.2419 1669.0286
41 55 Roach -726.7778 1614.4927
42 55 Perch -713.2929 1627.9776
43 55 Pike -1089.4561 1251.8144
44 60 Bream -672.2419 1881.8714
45 60 Roach -726.7778 1827.3354
46 60 Perch -713.2929 1840.8204
47 60 Pike -1089.4561 1464.6572

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Compare to .predict()
mdl_mass_vs_both.predict(explanatory_data) 0 -459.3991
1 -513.9350
2 -500.4501
3 -876.6133
4 -246.5563
5 -301.0923
...
43 1251.8144
44 1881.8714
45 1827.3354
46 1840.8204
47 1464.6572

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
Assessing model
performance
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
Model performance metrics
Coe cient of determination (R-squared): how well the linear regression line ts the
observed values.
Larger is be er.

Residual standard error (RSE): the typical size of the residuals.

Smaller is be er.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Getting the coefficient of determination
print(mdl_mass_vs_length.rsquared)

0.8225689502644215

print(mdl_mass_vs_species.rsquared)

0.25814887709499157

print(mdl_mass_vs_both.rsquared)

0.9200433561156649

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Adjusted coefficient of determination
More explanatory variables increases R2 .
Too many explanatory variables causes over ing.

Adjusted coe cient of determination penalizes more explanatory variables.

R¯2 = 1 − (1 − R2 ) nobsn−n
obs −1
var −1

Penalty is noticeable when R2 is small, or nvar is large fraction of nobs .

In statsmodels , it's contained in the rsquared_adj a ribute.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Getting the adjusted coefficient of determination
print("rsq_length: ", mdl_mass_vs_length.rsquared)
print("rsq_adj_length: ", mdl_mass_vs_length.rsquared_adj)

rsq_length: 0.8225689502644215
rsq_adj_length: 0.8211607673300121

print("rsq_species: ", mdl_mass_vs_species.rsquared)

print("rsq_adj_species: ", mdl_mass_vs_species.rsquared_adj)

rsq_species: 0.25814887709499157
rsq_adj_species: 0.24020086605696722

print("rsq_both: ", mdl_mass_vs_both.rsquared

print("rsq_adj_both: ", mdl_mass_vs_both.rsquared_adj)

rsq_both: 0.9200433561156649
rsq_adj_both: 0.9174431400543857

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Getting the residual standard error
rse_length = np.sqrt(mdl_mass_vs_length.mse_resid)
print("rse_length: ", rse_length)

rse_length: 152.12092835414788

rse_species = np.sqrt(mdl_mass_vs_species.mse_resid)
print("rse_species: ", rse_species)

rse_species: 313.5501156682592

rse_both = np.sqrt(mdl_mass_vs_both.mse_resid)
print("rse_both: ", rse_both)

rse_both: 103.35563303966488

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
Models for each
category
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
Four categories
print(fish["species"].unique())

array(['Bream', 'Roach', 'Perch', 'Pike'], dtype=object)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Splitting the dataset
bream = fish[fish["species"] == "Bream"]
perch = fish[fish["species"] == "Perch"]
pike = fish[fish["species"] == "Pike"]
roach = fish[fish["species"] == "Roach"]

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Four models
mdl_bream = ols("mass_g ~ length_cm", data=bream).fit() mdl_perch = ols("mass_g ~ length_cm", data=perch).fit()
print(mdl_bream.params) print(mdl_perch.params)

Intercept -1035.3476 Intercept -619.1751

length_cm 54.5500 length_cm 38.9115

mdl_pike = ols("mass_g ~ length_cm", data=pike).fit() mdl_roach = ols("mass_g ~ length_cm", data=roach).fit()

print(mdl_pike.params) print(mdl_roach.params)

Intercept -1540.8243 Intercept -329.3762

length_cm 53.1949 length_cm 23.3193

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Explanatory data
explanatory_data = pd.DataFrame( length_cm
{"length_cm": np.arange(5, 61, 5)}) 0 5
1 10
print(explanatory_data) 2 15
3 20
4 25
5 30
6 35
7 40
8 45
9 50
10 55
11 60

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Making predictions
prediction_data_bream = explanatory_data.assign( prediction_data_perch = explanatory_data.assign(
mass_g = mdl_bream.predict(explanatory_data), mass_g = mdl_perch.predict(explanatory_data),
species = "Bream") species = "Perch")

prediction_data_pike = explanatory_data.assign( prediction_data_roach = explanatory_data.assign(

mass_g = mdl_pike.predict(explanatory_data), mass_g = mdl_roach.predict(explanatory_data),
species = "Pike") species = "Roach")

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Concatenating predictions
prediction_data = pd.concat([prediction_data_bream, length_cm mass_g species
prediction_data_roach, 0 5 -762.597660 Bream
prediction_data_perch, 1 10 -489.847756 Bream
prediction_data_pike]) 2 15 -217.097851 Bream
3 20 55.652054 Bream
4 25 328.401958 Bream
5 30 601.151863 Bream
...
3 20 -476.926955 Pike
4 25 -210.952626 Pike
5 30 55.021703 Pike
6 35 320.996032 Pike
7 40 586.970362 Pike
8 45 852.944691 Pike
9 50 1118.919020 Pike
10 55 1384.893349 Pike
11 60 1650.867679 Pike

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Visualizing predictions
sns.lmplot(x="length_cm",
y="mass_g",
data=fish,
hue="species",
ci=None)
plt.show()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Adding in your predictions
sns.lmplot(x="length_cm",
y="mass_g",
data=fish,
hue="species",
ci=None)

sns.scatterplot(x="length_cm",
y="mass_g",
data=prediction_data,
hue="species",
ci=None,
legend=False)

plt.show()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Coefficient of determination
mdl_fish = ols("mass_g ~ length_cm + species", print(mdl_bream.rsquared_adj)
data=fish).fit()

0.874
print(mdl_fish.rsquared_adj)

print(mdl_perch.rsquared_adj)
0.917

0.917

print(mdl_pike.rsquared_adj)

0.941

print(mdl_roach.rsquared_adj)

0.815

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Residual standard error
print(np.sqrt(mdl_fish.mse_resid)) print(np.sqrt(mdl_bream.mse_resid))

103 74.2

print(np.sqrt(mdl_perch.mse_resid))

100

print(np.sqrt(mdl_pike.mse_resid))

120

print(np.sqrt(mdl_roach.mse_resid))

38.2

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
One model with an
interaction
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
What is an interaction?
In the sh dataset
Di erent sh species have di erent mass to length ratios.

The e ect of length on the expected mass is di erent for di erent species.

More generally
The e ect of one explanatory variable on the expected response changes depending on the
value of another explanatory variable.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Specifying interactions
No interactions No interactions
response ~ explntry1 + explntry2 mass_g ~ length_cm + species

With interactions (implicit) With interactions (implicit)

response_var ~ explntry1 * explntry2 mass_g ~ length_cm * species

With interactions (explicit) With interactions (explicit)

response ~ explntry1 + explntry2 + explntry1:explntry2 mass_g ~ length_cm + species + length_cm:species

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Running the model
mdl_mass_vs_both = ols("mass_g ~ length_cm * species", data=fish).fit()

print(mdl_mass_vs_both.params)

Intercept -1035.3476
species[T.Perch] 416.1725
species[T.Pike] -505.4767
species[T.Roach] 705.9714
length_cm 54.5500
length_cm:species[T.Perch] -15.6385
length_cm:species[T.Pike] -1.3551
length_cm:species[T.Roach] -31.2307

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Easier to understand coefficients
mdl_mass_vs_both_inter = ols("mass_g ~ species + species:length_cm + 0", data=fish).fit()

print(mdl_mass_vs_both_inter.params)

species[Bream] -1035.3476
species[Perch] -619.1751
species[Pike] -1540.8243
species[Roach] -329.3762
species[Bream]:length_cm 54.5500
species[Perch]:length_cm 38.9115
species[Pike]:length_cm 53.1949
species[Roach]:length_cm 23.3193

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Familiar numbers
print(mdl_mass_vs_both_inter.params) print(mdl_bream.params)

species[Bream] -1035.3476 Intercept -1035.3476

species[Perch] -619.1751 length_cm 54.5500
species[Pike] -1540.8243
species[Roach] -329.3762
species[Bream]:length_cm 54.5500
species[Perch]:length_cm 38.9115
species[Pike]:length_cm 53.1949
species[Roach]:length_cm 23.3193

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
Making predictions
with interactions
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
The model with the interaction
mdl_mass_vs_both_inter = ols("mass_g ~ species + species:length_cm + 0",
data=fish).fit()

print(mdl_mass_vs_both_inter.params)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

The prediction flow
from itertools import product length_cm species mass_g
0 5 Bream -762.5977
length_cm = np.arange(5, 61, 5) 1 5 Roach -212.7799
2 5 Perch -424.6178
species = fish["species"].unique() 3 5 Pike -1274.8499
4 10 Bream -489.8478
p = product(length_cm, species) 5 10 Roach -96.1836
6 10 Perch -230.0604
7 10 Pike -1008.8756
explanatory_data = pd.DataFrame(p,
8 15 Bream -217.0979
columns=["length_cm",
...
"species"])
40 55 Bream 1964.9014
41 55 Roach 953.1833
prediction_data = explanatory_data.assign(
42 55 Perch 1520.9556
mass_g = mdl_mass_vs_both_inter.predict(explanatory_data))
43 55 Pike 1384.8933
44 60 Bream 2237.6513
print(prediction_data) 45 60 Roach 1069.7796
46 60 Perch 1715.5129
47 60 Pike 1650.8677

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Visualizing the predictions
sns.lmplot(x="length_cm",
y="mass_g",
data=fish,
hue="species",
ci=None)

sns.scatterplot(x="length_cm",
y="mass_g",
data=prediction_data,
hue="species")

plt.show()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Manually calculating the predictions
coeffs = mdl_mass_vs_both_inter.params

ic_bream, ic_perch, ic_pike, ic_roach,

slope_bream, slope_perch, slope_pike, slope_roach = coeffs

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Manually calculating the predictions
conditions = [
explanatory_data["species"] == "Bream",
explanatory_data["species"] == "Perch",
explanatory_data["species"] == "Pike",
explanatory_data["species"] == "Roach"
]

ic_choices = [ic_bream, ic_perch, ic_pike, ic_roach]

intercept = np.select(conditions, ic_choices)

slope_choices = [slope_bream, slope_perch, slope_pike, slope_roach]

slope = np.select(conditions, slope_choices)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Manually calculating the predictions
prediction_data = explanatory_data.assign( prediction_data = explanatory_data.assign(
mass_g = intercept + slope * explanatory_data["length_cm"]) mass_g = mdl_mass_vs_both_inter.predict(explanatory_data))

print(prediction_data) print(prediction_data)

length_cm species mass_g length_cm species mass_g

0 5 Bream -762.5977 0 5 Bream -762.5977
1 5 Roach -212.7799 1 5 Roach -212.7799
2 5 Perch -424.6178 2 5 Perch -424.6178
3 5 Pike -1274.8499 3 5 Pike -1274.8499
4 10 Bream -489.8478 4 10 Bream -489.8478
5 10 Roach -96.1836 5 10 Roach -96.1836
... ...
43 55 Pike 1384.8933 43 55 Pike 1384.8933
44 60 Bream 2237.6513 44 60 Bream 2237.6513
45 60 Roach 1069.7796 45 60 Roach 1069.7796
46 60 Perch 1715.5129 46 60 Perch 1715.5129
47 60 Pike 1650.8677 47 60 Pike 1650.8677

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
Simpson's Paradox
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
A most ingenious paradox!
Simpson's Paradox occurs when the trend of a model on the whole dataset is very di erent
from the trends shown by models on subsets of the dataset.

trend = slope coe cient

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Synthetic Simpson data
x y group 5 groups of data, labeled "A" to "E"

62.24344 70.60840 D
52.33499 14.70577 B
56.36795 46.39554 C
66.80395 66.17487 D
66.53605 89.24658 E
62.38129 91.45260 E

1 h ps://www.rdocumentation.org/packages/datasauRus/topics/simpsons_paradox

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Linear regressions
Whole dataset By group
mdl_whole = ols("y ~ x", mdl_by_group = ols("y ~ group + group:x + 0",
data=simpsons_paradox).fit() data = simpsons_paradox).fit()

print(mdl_whole.params) print(mdl_by_group.params)

Intercept -38.554 groupA groupB groupC groupD groupE

x 1.751 32.5051 67.3886 99.6333 132.3932 123.8242
groupA:x groupB:x groupC:x groupD:x groupE:x
-0.6266 -1.0105 -0.9940 -0.9908 -0.5364

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Plotting the whole dataset
sns.regplot(x="x",
y="y",
data=simpsons_paradox,
ci=None)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Plotting by group
sns.lmplot(x="x",
y="y",
data=simpsons_paradox,
hue="group",
ci=None)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Reconciling the difference
Good advice
If possible, try to plot the dataset.

Common advice
You can't choose the best model in general – it depends on the dataset and the question you
are trying to answer.

More good advice

Articulate a question before you start modeling.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Test score example

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Infectious disease example

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Reconciling the difference
Usually (but not always) the grouped model contains more insight.

Are you missing explanatory variables?

Context is important.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Simpson's paradox in real datasets
The paradox is usually less obvious.

You may see a zero slope rather than a complete change in direction.

It may not appear in every group.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
Two numeric
explanatory
variables
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
Visualizing three numeric variables
3D sca er plot

2D sca er plot with response as color

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Another column for the fish dataset
species mass_g length_cm height_cm
Bream 1000 33.5 18.96
Bream 925 36.2 18.75
Roach 290 24.0 8.88
Roach 390 29.5 9.48
Perch 1100 39.0 12.80
Perch 1000 40.2 12.60
Pike 1250 52.0 10.69
Pike 1650 59.0 10.81

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

3D scatter plot

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

2D scatter plot, color for response
sns.scatterplot(x="length_cm",
y="height_cm",
data=fish,
hue="mass_g")

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Modeling with two numeric explanatory variables
mdl_mass_vs_both = ols("mass_g ~ length_cm + height_cm",
data=fish).fit()

print(mdl_mass_vs_both.params)

Intercept -622.150234
length_cm 28.968405
height_cm 26.334804

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

The prediction flow
from itertools import product length_cm height_cm mass_g
0 5 2 -424.638603
length_cm = np.arange(5, 61, 5) 1 5 4 -371.968995
height_cm = np.arange(2, 21, 2) 2 5 6 -319.299387
3 5 8 -266.629780
p = product(length_cm, height_cm) 4 5 10 -213.960172
.. ... ... ...
explanatory_data = pd.DataFrame(p, 115 60 12 1431.971694
columns=["length_cm", 116 60 14 1484.641302
"height_cm"]) 117 60 16 1537.310909
118 60 18 1589.980517
119 60 20 1642.650125
prediction_data = explanatory_data.assign(
mass_g = mdl_mass_vs_both.predict(explanatory_data))
[120 rows x 3 columns]

print(prediction_data)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Plotting the predictions
sns.scatterplot(x="length_cm",
y="height_cm",
data=fish,
hue="mass_g")

sns.scatterplot(x="length_cm",
y="height_cm",
data=prediction_data,
hue="mass_g",
legend=False,
marker="s")

plt.show()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Including an interaction
mdl_mass_vs_both_inter = ols("mass_g ~ length_cm * height_cm",
data=fish).fit()

print(mdl_mass_vs_both_inter.params)

Intercept 159.107480
length_cm 0.301426
height_cm -78.125178
length_cm:height_cm 3.545435

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

The prediction flow with an interaction
length_cm = np.arange(5, 61, 5)
height_cm = np.arange(2, 21, 2)

p = product(length_cm, height_cm)

explanatory_data = pd.DataFrame(p,
columns=["length_cm",
"height_cm"])

prediction_data = explanatory_data.assign(
mass_g = mdl_mass_vs_both_inter.predict(explanatory_data))

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Plotting the predictions
sns.scatterplot(x="length_cm",
y="height_cm",
data=fish,
hue="mass_g")

sns.scatterplot(x="length_cm",
y="height_cm",
data=prediction_data,
hue="mass_g",
legend=False,
marker="s")

plt.show()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
More than two
explanatory
variables
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
From last time
sns.scatterplot(x="length_cm",
y="height_cm",
data=fish,
hue="mass_g")

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Faceting by species
grid = sns.FacetGrid(data=fish,
col="species",
hue="mass_g",
col_wrap=2,
palette="plasma")

grid.map(sns.scatterplot,
"length_cm",
"height_cm")

plt.show()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Faceting by species
It's possible to use more than one
categorical variable for faceting

Beware of faceting overuse

Plo ing becomes harder with increasing

number of variables

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Different levels of interaction
No interactions

ols("mass_g ~ length_cm + height_cm + species + 0", data=fish).fit()

two-way interactions between pairs of variables

ols(
"mass_g ~ length_cm + height_cm + species +
length_cm:height_cm + length_cm:species + height_cm:species + 0", data=fish).fit()

three-way interaction between all three variables

ols(
"mass_g ~ length_cm + height_cm + species +
length_cm:height_cm + length_cm:species + height_cm:species + length_cm:height_cm:species + 0", data=fish).fit()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

All the interactions
ols(
"mass_g ~ length_cm + height_cm + species +
length_cm:height_cm + length_cm:species + height_cm:species + length_cm:height_cm:species + 0",
data=fish).fit()

same as

ols(
"mass_g ~ length_cm * height_cm * species + 0",
data=fish).fit()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Only two-way interactions
ols(
"mass_g ~ length_cm + height_cm + species +
length_cm:height_cm + length_cm:species + height_cm:species + 0",
data=fish).fit()

same as

ols(
"mass_g ~ (length_cm + height_cm + species) ** 2 + 0",
data=fish).fit()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

The prediction flow
mdl_mass_vs_all = ols( length_cm height_cm species mass_g
"mass_g ~ length_cm * height_cm * species + 0", 0 5 2 Bream -570.656437
data=fish).fit() 1 5 2 Roach 31.449145
2 5 2 Perch 43.789984
length_cm = np.arange(5, 61, 5) 3 5 2 Pike 271.270093
height_cm = np.arange(2, 21, 2) 4 5 4 Bream -451.127405
species = fish["species"].unique() .. ... ... ... ...
475 60 18 Pike 2690.346384
p = product(length_cm, height_cm, species) 476 60 20 Bream 1531.618475
477 60 20 Roach 2621.797668
explanatory_data = pd.DataFrame(p, 478 60 20 Perch 3041.931709
columns=["length_cm", 479 60 20 Pike 2926.352397
"height_cm",
"species"]) [480 rows x 4 columns]

prediction_data = explanatory_data.assign(
mass_g = mdl_mass_vs_all.predict(explanatory_data))

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
How linear
regression works
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
The standard simple linear regression plot

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Visualizing residuals

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

A metric for the best fit
The simplest idea (which doesn't work)
Take the sum of all the residuals.

Some residuals are negative.

The next simplest idea (which does work)

Take the square of each residual, and add up those squares.

This is called the sum of squares.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

A detour into numerical optimization
A line plot of a quadratic equation

x = np.arange(-4, 5, 0.1)
y = x ** 2 - x + 10

xy_data = pd.DataFrame({"x": x,
"y": y})

sns.lineplot(x="x",
y="y",
data=xy_data)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Using calculus to solve the equation
y = x2 − x + 10
∂y
∂x
= 2x − 1

0 = 2x − 1

x = 0.5

y = 0.52 − 0.5 + 10 = 9.75

Not all equations can be solved like this.

You can let Python gure it out.

Don't worry if this doesn't make sense, you

won't need it for the exercises.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

minimize()
from scipy.optimize import minimize fun: 9.75
hess_inv: array([[0.5]])
jac: array([0.])
def calc_quadratic(x):
message: 'Optimization terminated successfully.'
y = x ** 2 - x + 10
nfev: 6
return y
nit: 2
njev: 3
minimize(fun=calc_quadratic, status: 0
x0=3) success: True
x: array([0.49999998])

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

A linear regression algorithm
De ne a function to calculate the sum of
def calc_sum_of_squares(coeffs):
squares metric.
intercept, slope = coeffs
# More calculation!

Call minimize() to nd coe cients that minimize(

minimize this function. fun=calc_sum_of_squares,
x0=0
)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
Multiple logistic
regression
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
Bank churn dataset
has_churned time_since_ rst_purchase time_since_last_purchase
0 0.3993247 -0.5158691
1 -0.4297957 0.6780654
0 3.7383122 0.4082544
0 0.6032289 -0.6990435
... ... ...
response length of relationship recency of activity

1 h ps://www.rdocumentation.org/packages/bayesQR/topics/Churn

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

logit()
from statsmodels.formula.api import logit

logit("response ~ explanatory", data=dataset).fit()

logit("response ~ explanatory1 + explanatory2", data=dataset).fit()

logit("response ~ explanatory1 * explanatory2", data=dataset).fit()

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

The four outcomes
predicted false predicted true
actual false correct false positive
actual true false negative correct

conf_matrix = mdl_logit.pred_table()

print(conf_matrix)

[[102. 98.]
[ 53. 147.]]

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Prediction flow
from itertools import product

explanatory1 = some_values
explanatory2 = some_values

p = product(explanatory1, explanatory2)
explanatory_data = pd.DataFrame(p,
columns=["explanatory1",
"explanatory2"])
prediction_data = explanatory_data.assign(
mass_g = mdl_logit.predict(explanatory_data))

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Visualization
prediction_data["most_likely_outcome"] = np.round(prediction_data["has_churned"])

sns.scatterplot(...
data=churn,
hue="has_churned",
...)

sns.scatterplot(...
data=prediction_data,
hue="most_likely_outcome",
...)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
The logistic
distribution
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
Gaussian probability density function (PDF)
from scipy.stats import norm

x = np.arange(-4, 4.05, 0.05)

gauss_dist = pd.DataFrame({
"x": x,
"gauss_pdf": norm.pdf(x)}
)

sns.lineplot(x="x",
y="gauss_pdf",
data=gauss_dist)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Gaussian cumulative distribution function (CDF)
x = np.arange(-4, 4.05, 0.05)

gauss_dist = pd.DataFrame({
"x": x,
"gauss_pdf": norm.pdf(x),
"gauss_cdf": norm.cdf(x)}
)

sns.lineplot(x="x",
y="gauss_cdf",
data=gauss_dist)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Gaussian cumulative distribution function (CDF)
x = np.arange(-4, 4.05, 0.05)

gauss_dist = pd.DataFrame({
"x": x,
"gauss_pdf": norm.pdf(x),
"gauss_cdf": norm.cdf(x)}
)

sns.lineplot(x="x",
y="gauss_cdf",
data=gauss_dist)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Gaussian inverse CDF
p = np.arange(0.001, 1, 0.001)

gauss_dist_inv = pd.DataFrame({
"p": p,
"gauss_inv_cdf": norm.ppf(p)}
)

sns.lineplot(x="p",
y="gauss_inv_cdf",
data=gauss_dist_inv)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Logistic PDF
from scipy.stats import logistic

x = np.arange(-4, 4.05, 0.05)

logistic_dist = pd.DataFrame({
"x": x,
"log_pdf": logistic.pdf(x)}
)

sns.lineplot(x="x",
y="log_pdf",
data=logistic_dist)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Logistic distribution
Logistic distribution CDF is also called the logistic function.
1
cdf(x) = (1+exp(−x))

Logistic distribution inverse CDF is also called the logit function.

p
inverse_cdf(p) = log( (1−p) )

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
How logistic
regression works
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
Sum of squares doesn't work
np.sum((y_pred - y_actual) ** 2)

y_actual is always 0 or 1 .

y_pred is between 0 and 1 .

There is a be er metric than sum of squares.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Likelihood
y_pred * y_actual

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Likelihood
y_pred * y_actual + (1 - y_pred) * (1 - y_actual)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Likelihood
np.sum(y_pred * y_actual + (1 - y_pred) * (1 - y_actual))

When y_actual = 1

y_pred * 1 + (1 - y_pred) * (1 - 1) = y_pred

When y_actual = 0

y_pred * 0 + (1 - y_pred) * (1 - 0) = 1 - y_pred

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Log-likelihood
Computing likelihood involves adding many very small numbers, leading to numerical error.

Log-likelihood is easier to compute.

log_likelihood = np.log(y_pred) * y_actual + np.log(1 - y_pred) * (1 - y_actual)

Both equations give the same answer.

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Negative log-likelihood
Maximizing log-likelihood is the same as minimizing negative log-likelihood.

-np.sum(log_likelihoods)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Logistic regression algorithm
def calc_neg_log_likelihood(coeffs)
intercept, slope = coeffs
# More calculation!

from scipy.optimize import minimize

minimize(
fun=calc_neg_log_likelihood,
x0=[0, 0]
)

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Let's practice!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N
Congratulations!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Maarten Van den Broeck

Content Developer at DataCamp
You learned things
Chapter 1 Chapter 2

Fit/visualize/predict/assess parallel slopes Interactions between explanatory variables

Simpson's Paradox

Chapter 3 Chapter 4

Extend to many explanatory variables Logistic regression with multiple

explanatory variables
Implement linear regression algorithm
Logistic distribution

Implement logistic regression algorithm

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

There is more to learn
Training and testing sets

Cross validation

P-values and signi cance

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Advanced regression
Generalized Linear Models in Python

Introduction to Predictive Analytics in Python

Linear Classi ers in Python

Machine Learning with Tree-Based Models in Python

INTERMEDIATE REGRESSION WITH STATSMODELS IN PYTHON

Have fun regressing!
I N T E R M E D I AT E R E G R E S S I O N W I T H S TAT S M O D E L S I N P Y T H O N

Think Stats 3rd Edition Early Release - Allen Downey
No ratings yet
Think Stats 3rd Edition Early Release - Allen Downey
97 pages
Integrated System Lab
No ratings yet
Integrated System Lab
25 pages
1.reg Chapter1
No ratings yet
1.reg Chapter1
30 pages
Multiple Linear Regression 3
No ratings yet
Multiple Linear Regression 3
68 pages
ML Combined
No ratings yet
ML Combined
254 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
132 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Lecture Notes Week 3
No ratings yet
Lecture Notes Week 3
61 pages
A Guided Tour To Machine Learning Using MATLAB
No ratings yet
A Guided Tour To Machine Learning Using MATLAB
15 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
An Introduction To Stadistical Learning-129-140-1-8
No ratings yet
An Introduction To Stadistical Learning-129-140-1-8
8 pages
UnivariateRegression Summary
No ratings yet
UnivariateRegression Summary
36 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
DA Practicle Answers Easyw
No ratings yet
DA Practicle Answers Easyw
30 pages
ML Lab-3
No ratings yet
ML Lab-3
14 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
LinearRegression Iris
No ratings yet
LinearRegression Iris
4 pages
Intro To Forecasting
No ratings yet
Intro To Forecasting
15 pages
Implementation of Simple Linear Regression Algorithm Using Python
No ratings yet
Implementation of Simple Linear Regression Algorithm Using Python
12 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
12 LinearModels1 Annotated
No ratings yet
12 LinearModels1 Annotated
34 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Interfacing Geostatistics and GIS
100% (1)
Interfacing Geostatistics and GIS
282 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
ml1 PRG
No ratings yet
ml1 PRG
2 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Finance Fundamentals in Python
100% (4)
Finance Fundamentals in Python
877 pages
Introduction To Regression With Statsmodels in Python
No ratings yet
Introduction To Regression With Statsmodels in Python
142 pages
Wa0001
No ratings yet
Wa0001
39 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Lecture Slides Slides 9
No ratings yet
Lecture Slides Slides 9
2 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Lab Linear Regression
No ratings yet
Lab Linear Regression
21 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Btech1007022 Lab5
No ratings yet
Btech1007022 Lab5
14 pages
ML Remaining
No ratings yet
ML Remaining
17 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Hemraj Python Ass1
No ratings yet
Hemraj Python Ass1
7 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
C1 W2 Lab02 Multiple Variable Soln
No ratings yet
C1 W2 Lab02 Multiple Variable Soln
11 pages
Btech1007022 Lab5.1
No ratings yet
Btech1007022 Lab5.1
9 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
ML Lab (R22) Manual
No ratings yet
ML Lab (R22) Manual
25 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
AI Fundamentals
90% (10)
AI Fundamentals
881 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
Chapter 6 - Advanced Machine Learning PDF
No ratings yet
Chapter 6 - Advanced Machine Learning PDF
37 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
Applied Finance in Python
100% (2)
Applied Finance in Python
545 pages
SPSS 2
No ratings yet
SPSS 2
27 pages
DA Lab ANSWERS
No ratings yet
DA Lab ANSWERS
10 pages
Sample MCQs
No ratings yet
Sample MCQs
67 pages
A Solution Manual and Notes For: The Elements of Statistical Learning by Jerome Friedman, Trevor Hastie, and Robert Tibshirani
No ratings yet
A Solution Manual and Notes For: The Elements of Statistical Learning by Jerome Friedman, Trevor Hastie, and Robert Tibshirani
121 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
Introduction To Statistics in Python
100% (2)
Introduction To Statistics in Python
211 pages
Introduction and Intermediate Docker
100% (1)
Introduction and Intermediate Docker
255 pages
Mekelle University College of Business and Economics Departement of Economics
No ratings yet
Mekelle University College of Business and Economics Departement of Economics
49 pages
Introduction To TensorFlow in Python
100% (3)
Introduction To TensorFlow in Python
146 pages
Data Analysis Using Eviews
No ratings yet
Data Analysis Using Eviews
9 pages
Exp 1
No ratings yet
Exp 1
6 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Supervised Learning With Scikit-Learn
No ratings yet
Supervised Learning With Scikit-Learn
178 pages
Exercise - Corriges AUTO HETERO E7
No ratings yet
Exercise - Corriges AUTO HETERO E7
24 pages
COURSES ECONOMETRICS Multiple Regression, Dummy, Error Anal
No ratings yet
COURSES ECONOMETRICS Multiple Regression, Dummy, Error Anal
26 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
Oman Chapter
No ratings yet
Oman Chapter
17 pages
Mas 42b Cost Behavior With Regression Analysis
No ratings yet
Mas 42b Cost Behavior With Regression Analysis
7 pages
Controlling Shareholders, Agency Problems, and Dividend Policy in Finland
No ratings yet
Controlling Shareholders, Agency Problems, and Dividend Policy in Finland
31 pages
CH 05 Wooldridge 6e PPT Updated
No ratings yet
CH 05 Wooldridge 6e PPT Updated
8 pages
ECON3334 Midterm Fall2022 Question
No ratings yet
ECON3334 Midterm Fall2022 Question
7 pages
Panel Vs Pooled Data
No ratings yet
Panel Vs Pooled Data
9 pages
Mece-00 1:econometric Methods: Course Code: Asst
No ratings yet
Mece-00 1:econometric Methods: Course Code: Asst
22 pages
World Development: Carlo Azzarri, Sara Signorelli
No ratings yet
World Development: Carlo Azzarri, Sara Signorelli
19 pages
Chapter 3
No ratings yet
Chapter 3
23 pages
CH-6, Math-5 - Lecture - Note
No ratings yet
CH-6, Math-5 - Lecture - Note
16 pages
Gergaud O and Ginsburgh V 2008 Does Terroir Matter
No ratings yet
Gergaud O and Ginsburgh V 2008 Does Terroir Matter
17 pages
Introduction To Econometrics (3 Updated Edition, Global Edition)
No ratings yet
Introduction To Econometrics (3 Updated Edition, Global Edition)
7 pages
MGMT 469 Maximum Likelihood Estimation
No ratings yet
MGMT 469 Maximum Likelihood Estimation
6 pages
MOOC Econometrics 4
No ratings yet
MOOC Econometrics 4
3 pages
Example 2
No ratings yet
Example 2
7 pages
MOOC Econometrics: Test Exercise 4
No ratings yet
MOOC Econometrics: Test Exercise 4
2 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Effect of Total Quality Management On Industrial Performance in Nigeria An Empirical Investigation PDF
No ratings yet
Effect of Total Quality Management On Industrial Performance in Nigeria An Empirical Investigation PDF
10 pages
Solved - The Median Starting Salary For New Law School Graduates...
No ratings yet
Solved - The Median Starting Salary For New Law School Graduates...
1 page
11 17 06 Modeling Sovereign Correlations
No ratings yet
11 17 06 Modeling Sovereign Correlations
21 pages
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
From Everand
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
Christophe Lecoutre
No ratings yet