Skip to content

Commit e1d7b6b

Browse files
committed
Pushing the docs to dev/ for branch: main, commit de084fc3f00b7f1e790ce841fff7f484b254fa33
1 parent 78263f7 commit e1d7b6b

File tree

1,259 files changed

+4729
-4715
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,259 files changed

+4729
-4715
lines changed
Binary file not shown.

dev/_downloads/34b53ad148e36f98b6de8ddc15e3dfd3/plot_causal_interpretation.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -124,8 +124,7 @@
124124
ax = coef.plot.barh()
125125
ax.set_xlabel("Coefficient values")
126126
ax.set_title("Coefficients of the linear regression including the ability features")
127-
plt.tight_layout()
128-
plt.show()
127+
_ = plt.tight_layout()
129128

130129
# %%
131130
# Income prediction with partial observations
@@ -158,6 +157,8 @@
158157
ax = coef.plot.barh()
159158
ax.set_xlabel("Coefficient values")
160159
_ = ax.set_title("Coefficients of the linear regression excluding the ability feature")
160+
plt.tight_layout()
161+
plt.show()
161162

162163
# %%
163164
# To compensate for the omitted variable, the model inflates the coefficient of

dev/_downloads/521b554adefca348463adbbe047d7e99/plot_linear_model_coefficient_interpretation.py

+23-25
Original file line numberDiff line numberDiff line change
@@ -3,25 +3,35 @@
33
Common pitfalls in the interpretation of coefficients of linear models
44
======================================================================
55
6-
In linear models, the target value is modeled as
7-
a linear combination of the features (see the :ref:`linear_model` User Guide
8-
section for a description of a set of linear models available in
9-
scikit-learn).
10-
Coefficients in multiple linear models represent the relationship between the
11-
given feature, :math:`X_i` and the target, :math:`y`, assuming that all the
12-
other features remain constant (`conditional dependence
13-
<https://en.wikipedia.org/wiki/Conditional_dependence>`_).
14-
This is different from plotting :math:`X_i` versus :math:`y` and fitting a
15-
linear relationship: in that case all possible values of the other features are
16-
taken into account in the estimation (marginal dependence).
6+
In linear models, the target value is modeled as a linear combination of the
7+
features (see the :ref:`linear_model` User Guide section for a description of a
8+
set of linear models available in scikit-learn). Coefficients in multiple linear
9+
models represent the relationship between the given feature, :math:`X_i` and the
10+
target, :math:`y`, assuming that all the other features remain constant
11+
(`conditional dependence
12+
<https://en.wikipedia.org/wiki/Conditional_dependence>`_). This is different
13+
from plotting :math:`X_i` versus :math:`y` and fitting a linear relationship: in
14+
that case all possible values of the other features are taken into account in
15+
the estimation (marginal dependence).
1716
1817
This example will provide some hints in interpreting coefficient in linear
1918
models, pointing at problems that arise when either the linear model is not
2019
appropriate to describe the dataset, or when features are correlated.
2120
21+
.. note::
22+
23+
Keep in mind that the features :math:`X` and the outcome :math:`y` are in
24+
general the result of a data generating process that is unknown to us.
25+
Machine learning models are trained to approximate the unobserved
26+
mathematical function that links :math:`X` to :math:`y` from sample data. As
27+
a result, any interpretation made about a model may not necessarily
28+
generalize to the true data generating process. This is especially true when
29+
the model is of bad quality or when the sample data is not representative of
30+
the population.
31+
2232
We will use data from the `"Current Population Survey"
23-
<https://www.openml.org/d/534>`_ from 1985 to predict
24-
wage as a function of various features such as experience, age, or education.
33+
<https://www.openml.org/d/534>`_ from 1985 to predict wage as a function of
34+
various features such as experience, age, or education.
2535
2636
.. contents::
2737
:local:
@@ -729,18 +739,6 @@
729739
# See the :ref:`sphx_glr_auto_examples_inspection_plot_causal_interpretation.py`
730740
# for a simulated case of ability OVB.
731741
#
732-
# Warning: data and model quality
733-
# -------------------------------
734-
#
735-
# Keep in mind that the outcome `y` and features `X` are the product
736-
# of a data generating process that is hidden from us. Machine
737-
# learning models are trained to approximate the unobserved
738-
# mathematical function that links `X` to `y` from sample data. As a
739-
# result, any interpretation made about a model may not necessarily
740-
# generalize to the true data generating process. This is especially
741-
# true when the model is of bad quality or when the sample data is
742-
# not representative of the population.
743-
#
744742
# Lessons learned
745743
# ---------------
746744
#
Binary file not shown.

dev/_downloads/cf0f90f46eb559facf7f63f124f61e04/plot_linear_model_coefficient_interpretation.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Common pitfalls in the interpretation of coefficients of linear models\n\nIn linear models, the target value is modeled as\na linear combination of the features (see the `linear_model` User Guide\nsection for a description of a set of linear models available in\nscikit-learn).\nCoefficients in multiple linear models represent the relationship between the\ngiven feature, $X_i$ and the target, $y$, assuming that all the\nother features remain constant ([conditional dependence](https://en.wikipedia.org/wiki/Conditional_dependence)).\nThis is different from plotting $X_i$ versus $y$ and fitting a\nlinear relationship: in that case all possible values of the other features are\ntaken into account in the estimation (marginal dependence).\n\nThis example will provide some hints in interpreting coefficient in linear\nmodels, pointing at problems that arise when either the linear model is not\nappropriate to describe the dataset, or when features are correlated.\n\nWe will use data from the [\"Current Population Survey\"](https://www.openml.org/d/534) from 1985 to predict\nwage as a function of various features such as experience, age, or education.\n :depth: 1\n"
18+
"\n# Common pitfalls in the interpretation of coefficients of linear models\n\nIn linear models, the target value is modeled as a linear combination of the\nfeatures (see the `linear_model` User Guide section for a description of a\nset of linear models available in scikit-learn). Coefficients in multiple linear\nmodels represent the relationship between the given feature, $X_i$ and the\ntarget, $y$, assuming that all the other features remain constant\n([conditional dependence](https://en.wikipedia.org/wiki/Conditional_dependence)). This is different\nfrom plotting $X_i$ versus $y$ and fitting a linear relationship: in\nthat case all possible values of the other features are taken into account in\nthe estimation (marginal dependence).\n\nThis example will provide some hints in interpreting coefficient in linear\nmodels, pointing at problems that arise when either the linear model is not\nappropriate to describe the dataset, or when features are correlated.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>Keep in mind that the features $X$ and the outcome $y$ are in\n general the result of a data generating process that is unknown to us.\n Machine learning models are trained to approximate the unobserved\n mathematical function that links $X$ to $y$ from sample data. As\n a result, any interpretation made about a model may not necessarily\n generalize to the true data generating process. This is especially true when\n the model is of bad quality or when the sample data is not representative of\n the population.</p></div>\n\nWe will use data from the [\"Current Population Survey\"](https://www.openml.org/d/534) from 1985 to predict wage as a function of\nvarious features such as experience, age, or education.\n :depth: 1\n"
1919
]
2020
},
2121
{
@@ -693,7 +693,7 @@
693693
"cell_type": "markdown",
694694
"metadata": {},
695695
"source": [
696-
"We observe that the AGE and EXPERIENCE coefficients are varying a lot\ndepending of the fold.\n\n## Wrong causal interpretation\n\nPolicy makers might want to know the effect of education on wage to assess\nwhether or not a certain policy designed to entice people to pursue more\neducation would make economic sense. While Machine Learning models are great\nfor measuring statistical associations, they are generally unable to infer\ncausal effects.\n\nIt might be tempting to look at the coefficient of education on wage from our\nlast model (or any model for that matter) and conclude that it captures the\ntrue effect of a change in the standardized education variable on wages.\n\nUnfortunately there are likely unobserved confounding variables that either\ninflate or deflate that coefficient. A confounding variable is a variable that\ncauses both EDUCATION and WAGE. One example of such variable is ability.\nPresumably, more able people are more likely to pursue education while at the\nsame time being more likely to earn a higher hourly wage at any level of\neducation. In this case, ability induces a positive [Omitted Variable Bias](https://en.wikipedia.org/wiki/Omitted-variable_bias) (OVB) on the EDUCATION\ncoefficient, thereby exaggerating the effect of education on wages.\n\nSee the `sphx_glr_auto_examples_inspection_plot_causal_interpretation.py`\nfor a simulated case of ability OVB.\n\n## Warning: data and model quality\n\nKeep in mind that the outcome `y` and features `X` are the product\nof a data generating process that is hidden from us. Machine\nlearning models are trained to approximate the unobserved\nmathematical function that links `X` to `y` from sample data. As a\nresult, any interpretation made about a model may not necessarily\ngeneralize to the true data generating process. This is especially\ntrue when the model is of bad quality or when the sample data is\nnot representative of the population.\n\n## Lessons learned\n\n* Coefficients must be scaled to the same unit of measure to retrieve\n feature importance. Scaling them with the standard-deviation of the\n feature is a useful proxy.\n* Coefficients in multivariate linear models represent the dependency\n between a given feature and the target, **conditional** on the other\n features.\n* Correlated features induce instabilities in the coefficients of linear\n models and their effects cannot be well teased apart.\n* Different linear models respond differently to feature correlation and\n coefficients could significantly vary from one another.\n* Inspecting coefficients across the folds of a cross-validation loop\n gives an idea of their stability.\n* Coefficients are unlikely to have any causal meaning. They tend\n to be biased by unobserved confounders.\n* Inspection tools may not necessarily provide insights on the true\n data generating process.\n\n"
696+
"We observe that the AGE and EXPERIENCE coefficients are varying a lot\ndepending of the fold.\n\n## Wrong causal interpretation\n\nPolicy makers might want to know the effect of education on wage to assess\nwhether or not a certain policy designed to entice people to pursue more\neducation would make economic sense. While Machine Learning models are great\nfor measuring statistical associations, they are generally unable to infer\ncausal effects.\n\nIt might be tempting to look at the coefficient of education on wage from our\nlast model (or any model for that matter) and conclude that it captures the\ntrue effect of a change in the standardized education variable on wages.\n\nUnfortunately there are likely unobserved confounding variables that either\ninflate or deflate that coefficient. A confounding variable is a variable that\ncauses both EDUCATION and WAGE. One example of such variable is ability.\nPresumably, more able people are more likely to pursue education while at the\nsame time being more likely to earn a higher hourly wage at any level of\neducation. In this case, ability induces a positive [Omitted Variable Bias](https://en.wikipedia.org/wiki/Omitted-variable_bias) (OVB) on the EDUCATION\ncoefficient, thereby exaggerating the effect of education on wages.\n\nSee the `sphx_glr_auto_examples_inspection_plot_causal_interpretation.py`\nfor a simulated case of ability OVB.\n\n## Lessons learned\n\n* Coefficients must be scaled to the same unit of measure to retrieve\n feature importance. Scaling them with the standard-deviation of the\n feature is a useful proxy.\n* Coefficients in multivariate linear models represent the dependency\n between a given feature and the target, **conditional** on the other\n features.\n* Correlated features induce instabilities in the coefficients of linear\n models and their effects cannot be well teased apart.\n* Different linear models respond differently to feature correlation and\n coefficients could significantly vary from one another.\n* Inspecting coefficients across the folds of a cross-validation loop\n gives an idea of their stability.\n* Coefficients are unlikely to have any causal meaning. They tend\n to be biased by unobserved confounders.\n* Inspection tools may not necessarily provide insights on the true\n data generating process.\n\n"
697697
]
698698
}
699699
],

dev/_downloads/ff3bc184e1a2d8d99b77058ba52b764f/plot_causal_interpretation.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@
105105
},
106106
"outputs": [],
107107
"source": [
108-
"import matplotlib.pyplot as plt\n\nmodel_coef = pd.Series(regressor_with_ability.coef_, index=features_names)\ncoef = pd.concat(\n [true_coef[features_names], model_coef],\n keys=[\"Coefficients of true generative model\", \"Model coefficients\"],\n axis=1,\n)\nax = coef.plot.barh()\nax.set_xlabel(\"Coefficient values\")\nax.set_title(\"Coefficients of the linear regression including the ability features\")\nplt.tight_layout()\nplt.show()"
108+
"import matplotlib.pyplot as plt\n\nmodel_coef = pd.Series(regressor_with_ability.coef_, index=features_names)\ncoef = pd.concat(\n [true_coef[features_names], model_coef],\n keys=[\"Coefficients of true generative model\", \"Model coefficients\"],\n axis=1,\n)\nax = coef.plot.barh()\nax.set_xlabel(\"Coefficient values\")\nax.set_title(\"Coefficients of the linear regression including the ability features\")\n_ = plt.tight_layout()"
109109
]
110110
},
111111
{
@@ -141,7 +141,7 @@
141141
},
142142
"outputs": [],
143143
"source": [
144-
"model_coef = pd.Series(regressor_without_ability.coef_, index=features_names)\ncoef = pd.concat(\n [true_coef[features_names], model_coef],\n keys=[\"Coefficients of true generative model\", \"Model coefficients\"],\n axis=1,\n)\nax = coef.plot.barh()\nax.set_xlabel(\"Coefficient values\")\n_ = ax.set_title(\"Coefficients of the linear regression excluding the ability feature\")"
144+
"model_coef = pd.Series(regressor_without_ability.coef_, index=features_names)\ncoef = pd.concat(\n [true_coef[features_names], model_coef],\n keys=[\"Coefficients of true generative model\", \"Model coefficients\"],\n axis=1,\n)\nax = coef.plot.barh()\nax.set_xlabel(\"Coefficient values\")\n_ = ax.set_title(\"Coefficients of the linear regression excluding the ability feature\")\nplt.tight_layout()\nplt.show()"
145145
]
146146
},
147147
{

dev/_downloads/scikit-learn-docs.zip

322 Bytes
Binary file not shown.
-198 Bytes
-261 Bytes
-43 Bytes
-58 Bytes
-13 Bytes
-14 Bytes
21 Bytes
-28 Bytes
-149 Bytes
-18 Bytes
84 Bytes
50 Bytes
2.32 KB
908 Bytes
-979 Bytes
-90 Bytes
-73 Bytes
45 Bytes
-8 Bytes
0 Bytes
5 Bytes
-29 Bytes
25 Bytes
114 Bytes
-74 Bytes
30 Bytes
-592 Bytes
-164 Bytes
713 Bytes

dev/_sources/auto_examples/applications/plot_cyclical_feature_engineering.rst.txt

+1-1

dev/_sources/auto_examples/applications/plot_digits_denoising.rst.txt

+1-1

dev/_sources/auto_examples/applications/plot_face_recognition.rst.txt

+5-5

dev/_sources/auto_examples/applications/plot_model_complexity_influence.rst.txt

+15-15

0 commit comments

Comments
 (0)