|
3 | 3 | Common pitfalls in the interpretation of coefficients of linear models
|
4 | 4 | ======================================================================
|
5 | 5 |
|
6 |
| -In linear models, the target value is modeled as |
7 |
| -a linear combination of the features (see the :ref:`linear_model` User Guide |
8 |
| -section for a description of a set of linear models available in |
9 |
| -scikit-learn). |
10 |
| -Coefficients in multiple linear models represent the relationship between the |
11 |
| -given feature, :math:`X_i` and the target, :math:`y`, assuming that all the |
12 |
| -other features remain constant (`conditional dependence |
13 |
| -<https://en.wikipedia.org/wiki/Conditional_dependence>`_). |
14 |
| -This is different from plotting :math:`X_i` versus :math:`y` and fitting a |
15 |
| -linear relationship: in that case all possible values of the other features are |
16 |
| -taken into account in the estimation (marginal dependence). |
| 6 | +In linear models, the target value is modeled as a linear combination of the |
| 7 | +features (see the :ref:`linear_model` User Guide section for a description of a |
| 8 | +set of linear models available in scikit-learn). Coefficients in multiple linear |
| 9 | +models represent the relationship between the given feature, :math:`X_i` and the |
| 10 | +target, :math:`y`, assuming that all the other features remain constant |
| 11 | +(`conditional dependence |
| 12 | +<https://en.wikipedia.org/wiki/Conditional_dependence>`_). This is different |
| 13 | +from plotting :math:`X_i` versus :math:`y` and fitting a linear relationship: in |
| 14 | +that case all possible values of the other features are taken into account in |
| 15 | +the estimation (marginal dependence). |
17 | 16 |
|
18 | 17 | This example will provide some hints in interpreting coefficient in linear
|
19 | 18 | models, pointing at problems that arise when either the linear model is not
|
20 | 19 | appropriate to describe the dataset, or when features are correlated.
|
21 | 20 |
|
| 21 | +.. note:: |
| 22 | +
|
| 23 | + Keep in mind that the features :math:`X` and the outcome :math:`y` are in |
| 24 | + general the result of a data generating process that is unknown to us. |
| 25 | + Machine learning models are trained to approximate the unobserved |
| 26 | + mathematical function that links :math:`X` to :math:`y` from sample data. As |
| 27 | + a result, any interpretation made about a model may not necessarily |
| 28 | + generalize to the true data generating process. This is especially true when |
| 29 | + the model is of bad quality or when the sample data is not representative of |
| 30 | + the population. |
| 31 | +
|
22 | 32 | We will use data from the `"Current Population Survey"
|
23 |
| -<https://www.openml.org/d/534>`_ from 1985 to predict |
24 |
| -wage as a function of various features such as experience, age, or education. |
| 33 | +<https://www.openml.org/d/534>`_ from 1985 to predict wage as a function of |
| 34 | +various features such as experience, age, or education. |
25 | 35 |
|
26 | 36 | .. contents::
|
27 | 37 | :local:
|
|
729 | 739 | # See the :ref:`sphx_glr_auto_examples_inspection_plot_causal_interpretation.py`
|
730 | 740 | # for a simulated case of ability OVB.
|
731 | 741 | #
|
732 |
| -# Warning: data and model quality |
733 |
| -# ------------------------------- |
734 |
| -# |
735 |
| -# Keep in mind that the outcome `y` and features `X` are the product |
736 |
| -# of a data generating process that is hidden from us. Machine |
737 |
| -# learning models are trained to approximate the unobserved |
738 |
| -# mathematical function that links `X` to `y` from sample data. As a |
739 |
| -# result, any interpretation made about a model may not necessarily |
740 |
| -# generalize to the true data generating process. This is especially |
741 |
| -# true when the model is of bad quality or when the sample data is |
742 |
| -# not representative of the population. |
743 |
| -# |
744 | 742 | # Lessons learned
|
745 | 743 | # ---------------
|
746 | 744 | #
|
|
0 commit comments