{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Partial Dependence and Individual Conditional Expectation Plots\n\nPartial dependence plots show the dependence between the target function [2]_\nand a set of features of interest, marginalizing over the values of all other\nfeatures (the complement features). Due to the limits of human perception, the\nsize of the set of features of interest must be small (usually, one or two)\nthus they are usually chosen among the most important features.\n\nSimilarly, an individual conditional expectation (ICE) plot [3]_\nshows the dependence between the target function and a feature of interest.\nHowever, unlike partial dependence plots, which show the average effect of the\nfeatures of interest, ICE plots visualize the dependence of the prediction on a\nfeature for each :term:`sample` separately, with one line per sample.\nOnly one feature of interest is supported for ICE plots.\n\nThis example shows how to obtain partial dependence and ICE plots from a\n:class:`~sklearn.neural_network.MLPRegressor` and a\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` trained on the\nCalifornia housing dataset. The example is taken from [1]_.\n\n.. [1] T. Hastie, R. Tibshirani and J. Friedman, \"Elements of Statistical\n       Learning Ed. 2\", Springer, 2009.\n\n.. [2] For classification you can think of it as the regression score before\n       the link function.\n\n.. [3] :arxiv:`Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2015).\n       \"Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of\n       Individual Conditional Expectation\". Journal of Computational and\n       Graphical Statistics, 24(1): 44-65 <1309.6392>`\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## California Housing data preprocessing\n\nCenter target to avoid gradient boosting init bias: gradient boosting\nwith the 'recursion' method does not account for the initial estimator\n(here the average target, by default).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import pandas as pd\nfrom sklearn.datasets import fetch_california_housing\nfrom sklearn.model_selection import train_test_split\n\ncal_housing = fetch_california_housing()\nX = pd.DataFrame(cal_housing.data, columns=cal_housing.feature_names)\ny = cal_housing.target\n\ny -= y.mean()\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1-way partial dependence with different models\n\nIn this section, we will compute 1-way partial dependence with two different\nmachine-learning models: (i) a multi-layer perceptron and (ii) a\ngradient-boosting. With these two models, we illustrate how to compute and\ninterpret both partial dependence plot (PDP) and individual conditional\nexpectation (ICE).\n\n### Multi-layer perceptron\n\nLet's fit a :class:`~sklearn.neural_network.MLPRegressor` and compute\nsingle-variable partial dependence plots.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from time import time\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import QuantileTransformer\nfrom sklearn.neural_network import MLPRegressor\n\nprint(\"Training MLPRegressor...\")\ntic = time()\nest = make_pipeline(\n    QuantileTransformer(),\n    MLPRegressor(\n        hidden_layer_sizes=(30, 15),\n        learning_rate_init=0.01,\n        early_stopping=True,\n        random_state=0,\n    ),\n)\nest.fit(X_train, y_train)\nprint(f\"done in {time() - tic:.3f}s\")\nprint(f\"Test R2 score: {est.score(X_test, y_test):.2f}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We configured a pipeline to scale the numerical input features and tuned the\nneural network size and learning rate to get a reasonable compromise between\ntraining time and predictive performance on a test set.\n\nImportantly, this tabular dataset has very different dynamic ranges for its\nfeatures. Neural networks tend to be very sensitive to features with varying\nscales and forgetting to preprocess the numeric feature would lead to a very\npoor model.\n\nIt would be possible to get even higher predictive performance with a larger\nneural network but the training would also be significantly more expensive.\n\nNote that it is important to check that the model is accurate enough on a\ntest set before plotting the partial dependence since there would be little\nuse in explaining the impact of a given feature on the prediction function of\na poor model.\n\nWe will plot the partial dependence, both individual (ICE) and averaged one\n(PDP). We limit to only 50 ICE curves to not overcrowd the plot.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn.inspection import PartialDependenceDisplay\n\ncommon_params = {\n    \"subsample\": 50,\n    \"n_jobs\": 2,\n    \"grid_resolution\": 20,\n    \"centered\": True,\n    \"random_state\": 0,\n}\n\nprint(\"Computing partial dependence plots...\")\ntic = time()\ndisplay = PartialDependenceDisplay.from_estimator(\n    est,\n    X_train,\n    features=[\"MedInc\", \"AveOccup\", \"HouseAge\", \"AveRooms\"],\n    kind=\"both\",\n    **common_params,\n)\nprint(f\"done in {time() - tic:.3f}s\")\ndisplay.figure_.suptitle(\n    \"Partial dependence of house value on non-location features\\n\"\n    \"for the California housing dataset, with MLPRegressor\"\n)\ndisplay.figure_.subplots_adjust(hspace=0.3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Gradient boosting\n\nLet's now fit a :class:`~sklearn.ensemble.HistGradientBoostingRegressor` and\ncompute the partial dependence on the same features.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn.ensemble import HistGradientBoostingRegressor\n\nprint(\"Training HistGradientBoostingRegressor...\")\ntic = time()\nest = HistGradientBoostingRegressor(random_state=0)\nest.fit(X_train, y_train)\nprint(f\"done in {time() - tic:.3f}s\")\nprint(f\"Test R2 score: {est.score(X_test, y_test):.2f}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Here, we used the default hyperparameters for the gradient boosting model\nwithout any preprocessing as tree-based models are naturally robust to\nmonotonic transformations of numerical features.\n\nNote that on this tabular dataset, Gradient Boosting Machines are both\nsignificantly faster to train and more accurate than neural networks. It is\nalso significantly cheaper to tune their hyperparameters (the defaults tend\nto work well while this is not often the case for neural networks).\n\nWe will plot the partial dependence, both individual (ICE) and averaged one\n(PDP). We limit to only 50 ICE curves to not overcrowd the plot.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"Computing partial dependence plots...\")\ntic = time()\ndisplay = PartialDependenceDisplay.from_estimator(\n    est,\n    X_train,\n    features=[\"MedInc\", \"AveOccup\", \"HouseAge\", \"AveRooms\"],\n    kind=\"both\",\n    **common_params,\n)\nprint(f\"done in {time() - tic:.3f}s\")\ndisplay.figure_.suptitle(\n    \"Partial dependence of house value on non-location features\\n\"\n    \"for the California housing dataset, with Gradient Boosting\"\n)\ndisplay.figure_.subplots_adjust(wspace=0.4, hspace=0.3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Analysis of the plots\n\nWe can clearly see on the PDPs (dashed orange line) that the median house price\nshows a linear relationship with the median income (top left) and that the\nhouse price drops when the average occupants per household increases (top\nmiddle). The top right plot shows that the house age in a district does not\nhave a strong influence on the (median) house price; so does the average\nrooms per household.\n\nThe ICE curves (light blue lines) complement the analysis: we can see that\nthere are some exceptions (which are better highlighted with the option\n`centered=True`), where the house price remains constant with respect to\nmedian income and average occupants variations.\nOn the other hand, while the house age (top right) does not have a strong\ninfluence on the median house price on average, there seems to be a number\nof exceptions where the house price increases when\nbetween the ages 15-25. Similar exceptions can be observed for the average\nnumber of rooms (bottom left). Therefore, ICE plots show some individual\neffect which are attenuated by taking the averages.\n\nIn all plots, the tick marks on the x-axis represent the deciles of the\nfeature values in the training data.\n\nWe also observe that :class:`~sklearn.neural_network.MLPRegressor` has much\nsmoother predictions than\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor`.\n\nHowever, it is worth noting that we are creating potential meaningless\nsynthetic samples if features are correlated.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2D interaction plots\n\nPDPs with two features of interest enable us to visualize interactions among\nthem. However, ICEs cannot be plotted in an easy manner and thus interpreted.\nAnother consideration is linked to the performance to compute the PDPs. With\nthe tree-based algorithm, when only PDPs are requested, they can be computed\non an efficient way using the `'recursion'` method.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n\nprint(\"Computing partial dependence plots...\")\ntic = time()\n_, ax = plt.subplots(ncols=3, figsize=(9, 4))\n\n# Note that we could have called the method `from_estimator` three times and\n# provide one feature, one kind of plot, and one axis for each call.\ndisplay = PartialDependenceDisplay.from_estimator(\n    est,\n    X_train,\n    features=[\"AveOccup\", \"HouseAge\", (\"AveOccup\", \"HouseAge\")],\n    kind=[\"both\", \"both\", \"average\"],\n    ax=ax,\n    **common_params,\n)\n\nprint(f\"done in {time() - tic:.3f}s\")\ndisplay.figure_.suptitle(\n    \"Partial dependence of house value on non-location features\\n\"\n    \"for the California housing dataset, with Gradient Boosting\"\n)\ndisplay.figure_.subplots_adjust(wspace=0.4, hspace=0.3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The two-way partial dependence plot shows the dependence of median house\nprice on joint values of house age and average occupants per household. We\ncan clearly see an interaction between the two features: for an average\noccupancy greater than two, the house price is nearly independent of the\nhouse age, whereas for values less than two there is a strong dependence on\nage.\n\n## 3D interaction plots\n\nLet's make the same partial dependence plot for the 2 features interaction,\nthis time in 3 dimensions.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nfrom mpl_toolkits.mplot3d import Axes3D\nfrom sklearn.inspection import partial_dependence\n\nfig = plt.figure()\n\nfeatures = (\"AveOccup\", \"HouseAge\")\npdp = partial_dependence(\n    est, X_train, features=features, kind=\"average\", grid_resolution=10\n)\nXX, YY = np.meshgrid(pdp[\"values\"][0], pdp[\"values\"][1])\nZ = pdp.average[0].T\nax = Axes3D(fig)\nfig.add_axes(ax)\n\nsurf = ax.plot_surface(XX, YY, Z, rstride=1, cstride=1, cmap=plt.cm.BuPu, edgecolor=\"k\")\nax.set_xlabel(features[0])\nax.set_ylabel(features[1])\nax.set_zlabel(\"Partial dependence\")\n# pretty init view\nax.view_init(elev=22, azim=122)\nplt.colorbar(surf)\nplt.suptitle(\n    \"Partial dependence of house value on median\\n\"\n    \"age and average occupancy, with Gradient Boosting\"\n)\nplt.subplots_adjust(top=0.9)\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}