scikit-learn
diff --git a/‎1.1/.buildinfo
+1-1 b/‎1.1/.buildinfo
+1-1
diff --git a/‎1.1/_downloads/0486bf9e537e44cedd2a236d034bcd90/plot_pcr_vs_pls.ipynb
+1-1 b/‎1.1/_downloads/0486bf9e537e44cedd2a236d034bcd90/plot_pcr_vs_pls.ipynb
+1-1
diff --git a/‎1.1/_downloads/05ca8a4e90b4cc2acd69f9e24b4a1f3a/plot_classifier_chain_yeast.ipynb
+1-1 b/‎1.1/_downloads/05ca8a4e90b4cc2acd69f9e24b4a1f3a/plot_classifier_chain_yeast.ipynb
+1-1
diff --git a/‎1.1/_downloads/06cfc926acb27652fb2aa5bfc583e7cb/plot_hashing_vs_dict_vectorizer.ipynb
+1-1 b/‎1.1/_downloads/06cfc926acb27652fb2aa5bfc583e7cb/plot_hashing_vs_dict_vectorizer.ipynb
+1-1
diff --git a/‎1.1/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
0 Bytes b/‎1.1/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
0 Bytes
diff --git a/‎1.1/_downloads/1c4a422dfa5bd721501d19a2b7e2499b/plot_species_kde.ipynb
+1-1 b/‎1.1/_downloads/1c4a422dfa5bd721501d19a2b7e2499b/plot_species_kde.ipynb
+1-1
diff --git a/‎1.1/_downloads/215c560d29193ab9b0a495609bc74802/plot_monotonic_constraints.ipynb
+1-1 b/‎1.1/_downloads/215c560d29193ab9b0a495609bc74802/plot_monotonic_constraints.ipynb
+1-1
diff --git a/‎1.1/_downloads/2402de18d671ce5087e3760b2540184f/plot_grid_search_stats.ipynb
+2-2 b/‎1.1/_downloads/2402de18d671ce5087e3760b2540184f/plot_grid_search_stats.ipynb
+2-2
diff --git a/‎1.1/_downloads/26998096b90db15754e891c733ae032c/plot_iris_dataset.ipynb
+1-1 b/‎1.1/_downloads/26998096b90db15754e891c733ae032c/plot_iris_dataset.ipynb
+1-1
diff --git a/‎1.1/_downloads/26f110ad6cff1a8a7c58b1a00d8b8b5a/plot_column_transformer_mixed_types.ipynb
+1-1 b/‎1.1/_downloads/26f110ad6cff1a8a7c58b1a00d8b8b5a/plot_column_transformer_mixed_types.ipynb
+1-1
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 8c6ff21e847d280e934fd16d253894de
+config: 2ef931295e3d55971b9853cfa92577eb
 tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Principal Component Regression vs Partial Least Squares Regression\n\nThis example compares `Principal Component Regression\n<https://en.wikipedia.org/wiki/Principal_component_regression>`_ (PCR) and\n`Partial Least Squares Regression\n<https://en.wikipedia.org/wiki/Partial_least_squares_regression>`_ (PLS) on a\ntoy dataset. Our goal is to illustrate how PLS can outperform PCR when the\ntarget is strongly correlated with some directions in the data that have a\nlow variance.\n\nPCR is a regressor composed of two steps: first,\n:class:`~sklearn.decomposition.PCA` is applied to the training data, possibly\nperforming dimensionality reduction; then, a regressor (e.g. a linear\nregressor) is trained on the transformed samples. In\n:class:`~sklearn.decomposition.PCA`, the transformation is purely\nunsupervised, meaning that no information about the targets is used. As a\nresult, PCR may perform poorly in some datasets where the target is strongly\ncorrelated with *directions* that have low variance. Indeed, the\ndimensionality reduction of PCA projects the data into a lower dimensional\nspace where the variance of the projected data is greedily maximized along\neach axis. Despite them having the most predictive power on the target, the\ndirections with a lower variance will be dropped, and the final regressor\nwill not be able to leverage them.\n\nPLS is both a transformer and a regressor, and it is quite similar to PCR: it\nalso applies a dimensionality reduction to the samples before applying a\nlinear regressor to the transformed data. The main difference with PCR is\nthat the PLS transformation is supervised. Therefore, as we will see in this\nexample, it does not suffer from the issue we just mentioned.\n"
+        "\n# Principal Component Regression vs Partial Least Squares Regression\n\nThis example compares [Principal Component Regression](https://en.wikipedia.org/wiki/Principal_component_regression) (PCR) and\n[Partial Least Squares Regression](https://en.wikipedia.org/wiki/Partial_least_squares_regression) (PLS) on a\ntoy dataset. Our goal is to illustrate how PLS can outperform PCR when the\ntarget is strongly correlated with some directions in the data that have a\nlow variance.\n\nPCR is a regressor composed of two steps: first,\n:class:`~sklearn.decomposition.PCA` is applied to the training data, possibly\nperforming dimensionality reduction; then, a regressor (e.g. a linear\nregressor) is trained on the transformed samples. In\n:class:`~sklearn.decomposition.PCA`, the transformation is purely\nunsupervised, meaning that no information about the targets is used. As a\nresult, PCR may perform poorly in some datasets where the target is strongly\ncorrelated with *directions* that have low variance. Indeed, the\ndimensionality reduction of PCA projects the data into a lower dimensional\nspace where the variance of the projected data is greedily maximized along\neach axis. Despite them having the most predictive power on the target, the\ndirections with a lower variance will be dropped, and the final regressor\nwill not be able to leverage them.\n\nPLS is both a transformer and a regressor, and it is quite similar to PCR: it\nalso applies a dimensionality reduction to the samples before applying a\nlinear regressor to the transformed data. The main difference with PCR is\nthat the PLS transformation is supervised. Therefore, as we will see in this\nexample, it does not suffer from the issue we just mentioned.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Classifier Chain\nExample of using classifier chain on a multilabel dataset.\n\nFor this example we will use the `yeast\n<https://www.openml.org/d/40597>`_ dataset which contains\n2417 datapoints each with 103 features and 14 possible labels. Each\ndata point has at least one label. As a baseline we first train a logistic\nregression classifier for each of the 14 labels. To evaluate the performance of\nthese classifiers we predict on a held-out test set and calculate the\n`jaccard score <jaccard_similarity_score>` for each sample.\n\nNext we create 10 classifier chains. Each classifier chain contains a\nlogistic regression model for each of the 14 labels. The models in each\nchain are ordered randomly. In addition to the 103 features in the dataset,\neach model gets the predictions of the preceding models in the chain as\nfeatures (note that by default at training time each model gets the true\nlabels as features). These additional features allow each chain to exploit\ncorrelations among the classes. The Jaccard similarity score for each chain\ntends to be greater than that of the set independent logistic models.\n\nBecause the models in each chain are arranged randomly there is significant\nvariation in performance among the chains. Presumably there is an optimal\nordering of the classes in a chain that will yield the best performance.\nHowever we do not know that ordering a priori. Instead we can construct an\nvoting ensemble of classifier chains by averaging the binary predictions of\nthe chains and apply a threshold of 0.5. The Jaccard similarity score of the\nensemble is greater than that of the independent models and tends to exceed\nthe score of each chain in the ensemble (although this is not guaranteed\nwith randomly ordered chains).\n"
+        "\n# Classifier Chain\nExample of using classifier chain on a multilabel dataset.\n\nFor this example we will use the [yeast](https://www.openml.org/d/40597) dataset which contains\n2417 datapoints each with 103 features and 14 possible labels. Each\ndata point has at least one label. As a baseline we first train a logistic\nregression classifier for each of the 14 labels. To evaluate the performance of\nthese classifiers we predict on a held-out test set and calculate the\n`jaccard score <jaccard_similarity_score>` for each sample.\n\nNext we create 10 classifier chains. Each classifier chain contains a\nlogistic regression model for each of the 14 labels. The models in each\nchain are ordered randomly. In addition to the 103 features in the dataset,\neach model gets the predictions of the preceding models in the chain as\nfeatures (note that by default at training time each model gets the true\nlabels as features). These additional features allow each chain to exploit\ncorrelations among the classes. The Jaccard similarity score for each chain\ntends to be greater than that of the set independent logistic models.\n\nBecause the models in each chain are arranged randomly there is significant\nvariation in performance among the chains. Presumably there is an optimal\nordering of the classes in a chain that will yield the best performance.\nHowever we do not know that ordering a priori. Instead we can construct an\nvoting ensemble of classifier chains by averaging the binary predictions of\nthe chains and apply a threshold of 0.5. The Jaccard similarity score of the\nensemble is greater than that of the independent models and tends to exceed\nthe score of each chain in the ensemble (although this is not guaranteed\nwith randomly ordered chains).\n"
       ]
     },
     {
 
@@ -87,7 +87,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "Observe in particular that the repeated token `\"is\"` is counted twice for\ninstance.\n\nBreaking a text document into word tokens, potentially losing the order\ninformation between the words in a sentence is often called a `Bag of Words\nrepresentation <https://en.wikipedia.org/wiki/Bag-of-words_model>`_.\n\n"
+        "Observe in particular that the repeated token `\"is\"` is counted twice for\ninstance.\n\nBreaking a text document into word tokens, potentially losing the order\ninformation between the words in a sentence is often called a [Bag of Words\nrepresentation](https://en.wikipedia.org/wiki/Bag-of-words_model).\n\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Kernel Density Estimate of Species Distributions\nThis shows an example of a neighbors-based query (in particular a kernel\ndensity estimate) on geospatial data, using a Ball Tree built upon the\nHaversine distance metric -- i.e. distances over points in latitude/longitude.\nThe dataset is provided by Phillips et. al. (2006).\nIf available, the example uses\n`basemap <https://matplotlib.org/basemap/>`_\nto plot the coast lines and national boundaries of South America.\n\nThis example does not perform any learning over the data\n(see `sphx_glr_auto_examples_applications_plot_species_distribution_modeling.py` for\nan example of classification based on the attributes in this dataset).  It\nsimply shows the kernel density estimate of observed data points in\ngeospatial coordinates.\n\nThe two species are:\n\n - `\"Bradypus variegatus\"\n   <https://www.iucnredlist.org/species/3038/47437046>`_ ,\n   the Brown-throated Sloth.\n\n - `\"Microryzomys minutus\"\n   <http://www.iucnredlist.org/details/13408/0>`_ ,\n   also known as the Forest Small Rice Rat, a rodent that lives in Peru,\n   Colombia, Ecuador, Peru, and Venezuela.\n\n## References\n\n * `\"Maximum entropy modeling of species geographic distributions\"\n   <http://rob.schapire.net/papers/ecolmod.pdf>`_\n   S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling,\n   190:231-259, 2006.\n"
+        "\n# Kernel Density Estimate of Species Distributions\nThis shows an example of a neighbors-based query (in particular a kernel\ndensity estimate) on geospatial data, using a Ball Tree built upon the\nHaversine distance metric -- i.e. distances over points in latitude/longitude.\nThe dataset is provided by Phillips et. al. (2006).\nIf available, the example uses\n[basemap](https://matplotlib.org/basemap/)\nto plot the coast lines and national boundaries of South America.\n\nThis example does not perform any learning over the data\n(see `sphx_glr_auto_examples_applications_plot_species_distribution_modeling.py` for\nan example of classification based on the attributes in this dataset).  It\nsimply shows the kernel density estimate of observed data points in\ngeospatial coordinates.\n\nThe two species are:\n\n - [\"Bradypus variegatus\"](https://www.iucnredlist.org/species/3038/47437046) ,\n   the Brown-throated Sloth.\n\n - [\"Microryzomys minutus\"](http://www.iucnredlist.org/details/13408/0) ,\n   also known as the Forest Small Rice Rat, a rodent that lives in Peru,\n   Colombia, Ecuador, Peru, and Venezuela.\n\n## References\n\n * [\"Maximum entropy modeling of species geographic distributions\"](http://rob.schapire.net/papers/ecolmod.pdf)\n   S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling,\n   190:231-259, 2006.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Monotonic Constraints\n\nThis example illustrates the effect of monotonic constraints on a gradient\nboosting estimator.\n\nWe build an artificial dataset where the target value is in general\npositively correlated with the first feature (with some random and\nnon-random variations), and in general negatively correlated with the second\nfeature.\n\nBy imposing a positive (increasing) or negative (decreasing) constraint on\nthe features during the learning process, the estimator is able to properly\nfollow the general trend instead of being subject to the variations.\n\nThis example was inspired by the `XGBoost documentation\n<https://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html>`_.\n"
+        "\n# Monotonic Constraints\n\nThis example illustrates the effect of monotonic constraints on a gradient\nboosting estimator.\n\nWe build an artificial dataset where the target value is in general\npositively correlated with the first feature (with some random and\nnon-random variations), and in general negatively correlated with the second\nfeature.\n\nBy imposing a positive (increasing) or negative (decreasing) constraint on\nthe features during the learning process, the estimator is able to properly\nfollow the general trend instead of being subject to the variations.\n\nThis example was inspired by the [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html).\n"
       ]
     },
     {
 
@@ -284,7 +284,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "## Pairwise comparison of all models: frequentist approach\n\nWe could also be interested in comparing the performance of all our models\nevaluated with :class:`~sklearn.model_selection.GridSearchCV`. In this case\nwe would be running our statistical test multiple times, which leads us to\nthe `multiple comparisons problem\n<https://en.wikipedia.org/wiki/Multiple_comparisons_problem>`_.\n\nThere are many possible ways to tackle this problem, but a standard approach\nis to apply a `Bonferroni correction\n<https://en.wikipedia.org/wiki/Bonferroni_correction>`_. Bonferroni can be\ncomputed by multiplying the p-value by the number of comparisons we are\ntesting.\n\nLet's compare the performance of the models using the corrected t-test:\n\n"
+        "## Pairwise comparison of all models: frequentist approach\n\nWe could also be interested in comparing the performance of all our models\nevaluated with :class:`~sklearn.model_selection.GridSearchCV`. In this case\nwe would be running our statistical test multiple times, which leads us to\nthe [multiple comparisons problem](https://en.wikipedia.org/wiki/Multiple_comparisons_problem).\n\nThere are many possible ways to tackle this problem, but a standard approach\nis to apply a [Bonferroni correction](https://en.wikipedia.org/wiki/Bonferroni_correction). Bonferroni can be\ncomputed by multiplying the p-value by the number of comparisons we are\ntesting.\n\nLet's compare the performance of the models using the corrected t-test:\n\n"
       ]
     },
     {
@@ -341,7 +341,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        ".. topic:: References\n\n   .. [1] Dietterich, T. G. (1998). `Approximate statistical tests for\n          comparing supervised classification learning algorithms\n          <http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf>`_.\n          Neural computation, 10(7).\n   .. [2] Nadeau, C., & Bengio, Y. (2000). `Inference for the generalization\n          error\n          <https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf>`_.\n          In Advances in neural information processing systems.\n   .. [3] Bouckaert, R. R., & Frank, E. (2004). `Evaluating the replicability\n          of significance tests for comparing learning algorithms\n          <https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf>`_.\n          In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n   .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). `Time\n          for a change: a tutorial for comparing multiple classifiers through\n          Bayesian analysis\n          <http://www.jmlr.org/papers/volume18/16-305/16-305.pdf>`_.\n          The Journal of Machine Learning Research, 18(1). See the Python\n          library that accompanies this paper `here\n          <https://github.com/janezd/baycomp>`_.\n   .. [5] Diebold, F.X. & Mariano R.S. (1995). `Comparing predictive accuracy\n          <http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf>`_\n          Journal of Business & economic statistics, 20(1), 134-144.\n\n"
+        ".. topic:: References\n\n   .. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n          comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n          Neural computation, 10(7).\n   .. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n          error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n          In Advances in neural information processing systems.\n   .. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n          of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n          In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n   .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n          for a change: a tutorial for comparing multiple classifiers through\n          Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n          The Journal of Machine Learning Research, 18(1). See the Python\n          library that accompanies this paper [here](https://github.com/janezd/baycomp).\n   .. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n          Journal of Business & economic statistics, 20(1), 134-144.\n\n"
       ]
     }
   ],
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# The Iris Dataset\nThis data sets consists of 3 different types of irises'\n(Setosa, Versicolour, and Virginica) petal and sepal\nlength, stored in a 150x4 numpy.ndarray\n\nThe rows being the samples and the columns being:\nSepal Length, Sepal Width, Petal Length and Petal Width.\n\nThe below plot uses the first two features.\nSee `here <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_ for more\ninformation on this dataset.\n"
+        "\n# The Iris Dataset\nThis data sets consists of 3 different types of irises'\n(Setosa, Versicolour, and Virginica) petal and sepal\nlength, stored in a 150x4 numpy.ndarray\n\nThe rows being the samples and the columns being:\nSepal Length, Sepal Width, Petal Length and Petal Width.\n\nThe below plot uses the first two features.\nSee [here](https://en.wikipedia.org/wiki/Iris_flower_data_set) for more\ninformation on this dataset.\n"
       ]
     },
     {
 
@@ -159,7 +159,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "<div class=\"alert alert-info\"><h4>Note</h4><p>In practice, you will have to handle yourself the column data type.\n   If you want some columns to be considered as `category`, you will have to\n   convert them into categorical columns. If you are using pandas, you can\n   refer to their documentation regarding `Categorical data\n   <https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html>`_.</p></div>\n\n"
+        "<div class=\"alert alert-info\"><h4>Note</h4><p>In practice, you will have to handle yourself the column data type.\n   If you want some columns to be considered as `category`, you will have to\n   convert them into categorical columns. If you are using pandas, you can\n   refer to their documentation regarding [Categorical data](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html).</p></div>\n\n"
       ]
     },
     {
Original file line number	Diff line number	Diff line change
`@@ -15,7 +15,7 @@`
`15`	`15`	`"cell_type": "markdown",`
`16`	`16`	`"metadata": {},`
`17`	`17`	`"source": [`
`18`		- "\n# Principal Component Regression vs Partial Least Squares Regression\n\nThis example compares `Principal Component Regression\n<https://en.wikipedia.org/wiki/Principal_component_regression>`_ (PCR) and\n`Partial Least Squares Regression\n<https://en.wikipedia.org/wiki/Partial_least_squares_regression>`_ (PLS) on a\ntoy dataset. Our goal is to illustrate how PLS can outperform PCR when the\ntarget is strongly correlated with some directions in the data that have a\nlow variance.\n\nPCR is a regressor composed of two steps: first,\n:class:`~sklearn.decomposition.PCA` is applied to the training data, possibly\nperforming dimensionality reduction; then, a regressor (e.g. a linear\nregressor) is trained on the transformed samples. In\n:class:`~sklearn.decomposition.PCA`, the transformation is purely\nunsupervised, meaning that no information about the targets is used. As a\nresult, PCR may perform poorly in some datasets where the target is strongly\ncorrelated with directions that have low variance. Indeed, the\ndimensionality reduction of PCA projects the data into a lower dimensional\nspace where the variance of the projected data is greedily maximized along\neach axis. Despite them having the most predictive power on the target, the\ndirections with a lower variance will be dropped, and the final regressor\nwill not be able to leverage them.\n\nPLS is both a transformer and a regressor, and it is quite similar to PCR: it\nalso applies a dimensionality reduction to the samples before applying a\nlinear regressor to the transformed data. The main difference with PCR is\nthat the PLS transformation is supervised. Therefore, as we will see in this\nexample, it does not suffer from the issue we just mentioned.\n"
	`18`	+ "\n# Principal Component Regression vs Partial Least Squares Regression\n\nThis example compares [Principal Component Regression](https://en.wikipedia.org/wiki/Principal_component_regression) (PCR) and\n[Partial Least Squares Regression](https://en.wikipedia.org/wiki/Partial_least_squares_regression) (PLS) on a\ntoy dataset. Our goal is to illustrate how PLS can outperform PCR when the\ntarget is strongly correlated with some directions in the data that have a\nlow variance.\n\nPCR is a regressor composed of two steps: first,\n:class:`~sklearn.decomposition.PCA` is applied to the training data, possibly\nperforming dimensionality reduction; then, a regressor (e.g. a linear\nregressor) is trained on the transformed samples. In\n:class:`~sklearn.decomposition.PCA`, the transformation is purely\nunsupervised, meaning that no information about the targets is used. As a\nresult, PCR may perform poorly in some datasets where the target is strongly\ncorrelated with directions that have low variance. Indeed, the\ndimensionality reduction of PCA projects the data into a lower dimensional\nspace where the variance of the projected data is greedily maximized along\neach axis. Despite them having the most predictive power on the target, the\ndirections with a lower variance will be dropped, and the final regressor\nwill not be able to leverage them.\n\nPLS is both a transformer and a regressor, and it is quite similar to PCR: it\nalso applies a dimensionality reduction to the samples before applying a\nlinear regressor to the transformed data. The main difference with PCR is\nthat the PLS transformation is supervised. Therefore, as we will see in this\nexample, it does not suffer from the issue we just mentioned.\n"
`19`	`19`	`]`
`20`	`20`	`},`
`21`	`21`	`{`
Original file line number	Diff line number	Diff line change
`@@ -87,7 +87,7 @@`
`87`	`87`	`"cell_type": "markdown",`
`88`	`88`	`"metadata": {},`
`89`	`89`	`"source": [`
`90`		- "Observe in particular that the repeated token `\"is\"` is counted twice for\ninstance.\n\nBreaking a text document into word tokens, potentially losing the order\ninformation between the words in a sentence is often called a `Bag of Words\nrepresentation <https://en.wikipedia.org/wiki/Bag-of-words_model>`_.\n\n"
	`90`	+ "Observe in particular that the repeated token `\"is\"` is counted twice for\ninstance.\n\nBreaking a text document into word tokens, potentially losing the order\ninformation between the words in a sentence is often called a [Bag of Words\nrepresentation](https://en.wikipedia.org/wiki/Bag-of-words_model).\n\n"
`91`	`91`	`]`
`92`	`92`	`},`
`93`	`93`	`{`
Original file line number	Diff line number	Diff line change
`@@ -284,7 +284,7 @@`
`284`	`284`	`"cell_type": "markdown",`
`285`	`285`	`"metadata": {},`
`286`	`286`	`"source": [`
`287`		- "## Pairwise comparison of all models: frequentist approach\n\nWe could also be interested in comparing the performance of all our models\nevaluated with :class:`~sklearn.model_selection.GridSearchCV`. In this case\nwe would be running our statistical test multiple times, which leads us to\nthe `multiple comparisons problem\n<https://en.wikipedia.org/wiki/Multiple_comparisons_problem>`_.\n\nThere are many possible ways to tackle this problem, but a standard approach\nis to apply a `Bonferroni correction\n<https://en.wikipedia.org/wiki/Bonferroni_correction>`_. Bonferroni can be\ncomputed by multiplying the p-value by the number of comparisons we are\ntesting.\n\nLet's compare the performance of the models using the corrected t-test:\n\n"
	`287`	+ "## Pairwise comparison of all models: frequentist approach\n\nWe could also be interested in comparing the performance of all our models\nevaluated with :class:`~sklearn.model_selection.GridSearchCV`. In this case\nwe would be running our statistical test multiple times, which leads us to\nthe [multiple comparisons problem](https://en.wikipedia.org/wiki/Multiple_comparisons_problem).\n\nThere are many possible ways to tackle this problem, but a standard approach\nis to apply a [Bonferroni correction](https://en.wikipedia.org/wiki/Bonferroni_correction). Bonferroni can be\ncomputed by multiplying the p-value by the number of comparisons we are\ntesting.\n\nLet's compare the performance of the models using the corrected t-test:\n\n"
`288`	`288`	`]`
`289`	`289`	`},`
`290`	`290`	`{`
`@@ -341,7 +341,7 @@`
`341`	`341`	`"cell_type": "markdown",`
`342`	`342`	`"metadata": {},`
`343`	`343`	`"source": [`
`344`		- ".. topic:: References\n\n .. [1] Dietterich, T. G. (1998). `Approximate statistical tests for\n comparing supervised classification learning algorithms\n <http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf>`_.\n Neural computation, 10(7).\n .. [2] Nadeau, C., & Bengio, Y. (2000). `Inference for the generalization\n error\n <https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf>`_.\n In Advances in neural information processing systems.\n .. [3] Bouckaert, R. R., & Frank, E. (2004). `Evaluating the replicability\n of significance tests for comparing learning algorithms\n <https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf>`_.\n In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). `Time\n for a change: a tutorial for comparing multiple classifiers through\n Bayesian analysis\n <http://www.jmlr.org/papers/volume18/16-305/16-305.pdf>`_.\n The Journal of Machine Learning Research, 18(1). See the Python\n library that accompanies this paper `here\n <https://github.com/janezd/baycomp>`_.\n .. [5] Diebold, F.X. & Mariano R.S. (1995). `Comparing predictive accuracy\n <http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf>`_\n Journal of Business & economic statistics, 20(1), 134-144.\n\n"
	`344`	+ ".. topic:: References\n\n .. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n Neural computation, 10(7).\n .. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n In Advances in neural information processing systems.\n .. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n for a change: a tutorial for comparing multiple classifiers through\n Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n The Journal of Machine Learning Research, 18(1). See the Python\n library that accompanies this paper [here](https://github.com/janezd/baycomp).\n .. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n Journal of Business & economic statistics, 20(1), 134-144.\n\n"
`345`	`345`	`]`
`346`	`346`	`}`
`347`	`347`	`],`
Original file line number	Diff line number	Diff line change
`@@ -159,7 +159,7 @@`
`159`	`159`	`"cell_type": "markdown",`
`160`	`160`	`"metadata": {},`
`161`	`161`	`"source": [`
`162`		- "<div class=\"alert alert-info\"><h4>Note</h4><p>In practice, you will have to handle yourself the column data type.\n If you want some columns to be considered as `category`, you will have to\n convert them into categorical columns. If you are using pandas, you can\n refer to their documentation regarding `Categorical data\n <https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html>`_.</p></div>\n\n"
	`162`	+ "<div class=\"alert alert-info\"><h4>Note</h4><p>In practice, you will have to handle yourself the column data type.\n If you want some columns to be considered as `category`, you will have to\n convert them into categorical columns. If you are using pandas, you can\n refer to their documentation regarding [Categorical data](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html).</p></div>\n\n"
`163`	`163`	`]`
`164`	`164`	`},`
`165`	`165`	`{`