Skip to content

Commit 6de9741

Browse files
committed
Pushing the docs to 1.1/ for branch: 1.1.X, commit 17df37aee774720212c27dbc34e6f1feef0e2482
1 parent 721b1ec commit 6de9741

File tree

1,631 files changed

+34829
-26430
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,631 files changed

+34829
-26430
lines changed

1.1/.buildinfo

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 8c6ff21e847d280e934fd16d253894de
3+
config: 2ef931295e3d55971b9853cfa92577eb
44
tags: 645f666f9bcd5a90fca523b33c5a78b7

1.1/_downloads/0486bf9e537e44cedd2a236d034bcd90/plot_pcr_vs_pls.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Principal Component Regression vs Partial Least Squares Regression\n\nThis example compares `Principal Component Regression\n<https://en.wikipedia.org/wiki/Principal_component_regression>`_ (PCR) and\n`Partial Least Squares Regression\n<https://en.wikipedia.org/wiki/Partial_least_squares_regression>`_ (PLS) on a\ntoy dataset. Our goal is to illustrate how PLS can outperform PCR when the\ntarget is strongly correlated with some directions in the data that have a\nlow variance.\n\nPCR is a regressor composed of two steps: first,\n:class:`~sklearn.decomposition.PCA` is applied to the training data, possibly\nperforming dimensionality reduction; then, a regressor (e.g. a linear\nregressor) is trained on the transformed samples. In\n:class:`~sklearn.decomposition.PCA`, the transformation is purely\nunsupervised, meaning that no information about the targets is used. As a\nresult, PCR may perform poorly in some datasets where the target is strongly\ncorrelated with *directions* that have low variance. Indeed, the\ndimensionality reduction of PCA projects the data into a lower dimensional\nspace where the variance of the projected data is greedily maximized along\neach axis. Despite them having the most predictive power on the target, the\ndirections with a lower variance will be dropped, and the final regressor\nwill not be able to leverage them.\n\nPLS is both a transformer and a regressor, and it is quite similar to PCR: it\nalso applies a dimensionality reduction to the samples before applying a\nlinear regressor to the transformed data. The main difference with PCR is\nthat the PLS transformation is supervised. Therefore, as we will see in this\nexample, it does not suffer from the issue we just mentioned.\n"
18+
"\n# Principal Component Regression vs Partial Least Squares Regression\n\nThis example compares [Principal Component Regression](https://en.wikipedia.org/wiki/Principal_component_regression) (PCR) and\n[Partial Least Squares Regression](https://en.wikipedia.org/wiki/Partial_least_squares_regression) (PLS) on a\ntoy dataset. Our goal is to illustrate how PLS can outperform PCR when the\ntarget is strongly correlated with some directions in the data that have a\nlow variance.\n\nPCR is a regressor composed of two steps: first,\n:class:`~sklearn.decomposition.PCA` is applied to the training data, possibly\nperforming dimensionality reduction; then, a regressor (e.g. a linear\nregressor) is trained on the transformed samples. In\n:class:`~sklearn.decomposition.PCA`, the transformation is purely\nunsupervised, meaning that no information about the targets is used. As a\nresult, PCR may perform poorly in some datasets where the target is strongly\ncorrelated with *directions* that have low variance. Indeed, the\ndimensionality reduction of PCA projects the data into a lower dimensional\nspace where the variance of the projected data is greedily maximized along\neach axis. Despite them having the most predictive power on the target, the\ndirections with a lower variance will be dropped, and the final regressor\nwill not be able to leverage them.\n\nPLS is both a transformer and a regressor, and it is quite similar to PCR: it\nalso applies a dimensionality reduction to the samples before applying a\nlinear regressor to the transformed data. The main difference with PCR is\nthat the PLS transformation is supervised. Therefore, as we will see in this\nexample, it does not suffer from the issue we just mentioned.\n"
1919
]
2020
},
2121
{

1.1/_downloads/05ca8a4e90b4cc2acd69f9e24b4a1f3a/plot_classifier_chain_yeast.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Classifier Chain\nExample of using classifier chain on a multilabel dataset.\n\nFor this example we will use the `yeast\n<https://www.openml.org/d/40597>`_ dataset which contains\n2417 datapoints each with 103 features and 14 possible labels. Each\ndata point has at least one label. As a baseline we first train a logistic\nregression classifier for each of the 14 labels. To evaluate the performance of\nthese classifiers we predict on a held-out test set and calculate the\n`jaccard score <jaccard_similarity_score>` for each sample.\n\nNext we create 10 classifier chains. Each classifier chain contains a\nlogistic regression model for each of the 14 labels. The models in each\nchain are ordered randomly. In addition to the 103 features in the dataset,\neach model gets the predictions of the preceding models in the chain as\nfeatures (note that by default at training time each model gets the true\nlabels as features). These additional features allow each chain to exploit\ncorrelations among the classes. The Jaccard similarity score for each chain\ntends to be greater than that of the set independent logistic models.\n\nBecause the models in each chain are arranged randomly there is significant\nvariation in performance among the chains. Presumably there is an optimal\nordering of the classes in a chain that will yield the best performance.\nHowever we do not know that ordering a priori. Instead we can construct an\nvoting ensemble of classifier chains by averaging the binary predictions of\nthe chains and apply a threshold of 0.5. The Jaccard similarity score of the\nensemble is greater than that of the independent models and tends to exceed\nthe score of each chain in the ensemble (although this is not guaranteed\nwith randomly ordered chains).\n"
18+
"\n# Classifier Chain\nExample of using classifier chain on a multilabel dataset.\n\nFor this example we will use the [yeast](https://www.openml.org/d/40597) dataset which contains\n2417 datapoints each with 103 features and 14 possible labels. Each\ndata point has at least one label. As a baseline we first train a logistic\nregression classifier for each of the 14 labels. To evaluate the performance of\nthese classifiers we predict on a held-out test set and calculate the\n`jaccard score <jaccard_similarity_score>` for each sample.\n\nNext we create 10 classifier chains. Each classifier chain contains a\nlogistic regression model for each of the 14 labels. The models in each\nchain are ordered randomly. In addition to the 103 features in the dataset,\neach model gets the predictions of the preceding models in the chain as\nfeatures (note that by default at training time each model gets the true\nlabels as features). These additional features allow each chain to exploit\ncorrelations among the classes. The Jaccard similarity score for each chain\ntends to be greater than that of the set independent logistic models.\n\nBecause the models in each chain are arranged randomly there is significant\nvariation in performance among the chains. Presumably there is an optimal\nordering of the classes in a chain that will yield the best performance.\nHowever we do not know that ordering a priori. Instead we can construct an\nvoting ensemble of classifier chains by averaging the binary predictions of\nthe chains and apply a threshold of 0.5. The Jaccard similarity score of the\nensemble is greater than that of the independent models and tends to exceed\nthe score of each chain in the ensemble (although this is not guaranteed\nwith randomly ordered chains).\n"
1919
]
2020
},
2121
{

1.1/_downloads/06cfc926acb27652fb2aa5bfc583e7cb/plot_hashing_vs_dict_vectorizer.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@
8787
"cell_type": "markdown",
8888
"metadata": {},
8989
"source": [
90-
"Observe in particular that the repeated token `\"is\"` is counted twice for\ninstance.\n\nBreaking a text document into word tokens, potentially losing the order\ninformation between the words in a sentence is often called a `Bag of Words\nrepresentation <https://en.wikipedia.org/wiki/Bag-of-words_model>`_.\n\n"
90+
"Observe in particular that the repeated token `\"is\"` is counted twice for\ninstance.\n\nBreaking a text document into word tokens, potentially losing the order\ninformation between the words in a sentence is often called a [Bag of Words\nrepresentation](https://en.wikipedia.org/wiki/Bag-of-words_model).\n\n"
9191
]
9292
},
9393
{
Binary file not shown.

1.1/_downloads/1c4a422dfa5bd721501d19a2b7e2499b/plot_species_kde.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Kernel Density Estimate of Species Distributions\nThis shows an example of a neighbors-based query (in particular a kernel\ndensity estimate) on geospatial data, using a Ball Tree built upon the\nHaversine distance metric -- i.e. distances over points in latitude/longitude.\nThe dataset is provided by Phillips et. al. (2006).\nIf available, the example uses\n`basemap <https://matplotlib.org/basemap/>`_\nto plot the coast lines and national boundaries of South America.\n\nThis example does not perform any learning over the data\n(see `sphx_glr_auto_examples_applications_plot_species_distribution_modeling.py` for\nan example of classification based on the attributes in this dataset). It\nsimply shows the kernel density estimate of observed data points in\ngeospatial coordinates.\n\nThe two species are:\n\n - `\"Bradypus variegatus\"\n <https://www.iucnredlist.org/species/3038/47437046>`_ ,\n the Brown-throated Sloth.\n\n - `\"Microryzomys minutus\"\n <http://www.iucnredlist.org/details/13408/0>`_ ,\n also known as the Forest Small Rice Rat, a rodent that lives in Peru,\n Colombia, Ecuador, Peru, and Venezuela.\n\n## References\n\n * `\"Maximum entropy modeling of species geographic distributions\"\n <http://rob.schapire.net/papers/ecolmod.pdf>`_\n S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling,\n 190:231-259, 2006.\n"
18+
"\n# Kernel Density Estimate of Species Distributions\nThis shows an example of a neighbors-based query (in particular a kernel\ndensity estimate) on geospatial data, using a Ball Tree built upon the\nHaversine distance metric -- i.e. distances over points in latitude/longitude.\nThe dataset is provided by Phillips et. al. (2006).\nIf available, the example uses\n[basemap](https://matplotlib.org/basemap/)\nto plot the coast lines and national boundaries of South America.\n\nThis example does not perform any learning over the data\n(see `sphx_glr_auto_examples_applications_plot_species_distribution_modeling.py` for\nan example of classification based on the attributes in this dataset). It\nsimply shows the kernel density estimate of observed data points in\ngeospatial coordinates.\n\nThe two species are:\n\n - [\"Bradypus variegatus\"](https://www.iucnredlist.org/species/3038/47437046) ,\n the Brown-throated Sloth.\n\n - [\"Microryzomys minutus\"](http://www.iucnredlist.org/details/13408/0) ,\n also known as the Forest Small Rice Rat, a rodent that lives in Peru,\n Colombia, Ecuador, Peru, and Venezuela.\n\n## References\n\n * [\"Maximum entropy modeling of species geographic distributions\"](http://rob.schapire.net/papers/ecolmod.pdf)\n S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling,\n 190:231-259, 2006.\n"
1919
]
2020
},
2121
{

1.1/_downloads/215c560d29193ab9b0a495609bc74802/plot_monotonic_constraints.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Monotonic Constraints\n\nThis example illustrates the effect of monotonic constraints on a gradient\nboosting estimator.\n\nWe build an artificial dataset where the target value is in general\npositively correlated with the first feature (with some random and\nnon-random variations), and in general negatively correlated with the second\nfeature.\n\nBy imposing a positive (increasing) or negative (decreasing) constraint on\nthe features during the learning process, the estimator is able to properly\nfollow the general trend instead of being subject to the variations.\n\nThis example was inspired by the `XGBoost documentation\n<https://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html>`_.\n"
18+
"\n# Monotonic Constraints\n\nThis example illustrates the effect of monotonic constraints on a gradient\nboosting estimator.\n\nWe build an artificial dataset where the target value is in general\npositively correlated with the first feature (with some random and\nnon-random variations), and in general negatively correlated with the second\nfeature.\n\nBy imposing a positive (increasing) or negative (decreasing) constraint on\nthe features during the learning process, the estimator is able to properly\nfollow the general trend instead of being subject to the variations.\n\nThis example was inspired by the [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html).\n"
1919
]
2020
},
2121
{

1.1/_downloads/2402de18d671ce5087e3760b2540184f/plot_grid_search_stats.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@
284284
"cell_type": "markdown",
285285
"metadata": {},
286286
"source": [
287-
"## Pairwise comparison of all models: frequentist approach\n\nWe could also be interested in comparing the performance of all our models\nevaluated with :class:`~sklearn.model_selection.GridSearchCV`. In this case\nwe would be running our statistical test multiple times, which leads us to\nthe `multiple comparisons problem\n<https://en.wikipedia.org/wiki/Multiple_comparisons_problem>`_.\n\nThere are many possible ways to tackle this problem, but a standard approach\nis to apply a `Bonferroni correction\n<https://en.wikipedia.org/wiki/Bonferroni_correction>`_. Bonferroni can be\ncomputed by multiplying the p-value by the number of comparisons we are\ntesting.\n\nLet's compare the performance of the models using the corrected t-test:\n\n"
287+
"## Pairwise comparison of all models: frequentist approach\n\nWe could also be interested in comparing the performance of all our models\nevaluated with :class:`~sklearn.model_selection.GridSearchCV`. In this case\nwe would be running our statistical test multiple times, which leads us to\nthe [multiple comparisons problem](https://en.wikipedia.org/wiki/Multiple_comparisons_problem).\n\nThere are many possible ways to tackle this problem, but a standard approach\nis to apply a [Bonferroni correction](https://en.wikipedia.org/wiki/Bonferroni_correction). Bonferroni can be\ncomputed by multiplying the p-value by the number of comparisons we are\ntesting.\n\nLet's compare the performance of the models using the corrected t-test:\n\n"
288288
]
289289
},
290290
{
@@ -341,7 +341,7 @@
341341
"cell_type": "markdown",
342342
"metadata": {},
343343
"source": [
344-
".. topic:: References\n\n .. [1] Dietterich, T. G. (1998). `Approximate statistical tests for\n comparing supervised classification learning algorithms\n <http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf>`_.\n Neural computation, 10(7).\n .. [2] Nadeau, C., & Bengio, Y. (2000). `Inference for the generalization\n error\n <https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf>`_.\n In Advances in neural information processing systems.\n .. [3] Bouckaert, R. R., & Frank, E. (2004). `Evaluating the replicability\n of significance tests for comparing learning algorithms\n <https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf>`_.\n In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). `Time\n for a change: a tutorial for comparing multiple classifiers through\n Bayesian analysis\n <http://www.jmlr.org/papers/volume18/16-305/16-305.pdf>`_.\n The Journal of Machine Learning Research, 18(1). See the Python\n library that accompanies this paper `here\n <https://github.com/janezd/baycomp>`_.\n .. [5] Diebold, F.X. & Mariano R.S. (1995). `Comparing predictive accuracy\n <http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf>`_\n Journal of Business & economic statistics, 20(1), 134-144.\n\n"
344+
".. topic:: References\n\n .. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n Neural computation, 10(7).\n .. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n In Advances in neural information processing systems.\n .. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n for a change: a tutorial for comparing multiple classifiers through\n Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n The Journal of Machine Learning Research, 18(1). See the Python\n library that accompanies this paper [here](https://github.com/janezd/baycomp).\n .. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n Journal of Business & economic statistics, 20(1), 134-144.\n\n"
345345
]
346346
}
347347
],

1.1/_downloads/26998096b90db15754e891c733ae032c/plot_iris_dataset.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# The Iris Dataset\nThis data sets consists of 3 different types of irises'\n(Setosa, Versicolour, and Virginica) petal and sepal\nlength, stored in a 150x4 numpy.ndarray\n\nThe rows being the samples and the columns being:\nSepal Length, Sepal Width, Petal Length and Petal Width.\n\nThe below plot uses the first two features.\nSee `here <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_ for more\ninformation on this dataset.\n"
18+
"\n# The Iris Dataset\nThis data sets consists of 3 different types of irises'\n(Setosa, Versicolour, and Virginica) petal and sepal\nlength, stored in a 150x4 numpy.ndarray\n\nThe rows being the samples and the columns being:\nSepal Length, Sepal Width, Petal Length and Petal Width.\n\nThe below plot uses the first two features.\nSee [here](https://en.wikipedia.org/wiki/Iris_flower_data_set) for more\ninformation on this dataset.\n"
1919
]
2020
},
2121
{

1.1/_downloads/26f110ad6cff1a8a7c58b1a00d8b8b5a/plot_column_transformer_mixed_types.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@
159159
"cell_type": "markdown",
160160
"metadata": {},
161161
"source": [
162-
"<div class=\"alert alert-info\"><h4>Note</h4><p>In practice, you will have to handle yourself the column data type.\n If you want some columns to be considered as `category`, you will have to\n convert them into categorical columns. If you are using pandas, you can\n refer to their documentation regarding `Categorical data\n <https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html>`_.</p></div>\n\n"
162+
"<div class=\"alert alert-info\"><h4>Note</h4><p>In practice, you will have to handle yourself the column data type.\n If you want some columns to be considered as `category`, you will have to\n convert them into categorical columns. If you are using pandas, you can\n refer to their documentation regarding [Categorical data](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html).</p></div>\n\n"
163163
]
164164
},
165165
{

0 commit comments

Comments
 (0)