Skip to content

Commit 81482cf

Browse files
committed
Pushing the docs to dev/ for branch: main, commit f6d6929f32e3eb7206719c4abde8addddd8a8f08
1 parent 32dc498 commit 81482cf

File tree

1,648 files changed

+7020
-7388
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,648 files changed

+7020
-7388
lines changed

dev/_downloads/02a1306a494b46cc56c930ceec6e8c4a/plot_species_kde.py

+11-11
Original file line numberDiff line numberDiff line change
@@ -18,22 +18,22 @@
1818
1919
The two species are:
2020
21-
- `"Bradypus variegatus"
22-
<https://www.iucnredlist.org/species/3038/47437046>`_ ,
23-
the Brown-throated Sloth.
21+
- `"Bradypus variegatus"
22+
<https://www.iucnredlist.org/species/3038/47437046>`_ ,
23+
the Brown-throated Sloth.
2424
25-
- `"Microryzomys minutus"
26-
<http://www.iucnredlist.org/details/13408/0>`_ ,
27-
also known as the Forest Small Rice Rat, a rodent that lives in Peru,
28-
Colombia, Ecuador, Peru, and Venezuela.
25+
- `"Microryzomys minutus"
26+
<http://www.iucnredlist.org/details/13408/0>`_ ,
27+
also known as the Forest Small Rice Rat, a rodent that lives in Peru,
28+
Colombia, Ecuador, Peru, and Venezuela.
2929
3030
References
3131
----------
3232
33-
* `"Maximum entropy modeling of species geographic distributions"
34-
<http://rob.schapire.net/papers/ecolmod.pdf>`_
35-
S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling,
36-
190:231-259, 2006.
33+
- `"Maximum entropy modeling of species geographic distributions"
34+
<http://rob.schapire.net/papers/ecolmod.pdf>`_
35+
S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling,
36+
190:231-259, 2006.
3737
""" # noqa: E501
3838

3939
# Authors: The scikit-learn developers
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/1c4a422dfa5bd721501d19a2b7e2499b/plot_species_kde.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Kernel Density Estimate of Species Distributions\nThis shows an example of a neighbors-based query (in particular a kernel\ndensity estimate) on geospatial data, using a Ball Tree built upon the\nHaversine distance metric -- i.e. distances over points in latitude/longitude.\nThe dataset is provided by Phillips et. al. (2006).\nIf available, the example uses\n[basemap](https://matplotlib.org/basemap/)\nto plot the coast lines and national boundaries of South America.\n\nThis example does not perform any learning over the data\n(see `sphx_glr_auto_examples_applications_plot_species_distribution_modeling.py` for\nan example of classification based on the attributes in this dataset). It\nsimply shows the kernel density estimate of observed data points in\ngeospatial coordinates.\n\nThe two species are:\n\n - [\"Bradypus variegatus\"](https://www.iucnredlist.org/species/3038/47437046) ,\n the Brown-throated Sloth.\n\n - [\"Microryzomys minutus\"](http://www.iucnredlist.org/details/13408/0) ,\n also known as the Forest Small Rice Rat, a rodent that lives in Peru,\n Colombia, Ecuador, Peru, and Venezuela.\n\n## References\n\n * [\"Maximum entropy modeling of species geographic distributions\"](http://rob.schapire.net/papers/ecolmod.pdf)\n S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling,\n 190:231-259, 2006.\n"
7+
"\n# Kernel Density Estimate of Species Distributions\nThis shows an example of a neighbors-based query (in particular a kernel\ndensity estimate) on geospatial data, using a Ball Tree built upon the\nHaversine distance metric -- i.e. distances over points in latitude/longitude.\nThe dataset is provided by Phillips et. al. (2006).\nIf available, the example uses\n[basemap](https://matplotlib.org/basemap/)\nto plot the coast lines and national boundaries of South America.\n\nThis example does not perform any learning over the data\n(see `sphx_glr_auto_examples_applications_plot_species_distribution_modeling.py` for\nan example of classification based on the attributes in this dataset). It\nsimply shows the kernel density estimate of observed data points in\ngeospatial coordinates.\n\nThe two species are:\n\n- [\"Bradypus variegatus\"](https://www.iucnredlist.org/species/3038/47437046) ,\n the Brown-throated Sloth.\n\n- [\"Microryzomys minutus\"](http://www.iucnredlist.org/details/13408/0) ,\n also known as the Forest Small Rice Rat, a rodent that lives in Peru,\n Colombia, Ecuador, Peru, and Venezuela.\n\n## References\n\n- [\"Maximum entropy modeling of species geographic distributions\"](http://rob.schapire.net/papers/ecolmod.pdf)\n S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling,\n 190:231-259, 2006.\n"
88
]
99
},
1010
{
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/1ed4d16a866c9fe4d86a05477e6d0664/plot_svm_scale_c.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Scaling the regularization parameter for SVCs\n\nThe following example illustrates the effect of scaling the regularization\nparameter when using `svm` for `classification <svm_classification>`.\nFor SVC classification, we are interested in a risk minimization for the\nequation:\n\n\n\\begin{align}C \\sum_{i=1, n} \\mathcal{L} (f(x_i), y_i) + \\Omega (w)\\end{align}\n\nwhere\n\n - $C$ is used to set the amount of regularization\n - $\\mathcal{L}$ is a `loss` function of our samples\n and our model parameters.\n - $\\Omega$ is a `penalty` function of our model parameters\n\nIf we consider the loss function to be the individual error per sample, then the\ndata-fit term, or the sum of the error for each sample, increases as we add more\nsamples. The penalization term, however, does not increase.\n\nWhen using, for example, `cross validation <cross_validation>`, to set the\namount of regularization with `C`, there would be a different amount of samples\nbetween the main problem and the smaller problems within the folds of the cross\nvalidation.\n\nSince the loss function dependens on the amount of samples, the latter\ninfluences the selected value of `C`. The question that arises is \"How do we\noptimally adjust C to account for the different amount of training samples?\"\n"
7+
"\n# Scaling the regularization parameter for SVCs\n\nThe following example illustrates the effect of scaling the regularization\nparameter when using `svm` for `classification <svm_classification>`.\nFor SVC classification, we are interested in a risk minimization for the\nequation:\n\n\n\\begin{align}C \\sum_{i=1, n} \\mathcal{L} (f(x_i), y_i) + \\Omega (w)\\end{align}\n\nwhere\n\n- $C$ is used to set the amount of regularization\n- $\\mathcal{L}$ is a `loss` function of our samples and our model parameters.\n- $\\Omega$ is a `penalty` function of our model parameters\n\nIf we consider the loss function to be the individual error per sample, then the\ndata-fit term, or the sum of the error for each sample, increases as we add more\nsamples. The penalization term, however, does not increase.\n\nWhen using, for example, `cross validation <cross_validation>`, to set the\namount of regularization with `C`, there would be a different amount of samples\nbetween the main problem and the smaller problems within the folds of the cross\nvalidation.\n\nSince the loss function dependens on the amount of samples, the latter\ninfluences the selected value of `C`. The question that arises is \"How do we\noptimally adjust C to account for the different amount of training samples?\"\n"
88
]
99
},
1010
{
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/21a6ff17ef2837fe1cd49e63223a368d/plot_unveil_tree_structure.py

+11-14
Original file line numberDiff line numberDiff line change
@@ -59,20 +59,17 @@
5959
#
6060
# Among these arrays, we have:
6161
#
62-
# - ``children_left[i]``: id of the left child of node ``i`` or -1 if leaf
63-
# node
64-
# - ``children_right[i]``: id of the right child of node ``i`` or -1 if leaf
65-
# node
66-
# - ``feature[i]``: feature used for splitting node ``i``
67-
# - ``threshold[i]``: threshold value at node ``i``
68-
# - ``n_node_samples[i]``: the number of training samples reaching node
69-
# ``i``
70-
# - ``impurity[i]``: the impurity at node ``i``
71-
# - ``weighted_n_node_samples[i]``: the weighted number of training samples
72-
# reaching node ``i``
73-
# - ``value[i, j, k]``: the summary of the training samples that reached node i for
74-
# output j and class k (for regression tree, class is set to 1). See below
75-
# for more information about ``value``.
62+
# - ``children_left[i]``: id of the left child of node ``i`` or -1 if leaf node
63+
# - ``children_right[i]``: id of the right child of node ``i`` or -1 if leaf node
64+
# - ``feature[i]``: feature used for splitting node ``i``
65+
# - ``threshold[i]``: threshold value at node ``i``
66+
# - ``n_node_samples[i]``: the number of training samples reaching node ``i``
67+
# - ``impurity[i]``: the impurity at node ``i``
68+
# - ``weighted_n_node_samples[i]``: the weighted number of training samples
69+
# reaching node ``i``
70+
# - ``value[i, j, k]``: the summary of the training samples that reached node i for
71+
# output j and class k (for regression tree, class is set to 1). See below
72+
# for more information about ``value``.
7673
#
7774
# Using the arrays, we can traverse the tree structure to compute various
7875
# properties. Below, we will compute the depth of each node and whether or not
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/23e3d7fa2388aef4e9a60c4a6caf166d/plot_face_recognition.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Faces recognition example using eigenfaces and SVMs\n\nThe dataset used in this example is a preprocessed excerpt of the\n\"Labeled Faces in the Wild\", aka LFW_:\n\n http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)\n\n"
7+
"\n# Faces recognition example using eigenfaces and SVMs\n\nThe dataset used in this example is a preprocessed excerpt of the\n\"Labeled Faces in the Wild\", aka LFW_:\nhttp://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)\n\n"
88
]
99
},
1010
{
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/3c9b7bcd0b16f172ac12ffad61f3b5f0/plot_stack_predictors.ipynb

+4-4
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
"cell_type": "markdown",
2323
"metadata": {},
2424
"source": [
25-
"## Download the dataset\n\n We will use the `Ames Housing`_ dataset which was first compiled by Dean De Cock\n and became better known after it was used in Kaggle challenge. It is a set\n of 1460 residential homes in Ames, Iowa, each described by 80 features. We\n will use it to predict the final logarithmic price of the houses. In this\n example we will use only 20 most interesting features chosen using\n GradientBoostingRegressor() and limit number of entries (here we won't go\n into the details on how to select the most interesting features).\n\n The Ames housing dataset is not shipped with scikit-learn and therefore we\n will fetch it from `OpenML`_.\n\n\n"
25+
"## Download the dataset\n\nWe will use the `Ames Housing`_ dataset which was first compiled by Dean De Cock\nand became better known after it was used in Kaggle challenge. It is a set\nof 1460 residential homes in Ames, Iowa, each described by 80 features. We\nwill use it to predict the final logarithmic price of the houses. In this\nexample we will use only 20 most interesting features chosen using\nGradientBoostingRegressor() and limit number of entries (here we won't go\ninto the details on how to select the most interesting features).\n\nThe Ames housing dataset is not shipped with scikit-learn and therefore we\nwill fetch it from `OpenML`_.\n\n\n"
2626
]
2727
},
2828
{
@@ -40,7 +40,7 @@
4040
"cell_type": "markdown",
4141
"metadata": {},
4242
"source": [
43-
"## Make pipeline to preprocess the data\n\n Before we can use Ames dataset we still need to do some preprocessing.\n First, we will select the categorical and numerical columns of the dataset to\n construct the first step of the pipeline.\n\n"
43+
"## Make pipeline to preprocess the data\n\nBefore we can use Ames dataset we still need to do some preprocessing.\nFirst, we will select the categorical and numerical columns of the dataset to\nconstruct the first step of the pipeline.\n\n"
4444
]
4545
},
4646
{
@@ -105,7 +105,7 @@
105105
"cell_type": "markdown",
106106
"metadata": {},
107107
"source": [
108-
"## Stack of predictors on a single data set\n\n It is sometimes tedious to find the model which will best perform on a given\n dataset. Stacking provide an alternative by combining the outputs of several\n learners, without the need to choose a model specifically. The performance of\n stacking is usually close to the best model and sometimes it can outperform\n the prediction performance of each individual model.\n\n Here, we combine 3 learners (linear and non-linear) and use a ridge regressor\n to combine their outputs together.\n\n .. note::\n Although we will make new pipelines with the processors which we wrote in\n the previous section for the 3 learners, the final estimator\n :class:`~sklearn.linear_model.RidgeCV()` does not need preprocessing of\n the data as it will be fed with the already preprocessed output from the 3\n learners.\n\n"
108+
"## Stack of predictors on a single data set\n\nIt is sometimes tedious to find the model which will best perform on a given\ndataset. Stacking provide an alternative by combining the outputs of several\nlearners, without the need to choose a model specifically. The performance of\nstacking is usually close to the best model and sometimes it can outperform\nthe prediction performance of each individual model.\n\nHere, we combine 3 learners (linear and non-linear) and use a ridge regressor\nto combine their outputs together.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>Although we will make new pipelines with the processors which we wrote in\n the previous section for the 3 learners, the final estimator\n :class:`~sklearn.linear_model.RidgeCV()` does not need preprocessing of\n the data as it will be fed with the already preprocessed output from the 3\n learners.</p></div>\n\n"
109109
]
110110
},
111111
{
@@ -156,7 +156,7 @@
156156
"cell_type": "markdown",
157157
"metadata": {},
158158
"source": [
159-
"## Measure and plot the results\n\n Now we can use Ames Housing dataset to make the predictions. We check the\n performance of each individual predictor as well as of the stack of the\n regressors.\n\n"
159+
"## Measure and plot the results\n\nNow we can use Ames Housing dataset to make the predictions. We check the\nperformance of each individual predictor as well as of the stack of the\nregressors.\n\n"
160160
]
161161
},
162162
{

dev/_downloads/3d58721191491072eecc520f0a45cdb3/plot_lasso_and_elasticnet.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77
signal obtained from sparse and correlated features that are further corrupted
88
with additive gaussian noise:
99
10-
- a :ref:`lasso`;
11-
- an :ref:`automatic_relevance_determination`;
12-
- an :ref:`elastic_net`.
10+
- a :ref:`lasso`;
11+
- an :ref:`automatic_relevance_determination`;
12+
- an :ref:`elastic_net`.
1313
1414
It is known that the Lasso estimates turn to be close to the model selection
1515
estimates when the data dimensions grow, given that the irrelevant variables are
@@ -244,6 +244,6 @@
244244
# References
245245
# ----------
246246
#
247-
# .. [1] :doi:`"Lasso-type recovery of sparse representations for
247+
# .. [1] :doi:`"Lasso-type recovery of sparse representations for
248248
# high-dimensional data" N. Meinshausen, B. Yu - The Annals of Statistics
249249
# 2009, Vol. 37, No. 1, 246-270 <10.1214/07-AOS582>`
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/51bc3899ceeec0ecf99c5f72ff1fd241/wikipedia_principal_eigenvector.py

+2-8
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,7 @@
66
A classical way to assert the relative importance of vertices in a
77
graph is to compute the principal eigenvector of the adjacency matrix
88
so as to assign to each vertex the values of the components of the first
9-
eigenvector as a centrality score:
10-
11-
https://en.wikipedia.org/wiki/Eigenvector_centrality
12-
9+
eigenvector as a centrality score: https://en.wikipedia.org/wiki/Eigenvector_centrality.
1310
On the graph of webpages and links those values are called the PageRank
1411
scores by Google.
1512
@@ -18,10 +15,7 @@
1815
this eigenvector centrality.
1916
2017
The traditional way to compute the principal eigenvector is to use the
21-
power iteration method:
22-
23-
https://en.wikipedia.org/wiki/Power_iteration
24-
18+
`power iteration method <https://en.wikipedia.org/wiki/Power_iteration>`_.
2519
Here the computation is achieved thanks to Martinsson's Randomized SVD
2620
algorithm implemented in scikit-learn.
2721

dev/_downloads/521b554adefca348463adbbe047d7e99/plot_linear_model_coefficient_interpretation.py

+1-6
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,6 @@
3232
We will use data from the `"Current Population Survey"
3333
<https://www.openml.org/d/534>`_ from 1985 to predict wage as a function of
3434
various features such as experience, age, or education.
35-
36-
.. contents::
37-
:local:
38-
:depth: 1
39-
4035
"""
4136

4237
# Authors: The scikit-learn developers
@@ -98,7 +93,7 @@
9893
# at the pairwise relationships between them. Only numerical
9994
# variables will be used. In the following plot, each dot represents a sample.
10095
#
101-
# .. _marginal_dependencies:
96+
# .. _marginal_dependencies:
10297

10398
train_dataset = X_train.copy()
10499
train_dataset.insert(0, "WAGE", y_train)
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/59fa6c0bdd3e601dd83ea28de5feeffd/plot_random_multilabel_dataset.py

+11-11
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,17 @@
99
1010
Points are labeled as follows, where Y means the class is present:
1111
12-
===== ===== ===== ======
13-
1 2 3 Color
14-
===== ===== ===== ======
15-
Y N N Red
16-
N Y N Blue
17-
N N Y Yellow
18-
Y Y N Purple
19-
Y N Y Orange
20-
Y Y N Green
21-
Y Y Y Brown
22-
===== ===== ===== ======
12+
===== ===== ===== ======
13+
1 2 3 Color
14+
===== ===== ===== ======
15+
Y N N Red
16+
N Y N Blue
17+
N N Y Yellow
18+
Y Y N Purple
19+
Y N Y Orange
20+
Y Y N Green
21+
Y Y Y Brown
22+
===== ===== ===== ======
2323
2424
A star marks the expected sample for each class; its size reflects the
2525
probability of selecting that class label.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/65b807da1fd0f3cbb60c1425fddba026/plot_multilabel.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Multilabel classification\n\nThis example simulates a multi-label document classification problem. The\ndataset is generated randomly based on the following process:\n\n - pick the number of labels: n ~ Poisson(n_labels)\n - n times, choose a class c: c ~ Multinomial(theta)\n - pick the document length: k ~ Poisson(length)\n - k times, choose a word: w ~ Multinomial(theta_c)\n\nIn the above process, rejection sampling is used to make sure that n is more\nthan 2, and that the document length is never zero. Likewise, we reject classes\nwhich have already been chosen. The documents that are assigned to both\nclasses are plotted surrounded by two colored circles.\n\nThe classification is performed by projecting to the first two principal\ncomponents found by PCA and CCA for visualisation purposes, followed by using\nthe :class:`~sklearn.multiclass.OneVsRestClassifier` metaclassifier using two\nSVCs with linear kernels to learn a discriminative model for each class.\nNote that PCA is used to perform an unsupervised dimensionality reduction,\nwhile CCA is used to perform a supervised one.\n\nNote: in the plot, \"unlabeled samples\" does not mean that we don't know the\nlabels (as in semi-supervised learning) but that the samples simply do *not*\nhave a label.\n"
7+
"\n# Multilabel classification\n\nThis example simulates a multi-label document classification problem. The\ndataset is generated randomly based on the following process:\n\n- pick the number of labels: n ~ Poisson(n_labels)\n- n times, choose a class c: c ~ Multinomial(theta)\n- pick the document length: k ~ Poisson(length)\n- k times, choose a word: w ~ Multinomial(theta_c)\n\nIn the above process, rejection sampling is used to make sure that n is more\nthan 2, and that the document length is never zero. Likewise, we reject classes\nwhich have already been chosen. The documents that are assigned to both\nclasses are plotted surrounded by two colored circles.\n\nThe classification is performed by projecting to the first two principal\ncomponents found by PCA and CCA for visualisation purposes, followed by using\nthe :class:`~sklearn.multiclass.OneVsRestClassifier` metaclassifier using two\nSVCs with linear kernels to learn a discriminative model for each class.\nNote that PCA is used to perform an unsupervised dimensionality reduction,\nwhile CCA is used to perform a supervised one.\n\nNote: in the plot, \"unlabeled samples\" does not mean that we don't know the\nlabels (as in semi-supervised learning) but that the samples simply do *not*\nhave a label.\n"
88
]
99
},
1010
{
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)