Skip to content

Commit 73b4e24

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 4fcbbc01be6406500f55b07c7b86278f17cb2f20
1 parent 70b6b74 commit 73b4e24

File tree

1,317 files changed

+5846
-5745
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,317 files changed

+5846
-5745
lines changed

dev/.buildinfo

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 6cd47419ab6a7583ffff2c1811ceb650
3+
config: 5744e0ab801bd7d341c57909164479c2
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.

dev/_downloads/21a6ff17ef2837fe1cd49e63223a368d/plot_unveil_tree_structure.py

+31-4
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,8 @@
4444
#
4545
# The decision classifier has an attribute called ``tree_`` which allows access
4646
# to low level attributes such as ``node_count``, the total number of nodes,
47-
# and ``max_depth``, the maximal depth of the tree. It also stores the
47+
# and ``max_depth``, the maximal depth of the tree. The tree_.compute_node_depths()
48+
# method computes the depth of each node in the tree. `tree_` also stores the
4849
# entire binary tree structure, represented as a number of parallel arrays. The
4950
# i-th element of each array holds information about the node ``i``. Node 0 is
5051
# the tree's root. Some of the arrays only apply to either leaves or split
@@ -63,6 +64,10 @@
6364
# - ``n_node_samples[i]``: the number of training samples reaching node
6465
# ``i``
6566
# - ``impurity[i]``: the impurity at node ``i``
67+
# - ``weighted_n_node_samples[i]``: the weighted number of training samples
68+
# reaching node ``i``
69+
# - ``value[i, j, k]``: the summary of the training samples that reached node i for
70+
# class j and output k.
6671
#
6772
# Using the arrays, we can traverse the tree structure to compute various
6873
# properties. Below, we will compute the depth of each node and whether or not
@@ -73,6 +78,7 @@
7378
children_right = clf.tree_.children_right
7479
feature = clf.tree_.feature
7580
threshold = clf.tree_.threshold
81+
values = clf.tree_.value
7682

7783
node_depth = np.zeros(shape=n_nodes, dtype=np.int64)
7884
is_leaves = np.zeros(shape=n_nodes, dtype=bool)
@@ -100,13 +106,13 @@
100106
for i in range(n_nodes):
101107
if is_leaves[i]:
102108
print(
103-
"{space}node={node} is a leaf node.".format(
104-
space=node_depth[i] * "\t", node=i
109+
"{space}node={node} is a leaf node with value={value}.".format(
110+
space=node_depth[i] * "\t", node=i, value=values[i]
105111
)
106112
)
107113
else:
108114
print(
109-
"{space}node={node} is a split node: "
115+
"{space}node={node} is a split node with value={value}: "
110116
"go to node {left} if X[:, {feature}] <= {threshold} "
111117
"else to node {right}.".format(
112118
space=node_depth[i] * "\t",
@@ -115,9 +121,30 @@
115121
feature=feature[i],
116122
threshold=threshold[i],
117123
right=children_right[i],
124+
value=values[i],
118125
)
119126
)
120127

128+
# %%
129+
# What is the values array used here?
130+
# -----------------------------------
131+
# The `tree_.value` array is a 3D array of shape
132+
# [``n_nodes``, ``n_classes``, ``n_outputs``] which provides the count of samples
133+
# reaching a node for each class and for each output. Each node has a ``value``
134+
# array which is the number of weighted samples reaching this
135+
# node for each output and class.
136+
#
137+
# For example, in the above tree built on the iris dataset, the root node has
138+
# ``value = [37, 34, 41]``, indicating there are 37 samples
139+
# of class 0, 34 samples of class 1, and 41 samples of class 2 at the root node.
140+
# Traversing the tree, the samples are split and as a result, the ``value`` array
141+
# reaching each node changes. The left child of the root node has ``value = [37, 0, 0]``
142+
# because all 37 samples in the left child node are from class 0.
143+
#
144+
# Note: In this example, `n_outputs=1`, but the tree classifier can also handle
145+
# multi-output problems. The `value` array at each node would just be a 2D
146+
# array instead.
147+
121148
##############################################################################
122149
# We can compare the above output to the plot of the decision tree.
123150

Binary file not shown.

dev/_downloads/f7a387851c5762610f4e8197e52bbbca/plot_unveil_tree_structure.ipynb

+9-2
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
"cell_type": "markdown",
4141
"metadata": {},
4242
"source": [
43-
"## Tree structure\n\nThe decision classifier has an attribute called ``tree_`` which allows access\nto low level attributes such as ``node_count``, the total number of nodes,\nand ``max_depth``, the maximal depth of the tree. It also stores the\nentire binary tree structure, represented as a number of parallel arrays. The\ni-th element of each array holds information about the node ``i``. Node 0 is\nthe tree's root. Some of the arrays only apply to either leaves or split\nnodes. In this case the values of the nodes of the other type is arbitrary.\nFor example, the arrays ``feature`` and ``threshold`` only apply to split\nnodes. The values for leaf nodes in these arrays are therefore arbitrary.\n\nAmong these arrays, we have:\n\n - ``children_left[i]``: id of the left child of node ``i`` or -1 if leaf\n node\n - ``children_right[i]``: id of the right child of node ``i`` or -1 if leaf\n node\n - ``feature[i]``: feature used for splitting node ``i``\n - ``threshold[i]``: threshold value at node ``i``\n - ``n_node_samples[i]``: the number of training samples reaching node\n ``i``\n - ``impurity[i]``: the impurity at node ``i``\n\nUsing the arrays, we can traverse the tree structure to compute various\nproperties. Below, we will compute the depth of each node and whether or not\nit is a leaf.\n\n"
43+
"## Tree structure\n\nThe decision classifier has an attribute called ``tree_`` which allows access\nto low level attributes such as ``node_count``, the total number of nodes,\nand ``max_depth``, the maximal depth of the tree. The tree_.compute_node_depths()\nmethod computes the depth of each node in the tree. `tree_` also stores the\nentire binary tree structure, represented as a number of parallel arrays. The\ni-th element of each array holds information about the node ``i``. Node 0 is\nthe tree's root. Some of the arrays only apply to either leaves or split\nnodes. In this case the values of the nodes of the other type is arbitrary.\nFor example, the arrays ``feature`` and ``threshold`` only apply to split\nnodes. The values for leaf nodes in these arrays are therefore arbitrary.\n\nAmong these arrays, we have:\n\n - ``children_left[i]``: id of the left child of node ``i`` or -1 if leaf\n node\n - ``children_right[i]``: id of the right child of node ``i`` or -1 if leaf\n node\n - ``feature[i]``: feature used for splitting node ``i``\n - ``threshold[i]``: threshold value at node ``i``\n - ``n_node_samples[i]``: the number of training samples reaching node\n ``i``\n - ``impurity[i]``: the impurity at node ``i``\n - ``weighted_n_node_samples[i]``: the weighted number of training samples\n reaching node ``i``\n - ``value[i, j, k]``: the summary of the training samples that reached node i for\n class j and output k.\n\nUsing the arrays, we can traverse the tree structure to compute various\nproperties. Below, we will compute the depth of each node and whether or not\nit is a leaf.\n\n"
4444
]
4545
},
4646
{
@@ -51,7 +51,14 @@
5151
},
5252
"outputs": [],
5353
"source": [
54-
"n_nodes = clf.tree_.node_count\nchildren_left = clf.tree_.children_left\nchildren_right = clf.tree_.children_right\nfeature = clf.tree_.feature\nthreshold = clf.tree_.threshold\n\nnode_depth = np.zeros(shape=n_nodes, dtype=np.int64)\nis_leaves = np.zeros(shape=n_nodes, dtype=bool)\nstack = [(0, 0)] # start with the root node id (0) and its depth (0)\nwhile len(stack) > 0:\n # `pop` ensures each node is only visited once\n node_id, depth = stack.pop()\n node_depth[node_id] = depth\n\n # If the left and right child of a node is not the same we have a split\n # node\n is_split_node = children_left[node_id] != children_right[node_id]\n # If a split node, append left and right children and depth to `stack`\n # so we can loop through them\n if is_split_node:\n stack.append((children_left[node_id], depth + 1))\n stack.append((children_right[node_id], depth + 1))\n else:\n is_leaves[node_id] = True\n\nprint(\n \"The binary tree structure has {n} nodes and has \"\n \"the following tree structure:\\n\".format(n=n_nodes)\n)\nfor i in range(n_nodes):\n if is_leaves[i]:\n print(\n \"{space}node={node} is a leaf node.\".format(\n space=node_depth[i] * \"\\t\", node=i\n )\n )\n else:\n print(\n \"{space}node={node} is a split node: \"\n \"go to node {left} if X[:, {feature}] <= {threshold} \"\n \"else to node {right}.\".format(\n space=node_depth[i] * \"\\t\",\n node=i,\n left=children_left[i],\n feature=feature[i],\n threshold=threshold[i],\n right=children_right[i],\n )\n )"
54+
"n_nodes = clf.tree_.node_count\nchildren_left = clf.tree_.children_left\nchildren_right = clf.tree_.children_right\nfeature = clf.tree_.feature\nthreshold = clf.tree_.threshold\nvalues = clf.tree_.value\n\nnode_depth = np.zeros(shape=n_nodes, dtype=np.int64)\nis_leaves = np.zeros(shape=n_nodes, dtype=bool)\nstack = [(0, 0)] # start with the root node id (0) and its depth (0)\nwhile len(stack) > 0:\n # `pop` ensures each node is only visited once\n node_id, depth = stack.pop()\n node_depth[node_id] = depth\n\n # If the left and right child of a node is not the same we have a split\n # node\n is_split_node = children_left[node_id] != children_right[node_id]\n # If a split node, append left and right children and depth to `stack`\n # so we can loop through them\n if is_split_node:\n stack.append((children_left[node_id], depth + 1))\n stack.append((children_right[node_id], depth + 1))\n else:\n is_leaves[node_id] = True\n\nprint(\n \"The binary tree structure has {n} nodes and has \"\n \"the following tree structure:\\n\".format(n=n_nodes)\n)\nfor i in range(n_nodes):\n if is_leaves[i]:\n print(\n \"{space}node={node} is a leaf node with value={value}.\".format(\n space=node_depth[i] * \"\\t\", node=i, value=values[i]\n )\n )\n else:\n print(\n \"{space}node={node} is a split node with value={value}: \"\n \"go to node {left} if X[:, {feature}] <= {threshold} \"\n \"else to node {right}.\".format(\n space=node_depth[i] * \"\\t\",\n node=i,\n left=children_left[i],\n feature=feature[i],\n threshold=threshold[i],\n right=children_right[i],\n value=values[i],\n )\n )"
55+
]
56+
},
57+
{
58+
"cell_type": "markdown",
59+
"metadata": {},
60+
"source": [
61+
"## What is the values array used here?\nThe `tree_.value` array is a 3D array of shape\n[``n_nodes``, ``n_classes``, ``n_outputs``] which provides the count of samples\nreaching a node for each class and for each output. Each node has a ``value``\narray which is the number of weighted samples reaching this\nnode for each output and class.\n\nFor example, in the above tree built on the iris dataset, the root node has\n``value = [37, 34, 41]``, indicating there are 37 samples\nof class 0, 34 samples of class 1, and 41 samples of class 2 at the root node.\nTraversing the tree, the samples are split and as a result, the ``value`` array\nreaching each node changes. The left child of the root node has ``value = [37, 0, 0]``\nbecause all 37 samples in the left child node are from class 0.\n\nNote: In this example, `n_outputs=1`, but the tree classifier can also handle\nmulti-output problems. The `value` array at each node would just be a 2D\narray instead.\n\n"
5562
]
5663
},
5764
{

dev/_downloads/scikit-learn-docs.zip

-755 Bytes
Binary file not shown.
-204 Bytes
-10 Bytes
-10 Bytes
6 Bytes
-14 Bytes
-38 Bytes
-11 Bytes
68 Bytes
110 Bytes
125 Bytes
29 Bytes
1.16 KB
-3.12 KB
46 Bytes
-12 Bytes
-19 Bytes
-3 Bytes
98 Bytes
-189 Bytes
-195 Bytes
25 Bytes
-128 Bytes
-12 Bytes
169 Bytes
-164 Bytes
156 Bytes
103 Bytes
-201 Bytes
42 Bytes
-53 Bytes
-70 Bytes
-11 Bytes

dev/_sources/auto_examples/applications/plot_cyclical_feature_engineering.rst.txt

+1-1

dev/_sources/auto_examples/applications/plot_digits_denoising.rst.txt

+1-1

dev/_sources/auto_examples/applications/plot_face_recognition.rst.txt

+4-4

dev/_sources/auto_examples/applications/plot_model_complexity_influence.rst.txt

+15-15

0 commit comments

Comments
 (0)