0% found this document useful (0 votes)

60 views12 pages

Random Forests: N 1 N J X A I X A I

The document discusses random forests and summarizes key aspects in 3 sentences: Random forests are a simple and effective machine learning method that averages the predictions of many decision trees built from random subsets of the data. Decision trees partition the data space and make predictions based on the average outcomes in each partition region. Bagging improves upon single decision trees by averaging the predictions of many trees each built from bootstrap samples of the original data.

Uploaded by

endale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views12 pages

Random Forests: N 1 N J X A I X A I

Uploaded by

endale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Random Forests

One of the best known classifiers is the random forest. It is very simple and effective but
there is still a large gap between theory and practice. Basically, a random forest is an average
of tree estimators.

These notes rely heavily on Biau and Scornet (2016) as well as the other references at the
end of the notes.

1 Partitions and Trees

We begin by reviewing trees. As with nonparametric regression, simple and interpretable

classifiers can be derived by partitioning the range of X. Let Πn = {A1 , . . . , AN } be
a partition of X . Let Aj be the partition element that contains x. Then b h(x) = 1 if
P P
Xi ∈Aj Yi ≥ Xi ∈Aj (1 − Yi ) and h(x) = 0 otherwise. This is nothing other than the plugin
b
classifier based on the partition regression estimator
N
X
m(x)
b = Y j I(x ∈ Aj )
j=1

−1
Pn
where Y j = nj i=1 Yi I(Xi ∈ Aj ) is the average of the Yi ’s in Aj and nj = #{Xi ∈ Aj }.
(We define Y j to be 0 if nj = 0.)

Recall from the results on regression that if m ∈ H1 (1, L) and the binwidth b of a regular
partition satisfies b n−1/(d+2) then
c
b − m||2P ≤ 2/(d+2) .
E||m (1)
n
h) − R(h∗ ) = O(n−1/(d+2) ).
We conclude that the corresponding classification risk satisfies R(b

Regression trees and classification trees (also called decision trees) are partition classifiers
where the partition is built recursively. For illustration, suppose there are two covariates,
X1 = age and X2 = blood pressure. Figure 1 shows a classification tree using these variables.

The tree is used in the following way. If a subject has Age ≥ 50 then we classify him as
Y = 1. If a subject has Age < 50 then we check his blood pressure. If systolic blood pressure
is < 100 then we classify him as Y = 1, otherwise we classify him as Y = 0. Figure 2 shows
the same classifier as a partition of the covariate space.

Here is how a tree is constructed. First, suppose that there is only a single covariate X. We
choose a split point t that divides the real line into two sets A1 = (−∞, t] and A2 = (t, ∞).
Let Y 1 be the mean of the Yi ’s in A1 and let Y 2 be the mean of the Yi ’s in A2 .

1
Age
< 50 ≥ 50

Blood Pressure 1

< 100 ≥ 100

0 1

Figure 1: A simple classification tree.

1
Blood Pressure

110
1
0

50
Age

Figure 2: Partition representation of classification tree.

2
For continuous Y (regression), the split is chosen to minimize the training error. For binary
Y (classification), the split is chosen to minimizeP2a surrogate for classification error. A
common choice is the impurity defined by I(t) = s=1 γs where
2
γs = 1 − [Y s + (1 − Y s )2 ]. (2)

This particular measure of impurity is known as the Gini index. If a partition element As
contains all 0’s or all 1’s, then γs = 0. Otherwise, γs > 0. We choose the split point t to
minimize the impurity. Other indices of impurity besides the Gini index can be used, such
as entropy. The reason for using impurity rather than classification error is because impurity
is a smooth function and hence is easy to minimize.

Now we continue recursively splitting until some stopping criterion is met. For example, we
might stop when every partition element has fewer than n0 data points, where n0 is some
fixed number. The bottom nodes of the tree are called the leaves. Each leaf has an estimate
m(x)
b which is the mean of Yi ’s in that leaf. For classification, we take b
h(x) = I(m(x)
b > 1/2).
When there are several covariates, we choose whichever covariate and split that leads to the
lowest impurity.

The result is a piecewise constant estimator that can be represented as a tree.

2 Example

The following data are from simulated images of gamma ray events for the Major Atmo-
spheric Gamma-ray Imaging Cherenkov Telescope (MAGIC) in the Canary Islands. The
data are from archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope. The telescope
studies gamma ray bursts, active galactic nuclei and supernovae remnants. The goal is to
predict if an event is real or is background (hadronic shower). There are 11 predictors that
are numerical summaries of the images. We randomly selected 400 training points (200 pos-
itive and 200 negative) and 1000 test cases (500 positive and 500 negative). The results of
various methods are in Table 1. See Figures 3, 4, 5, 6.

3 Bagging

Trees are useful for their simplicity and interpretability. But the prediction error can be
reduced by combining many trees. A common approach, called bagging, is as follows.

Suppose we draw B bootstrap samples and each time we construct a classifier. This gives tree
classifiers h1 , . . . , hB . (The same idea applies to regression.) We now classify by combining

3
Method Test Error
Logistic regression 0.23
SVM (Gaussian Kernel) 0.20
Kernel Regression 0.24
Additive Model 0.20
Reduced Additive Model 0.20
11-NN 0.25
Trees 0.20

Table 1: Various methods on the MAGIC data. The reduced additive model is based on
using the three most significant variables from the additive model.
0.2

0.8

0.05
0.4
0.1
0.0

0.6

0.00
0.2
0.0

0.4

−0.05
−0.1

−0.1

0.0
0.2

−0.10
−0.2

−0.2

0.0

−0.2

−0.15
−0.3

−0.2
−0.3

−0.20
−0.4

−1 0 1 2 3 4 5 0 2 4 6 −1 0 1 2 3 4 −2 −1 0 1 2 −2 −1 0 1 2
0.15

0.06

0.04
0.2
0.2
0.04
0.10

0.03
0.1
0.02

0.02
0.05

0.1

0.01
0.00

0.0
0.00

0.00
−0.02

0.0

−0.1
−0.05

−0.04

−0.02
−0.10

−0.1

−0.2
−0.06

−4 −2 0 2 −4 −2 0 2 −6 −2 0 2 4 6 −1.0 0.0 1.0 2.0 −2 −1 0 1 2

Figure 3: Estimated functions for additive model.

4
0.29
0.28
0.27
Test Error

0.26
0.25

0 10 20 30 40 50

Figure 4: Test error versus k for nearest neighbor estimator.

xtrain.V9 < −0.189962

xtrain.V1 < 1.21831 xtrain.V1 < −0.536394

xtrain.V4 < 0.411748 xtrain.V4 < −0.769513 xtrain.V10 < −0.369401 xtrain.V2 < 0.343193

xtrain.V8 < −0.912288 xtrain.V3 < −0.463854 xtrain.V3 < −1.14519 xtrain.V7 < 0.015902
0 0 0 0

xtrain.V6 < −0.274359 xtrain.V10 < 0.4797 xtrain.V2 < −0.607802 xtrain.V8 < −0.199142
0 0 1 0

xtrain.V5 < 1.41292 xtrain.V4 < 1.95513

xtrain.V3 < −0.96174
1 1 0 0 0

xtrain.V1 < −0.787108

1 0 1 1 0

1 1

Figure 5: Full tree.

5
xtrain.V9 < | !0.189962

xtrain.V1 < 1.21831 xtrain.V1 < !0.536394

1 0

xtrain.V10 < !0.369401

0
1 0

Figure 6: Classification tree. The size of the tree was chosen by cross-validation.

them: (
1 if B1 j hj (x) ≥ 1
P
2
h(x) =
0 otherwise.
This is called bagging which stands for bootstrap aggregation. A variation is sub-bagging
where we use subsamples instead of bootstrap samples.

To get some intuition about why bagging is useful, consider this example from Buhlmann
and Yu (2002). Suppose that x ∈ R and consider the simple decision rule θbn = I(Y n ≤ x).
Let µ = E[Yi ] and for simplicity assume that Var(Yi ) = 1. Suppose that x is close to µ
√
relative to the sample size. We can model this by setting x ≡ xn = µ + c/ n. Then θbn
converges to I(Z ≤ c) where Z ∼ N (0, 1). So the limiting mean and variance of θbn are
∗
Φ(c) and Φ(c)(1 − Φ(c)). Now the bootstrap
√ distribution of Y (conditional on Y1 , . . . , Yn )
∗
is approximately N (Y , 1/n). That is, n(Y − Y ) ≈ N (0, 1). Let E ∗ denote the average
with respect to the bootstrap randomness. Then, if θen is the bagged estimator, we have
" !#
∗ ∗ ∗
√ ∗ √
θen = E [I(Y ≤ xn )] = E I n(Y − Y ) ≤ n(xn − Y )
√
= Φ( n(xn − Y )) + o(1) = Φ(c + Z) + o(1)

where Z ∼ N (0, 1), and we used the fact that Y ≈ N (µ, 1/n).

To summarize, θbn ≈ I(Z ≤ c) while θen ≈ Φ(c + Z) which is a smoothed version of I(Z ≤ c).

6
In other words, bagging is a smoothing operator. In particular, suppose we take c = 0.
Then θbn converges to a Bernoulli with mean 1/2 and variance 1/4. The bagged estimator
converges to Φ(Z) = Unif(0, 1) which has mean 1/2 and variance 1/12. The reduction in
variance is due to the smoothing effect of bagging.

4 Random Forests

Finally we get to random forests. These are bagged trees except that we also choose random
subsets of features for each tree. The estimator can be written as
1 X
m(x)
b = m
b j (x)
M j

where mb j is a tree estimator based on a subsample (or bootstrap) of size a using p randomly
selected features. The trees are usually required to have some number k of observations in
the leaves. There are three tuning parameters: a, p and k. You could also think of M as a
tuning parameter but generally we can think of M as tending to ∞.

For each tree, we can estimate the prediction error on the un-used data. (The tree is built
on a subsample.) Averaging these prediction errors gives an estimate called the out-of-bag
error estimate.

Unfortunately, it is very difficult to develop theory for random forests since the splitting
is done using greedy methods. Much of the theoretical analysis is done using simplified
versions of random forests. For example, the centered forest is defined as follows. Suppose
the data are on [0, 1]d . Choose a random feature, split in the center. Repeat until there are
k leaves. This defines one tree. Now we average M such trees. Breiman (2004) and Biau
(2002) proved the following.

Theorem 1 If each feature is selected with probability 1/d, k = o(n) and k → ∞ then

E[|m(X)
b − m(X)|2 ] → 0

as n → ∞.

Under stronger assumptions we can say more:

Theorem 2 Suppose that m is Lipschitz and that m only depends on a subset S of the
features and that the probability of selecting j ∈ S is (1/S)(1 + o(1)). Then
3
4|S| log
2 1 2+3
E|m(X)
b − m(X)| = O .
n

7
This is better than the usual Lipschitz rate n−2/(d+2) if |S| ≤ p/2. But the condition that
we select relevant variables with high probability is very strong and proving that this holds
is a research problem.

A significant step forward was made by Scornet, Biau and Vert (2015). Here is their result.

Theorem 3 Suppose that Y = j mj (X(j)) + where X ∼ Uniform[0, 1]d , ∼ N (0, σ 2 )

P
and each mj is continuous. Assume that the split is chosen using the maximum drop in sums
of squares. Let tn be the number of leaves on each tree and let an be the subsample size. If
tn → ∞, an → ∞ and tn (log an )9 /an → 0 then
E[|m(X)
b − m(X)|2 ] → 0
as n → ∞.

Again, the theorem has strong assumptions but it does allow a greedy split selection. Scornet,
Biau and Vert (2015) provide another interesting result. Suppose that (i) there is a subset
S of relevant features, (ii) p = d, (iii) mj is not constant on any interval for j ∈ S. Then
with high probability, we always split only on relevant variables.

5 Connection to Nearest Neighbors

Lin and Jeon (2006) showed that there is a connection between random forests and k-NN
methods. We say that Xi is a layered nearest neighbor (LNN) of x If the hyper-rectangle
defined by x and Xi contains no data points except Xi . Now note that if tree is grown until
each leaf has one point, then m(x)
b is simply a weighted average of LNN’s. More generally,
Lin and Jeon (2006) call Xi a k-potential nearest neighbor k − P N N if there are fewer than
k samples in the the hyper-rectangle defined by x and Xi . If we restrict to random forests
whose leaves have k points then it follows easily that m(x)
b is some weighted average of the
k − P N N ’s.

Let us know return to LNN’s. Let Ln (x) denote all LNN’s of x and let Ln (x) = |Ln (x)|. We
could directly define
1 X
m(x)
b = Yi I(Xi ∈ Ln (x)).
Ln (x) i
Biau and Devroye (2010) showed that, if X has a continuous density,
(d − 1)!E[Ln (x)]
→ 1.
2d (log n)d−1
Moreover, if Y is bounded and m is continuous then, for all p ≥ 1,
b n (X) − m(X)|p → 0
E|m

8
as n → ∞. Unfortunately, the rate of convergence is slow. Suppose that Var(Y |X = x) = σ 2
is constant. Then
σ2 σ 2 (d − 1)!
b n (X) − m(X)|p ≥
E|m ∼ d .
E[Ln (x)] 2 (log n)d−1

If we use k-PNN, with k → ∞ and k = o(n), then the results Lin and Jeon (2006) show that
the estimator is consistent and has variance of order O(1/k(log n)d−1 ).

As an aside, Biau and Devroye (2010) also show that if we apply bagging to the usual 1-NN
rule to subsamples of size k and then average over subsamples, then, if k → ∞ and k = o(n),
then for all p ≥ 1 and all distributions P , we have that E|m(X)
b − m(X)|p → 0. So bagged
1-NN is universally consistent. But at this point, we have wondered quite far from random
forests.

6 Connection to Kernel Methods

There is also a connection between random forests and kernel methods (Scornet 2016). Let
Aj (x) be the cell containing x in the j th tree. Then we can write the tree estimator as

1 X X Yi I(Xi ∈ Aj (x)) 1 XX
m(x)
b = = Wij Yj
M j i Nj (x) M j i

where Nj (x) is the number of data points in Aj (x) and Wij = I(Xi ∈ Aj (x))/Nj (x). This
suggests that a cell Aj with low density (and hence small Nj (x)) has a high weight. Based
on this observation, Scornet (2016) defined kernel based random forest (KeRF) by
P P
j i Yi I(Xi ∈ Aj (x))
m(x)
b = P .
j Nj (x)

With this modification, m(x)

b is the average of each Yi weighted by how often it appears in
the trees. The KeRF can be written as
P
Yi K(x, Xi )
m(x)
b = Pi
s Kn (x, Xs )

where
1 X
Kn (x, z) = I(x ∈ Aj (x)).
M j

The trees are random. So let us write the j th tree as Tj = T (Θj ) for some random quantity
Θj . So the forests is built from T (Θ1 ), . . . , T (ΘM ). And we can write Aj (x) as A(x, Θj ).
Then Kn (x, z) converges almost surely (as M → ∞) to κn (x, z) = PΘ (z ∈ A(x, Θ)) which is

9
just the probability that x and z are connected, in the sense that they are in the same cell.
Under some assumptions, Scornet (2016) showed that KeRF’s and forests are close to each
other, thus providing a kernel interpretation of forests.

Recall the centered forest we discussed earlier. This is a stylized forest — quite different
from the forests used in practice — but they provide a nice way to study the properties
of the forest. In the case of KeRF’s, Scornet (2016) shows that if m(x) is Lipschitz and
X ∼ Unif([0, 1]d ) then
3+d1log 2
2 12
E[(m(x)
b − m(x)) ] ≤ C(log n) .
n

This is slower than the minimax rate n−2/(d+2) but this probably reflects the difficulty in
analyzing forests.

7 Variable Importance

Let m
b be a random forest estimator. How important is feature X(j)?

LOCO. One way to answer this question is to fit the forest with all the data and fit it
again without using X(j). When we construct a forest, we randomly select features for each
tree. This second forest can be obtained by simply average the trees where feature j was
b (−j) . Let H be a hold-out sample of size m. Then let
not selected. Call this m

bj = 1
X
∆ Wi
m i∈H

where
b (−j) (Xi ))2 − (Yi − m(X
Wi = (Yi − m b i ))2 .
Then ∆j is a consistent estimate of the prediction risk inflation that occurs by not having
access to X(j). Formally, if T denotes the training data then,
" #

2 2
E[∆j |T ] = E (Y − m
b b (−j) (X)) − (Y − m(X))
b T ≡ ∆j .

In fact, since ∆
b j is simply an average, we can easily construct a confidence interval. This
approach is called LOCO (Leave-Out-COvariates). Of course, it is easily extended to sets
of features. The method is explored in Lei, G’Sell, Rinaldo, Tibshirani, Wasserman (2017)
and Rinaldo, Tibshirani, Wasserman (2015).

Permutation Importance. A different approach is to permute the values of X(j) for the
out-of-bag observations, for each tree. Let Oj be the out-of-bag observations for tree j and

10
let Oj∗ be the out-of-bag observations for tree j with X(j) permuted.

bj = 1
XX
Γ Wij
M j i

where
1 X 1 X
Wij = b j (Xi ))2 −
(Yi − m b j (Xi ))2 .
(Yi − m
mj i∈O∗ mj i∈O
j j

This avoids using a hold-out sample. This is estimating

b j0 ))2 ] − E[(Y − m(X))

Γj = E[(Y − m(X b 2
]

where Xj0 has the same distribution as X except that Xj0 (j) is an independent draw from
X(j). This is a lot like LOCO but its meaning is less clear. Note that mb j is not changed
when X(j) is permuted. Gregorutti, Michel and Saint Pierre. (2013) show that, when (X, )
is Gaussian, that Var(X) = (1 − c)I + c11T and that Cov(Y, X(j)) = τ for all j then
2
τ
Γj = 2 .
1 − c + dc

It
P is not clear how this connects to the actual importance of X(j). In the case where Y =
2
j mj (X(j)) + with E[|X] = 0 and E[ |X] < ∞, they show that Γj = 2Var(mj (X(j)).

8 Inference

Using
√ the theory of infinite order U -statistics, Mentch and Hooker (2015) showed that
n(m(x)
b − E[m(x)])/σ
b converges to a Normal(0,1) and they show how to estimate σ.

Wager and Athey (2017) show asymptotic normality if we use sample splitting: part of the
data are used to build the tree and part is used to estimate the average in the leafs of the
tree. Under a number of technical conditions — including the fact that we use subsamples
of size s = nβ with β < 1 — they show that (m(x)
b − m(x))/σn (x) N (0, 1) and they show
how to estimate σn (x). Specifically,
2 X
n−1 n
bn2 (x)
σ = b j (x), Nij )2
(Cov(m
n n−s i

where the covariance is with respect to the trees in the forest and Nij = 1 of (Xi , Yi ) was in
the j th subsample and 0 otherwise.

11
9 Summary

Random forests are considered one of the best all purpose classifiers. But it is still a mystery
why they work so well. The situation is very similar to deep learning. We have seen that
there are now many interesting theoretical results about forests. But the results make strong
assumptions that create a gap between practice and theory. Furthermore, there is no theory
to say why forests outperform other methods. The gap between theory and practice is due
to the fact that forests — as actually used in practice — are complex functions of the data.

10 References

Biau, Devroye and Lugosi. (2008). Consistency of Random Forests and Other Average
Classifiers. JMLR.

Biau, Gerard, and Scornet. (2016). A random forest guided tour. Test 25.2: 197-227.

Biau, G. (2012). Analysis of a Random Forests Model. arXiv:1005.0208.

Buhlmann, P., and Yu, B. (2002). Analyzing bagging. Annals of Statistics, 927-961.

Gregorutti, Michel, and Saint Pierre. (2013). Correlation and variable importance in random
forests. arXiv:1310.5726.

Lei J, G’Sell M, Rinaldo A, Tibshirani RJ, Wasserman L. (2017). Distribution-free predictive

inference for regression. Journal of the American Statistical Association.

Lin, Y. and Jeon, Y. (2006). Random Forests and Adaptive Nearest Neighbors. Journal of
the American Statistical Association, 101, p 578.

L. Mentch and G. Hooker. (2015). Ensemble trees and CLTs: Statistical inference for
supervised learning. Journal of Machine Learning Research.

Rinaldo A, Tibshirani R, Wasserman L. (2015). Uniform asymptotic inference and the

bootstrap after model selection. arXiv preprint arXiv:1506.06266.

Scornet E. Random forests and kernel methods. (2016). IEEE Transactions on Information
Theory. 62(3):1485-500.

Wager, S. (2014). Asymptotic Theory for Random Forests. arXiv:1405.0352.

Wager, S. (2015). Uniform convergence of random forests via adaptive concentration. arXiv:1503.06388.

Wager, S. and Athey, S. (2017). Estimation and inference of heterogeneous treatment effects
using random forests. Journal of the American Statistical Association.

Best Practices For Effectively Implementing An ATP Sanitation Verification Program
100% (1)
Best Practices For Effectively Implementing An ATP Sanitation Verification Program
16 pages
Business Data Analysis Using Excel, 2010 (David Whigham) PDF
75% (4)
Business Data Analysis Using Excel, 2010 (David Whigham) PDF
315 pages
JU Publications List
100% (5)
JU Publications List
1,017 pages
Master Plan Porto Romano Bay Albania
100% (1)
Master Plan Porto Romano Bay Albania
138 pages
Basic Statistics in Business and Economics 10th Edition Lind Unlocked Test Bank
0% (1)
Basic Statistics in Business and Economics 10th Edition Lind Unlocked Test Bank
321 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
CIMA Syllabus Final
No ratings yet
CIMA Syllabus Final
128 pages
Real Variables with Basic Metric Space Topology
From Everand
Real Variables with Basic Metric Space Topology
Robert B. Ash
5/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
SDS Underwater Cutting Rods 2018 PDF
100% (1)
SDS Underwater Cutting Rods 2018 PDF
8 pages
Design Proposal of An Automatic Smart MultiInsect Mosquito Killing System IEEE
No ratings yet
Design Proposal of An Automatic Smart MultiInsect Mosquito Killing System IEEE
6 pages
08 Tree Advanced
No ratings yet
08 Tree Advanced
68 pages
Bagging and Random Forests
No ratings yet
Bagging and Random Forests
24 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
23 Ens RandomForests
No ratings yet
23 Ens RandomForests
27 pages
Discoverer10g Administration
No ratings yet
Discoverer10g Administration
85 pages
Random Forest Explained
No ratings yet
Random Forest Explained
39 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Special Edition On Irreecha Oct. 2016 2
100% (1)
Special Edition On Irreecha Oct. 2016 2
21 pages
Lecture 22: Bagging and Random Forest: Wenbin Lu Department of Statistics North Carolina State University Fall 2019
No ratings yet
Lecture 22: Bagging and Random Forest: Wenbin Lu Department of Statistics North Carolina State University Fall 2019
35 pages
Week 12
No ratings yet
Week 12
34 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
Harsh It
No ratings yet
Harsh It
9 pages
Data Science - Decision Tree - Random Forest
No ratings yet
Data Science - Decision Tree - Random Forest
15 pages
Random Forests - Machine Learning
No ratings yet
Random Forests - Machine Learning
9 pages
Biau 2016
No ratings yet
Biau 2016
31 pages
Random Forests
No ratings yet
Random Forests
43 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Biau 08 A
No ratings yet
Biau 08 A
19 pages
CS109a Lecture16 Bagging RF Boosting
No ratings yet
CS109a Lecture16 Bagging RF Boosting
48 pages
Bachelor Thesis
No ratings yet
Bachelor Thesis
81 pages
Random Forest
No ratings yet
Random Forest
5 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Image Filtering Images by Pawan Sinha
No ratings yet
Image Filtering Images by Pawan Sinha
6 pages
Random Forest
No ratings yet
Random Forest
30 pages
Artificial Intelligence and Machine Learning in Financial Services
No ratings yet
Artificial Intelligence and Machine Learning in Financial Services
45 pages
Paper Review On Security Feature For Cloud Computing
No ratings yet
Paper Review On Security Feature For Cloud Computing
32 pages
Advanced Predictive Analytics Using R & Python: - Muquayyar Ahmed Data Scientist
No ratings yet
Advanced Predictive Analytics Using R & Python: - Muquayyar Ahmed Data Scientist
11 pages
Random Forest
No ratings yet
Random Forest
25 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
SPH Catalogue
No ratings yet
SPH Catalogue
127 pages
Case Study Guideline
0% (1)
Case Study Guideline
2 pages
Random Forests
No ratings yet
Random Forests
35 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
RandomForest2324 CR - 4p
No ratings yet
RandomForest2324 CR - 4p
11 pages
CHP 8.2 Intro To Statistical Learning
No ratings yet
CHP 8.2 Intro To Statistical Learning
13 pages
A Random Forest Guided Tour: Gerard - Biau@
No ratings yet
A Random Forest Guided Tour: Gerard - Biau@
41 pages
Lecture 11 Slides - After
No ratings yet
Lecture 11 Slides - After
55 pages
Montillo RandomForests 4-2-2009
No ratings yet
Montillo RandomForests 4-2-2009
28 pages
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
No ratings yet
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
27 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Random Forest Intro Presented
No ratings yet
Random Forest Intro Presented
38 pages
COPC 2021 CX Standard For Customer Operations Release 7.0
No ratings yet
COPC 2021 CX Standard For Customer Operations Release 7.0
80 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Ensemble Methods
No ratings yet
Ensemble Methods
32 pages
Random Forest
No ratings yet
Random Forest
29 pages
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
No ratings yet
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
53 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
2307
No ratings yet
2307
3 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Slide 3
No ratings yet
Slide 3
23 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Random Forests - SpringerLink
No ratings yet
Random Forests - SpringerLink
6 pages
Oromia Credit and Saving Share Company Information System Department SDT Individual Weekely Plan
100% (1)
Oromia Credit and Saving Share Company Information System Department SDT Individual Weekely Plan
2 pages
Random Forest
No ratings yet
Random Forest
83 pages
What Are The Mean and Median Filters
No ratings yet
What Are The Mean and Median Filters
6 pages
Temesgen Dawit
No ratings yet
Temesgen Dawit
90 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Machine Learning in Ecology
No ratings yet
Machine Learning in Ecology
15 pages
ML Lec6
No ratings yet
ML Lec6
4 pages
Rmi and Corba: Why Both Are Valuable Tools
No ratings yet
Rmi and Corba: Why Both Are Valuable Tools
40 pages
ML Mid Question Solve
No ratings yet
ML Mid Question Solve
19 pages
DL Industries Investor Presentation
No ratings yet
DL Industries Investor Presentation
28 pages
Guo Paper 2019
No ratings yet
Guo Paper 2019
4 pages
CLEF2011wn QA4MRE PakrayEt2011
No ratings yet
CLEF2011wn QA4MRE PakrayEt2011
16 pages
12122020
No ratings yet
12122020
1 page
Random Forest
No ratings yet
Random Forest
8 pages
Reading A .TXT File in A Project in Eclipse
No ratings yet
Reading A .TXT File in A Project in Eclipse
11 pages
(Maria Brouwer) Governance and Innovation (Routled
No ratings yet
(Maria Brouwer) Governance and Innovation (Routled
260 pages
Hierarchical Afaan Oromoo News Text Classification
No ratings yet
Hierarchical Afaan Oromoo News Text Classification
11 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Petron Plustm Formula 7 Diesel Engine Conditioner
No ratings yet
Petron Plustm Formula 7 Diesel Engine Conditioner
2 pages
EMS-Motor Starter Wiring and Enclosuers
No ratings yet
EMS-Motor Starter Wiring and Enclosuers
3 pages
Will Tasks Identifying To Achieve Objectives
No ratings yet
Will Tasks Identifying To Achieve Objectives
2 pages
Guyyaa 17/07/2019 To: IT Director Dhaaf
No ratings yet
Guyyaa 17/07/2019 To: IT Director Dhaaf
2 pages
1 s2.0 S1755581723001256 Main
No ratings yet
1 s2.0 S1755581723001256 Main
41 pages
K7D628
No ratings yet
K7D628
16 pages
Special Power of Attorney
No ratings yet
Special Power of Attorney
2 pages
Rooster
No ratings yet
Rooster
15 pages
Zone Cell Phone
No ratings yet
Zone Cell Phone
1 page
Kahoot Koonji Intro To PM Week 1 7
No ratings yet
Kahoot Koonji Intro To PM Week 1 7
7 pages
FastLink CAT5e (SFTP) Outdoor
No ratings yet
FastLink CAT5e (SFTP) Outdoor
3 pages
EE Lab 10
No ratings yet
EE Lab 10
7 pages
2019 Shark Fishing and Finning Regulations
No ratings yet
2019 Shark Fishing and Finning Regulations
9 pages
OPDR PENAL ISSUE How To Do Peyment If System Show OPDR PENAL Error
No ratings yet
OPDR PENAL ISSUE How To Do Peyment If System Show OPDR PENAL Error
3 pages
Macronix MX25L12855FXCI 10G Datasheet
No ratings yet
Macronix MX25L12855FXCI 10G Datasheet
15 pages
BBMF2063 Tutorial Questions - 202306-10
No ratings yet
BBMF2063 Tutorial Questions - 202306-10
1 page
Lecture 14 Biosynthesis and Degradation of Nucleic Acids
No ratings yet
Lecture 14 Biosynthesis and Degradation of Nucleic Acids
17 pages
Statement of Saving Account: Error: Subreport Could Not Be Shown
No ratings yet
Statement of Saving Account: Error: Subreport Could Not Be Shown
4 pages
2 2 2
No ratings yet
2 2 2
4 pages
Iphellstar Shirt Hellstar Studios Short Sleeve Tee Shirt6644203228classType VARIANT&From Search
No ratings yet
Iphellstar Shirt Hellstar Studios Short Sleeve Tee Shirt6644203228classType VARIANT&From Search
1 page
SIUT Registration Form
No ratings yet
SIUT Registration Form
4 pages
A Gentle Introduction To SuperCollider (2nd Edition)
No ratings yet
A Gentle Introduction To SuperCollider (2nd Edition)
122 pages

Random Forests: N 1 N J X A I X A I

Uploaded by

Random Forests: N 1 N J X A I X A I

Uploaded by

Random Forests

1 Partitions and Trees

We begin by reviewing trees. As with nonparametric regression, simple and interpretable

< 100 ≥ 100

Figure 1: A simple classification tree.

Figure 2: Partition representation of classification tree.

The result is a piecewise constant estimator that can be represented as a tree.

−4 −2 0 2 −4 −2 0 2 −6 −2 0 2 4 6 −1.0 0.0 1.0 2.0 −2 −1 0 1 2

Figure 3: Estimated functions for additive model.

Figure 4: Test error versus k for nearest neighbor estimator.

xtrain.V9 < −0.189962

xtrain.V1 < 1.21831 xtrain.V1 < −0.536394

xtrain.V5 < 1.41292 xtrain.V4 < 1.95513

xtrain.V1 < −0.787108

Figure 5: Full tree.

xtrain.V1 < 1.21831 xtrain.V1 < !0.536394

xtrain.V10 < !0.369401

Under stronger assumptions we can say more:

Theorem 3 Suppose that Y = j mj (X(j)) +  where X ∼ Uniform[0, 1]d ,  ∼ N (0, σ 2 )

5 Connection to Nearest Neighbors

6 Connection to Kernel Methods

With this modification, m(x)

This avoids using a hold-out sample. This is estimating

b j0 ))2 ] − E[(Y − m(X))

Biau, G. (2012). Analysis of a Random Forests Model. arXiv:1005.0208.

Lei J, G’Sell M, Rinaldo A, Tibshirani RJ, Wasserman L. (2017). Distribution-free predictive

Rinaldo A, Tibshirani R, Wasserman L. (2015). Uniform asymptotic inference and the

Wager, S. (2014). Asymptotic Theory for Random Forests. arXiv:1405.0352.

You might also like

Theorem 3 Suppose that Y = j mj (X(j)) + where X ∼ Uniform[0, 1]d , ∼ N (0, σ 2 )