0% found this document useful (0 votes)

71 views12 pages

Rise of The Machines: Larry Wasserman

This chapter discusses the rise of machine learning and its relationship to statistics. It notes that machine learning offers new research areas, applications, and colleagues to work with, but that statistics departments risk becoming outdated if they do not embrace machine learning. The chapter describes how machine learning has adopted many statistical concepts and now addresses topics like density estimation and graphical models. However, statistics has neglected some machine learning areas like semisupervised learning and deep learning. It argues the fast conference publication culture of machine learning allows the field to progress more quickly than statistics. The chapter concludes with two case studies of how statistical thinking can benefit machine learning problems.

Uploaded by

Ismael Kaïssy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views12 pages

Rise of The Machines: Larry Wasserman

Uploaded by

Ismael Kaïssy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Chapter 1

Rise of the Machines

Larry Wasserman

On the 50th anniversary of the Committee of Presidents of Statistical So-

cieties I reflect on the rise of the field of Machine Learning and what it means
for Statistics. Machine Learning offers a plethora of new research areas, new
applications areas and new colleagues to work with. Our students now compete
with Machine Learning students for jobs. I am optimistic that visionary Statis-
tics departments will embrace this emerging field; those that ignore or eschew
Machine Learning do so at their own risk and may find themselves in the rubble
of an outdated, antiquated field.

1
1.1 Introduction
Statistics is the science of learning from data. Machine Learning (ML) is the
science of learning from data. These fields are identical in intent although they
differ in their history, conventions, emphasis and culture.
There is no denying the success and importance of the field of Statistics for
science and, more generally, for society. I’m proud to be a part of the field. The
focus of this essay is on one challenge (and opportunity) to our field: the rise of
Machine Learning.
During my twenty-five year career I have seen Machine Learning evolve from
being a collection of rather primitive (yet clever) set of methods to do classifi-
cation, to a sophisticated science that is rich in theory and applications.
A quick glance at the The Journal of Machine Learning Research (mlr.
csail.mit.edu) and NIPS (books.nips.cc) reveals papers on a variety of top-
ics that will be familiar to Statisticians such as:
conditional likelihood, sequential design, reproducing kernel Hilbert
spaces, clustering, bioinformatics, minimax theory, sparse regres-
sion, estimating large covariance matrices, model selection, density
estimation, graphical models, wavelets, nonparametric regression.

These could just as well be papers in our flagship statistics journals.

This sampling of topics should make it clear that researchers in Machine
Learning — who were at one time somewhat unaware of mainstream statistical
methods and theory — are now not only aware of, but actively engaged in,
cutting edge research on these topics.
On the other hand, there are statistical topics that are active areas of re-
search in Machine Learning but are virtually ignored in Statistics. To avoid
becoming irrelevant, we Statisticians need to (i) stay current on research areas
in ML and (ii) change our outdated model for disseminating knowledge and (iii)
revamp our graduate programs.

1.2 The Conference Culture

ML moves at a much faster pace than Statistics. At first, ML researchers devel-
oped expert systems that eschewed probability. But very quickly they adopted
advanced statistical concepts like empirical process theory and concentration
of measure. This transition happened in a matter of a few years. Part of the
reason for this fast pace is the conference culture. The main venue for research
in ML is refereed conference proceedings rather than journals.
Graduate students produce a stream of research papers and graduate with
hefty CV’s. One of the reasons for the blistering pace is, again, the conference
culture.
The process of writing a typical statistics paper goes like this: you have an
idea for a method, you stew over it, you develop it, you prove some results about

2
it, and eventually you write it up and submit it. Then the refereeing process
starts. One paper can take years.
In ML, the intellectual currency is conference publications. There are a
number of deadlines for the main conference (NIPS, AISTAT, ICML, COLT).
The threat of a deadline forces one to quit ruminating and start writing. Most
importantly, all faculty members and students are facing the same deadline so
there is a synergy in the field that has mutual benefits. No one minds if you
cancel a class right before the NIPS deadline. And then, after the deadline,
everyone is facing another deadline: refereeing each others papers and doing so
in a timely manner. If you have an idea and don’t submit a paper on it, then
you may be out of luck because someone may scoop you.
This pressure is good; it keeps the field moving at a fast pace. If you think
this leads to poorly written papers or poorly thought out ideas, I suggest you
look at nips.cc and read some of the papers. There are some substantial, deep
papers. There are also a few bad papers. Just like in our journals. The papers
are refereed and the acceptance rate is comparable to our main journals. And
if an idea requires more detailed followup, then one can always write a longer
journal version of the paper.
Absent this stream of constant deadline, a field moves slowly. This is a
problem for Statistics not only for its own sake but also because it now competes
with ML.
Of course, there are disadvantages to the conference culture. Work is done
in a rush, and ideas are often not fleshed out in detail. But I think that the
advantages outweigh the disadvantages.

1.3 Neglected Research Areas

There are many statistical topics that are dominated by ML and mostly ignored
by Statistics. This is a shame because Statistics has much to offer in all these
areas. Examples include semisupervised inference, computational topology, on-
line learning, sequential game theory, hashing, active learning, deep learning,
differential privacy, random projections and reproducing kernel Hilbert spaces.
Ironically, some of these — like sequential game theory an reproducing kernel
Hilbert spaces — started in Statistics.

1.4 Case Studies

I’m lucky. I am at an institution which has a Machine Learning Department
(within the School of Computer Science) and, more importantly, the ML de-
partment welcomes involvement by Statisticians. So I’ve been fortunate to work
with colleagues in ML, attend their seminars, work with ML students and teach
courses in the ML department.
There are a number of topics I’ve worked on at least partly due to my asso-
ciation with ML. These include, statistical topology, graphical models, semisu-

3
●●
●●● ●

●
●

●●

● ●

●
●

Figure 1.1: Labeled data.

pervised inference, conformal prediction, and differential privacy.

Since this paper is supposed to be a personal reflection, let me now briefly
discuss two of these ML problems that I have had the good fortune to work on.
The point of these examples is to show how statistical thinking can be useful
for Machine Learning.

1.4.1 Case Study I: Semisupervised Inference

Suppose we observe data (X1 , Y1 ), . . . , (Xn , Yn ) and we want to predict Y from
X. If Y is discrete, this is a classification problem. If Y is real-valued, this is a
regression problem. Further, suppose we observe more data Xn+1 , . . . , XN with-
out the corresponding Y values. We thus have labeled data L = {(X1 , Y1 ), . . . , (Xn , Yn )}
and unlabeled data U = {Xn+1 , . . . , XN }. How do we use the unlabeled data
in addition to the labeled data to improve prediction? This is the problem of
semisupervised inference.
Consider Figure 1.1. The covariate is x = (x1 , x2 ) ∈ R2 . The outcome in
this case is binary as indicated by the circles and squares. Finding the decision
boundary using only the labeled data is difficult. Figure 1.2 shows the labeled
data together with some unlabeled data. We clearly see two clusters. If we
make the additional assumption that P (Y = 1|X = x) is smooth relative to the
clusters, then we can use the unlabeled data to nail down the decision boundary
accurately.
There are copious papers with heuristic methods for taking advantage of un-
labeled data. To see how useful these methods might be, consider the following
example. We download one-million webpages with images of cats and dogs. We
randomly select 100 pages and classify them by hand. Semisupervised methods

4
●

● ●

●● ●●
●
●
●
●● ●
● ● ●
● ● ● ● ●● ● ●●
●● ● ●
● ●
● ● ●
●● ● ●●
● ●
●
●
●●●● ●
●
● ● ●
● ●● ●
●
●
●●
● ● ●●
●
● ● ●
● ●
● ●
● ● ●●
●
● ● ● ●●● ● ●
● ●
● ●
● ● ●
● ● ● ● ●● ●
●● ●
●● ● ●
●
●●
●●
●●●
●
●●
●

●
●
● ●● ●
●●
●●
●●● ● ●
● ● ●● ●
● ● ●
● ● ●●
● ● ●●
● ●●
● ●● ●● ●
● ●● ●●● ●
●● ● ●
● ●● ● ●
● ●
● ● ●
● ●● ●
● ● ●● ●
●● ● ● ●● ● ● ●
● ● ● ●
●
●

Figure 1.2: Labeled and unlabeled data.

allow us to use the other 999,900 webpages to construct a good classifier.

But does semisupervised inference work? Or, to put it another way, under
what conditions does it work? In [1], we showed the following (which I state
informally here).
Suppose that Xi ∈ Rd . Let Sn denote the set of supervised estimators; these
estimators use only the labeled data. Let SS N denote the set of semisupervised
estimators; these estimators use the labeled data and unlabeled data. Let m be
the number of unlabeled data points and suppose that m ≥ n2/(2+ξ) for some
0 < ξ < d − 3. Let f (x) = E(Y |X = x). There is a large, nonparametric class
of distributions Pn such that the following is true:
1. There is a semisupervised estimator fb such that
2
2+ξ
C
sup RP (fb) ≤ (1.1)
P ∈Pn n

where RP (fb) = E(fb(X) − f (X))2 is the risk of the estimator fb under

distribution P .
2. For supervised estimators Sn we have
2
d−1
C
inf sup RP (fb) ≥ . (1.2)
fb∈Sn P ∈Pn n

3. Combining these two results we conclude that

2(d−3−ξ)
inf fb∈SS N supP ∈Pn RP (fb)
(2+ξ)(d−1)
C
≤ →0 (1.3)
inf fb∈Sn supP ∈Pn RP (fb) n

5
and hence, semisupervised estimation dominates supervised estimation.
The class Pn consists of distributions such that the marginal for X is highly
concentrated near some lower dimensional set and such that the regression func-
tion is smooth on this set. We have not proved that the class must be of this
form for semisupervised inference to improve on supervised inference but we
suspect that is indeed the case. Our framework includes a parameter α that
characterizes the strength of the semisupervised assumption. We showed that,
in fact, one can use the data to adapt to the correct value of α.

1.4.2 Case Study II: Statistical Topology

Computational topologists and researchers in Machine Learning have developed
methods for analyzing the shape of functions and data. Here I’ll briefly review
some of our work on estimating manifolds ([6, 7, 8]).
Suppose that M is a manifold of dimension d embedded in RD . Let X1 , . . . , Xn
be a sample from a distribution in P supported on M . We observe

Yi = Xi + i , i = 1, . . . , n (1.4)

where 1 , . . . , n ∼ Φ are noise variables.

Machine Learning researchers have derived many methods for estimating the
manifold M . But this leaves open an important statistical question: how well
do these estimators work? One approach to answering this question is to find
the minimax risk under some loss function. Let M c b an estimator of M . A
natural loss function for this problem is Hausdorff loss:
n o
H(M, Mc) = inf : M ⊂ M c ⊕ and M
c⊂M ⊕ . (1.5)

Let P be a set of distributions. The parameter of interest is M = support(P )

which we assume is a d-dimensional manifold. The minimax risk is

Rn = inf sup EP [H(M

c, M )]. (1.6)
c P ∈P
M

Of course, the risk depends on what conditions we assume on M and one the
noise Φ.
Our main findings are as follows. When there is no noise — so the data fall
on the manifold — we get Rn n−2/d . When the noise is perpendicular to M ,
the risk is Rn n−2/(2+d) . When the noise is Gaussian the rate is Rn 1/ log n.
The latter is not surprising when one considers the similar problem of estimating
a function when there are errors in variables.
The implications for Machine Learning are that, the best their algorithms
can do is highly dependent on the particulars of the type of noise.
How do we actually estimate these manifolds in practice? In ([8]) we take the
following point of view: if the noise is not too large, then the manifold should
be close to a d-dimensional hyper-ridge in the density p(y) for Y . Ridge finding
is an extension of mode finding, which is a common task in computer vision.

6
●
●
● ●
●
●
●
● ● ●
●
● ● ●
● ●
● ●
● ● ●
● ●
● ●
●
●● ● ● ●
● ● ●
●
● ●
●
●
●

Figure 1.3: The Mean Shift Algorithm. The data points move along trajecto-
ries during iterations until they reach the two modes marked by the two large
asterisks.

Let p be a density on RD . Suppose that p has k modes m1 , . . . , mk . An

integral curve, or path of steepest ascent, is a path π : R → RD such that
d
π 0 (t) = π(t) = ∇p(π(t)). (1.7)
dt
Under weak conditions, the paths π partition the space and are disjoint except
at the modes [9, 2].
The mean shift algorithm ([5, 3]) is a method for finding the modes of a
density by following the steepest ascent paths. The algorithm starts with a
mesh of points and then moves the points along gradient ascent trajectories
towards local maxima. A simple example is shown in Figure 1.3.
Given a function p : RD → R, let g(x) = ∇p(x) denote the gradient at x
and let H(x) denote the Hessian matrix. Let

λ1 (x) ≥ λ2 (x) ≥ · · · ≥ λD (x) (1.8)

denote the eigenvalues of H(x) and let Λ(x) be the diagonal matrix whose
diagonal elements are the eigenvalues. Write the spectral decomposition of
H(x) as H(x) = U (x)Λ(x)U (x)T . Fix 0 ≤ d < D and let V (x) be the last D − d
columns of U (x) (that is, the columns corresponding to the D − d smallest
eigenvalues). If we write U (x) = [V (x) : V (x)] then we can write H(x) =
[V (x) : V (x)]Λ(x)[V (x) : V (x)]T . Let L(x) = V (x)V (x)T be the projector on
the linear space defined by the columns of V (x). Define the projected gradient

G(x) = L(x)g(x). (1.9)

7
If the vector field G(x) is Lipschitz then by Theorem 3.39 of [9], G defines
a global flow as follows. The flow is a family of functions φ(x, t) such that
φ(x, 0) = x and φ0 (x, 0) = G(x) and φ(s, φ(t, x)) = φ(s + t, x). The flow lines,
or integral curves, partition the space and at each x where G(x) is non-null,
there is a unique integral curve passing through x. The intuition is that the
flow passing through x is a gradient ascent path moving towards higher values
of p. Unlike the paths defined by the gradient g which move towards modes,
the paths defined by G move towards ridges.
The paths can be parameterized in many ways. One commonly used param-
eterization is to use t ∈ [−∞, ∞] where large values of t correspond to higher
values of p. In this case t = ∞ will correspond to a point on the ridge. In this
parameterization we can express each integral curve in the flow as follows. A
map π : R → RD is an integral curve with respect to the flow of G if

π 0 (t) = G(π(t)) = L(π(t))g(π(t)). (1.10)

Definition: The ridge R consists of the destinations of the integral curves:

y ∈ R if limt→∞ π(t) = y for some π satisfying (1.10).

As mentioned above, the integral curves partition the space and for each
x∈/ R, there is a unique path πx passing through x. The ridge points are zeros
of the projected gradient: y ∈ R implies that G(y) = (0, . . . , 0)T . [10] derived
an extension of the mean-shift algorithm, called the subspace constrained mean
shift algorithm that finds ridges which can be applied to the kernel density
estimator. Our results can be summarized as follows:

1. Stability. We showed that if two functions are sufficiently close together

then their ridges are also close together (in Hausdorff distance).

2. We constructed an estimator R
b such that
2 !
D+8
log n
H(R, R)
b = OP (1.11)
n

where H is the Hausdorff distance. Further, we showed that R b is topo-

logically similar to R. We also construct an estimator Rh for h > 0 that
b
satisfies 12 !
log n
H(Rh , R
bh ) = OP (1.12)
n
where Rh is a smoothed version of R.

3. Suppose the data are obtained by sampling points on a manifold and

adding noise with small variance σ 2 . We showed that the resulting density
p has a ridge Rσ such that

H(M, Rσ ) = O σ 2 log3 (1/σ)

(1.13)

8
Figure 1.4: Simulated cosmic web data.

and Rσ is topologically similar to M . Hence when the noise σ is small,

the ridge is close to M . It then follows that
2 !
D+8
log n
+ O σ 2 log3 (1/σ) .

H(M, R) b = OP (1.14)
n

An example can be found in Figures 1.4 and 1.5. I believe that Statistics
has much to offer to this area especially in terms of making the assumptions
precise and clarifying how accurate the inferences can be.

1.5 Computational Thinking

There is another interesting difference that is worth pondering. Consider the
problem of estimating a mixture of Gaussians. In Statistics we think of this
as a solved problem. You use, for example, maximum likelihood which is im-
plemented by the EM algorithm. But the EM algorithm does not solve the
problem. There is no guarantee that the EM algorithm will actually find the
MLE; it’s a shot in the dark. The same comment applies to MCMC methods.
In ML, when you say you’ve solved the problem, you mean that there is
a polynomial time algorithm with provable guarantees. There is, in fact, a
rich literature in ML on estimating mixtures that do provide polynomial time
algorithms. Furthermore, they come with theorems telling you how many ob-
servations you need if you want the estimator to be a certain distance from the
truth, with probability at least 1 − δ. This is typical for what is expected of
an estimator in ML. You need to provide a provable polynomial time algorithm
and a finite sample (non-asymptotic) guarantee on the estimator.

9
Figure 1.5: Ridge finder applied to simulated cosmic web data.

ML puts heavier emphasis on computational thinking. Consider, for exam-

ple, the difference between P and NP problems. This is at the heart of theoretical
Computer Science and ML. Running an MCMC on an NP hard problem is often
meaningless. Instead, it is usually better to approximate the NP problem with
a simpler problem. How often do we teach this to our students?

1.6 The Evolving Meaning of Data

For most of us in Statistics, data means numbers. But data now includes images,
documents, videos, web pages, twitter feeds and so on. Traditional data — num-
bers from experiments and observational studies — are still of vital importance
but they represents a tiny fraction of the data out there. If we take the union
of all the data in the world, what fraction is being analyzed by statisticians? I
think it is a small number.
This comes back to education. If our students can’t analyze giant datasets
like millions of twitter feeds or millions of web pages then other people will
analyze those data. We will end up with a small cut of the pie.

1.7 Education and Hiring

The goal of a graduate student in Statistics is to find and advisor and write a
thesis. They graduate with a single data point: their thesis work.
The goal of a graduate student in ML is to find a dozen different research
problems to work on and publish many papers. They graduate with a rich data
set: many papers on many topics with many different people.

10
Having been on hiring committees for both Statistics and ML I can say that
the difference is striking. It is easy to choose candidates to interview in ML.
You have a lot of data on each candidate and you know what you are getting.
In Statistics, it is a struggle. You have little more than a few papers that bear
their advisor’s footprint.
The ML conference culture encourages publishing many papers on many
topics which is better for both the students and their potential employers. And
now, Statistics students are competing with ML students, putting Statistics
students at a significant disadvantage.
There are a number of topics that are routinely covered in ML that we rarely
teach in Statistics. Examples are: Vapnik-Chervonenkis theory, concentration of
measure, random matrices, convex optimization, graphical models, reproducing
kernel Hilbert spaces, support vector machines, and sequential game theory. It
is time to get rid of antiques like UMVUE, complete statistics and so on and
teach modern ideas.

1.8 If You Can’t Beat Them, Join Them

I don’t want to leave the reader with the impression that we are in some sort
of competition with ML. Instead, we should feel blessed that a second group of
Statisticians has appeared. Working with ML and adopting some of their ideas
enriches both fields.
ML has much to offer Statistics. And Statisticians have a lot to offer ML.
For example, we put much emphasis on quantifying uncertainty (standard er-
rors, confidence intervals, potserior distributions), an emphasis that is perhaps
lacking in ML. And sometimes, statistical thinking casts new light on existing
ML methods. A good example is the statistical view of boosting given in [4]. I
hope we will see collaboration and cooperation between the two fields thrive in
the years to come.

Acknowledgements: I’d like to thank Kathryn Roeder, Rob, Tibshirnai, Ryan

Tibshirani and Isa Verdinelli for reading a draft of this essay and providing
helpful suggestions.

11
Bibliography

[1] M. Azizyan, A. Singh, and L. Wasserman. Density-sensitive semisupervised

inference. The Annals of Statistics, 2013.

[2] Chacón. Clusters and water flows: a novel approach to modal clustering
through morse theory. arXiv preprint arXiv:1212.1384, 2012.
[3] D. Comaniciu and P. Meer. Mean shift: a robust approach toward feature
space analysis. Pattern Analysis and Machine Intelligence, IEEE Transac-
tions on, 24(5):603 –619, may 2002.

[4] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive logistic
regression: a statistical view of boosting (with discussion and a rejoinder
by the authors). The Annals of Statistics, 28(2):337–407, 2000.
[5] Keinosuke Fukunaga and Larry D. Hostetler. The estimation of the gradi-
ent of a density function, with applications in pattern recognition. IEEE
Transactions on Information Theory, 21:32–40, 1975.
[6] Christopher R. Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and
Larry Wasserman. Manifold estimation and singular deconvolution under
hausdorff loss. The Annals of Statistics, 40:941–963, 2012.

[7] Christopher R. Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and

Larry Wasserman. Minimax manifold estimation. Journal of Machine
Learning Research, pages 1263–1291, 2012.
[8] C.R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman. Non-
parametric ridge estimation. arXiv preprint arXiv:1212.5156, 2012.

[9] M.C. Irwin. Smooth dynamical systems, volume 94. Academic Press, 1980.
[10] Ozertem and Erdogmus. Locally defined principal curves and surfaces.
Journal of Machine Learning Research, 12:1249–1286, 2011.

All Models Are Wrong
No ratings yet
All Models Are Wrong
429 pages
Statistical Methods For Machine Learning
No ratings yet
Statistical Methods For Machine Learning
272 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
MLT Unit 2 Notes
No ratings yet
MLT Unit 2 Notes
58 pages
Reliability Analysis For Repairable
100% (1)
Reliability Analysis For Repairable
261 pages
Chapter 01 Introduction To ML
No ratings yet
Chapter 01 Introduction To ML
178 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
Introduction To Normal Distribution: Nathaniel E. Helwig
0% (1)
Introduction To Normal Distribution: Nathaniel E. Helwig
56 pages
IGNOU Artificial Intelligence Previous 10 Years Solved Papers
From Everand
IGNOU Artificial Intelligence Previous 10 Years Solved Papers
Manish Soni
No ratings yet
Poly Aml
No ratings yet
Poly Aml
76 pages
16-Intro To ML
No ratings yet
16-Intro To ML
52 pages
01 ML Basics
No ratings yet
01 ML Basics
61 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
40 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Intro Slides
No ratings yet
Intro Slides
31 pages
DSA5102X Lecture1
No ratings yet
DSA5102X Lecture1
51 pages
New Slides Machine Learning-Winter 2024
No ratings yet
New Slides Machine Learning-Winter 2024
72 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Lesson 4 - Introduction Machine Learning
No ratings yet
Lesson 4 - Introduction Machine Learning
44 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
ML 23 First Lectures 2 3 v0.1
No ratings yet
ML 23 First Lectures 2 3 v0.1
66 pages
CPSC540: Machine Learning Machine Learning Machine Learning Machine Learning
No ratings yet
CPSC540: Machine Learning Machine Learning Machine Learning Machine Learning
387 pages
CSC 492 Lecture Notes - 19.06.2024
No ratings yet
CSC 492 Lecture Notes - 19.06.2024
34 pages
Textbook ML - Removed - Removed - Removed - Removed
No ratings yet
Textbook ML - Removed - Removed - Removed - Removed
40 pages
03 PointEstimation
No ratings yet
03 PointEstimation
37 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
ML For Sociology
No ratings yet
ML For Sociology
19 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
A Final Report
No ratings yet
A Final Report
39 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
Is The End of Insight in Sight?: Jean-Michel Tucny, Mihir Durve, and Sauro Succi
No ratings yet
Is The End of Insight in Sight?: Jean-Michel Tucny, Mihir Durve, and Sauro Succi
20 pages
ML CH-1 Introduction To Machine Learning
No ratings yet
ML CH-1 Introduction To Machine Learning
12 pages
02-03-Warming-Up and Data and Features
No ratings yet
02-03-Warming-Up and Data and Features
22 pages
Lesson 2
No ratings yet
Lesson 2
43 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
100% (1)
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
414 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
ML Lecture1
No ratings yet
ML Lecture1
37 pages
Introducing Machine Learning
No ratings yet
Introducing Machine Learning
17 pages
Machine Learning Advances For Time Series Forecasting: Ricardo P. Masini
No ratings yet
Machine Learning Advances For Time Series Forecasting: Ricardo P. Masini
44 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
20ECE633T Machine Learning in VLSI
No ratings yet
20ECE633T Machine Learning in VLSI
81 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Realized Volatility and Parametric Estimation of Heston SDEs
100% (1)
Realized Volatility and Parametric Estimation of Heston SDEs
33 pages
Machine Learning Challenges and Opportunities in The African Agricultural Sector. A General Perspective.
No ratings yet
Machine Learning Challenges and Opportunities in The African Agricultural Sector. A General Perspective.
13 pages
CPSC540: Machine Learning Machine Learning Machine Learning Machine Learning
No ratings yet
CPSC540: Machine Learning Machine Learning Machine Learning Machine Learning
91 pages
41 2 24 Guitian Preprint
No ratings yet
41 2 24 Guitian Preprint
29 pages
MAI Lecture 01 Introduction
No ratings yet
MAI Lecture 01 Introduction
52 pages
Unit 1
No ratings yet
Unit 1
19 pages
Notices of The American Mathematical Society - 2020 - Ernest Fokoue - Model Selection For Optimal Prediction in Statistical Machine Learning
No ratings yet
Notices of The American Mathematical Society - 2020 - Ernest Fokoue - Model Selection For Optimal Prediction in Statistical Machine Learning
14 pages
Jules Triomphe - Creating A New Super Tuner For Mobile Networks Using ML
No ratings yet
Jules Triomphe - Creating A New Super Tuner For Mobile Networks Using ML
24 pages
ML 01
No ratings yet
ML 01
24 pages
AML Unit 4 Part 1
No ratings yet
AML Unit 4 Part 1
14 pages
2667-Article Text-6869-1-10-20230615
No ratings yet
2667-Article Text-6869-1-10-20230615
17 pages
Molina-Garip 2019 Socarxiv
No ratings yet
Molina-Garip 2019 Socarxiv
27 pages
Texto para Discussão: Departamento de Economia
No ratings yet
Texto para Discussão: Departamento de Economia
43 pages
Div Class Title Explaining Fixed Effects Random Effects Modeling of Time Series Cross Sectional and Panel Data A Href fn2606 Ref Type FN A Div
No ratings yet
Div Class Title Explaining Fixed Effects Random Effects Modeling of Time Series Cross Sectional and Panel Data A Href fn2606 Ref Type FN A Div
21 pages
Institute of Mathematical Statistics The Annals of Statistics
No ratings yet
Institute of Mathematical Statistics The Annals of Statistics
55 pages
CHAP Introduction 1.2 Environmental Data Science 18p
No ratings yet
CHAP Introduction 1.2 Environmental Data Science 18p
18 pages
Pem1 PDF
No ratings yet
Pem1 PDF
53 pages
Documento 1 Bmantenimiento
No ratings yet
Documento 1 Bmantenimiento
70 pages
A Random Field Model of External Metal Loss Corrosion On Buried Pipelines
No ratings yet
A Random Field Model of External Metal Loss Corrosion On Buried Pipelines
16 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Quality of Analytical Measurements: Univariate Regression: 2009 Elsevier B.V. All Rights Reserved
No ratings yet
Quality of Analytical Measurements: Univariate Regression: 2009 Elsevier B.V. All Rights Reserved
43 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
Descriptive and Summary Statistics: Statistics for Lean Six Sigma Simplified with GEN AI, #1
From Everand
Descriptive and Summary Statistics: Statistics for Lean Six Sigma Simplified with GEN AI, #1
Sumeet Savant
No ratings yet
CT4 Models PDF
0% (1)
CT4 Models PDF
6 pages
Lecture On Volatility
No ratings yet
Lecture On Volatility
37 pages
Deep Learning - Important Questions With Answer Keys
No ratings yet
Deep Learning - Important Questions With Answer Keys
6 pages
Bzdok, D., Altman, N., & Krzywinski, M. (2020)
No ratings yet
Bzdok, D., Altman, N., & Krzywinski, M. (2020)
6 pages
Machine Learning For Sociology: Annual Review of Sociology
No ratings yet
Machine Learning For Sociology: Annual Review of Sociology
19 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
Su X. - Tree-Augmented Cox Proportional Hazards Models (2005)
No ratings yet
Su X. - Tree-Augmented Cox Proportional Hazards Models (2005)
14 pages
Solutions Chapter 10
No ratings yet
Solutions Chapter 10
7 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Falk & Katz-Gerro (2016) Cultural Participation in Europe PDF
No ratings yet
Falk & Katz-Gerro (2016) Cultural Participation in Europe PDF
36 pages
Chapter 3 Summary
No ratings yet
Chapter 3 Summary
8 pages
Solutions Problem Set 1
No ratings yet
Solutions Problem Set 1
7 pages
Perceptron Mistake Bound
No ratings yet
Perceptron Mistake Bound
10 pages
Statistical Methodology Past Paper 2018-2019
No ratings yet
Statistical Methodology Past Paper 2018-2019
4 pages
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
No ratings yet
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
17 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Fragmentation Modeling Using The Expectation Maximization Algorithm
No ratings yet
Fragmentation Modeling Using The Expectation Maximization Algorithm
22 pages
Statistick 1
No ratings yet
Statistick 1
3 pages
Homework 2
No ratings yet
Homework 2
4 pages
TF Certificate Candidate Handbook PDF
No ratings yet
TF Certificate Candidate Handbook PDF
7 pages
MGMT 469 Maximum Likelihood Estimation
No ratings yet
MGMT 469 Maximum Likelihood Estimation
6 pages
Eee25 2018syllabus
No ratings yet
Eee25 2018syllabus
4 pages
Quadratic Mean Differentiability Example
No ratings yet
Quadratic Mean Differentiability Example
5 pages
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
No ratings yet
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
3 pages
CIR Calibration
No ratings yet
CIR Calibration
8 pages
Rising Star Academy
No ratings yet
Rising Star Academy
20 pages

Rise of The Machines: Larry Wasserman

Uploaded by

Rise of The Machines: Larry Wasserman

Uploaded by

Chapter 1

Rise of the Machines

On the 50th anniversary of the Committee of Presidents of Statistical So-

These could just as well be papers in our flagship statistics journals.

1.2 The Conference Culture

1.3 Neglected Research Areas

1.4 Case Studies

Figure 1.1: Labeled data.

pervised inference, conformal prediction, and differential privacy.

1.4.1 Case Study I: Semisupervised Inference

Figure 1.2: Labeled and unlabeled data.

allow us to use the other 999,900 webpages to construct a good classifier.

where RP (fb) = E(fb(X) − f (X))2 is the risk of the estimator fb under

3. Combining these two results we conclude that

1.4.2 Case Study II: Statistical Topology

where 1 , . . . , n ∼ Φ are noise variables.

Let P be a set of distributions. The parameter of interest is M = support(P )

Rn = inf sup EP [H(M

Let p be a density on RD . Suppose that p has k modes m1 , . . . , mk . An

λ1 (x) ≥ λ2 (x) ≥ · · · ≥ λD (x) (1.8)

G(x) = L(x)g(x). (1.9)

π 0 (t) = G(π(t)) = L(π(t))g(π(t)). (1.10)

Definition: The ridge R consists of the destinations of the integral curves:

1. Stability. We showed that if two functions are sufficiently close together

where H is the Hausdorff distance. Further, we showed that R b is topo-

3. Suppose the data are obtained by sampling points on a manifold and

H(M, Rσ ) = O σ 2 log3 (1/σ)

and Rσ is topologically similar to M . Hence when the noise σ is small,

1.5 Computational Thinking

ML puts heavier emphasis on computational thinking. Consider, for exam-

1.6 The Evolving Meaning of Data

1.7 Education and Hiring

1.8 If You Can’t Beat Them, Join Them

Acknowledgements: I’d like to thank Kathryn Roeder, Rob, Tibshirnai, Ryan

[1] M. Azizyan, A. Singh, and L. Wasserman. Density-sensitive semisupervised

[7] Christopher R. Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and

You might also like

where 1 , . . . , n ∼ Φ are noise variables.