Front Matter
Front Matter
Advisors:
George Casella Stephen Fienberg Ingram Olkin
Springer Texts in Statistics
Alfred: Elements of Statistics for the Life and Social Sciences
Berger: An Introduction to Probability and Stochastic Processes
Bilodeau and Brenner: Theory of Multivariate Statistics
Blom: Probability and Statistics: Theory and Applications
Brockwell and Davis: Introduction to Times Series and Forecasting, Second
Edition
Chow and Teicher: Probability Theory: Independence, Interchangeability,
Martingales, Third Edition
Christensen: Advanced Linear Modeling: Multivariate, Time Series, and
Spatial Data—Nonparametric Regression and Response Surface
Maximization, Second Edition
Christensen: Log-Linear Models and Logistic Regression, Second Edition
Christensen: Plane Answers to Complex Questions: The Theory of Linear
Models, Third Edition
Creighton: A First Course in Probability Models and Statistical Inference
Davis: Statistical Methods for the Analysis of Repeated Measurements
Dean and Voss: Design and Analysis of Experiments
du Toit, Steyn, and Stumpf: Graphical Exploratory Data Analysis
Durrett: Essentials of Stochastic Processes
Edwards: Introduction to Graphical Modelling, Second Edition
Finkelstein and Levin: Statistics for Lawyers
Flury: A First Course in Multivariate Statistics
Jobson: Applied Multivariate Data Analysis, Volume I: Regression and
Experimental Design
Jobson: Applied Multivariate Data Analysis, Volume II: Categorical and
Multivariate Methods
Kalbfleisch: Probability and Statistical Inference, Volume I: Probability,
Second Edition
Kalbfleisch: Probability and Statistical Inference, Volume II: Statistical
Inference, Second Edition
Karr: Probability
Keyfitz: Applied Mathematical Demography, Second Edition
Kiefer: Introduction to Statistical Inference
Kokoska and Nevison: Statistical Tables and Formulae
Kulkarni: Modeling, Analysis, Design, and Control of Stochastic Systems
Lange: Applied Probability
Lehmann: Elements of Large-Sample Theory
Lehmann: Testing Statistical Hypotheses, Second Edition
Lehmann and Casella: Theory of Point Estimation, Second Edition
Lindman: Analysis of Variance in Experimental Design
Lindsey: Applying Generalized Linear Models
(continued after index)
Larry Wasserman
All of Nonparametric
Statistics
With 52 Illustrations
Larry Wasserman
Department of Statistics
Carnegie Mellon University
Pittsburgh, PA 15213-3890
USA
[email protected]
Editorial Board
George Casella Stephen Fienberg Ingram Olkin
Department of Statistics Department of Statistics Department of Statistics
University of Florida Carnegie Mellon University Stanford University
Gainesville, FL 32611-8545 Pittsburgh, PA 15213-3890 Stanford, CA 94305
USA USA USA
ISBN-10: 0-387-25145-6
ISBN-13: 978-0387-25145-5
9 8 7 6 5 4 3 2 1
springeronline.com
To Isa
Preface
The book has a mixture of methods and theory. The material is meant
to complement more method-oriented texts such as Hastie et al. (2001) and
Ruppert et al. (2003).
After the Introduction in Chapter 1, Chapters 2 and 3 cover topics related to
the empirical cdf such as the nonparametric delta method and the bootstrap.
Chapters 4 to 6 cover basic smoothing methods. Chapters 7 to 9 have a higher
theoretical content and are more demanding. The theory in Chapter 7 lays the
foundation for the orthogonal function methods in Chapters 8 and 9. Chapter
10 surveys some of the omitted topics.
I assume that the reader has had a course in mathematical statistics such
as Casella and Berger (2002) or Wasserman (2004). In particular, I assume
that the following concepts are familiar to the reader: distribution functions,
convergence in probability, convergence in distribution, almost sure conver-
gence, likelihood functions, maximum likelihood, confidence intervals, the
delta method, bias, mean squared error, and Bayes estimators. These back-
ground concepts are reviewed briefly in Chapter 1.
Data sets and code can be found at:
www.stat.cmu.edu/∼larry/all-of-nonpar
I need to make some disclaimers. First, the topics in this book fall under
the rubric of “modern nonparametrics.” The omission of traditional methods
such as rank tests and so on is not intended to belittle their importance. Sec-
ond, I make heavy use of large-sample methods. This is partly because I think
that statistics is, largely, most successful and useful in large-sample situations,
and partly because it is often easier to construct large-sample, nonparamet-
ric methods. The reader should be aware that large-sample methods can, of
course, go awry when used without appropriate caution.
I would like to thank the following people for providing feedback and sugges-
tions: Larry Brown, Ed George, John Lafferty, Feng Liang, Catherine Loader,
Jiayang Sun, and Rob Tibshirani. Special thanks to some readers who pro-
vided very detailed comments: Taeryon Choi, Nils Hjort, Woncheol Jang,
Chris Jones, Javier Rojo, David Scott, and one anonymous reader. Thanks
also go to my colleague Chris Genovese for lots of advice and for writing the
LATEX macros for the layout of the book. I am indebted to John Kimmel,
who has been supportive and helpful and did not rebel against the crazy title.
Finally, thanks to my wife Isabella Verdinelli for suggestions that improved
the book and for her love and support.
Larry Wasserman
Pittsburgh, Pennsylvania
July 2005
Contents
1 Introduction 1
1.1 What Is Nonparametric Inference? . . . . . . . . . . . . . . . . 1
1.2 Notation and Background . . . . . . . . . . . . . . . . . . . . . 2
1.3 Confidence Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Useful Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Nonparametric Regression 61
5.1 Review of Linear and Logistic Regression . . . . . . . . . . . . 63
5.2 Linear Smoothers . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 Choosing the Smoothing Parameter . . . . . . . . . . . . . . . 68
5.4 Local Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 Penalized Regression, Regularization and Splines . . . . . . . . 81
5.6 Variance Estimation . . . . . . . . . . . . . . . . . . . . . . . . 85
5.7 Confidence Bands . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.8 Average Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.9 Summary of Linear Smoothing . . . . . . . . . . . . . . . . . . 95
5.10 Local Likelihood and Exponential Families . . . . . . . . . . . . 96
5.11 Scale-Space Smoothing . . . . . . . . . . . . . . . . . . . . . . . 99
5.12 Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . . 100
5.13 Other Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.14 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . 119
5.15 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Bibliography 243
Index 263