Advanced R
Advanced R
Advanced R
The R Series
“The author has become one of the foremost authorities on this
Advanced R
topic and is well known and appreciated throughout the entire R
community. This is the great strength of the book and the primary
reason it deserves to be published. It addresses a topic where there
Advanced R
is already a growing number of books, but few have the depth, the
technical accuracy, and the authority of this one.”
—Bill Venables, CSIRO
Advanced R presents useful tools and techniques for attacking
many types of R programming problems, helping you avoid mis-
takes and dead ends. With more than ten years of experience pro-
gramming in R, the author illustrates the elegance, beauty, and flex-
ibility at the heart of R.
The book develops the necessary skills to produce quality code that
can be used in a variety of circumstances. You will learn:
• The fundamentals of R, including standard data types and
functions
• Functional programming as a useful framework for solving wide
classes of problems
• The positives and negatives of metaprogramming
• How to write fast, memory-efficient code
This book not only helps current R users become R programmers
but also shows existing programmers what’s special about R. Inter-
mediate R programmers can dive deeper into R and learn new strat-
egies for solving diverse problems while programmers from other Wickham
languages can learn the details of R and understand why R works
the way it does.
Hadley Wickham
K20319
w w w. c rc p r e s s . c o m
Series Editors
John M. Chambers Torsten Hothorn
Department of Statistics Division of Biostatistics
Stanford University University of Zurich
Stanford, California, USA Switzerland
Customer and Business Analytics: Applied Data Mining for Business Decision
Making Using R, Daniel S. Putler and Robert E. Krider
Hadley Wickham
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To Jeff, who makes me happy, and who made
sure I had a life outside this book.
Contents
1 Introduction 1
1.1 Who should read this book . . . . . . . . . . . . . . . . 3
1.2 What you will get out of this book . . . . . . . . . . . . 3
1.3 Meta-techniques . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Recommended reading . . . . . . . . . . . . . . . . . . 5
1.5 Getting help . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . 6
1.7 Conventions . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 Colophon . . . . . . . . . . . . . . . . . . . . . . . . . . 8
I Foundations 11
2 Data structures 13
2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Atomic vectors . . . . . . . . . . . . . . . . . . . 15
2.1.1.1 Types and tests . . . . . . . . . . . . . 16
2.1.1.2 Coercion . . . . . . . . . . . . . . . . . 16
2.1.2 Lists . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.0.1 Names . . . . . . . . . . . . . . . . . . 20
2.2.1 Factors . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 Exercises . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Matrices and arrays . . . . . . . . . . . . . . . . . . . . 24
ix
x Contents
2.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Data frames . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 Creation . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.2 Testing and coercion . . . . . . . . . . . . . . . . 28
2.4.3 Combining data frames . . . . . . . . . . . . . . 28
2.4.4 Special columns . . . . . . . . . . . . . . . . . . . 29
2.4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Subsetting 33
3.1 Data types . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.1 Atomic vectors . . . . . . . . . . . . . . . . . . . 34
3.1.2 Lists . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.3 Matrices and arrays . . . . . . . . . . . . . . . . 37
3.1.4 Data frames . . . . . . . . . . . . . . . . . . . . . 38
3.1.5 S3 objects . . . . . . . . . . . . . . . . . . . . . . 39
3.1.6 S4 objects . . . . . . . . . . . . . . . . . . . . . . 39
3.1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Subsetting operators . . . . . . . . . . . . . . . . . . . . 40
3.2.1 Simplifying vs. preserving subsetting . . . . . . . 41
3.2.2 $ . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.3 Missing/out of bounds indices . . . . . . . . . . . 44
3.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Subsetting and assignment . . . . . . . . . . . . . . . . 45
3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Lookup tables (character subsetting) . . . . . . . 46
3.4.2 Matching and merging by hand (integer subset-
ting) . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.3 Random samples/bootstrap (integer subsetting) 48
3.4.4 Ordering (integer subsetting) . . . . . . . . . . . 49
Contents xi
4 Vocabulary 57
4.1 The basics . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Common data structures . . . . . . . . . . . . . . . . . 59
4.3 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 Working with R . . . . . . . . . . . . . . . . . . . . . . 61
4.5 I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Style guide 63
5.1 Notation and naming . . . . . . . . . . . . . . . . . . . 63
5.1.1 File names . . . . . . . . . . . . . . . . . . . . . 63
5.1.2 Object names . . . . . . . . . . . . . . . . . . . . 64
5.2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.1 Spacing . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.2 Curly braces . . . . . . . . . . . . . . . . . . . . 66
5.2.3 Line length . . . . . . . . . . . . . . . . . . . . . 67
5.2.4 Indentation . . . . . . . . . . . . . . . . . . . . . 67
5.2.5 Assignment . . . . . . . . . . . . . . . . . . . . . 67
5.3 Organisation . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.1 Commenting guidelines . . . . . . . . . . . . . . 68
xii Contents
6 Functions 69
6.1 Function components . . . . . . . . . . . . . . . . . . . 71
6.1.1 Primitive functions . . . . . . . . . . . . . . . . . 71
6.1.2 Exercises . . . . . . . . . . . . . . . . . . . . . . 72
6.2 Lexical scoping . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.1 Name masking . . . . . . . . . . . . . . . . . . . 74
6.2.2 Functions vs. variables . . . . . . . . . . . . . . . 75
6.2.3 A fresh start . . . . . . . . . . . . . . . . . . . . 76
6.2.4 Dynamic lookup . . . . . . . . . . . . . . . . . . 77
6.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Every operation is a function call . . . . . . . . . . . . 79
6.4 Function arguments . . . . . . . . . . . . . . . . . . . . 81
6.4.1 Calling functions . . . . . . . . . . . . . . . . . . 81
6.4.2 Calling a function given a list of arguments . . . 83
6.4.3 Default and missing arguments . . . . . . . . . . 83
6.4.4 Lazy evaluation . . . . . . . . . . . . . . . . . . . 84
6.4.5 ... . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . 89
6.5 Special calls . . . . . . . . . . . . . . . . . . . . . . . . 89
6.5.1 Infix functions . . . . . . . . . . . . . . . . . . . 90
6.5.2 Replacement functions . . . . . . . . . . . . . . . 91
6.5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . 93
6.6 Return values . . . . . . . . . . . . . . . . . . . . . . . 94
6.6.1 On exit . . . . . . . . . . . . . . . . . . . . . . . 97
6.6.2 Exercises . . . . . . . . . . . . . . . . . . . . . . 97
6.7 Quiz answers . . . . . . . . . . . . . . . . . . . . . . . . 98
Contents xiii
7 OO field guide 99
7.1 Base types . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.2 S3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2.1 Recognising objects, generic functions, and meth-
ods . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2.2 Defining classes and creating objects . . . . . . . 105
7.2.3 Creating new methods and generics . . . . . . . 106
7.2.4 Method dispatch . . . . . . . . . . . . . . . . . . 107
7.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . 109
7.3 S4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.3.1 Recognising objects, generic functions, and meth-
ods . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.3.2 Defining classes and creating objects . . . . . . . 113
7.3.3 Creating new methods and generics . . . . . . . 115
7.3.4 Method dispatch . . . . . . . . . . . . . . . . . . 115
7.3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . 116
7.4 RC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.4.1 Defining classes and creating objects . . . . . . . 117
7.4.2 Recognising objects and methods . . . . . . . . . 119
7.4.3 Method dispatch . . . . . . . . . . . . . . . . . . 119
7.4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . 120
7.5 Picking a system . . . . . . . . . . . . . . . . . . . . . . 120
7.6 Quiz answers . . . . . . . . . . . . . . . . . . . . . . . . 121
8 Environments 123
8.1 Environment basics . . . . . . . . . . . . . . . . . . . . 124
8.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . 130
8.2 Recursing over environments . . . . . . . . . . . . . . . 130
8.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . 132
8.3 Function environments . . . . . . . . . . . . . . . . . . 133
xiv Contents
11 Functionals 199
11.1 My first functional: lapply() . . . . . . . . . . . . . . . 201
11.1.1 Looping patterns . . . . . . . . . . . . . . . . . . 203
11.1.2 Exercises . . . . . . . . . . . . . . . . . . . . . . 204
11.2 For loop functionals: friends of lapply() . . . . . . . . . 205
11.2.1 Vector output: sapply and vapply . . . . . . . . . 205
11.2.2 Multiple inputs: Map (and mapply) . . . . . . . . . 207
11.2.3 Rolling computations . . . . . . . . . . . . . . . 209
11.2.4 Parallelisation . . . . . . . . . . . . . . . . . . . . 212
11.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . 213
11.3 Manipulating matrices and data frames . . . . . . . . . 214
11.3.1 Matrix and array operations . . . . . . . . . . . . 214
11.3.2 Group apply . . . . . . . . . . . . . . . . . . . . 216
11.3.3 The plyr package . . . . . . . . . . . . . . . . . . 217
xvi Contents
14 Expressions 281
14.1 Structure of expressions . . . . . . . . . . . . . . . . . . 282
14.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . 286
14.2 Names . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
14.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . 287
14.3 Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
14.3.1 Modifying a call . . . . . . . . . . . . . . . . . . 289
14.3.2 Creating a call from its components . . . . . . . 290
xviii Contents
IV Performance 329
16 Performance 331
16.1 Why is R slow? . . . . . . . . . . . . . . . . . . . . . . 332
16.2 Microbenchmarking . . . . . . . . . . . . . . . . . . . . 333
16.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . 334
16.3 Language performance . . . . . . . . . . . . . . . . . . 335
16.3.1 Extreme dynamism . . . . . . . . . . . . . . . . . 335
16.3.2 Name lookup with mutable environments . . . . 337
16.3.3 Lazy evaluation overhead . . . . . . . . . . . . . 339
16.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . 340
16.4 Implementation performance . . . . . . . . . . . . . . . 341
16.4.1 Extracting a single value from a data frame . . . 341
16.4.2 ifelse(), pmin(), and pmax() . . . . . . . . . . . 342
16.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . 344
16.5 Alternative R implementations . . . . . . . . . . . . . . 344
18 Memory 377
18.1 Object size . . . . . . . . . . . . . . . . . . . . . . . . . 378
18.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . 382
18.2 Memory usage and garbage collection . . . . . . . . . . 383
18.3 Memory profiling with lineprof . . . . . . . . . . . . . . 385
18.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . 388
18.4 Modification in place . . . . . . . . . . . . . . . . . . . 389
18.4.1 Loops . . . . . . . . . . . . . . . . . . . . . . . . 392
18.4.2 Exercises . . . . . . . . . . . . . . . . . . . . . . 393
Index 451