Getting Started With R: An Introduction For Biologists. ISBN 9780198787846, 978-0198787846
Getting Started With R: An Introduction For Biologists. ISBN 9780198787846, 978-0198787846
Visit the link below to download the full version of this book:
https://cheaptodownload.com/product/getting-started-with-r-an-introduction-for-b
iologists-2nd-edition-full-pdf-download/
Getting Started
with R
An Introduction for
Biologists
Second Edition
ANDREW P. BECKERMAN
DYL AN Z. CHILDS
O W E N L . P E TC H E Y
3
3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Andrew Beckerman, Dylan Childs, & Owen Petchey 2017
The moral rights of the authors have been asserted
First Edition published in 2012
Second Edition published in 2017
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2016946804
ISBN 978–0–19–878783–9 (hbk.)
ISBN 978–0–19–878784–6 (pbk.)
DOI 10.1093/acprof:oso/9780198787839.001.0001
Printed and bound by
CPI Litho (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
Contents
Preface ix
Introduction to the second edition ix
What this book is about xii
How the book is organized xiv
Why R? xvi
Updates xviii
Acknowledgements xviii
Index 227
Preface
our focus on helping you get started using R. We love doing this and we’ve
been teaching this for 15 years. Not surprisingly, many of you are also find-
ing that this getting-started book is great for undergraduate and graduate
teaching. We thank you all for your feedback!
Second, we have substantially revised how we use, and thus suggest you
use, R. Our changes and suggestions take advantage of some new and very
cool, efficient, and straightforward tools. We think these changes will help
you focus even more on your data and questions. This is good.
If you compare this second edition with the first, you will find sev-
eral differences. We no longer rely on base R tools and graphics for data
manipulation and figure making, instead introducing dplyr and ggplot2.
We’ve also expanded the set of basic statistics we introduce to you, includ-
ing new examples of a simple regression and a one-way and a two-way
ANOVA, in addition to the old ANCOVA example. Third, we provide an
entire new chapter on the generalized linear model. Oh, yes, and we have
added an author, Dylan.
But the tools and their syntax were designed a long time ago. Many em-
ploy a rather idiosyncratic set of symbols and syntax to accomplish tasks.
For example, square brackets are used for selecting parts of datasets, and
dollar signs for referring to particular variables. Sometimes different tools
that perform similar tasks work in very different ways. This makes for ra-
ther idiosyncratic instructions that are not so easy for people to read or to
remember how to write.
So after much deliberation, and some good experiences, we decided
that in this second edition we would introduce a popular and new set of
tools contributed by Sir1 Hadley Wickham and many key collaborators
(http://had.co.nz). These new tools introduce a set of quite stand-
ardized and coherent syntax and exist in a set of add-on packages—you
will learn exactly what these are and how to use them later. And you will
also learn some base R. In fact, you will learn a great deal of base R.
We decided to teach this new way of using R because:
• The tools use a more ‘natural language’ that is easier for humans to
work with.
• The standardization and coherence among the tools make them easy
to learn and use.
• The tools work very well for simple and small problems, but also scale
very intuitively and naturally to quite complex and large problems.
• There are tools for every part of the workflow, from data management
to statistical analysis and making beautiful graphs.
• Each of us independently migrated to this new set of tools, giving
us greater confidence that it’s the way forward. (Well, Andrew was
forced a bit.)
Though we are confident that teaching newcomers these new tools is the
right thing to do, there are some risks and, in particular, people taught only
these new tools may not be able to work easily with people or code using
1
Unofficial knighthood for contributions to making our R-lives so much easier and
beautiful.
xii PREFACE
the classic way. Furthermore, some colleagues have questioned the wis-
dom of teaching this ‘modern’ approach to entry-level students (i.e. those
with no or little previous experience with R), especially if taught in the ab-
sence of the classic approach (funnily enough, many of these ‘concerned’
colleagues don’t use R at all!). Certainly the risks mentioned above are real,
and for that reason we provide a short appendix in Chapter 3 (the chapter
on Data management) that links the classic and new methods. The classic
way can still sometimes be the best way. And old dogs don’t often agree to
learning new tricks.
Another concern voiced asks why we’re teaching ‘advanced R’ at entry
level, with the idea that the use of new tools and add-on packages im-
plies ‘advanced’. After all, why wouldn’t the ‘base’ R distribution contain
everything an entry-level user needs? Well, it does, but we’ve found the
standardization and syntax in the add-on packages to be valuable even for
us as seasoned users. And one should not read ‘base’ R distribution as ‘ba-
sic’ R distribution, or ‘add-on’ package as ‘advanced’ package. The ‘base’
distribution contains many advanced tools, and many add-on packages
contain very basic tools.
We hope you enjoy this new Getting Started with R.
There are a few things that you need to know to make this book, and our
ideas, work for you. Many of you already know how to do most of these
things, having been in the Internet age for long enough now, but just to
be sure:
1. You need to know how to download things from the Internet. If you
use Windows, Macintosh, or Linux, the principles are the same, but
the details are different. Know your operating system. Know your
browser and know your mouse/trackpad.
2. You need to know how to make folders on your computer and save
files to them. This is essential for being organized and efficient.
3. It is useful, though not essential, to understand what a ‘path’ is on
your computer. This is the address of a folder or a file (i.e. the path
to a file). On Windows, depending on the type you are using, this in-
volves a drive name, a colon (:), and slashes (\ or /). On a Macintosh
and Linux/Unix, this requires the names of your hard drive, the
name of your home directory, a tilde (~), the names of folders, and
slashes (/).
xiv PREFACE
In this book, we will show you how to use R in the context of everyday re-
search in biology (or, indeed, in many other disciplines). Our philosophy
assumes that you have some data and would like to derive some under-
standing from it. Typically you need to manage your data, explore your
data (e.g. by plotting it), and then analyse your data. Before any attempt at
analysis, we suggest (no, demand!) that you always plot your data. As al-
ways, analysing (modelling) your data involves first developing a model
that accurately reflects your question, and then testing critical assump-
tions associated with the statistical method (model). Only after this do
you attempt interpretation. Our focus is on developing a rigorous and ef-
ficient routine (workflow) and a template for using R for data exploration,
visualization, and analysis. We believe that this will give you a functional
approach to using R, in which you always have the goal (understanding
your data, answering your question) in mind.
Chapter 1 is about getting R and getting acquainted with it. The chap-
ter is a bit like when you first meet someone who might be your friend,
or might not, so you take some time to get to know each other. We also
introduce you to another friend, RStudio, and strongly recommend that
you get to know this one, as well as R. RStudio is just great. You will fall in
love with it.
Chapter 2 is about getting your data ready for R, getting it into R, and
checking it got into R correctly. Not many courses cover data preparation
PREFACE xv
as much as in this chapter, but it’s really essential for an efficient experience
with R. Good preparation makes for great performance. We give tips about
what can go wrong here, how to recognize this, and how to fix it.
Chapter 3 focuses on how you work with data once it’s in R. Usually
you’ll need to do some data manipulation before making a graph or doing a
statistical analysis. You might need to subset your data, or want to calculate
mean ± SE. We walk you through some very efficient and clear methods
for doing all kinds of data manipulations.
Chapter 4 is about visualizing your data, and comes before the chapters
about statistical analyses because we always visualize our data before we do
any statistics (you will hear that again and again throughout this book).
We introduce you to scatterplots, histograms, and box-and-whisker plots.
In later chapters, we also introduce you to plots of means and standard
errors. (But we do not introduce you to bar charts with error bars, because
they are evil2 .)
Chapters 5, 6, and 7 finally get on to doing some statistics. Chapter 5
introduces ‘basic’ statistical tests (t-test, chi-squared contingency table
analyses, simple linear regression, and the one-way ANOVA). Chapter 6
is slightly more complex tests (two-way ANOVA and ANCOVA). And
Chapter 7 takes us to new territory, where we introduce about the sim-
plest generalized linear model around: a Poisson regression. As we said,
we are introducing how to do stuff in R and we’re not aiming to cover
lots of statistics in great detail, but along the way we try and ensure that
your understanding of statistics maps onto the output you can get from
using R. We’ve added this ‘getting started with generalized linear models’
chapter because so many types of question in the biological sciences de-
mand it. Our goal is that you should have seen enough variety of analysis
methods to be comfortable and confident in moving forward and learning
more yourself.
Chapter 8 comes back to figures and graphs. It is about how to make
your graphs look even more beautiful than they were during the previous
2
http://dx.doi.org/10.1371/journal.pbio.1002128
xvi PREFACE
chapters. Put another way, it’s about pimping your graphs. Making the la-
bels, symbols, colours, shading, sizes, and everything else you might like
to change look beautiful, coordinated, and clear, so readers are amazed by
the clarity with which they can see your findings. It will also give you the
skills and flexibility to make atrocious graphs . . . be careful.
The final chapter 9, wraps all this up and provides encouragement. It is
brief. We figure that by this point, you’ll have had enough of us, and will be
raring to get your own data into R. And that is great, because that is when
you’ll really solidify your learning.
Why R?
If you’ve got this far, you probably know you want to learn R. Some of you
will have established research careers based around using a variety of stat-
istical and graphing packages. Some of you will be starting your research
career and wondering whether you should use some of the packages and
applications that your supervisor/research group uses, or jump ship to R.
Perhaps your group already uses R and you are just looking for that ‘get-
ting started’ book that answers what you think are embarrassing questions.
Regardless of your stage or background, we think an informal but struc-
tured introduction to an approach and routine for using R will help. And
regardless of the motivation, we finish the Preface here by introducing a
PREFACE xvii
Updates
Rstudio evolves quickly, so don’t worry if what you see on your computer
is a little different from what’s printed in this book. For example, as this
book went to press, RStudio started using a new method for importing
data. We quickly updated the most important parts of the book, but for
a full account of this change, and any others, look on the book web site
www.r4all.org/the-book.
Acknowledgements
Thanks to our wives, Sophie, Amanda, and Sara, for everything. After all
these years, they know about R too. Many thanks to Ian Sherman and Lucy
Nash at OUP for their guidance, support and encouragement, to Douglas
Meekison for excellent copy-editing, and Philip Alexander for patiently
dealing with countless “final” fixes!
1
Getting and Getting
Acquainted with R
Getting Started with R Second Edition. Andrew Beckerman, Dylan Childs, & Owen Petchey:
Oxford University Press (2017). © Andrew Beckerman, Dylan Childs, & Owen Petchey.
DOI 10.1093/acprof:oso/9780198787839.001.0001
2 GETTING STARTED WITH R
We will first walk you through getting and installing R and getting and
installing RStudio. While for many this will be trivial, our experience sug-
gests that many of you probably need a tiny bit of hand-holding every once
and a while.
1.2 Getting R
Figure 1.1 The CRAN website front page, from where you can find the links to
download the R application.
GET TING AND GET TING ACQUAINTED WITH R 3
The top box on CRAN provides access to the three major classes of op-
erating systems. Simply click on the link for your operating system. As we
mentioned in the Preface, R remains freely available.
You’ll hear our next recommendation quite a bit throughout the book:
read the instructions. The instructions will take you through the pro-
cesses of downloading R and installing it on your computer. It might also
make sense to examine some of the Frequently Asked Questions found at
the bottom of the web page. R has been around quite a long time now,
and these FAQs reflect more than a decade of beginners like you asking
questions about how R works, etc. Go on . . . have a look!
1.2.1 L I N U X/U N I X
Moving along now, the Linux link takes you to several folders for flavours
of Linux and Unix. Within each of those is a set of instructions. We’ll as-
sume that if you know enough to have a Linux or Unix machine under
your fine fingertips, you can follow these instructions and take advantage
of the various tools.
1.2.2 WINDOWS
The Windows link takes you to a page with three more links. The link
you want to focus on is ‘base’. You will also notice that there is a link to
the aforementioned R FAQs and an additional R for Windows FAQs. Go
on . . . have a look! There is a tonne of good stuff in there about the various
ways R works on Windows NT, Vista, 8, 10, etc. The base link moves you
further on to instructions and the installer, as shown in Figure 1.2.
1.2.3 M AC I N T O S H
The (Mac) OS X link takes you to a page with several links as well
(Figure 1.3). Unless you are on a super-old machine, the first link is the
one on which you want to focus. It will download the latest version of R
for several recent distributions of OS X and offer, via a .dmg installer, to
put everything where it needs to be. Note that while not required for ‘get-
ting started’, getting the XQuartz X11 windowing system is a good idea;
4 GETTING STARTED WITH R
a link is provided just below the paragraph describing the installer (see
Figure 1.3). As with Windows, the R FAQs and an additional R for OS X
FAQs are provided . . . they are good things.
So, at this stage, you should have downloaded and installed R. Well done!
However, we are not going to use R directly. Our experience suggests that
you will enjoy your R-life a lot more if you interact with R via a different
program, also freely available: the software application RStudio. RStudio is
a lovely, cross-platform application that makes interacting with R quite a
bit easier and more pleasurable. Among other things, it makes importing
data a breeze, has a standardized look and feel on all platforms, and has
several tools that make it much easier to keep track of the instructions you
have to give R to make the magic happen.
Figure 1.4 The RStudio website front page, from where you can find the links
to download the RStudio application. (Note: you must (as you have done) also
download the R application from the CRAN website.)