0% found this document useful (0 votes)

232 views4 pages

R For Data Science

This book provides a complete data science workflow using R and its packages. It presents a clear workflow of import, tidy, transform, visualize, model, and communicate data. The book effectively teaches concepts through examples and visuals. It serves as a useful resource for beginner to intermediate data scientists and statisticians to learn efficient data science practices in R.

Uploaded by

Prateek Samuel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

232 views4 pages

R For Data Science

Uploaded by

Prateek Samuel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/315993111

R for Data Science

Article in Journal of statistical software · April 2017

DOI: 10.18637/jss.v077.b01

CITATIONS READS

0 5,977

1 author:

Christopher J. Lortie
York University
187 PUBLICATIONS 7,250 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Direct and indirect consequences of dominant plants in arid environments View project

The importance of non-independence in social media science amplification. View project

All content following this page was uploaded by Christopher J. Lortie on 14 August 2017.

The user has requested enhancement of the downloaded file.

JSS Journal of Statistical Software
April 2017, Volume 77, Book Review 1. doi: 10.18637/jss.v077.b01

Reviewer: Christopher J. Lortie

York University and NCEAS

R for Data Science

Hadley Wickham, Garrett Grolemund

O’Reilly, Canada, 2016.
ISBN 978-1-4919-1039-9. 522 pp. USD 39.11 (P).
http://r4ds.had.co.nz/

Data science is a complex domain, and decisions associated with wrangling big and little data
are non-trivial (Gandomi and Haider 2015; Peters, Havstad, Cushing, Tweedie, Fuentes, and
Villanueva-Rosales 2014; Marx 2013). This book is written as a general resource for R by
providing a complete data science workflow, i.e., a set of steps for specific packages. The
workflow or set of steps is the anchor for the book and is developed immediately within the
preface. Import, tidy, transform-visualize-model iteratively, followed by communicate. The
workflow is described in text, illustrated, and sets of chapters are linked to the workflow
throughout the book. This structure provides an excellent backbone to the content and
facilitates its use as a resource because one can easily revisit a specific chapter for reference
when working through a real problem. The meaning of each step is self-explanatory but
nonetheless well defined and demonstrated by worked examples throughout the book.
The efficacy of communication for this statistical software and implementation book was eval-
uated using the following criteria: clarity of writing, supporting visuals that make complex
data science concepts accessible, and an appropriate balance between detail and general un-
derstanding of process. ‘R for Data Science’ was successful in all three potential dimensions
of communication. The writing is direct. Most chapters lead with code, examples, then the
description follows. This exposes the reader more rapidly to the relevant material needed to
grasp and do the data science. The book is primarily written in a show-then-tell format, and
this approach reduces the need for the reader to process large chunks of description (intro-
ductions are very brief in each chapter). Telling one how to do something versus showing it
directly can of course be appropriate in some contexts, and readers have different learning
styles. Nonetheless, showing the data science first engages and challenges the reader to read
the R code and learn the grammar. Reading code others have written is an important skill
and considering a problem before seeing the solution stimulates deeper learning. If anything,
there could have been even more development of the problem-solution model in the writing,
but I recognize that this can sometimes come at the cost of clarity and can tax the patience of
readers at different levels. There are exercises provided to consolidate learning and they are
pitched at the right level consistent with each chapter. The supporting visuals excel (but not
2 R for Data Science

Excel, pardon the pun) at visualizing the layered grammar of graphics in ggplot2, relational
data with dplyr, and subsetting with vectors. Visual learners will appreciate the concepts
illustrated, use of color, and a certain to be favorite – the pepper shaker, with pepper packet
in it, with pepper in the packet – to illustrate subsetting of lists of lists. Most chapters bal-
ance detail and general understanding of process well. This it not to say that the details of
coding were never a challenge to reconcile with the big picture. Many data science and coding
concepts are complex. The ‘Iteration with purr’ chapter was a challenge in merging and con-
trasting the details between different options such as ‘for loops’ versus functionals. However,
later chapters such as those in the model section struck a better balance. This difference can
in part be due to an audience experience bias such as one’s background in statistics versus
data science. This suggests that different audiences will be able to better capitalize on the
show-then-tell approach depending on their experience. The book is thus well pitched for
beginner to intermediate data scientists and likely for statisticians with an intermediate level
of experience with data science concepts and approaches. The communication and writing
style is accessible and not unduly technical for all readers.
There is extensive support for R available in the form of documentation (documentation for
R directly and reference manuals and vignettes for CRAN packages), FAQs, StackOverflow,
blogs, webinars, workshops, and many books (and many are also free). Too much informa-
tion, not too little is most likely the challenge for data scientists and statisticians working
in R. For the R community in particular, the breadth and scope of packages, discussion, and
documentation are unparalleled. Typically, this is a benefit in solving a problem, and fre-
quently, there is no one single solution but many. However, processing and parsing responses,
solutions, and code from different sources is time consuming and, at times, overwhelming. ‘R
for Data Science’ is a logical, contemporary entry point that compiles a relatively consistent
set of current R packages together into a clean data science workflow appropriate for many
purposes. The book is built up from extensive package development, and both R and its
packages will continue to evolve. The book reframes and updates a ggplot2 book (Wickham
2009) and complements the updated book (Wickham 2016). It explains the philosophy and
grammar of this package succinctly. It also further develops the concept of ‘tidydata’ (i.e.,
columns as variables, rows as observations, Wickham 2014). The concept of this mapping of
data is not unique to the ‘tidyverse’, but this ecosystem offers functions to easily deal with
some frequent types of inconvenient data and to readily wrangle and specify what constitutes
a variable and an observation so that the concept of tidydata makes sense. Tidydata thus
set up dataframes for more efficient processing. This ecosystem of packages, its grammar,
and the thinking are better situated within the domain of data science through the book.
The novelty in this book is a coherent workflow across different concepts and packages. It
is a solid foundation for the statistician interested in learning and improving data handling
skills. For the data scientist versed in the extensive resources distributed online for R, it is
an integrated set of resources and sample code that can readily provide and affirm a literate,
reproducible philosophy of data science. It is not about efficient programming or coding in
R, it is about efficient data science.
‘R for Data Science’ is an excellent resource. If you are already familiar with this ecosystem
of packages and ideas, it is nonetheless still valuable. You may be reading about many of
the approaches and tools you already use or have seen, but in seeing them organized and
described, in many instances by the authors of the packages, one gains novel insights. Even if
you do not agree with the assumptions in full, the documentation and logic described provides
Journal of Statistical Software – Book Reviews 3

a more complete sense of how data science needs, package development in R, and the goal of
integration are useful for statistical languages. Open science development can rapidly provide
us with new packages, but sometimes connecting and understanding them is a challenge.
This book is thus an excellent example of the value of documentation beyond vignettes that
facilitates deeper learning and appreciation of the landscape and not just the details of the
moment. When using R, it is not uncommon to be in the midst of a problem, rapidly look
up a solution online (from whatever resource works), and move on. The solution may or may
not come from a book, and if it does, one captures the relevant code or explanation from
the snippet only. This begs the question of investing in a complete book. For this book, I
recommend the investment: time you enjoy wasting (on a technical book like this one) is not
wasted.

References

Gandomi A, Haider M (2015). “Beyond the Hype: Big Data Concepts, Methods, and
Analytics.” International Journal of Information Management, 35(2), 137–144. doi:
10.1016/j.ijinfomgt.2014.10.007.

Marx V (2013). “Biology: The Big Challenges of Big Data.” Nature, 498(7453), 255–260.
doi:10.1038/498255a.

Peters DPC, Havstad KM, Cushing J, Tweedie C, Fuentes O, Villanueva-Rosales N (2014).

“Harnessing the Power of Big Data: Infusing the Scientific Method with Machine Learning
to Transform Ecology.” Ecosphere, 5(6), 1–15. doi:10.1890/es13-00359.1.

Wickham H (2009). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York.

Wickham H (2014). “Tidy Data.” Journal of Statistical Software, 59, 1–23. doi:10.18637/
jss.v059.i10.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. 2nd edition. Springer-
Verlag, New York.

Reviewer:
Christopher J. Lortie
York University and NCEAS
Biology
Toronto, Canada, M3J1P3
E-mail: [email protected]
URL: http://www.christopherlortie.info/

Journal of Statistical Software http://www.jstatsoft.org/

published by the Foundation for Open Access Statistics http://www.foastat.org/
April 2017, Volume 77, Book Review 1 Published: 2017-04-03
doi:10.18637/jss.v077.b01

View publication stats

@DatascienceM Mastering R
No ratings yet
@DatascienceM Mastering R
297 pages
(Practical Guides To Biostatistics and Epidemiology) Jos W. R. Twisk - Applied Mixed Model Analysis - A Practical Guide-Cambridge University Press (2019)
100% (2)
(Practical Guides To Biostatistics and Epidemiology) Jos W. R. Twisk - Applied Mixed Model Analysis - A Practical Guide-Cambridge University Press (2019)
243 pages
Woolford D. Applied Data Science. Data Translators Across The Disciplines 2023
No ratings yet
Woolford D. Applied Data Science. Data Translators Across The Disciplines 2023
195 pages
Sanitary Landfill Research
100% (3)
Sanitary Landfill Research
58 pages
Ist Uv Om
No ratings yet
Ist Uv Om
93 pages
Advanced R Solutions Chapman Amp Hall CRC The R Series 1nbsped 1032007508 9781032007502
No ratings yet
Advanced R Solutions Chapman Amp Hall CRC The R Series 1nbsped 1032007508 9781032007502
302 pages
R Markdown: The Definitive Guide: Yihui Xie, J. J. Allaire, Garrett Grolemund
33% (3)
R Markdown: The Definitive Guide: Yihui Xie, J. J. Allaire, Garrett Grolemund
123 pages
70 Standard Practices-Engine
No ratings yet
70 Standard Practices-Engine
228 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
73 pages
L5590-L3550 - Servce Manual
No ratings yet
L5590-L3550 - Servce Manual
157 pages
Applied Text Analysis
No ratings yet
Applied Text Analysis
13 pages
Solutions2CMOS CircuitDesign Allen
No ratings yet
Solutions2CMOS CircuitDesign Allen
509 pages
R For Programmers PDF
No ratings yet
R For Programmers PDF
370 pages
RYAN, THOMAS P. - (Wiley Series in Probability and Statistics) Modern Regression Methods - (2
No ratings yet
RYAN, THOMAS P. - (Wiley Series in Probability and Statistics) Modern Regression Methods - (2
658 pages
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
3/5 (7)
Cloud Computing
No ratings yet
Cloud Computing
14 pages
11 Data Visualization
No ratings yet
11 Data Visualization
44 pages
R Intro Script
No ratings yet
R Intro Script
86 pages
R Markdown: The Definitive Guide: Yihui Xie, J. J. Allaire, Garrett Grolemund
No ratings yet
R Markdown: The Definitive Guide: Yihui Xie, J. J. Allaire, Garrett Grolemund
123 pages
Data Science Methodology
No ratings yet
Data Science Methodology
4 pages
Modern Data Science With R-775437 Chapters
No ratings yet
Modern Data Science With R-775437 Chapters
10 pages
Gastroenterology 1
No ratings yet
Gastroenterology 1
6 pages
Ggplot2 Elegant Graphics For Data Analysis (2016, Springer) PDF
No ratings yet
Ggplot2 Elegant Graphics For Data Analysis (2016, Springer) PDF
281 pages
Grammar of Graphics
No ratings yet
Grammar of Graphics
45 pages
How To Work With List Columns
No ratings yet
How To Work With List Columns
104 pages
Predictive Modeling Project Report
100% (2)
Predictive Modeling Project Report
31 pages
Outlier Detection: Techniques and Applications: N. N. R. Ranga Suri Narasimha Murty M G. Athithan
No ratings yet
Outlier Detection: Techniques and Applications: N. N. R. Ranga Suri Narasimha Murty M G. Athithan
227 pages
Forecast Time Series With R Language
No ratings yet
Forecast Time Series With R Language
98 pages
Graphical Data Analysis With R
No ratings yet
Graphical Data Analysis With R
306 pages
Maths For Data Science
No ratings yet
Maths For Data Science
1 page
Programming For Data Science With R Syllabus
No ratings yet
Programming For Data Science With R Syllabus
12 pages
Digestive and Gastrointestinal Function Handouts
No ratings yet
Digestive and Gastrointestinal Function Handouts
15 pages
CFX
No ratings yet
CFX
58 pages
Balaji Industries Catalog For Pharma-Chem Industry (Sakinaka, Mumbai) (64 Pages) (27.03.2023) (Vol 1, Ver 2223)
No ratings yet
Balaji Industries Catalog For Pharma-Chem Industry (Sakinaka, Mumbai) (64 Pages) (27.03.2023) (Vol 1, Ver 2223)
64 pages
Ggplot 2
No ratings yet
Ggplot 2
48 pages
4650 and 4850 Section 70
No ratings yet
4650 and 4850 Section 70
58 pages
R For Data Science: Journal of Statistical Software April 2017
No ratings yet
R For Data Science: Journal of Statistical Software April 2017
4 pages
Narendra Mohan Director: Email: Nsikanpur@nic - In, Director - Nsi@gov - in
100% (1)
Narendra Mohan Director: Email: Nsikanpur@nic - In, Director - Nsi@gov - in
33 pages
Netflix Data Science Interview Question
No ratings yet
Netflix Data Science Interview Question
7 pages
Apache Spark Programming With Databricks
No ratings yet
Apache Spark Programming With Databricks
112 pages
Analysis - Ecological - Data PCA in R
No ratings yet
Analysis - Ecological - Data PCA in R
126 pages
SAS Viya: The Python Perspective
From Everand
SAS Viya: The Python Perspective
Kevin D. Smith
No ratings yet
R Advbeginner v5
No ratings yet
R Advbeginner v5
73 pages
Mastering Scientifi C Computing With R
0% (1)
Mastering Scientifi C Computing With R
55 pages
Data Literacy Fundamentals: Understanding the Power & Value of Data
From Everand
Data Literacy Fundamentals: Understanding the Power & Value of Data
Ben Jones
No ratings yet
CH 22
100% (1)
CH 22
16 pages
Monte Carlo Studies Using SAS
100% (2)
Monte Carlo Studies Using SAS
258 pages
R Lnaguager
No ratings yet
R Lnaguager
38 pages
Data Science With R
No ratings yet
Data Science With R
21 pages
Part D - SUBBASE AND BASE
100% (1)
Part D - SUBBASE AND BASE
6 pages
3 Data Structures: 3.1 Arrays
100% (4)
3 Data Structures: 3.1 Arrays
48 pages
GR-1000 900 700 Ex-4
No ratings yet
GR-1000 900 700 Ex-4
9 pages
Valvulas Cedaspe
100% (1)
Valvulas Cedaspe
11 pages
Abell Model-Business Modeling - (Chapter 2 MSO)
No ratings yet
Abell Model-Business Modeling - (Chapter 2 MSO)
35 pages
Intro To Data Science With DB
No ratings yet
Intro To Data Science With DB
33 pages
Survival Plots SURVMINER Package Tutorial
No ratings yet
Survival Plots SURVMINER Package Tutorial
5 pages
Data Science With R - Course Materials
No ratings yet
Data Science With R - Course Materials
25 pages
DOC010365 - 0 Manual Ajax FM200 Extingushing Stytem
No ratings yet
DOC010365 - 0 Manual Ajax FM200 Extingushing Stytem
22 pages
R Markdown
No ratings yet
R Markdown
15 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
RDataMining Slides Text Mining
No ratings yet
RDataMining Slides Text Mining
34 pages
Listo de Kits Atlas Copco Z 2009
No ratings yet
Listo de Kits Atlas Copco Z 2009
11 pages
GT02A User Manual PDF
No ratings yet
GT02A User Manual PDF
10 pages
Econology Design Guides For Plastics
100% (7)
Econology Design Guides For Plastics
46 pages
Cooper Bearings
No ratings yet
Cooper Bearings
36 pages
Cheatsheet Data Visualization
100% (1)
Cheatsheet Data Visualization
5 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Specification For 10 and 12 Model Excavator
No ratings yet
Specification For 10 and 12 Model Excavator
2 pages
How To Do Deep Learning With SAS: Title
No ratings yet
How To Do Deep Learning With SAS: Title
16 pages
R For Data Science: Journal of Statistical Software April 2017
No ratings yet
R For Data Science: Journal of Statistical Software April 2017
4 pages
Machine Learning Essentials
0% (1)
Machine Learning Essentials
2 pages
Data Science With R Workflow
100% (1)
Data Science With R Workflow
1 page
Variable Selection
No ratings yet
Variable Selection
15 pages
Proc Report
No ratings yet
Proc Report
32 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
R-Cheat Sheet
100% (1)
R-Cheat Sheet
4 pages
Goulds Pumps: Lineshaft and Submersible Turbine Pumps
No ratings yet
Goulds Pumps: Lineshaft and Submersible Turbine Pumps
16 pages
Text Analysis in R
No ratings yet
Text Analysis in R
21 pages
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
Implementation of CPPI
No ratings yet
Implementation of CPPI
10 pages
Project On Data File Handling
No ratings yet
Project On Data File Handling
13 pages
Adding Decimals (1 Digit Plus 2 Digits)
No ratings yet
Adding Decimals (1 Digit Plus 2 Digits)
22 pages
R Reference Card
No ratings yet
R Reference Card
1 page
Professional Data Engineer Beta Exam Guide
No ratings yet
Professional Data Engineer Beta Exam Guide
6 pages
ShaliDCT 102A
No ratings yet
ShaliDCT 102A
2 pages
Visual Welding Inspection Procedure Doc Scribd - Google Search
No ratings yet
Visual Welding Inspection Procedure Doc Scribd - Google Search
2 pages
Ayecka Web Prepared Html4
No ratings yet
Ayecka Web Prepared Html4
13 pages
IELTS For Academic Purposes - Booster
100% (4)
IELTS For Academic Purposes - Booster
99 pages
2020 09 10 Versa Bond Ppo v2
No ratings yet
2020 09 10 Versa Bond Ppo v2
4 pages
Car Lift Launch 3T Alignment TLT830WA 307040937
100% (1)
Car Lift Launch 3T Alignment TLT830WA 307040937
2 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Content://Com - Dropbox.android - fileCache/Filecache/c7090b24 396a 4d2a 870c 56080fc36702
No ratings yet
Content://Com - Dropbox.android - fileCache/Filecache/c7090b24 396a 4d2a 870c 56080fc36702
1 page
Chapters - Perkins 2800 Workshop Manual (Page 2) - ManualsLib
100% (1)
Chapters - Perkins 2800 Workshop Manual (Page 2) - ManualsLib
1 page
Data Science for Librarians: Transforming Information into Insight
From Everand
Data Science for Librarians: Transforming Information into Insight
Jason Miller
1/5 (1)

R For Data Science

Uploaded by

R For Data Science

Uploaded by

See

R for Data Science

Article in Journal of statistical software · April 2017

The importance of non-independence in social media science amplification. View project

The user has requested enhancement of the downloaded file.

Reviewer: Christopher J. Lortie

R for Data Science

Hadley Wickham, Garrett Grolemund

Peters DPC, Havstad KM, Cushing J, Tweedie C, Fuentes O, Villanueva-Rosales N (2014).

Journal of Statistical Software http://www.jstatsoft.org/

View publication stats

You might also like