0% found this document useful (0 votes)
52 views2 pages

Chapter 35 Introduction To Productivity Tools - Introduction To Data Science

This chapter introduces productivity tools for data science. It recommends using scripting languages like R instead of point-and-click tools for more flexibility and reproducibility. It discusses organizing files systematically, automating tasks, and minimizing mouse use. Specific tools introduced are the Unix shell for file management, Git for version control, GitHub for hosting code, and R Markdown for reproducible reports combining text and code. An example project on US gun murders is used to demonstrate these tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views2 pages

Chapter 35 Introduction To Productivity Tools - Introduction To Data Science

This chapter introduces productivity tools for data science. It recommends using scripting languages like R instead of point-and-click tools for more flexibility and reproducibility. It discusses organizing files systematically, automating tasks, and minimizing mouse use. Specific tools introduced are the Unix shell for file management, Git for version control, GitHub for hosting code, and R Markdown for reproducible reports combining text and code. An example project on US gun murders is used to demonstrate these tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Chapter 35 Introduction to productivity tools

Generally speaking, we do not recommend using point-and-click approaches for data analysis. Instead,
we recommend scripting languages, such as R, since they are more flexible and greatly facilitate
reproducibility. Similarly, we recommend against the use of point-and-click approaches to organizing files
and document preparation. In this chapter, we demonstrate alternative approaches. Specifically, we will
learn to use freely available tools that, although at first may seem cumbersome and non-intuitive, will
eventually make you a much more efficient and productive data scientist.

Three general guiding principles that motivate what we learn here are 1) be systematic when organizing
your filesystem, 2) automate when possible, and 3) minimize the use of the mouse. As you become more
proficient at coding, you will find that 1) you want to minimize the time you spend remembering what you
called a file or where you put it, 2) if you find yourself repeating the same task over and over, there is
probably a way to automate, and 3) anytime your fingers leave the keyboard, it results in loss of
productivity.

A data analysis project is not always a dataset and a script. A typical data analysis challenge may involve
several parts, each involving several data files, including files containing the scripts we use to analyze
data. Keeping all this organized can be challenging. We will learn to use the Unix shell as a tool for
managing files and directories on your computer system. Using Unix will permit you to use the keyboard,
rather than the mouse, when creating folders, moving from directory to directory, and renaming, deleting,
or moving files. We also provide specific suggestions on how to keep the filesystem organized.
The data analysis process is also iterative and adaptive. As a result, we are constantly editing our scripts
and reports. In this chapter, we introduce you to the version control system Git, which is a powerful tool
for keeping track of these changes. We also introduce you to GitHub113, a service that permits you to
host and share your code. We will demonstrate how you can use this service to facilitate collaborations.
Keep in mind that another positive benefit of using GitHub is that you can easily showcase your
work to potential employers.

Finally, we learn to write reports in R markdown, which permits you to incorporate text and code into a
single document. We will demonstrate how, using the  knitr  package, we can write reproducible and
aesthetically pleasing reports by running the analysis and generating the report simultaneously.

We will put all this together using the powerful integrated desktop environment RStudio114. Throughout
the chapter we will be building up an example on US gun murders. The final project, which includes
several files and folders, can be seen here: https://github.com/rairizarry/murders. Note that one of the files
in that project is the final report: https://github.com/rairizarry/murders/blob/master/report.md.

113. http://github.com↩︎

114. https://www.rstudio.com/↩︎

You might also like