Chapter 35 Introduction To Productivity Tools - Introduction To Data Science

This chapter introduces productivity tools for data science. It recommends using scripting languages like R instead of point-and-click tools for more flexibility and reproducibility. It discusses organizing files systematically, automating tasks, and minimizing mouse use. Specific tools introduced are the Unix shell for file management, Git for version control, GitHub for hosting code, and R Markdown for reproducible reports combining text and code. An example project on US gun murders is used to demonstrate these tools.

Uploaded by

Adrian E. Largo Zuluaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views2 pages

Chapter 35 Introduction To Productivity Tools - Introduction To Data Science

Uploaded by

Adrian E. Largo Zuluaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Chapter 35 Introduction to productivity tools

Generally speaking, we do not recommend using point-and-click approaches for data analysis. Instead,
we recommend scripting languages, such as R, since they are more flexible and greatly facilitate
reproducibility. Similarly, we recommend against the use of point-and-click approaches to organizing files
and document preparation. In this chapter, we demonstrate alternative approaches. Specifically, we will
learn to use freely available tools that, although at first may seem cumbersome and non-intuitive, will
eventually make you a much more efficient and productive data scientist.

Three general guiding principles that motivate what we learn here are 1) be systematic when organizing
your filesystem, 2) automate when possible, and 3) minimize the use of the mouse. As you become more
proficient at coding, you will find that 1) you want to minimize the time you spend remembering what you
called a file or where you put it, 2) if you find yourself repeating the same task over and over, there is
probably a way to automate, and 3) anytime your fingers leave the keyboard, it results in loss of
productivity.

A data analysis project is not always a dataset and a script. A typical data analysis challenge may involve
several parts, each involving several data files, including files containing the scripts we use to analyze
data. Keeping all this organized can be challenging. We will learn to use the Unix shell as a tool for
managing files and directories on your computer system. Using Unix will permit you to use the keyboard,
rather than the mouse, when creating folders, moving from directory to directory, and renaming, deleting,
or moving files. We also provide specific suggestions on how to keep the filesystem organized.
The data analysis process is also iterative and adaptive. As a result, we are constantly editing our scripts
and reports. In this chapter, we introduce you to the version control system Git, which is a powerful tool
for keeping track of these changes. We also introduce you to GitHub113, a service that permits you to
host and share your code. We will demonstrate how you can use this service to facilitate collaborations.
Keep in mind that another positive benefit of using GitHub is that you can easily showcase your
work to potential employers.

Finally, we learn to write reports in R markdown, which permits you to incorporate text and code into a
single document. We will demonstrate how, using the knitr package, we can write reproducible and
aesthetically pleasing reports by running the analysis and generating the report simultaneously.

We will put all this together using the powerful integrated desktop environment RStudio114. Throughout
the chapter we will be building up an example on US gun murders. The final project, which includes
several files and folders, can be seen here: https://github.com/rairizarry/murders. Note that one of the files
in that project is the final report: https://github.com/rairizarry/murders/blob/master/report.md.

113. http://github.com↩︎

114. https://www.rstudio.com/↩︎

ICDL Computer Essentials
From Everand
ICDL Computer Essentials
Michael Anderson
4/5 (2)
Solution Manual For Introduction To Renewable Energy For Engineers 1st Edition Hagen 0133360865 9780133360868
100% (1)
Solution Manual For Introduction To Renewable Energy For Engineers 1st Edition Hagen 0133360865 9780133360868
82 pages
4 Clean Transform, and Load Data in Power BI
100% (3)
4 Clean Transform, and Load Data in Power BI
88 pages
Python for Mechanical and Aerospace Engineering
From Everand
Python for Mechanical and Aerospace Engineering
Alexander Kenan
No ratings yet
KNIME Essentials
From Everand
KNIME Essentials
Gábor Bakos
No ratings yet
Ansible for IT Experts
From Everand
Ansible for IT Experts
Denis Zuev
No ratings yet
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
From Everand
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
James Tudor
5/5 (1)
DataScience - Unit 1
No ratings yet
DataScience - Unit 1
12 pages
The Beginner’s Guide to AI - Aider
From Everand
The Beginner’s Guide to AI - Aider
Steven Mcananey
No ratings yet
Programming And Coding begginers level
From Everand
Programming And Coding begginers level
Memo
No ratings yet
The Ascetic Programmer
From Everand
The Ascetic Programmer
Antonio Piccolboni
5/5 (1)
Byte by Byte
From Everand
Byte by Byte
Manuel Oliveira
No ratings yet
Modern Tkinter for Busy Python Developers: Quickly Learn to Create Great Looking User Interfaces for Windows, Mac and Linux Using Python's Standard GUI Toolkit
From Everand
Modern Tkinter for Busy Python Developers: Quickly Learn to Create Great Looking User Interfaces for Windows, Mac and Linux Using Python's Standard GUI Toolkit
Mark Roseman
3/5 (1)
Cours R
No ratings yet
Cours R
39 pages
Jump Start Git
From Everand
Jump Start Git
Shaumik Daityari
No ratings yet
Programming Concepts in C++
From Everand
Programming Concepts in C++
Robert Burns
No ratings yet
DESIGN ALGORITHMS TO SOLVE COMMON PROBLEMS: Mastering Algorithm Design for Practical Solutions (2024 Guide)
From Everand
DESIGN ALGORITHMS TO SOLVE COMMON PROBLEMS: Mastering Algorithm Design for Practical Solutions (2024 Guide)
ARCHER PAUL
No ratings yet
Learning Jupyter
From Everand
Learning Jupyter
Dan Toomey
3.5/5 (4)
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
DMDW Lab Report: Data Analytics Branch
No ratings yet
DMDW Lab Report: Data Analytics Branch
51 pages
Collection of Raspberry Pi Projects
From Everand
Collection of Raspberry Pi Projects
Guillermo Perez Guillen
5/5 (1)
Topic 1 - Financial Analytics and The R Environment
No ratings yet
Topic 1 - Financial Analytics and The R Environment
24 pages
Python Programming : Automation & Scripting , BeautifulSoup, Selenium, PyAutoGUI, Click & argparse: Python, #5
From Everand
Python Programming : Automation & Scripting , BeautifulSoup, Selenium, PyAutoGUI, Click & argparse: Python, #5
e3
No ratings yet
Intro2R Wk2
No ratings yet
Intro2R Wk2
40 pages
The Software Programmer: Basis of common protocols and procedures
From Everand
The Software Programmer: Basis of common protocols and procedures
S Mathioudakis
No ratings yet
Learn R By Coding
From Everand
Learn R By Coding
Thomas Kurnicki
No ratings yet
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
From Everand
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Eric Vargas
No ratings yet
Forest Analytics With R An Introduction Scribd Download
100% (13)
Forest Analytics With R An Introduction Scribd Download
14 pages
Getting Started With R
No ratings yet
Getting Started With R
7 pages
Git Basics and Version Control: Coder's companion
From Everand
Git Basics and Version Control: Coder's companion
Sankar Srinivasan
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Python from the Very Beginning
From Everand
Python from the Very Beginning
John Whitington
No ratings yet
COMPUTER PRODUCTIVITY BOOK 1 Use AutoHotKey Create your own personal productivity scripts: AutoHotKey productivity, #1
From Everand
COMPUTER PRODUCTIVITY BOOK 1 Use AutoHotKey Create your own personal productivity scripts: AutoHotKey productivity, #1
Max Drake
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
7 pages
Code and Data
No ratings yet
Code and Data
45 pages
Twitter Return Vs S&P 500 Return
No ratings yet
Twitter Return Vs S&P 500 Return
7 pages
20 Windows Tools Every SysAdmin Should Know
From Everand
20 Windows Tools Every SysAdmin Should Know
padmin
4.5/5 (3)
Afin8015 Topic 1 2023.
No ratings yet
Afin8015 Topic 1 2023.
64 pages
Algorithms Made Simple: Understanding the Building Blocks of Software
From Everand
Algorithms Made Simple: Understanding the Building Blocks of Software
William E. Clark
No ratings yet
Workflow of Statistical Data Analysis
No ratings yet
Workflow of Statistical Data Analysis
105 pages
Python Programming: 8 Simple Steps to Learn Python Programming Language in 24 hours! Practical Python Programming for Beginners, Python Commands and Python Language
From Everand
Python Programming: 8 Simple Steps to Learn Python Programming Language in 24 hours! Practical Python Programming for Beginners, Python Commands and Python Language
Norman James
2/5 (1)
Easy Programming for Everyone
From Everand
Easy Programming for Everyone
Umar Asghar
No ratings yet
Steps to Technology: Terms and Concepts For Beginners
From Everand
Steps to Technology: Terms and Concepts For Beginners
Ahmed Mosalam
No ratings yet
Understanding Software Engineering Vol 2: Programming principles and concepts to build any software.
From Everand
Understanding Software Engineering Vol 2: Programming principles and concepts to build any software.
Gabriel Clemente
5/5 (1)
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
Document
No ratings yet
Document
37 pages
Introduction To Programming Econometrics With R - Draft
No ratings yet
Introduction To Programming Econometrics With R - Draft
55 pages
DSCI Key Terms and Ideas For Review
No ratings yet
DSCI Key Terms and Ideas For Review
98 pages
Your First Python Program
From Everand
Your First Python Program
Alexander Paz
No ratings yet
Mastering Python: A Comprehensive Guide for Beginners and Experts
From Everand
Mastering Python: A Comprehensive Guide for Beginners and Experts
Rick Spair
No ratings yet
Programming And Coding in Intermidiate Level
From Everand
Programming And Coding in Intermidiate Level
Memo
No ratings yet
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
TypeScript for Python Developers: Bridging Syntax and Practices
From Everand
TypeScript for Python Developers: Bridging Syntax and Practices
Baldurs L.
No ratings yet
Mastering Python in 7 Days
From Everand
Mastering Python in 7 Days
Alex Wood
No ratings yet
Bda Unit5
No ratings yet
Bda Unit5
110 pages
Beginner's guide to mastering python
From Everand
Beginner's guide to mastering python
Xilis
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Beyond Spreadsheets R PDF
100% (1)
Beyond Spreadsheets R PDF
470 pages
Modular Programming with Python
From Everand
Modular Programming with Python
Erik Westra
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
R Course 2014: Lecture 1
No ratings yet
R Course 2014: Lecture 1
58 pages
JauntPE 12steps Modified
100% (1)
JauntPE 12steps Modified
2 pages
Web Mining
No ratings yet
Web Mining
73 pages
QSAN Compatibility Matrix XN 2108 en
No ratings yet
QSAN Compatibility Matrix XN 2108 en
90 pages
1.2.2.9 Hands-On Lab Provision An Instance of IBM Db2 Lite Plan - MD
No ratings yet
1.2.2.9 Hands-On Lab Provision An Instance of IBM Db2 Lite Plan - MD
3 pages
Normalization Erd Example With Exercise
No ratings yet
Normalization Erd Example With Exercise
4 pages
File Processing System and Its Disadvantages
No ratings yet
File Processing System and Its Disadvantages
6 pages
Lec2 Dimensional Model
No ratings yet
Lec2 Dimensional Model
30 pages
DOS Command
No ratings yet
DOS Command
4 pages
Result Management System UML Diagram FreeProjectz
No ratings yet
Result Management System UML Diagram FreeProjectz
14 pages
DBMS Manual (Complete)
No ratings yet
DBMS Manual (Complete)
77 pages
Dictionery
No ratings yet
Dictionery
9 pages
BADI Cost Split Actual
No ratings yet
BADI Cost Split Actual
3 pages
Govardhana Rao
No ratings yet
Govardhana Rao
6 pages
File System Questions
No ratings yet
File System Questions
34 pages
The Four Stages of NTFS File Growth - Part - 2
No ratings yet
The Four Stages of NTFS File Growth - Part - 2
5 pages
Pear DB
No ratings yet
Pear DB
29 pages
Management Information Systems: Managing The Digital Firm, 12e Authors: Kenneth C. Laudon and Jane P. Laudon
No ratings yet
Management Information Systems: Managing The Digital Firm, 12e Authors: Kenneth C. Laudon and Jane P. Laudon
34 pages
OER Licensing Your Work Topic 3
No ratings yet
OER Licensing Your Work Topic 3
39 pages
Alumni Management
No ratings yet
Alumni Management
31 pages
AWR
No ratings yet
AWR
8 pages
ADOP Patching Gotchas!: What Can and Will Go Wrong When Patching EBS 12.2.X and How To Fix It!
No ratings yet
ADOP Patching Gotchas!: What Can and Will Go Wrong When Patching EBS 12.2.X and How To Fix It!
41 pages
Stored Procedure Trigger
No ratings yet
Stored Procedure Trigger
20 pages
Basic CRUD Operations, F Unctions, Expressions An D Clauses
No ratings yet
Basic CRUD Operations, F Unctions, Expressions An D Clauses
35 pages
DBMS Unit-2 (I)
No ratings yet
DBMS Unit-2 (I)
35 pages
Ext4 Foss
No ratings yet
Ext4 Foss
25 pages
Removing Invalid Characters For A Particular Infoobject
No ratings yet
Removing Invalid Characters For A Particular Infoobject
10 pages
Nutanix NCSA Core
No ratings yet
Nutanix NCSA Core
22 pages
Order Statistics - 2003 - David
No ratings yet
Order Statistics - 2003 - David
475 pages

Chapter 35 Introduction To Productivity Tools - Introduction To Data Science

Uploaded by

Chapter 35 Introduction To Productivity Tools - Introduction To Data Science

Uploaded by

Chapter 35 Introduction to productivity tools

You might also like