0% found this document useful (0 votes)

5 views

R Module 4 - Data_IO

The document provides an overview of setting and managing the working directory in R, emphasizing the importance of this step for data input and output. It details how to read various data formats, including CSV and Excel files, and introduces functions like read.table() and write.table() for data manipulation. Additionally, it mentions packages for reading data from other software formats, highlighting the flexibility of R in handling diverse data sources.

Uploaded by

lowtarhkM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

R Module 4 - Data_IO

Uploaded by

lowtarhkM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Data Input/Output

Andrew Jaffe

January 4, 2016
Before we get Started: Working Directories

I R looks for files on your computer relative to the “working”

directory
I It’s always safer to set the working directory at the beginning
of your script. Note that setting the working directory created
the necessary code that you can copy into your script.
I Example of help file

## get the working directory

getwd()
# setwd("~/winterR_2016/Lectures")
Setting a Working Directory

I Setting the directory can sometimes be finicky

I Windows: Default directory structure involves single backslashes
(“"), but R interprets these as”escape" characters. So you must
replace the backslash with forward slashed (“/”) or two
backslashes (“\”)
I Mac/Linux: Default is forward slashes, so you are okay
I Typical linux/DOS directory structure syntax applies
I “..” goes up one level
I “./” is the current directory
I “~” is your home directory
Working Directory
Note that the dir() function interfaces with your operating system
and can show you which files are in your current working directory.
You can try some directory navigation:

dir("./") # shows directory contents

[1] "Data_IO.html" "Data_IO.pdf"

[3] "Data_IO.R" "Data_IO.Rmd"
[5] "monuments_newNames.csv"

dir("..")

[1] "lab" "lecture"

Working Directory

I Copy the code to set your working directory from the History
tab in RStudio (top right)
I Confirm the directory contains “day1.R” using dir()
Data Input

I ‘Reading in’ data is the first step of any real project/analysis

I R can read almost any file format, especially via add-on
packages
I We are going to focus on simple delimited files first
I tab delimited (e.g. ‘.txt’)
I comma separated (e.g. ‘.csv’)
I Microsoft excel (e.g. ‘.xlsx’)
Data Aside

I Everything we do in class will be using real publicly available

data - there are few ‘toy’ example datasets and ‘simulated’ data
I OpenBaltimore and Data.gov will be sources for the first few
days
Data Input

Monuments Dataset: “This data set shows the point location of

Baltimore City monuments. However, the completness and
currentness of these data are uncertain.”

I Download data from http:

//www.aejaffe.com/winterR_2016/data/Monuments.csv
I Save it (or move it) to the same folder as your day1.R script
I Within RStudio: Session –> Set Working Directory –> To
Source File Location
I (data downloaded from https://data.baltimorecity.gov/
Community/Monuments/cpxf-kxp3)
Data Input

R Studio features some nice “drop down” support, where you can
run some tasks by selecting them from the toolbar.
For example, you can easily import text datasets using the “Tools
–> Import Dataset” command. Selecting this will bring up a new
screen that lets you specify the formatting of your text file.
After importing a datatset, you get the corresponding R commands
that you can enter in the console if you want to re-import data.
Data Input
So what is going on “behind the scenes”?
read.table(): Reads a file in table format and creates a data
frame from it, with cases corresponding to lines and variables to
fields in the file.

# the four ones I've put at the top are the important input
read.table( file, # filename
header = FALSE, # are there column names?
sep = "", # what separates columns?
as.is = !stringsAsFactors, # do you want charact
quote = "\"'", dec = ".", row.names, col.names,
na.strings = "NA", nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.line
strip.white = FALSE, blank.lines.skip = TRUE, co
stringsAsFactors = default.stringsAsFactors())

# for example: `read.table("file.txt", header = TRUE, sep="

Data Input

I The filename is the path to your file, in quotes

I The function will look in your “working directory” if no
absolute file path is given
I Note that the filename can also be a path to a file on a website
(e.g. ‘www.someurl.com/table1.txt’)
Data Input

There is a ‘wrapper’ function for reading CSV files:

read.csv

function (file, header = TRUE, sep = ",", quote = "\"", dec

fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote =
dec = dec, fill = fill, comment.char = comment.char, ..
<bytecode: 0x0000000014afdcd0>
<environment: namespace:utils>

Note: the ... designates extra/optional arguments that can be

passed to read.table() if needed
Data Input
I Here would be reading in the data from the command line,
specifying the file path:

mon = read.csv("../../data/Monuments.csv",header=TRUE,as.is
head(mon)

name zipCode neighborhood cou

1 James Cardinal Gibbons 21201 Downtown
2 The Battle Monument 21202 Downtown
3 Negro Heroes of the U.S Monument 21202 Downtown
4 Star Bangled Banner 21202 Downtown
5 Flame at the Holocaust Monument 21202 Downtown
6 Calvert Statue 21202 Downtown
policeDistrict Location.1
1 CENTRAL 408 CHARLES ST\nBaltimore, MD\n
2 CENTRAL
3 CENTRAL
4 CENTRAL 100 HOLLIDAY ST\nBaltimore, MD\n
Data Input

colnames(mon) # column names

[1] "name" "zipCode" "neighborhood" "

[5] "policeDistrict" "Location.1"

head(mon$zipCode) # first few rows

[1] 21201 21202 21202 21202 21202 21202

Data Input

The read.table() function returns a data.frame, which is the

primary data format for most data cleaning and analyses

str(mon) # structure of an R object

'data.frame': 84 obs. of 6 variables:

$ name : chr "James Cardinal Gibbons" "The Batt
$ zipCode : int 21201 21202 21202 21202 21202 2120
$ neighborhood : chr "Downtown" "Downtown" "Downtown" "
$ councilDistrict: int 11 11 11 11 11 11 11 7 14 14 ...
$ policeDistrict : chr "CENTRAL" "CENTRAL" "CENTRAL" "CEN
$ Location.1 : chr "408 CHARLES ST\nBaltimore, MD\n"
Data Input
Changing variable names in data.frames works using the names()
function, which is analagous to colnames() for data frames (they
can be used interchangeably)

names(mon)[1] = "Name"
names(mon)

[1] "Name" "zipCode" "neighborhood" "

[5] "policeDistrict" "Location.1"

names(mon)[1] = "name"
names(mon)

[1] "name" "zipCode" "neighborhood" "

[5] "policeDistrict" "Location.1"
Data Output

While its nice to be able to read in a variety of data formats, it’s

equally important to be able to output data somewhere.
write.table(): prints its required argument x (after converting it
to a data.frame if it is not one nor a matrix) to a file or
connection.

write.table(x,file = "", append = FALSE, quote = TRUE, sep

eol = "\n", na = "NA", dec = ".", row.names = T
col.names = TRUE, qmethod = c("escape", "double
fileEncoding = "")
Data Output

x: the R data.frame or matrix you want to write

file: the file name where you want to R object written. It can be
an absolute path, or a filename (which writes the file to your
working directory)
sep: what character separates the columns?

I “,” = .csv - Note there is also a write.csv() function

I = tab delimited
“’’

row.names: I like setting this to FALSE because I email these to

collaborators who open them in Excel
Data Output

For example, we can write back out the Monuments dataset with
the new column name:

names(mon)[6] = "Location"
write.csv(mon, file="monuments_newNames.csv", row.names=FAL

Note that row.names=TRUE would make the first column contain

the row names, here just the numbers 1:nrow(mon), which is not
very useful for Excel. Note that row names can be
useful/informative in R if they contain information (but then they
would just be a separate column).
Data Input - Excel

Many data analysts collaborate with researchers who use Excel to

enter and curate their data. Often times, this is the input data for
an analysis. You therefore have two options for getting this data
into R:

I Saving the Excel sheet as a .csv file, and using read.csv()

I Using an add-on package, like xlsx, readxl, or openxlsx

For single worksheet .xlsx files, I often just save the spreadsheet as a
.csv file (because I often have to strip off additional summary data
from the columns)
For an .xlsx file with multiple well-formated worksheets, I use the
xlsx, readxl, or openxlsx package for reading in the data.
Data Input - Other Software

I haven package (https://cran.r-project.org/web/

packages/haven/index.html) reads in SAS, SPSS, Stata
formats
I readxl package - the read_excel function can read Excel
sheets easily
I readr package - Has read_csv/write_csv and read_table
functions similar to read.csv/write.csv and read.table. Has
different defaults, but can read much faster for very large
data sets
I sas7bdat reads .sas7bdat files
I foreign package - can read all the formats as haven. Around
longer (aka more testing), but not as maintained (bad for
future).

R Cheat Sheet PDF
100% (1)
R Cheat Sheet PDF
38 pages
Unit II - R Programming
No ratings yet
Unit II - R Programming
29 pages
Data Science Wrangling
No ratings yet
Data Science Wrangling
121 pages
M3 Dar
No ratings yet
M3 Dar
52 pages
R Programming UNIT 2
No ratings yet
R Programming UNIT 2
119 pages
Data Import::: Cheat Sheet
No ratings yet
Data Import::: Cheat Sheet
2 pages
R Tutorial
No ratings yet
R Tutorial
119 pages
Broomspatial
No ratings yet
Broomspatial
31 pages
UNIT-II R Programming
No ratings yet
UNIT-II R Programming
41 pages
Lecture 4.pptx
No ratings yet
Lecture 4.pptx
27 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
Module 3-2
No ratings yet
Module 3-2
35 pages
SEU - DS510 - Module 4 Input-Output and Data Structure
No ratings yet
SEU - DS510 - Module 4 Input-Output and Data Structure
68 pages
Modulel IV
No ratings yet
Modulel IV
48 pages
R Programming Unit 2
No ratings yet
R Programming Unit 2
46 pages
Data Import
No ratings yet
Data Import
2 pages
UNIT-II_R_programming-1
No ratings yet
UNIT-II_R_programming-1
41 pages
Data Import Cheatsheet
No ratings yet
Data Import Cheatsheet
2 pages
Unit 2 Reading and Writing Files
No ratings yet
Unit 2 Reading and Writing Files
33 pages
UNIT -2 R programming
No ratings yet
UNIT -2 R programming
32 pages
6 Input, Output, Connections
No ratings yet
6 Input, Output, Connections
17 pages
Unit 1 R Reading-Writing Files
No ratings yet
Unit 1 R Reading-Writing Files
8 pages
R Programming Lab
No ratings yet
R Programming Lab
8 pages
R Basics Continued - Factors and Data Frames - Intro To R and RStudio For Genomics
No ratings yet
R Basics Continued - Factors and Data Frames - Intro To R and RStudio For Genomics
17 pages
ProgrammingForDS14_Rbasics
No ratings yet
ProgrammingForDS14_Rbasics
32 pages
Reading Files in R Programming Language
No ratings yet
Reading Files in R Programming Language
33 pages
MBA Sem 1 Unit 3 Fundamentals of R (1)
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R (1)
41 pages
Module II Notes - 1
No ratings yet
Module II Notes - 1
6 pages
R Exercise 1 - Introduction To R For Non-Programmers
No ratings yet
R Exercise 1 - Introduction To R For Non-Programmers
9 pages
Data Minig and Techniquezz
No ratings yet
Data Minig and Techniquezz
48 pages
Curso 2 Data in Out Listas
No ratings yet
Curso 2 Data in Out Listas
30 pages
Read and Write CSV Files in R
No ratings yet
Read and Write CSV Files in R
39 pages
R Tutorial
No ratings yet
R Tutorial
100 pages
Data - Read - Table (" /Kutnerdata/Chapter 1 Data Sets/Ch01Pr19.Dat")
No ratings yet
Data - Read - Table (" /Kutnerdata/Chapter 1 Data Sets/Ch01Pr19.Dat")
2 pages
I R A E D: Mport EAD ND Xport ATA
No ratings yet
I R A E D: Mport EAD ND Xport ATA
28 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
12-14 Answers R No GPT Foramt
No ratings yet
12-14 Answers R No GPT Foramt
7 pages
B.Tech (CSE) Ritu Raj Chanda : Submitted by
No ratings yet
B.Tech (CSE) Ritu Raj Chanda : Submitted by
11 pages
r programming 2nd unit
No ratings yet
r programming 2nd unit
43 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
R Tutorial #1: Applied Econometrics (Econ3005)
No ratings yet
R Tutorial #1: Applied Econometrics (Econ3005)
21 pages
01 IntroSlides
No ratings yet
01 IntroSlides
43 pages
mod3 tables EPP
No ratings yet
mod3 tables EPP
9 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
Week 7
No ratings yet
Week 7
10 pages
R Cheat Sheet 3 PDF
No ratings yet
R Cheat Sheet 3 PDF
2 pages
Cheat R Sheet
No ratings yet
Cheat R Sheet
5 pages
R Program Cheat Sheet 1
No ratings yet
R Program Cheat Sheet 1
2 pages
Practical 1_Data Frame Manipulation_072502
No ratings yet
Practical 1_Data Frame Manipulation_072502
16 pages
Chapter 03 Wrangling
No ratings yet
Chapter 03 Wrangling
40 pages
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
No ratings yet
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
24 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
10 Lessons in Front-end
From Everand
10 Lessons in Front-end
Krasimir Tsonev
2/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
4.5/5 (3)
Windows Command Prompt
From Everand
Windows Command Prompt
Murat Yildirimoglu
No ratings yet
CSV File Guide
From Everand
CSV File Guide
Mia Wright
No ratings yet
SQL Query Basics
From Everand
SQL Query Basics
Isabella Ramirez
No ratings yet
Microsoft Access: Database Creation and Management through Microsoft Access
From Everand
Microsoft Access: Database Creation and Management through Microsoft Access
Steven Bright
No ratings yet
CS 604 Assignment 1 Solution BC 210410285
No ratings yet
CS 604 Assignment 1 Solution BC 210410285
4 pages
Tutorial Asem 51
No ratings yet
Tutorial Asem 51
99 pages
Linux 2
No ratings yet
Linux 2
166 pages
(s3) Commands With The AWS CLI: Create A Bucket
No ratings yet
(s3) Commands With The AWS CLI: Create A Bucket
5 pages
Compose and Install Laravel
No ratings yet
Compose and Install Laravel
25 pages
Unix Lab QUESTION SET
No ratings yet
Unix Lab QUESTION SET
11 pages
COL100 Assignment 1
No ratings yet
COL100 Assignment 1
2 pages
OS LAB MANUAL1 (1)
No ratings yet
OS LAB MANUAL1 (1)
4 pages
Optional Exercise (Linux Terminal Commands)
No ratings yet
Optional Exercise (Linux Terminal Commands)
12 pages
Red Hat System Administration I 2.9 Lab PDF
No ratings yet
Red Hat System Administration I 2.9 Lab PDF
17 pages
Lab1 2024
No ratings yet
Lab1 2024
5 pages
CD DIR MKDIR Commands
No ratings yet
CD DIR MKDIR Commands
23 pages
OpenFOAM Version 7 Instruction Sheet English
No ratings yet
OpenFOAM Version 7 Instruction Sheet English
3 pages
Docker - Comandos
No ratings yet
Docker - Comandos
2 pages
OS Lab Manual Aditya
No ratings yet
OS Lab Manual Aditya
49 pages
Linux Commands
No ratings yet
Linux Commands
43 pages
Ge2155 Set 4
No ratings yet
Ge2155 Set 4
7 pages
CLASS notes-UNIX-R Assignments123
No ratings yet
CLASS notes-UNIX-R Assignments123
23 pages
Anna University Practical Examinations (May - June 2010) GE2155 - Computer Practice Lab-II Set - III
No ratings yet
Anna University Practical Examinations (May - June 2010) GE2155 - Computer Practice Lab-II Set - III
6 pages
Unix/Linux Tutorial: $ Chmod 700 Phigbee
No ratings yet
Unix/Linux Tutorial: $ Chmod 700 Phigbee
2 pages
Hands On Lab Introducing Linux Terminal
No ratings yet
Hands On Lab Introducing Linux Terminal
6 pages
OS Manual LBS
No ratings yet
OS Manual LBS
73 pages
3.1.2.7 Lab Getting Familiar With The Linux Shell ILM
No ratings yet
3.1.2.7 Lab Getting Familiar With The Linux Shell ILM
9 pages
Spike
No ratings yet
Spike
20 pages
Soscmd
No ratings yet
Soscmd
12 pages
linux lab manual
No ratings yet
linux lab manual
25 pages
npes_saved_mingw
No ratings yet
npes_saved_mingw
3 pages
Getting Started With Intel Galileo and Intel Edison Using Wyliodrin
No ratings yet
Getting Started With Intel Galileo and Intel Edison Using Wyliodrin
179 pages
Commands
No ratings yet
Commands
12 pages
Write A Shell Script To Create A File in
No ratings yet
Write A Shell Script To Create A File in
8 pages

R Module 4 - Data_IO

Uploaded by

R Module 4 - Data_IO

Uploaded by

Data Input/Output

I R looks for files on your computer relative to the “working”

## get the working directory

I Setting the directory can sometimes be finicky

dir("./") # shows directory contents

[1] "Data_IO.html" "Data_IO.pdf"

[1] "lab" "lecture"

I ‘Reading in’ data is the first step of any real project/analysis

I Everything we do in class will be using real publicly available

Monuments Dataset: “This data set shows the point location of

I Download data from http:

# for example: `read.table("file.txt", header = TRUE, sep="

I The filename is the path to your file, in quotes

There is a ‘wrapper’ function for reading CSV files:

function (file, header = TRUE, sep = ",", quote = "\"", dec

Note: the ... designates extra/optional arguments that can be

name zipCode neighborhood cou

colnames(mon) # column names

[1] "name" "zipCode" "neighborhood" "

head(mon$zipCode) # first few rows

[1] 21201 21202 21202 21202 21202 21202

The read.table() function returns a data.frame, which is the

str(mon) # structure of an R object

'data.frame': 84 obs. of 6 variables:

[1] "Name" "zipCode" "neighborhood" "

[1] "name" "zipCode" "neighborhood" "

While its nice to be able to read in a variety of data formats, it’s

write.table(x,file = "", append = FALSE, quote = TRUE, sep

x: the R data.frame or matrix you want to write

I “,” = .csv - Note there is also a write.csv() function

row.names: I like setting this to FALSE because I email these to

Note that row.names=TRUE would make the first column contain

Many data analysts collaborate with researchers who use Excel to

I Saving the Excel sheet as a .csv file, and using read.csv()

I haven package (https://cran.r-project.org/web/

You might also like