R Module 4 - Data_IO
R Module 4 - Data_IO
Andrew Jaffe
January 4, 2016
Before we get Started: Working Directories
dir("..")
I Copy the code to set your working directory from the History
tab in RStudio (top right)
I Confirm the directory contains “day1.R” using dir()
Data Input
R Studio features some nice “drop down” support, where you can
run some tasks by selecting them from the toolbar.
For example, you can easily import text datasets using the “Tools
–> Import Dataset” command. Selecting this will bring up a new
screen that lets you specify the formatting of your text file.
After importing a datatset, you get the corresponding R commands
that you can enter in the console if you want to re-import data.
Data Input
So what is going on “behind the scenes”?
read.table(): Reads a file in table format and creates a data
frame from it, with cases corresponding to lines and variables to
fields in the file.
# the four ones I've put at the top are the important input
read.table( file, # filename
header = FALSE, # are there column names?
sep = "", # what separates columns?
as.is = !stringsAsFactors, # do you want charact
quote = "\"'", dec = ".", row.names, col.names,
na.strings = "NA", nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.line
strip.white = FALSE, blank.lines.skip = TRUE, co
stringsAsFactors = default.stringsAsFactors())
read.csv
mon = read.csv("../../data/Monuments.csv",header=TRUE,as.is
head(mon)
names(mon)[1] = "Name"
names(mon)
names(mon)[1] = "name"
names(mon)
For example, we can write back out the Monuments dataset with
the new column name:
names(mon)[6] = "Location"
write.csv(mon, file="monuments_newNames.csv", row.names=FAL
For single worksheet .xlsx files, I often just save the spreadsheet as a
.csv file (because I often have to strip off additional summary data
from the columns)
For an .xlsx file with multiple well-formated worksheets, I use the
xlsx, readxl, or openxlsx package for reading in the data.
Data Input - Other Software