Unit 1 R Reading-Writing Files
Unit 1 R Reading-Writing Files
R> library(help="datasets")
R-ready data sets have a corresponding help file where you can find
important details about the data and how it’s organized. For example, one
of the built-in data sets is named ChickWeight. If you enter ?ChickWeight
at the prompt.
R> ChickWeight[1:15,]
R> library("tseries")
'tseries' version: 0.10-32
'tseries' is a package for time series analysis and
computational finance.
See 'library(help="tseries")' for details.
Now you can enter library(help="tseries") to see the list of data sets in this
package, and you can enter ?ice.river to find more details about the data set
you want to work with here. The help file describes ice.river as a “time
series object” comprised of river flow, precipitation, and temperature
measurements—data initially reported in Tong (1990). To access this
object itself, you must explicitly load it using the data function. Then you
can work with ice.river in your workspace as usual. Here are the first five
records:
R> data(ice.river)
R> ice.river[1:5,]
flow.vat flow.jok prec temp
[1,] 16.10 30.2 8.1 0.9
[2,] 19.20 29.0 4.4 1.6
[3,] 14.50 28.4 7.0 0.1
[4,] 11.00 27.8 0.0 0.6
[5,] 13.60 27.8 0.0 2.0
Table-format files are best thought of as plain-text files with three key
features that fully define how R should read the data.
Header If a header is present, it’s always the first line of the file. This
optional feature is used to provide names for each column of data. When
importing a file into R, you need to tell the software whether a header is
present so that it knows whether to treat the first line as variable names
or, alternatively, observed data values.
Note that the first line is the header, the values are delimited with a single
space, and missing values are denoted with an asterisk (*). Also, note that
each new record is required to start on a new line. Suppose you’re handed
this plain-text file for data analysis in R. The ready-to-use command
read.table imports table-format files, producing a data frame object, as
follows:
R>mydatafile<read.table(file="mydatafile.txt",
header=TRUE,sep="", na.strings="*",
stringsAsFactors=FALSE)
R> Mydatafile
person age sex funny age.mon
1 Peter NA M High 504
2 Lois 40 F <NA> 480
3 Meg 17 F Low 204
4 Chris 14 M Med 168
5 Stewie 1 M High NA
6 Brian NA M Med NA
In a call to read.table, file takes a character string with the filename and
folder location (using forward slashes), header is a logical value telling R
whether file has a header (TRUE in this case), sep takes a character string
providing the delimiter (a single space, " ", in this case), and na.strings
requests the characters used to denote missing values ("*" in this case).
Data from text file can also be load using import Dataset in Environment
section. That include the default instructions:
mydatafile <- read.csv("B:/data mining/data science elective/
r programming/course material/mydatafile.txt", sep="")
view(mydatafile)
Spreadsheet Workbooks
Next, let’s examine some ubiquitous spreadsheet software file formats. The
standard file format for Microsoft Office Excel is .xls or .xlsx.
To read this spreadsheet with R, you should first convert it to a table format.
In Excel, File ! Save As... provides a wealth of options. Save the spreadsheet
as a comma-separated file, called spreadsheet.csv. R has a shortcut version
of read.table, read.csv, for these files.
R>spread<- read.csv(file="spreadsheetfile.csv",
header=FALSE,stringsAsFactors=TRUE)
Here, the file argument again specifies the desired file, which has no
header, so header=FALSE. You set stringsAsFactors=TRUE because you
do want to treat the sex variable (the only non-numeric variable) as a factor.
There are no missing values, so you don’t need to specify na.strings
(though if there were, this argument is simply used in the same way as
earlier), and by definition, .csv files are comma-delimited, which read.csv
correctly implements by default, so you don’t need the sep argument. The
resulting data frame, spread, can then be printed in your R console.
If the dataset is in excel sheet than it can be directly import from import
Dataset in Environment section. That include the default instructions
library(readxl)
data <- read_excel("data.xlsx")
View(data)
Writing out new files from data frame objects with R is just as easy as
reading in files. R’s vector-oriented behavior is a fast and convenient way
to recode data sets, so it’s perfect for reading in data, restructuring it, and
writing it back out to a file.
Data Sets
R> write.table(mydatafile,"newfile.txt")
R>write.table(x=mydatafile,newfile.txt",
sep="@", na="??",quote=FALSE,
row.names=FALSE)
You provide file with the folder location, ending in the filename you want
for your new data file. This command creates a new table-format file called
somenewfile.txt in the specified folder location, delimited by @ and with
missing values denoted with ?? (because you’re actually creating a new file,
the file.choose command doesn’t tend to be used here). Since mydatafile
has variable names, these are automatically written to the file as a header.
The optional logical argument quote determines whether to encapsulate
each non-numeric entry in double quotes, request no quotes by setting the
argument to FALSE. Another optional logical argument, row.names, asks
whether to include the row names of mydatafile (in this example, this would
just be the numbers 1 to 6), which you also omit with FALSE. The resulting
file, can be opened in a text editor.