0% found this document useful (0 votes)
72 views8 pages

Unit 1 R Reading-Writing Files

This document discusses reading and writing data in R. It covers: 1) Built-in and contributed R datasets that are ready to use, and how to access their documentation. 2) Importing data from external text and spreadsheet files by specifying the file, delimiter, header, missing values, and converting spreadsheets to csv format. 3) Writing data frames out to new text files while customizing the delimiter, missing values, and inclusion of row names.

Uploaded by

Shreya Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views8 pages

Unit 1 R Reading-Writing Files

This document discusses reading and writing data in R. It covers: 1) Built-in and contributed R datasets that are ready to use, and how to access their documentation. 2) Importing data from external text and spreadsheet files by specifying the file, delimiter, header, missing values, and converting spreadsheets to csv format. 3) Writing data frames out to new text files while customizing the delimiter, missing values, and inclusion of row names.

Uploaded by

Shreya Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Unit 1: Reading/Writing Data, Reading files: R-Ready Data Sets and

Reading in External Data Files, import/export data


R-Ready Data Sets
First, let’s take a brief look at some of the data sets that are built into the
software or are part of user-contributed packages. These data sets are useful
samples to practice with and to experiment with functionality. Enter data()
at the prompt to bring up a window listing these ready-to-use data sets
along with a one-line description. These data sets are organized in
alphabetical order by name and grouped by package.
Built-in Data Sets
There are a number of data sets contained within the built-in, automatically
loaded package datasets. To see a summary of the data sets contained in the
package, you can use the library function as follows:

R> library(help="datasets")

R-ready data sets have a corresponding help file where you can find
important details about the data and how it’s organized. For example, one
of the built-in data sets is named ChickWeight. If you enter ?ChickWeight
at the prompt.

Figure: The help file for the ChickWeight data set


As you can see, this file explains the variables and their values; it notes that
the data are stored in a data frame with 578 rows and 4 columns. Since the
objects in datasets are built in, all you have to do to access ChickWeight is
enter its name at the prompt. Let’s look at the first 15 records.

R> ChickWeight[1:15,]

Contributed Data Sets


There are many more R-ready data sets that come as part of contributed
packages. To access them, first install and load the relevant package.
Consider the data set ice.river, which is in the contributed package tseries
by Trapletti and Hornik (2013). First, you have to install the package,
which you can do by running the line install.packages("tseries") at the
prompt. Then, to access the components of the package, load it using
library:

R> library("tseries")
'tseries' version: 0.10-32
'tseries' is a package for time series analysis and
computational finance.
See 'library(help="tseries")' for details.

Now you can enter library(help="tseries") to see the list of data sets in this
package, and you can enter ?ice.river to find more details about the data set
you want to work with here. The help file describes ice.river as a “time
series object” comprised of river flow, precipitation, and temperature
measurements—data initially reported in Tong (1990). To access this
object itself, you must explicitly load it using the data function. Then you
can work with ice.river in your workspace as usual. Here are the first five
records:

R> data(ice.river)
R> ice.river[1:5,]
flow.vat flow.jok prec temp
[1,] 16.10 30.2 8.1 0.9
[2,] 19.20 29.0 4.4 1.6
[3,] 14.50 28.4 7.0 0.1
[4,] 11.00 27.8 0.0 0.6
[5,] 13.60 27.8 0.0 2.0

Reading in External Data Files


R has a variety of functions for reading characters from stored files and
making sense of them. You’ll look at how to read table-format files,
which are among the easiest for R to read and import.
The Table Format

Table-format files are best thought of as plain-text files with three key
features that fully define how R should read the data.

Header If a header is present, it’s always the first line of the file. This
optional feature is used to provide names for each column of data. When
importing a file into R, you need to tell the software whether a header is
present so that it knows whether to treat the first line as variable names
or, alternatively, observed data values.

Delimiter The all-important delimiter is a character used to separate the


entries in each line. The delimiter character cannot be used for anything else
in the file. This tells R when a specific entry begins and ends (in other
words, its exact position in the table).

Missing value This is another unique character string used exclusively to


denote a missing value. When reading the file, R will turn these entries
into the form it recognizes: NA.

Typically, these files have a .txt extension (highlighting the plain-text


style) or .csv (for comma-separated values).

Figure A plain-text table-format file

Note that the first line is the header, the values are delimited with a single
space, and missing values are denoted with an asterisk (*). Also, note that
each new record is required to start on a new line. Suppose you’re handed
this plain-text file for data analysis in R. The ready-to-use command
read.table imports table-format files, producing a data frame object, as
follows:
R>mydatafile<read.table(file="mydatafile.txt",
header=TRUE,sep="", na.strings="*",
stringsAsFactors=FALSE)

R> Mydatafile
person age sex funny age.mon
1 Peter NA M High 504
2 Lois 40 F <NA> 480
3 Meg 17 F Low 204
4 Chris 14 M Med 168
5 Stewie 1 M High NA
6 Brian NA M Med NA

In a call to read.table, file takes a character string with the filename and
folder location (using forward slashes), header is a logical value telling R
whether file has a header (TRUE in this case), sep takes a character string
providing the delimiter (a single space, " ", in this case), and na.strings
requests the characters used to denote missing values ("*" in this case).
Data from text file can also be load using import Dataset in Environment
section. That include the default instructions:
mydatafile <- read.csv("B:/data mining/data science elective/
r programming/course material/mydatafile.txt", sep="")
view(mydatafile)
Spreadsheet Workbooks
Next, let’s examine some ubiquitous spreadsheet software file formats. The
standard file format for Microsoft Office Excel is .xls or .xlsx.

Figure: A spreadsheet file of the data

To read this spreadsheet with R, you should first convert it to a table format.
In Excel, File ! Save As... provides a wealth of options. Save the spreadsheet
as a comma-separated file, called spreadsheet.csv. R has a shortcut version
of read.table, read.csv, for these files.

R>spread<- read.csv(file="spreadsheetfile.csv",
header=FALSE,stringsAsFactors=TRUE)

Here, the file argument again specifies the desired file, which has no
header, so header=FALSE. You set stringsAsFactors=TRUE because you
do want to treat the sex variable (the only non-numeric variable) as a factor.
There are no missing values, so you don’t need to specify na.strings
(though if there were, this argument is simply used in the same way as
earlier), and by definition, .csv files are comma-delimited, which read.csv
correctly implements by default, so you don’t need the sep argument. The
resulting data frame, spread, can then be printed in your R console.
If the dataset is in excel sheet than it can be directly import from import
Dataset in Environment section. That include the default instructions
library(readxl)
data <- read_excel("data.xlsx")
View(data)

Writing Out Data Files

Writing out new files from data frame objects with R is just as easy as
reading in files. R’s vector-oriented behavior is a fast and convenient way
to recode data sets, so it’s perfect for reading in data, restructuring it, and
writing it back out to a file.

Data Sets

The function for writing table-format files to your computer is write.table.


You supply a data frame object as x, and this function writes its contents
to a new file with a specified name, delimiter, and missing value string.
For example, the following line takes the mydatafile object from Section
8.2 and writes it to a file:

R> write.table(mydatafile,"newfile.txt")

R>write.table(x=mydatafile,newfile.txt",
sep="@", na="??",quote=FALSE,

row.names=FALSE)
You provide file with the folder location, ending in the filename you want
for your new data file. This command creates a new table-format file called
somenewfile.txt in the specified folder location, delimited by @ and with
missing values denoted with ?? (because you’re actually creating a new file,
the file.choose command doesn’t tend to be used here). Since mydatafile
has variable names, these are automatically written to the file as a header.
The optional logical argument quote determines whether to encapsulate
each non-numeric entry in double quotes, request no quotes by setting the
argument to FALSE. Another optional logical argument, row.names, asks
whether to include the row names of mydatafile (in this example, this would
just be the numbers 1 to 6), which you also omit with FALSE. The resulting
file, can be opened in a text editor.

Like read.csv, write.csv is a shortcut version of the write.table function


designed specifically for .csv files.
write.csv(spread,"output.csv")

You might also like