Data
Data
Usually you will want to import data from a file corresponding to data associated with a
homework problem. Such a file will usually end with the extensions .txt or .dat.
The data files for this course will always be available on the CD that comes with the text
and/or on the course web page. A data file will consist of columns of numbers, with
nothing separating the columns but “white space.” If each column has a title on top
describing what the data in the column represents (e.g., “age,” “weight,” “income,” etc.),
we will say that the file has a “header.”
The easiest way to import the data into R and have it readily available for the current and
future sessions is to first save the data file into the R-2.5.1 working directory. When you
installed R, this directory was placed on your hard drive. In Windows, it will usually be
C:\Program Files\R\R-2.5.1. To put files there from the book CD, locate the
file on the CD then copy it into the R-2.5.1 directory. Within R, you can find out what the
current working directory is by choosing Change dir… from the File menu.
Alternatively, you can load the file from the CD or the course web page.
Suppose you want to work with the data from Problem 19 of Chapter 1, which is in a file
named CH01PR19.txt which you have saved from the CD or the course web page into
your R working directory. Assume the file has no header. You will want to create a
Table object in R containing this data. First choose an appropriate name for the table.
Assume you choose to name it Data. Then at the R command prompt type
> Data <- read.table("CH01PR19.txt")
and hit the “Enter” key (but don’t type the leading > symbol; it’s already there!). Then
there will be a Table object in R named Data containing the data in rows and columns.
To view it, you would type
> Data
at the command prompt and hit Enter. However, if it is a large file, you might not be able
to view the whole table at once. In that case, try
> head(Data)
Note that, in the absence of a header, the columns will be named V1, V2, etc., and the
rows will be numbered.
Now if the file does have a header (which you may have added yourself), you need to
change the above command to
> Data <- read.table("CH01PR19.txt", header=TRUE)
In this case, when you view the file you will see the title for each column at the top of
each column instead of V1, V2, etc. R regards these titles as names for the columns, and
not as data.
If you want to load the data file from some other directory, or if you wish to load it from
the CD, you need to type the full path name in the read.table() command. For
instance, if you want to load the data for problem 19 in chapter one (which has no header)
directly from the CD in the D:/ drive into a table called Data, you would use the lengthy
command:
> Data <- read.table("D:\KutnerData\Chapter 1 Data Sets\CH01PR19.txt")
(Note: On a Mac, the slashes would go in the other direction.) As you can see, it would
be easier to just copy this file into the R working directory first. If you want to obtain the
data from a web page, put its complete URL within the quotes. For example:
> Data <- read.table("http://www.stat.ucdavis.edu/sta108/CH01PR19.txt")
(Note: this URL doesn’t actually exist). Again, if the file has a header you must add the
header=TRUE setting within the read.table() command (after a comma).
Now suppose the file Data has two columns, and the first column is the GPA, while the
second column is ACT score. If you would like to rename the columns in your R data
table so that each column has a descriptive title, you could give the R command:
> names(Data) <- c("GPA", "ACT")
Then when you view the file the titles of the columns will have the new names you
assigned. Note that you can also give the columns these titles in the data file before you
load it into R, and then use the header=TRUE setting when loading. Also, to avoid
errors, you should never include a space in the title of any column
If you want to see which R objects are currently in your R environment, you can type
> ls()
If you no longer need one or more of these objects, you can remove them. For instance,
if you are done with Data, you can type
> rm(Data)
Then Data will no longer be in your current R environment. When you quit R, if you
wish to keep all the new objects in your current R environment, be sure to answer “Yes”
when asked, “Save workspace image?”