0% found this document useful (0 votes)
9 views

Data Structure (Data Frame)

Uploaded by

Md Ruhul Amin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data Structure (Data Frame)

Uploaded by

Md Ruhul Amin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Statistical Computing III: R Programming

Md. Ershadul Haque


Associate Professor
Department of Statistics
University of Dhaka
Advanced Data Structures
Sometimes data requires more complex storage than simple vectors
and thankfully R provides a host of data structures. The most common
are the data.frame, list and matrix followed by the array. Of
these, the data.frame will be most familiar to anyone who has used a
spreadsheet, the matrix to people familiar with matrix math and the
list to programmers.
data.frame
• Perhaps one of the most useful features of R is the data.frame. It
is one of the most often cited reasons for R’s ease of use.

• On the surface a data.frame is just like an Excel spreadsheet in


that it has columns and rows. In statistical terms, each column is a
variable and each row is an observation.

• In terms of how R organizes data.frames, each column is actually


a vector, each of which has the same length. That is very important
because it lets each column hold a different type of data.
data.frame (cont…)
• There are numerous ways to construct a data.frame, the simplest
being to use the data.frame function.

x <- 10:1

y <- -4:5

u<-LETTERS[1:10]

v<-letters[1:10]

dataf<- data.frame(x, y, u, v)

dataf
data.frame (cont…)
• To access multiple columns by column number, make the column
argument a numeric vector of the column number.
dataf[, c(1,4)]

• To access multiple columns by name, make the column argument a


character vector of the names.
dataf[, c("x", "y")]
Reading Data into R
There are numerous ways in R to get data; the most common is probably
reading comma separated values (CSV) files. Of course there are many
other options that we will cover as well.
Reading CSVs
• The best way to read data from a CSV file is to use read.table. It
might be tempting to use read.csv but that is more trouble than it is
worth, and all it does is call read.table with some arguments
preset. The result of using read.table is a data.frame.
• The first argument to read.table is the full path of the file to be
loaded. The file can be sitting on disk or even on the Web.
Reading Data into R (cont…)
Reading CSVs
• Read the .csv file into R using read.csv
mydata<- read.csv("nutrition_data.csv")
Reading Data into R (cont…)
Excel Data
• To read excel files (xls/xlsx) one can use the package readxl. This
package provides the functions read_excel to read both .xls and
.xlsx files
library(readxl)
data<- read_excel("C:/path/file_name.xlsx",sheet = 1)
Reading Data from Other Sources
• R is capable of reading data from most formats such as Excel,
SAS, STATA, SPSS
• The foreign package in R can be used to read data stored as
SPSS SAV files, STATA DTA files, or SAS XPORT libraries.
install.packages("foreign")
library(foreign)
• The read.spss() function will read data from SPSS.
data<-read.spss('gssnet.sav',to.data.frame=TRUE)
To read the numeric values instead of factor levels
use.value.labels=FALSE
Reading Data from Other Sources

• The function to read data from STATA is read.data().


dataSTATA<-read.dta("C:/path/file_name.dta")
R Built-in Data Sets
• To see the list of pre-loaded data, type
data()
• To load and print a data set named mtcars, type:
data(mtcars)
head(mtcars)
head(mtcars,10)
• To select end of mtcars data set
tail(mtcars)
• To access a data set from a installed package, load that package
library(MASS)
data(birthwt)
• To access a data set from a package, first install that package
install.packages("lmtest")
library(lmtest)
data(wages)
R Built-in Data Sets
• You can assign a name for the data set mtcars:
data<-mtcars
• The class() function will give the type of data structure
class(data)
Avoiding $-notation (attach and detach)
• The notation for accessing variables in data frames gets rather
heavy if you repeatedly have to write longish commands like
library(MASS)
data(birthwt)
• Fortunately, you can make R look for objects among the variables in
a given data frame, for example thuesen. You write
attach(birthwt)
attach and detach (cont…)
• then thuesen’s data are available without the clumsy $-notation:

You might also like