0% found this document useful (0 votes)
25 views8 pages

R Chapter4

Uploaded by

Akshay Hebbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

R Chapter4

Uploaded by

Akshay Hebbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

R Programming

UNIT I- Chapter 4

LISTS and Data Frames

In R, a list is a data structure that can hold elements of different types, such as vectors, matrices,
data frames, or even other lists. Lists are very flexible and can be used to store heterogeneous
data. You can create lists using the list() function or by combining different objects into a list.

In the list foo, you’ve stored a 2x 2 numeric matrix, a logical vector, and a character string.
These are printed in the order they were supplied to list. Just as with vectors, you can use the
length function to check the number of components in a list.

You can retrieve components from a list using indexes, which are entered in double square
brackets.

Accessing List Components

V Sem BCA R-Programming- Chapter-04 Lists and Data Frames 1


Accessing 2nd and 3rd component

Naming

You can name list components to make the elements more recognizable and easier to work
with., a name is an R attribute.

V Sem BCA R-Programming- Chapter-04 Lists and Data Frames 2


This has changed how the object is printed to the console. Where earlier it printed [[1]], [[2]],
and [[3]] before each component, now it prints the names you specified: $mymatrix,
$mylogicals, and $mystring. You can now perform member referencing using these names and
the dollar operator,rather than the double square brackets.

This is the same as calling foo[[1]].

To name the components of a list as it’s being created, assign a label to each component in
the list command. Using some components of foo,create a new, named list.

V Sem BCA R-Programming- Chapter-04 Lists and Data Frames 3


List Nesting

Note that you can add components to any existing list by using the dollar operator and a new
name. Here’s an example using foo and baz from earlier:

Data Frames

In R, a data frame is a two-dimensional tabular data structure similar to a spreadsheet or a table.


It is one of the most commonly used data structures for data manipulation and analysis in R.
Data frames allow you to store and manipulate data with rows and columns, where each column
can contain data of different types. .

A data frame is R’s most natural way of presenting a data set with a collection of recorded
observations for one or more variables. Like lists, data frames have no restriction on the data
types of the variables; you can store numeric data, factor data, and so on. The R data frame can
be thought of as a list with some extra rules attached. The most important distinction is that in
a data frame (unlike a list), the members must all be vectors of equal length. The data frame is
one of the most important and frequently used tools in R for statistical data analysis.

Creating Data Frame

To create a data frame from scratch, use the data.frame function. You supply your data, grouped
by variable, as vectors of the same length—the same way you would construct a named list.
Consider the following example data set:

V Sem BCA R-Programming- Chapter-04 Lists and Data Frames 4


R> mydata <- data.frame(person=c("Peter","Lois","Meg","Chris","Stewie"),
age=c(42,40,17,14,1),
gender=factor(c("M","F","F","M","M")))
R> mydata
person age gender
1 Peter 42 M
2 Lois 40 F
3 Meg 17 F
4 Chris 14 M
5 Stewie 1 M

You can extract portions of the data by specifying row and column index positions (much as
with a matrix). Here’s an example:

This returns a factor vector with the gender of Meg, Chris, and Stewie. The following extracts
the entire third and first columns (in that order):

R> mydata[,c(3,1)]

Gender person
1 M Peter
2 F Lois
3 F Meg
4 M Chris
5 M Stewie

This results in another data frame giving the sex and then the name of each person. You can
also use the names of the vectors that were passed to data.frame to access variables even if you
don’t know their column index positions, which can be useful for large data sets. You use the
same dollar operator you used for member-referencing named lists.

You can report the size of a data frame—the number of records and variables—just as you’ve
seen for the dimensions of a matrix.

V Sem BCA R-Programming- Chapter-04 Lists and Data Frames 5


Adding Data Columns and Combining Data Frames

We can add data to an existing data frame. This could be adding the number of columns, or
it could be more records (adding to the number of rows).
rbind and cbind functions can be used to extend data frames intuitively. For example,
suppose you had another record to include in mydata: the age and gender of another individual,
Brian. The first step is to create a new data frame that contains Brian’s information.

R> newrecord <-


data.frame(person="Brian",age=7,gender=factor("M",levels=levels(mydata$gender)))
R> newrecord
Person age gender
1 Brian 7 M

Now, you can simply call the following:

R> mydata <- rbind(mydata,newrecord)


R> mydata
person age gender
1 Peter 42 M
2 Lois 40 F
3 Meg 17 F
4 Chris 14 M
5 Stewie 1 M
6 Brian 7 M

Adding a variable/column to a data frame is also quite straightforward. Let’s say you’re now
given data on the classification of how funny these six individuals are, defined as a “degree of
funniness.” The degree of funniness can take three possible values: Low, Med (medium), and
High. Suppose Peter, Lois, and Stewie have a high degree of funniness, Chris and Brian have
a medium degree of funniness, and Meg has a low degree of funniness.

V Sem BCA R-Programming- Chapter-04 Lists and Data Frames 6


Now, you can simply use cbind to append this factor vector as a column to the existing
mydata.

R> mydata <- cbind(mydata,funny)


R> mydata
person age gender funny
1 Peter 42 M High
2 Lois 40 F High
3 Meg 17 F Low
4 Chris 14 M Med
5 Stewie 1 M High
6 Brian 7 M Med

One alternative for adding a variable is to use the dollar operator,much like adding a new
member to a named list.

Suppose now you want to add another variable to mydata by including a column with the age
of the individuals in months, not years, calling this new variable age.mon.

person age gender funny age.mon

Logical Record Subsets

We can compare particular column of dataframe with valu For example gender can be
compared with M or F to get logical result as follows .

R> mydata$gender=="M"
[1] TRUE FALSE FALSE TRUE TRUE TRUE

This flags the male records. You can use this with the matrix-like syntax to get the male-only
subset.

R>mydata[mydata$gender=="M",]

V Sem BCA R-Programming- Chapter-04 Lists and Data Frames 7


person age gender funny age.mon

R> mydata[mydata$gender=="M",-3]

Or

R> mydata[mydata$gender=="M", c("person","age","funny","age.mon")]

Above 2 statements skips 3rd column gender

___________

V Sem BCA R-Programming- Chapter-04 Lists and Data Frames 8

You might also like