R Chapter4
R Chapter4
UNIT I- Chapter 4
In R, a list is a data structure that can hold elements of different types, such as vectors, matrices,
data frames, or even other lists. Lists are very flexible and can be used to store heterogeneous
data. You can create lists using the list() function or by combining different objects into a list.
In the list foo, you’ve stored a 2x 2 numeric matrix, a logical vector, and a character string.
These are printed in the order they were supplied to list. Just as with vectors, you can use the
length function to check the number of components in a list.
You can retrieve components from a list using indexes, which are entered in double square
brackets.
Naming
You can name list components to make the elements more recognizable and easier to work
with., a name is an R attribute.
To name the components of a list as it’s being created, assign a label to each component in
the list command. Using some components of foo,create a new, named list.
Note that you can add components to any existing list by using the dollar operator and a new
name. Here’s an example using foo and baz from earlier:
Data Frames
A data frame is R’s most natural way of presenting a data set with a collection of recorded
observations for one or more variables. Like lists, data frames have no restriction on the data
types of the variables; you can store numeric data, factor data, and so on. The R data frame can
be thought of as a list with some extra rules attached. The most important distinction is that in
a data frame (unlike a list), the members must all be vectors of equal length. The data frame is
one of the most important and frequently used tools in R for statistical data analysis.
To create a data frame from scratch, use the data.frame function. You supply your data, grouped
by variable, as vectors of the same length—the same way you would construct a named list.
Consider the following example data set:
You can extract portions of the data by specifying row and column index positions (much as
with a matrix). Here’s an example:
This returns a factor vector with the gender of Meg, Chris, and Stewie. The following extracts
the entire third and first columns (in that order):
R> mydata[,c(3,1)]
Gender person
1 M Peter
2 F Lois
3 F Meg
4 M Chris
5 M Stewie
This results in another data frame giving the sex and then the name of each person. You can
also use the names of the vectors that were passed to data.frame to access variables even if you
don’t know their column index positions, which can be useful for large data sets. You use the
same dollar operator you used for member-referencing named lists.
You can report the size of a data frame—the number of records and variables—just as you’ve
seen for the dimensions of a matrix.
We can add data to an existing data frame. This could be adding the number of columns, or
it could be more records (adding to the number of rows).
rbind and cbind functions can be used to extend data frames intuitively. For example,
suppose you had another record to include in mydata: the age and gender of another individual,
Brian. The first step is to create a new data frame that contains Brian’s information.
Adding a variable/column to a data frame is also quite straightforward. Let’s say you’re now
given data on the classification of how funny these six individuals are, defined as a “degree of
funniness.” The degree of funniness can take three possible values: Low, Med (medium), and
High. Suppose Peter, Lois, and Stewie have a high degree of funniness, Chris and Brian have
a medium degree of funniness, and Meg has a low degree of funniness.
One alternative for adding a variable is to use the dollar operator,much like adding a new
member to a named list.
Suppose now you want to add another variable to mydata by including a column with the age
of the individuals in months, not years, calling this new variable age.mon.
We can compare particular column of dataframe with valu For example gender can be
compared with M or F to get logical result as follows .
R> mydata$gender=="M"
[1] TRUE FALSE FALSE TRUE TRUE TRUE
This flags the male records. You can use this with the matrix-like syntax to get the male-only
subset.
R>mydata[mydata$gender=="M",]
R> mydata[mydata$gender=="M",-3]
Or
___________