Lenguaje R C2
Lenguaje R C2
2
Working with Logical Vectors.
You can construct vectors that contain only logical values and use them as
an argument for the index functions.
Comparing values
To build logical vectors, you’d better know how to compare values, and R
contains a set of operators that you can use for this purpose.
To find out which games Granny scored more than five baskets we use the
which() function. Which is handy for long vectors.
> which(baskets.of.Granny > 5)
[1] 1 4 5
The which() function takes a logical vector as argument. Hence, you can
save the outcome of a logical vector in an object and pass that to the
which()
> the.best <- baskets.of.Geraldine < baskets.of.Granny
> the.best
[1] TRUE TRUE TRUE TRUE FALSE FALSE
> which(the.best)
[1] 1 2 3 4
Using logical vectors as indices
The index function doesn’t take only numerical vectors as arguments; it
also
works with logical vectors. If you use a logical vector to index, R returns a
vector with only the values for which the logical vector is TRUE.
> baskets.of.Granny[the.best]
[1] 12 4 4 6
> baskets.of.Granny[baskets.of.Granny > baskets.of.Geraldine]
[1] 12 4 4 6
> x == NA
That won’t work — you need to use is.na().
You have an easy way to figure out whether any value in a logical vector is TRUE
with the function any(). To ask R whether Granny was better than Geraldine in any
game, use this code:
> any(the.best)
[1] TRUE
To find out whether Granny was always better than Geraldine, use the following
code:
> all(the.best)
[1] FALSE
Powering Up Your Math
Vectorization is the Holy Grail for every R programmer. Using the indices
and vectorized operators, however, can save you a lot of coding and
calculation time.
Using arithmetic vector operations
A third set of arithmetic functions consists of functions in which the
outcome is dependent on more than one value in the vector. Often, the
idea behind these operations requires some form of looping over the
different values in a vector.
Calculations with missing values always return NA as a result. The same is
true for vector operations as well. R, however, gives you a way to simply
discard the missing values by setting the argument na.rm to TRUE.
> x <- c(3, 6, 2, NA, 1)
> sum(x)
[1] NA
> sum(x, na.rm = TRUE)
[1] 12
This argument works in sum(), prod(), min(), and max().
If you have a vector that contains only missing values and you set the
argument na.rm to TRUE, The sum of missing values is 0, the product is 1,
the minimum is Inf, and the maximum is ‐Inf.
Cumulating operations
Suppose that after every game, you want to update the total number of
baskets that Granny made during the season. After the second game, that’s
the total of the first two games; after the third game, it’s the total of the first
three games; and so on. In other words, you want to calculate the cumulative
sum of the baskets Granny scored. You can make this calculation easily by
using the function cumsum() as in the following example:
> cumsum(baskets.of.Granny)
[1] 12 16 21 27 36 39
In a similar way, cumprod() gives you the cumulative product. You also can
get the cumulative minimum and maximum with the related functions
cummin() and cummax(). To find the maximum number of baskets Geraldine
scored up to any given game, you can use the following code:
> cummax(baskets.of.Geraldine)
[1] 5 5 5 5 12 12
These functions don’t have an extra argument to remove missing values.
Missing values are propagated through the vector, as shown in the following
example:
> cummin(x)
[1] 3 3 2 NA NA
Working with vectors with NA elements
> d[1] 3 NA 5 7 NA 10
> index.clean.d <- which(!is.na(d))
> cumsum(d[index.clean.d])
[1] 3 8 15 2
> cumprod(d[index.clean.d])
[1] 3 15 105 1050
> cummin(d[index.clean.d])
[1] 3 3 3 3
> cummax(d[index.clean.d])
[1] 3 5 7 10
Calculating differences
You can calculate the difference in the number of baskets between every
two games Granny played by using the following code:
> diff(baskets.of.Granny)
[1] -8 0 2 3 -6 The vector returned by diff() is always one element shorter
than the original vector you gave as an argument.
The rule about missing values applies here, too. When your vector
contains a missing value, the result from that calculation will be NA.
> diff(d)[1] NA NA 2 NA NA
Just like the cumulative functions, the diff() function doesn’t have an
argument to eliminate the missing values.
Recycling arguments
Each time, you combine a vector with multiple values and one with a single value in a
function. R applies the function, using that single value for every value in the vector.
R repeats the shortest vector as often as necessary to carry out the task you asked it
to perform.
Suppose you split up the number of baskets Granny made into two‐pointers and
three‐pointers:
> Granny.pointers <- c(10, 2, 4, 0, 4, 1, 4, 2, 7, 2, 1, 2)
You arrange the numbers in such a way that for every game, first the number of two‐
pointers is given, followed by the number of three‐pointers. Now Granny wants to
know how many points she’s actually scored this season. You can calculate that
easily with the help of recycling:
> points <- Granny.pointers * c(2, 3)
> points
[1] 20 6 8 0 8 3 8 6 14 6 2 6
> sum(points)
[1] 87
If the length of the longer vector isn’t exactly a multiple of the length of the shorter
vector, you can get unexpected results.
Now Granny wants to know how much she improved every game.
> round(diff(baskets.of.Granny) / baskets.of.Granny[1:5] * 100)
2nd 3rd 4th 5th 6th
-67 25 20 50 -67
Getting Started with Reading and Writing
You assign text to variables. You manipulate these variables in many different
ways, including finding text within text and concatenating different pieces of
text into a single vector. You also use R functions to sort text andto find words
in text with some powerful pattern search functions, called regular
expressions. Finally, you work with factors, the R way of representing
categories(or categorical data, as statisticians call it).
Using Character Vectors for Text Data
Text in R is represented by character vectors. In the world of computer
programming, text often is referred to as a string. Here text refers to a single
element of a vector. Each element of a character vector is a bit of text,
also known as a string.
Named vectors, vectors in which each element has a name. This is useful
because you can then refer to the elements by name as well as position.
Assigning a value to a character vector
You assign a value to a character vector by using the assignment operator
(<‐), the same way you do for all other variables. You test whether a variable
is of class character, for example, by using the is.character() function as
follows:
> x <- "Helloworld!"
>is.character(x)
TRUE
Noticethat x is a character vector of length 1. To find out how many characters
are in the text, use nchar():
>length(x)
[1] 1
>nchar(x)
[1] 12
The results tell you that x has length 1 and that the single element in x has 12
characters.
Creating a character vector with more than one element
To create a character vector with more than one element, use the combine
function, c():
x <- c("Hello", "world!")
>length(x)
[1] 2
>nchar(x)
[1] 5 6
Notice that this time, R tells you that your vector has length 2 and that the first
element has five characters and the second element has six characters.
Extracting a subset of a vector
You use the same indexing rules for character vectors that you use for
numeric vectors (or for vectors of any type). The process of referring to a
subset of a vector through indexing its elements is also called subsetting. In
Other words, subsetting is the process of extracting a subset of a vector.
Use These built-in vectors whenever you need to make lists of things.
> letters[10]
[1] "j“
> LETTERS[24:26]
[1] "X" "Y" "Z“
You can use the tail() function to display the trailing elements of a vector. To
get the last five elements of
LETTERS, try:
> tail(LETTERS, 5)
[1] "V" "W" "X" "Y" "Z“
Similarly, you can use the head() function to get the first element of a
variable. By default, both head() and tail() returns six elements.
> head(letters, 10)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j”
Naming the values in your vectors
You can use these named vectors in R to associate text values (names)
with any other type of value. Then you can refer to these values by name in
addition to position in the list. This format has a wide range of applications,
for example, named vectors make it easy to create lookup tables.
> str(islands)
Named num [1:48] 11506 5500 16988 2968 16...
‐ attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia"...
Because each element in the vector has a value as well as a name, now
you can subset the vector by name. To retrieve the sizes of Asia, Africa, and
Antarctica, use the following:
> islands[c("Asia", "Africa", "Antarctica")]
Asia Africa Antarctica
16988 11506 5500
You use the names() function to retrieve the names of a named vector:
> names(islands)[1:9]
The names of the six largest islands
> names(sort(islands, decreasing = TRUE)[1:6])
[1] "Asia" "Africa" "North America"
[4] "South America" "Antarctica" "Europe”
Creating and assigning named vectors
You use the assignment operator (<‐) to assign names to vectors in much
the same way that you assign values to character vectors.
Imagine you want to create a named vector with the number of days in
each month. First, create a numeric vector containing the number of days
in each month. Then use the built‐in dataset month.name for the month
names, as follows:
> month.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
> names(month.days) <- month.name
> month.days
January February March April
31 28 31 30
May June July August
31 30 31 31
September October November December
30 31 30 31
Now you can use this vector to find the names of the months with 31 days:
> names(month.days[month.days == 31])
[1] "January" "March" "May"
[4] "July" "August" "October"
[7] "December”
Splitting text
> pangram <- "The quick brown fox jumps over the lazy dog"
> pangram
[1] "The quick brown fox jumps over the lazy dog"
To split this text at the word boundaries (spaces), you can use strsplit() as
follows:
> strsplit(pangram, " ")
[[1]]
[1] "The" "quick" "brown" "fox" "jumps" "over" "the" "lazy" "dog"
Lists allow you to combine all kinds of variables.
In the preceding example, this list has only a single component, a vector.
To extract a component from a list, you have to use double square
brackets. Split your pangram into words, and assign the first component to
a new variable called words, using double‐square‐brackets ([[ ]]) subsetting.
> words <- strsplit(pangram, " ")[[1]]
> words
[1] "The" "quick" "brown" "fox" "jumps" "over" "the" "lazy" "dog"
To find the unique elements of a vector, including a vector of text, you use
the unique() function.
> unique(tolower(words)) # tolower convert to lowercase
[1] "the" "quick" "brown" "fox" "jumps" "over" "lazy"
[8] "dog"
> colores <- strsplit(c("Los arboles son verdes", "El Mar es azul", "El sol es
amarillo"), ",")
> oracion.color1 <- colores[[1]]
> oracion.color1
[1] "Los arboles son verdes"
> color1.separadas <- strsplit(oracion.color1, " ")
> color1.separadas
[[1]]
[1] "Los" "arboles" "son" "verdes"
> color1.separadas <- strsplit(oracion.color1, "")
> color1.separadas
[[1]]
[1] "L" "o" "s" " " "a" "r" "b" "o" "l" "e" "s" " " "s" "o" "n" " " "v" "e" "r"
[20] "d" "e" "s"