0% found this document useful (0 votes)
7 views17 pages

R - Chapter 3

Uploaded by

Akshay Hebbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views17 pages

R - Chapter 3

Uploaded by

Akshay Hebbar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

R Programming

UNIT I- Chapter 3

NON-NUMERIC VALUES

In this chapter, we’ll consider three important non-numeric data types: logicals, characters, and
factors. These data types play an important role in effective use of R.

Logical Values
Logical values (also simply called logicals) are based on a simple premise:a logical-valued
object can only be either TRUE or FALSE. These can be interpreted as yes/no, one/zero,
satisfied/not satisfied, and so on.

Logical values in R are written fully as TRUE and FALSE, but they are frequently abbreviated
as T or F. The abbreviated version has no effect on the execution of the code.
Assigning logical values to an object is the same as assigning numeric values.

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 1


A Logical Outcome: Relational Operators

Logicals are commonly used to check relationships between values. For example, you might
want to know whether some number a is greater than a predefined threshold b. For this, you
use the standard relational operators shown in Table, which produce logical values as results.

Examples

Vector Comparisons

Vector recycling also applies to logicals. Let’s use foo from earlier, along with a shorter
vector, baz.

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 2


Now let’s rewrite the contents of foo and bar as 5x2 column-filled matrices.

The same element-wise behavior applies here; if you compare the matrices, you get a matrix
of the same size filled with logicals.

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 3


any and all functions

There are two useful functions you can use to quickly inspect a collection of logical values:
any and all. When examining a vector, any returns TRUE if any of the logicals in the vector
are TRUE and returns FALSE otherwise.
The function all returns a TRUE only if all of the logicals are TRUE, and returns FALSE
otherwise. As a quick example, let’s work with two of the logical vectors formed by the
comparisons of foo and bar from the beginning of this section.

Multiple Comparisons: Logical Operators

Logical are especially useful when you want to examine whether multiple conditions are
satisfied. Often you’ll want to perform certain operations only if a number of different
conditions have been met. The previous section looked at relational operators, used to compare
the literal values (that is, numeric or otherwise) of stored R objects.

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 4


This Section look at logical operators, which are used to compare two TRUE or FALSE
objects. These operators are based on the statements AND and OR. Table below summarizes
the R syntax and the behavior of logical operators.

Examples

Elementwise Comparison
There is a short (&, |) and long (&&, ||) version of the AND and OR operators. The short
versions are meant for element-wise comparisons, where you have two logical vectors and you
want multiple logicals as a result. The long versions, which you’ve been using so far, are meant
for comparing two individual values and will return a single logical value.

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 5


Logicals Are Numbers

Because of the binary nature of logical values, they’re often represented with TRUE as 1 and
FALSE as 0. In fact, in R, if you perform elementary numeric operations on logical values,
TRUE is treated like 1, and FALSE is treated like 0.

Logical Subsetting and Extraction

Logicals can also be used to extract and subset elements in vectors and other objects, in the
same way as you’ve done so far with index vectors. Rather than entering explicit indexes in
the square brackets, you can supply logical flag vectors, where an element is extracted if the

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 6


corresponding entry in the flag vector is TRUE. As such, logical flag vectors should be the
same length as the vector that’s being accessed.

Examples

which() function

The which() function in R is used to determine the indices or positions of elements in a vector
or array that satisfy a certain condition. It is a very useful function for finding the location of
specific values or conditions within your data.

The R function which takes in a logical vector as the argument x and returns the indexes
corresponding to the positions of any and all TRUE entries.

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 7


To omit the negative entries of myvec, you could execute the following:

Logicals with matrices

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 8


This returns the four indexes of the elements that satisfied the relational check, but they are
provided as scalar values

which() Essentially treats the multidimensional object as a single vector (laidout column after
column) and then returns the vector of corresponding indexes. Say the matrix A was arranged
as a vector by stacking the columns first through third, using c(A[,1],A[,2],A[,3]). Then the
indexes returned make more sense.

Which() return dimensions specific indexes using the optional argument arr.ind (array indexes).
By default, this argument is set to FALSE, resulting in the vector converted indexes. Setting
arr.ind to TRUE, on the other hand, treats the object as a matrix or array rather than a vector,
providing you with the row and column positions of the elements you requested.

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 9


Characters

Character strings are another common data type, and are used to represent text.

Creating a String

Strings can be compared in several ways, the most common comparison being a check for
equality.

Other relational operators work as you might expect. For example, R considers letters that
come later in the alphabet to be greater than earlier.

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 10


Concatenation

There are two main functions used to concatenate one or more strings: cat and paste. The
difference between the two lies in how their contents are returned. The first function, cat, sends
its output directly to the console screen and doesn’t formally return anything. The paste
function concatenates its contents and then returns the final character string as a usable
R object. This is useful when the result of a string concatenation needs to be passed to another
function or used in some secondary way, as opposed to just being displayed. Consider the
following vector of character strings:

Examples

When calling cat or paste, you pass arguments to the function in the order you want them
combined. The following lines show identical usage yet different types of output from the two
functions:

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 11


These two functions have an optional argument, sep, that’s used as a separator between strings
as they’re concatenated. You pass sep a character string, and it will place this string between
all other strings you’ve provided to paste or cat. For example:

With Numeric values:

Escape Sequence

An escape sequence lets you enter characters that control the format and spacing of the string,
rather than being interpreted as normal text. Table below describes some of the most common
escape sequences.

Examples

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 12


Substrings and Matching

Pattern matching lets you check a given string to identify smaller strings within it. The function
substr takes a string x and extracts the part of the string between two-character positions
(inclusive), indicated with numbers passed as start and stop arguments.

Example

The function substr can also be used with the assignment operator to directly substitute in a
new set of characters. In this case, the replacement string should contain the same number of
characters as the selected area.

If the replacement string is longer than the number of characters indicated by start and stop,
then replacement still takes place, beginning at start and ending at stop. It cuts off any
characters that overrun the number of characters you’re replacing. If the string is shorter than
the number of characters you’re replacing, then replacement ends when the string is fully
inserted, leaving the original characters up to stop untouched.

Substitution is more flexible using the functions sub and gsub. The sub function searches a
given string x for a smaller string pattern contained within. It then replaces the first instance
with a new string, given as the argument replacement. The gsub function does the same thing,
but it replaces every instance of pattern. Here’s an example:

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 13


Factors

Factors in are data structures that are implemented to categorize the data or represent
categorical data and store it on multiple levels. Categorical data like this can play an important
role in data science.

Identifying Categories

To see how factors work, let’s start with a simple data set. Suppose you find eight people and
record their first name, sex, and month of birth

There’s really only one sensible way to represent the name of each person in R—as a vector
of character strings.
R> firstname <- c("Liz","Jolene","Susan","Boris","Rochelle","Tim","Simon","Amy")

You have more flexibility when it comes to recording gender, however. Coding females as 0
and males as 1, a numeric option would be as follows:

R> gender.num <- c(0,0,0,1,0,1,1,0)

Or

R> gender.char <- c("female","female","female","male","female","male","male","female")

Factors are typically created from a numeric or a character vector. To create a factor vector,
use the function factor, as in this example working with gender.num and gender.char:

R> gender.num.fac <- factor(x= gender.num)


R> gender.num.fac
[1] 0 0 0 1 0 1 1 0
Levels: 0 1
R> gender.char.fac <- factor(x= gender.char)
R> gender.char.fac
[1] female female female male female male male female
Levels: female male

You can extract the levels as a vector of character strings using the levels function.

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 14


R> levels(x= gender.num.fac)
[1] "0" "1"
R> levels(x= gender.char.fac)
[1] "female" "male

You can also relabel a factor using levels. Here’s an example:


R> levels(x= gender.num.fac) <- c("1","2")
R> gender.num.fac
[1] 1 1 1 2 1 2 2 1
Levels: 1 2

Defining and Ordering Levels

The gender factor represents the simplest kind of factor variable—there are only two possible levels with no
ordering, in that one level is not intuitively considered “higher than” or “following” the other. But mob (month of
birth) factors with levels that can be logically ordered; where there are 12 levels that have a natural order. Let’s
store the observed MOB data from earlier as a character vector.

Alphabetically, this result is of course correct—J doesn’t occur before D. But in terms of the
order of the calendar months, which is what we’re interested in, the FALSE result is incorrect.
If you create a factor object from these values, you can deal with both of these problems by
supplying additional arguments to the factor function. You can define additional levels by
supplying a character vector of all possible values to the levels argument and then instruct R to
order the values precisely as they appear in levels by setting the argument ordered to TRUE.

Here, the mob.fac vector contains the same individual entries at the same index positions as
the mob vector from earlier. But notice that this variable has 12 levels .

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 15


Combining and Cutting
Suppose you now observe three more individuals with MOB values "Oct", "Feb", and
"Feb", which are stored as a factor object, as follows.

Now we can create new factor object with additional 3 values as follows

cut() Function

In R, you can use the cut() function to divide a numeric vector into intervals or bins. This is
often useful for creating categorical variables from continuous data. The cut() function allows
you to specify the number of bins or the breakpoints for the bins. Here's how you can use it:

Consider the following numeric vector of length 10:

Suppose you want to bin the data as follows: Small refers to observations in the interval [0;2),
Medium refers to [2;4), and Large refers to [4; 6]. A square bracket refers to inclusion of its
nearest value, and a parenthesis indicates exclusion, so an observation y will fall in the Small
interval if 0 <= y < 2, in Medium if 2 <=y < 4, or in Large if 4 <= y <=6. For this you’d use
cut and supply your desired break intervals to the breaks argument:

This gives you a factor, with each observation now assigned an interval. However, notice that
your boundary intervals are back-to-front—you want the boundary levels on the left like [0,2),
rather than the right as they appear by default, (0,2]. You can fix this by setting the logical
argument right to FALSE.

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 16


But there’s still a problem:the final interval currently excludes 6, and you want this maximum
value to be included in the highest level. You can fix this with another logical argument:
include.lowest.

Finally we can add labels to the categories, rather than using the interval levels that R applies
by default, by passing a character string vector to the labels argument. The order of labels must
match the order of the levels in the factor object.

_________

V Sem BCA R-Programming- Chapter-03 Non-Numeric Values 17

You might also like