Unit 2 in R Updated-HN
Unit 2 in R Updated-HN
1. if
The if-else in R enforce conditional execution of code. They are an important part of
R’s decision-making capability. It allows us to make a decision based on the result of
a condition. The if statement contains a condition that evaluates to a logical output
CODE
if(a>b){
print("a is greater than b")
} else{
print("b is greater than a")
}
2. ifelse() Function
The ifelse() function acts like the if-else structure. The following is the syntax of
the ifelse() function in R:
3. switch
The switch is an easier way to choose between multiple alternatives than multiple
if-else statements. The R switch takes a single input argument and executes a
particular code based on the value of the input. Each possible value of the input is
called a case.
4. for loops
The for loop in R, repeats through sequences to perform repeated tasks. They work
with an iterable variable to go through a sequence. The following is the syntax of for
loops in R:
5. while Loops
6.break Statement
The break statement can break out of a loop. Imagine a loop searching a specific
element in a sequence. The loop needs to keep going until either it finds the element
or until the end of the sequence. If it finds the element early, further looping is not
needed. In such a case, the R break statement can “break” us out of the loop early.
8. repeat loop
The repeat loop in R initiates an infinite loop from the get-go. The only way to get
out of the loop is to use the break statement. The repeat loop is useful when you don’t
know the required number of iterations.
2.2 Function
A function is a set of statements organized together to perform a specific task. R has a
large number of in-built functions and the user can create their own functions.
2.2.1 Function Definition
An R function is created by using the keyword function. The basic syntax of an R
function definition is as follows −
function_name <- function(arg_1, arg_2, ...) {
Function body
}
Function Components
The different parts of a function are −
Function Name − This is the actual name of the function. It is stored in R
environment as an object with this name.
Arguments − An argument is a placeholder. When a function is invoked, you
pass a value to the argument. Arguments are optional; that is, a function may
contain no arguments. Also arguments can have default values.
Function Body − The function body contains a collection of statements that
defines what the function does.
Return Value − The return value of a function is the last expression in the
function body to be evaluated.
R has many in-built functions which can be directly called in the program without
defining them first. We can also create and use our own functions referred as user
defined functions.
Dates
R has developed a special representation for dates and times. Dates are represented by
the Date class and times are represented by the POSIXct or the POSIXlt class. Dates
are stored internally as the number of days since 1970-01-01 while times are stored
internally as the number of seconds since 1970-01-01.
Times
Times are represented by the POSIXct or the POSIXlt class. POSIXct is just a very
large integer under the hood. It use a useful class when you want to store times in
something like a data frame. POSIXlt is a list underneath and it stores a bunch of
other useful information like the day of the week, day of the year, month, day of the
month. This is useful when you need that kind of information.
There are a number of generic functions that work on dates and times to help you
extract pieces of dates and/or times.
2.5 Introduction to Functions
2.5.1Preview Of Some Important R Data Structures
A data structure is a particular way of organizing data in a computer so that it can be
used effectively. The idea is to reduce the space and time complexities of different
tasks. Data structures in R programming are tools for holding multiple values.
R’s base data structures are often organized by their dimensionality (1D, 2D, or nD)
and whether they’re homogeneous (all elements must be of the identical type) or
heterogeneous (the elements are often of various types). This gives rise to the six data
types which are most frequently utilized in data analysis.
The most essential data structures used in R include:
Vectors
Lists
Dataframes
Matrices
Arrays
Factors
2.6 Vectors in R
In R, Vector is a basic data structure in R that contains element of similar type. These
data types in R can be logical, integer, double, character, complex or raw.
In R using the function, typeof() one can check the data type of vector.
One more significant property of R vector is its length. The
function length() determines the number of elements in the vector.
Adding and Deleting Vector Elements
Vectors are stored like arrays in C, contiguously, and thus you cannot insert or delete
elements—something you may be used to if you are a Python programmer. The size of
a vector is determined at its creation, so if you wish to add or delete elements.
Obtaining the Length of a Vector
We can obtain the length of a vector by using the length () function:
Character strings are another common data type, used to represent text.In R,
character strings (or simply "strings") are indicated by double quotation marks. To
create a string, just enter text between two paris of these quotes.
Most characters can be used in a string, with a couple of exceptions, one being
the backslash character, "\". This character is called the escape character and is used
to insert characters that would otherwise be difficult to add. For example, without an
escape character, adding a double quote inside a string would pose a problem, as R
would assume that you meant the string to end upon seeing the double quote. With an
escape character, however, adding a double quote inside your string is easy, you simply
prepend the double quote with the backslash. The table below shows some of the other
characters that can be "escaped" in this way.
2.8 Matrices in R
Matrices are Data frames which contain lists of homogeneous data in a tabular format.
We can perform arithmetic operations on some elements of the matrix or the whole
matrix itself in R. Matrices are special cases of a more general R type of object: arrays.
Arrays can be multidimensional. For example, a three-dimensional array would consist
of rows, columns, and layers, not just rows and columns as in the matrix case.
Matrices can represent the binding of two or more vectors of equal length. Analogous
operations can be used to change the size of a matrix. For instance, the rbind() (row
bind) and cbind() (column bind) functions let you add rows or columns to a matrix
Applying Functions to Matrix Rows and Columns
One of the most famous and most used features of R is the *apply() family of
functions, such as apply(), tapply(), and lapply(). Here, we’ll look at apply(), which
instructs R to call a user-specified function on each of the rows or each of the columns
of a matrix.
Using the apply() Function
This is the general form of apply for matrices:
2.9 Lists in R
Lists are R Data Types stores collections of objects of differing lengths and types
using list() function. In contrast to a vector, in which all elements must be of the same
mode, R’s list structure can combine objects of different types.. The list plays a central
role in R, forming the basis for data frames, object-oriented programming, and so on.
Creating Lists
Technically, a list is a vector. Ordinary vectors—those of the type we’ve been
using so far in this book—are termed atomic vectors, since their components cannot be
broken down into smaller components. In contrast, lists are referred to as recursive
vectors.
Let’s consider an employee database. For each employee, we wish to store the name,
salary, and a Boolean indicating union membership. Since we have three different
modes here—character, numeric, and logical—it’s a perfect place for using lists. Our
entire database might then be a list of lists, or some other kind of list such as a data
frame, though we won’t pursue that here.
2.10 Data Frames
The sequence and number of observations in the vectors must be the same for each
vector in the Data Frame to represent a DataSet.
The first, second and third entries in each vector, for example, must represent the
observations collected from first, second and third sampling units respectively.
Programming in R
There are several built-in functions library and add-on tools available for R and they
continue to grow at an incredible rate. Yet programs need performing a task for which
no functions exist.Since R is itself a programming language, extending its functionality
to accommodate more procedures depends on the complexity of the procedure and the
level of R proficiency of the user.
2.11 Classes Vectors: Generating sequences
sequence() function in R Language is used to create a vector of sequenced elements.
It creates vectors with specified length, and specified differences between elements. It
is similar to seq() function.
Syntax: sequence(x)
Parameters:
x: Maximum element of vector
Logical index
Another really useful way to extract data from a vector is to use a logical expression
as an index. For example, to extract all elements with a value greater than 4 in the
vector my_vec
Here, the logical expression is my_vec > 4 and R will only extract those elements
that satisfy this logical condition. So how does this actually work? If we look at the
output of just the logical expression without the square brackets you can see that R
returns a vector containing either TRUE or FALSE which correspond to whether the
logical condition is satisfied for each element. In this case only the 4 th and
8th elements return a TRUE as their value is greater than 4.
The basic arithmetic operations can all be performed on multi-dimensional arrays, and
act on the arrays element-by-element. For example,the runtime scales linearly with the
number of elements in the multi-dimensional array, because the arithmetic operation is
performed on each individual index. For example, the runtime for adding a pair
of M×N matrices scales as (O(MN)).
The most commonly-used function for array multiplication is the dot function, which
takes two array inputs x and y and returns their "dot product". It constructs a product by
summing over the last index of array x, and over the next-to-last index of array y (or
over its last index, if y is a 1D array). This may sound like a complicated rule, but you
should be able to convince yourself that it corresponds to the appropriate type of
multiplication operation for the most common cases encountered in linear algebra:
2.21VECTOR INDEXING:
Vector elements are accessed using indexing vectors, which can be numeric,
character or logical vectors.You can access an individual element of a vector by its
position (or "index"), indicated using square brackets. In R, the first element has an
index of 1.
Vectors are the most basic data types in R. Even a single object created is also
stored in the form of a vector. Vectors are nothing but arrays as defined in other
languages. Vectors contain a sequence of homogeneous types of data. If mixed values
are given then it auto converts the data according to the precedence. There are various
operations that can be performed on vectors in R.
1. Combining Vector in R
Functions are used to combine vectors. In order to combine the two vectors in R, we
will create two new vectors ‘n’ and ‘s’. Then, we will create another vector that will
combine these two using c(n,s) as follows:
For example:
> #Author DataFlair
> n = c(1, 2, 3, 4)
> s = c("Hadoop", "Spark", "HIVE", "Flink")
> c(n,s)
2. Arithmetic Operations on Vectors in R
Arithmetic operations on vectors can be performed member-by-member.
For example:
Suppose we have two vectors a and b:
> a - b #Subtraction
For division:
> a / b #Division
For remainder operation:
For example:
> #Author DataFlair
> S = c("bb", "cc")
> L = c(TRUE, TRUE) #Defining our Logical Vector
> S[L] #This will return elements of vector S that corrospond to logic vector L
3. Numeric Index
For indexing a numerical value in R, we specify the index between square braces [ ]. If
our index is negative, then R will return us all the values except for the index that we
have specified. For example, specifying [-2] will prompt R to convert -2 into its
absolute value and then search for the value that occupies that index.
5. Duplicate Index
The index vector allows duplicate values. Hence, the following retrieves a member
twice in one operation.
For example:
> # Author DataFlair
> s[c(2,3,3)]
6. Range Indexes
To produce a vector slice between two indexes, we can use the colon operator “:“. It is
convenient for situations involving large vectors.
For example:
> # Author DataFlair
> s[1:3]