0% found this document useful (0 votes)
6 views14 pages

Unit 2 in R Updated-HN

The document provides an overview of control structures and vectors in R programming, detailing various control statements such as if-else, loops, and functions. It explains the importance of scoping rules, the representation of dates and times, and the fundamental data structures in R, including vectors, lists, matrices, and data frames. Additionally, it covers how to manipulate these data structures and the concept of lazy evaluation in functions.

Uploaded by

sathyav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views14 pages

Unit 2 in R Updated-HN

The document provides an overview of control structures and vectors in R programming, detailing various control statements such as if-else, loops, and functions. It explains the importance of scoping rules, the representation of dates and times, and the fundamental data structures in R, including vectors, lists, matrices, and data frames. Additionally, it covers how to manipulate these data structures and the concept of lazy evaluation in functions.

Uploaded by

sathyav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

UNIT 2 CONTROL STRUCTURES AND VECTORS

2.1 CONTROL STRUCTURES


Control statements are expressions used to control the execution and flow of the
program based on the conditions provided in the statements. In R, there are decision-
making structures like if-else that control execution of the program conditionally.
There are also looping structures that loop or repeat code sections based on certain
conditions and state. These structures are used to make a decision after assessing the
variable.
In R programming, there are 8 types of control statements as follows:
 if condition
 if-else condition
 for loop
 nested loops
 while loop
 repeat and break statement
 return statement
 next statement

1. if
The if-else in R enforce conditional execution of code. They are an important part of
R’s decision-making capability. It allows us to make a decision based on the result of
a condition. The if statement contains a condition that evaluates to a logical output

CODE
if(a>b){
print("a is greater than b")
} else{
print("b is greater than a")
}

2. ifelse() Function
The ifelse() function acts like the if-else structure. The following is the syntax of
the ifelse() function in R:

ifelse(condition, exp_if_true, exp_if_false)

3. switch
The switch is an easier way to choose between multiple alternatives than multiple
if-else statements. The R switch takes a single input argument and executes a
particular code based on the value of the input. Each possible value of the input is
called a case.

4. for loops
The for loop in R, repeats through sequences to perform repeated tasks. They work
with an iterable variable to go through a sequence. The following is the syntax of for
loops in R:

5. while Loops

The while loop in R evaluates a condition. If the condition evaluates to TRUE it


loops through a code block, whereas if the condition evaluates to FALSE it exits the
loop. The while loop in R keeps looping through the enclosed code block as long as
the condition is TRUE. This can also result in an infinite loop sometimes which is
something to avoid.

6.break Statement
The break statement can break out of a loop. Imagine a loop searching a specific
element in a sequence. The loop needs to keep going until either it finds the element
or until the end of the sequence. If it finds the element early, further looping is not
needed. In such a case, the R break statement can “break” us out of the loop early.

8. repeat loop
The repeat loop in R initiates an infinite loop from the get-go. The only way to get
out of the loop is to use the break statement. The repeat loop is useful when you don’t
know the required number of iterations.

2.2 Function
A function is a set of statements organized together to perform a specific task. R has a
large number of in-built functions and the user can create their own functions.
2.2.1 Function Definition
An R function is created by using the keyword function. The basic syntax of an R
function definition is as follows −
function_name <- function(arg_1, arg_2, ...) {
Function body
}
Function Components
The different parts of a function are −
 Function Name − This is the actual name of the function. It is stored in R
environment as an object with this name.
 Arguments − An argument is a placeholder. When a function is invoked, you
pass a value to the argument. Arguments are optional; that is, a function may
contain no arguments. Also arguments can have default values.
 Function Body − The function body contains a collection of statements that
defines what the function does.
 Return Value − The return value of a function is the last expression in the
function body to be evaluated.
R has many in-built functions which can be directly called in the program without
defining them first. We can also create and use our own functions referred as user
defined functions.

Lazy Evaluation of Function


Arguments to functions are evaluated lazily, which means so they are evaluated only
when needed by the function body.
# Create a function with arguments.
new.function <- function(a, b) {
print(a^2)
print(a)
print(b)
}

# Evaluate the function without supplying one of the arguments.


new.function(6)
When we execute the above code, it produces the following result −
[1] 36
[1] 6
Error in print(b) : argument "b" is missing, with no default
2.3 Scoping Rules
The scoping rules of a language determine how a value is associated with a free
variable in a function. R uses lexical scoping or static scoping. An alternative to lexical
scoping is dynamic scoping which is implemented by some languages. Lexical scoping
turns out to be particularly useful for simplifying statistical computations.
The scoping rules of a language determine how values are assigned to free variables.
Free variables are not formal arguments and are not local variables (assigned insided
the function body).
what is an environment?
An environment is a collection of (symbol, value) pairs, i.e. x is a symbol
and 3.14 might be its value. Every environment has a parent environment and it is
possible for an environment to have multiple “children”. The only environment without
a parent is the empty environment.

2.3.1Lexical Scoping: Why Does It Matter?


Typically, a function is defined in the global environment, so that the values of free
variables are just found in the user’s workspace. This behavior is logical for most
people and is usually the “right thing” to do. However, in R you can have functions
defined inside other functions (languages like C don’t let you do this). Now things get
interesting—in this case the environment in which a function is defined is the body of
another function!
Here is an example of a function that returns another function as its return value.
Remember, in R functions are treated like any other object and so this is perfectly
valid.
2.3.2 Lexical vs. Dynamic Scoping
With lexical scoping the value of y in the function g is looked up in the environment
in which the function was defined, in this case the global environment, so the value
of y is 10. With dynamic scoping, the value of y is looked up in the environment from
which the function was called (sometimes referred to as the calling environment). In R
the calling environment is known as the parent frame. In this case, the value of y would
be 2.
When a function is defined in the global environment and is subsequently called from
the global environment, then the defining environment and the calling environment are
the same. This can sometimes give the appearance of dynamic scoping.
2.4 Dates And Times

Dates
R has developed a special representation for dates and times. Dates are represented by
the Date class and times are represented by the POSIXct or the POSIXlt class. Dates
are stored internally as the number of days since 1970-01-01 while times are stored
internally as the number of seconds since 1970-01-01.
Times
Times are represented by the POSIXct or the POSIXlt class. POSIXct is just a very
large integer under the hood. It use a useful class when you want to store times in
something like a data frame. POSIXlt is a list underneath and it stores a bunch of
other useful information like the day of the week, day of the year, month, day of the
month. This is useful when you need that kind of information.
There are a number of generic functions that work on dates and times to help you
extract pieces of dates and/or times.
2.5 Introduction to Functions
2.5.1Preview Of Some Important R Data Structures
A data structure is a particular way of organizing data in a computer so that it can be
used effectively. The idea is to reduce the space and time complexities of different
tasks. Data structures in R programming are tools for holding multiple values.
R’s base data structures are often organized by their dimensionality (1D, 2D, or nD)
and whether they’re homogeneous (all elements must be of the identical type) or
heterogeneous (the elements are often of various types). This gives rise to the six data
types which are most frequently utilized in data analysis.
The most essential data structures used in R include:
 Vectors
 Lists
 Dataframes
 Matrices
 Arrays
 Factors
2.6 Vectors in R
In R, Vector is a basic data structure in R that contains element of similar type. These
data types in R can be logical, integer, double, character, complex or raw.
In R using the function, typeof() one can check the data type of vector.
One more significant property of R vector is its length. The
function length() determines the number of elements in the vector.
Adding and Deleting Vector Elements
Vectors are stored like arrays in C, contiguously, and thus you cannot insert or delete
elements—something you may be used to if you are a Python programmer. The size of
a vector is determined at its creation, so if you wish to add or delete elements.
Obtaining the Length of a Vector
We can obtain the length of a vector by using the length () function:

2.7 Characters Strings in R

Character strings are another common data type, used to represent text.In R,
character strings (or simply "strings") are indicated by double quotation marks. To
create a string, just enter text between two paris of these quotes.

Most characters can be used in a string, with a couple of exceptions, one being
the backslash character, "\". This character is called the escape character and is used
to insert characters that would otherwise be difficult to add. For example, without an
escape character, adding a double quote inside a string would pose a problem, as R
would assume that you meant the string to end upon seeing the double quote. With an
escape character, however, adding a double quote inside your string is easy, you simply
prepend the double quote with the backslash. The table below shows some of the other
characters that can be "escaped" in this way.
2.8 Matrices in R
Matrices are Data frames which contain lists of homogeneous data in a tabular format.
We can perform arithmetic operations on some elements of the matrix or the whole
matrix itself in R. Matrices are special cases of a more general R type of object: arrays.
Arrays can be multidimensional. For example, a three-dimensional array would consist
of rows, columns, and layers, not just rows and columns as in the matrix case.
Matrices can represent the binding of two or more vectors of equal length. Analogous
operations can be used to change the size of a matrix. For instance, the rbind() (row
bind) and cbind() (column bind) functions let you add rows or columns to a matrix
Applying Functions to Matrix Rows and Columns
One of the most famous and most used features of R is the *apply() family of
functions, such as apply(), tapply(), and lapply(). Here, we’ll look at apply(), which
instructs R to call a user-specified function on each of the rows or each of the columns
of a matrix.
Using the apply() Function
This is the general form of apply for matrices:

2.9 Lists in R
Lists are R Data Types stores collections of objects of differing lengths and types
using list() function. In contrast to a vector, in which all elements must be of the same
mode, R’s list structure can combine objects of different types.. The list plays a central
role in R, forming the basis for data frames, object-oriented programming, and so on.
Creating Lists
Technically, a list is a vector. Ordinary vectors—those of the type we’ve been
using so far in this book—are termed atomic vectors, since their components cannot be
broken down into smaller components. In contrast, lists are referred to as recursive
vectors.
Let’s consider an employee database. For each employee, we wish to store the name,
salary, and a Boolean indicating union membership. Since we have three different
modes here—character, numeric, and logical—it’s a perfect place for using lists. Our
entire database might then be a list of lists, or some other kind of list such as a data
frame, though we won’t pursue that here.
2.10 Data Frames
The sequence and number of observations in the vectors must be the same for each
vector in the Data Frame to represent a DataSet.
The first, second and third entries in each vector, for example, must represent the
observations collected from first, second and third sampling units respectively.
Programming in R
There are several built-in functions library and add-on tools available for R and they
continue to grow at an incredible rate. Yet programs need performing a task for which
no functions exist.Since R is itself a programming language, extending its functionality
to accommodate more procedures depends on the complexity of the procedure and the
level of R proficiency of the user.
2.11 Classes Vectors: Generating sequences
sequence() function in R Language is used to create a vector of sequenced elements.
It creates vectors with specified length, and specified differences between elements. It
is similar to seq() function.
Syntax: sequence(x)
Parameters:
x: Maximum element of vector

2.13 Extracting elements of a vector using subscripts


To extract (also known as indexing or subscripting) one or more values (more
generally known as elements) from a vector we use the square bracket [ ] notation.
The general approach is to name the object you wish to extract from, then a set of
square brackets with an index of the element you wish to extract contained within the
square brackets. This index can be a position or the result of a logical test.
Positional index
To extract elements based on their position we simply write the position inside
the [ ]. For example, to extract the 3rd value of my_vec

Logical index
Another really useful way to extract data from a vector is to use a logical expression
as an index. For example, to extract all elements with a value greater than 4 in the
vector my_vec

Here, the logical expression is my_vec > 4 and R will only extract those elements
that satisfy this logical condition. So how does this actually work? If we look at the
output of just the logical expression without the square brackets you can see that R
returns a vector containing either TRUE or FALSE which correspond to whether the
logical condition is satisfied for each element. In this case only the 4 th and
8th elements return a TRUE as their value is greater than 4.

2.14 Working with logical subscripts


When you subscript with a logical vector, you are selecting the elements that
correspond to TRUE.
That is, the logical vector doing the subscripting is the same length as the original
vector, and it is the result of some comparison operation. A logical subscript is
similar to a negative number subscript. They both leave the elements of the result in
the same order as the original with some of the elements not there.

2.15 Scalars - Vectors - Arrays - and Matrices


Four common object types that store data are:
1. Scalars: store a single numeric value.
2. Strings: store a set of one or more characters.
3. Vectors: store several scalar or string elements.
4. Data Frames. Store several vectors (meaning that they contain several rows and
columns).
Scalars
A scalar data structure is the most basic data type that holds only a single atomic
value at a time. Using scalars, more complex data types can be constructed.
Vectors
A vector object is just a combination of several scalars stored as a single object. For
example, the numbers from one to ten could be a vector of length 10, and the
characters in the English alphabet could be a vector of length 26. Like scalars,
vectors can be either numeric or character (but not both!).
Matrices
Matrices are special cases of a more general R type of object: arrays. Arrays can be
multidimensional. For example, a three-dimensional array would consist of rows,
columns, and layers, not just rows and columns as in the matrix case.Since we
specified the matrix entries in the preceding example, and there were four of them,
we did not need to specify both ncol and nrow; just nrow or ncol would have been
enough.

Extended Example: Generating a Covariance Matrix


This example demonstrates R’s row() and col() functions, whose arguments are
matrices. For example, for a matrix a, row(a[2,8]) will return the row number of that
element of a, which is 2. Well, we knew row(a[2,8]) is in row
Let’s consider an example. When writing simulation code for multivariate normal
distributions—for instance, using mvrnorm() from the MASS library—we need to
specify a covariance matrix. The key point for our purposes here is that the matrix is
symmetric; for example, the element in row 1, column 2 is equal to the element in
row 2, column 1.
Arrays
Arrays are the R data objects which can store data in more than two dimensions.
For example − If we create an array of dimension (2, 3, 4) then it creates 4
rectangular matrices each with 2 rows and 3 columns. Arrays can store only data
type.An array is created using the array() function. It takes vectors as input and uses
the values in the dim parameter to create an array.
2.16 Adding and Deleting Vector Elements
Vectors are one-dimensional arrays that can hold numeric data, character data, or
logical data. The combine function c() or colon : is used to form the vector.

2.17 Obtaining the Length of a Vector:


We can obtain the length of a vector by using the length() function. It returns the no.of
elements in a vector.

2.18 Matrices and Arrays as Vectors


Arrays and matrices (and even lists, in a sense) are actually vectors too. They merely
have extra class attributes. For example, matrices have the number of rows and
columns.

The 2-by-2 matrix


m is stored as a four-element vector, column-wise, as (1,3,2,4). We then added
(10,11,12,13) to it, yielding (11,14,14,17), but R remembered that we were working
with matrices and thus gave the 2-by-2 result you see in the example.
Recycling
When applying an operation to two vectors that requires them to be the same length, R
automatically recycles, or repeats, the shorter one, until it is long enough to match the
longer one. Here is an example:
2.19 Arithmetic Operations

The basic arithmetic operations can all be performed on multi-dimensional arrays, and
act on the arrays element-by-element. For example,the runtime scales linearly with the
number of elements in the multi-dimensional array, because the arithmetic operation is
performed on each individual index. For example, the runtime for adding a pair
of M×N matrices scales as (O(MN)).

2.20 The logical Operation

The most commonly-used function for array multiplication is the dot function, which
takes two array inputs x and y and returns their "dot product". It constructs a product by
summing over the last index of array x, and over the next-to-last index of array y (or
over its last index, if y is a 1D array). This may sound like a complicated rule, but you
should be able to convince yourself that it corresponds to the appropriate type of
multiplication operation for the most common cases encountered in linear algebra:

2.21VECTOR INDEXING:

Vector elements are accessed using indexing vectors, which can be numeric,
character or logical vectors.You can access an individual element of a vector by its
position (or "index"), indicated using square brackets. In R, the first element has an
index of 1.

You can access multiple elements of a vector by specifying a vector of element


indices inside the square brackets. All the methods that we learned about in the last
section can be used to generate these indexing vectors.
2.22 Common Operations on Vectors

Vectors are the most basic data types in R. Even a single object created is also
stored in the form of a vector. Vectors are nothing but arrays as defined in other
languages. Vectors contain a sequence of homogeneous types of data. If mixed values
are given then it auto converts the data according to the precedence. There are various
operations that can be performed on vectors in R.
1. Combining Vector in R
Functions are used to combine vectors. In order to combine the two vectors in R, we
will create two new vectors ‘n’ and ‘s’. Then, we will create another vector that will
combine these two using c(n,s) as follows:

For example:
> #Author DataFlair
> n = c(1, 2, 3, 4)
> s = c("Hadoop", "Spark", "HIVE", "Flink")
> c(n,s)
2. Arithmetic Operations on Vectors in R
Arithmetic operations on vectors can be performed member-by-member.

For example:
Suppose we have two vectors a and b:

> #Author DataFlair


> a = c (1, 3)
> b = c (1, 3)
> a + b #Addition
For subtraction:

> a - b #Subtraction
For division:

> a / b #Division
For remainder operation:

> a %% b #Remainder Operation


3. Logical Index Vector in R
By using a logical index vector in R, we can form a new vector from a given vector,
which has the same length as the original vector. If the corresponding members of the
original vector are included in the slice, then vector members are TRUE and otherwise
FALSE.

For example:
> #Author DataFlair
> S = c("bb", "cc")
> L = c(TRUE, TRUE) #Defining our Logical Vector
> S[L] #This will return elements of vector S that corrospond to logic vector L
3. Numeric Index

For indexing a numerical value in R, we specify the index between square braces [ ]. If
our index is negative, then R will return us all the values except for the index that we
have specified. For example, specifying [-2] will prompt R to convert -2 into its
absolute value and then search for the value that occupies that index.
5. Duplicate Index

The index vector allows duplicate values. Hence, the following retrieves a member
twice in one operation.

For example:
> # Author DataFlair

> s = c("aa", "bb", "cc", "dd", "ee")

> s[c(2,3,3)]

6. Range Indexes

To produce a vector slice between two indexes, we can use the colon operator “:“. It is
convenient for situations involving large vectors.
For example:
> # Author DataFlair

> s = c("aa", "bb", "cc", "dd", "ee")

> s[1:3]

You might also like