R Notes Based On Text Module 1
R Notes Based On Text Module 1
Based on the syllabus the first module, titled "Introduction to R", aims to introduce learners to
the R programming environment and fundamental concepts. This module covers the following
key areas:
● The R Environment: This includes understanding the Command Line Interface and
Batch processing in R. It also introduces R Packages, which are collections of functions
and data sets that extend the capabilities of base R. According to "R in a Nutshell 2nd
edition", R is a software environment for statistical computing and graphics,
encompassing a language, an interpreter, a graphics system, and applications for different
operating systems. Chapter 1 of "R in a Nutshell 2nd edition" discusses getting and
installing R, while Chapter 2 provides an overview of the R user interface. The R console
is an interactive environment where expressions are entered and evaluated.
● Data Types: This module introduces fundamental data types in R. While the syllabus
doesn't explicitly list them, "R in a Nutshell 2nd edition" details various data types such
as numeric vectors, character vectors, logical values (TRUE, FALSE), NA (Not Available
for missing values), Inf and -Inf (infinity), NaN (Not a Number), and NULL
(representing an empty object).
● Vectors: A significant part of the first module focuses on vectors, including vector
operations and factor vectors. In R, a vector is a one-dimensional array that can hold
numeric, character, or logical values. "R in a Nutshell 2nd edition" explains how to create
and manipulate vectors, including performing operations on them. Factor vectors are used
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
1
to represent categorical data.
● List: The module also covers lists and their operations. A list in R is a versatile data
structure that can contain elements of different types, such as vectors, other lists, data
frames, and functions.
● Data Frames: Data frames are another key data structure introduced in this module. A
data frame is a tabular data structure with rows and columns, where each column can
have a different data type. They are fundamental for data analysis in R.
● Matrices and arrays: Module 1 also introduces matrices (two-dimensional arrays) and
multi-dimensional arrays in R. "R in a Nutshell 2nd edition" provides examples of
creating and accessing elements in arrays.
● Functions: Finally, the first module introduces functions, including how to use
functions as arguments to other functions and the concept of named arguments. In R, a
function is an object that performs a specific task. You can define your own functions and
use built-in functions. Functions can take other functions as arguments, and you can
specify arguments by their names when calling a function.
R Environment
R Packages
Module 1 of the "AIT 362 Programming in R" course, as outlined in the syllabus, focuses on
introducing the "R Environment". According to "R in a Nutshell 2nd edition", R is a software
environment for statistical computing and graphics. This environment encompasses a
computer language, an interpreter that executes the code, a graphics system, and applications that
run on various operating systems.
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
2
The syllabus specifically mentions the following aspects of the R Environment that are covered
in Module 1:
○ The R Console is described as the most crucial tool for interacting with R. It
allows users to type commands (expressions) and see the R system's responses.
The R interpreter reads these expressions and provides a result or an error
message. When R is ready for a command, it displays a prompt, typically ">".
○ "R in a Nutshell 2nd edition" details how to get started with the R console on
different operating systems like Windows, Mac OS X, and Linux. It also mentions
that on Linux, R can start directly on the command line.
○ The text also briefly touches upon Batch Mode, indicating it as another way to
run R. In batch mode, R executes commands from a script without interactive
input. You can typically invoke batch mode from the command line using a
command like $ R CMD BATCH my_script.R (this specific command isn't in the
source, but it's a common way to run R in batch mode, and you may want to
independently verify this information).
● R Packages:
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
3
installing for the first time in a session, R might ask you to choose a CRAN
mirror.
○ It's also possible to create custom packages by organizing R code, data, and
documentation in a specific directory structure. The package.skeleton() function
can help create the basic structure for a new package. Building and checking
packages can be done using command-line tools like R CMD check and R CMD
build.
In summary, Module 1 introduces the learner to the R environment as a powerful platform for
statistical analysis and visualization, emphasizing how to interact with R through the command
line interface (R console and batch processing) and how to extend R's capabilities using R
packages.
Variables in R
● Assignment:
○ Values are assigned to variables using the assignment operator <-. For example,
x <- 1 assigns the numeric value 1 to the variable named x.
○ "R in a Nutshell 2nd edition" notes that the assignment operator is usually
pronounced as "gets".
○ You can also use = for assignment in many contexts [You may want to
independently verify this as it is common in R but not explicitly stated in the
provided sources].
○ It's mentioned that in some programming contexts, an alternative assignment
using -> might help write clearer code, where the value is placed on the left and
the variable on the right (e.g., 4 -> y).
● Variable Names as Symbols:
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
4
○ Symbols can consist of letters, numbers, and the dot (.) or underscore (_)
characters [You may want to independently verify this as it's a common R rule not
explicitly stated in the provided sources].
○ Case matters in R; x1 and X1 are different variables.
○ Some symbols contain special syntax and to refer to these objects, you enclose
them in backquotes (\). For example, to get help on the assignment operator, you
would use ?<-. You could even define a symbol with special characters using
backquotes, like 1+2=3 <- "hello" ``.
○ However, not all words are valid as symbols; some words are reserved in R,
such as if, else, function, for, TRUE, FALSE, NULL, Inf, NaN, NA, etc..
● Variables as Objects:
○ Like most other languages, R allows you to assign values to variables and refer to
them by name.
○ Crucially, in R, a variable points to an object. Everything in R is an object. This
means that a variable can hold not just simple data values (like numbers or
characters) but also more complex data structures such as vectors, lists, data
frames, and even functions.
○ For example, b <- c(1, 2) creates a vector object c(1, 2) and assigns it to the
variable b.
○ Similarly, f <- function(x,y) {c(x+1, y+1)} defines a function object and assigns it
to the variable f.
● Scope and Environments:
○ The sources briefly mention that there are subtleties to how variables are
evaluated and refer to variable scope and environments, which are discussed in
more detail in Chapter 8 of "R in a Nutshell 2nd edition". Environments define
the context in which variables exist and can be accessed. The global environment
is the default workspace where variables you define directly are stored.
● Variables in Functions and Formulas:
○ When you define a function, the arguments listed within the function() definition
are symbol names that will become variables within the body of the function.
○ In formulas, which are a special type of object in R used to describe relationships
between variables (often in statistical modeling and graphics), variable names are
used to specify the dependent and independent variables. For example, y ~ x1 +
x2 indicates that the variable y is modeled as a function of the variables x1 and
x2.
In summary, R variables are symbolic names that hold references to objects. They are created
through assignment, are case-sensitive, and adhere to certain naming rules. The fact that
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
5
variables can hold various types of objects is a key characteristic of R and contributes to its
flexibility in data analysis and programming.
In R, all code manipulates objects, which are represented as "things" by the computer. Every
object in R has a type that defines how it is stored, and each object is also a member of a class
that defines what information it contains and how it can be used.
Here's an overview of the main aspects of data types in R, according to the sources:
● Primitive Object Types (Basic Vectors): These are vectors that contain a single type of
value. The sources identify the following basic vector types:
○ integer: Naturally produced from sequences and can be coerced using the
integer() function. For example, typeof(1:1) returns "integer".
○ numeric (often double-precision floating-point numbers): This is the default type
for numbers in R. You can combine arbitrary sets of numbers into a numeric
vector using the c() function. typeof(1) returns "double" by default.
○ complex: Used for complex numbers.
○ character: Vectors containing text. They are created using quotes, e.g., "Hello
world.". This is called a character vector in R.
○ logical: Represents Boolean values, either TRUE or FALSE.
○ raw: A vector containing raw bytes, useful for encoding objects from outside the
R environment.
● The c() function can combine elements into a vector. If you combine elements of different
types, R will attempt to coerce them to a single type. The ordering of coercion is roughly:
logical < integer < numeric < complex < character < list.
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
6
● You can check the type of an object using the typeof() function and its class using the
class() function.
● Compound Objects: These objects are containers for basic vectors and other objects.
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
7
○ factors: Used to compactly represent vectors of categorical values. They are
ordered collections of items, and the distinct values are called levels. You can
create a factor using the factor() function.
○ data frames: Tabular data structures with columns that can be of different types
but must have the same length (number of rows). They are implemented as lists
with the class "data.frame". You can create them using data.frame().
○ formulas: A special notation in R to describe relationships between variables,
often used in statistical modeling and graphics. For example, y ~ x1 + x2 is a
formula object.
○ time series: Sequences of measurements of a quantity over time, typically taken
at equally spaced intervals. R has a specific ts class for this.
○ dates and times: R includes classes like Date (for dates only) and
POSIXct/POSIXlt (for dates and times) to represent temporal data.
○ connections: Objects used for receiving data from or sending data to applications
or files outside the R environment, like files or URLs.
Understanding these data types is fundamental to working with R effectively. The str() function
can be used to compactly display the internal structure of an R object, giving information about
its type and structure.
Vectors in R
● integer: Vectors containing whole numbers. They are naturally produced from sequences
(e.g., 1:5) and can be explicitly created or coerced using the integer() function.
● numeric (often double-precision floating-point numbers): The default type for numbers
in R. You can create numeric vectors using functions like c(). For example, c(0, 1, 1, 2, 3,
5, 8) creates a numeric vector.
● complex: Vectors for representing complex numbers.
● character: Vectors containing text, also known as character arrays. They are created
using quotes, e.g., "hello".
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
8
● logical: Vectors holding Boolean values, TRUE or FALSE.
● raw: Vectors containing raw bytes, useful for low-level operations.
You can create vectors using the c() function, which combines its arguments into a vector. For
example, v <- c(.295, .300, .250, .287, .215) creates a numeric vector. The c() function will
coerce all its arguments to a single type if they are different. The typical order of coercion is
logical < integer < numeric < complex < character < list [mentioned in the more detailed
explanation of data types].
Other ways to create vectors include the sequence operator : (e.g., 1:10 creates a vector of
integers from 1 to 10) and the seq() function, which provides more flexibility in specifying the
sequence (e.g., seq(from=5, to=25, by=5)). You can also manipulate the length of a vector using
the length() attribute.
Vector Operations
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
9
Factor Vectors
A factor is a special type of vector used to represent categorical data. According to "R in a
Nutshell," a factor is an ordered collection of items, and the distinct values that the factor can
take are called levels. The syllabus also mentions "factor vectors".
Factors are particularly useful for representing data with a fixed set of possible values, such as
eye color (brown, blue, green) or treatment groups (control, treatment A, treatment B). While
you could represent these as character vectors, factors can be more efficient in terms of storage
and are often required by statistical modeling functions.
You can create a factor from another vector (typically a character vector) using the factor()
function. For example:
eye.colors.factor
The output would show the levels of the factor: "blue" "blue" "green" "brown" (the order might
vary depending on the input). The underlying representation of the factor is a vector of integers
corresponding to the levels, along with a separate attribute storing the actual levels.
● Levels: The unique categories present in the data. You can access the levels of a factor
using the levels() function.
● Order: Factors can be either unordered (nominal categories) or ordered (ordinal
categories, where there is a meaningful order among the levels). You can specify ordered
factors using the ordered = TRUE argument in the factor() function and defining the
order of levels.
● Efficiency: For categorical data with many repetitions of the same values, factors can be
more memory-efficient than character vectors.
● Statistical Modeling: Many statistical functions in R automatically treat factor variables
appropriately for analysis (e.g., creating dummy variables for regression models).
In a data frame, columns containing categorical data are often stored as factors.
In summary, vectors are the basic building blocks for storing sequences of data of the same type
in R, supporting element-wise operations and flexible indexing. Factor vectors are a specialized
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
10
type of vector designed for representing categorical data efficiently and are crucial for statistical
analysis in R.
List- operations in R
In R, a list is a versatile data structure that holds an ordered collection of objects. Unlike vectors,
which must contain elements of the same basic type, lists can contain a heterogeneous
selection of objects. This means a single list can include numbers, character strings, vectors,
other lists, data frames, functions, and more. According to "R in a Nutshell," lists are a built-in
data type for mixing objects of different types. The syllabus for "Programming in R" also
identifies "List- operations" as a key data type.
Creating Lists
You can create a list using the list() function. You simply provide the objects you want to include
in the list as arguments to the function. You can also assign names to the elements during
creation using the name = object syntax.
my_list <- list(name = "Alice", age = 30, grades = c(85, 92, 78), is_student = TRUE)
my_list
another_list
List Operations
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
11
R provides several ways to manipulate and access elements within a list. The "Programming in
R" syllabus explicitly mentions "List- operations", and a question in the model paper asks about
general list operations.
● By Index (Position): Use single square brackets [] or double square brackets [[]] along
with the index of the element.
○ list[n]: Returns a sublist containing the nth element.
●
○ list[[n]]: Returns the content of the nth element. This is often what you want
when working with individual elements.
By Name: If the elements in the list have names, you can access them using the $ operator
followed by the name of the element.
my_list$name # Returns "Alice"
● You can also use double square brackets with the name as a character string:
list[["name"]].
Partial Matching by Name: With [[ ]], you can use the exact = FALSE option to allow partial
matching of names.
dairy <- list(milk = "1 gallon", butter = "1 pound")
You can modify elements in a list by assigning new values using the same access methods:
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
12
my_list[] <- FALSE # Change the fourth element
my_list
Adding: You can add new elements to a list by assigning a value to a new name or index:
my_list$city <- "New York" # Add a new element with name "city"
my_list
my_list
4. Combining Lists
You can combine multiple lists into a single list using the c() function.
combined_list
lapply(): This function takes a list and a function as arguments and returns a list where each
element is the result of applying the function to the corresponding element of the input list.
numbers <- list(1, 2, 3, 4)
squared_numbers
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
13
sapply(): This function works similarly to lapply() but attempts to simplify the output to a vector
or matrix if possible.
numbers <- list(1, 2, 3, 4)
squared_numbers_vector
It's important to note that data frames in R are implemented as lists with the class attribute set
to "data.frame". This means that many of the operations applicable to lists, especially accessing
elements using $ and [[ ]], also work for data frames.
In summary, lists are a fundamental and flexible data structure in R for storing collections of
diverse objects. Understanding how to create and manipulate lists using these various operations
is crucial for effective data handling and programming in R.
Data Frames in R
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
14
In R, a data frame is a fundamental data structure used to represent tabular data. It's designed
to resemble a spreadsheet or a database table. According to "R in a Nutshell," a data frame is a
list that contains multiple named vectors of the same length. The "Programming in R Syllabus"
also lists "Data Frames" as one of the key data types in R. Data frames are particularly
well-suited for representing experimental data where you have individual observations, each with
several different measurements that might have different dimensions or be qualitative rather than
quantitative. They are also commonly used to represent business data kept in database tables.
Structure
Relationship to Lists
Crucially, data frames are implemented as lists with the class attribute set to "data.frame". This
means that many of the ways you access elements in lists also work for data frames. A data
frame is a list of equal-length vectors. Each vector in the list represents a column of the data
frame.
You can create a data frame using the data.frame() function. You provide vectors as arguments to
this function, and these vectors become the columns of the data frame. The vectors must be of
the same length, or R will return an error. You can also assign names to the columns during
creation using the name = vector syntax.
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
15
l <- c(70, 73, 77, 90, 102)
nleast
This example from "R in a Nutshell" demonstrates the creation of a data frame named nleast with
three columns: teams, w, and l.
top.5.salaries
Accessing Elements
Because data frames are lists, you can access their components (columns) using list-like syntax:
By Name using $: You can access a specific column by using the $ operator followed by the
column name.
nleast$teams # Access the 'teams' column
By Name using [[""]]: You can also access a column using double square brackets [[]] and the
column name as a character string.
nleast[["w"]]
top.5.salaries[["position"]]
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
16
By Index using []: You can access columns by their numerical index within single square
brackets. This will return a data frame containing the selected column(s).
nleast # Returns a data frame with the first column ('teams')
nleast[2:3] # Returns a data frame with the second and third columns ('w' and 'l')
By Index using [[]]: Using double square brackets with a single numerical index will return the
column as a vector.
nleast[] # Returns the 'w' column as a numeric vector
Accessing Rows and Specific Cells using []: You can access specific rows or individual cells
using single square brackets with row and column indices (e.g., data.frame[row_index,
column_index]). Omitting an index will select all rows or all columns.
nleast # Returns the first row
nleast # Returns the element in the third row and first column
Key Properties
Usage Examples
● The syllabus mentions "List- operations" and "Data Frames" as distinct data types,
implying that while related, they are used differently.
● The baseball data example shows a typical use case for data frames to store structured,
mixed-type data. You can then access specific information, such as the number of losses
for the Florida team using nleast$l[nleast$teams=="FLA"].
● The top.5.salaries example further illustrates the creation of a data frame with different
data types (character and numeric).
Data frames are frequently created by importing data from external files such as text files (using
functions like read.table, read.csv, read.delim) or from databases (using packages like RODBC
or DBI). Conversely, you can export data frames to text files using functions like write.table or
write.csv.
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
17
While data frames can hold different data types, matrices in R are restricted to holding elements
of a single basic type (numeric, character, or logical). If you try to combine different data types
in a matrix, R will coerce them to the same type.
● names: This attribute stores the column names. You can access and modify it using the
names() function.
● row.names: This attribute stores the row names.
● class: This attribute indicates that the object belongs to the "data.frame" class.
● dim: This attribute stores the dimensions of the data frame (number of rows and
columns).
● dimnames: This attribute is a list containing the column names and row names.
● Subsetting: You can extract subsets of rows and columns based on conditions using the
[] operator or the subset() function.
● Merging: You can combine two data frames based on common columns using the
merge() function.
● Binding: You can combine data frames row-wise using rbind() or column-wise using
cbind().
● Transformations: You can create new columns or modify existing ones using various
functions.
● Summarization: Functions like summary(), aggregate(), and tapply() can be used to
calculate summary statistics for data frame columns.
● Sorting: You can sort the rows of a data frame based on the values in one or more
columns using the order() or sort() function.
● Reshaping: Functions like reshape() and packages like reshape2 (using melt() and cast())
allow you to change the structure of your data frame between "wide" and "long" formats.
● Data Cleaning: R provides tools for finding and removing duplicate rows using unique()
or duplicated(), and for handling missing values (NA).
Data Editor
R provides a built-in data editor that allows you to view and edit data frames (and matrices) in a
graphical user interface. You can open the data editor using the edit() function. While convenient
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
18
for inspection or minor edits, it's generally recommended to use scripting for more serious data
entry or manipulation.
In summary, data frames are a cornerstone of data analysis in R, providing a flexible and
structured way to store and work with tabular data of mixed types. Their list-based nature allows
for powerful and intuitive ways to access and manipulate the data.
Matrices in R
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
19
● Mathematical operations can be performed on matrices. The operator %*% is used for
matrix multiplication. The function t() returns the transpose of a matrix. solve() can be
used to solve systems of linear equations involving matrices.
● The crossprod() function can compute matrix cross-products.
Arrays in R
● Both matrices and arrays are used to store collections of data elements of a single data
type.
● Both are built upon an underlying vector.
● Elements in both are accessed using indices within square brackets.
● A matrix is a special case of an array with exactly two dimensions.
● Arrays can have more than two dimensions, while matrices are strictly
two-dimensional.
Accessing Elements
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
20
● For a matrix m, m[i, j] accesses the element at the i-th row and j-th column. m[i,]
accesses the i-th row, and m[,j] accesses the j-th column.
● For an array a with dimensions c(d1, d2, d3), a[i, j, k] accesses the element at the i-th
position of the first dimension, j-th of the second, and k-th of the third. a[i,,] would access
the i-th "slice" across the first dimension, containing all combinations of the other two
dimensions.
These data structures, matrices and arrays, are fundamental for various mathematical and
statistical computations in R.
Control statements in R allow you to manage the flow of execution in your programs, enabling
you to make decisions and repeat code blocks. The "AIT 362 Programming in R Syllabus"
mentions "Branching and looping - For loops, While loops, Controlling loops" as a key topic.
"R in a Nutshell 2nd edition" provides detailed explanations of these control structures.
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
21
Branching (Conditional Statements)
Conditional statements in R allow you to execute specific blocks of code based on whether
certain conditions are true or false.
○ if (condition) true_expression
○ if (condition) true_expression else false_expression The condition is a logical
expression. If it evaluates to TRUE, the true_expression is executed. If it
evaluates to FALSE in the second form, the false_expression is executed. It's
crucial to understand that the if statement in R is not a vector operation. If the
condition is a vector of logical values with more than one element, only the first
element of the vector will be evaluated to determine which expression to execute.
This can lead to unexpected behavior if you intend to perform element-wise
conditional operations on vectors. A warning message is typically issued in such
cases.
● ifelse() function: For element-wise conditional operations on vectors, you should use the
ifelse() function. The syntax is: ifelse(condition, value_if_true, value_if_false) Here,
condition is a logical vector. The function returns a vector of the same length where each
element is taken from value_if_true if the corresponding element in condition is TRUE,
and from value_if_false if it is FALSE. For example, ifelse(c(TRUE, FALSE, TRUE,
FALSE, TRUE), a, b) will create a new vector by picking elements from vector a where
the condition is TRUE and from vector b where it is FALSE.
● switch statement: While not explicitly detailed in the provided snippets, switch is
another conditional control structure in R (mentioned in the teaching plan as "switch, if
else"). It allows you to execute different code blocks based on the value of an expression.
Typically, it takes an expression as input and matches its value against several possible
cases, executing the code associated with the first matching case.
Looping
Loops in R provide a way to repeatedly execute a block of code. R offers three main types of
loops:
repeat loop: This loop repeatedly executes an expression indefinitely. repeat expression To exit
a repeat loop, you must use the break keyword within the loop. The break statement immediately
terminates the loop and transfers control to the statement following the loop. You can also use
the next command within a repeat loop to skip the current iteration and proceed to the next one.
An example of using repeat with break to print multiples of 5 up to 25 is provided:
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
22
i <- 5
repeat {
● If you don't include a break statement, a repeat loop will become an infinite loop.
while loop: This loop executes an expression as long as a specified condition is TRUE. while
(condition) expression The condition is evaluated before each iteration. If it is TRUE, the
expression (which can be a block of code enclosed in curly braces {}) is executed. The loop
continues until the condition becomes FALSE. Similar to repeat loops, you can use break to exit
a while loop prematurely and next to skip to the next iteration. The multiples of 5 example can
be rewritten using a while loop:
i <- 5
for loop: This loop iterates over each item in a vector (or a list) and executes an expression for
each item. for (var in sequence) expression The var takes on the value of each element in the
sequence (which is typically a vector or a list) during each iteration of the loop. The expression is
executed for each value of var. You can again use break and next within for loops. The multiples
of 5 example can also be implemented with a for loop:
for (i in seq(from=5, to=25, by=5)) print(i)
1. Results are not automatically printed inside a loop unless you explicitly use the
print() function.
2. The variable var that is assigned in the for loop is changed in the calling
environment after the loop finishes.
It is mentioned that the looping functions (repeat, while, and for) have a special type because the
expression within them is not necessarily evaluated.
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
23
Controlling Loops
The break and next statements are the primary mechanisms for controlling the flow within loops
in R.
● break: When encountered inside a loop (repeat, while, or for), the break statement
immediately terminates the innermost loop and execution continues with the statement
immediately following the loop.
● next: When encountered inside a loop, the next statement stops the execution of the
current iteration and immediately jumps to the beginning of the next iteration of the loop.
For for loops, it moves to the next element in the sequence; for while loops, it
re-evaluates the condition; and for repeat loops, it simply starts the next cycle.
Looping Extensions
The sources also briefly mention that while R doesn't have built-in iterators or foreach loops like
some other languages, these functionalities are available through add-on packages, such as those
from Revolution Computing (now part of Microsoft) and available on CRAN. Packages like
iterators and foreach provide more advanced and often more elegant ways to handle iterations in
R.
In summary, R provides a set of control statements (if, ifelse, switch for branching and repeat,
while, for for looping) along with control keywords (break, next) to manage the execution flow
of your R programs. Understanding how these statements work, especially the vector nature of
ifelse versus the scalar nature of if, and the use of break and next in loops, is crucial for writing
effective and correct R code.
Functions as Arguments
In R, functions are treated as objects, just like any other data type. This fundamental property
allows you to do some powerful things, including passing functions as arguments to other
functions.
● Many functions in R are designed to take other functions as arguments. For example,
many modeling functions might have an argument that specifies a function for handling
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
24
missing values.
The sapply function is a clear example of a function that takes another function as an argument.
sapply iterates through each element of a vector and applies the function provided as an
argument to each element, returning the results.
a <- 1:7
sapply(a, sqrt)
● In this example, sqrt is passed as an argument to sapply, and sapply applies the sqrt
function to each element of the vector a.
● This capability is particularly useful when you have a function that doesn't work
element-wise on a vector. sapply provides a way to extend the functionality of such
functions to work on vectors.
Anonymous functions are often used when passing functions as arguments. Because functions
are objects, you can create functions without giving them a name, and these anonymous
functions can be directly passed as arguments to other functions.
apply.to.three <- function(f) {f(3)}
You can also define an anonymous function and apply it directly to an argument by enclosing the
function definition in parentheses:
(function(x) {x+1})(1)
● The parentheses are necessary because function calls (f(arguments)) have very high
precedence in R.
● The use of functions as arguments, especially with functions like sapply and anonymous
functions, can be a good alternative to control structures like loops for performing
operations on collections of data. For instance, calculating the square of each element in a
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
25
vector can be done using sapply with an anonymous function instead of a for loop.
Named Arguments
When defining a function in R, you assign names to its arguments. When calling a function, you
can specify the values for these arguments in a few different ways, including by using their
names.
When calling a function, you can provide arguments by their exact names. This explicitly tells
R which value corresponds to which argument, regardless of the order in which they are listed in
the function call.
Consider a function defined as:
addTheLog <- function(first, second) {first + log(second)}
● In this case, first is assigned the value 1, and second is assigned the result of exp(4).
If you provide the arguments in the default order defined in the function, you can omit the
names.
Using the same addTheLog function:
addTheLog(1, exp(4)) # Order matches the definition (first, second)
● Here, 1 is assigned to first, and exp(4) is assigned to second because of their position in
the function call.
R also supports partially matching names for arguments. This means you can use an
abbreviated version of the argument name as long as it uniquely identifies the argument.
However, the source mentions that partial name matching is a deprecated feature because it can
lead to confusion.
Using addTheLog again:
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
26
addTheLog(s=exp(4), f=1) # 's' partially matches 'second', 'f' partially matches 'first'
● Using exact argument names is generally considered good practice. While it might
require more typing, it makes your code easier to read and removes potential ambiguity,
especially in functions with many arguments or arguments with similar names.
● If you call a function with an ambiguous partial name (a partial name that could match
more than one argument), R will typically throw an error.
For example, if you have a function f <- function(arg1=10, arg2=20) {...} and you call it
with f(arg=1), you will get an error because arg could partially match both arg1 and arg2.
Understanding how to use functions as arguments and how to call functions with named
arguments provides flexibility and clarity in your R programming. Passing functions as
arguments enables higher-order functions and functional programming paradigms, while named
arguments improve code readability and reduce the reliance on remembering the exact order of
arguments.
KTU - AIT 362 – Programming in R (2019 Scheme) | Based on prescribed text and syllabus.
Prepared by Mr. Nisanth P, Asst. Professor, Department of AI & ML , VAST | +91 90372 22822
27