Chapter1 Notes
Chapter1 Notes
Introduction to R
R programming is an interpreted programming language widely used to analyse statistical
information and a graphical representation. It was developed by Ross Ihaka and Robert
Gentleman at the University of Auckland, New Zealand in the year 1960 and 1970s.
R programming is popular in the field of data science among data analysts, researchers,
statisticians, etc. In the present era, R is one of the most important tool which is used by
researchers, data analyst, statisticians, and marketers for retrieving, cleaning, analyzing,
visualizing, and presenting data.
R is an open-source programming language and it is available on widely used platforms e.g.
Windows, Linux, and Mac.
R not only allows us to do branching and looping but also allows to do modular programming
using functions. R allows integration with the procedures written in the C, C++, .Net, Python,
and FORTRAN languages to improve efficiency.
Features of R Programming Language:
The R Language is renowned for its extensive features that make it a powerful tool for data
analysis, statistical computing, and visualization. Here are some of the key features of R:
1. Comprehensive Statistical Analysis:
R language provides a wide array of statistical techniques, including linear and nonlinear
modelling, classical statistical tests, time-series analysis, classification, and clustering.
2. Advanced Data Visualization:
With packages like ggplot2, plotly, and lattice, R excels at creating complex and aesthetically
pleasing data visualizations, including plots, graphs, and charts.
3. Extensive Packages and Libraries:
The Comprehensive R Archive Network (CRAN) hosts thousands of packages that extend R’s
capabilities in areas such as machine learning, data manipulation, bioinformatics, and more.
4. Open Source and Free:
R is free to download and use, making it accessible to everyone. Its open-source nature
encourages community contributions and continuous improvement.
5. Platform Independence:
R is platform-independent, running on various operating systems, including Windows, macOS,
and Linux, which ensures flexibility and ease of use across different environments.
Output
Here, we have created a simple variable called message. We have initialized this variable with a
simple message string called "Hello World!". On execution, this program prints the message
stored inside the variable.
Every output in R is preceded by a number (say n) in square brackets. This number means
that the displayed value is the nth element printed.
R Script File
The R script file is another way on which we can write our programs, and then we execute those
scripts at our command prompt with the help of R interpreter known as Rscript. We make a text
file and write the following code. We will save this file with .R extension as:
Hello.R
msg <-"Hello World!"
print(msg)
To execute this file in Windows and other operating systems, the process will remain the same as
mentioned below.
Comments
In R programming, comments are the programmer readable explanation in the source code of an
R program. The purpose of adding these comments is to make the source code easier to
understand. These comments are generally ignored by compilers and interpreters.
In R programming there is only single-line comment. R doesn't support multi-line comment. But
if we want to perform multi-line comments, then we can add our code in a false block.
The # symbol is used to create single-line comments in R. For example,
Example:
Output:
[1] "hello"
[1] "hello"
[1] "hello"
var1 = "hello"
print(class(var1))
Output:
[1] "character"
2. ls( ) function
This built-in function is used to know all the present variables in the workspace. This is generally
helpful when dealing with a large number of variables at once and helps prevents overwriting
any of them.
Syntax :
ls( )
Example:
Output:
3. rm() function
This is again a built-in function used to delete an unwanted variable within your workspace. This
helps clear the memory space allocated to certain variables that are not in use thereby creating
more space for others. The name of the variable to be deleted is passed as an argument to it.
Syntax :
rm(variable)
Example:
var1 = "hello"
# using leftward operator
var2 <- "hello"
# using rightward operator
"hello" -> var3
rm(var3)
print(var3)
Output:
typeof( ): typeof() function in R Language is used to return the types of data used as the
arguments.
Syntax: typeof(x)
Scope of Variables in R programming:
Global Variables
Global variables are those variables that exist throughout the execution of a program. It
can be changed and accessed from any part of the program.
As the name suggests, Global Variables can be accessed from any part of the program.
They are available throughout the lifetime of a program.
They are declared anywhere in the program outside all of the functions or blocks.
Example:
# global variable
global = 5
# global variable accessed from within a function
display = function(){
print(global)
}
display()
# changing value of global variable
global = 10
display()
Output:
[1] 5
[1] 10
Local Variables
Local variables are those variables that exist only within a certain part of a program like a
function and are released when the function call ends.
Local variables do not exist outside the block in which they are declared, i.e. they can not
be accessed or used outside that block.
Example:
func = function(){
# this variable is local to the
age = 18
print(age)
}
cat("Age is:\n")
func()
Output:
Age is:
[1] 18
Integer 3L, 66L, 2346L Here, L tells R to store the value as an integer,
Complex Z=1+2i, t=7+3i A complex value in R is defined as the pure imaginary value i.
Character 'a', '"good'", In R programming, a character is used to represent string values.
"TRUE", '35.4' We convert objects into character values with the help
ofas.character() function.
Raw A raw data type is used to holds raw bytes. i.e Hexadecimal
(ASCII) value of each character in ASCII
Example:
Output:
TRUE
The data type of variable_logical is logical
3532
The data type of variable_numeric is numeric
133
The data type of variable_integer is integer
integer3+2i
The data type of variable_complex is complex
Learning r programming
The data type of variable_char is character
4c 65 61 72 6e 69 6e 67 20 72 20 70 72 6f 67 72 61 6d 6d 69 6e 67
The data type of variable_char is raw
R – Keywords
Keywords are specific reserved words in R, each of which has a specific feature associated with
it. In R, one can view these keywords by using either help(reserved) or ?reserved. Here is the list
of keywords in R:
if else Repeat
while function For
next break TRUE
FALSE NULL Inf
NaN NA NA_integer_
NA_real_ NA_complex_ NA_character_
So one needs to convert that inputted value to the format that he needs. In this case, string “255”
is converted to integer 255. To convert the inputted value to the desired data type, there are some
functions in R,
as.integer(n); —> convert to integer
as.numeric(n); —> convert to numeric type (float, double etc)
as.complex(n); —> convert to complex number (i.e 3+2i)
as.Date(n) —> convert to date …, etc
as.logical() --- Converts the value to logical type.
Syntax
var = readline (prompt = " ")
Parameter
prompt: It is an optional parameter. It is used to specify the user what type of input the program
is expecting.
Return type : A character vector (string).
Example:
Output:
Enter name::kle
[1] "kle"
[1] "character"
[1] "character"
Syntax:
x = scan( )
scan() method is taking input continuously, to terminate the input process, need to press Enter
key 2 times on the console.
Example:
x = scan()
# print the inputted values
print(x)
Output:
Enter name::KLE
[1] "KLE"
[1] "Entered name is KLE"
Output:
Enter name::KLE
[1] "KLE"
[1] "Entered name is KLE"
Entred name is:: kle
Operators in R
In computer programming, an operator is a symbol which represents an action. An operator is a
symbol which tells the compiler to perform specific logical or mathematical manipulations. R
programming is very rich in built-in operators.
In R programming, there are different types of operator, and each operator performs a different
task. For data manipulation, There are some advance operators also such as model formula and
list indexing.
There are the following types of operators used in R:
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators
Arithmetic Operators
Arithmetic operators are the symbols which are used to represent arithmetic math operations.
The operators act on each and every element of the vector. There are various arithmetic operators
which are supported by R.
S. Operator Description Example
No
Relational Operators
A relational operator is a symbol which defines some kind of relation between two entities.
These include numerical equalities and inequalities. A relational operator compares each element
of the first vector with the corresponding element of the second vector. The result of the
comparison will be a Boolean value. There are the following relational operators which are
supported by R.
S. Operator Description Example
No
This operator will return TRUE when every a <- c(1, 3, 5)
element in the first vector is greater than the b <- c(2, 4, 6)
print(a>b)
corresponding element of the second vector.
1. > It will give us the
following output:
[1] FALSE
FALSE FALSE
Logical Operators
The logical operators allow a program to make a decision on the basis of multiple conditions. In
the program, each operand is considered as a condition which can be evaluated to a false or true
value. The value of the conditions is used to determine the overall value of the op1 operator op2.
Logical operators are applicable to those vectors whose type is logical, numeric, or complex.
The logical operator compares each element of the first vector with the corresponding element of
the second vector.
There are the following types of operators which are supported by R:
S. Operator Description Example
No
This operator is known as the Logical AND a <- c(3, 0, TRUE,
operator. This operator takes the first element of 2+2i)
b <- c(2, 4, TRUE,
both the vector and returns TRUE if both the 2+3i)
elements are TRUE. print(a&b)
1. &
It will give us the
following output:
[1] TRUE
FALSE TRUE TRUE
This operator takes the first element of both the a <- c(3, 0, TRUE,
vector and gives TRUE as a result, only if both 2+2i)
b <- c(2, 4, TRUE,
are TRUE. 2+3i)
print(a&&b)
4. &&
It will give us the
following output:
[1] TRUE
This operator takes the first element of both the a <- c(3, 0, TRUE,
vector and gives the result TRUE, if one of them 2+2i)
b <- c(2, 4, TRUE,
is true. 2+3i)
print(a||b)
5. ||
It will give us the
following output:
[1] TRUE
[1] FALSE
[1] FALSE
x <- 100
if(x > 10){
print(paste(x, "is greater than 10"))
}
Output:
if-else condition
It is similar to if condition but when the test expression in if condition fails, then statements in
else condition are executed.
Syntax:
if(expression){
statements
....
....
}
else{
statements
....
....
}
Example:
x <- 5
# Check value is less than or greater than 10
if(x > 10){
print(paste(x, "is greater than 10"))
}else{
print(paste(x, "is less than 10"))
}
Output:
for loop
It is a type of loop or sequence of statements executed repeatedly until exit condition is reached.
Syntax:
for(value in vector){
statements
....
....
}
Example:
var =c(10,20,30)
for( i in var)
{
print(i)
}
Output:
[1] 10
[1] 20
[1] 30
while loop
while loop is another kind of loop iterated until a condition is satisfied. The testing expression is
checked first before executing the body of loop.
Syntax:
while(expression){
statement
....
....
}
Example:
x = 1
# Print 1 to 5
while(x <= 5){
print(x)
x = x + 1
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
x = 1
# Print 1 to 5
repeat{
print(x)
x = x + 1
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
return statement
return statement is used to return the result of an executed function and returns control to the
calling function.
Syntax:
return(expression)
Example:
func(1)
func(0)
func(-1)
Output:
[1] "Positive"
[1] "Zero"
[1] "Negative"
next statement
next statement is used to skip the current iteration without executing the further statements and
continues the next iteration cycle without terminating the loop.
x = 1
repeat {
# Break if x = 4
if ( x == 4) {
break
}
# Skip if x == 2
if ( x == 2 ) {
# Increment x by 1 and skip
x = x + 1
next
}
Output
[1] 1
[1] 3
Coercion in R:
Coercion includes type conversion . Type conversion means change of one type of data into
another type of data. We have to type of coercion occurs :
1. Implicit Coercion
2. Explicit Coercion
Explicit Coercion :
In explicit coercion , we can change one data type to another data type by applying function.
We create an object “x” which stores integer values from 1 to 6.
Implicit Coercion:
We input logical and numeric data in an object . Logical data convert to numeric data implicitly.
The exponential function, f (x) = 𝑒 𝑥 , is often written as exp(x) and represents the inverse of the
natural log such that exp(loge x) = loge exp(x) = x.
The R command for the exponential function is exp:
R> exp(x=3)
[1] 20.08554
The default behavior of log is to assume the natural log:
R> log(x=20.08554)
[1] 3
Data Structures
Vectors:
A vector is the basic data structure in R that stores data of similar types.
R vectors are the same as the arrays in C language which are used to hold multiple data values of
the same type. One major key point is that in R the indexing of the vector will start from ‘1’ and
not from ‘0’. We can create numeric vectors and character vectors as well.
Suppose we need to record the age of 5 employees. Instead of creating 5 separate variables, we
can simply create a vector.
Elements of a Vector
Creating a Vector in R
In R, we use the c( ) function to create a vector. For example,
Types of R vectors
Vectors are of different types which are used in R. Following are some of the types of vectors:
Numeric vectors: Numeric vectors are those which contain numeric values such as integer,
float, etc.
Output
[1] "double"
[1] "integer"
Character vectors: Character vectors in R contain alphanumeric values and special characters.
Output
[1] "character"
Logical vectors: Logical vectors in R contain Boolean values such as TRUE, FALSE and NA
for Null values.
Output
[1] "logical"
n R, each element in a vector is associated with a number. The number is known as a vector
index. We can also modify the vector elements
We can access elements of a vector using the index number (1, 2, 3 …). For example,
Sequences are extremely useful, but sometimes you may want simply torepeat a certain value.
You do this using rep ( )
R> rep(x=1,times=4)
[1] 1 1 1 1
R> rep(x=c(3,62,8.3),times=3)
[1] 3.0 62.0 8.3 3.0 62.0 8.3 3.0 62.0 8.3
R> rep(x=c(3,62,8.3),each=2)
[1] 3.0 3.0 62.0 62.0 8.3 8.3
R> rep(x=c(3,62,8.3),times=3,each=2)
[1] 3.0 3.0 62.0 62.0 8.3 8.3 3.0 3.0 62.0 62.0 8.3 8.3 3.0 3.0 62.0
[16] 62.0 8.3 8.3
Sorting with sort:
sort() function is used with the help of which we can sort the values in ascending or descending
order.
R> x= c(8, 2, 7, 1, 11, 2)
R> x
[1] 8 2 7 1 11 2
R> sort(x)
[1] 1 2 2 7 8 11
R> sort(x,decreasing = TRUE)
[1] 11 8 7 2 2 1
Finding a Vector Length with length:
length( ) function, which determines how many entries exist in a vector given as the argument x.
R> length(x=c(3,2,8,1))
[1] 4
R> length(x=5:13)
[1] 9
Deleting a vector
Vectors can be deleted by reassigning them as NULL. To delete a vector we use the NULL
operator.
X <- c(5, 2, 1, 6)
# Deleting a vector
X <- NULL
print('Deleted vector')
print(X)
Output
Arithmetic operations
We can perform arithmetic operations between 2 vectors. These operations are performed
element-wise and hence the length of both the vectors should be the same.
# Creating Vectors
X <- c(5, 2, 5, 1, 51, 2)
Y <- c(7, 9, 1, 5, 2, 1)
# Addition
Z <- X + Y
print('Addition')
print(Z)
# Subtraction
S <- X - Y
print('Subtraction')
print(S)
# Multiplication
M <- X * Y
print('Multiplication')
print(M)
# Division
D <- X / Y
print('Division')
print(D)
Output
Addition 12 11 6 6 53 3
Subtraction -2 -7 4 -4 49 1
Multiplication 35 18 5 5 102 2
Division 0.7142857 0.2222222 5.0000000 0.2000000 25.5000000 2.0000000
Creating a Matrix:
To create a matrix in R you need to use the function called matrix(). The arguments to this
matrix( ) are the set of elements in the vector. You have to pass how many numbers of rows and
how many numbers of columns you want to have in your matrix.
Note: By default, matrices are in column-wise order.
The syntax of the matrix() function is
Here,
vector - the data items of same type
nrow - number of rows
ncol - number of columns
byrow (optional) - if TRUE, the matrix is filled row-wise. By default, the matrix is filled
column-wise.
data=c(1, 2, 3, 4, 5, 6, 7, 8, 9)
# By default matrices are in column-wise order
# So this parameter decides how to arrange the matrix
A = matrix(data,nrow = 3,ncol = 3, byrow = TRUE)
# Naming rows
rownames(A) = c("a", "b", "c")
# Naming columns
colnames(A) = c("c", "d", "e")
Output
[3,] 3 6
Matrix Dimensions:
Another useful function, dim( ), provides the dimensions of a matrix stored in your workspace.
R> mymat <- rbind(c(1,3,4),5:3,c(100,20,90),11:13)
R> mymat
[,1] [,2] [,3]
[1,] 1 3 4
[2,] 5 4 3
[3,] 100 20 90
[4,] 11 12 13
R> dim(mymat)
[1] 4 3
R> nrow(mymat)
[1] 4
R> ncol(mymat)
[1] 3
R> dim(mymat)[2]
[1] 3
Subsetting:
Extracting and subsetting elements from matrices in R is much like extracting elements from
vectors. The only complication is that you now have an additional dimension. Element extraction
still uses the square-bracket operator, but now it must be performed with both a row and a
column position, given strictly in the order of [row,column]. Let’s start by creating a 3 × 3
matrix,
R> A <- matrix(c(0.3,4.5,55.3,91,0.1,105.5,-4.2,8.2,27.9),nrow=3,ncol=3)
R> A
[,1] [,2] [,3]
[1,] 0.3 91.0 -4.2
[2,] 4.5 0.1 8.2
[3,] 55.3 105.5 27.9
R> A[3,2]
[1] 105.5
Row, Column, and Diagonal Extractions:
To extract an entire row or column from a matrix, you simply specify the desired row or column
number and leave the other value blank.
R> A <- matrix(c(0.3,4.5,55.3,91,0.1,105.5,-4.2,8.2,27.9),nrow=3,ncol=3)
R> A
[,1] [,2] [,3]
[1,] 0.3 91.0 -4.2
[2,] 4.5 0.1 8.2
[3,] 55.3 105.5 27.9
R> A[,2]
[1] 91.0 0.1 105.5
R> A[1,]
[1] 0.3 91.0 -4.2
You can also identify the values along the diagonal of a square matrix (that is, a matrix with an
equal number of rows and columns) using the diag command.
R> diag(x=A)
[1] 0.3 0.1 27.9
Omitting and Overwriting:
To delete or omit elements from a matrix, you again use square brackets, but this time with
negative indexes.
R> A <- matrix(c(0.3,4.5,55.3,91,0.1,105.5,-4.2,8.2,27.9),nrow=3,ncol=3)
R> A
[,1] [,2] [,3]
[1,] 0.3 91.0 -4.2
[2,] 4.5 0.1 8.2
[3,] 55.3 105.5 27.9
R> A[,-2]
[,1] [,2]
[1,] 0.3 -4.2
[2,] 4.5 8.2
In R, the transpose of a matrix is found with the function t( ). Let’s create a new matrix and then
transpose it.
R> A <- rbind(c(2,5,2),c(6,1,4))
R> A
[,1] [,2] [,3]
[1,] 2 5 2
[2,] 6 1 4
R> t(A)
[,1] [,2]
[1,] 2 6
[2,] 5 1
[3,] 2 4
If you “transpose the transpose” of A, you’ll recover the original matrix.
R> t(t(A))
[,1] [,2] [,3]
[1,] 2 5 2
[2,] 6 1 4
Identity Matrix
The identity matrix written as Im is a particular kind of matrix used in mathematics. It’s a square
m × m matrix with ones on the diagonal and zeros elsewhere.
You can create an identity matrix of any dimension using the standard matrix function, but
there’s a quicker approach using diag.
R> A <- diag(x=3)
R> A
R will perform this multiplication in an element-wise manner, as you might expect. Scalar
multiplication of a matrix is carried out using the standard arithmetic * operator.
R> A <- rbind(c(2,5,2),c(6,1,4))
R> a <- 2
R> a*A
[,1] [,2] [,3]
[1,] 4 10 4
[2,] 12 2 8
Matrix Addition and Subtraction:
Addition or subtraction of two matrices of equal size is also performed in an element-wise
fashion. Corresponding elements are added or subtracted from one another, depending on the
operation.
You can add or subtract any two equally sized matrices with the standard + and - symbols.
R> A <- cbind(c(2,5,2),c(6,1,4))
R> A
[,1] [,2]
[1,] 2 6
[2,] 5 1
[3,] 2 4
R> B <- cbind(c(-2,3,6),c(8.1,8.2,-9.8))
R> B
[,1] [,2]
[1,] -2 8.1
[2,] 3 8.2
[3,] 6 -9.8
R> A-B
[,1] [,2]
[1,] 4 -2.1
[2,] 2 -7.2
[3,] -4 13.8
Matrix Multiplication:
In order to multiply two matrices A and B of size m × n and p × q, it must be true that n = p. The
resulting matrix A · B will have the size m × q.
[1] 2 3
R> B <- cbind(c(3,-1,1),c(-3,1,5))
R> dim(B)
[1] 3 2
R> A%*%B
[,1] [,2]
[1,] 3 9
[2,] 21 3
Matrix Inversion:
Some square matrices can be inverted. The inverse of a matrix A is denoted 𝐴−1 . An invertible
R Array
An Array is a data structure which can store data of the same type in more than two dimensions.
The only difference between vectors, matrices, and arrays are
Vectors are uni-dimensional arrays
Matrices are two-dimensional arrays
Arrays can have more than two dimensions
Before we learn about arrays, make sure you know about R matrix and R vector.
In R, we use the array( ) function to create an array.
The syntax of the array( ) function is
Here,
vector - the data items of same type
nrow - number of rows
ncol - number of columns
print(array1)
Output
, , 1
, , 2
Here,
n1 - specifies the row position
n2 - specifies the column position
mat_level - specifies the matrix level
Output
, , 1
Desired Element: 11
Output
[1] TRUE
[2] FALSE
Length of Array in R
In R, we can use the length( ) function to find the number of elements present inside the array.
For example,
Output
Total Elements: 12
NON-NUMERIC VALUES:
Logical Values:
Logical values (also simply called logicals) are based on a simple premise: a logical-valued
object can only be either TRUE or FALSE. These can be interpreted as yes/no, one/zero,
satisfied/not satisfied, and so on.
Logical values in R are written fully as TRUE and FALSE, but they are frequently abbreviated
as T or F.
Assigning logical values to an object is the same as assigning numeric values.
R> foo <- TRUE
R> foo
[1] TRUE
Logical Vector:
R> var <- c(T,F,F,F,T,F,T,T,T,F,T,F)
R> var
[1] TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE
R> length(x=var)
[1] 12
Logical Matrix:
R> qux <- matrix(data=baz,nrow=3,ncol=4,byrow=foo)
R> qux
[,1] [,2] [,3] [,4]
[1,] TRUE FALSE FALSE FALSE
[2,] TRUE FALSE TRUE TRUE
[3,] TRUE FALSE TRUE FALSE
Logical Outcome: Relational Operators
Logicals are commonly used to check relationships between values.
R> 1==2
[1] FALSE
R> 1>2
[1] FALSE
R> (2-1)<=2
[1] TRUE
R> 1!=(2+3)
[1] TRUE
R> foo <- c(3,2,1,4,1,2,1,-1,0,3)
R> bar <- c(4,1,2,1,1,0,0,3,0,4)
R> length(x=foo)==length(x=bar)
[1] TRUE
R> foo==bar
[1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
R> foo<bar
[1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
R> foo<=bar
[1] TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE
R> foo<=(bar+10)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Multiple Comparisons: Logical Operators:
Logicals are especially useful when you want to examine whether multiple conditions are
satisfied.
R> FALSE||((T&&TRUE)||FALSE)
[1] TRUE
R> !TRUE&&TRUE
[1] FALSE
R> (T&&(TRUE||F))&&FALSE
[1] FALSE
R> (6<4)||(3!=1)
[1] TRUE
R> foo <- c(T,F,F,F,T,F,T,T,T,F,T,F)
bar <- c(F,T,F,T,F,F,F,F,T,T,T,T)
R> foo&bar
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
FALSE
R> foo |bar
[1] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
Logicals Are Numbers!
Because of the binary nature of logical values, they’re often represented with TRUE as 1 and
FALSE as 0. In fact, in R, if you perform elementary numeric operations on logical values,
TRUE is treated like 1, and FALSE is treated like 0.
R> TRUE+TRUE
[1] 2
R> FALSE-TRUE
[1] -1
R> T+T+F+T+F+F+T
[1] 4
R> 1&&1
[1] TRUE
R> 1||0
[1] TRUE
R> 0&&1
[1] FALSE
Characters:
Character strings are another common data type, and are used to represent text. In R, strings are
often used to specify folder locations or software.
Creating a String
Character strings are indicated by double quotation marks, ". To create a string, just enter text
between a pair of quotes.
R> foo <- "This is a character string!"
R> foo
[1] "This is a character string!"
R> length(x=foo)
[1] 1
R treats the string as a single entity. In other words, foo is a vector of length 1 because R counts
only the total number of distinct strings rather than individual words or characters. To count the
number of individual characters, you can use the nchar( ) function. Here’s an example using foo:
R> nchar(x=foo)
[1] 27
Strings can be compared in several ways, the most common comparison being a check for
equality.
R> "alpha"=="alpha"
[1] TRUE
R> "alpha"!="beta"
[1] TRUE
R> c("alpha","beta","gamma")=="beta"
[1] FALSE TRUE FALSE
Other relational operators work as you might expect. For example, R considers letters that come
later in the alphabet to be greater than earlier letters, meaning it can determine whether one string
of letters is greater than another with respect to alphabetical order.
R> "alpha"<="beta"
[1] TRUE
R> "gamma">"Alpha"
[1] TRUE
Concatenation:
There are two main functions used to concatenate (or glue together) one or more strings: cat and
paste. The difference between the two lies in how their contents are returned. The first function,
cat, sends its output directly to the console screen and doesn’t formally return anything. The
paste function concatenates its contents and then returns the final character string as a usable R
object. This is useful when the result of a string concatenation needs to be passed to another
function or used in some secondary way, as opposed to just being displayed. Consider the
following vector of character strings:
R> qux <- c("awesome","R","is")
R> cat(qux[2],qux[3],"totally",qux[1],"!")
R is totally awesome !
R> paste(qux[2],qux[3],"totally",qux[1],"!")
[1] "R is totally awesome !"
Escape Sequences:
Escape sequences add flexibility to the display of character strings, which can be useful for
summaries of results and plot annotations.
Factors:
Factors in R Programming Language are data structures that are implemented to categorize the
data or represent categorical data and store it on multiple levels.
The R factor accepts only a restricted number of distinct values. For example, a data field such as
gender may contain values only from female, male, or transgender.
In the above example, all the possible cases are known early and are predefined. These distinct
values are known as levels. After a factor is created it only consists of levels that are by default
sorted alphabetically.
Suppose a data field such as marital status may contain only values from single, married,
separated, divorced, or widowed.
In such a case, we know the possible values beforehand and these predefined, distinct values are
called levels of a factor.
Create a Factor in R
In R, we use the factor() function to create a factor. Once a factor is created, it can only contain
predefined set values called levels.
The syntax for creating a factor is
factor(vector)
# Creating a vector
x <-c("female", "male", "male", "female")
print(x)
Output
#create a factor
gender <- factor(c("male", "female", "male", "transgender", "female"))
Output
[1] male
Levels: female male transgender
[1] transgender
Levels: female male transgender
female
empId = c(1, 2, 3, 4)
# which is the character vector
empName = c("Debi", "Sandeep", "Subham", "Shiba")
Output
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Debi" "Sandeep" "Subham" "Shiba"
[[3]]
[1] 4
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
"Total Staff" = numberOfEmp
)
print(empList)
Output
[[1]] $ID
[1] 1 2 3 4
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba"
$`Total Staff`
[1] 4
Accessing name components using $ command
[1] "Debi" "Sandeep" "Subham" "Shiba"
2. Access components by indices: We can also access the components of the R list using
indices. To access the top-level components of a R list we have to use a double slicing
operator “[[ ]]” which is two square brackets and if we want to access the lower or inner-
level components of a R list we have to use another square bracket “[ ]” along with the
double slicing operator “[[ ]]“.
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
"Total Staff" = numberOfEmp
)
print(empList)
# Accessing a top level components by indices
cat("Accessing name components using indices\n")
print(empList[[2]])
Output
$ID
[1] 1 2 3 4
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba"
$`Total Staff`
[1] 4
Data Frames:
A data frame is a two-dimensional data structure which can store data in tabular format.
Data frames have rows and columns and each column can be a different vector. And different
vectors can be of different data types.
R DataFrame is made up of three principal components, the data, rows, and columns.
Here,
first_col - a vector with values val1, val2, ... of same data type
second_col - another vector with values val1, val2, ... of same data type and so on
print(dataframe1)
Output
There are different ways to extract columns from a data frame. We can use [ ] , [[ ]] ,
Output
Name
1 kiran
2 Abhi
3 Raju
[1] "Kiran" "Raju" "Abhi"
print(updated)
Output
Name Age
1 Kiran 22
2 Akash 15
3 Ravi 46
4 Raju 89
Output
Total Elements: 3
Here, we have used length() to find the total number of columns in dataframe1. Since there are 3
columns, the length() function returns 3.
SPECIAL VALUES, CLASSES, AND COERCION:
Many situations in R call for special values. For example, when a data set has missing
observations or when a practically infinite number is calculated, the software has some unique
terms that it reserves for these situations. These special values can be used to mark missing
values in vectors, arrays, or other data structures.
In general, R supports:
NULL
NA
NaN
Inf / -Inf
NULL is an object and is returned when an expression or function results in an undefined value.
In R language, NULL (capital letters) is a reserved word and can also be the product of
importing data with unknown data type.
NA is a logical constant of length 1 and is an indicator for a missing value.NA (capital letters) is
a reserved word and can be coerced (compel, constrain, force, and accommodate.) to any other
data type vector (except raw). NA and “NA” (as presented as string) are not interchangeable. NA
stands for Not Available.
NaN stands for Not A Number and is a logical vector of a length 1 and applies to numerical
values, as well as real and imaginary parts of complex values, but not to values of integer vector.
NaN is a reserved word.
Inf and -Inf stands for infinity (or negative infinity) and is a result of storing either a large
number or a product that is a result of division by zero. Inf is a reserved word and is – in most
cases – product of computations in R language and therefore very rarely a product of data
import. Infinite also tells you that the value is not missing and a number!
Missing values denoted by NA and/or NaN for undefined mathematical operations.
is.na()
is.nan()
In general, you can think of attributes as either explicit or implicit. Explicit attributes are
immediately visible to the user, while R determines implicit attributes internally. You can print
explicit attributes for a given object with the attributes( ) function, which takes any object and
returns a named list. Consider, for example, the following 3 × 3 matrix:
R> foo <- matrix(data=1:9,nrow=3,ncol=3)
R> foo
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
R> attributes(foo)
$dim
[1] 3 3
Object Class
R is an object-oriented programming language, meaning entities are stored as objects and have
methods that act upon them.
Every object you create is identified, either implicitly or explicitly, with at least one class.
Class is the blueprint or a prototype from which objects are made by encapsulating data members
and functions. An object is a data structure that contains some methods that act upon its
attributes.
Class System in R
While most programming languages have a single class system, R has three class systems:
1. S3 Class
2. S4 Class
3. Reference Class
S3 Class in R
S3 class is the most popular class in the R programming language. Most of the classes that come
predefined in R are of this type.
An S3 object is basically a list with its class attributes assigned some names. And the member
variable of the object created is the components of the list.
For creating an S3 object there are two main steps:
1. Create a list(say x) with the required components
2. Then the class can be formed by function class(x) and a name should be assigned to this
class
For example,
Output
$name
[1] "Peter"
$age
[1] 21
$role
[1] "Developer"
attr(,"class")
[1] "Employee_Info"
In the above example, we have created a list named employee1 with three components.
Here, Employee_Info is the name of the class. And to create an object of this class, we have
passed the employee1 list inside class().
Finally, we have created an object of the Employee_Info class and called the object employee1.
S4 Class in R
S4 class is an improvement over the S3 class. They have a formally defined structure which
helps in making objects of the same class look more or less similar.
In R, we use the setClass() function to define a class. For example,
Here, we have created a class named Student_Info with three slots (member variables): name,
age, and CGPA.
Now to create an object, we use the new() function. For example,
Here, inside new(), we have provided the name of the class "Student_Info" and value for all three
slots.
We have successfully created the object named student1.
Example: S4 Class in R
Output
Slot "age":
[1] 21
Slot "GPA":
[1] 3.5
Reference Class in R:
Defining a reference class is similar to defining a S4 class. Instead of setClass() we use the
setRefClass() function. For example,
Output
[1] 3.5
BASIC PLOTTING:
One particularly popular feature of R is its incredibly flexible plotting tools for data and model
visualization.
Using plot with Coordinate Vectors:
The easiest way to think about generating plots in R is to treat your screen as a blank, two-
dimensional canvas. You can plot points and lines using x and y-coordinates. are usually
represented with points written as a pair: (x value, y value).
The R function plot( ), on the other hand, takes in two vectors—one vector of x locations and one
vector of y locations—and opens a graphics device where it displays the result.
For example, let’s say you wanted to plot the points (1.1,2), (2,2.2), (3.5,−1.3), (3.9,0), and
(4.2,0.2). In plot, you must provide the vector of x locations first, and the y locations second.
Let’s define these as foo and bar, respectively:
R> foo <- c(1.1,2,3.5,3.9,4.2)
R> bar <- c(2,2.2,-1.3,0,0.2)
R> plot(foo,bar)
Graphical Parameters
There are a wide range of graphical parameters that can be supplied as arguments to the plot()
function. These parameters invoke simple visual enhancements, like coloring the points and
adding axis labels, and can also control technical aspects of the graphics device.
Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 62
CHAPTER - 1
Some of the most commonly used graphical parameters are listed here
type : Tells R how to plot the supplied coordinates (for example, as stand-alone points or joined
by lines or both dots and lines).
main, xlab, ylab : Options to include plot title, the horizontal axis label, and the vertical axis
label, respectively.
col : Color (or colors) to use for plotting points and lines.
pch :Stands for point character. This selects which character to use for plotting individual points.
cex :Stands for character expansion. This controls the size of plotted point characters.
lty :Stands for line type. This specifies the type of line to use to connect the points (for example,
solid, dotted, or dashed).
lwd :Stands for line width. This controls the thickness of plotted lines.
xlim, ylim : This provides limits for the horizontal range and vertical range (respectively) of the
plotting region.
Automatic Plot Types:
By default, the plot function will plot individual points. This is the default plot type, but other
plot types will have a different appearance. To control the plot type, you can specify a single
character valued option for the argument type.
In R, we can change the type of plot using the type parameter inside the plot() function.
Here are some of the most commonly used types of plot we can use inside plot():
Value Description
"p" Points Plot (Default)
"l" Line Plot
"b" Both Line and Points
"s" Step Plot
"n" No Plotting
"h" Histogram-like Plot
# draw a line
foo <- c(1.1,2,3.5,3.9,4.2)
bar <- c(2,2.2,-1.3,0,0.2)
plot(foo,bar,type = "l")
# draw a line
foo <- c(1.1,2,3.5,3.9,4.2)
bar <- c(2,2.2,-1.3,0,0.2)
plot(foo,bar,type="b",main="My plot",xlab="x axis label",
ylab="location y")
Color:
You can set colors with the col parameter in a number of ways. The simplest options are to use
an integer selector or a character string. There are a number of color string values recognized by
R, which you can see by entering colors() at the prompt. The default color is integer 1 or the
character string "black".
1. Reading Files in R
2. Writing Files in R
1. Creation of files
2. Writing to the files
3. Reading data from a file
4. Check the existing status of a file
1. Creation of a file
It is the first operation that is performed in file handling. R enables us to escape the mutual
creation of files, such as it permits us to create runtimes files in a specific location.
Syntax
file.create("file-name-with-extension")
if (file.create("Demo.txt")) {
print('Congrats! Your File Has been created.')
} else {
print(' Unable to Create File')
}
Syntax
Parameters
x: represents the data that we want to write
file: Indicates the file that has to be written
write.table(x = ToothGrowth[1:10, ], "Demo.txt")
data = read.table("Demo.txt")
print(data) # priting data
Syntax
read.table("file-name-to-read-with-extension")
Syntax
file.exist("file-name")
if (file.exists("Demo.txt")) {
print('Your File `EdPresso.txt` Exist!')
} else {
print(‘ File `Demo.txt` is Unavailable')
}
FUNCTIONS:
A function is just a block of code that you can call and run from any part of your program. They
are used to break our code in simple parts and avoid repeatable codes.
The function Command:
To define a function, use the function command and assign the results to an object name. Once
you’ve done this, you can call the function using that object name just like any other built-in or
contributed function in the workspace.
Function Creation
A function definition always follows this standard format:
Syntax:
Here, we have defined a function called power which takes two parameters - a and b. Inside the
function, we have included a code to print the value of a raised to the power b.
Call the Function
After you have defined the function, you can call the function using the function name and
arguments. For example,
Output
Return Values
You can use the return() keyword to return values from a function. For example,
Output
Named Arguments:
In the above function call of the power() function, the arguments passed during the function call
must be of the same order as the parameters passed during function declaration.
This means that when we call power(2, 3), the value 2 is assigned to a and 3 is assigned to b. If
you want to change the order of arguments to be passed, you can use named arguments
Output
power(2, 3)
# call function with default arguments
power(b=3)
Output
R> myfibrec2(6)
[1] 8
R> myfibrec2(-3)
[1] 2
Warning message:
In myfibrec2(-3) :
Assuming you meant 'n' to be positive -- doing that instead
R> myfibrec2(0)
Error in myfibrec2(0) : 'n' is uninterpretable at 0
For example, if you call the myfibrec2 function from earlier and pass it 0, the function throws
an error and terminates. But watch what happens when you pass that function call as the first
argument to try:
R> attempt1 <- try(myfibrec2(0),silent=TRUE)
Nothing seems to happen. What’s happened to the error? In fact, the error has still occurred, but
try has suppressed the printing of an error message to the console because you passed it the
argument silent set to TRUE.
The error information is now stored in the object attempt1, which is of class "try-error". To see
the error, simply print attempt1 to the console:
R> attempt1
[1] "Error in myfibrec2(0) : 'n' is uninterpretable at 0\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in myfibrec2(0): 'n' is uninterpretable at 0>
You would have seen this printed to the console if you’d left silent set to FALSE. Catching an
error this way can be really handy, especially when a function produces the error in the body
code of another function. Using try, you can handle the error without terminating that parent
function.
TIMINGS, AND VISIBILITY:
R is often used for lengthy numeric exercises, such as simulation or random variate generation.
For these complex, time-consuming operations, it’s often useful to keep track of progress or see
how long a certain task took to complete.
For example, you may want to compare the speed of two different programming approaches to a
given problem.
A progress bar shows how far along R is as it executes a set of operations. To show how this
works, you need to run code that takes a while to execute, which you’ll do by making R sleep.
The Sys.sleep() command makes R pause for a specified amount of time, in seconds, before
continuing.
R> Sys.sleep(3)
If you run this code, R will pause for three seconds before you can continue using the console.
To use Sys.sleep in a more common fashion, consider the following:
sleep_test <- function(n)
{
result <- 0
for(i in 1:n){
result <- result + 1
Sys.sleep(0.5)
}
return(result)
}
The sleep_test function is basic—it takes a positive integer n and adds 1 to the result value for n
iterations. At each iteration, you also tell the loop to sleep for a half second. Because of that
sleep command, executing the following code takes about four seconds to return a result:
R> sleep_test(8)
[1] 8
Now, say you want to track the progress of this type of function as it executes. You can
implement a textual progress bar with three steps: initialize the bar object with txtProgressBar,
update the bar with setTxtProgressBar, and terminate the bar with close. The next function,
prog_test, modifies sleep_test to include those three commands.
prog_test <- function(n){
result <- 0
progbar <- txtProgressBar(min=0,max=n,style=1,char="=")
for(i in 1:n){
result <- result + 1
Sys.sleep(0.5)
setTxtProgressBar(progbar,value=i)
}
close(progbar)
return(result)
}
R> prog_test(8)
================================================================
[1] 8
Measuring Completion Time:
If you want to know how long a computation takes to complete, you can use the Sys.time( )
command. This command outputs an object that details current date and time information based
on your system.
R> Sys.time()
[1] "2016-03-06 16:39:27 NZDT"
You can store objects like these before and after some code and then compare them to see how
much time has passed.
R> t1 <- Sys.time()
R> Sys.sleep(3)
R> t2 <- Sys.time()
R> t2-t1
Time difference of 3.012889 secs