Unit I - R Programming
Unit I - R Programming
UNIT I: INTRODUCTION
VARIABLES
Variables are used to store the information to be manipulated and referenced in the R program.
The R variable can store an atomic vector, a group of atomic vectors, or a combination of many R
objects.
R supports two ways of variable assignment:
1. Using equal operator ( = ): Operators use an equal sign to assign values to variables.
Syntax: variable_name = value
Ex: x = 10
2. Using the leftward operator (< -): Operator use a leftward operator to assign values to
variables where data is copied from right to left.
Syntax: variable_name < - value
Ex: x < -20
DATA TYPES
R data types are used in computer programming to specify the kind of data that can be stored in a
variable. The operating system allocates memory based on the data type of the variable and decides
what can be stored in the reserved memory.
The following data types are used in R programming:
1. Integer: This data type is used to store the value as an integer.
Ex: 3L, 66L, 2346L
2. Numeric: Decimal value is called numeric in R, and it is the default computational data type
Ex : 12, 32, 112, 54.32
3. Complex: A complex value in R is defined as real value and the pure imaginary value i.
Ex : Z=1+2i, t=7+3i
4. Logical: It is a special data type for data with only two possible values which can be construed
as true/false.
Ex : TRUE and FALSE
5. Character: In R programming, a character is used to represent string values.
Ex : 'a', '"good'", "TRUE", '35.4'
OPERATORS
An operator is a symbol tells the compiler to perform specific logical or mathematical
manipulations. In R programming, there are different types of operators, and each operator
performs a different task.
There are as follows
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators
1. Arithmetic Operators
The arithmetic operators are used to perform arithmetic operations like, addition, subtraction,
multiplication, division and modulo. An arithmetic expression is one which comprises
arithmetic operators and variables or constants. Here variables and constants are called as
operands. The arithmetic operators are as follows.
+ : addition
- : subtraction
* : multiplication
/ : division
%% : modulo
^ : power
Ex : a+b, a-b etc.
2. Relational Operators
Relational operators are used to construct relational expressions, which are used to compare
two quantities. A relational expression is of the form operand1 operator operand2. The relation
operator are as follows.
< : is less than
> : is greater than
>= : is greater than or equal to
<= : is lesser than or equal to
== : is equal to
!= : is not equal to
Ex: a < b, a = =10.
3. Logical Operators
These are used to construct compound conditional expressions. The logical operators && and
|| are used to combine two expressions and make decision and ! is used to negate a conditional
expression.
The Logical operators are
&& : AND
|| : OR
! : Logical NOT
& : Logical AND
| : Logical OR
4. Assignment Operators
Assignment operators are used to assign the result of an expression to a variable.
Syntax: variable = expression;
<- : Left assignment operators.
= : Equal Operator
Ex: a <- 20, C = 90
5. Miscellaneous Operators
Miscellaneous operators are used for a special and specific purpose. These operators are not
used for general mathematical or logical computation.
: The colon operator is used to create the series of numbers in sequence for a vector
Ex: v <- 1:8
print(v)
Output
12345678
class() FUNCTION
It is a built-in function is used to determine the data type of the variable provided to it. The class
function returns the data type of variable.
Syntax: class(variable)
Ex: var1 = "hello"
print(class(var1))
Output: “character”
VECTORS
In R, a sequence of elements which share the same data type is known as vector. Vector is classified
into two parts:
CREATION OF VECTOR
1) Using the c() function
Vector can be create by using c() function. This function returns a onedimensional array or
simply vector.
Syntax: Vector_Name <- c ( List of Elements)
Ex: Myvec <- c (1,3,1,4,2)
print(Myvec)
Output: 1 3 1 4 2
Output: 1 2 3 4 5 6 7 8 9 10
3) Using the seq() function
A sequence function creates a sequence of elements as a vector. The seq() function is used by
setting step size with ‘by' parameter.
Numeric vector: A vector which contains numeric elements is known as a numeric vector. If
we assign a decimal value to any variable, then that variable will become a numeric type.
Ex: num_vec<-c(10.1, 10.2, 33.2)
print(num_vec )
class(num_vec)
Integer vector
A non-fraction numeric value is known as integer data. An integer value can be assigned to
variable by appending L to the value.
Ex: int_vec1<-c(1L,2L,3L,4L,5L)
print(int_vec1)
class(int_vec1)
Output: 1L,2L,3L,4L,5L
“integer”
Logical vector
The logical data types have only two values i.e., True or False. These values are based on
which condition is satisfied. A vector which contains Boolean values is known as the logical
vector.
Ex: d<- 5
e<- 6
f<- 7
log_vec<-c(d<e, d<f, e<d,e<f,f<d,f<e)
print(log_vec)
class(log_vec)
VECTOR OPERATION
1) Combining vectors
By combining one or more vectors, it forms a new vector which contains all the elements of
each vector.
Ex: p <- c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r <- c (p, q)
print (r)
Output: "1" "2" "4" "5" "7" "8" "shubham" "arpita" "nishka" "gunjan" "vaishali" "sumit"
2) Arithmetic operations
We can perform all the arithmetic operation on vectors. The arithmetic operations are performed
member-by-member on vectors.
Ex: a<-c(1,3,5,7)
b<-c(2,4,6,8)
print (a+b)
print (a-b)
print (a*b)
print (a%%b)
Output: 3 7 11 15
-1 -1 -1 -1
2 12 30 56
1 3 5 7
MATRICES
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with the
help of the vector input to the matrix function. In R, matrix( ) is used to create matrix.
Output:
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14
Ex: For the above created R matrix, accessing the elements as follow
#Accessing element present on 3rd row and 2nd column
print(R[3,2])
Output:
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 20 13
row4 14 15 16
Output:
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
17 18 19
Output:
col1 col2 col3
row1 5 6 7 17
row2 8 9 10 18
row3 11 12 13 19
row4 14 15 16 20
MATRIX OPERATIONS
In R, we can perform the mathematical operations on a matrix such as addition, subtraction,
multiplication, etc.
Ex: R <- matrix(c(5:16), nrow = 4,ncol=3)
S <- matrix(c(1:12), nrow = 4,ncol=3)
sum<-R+S
print(sum)
sub<-R-S
print(sub)
mul<-R*S
print(mul)
div<-R/S
print(div)
Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 11
Statistical Analysis and R Programming 2024-25
Output:
[,1] [,2] [,3]
[1,] 6 14 22
[2,] 8 16 24
[3,] 10 18 26
[4,] 12 20 28
ARRAYS
In R, arrays are the data objects which allow us to store data in more than two dimensions. In R,
an array is created using array () function. This function takes a vector as an input and to create an
array. it uses vectors values in the dim parameter.
Ex:- If we will create an array of dimension (2, 3, 4) then it will create 4 rectangular matrices of
2 row and 3 columns.
Syntax:
array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))
Where data: It is an input vector is given to the array.
row_size: the number of row elements an array can store.
column_size: the number of columns elements an array can store.
matrices: In R, the array consists of multi-dimensional matrices
dim_names: This is used to change the default names of rows and columns.
Output:
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
Output:
, , Matrix1
Col1 Col2 Col3
Row1 1 10 13
Row2 3 11 14
Row3 5 12 15
, , Matrix2
Col1 Col2 Col3
Row1 1 10 13
Row2 3 11 14
Row3 5 12 15
MANIPULATION OF ELEMENTS
The array is made up matrices in multiple dimensions so that the operations on elements of an
array are carried out by accessing elements of the matrices.
Ex: #Creating two vectors of different lengths
vec1 <-c(1,3,5)
vec2 <-c(10,11,12,13,14,15)
res1 <- array(c(vec1,vec2),dim=c(3,3,1))
print(res1)
vec1 <-c(8,4,7)
vec2 <-c(16,73,48,46,36,73)
res2 <- array(c(vec1,vec2),dim=c(3,3,1))
print(res2)
res3 <- mat1+mat2
print(res3)
Output:
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,1
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73
LISTS
In R, A list is a data structure which has components of mixed data types. Lists are the objects of
R which contain elements of different types such as number, vectors, string and another list inside
it.
The function which is used to create a list in R is list( ).
Ex: list_1<-list(1,2,3)
list_2<-list("Shubham","Arpita","Vaishali")
print(list_1)
print(list_2)
Output:
1
2
3
"Shubham"
"Arpita"
"Vaishali"
1. Creating a list.
2. Assign a name to the list elements with the names() function.
3. Print the list data.
Ex: list_data <- list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow= 2),
list("BCA","MCA","B.tech"))
names (list_data) <- c("Students", "Marks", "Course")
print(list_data)
Output:
$Students
[1] "Shubham" "Nishka" "Gunjan"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."
Output:
"Shubham" "Arpita" "Nishka"
Output:
$Student
"Shubham" "Arpita" "Nishka"
Output:
"Moradabad"
$<NA>
NULL
$Course
"Masters of computer applications"
Output: 1 2 3 4 5
10 11 12 13 14
11 13 15 17 19
MERGING LISTS
R allows us to merge one or more lists into one list. To merge the lists or combine the list pass all
the lists into list function as a parameter, and it returns a list which contains all the elements which
are present in the lists.
7
9
DATA FRAMES
A data frame is a two-dimensional array-like structure or a table in which a column contains values
of one variable, and rows contains one set of values from each column.
A data frame is a special case of the list in which each component has equal length. A matrix can
contain one type of data, but a data frame can contain different data types such as numeric,
character, factor, etc.
There are following characteristics of a data frame.
The columns name should be non-empty.
The rows name should be unique.
The data is stored in a data frame can be a factor, numeric, or character type.
Each column contains the same number of data items.
In R, the data frames are created with frame() function of data. This function contains the vectors
of any type such as numeric, character, or integer.
Ex: Create a data frame that contains employee id (integer vector), employee name(character
vector), salary (numeric vector), and starting date(Date vector).
Ex: emp.data<- data.frame( employee_id = c (1:5),
employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
sal = c(623.3,915.2,611.0,729.0,843.25),
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")), )
print(emp.data)
Output:
employee_id employee_name sal starting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 915.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 20
Statistical Analysis and R Programming 2024-25
Output:
emp.data.employee_id emp.data.sal
1 623.30
2 515.20
3 611.00
4 729.00
5 843.25
Output:
employee_id employee_name sal starting_date
1 Shubham 623.3 2012-01-01
Output:
employee_id starting_date
2 2013-09-23
3 2014-11-15
Output:
employee_id employee_name sal starting_date Address
1 Shubham 623.30 2012-01-01 Moradabad
2 Arpita 515.20 2013-09-23 Lucknow
NON-NUMERIC VALUES
LOGICAL VALUES
Logical-values can only take on two values: TRUE or FALSE.
Logical-values represent binary states like - >yes/no and ->one/zero
Logical-values are used to indicate whether a condition has been met or not.TRUE and
FALSE Notation
Logical-values are represented as TRUE and FALSE
Assigning Logical-values
Ex: b1 <- TRUE
b2 <- FALSE
Output:
TRUE
all(): Checks whether all elements in a vector meet a specific condition. It returns TRUE if all
elements satisfy the condition; otherwise, it returns FALSE
Ex: vector2 <- c(1, 2, 3, 4, 5)
result <- all(vector2 > 0)
Output:
TRUE
ii) Long versions are for comparing individual values. Long versions return a single
logical - value.
Ex: `&&`, `||`
Using long versions of logical operators evaluates only the first pair of logicals in two
vectors.
Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 24
Statistical Analysis and R Programming 2024-25
Output:
24
STRING
A string is a data type.
It is used to represent text or character data.
Strings can consist of almost any combination of characters, including numbers.
Strings are commonly used for storing and manipulating textual information.
For ex: names, sentences, and text-data extracted from files or databases.
Strings can create by using single or double quotation marks.
Ex: single_quoted <- 'This is a single-quoted string.'
double_quoted <- "This is a double-quoted string."
nchar(): It is used to determine the number of characters in a given string. It calculates and returns
the length of a string in terms of the number of characters
Ex: my_string <- "Hello, World!"
string_length <- nchar(my_string)
cat("The length of the string is:", string_length)
Output:
The length of the string is: 13
CONCATENATION
Two main functions are used for concatenating strings: `cat` and `paste`.
1) Using the cat() Function
cat() can be used for concatenating and printing strings with optional separators.
Ex: cat("Hello", "World")
Output:
"Hello World"
Output:
"Hello World"
Output:
"Hello, World"
ESCAPE SEQUENCES
The backslash (\) is used to invoke an escape sequence.
Escape sequences allow to enter characters that control the format and spacing of the string.
Ex: `\n` starts a newline.
`\t` represents a horizontal tab.
`\b` invokes a backspace.
`\\` is used to include a single backslash.
`\"` includes a double quote.
Output:
"Hello"
2. sub(): It is used for replacing the first occurrence of a pattern within a string
Ex: text <- "I like apples, but apples are red."
new_text <- sub("apple", "banana", text)
Output:
I like bananas, but apples are red.
Output:
I like bananas, but bananas are red.
SPECIAL VALUES
When a data set has missing observations or when a practically infinite number is calculated the
software has some unique terms reserved for these situations.
They are
INF and -INF: When a number is too large for R to represent, the value is given as Infinite.
Ex: 1 / 0
Inf+1
Output: INF
Nan (Not a Number): In some situations, it is impossible to express the result of calculation
using number, in such cases Nan is given as the output.
Ex: -Inf+Inf Output: NaN
Inf/Inf
0/0
NA (Not Available):- If the value is not define, data value is out of range, in such cases NA
values be printed as output.
Ex: X< - c (1,2,3) Output: NA
X[4]
COERCION
In R programming, converting from one object or data type to another object or data type is referred
as coercion.
There are two types of coercion
1. Implicit coercion: This type of coercion occurs automatically.
Ex: The logical value True will be treated as 1 and False will be treated as 0.
2. Explicit coercion: This type of coercion can be done with the help of Is-Dot Object-Checking
Functions and As-Dot Object-Checking Functions.
is.vector(num.vec1) // TRUE
is.logical(num.vec1) // FALSE
Output
GRAPHICAL PARAMETERS
There are a wide range of graphical parameters that can be supplied as arguments to the plot
function
type –types parameter tells R how to plot the supplied coordinates.
The default value for type is "p", which can be interpreted as “points only.” If type="l"
meaning “lines only”. "b" for both points and lines "o" for overplotting the points with
lines. The option type="n" results in no points or lines plotted.
Ex: foo <- c(1.1, 2, 3.5, 3.9 ,4.2)
bar <- c(2, 2.2, -1.3, 0, 0.2)
plot(foo,bar,type="b")
main, xlab, ylab : Options to include plot title, the horizontal axis label,and the vertical axis
label, respectively
Ex: > plot(foo,bar,type="b",main="My lovely plot", xlab="x axis label", ylab="location y")
plot(foo,bar,type="b",main="My lovely plot\ntitle on two lines",xlab="", ylab="")
col : it is a coloris to use for plotting points and lines. The simplest options are to use an
integer selector or a character string. The default color is integer 1 or the character string
"black". There are eight possible integer values and around 650 character strings tspecify
color. also specify colors using RGB (red,green, and blue) levels
Ex: plot(foo,bar,type="b",main="My lovely plot",xlab="",ylab="",col=2)
pch : pch stands for point character. This selects which character to use for plotting
individual points. The pch parameter controls the character used to plot individual data
points. a single character to use for each point, or specify a value between 1 and 25. The
symbols corresponding to each integer are shown below.
Ex: foo <- c(1.1, 2, 3.5, 3.9 ,4.2)
bar <- c(2, 2.2, -1.3, 0, 0.2)
plot(foo,bar,type="b",main="My lovely plot",xlab="",ylab="", col=4,pch=8)
cex: It stands for character expansion. This controls the size of plotted point characters.
Ex:plot(foo,bar,type="b",main="Mylovelyplot",xlab="",ylab="", col=4, pch=8, cex=2.3)
lty: It stands for line type. This specifies the type of line to use to connect the points (for
example, solid, dotted, or dashed). It take the values 1 through 6. These options are shown
in the figure below.
Ex: plot(foo,bar,type="b",main="My lovely plot",xlab="",ylab="", col=4,pch=8,lty=2)
lwd: It stands for line width. This controls the thickness of plotted lines.
Ex: plot(foo,bar,type="b",main="My lovely plot", xlab="",ylab="",col=4, pch=8,lty=2,
cex=2. ,lwd=3.3)
xlim, ylim :This provides limits for the horizontal range and vertical range (respectively)
of the plotting region.
Ex: plot(foo,bar,type="b",main="My lovely plot",xlab="",ylab="", col=6, pch=15, lty=3,
cex=0.7,lwd=2, xlim=c(3,5), ylim=c(-0.5,0.2))
DISADVANTAGES
Basic Security: R lacks basic security. It is an essential part of most programming. R as it
cannot be embedded in a web application due to less security.
Lesser Speed: R programming language is much slower than other programming languages
such as MATLAB and Python. In comparison to other programming language, R packages
are much slower.
Complicated Language: The people who don't have prior knowledge or programming
experience may find it difficult to learn R.