0% found this document useful (0 votes)
4 views74 pages

R Content

R is a programming language and software environment designed for statistical analysis and graphics, created in 1993 by Ross Ihaka and Robert Gentleman. It features a variety of data types, including vectors, lists, and data frames, and supports conditional statements and loops for decision making and iteration. R allows dynamic typing, meaning variables can change data types, and provides extensive tools for data manipulation and visualization.

Uploaded by

Ankit Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views74 pages

R Content

R is a programming language and software environment designed for statistical analysis and graphics, created in 1993 by Ross Ihaka and Robert Gentleman. It features a variety of data types, including vectors, lists, and data frames, and supports conditional statements and loops for decision making and iteration. R allows dynamic typing, meaning variables can change data types, and provides extensive tools for data manipulation and visualization.

Uploaded by

Ankit Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

R Language

➢ R is a programming language and software environment for statistical analysis, graphics

representation and reporting.

➢ R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New

Zealand, in 1993 and is currently developed by the R Development Core Team.

➢ The core of R is an interpreted computer language which allows branching and looping as

well as modular programming using functions. R allows integration with the procedures

written in the C, C++, .Net, Python or FORTRAN languages for efficiency.


Features of R Language
➢ R is a well-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
➢ R has an effective data handling and storage facility,
➢ R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
➢ R provides a large, coherent and integrated collection of tools for data analysis.
➢ R provides graphical facilities for data analysis and display either directly at the computer
or printing at the papers.
Comments

Comments are like helping text in your R program and they are ignored by the interpreter
while executing your actual program. Single comment is written using # in the beginning of
the statement as follows

# My first program in R Programming


Data Types in R
In contrast to other programming languages like C and Java, the variables in R are not
declared as some data type. The variables are assigned with R-Objects and the data type of
the R-object becomes the data type of the variable. There are many types of R-objects. The
frequently used ones are −

➢ Vectors
➢ Lists
➢ Matrices
➢ Arrays
➢ Factors
➢ Data Frames
Data Type Example
Logical TRUE, FALSE
Numeric 12.3, 5, 999
Integer 2L, 34L, 0L
Complex 3 + 2i
Character 'a' , '"good", "TRUE", '23.4'
"Hello" is stored as 48 65 6c 6c
Raw
6f
Raw Data Types

v <- charToRaw("Hello")

print(class(v))
x = 1; y = 2

z=x>y

print(z)

print(class(z))

u = TRUE; v = FALSE;

z=u & v # u | v # !u

print(z)
Decimal values are called numerics in R. It is the default computational data type. If we
assign a decimal value to a variable x as follows, x will be of numeric type.
> x = 10.5 # assign a decimal value
>x # print the value of x
[1] 10.5
> class(x) # print the class name of x
[1] "numeric“

The fact that k is not an integer can be confirmed with the is.integer function.

> is.integer(k) # is k an integer?


[1] FALSE
Integer
In order to create an integer variable in R, we invoke the as.integer function. We can be
assured that y is indeed an integer by applying the is.integer function.
> y = as.integer(3)
>y # print the value of y
[1] 3
> class(y) # print the class name of y
[1] "integer"
> is.integer(y) # is y an integer?
[1] TRUE
Complex Number
A complex value in R is defined via the pure imaginary value i.
> z = 1 + 2i # create a complex number
>z # print the value of z
[1] 1+2i
> class(z) # print the class name of z
[1] "complex"
The following gives an error as −1 is not a complex value.
> sqrt(−1) # square root of −1
[1] NaN
Warning message:
In sqrt(−1) : NaNs produced
Instead, we have to use the complex value −1 + 0i.
> sqrt(−1+0i) # square root of −1+0i
[1] 0+1i
An alternative is to coerce −1 into a complex value.
> sqrt(as.complex(−1))
[1] 0+1i
Integer
In order to create an integer variable in R, we invoke the as.integer function. We can be
assured that y is indeed an integer by applying the is.integer function.
> y = as.integer(3)
>y # print the value of y
[1] 3
> class(y) # print the class name of y
[1] "integer"
> is.integer(y) # is y an integer?
[1] TRUE
In R programming, the very basic data types are the R-objects called vectors which hold

elements of different classes as shown above. Please note in R the number of classes is not

confined to only the above six types. For example, we can use many atomic vectors and

create an array whose class will become array.


A variable provides us with named storage that our programs can manipulate. A variable in R

can store an atomic vector, group of atomic vectors or a combination of many Robjects. A

valid variable name consists of letters, numbers and the dot or underline characters. The

variable name starts with a letter or the dot not followed by a number.
Variable Name Validity Reason
var_name2. valid Has letters, numbers, dot and
underscore
var_name% Invalid Has the character '%'. Only dot(.) and
underscore allowed.
2var_name invalid Starts with a number
.var_name , valid Can start with a dot(.) but the
var.name dot(.)should not be followed by a
number.
.2var_name invalid The starting dot is followed by a
number making it invalid.
_var_name invalid Starts with _ which is not valid
Variable Assignment
The variables can be assigned values using leftward, rightward and equal to operator. The
values of the variables can be printed using print() or cat()function. The cat() function
combines multiple items into a continuous print output.

# Assignment using equal operator.


var.1 = c(0,1,2,3)
# Assignment using leftward operator.
var.2 <- c("learn","R")
# Assignment using rightward operator.
c(TRUE,1) -> var.3
print(var.1)
cat ("var.1 is ", var.1 ,"\n")
cat ("var.2 is ", var.2 ,"\n")
cat ("var.3 is ", var.3 ,"\n")
Data Type of a Variable

In R, a variable itself is not declared of any data type, rather it gets the data type of the R -

object assigned to it. So R is called a dynamically typed language, which means that we can

change a variable’s data type of the same variable again and again when using it in a

program.
Finding Variables

To know all the variables currently available in the workspace we use the ls() function. Also

the ls() function can use patterns to match the variable names.

print(ls())

The ls() function can use patterns to match the variable names.

# List the variables starting with the pattern "var".

print(ls(pattern = "var"))
Finding Variables

The variables starting with dot(.) are hidden, they can be listed using "all.names = TRUE"

argument to ls() function.

print(ls(all.name = TRUE))
Deleting Variables

Variables can be deleted by using the rm() function. Below we delete the variable var.3. On

printing the value of the variable error is thrown.

rm(var.3)

print(var.3)

When we execute the above code, it produces the following result −

[1] "var.3"

Error in print(var.3) : object 'var.3' not found


Deleting Variables

Variables can be deleted by using the rm() function. Below we delete the variable var.3. On

printing the value of the variable error is thrown.

rm(var.3)

print(var.3)

When we execute the above code, it produces the following result −

[1] "var.3"

Error in print(var.3) : object 'var.3' not found


Decision Making

Decision making is an important part of programming. This can be achieved in R

programming using the conditional if...else statement.

if statement

The syntax of if statement is:


if (test_expression)
{
statement
}
If the test_expression is TRUE, the statement gets executed. But if it's FALSE, nothing

happens.

Here, test_expression can be a logical or numeric vector, but only the first element is taken

into consideration.

In the case of numeric vector, zero is taken as FALSE, rest as TRUE.


Example of if...else statement
x <- -5
if(x > 0){
print("Non-negative number")
} else {
print("Negative number")
}

if(x > 0) print("Non-negative number") else print("Negative number")


Nested if
if ( test_expression1) {
statement1
} else if ( test_expression2) {
statement2
} else if ( test_expression3) {
statement3
} else statement4
Only one statement will get executed depending upon the test_expressions
ifelse() function

ifelse(test_expression,x,y)

Here, test_expression must be a logical vector (or an object that can be coerced to logical).

The return value is a vector with the same length as test_expression.

This returned vector has element from x if the corresponding value of test_expression is

TRUE or from y if the corresponding value of test_expression is FALSE.


Example of ifelse function

a = c(5,7,2,9)

ifelse(a %% 2 == 0,"even","odd")
Looping in R

Syntax of for loop

for (val in sequence)

statement

Here, sequence is a vector and val takes on each of its value during the loop. In each

iteration, statement is evaluated.


Example of for loop

x <- c(2,5,3,9,8,11,6)

count <- 0

for (val in x) {

if(val %% 2 == 0) count = count+1

print(count)
While loop

while (test_expression)

statement

Here, test_expression is evaluated and the body of the loop is entered if the result is TRUE.

The statements inside the loop are executed and the flow returns to evaluate the

test_expression again.
Break Statement

Example 1: break statement


x <- 1:5
for (val in x) {
if (val == 3){
break
}
print(val)
}
next Statement

A next statement is useful when we want to skip the current iteration of a loop without

terminating it. On encountering next, the R parser skips further evaluation and starts next

iteration of the loop.


x <- 1:5
for (val in x) {
if (val == 3){
next
}
print(val)
}
repeat loop

A repeat loop is used to iterate over a block of code multiple number of times.

There is no condition check in repeat loop to exit the loop.

We must ourselves put a condition explicitly inside the body of the loop and use the break

statement to exit the loop. Failing to do so will result into an infinite loop.

repeat {

statement

}
Example of repeat

x <- 1

repeat {
print(x)
x = x+1
if (x == 6){
break
}
}
Sequence Generation

To generate regular sequences.


➢ seq(from, to)
➢ seq(from, to, by=)
➢ seq(from, to, length=)
➢ seq(along)
Details
First seq(.) form generate the sequence from, from+1, ..., to. seq is a generic function.

The second form generates from, from+by, ..., to.

The third generates a sequence of length equally spaced values from from to to.

The last generates the sequence 1, 2, ..., length(along).


Vectors
A vector is a sequence of data elements of the same basic type. Members in a vector are
officially called components.

When you want to create vector with more than one element, you should use c() function
which means to combine the elements into a vector.

# Create a vector.
apple <- c('red','green',"yellow")
print(apple)

# Get the class of the vector.


print(class(apple)) #Character is printed
Vectors
And here is a vector of logical values.
> c(TRUE, FALSE, TRUE, FALSE, FALSE)
[1] TRUE FALSE TRUE FALSE FALSE

A vector can contain character strings.


> c("aa", "bb", "cc", "dd", "ee")
[1] "aa" "bb" "cc" "dd" "ee“

Incidentally, the number of members in a vector is given by the length function.


> length(c("aa", "bb", "cc", "dd", "ee"))
[1] 5
Combining Vectors
Combining Vectors
Vectors can be combined via the function c. For examples, the following two vectors n and s
are combined into a new vector containing elements from both vectors.
> n = c(2, 3, 5)
> s = c("aa", "bb", "cc", "dd", "ee")
> c(n, s)
[1] "2" "3" "5" "aa" "bb" "cc" "dd" "ee"

Value Coercion
In the code snippet above, notice how the numeric values are being coerced into character
strings when the two vectors are combined. This is necessary so as to maintain the same
primitive data type for members in the same vector.
Vector Arithmetics

Arithmetic operations of vectors are performed member-by-member, i.e., memberwise.


For example, suppose we have two vectors a and b.
> a = c(1, 3, 5, 7)
> b = c(1, 2, 4, 8)
Then, if we multiply a by 5, we would get a vector with each of its members multiplied by 5.
>5*a
[1] 5 15 25 35
And if we add a and b together, the sum would be a vector whose members are the sum of
the corresponding members from a and b.
>a+b
[1] 2 5 9 15
Similarly for subtraction, multiplication and division, we get new vectors via memberwise
operations.
Combining Vectors
a-b
[1] 0 1 1 -1
>a*b
[1] 1 6 20 56
>a/b
[1] 1.000 1.500 1.250 0.875
Recycling Rule
If two vectors are of unequal length, the shorter one will be recycled in order to match the
longer vector. For example, the following vectors u and v have different lengths, and their
sum is computed by recycling values of the shorter vector u.
> u = c(10, 20, 30)
> v = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
>u+v
[1] 11 22 33 14 25 36 17 28 39
Vector Index
We retrieve values in a vector by declaring an index inside a single square
bracket "[]"operator.
For example, the following shows how to retrieve a vector member. Since the vector index is
1-based, we use the index position 3 for retrieving the third member.
> s = c("aa", "bb", "cc", "dd", "ee")
> s[3]
[1] "cc"
Unlike other programming languages, the square bracket operator returns more than just
individual members. In fact, the result of the square bracket operator is another vector,
and s[3] is a vector slice containing a single member "cc".
Combining Vectors
Negative Index
If the index is negative, it would strip the member whose position has the same absolute
value as the negative index. For example, the following creates a vector slice with the third
member removed.
> s[-3]
[1] "aa" "bb" "dd" "ee"
Out-of-Range Index
If an index is out-of-range, a missing value will be reported via the symbol NA.
> s[10]
[1] NA
Numeric Index Vectors
A new vector can be sliced from a given vector with a numeric index vector, which consists
of member positions of the original vector to be retrieved.
Here it shows how to retrieve a vector slice containing the second and third members of a
given vector s.
> s = c("aa", "bb", "cc", "dd", "ee")
> s[c(2, 3)]
[1] "bb" "cc"
Duplicate Indexes
The index vector allows duplicate values. Hence the following retrieves a member twice in
one operation.
> s[c(2, 3, 3)]
[1] "bb" "cc" "cc"
Numeric Index Vectors
Out-of-Order Indexes
The index vector can even be out-of-order. Here is a vector slice with the order of first and
second members reversed.
> s[c(2, 1, 3)]
[1] "bb" "aa" "cc"
Range Index
To produce a vector slice between two indexes, we can use the colon operator ":". This can
be convenient for situations involving large vectors.
> s[2:4]
[1] "bb" "cc" "dd"
Logical Index Vector
A new vector can be sliced from a given vector with a logical index vector, which has the
same length as the original vector. Its members are TRUE if the corresponding members in
the original vector are to be included in the slice, and FALSE if otherwise.
For example, consider the following vector s of length 5.
> s = c("aa", "bb", "cc", "dd", "ee")
To retrieve the second and fourth members of s, we can define a logical vector L of the same
length, and have its second and fourth members set as TRUE.
> L = c(FALSE, TRUE, FALSE, TRUE, FALSE)
> s[L]
[1] "bb" "dd"
The code can be abbreviated into a single line.
> s[c(FALSE, TRUE, FALSE, TRUE, FALSE)]
[1] "bb" "dd"
Named Vector Members
We can assign names to vector members.
For example, the following variable v is a character string vector with two members.
> v = c("Mary", "Sue")
>v
[1] "Mary" "Sue"
We now name the first member as First, and the second as Last.
> names(v) = c("First", "Last")
>v
First Last
"Mary" "Sue"
Then we can retrieve the first member by its name.
> v["First"]
[1] "Mary"
Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it.

A list is a generic vector containing other objects.


For example, the following variable x is a list containing copies of three vectors n, s, b, and a
numeric value 3.
> n = c(2, 3, 5)
> s = c("aa", "bb", "cc", "dd", "ee")
> b = c(TRUE, FALSE, TRUE, FALSE, FALSE)
> x = list(n, s, b, 3) # x contains copies of n, s, b
Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it.
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
# Print the list.
print(list1) #Output is
# [[1]]
[1] 2 5 3

[[2]]
[1] 21.3

[[3]]
function (x) .Primitive("sin")
Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it.
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
# Print the list.
print(list1) #Output is
# [[1]]
[1] 2 5 3

[[2]]
[1] 21.3

[[3]]
function (x) .Primitive("sin")
Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to
the matrix function.

# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
Arrays
While matrices are confined to two dimensions, arrays can be of any number of dimensions.
The array function takes a dim attribute which creates the required number of dimension. In
the below example we create an array with two elements which are 3x3 matrices each.

# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)
Data Frames
Data frames are tabular data objects. Unlike a matrix in data frame each column can contain
different modes of data. The first column can be numeric while the second column can be
character and third column can be logical. It is a list of vectors of equal length.

Data Frames are created using the data.frame() function.

# Create the data frame.


BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
print(BMI)
When we execute the above code, it produces the following result −

gender height weight Age


1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26
Functions in R
func_name <- function (argument) {
statement
}

pow <- function(x, y) {


# function to print x raised to the power y

result <- x^y


print(paste(x,"raised to the power", y, "is", result))
}

#Calling of function
pow(8, 2)
Named Arguments

In the above function calls, the argument matching of formal argument to the actual
arguments takes place in positional order.

This means that, in the call pow(8,2), the formal arguments x and y are assigned 8 and 2
respectively.

We can also call the function using named arguments.

When calling a function in this way, the order of the actual arguments doesn't matter. For
example, all of the function calls given below are equivalent.
pow(x = 8, y = 2) or
Pow(y=8, x=2) # We can mix named and unnamed
Default values to the Arguments
Default Values for Arguments
We can assign default values to arguments in a function in R.

This is done by providing an appropriate value to the formal argument in the function
declaration.
Here is the above function with a default value for y.

pow <- function(x, y = 2) {


# function to print x raised to the power y
result <- x^y
print(paste(x,"raised to the power", y, "is", result))
}
Function return

check <- function(x) {


if (x > 0) {
result <- "Positive"
}
else if (x < 0) {
result <- "Negative"
}
else {
result <- "Zero"
}
return(result)
}
Function without return
If there are no explicit returns from a function, the value of the last evaluated expression is
returned automatically in R.

check <- function(x) {


if (x > 0) {
result <- "Positive"
}
else if (x < 0) {
result <- "Negative"
}
else {
result <- "Zero"
}
result
}
Multiple Returns

The return() function can return only a single object. If we want to return multiple values in
R, we can use a list (or other objects) and return it.

Following is an example.

multi_return <- function() {


my_list <- list("color" = "red", "size" = 20, "shape" = "round")
return(my_list)
}
Calling of this function
a <- multi_return()
a$color #to print the color
CSV Files
In R, we can read data from files stored outside the R environment. We can also write data
into files which will be stored and accessed by the operating system. R can read and write
into various file formats like csv, excel, xml etc.

Input as CSV File


The csv file is a text file in which the values in the columns are separated by a comma. Let's
consider the following data present in the file named input.csv.
You can create this file using windows notepad by copying and pasting this data. Save the
file as input.csv using the save As All files(*.*) option in notepad.
CSV Files
id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
,Gary,843.25,2015-03-27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance
Reading from CSV Files
Following is a simple example of read.csv() function to read a CSV file available in your
current working directory −

data <- read.csv("input.csv")


print(data)
When we execute the above code, it produces the following result −

id, name, salary, start_date, dept


1 1 Rick 623.30 2012-01-01 IT
Analyzing the CSV File

By default the read.csv() function gives the output as a data frame. This can be easily
checked as follows. Also we can check the number of columns and rows.

data <- read.csv("input.csv")

print(is.data.frame(data))
print(ncol(data))
print(nrow(data))
When we execute the above code, it produces the following result −

[1] TRUE
[1] 5
[1] 8
CSV Files
Once we read data in a data frame, we can apply all the functions applicable to data frames as
explained in subsequent section.

Get the maximum salary


# Create a data frame.
data <- read.csv("input.csv")

# Get the max salary from data frame.


sal <- max(data$salary)
print(sal)
Get the details of the person with max salary
We can fetch rows meeting specific filter criteria similar to a SQL where clause.

# Create a data frame.


data <- read.csv("input.csv")

# Get the max salary from data frame.


sal <- max(data$salary)

# Get the person detail having max salary.


retval <- subset(data, salary == max(salary))
print(retval)
When we execute the above code, it produces the following result −

id name salary start_date dept


5 NA Gary 843.25 2015-03-27 Finance
Get the details of the person with max salary
Get all the people working in IT department
# Create a data frame.
data <- read.csv("input.csv")

retval <- subset( data, dept == "IT")


print(retval)
Get the details of the person with max salary
Get the persons in IT department whose salary is greater than 600
# Create a data frame.
data <- read.csv("input.csv")

info <- subset(data, salary > 600 & dept == "IT")


print(info)
Get the people who joined on or after 2014
# Create a data frame.
data <- read.csv("input.csv")

retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))


print(retval)
Writing into CSV file

write.csv(data.frame(Actual,Predicted), file="SampleData.csv", row.names=TRUE)

getwd()
# Create vector objects.
city <- c("Tampa","Seattle","Hartford","Denver")
state <- c("FL","WA","CT","CO")
zipcode <- c(33602,98104,06161,80294)

# Combine above three vectors into one data frame.


addresses <- cbind(city,state,zipcode)

# Print a header.
cat("# # # # The First data frame\n")

# Print the data frame.


print(addresses)
# Create another data frame with similar columns
new.address <- data.frame(
city = c("Lowry","Charlotte"),
state = c("CO","FL"),
zipcode = c("80230","33949")
)

# Print a header.
cat("# # # The Second data frame\n")

# Print the data frame.


print(new.address).
print(addresses)
# Combine rows form both the data frames

all.addresses <- rbind(addresses,new.address)

# Print a header.
cat("# # # The combined data frame\n")

# Print the result.


print(all.addresses)
Merging Data Frames

We can merge two data frames by using the merge() function. The data frames must have

same column names on which the merging happens.

In the example below, we consider the data sets about Diabetes in Pima Indian Women

available in the library names "MASS". we merge the two data sets based on the values of

blood pressure("bp") and body mass index("bmi"). On choosing these two columns for

merging, the records where values of these two variables match in both data sets are

combined together to form a single data frame.


library(MASS)
merged.Pima <- merge(x = Pima.te, y = Pima.tr,
by.x = c("bp", "bmi"),
by.y = c("bp", "bmi")
)
print(merged.Pima)
nrow(merged.Pima)

You might also like