0% found this document useful (0 votes)
7 views

Introduction to r

The document outlines the advantages of using R programming, including its cost-effectiveness, cross-platform compatibility, and advanced statistical capabilities. It details various data structures in R such as vectors, lists, matrices, data frames, and factors, along with their creation and manipulation. Additionally, it covers control flow statements and loops in R, providing examples of their syntax and usage.

Uploaded by

amarnath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Introduction to r

The document outlines the advantages of using R programming, including its cost-effectiveness, cross-platform compatibility, and advanced statistical capabilities. It details various data structures in R such as vectors, lists, matrices, data frames, and factors, along with their creation and manipulation. Additionally, it covers control flow statements and loops in R, providing examples of their syntax and usage.

Uploaded by

amarnath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Why R?

■ It's free!
■ It runs on a variety of platforms including Windows, Unix and MacOS.
■ It provides an unparalleled platform for programming new statistical
methods in an easy and straightforward manner.
■ It contains advanced statistical routines not yet available in other packages.
■ It has state-of-the-art graphics capabilities.

DATA STRUCTURES IN R PROGRAMMING


A data structure is a particular way of organizing data in a computer so that it can
be used effectively.

The most essential data structures used in R include:


● Vectors
● Lists
● Dataframes
● Matrices
● Arrays
● Factors

Vectors
A vector is an ordered collection of basic data types of a given length. The only
key thing here is all the elements of a vector must be of the identical data type e.g
homogeneous data structures. Vectors are one-dimensional data structures.
You can perform addition, subtraction, multiplication, and division on the vectors.
Vectors are created by using the c() function. Coercion takes place in a vector, from
bottom to top, if the elements passed are of different data types, from logical to
integer to double to character.
The typeof() function is used to check the data type of the vector, and the class()
function is used to check the class of the vector.
Vec1 <- c(44, 25, 64, 96, 30)
Vec2 <- c(1, FALSE, 9.8, "hello world")
typeof(Vec1)
typeof(Vec2)
Output:
[1] "double"
[1] "character"
To delete a vector, you simply have to do the following:
Vec1 <- NULL
Vec2 <- NULL
For example:
x <- c("Jan","Feb","March","Apr","May","June","July")
y <- x[c(3,2,7)]
print(y)
Output:
[1] "March" "Feb" "July"

Lists
A list is a non-homogeneous data structure, which implies that it can contain
elements of different data types. It accepts numbers, characters, lists, and even
matrices and functions inside it. It is created by using the list() function.
For example:
list1<- list("Sam", "Green", c(8,2,67), TRUE, 51.99, 11.78,FALSE)
print(list1)
Output:
[[1]]
[1] "Sam"
[[2]]
[1] "Green"
[[3]]
[1] 8 2 67
[[4]]
[1] TRUE
[[5]]
[1] 51.99
[[6]]
[1] 11.78
[7]]
[1] FALSE

Accessing the Elements of a List


The elements of a list can be accessed by using the indices of those elements.
For example:
list2 <- list(matrix(c(3,9,5,1,-2,8), nrow = 2), c("Jan","Feb","Mar"), list(3,4,5))
print(list2[1])
print(list2[2])
print(list2[3])
Output:
[[1]]
[,1] [,2] [,3] (First element of the list)
[1,] 3 5 -2
[2,] 9 1 8
[[1]]
[1] "Jan" "Feb" "Mar" (Second element of the list)
[1,] 3 5 -2
[[1]]
[[1]][[1]]
[1] 3
[[1]][[2]] (Third element of the list)
[1] 4
[[1]][[3]]
[1] 5

Matrices
Matrix is a two-dimensional data structure that is homogenous, meaning that it
only accepts elements of the same data type. Coercion takes place if elements of
different data types are passed. It is created by using the matrix() function.
The basic syntax to create a matrix is given below:
matrix(data, nrow, ncol, byrow, dimnames)
where,
data = the input element of a matrix given as a vector.
nrow = the number of rows to be created.
ncol = the number of columns to be created.
byrow = the row-wise arrangement of the elements instead of column-wise
dimnames = the names of columns or rows to be created.

For example:
M1 <- matrix(c(1:9), nrow = 3, ncol =3, byrow= TRUE)
print(M1)

Output:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
M2 <- matrix(c(1:9), nrow = 3, ncol =3, byrow= FALSE)
print(M2)

Output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

By using row and column names, a matrix can be created as follows:


rownames = c("row1", "row2", "row3")
colnames = c("col1", "col2", "col3")

M3 <- matrix(c(1:9), nrow = 3, byrow = TRUE, dimnames = list(rownames,


colnames))

print(M3)

Output:
col1 col2 col3
row1 1 2 3
row2 4 5 6
row3 7 8 9

Factor
Factors are used in data analysis for statistical modeling. They are used to
categorize unique values in columns, such as “Male”, “Female”, “TRUE”,
“FALSE”, etc., and store them as levels. They can store both strings and integers.
They are useful in columns that have a limited number of unique values.
Factors can be created using the factor() function and they take vectors as inputs.

For example:
data <- c("Male","Female","Male","Child","Child","Male","Female","Female")
print(data)
factor.data <- factor(data)
print(factor.data)

Output:
[1] Male Female Male Child Child Male Female Female
Levels: Child Female Male
For any data frame, R treats the text column as categorical data and creates factors
on it.

For example: For the emp.finaldata data frame, R treats empdept as the factor.
print(is.factor(emp.finaldata$empdept))
print(emp.finaldata$empdept)

Output:
[1] TRUE
[1] Sales Marketing HR R&D IT Operations Finance
Levels: HR Marketing R & D Sales Finance IT Operations

Data Frame
Data frame is a two-dimensional array-like structure that also resembles a table, in
which each column contains values of one variable and each row contains one set
of values from each column.
A data frame has the following characteristics:
● The column names of a data frame should not be empty.
● The row names of a data frame should be unique.
● The data stored in a data frame can be a numeric, factor, or character
type.
● Each column should contain the same number of data items.

Creating a Data Frame


You can use the following syntax for creating a data frame in R programming:
empid <- c(1:4)
empname <- c("Sam","Rob","Max","John")
empdept <- c("Sales","Marketing","HR","R & D")
emp.data <- data.frame(empid,empname,empdept)
print(emp.data)

Output:
Sl.No. empid empname empdept

1 1 Sam Sales

2 2 Rob Marketing
3 3 Max HR

4 4 John R&D

Extracting Columns or Rows from a Data Frame


To extract a specific column from a data frame, use the following syntax:
result <- data.frame(emp.data$empname,emp.data$empdept)
print(result)

Output:
Sr. No. emp.data.empname emp.data.empdept

1 Sam Sales

2 Rob Marketing

3 Max HR

4 John R&D

To extract specific rows from a data frame, use the following syntax:
result <- emp.data[1:2,]
print(result)

Output:
Sr. No. empid empname empdept

1 1 Sam Sales

2 2 Rob Marketing
The following code extracts the first and third rows with second and third columns
respectively.
result <- emp.data[c(1,2),c(2,3)]
print(result)

Output:
Sr. No. empname empdept

1 Sam Sales

2 Max HR

Adding a Column to a Data Frame


To add a salary column to the above data frame, you can use the following syntax:
emp.data$salary <- c(20000,30000,40000,27000)
n <- emp.data
print(n)

Output:
Sr. No. empid empname empdept Salary

1 1 Sam Sales 20000

2 2 Rob Marketing 30000

3 3 Max HR 40000

4 4 John R&D 27000


Adding a Row to a Data Frame
To add a new row(s) to an existing data frame, you need to create a new data frame
that contains the new row(s), and then merge it with the existing data frame using
the rbind() function.

Creating a New Data Frame


emp.newdata <- data.frame(
empid = c(5:7),
empname = c("Frank","Tony","Eric"),
empdept = c("IT","Operations","Finance"),
salary = c(32000,51000,45000)
)

Merging the New Data Frame with the Existing Data Frame
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)

Output:
Sr. No. empid empname empdept Salary

1 1 Sam Sales 20000

2 2 Rob Marketing 30000

3 3 Max HR 40000

4 4 John R&D 27000

5 5 Frank IT 32000

6 6 Tony Operations 51000

7 7 Eric Finance 45000


Arrays
Arrays refer to the type of data structure that is used to store multiple items of a
similar type together. This leads to a collection of items that are stored at
contiguous memory locations. This memory location is denoted by the array name.
The position of an element can be calculated simply by adding an offset to its base
value.

For example:

Array Structure
An array consists of the following:
Array Index: The array index identifies the location of the element. The array index
starts with 0.
Array Element: Array elements are items that are stored in the array.
Array Length: The array length is determined by the number of elements that can
be stored by the array. In the above-mentioned example, the array length is 12.
There are two types of arrays:
● One-dimensional Arrays
● Multi-dimensional Arrays

One-dimensional Arrays
One- or single-dimensional arrays are the types of arrays that have array elements
stored in a sequence and can be accessed in the same order. The figure given above
is an example of a one-dimensional array.

Multi-dimensional Arrays
Multi-dimensional arrays are the arrays that have elements stored in more than one
dimension. They can be two- or three-dimensional arrays and can consist of row
and column indexes.
For example:

Accessing the Elements of an Array


The elements of an array can be accessed using the following syntax:
Syntax:
arrayName[index]

In-built Functions in R programming


These functions in R programming are provided by the R environment for direct
execution, to make our work easier.

Some examples for the frequently used in-built functions are as follows:

i). #seq() To create a sequence of numbers


print(seq(1,9))

Output:
[1] 1 2 3 4 5 6 7 8 9

ii). #sum() To find the sum of numbers


print(sum(25,50))

Output:
[1] 75

iii). #mean() To find the mean of numbers


print(mean(41:68))

Output: [1] 54.5


iv). #paste() To combine vectors after converting them to characters
>paste(1,"sam",2,"rob",3,"max")

Output:
[1] "1 sam 2 rob 3 max"

>paste(1,"sam",2,"rob",3,"max", sep = ',')

Output:
[1] "1,sam,2,rob,3,max"

>paste(1:3, c("sam","rob","max"), sep = '-', collapse = " and ")


#collapse is used to separate the values when a vector is passed

Output:
[1] "1-sam and 2-rob and 3-max"
CONTROL FLOW STATEMENTS IN R PROGRAMMING
If Statement
It is one of the control statements in R programming that consists of a Boolean
expression and a set of statements. If the Boolean expression evaluates to TRUE,
the set of statements is executed. If the Boolean expression evaluates to FALSE,
the statements after the end of the If statement are executed.
The basic syntax for the If statement is given below:
if(Boolean_expression) {
This block of code will execute if the Boolean expression returns TRUE.
}
For example:
x <- “Intellipaat”
if(is.character(x))
{
print("X is a Character")
}
Output:[1] “X is a Character”

Else Statement
In the If -Else statement, an If statement is followed by an Else statement, which
contains a block of code to be executed when the Boolean expression in the If the
statement evaluates to FALSE.
The basic syntax of it is given below:
if(Boolean_expression) {
This block of code executes if the Boolean expression returns TRUE.
} else {
This block of code executes if the Boolean expression returns FALSE.
}
For example:
x <- c("Intellipaat","R","Tutorial")
if("Intellipaat" %in% x) {
print("Intellipaat")
} else {
print("Not found")
}
Output: [1] “Intellipaat”
Else If Statement
An Else if statement is included between If and Else statements. Multiple Else-If
statements can be included after an If statement. Once an If a statement or an Else
if statement evaluates to TRUE, none of the remaining else if or Else statement
will be evaluated.
The basic syntax of it is given below:
if(Boolean_expression1) {
This block of code executes if the Boolean expression 1 returns TRUE
} else if(Boolean_expression2) {
This block of code executes if the Boolean expression 2 returns TRUE
} else if(Boolean_expression3) {
This block of code executes if the Boolean expression returns TRUE
} else {
This block of code executes if none of the Boolean expression returns TRUE
}
For example:
x <- c("Intellipaat","R","Tutorial")
if("Intellipaat" %in% x) {
print("Intellipaat")
} else if ("Tutorial" %in% x)
print("Tutorial")
} else {
print("Not found")}
Output:[1] “Intellipaat”

Switch Statement
The switch statement is one of the control statements in R programming which is
used to equate a variable against a set of values. Each value is called a case.
The basic syntax for a switch statement is as follows:
switch(expression, case1, case2, case3....)
For example:
x <- switch(
3,
"Intellipaat",
"R",
"Tutorial",
"Beginners"
)
print(x)
Output:[1] “Tutorial”
If the value passed as an expression is not a character string, then it is coerced to an
integer and is compared with the indexes of cases provided in the switch statement.
y <- "12"
x <- switch(
y,
"9"= "Good Morning",
"12"= "Good Afternoon",
"18"= "Good Evening",
"21"= "Good Night"
)
print(x)
Output:[1] “Good Afternoon”
If an expression evaluates a character string, then it is matched (exactly) to the
names of the cases mentioned in the switch statement.
■ If there is more than one match, the first matching
element is returned.
■ No default argument is available.

Loops
The function of a looping statement is to execute a block of code, several times and
to provide various control structures that allow for more complicated execution
paths than a usual sequential execution.
The types of loops in R are as follows:

Repeat Loop
A repeat loop is one of the control statements in R programming that executes a set
of statements in a loop until the exit condition specified in the loop, evaluates to
TRUE.
The basic syntax for a repeat loop is given below:
repeat {
statements
if(exit_condition) {
break
}
}
For example:
v <- 9
repeat {
print(v)
v=v-1
if(v < 1) {
break
}
}
Output:
[1] 9
[1] 8
[1] 7
[1] 6
[1] 5
[1] 4
[1] 3
[1] 2
[1] 1

While Loop
A while loop is one of the control statements in R programming which executes a
set of statements in a loop until the condition (the Boolean expression) evaluates to
TRUE.
The basic syntax of a while loop is given below
while (Boolean_expression) {
statement
}
For example:
v <-9
while(v>5){
print(v)
v = v-1
}
Output:
[1] 9
[1] 8
[1] 7
[1] 6
For Loop
For loop is one of the control statements in R programming that executes a set of
statements in a loop for a specific number of times, as per the vector provided to it.
The basic syntax of a for loop is given below
for (value in vector) {
statements
}
For example:
v <- c(1:5)
for (i in v) {
print(i)
}

Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
We can also use the break statement inside a for-loop to break it out abruptly.
For example:
v <- c(1:5)
for (i in v) {
if(i == 3){
break
}
print(i)
}
Output:[1] 1
[1] 2

Loop-control Statements
Loop-control statements are part of control statements in R programming that are
used to change the execution of a loop from its normal execution sequence.
There are two loop-control statements in R

Break Statement
A break statement is used for two purposes
■ To terminate a loop immediately and resume at the
next statement following the loop.
■ To terminate a case in a switch statement.
For example:
v <- c(0:6)
for (i in v) {
if(i == 3){
break
}
print(i)
}

Output:
[1] 0
[1] 1
[1] 2

Next Statement
A next statement is one of the control statements in R programming that is used to
skip the current iteration of a loop without terminating the loop. Whenever a next
statement is encountered, further evaluation of the code is skipped and the next
iteration of the loop starts.
For example:
v <- c(0:6)
for (i in v) {
if(i == 3){
next
}
print(i)
}
Output:
[1] 0
[1] 1
[1] 2
[1] 4
[1] 5
[1] 6
Exercises

Write R program to
1. Find the sum of 2 numbers.
2. Find the given number is odd or even.
3. Find the greatest of 3 numbers.
4. Find the sum of n numbers.
5. Generate the first 10 Fibonacci numbers.
6. Find the maximum and the minimum value of a given vector.
7. Create a dataframe for maintaining patient information in a hospital. (assume
number of patients is 5).
8. Compute sum, mean and product of a given vector element.

You might also like