Sandeep G
Introduction
• R is a programming language used for statistical
computing and graphics.
• It is an open-source software, which means that it is
freely available for anyone to use and modify.
• R provides a wide range of statistical and graphical
techniques, and has a large user community with many
contributed packages.
Introduction
• R was created by Ross Ihaka and Robert Gentleman at
the University of Auckland, New Zealand in the early
1990s.
• The first version of R was released in 1995.
• R is free and open source, which makes it accessible to everyone.
• R has a large and active user community, which provides a wealth of
resources, including documentation, tutorials, and support forums.
• R provides a wide range of statistical and graphical techniques that
are not available in other software packages.
• R is highly customizable, which means that you can adapt it to your
specific needs and preferences.
• R can have a steep learning curve for beginners who are not familiar
with programming or statistical concepts.
Integrated Development Environment (IDE)
• In R, an Integrated Development Environment (IDE) is a software
application designed to make the process of writing, testing, and
debugging R code more efficient and organized. Some popular IDEs
for R include RStudio, Jupyter Notebook, and Visual Studio Code.
• A typical IDE for R provides a code editor with syntax highlighting,
autocompletion, and code formatting features to make writing code
easier and more efficient. It also offers a console window where R
commands can be executed and the output can be viewed in real-
time.
R-Studio
• RStudio is an integrated development environment (IDE) for R.
• It provides a user-friendly interface for coding, debugging, and testing
R code.
• Batch processing - refers to the automated execution of a series of R
commands or scripts without user interaction.
R-Studio
Script Window Workspace
Console
Graphs/Packages
Installing R:
• Go to the official R website (https://www.r-project.org/) and
download the appropriate version of R for your operating system.
• Follow the installation instructions to install R on your computer.
Installing RStudio:
• Go to the official RStudio website
(https://www.rstudio.com/products/rstudio/download/) and
download the appropriate version of RStudio for your operating
system.
• Follow the installation instructions to install RStudio on your
computer.
Arithmetic Operations in R
• R can be used as a calculator for simple arithmetic operations
• Addition: 2 + 3 evaluates to 5
• Subtraction: 10 - 5 evaluates to 5
• Multiplication: 4 * 5 evaluates to 20
• Division: 20 / 4 evaluates to 5
• Integer Division: 20 %/% 3 evaluates to 6 (the quotient when 20 is
divided by 3)
• Modulus: 20 %% 3 evaluates to 2 (the remainder when 20 is
divided by 3)
Declare variables in R console
• In R, you can declare variables to store values using the assignment
operator (<- or =).
• To declare a variable, simply assign a value to a name of your choice.
• Here are some examples:
• Declare a numeric variable:
• x <- 5
• The variable x now contains the value 5.
Declare variables in R console
• Declare a character variable:
• name <- “abc"
• The variable name now contains the string “abc".
• Declare a logical variable:
• is_raining <- TRUE
• The variable is_raining now contains the logical value TRUE.
Declare variables in R console
• To view the value of a variable, simply type the name of the variable
in the console and press enter.
Example: x will print the value of the variable x.
• Note that variable names in R are case-sensitive, and cannot start
with a number or contain spaces or special characters.
• Declaring variables is a fundamental concept in programming, and it's
important to master it before moving on to more advanced topics in
R.
Listing Variables in R
• The ls() function is used to list the variables that are currently defined
in the R workspace.
• Here's how to use the ls() function:
• Type ls() in the console and press enter.
• The console will display a list of all the variables that are currently defined in
the workspace.
• ls(pattern = "x") will list only the variables whose names contain the letter "x".
Writing R Scripts with Comments
• In addition to the R commands, you can also include comments in
your script to explain what the code is doing and why.
• Comments are lines of text that are not executed as R commands, but
are instead meant for human readers.
• To write a comment in an R script, simply start the line with the #
character.
• The # character tells R to ignore everything that follows it on that line.
Managing Working Directory in R
• The working directory is the folder where R looks for files and saves
outputs by default.
• In R, you can check the current working directory using the getwd()
function.
• For example, if you want to know the current working directory in
your R console or script, type the following command:
• getwd()
• This will return the path of your current working directory.
Managing Working Directory in R
• You can also change your working directory using the setwd()
function.
• For example, if you want to set your working directory to a folder
called "mydata" on your desktop, type the following command:
• setwd("C:/Users/LENOVO/Desktop/mydata").
• This will set your working directory to the "mydata" folder on your
desktop.
Clearing Workspace in R with rm()
• In R, you can clear your entire workspace using the rm(list = ls())
command. This will remove all variables from your workspace.
• Be careful when using rm(list = ls()), as it will remove all variables from your
workspace, including any important data that you have stored.
• Alternatively, you can selectively remove variables using the rm() function.
• For example, if you have a variable called "var" that you want to remove,
type the following command:
rm(var)
Clearing Workspace in R with rm()
• You can also remove multiple variables at once by including them as
arguments in the rm() function.
• For example, if you want to remove both "mydata" and "mylist"
variables, type the following command:
rm(mydata, mylist)
Determine type of variable
• Class() to determine the type of variable
Class(5L)
Check whether a given input is numeric or integer. This is known as the
is-dot-function.
• is.numeric() as.numeric
• is.integer() as.integer
• Is.character() as.character
Basic data types
1. Vectors- are the most basic R data objects. A vector is a sequence
of data elements of the same data type.
A function c() is used to create a vector in R, which further allows
users to combine values into a vector.
a) X<-c(11,12,11,13)
b) #print vector
x
c) #Attach labels to the vector elements-option-1
names<-c(“A”, “B”, “C”, “D”)
d) #Print X
X
Basic data types
#Attach labels to the vector elements-option-2
t<-c(A=11, B=12,C=11,D=13)
Vector Length
Length(t)
Coercion of vector elements
• A vector in R can only hold elements of the same type, which means
that users cannot have a vector that contains both logical and
numeric data types. If the user wants to build a mixed vector that
contains both integers and characters, then automatically, R performs
coercion to make sure that the vector contains elements of same
type.
Vector Arithmetic
• a<-c(50,100,30)
• a*3
• earnings<-c(50,100,80)
• Expense<-c(30,40,30)
Earnings - Expense
earnings + c(10,20,30)
Earnings*c(1,2,3)
Earnings/c(1,2,3)
• #calculate sum of elements in vector
• Z<-c(5,10,15)
• Sum(z)
Vector subsetting
• It is used to break vectors into selected parts and derive a new vector
known as a subset of the original vector.
• W<-c(10,12,14,15)
• W[1]
• W[c(2,4)]
Vector subsetting
• W<-c(10,12,14,15)
• W[-1]
• W[-c(2,3)]
• Minus does not work with name
Matrices
• Matrices are the R objects in which the elements are arranged in a
two-dimensional rectangular layout.
• A matrix contain elements of same atomic type
• To build a matrix, we use the matrix function
# Create a 2 by 3 matrix with values 1 to 6 and 2 rows
> Matrix (1:6, nrow=2)
# Create a 2 by 3 matrix with values 1 to 6 and 2 rows
> Matrix (1:6, ncol=2)
Matrices
• # Fill up the matrix in row wise fashion
• >matrix (1:6, nrow=2, byrow=TRUE)
• #Vector containing the values 1 to 3 to the matrix function, with 2 rows and 3
columns
• > matrix (1:3, nrow=2, ncol=3)
• #matrix with a vector whose multiple does not nicely fit in the matrix
• >matrix (1:4, nrow=2, ncol=3)
• # matrix with character elements
• char<-matrix(LETTERS[1:6], nrow=4, ncol=3)
• #create a matrix with elements as 12 random numbers between 1 and 15,
nrow=3
r<-matrix(sample(1:15,12),nrow = 3)
Matrices – cbind() and rbind() functions
• > cbind (1:3, 1:3)
• >rbind(1:3, 1:3)
• #matrix ‘m’, containing the elements 1 to 6
• m<-matrix(1:6, byrow=TRUE, nrow=2)
• #Add 7,8,9 values rbind function
• > rbind(m,7:9)
• #Add 10,11 values rbind function
• >cbind (m,c(10,11))
Number of columns in a matrix
• m<-matrix(1:6, nrow=2)
• #number of rows
• nrow(m)
• #number of columns
• ncol(m)
Matrices – naming the matrix
• We use two functions – rownames() and colnames()
• #matrix m, containing (1:6, byrow=TRUE, nrow=2)
• >m<-matrix(1:6, byrow=TRUE, nrow=2)
• >rownames (m)<-c(“row1”, “row2”)
• >colnames(m)<-c(“col1”,”col2”,”col3”)
Matrices – naming the matrix
• We use one-liner ways of naming matrices while we are creating it.
• We can use dimnames argument of the matrix function
• We need to specify a list that has a vector of row names as the first
element and a vector of columns names as the second element
• #matrix ‘m’, containing the elements 1 to 6
• m<-matrix(1:6, byrow=TRUE, nrow=2, dimnames=list(c(“row1”,
“row2”), c(“col1”, “col2”, “col3”)))
Matrix Subsetting
• #Select elements randomly into matrix
• m<-matrix(sample(1:15,12),nrow=3)
• #Select all elements in row 3
• m[3,]
• #Select all elements in col 3
• m[,3]
• #Select the element in row 1 and col 3
• m[1,3]
Matrix Subsetting
• #what happens when we decide not to include a comma to clearly
discern between column and row indices
• m[5]
• #select multiple elements
• m[c(1,2), c(2,3)]
• m[c(1,3), c(1,3,4)]
• m[2,c(2,3)]
Matrix Subsetting
• rownames(m)<-c(“r1”, “r2”, “r3”)
• colnames(m)<-c(“c1”, “c2”, “c3”, “c4”)
• #subsetting by names
• m[“r2”, “c3”] #similar to m[2,3]
• m[2, “c3”]
• m[3, c(“c3”, “c4”)]
Transpose of Matrix
• #t() function can be used
• m<-matrix(1:6,nrow=3)
• t(m)
Practice Questions
• 1. Declare a variable x and assign it the value 5.
• 2. Declare a variable y and assign it the value
"hello".
• 3. List all the variables in the current R workspace
• 4. Clear the current R workspace.
• 5. Create a vector x with the values 1, 2, 3, 4, and 5.
What is the length of x?
• 6. Create a vector y with the values 6, 7, 8, 9, and 10. What is
the sum of the first and last elements of y?
• 7. Create a vector z with the values 11, 12, 13, 14, and 15.
Use subsetting to select the second, fourth, and fifth elements
of z.
• 8. Create a vector A with the values 3, 6, 9, and 12. Use
subsetting and arithmetic operations to select the first and
third elements of A and multiply them together.
• 9. Create a vector b with the values "apple", "banana",
"cherry", "dates", and "oranges". Use subsetting to select the
third and fifth elements of b.
• 10. Create a vector x with the values 1, 2, 3, 4, 5, 6, 7, 8, 9, and
10. Create a new vector y that contains all the elements of x
except the 2nd, 5th, and 7th elements. What is the value of the
4th element of y?
• 11. Create a 2x3 matrix ‘m’ with the values 1, 2, 3, 4, 5, and 6. What
is the number columns in m?
• 12. Create a 4x4 matrix n with the values 1, 2, 3, ..., 16 in row-major
order (i.e., filling the matrix by rows). Use the ncol function to find
the number of columns in n.
• 13. Create a 3x3 matrix p with the values 1, 2, 3, 4, 5, 6, 7, 8, and 9.
Use subsetting to select the first two rows of p.
• 14. Create a 2x2 matrix q with the values 1, 2, 3, and 4. Use
subsetting to select the second column of q.
• 15. Create a 3x3 matrix r with the values 1, 2, 3, 4, 5, 6, 7, 8, and 9.
Use subsetting and arithmetic operations to select the second and
third columns of r and multiply them element-wise.
• 16. Create a 2x2 matrix S with the values 1, 2, 3, and 4. Transpose S
using the t function.
• 17. Create a 3x3 matrix t with the values 1, 2, 3, 4, 5, 6, 7, 8, and 9.
Use the cbind function to add a fourth column to t with the values 10,
11, and 12.
• 18. Create a 2x2 matrix u with the values 1, 2, 3, and 4. Use the rbind
function to add a third row to u with the values 5 and 6.
• 19. Create a square matrix ‘mat’ with first nine numbers. Create
another square matrix ‘new’ with elements from 10 to 18. Then
replace the diagonal elements of ‘new’ same as that of ‘mat’.
• 20. In the above question, find the off diagonal elements.
Factors
• In R, the factor() function creates a categorical or factor variable.
Factors represent categorical data, such as levels of a categorical
variable, where the values are discrete and unordered. The factor()
function is commonly used to convert character or numeric variables
into factors.
• The basic syntax for the factor() function is as follows:
factor(x, levels, labels, ordered = FALSE)
Factors
The basic syntax for the factor() function is as follows:
factor(x, levels, labels, ordered = FALSE)
X: The input vector that you want to convert into a factor.
Levels: An optional argument that specifies the levels of the factor. If not
provided, the unique values of x will be used as levels, sorted in
alphabetical or numerical order.
Labels: An optional argument that allows you to assign custom labels to
the levels. If not provided, the levels will be used as labels.
Ordered: A logical value indicating whether the factor should be ordered.
By default, it is set to FALSE, indicating an unordered factor.
Factors
# Example 1: Creating a factor from a character vector
colors <- c("red", "blue", "green", "red", "blue")
color_factor <- factor(colors)
Colors
# Example 2: Creating a factor with custom levels and labels
temperature <- c("low", "medium", "high", "low", "high")
temp_factor <- factor(temperature, levels = c("low", "medium", "high"), labels = c("L", "M",
"H"))
temp_factor
# Display the structure of the factor
str(temp_factor)
Factors
# Summarizing a Factor
Summary is a generic function used to produce summaries of the
results.
summary(temp_factor)
table(temp_factor)
Ordered Factors
# Create an ordered factor
temperature <- c("low", "medium", "high", "low", "high")
temp_factor <- factor(temperature, levels = c("low", "medium", "high"),
ordered = TRUE)
temp_factor
Data Frame
• Data Frame is a fundamental data structure to store data sets.; it is
similar to a spreadsheet with rows and columns, where each column
can be a different vector.
• It can include numbers, characters, text, and so on
Data Frame
Creating a data frame
• To create a data frame in R, we can use the data.frame() function. The
basic syntax for creating a data frame is as follows:
data.frame(..., row.names = NULL)
Data Frame
# Create a data frame
id<-1:5
name <- c("A", "B", "C“, “D”, “E”)
age <- c(25, 32, 28,21,43)
course <- c(“Maths"," Statistics", "Demography", "Economics", "Geography")
df <- data.frame(Id= id, Name = name, Age = age, Course = course)
nrow() returns the number of rows in a data frame or matrix.
ncol() returns the number of columns in a data frame or matrix.
names(df)
Data Frame
Subsetting of Data Frames
#Age of “B”
df[2,3]
df[2,"Age“]
Data Frame
Subsetting of Data Frames
#Retrieve the data of age column
df$Age
The above command returns the age vector inside the data frame.
Further, the double brackets notation with a name or index can also be
used. In all cases, the result is a vector.
df[[“Age”]]
Data Frame
Extending Data Frames
height<-c(163, 177, 163, 162, 157)
#Add height column to the data frame
df$height<-height
Data Frame
Extending Data Frames
height<-c(163, 177, 163, 162, 157)
#Add a row to data frame
new<-data.frame(ID=1, Age=22, Name="F", Course="Sociology",
height=159)
df<-rbind(df,new)
Data Frame
Delete Row and Column
#Delete a Column
df<-df[,-6]
# Delete a row by index
df <- df[-RowNumber, ]
Data Frame
Updating Row
# Update values in the third row
df[3, "Name"] <- "NewName"
df[3, "Age"] <- 30
Data Frame
Updating the name of a column
# Updating the name of a column
names(df)[names(df) == "Course"] <- "Subject"
# Update the names of two columns
colnames(df)[names(df) %in% c("Name", "Subject")] <-
c("FullName", "Course")
Lists
• A list in R constitutes of different objects such as strings, numbers,
and vectors. It can also include another list within it.
• A list can be created using the list() function, which takes in different
R objects and stores the values in the database.
Lists
#Create a list containing strings, numbers, vectors and a logical value
list_data<-list("Red", "Green", c(21,32,13), TRUE, 51.23, 119.1)
#Create a list containing a vector, matrix and a list
list_new<-list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8),nrow =
2),list("green", 12.3))
#Give names to the elements in the list
names(list_new)<-c("1st quarter", "A_Matrix", "A Inner list")