An Introduction to R
Sandip Mukhopadhyay
What is R?
R is a scripting/programming language and
environment for statistical computing, data
science and graphics.
R is a successor of the proprietary statistical
computing programming language S.
It is an important tool for computational
statistics, visualization and data science.
WHAT IS R?
GNU Project Developed by John Chambers @ Bell Lab
Free software environment for statistical computing and graphics
Functional programming language written primarily in C, Fortran
HISTORY AND EVOLUTION OF R
R has developed from the S language
S Version 1
S Version 2
S Version 3
S Version 4
Developed 30 years ago for research
applied to the high-tech industry
HISTORY AND EVOLUTION OF R
The regular development of R
1990’s: R developed
concurrently with S
1993: R made public
Acceleration of R development
R-Help and R-Devl mailing-lists
Creation of the R Core Group
HISTORY AND EVOLUTION OF R
Growing number of packages
2001: ~100 packages
Today: Over 10152 packages
2000: R version 1.0.1
Today: R version 3.6.1
Source: R Journal Vol 1/2
Reasons to learn R
• Free, Open source
• Preferred option in academia and research
• Great visualization
• Advanced statistics
Reasons to learn R
• Supportive open source community
• Easy extensibility via packages
• Many find it easier to learn compared to
Python
Limitations of R
• Lack of scalability
• Less acceptance in Industrial application
compared to its peer Python
• Application of R is limited to data-science,
while Python has wider usage
• Python is easier to deploy in commercial
setting
R Studio
R studio is a widely used IDE for writing, testing and executing R
codes. There are various parts in a typical screen of R studio IDE.
These are:
Console see the output
Syntax editor when we can write the code
Workspace tab where users can see active objects from the code written in
the console
History tab that shows a history of commands used in the code
File tab where folders and files can be seen in the default workspace
Plot tab shows graphs
Packages tab shows add-ons and packages required for running specific
process(s)
Help tab contains the information on IDE, commands, etc.
Syntax editor History
Console Help / Viewer
Packages in R
A package in R is the fundamental unit of
shareable code. It is a collection of the
following:
• Functions
• Data sets
• Compiled code
• Documentation for the package and for the functions
inside
Packages which are not part of core R need to be installed
This package also need to be loaded before every session.
library(“ggplot2”)
Few commands to get started
packageDescription(“ggplot2”)
help(package = “ggplot2”)
find.package(“ggplot2”)
install.packages(“ggplot2”)
Some basics about R coding
• R statements or commands can be separated by a semicolon (;) or a
new line.
• The assignment operator in R is "<-" (Although "=" also works)
• All characters after # are treated as comments.
• Single inverted comma ‘ ’ and double inverted comma “ ” work
similarly
• First bracket ( ) and third bracket [ ] work very similarly. Hardly there
is any use of second bracket { }.
Functions and Help in R
• There are over 1,000 functions at the core of R, and new R functions
are created all the time.
• Each R function comes with its own help page. To access a function’s
help page, type a question mark followed by the function’s name in
the console.
Reference materials / other R resources
1. R-blogs : https://www.r-bloggers.com
2. R tutorials :
https://www.programiz.com/r-
programming/
3. R Video book : https://www.r-
bloggers.com/in-depth-introduction-to-
machine-learning-in-15-hours-of-expert-
videos/
4. Stackoverflow
5. R pubs
Operators in R
Commonly used function in R
Commonly used function in R
Summary : what we have learnt
• Four types of operators in R are arithmetic,
relational, logical, and assignment.
• Two types of conditional statements in R are
if…else and nested if…else.
• Three types of loops in R are for loop, while
loop, and repeat loop.
• The commonly used functions in R
Types of Data Structure in R
• Scalars – single numbers; also called zero dimensional vector
• Vectors – a row of numbers; also called one dimensional array
• Matrices - These are two-dimensional data structures
• Arrays - Similar to matrices; these can have more than two
dimensions.
• Data frames - These are the most commonly used data structures in
R. A data frame is similar to a general matrix, but it can contain
different modes of data, such as a number and character.
• Lists - These are the most complex data structures. A list may contain
a combination of vectors, matrices, data frames, and even other lists.
Types of Data Structure in R : Scalar
Scalars – single numbers; also called zero dimensional vector
Example:
f <- 3 # numeric
f
g <- "US" # text
g
h <- TRUE # logical
h
Types of Data Structure in R : Vector
Vectors – a row of numbers; also called one dimensional array.
One dimensional
Example:
a <- c(1, 2, 5, 3, 6, -2, 4)
a
b <- c("one", "two", "three")
b
c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
c
Vectors
• Vectors are stored like arrays in C
• Vector indices begin at 1
• All Vector elements must have the same mode such as integer,
numeric (floating point number), character (string), logical (Boolean),
complex, object etc.
Create a vector of numbers
The c function (c is short for combine) creates a new vector consisting of three
values: 4, 7, and 8.
Vectors
A vector cannot hold values of different data types.
Consider the example below. We are trying to place
integer, string and boolean values together in a
vector.
Note: All the values are converted to the same data
type, i.e. “character”.
Vectors
Accessing the value (s) in the vector
Create a variable by the name, “VariableSeq” and assign to it a vector
consisting of string values.
• Access values in a vector, specify the indices at which the value is
present in the vector. Indices start at 1.
Types of Data Structure in R : Matrices
Matrices - These are two-dimensional data structures
Example:
vector <- c(1,2,3,4)
f <- matrix(vector, nrow=2, ncol=2)
f
[,1] [,2]
[1,] 1 3
[2,] 2 4
Matrices
To access the 2nd column of the matrix, simply provide the column number and
omit the row number.
To access the 2nd and 3rd columns of the matrix, simply provide the column
numbers and omit the row number.
Types of Data Structure in R : Arrays
Arrays - Similar to matrices; these can have more than two dimensions.
a <- matrix(c(1,1,1,1) , 2, 2)
b <- matrix(c(2,2,2,2) , 2, 2)
x <- array(c(a,b), c(2,2,2))
Types of Data Structure in R : Data frames
Data frames - These are the most commonly used data structures in R.
A data frame is similar to a general matrix, but it can contain different
modes of data, such as a number and character.
name <- c( “Ram” , “Laxman” , “Sita”, “Urmila” )
gender <- c(“M”, “M”, “F”, “F”)
age <- c(27,26,25, 24)
df <- data.frame(name, gender, age)
df
Data Frames
Think of a data frame as something akin to a database table or an Excel
spreadsheet.
Create a data frame
• First create three vectors, “EmpNo”, “EmpName” and “ProjName”
• Then create a data frame, “Employee”
Types of Data Structure in R : Lists
Lists - These are the most complex data structures. A list may contain a
combination of vectors, matrices, data frames, and even other lists.
Example:
vec <- c(1,2,3,4)
mat <- matrix(vec,2,2)
x <- list (vec, mat)
Data Frame Access
There are two ways to access the content of data frames:
• By providing the index number in square brackets.
Example:
• By providing the column name as a string in double
brackets.
Example:
Few R functions for understanding data in data frames
• dim()
dim()function is used to obtain dimensions of a data frame.
• nrow()
nrow() function returns number of rows in a data frame.
• ncol()
ncol() function returns number of columns in a data frame.
• str()
str() function compactly displays the internal structure of R objects.
summary()
use the summary() function to return result summaries for each column of the
dataset.
Few R functions for understanding data in data frames
• head()
head()function is used to obtain the first n observations where n is set as 6 by
default.
• tail()
tail()function is used to obtain the last n observations where n is set as 6 by
default.
• edit()
• The edit() function will invoke the text editor on the R object.
Import and export of data in R
• Importing data from .csv file
• Two very important functions
• read.csv ()– it reads a .csv file from a specified file
• write.csv () – it creates a .csv file in the working directory
• read.csv () is a special case of read.table ()
• write.csv () is a special case of write.table()
Reading Spreadsheets
read.xlsx(“filename”,…)
where, filename argument defines the path of the file to be read; the
dots “…” define the other optional arguments.
Working with directory
getwd()
getwd() command returns the absolute filepath of the current working
directory.
setwd()
setwd() command resets the current working directory to another
location as per users’ preference.
dir()
This function returns a character vector of the names of files or
directories in the named directory.
version to view the version of the paper