R Language 1st Unit Deep
R Language 1st Unit Deep
UNIT-I:
Introduction, How to run R, R Sessions and Functions, Basic Math, Variables, Data
Types, Vectors, Conclusion, Advanced Data Structures, Data Frames, Lists, Matrices,
Arrays, Classes
Introduction:
R is a programming language and software environment for statistical computing,data analysis,
scientific research and graphics representation and reporting.
R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand,
and is currently developed by the R DevelopmentCore Team.
R is an interpreted programming language .Here code is executed line by line at a time.so debugging is
easy
R allows integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for
efficiency.
. R is free open software distributed under a GNU-style copy left, and an official part of the GNU project called
GNU S
Features of R:
R is a programming language and software environment for statistical computing,data analysis,
scientific research and graphics representation and reporting.
Why use R
• R is an open source programming language and software environment for statistical computing
and graphics.
• R is an object oriented programming environment, much more than most other statistical
software packages.
• R is a comprehensive statistical platform, offering all manner of data-analytic techniques – any type of
data analysis can done in R.
• R has state-of-the-art graphics capabilities- visualize complex data.
• R is a powerful platform for interactive data analysis and exploration.
• Getting data into a usable form from multiple sources .
• R functionality can be integrated into applications written in other languages, including C++, Java,
Python , PHP, SAS and SPSS.
• R runs on a wide array of platforms, including Windows, Unix and Mac OS X.
• R is extensible; can be expanded by installing “packages”
1
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R &Advanced Data Structures
1. Data Science
Harvard Business Review named data scientist the "sexiest job of the 21st century". Glassdoor
named it the "best job of the year" for 2016. With the advent of IoT devices creating terabytes
and terabytes of data that can be used to make better decisions, data science is a field that has no
other way to go but up.
Most courses on data science include R in their curriculum because it is the data scientist’s
favourite tool.
2. Statistical computing
R is the most popular programming language among statisticians. In fact, it was initially built by
statisticians for statisticians. It has a rich package repository with more than 9100 packages with
every statistical function you can imagine.
3. Machine Learning
R has found a lot of use in predictive analytics and machine learning. It has various package for
common ML tasks like linear and non-linear regression, decision trees, linear and non-linear
classification and many more.
Everyone from machine learning enthusiasts to researchers use R to implement machine learning
algorithms in fields like finance, genetics research, retail, marketing and health car.
Alternatives to R programming:
2
Downloading and Installing R
• R is free available from the comprehensive R Archive Network (CRAN) at http://cran.r-project.org
• Precompiled binaries are available for Linux, Mac OS X and windows.
• R latest release R-3.4.0
• Installing R on windows and Mac is just like installing any other program.
• Install R Studio: a free IDE for R at http://www.rstudio.com/
• If we install R and R Studio, then we need to run R Studio only.
• R is case-sensitive.
• R scripts are simply text files with a .R extension.
•
•
We also have the help.search() function to do a search engine type of search. We could use
the ?? operator for this.
> help.search("histograms")
> ??"histograms"
You must be itching to start learning R by now. Our collection of R tutorials will help you learn R.
Whether you are a beginner or an expert, each tutorial explains the relevant concepts and syntax
with easy-to-understand examples.
R sessions
Starting an R session
The R programming can be done in two ways. We can either type the command lines on the screen
inside an "R-session", or we can save the commands as a "script" file and execute the whole file inside
R. First we will learn the R-session.
To start an R session, type 'R' from the command line in windows or linux OS. For example, from shell
prompt '$' in linux, type
$R
This generates the following output before entering the '>>' prompt of R:
R version 3.1.1 (2014-07-10) -- "Sock it to Me"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)
> help("if")
then, help lines for the "if" statement are printed.
When we work in R, the R objects we created and loaded are stored in a memory portion
called workspace. When we say 'no' to saving the workspace, we all these objects are wiped out from
the workspace memory. If we say 'yes', they are saved into a file called ".RData" is written to the
present working directory.
In Linux, this "working directory" is generally the directory from where R was started through the
command 'R'. In windows, it can be either "My Documents" or user's home directory.
When we start R in the same currnt directory next time, the work space and all the created objects are
restored automatically from this ".RData" directory.
> getwd()
[1] "/home/user"
Similarly, we can set the current wor directory by calling setwd() function:
> setwd("/home/user/prog")
In Windows version of R, the working directory can be set from menu in R window.
In case we need information on a specific file, use file.info("filename") command. This prints all the
information about this file on the screen.
Comments
Comments are like helping text in your R program and they are ignored by the interpreter while
executing your actual program. Single comment is written using # in the beginning of the statement as
follows:
# My first program in R Programming
R does not support multi-line comments but you can perform a trick which is something
> # If there are more than 1 item, we can concatenate using paste()
> print(paste("How","are","you?"))
[1] "How are you?"
R Variables and Constants
Variables in R
Variables are memory location name which is used to store data, whose value can be
changed according to our need. Unique name given to variable (function and objects as
well) is identifier.
Valid identifiers in R
1.total, 2. Sum, 3. .fine , 4. with.dot, 5. this_is_acceptable, 6. Number5
Invalid identifiers in R
1 tot@l, 2 .5um, 3. _fine, 4. TRUE, 5. .4ne
Constants in R:
Constants refer to fixed values. They are also called as literals. Basic types of constant are.
1.Numeric Constant
.
2.Character Constants
3.Built-in Constants
Numeric Constants
All numbers fall under this category. They can be of type integer, double or complex.
It can be checked with the typeof() function.
Numeric constants followed by L are regarded as integer and those followed by i are regarded
as complex.
>a=5
> class(a)
[1] "numeric"
>b=5l
> class(b)
[1] "integer"
c=10+3i
> class(c)
[1] "complex"
Numeric constants preceded by 0x or 0X are interpreted as hexadecimal numbers.
> 0xff
>[1] 255
> class("a")
[1] "character"
Built-in Constants
Some of the built-in constants defined in R along with their values is shown below.
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
> pi
[1] 3.141593
> month.name
[1] "January" "June" "February" "March" "April" "May"
[7] "July" "December"
"August" "September" "October" "November"
> month.abb
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
But it is not good to rely on these, as they are implemented as variables whose values
can be changed.
> pi
[1] 3.141593
> pi <- 56
> pi
[1] 56
R - Data Types
Variables are nothing but reserved memory locations to store values.the variables are not
declared as some data type. The variables are assigned with R-Objects and the data type of the R-
object becomes the data type of the variable. There are many types of R-objects. The frequently used
ones are
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
.
R has 5 basic atomic vectors classes (data types)
logical (e.g., TRUE, FALSE)
numeric (real or decimal) (e.g, 2, 2.0, pi)
integer (e.g,, 2L, as.integer(3))
complex (e.g, 1 + 0i, 1 + 4i)
character (e.g, "a", "swc")
v <- 2+5i
Complex 3 + 2i print(class(v))
[1] "complex"
v <- "TRUE"
print(class(v))
Character 'a' , '"good", "TRUE", '23.4' [1] "character"
v <- charToRaw("Hello")
print(class(v))
[1] "raw"
Raw "Hello" is stored as 48 65 6c 6c 6f
Type conversion:
data is converted one data type to another data type is called type
conversion. there are two types
1.Implicit type conversion
2. Explicitly type conversion
> x=c(1,2,3,4,5,6)
> as.numeric(x)
[1] 0 1 2 3 4 5 6
> as.logical(x)
> as.character(x)
> as.complex(x)
You can also create vectors by concatenating them using the c() function.
EX:
> x<-c(TRUE, TRUE, FALSE, FALSE)
> print(x)
[1] TRUE TRUE FALSE FALSE
> class(x)
[1] "logical"
> length(x)
[1] 4
> str(x)
logi [1:4] TRUE TRUE FALSE FALSE
(Don't use T and F!)
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4
[16] 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
[31] 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4
[46] 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
[61] 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4
[76] 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9
[91] 10.0
How to access Elements of a Vector?
Elements of a vector can be accessed using vector indexing. The vector
used for indexing can be logical, integer or character vector.
Using integer vector as index
Vector index in R starts from 1, unlike most programming languages where index
start from 0.
> x<-c(1,2,3,4,5,6,7,8,9)
> print(x)
[1] 1 2 3 4 5 6 7 8 9
> x[-1] #access all element but 1st element(except 1st element)
[1] 2 3 4 5 6 7 8 9
[1] 2 4
x<-c(1,2,3,4,5,6,7,8,9)
> print(x)
[1] 1 2 3 4 5 6 7 8 9
> x[c(TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE)]
[1] 1 3 5 7
> x[x > 0]
[1] 1 2 3 4 5 6 7 8 9
Using character vector as index
This type of indexing is useful when dealing with named vectors. We can name each
elements of a vector.
> x<-c("name"="sailu","age"=4)
> names(x)
[1] "name" "age"
> x["name"]
name
"sailu"
> x["age"]
age
"4"
How to modify a vector in R?
We can modify a vector using the assignment operator.
If we want to truncate the elements, we can use reassignments.
> x<-c(1,2,3,4,5,6)
> print(x)
[1] 1 2 3 4 5 6
> print(x[2])
[1] 2
> x[2]<-200
> print(x)
[1] 1 200 3 4 5 6
x<-c(1,2,3,4,5,6,7,8,9)
> print(x)
[1] 1 2 3 4 5 6 7 8 9
> x<-NULL
> print(x)
NULL
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R & Advanced Data Structures
R Matrix:
In this to work with matrix in R. You will learn to create and modify matrix, and
access matrix elements.
1.Matrix is a two dimensional data structure in R programming.
2.Matrix is similar to vectors but additionally contains the dimension attribute. All
attributes of an object can be checked with the attributes() function (dimension can
be checked directly with the dim() function).
3.We can check if a variable is a matrix or not with the class() function.
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
We can see that the matrix is filled column-wise. This can be reversed to row-wise
filling by passing TRUE to the argument byrow.
> matrix(1:9, nrow=3, byrow=TRUE) # fill matrix row-wise
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
It is possible to name the rows and columns of matrix during creation by passing a 2
element list to the argument dimnames.
> x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"),
c("A","B","C")))
> x OR print(x)
A B C
X1 4 7
Y 2 5 8
Z 3 6 9
These names can be accessed or changed with two helpful
functions colnames() and rownames().
> colnames(x)
[1] "A" "B" "C"
> rownames(x)
[1] "X" "Y" "Z"
> x
C1 C2 C3
R1 1 4 7
R2 2 5 8
R3 3 6 9
Another way of creating a matrix is by using functions cbind() and rbind() as in
column bind and row bind.
> cbind(c(1,2,3),c(4,5,6))
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> rbind(c(1,2,3),c(4,5,6))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Finally, you can also create a matrix from a vector by setting its dimension
using dim().
> x <- c(1,2,3,4,5,6)
> x
[1] 1 2 3 4 5 6
> class(x)
[1] "numeric"
[1] "matrix"
How to access Elements of a matrix?
We can access elements of a matrix using the square bracket [ indexing
method. Elements can be accessed as var[row, column]. Here rows and columns
are vectors.
Using integer vector as index
We specify the row numbers and column numbers as vectors and use it for indexing.
If any field inside the bracket is left blank, it selects all.
We can use negative integers to specify rows or columns to be excluded.
>x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> x[,] # leaving row as well as column field blank will select
entire matrix
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> x[1:4]
[1] 4 6 1 8
> x[c(TRUE,FALSE,TRUE),c(TRUE,TRUE,FALSE)]
[,1] [,2]
[1,] 4 8
[2,] 1 2
[1,] 8 3
[2,] 2 9
It is also possible to index using a single logical vector where recycling takes place if
necessary.
> x[c(TRUE, FALSE)]
[1] 4 1 0 3 9
In the above example, the matrix x is treated as vector formed by stacking columns of
the matrix one after another, i.e., (4,6,1,8,0,2,3,7,9).
The indexing logical vector is also recycled and thus alternating elements are
selected. This property is utilized for filtering of matrix elements as shown below.
> x[x>5]# select elements greater than 5
[1] 6 8 7 9
> x[,"A"]
[1] 4 6 1
> x[TRUE,c("A","C")]
A C
[1,] 4 3
[2,] 6 7
[3,] 1 9
> x[2:3,c("A","C")]
A C
[1,] 6 7
[2,] 1 9
How to modify a matrix in R?
We can combine assignment operator with the above learned methods for accessing
elements of a matrix to modify it.
>x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> dim(x) <- c(1,6)# change to 1X6 matrix [,1] [,2] [,3] [,4] [,5] [,6]
[1,]123456
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R & Advanced Data Structures
Matrix Computations:
Various mathematical operations are performed on the matrices
using the R
operators. The result of the operation is also a matrix.
Matrix Addition & Subtraction
# Create two 2x3 matrices.
matrix1<- matrix(c(3,9,-1,4,2,6),nrow=2)
print(matrix1)
matrix2 <- matrix(c(5,2,0,9,3,4),nrow=2)
print(matrix2)
# Add the matrices.
a<- matrix1 + matrix2
print(a)
[,1] [,2] , [,3]
[1,] 8 -1 5
[2,] 11 13 10
Result of subtraction
[,1] [,2] [,3]
[1,] -2 -1 -1
[2,] 7 -5 2
Matrix Multiplication & Division
Result of multiplication
[,1] [,2] [,3]
[1,] 15 0 6
[2,] 18 36 24
Result of division
[,1] [,2] [,3]
[1,] 0.6 -Inf 0.6666667
[2,] 4.5 0.4444444 1.5000000
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R & Advanced Data Structures
R Lists
List is a data structure having components of mixed(different) data types.
A vector having all elements of the same type is called atomic vector but a
vector having elements of different type is called list.
[1] 2.5
[[2]]
[1] TRUE
[[3]]
[1] 1 2 3
ACCESS,MODIFY,DELETE:
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R & Advanced Data Structures
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R & Advanced Data Structures
R Data Frame
Data frame is a two dimensional data structure in R. It is a special case of a list which
has each component of equal length.
Each component form the column and contents of the component form the row
How to create a Data Frame in R?
We can create a data frame using the data.frame() function.
# Create the data frame.
x<- data.frame( emp_id = c (1,2,3,4,5), emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"),salary = c(623.3,515.2,611.0,729.0,843.25))
>print(x)
emp_id emp_name salary
1 Rick 623.30
2 Dan 515.20
3 Michelle 611.00
4 Ryan 729.00
5 Gary 843.25
Accessing components in data frame
Components of data frame can be accessed like a list or like a matrix.
We can use either [ ] or $ operator to access columns of data frame.
>x["emp_name"]
emp_name
Rick
Dan
Michelle
Ryan
Gary
>x[“salary”]
salary
623.30
515.20
611.00
729.00
843.25 ( or )
> x$Name
[1]
Rick
Dan
Michelle
Ryan
Gary ( or )
>x[3]
salary
623.30
515.20
611.00
729.00 8
843.25
How to modify a Data Frame in R?
Data frames can be modified like we modified matrices through reassignment.
x<- data.frame( emp_id = c (1,2,3,4,5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25))
>x
emp_id emp_name salary
1 Rick 623.30
2 Dan 515.20
3 Michelle 611.00
4 Ryan 729.00
5 Gary 843.25
x[3]<-c(100,200,300,400,500)
print(x)
emp_id emp_name salary
1 Rick 100
2 Dan 200
3 Michelle 300
4 Ryan 400
5 Gary 500
Adding Components
Rows can be added to a data frame using the rbind() function.
rbind(x,list(6,"Paul",600))
>print(x)
emp_id emp_name salary
1 Rick 100
2 Dan 200
3 Michelle 300
4 Ryan 400
5 Gary 500
6 “pual” 600
cbind(x,State=c("AP","TS","MH","TN","K&j"))
print(x)
emp_id emp_name salary State
1 Rick 100 AP
2 Dan 200 TS
3 Michelle 300 MH
4 Ryan 400 TN
5 Gary 500 K&j
Deleting Component
Data frame columns can be deleted by assigning NULL to it.
> x<-NULL
>print(x)
NULL
or
$state<-NULL(only state column deleted from above table)
R Factors: Factor is a data structure used for fields that takes only predefined, finite
number of values (categorical data).vector act as input to factor
We can create a factor using the function factor().
How to create Factor
> x <- factor(c("single", "married", "married", "single"));
> x
[1] single married married single
Levels: married single
> x <- factor(c("single", "married", "married", "single"), levels = c("single",
"married", "divorced"));
> x
[1] single married married single
Levels: single married divorced
We can see from the above example that levels may be predefined even if not used.
Factors are closely related with vectors. In fact, factors are stored as integer vectors.
This is clearly seen from its structure.
> x <- factor(c("single","married","married","single"))
> str(x)
Factor w/ 2 levels "married","single": 2 1 1 2
We see that levels are stored in a character vector and the individual elements are
actually stored as indices.
Factors are also created when we read non-numerical columns into a data
frame. By default, data.frame() function converts character vector into factor. To
suppress this behavior, we have to pass the argument stringsAsFactors = FALSE.
How to access compoments of a factor?
Accessing components of a factor is very much similar to that of vectors.
> x
[1] singlemarried married single Levels: married single
> x
[1] singledivorced <NA> single
Levels: single married divorced
A workaround to this is to add the value to the level first.
> levels(x) <- c(levels(x), "widowed") # add new level
> x
[1] singledivorced widowedsingle Levels: single married divorced widowed
>x<-NULL
>X
NULL
R- Arrays
Arrays are the R data objects which can store data in more than two dimensions.
For example - If we create an array of dimension (2, 3, 4) then it creates 4
rectangular matrices each with 2 rows and 3 columns. An array is created using
the array() function. It takes vectors as input and uses the values in the dim
parameter to create an array.
> x <- array(1:9)
> x
[1] 1 2 3 4 5 6 7 8 9
> x <- array(1:9,c(3,3))
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
>x <- 1:18
dim(x) <- c(3,3,2)
print(x)
, , 1
, , 2
>
>
>
.
Example
The following example creates an array of two 3x3 matrices
each with 3 rows and 3columns.
This array has three dimensions. Notice that, although the rows are given as the first
dimension, the tables are filled column-wise. So, for arrays, R fills the columns, then
the rows, and then the rest.
CHANGE THE DIMENSIONS OF A VECTOR IN R
Alternatively, you could just add the dimensions using the dim() function. This is a little
hack that goes a bit faster than using the array() function; it’s especially useful if you
have your data already in a vector. (This little trick also works for creating matrices, by
the way, because a matrix is nothing more than an array with only two dimensions.)
Say you already have a vector with the numbers 1 through 24, like this:
> my.vector <- 1:24
You can easily convert that vector to an array exactly like my.array simply by
assigning the dimensions, like this:
> dim(my.vector) <- c(3,4,2)
If you check how my.vector looks like now, you see there is no difference from the
array my.array that you created before.
> identical(my.array, my.vector)
[1] TRUE
R Objects and Classes: Introduction and Types
We can do object oriented programming in R. In fact, everything in R is an
object.
Class is a blueprint for the object.
An object is also called an instance of a class and the process of creating this
object is called instantiation.
Objects are created by Objects are created Objects are created using
setting the class attribute using new() generator functions
$name
[1] "John"
$age
[1] 21
$GPA [1]
3.5
attr(,"class")
[1] "student"
R S4 Class: In this ,you'll learn everything about S4 classes in R; how to
define them, create them, access their slots, and use them efficiently in your
program.
> s
An object of class "student"
Slot "name":
[1] "John"
Slot "age":
[1] 21
Slot "GPA":
[1] 3.5
> s
An object of class "student" Slot
"name":
[1] "John"
> s$age
[1] 21
> s$GPA
[1] 3.5
Similarly, it is modified by reassignment.
> s$name <- "Paul"
> s
Reference class object of class "student"
Field "name":
[1] "Paul"
Field "age":
[1] 21
Field "GPA":
[1] 3.5