100% found this document useful (3 votes)
226 views61 pages

R Language 1st Unit Deep

Uploaded by

It's Me
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
226 views61 pages

R Language 1st Unit Deep

Uploaded by

It's Me
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 61

STATISTICS WITH R PROGRAMMING

UNIT- 1: Getting Started with R &Advanced Data Structures

UNIT-I:
Introduction, How to run R, R Sessions and Functions, Basic Math, Variables, Data
Types, Vectors, Conclusion, Advanced Data Structures, Data Frames, Lists, Matrices,
Arrays, Classes
Introduction:
 R is a programming language and software environment for statistical computing,data analysis,
scientific research and graphics representation and reporting.
 R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand,
and is currently developed by the R DevelopmentCore Team.
 R is an interpreted programming language .Here code is executed line by line at a time.so debugging is
easy
 R allows integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for
efficiency.
. R is free open software distributed under a GNU-style copy left, and an official part of the GNU project called
GNU S

Features of R:
 R is a programming language and software environment for statistical computing,data analysis,
scientific research and graphics representation and reporting.

 R is a well-developed, simple and effective programming language which includes conditionals,


loops, user defined recursive functions and input and output facilities.
 R has an effective data handling and storage facility,
 R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
 R provides a large, coherent and integrated collection of tools for data analysis.
 R provides graphical facilities for data analysis and display either directly at the computer or printing
at the papers.
As a conclusion, R is world’s most widely used statistics programming language. It's the#1 choice of
data scientists and supported by a vibrant and talented community of contributors. R is taught in
universities and deployed in mission critical business applications.

Things to Know Before Start Learning R

Why use R
• R is an open source programming language and software environment for statistical computing
and graphics.
• R is an object oriented programming environment, much more than most other statistical
software packages.
• R is a comprehensive statistical platform, offering all manner of data-analytic techniques – any type of
data analysis can done in R.
• R has state-of-the-art graphics capabilities- visualize complex data.
• R is a powerful platform for interactive data analysis and exploration.
• Getting data into a usable form from multiple sources .
• R functionality can be integrated into applications written in other languages, including C++, Java,
Python , PHP, SAS and SPSS.
• R runs on a wide array of platforms, including Windows, Unix and Mac OS X.
• R is extensible; can be expanded by installing “packages”

1
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R &Advanced Data Structures

Applications of R Programming in Real World

1. Data Science
Harvard Business Review named data scientist the "sexiest job of the 21st century". Glassdoor
named it the "best job of the year" for 2016. With the advent of IoT devices creating terabytes
and terabytes of data that can be used to make better decisions, data science is a field that has no
other way to go but up.
Most courses on data science include R in their curriculum because it is the data scientist’s
favourite tool.
2. Statistical computing
R is the most popular programming language among statisticians. In fact, it was initially built by
statisticians for statisticians. It has a rich package repository with more than 9100 packages with
every statistical function you can imagine.
3. Machine Learning
R has found a lot of use in predictive analytics and machine learning. It has various package for
common ML tasks like linear and non-linear regression, decision trees, linear and non-linear
classification and many more.

Everyone from machine learning enthusiasts to researchers use R to implement machine learning
algorithms in fields like finance, genetics research, retail, marketing and health car.

Alternatives to R programming:

Python - Popular general purpose language


Python is a very powerful high-level, object-oriented programming language with an easy-to-use and simple
syntax.Python is extremely popular among data scientists and researchers. Most of the packages in R have
equivalent libraries in Python as well.While R is the first choice of statisticians and mathematicians, professional
programmers prefer implementing new algorithms in a programming language they already know.The choice
between R vs Python also depends on what you are trying to accomplish with your code. If you are trying to
analyze a dataset and present the findings in a research paper, then R is probably a better choice. But if you are
writing a data analysis program that runs in a distributed system and interacts with lots of other components, it
would be preferable to work with Python.

SAS (Statistical Analysis System)


SAS is a powerful software that has been the first choice of private enterprise for their analytics needs for a long
time. Its GUI and comprehensive documentation, coupled with reliable technical support make it a very good tool
for companies.While R is the undisputed champion in academics and research, SAS is extremely popular in
commercial analytics. But R and Python are gaining momentum in the enterprise space and companies are also
trying to move towards open-source technologies. Time will tell if SAS will continue its dominance or R/Python
will take over.

SPSS - Software package for statistical analysis


SPSS is another popular statistical tool. It is used most commonly in the social sciences and is considered the
easiest to learn among enterprise statistical tools.SPSS is loved by non-statisticians because it is similar to excel so
those who are already familiar with it will find SPSS very easy to use. SPSS has the same downside as SAS - it is
expensive. SPSS was acquired by IBM in 2009 for a reported $1.2 billion.
Alternatives to R programming

2
Downloading and Installing R
• R is free available from the comprehensive R Archive Network (CRAN) at http://cran.r-project.org
• Precompiled binaries are available for Linux, Mac OS X and windows.
• R latest release R-3.4.0
• Installing R on windows and Mac is just like installing any other program.
• Install R Studio: a free IDE for R at http://www.rstudio.com/
• If we install R and R Studio, then we need to run R Studio only.
• R is case-sensitive.
• R scripts are simply text files with a .R extension.


Run R Programming on Your Computer


You will find the easiest way to run R programming on your system (Windows) in this section.
Run R Programming in Windows
Go to (https://
Click on the CRAN link on the left sidebar
Select a mirror
Click "Download R for Windows”
Click on the link that downloads the base distribution
Run the file and follow the steps in the instructions to install R.
Should I install the 32-bit version or the 64-bit version?
Most people don’t need to worry about this. Obviously the 64-bit version of R won’t work on a 32-bit machine but bo
You might want to consider installing 32-bit version of R if your production environment is 32-bit because some pack
Getting help in R
To get help on specific topics, we can use the help() function along with the topic we want to
search. We can also use the ? operator for this.
> help(Syntax)
> ?Syntax

We also have the help.search() function to do a search engine type of search. We could use
the ?? operator for this.
> help.search("histograms")
> ??"histograms"

You must be itching to start learning R by now. Our collection of R tutorials will help you learn R.
Whether you are a beginner or an expert, each tutorial explains the relevant concepts and syntax
with easy-to-understand examples.

R sessions
Starting an R session
The R programming can be done in two ways. We can either type the command lines on the screen
inside an "R-session", or we can save the commands as a "script" file and execute the whole file inside
R. First we will learn the R-session.

To start an R session, type 'R' from the command line in windows or linux OS. For example, from shell
prompt '$' in linux, type
$R
This generates the following output before entering the '>>' prompt of R:
R version 3.1.1 (2014-07-10) -- "Sock it to Me"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.


You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.


Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or


'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]


Working with R session
Once we are inside the R session, we can directly execute R language commands by typing them line by
line. Pressing the enter key terminates typing of command and brings the > prompt again. In the example
session below, we declare 2 variables 'a' and 'b' to have values 5 and 6 respectively, and assign their sum to
another variable called 'c':
> a=5
> b=6
>c=a+b
>c
The value of the variable 'c' is printed as,
[1] 11
In R session, typing a variable name prints its value on the screen.

Get help inside R session


To get help on any function of R, type help(function-name) in R prompt. For example, if we need help
on "if" logic, type,

> help("if")
then, help lines for the "if" statement are printed.

Exit the R session


To exit the R session, type quit() in the R prompt, and say 'n' (no) for saving the workspace image. This
means, we do not want to save the memory of all the commands we typed in the current session:
> quit()
Save workspace image? [y/n/c]: n
>
Saving the R session
Note that by not saving the current session, we loose all the memory of current session commands and
the variables and objects created when we exit R prompt.

When we work in R, the R objects we created and loaded are stored in a memory portion
called workspace. When we say 'no' to saving the workspace, we all these objects are wiped out from
the workspace memory. If we say 'yes', they are saved into a file called ".RData" is written to the
present working directory.

In Linux, this "working directory" is generally the directory from where R was started through the
command 'R'. In windows, it can be either "My Documents" or user's home directory.

When we start R in the same currnt directory next time, the work space and all the created objects are
restored automatically from this ".RData" directory.

Listing the objects in the current R session


We can list the names of the objects in the current R session by ls() command. For example, start R
session fresh and proceed as follows:
> a=5
> b=6
> c=8
> sum = a+b+c
> sum
[1] 19
> ls()
[1] "a" "b" "c" "sum"

Here, the objects we created have been listed


Removing objects from the current R session
Specific objects created in the current session can be removed using rm() command. If we specify the
name of an object, it will be removed. If we just say rm(list = ls()) , all objects created so far will be
removed. See below.
> a=5
> b=6
> c=8
> sum = a+b+c
> sum
[1] 19
> ls()
[1] "a" "b" "c" "sum"
> rm(list=c("sum"))
> ls()
[1] "a" "b" "c"
> rm(list = ls())
> ls()
character(0)
Getting and setting the current working directories
From R prompt, we can get information about the current working directory using getwd()
command:

> getwd()
[1] "/home/user"
Similarly, we can set the current wor directory by calling setwd() function:

> setwd("/home/user/prog")

After this, "/home/user/prog" will be the working directory.

In Windows version of R, the working directory can be set from menu in R window.

Getting file information from R session


When we are inside R prompt, the operation system commands will not be recognised by R. If we want
to list the names of files in the current directory in which R has been started, we should
use list.files() commnd to list the files. This lists all the files in the current directory.

In case we need information on a specific file, use file.info("filename") command. This prints all the
information about this file on the screen.
Comments
Comments are like helping text in your R program and they are ignored by the interpreter while
executing your actual program. Single comment is written using # in the beginning of the statement as
follows:
# My first program in R Programming
R does not support multi-line comments but you can perform a trick which is something

#add the two numbers


#10 is assign to a
a=10
#28 is assign to b
b=28
#a is added to b
c=a+b
print(c)
[1] 38
R Reserved Words(key words)
Reserved words in R programming are a set of words that have special meaning and
cannot be used as an identifier (variable name, function name etc.).
Here is a list of reserved words in the R's parser.
Reserved words in R

if else repeat while function


for in next break TRUE
FALSE NULL Inf NaN NA
NA_integer NA_real_ NA_complex_ NA_character_ ...
_
This list can be viewed by typing help(reserved) or ?reserved at the R command
prompt as follows.

Example: Hello World Program


> # We can use the print() function
> print("Hello World!")
[1] "Hello World!"

> # Quotes can be suppressed in the output


> print("Hello World!", quote = FALSE)
[1] Hello World!

> # If there are more than 1 item, we can concatenate using paste()
> print(paste("How","are","you?"))
[1] "How are you?"
R Variables and Constants
Variables in R
Variables are memory location name which is used to store data, whose value can be
changed according to our need. Unique name given to variable (function and objects as
well) is identifier.

Rules for writing Identifiers in R


1. Identifiers can be a combination of letters, digits, period (.) and underscore (_).
2. It must start with a letter or a period. If it starts with a period, it cannot be followed by a digit.
3. Reserved words in R cannot be used as identifiers.
4. Upper case differ from lower case.

Valid identifiers in R
1.total, 2. Sum, 3. .fine , 4. with.dot, 5. this_is_acceptable, 6. Number5

Invalid identifiers in R
1 tot@l, 2 .5um, 3. _fine, 4. TRUE, 5. .4ne

Constants in R:

Constants refer to fixed values. They are also called as literals. Basic types of constant are.

1.Numeric Constant
.
2.Character Constants

3.Built-in Constants
Numeric Constants
All numbers fall under this category. They can be of type integer, double or complex.
It can be checked with the typeof() function.
Numeric constants followed by L are regarded as integer and those followed by i are regarded
as complex.
>a=5
> class(a)
[1] "numeric"

>b=5l
> class(b)
[1] "integer"

c=10+3i
> class(c)
[1] "complex"
Numeric constants preceded by 0x or 0X are interpreted as hexadecimal numbers.
> 0xff

>[1] 255

> 0XF + 1 [1] 16


Character Constants
Character constants can be represented using either single quotes (') or double
quotes (") as delimiters.
> a = 'sailu'
[1] "sailu"

> class("a")
[1] "character"

Built-in Constants
Some of the built-in constants defined in R along with their values is shown below.
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"

> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"

> pi
[1] 3.141593

> month.name
[1] "January" "June" "February" "March" "April" "May"
[7] "July" "December"
"August" "September" "October" "November"

> month.abb
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"

But it is not good to rely on these, as they are implemented as variables whose values
can be changed.
> pi
[1] 3.141593

> pi <- 56
> pi
[1] 56
R - Data Types
Variables are nothing but reserved memory locations to store values.the variables are not
declared as some data type. The variables are assigned with R-Objects and the data type of the R-
object becomes the data type of the variable. There are many types of R-objects. The frequently used
ones are
 Vectors
 Lists
 Matrices
 Arrays
 Factors
 Data Frames
.
R has 5 basic atomic vectors classes (data types)
 logical (e.g., TRUE, FALSE)
 numeric (real or decimal) (e.g, 2, 2.0, pi)
 integer (e.g,, 2L, as.integer(3))
 complex (e.g, 1 + 0i, 1 + 4i)
character (e.g, "a", "swc")

Data Type Example Verify

Logical TRUE, FALSE v <- TRUE


print(class(v))
[1] “logical”
v <- 23.5
print(class(v))
Numeric 12.3, 5, 999
[1] "numeric"

Integer 2L, 34L, 0L v <- 2L


print(class(v)
[1] "integer"

v <- 2+5i
Complex 3 + 2i print(class(v))
[1] "complex"

v <- "TRUE"
print(class(v))
Character 'a' , '"good", "TRUE", '23.4' [1] "character"
v <- charToRaw("Hello")
print(class(v))
[1] "raw"
Raw "Hello" is stored as 48 65 6c 6c 6f
Type conversion:
data is converted one data type to another data type is called type
conversion. there are two types
1.Implicit type conversion
2. Explicitly type conversion

1.Implicit type conversion :

logical -> integer -> numeric -> complex -> character.


1 < "2"
TRUE
"1" > 2
FALSE
1 < "a"
TRUE

2. Explicitly type conversion

Explicitly using the as.<class_name>.


as.numeric()
as.character()
When you coerce an existing numeric vector with as.numeric(), it does nothing.

> x=c(1,2,3,4,5,6)
> as.numeric(x)

[1] 0 1 2 3 4 5 6

> as.logical(x)

[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE

> as.character(x)

[1] "0" "1" "2" "3" "4" "5" "6"

> as.complex(x)

[1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i

Sometimes coercions, especially nonsensical ones won't work.


x <- c("a", "b", "c")
> as.numeric(x)
[1] NA NA NA
Warning message:
NAs introduced by coercion
> as.logical(x)
[1] NA NA NA
data structures
R has many data structures. These include
1.Vectors
2.Matrix
3.Arrays
4.Factor
5.Data Frame
Vectors
A vector is the most common and basic data structure in R. Vectors can be of
two types:
 atomic vectors
 lists
Atomic Vectors Atomic vector is collection elements belongs to same data type in
form of characters, logical, integers or numeric.

You can also create vectors by concatenating them using the c() function.

How to create vector:


1.Numeric vector:
xc(1,2,3)
>print(x)
[1] 1 2 3
> class(x)
[1] "numeric"
> length(x)
[1] 3
>str(x)
num [1:3] 1 2 3
x is a numeric vector. These are the most common kind(same dytpe).
They are numeric objects and are treated as double precision real numbers.

Integer vector: To explicitly create integers, add a L at the end.


> x1  c(1L, 2L, 3L)
> print(x1)
[1] 1 2 3
> class(x)
[1] "numeric"
> str(x)
num [1:3] 1 2 3
> length(x)
[1] 3
logical vectors: contains only TRUE,FALSE.

EX:
> x<-c(TRUE, TRUE, FALSE, FALSE)
> print(x)
[1] TRUE TRUE FALSE FALSE
> class(x)
[1] "logical"
> length(x)
[1] 4
> str(x)
logi [1:4] TRUE TRUE FALSE FALSE
(Don't use T and F!)

character vectors: contains collection of character enclosed by either “ “ or ‘ ‘.


> x<-c("Alec", "Dan", "Rob", "Rich")
> print(x)
[1] "Alec" "Dan" "Rob" "Rich"
> class(x)
[1] "character"
> length(x)
[1] 4
> str(x)
chr [1:4] "Alec" "Dan" "Rob" "Rich"

HOW TO Add Elements


x <- c(1,2,3)
> x=c(1,2,3)
> y=c(x,4)
> print(y)
[1] 1 2 3 4
You can also create vectors as sequence of numbers
> x<-1:10
> seq(10)
[1] 1 2 3 4 5 6 7 8 9 10
> seq(1, 10, by = 0.1)

[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4
[16] 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
[31] 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4
[46] 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
[61] 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4
[76] 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9
[91] 10.0
How to access Elements of a Vector?
Elements of a vector can be accessed using vector indexing. The vector
used for indexing can be logical, integer or character vector.
Using integer vector as index
Vector index in R starts from 1, unlike most programming languages where index
start from 0.
> x<-c(1,2,3,4,5,6,7,8,9)
> print(x)
[1] 1 2 3 4 5 6 7 8 9

> print(x[2]) #access 2nd element


[1] 2
or
> x[2]
[1] 2

x[c(2,4)] #access 2nd and 3rd elemnent


[1] 2 4

> x[-1] #access all element but 1st element(except 1st element)
[1] 2 3 4 5 6 7 8 9

>x[c(2.4, 3.54)] #real numbers are truncated to integers

[1] 2 4

Using logical vector as index


When we use a logical vector for indexing, the position where the logical vector
is TRUE is returned.
This useful feature helps us in filtering of vector as shown below.

x<-c(1,2,3,4,5,6,7,8,9)
> print(x)
[1] 1 2 3 4 5 6 7 8 9
> x[c(TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE)]
[1] 1 3 5 7
> x[x > 0]
[1] 1 2 3 4 5 6 7 8 9
Using character vector as index
This type of indexing is useful when dealing with named vectors. We can name each
elements of a vector.
> x<-c("name"="sailu","age"=4)
> names(x)
[1] "name" "age"
> x["name"]
name
"sailu"
> x["age"]
age
"4"
How to modify a vector in R?
We can modify a vector using the assignment operator.
If we want to truncate the elements, we can use reassignments.

> x<-c(1,2,3,4,5,6)
> print(x)
[1] 1 2 3 4 5 6
> print(x[2])
[1] 2
> x[2]<-200
> print(x)
[1] 1 200 3 4 5 6

How to delete a Vector?


We can delete a vector by simply assigning a NULL to it.

x<-c(1,2,3,4,5,6,7,8,9)
> print(x)
[1] 1 2 3 4 5 6 7 8 9
> x<-NULL
> print(x)
NULL
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R & Advanced Data Structures
R Matrix:
In this to work with matrix in R. You will learn to create and modify matrix, and
access matrix elements.
1.Matrix is a two dimensional data structure in R programming.
2.Matrix is similar to vectors but additionally contains the dimension attribute. All
attributes of an object can be checked with the attributes() function (dimension can
be checked directly with the dim() function).
3.We can check if a variable is a matrix or not with the class() function.

How to create a matrix in R programming?

1.Matrix can be created using the matrix() function.


2.Dimension of the matrix can be defined by passing appropriate
value for arguments nrowa nd ncol.
3.Providing value for both dimension is not necessary. If one of the
dimension is provided, the other is inferred from length of the data.
> matrix(1:9, nrow = 3, ncol = 3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

> # same result is obtained by providing only one dimension


> matrix(1:9, nrow = 3) [,1] [,2] [,3]

[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
We can see that the matrix is filled column-wise. This can be reversed to row-wise
filling by passing TRUE to the argument byrow.
> matrix(1:9, nrow=3, byrow=TRUE) # fill matrix row-wise
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
It is possible to name the rows and columns of matrix during creation by passing a 2
element list to the argument dimnames.
> x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"),
c("A","B","C")))

> x OR print(x)
A B C
X1 4 7
Y 2 5 8
Z 3 6 9
These names can be accessed or changed with two helpful
functions colnames() and rownames().
> colnames(x)
[1] "A" "B" "C"
> rownames(x)
[1] "X" "Y" "Z"

> # It is also possible to change names


> colnames(x) <- c("C1","C2","C3")
> rownames(x) <- c("R1","R2","R3")

> x
C1 C2 C3
R1 1 4 7
R2 2 5 8
R3 3 6 9
Another way of creating a matrix is by using functions cbind() and rbind() as in
column bind and row bind.
> cbind(c(1,2,3),c(4,5,6))
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6

> rbind(c(1,2,3),c(4,5,6))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Finally, you can also create a matrix from a vector by setting its dimension
using dim().
> x <- c(1,2,3,4,5,6)
> x
[1] 1 2 3 4 5 6
> class(x)
[1] "numeric"

> dim(x) <- c(2,3)


> x
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> class(x)

[1] "matrix"
How to access Elements of a matrix?
We can access elements of a matrix using the square bracket [ indexing
method. Elements can be accessed as var[row, column]. Here rows and columns
are vectors.
Using integer vector as index
We specify the row numbers and column numbers as vectors and use it for indexing.
If any field inside the bracket is left blank, it selects all.
We can use negative integers to specify rows or columns to be excluded.
>x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

> x[c(1,2),c(2,3)] # select rows 1 & 2 and columns 2 & 3


[,1] [,2]
[1,] 4 7
[2,] 5 8

> x[c(3,2),] # leaving column field blank will select entire


columns
[,1] [,2] [,3]
[1,] 3 6 9
[2,] 2 5 8

> x[,] # leaving row as well as column field blank will select
entire matrix
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

> x[-1,] # select all rows except first


[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
One thing to notice here is that, if the matrix returned after indexing is a row matrix or
column matrix, the result is given as a vector.
> x[1,]
[1] 1 4 7
> class(x[1,])
[1] "integer"
This behavior can be avoided by using the argument drop = FALSE while indexing.
> x[1,,drop=FALSE] # now the result is a 1X3 matrix rather than a
vector
[,1] [,2] [,3]
[1,] 1 4 7
> class(x[1,,drop=FALSE])
[1] "matrix"
It is possible to index a matrix with a single vector.
While indexing in such a way, it acts like a vector formed by stacking columns of the
matrix one after another. The result is returned as a vector.
>x
[,1] [,2] [,3]
[1,] 4 8 3
[2,] 6 0 7
[3,] 1 2 9

> x[1:4]
[1] 4 6 1 8

> x[c(3,5,7)] [1] 1 0 3

Using logical vector as index


Two logical vectors can be used to index a matrix. In such situation, rows and columns
where the value is TRUE is returned. These indexing vectors are recycled if necessary
and can be mixed with integer vectors.
>x
[,1] [,2] [,3]
[1,] 4 8 3
[2,] 6 0 7
[3,] 1 2 9

> x[c(TRUE,FALSE,TRUE),c(TRUE,TRUE,FALSE)]
[,1] [,2]
[1,] 4 8
[2,] 1 2

> x[c(TRUE,FALSE),c(2,3)]# the 2 element logical vector is


recycled to 3 element vector [,1] [,2]

[1,] 8 3
[2,] 2 9
It is also possible to index using a single logical vector where recycling takes place if
necessary.
> x[c(TRUE, FALSE)]
[1] 4 1 0 3 9
In the above example, the matrix x is treated as vector formed by stacking columns of
the matrix one after another, i.e., (4,6,1,8,0,2,3,7,9).
The indexing logical vector is also recycled and thus alternating elements are
selected. This property is utilized for filtering of matrix elements as shown below.
> x[x>5]# select elements greater than 5
[1] 6 8 7 9

> x[x%%2 == 0] # select even elements


[1] 4 6 8 0 2
Using character vector as index
Indexing with character vector is possible for matrix with named row or column. This
can be mixed with integer or logical indexing.
> x
ABC
[1,] 4 8 3
[2,] 6 0 7
[3,] 1 2 9

> x[,"A"]
[1] 4 6 1

> x[TRUE,c("A","C")]
A C
[1,] 4 3
[2,] 6 7
[3,] 1 9

> x[2:3,c("A","C")]
A C
[1,] 6 7
[2,] 1 9
How to modify a matrix in R?
We can combine assignment operator with the above learned methods for accessing
elements of a matrix to modify it.
>x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

> x[2,2] <- 10 # modify a single element


[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 10 8
[3,] 3 6 9

> x[x<5] <- 0 # modify elements less than 5


[,1] [,2] [,3]
[1,] 0 0 7
[2,] 0 10 8
[3,] 0 6 9
A common operation with matrix is to transpose it. This can be done with
the function t().
> t(x) # transpose a
matrix [,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 10 6
[3,] 7 8 9
We can add row or column using rbind() and cbind() function respectively.
Similarly, it can be removed through reassignment.
> cbind(x, c(1, 2, 3))# add column
[,1] [,2] [,3] [,4]
[1,] 0 0 7 1
[2,] 0 10 8 2
[3,] 0 6 9 3

> rbind(x,c(1,2,3)) # add row


[,1] [,2] [,3]
[1,] 0 0 7
[2,] 0 10 8
[3,] 0 6 9
[4,] 1 2 3

> x <- x[1:2,]# remove last row


[,1] [,2] [,3]
[1,] 0 0 7
[2,] 0 10 8
Dimension of matrix can be modified as well, using the dim() function.
> x
[,1] [,2] [,3]
[1,]135
[2,]246

> dim(x) <- c(3,2) # change to 3X2 matrix


[,1] [,2]
[1,]14
[2,]25
[3,]36

> dim(x) <- c(1,6)# change to 1X6 matrix [,1] [,2] [,3] [,4] [,5] [,6]
[1,]123456
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R & Advanced Data Structures

Matrix Computations:
Various mathematical operations are performed on the matrices
using the R
operators. The result of the operation is also a matrix.
Matrix Addition & Subtraction
# Create two 2x3 matrices.
matrix1<- matrix(c(3,9,-1,4,2,6),nrow=2)
print(matrix1)
matrix2 <- matrix(c(5,2,0,9,3,4),nrow=2)
print(matrix2)
# Add the matrices.
a<- matrix1 + matrix2
print(a)
[,1] [,2] , [,3]
[1,] 8 -1 5
[2,] 11 13 10
Result of subtraction
[,1] [,2] [,3]
[1,] -2 -1 -1
[2,] 7 -5 2
Matrix Multiplication & Division
Result of multiplication
[,1] [,2] [,3]
[1,] 15 0 6
[2,] 18 36 24
Result of division
[,1] [,2] [,3]
[1,] 0.6 -Inf 0.6666667
[2,] 4.5 0.4444444 1.5000000
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R & Advanced Data Structures
R Lists
List is a data structure having components of mixed(different) data types.
A vector having all elements of the same type is called atomic vector but a
vector having elements of different type is called list.

How to create a list in R programming?


List can be created using the list() function.
> x <- list("a" = 2.5, "b" = TRUE, "c" = 1:3)
Here, we create a list x, of three components with
data types double, logical and integer vector
respectively. Its structure can be examined with the
str() function.
>Print(x)

[1] 2.5

[[2]]
[1] TRUE

[[3]]
[1] 1 2 3
ACCESS,MODIFY,DELETE:
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R & Advanced Data Structures
STATISTICS WITH R PROGRAMMING
UNIT- 1: Getting Started with R & Advanced Data Structures

R Data Frame
Data frame is a two dimensional data structure in R. It is a special case of a list which
has each component of equal length.
Each component form the column and contents of the component form the row
How to create a Data Frame in R?
We can create a data frame using the data.frame() function.
# Create the data frame.
x<- data.frame( emp_id = c (1,2,3,4,5), emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"),salary = c(623.3,515.2,611.0,729.0,843.25))

>print(x)
emp_id emp_name salary
1 Rick 623.30
2 Dan 515.20
3 Michelle 611.00
4 Ryan 729.00
5 Gary 843.25
Accessing components in data frame
Components of data frame can be accessed like a list or like a matrix.
We can use either [ ] or $ operator to access columns of data frame.
>x["emp_name"]
emp_name
Rick
Dan
Michelle
Ryan
Gary
>x[“salary”]
salary
623.30
515.20
611.00
729.00
843.25 ( or )
> x$Name
[1]
Rick
Dan
Michelle
Ryan
Gary ( or )
>x[3]
salary
623.30
515.20
611.00
729.00 8
843.25
How to modify a Data Frame in R?
Data frames can be modified like we modified matrices through reassignment.
x<- data.frame( emp_id = c (1,2,3,4,5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25))
>x
emp_id emp_name salary
1 Rick 623.30
2 Dan 515.20
3 Michelle 611.00
4 Ryan 729.00
5 Gary 843.25
x[3]<-c(100,200,300,400,500)
print(x)
emp_id emp_name salary
1 Rick 100
2 Dan 200
3 Michelle 300
4 Ryan 400
5 Gary 500

Adding Components
Rows can be added to a data frame using the rbind() function.
rbind(x,list(6,"Paul",600))
>print(x)
emp_id emp_name salary
1 Rick 100
2 Dan 200
3 Michelle 300
4 Ryan 400
5 Gary 500
6 “pual” 600

cbind(x,State=c("AP","TS","MH","TN","K&j"))
print(x)
emp_id emp_name salary State
1 Rick 100 AP
2 Dan 200 TS
3 Michelle 300 MH
4 Ryan 400 TN
5 Gary 500 K&j

Deleting Component
Data frame columns can be deleted by assigning NULL to it.
> x<-NULL
>print(x)
NULL
or
$state<-NULL(only state column deleted from above table)
R Factors: Factor is a data structure used for fields that takes only predefined, finite
number of values (categorical data).vector act as input to factor
We can create a factor using the function factor().
How to create Factor
> x <- factor(c("single", "married", "married", "single"));
> x
[1] single married married single
Levels: married single
> x <- factor(c("single", "married", "married", "single"), levels = c("single",
"married", "divorced"));
> x
[1] single married married single
Levels: single married divorced
We can see from the above example that levels may be predefined even if not used.
Factors are closely related with vectors. In fact, factors are stored as integer vectors.
This is clearly seen from its structure.
> x <- factor(c("single","married","married","single"))
> str(x)
Factor w/ 2 levels "married","single": 2 1 1 2
We see that levels are stored in a character vector and the individual elements are
actually stored as indices.
Factors are also created when we read non-numerical columns into a data
frame. By default, data.frame() function converts character vector into factor. To
suppress this behavior, we have to pass the argument stringsAsFactors = FALSE.
How to access compoments of a factor?
Accessing components of a factor is very much similar to that of vectors.
> x
[1] singlemarried married single Levels: married single

> x[3] # access 3rd element


[1] married
Levels: married single

>x[c(2, 4)] # access 2nd and 4th element


[1] married single
Levels: married single

> x[-1] # access all but 1st element


[1] married married single
Levels: married single

> x[c(TRUE, FALSE, FALSE, TRUE)]# using logical vector


[1] single single Levels: married single
How to modify a factor?
Components of a factor can be modified using simple assignments. However, we
cannot choose values outside of its predefined levels.
> x
[1] singlemarried married single Levels: single married divorced

> x[2] <- "divorced" # modify second element;x


[1] singledivorced marriedsingle
Levels: single married divorced

> x[3] <- "widowed" # cannot assign values outside levels


Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "widowed") :
invalid factor level, NA generated

> x
[1] singledivorced <NA> single
Levels: single married divorced
A workaround to this is to add the value to the level first.
> levels(x) <- c(levels(x), "widowed") # add new level

> x[3] <- "widowed"

> x
[1] singledivorced widowedsingle Levels: single married divorced widowed

How to delete a factor?


delete factor By NULL

>x<-NULL

>X

NULL
R- Arrays

Arrays are the R data objects which can store data in more than two dimensions.
For example - If we create an array of dimension (2, 3, 4) then it creates 4
rectangular matrices each with 2 rows and 3 columns. An array is created using
the array() function. It takes vectors as input and uses the values in the dim
parameter to create an array.
> x <- array(1:9)
> x
[1] 1 2 3 4 5 6 7 8 9
> x <- array(1:9,c(3,3))
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
>x <- 1:18
dim(x) <- c(3,3,2)
print(x)
, , 1

[,1] [,2] [,3]


[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

, , 2

[,1] [,2] [,3]


[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18

>
>
>
.
Example
The following example creates an array of two 3x3 matrices
each with 3 rows and 3columns.

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim=c(3,3,2))
print(result)

When we execute the above code, it produces the following


result:
, , 1
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
, , 2
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

Naming Columns and Rows


We can give names to the rows, columns and matrices in the
array by using the dimnames parameter.

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

column.names <- c("COL1","COL2","COL3")


row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.


result <- array(c(vector1,vector2),dim=c(3,3,2),dimnames =
list(column.names,row.names,matrix.names))
print(result)
When we execute the above code, it produces the following
result:
, , Matrix1
col1 col22 col3
row1 5 10 13
row2 9 11 14
row3 3 12 15
, , Matrix2
col1 col22 col3
row1 5 10 13
row2 9 11 14
row3 3 12 15
Accessing Array Elements
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")
# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim=c(3,3,2),dimnames =
list(column.names,row.names,matrix.names))
# Print the third row of the second matrix of the array.
print(result[3,,2])
# Print the element in the 1st row and 3rd column of the
1st matrix.
print(result[1,3,1])
# Print the 2nd Matrix.
print(result[,,2])

When we execute the above code, it produces the following


result:
ROW1 ROW2 ROW3
3 12 15
[1] 13
ROW1 ROW2 ROW3
COL1 5 10 13
COL2 9 11 14
COL3 3 12 15
Manipulating Array Elements
As array is made up matrices in multiple dimensions, the
operations on elements of array are carried out by accessing
elements of the matrices.
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
array1 <- array(c(vector1,vector2),dim=c(3,3,2)) #
Create two vectors of different lengths. vector3
<- c(9,1,0)
vector4 <- c(6,0,11,3,14,1,2,6,9)
array2 <- array(c(vector1,vector2),dim=c(3,3,2)) #
create matrices from these arrays.
matrix1 <- array1[,,2]
matrix2 <- array2[,,2]
# Add the matrices.
result <- matrix1+matrix2
print(result)

When we execute the above code, it produces the following


result:
[,1] [,2] [,3]
[1,] 10 20 26
[2,] 18 22 28
[3,] 6 24 30
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
new.array <- array(c(vector1,vector2),dim=c(3,3,2))
print(new.array)
# Use apply to calculate the sum of the rows across all the
matrices.
result <- apply(new.array, c(1), sum)
print(result)

When we execute the above code, it produces the following


result:
, , 1
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
, , 2
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
[1] 56 68 60
You can create an array easily with the array() function, where you give the data as
the first argument and a vector with the sizes of the dimensions as the second
argument. The number of dimension sizes in that argument gives you the number of
dimensions. For example, you make an array with four columns, three rows, and two
“tables” like this:
> my.array <- array(1:24, dim=c(3,4,2))
> my.array
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24

This array has three dimensions. Notice that, although the rows are given as the first
dimension, the tables are filled column-wise. So, for arrays, R fills the columns, then
the rows, and then the rest.
CHANGE THE DIMENSIONS OF A VECTOR IN R
Alternatively, you could just add the dimensions using the dim() function. This is a little
hack that goes a bit faster than using the array() function; it’s especially useful if you
have your data already in a vector. (This little trick also works for creating matrices, by
the way, because a matrix is nothing more than an array with only two dimensions.)
Say you already have a vector with the numbers 1 through 24, like this:
> my.vector <- 1:24
You can easily convert that vector to an array exactly like my.array simply by
assigning the dimensions, like this:
> dim(my.vector) <- c(3,4,2)
If you check how my.vector looks like now, you see there is no difference from the
array my.array that you created before.
> identical(my.array, my.vector)
[1] TRUE
R Objects and Classes: Introduction and Types
We can do object oriented programming in R. In fact, everything in R is an
object.
Class is a blueprint for the object.
An object is also called an instance of a class and the process of creating this
object is called instantiation.

R has three class systems.


1.S3,
2.S4 and more recently
3.Reference class systems.
They have their own features and peculiarities and choosing one over the other is a
matter of preference. Below, we give a brief introduction to them.

Comparision between S3 vs S4 vs Reference Class

S3 Class S4 Class Referene Class

Class defined Class defined


Lacks formal definition
using setClass() using setRefClass()

Objects are created by Objects are created Objects are created using
setting the class attribute using new() generator functions

Attributes are accessed Attributes are accessed


Attributes are accessed using $
using $ using @

Methods belong to generic Methods belong to


Methods belong to the class
function generic function

Follows copy-on-modify Follows copy-on-modify Does not follow copy-on-modify


semantics semantics semantics
R S3 Class
In this article, you will learn to work with S3 classes (one of the three class
systems in R).

S3 class is the most popular and prevalent class in R programming language.


Most of the classes that come predefined in R are of this type. The fact that it is simple
and easy to implement is the reason behind this.
How to define S3 class and create S3 objects?
S3 class has no formal, predefined definition.
Basically, a list with its class attribute set to some class name, is an S3 object. The
components of the list become the member variables of the object.
Following is a simple example of how an S3 object of class student can be created.

> # create a list with required components


> s <- list(name = "John", age = 21, GPA = 3.5)

> # name the class appropriately


> class(s) <- "student"

> # That's it! we now have an object of class "student"


> s

$name
[1] "John"
$age
[1] 21
$GPA [1]
3.5
attr(,"class")
[1] "student"
R S4 Class: In this ,you'll learn everything about S4 classes in R; how to
define them, create them, access their slots, and use them efficiently in your
program.

How to define S4 Class?


S4 class is defined using the setClass() function.
In R terminology, member variables are called slots. While defining a class, we need
to set the name and the slots (along with class of the slot) it is going to have.
Example 1: Definition of S4 class
setClass("student", slots=list(name="character", age="numeric",
GPA="numeric"))
In the above example, we defined a new class called student along with three slots it's
going to have name, age and GPA.
There are other optional arguments of setClass() which you can explore in the help
section with ?setClass.

How to create S4 objects?


S4 objects are created using the new() function.
Example 2: Creation of S4 object
> # create an object using new()
> # provide the class name and value for slots
> s <- new("student",name="John", age=21, GPA=3.5)

> s
An object of class "student"
Slot "name":
[1] "John"

Slot "age":
[1] 21

Slot "GPA":
[1] 3.5

How to access and modify slot?


Just as components of a list are accessed using $, slot of an object are accessed
using @.
Accessing slot:
> s@name
[1] "John"
> s@GPA
[1] 3.5
> s@age
[1] 21
Modifying slot directly
A slot can be modified through reassignment.
> # modify GPA

> s@GPA <- 3.7

> s
An object of class "student" Slot
"name":
[1] "John"

Slot "age": [1]


21

Slot "GPA": [1]


3.7
.
R Reference Class
In this, you will learn to work with reference class which is one of the three class
systems (other two are S3 and S4).

Reference class in R programming is


similar to the object oriented programming
we are used to seeing in common
languages like C++, Java, Python etc.
Unlike S3 and S4 classes, methods belong to class rather than generic functions.
Reference class are internally implemented as S4 classes with an environment added
to it.
How to defined a reference class?
Defining reference class is similar to defining a S4 class. Instead of setClass() we
use the setRefClass() function.
> setRefClass("student")
Member variables of a class, if defined, need to be included in the class definition.
Member variables of reference class are called fields (analogous to slots in S4
classes).
Following is an example to define a class called student with 3
fields, name, age and GPA.
> setRefClass("student", fields = list(name = "character", age =
"numeric", GPA = "numeric"))
How to create a reference objects?
The function setRefClass() returns a generator function which is used to create
objects of that class.
> student <- setRefClass("student",
fields = list(name = "character", age = "numeric", GPA =
"numeric"))
> # now student() is our generator function which can be used to
create new objects

> s <- student(name = "John", age = 21, GPA = 3.5)


> s
Reference class object of class "student"
Field "name":
[1] "John"
Field "age":
[1] 21
Field "GPA":
[1] 3.5
How to access and modify fields?
Fields of the object can be accessed using the $ operator.
> s$name
[1] "John"

> s$age
[1] 21

> s$GPA
[1] 3.5
Similarly, it is modified by reassignment.
> s$name <- "Paul"

> s
Reference class object of class "student"
Field "name":
[1] "Paul"
Field "age":
[1] 21
Field "GPA":
[1] 3.5

You might also like