0% found this document useful (0 votes)
3 views

Nirula R Programming Lab Manual (1)

The R Programming Lab Manual for MBA students outlines the course objectives, outcomes, and structure, focusing on the installation and use of R for programming tasks. It covers various topics including data types, built-in functions, and statistical analysis techniques, with exercises designed to enhance practical skills in R. The manual also includes a detailed index and references for further reading.
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Nirula R Programming Lab Manual (1)

The R Programming Lab Manual for MBA students outlines the course objectives, outcomes, and structure, focusing on the installation and use of R for programming tasks. It covers various topics including data types, built-in functions, and statistical analysis techniques, with exercises designed to enhance practical skills in R. The manual also includes a detailed index and references for further reading.
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 94

R PROGRAMMING

LAB MANUAL

MASTER OF BUSINESS
ADMINISTRATION
R PROGRAMMING

PROGRAM : MBA, I Year II-Sem


COURSE NAME & CODE : R Programming LAB – C 207
L-T-P STRUCTURE : 0-0-2
COURSE CREDITS :2

Pre -re quis ite :


 No Pre-requisite

Objectives:

This course provides the knowledge to Install and use R for simple programming
tasks, extended R libraries and packages. Which helps to Develop R
Programs using Looping Constructs and R mathematical functions that can
be used for data exploration in R.

Course Outcomes:

At the end of this course student will be able to

CO1: Master the use of the R interactive environment.


CO2: Expand R by installing R packages.
CO3: Develop Loop constructs in R.
CO4: Use R for descriptive statistics.
CO5: Use R for inferential statistics

COURSE ARTICULATION MATRIX (Correlation between COs&POs):


Cours e COs
Code PO1 PO2 PO3 PO4 PO5
1 1 1 2
CO1
1 1 1
C207 CO2 1 2 1
CO3
1 2 1
CO4
1 2 1
CO5

1= Slight(low) 2=Moderate(Medium) 3=Subs tantial(High)

1
R PROGRAMMING

TEXT BOOKS:
T1: The Art of R Programming, Norman Matloff, Cengage Learning
T2: R for Everyone, Lander, Pearson

REFERENCE BOOKS:
R1: R Cookbook, PaulTeetor, Oreilly.
R2: R in Action, Rob Kabacoff, Manning.

2
R PROGRAMMING

INDEX
S.No Program Description Page No.
1. History of R 4-5

2. Installing R and packages in R. 6-8


3. Programs on data types in R. 9-11

4. Built-in Functions in R 12-13

5. Creating and manipulating a vector in R. 14-20


6. Creating matrix and manipulating matrix in R. 21-28

7. Creating and operations on Factors in R. 29-32


8. Viva Voice Questions and Answers – Cycle -I 33-38

9. Operations on Data Frames in R. 39-43

10. Operations on Lists in R. 44-47


11. Programs on Operators in R. 48-54

12. Comparison of Matrices and Vectors in R. 55-57


13. Programs on If – else statements in R. 58-61

14. Programs on For Loops in R. 62-65

15. Programs on While Loops in R. 66-71


16. Customizing and Saving to Graphs in R. 72-77

17. Viva Voice Questions and Answers – Cycle -II 78-84


Additional Experiments

1. PLOT Function in R to customize graphs. 85-88

2. 3D PLOT in R to customize graphs. 89-91

3
R PROGRAMMING

History of R

R is a programming langu age and free software environment for statistical


compu ting and graphics that is supported by the R Foundation for Statistical
Computing. The R language is widely used among statisticians and data
miners for developing statistical software and data analysis.
R is an implementation of the S programming langu age combined with
lexical scoping semantics inspired by Scheme. S was created by John
Chambers in 1976, while at Bell Labs. There are some important differences,
but much of the code written for S runs unaltered.
R was created by Ross Ihaka and Robert Gentleman at the University of
Au ckland , New Zealand, and is currently developed by the R Development
Core Team, of which Chambers is a member. R is named partly after the first
names of the first two R authors and partly as a play on the name of S. The
project was conceived in 1992, with an initial version released in 1995 and
a stable beta version in 2000
R and its libraries implement a wide variety of statistical and graphical
techniques, including linear and nonlinear modelling, classical statistical
tests, time-series analysis, classification, clustering, and others. R is easily
extensible through functions and extensions, and the R community is noted
for its active contributions in terms of packages. Many of R's standard
functions are written in R itself, which makes it easy for users to follow the
algorithmic choices made.
R is an interpreted langu age; u sers typically access it through a command-
line interpreter. If a user types 2+2 at the R command prompt and presses
enter, the computer replies with 4, as shown below:
>2+2
[1] 4
Features of R
As stated earlier, R is a programming language and software environment
for statistical analysis, graphics representation and reporting. The
following are the important features of R −
 R is a well-developed, simple and effective programming language which
includes conditionals, loops, user defined recursive functions and input
and output facilities.
4
R PROGRAMMING

 R has an effective data handling and storage facility,


 R provides a suite of operators for calculations on arrays, lists,
vectors and matrices.
 R provides a large, coherent and integrated collection of tools for
data analysis.
 R provides graphical facilities for data analysis and display either
directly at the computer or printing at the papers.

5
R PROGRAMMING

Exercise -II
To Install R and R Packages

1. Open an internet browser and go to www.r-p roject.org.


2. Click the "download R" link in the middle of the page under "Getting
Started."
3. Select a CRAN location (a mirror site) and click the corresponding link.
4. Click on the "Download R for WINDOWS" link at the top of the page.
5. Click on the file containing the latest version of R under "Files."
6. Save the .pkg file, double-click it to open, and follow the installation
instructions.
7. Now that R is installed, you need to download and install RStudio.

To Install RStudio

1. Go to www.rstu dio.com and click on the "Download RStudio" button.


2. Click on "Download RStudio Desktop."
3. Click on the version recommended for your system, or the latest Mac
version, save the .dmg file on your computer, double-click it to open, and
then drag and drop it to your applications folder.

To Install R Packages

The capabilities of R are extended through user-created packages, which


allow specialized statistical techniques, graphical devices, import/export
capabilities, reporting tools (knitr, Sweave), etc. These packages are
developed primarily in R, and sometimes in Java, C, C++, and Fortran.The R
packaging system is also used by researchers to create compendia to
organise research data, code and report files in a systematic way for sharing
and public archiving.
A core set of packages is included with the installation of R, with more than
12,500 additional packages (as of May 2018[u pdate]) available at the
Comprehensive R Archive Network (CRAN).

Packages are collections of R functions, data, and compiled code in a well-


defined format. The directory where packages are stored is called the
library. R comes with a standard set of packages. Others are available for
download and installation. Once installed, they have to be loaded into the
session to be used.

6
.libPaths() # get library location library()
# see all packages installed search() #
see packages currently loaded

Adding R Packages

You can expand the types of analyses you do be adding other packages. A
complete list of contribu ted packages is available from CRAN.

Follow these steps:

1. Download and install a package (you only need to do this once).


2. To use the package, invoke the library(package) command to load it into
the current session. (You need to do this once in each session, unless you
cu stomize you r environment to automatically load it each time.)

Installing and Loading Packages

It turns out the ability to estimate ordered logistic or probit regression is


included in the MASS package.
To install this package you run the following command:
1 > install . packages (" MASS ")

You will be asked to pick a CRAN mirror from which to download


(generally the closer the faster) and R will install the package to your
library. R will still be clueless. To actually tell R to use the new package
you have to tell R to load the package’s library each time you start an R
session, just like so:

1 > library (" MASS ")


>
R now knows all the functions that are canned in the MASS package. To see
what functions are implemented in the MASS package, type:
1 > library ( help = " MASS ")
>
Maintaining your Library

Packages are frequently updated. Depending on the developer this could


happen very often. To keep your packages updated enter this every once in
a while:
1 > update . packages ( )

The Workspace
The workspace is your current R working environment and includes any
user-defined objects (vectors, matrices, data frames, lists, functions). At the
end of an R session, the user can save an image of the current workspace
that is automatically reloaded the next time R is started. Commands are
entered interactively at the R user prompt. Up and down arrow keys
scroll through your command history.

You will probably want to keep different projects in different physical


directories. Here are some standard commands for managing your
workspace.
getwd( ) # print the current working directory .
ls ( ) # list the objects in the current workspace.
Setwd (mydirectory) # change to my directory
setwd ("c:/docs/mydir") # note / instead of \ in windows
# view and set options for the session
help(options) # learn about available options
options( ) # view current option settings
Exercise – III
DATA TYPES
You may like to store information of various data types like character, wide
character, integer, floating point, double floating point, Boolean etc. Based
on the data type of a variable, the operating system allocates memory and
decides what can be stored in the reserved memory.
The variables are assigned with R-Objects and the data type of the R-object
becomes the data type of the variable. There are many types of R -objects. The
frequently used ones are −

 Vectors: A basic data structure of R containing the same type of data.


 Matrices: A matrix is a two-dimensional rectangular data set. It can be
created using a vector input to the matrix function.
 Factors: Factors are the r-objects which are created using a vector. It stores
the vector along with the distinct values of the elements in the vector as
labels. The labels are always character irrespective of whether it is numeric
or character or Boolean etc. in the input vector. They are useful in statistical
modelling.
 Data Frames: Data frames are tabular data objects. Unlike a matrix in data
frame each column can contain different modes of data. The first column can
be numeric while the second column can be character and third column can
be logical. It is a list of vectors of equal length.
 Lists: A list is an R-object which can contain many different types of
elements inside it like vectors, functions and even another list inside it.
Modes
All objects have a certain mode. Some objects can only deal with one mode
at a time, others can store elements of multiple modes. R distinguishes the
following modes:
1. integer: integers (e.g. 1, 2 or -69)
2. numeric: real numbers (e.g 2.336, -0.35)
3. complex: complex or imaginary numbers
4. character: elements made up of text-strings (e.g. "text", "Hello World!",
or "123")
5. logical: data containing logical constants (i.e. TRUE and FALSE)
By atomic, we mean the vector only holds data of a single type.
 character: "a", "swc"
 numeric: 2, 15.5
 integer: 2L (the L tells R to store this as an integer)
 logical: TRUE, FALSE
 complex: 1+4i (complex numbers with real and imaginary parts)

R provides many functions to examine features of vectors and other


objects, for example

 class( ) - what kind of object is it (high-level)?


 typeof( ) - what is the object’s data type (low-level)?
 length( ) - how long is it? What about two dimensional objects?

1. Use R to calculate the following: I. 31


* 78
Sol: > 31*78
[1] 2418
II. 697 / 41
Sol: > 697 / 41 [1] 17
2. Assign the value of 39 to x
Sol: > x<-39
>x
[1] 39
3. Assign the value of 22 to y
Sol: > y<-22
>y
[1] 22
4. Make z the value of x - y
Sol: > z<- x - y
5. Display the value of z in the console
Sol: > z
[1] 17
6. Calculate the square root of 2345, and perform a log2 transformation on
the result.
Sol : > log2(sqrt(2345)) [1]
5.597686
7. Type the following code, which assigns numbers to objects x and y. x <- 10 y

<- 20

I. Calculate the product of x and y.


Sol: > x<-10
> y<-20
> x*y
[1] 200
II. Store the result in a new object called z.
Sol: > z<-x*y
>z
[1] 200
8. Calculate the following quantities:
I. The sum of 100.1, 234.9 and 12.01.
Sol: > 100.1+234.9+12.01 [1]
347.01
II. The square root of 256.
Sol: > sqrt(256) [1] 16
III. Calculate the 10-based logarithm of 100, and multiply the result
with the cosine of π. Hint: see ? log and ? pi.
Sol: > log10(100)*cos(pi) [1] -2
Exercise – IV
Built-in Functions
Almost everything in R is done through functions. Here I'm only refering to
numeric and character functions that are commonly used in creating or
recoding variables.
Numeric Functions

Function Description

abs(x) absolute value

sqrt(x) square root

ceiling(x) ceiling(3.475) is 4

floor(x) floor(3.475) is 3

trunc(x) trunc(5.99) is 5

round(x, digits=n) round(3.475, digits=2) is 3.48

signif(x, digits=n) signif(3.475, digits=2) is 3.5

cos(x), sin(x), tan(x) also acos(x), cosh(x), acosh(x), etc.

log(x) natural logarithm

log10(x) common logarithm

exp(x) e^x

1. Calculate the cumulative sum (’running total’) of the numbers 2, 3, 4,


5, 6. Hint: use cumsum() Function. Sol: >
sum(2:6)
[1] 20
> cumsum(2:6)
[1] 2 5 9 14 20
2. Print the 1 to10 numbers in reverse order. Hint: use the rev function.
Sol:
> rev(1:10)
[1] 10 9 8 7 6 5 4 3 2 1
3. Calculate the cumulative sum of those numbers, but in reverse
order.
Sol: > rev(cumsum(1:10))
[1] 55 45 36 28 21 15 10 6 3 1
4. Find 10 random numbers between 0 and100. (Hint: you can use
sample() function)
Sol: > sample(1:100)
[1] 92 86 59 88 19 2 37 23 89 29 18 87 15 30 32 63 14 75 [19]
12 49 72 66 24 20 54 68 48 69 5 99 22 61 83 90 7 94 [37] 81 3
84 43 26 82 80 53 41 27 71 9 38 1 47 10 51 40 [55] 46 44 13
45 100 34 42 79 6 96 4 97 57 28 73 95 91 65 [73] 93 58 39 8 16
17 78 60 36 35 74 85 55 31 76 25 98 70 [91] 33 77 21 56 52 67
50 62 11 64
5. Calculate and Verify the value of x where x = 5, 5*x -> x, x
Sol: > x<-5
> 5*x->x
>x
[1] 25
6. Compute log to the base 10 (log10) of the sqrt of 100. Do not use
variables.
Sol: > log10(sqrt(100)) [1]
1
Exercise – V

Vectors: A basic data structure of R containing the same type of data.

Creating Vector
Vectors are generally created using the c() function.
Since, a vector must have elements of the same type, this function will try
and coerce elements to the same type, if they are different.
Coercion is from lower to higher types from logical to integer to double to
character.

> x <- c(1, 5, 4, 9, 0)

> typeof(x)

[1] "double"

> length(x)

[1] 5

> x <- c(1, 5.4, TRUE, "hello")

>x

[1] "1" "5.4" "TRUE" "hello"

> typeof(x)

[1] "character"

If we want to create a vector of consecutive numbers, the : operator is very


helpful.
Example 1: Creating a vector using : operator

> x <- 1:7; x

[1] 1 2 3 4 5 6 7

> y <- 2:-2; y

[1] 2 1 0 -1 -2

More complex sequences can be created using the seq() function, like
defining number of points in an interval, or the step size.

Example 2: Creating a vector using seq() function

> seq(1, 3, by=0.2) # specify step size

[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0

> seq(1, 5, length.out=4) # specify length of the vector

[1] 1.000000 2.333333 3.666667 5.000000

VECTORS EXERCISE - I
1. Consider two vectors, x, y
x=c(4,6,5,7,10,9,4,15)
y=c(0,10,1,8,2,3,4,1) What is the value of: x*y and x+y

Sol: > x<-c(4,6,5,7,10,9,4,15)


> y<-c(0,10,1,8,2,3,4,1)
>x
[1] 4 6 5 7 10 9 4 15
>y
[1] 0 10 1 8 2 3 4 1
> x*y
[1] 0 60 5 56 20 27 16 15
> x+y
[1] 4 16 6 15 12 12 8 16

2. Consider two vectors, a, b


a=c(1,5,4,3,6)
b=c(3,5,2,1,9) What is the value of: a<=b
Sol:
> a<-c(1,5,4,3,6)
> b<-c(3,5,2,1,9)
> a<=b
[1] TRUE TRUE FALSE FALSE TRUE

3. If x=c(1:12)
What is the value of: dim(x) What
is the value of: length(x) Sol:
> x<-c(1:12)
> dim(x)
NULL
> length(x)
[1] 12

4. If a=c(12:5) What is the value of:


is.numeric(a) Sol:
> a<-c(12:5)
> typeof(a) [1]
"integer"
> is.numeric(a)
[1] TRUE

5. Consider two vectors, x, y


x=letters [1:10]
y=letters[15:24] What is the value of: x<y
Sol:
> x<-letters[1:10]
> y<-letters[15:24]
>x
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
>y
[1] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
> x<y
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

6. If x=c ('blue', 'red', 'green', 'yellow') what is the value of:


is.character(x). Sol:
> x<-c ('blue', 'red', 'green', 'yellow')
> typeof(x)
[1] "character"
> is.character(x) [1]
TRUE
7. If x=c('blue',10,'green',20) What is the value of: is.character (x).
Sol:
> typeof(x)
[1] "character"
> is.character(x) [1]
TRUE
8. Consider two vectors, a, b
a=c(10,2,4,15)
b=c(3,12,4,11) What is the value of: rbind(a,b)
SOL:
> a<-c(10,2,4,15)
> b<-c(3,12,4,11)
>a
[1] 10 2 4 15
>b
[1] 3 12 4 11
> rbind(a,b)
[,1] [,2] [,3] [,4] a
10 2 4 15 b
3 12 4 11
9. Consider two vectors, a, b
a=c(1,2,4,5,6)
b=c(3,2,4,1,9) What is the value of:
cbind(a,b) Sol:
> a=c(1,2,4,5,6)
> b=c(3,2,4,1,9)
> cbind (a,b)
a b [1,] 1
3 [2,] 2 2
[3,] 4 4 [4,]
5 1 [5,] 6 9
Exercise – VI

VECTORS EXERCISE - II

1. The numbers below are the first ten days of rainfall amounts in 1996.
Read them in to a vector using the c() function 0.1, 0.6, 33.8, 1.9, 9.6,
4.3, 33.7, 0.3, 0.0, 0.1
Sol:
> rainfall<-c(0.1, 0.6, 33.8, 1.9, 9.6, 4.3, 33.7, 0.3, 0.0, 0.1)
> rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
2. Inspect Table and answer the following questions:
I. What was the mean rainfall, how about the standard deviation?
Sol:
rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
> mean(rainfall) [1] 8.44
> sd(rainfall)
[1] 13.66473

II. Calculate the cumulative rainfall (’running total’) over these ten days.
Confirm that the last value of the vector that this
produces is equal to the total sum of the rainfall.
Sol:
> rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
> cumsum(rainfall)
[1] 0.1 0.7 34.5 36.4 46.0 50.3 84.0 84.3 84.3 84.4
> sum(rainfall)==rainfall[10] [1] FALSE
III. Which day saw the highest rainfall? Hint
which.max()
Sol:
> rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
> max(rainfall) [1] 33.8
3. Compute the problem sum ((x - mean(x)) ^2).
Sol:
> x<-c(1:10)
> sum ((x - mean(x)) ^2) [1]
82.5

4. The weights of five people before and after a diet programme are
given in the table.

Read the `before' and `after' values into two different vectors called
before and after. Use R to evaluate the amount of weight lost for each
participant. What is the average amount of weight lost?

Sol:
> before
[1] 78 72 78 79 105
> after
[1] 67 65 79 70 93
> weightlost<-before-after
> weightlost
[1] 11 7 -1 9 12
> mean(weightlost) [1]
7.6
Exercise – VII MATRICES

EXERCISE - I

 Matrices: A matrix is a two-dimensional rectangular data set. It can


be created using a vector input to the matrix function.

Creating Matrices
To create matrices we will use the matrix() function. The matrix() function
takes the following arguments:
• data an R object (this could be a vector).
• nrow the desired number of rows.
• ncol the desired number of columns.
• byrow a logical statement to populate the matrix by either row or by
column.
Creation of matrix
a) matrix1 <- matrix ( data = 1, nrow = 3, ncol = 3) Sol:
> matrix1 <- matrix ( data = 1, nrow = 3, ncol = 3)
> matrix1
[,1] [,2] [,3] [1,] 1
1 1 [2,]
1 1 1
[3,] 1 1 1
b) vector8 <- 1:12 matrix3 <- matrix ( data = Vector8 , nrow =
4) Sol:
> vector8 <- c(1:12)
> vector8
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> matrix3 <- matrix ( data = vector8 , nrow = 4)
> matrix3
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
c) v1<- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3)
Sol:
> v1<- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3)
> v1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
d) v2<- matrix(1:8, ncol = 2)
Sol:
> v2<- matrix(1:8, ncol = 2)
> v2
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 4 8
e) matrix1 = matrix(1:9, nrow = 3) matrix1 + 2
Sol:
> matrix1 = matrix(1:9, nrow = 3)
> matrix1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> matrix1+2
[,1] [,2] [,3]
[1,] 3 6 9
[2,] 4 7 10
[3,] 5 8 11

Manipulation of Matrix
f) matrix1
Sol:
> matrix1
[,1] [,2] [,3] [1,] 1
4 7 [2,]
2 5 8
[3,] 3 6 9
g) matrix1[1, 3] Sol:
> matrix1[1, 3] [1] 7
h) matrix1[ 2, ] Sol:
> matrix1[ 2, ] [1]
258
i) matrix1[,-2] Sol:
> matrix1[,-2] [,1]
[,2]
[1,] 1 7
[2,] 2 8
[3,] 3 9

j) matrix1[1, 1] = 15
Sol:
> matrix1[1, 1] = 15
> matrix1
[,1] [,2] [,3] [1,]
15 4 7
[2,] 2 5 8
[3,] 3 6 9
k) matrix1[ ,2 ] = 1
Sol:
> matrix1
[,1] [,2] [,3] [1,]
15 1 7
[2,] 2 1 8
[3,] 3 1 9
l) matrix1[ ,2:3 ] = 2
Sol:
> matrix1[ ,2:3 ] = 2
> matrix1
[,1] [,2] [,3]
[1,] 15 2 2
[2,] 2 2 2
[3,] 3 2 2
Exercise – VIII MATRICES

EXERCISE - II

Mathematical Operations
R can do matrix arithmetic. Below is a list of some basic operations we can
do.
 + - * / standard scalar or by element operations
 %*% matrix multiplication
 t() transpose
 solve() inverse
 det() determinant
 chol() cholesky decomposition
 eigen() eigenvalues and eigenvectors
 crossprod() cross product.

> B<-matrix(nrow=3,ncol=3,data=c(1,2,3,4,2,6,-3,-1,-3) , byrow=TRUE)


>B
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 2 6
[3,] -3 -1 -3
> B%*%B%*%B
[,1] [,2] [,3]
[1,] -6 0 0 [2,]
0 -6 0
[3,] 0 0 -6
SOL:
> m<-matrix(nrow=2,ncol=4,data=c(1,3,5,7,2,4,6,8) , byrow=TRUE)
>m
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
b) Calculate Transpose.
Sol:
> t(m)
[,1] [,2] [1,] 1
2 [2,]
3 4
[3,] 5 6
[4,] 7 8
c) Calculate Inverse.
Sol:
> solve(m)
Error in solve.default(m) : 'a' (2 x 4) must be square

> m<-matrix(nrow=3,ncol=3,data=c(1,3,5,7,2,4,6,8,9) , byrow=TRUE)


>m
[,1] [,2] [,3] [1,] 1
3 5 [2,]
7 2 4
[3,] 6 8 9
>
d) Calculate Determinant.
Sol:
> det(m) [1] 89
e) Calculate the Multiplication of the matrix.
Sol:
> m1<-m%*%m
> m1
[,1] [,2] [,3] [1,]
52 49 62 [2,] 45
57 79 [3,] 116
106 143
>
f) Construct a matrix with 10 columns and 10 rows, all filled with
random numbers between 0 and 100.
Sol:
m <- matrix(runif(100), ncol=10)
g) Calculate the row means of this matrix (Hint: use rowMeans). Also
calculate the standard deviation across the row means (now also use
sd().
Sol:
> m1<-rowMeans(m)
> m1
[1] 0.3885344 0.6758386 0.4342555 0.5735385 0.5112892
0.4370579 0.4852983
[8] 0.6234814 0.6275129 0.7056754
> sd(m1)
[1] 0.1104536

h) Now remake the above matrix with 100 columns, and 10 rows. Then
calculate the column means (using, of course, colMeans).
Sol:
>m <- matrix(runif(1000), ncol=100,nrow=10)
> m1<-colMeans(m)
> m1
Exercise – IX

MATRICES EXERCISE - III

Q.1

Consider A=matrix(c(2,0,1,3), ncol=2) and B=matrix(c(5,2,4,-1), ncol=2).


a) Find A + B
b) Find A – B

Q.2

Scalar multiplication. Find the solution for aA where a=3 and A is the same
as in the previous question.

Q.3

Using the the diag function build a diagonal matrix of size 4 with the
following values in the diagonal 4,1,2,3.

Q.4

Find the solution for Ab, where A is the same as in the previous question
and b=c(7,4).

Q.5

Find the solution for AB, where B is the same as in question 1.

Q.6

Find the transpose matrix of A.

Q.7

Find the inverse matrix of A.


Q.8

Find the value of x on Ax=b.

Q.9

Using the function eigen find the eigenvalue for A.

Q.10

Find the eigenvalues and eigenvectors of A’A . Hint: Use crossprod to


compute A’A .
Exercise – X

Factors

Factor is a data structure used for fields that takes only predefined, finite
number of values (categorical data).
For example, a data field such as marital status may contain only values
from single, married, separated, divorced, or widowed.
In such case, we know the possible values beforehand and these
predefined, distinct values are called levels. Following is an example of
factor in R.

>x

[1] single married married single

Levels: married single

Here, we can see that factor x has four elements and two levels. We can
check if a variable is a factor or not using class() function.
Similarly, levels of a factor can be checked using the levels() function.

> class(x) [1]

"factor"

> levels(x)

[1] "married" "single"

Creating factor in R?

We can create a factor using the function factor(). Levels of a factor are
inferred from the data if not provided.

> x <- factor(c("single", "married", "married", "single"));


>x

[1] single married married single

Levels: married single

> x <- factor(c("single", "married", "married", "single"), levels = c("single",


"married", "divorced"));

>x

[1] single married married single

Levels: single married divorced

We can see from the above example that levels may be predefined even if
not used.
Factors are closely related with vectors. In fact, factors are stored as integer
vectors. This is clearly seen from its structure.

> x <- factor(c("single","married","married","single"))

> str(x)

Factor w/ 2 levels "married","single": 2 1 1 2

Q.1 If x = c(1, 2, 3, 3, 5, 3, 2, 4, NA), what are the levels of factor(x)?


Sol:
> x = c(1, 2, 3, 3, 5, 3, 2, 4, NA)

> levels(factor(x))

[1] "1" "2" "3" "4" "5"


Q.2 Let x <- c(11, 22, 47, 47, 11, 47, 11). If an R expression factor(x,
levels=c(11, 22, 47), ordered=TRUE) is executed, what will be the 4th
element in the output?
Sol:
> x <- c(11, 22, 47, 47, 11, 47, 11)
> factor(x, levels=c(11, 22, 47), ordered=TRUE)
[1] 11 22 47 47 11 47 11
Levels: 11 < 22 < 47

Q.3 If z <- c("p", "a" , "g", "t", "b"), then What is the R expression will re
place the third element in z with "b".
Sol:
> z <- c("p", "a" , "g", "t", "b")
> z[3] <- "b"
>z
[1] "p" "a" "b" "t" "b"

Q.4 If z <- factor(c("p", "q", "p", "r", "q")) and levels of z are "p", "q" ,"r",
write an R expression that will change the level "p" to "w" so that z is e
qual to: "w", "q" , "w", "r" , "q".
Sol:
> z <- factor(c("p", "q", "p", "r", "q"))
> levels(z)[1] <- "w"
>z
[1] w q w r q
Levels: w q r

Q.5 If: s1 <- factor(sample(letters, size=5, replace=TRUE)) and s2 <-


factor(sample(letters, size=5, replace=TRUE)), write an R
expression that will concatenate s1 and s2 in a single factor with 10
elements.

Sol:
> s1 <- factor(sample(letters, size=5, replace=TRUE))
> s2 <- factor(sample(letters, size=5, replace=TRUE))
> factor(c(levels(s1)[s1], levels(s2)[s2]))
[1] c e q l t v b k t c
Levels: b c e k l q t v

Q.6 Consider the factor responses <- factor(c("Agree", "Agree",


"Strongly Agree", "Disagree", "Agree")), with the following output:

Sol:
> responses <- factor(c("Agree", "Agree", "Strongly Agree", "Disagree", "Agr
ee"))
> responses
[1] Agree Agree Strongly Agree Disagree Agree
Levels: Agree Disagree Strongly Agree

Q.7 If x <- factor(c("high", "low", "medium", "high", "high", "low",


"medium")), write an R expression that will provide unique numeric
values for various levels of x with the following output:

Sol:
> x <- factor(c("high", "low", "medium", "high", "high", "low", "medium"))
> data.frame(levels = unique(x), value = as.numeric(unique(x)))
levels value
1 high 1
2 low 2
3 medium 3
Viva Voice Questions and Answers - Cycle-I

1. What is R?
R is a programming language which is used for developing statistical
software and data analysis. It is being increasingly deployed for
machine learning applications as well.
2. How R commands are written?
By using # at the starting of the line of code like #division commands are
written.
3. What is t-tests() in R?
It is used to determine that the means of two groups are equal or not by
using t.test() function.
4. What are the disadvantages of R Programming?
The disadvantages are:-
 Lack of standard GUI
 Not good for big data.
 Does not provide spreadsheet view of data.
5. What is the use of With () and By () function in R?
with() function applies an expression to a dataset.
#with (data, expression)
By() function applies a function t each level of a factors.
#by (data, factor list, function)
6. In R programming, how missing values are represented?
In R missing values are represented by NA which should be in capital
letters.
7. What is the use of subset() and sample() function in R?
Subset() is used to select the variables and observations and sample()
function is used to generate a random sample of the size n from a
dataset.
8. Explain what is transpose.
Transpose is used for reshaping of the data which is used for
analysis. Transpose is performed by t() function.
9. What are the advantages of R?
The advantages are:-
 It is used for managing and manipulating of data.
 No license restrictions
 Free and open source software.
 Graphical capabilities of R are good.
 Runs on many Operating system and different hardware and also run
on 32 & 64 bit processors etc.
10. What is the function used for adding datasets in R?
For adding two datasets rbind() function is used but the column of two
datasets must be same.
Syntax: rbind(x1,x2……) where x1,x2: vector, matrix, data frames.
11. How you can produce co-relations and covariances?
Cor-relations is produced by cor() and covariances is produced by
cov() function.
12. What is difference between matrix and dataframes?
Dataframe can contain different type of data but matrix can contain
only similar type of data.
13. What is difference between lapply and sapply?
lapply is used to show the output in the form of list whereas sapply is
used to show the output in the form of vector or data frame
14. What is the difference between seq(4) and seq_along(4)?
Seq(4) means vector from 1 to 4 (c(1,2,3,4)) whereas seq_along(4)
means a vector of the length(4) or 1(c(1)).
15. Explain how you can start the R commander GUI.
rcmdr command is used to start the R commander GUI.
16. What is the memory limit of R?
In 32 bit system memory limit is 3Gb but most versions limited to
2Gb and in 64 bit system memory limit is 8Tb.
17. How many data structures R has?
There are 5 data structure in R i.e. vector, matrix, array which are of
homogenous type and other two are list and data frame which are
heterogeneous.
18. Explain how data is aggregated in R.
There are two methods that is collapsing data by using one or more BY
variable and other is aggregate() function in which BY variable should
be in list.
19. How many sorting algorithms are available?
There are 5 types of sorting algorithms are used which are:-
 Bubble Sort
 Selection Sort
 Merge Sort
 Quick Sort
 Bucket Sort
20. How to create new variable in R programming?
For creating new variable assignment operator ‘< -’ is used
For e.g. mydata$sum <- mydata$x1 + mydata$x2
21. What are R packages?
Packages are the collections of data, R functions and compiled code in a
well-defined format and these packages are stored in library. One of the
strengths of R is the user-written function in R language.
22. What is the workspace in R?
Workspace is the current R working environment which includes any
user defined objects like vector, lists etc.
23. What is the function which is used for merging of data frames
horizontally in R?
Merge()function is used to merge two data frames
Eg. Sum<-merge(data frame1,data frame 2,by=’ID’)
24. What is the function which is used for merging of data frames
vertically in R?
rbind() function is used to merge two data frames vertically. Eg.
Sum<- rbind(data frame1,data frame 2)
25. What is the power analysis?
It is used for experimental design .It is used to determine the effect of
given sample size.
26. Which method is used for exporting the data in R?
There are many ways to export the data into another formats like
SPSS, SAS , Stata , Excel Spreadsheet.
27. Which packages are used for exporting of data?
For excel xlsReadWrite package is used and for sas,spss ,stata foreign
package is implemented.
28. How impossible values are represented in R?
In R NaN is used to represent impossible values.
29. Which command is used for storing R object into a file?
Save command is used for storing R objects into a file. Syntax:
>save(z,file=”z.Rdata”)
30. Which command is used for restoring R object from a file?
load command is used for storing R objects from a file.
Syntax: >load(”z.Rdata”)
31. What is the use of coin package in R?
Coin package is used to achieve the re randomization or permutation
based statistical tests.

32. Which function is used for sorting in R?


order() function is used to perform the sorting.
33. What is the difference between data frame and a matrix in R?
Data frame can contain heterogeneous inputs while a matrix cannot. In
matrix only similar data types can be stored whereas in a data frame
there can be different data types like characters, integers or other data
frames.
34. How can you add datasets in R?
rbind () function can be used add datasets in R language provided the
columns in the datasets should be same.
35. What are factor variable in R language?
Factor variables are categorical variables that hold either string or
numeric values. Factor variables are used in various types of graphics
and particularly for statistical modelling where the correct number of
degrees of freedom is assigned to them.
36. What is the memory limit in R?
8TB is the memory limit for 64-bit system memory and 3GB is the
limit for 32-bit system memory.
37. What are the data types in R on which binary operators can be
applied?
Scalars, Matrices ad Vectors.
38. How do you create log linear models in R language?
Using the loglm () function
39. What will be the class of the resulting vector if you concatenate a
number and NA?
number
40. What will be the class of the resulting vector if you concatenate a
number and a character?
character
41. Write code to build an R function powered by C?
If you want to know all the values in c (1, 3, 5, 7, 10) that are not in c (1,
5, 10, 12, 14). Which in-built function in R can be used to do this? Also,
how this can be achieved without using the in -built function.
Using in-built function - setdiff(c (1, 3, 5, 7, 10), c (1, 5, 10, 11, 13))
Without using in-built function - c (1, 3, 5, 7, 10) [! c (1, 3, 5, 7, 10)
%in% c (1, 5, 10, 11, 13).
42. How can you debug and test R programming code?
R code can be tested using Hadley’s test that package.
43. What will be the class of the resulting vector if you concatenate a
number and a logical?
number
44. Write a function in R language to replace the missing value in a
vector with the mean of that vector.
mean impute <- function(x) {x [is.na(x)] <- mean(x, na.rm = TRUE); x}
45. What happens if the application object is not able to handle an
event?
The event is dispatched to the delegate for processing.
46. How do you write R comments?
The line of code in R language should begin with a hash symbol (#).
47. How can you verify if a given object “X” is a matric data object?
If the function call is.matrix(X ) returns TRUE then X can be termed as
a matrix data object.
48. How can you verify if a given object “X” is a matrix data object?
If the function call is.matrix(X) returns true then X can be considered as
a matrix data object otheriwse not.
49. How will you measure the probability of a binary response
variable in R language?
Logistic regression can be used for this and the function glm () in R
language provides this functionality.
50. What is the use of sample and subset functions in R
programming language?
Sample () function can be used to select a random sample of size ‘n’
from a huge dataset.
Subset () function is used to select variables and observations from a
given dataset.
Exercise – XI

Data Frames
Data frame is a two dimensional data structure in R. It is a special case of a
list which has each component of equal length.
Each component form the column and contents of the compon ent form the
rows.
Creating Data Frame in R
We can create a data frame using the data.frame() function.
For example, the above shown data frame can be created as follows.

> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John","Dora"))

> str(x) # structure of x

'data.frame': 2 obs. of 3 variables:

$ SN : int 1 2

$ Age : num 21 15

$ Name: Factor w/ 2 levels "Dora","John": 2 1

Notice above that the third column, Name is of type factor, instead of a
character vector.
By default, data.frame() function converts character vector into factor.
To suppress this behavior, we can pass the argument
stringsAsFactors=FALSE.

> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John", "Dora"),


stringsAsFactors = FALSE)

> str(x) # now the third column is a character vector

'data.frame': 2 obs. of 3 variables:


$ SN : int 1 2

$ Age : num 21 15

$ Name: chr "John" "Dora"

Many data input functions of R like, read.table(), read.csv(), read.delim(),


read.fwf() also read data into a data frame.
Access Components of a Data Frame
Components of data frame can be accessed like a list or like a matrix.

Accessing like a list

We can use either [, [[ or $ operator to access columns of data frame.

> x["Name"]

Name

1 John

2 Dora

> x$Name

[1] "John" "Dora"

> x[["Name"]]

[1] "John" "Dora"

> x[[3]]

[1] "John" "Dora"


Accessing with [[ or $ is similar. However, it differs for [ in that, indexing
with [ will return us a data frame but the other two will reduce it into a
vector.

Question 1
Create the following data frame, afterwards invert Sex for all individuals.

Answer:
Name <- c("Alex", "Lilly", "Mark", "Oliver", "Martha", "Lucas", "Caroline")
Age <- c(25, 31, 23, 52, 76, 49, 26)
Height <- c(177, 163, 190, 179, 163, 183, 164)
Weight <- c(57, 69, 83, 75, 70, 83, 53)
Sex <- as.factor(c("F", "F", "M", "M", "F", "M", "F"))
df <- data.frame (row.names = Name, Age, Height, Weight, Sex)
levels(df$Sex) <- c("M", "F")
df

Question 2
Create this data frame (make sure you import the variable working as
character and not factor).

Add this data frame column-wise to the previous one.


a) How many rows and columns does the new data frame have?
b) What class of data is in each column?
Answer:
Name <- c("Alex", "Lilly", "Mark", "Oliver", "Martha", "Lucas", "Caroline")
Working <- c("Yes", "No", "No", "Yes", "Yes", "No", "Yes")
dfa <- data.frame(row.names = Name, Working)

a) dfa <- cbind (df,dfa)

dim(dfa)
## [1] 7 5
# or:
nrow(dfa)
## [1] 7
ncol(dfa)
## [1] 5

b) sapply(dfa, class)
## Age Height Weight Sex Working
## "numeric" "numeric" "numeric" "factor" "factor"
str(dfa) # alternative solution

Question 3
Create a simple data frame from 3 vectors. Order the entire data frame by
the first column.
# Example vectors
v <- c(45:41, 30:33)
b <- LETTERS[rep(1:3, 3)]
n <- round(rnorm(9, 65, 5))

df <- data.frame(Age = v, Class = b, Grade = n)

df[with (df, order(Age)),]

OR
df [order(df$Age), ]

Question 4
Create a data frame from a matrix of your choice, change the row names so
every row says id_i (where i is the row number) and change the column
names to variable_i (where i is the column number). I.e., for column 1 it will
say variable_1, and for row 2 will say id_2 and so on.
Answer:
matr <- matrix(1:20, ncol = 5) # Example matrix
df <- as.data.frame(matr)
colnames(df) <- paste("variable_", 1:ncol(df))
rownames(df) <- paste("id_", 1:nrow(df))
df

Question -5
Take Q.1 into consideration answer the Following Questions
a) What are the Names of Students?
b) Find the Mean Height of Students and Weight of Students.
c) Find the Standard Deviation of Height and Weight of Students. d)
Find the number of Male and Female Students.
Question - 6
The command data.frame() is used to create a data frame, each argument
representing a column.
> books <- data.frame(author=c("Ripley", "Cox", "Snijders", "Cox"),
+ year=c(1980, 1979, 1999, 2006),
+ publisher=c("Wiley", "Chapman", "Sage", "CUP"))
> books
author year publisher
1 Ripley 1980 Wiley
2 Cox 1979 Chapman
3 Snijders 1999 Sage
4 Cox 2006 CUP
(a) Create a small data frame representing a database of films. It should
contain the fields title, director, year, country, and at least three films.
(b) Create a second data frame of the same format as above, but containing
just one new film.
(c) Merge the two data frames using rbind().
(d) Try sorting the titles using sort(): what happens?
Exercise – XII

LISTS
List is a data structure having components of mixed data types.
A vector having all elements of the same type is called atomic vector but a
vector having elements of different type is called list.
We can check if it’s a list with typeof() function and find its length using
length().

Creating a list

List can be created using the list() function.

> x <- list("a" = 2.5, "b" = TRUE, "c" = 1:3)

Here, we create a list x, of three components with data types double, logical
and integer vector respectively.
Its structure can be examined with the str() function.

> str(x)

We can create the same list without the tags as follows. In such scenario,
numeric indices are used by default.

> x <- list(2.5,TRUE,1:3)

>x

Q.1 If: p <- c(2,7,8), q <- c("A", "B", "C") and x <- list(p, q), then what is
the value of x[2]?
Sol:
p <- c(2,7,8)
q <- c("A", "B", "C")
x <- list(p, q)
x[2]
[[1]]
[1] "A" "B" "C"
Q.2 If: w <- c(2, 7, 8) v <- c("A", "B", "C") x <- list(w, v), then which R sta
tement will replace "A" in x with "K".

Sol:
w <- c(2, 7, 8)
v <- c("A", "B", "C")
x <- list(w, v) x[[2]]
[1] <- "K"
x

[[1]]
[1] 2 7 8

[[2]]
[1] "K" "B" "C"

Q.3 If a <- list ("x"=5, "y"=10, "z"=15), which R statement will give the
sum of all elements in a?
Sol:
a <- list ("x"=5, "y"=10, "z"=15)
sum(unlist(a))

[1] 30

Q.4 If Newlist <- list(a=1:10, b="Good morning", c="Hi"), write an R sta


tement that will add 1 to each element of the first vector in Newlist. Sol:
Newlist <- list(a=1:10, b="Good morning", c="Hi")
Newlist$a <- Newlist$a + 1
Newlist

$a
[1] 2 3 4 5 6 7 8 9 10 11

$b
[1] "Good morning"
$c
[1] "Hi"

Q.5 If b <- list(a=1:10, c="Hello", d="AA"), write an R expression that


will give all elements, except the second, of the first vector of b.
Sol:
b <- list(a=1:10, c="Hello", d="AA")
b$a[-2]

[1] 1 3 4 5 6 7 8 9 10

Q.6 Let x <- list(a=5:10, c="Hello", d="AA"), write an R statement to


add a new item z = "NewItem" to the list x.
Sol:
$a
[1] 5 6 7 8 9 10
$c
[1] "Hello"
$d
[1] "AA"
$z
[1] "New Item"

Q.7 Consider y <- list("a", "b", "c"), write an R statement that will
assign new names "one", "two" and "three" to the elements of y.
Sol:
y <- list("a", "b", "c")
names(y) <- c("one", "two", "three")
y
$one
[1] "a"
$two
[1] "b"

$three
[1] "c"
Q.8 If x <- list(y=1:10, t="Hello", f="TT", r=5:20), write an R statement
that will give the length of vector r of x.
Sol:
x <- list(y=1:10, t="Hello", f="TT", r=5:20)
length(x$r)
[1] 16

Q.9 Let string <- "Grand Opening", write an R statement to split this
string into two and return the following output:
Sol:
> string <- "Grand Opening"
> a <- strsplit(string," ")
> list(a[[1]][1], a[[1]][2])
[[1]]
[1] "Grand"

[[2]]
[1] "Opening"

Q.10 Let: y <- list ("a", "b", "c") and q <- list ("A", "B", "C", "a", "b", "c").
Write an R statement that will return all elements of q that are not in
y, with the following result:
Sol:
> y <- list("a", "b", "c")
> q <- list("A", "B", "C", "a", "b", "c")
> setdiff(q, y)

[[1]] [1]
"A"

[[2]] [1]
"B"

[[3]] [1]
"C"
Exercise – XIII
Operators in R
R has many operators to carry out different mathematical and logical
operations. Operators in R can mainly be classified into the following
categories.

Type of operators in R

Arithmetic operators

R e la t i o n a l o p e r a t o r s

Logical operators

Assignment operators

R Arithmetic Operators
These operators are used to carry out mathematical operations like
addition and multiplication. Here is a list of arithmetic operators available
in R.

Arithmetic Operators in R

Operator Description

+ Addition

– Subtraction

* Multiplication

/ Division
^ Exponent

%% Modulus (Remainder from division)

%/% Integer Division

R Relational Operators
Relational operators are used to compare between values. Here is a list of
relational operators available in R.

Relational Operators in R

Operator Description

< Less than

> Greater than

<= Less than or equal to

>= Greater than or equal to

== Equal to

!= Not equal to

Operation on Vectors
The above mentioned operators work on vectors. The variables used above
were in fact single element vectors.
We can use the function c() (as in concatenate) to make vectors in R.
All operations are carried out in element-wise fashion. Here is an example.
> x <- c(2,8,3)
> y <- c(6,4,1)
> x+y
[1] 8 12 4
> x>y
[1] FALSE TRUE TRUE
When there is a mismatch in length (number of elements) of operand
vectors, the elements in shorter one is recycled in a cyclic manner to match
the length of the longer one.

R Logical Operators
Logical operators are used to carry out Boolean operations like AND, OR
etc.

Logical Operators in R

Operator Description

! Logical NOT

& Element-wise logical AND

&& Logical AND

| Element-wise logical OR

|| Logical OR

Operators & and | perform element-wise operation producing result having


length of the longer operand.
But && and || examines only the first element of the operands resulting
into a single length logical vector.
Zero is considered FALSE and non-zero numbers are taken as TRUE. An
example run.
> x <- c(TRUE,FALSE,0,6)
> y <- c(FALSE,TRUE,FALSE,TRUE)
> !x
[1] FALSE TRUE TRUE FALSE
> x&y
[1] FALSE FALSE FALSE TRUE
R Assignment Operators
These operators are used to assign values to variables.

Assignment Operators in R

Operator Description

<-, <<-, = Leftwards assignment

->, ->> Rightwards assignment

The operators <- and = can be used, almost interchangeably, to assign to


variable in the same environment.
The <<- operator is used for assigning to variables in the parent
environments (more like global assignments). The rightward assignments,
although available are rarely used.
> x <- 5
>x
[1] 5
>x=9
>x
[1] 9
> 10 -> x
>x
[1] 10

Q.1 Basic Operations There are two main different type of interest, simple
and compound. To start let’s create 3 variables, S = 100 (initial investment),
i1=.02 (annual simple interest), i2=.015 (annual compound interest), n=2
(years that the investment will last).

Simple Interest Define a variable called simple equal to S (1 + i1 * n)

Compound Interest Define a variable called compound equal to S x (1 +


i2)

SOL :
S <- 100 i1
<- 0.1 i2 <-
0.09 n <- 2

simple <- S*(1 + i1*n)

compound <- S*(1 + i2)**n

Q.2 It’s natural to ask which type of interest for this values gives more
amount of money after 2 years (n = 2). Using logical functions <,>, == check
which variable is bigger between simple and compound

SOL :

simple>compound
## [1] TRUE

Q.3 Using logical functions <,>, ==, |, & find out if simple or compound is
equal to 120
Using logical functions <,>, ==, |, & find out if simple and compound is equal
to 120
SOL :

simple>=120|compound>=120
## [1] TRUE
simple>=120&compound>=120
## [1] FALSE

Q.4 Formulas can deal with vectors, so let’s define a vector and use it in one
of the formulas we defined earlier. Let’s define S as a vector with the
following values 100, 96. Remember that c() is the function that allow us to
define vectors.

Apply to S the simple interest formula and store the value of the vector in
simple

SOL :
S <- c(100,95)
simple <- S*(1 + i1*n)

Q.5 Using logical functions <,>, == check if any of the simple values is
smaller or equal to compound

SOL :
simple<=compound
## [1] FALSE TRUE

Q.6 Using the function %/% find out how many $20 candies can you buy
with the money stored in compound

SOL :
compound%/%20
## [1] 5

Q.7 Using the function %% find out how much money is left after buying
the candies.

SOL :
compound%%20
## [1] 18.81
Q.8 Let’s create two new variables, ode defined as rational=1/3 and
decimal=0.33. Using the logical function != Verify if this two values are
different.

SOL :
rational <- 1/3
decimal <- 0.33

rational!=decimal
## [1] TRUE

Q.9 There are other functions that can help us compare two variables.

Use the logical function == verify if rational and decimal are the same.

Use the logical function isTRUE verify if rational and decimal are the same.

Use the logical function identical verify if rational and decimal are the same.

SOL :
rational==decimal
## [1] FALSE
isTRUE(rational==decimal)
## [1] FALSE
identical(rational,decimal)
## [1] FALSE

Q.10 Using the help of the logical functions of the previous exercise find the
approximation that R uses for 1/3. Hint: It is not the value that R prints
when you define 1/3

SOL :
1/3==0.3333333333333333
## [1] TRUE
Exercise- XVI
Logical operations and Factors.
1. Create a logical vector
x <- seq(-3,3,length=200) > 0
2. negate this vector
!x
3. Compute the truth table for logical AND
c(T,T,F,F) & c(T,F,F,T)
4. Explore arithmetic with logical and numeric
1:3 + c(T,F,T)
5. Compute the intersection of {1,2,...,10} and {5,6,...,15}

intersect(1:10,5:15)
6. Create a factor
drinks <- factor(c("beer","beer","wine","water"))
7. Examine the representation of the factor
unclass(drinks)
8. Compute the truth table for logical OR. The function R computes the
logical EXCLUSIVE-OR. What is the difference between the two?

9. Consider the vector 1:K, where K is a positive integer. Write an R


command that determines how many elements in the vector are
exactly divisible by 3.

a<- 1:100
a [a%%3==0]
Exercise- XV
Use R to find all the numbers between 1 and n which are multiples of
some m
You will need the operators: and %%
> 1:5
[1] 1 2 3 4 5
> 18 %% 5
[1] 3
> 4 %% 2
[1] 0

1. Use R to find all the numbers between 1 and 2000 which are
multiples of 317.
Sol:
> a <- 1:2000
> a [ a%%317 ==0]
[1] 317 634 951 1268 1585 1902

2. Find all the words with less than 6 or more than 8 characters in the
given vector.
Sol:
> a<- c("Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota",
"Mississippi", "Missouri", "Montana")
> a [nchar(a) >8 | nchar(a)<6]
[1] "Maine" "Massachusetts" "Minnesota" "Mississippi"
3. R has a wide range of built‐in functions.
length(c(4,2,9))
[1] 3
max(c(1,2,4,2,5,-1,1))
[1] 5
sum(c(1,2,3,4))
[1] 10
mean(c(0.5, 3.4, 8.9, 4,4, 6.7))
[1] 4.583333
sd(c(0.5, 3.4, 8.9, 4,4, 6.7))
[1] 2.893729
4. Create a list, and examine elements
x.lis <- list(a=1:10,b=letters[1:3],b=matrix(1:10,ncol=2))
x.lis$1 x.lis[[2]]
5. Element-wise arithmetic with matrices
x.mat <- matrix(1:10,ncol=2)
x.mat+1 x.mat + x.mat
6. Matrix multiplication
x.mat %*% t(x.mat)
7. Compute row and column sums of a matrix
apply (x.mat,1,sum)
apply (x.mat,2,sum)
Exercise- XVI
CONDITIONAL CONTROL STRUCTURES

R if statement

The syntax of if statement is:

if (test_expression) {

statement

If the test_expression is TRUE, the statement gets executed. But if it’s


FALSE, nothing happens.
Here, test_expression can be a logical or numeric vector, but only the first
element is taken into consideration.
In the case of numeric vector, zero is taken as FALSE, rest as TRUE.

Flowchart of if statement
Example: if statement

x <- 5

if(x > 0){

print("Positive number")

Output

[1] "Positive number"

Develop programs on if-else in R.


1. Program to check the leap year or not.

# Program to check if the input year is a leap year or not

year = as.integer(readline(prompt="Enter a year: "))


if((year %% 4) == 0) {
if((year %% 100) == 0) {
if((year %% 400) == 0) {
print(paste(year,"is a leap year"))
} else {

print(paste(year,"is not a leap year"))


}
} else {
print(paste(year,"is a leap year"))
}
} else {
print(paste(year,"is not a leap year"))
}

Output 1

Enter a year: 1900

[1] "1900 is not a leap year"

2. Find the Factorial of a given Number.

# take input from the user

num = as.integer(readline(prompt="Enter a number: "))

factorial = 1

# check is the number is negative, positive or zero

if(num < 0) {

print("Sorry, factorial does not exist for negative numbers")

} else if(num == 0) {

print("The factorial of 0 is 1")

} else {
for(i in 1:num) {

factorial = factorial * i
}
print(paste("The factorial of", num ,"is",factorial))
}

Output

Enter a number: 8

[1] "The factorial of 8 is 40320"

2. Check whether the given number is Even or Odd.

# Program to check if the input number is odd or even.


# A number is even if division by 2 give a remainder of 0.
# If remainder is 1, it is odd.
num = as.integer(readline(prompt="Enter a number: "))
if((num %% 2) == 0) {
print(paste(num,"is Even"))
} else {
print(paste(num,"is Odd"))
}

Output 1

Enter a number: 89 [1]

"89 is Odd"
Exercise- XVII
ITERATIVE CONTROL STRUCTURES
FOR LOOP
A for loop is used to iterate over a vector in R programming.

Syntax of for loop

for (val in sequence)

statement

Here, sequence is a vector and val takes on each of its value during the loop.
In each iteration, statement is evaluated.

Flowchart of for loop


1. Program to count the number of even numbers in a vector.

x <- c(2,5,3,9,8,11,6)

count <- 0

for (val in x) {

if(val %% 2 == 0) count = count+1

print(count)

Output

[1] 3

2. Program to Check Whether the given number is prime or not.

# Program to check if the input number is prime or not

# take input from the user

num = as.integer(readline(prompt="Enter a number: "))

flag = 0

# prime numbers are greater than 1

if(num > 1) {

# check for factors

flag = 1
for(i in 2:(num-1)) {

if ((num %% i) == 0) {

flag = 0

break

if(num == 2) flag = 1

if(flag == 1) {

print(paste(num,"is a prime number"))

} else {

print(paste(num,"is not a prime number"))

Output 1

Enter a number: 25

[1] "25 is not a prime number"


3. Program to display multiplication table.

# R Program to find the multiplicationtable (from 1 to 10)

# take input from the user

num = as.integer(readline(prompt = "Enter a number: "))

# use for loop to iterate 10 times

for(i in 1:10) { print(paste(num,'x',

i, '=', num*i))

Output

Enter a number: 7

[1] "7 x 1 = 7" [1]


"7 x 2 = 14" [1] "7
x 3 = 21" [1] "7 x
4 = 28" [1] "7 x 5
= 35" [1] "7 x 6 =
42" [1] "7 x 7 =
49" [1] "7 x 8 =
56" [1] "7 x 9 =
63" [1] "7 x 10 =
70"
Exercise- XVIII
ITERATIVE CONTROL STRUCTURES
WHILE LOOP
In R programming, while loops are used to loop until a specific condition is
met.
Syntax of while loop

while (test_expression)

statement

Here, test_expressionis evaluated and the body of the loop is entered if the
result is TRUE.
The statements inside the loop are executed and the flow returns to
evaluate the test_expression again.

This is repeated each time until test_expression evaluates to FALSE, in


which case, the loop exits.

Flowchart of while Loop


Example of while Loop

i <- 1

while (i < 6) {

print(i)

i = i+1

Output

[1] 1 [1]

2 [1] 3

[1] 4

[1] 5

1. Check whether the given number is Arm strong number or not.

# take input from the user

num = as.integer(readline(prompt="Enter a number: "))

# initialize sum

sum = 0
# find the sum of the cube of each digit

temp = num

while(temp > 0) {

digit = temp %% 10 sum

= sum + (digit ^ 3) temp =

floor(temp / 10)

# display the result


if(num == sum) {
print(paste(num, "is an Armstrong number"))
} else {
print(paste(num, "is not an Armstrong number"))
}

Output 1

Enter a number: 23

[1] "23 is not an Armstrong number"

2. Find sum of natural numbers without formula.

# take input from the user

num = as.integer(readline(prompt = "Enter a number: "))


if(num < 0) {

print("Enter a positive number")

} else {

sum = 0

# use while loop to iterate until zero

while(num > 0) {

sum = sum + num

num = num - 1

print(paste("The sum is", sum))

Output

Enter a number: 10

[1] "The sum is 55"

3. Program to print the Fibonacci Series

# take input from the user

nterms = as.integer(readline(prompt="How many terms? "))

# first two terms


n1 = 0 n2 =

1 count = 2

# check if the number of terms is valid

if(nterms <= 0) {

print("Plese enter a positive integer")

} else {

if(nterms == 1) {

print("Fibonacci sequence:")

print(n1)

} else {

print("Fibonacci sequence:")

print(n1)

print(n2)

while(count < nterms) {

nth = n1 + n2

print(nth)

# update values

n1 = n2
n2 = nth

count = count + 1

Output

How many terms? 7

[1] "Fibonacci sequence:"

[1] 0

[1] 1

[1] 1

[1] 2

[1] 3

[1] 5

[1] 8
Exercise- XIX R
Bar Plot
Bar plots can be created in R using the barplot() function.
We can supply a vector or matrix to this function. If we supply a vector, the
plot will have bars with their heights equal to the elements in the vector.
Let us suppose, we have a vector of maximum temperatures (in degree
Celsius) for seven days as follows.

max.temp <- c(22, 27, 26, 24, 23, 26, 28)

Now we can make a bar plot out of this data.

barplot(max.temp)

This function can take a lot of argument to control the way our data is
plotted. You can read about them in the help section ?barplot.
Some of the frequently used ones are, main to give the title, xlab and ylab to
provide labels for the axes, names.arg for naming each bar, col to define
color etc.
We can also plot bars horizontally by providing the argument horiz = TRUE.

# barchart with added parameters


barplot(max.temp,

main = "Maximum Temperatures in a Week",

xlab = "Degree Celsius",

ylab = "Day",

names.arg = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),

col = "darkred",

horiz = TRUE)
Plotting Categorical Data

Sometimes we have to plot the count of each item as bar plots from
categorical data. For example, here is a vector of age of 10 college
freshmen.

age <- c(17,18,18,17,18,19,18,16,18,18)

Simply doing barplot(age) will not give us the required plot. It will plot 10
bars with height equal to the student’s age. But we want to know the
number of student in each age category.
This count can be quickly found using the table() function, as shown below.

> table(age)

age

16 17 18 19

1 2 6 1

Now plotting this data will give our required bar plot. Note below, that we
define the argument density to shade the bars.

barplot(table(age),

main="Age Count of 10 Students",

xlab="Age",

ylab="Count",

border="red",
col="blue",

density=10

)
Exercise- XX

R Pie Chart

Pie chart is drawn using the pie() function in R programming . This function
takes in a vector of non-negative numbers.

> expenditure

Housing Food Cloths Entertainment Other

600 300 150 100 200

Let us consider the above data represents the monthly expenditure


breakdown of an individual.

Example: Simple pie chart using pie()

Now let us draw a simple pie chart out of this data using the pie() function.

pie(expenditure)

We can see above that a pie chart was plotted with 5 slices. The chart was
drawn in anti-clockwise direction using pastel colors.
Example 2: Pie chart with additional parameters

pie(expenditure,

labels=as.character(expenditure),

main="Monthly Expenditure Breakdown",

col=c("red","orange","yellow","blue","green"),

border="brown",

clockwise=TRUE

As seen in the above figure, we have used the actual amount as labels. Also,
the chart is drawn in clockwise fashion.
Since the human eye is relatively bad at judging angles, other types of
charts are appropriate than pie charts.
This is also stated in the R documentation – Pie charts are a very bad way
of displaying information.
Voice Questions and Answers - Cycle-II

1. Explain about the significance of transpose in R language


Transpose t () is the easiest method for reshaping the data before analysis.
2. What are with () and BY () functions used for?
With () function is used to apply an expression for a given dataset and BY ()
function is used for applying a function each level of factors.
3. dplyr package is used to speed up data frame management code.
Which package can be integrated with dplyr for large fast tables?
data.table
4. In base graphics system, which function is used to add elements to a
plot?
boxplot () or text ()
5. What is the command used to store R objects in a file?
save (x, file=”x.Rdata”)
6. What is the best way to use Hadoop and R together for analysis?
HDFS can be used for storing the data for long-term. MapReduce jobs
submitted from either Oozie, Pig or Hive can be used to encode, improve and
sample the data sets from HDFS into R. This helps to leverage complex
analysis tasks on the subset of data prepared in R.
7. What will be the output of log (-5.8) when executed on R console?
Executing the above on R console will display a warning sign that NaN (Not a
Number) will be produced because it is not possible to take the log of
negative number.
8. How is a Data object represented internally in R language?
unclass (as.Date (“2016-10-05″))
9. Which package in R supports the exploratory analysis of genomic
data?
Adegenet
10. What is the difference between data frame and a matrix in R?
Data frame can contain heterogeneous inputs while a matrix cannot. In
matrix only similar data types can be stored whereas in a data frame there
can be different data types like characters, integers or other data frames.
11. What are the data types in R on which binary operators can be
applied?
Scalars, Matrices ad Vectors.
12. How do you create log linear models in R language?
Using the loglm () function
13. What will be the class of the resulting vector if you concatenate a
number and NA?
number
14. How can you debug and test R programming code?
R code can be tested using Hadley’s testthat package.
15. What will be the class of the resulting vector if you concatenate a
number and a logical?
number
16. Write a function in R language to replace the missing value in a
vector with the mean of that vector.
mean impute <- function(x) {x [is.na(x)] <- mean(x, na.rm = TRUE); x}

17. What happens if the application object is not able to handle an


event?
The event is dispatched to the delegate for processing.
18. How will you read a .csv file in R language?
read.csv () function is used to read a .csv file in R language. Below is a
simple example –
filcontent <-read.csv (sample.csv)
print (filecontent)
19. Which function helps you perform sorting in R language?
Order ()
20. How will you list all the data sets available in all R packages?
Using the below line of code-
data(package = .packages(all.available = TRUE))

21. Which function is used to create a histogram visualisation in R


programming language?
Hist()
22. Write the syntax to set the path for current working directory in
R environment.
Setwd(“dir_path”)
23. What is the difference between rnorm and runif functions ?
rnorm function generates "n" normal random numbers based on the mean
and standard deviation arguments passed to the function.
Syntax of rnorm function -
rnorm(n, mean = , sd = )
runif function generates "n" unform random numbers in the interval of
minimum and maximum values passed to the function.
Syntax of runif function -
runif(n, min = , max = )
24. What will be the output on executing the following R
programming code –
mat<-matrix(rep(c(TRUE,FALSE),8),nrow=4)
sum(mat)
8
25. How will you combine multiple different string like
“Data”, “Science”, “in” ,“R”, “Programming” as a single string
“Data_Science_in_R_Programmming” ?
paste(“Data”, “Science”, “in” ,“R”, “Programming”,sep="_")
26. Write a function to extract the first name from the string “Mr.
Tom White”.
substr (“Mr. Tom White”,start=5, stop=7)

27. How to request an input from the user through keyboard


and monitor?
In R, there are a series of functions that can be used to request an input
from the user, including readline(), cat(), and scan(). But, I find
the readline() function to be the optimal function for this task.
28. How to read data from the keyboard?
To read the data from keyboard we use three different functions:
scan()
readline()
print()
29. How many ways are there to read and write files?
There are three ways to read and write files respectively:
Reading a data or matrix from a file
Reading a single File One Line at a Time
Writing a Table to a File
30. Explain how you can start the R commander GUI?
Typing the command, ("Rcmdr") into the R console starts the R commander
GUI.
31. In R how you can import Data?
You use R commander to import Data in R, and there are three ways
through which you can enter data into it
You can enter data directly via Data  New Data Set
Import data from a plain text (ASCII) or other files (SPSS, Minitab, etc.)
Read a data set either by typing the name of the data set or selecting the
data set in the dialog box
32. Mention what does not ‘R’ language do?
Though R programming can easily connects to DBMS is not a database
R does not consist of any graphical user interface
Though it connects to Excel/Microsoft Office easily, R language does not
provide any spreadsheet view of data
33. Explain how R commands are written?
In R, anywhere in the program you have to preface the line of code with a
#sign, for example
# subtraction
# division
# note order of operations exists.

34. How can you save your data in R?


To save data in R, there are many ways, but the easiest way of doing this is
Go to Data > Active Data Set > Export Active Data Set and a dialogue box
will appear, when you click ok the dialogue box let you save your data in
the usual way.
35. Mention how you can produce co-relations and covariances?
You can produce co-relations by the cor () function to produce co-relations
and cov () function to produce covariances.
36. Explain what is t-tests in R?
In R, the t.test () function produces a variety of t-tests. T-test is the most
common test in statistics and used to determine whether the means of two
groups are equal to each other.
37. Explain what is With () and By () function in R is used for?
With() function is similar to DATA in SAS, it apply an expression to a
dataset.
BY() function applies a function to each level of factors. It is similar to BY
processing in SAS.
38. What are the data structures in R that is used to
perform statistical analyses and create graphs?
R has data structures like
Vectors
Matrices
Arrays
Data frames
39. In R how missing values are represented ?
In R missing values are represented by NA (Not Available), why impossible
values are represented by the symbol NaN (not a number).
40. What is the function used for adding datasets in R?
rbind function can be used to join two data frames (datasets). The two data
frames must have the same variables, but they do not have to be in the same
order.
41. What is the use of subset() function and sample() function in R ?
In R, subset() functions help you to select variables and observations while
through sample() function you can choose a random sample of size n from
a dataset.
42. How to convert a factor variable to numeric.
The as.numeric() function returns a vector of the levels of your factor and
not the original values. Hence, it is required to convert a factor variable to
character before converting it to numeric.
a <- factor(c(5, 6, 7, 7, 5))
a1 = as.numeric(as.character(a))
43. How to extract first 3 characters from a word.

The substr() function is used to extract strings in a character vector. The


syntax of substr function is substr(character_vector, starting_position,
end_position)

x = "AXZ2016"
substr(x,1,3)
44. How to remove leading and trailing spaces .

The trimws() function is used to remove leading and trailing spaces. a = "
David Banes "
trimws(a)
It returns "David Banes".
45. How to generate random numbers between 1 and 100.

The runif() function is used to generate random numbers. rand

= runif(100, min = 1, max = 100)

46. How to remove all the objects.


rm(list=ls())

47. How to add 3 months to a date.

mydate <- as.Date("2015-09-02")


mydate + months(3)
ADDITIONAL EXPERIMENTS
R PLOT FUNCTION
The most used plotting function in R programming is the plot() function. It
is a generic function, meaning, it has many methods which are called
according to the type of object passed to plot().
In the simplest case, we can pass in a vector and we will get a scatter plot of
magnitude vs index. But generally, we pass in two vectors and a scatter plot
of these points are plotted.
For example, the command plot(c(1,2),c(3,5)) would plot the points (1,3)
and (2,5).
Here is a more concrete example where we plot a sine function form range
-pi to pi.

x <- seq(-pi,pi,0.1)

plot(x, sin(x))
Adding Titles and Labelling Axes

We can add a title to our plot with the parameter main. Similarly, xlab and
ylab can be used to label the x-axis and y-axis respectively.

plot(x, sin(x),

main="The Sine Function",

ylab="sin(x)")

Changing Color and Plot Type

We can see above that the plot is of circular points and black in color. This
is the default color.
We can change the plot type with the argument type. It accepts the
following strings and has the given effect.
"p" - points

"l" - lines

"b" - both points and lines

"c" - empty points joined by lines

"o" - overplotted points and lines

"s" and "S" - stair steps

"h" - histogram-like vertical lines

"n" - does not produce any points or lines

Similarly, we can define the color using col.

plot(x, sin(x),

main="The Sine Function",

ylab="sin(x)",

type="l",

col="blue")
R 3D PLOTS
There are many functions in R programming for creating 3D plots. In this
section, we will discuss on the persp() function which can be used to create
3D surfaces in perspective view.
This function mainly takes in three variables, x, y and z where x and y are
vectors defining the location along x- and y-axis. The height of the surface
(z-axis) will be in the matrix z. As an example,
Let’s plot a cone. A simple right circular cone can be obtained with the
following function.

cone <- function(x, y){

sqrt(x^2+y^2)

Now let’s prepare our variables.

x <- y <- seq(-1, 1, length= 20)

z <- outer(x, y, cone)

We used the function seq() to generate vector of equally spaced numbers.


Then, we used the outer() function to apply the function cone at every
combination of x and y.
Finally, plot the 3D surface as follows.

persp(x, y, z)
Adding Titles and Labelling Axes to Plot

We can add a title to our plot with the parameter main.


Similarly, xlab, ylab and zlab can be used to label the three axes.

Rotational angles

We can define the viewing direction using parameters theta and phi.
By default theta, azimuthal direction, is 0 and phi, colatitude direction, is
15.
Colouring and Shading Plot
Colouring of the plot is done with parameter col.
Similarly, we can add shading with the parameter shade.

persp(x, y, z,

main="Perspective Plot of a Cone",

zlab = "Height",

theta = 30, phi = 15,

col = "springgreen", shade = 0.5)

You might also like