0% found this document useful (0 votes)
81 views7 pages

Brief Introduction To R Kaustav Banerjee: Decision Sciences Area, IIM Lucknow

R is a programming language and software environment for statistical analysis. This document provides a brief introduction to using R through R Studio. It describes the four main windows of the R Studio interface - the editor, console, workspace, and viewer. Basic functions like plotting, help pages, and running code are explained. Examples demonstrate using R for calculations, simulations, creating variables and accessing elements of vectors and matrices. Data can be stored in R in vectors, matrices, and data frames.

Uploaded by

Rajiv Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views7 pages

Brief Introduction To R Kaustav Banerjee: Decision Sciences Area, IIM Lucknow

R is a programming language and software environment for statistical analysis. This document provides a brief introduction to using R through R Studio. It describes the four main windows of the R Studio interface - the editor, console, workspace, and viewer. Basic functions like plotting, help pages, and running code are explained. Examples demonstrate using R for calculations, simulations, creating variables and accessing elements of vectors and matrices. Data can be stored in R in vectors, matrices, and data frames.

Uploaded by

Rajiv Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Brief Introduction to R

Kaustav Banerjee
Decision Sciences Area, IIM Lucknow

1 Introduction
This is a minimal introduction to R via R Studio: we will discuss just those things, which are
necessary to get started for our course. As we progress in the course, we will learn more about R.
We should install R first, from the CRAN home page. Then we install R Studio (free version). As
you open R Studio and open the R script file, the interface looks like:

Figure 1: R Studio interface

Editor This top-left editor window is for writing your code: its the input device. We write on R
script files. You can run a part of your code only, by accessing the ‘Run Region’ commands,
available in the ‘Code’ menu drop-down list.

Console This bottom-left window is the output device: you get your output in the console.

Workspace This top-right window is to keep track of your workspace/history for your R session

Viewer This bottom-right window has tabs for viewing the plot/help page/package/file etc.

1
Remember
1. Save your R script file as filename.R, with .R extension.
2. You do not double-click on your R script file located in the working directory, for access in R
Studio. Rather, you open R Studio, and access the script file from File menu.
3. While quitting R Studio, a message window(Figure 2) asks whether you want to save the R
session. By default, it will save the session history. You should always select ‘Don’t Save’.

Figure 2: Quit R Studio message

Help files: Say you are trying to get the median of a data set, and you need help. You can try
either of the following: (a) Go to the ‘Help’ tab located in the Viewer window, and type ‘median’
in the search box. It will take you to the related help page. (b) Type help(median) in the console
window and press ‘Enter’, it will take you to the same help page.
Reading the help page itself is a task for beginners. Good thing is, you will find several sample
codes in the help page. You can copy-paste them in the console/editor and check how they work.

2 Your first code


Suppose you need to look at the binomial probability distribution for n = 10, θ = 1/2. You type

> plot(x = 0:10, y = dbinom(0:10,10,prob = 0.5), type = "h", col = "red", lwd = 2)

Now you may feel that the outer box is unneccesary and the axes should have proper names. To
get that you add a few options further and obtain Figure 3

> plot(x = 0:10, y = dbinom(0:10,10,prob = 0.5), type = "h", col = "red", lwd = 2,
bty = "n", xlab = "Outcome", ylab = "Probability")
0.25

0.25
0.20

0.20
dbinom(0:10, 10, prob = 0.5)

0.15

0.15
Probability
0.10

0.10
0.05

0.05
0.00

0.00

0 2 4 6 8 10 0 2 4 6 8 10

0:10 Outcome

Figure 3: Binomial probability distribution

For further details on how to control different plotting options, go to the help page for ‘plot’.

2
3 First few examples
3.1 R as calculator
R can be used as a calculator. To see, go to the console window, type the following and press ‘enter’.

> 10^3 + 500


[1] 1500

This [1] before 1500 indicates that the first member of the output vector is 1500. To simulate 5
observations from Normal(µ = 0, σ = 1), type the following in the console window and press ‘enter’.

> rnorm(n = 5, mean = 0, sd = 1)


[1] -0.93299939 1.79357651 -0.43180958 0.03244752 -0.27536938

Notice that, you will have different data in your case, as it’s simulation exercise. Also, while typing
the code in the console window you may have noticed a help tab (Figure 4) appearing automatically,
telling you the right way to do things. This is one of the big plus of R Studio. When you find the

Figure 4: Help tab in R Studio

help tab, if you press ‘Tab’ button, it will guide you through the entire code so that you don’t go
wrong. If, for the sake of reproduction, you need to save the data you have simulated, do it as:

> set.seed(5)
> rnorm(n = 5, mean = 0, sd = 1)
[1] -0.84085548 1.38435934 -1.25549186 0.07014277 1.71144087
> set.seed(5)
> rnorm(n = 5, mean = 0, sd = 1)
[1] -0.84085548 1.38435934 -1.25549186 0.07014277 1.71144087

3.2 Workspace
You can also give numbers a name. By doing so, they become so-called variables which can be
used later. For example, you can type in the console:

> a = rnorm(n = 5, mean = 0, sd = 1)


> a
> mean(a); var(a)
[1] -0.6029080 -0.4721664 -0.6353713 -0.2857736 0.1381082
[1] -0.3716222
[1] 0.1000902

3
Check that ‘a’ appears in the History window, which means that R now remembers what ‘a’ is. To
remove all such variables from R memory, type the following or click ‘Clear all history entries’ icon
of the History window.
> rm(list = ls())

4 Storing and accessing data


R organizes data in scalar (a single number – 0-dimensional), vector (a row of numbers, also called
arrays – 1-dimensional) and matrix (like a table – 2-dimensional) format. There are other formats
as well, like data frame, list etc. But first we talk about vector.

4.1 Data in vector format


Let us create a small data set first by using concatenate or c function:
a = c(2,4,0,29,3)
Now we access the elements of the vector ‘a’ to do number of things as follows:
length(a) # How many elements?
a[3] # Get 3rd element
a[-3] # Get all but 3rd element
a[2:4] # Get 2nd, 3rd & 4th element
a[c(1,5)] # Get 1st and 5th element
a[a > 2] # Get all elements greater than 2
a[a < 3 | a > 4] # Get elements less than or bigger than some values
which(a == max(a)) # What is the rank of the maximum value?
a[(length(a)-3):length(a)] # Get last 3 elements

4.2 Data in matrix format


In the following we have two matrices: in the second one, data are stored row-wise unlike the first.
> b = matrix(1:6, nrow = 2, ncol = 3)
> b
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> b = matrix(1:6, nrow = 2, ncol = 3, byrow = T)
> b
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
To access a particular row or column we write:
> b[2,]; b[,2]
[1] 4 5 6
[1] 2 5

4
When we have a matrix with many rows or columns (by convention columns stand for variables
and rows stand for outcomes or cases), we do it this way:
> b = matrix(0, nrow = 5, ncol = 2)
> b[,1] = rnorm(n = 5, mean = 0, sd = 1)
> b[,2] = rnorm(n = 5, mean = 0, sd = 1)
> b
[,1] [,2]
[1,] 1.2276303 -0.1389861
[2,] -0.8017795 -0.5973131
[3,] -1.0803926 -2.1839668
[4,] -0.1575344 0.2408173
[5,] -1.0717600 -0.2593554
Now we have two independent samples drawn from Normal(µ = 0, σ = 1) stored in a matrix. Alter-
natively, we can create two vectors (must be of same length) and bind them column-wise to create
a matrix.
> b1 = rnorm(n = 5, mean = 0, sd = 1)
> b2 = rnorm(n = 5, mean = 0, sd = 1)
> b = cbind(b1,b2)
> b
b1 b2
[1,] 0.9005119 -0.2934818
[2,] 0.9418694 1.4185891
[3,] 1.4679619 1.4987738
[4,] 0.7067611 -0.6570821
[5,] 0.8190089 -0.8527954
If we have two matrices (of same number of columns) we can join them row-wise also:
> b1 = matrix(1:6, nrow = 2, ncol = 3, byrow = T)
> b2 = matrix(7:12, nrow = 2, ncol = 3, byrow = T)
> b = rbind(b1, b2)
> b
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
To do some matrix operations, the function apply is very useful. Say we need the maximum value
from each row or column. We do it this way:
> apply(b, MARGIN = 1, max)
[1] 3 6 9 12
> apply(b, MARGIN = 2, max)
[1] 10 11 12
The third argument, which is the in-built function max here, can be any in-built function available
in R. It could also be some user-defined function.

5
4.3 Data in data frame format
Often we see data presented in a tabular format similar to a spreadsheet. R has a flexible way of
handling such data: it’s called data frame. Let us create the following:

> height = c(65, 61, 70, 65)


> gender = c("F","F","M","F")
> study = data.frame(height, gender)
> study
height gender
1 65 F
2 61 F
3 70 M
4 65 F

There are number of ways we can access data frame. A few are as follows, which will provide
same output, as the one obtained at the end.

study[, ’gender’] # Access as array by column name


study[, 2] # Access as array by column number
study$gender # Access as list by column name
study[[2]] # Access as list by column number
[1] F F M F
Levels: F M

Notice that, the column gender is categorical in nature: R treats it as a ‘factor’. One simple way
to deal with factors is to use table function:

> table(study$gender)
F M
3 1

4.4 Data in list format


A list is sometimes useful in creating a larger object which is composed of smaller objects. Suppose
we have two vectors (of unequal length) to be stored. Using matrix or data frame does not help.
So we use list, as follows.

> b1 = rnorm(n = 6, mean = 0, sd = 1)


> b2 = rnorm(n = 5, mean = 0, sd = 1)
> b = list(sample1 = b1, sample2 = b2)
> b
$sample1
[1] -1.1365828 0.8548304 -0.5783704 0.4963615 -0.7600579 -0.3413863

$sample2
[1] -2.1023291 -0.3017023 -1.2723834 -0.2796661 -0.2040973

We should now be able to access these objects individually.

6
4.5 Data import/export
Say, we have a data set in spreadsheet like MS-Excel. To import the data in R, do the following.
1. Save the excel file in comma separated value (csv) format (eg. filename.csv), you will get this
option from the save-as window drop-down list.
2. Suppose your file is located in ‘Data’ folder under ‘K’ drive
3. Run the command
read.csv("K:/Data/filename.csv", header = T)
4. In case you have data in txt format, run the command
read.table("K:/Data/filename.txt", header = T)
5. Check whether your data set is stored in data frame format.
Suppose we need to export a random sample created in R, in the ‘Data’ folder under ‘K’ drive. Run
the following commands:

sample = rnorm(10, 0, 1)
normal.sample = data.frame(data = sample)
write.csv(normal.sample, "K:/Data/normal sample.csv", row.names = F, col.names = T)

Ideally, your data set should remain separated from your R code.

5 Writing your first function


Let us try a simple one: do the sum of a given data set. We do this way:

my.sum = function(arg1)
{
s = 0
for (i in 1:length(arg1))
{
s = s + arg1[i]
}
return(s)
}
sample1 = 1:10
my.sum(sample1); sum(sample1) # Check my sum with the inbult sum function

If your function needs more arguments, you separate them with comma: arg1, arg2,... so on.

You might also like