0% found this document useful (0 votes)
30 views52 pages

BasicRWorkshop Jan 2016

Uploaded by

LeeYongEn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views52 pages

BasicRWorkshop Jan 2016

Uploaded by

LeeYongEn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

BASIC R WORKSHOP

January 2016 2

Topics Covered
1. The R Environment
2. The R Language
a) Expressions
b) Workspace objects
c) Function call
3. Data Types and Structures
4. Data Import/Export and Manipulation
5. Plots in R
January 2016 3

Session 1: The R Environment


• Various softwares are available:
• RGui (http://www.r-project.org/)
• RStudio (http://www.rstudio.com)
• Tinn-R
(http://sourceforge.net/projects/tinn-r/)
January 2016 4

RGui: R Console
Command line

The symbol “>” is known as


the prompt.

We can type the R code on


the command line and run it
upon pressing “Enter”.
January 2016 5

RGui: R Editor
• The editor allows you to
type and edit your R code.
• There are 2 ways to run
the code from the editor:
1. Pressing Ctrl+R will run the
line of code that your text
cursor is at.
2. Highlight the line(s) of code
that you want to run and
press Ctrl+R.
3. Highlight the line(s) of
codes that you want to R,
click the right mouse
button, and choose “Run
line or selection”.
January 2016 6

RGui: R Editor
1. Start a new script (Ctrl-N).

2. It is always good to keep track of what you are doing in any


of your programming files. In the editor, type the following:

#Filename: MyFirstR.R
#Purpose: Basic R Workshop Exercises
January 2016 7

RGui: R Editor
Exercise:
Type the following lines in your R editor and run the lines.
Observe what happens in the R console.

#This is a comment

a = 1+2+3
a
January 2016 8

RGui: Setting the Work Directory


• A work directory is where you place all your
working files, for example, data sets, figures, etc.

• It is normally handy to set the path to the work


directory you want to work in before commencing
any coding work.

• This enables you to read/write the necessary files


within the working directory immediately.
January 2016 9

RGui: Setting the Work Directory


There are two approaches to set your work directory:
1. Command line

setwd("C:/Users/gloriateng/Dropbox/RWorkshop")
getwd()

*Note that the backslash (/) is used instead of the forward slash (\).

2. Point-and-Click
a. Click on any empty spot of your R console.
b. Go to “File” – “Change dir”.
c. Choose the directory path where your working
directory is.
January 2016 10

RGui: Setting the Work Directory


Exercise:

1. Create a folder with your name.

2. By using either of the approach given, go back to RGui


and set your working directory to the path of the folder
that you have created.

3. Check that R has indeed set its path to your working


directory by typing getwd() in the command line.
January 2016 11

RGui: Saving Your Work


Saving your code in the editor:
1. Click on any empty space of the R editor.
2. Press Ctrl+S to save your code.
3. Save your code in a file name “MyFirstR.R”.
4. Go to your working directory that you have created just now. Check
that your code is indeed saved as “MyFirstR.R”.

Saving the workspace:


1. The workspace consists of all the commands and results generated
in the R console.
2. Quit R. A box will appear asking if you would like to save your
workspace.
3. Press “Yes”. Your workspace is now saved.
4. Go to your working directory and you will notice that you have 3
files: “MyFirstR.R”, “.Rdata”, and “.Rhistory”.
January 2016 12

RGui: Loading your code and workspace


Exercise:
1. Start RGui.
2. Open the file “MyFirstR.R”.
3. Run the line
setwd("C:/Users/gloriateng/Dropbox/RWorkshop"
)
4. Click on any empty space on your R console.
5. Click “File” – “Load workspace” and open the workspace
“.Rdata” that was saved previously.
6. This line will appear in your R console:
load("C:\\Users\\gloriateng\\Dropbox\\RWorksh
op\\.RData")
7. Now type “a” and the number 6 should appear on your
console.
January 2016 13

RGui: Loading your code and workspace


• The workspace retains all the values and results that
were run previously. Thus, when you type “a”, R
remembers that the value 6 was assigned to “a”
previously and generates the value 6.

• Normally it is recommended to only save your R code


(in .R) and not your workspace.
January 2016 14

Recap
• RGui: a software that allows you to edit and run R
codes.

• R console: allows you to type and run the code from the
command line

• R editor: allows you to type, edit, and run code. All code
written in the R editor will be saved as a .R file.

• Setting work directory: setwd(…)

• Saving your code/work: it is normally recommended to


save your code in the R editor.
January 2016 15

Session 2: The R Language


• Current version of the R language: 3.2.3
(“Wooden Christmas-Tree”, released 12 December 2015).

• Unfortunately, no automatic version update…


January 2016 16

Session 2: The R Language


• This session will cover:

Expressions Symbols and


assignments
Constant values Conditions

Arithmetic Keywords

Functions Variables

Workspace objects Getting help


January 2016 17

Expressions
Type the expressions given (without the comments) in your editor and run your
code.

1+2+3 #this is an example of an expression


600 #this is also an expression, and is known as a
constant
"apple" #this is a character

1+2-4*3/2^2 #an example of arithmetic.


#try calculating this expression by hand. Does your answer
match the answer given by R?

pi #running this symbol/variable would return the value


3.141593. The value 3.141593 is predefined in R and is
assigned to the symbol pi.
January 2016 18

Expressions
Type the expressions given (without the comments) in your editor and run your
code.

#you can assign values to your own symbols/variables as well


a = 4*3
b = a*4

a
b

#Note that naming variables/symbols is an art by itself. A good


programmer will never name their variables too simply. Try to be
descriptive yet succint when naming variables. For example,
dateOfBirth or dob.

a > b #this is an example of a condition.


a < b
a == b
January 2016 19

Expressions: Conditions
== Equality

> and >= Greater than (or equal to)

< and <= Less than (or equal to)

!= Inequality

&& Logical and

|| Logical or

! Logical not
January 2016 20

Expressions: Keywords
These symbols normally appear when R is unable to
evaluate an expression, for example, or when a
dataset has missing values.

NA Missing or unknown value (normally appear in


datasets with missing values.)

Inf Infinity (for e.g., 1/0)


NaN An arithmetic result that is undefined (for e.g.,
0/0)
NULL An empty result
January 2016 21

Workspace Objects
• When a value is assigned to a symbol/variable, it is
remembered as an object in R’s memory.
• Recall that you currently have two variables, namely “a”
and “b” in memory.
• To find out what are the objects currently in your
workspace, type ls().
• To clear all the objects in your workspace, type
rm(list = ls(all=TRUE)).
• Now type ls()again. You have now removed all the
objects in your workspace.
• To clear the console, using Ctrl-L will do the magic.
January 2016 22

Expressions: Function Calls


• A function call can be thought of as an
instruction given to R to perform a task.
• A typical R function:
functionname(argument1,
argument2, …)
• Examples:
• sum(1:3)
• mean(1:3)
• sd(1:3)
• rm(list = ls(all=TRUE))
January 2016 23

Expressions: Function Calls

•rm(list =
ls(all=TRUE))
January 2016 24

Getting Help
• help(functionname)

• A typical R documentation (or help file) consists of the


following:
• Description
• Usage
• Arguments
• Details
• See also
• Examples

• An example: help(sd)
• And don’t forget, Google is always one’s best friend!
January 2016 25

Recap
• Expressions: constant values, arithmetic,
conditions, assigning values to
symbols/variables, keywords, functions
• Use ls() to call the variables in your current
workspace.
• Use rm(list = ls(all=TRUE)) to remove
all variables in your current workspace.
• Ctrl-L clears the console.
• Learning to read the R documentation properly
helps you to understand a function better.
January 2016 26

•It is very common to blindly copy-and-paste code without


having the right understanding the syntax and semantics.
This is not a good way to learn programming.

•Take the time to understand the argument(s) for each


function used.

•Learn to read the documentations and understand how


each function is used. It is okay if you do not understand
all the details of the functions.

•Try out the examples given in the documentation.

•If the code is not running, stay calm, and check your
code. Most of the time it could be a spelling mistake
(especially big/small letters), a forgotten parenthesis, or
wrong usage of the function argument.

•Practice, practice, practice!


January 2016 27

Session 3: Data Types and Structures


• Data types:
• Numeric values
• Character values

• Data structures:
• Vectors
• Factors
• Matrices
• Data frames
• Lists
January 2016 28

Data Structures: Vectors


• One-dimensional.
• A collection of values with the same data type.
• An entry of a vector is known as an “element”.

1 A
2 B
3 C
A numeric vector. A character vector.
January 2016 29

Data Structures: Vectors


Exercise

1. Create a numeric vector to store the following values:


2 4 6 2 8

counts = c(2, 4, 6, 2, 8)
counts

2. Create a character vector to store the following values:


Apple Orange Grape Apple Apple

fruits = c(“Apple”, “Orange”, “Grape”, “Apple”, “Apple”)


fruits

See https://www.stat.auckland.ac.nz/~paul/ItDT/HTML/node64.html for more examples on how to create vectors.


January 2016 30

Data Structures: Factors


• One-dimensional.
• A collection of values that come from a fixed set of possible values
(levels).
• Can be used to represent categorical variables.
Question:
M 0 What are the levels for the
variable “Gender”?

F 1
M 0
Gender = {M, F} Gender = {0, 1}
January 2016 31

Data Structures: Factors


Exercise

1. Create a character vector to store the following values:


M F M F M
gender = c(“M”, “F”, “M”, “F”, “M”)

2. Create a numeric vector to store the following values:


0 1 0 1 0
genderN = c(0, 1, 0, 1, 0)

3. Convert the vector genderN to a factor.


genderF = factor(genderN)
genderF = factor(genderN, levels = c(0,1))

Think-Tank Time: What is the difference between genderN and genderF?


January 2016 32

Data Structures: Data Frames


• Two-dimensional.
• Many columns.
• A collection of vectors/factors of the same length.

counts fruits gender genderF


2 Apple M 0
4 Orange F 1
6 Grape M 0
2 Apple F 1
8 Apple M 0
January 2016 33

Data Structures: Data Frames


Exercise

1. Create a data frame for the variables counts, fruits, gender, and
genderF.

myframe = data.frame(counts, fruits, gender, genderF)

2. Observe what happens when you run the following commands given:

• myframe
• dimnames(myframe)
• dim(myframe)
• nrow(myframe)
• ncol(myframe)
• names(myframe)
• myframe[1,1]
• myframe[1,]
• myframe[1]
January 2016 34

Data Structures: Lists


• A collection of vectors. The vectors can be of different lengths.
• A list consists of components that can be from any type of data structure.

> myframe
counts fruits gender genderF
1 2 Apple M 0
2 4 Orange F 1
3 6 Grape M 0
4 2 Apple F 1
5 8 Apple M 0

> dimnames(myframe)
[[1]] Component Index
[1] "1" "2" "3" "4" "5"

[[2]]
[1] "counts" "fruits" "gender" "genderF”
January 2016 35

Data Structures: Lists


Exercise

1. Create a list containing the row names and column names of


myframe.

mylist = list(rownames=rownames(myframe),
colnames=colnames(myframe))
mylist
January 2016 36

Data Structures: Matrices and Arrays


Matrices
• Two-dimensional.
• All values must be of the same type.

1 5 9
2 6 10
3 7 11
4 8 12
A 4 x 3 matrix.
January 2016 37

Data Structures: Matrices and Arrays


Arrays
• Can be more than two dimensions.
• For example, a three-dimensional array corresponds to a data cube.

Image taken from the web


January 2016 38

Data Structures: Matrices and Arrays


Exercise

1. Create a matrix with 2 rows and 3 columns for the values 1, 2, 3, 4,


5, 6.

matrix(1:6, ncol=3)
matrix(1:6, nrow = 2, ncol=3)

Observe the results from these two lines of code. Are they same of
different?

2. Create a 2 x 2 x 2 array for the values 1, 2, 3, 4, 5, 6, 7, 8.

array(1:8, dim=c(2,2,2))
January 2016 39

Recap
•Vectors

•Factors

•Data frames

•Lists

•Matrices and Arrays


January 2016 40

Session 4: Data Import/Export and


Manipulation
• Reading Excel files (save as .csv and use
function read.csv)

• Reading .txt files and writing to .txt files

• Data manipulation
• Extracting rows/columns from datasets
• The “apply” function
January 2016 41

Writing and Reading .txt Files


Exercise:
1. Write the data frame myframe to a .txt file and save it as
myframe.txt.
write.table(myframe, file = "myframe.txt", sep = "\t",
row.names = TRUE, col.names = TRUE)

2. Read the file mycar.txt.


mycar = read.table("mycar.txt", sep = "\t", header =
TRUE)
mycar = read.table(file.choose(), sep = "\t", header =
TRUE) #an alternative
January 2016 42

Data Manipulation
Exercise:
1. Extract the column “speed”.

mycar #not advisable when you have a large


dataset
names(mycar)
speed #Observe what happens here.
mycar$speed

attach(mycar)
speed #Observe what happens here.
January 2016 43

Data Manipulation
Exercise:
2. Find the sum, mean, and variance for the variables speed and
dist.

apply(mycar, 1, sum) #not meaningful


apply(mycar, 1, mean) #not meaningful
apply(mycar, 1, var) #not meaningful

apply(mycar, 2, sum)
apply(mycar, 2, mean)
apply(mycar, 2, var)

*Check out tapply, lapply, sapply too!


January 2016 44

Recap
• read.table(), write.table()

• Data manipulation
• Extracting rows/columns from datasets: using
the $ sign or attaching the object
• The “apply” function
January 2016 45

Session 5: Plots in R
Exercise 1:
1. Read the documentation for the function plot (run help(plot)).
2. Type and run the following lines. Observe the difference in each line and what
happens when you run the code.

plot(mycar)

plot(dist, speed)

plot(dist, speed, main = "Speed and Stopping Distances", xlab


= "Distance (ft)", ylab = "Speed (mph)")

plot(dist, speed, main = "Speed and Stopping Distances", xlab


= "Distance (ft)", ylab = "Speed (mph)", pch=20)

plot(dist, speed, main = "Speed and Stopping Distances", xlab


= "Distance (ft)", ylab = "Speed (mph)", pch=20, col="blue")
January 2016 46

Session 5: Plots in R
Exercise 1 (continued):

abline(h = 10)
abline(v=50)

text(dist, speed, 1:nrow(mycar), cex = 0.8, offset =


0.2, pos = 2)

plot(dist, speed, main = "Speed and Stopping


Distances", xlab = "Distance (ft)", ylab = "Speed
(mph)", pch=20, col="blue")
m = which.max(speed)
m

text(dist[m], speed[m], m, cex = 0.8, offset = 0.2, pos


= 2)
mycar[m, 1:2]
January 2016 47

Session 5: Plots in R
Exercise 2 (extra stuff in video, please watch it!)
1. Generate 100 random variables from the standard normal distribution.
2. Plot a histogram with the estimated density on the left, and the empirical
cumulative distribution function on the right.
n = 100 #sample size
x = rnorm(n) #generate n number of random variables from the standard normal
distribution

par(mfrow = c(1,2)) #divide the graphic window into 1 row and 2 columns

#draw a histogram (if probability = TRUE, the density would be drawn, otherwise
the count of observations will be given)
hist(x, xlab = "x", ylab = "Density", main = "",
col = "grey", border = "white", probability = "TRUE")

#add lines onto the graph


u = seq(min(x), max(x), by = 0.01) #read up the documentation for seq
lines(u, dnorm(u, mean(x), sd(x)), lty = 2)

#add a legend
legend("topleft", c("Histogram", "Normal density"), cex=0.8,
col = c("grey", "black"), lwd = c(NA,1),
lty = c(NA,2), pch = c(15, NA), bty = "n")
January 2016 48

Session 5: Plots in R
Exercise 2 (continued):

#Now plot the empirical cumulative distribution function


F.empirical <-function(y) mean(x <= y)
#this creates a function with input y, and outputs the average value of all
#the x values that are less than or equal to y

#read the help file for Vectorize


plot(u, Vectorize(F.empirical)(u), type = "s", lwd = 2, col = "grey",
xlab = "x", ylab = "Cumulative probability", main = "", axes = False)
axis(1); axis(2)
lines(u, pnorm(u, mean(x), sd(x)),lty=2)

#the function locator(1) allows you to choose the location of the legend
#using a mouse click
legend(locator(1), c("Empirical c.d.f", "Normal c.d.f."), cex=0.8,
col = c("grey", "black"), lwd = c(2,1),
lty = c(1,2), bty = "n")
January 2016 49

Misc: Installing/Loading R Packages


• Reading Excel files into R require specific packages to be
installed.
• An R package can be though of as a bundle/library containing
specific functions for specific tasks. Some packages contain in-
built datasets as well.
• The standard R installation includes about 25 packages. When a
package is required to run a specific function, do the following:
install.packages(“NameofLibrary")

Note that you will need internet connection to perform the


installation.

• To load the library, type library(NameOfLibrary)


January 2016 50

Misc: Copying R output to Word

•Saving images

•Use font Courier New


or Lucida Console for R
output.
January 2016 51

Moving on
1. Statistics in R
2. Linear regression and time series (Applied Statistical
Models)
3. for, while loops, creating your own functions
4. S4 objects
5. Read the documentation
6. Google with discernment
7. Practice, practice, practice, practice!
January 2016 52

References
1. An interactive online learning tool: http://tryr.codeschool.com/

2. Notes written by Paul Murrell (University of Auckland, New Zealand):


https://www.stat.auckland.ac.nz/~paul/ItDT/HTML/node1.html

3. R Tutorial: http://www.r-tutor.com/

4. R-bloggers: http://www.r-bloggers.com/

5. Quick-R: http://www.statmethods.net/

6. Flowing Data: http://flowingdata.com/tag/r/

Image references:
(pg. 23)
http://www.alabamarivers.org/press-room/media-relations/action-alert-protect-impaired-streams

(pg. 35)
https://courses.cs.washington.edu/courses/csep573/01sp/lectures/class1/sld038.htm

(Pg. 46)
http://metamorphicliving.files.wordpress.com/2012/09/footsteps.jpg

You might also like