0% found this document useful (0 votes)
14 views23 pages

Introduction To R

R is an open-source programming language widely used for statistical analysis and graphics, featuring over 1,800 packages for various applications. It provides effective data handling, graphical facilities, and a well-developed language for programming tasks. R is particularly useful for data visualization, statistical computing, and data manipulation, making it a popular choice among data scientists and statisticians.

Uploaded by

manu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views23 pages

Introduction To R

R is an open-source programming language widely used for statistical analysis and graphics, featuring over 1,800 packages for various applications. It provides effective data handling, graphical facilities, and a well-developed language for programming tasks. R is particularly useful for data visualization, statistical computing, and data manipulation, making it a popular choice among data scientists and statisticians.

Uploaded by

manu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

What is R and why do we use it?

Open source, most widely


used for statistical analysis
and graphics
Extensible via dynamically
loadable add-on packages
>1,800 packages on CRAN
> v = rnorm(256)
> A = as.matrix (v,16,16)
> summary(A)
> library (fields)
> image.plot (A)
>…
> dyn.load( “foo.so”)
> .C( “foobar” )
> dyn.unload( “foo.so” )
1
Why R?

• Statistics & Data


Mining
• Commercial

• Technical Statistical computing


computing
and graphics
• Matrix and vector
http://www.r-project.org
formulations • Developed by R. Gentleman & R.
• Data Visualization
Ihaka
and analysis • Expanded by community as open
platform source
• Image processing, • Statistically rich
vector computing
2
The Programmer’s Dilemma
What
programming
language to
use & why?

i gh-
Scripting H l
e
(R, MATLAB, IDL) Lev es
uag
ng
La
Object Oriented
(C++, Java)

Functional
languages l
(C, Fortran) v e
w-Le ges
Lo gua
Assembly Lan
3
Features of R

R is an integrated suite of software for data manipulation,


calculation, and graphical display

• Effective data handling


• Various operators for calculations on arrays/matrices
• Graphical facilities for data analysis
• Well-developed language including conditionals, loops,
recursive functions and I/O capabilities.
Basic usage: arithmetic in R

• You can use R as a calculator


• Typed expressions will be evaluated and printed out
• Main operations: +, -, *, /, ^
• Obeys order of operations
• Use parentheses to group expressions
• More complex operations appear as functions
• sqrt(2)
• sin(pi/4), cos(pi/4), tan(pi/4), asin(1), acos(1), atan(1)
• exp(1), log(2), log10(10)
Getting help
• help(function_name)
– help(prcomp)
• ?function_name
– ?prcomp
• help.search(“topic”)
– ??topic or ??“topic”
• Search CRAN
– http://www.r-project.org
• From R GUI: Help  Search help…
• CRAN Task Views (for individual packages)
– http://cran.cnr.berkeley.edu/web/views/

6
Variables and assignment

• Use variables to store values


• Three ways to assign variables
•a=6
• a <- 6
• 6 -> a
• Update variables by using the current value in an
assignment
•x=x+1
• Naming rules
• Can include letters, numbers, ., and _
• Names are case sensitive
• Must start with . or a letter
R Commands
• Commands can be expressions or assignments
• Separate by semicolon or new line
• Can split across multiple lines
• R will change prompt to + if command not finished
• Useful commands for variables
• ls(): List all stored variables
• rm(x): Delete one or more variables
• class(x): Describe what type of data a variable stores
• save(x,file=“filename”): Store variable(s) to a binary file
• load(“filename”): Load all variables from a binary file
• Save/load in current directory or My Documents by
default
Vectors and vector operations

To create a vector: To access vector elements:


# c() command to create # 2nd element of x
vector x
x[2]
x=c(12,32,54,33,21,65) # first five elements of x
# c() to add elements to
x[1:5]
vector x
# all but the 3rd element of x
seq() command to create
#x=c(x,55,32)
sequence of number x[-3]
years=seq(1990,2003) # values of x that are < 40
# to contain in steps of .5 x[x<40]
a=seq(3,5,.5) # values of y such that x is <
40
# can use : to step by 1
y[x<40]
years=1990:2003;
To perform operations:
# rep() command to create # mathematical operations on
data that follow a regular vectors
pattern y=c(3,2,4,3,7,6,1,1)
b=rep(1,5)
x+y; 2*y; x*y; x/y; y^2 9
c=rep(1:2,4)
Matrices & matrix operations

To create a matrix:
# matrix() command to create matrix A with rows and cols
A=matrix(c(54,49,49,41,26,43,49,50,58,71),nrow=5,ncol
=2))
B=matrix(1,nrow=4,ncol=4)
To access matrix elements: Statistical operations:
# matrix_name[row_no, col_no] rowSums(A)
A[2,1] # 2nd row, 1st colSums(A)
column element rowMeans(A)
A[3,] # 3rd row colMeans(A)
A[,2] # 2nd column of the # max of each columns
matrix apply(A,2,max)
A[2:4,c(3,1)] # submatrix of # min of each row
2nd-4th elements of the 3rd and 1st apply(A,1,min)
columns
Element
A["KC",]by element
# access ops:
row by name, Matrix/vector multiplication:
"KC"
2*A+3; A+B; A*B; A/B; A %*% B; 10
Useful functions for vectors and
matrices
• Find # of elements or dimensions
• length(v), length(A), dim(A)
• Transpose
• t(v), t(A)
• Matrix inverse
• solve(A)
• Sort vector values
• sort(v)
• Statistics
• min(), max(), mean(), median(), sum(), sd(), quantile()
• Treat matrices as a single vector (same with sort())
Graphical display and plotting

• Most common plotting function is plot()


• plot(x,y) plots y vs x
• plot(x) plots x vs 1:length(x)
• plot() has many options for labels, colors, symbol, size, etc.
• Check help with ?plot
• Use points(), lines(), or text() to add to an existing plot
• Use x11() to start a new output window
• Save plots with png(), jpeg(), tiff(), or bmp()
R Packages
• R functions and datasets are organized into packages
• Packages base and stats include many of the built-in
functions in R
• CRAN provides thousands of packages contributed by R
users
• Package contents are only available when loaded
• Load a package with library(pkgname)
• Packages must be installed before they can be loaded
• Use library() to see installed packages
• Use install.packages(pkgname) and
update.packages(pkgname) to install or update a package
• Can also run R CMD INSTALL pkgname.tar.gz from
command line if you have downloaded package source
Exploring the iris data
• Load iris data into your R session:
– data (iris);
– help (data);
• Check that iris was indeed loaded:
– ls ();
• Check the class that the iris object belongs to:
– class (iris);
• Read Sections 3.4 and 6.3 in “Introduction to
R”
• Print the content of iris data:
– iris;
• Check the dimensions of the iris data:
– dim (iris);
• Check the names of the columns:
– names (iris);
14
Exploring the iris data (cont.)
• Plot Petal.Length vs. Petal.Width:
– plot (iris[ , 3], iris[ , 4]);
– example(plot)
• Exercise: create a plot similar to this figure:

Src: Figure is from Introduction to Data


Mining by Pang-Ning Tan, Michael Steinbach, 15
Reading data from files

• Large data sets are better loaded through the file input
interface in R
• Reading a table of data can be done using the read.table()
command:
• a <- read.table(“a.txt”)
• The values are read into R as an object of type data frame (a
sort of matrix in which different columns can have different
types). Various options can specify reading or discarding of
headers and other metadata.
• A more primitive but universal file-reading function exists,
called scan()
• b = scan(“input.dat”);
• scan() returns a vector of the data read
Programming in R
• The following slides assume a basic
understanding of programming concepts

• For more information, please see chapters 9 and


10 of the R manual:
http://cran.r-project.org/doc/manuals/R-intro.html

Additional resources
• Beginning R: An Introduction to Statistical Programming
by Larry Pace
• Introduction to R webpage on APSnet:
http://www.apsnet.org/edcenter/advanced/topics/
ecologyandepidemiologyinr/introductiontor/Pages/default.aspx
• The R Inferno:
http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
17
Conditional statements

• Perform different commands in different situations


• if (condition) command_if_true
• Can add else command_if_false to end
• Group multiple commands together with braces {}
• if (cond1) {cmd1; cmd2;} else if (cond2) {cmd3;
cmd4;}
• Conditions use relational operators
• ==, !=, <, >, <=, >=
• Do not confuse = (assignment) with == (equality)
• = is a command, == is a question
• Combine conditions with and (&&) and or (||)
• Use & and | for vectors of length > 1 (element-wise)
Loops
• Most common type of loop is the for loop
• for (x in v) { loop_commands; }
• v is a vector, commands repeat for each value in v
• Variable x becomes each value in v, in order
• Example: adding the numbers 1-10
• total = 0; for (x in 1:10) total = total + x;
• Other type of loop is the while loop
• while (condition) { loop_commands; }
• Condition is identical to if statement
• Commands are repeated until condition is false
• Might execute commands 0 times if already false
• while loops are useful when you don’t know number of
Scripting in R

• A script is a sequence of R commands that perform some


common task
• E.g., defining a specific function, performing some
analysis routine, etc.
• Save R commands in a plain text file
• Usually have extension of .R
• Run scripts with source() :
• source(“filename.R”)
• To save command output to a file, use sink():
• sink(“output.Rout”)
• sink() restores output to console
• Can be used with or outside of a script
Lists

• Objects containing an ordered collection of objects


• Components do not have to be of same type
• Use list() to create a list:
• a <- list(“hello”,c(4,2,1),“class”);
• Components can be named:
• a <- list(string1=“hello”,num=c(4,2,1),string2=“class”)
• Use [[position#]] or $name to access list elements
• E.g., a[[2]] and a$num are equivalent
• Running the length() command on a list gives the number of
higher-level objects
Writing your own functions

• Writing functions in R is defined by an assignment like:


• a <- function(arg1,arg2) { function_commands; }
• Functions are R objects of type “function”
• Functions can be written in C/FORTRAN and called via .C()
or .Fortran()
• Arguments may have default values
• Example: my.pow <- function(base, pow = 2) {return
base^pow;}
• Arguments with default values become optional, should
usually appear at end of argument list (though not required)
• Arguments are untyped
• Allows multipurpose functions that depend on argument
type
Useful R links
• R Home: http://www.r-project.org/
• R’s CRAN package distribution:
http://cran.cnr.berkeley.edu/
• Introduction to R manual:
http://cran.cnr.berkeley.edu/doc/manuals/R-intro.pdf
• Writing R extensions:
http://cran.cnr.berkeley.edu/doc/manuals/R-exts.pdf
• Other R documentation:
http://cran.cnr.berkeley.edu/manuals.html

23

You might also like