0% found this document useful (0 votes)
58 views27 pages

R Prog

1. The document provides an overview of R programming, including its data structures, functions, and use for data visualization and statistics. 2. R can handle data, perform matrix algebra and statistical analysis, and create graphics. It connects to databases and other programs. 3. R is widely used for statistical analysis and modeling through its many add-on packages that provide state-of-the-art statistical and machine learning methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views27 pages

R Prog

1. The document provides an overview of R programming, including its data structures, functions, and use for data visualization and statistics. 2. R can handle data, perform matrix algebra and statistical analysis, and create graphics. It connects to databases and other programs. 3. R is widely used for statistical analysis and modeling through its many add-on packages that provide state-of-the-art statistical and machine learning methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

R Programming

Overview
1. The Basics
2. R Data Structures
3. Data Input/Output
4. In-Built Functions
5. Data Visualization
What R does and does not
• data handling and storage: • is not a database, but
numeric, textual connects to DBMSs
• matrix algebra • has no graphical user
interfaces, but connects to
• hash tables and regular
Java, TclTk
expressions
• language interpreter can be
• high-level data analytic
very slow, but allows to
and statistical functions
call own C/C++ code
• classes (“OO”)
• no spreadsheet view of
• graphics data, but connects to
• programming language: Excel/MsOffice
loops, branching, • no professional /
subroutines commercial support
R and statistics
• Packaging: a crucial infrastructure to efficiently produce, load
and keep consistent software libraries from (many) different
sources / authors
• Statistics: most packages deal with statistics and data analysis
• State of the art: many statistical researchers provide their
methods as R packages
R as a Calculator
> 1550+2000
[1] 3550
or various calculations in the same row
> 2+3; 5*9; 6-6
[1] 5

1.0
[1] 45
[1] 0

0.5
sin(seq(0, 2 * pi, length = 100))
> log2(32)
[1] 5

0.0
> sqrt(2)

-0.5
[1] 1.414214
> seq(0, 5, length=6) -1.0

[1] 0 1 2 3 4 5
> plot(sin(seq(0, 2*pi, length=100))) 0 20 40

Index
60 80 100
Variables

> i = 81
> sqrt(i) numeric
[1] 9

> prov = "All that Glitters are not Gold"


character
> sub("Glitters ","Glisters",prov)
[1] " All that Glisters are not Gold“ string

> 1>2
[1] FALSE logical
Object orientation

primitive (or: atomic) data types in R are:

• numeric (integer, double, complex)


• character
• logical
• function
Numbers in R: NAN and NA

• NAN (not a number)


• NA (missing value)
– Basic handling of missing values
>x
[1] 1 2 3 4 5 6 7 8 NA
> mean(x)
[1] NA
> mean(x,na.rm=TRUE)
[1] 4.5
Objects in R
• Objects in R obtain values by assignment.
• This is achieved by the gets arrow, <-, and not the
equal sign, =.
• Objects can be of different kinds.
R Data Structures

 Vector
 Matrix
 Array
 Factor
 Data Frame
 List
Vectors
• vector: an ordered collection of data of the same type

> a = c(1,2,3)
> a*2
[1] 2 4 6

• In R, a single number is the special case of a vector with 1


element.
• Other vector types: character strings, logical
Vectors
• Create a vector
> x <- 1:10
• Give the elements some names
> names(x) <- c("first","second","third","fourth","fifth")

• Select elements based on another vector


> i <- c(1,5)
> x[i]
first fifth
1 5
> x[-c(i,8)]
second third fourth <NA> <NA> <NA> <NA>
2 3 4 6 7 9 10
Matrices

• matrix: a rectangular table of data of the same type

• array: 3-,4-,..dimensional matrix


• example: the red and green foreground and background
values for 20000 spots on 120 chips: a 4 x 20000 x 120 (3D)
array.
Matrices
• Create an array
> x <- array(1:10, dim = c(2, 5))
>x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> attributes(x)
$dim
[1] 2 5
> dim(x)
[1] 2 5
Matrices
• Set column or row names
> colnames(x) <- c("col1", "col2", "col3", "col4", "5", "6")
>x
col1 col2 col3 col4 col5
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> colnames(x)[1] <- "column1"
>x
column1 col2 col3 col4 col5
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Matrix
• Set row and columns names using dimnames
> dimnames(x) <- list(c("first", "second"), NULL)
>x
column1 col2 col3 col4 col5
first 1 3 5 7 9
second 2 4 6 8 10
• Setting dimension names
> dimnames(x) <- list(my.rows = c("first", "second"), my.cols = NULL)
>x
my.cols
my.rows [,1] [,2] [,3] [,4] [,5]
first 1 3 5 7 9
second 2 4 6 8 10
Lists
• vector: an ordered collection of data of the same type.
> a = c(7,5,1)
> a[2]
[1] 5

• list: an ordered collection of data of arbitrary types.


> doe = list(name="john",age=28,married=F)
> doe$name
[1] "john“
> doe$age
[1] 28

• Typically, vector elements are accessed by their index (an integer),


list elements by their name (a character string). But both types
support both access methods.
Data frames
• data frame: is supposed to represent the typical data table that
researchers come up with – like a spreadsheet.

• It is a rectangular table with rows and columns; data within


each column has the same type (e.g. number, text, logical), but
different columns may have different types.

Example:
>a
localisation tumorsize progress
XX348 proximal 6.3 FALSE
XX234 distal 8.0 TRUE
XX987 proximal 10.0 FALSE
Factors

• A character string can contain arbitrary text. Sometimes it is useful


to use a limited vocabulary, with a small number of allowed
words. A factor is a variable that can only take such a limited
number of values, which are called levels.
• Example
• a family of two girls (1) and four boys(0),
>kids = factor(c(1,0,1,0,0,0),levels=c(0,1), labels=c("boy","girl"))
> Kids
[1] girl boy girl boy boy boy
Levels: boy girl
> class(kids)
[1] "factor"
Data Input/Output
Directory management
• dir() list files in directory
• setwd(path) set working directory
• getwd() get working directory
• ?files File and Directory Manipulation

Standard ASCII Format


• read.csv read comma-delimited file
• write.csv write comma-delimited file
Reading

> sets <- read.csv("Sets_All.csv", header = TRUE)


> sets$Ordered.Year <- ordered(sets$Year)
> sets$SpotCd.Fac <- factor(sets$SpotCd, exclude = NULL)
> spotted.sets <- sets[sets$Sp1Cd == 2, ]

> write.csv(spotted.sets, file = "spotted.txt", row.names = FALSE)


Data Visualization
• plot() is the main graphing function
• Automatically produces simple plots for
vectors, functions or data frames
Sample Data Set
Plotting a Vector
• plot(v) will print the elements of the vector v
according to their index
# Plot height for each observation
> plot(dataset$Height)
# Plot values against their ranks
> plot(sort(dataset$Height))
Common Parameters for
plot()
• Specifying labels:
– main: provides a title
– xlab: label for the x axis
– Ylab: label for the y axis
• Specifying range limits
– ylim – 2-element vector gives range for x axis
– xlim – 2-element vector gives range for y axis
• Example
– plot(sort(dataset$Height), ylim = c(120,200), ylab = "Height
(in cm)", xlab = "Rank", main = "Distribution of Heights”)
Plotting Two Vectors
• plot()can pair elements from 2 vectors to
produce x-y coordinates
• plot() and pairs() can also produce composite
plots that pair all the variables in a data frame.
• Example
– plot(dataset$Hip, dataset$Waist, xlab = "Hip",
ylab = "Waist", main = "Circumference (in cm)",
pch = 2, col = "blue")
Histograms
• Generated by the hist() function
• The parameter breaks is key
– Specifies the number of categories to plot
or
– Specifies the breakpoints for each category
• The xlab, ylab, xlim, ylim options work as
expected
• Example
– hist(dataset$bp.sys, col = "lightblue", xlab = "Systolic
Blood Pressure", main = "Blood Pressure“)

You might also like