Introduction To R
Introduction To R
Introduction To R
Introduction to R
# Simple operations
12 + 64 * 2 # Simple vector
v <- c(1, 2, 3)
# variables assignation
x <- 1 # length of the vector (function call)
y <- 2 length(v)
z <- x + y
# vector indexing
# create a new vector v v[1]
v <- c(10, 8, 15) v[2]
v[3]
# first element of v
v[1] # math operations
sum(v)
# Indexing a vector max(v)
Data types in R :
- Vectors
- Lists
- Matrices
- Arrays
- Factors
- Data Frames
Indexing a data.frame
Loading and saving data.frame
Ordering a data.frame
> # data
> d <- data.frame(
+ count = c(8, 21, 32, 12, 4),
+ cat = c("A", "A", "B", "B", "B"))
> # get the order of each element in d$count
> order(d$count)
[1] 5 1 4 2 3
> order(d$count, decreasing = TRUE)
[1] 3 2 4 1 5
> # create a reordered data.frame
> d[order(d$count),]
count cat
5 4 B
1 8 A
4 12 B
2 21 A
3 32 B
> # reverse order data.frame
> d[order(d$cat, decreasing = TRUE),]
count cat
3 32 B
4 12 B
5 4 B
1 8 A
2 21 A
Factor
Function
apply() – used to apply a function to the rows or columns of matrices or data frames
apply(x, MARGIN, FUN)
x = matrix, data frame or array
MARGIN = 1 indicates rows, 2 indicates columns, c(1,2) indicates rows and column
FUN = function to be applied
apply(data, 2, sum)
[1] 6 15 24
apply(data, 1, sum)
[1] 12 15 18
lapply() – used to apply a function to each element of the list
lapply(x, FUN)
x = list
FUN = function
$item2
[1] 4 12 20 28 36
$item4
[1] 1 3 5 7 9
lapply(data, sum)
$item1
[1] 15
$item2
[1] 100
$item4
[1] 25
# Get the sum of each list item and simplify the result into a vector
data <- list(item1 = 1:5,
item2 = seq(4,36,8),
item4 = c(1,3,5,7,9))
data
$item1
[1] 1 2 3 4 5
$item2
[1] 4 12 20 28 36
$item4
[1] 1 3 5 7 9
sapply(data, sum)
item1 item2 item4
15 100 25
tapply() – breaks the data set intro groups and applies a function to each group
tapply(x, INDEX, FUN, simplify)
x = a vector
INDEX = a grouping factor or a list of factors
FUN = the function to be applied
Simplify = return simplified result if set to TRUE. Default is TRUE
grep() = searches for matches of certain character pattern in a vector of character strings and
returns the indices that yielded a match
grepI() = searches for matches of certain character pattern in a vector of character strings an
d returns a logical vector indicating which elements of the vector contained a match
aggregate() = splits the data into subsets, computes summary statistics for each, and return t
he result in a convenient form
Libraries
> # load the library
> # install.packages(“package_name”)
> library(stringr)
> # use a function from the library
> str_to_upper("aaaa")
[1] "AAAA"
Bioconductor
- provides tools for the analysis and comprehension of high-throughput genomic data.
- uses the R statistical programming language and is open source and open developme
nt.
xy plot
Scatter plot
Histogram
Boxplot
Barplots
Heatmap
Multiplots