STTN 225 R Summary
STTN 225 R Summary
STTN 225 R Summary
R-Notes
Getting help in R:
"?" and the known function
Or just use the built in
> ?var
help function.
"??" and the idea you are looking for
> ??regresion
Data types in R:
Scalars:
Just a 1 x 1 vector
> B = 192.90
Vectors:
Create these with the “c()” function: c for Combine
> D = c(12,34,45)
Matrices:
Create these with the “matrix()” function:
> G = matrix(c(1,2,3,4,5,6), ncol=3, byrow=TRUE)
ncol : number of columns
byrow=TRUE : Filling in by row.
byrow=FALSE : Filling in by column.
List:
Basically a vector that can have anything as its elements.
Create these with the “list()” function:
> I = list(X,D,G)
Data Frame:
A data frame is used for storing data tables.
It is a list of vectors of equal length.
> L = data.frame(Xdata=c(1,2,3), Ydata=c(22,33,44))
Indexing in R:
Vector indexing:
One index only: [...]
> D[2] The 2nd element
> D[1:2] The 1st until 2nd element
Matrix indexing:
Two indexes: [..., ...]
> G[2,3] Element in the 2nd row and the 3rd column
> G[2,] All elements in the 2nd row
> G[,3] All elements in the 3rd column
List:
$ indexing and [[...]] indexing:
> I$X
> I[[2]]
Data frames:
All of the above types of indexing:
> L[2]
> L[1,2]
> L$Ydata
> L[[2]]
Logical indexing:
Indicates TRUE’s and FALSE’s:
> D[D>20]
Reading Data into R:
Use “read.table” to get a text file into R.
> data1 = read.table(“Rdat1.txt”, header=TRUE, skip=0)
“Rdat1.txt”: file name
Header=TRUE: If the text file contains a header
skip=0: The amount of lines skipped before reading the data
Basic operators in R:
+ Addition
- Subtraction
* Multiplication
/ Division
^ or ** Exponent
%*% Matrix multiplication
%/% Integer division
%% Modulus (Remainder from division)
t(⋅) Transpose of a matrix or vector
solve(⋅) Inverse of a square matrix
%in% Determines if one thing is an element of another
: Sequence
Sequence Functions
Order Functions
Query Functions
t distribution t df
For example:
Suppose that 𝑋∼𝑁(0,92), with distribution function 𝐹(𝑥) = 𝑃(𝑋 < 𝑥) and
density function 𝑓(𝑥).
To determine 𝐹(2.9) = 𝑃(𝑋 < 2.9) we type:
> pnorm(2.9,0,9)
To determine 𝑓(2.9) we type:
> dnorm(2.9,0,9)
To determine 𝐹−1(0.95) we type:
> qnorm(0.95,0,9)
To generate 1000 observations from 𝑋, type:
> rnorm(1000,0,9)
Example:
> if(X > 8)
> {
> print("X larger than 8")
> }
> else if (X <= 8)
> {
> print("X smaller than 8")
> }
> else
> {
> print("otherwise")
> }
Plots in R:
The basic histogram in R is
> hist(X)
MonteCarlo(10000,10) Bootstrap(X,10000,10)
Likelihood
ℓ(𝜃|𝑋): = 𝑙𝑜𝑔(𝑙𝑖𝑘(𝜃|𝑋))
For a given set of data 𝑋1 , … , 𝑋𝑛 , we want to find the values of 𝜃 that make
these likelihood function values as big as possible (maximum).
The 𝜃 values that produce these maximum function values are called
Maximum Likelihood Estimators (MLEs).
This is typically "difficult" to do analytically (i.e., with math).
However, can be done numerically quite easily by using a "brute force"
approach.
The “Brute Force” Approach
Suppose we want to find the value of 𝜃 that produces the largest 𝑙𝑖𝑘(𝜃).
Create a "grid" of 𝜃 values and evaluate the function in every single value.
The 𝜃 that produces the maximum 𝑙𝑖𝑘(𝜃) value is your "optimal" 𝜃.
Note: The finer the grid the better the MLE!
x = Data
G = 100000
grid = seq(0.001,10,length = G)
lik = numeric(G) #lik = likelihood
i = 1
for(thetaj in grid)
{
#likj = product(f(Xi|Oj))
lik[i] = prod(dchisq(x,thetaj)) ##Note the change
i = i + 1
}
plot(grid,lik,type = "l")
abline(v = grid[indx], lty = 2)
x = Data
G = 100000
grid = seq(0.001,10,length = G)
l = numeric(G) #l = log-likelihood
i = 1
for(thetaj in grid)
{
#log-likj = sum(log(f(Xi|Oj)))
l[i] = sum(log(dchisq(x,thetaj))) ##Note the change
i = i + 1
}
plot(grid,l,type = "l")
abline(v = grid[indx], lty = 2)
Bootstrap Basic Percentile Confidence Interval
BootCI = function(x,alpha,B)
{
n = length(x)
mstar = numeric(B)
for(b in 1:B)
{
xstar = sample(x,n,replace=TRUE)
mstar[b] = median(xstar) #or mean
}
mstar = sort(mstar)
r = floor(B*alpha/2)
s = floor(B*(1 - alpha/2))
basicCI = c(mstar[r],mstar[s])
hybridCI = c(2*median(x)-mstar[s],2*median(x)-mstar[r])
X = Data
BootCI(X, 0.05, 1000)