0% found this document useful (0 votes)
64 views17 pages

A Brief Introduction To R

R is a free and open-source language and environment for statistical analysis and graphics. It contains thousands of pre-programmed statistical functions and can import data from various formats like Excel, SPSS, and STATA. R allows users to easily perform data manipulation, generate publication-quality graphs, conduct statistical tests, and develop statistical models.

Uploaded by

NBert Milla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views17 pages

A Brief Introduction To R

R is a free and open-source language and environment for statistical analysis and graphics. It contains thousands of pre-programmed statistical functions and can import data from various formats like Excel, SPSS, and STATA. R allows users to easily perform data manipulation, generate publication-quality graphs, conduct statistical tests, and develop statistical models.

Uploaded by

NBert Milla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

11/14/2016

A Brief Introduction to R

Dr. Norberto E. Milla

What is R?
• R is a language and environment for statistical
computing and graphics
• R is the open source - public domain version of S+
• Initially developed by Robert Gentleman and Ross
Ihaka of University of Auckland (early 1990’s)
• R is written by statisticians for statisticians (and the
rest of us)
• An environment–huge library of algorithms for data
access, data manipulation, analysis and graphics
• A community
–Thousands of contributors, 2 million users
–Resources and help in every domain

1
11/14/2016

Awesome thing #1: Its FREE!


• Open Source, licensed under GPL (like Linux!)
–Free as in freedom
• Flexible and runs on a wide array of platforms,
including Windows, Unix, and Mac OS X
• Open for integration
–Data ($A$, $P$$, $TATA, Excel, …)
• Broad user-base
–De-facto standard for data analysis and
teaching statistics

Awesome thing #2: Language


• Programming, not dialogs or cell formulas
–Freedom to combine methods
–Repeatable results
–Reliable and reusable
• Language designed for data analysis
–Object-oriented: vector, matrix, model, …
–Built-in library of algorithms
• Get more done, faster

2
11/14/2016

Awesome thing #3: Graphics


• Functions for standard graphs
–Scatterplot, boxplot, histogram, smoothing
–Bar plot, pie chart, dot chart, …
–Image plot, 3-D surface, map, …
• Customize without limits
–Combine graph types
–Create entirely new graphics
– Use of colors

Awesome thing #4: Statistics


• All standard statistical methods built in
–Mean, median, covariance, distributions, …
–Regression, ANOVA, cross-tabulations, …
–Survival, nonlinear mixed effects, GLM, …
–Neural networks, trees, GAM, …
• Object-oriented functions
–Access all parts of the analysis results
–Combine analytic methods
• Over 3,000 contributed packages for specialized
applications (as of 2011)

3
11/14/2016

Caveat

“Using R is a bit akin to smoking.


The beginning is difficult, one may
get headaches and even gag the
first few times. But in the long run,
it becomes pleasurable and even
addictive.”
--Francois Pinard

Downloading and installing R


Step 1: Go to the R homepage: http://www.r-project.org

Click here

4
11/14/2016

Downloading and installing R


Step 2: Select a CRAN mirror site
Click here

Downloading and installing R


Step 3: Select appropriate installer based on OS

Click here

5
11/14/2016

Downloading and installing R


Step 4: Select “base” installer

Downloading and installing R


Step 5: Download R installer

Click here

6
11/14/2016

Downloading and installing R


Step 5: Double click on the R application

Step 6. On the pop-up menu, click OK.


Step 7. Click Next on the next pop-up window and continue
answering all pop-up windows until you reach FINISH
window.

The R console

7
11/14/2016

Data types in R
• R has varied data types: scalars, vectors, matrices,
data frames and lists
• A vector is a single entity consisting of an ordered
collection of numbers (numeric, character, logical)
• A matrix is a vector that can be indexed by two or
more indices
• Data frames are matrix-like structures, in which the
columns can be of different types.
• Data frames are ‘data matrices’ with one row per
observational unit but with (possibly) both
numerical and categorical variables

Vector
• R is case-sensitive
• Assignment operators in R: <-, =
a<-c(1, 2, 5, 3, 6, -2 , 4) # numeric vector
b=c(“one”, ”two”, “three”) #character vector
c=c(TRUE, FALSE, TRUE, TRUE) #logical vector
• Elements of a vector can be referred to using
subscripts
• The following command will display the 2nd and 4th
elements of vector a
a[c(2, 4)]

8
11/14/2016

Matrix
• All columns in a matrix must have the same mode
(numeric, character, etc.) and the same length
mymatrix=matrix(vector,nrow=r,ncol=c,
byrow=FALSE,dimnames=list(char_vector_row
names, char_vector_colnames))

Example:
matrix1=matrix(1:20, 4, 5) #generates a 4x5 matrix

x=c(1:9)
rownames=c(“r1”,”r2”,”r3”)
colnames=c(“c1”,”c2”,”c3”)
matrix2=matrix(x, 3, 3, byrow=T,
dimnames=list(rownames,colnames)

Data Frame
• In a data frame different columns can have different
modes
• Similar to SAS and SPSS data sets
• Example:
x=c(1,2,3,4)
y=c(“red”, ”white”, ”red”, NA)
z=c(TRUE, TRUE, FALSE, FALSE)
mydata=data.frame(x,y,z) #will create the data frame
mydata
names(mydata)=c(“ID”, ”Color”, ”Passed”) #creates column
labels for mydata

9
11/14/2016

R built-in data editor


• One can enter data interactively into R using its
built-in spreadsheet
mydata=data.frame() #will create an empty data frame
mydata=edit(mydata) #will open the spreadsheet for
data entry
• Example:

Importing data from Excel


• For Excel 2003 or earlier, save the file in csv format
and use any one the following commands to import
the file into R
read.csv("D:/DMPS/R Training/QUICK-R/
import1.csv",header=TRUE,sep=",")
or,

read.table("D:/DMPS/R Training/QUICK-R/
import1.csv",header=TRUE,sep=",")

10
11/14/2016

Importing data from Excel


• For Excel 2007 or 2010, load first the xlsx library
using the following command
library(xlsx)

• Then use the following command to import the file


into R
read.xlsx("D:/DMPS/R Training/QUICK-R/
import2.xlsx",sheetIndex=1)
or, simply
read.xlsx("D:/DMPS/R Training/QUICK-R/
import2.xlsx“,1)

Importing data from SPSS


• There are two packages which can be used to
import SPSS data sets into R: foreign and Hmisc
• Load the foreign package
library(foreign)
• Use the following command to import the data into
R
myspssdata=read.spss(“D:/DMPS/R Training/QUICK-
R/ched_complete.sav”, use.value.labels=TRUE,
to.data.frame = TRUE)

11
11/14/2016

Importing data from SPSS


• Save the SPSS data set in portable (*.por) format
• Load the Hmisc package
library(Hmisc)
• Use the following command to import the data into
R
myspssdata=spss.get(“D:/DMPS/R Training/QUICK-
R/ched_complete.por”, use.value.labels=TRUE,
to.data.frame.=TRUE)

Importing data from STATA


• Call in the foreign package
library(foreign)
• Use the following command to import the data into
R
mystatadata=read.dta(“D:/DMPS/R Training/QUICK-
R/statadata.dta”, convert.factors=TRUE)

12
11/14/2016

Variable labels
• Using the edit() function we can specify the
variable labels in the R spreadsheet

• An alternative is by using the following command:

names(mydata)[3]=“age” # this assigns age as the label the 3rd


column of mydata

Value labels
• Use the factor() function for nominal data and the
ordered() function for ordinal data
• Suppose the variable v1 is coded 1, 2 or 3 and we
want to attach value labels 1=red, 2=blue and
3=green

mydata$v1=factor(mydata$v1, levels=c(1,2,3),
labels=c(“red”, ”blue”, ”green”))

13
11/14/2016

Value labels
• Suppose the variable y is coded 1, 3 or 5 and we
want to attach value labels 1=Low, 3=Medium, and
5=High

mydata$y=ordered(mydata$y, levels=c(1,3,5),
labels=c(“Low”, ”Medium”, ”High”))

Creating new variables


• There are three ways to create new variables from
existing variables in an R data set
• Suppose the R data set mydata has two variables x1
and x2 and we want to create two variables the
mean and sum of x1 and x2
• This can be accomplished as follows:
attach(mydata)
mydata$sum=x1+x2
mydata$sum=(x1+x2)/2
detach(mydata)

14
11/14/2016

Recoding variables
• Suppose we want to categorize age as follows:
>75=Old, 45-75=Middle Aged, and <=45=Young
• This can be done as follows:
attach(mydata)
mydata$agecat[age<=45]=“Young”
mydata$agecat[age>45 and age<=75]=“Middle Aged”
mydata$agecat[age>75]=“Old”
detach(mydata)

Renaming variables
• There are many ways to do this
• The simplest is using the fix() function
mydata=fix(mydata) # results are saved on close

15
11/14/2016

Merging data sets


• We can merge data sets horizontally using the
merge() function
newdata=merge(data1,data2,by=“id”) #assuming id is
common to data1 and
data2
• Vertical merging can be done using the rbind()
function
newdata=rbind(data1,data2) #assuming data1 and
data2 have the same
variables

Selecting variables
• The following command can be used to select
variables
newdata=mydata[c(“v1”,”v3”,”v15”)] # this selects variables
v1, v3, and v15 in
my data

Or,

newdata=mydata[c(5:10)] # this will select the 5th through


the 10th variables in mydata

16
11/14/2016

Excluding/removing variables
• The following command can be used to exclude
variables in the analysis
newdata=mydata[c(-1, -3)] # this will remove the 1st and 3rd
variables in mydata

Or,

mydata$v1=mydata$v3=NULL # this will delete the


variables v1 and v3 in mydata

Selecting observations
• Use the following commands to select observations
newdata=mydata[1:5,] #will select the first 5
observations in mydata

attach(mydata)
newdata=mydata[which(gender==“male” &
age>=65),] #will select males aged 65 and
over
detach(mydata)

17

You might also like