0% found this document useful (0 votes)
41 views36 pages

Introduction To R

Uploaded by

Yared Fikadu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views36 pages

Introduction To R

Uploaded by

Yared Fikadu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Introduction to R

SAMSON LETA
[email protected]
[email protected]
We’ll Cover

What is R

How to obtain and install R

Packages in R

How to read and export data

How to do basic statistical analyses

LM and GLM models in R


What is R

Software for Statistical Data Analysis

- written by Robert Gentleman and Ross Ihaka

Based on S

Programming Environment

Data Storage, Analysis, Graphing


Brief Introduction to R

Available at www.r-project.org R is Free and Open


Source Software
Runs on a wide variety of platforms:
 UNIX, Windows and MacOS.
R allows you to carry out statistical analyses in an
interactive mode, as well as allowing simple
programming
Current Version: R-3.5.0
Strengths and Weaknesses

Strengths
 Free and Open Source
 Strong User Community
 Highly extensible, flexible
 Implementation of high end statistical methods
 Flexible graphics and intelligent defaults
Weakness
 Steep learning curve
 Slow for large datasets
Installing R

 To use R, you first need to install the R program on your


computer.
 Installing R on a Windows PC – from Comprehensive R

Archive Network: http://cran.r-project.org


Starting R

Windows, Double-click on Desktop Icon


R Working Area

This is the area where all


commands are issued, and
non-graphical outputs
observed when run
interactively
Installing an R package

Sometimes we need additional functionality beyond

those offered by the core R library.


You can install an additional package from R CRAN
Installing RStudio

Rstudio (
http://www.rstudio.com/products/rstudio/downloa
d/
) is an integrated development environment (IDE)
for R.
It includes a console, syntax highlighting editor that
supports direct code execution, as well as tools for
plotting, history, debugging and workspace
management.
RStudio
Basics

 Highly Functional

 Everything done through functions


 Strict named arguments
 Abbreviations in arguments OK (e.g. T for TRUE)
 Object Oriented

 Everything is an object
 “<-” is an assignment operator
 “X <- 5”: X GETS the value 5
Getting Help in R

 From Documentation:

 ?WhatIWantToKnow
 help(“WhatIWantToKnow”)
 help.search(“WhatIWantToKnow”)
 help.start()
 getAnywhere(“WhatIWantToKnow”)
 example(“WhatIWantToKnow”)
Familiarizing with R
R comes with extensive documentation
help.start()
R objects - Data Structures

 Supports virtually any type of data

 Numbers, characters, logicals (TRUE/ FALSE)

 Arrays of virtually unlimited sizes

 Simplest: Vectors and Matrices

 Lists: Can Contain mixed type variables

 Data Frame: Rectangular Data Set


In an R Session…. A to Z

First, read data from other sources

 Use packages, libraries, and functions


 Write functions wherever necessary
Conduct Statistical Data Analysis
• Save outputs to files, write tables
Save R workspace if necessary (exit prompt)
Reading data into R

 R not well suited for data preprocessing

 Preprocess data elsewhere (Excel, SPSS, etc)

 Easiest form of data to input: text, csv file

 Read from other systems:

 Use the library “foreign”: library(foreign)


 Can import from SAS, SPSS, Epi Info and STATA
Reading Data into R

Read TXT files with read.delim() or read.table()

Read Comma Delimited files with read.csv() or

read.table()
Read Excel files with read.excel()
Operators and Expressions

The following table shows the standard arithmetic,

logical and relational operators you may use in


expressions: Operator Description
+ addition
- subtraction
* multiplication
/ division
^ or ** exponentiation
Operators and Expressions

Logical Operators
Operator Description
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== exactly equal to
!= not equal to
!x Not x
x|y x OR y
x&y x AND y
Operators and Expressions

Functions
R has a large number of functions; here are a few
frequently-used mathematical functions,
abs(x) the absolute value of x
exp(x) the exponential function of x
ln(x) or log(x) the natural logarithm of x if x>0
log10(x) the log base 10 of x (for x>0)
round(x) x rounded to the nearest whole
number
sqrt(x) the square root of x if x >= 0
Statistical Functions

 Descriptive Statistics

 Statistical Modeling

 Regressions:
 Survival
 Time series
 Multivariate Functions

 Inbuilt Packages, contributed packages


Descriptive Statistics

 Has functions for all common statistics

 summary() gives lowest, mean, median, first, third quartiles,

highest for numeric variables


 table() gives tabulation of categorical variables
Data description

> summary
 Displaying data using plotting functions
> plot()
> hist()
> boxplot()
Statistical Modeling

Over 400 functions

 lm, glm, aov, t.test


Numerous libraries & packages

 lattice, MASS, survival, …


Regressions

Linear models (lm)

Generalized linear models (glm)


Regressions

Fitting linear model

 Simple

 Multiple
Regressions

How to model
Specify your model like this:

 y ~ xi+ci, where

 y = outcome variable, xi = main explanatory variables, ci =


covariates, + = add terms
 Operators have special meanings
 + = add terms, : = interactions, / = nesting, so on…
Regressions

How to model
 Modeling -- object oriented

 each modeling procedure produces objects


 classes and functions for each object
Regressions

Model simplification

 Comparing nested modes (anova)


 Stepwise variable elimination (stepAIC)
– library MASS
Regressions

Model diagnosis
 plot() – general
 Normality - hist(), qqnorm()/qqline, shapiro.test()
 Homoscedasticity - by plotting the standardised residuals
against the predicted values
ncvTest - library(car)
 Linearity – plotting continuous variable with residual
error
plot(age, res)
lines(lowess(age, res))
Regressions

Assessment of individual observations


Outliers – outlierTest(), qqPlot()

Leverage – oservations with large X- value

Influential observation –cooks.distance(),

influencePlot()
Generalized linear models -glm

Family/ Explanatory
Model random Link variables/systematic
component components

Linear
Normal Identity Continuous
Regression

ANOVA Normal Identity Categorical

Logistic
Binomial Logit Mixed
Regression
Poisson
Poisson Log Mixed
Regression
Generalized linear models -glm

?family
o binomial(link = "logit")
o gaussian(link = "identity")
o Gamma(link = "inverse")
o inverse.gaussian(link = "1/mu^2")
o poisson(link = "log")
For more resources, check out…

R home page
http://www.r-project.org
R discussion group
http://www.stat.math.ethz.ch/mailman/listinfo/r-help

Search Google for R and Statistics


The End
THANK YOU

You might also like