0% found this document useful (0 votes)
5 views

Introduction To R Notes

The document provides an introduction to R, including what R is, its origins and capabilities, how to obtain and install R and RStudio, and basic workflows and conventions for using R. R is an open source programming language and software environment for statistical analysis and graphics. It is widely used among data scientists and researchers for data manipulation, calculation, and creating graphical representations of data.

Uploaded by

amsouseful
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Introduction To R Notes

The document provides an introduction to R, including what R is, its origins and capabilities, how to obtain and install R and RStudio, and basic workflows and conventions for using R. R is an open source programming language and software environment for statistical analysis and graphics. It is widely used among data scientists and researchers for data manipulation, calculation, and creating graphical representations of data.

Uploaded by

amsouseful
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Introduction to R notes

R is a powerful and flexible environment for research computing. Written by Ross Ihaka, Robert
Gentleman (hence the name “R”), the R Core Development Team, and an army of volunteers, R
provides a wider range of analytical and graphical commands than any other software. The fact
that this level of power is available free of charge has dramatically changed the landscape of
research software.
R is a variation of the S language, developed by John Chambers, Rick Becker, and others at Bell
Labs1. The Association of Computing Machinery presented John Chambers with a Software
System Award and said that the S language “. . . will forever alter the way people analyze,
visualize, and manipulate data. . . ” and went on to say that it is “. . . an elegant, widely accepted,
and enduring software system, with conceptual integrity. . . .” The original S language is still
commercially available as Tibco Spotfire S+. Most programs written in the S language will run in
R.

There are many good reasons to use R:


• it is open source (free download at http://cran.r-project.org/ ),
• it is available for Ms Windows, Mac OS X, Linux,
• it is expandable with more than 3000 libraries (also with free download),
• it is expandable to any kind of new method we might want to implement (possibility of
building our own functions),
• there is a huge sharing of information and experience on R in the web,
• ... many more... !

The R Environment
R is three things: a project, a language, and a software environment. As a project, R is part
of the GNU free software project (www.gnu.org), an international effort to share software
on a free basis, without license restrictions. Therefore, R does not cost the user anything to
use. The development and licensing of R are done under the philosophy that software should
be free and not proprietary. This is good for the user, although there are some
disadvantages. Mainly, that “R is free software and comes with ABSOLUTELY NO
WARRANTY.” This statement comes up on the screen every time you start R. There is no
quality control team of a software company regulating R as a product.

The R project is largely an academic endeavor, and most of the contributors are statisticians.
The R project started in 1995 by a group of statisticians at University of Auckland and has
continued to grow ever since. Because statistics is a cross-disciplinary science, the use of R
has appealed to academic researchers in various fields of applied statistics. There are a lot
of niches in terms of R users, including: environmental statistics, econometrics, medical and
public health applications, and bioinformatics, among others. The URL for the R project is
http://www.r-project.org/. As a language R is a dialect of the S language, an object-oriented
statistical programming language developed in the late 1980’s by AT&T’s Bell labs.

1
A flexible environment
R is a great tool for data manipulation, calculation and graphical display:
• the data handling is easy and effective,
• there are many built in functions that allow data manipulation (from simple operations to
complex modeling),
• graphical facilities for data analysis are very well developed,
• the programming language (called \S") is simple and effective at the same time.
• Documentation on how to exploit the many R functionalities can be found on the web
(guides, packages user manuals,...).

Obtaining and Installing R


The most recent version of R for all operating systems is always located at http://www.r-
project.org/index.html. Go directly to http://lib.stat.cmu.edu/R/CRAN/ , and download the R
version for your operating system. Then, install R.

To operate R, you should rely on writing R scripts. We will write these scripts in RStudio.
Download RStudio from http://www.rstudio.org. Then, install it on your computer. Some text
editors also offer integration with R, so that you can send code directly to R. RStudio is
generally the best solution for running R and maintaining a reproducible workflow.

Opening RStudio

Upon opening the first time, RStudio will look like the Figure below.
The window on the left is named “Console”. The point next to the blue “larger than” sign > is the
“command line”. You can tell R to perform actions by typing commands into this command line.

2
The bottom left panel is the console. Here you can type code directly to be sent to R.
The top left is called the RScript, and is basically a text editor that color codes for you and sends
commands easily to R. Using a separate R script is nice because you can save only the code that
works, making it easy to rerun and edit in the future, as opposed to the R console in which you
would also have to save all your mistakes and all the output. We recommend always saving your
R Scripts so you have the commands easily accessible and editable for future use. Code can be

3
sent from the RScript to the console either by highlighting and clicking this icon : or else by
typing CTRL+ENTER at the end of the line. Different RScripts can be saved in different tabs.

The top right is your Workspace and is where you will see objects (such as datasets and variables).
Clicking on the name of a dataset in your workspace will bring up a spreadsheet of the data.

The bottom right serves many purposes. It is where plots will appear, where you manage your files
(including importing files from your computer), where you install packages, and where the help
information appears. Use the tabs to toggle back and forth between these screens as needed.

The R Console
When we open R we get a simple window (the R console, in red color) with the standard prompt
“>". To get started with R is very straightforward: what we have to do
is just interact with the prompt, which is there waiting for us to type in our input commands.
Everything can be done by typing commands on this console. Some things can also be done by
clicking on the menu bar (setting the working directory, install packages,...).

Quitting R
We can quit R by:
• clicking on the x in the R window.
• or typing the command q() on the R console.

Workflows and conventions


There are many resources on how to structure your R workflow, and I encourage you to
search for and maintain a consistent approach to working with R. It will make your life much,
much easier—with regards to collaboration, replication, and general efficiency. A few really
important points that you might want to consider as you start using R:
_ Never type commands into the R command line. Always use a script file, from which you can
send (via RStudio, Emacs, ...) commands to R, or at least copy and paste them into R.

_ Comment your script files! Comments are indicated by the # sign:


> # This is a comment

_ Save your script files in a project-specific working directory.

Error messages
R will return error messages when a command is incorrect or when it cannot execute a
command. Often, these error messages are informative by themselves. You can often get more
information by simply searching for an error message on the web. Here, I try to add 1 and the
letter a, which does not (yet) make sense:
>1+a
## Error in eval(expr, envir, enclos): object 'a' not found

4
As your coding will become more complex, you may forget to complete a particular command.
For example, here I want to add 1 and the product of 2 and 4. But I forget to close the
parentheses around the product:
> 1 + (2 * 4
+)
## [1] 9
You will notice that the little > on the left changes into a +. This means that R is offering you a
new line to finish the original command. If I type a right parenthesis, R returns the result of my
operation.

Setting Your Working Directory


Your working directory is the location R uses to retrieve or store files, if you do not otherwise
specify the full path for filenames. On Windows, the default working directory is My
Documents. On Windows XP or earlier, that is C:\Documents and Settings\username\My
Documents. On Windows Vista or later, that is C:\Users\Yourname\My Documents. On
Macintosh, the default working directory is /Users/username.

The getwd function will tell you the current location of your working directory:

> getwd()

On any operating system, you can change the working directory with the setwd function.
Simply provide the full path between the quotes:

setwd("/myRfolder")

> setwd("D:/Ngari_drive D/Moses/Teaching/TUM/Statistical_modeling/Data")

We discussed earlier that R uses the forward slash “/” even on computers running Windows.
That is because within strings, R uses “\t,” “\n” and “\\” to represent the single characters
tab, newline, and backslash, respectively. In general, a backslash followed by another
character may have a special meaning.
So when using R on Windows, always specify the paths with either a single forward slash or
two backslashes in a row.

5
Set a default working directory

A default working directory is a folder where RStudio goes, every time you open it. You can
change the default working directory from RStudio menu under: Tools –> Global options –>
click on “Browse” to select the default working directory you want.

Read more here http://www.sthda.com/english/wiki/running-rstudio-and-setting-up-your-working-


directory-easy-r-programming

6
Using RStudio Projects

RStudio projects make it straightforward to divide your work into multiple contexts, each with
their own working directory, workspace, history, and source documents.

Creating Projects

RStudio projects are associated with R working directories. You can create an RStudio project:

• In a brand new directory


• In an existing directory where you already have R code and data
• By cloning a version control (Git or Subversion) repository

To create a new project in the RStudio IDE, use the Create Project command (available on the
Projects menu and on the global toolbar):

7
When a new project is created RStudio:

1. Creates a project file (with an .Rproj extension) within the project directory. This file contains
various project options (discussed below) and can also be used as a shortcut for opening the
project directly from the filesystem.
2. Creates a hidden directory (named .Rproj.user) where project-specific temporary files (e.g. auto-
saved source documents, window-state, etc.) are stored. This directory is also automatically
added to .Rbuildignore, .gitignore, etc. if required.
3. Loads the project into RStudio and display its name in the Projects toolbar (which is located on
the far right side of the main toolbar)

Working with Projects


Opening Projects

There are several ways to open a project:

1. Using the Open Project command (available from both the Projects menu and the Projects
toolbar) to browse for and select an existing project file (e.g. MyProject.Rproj).
2. Selecting a project from the list of most recently opened projects (also available from both the
Projects menu and toolbar).
3. Double-clicking on the project file within the system shell (e.g. Windows Explorer, OSX Finder,
etc.).

When a project is opened within RStudio the following actions are taken:

• A new R session (process) is started


• The .Rprofile file in the project's main directory (if any) is sourced by R

8
• The .RData file in the project's main directory is loaded (if project options indicate that it should
be loaded).
• The .Rhistory file in the project's main directory is loaded into the RStudio History pane (and
used for Console Up/Down arrow command history).
• The current working directory is set to the project directory.
• Previously edited source documents are restored into editor tabs
• Other RStudio settings (e.g. active tabs, splitter positions, etc.) are restored to where they were
the last time the project was closed.

Quitting a Project

When you are within a project and choose to either Quit, close the project, or open another
project the following actions are taken:

• .RData and/or .Rhistory are written to the project directory (if current options indicate they
should be)
• The list of open source documents is saved (so it can be restored next time the project is
opened)
• Other RStudio settings (as described above) are saved.
• The R session is terminated.

Getting Started with RStudio


Basic Commands

Entering Commands
Commands can be entered directly into the R console (bottom left), following the > prompt, and
sent to the computer by pressing enter. For example, typing 1 + 2 and pressing enter will output
the result 3:

> 1+2
[1] 3

Your entered code always follows the > prompt, and output always follows a number in square
brackets. Each command should take its own line of code, or else a line of code should be
9
continued with { }. It is possible to press enter before the line of code is completed, and often R
will recognize this. For example, if you were to type 1 + but then press enter before typing 2,
R knows that 1+ by itself doesn’t make any sense, so prompts for you to continue the line with
a + sign. At this point you could continue the line by pressing 2 then enter. This commonly
occurs if you forget to close parentheses or brackets. If you keep pressing enter and keep seeing a
+ sign rather than the regular > prompt that allows you to type new code, and if you can’t figure
out why, often the easiest option is to simply press ESC, which will get you back to the normal >
prompt and allow you to enter a new line of code.

You can also enter this code into the RScript and run it from there. Create a new RScript by File
-New - R Script. Now you can type in the R Script (top left), and then send your code to the

console either by pressing or CTRL+ENTER. Try typing 1+2 in the R Script and sending it
to the console.

Capitalization and punctuation need to be exact in R, but spacing doesn’t matter. If you get errors
when entering code, you may want to check for these common mistakes:

- Did you start your line of code with a fresh prompt (>)? If not, press ESC.
- Are your capitalization and punctuation correct?
- Are all your parentheses and brackets closed? For every forward (, {, or [, make sure there is a
corresponding backwards ), }, or ].

The basic arithmetic commands are pretty straight forward. For example, 1 + (2*3) would return
7. You can also name the result of any command with a name of your choosing with =. For
example
x = 3*4
sets x to equal the result of 3*4, or equivalently sets x = 12. The choice of x is arbitrary - you can
name it whatever you want. If you type x into the console now you will see 12 as the output:
> x
[1] 12
Naming objects and arithmetic works not just with numbers, but with more complex objects like
variables. To get a little fancier, suppose you have variables called Weight (measured in
pounds) and Height (measured in inches), and want to create a new variable for body mass
index, which you decide to name BMI. You can do this with the following code:
BMI = Weight/(Height^2) * 703

If you want to create your own variable or set of numbers, you can collect numbers together into
one object with c( ) and the numbers separated by commas inside the parentheses. For
example, to create your own variable Weight out of the weights 125, 160, 183, and 137, you
would type
Weight = c(125, 160, 183, 137)
To get more information on any built-in R commands, simply type ? followed by the command
name, and this will bring up a separate help page.

10
R packages
Many useful and important functions in R are provided via packages that need to be installed
separately.
You can do this by using the Package Installer in the menu (Packages & Data > Package
Installer in R or Tools > Install Packages... in RStudio), or by typing

> install.packages("foreign")

in the R command line. Next, every time you use R, you need to load the packages you want to
use: type

> library(foreign)

in the R command line.

If you prefer to use a function instead of the menus, you can use the install.packages function.
For example, to download and install Frank Harrell’s Hmisc package, start R and enter this
function call:

install.packages("Hmisc", dependencies=TRUE)

The argument dependencies=TRUE tells R to install any packages that this package “depends” on
and those that its author “suggests” as useful. R will then prompt you to choose the closest mirror
site and the package you need.

install.packages("foreign")
install.packages("xlsx")
install.packages("dplyr")
install.packages("reshape2")
install.packages("ggplot2")
install.packages("GGally")
install.packages("vcd")

You can also install packages from the menu (bottom righthand side, select packages)

11
Note: you need to connected to internet to install package (they are stored online).

Loading an Add-on Package


Once installed, a package is on your computer’s hard drive but not quite ready to use. Each time
you start R, you also have to load the package from the library before you can use it. You can see
what packages are installed and ready to load with the library function.

library()

That causes the window in the Fig. below to appear, showing the packages you have installed.
The similar installed.packages function lists your installed packages along with the version and
location of each.

You can then load a package you need with the menu selection, Packages >Load packages. It will
show you the names of all packages that you have installed but have not yet loaded. You can
then choose one from the list. Alternatively, you can use the library function. Here I am loading
the Hmisc package. Since the Linux version lacks menus, this function is the only way to load
packages.

library("Hmisc")

Many packages load without any messages; you will just see the “>” prompt again. When trying
to load a package, you may see the error message below. It means you have either mistyped the
package name (remember capitalization is important) or you have not installed the package

12
before trying to load it. In this case, Jim Lemon and Philippe Grosjean’s prettyR package name is
typed accurately, so we have not yet installed it.

> library("prettyR")
Error in library(prettyR) :
there is no package called ’prettyR’

To see what packages you have loaded, use the search function.
> search()
[1] ".GlobalEnv" "package:foreign" "package:boot"
[4] "package:Hmisc" "package:ggplot2" "package:Formula"
[7] "package:survival" "package:splines" "package:lattice"
[10] "package:grid" "package:stats" "package:graphics"
[13] "package:grDevices" "package:utils" "package:datasets"
[16] "package:methods" "Autoloads" "package:base"

Standard packages
The standard (or base) packages are considered part of the R source code. They contain the
basic functions that allow R to work, and the datasets and standard statistical and graphi cal
functions. They should be automatically available in any R installation.

Contributed packages and CRAN


There are thousands of contributed packages for R, written by many different authors. Some
of these packages implement specialized statistical methods, others give access to data or
hardware, and others are designed to complement textbooks. Some (the recommended
packages) are distributed with every binary distribution of R. Most are available for
download from CRAN (http://CRAN.R-project.org/ and its mirrors) and other repositories
such as Bioconductor
(http://www.bioconductor.org/) and Omegahat (http://www.omegahat.org/). The R FAQ
contains a list of CRAN packages current at the time of release, but the collection of available
packages changes very frequently.

Since there are so many packages written by users, two packages will occasionally have functions
with the same name. That can be very confusing until you realize what is happening. For example,
the Hmisc and prettyR packages both have a describe function that does similar things. In such a
case, the package you load last will mask the function(s) in the package you loaded earlier. For
example, we loaded the Hmisc package first, and now am loading the prettyR package (having
installed it in the meantime). The following message results:

library("Hmisc")

> library("prettyR")
Attaching package: ’prettyR’

13
The following object(s) are masked from package:Hmisc :

Describe

You can avoid such conflicts by detaching each package as soon as you are done using it by using
the detach function. For example, the following function call will detach the prettyR package:

detach("package:prettyR")

One approach that avoids conflicts is to load a package from the library right before using it and
then detach it immediately as in the following example:

> library("Hmisc")
> describe(mydata)

---output would appear here---


> detach("package:Hmisc")

Updating Your Installation


You have to run update.packages function in R to check for updates.

> update.packages()

14
The R language
The R language is very simple, and easy to understand and use. It is case sensitive (big and small
letters represent different objects). We basically need to type on the prompt lines containing
instructions, which can be broadly divided in (very important distinction!):

_ expressions: they are just evaluated, printed, and the value is then lost,
_ assignments: an expression is evaluated, its value is not printed but assigned to an object,
which is automatically saved in the workspace.

We can evaluate 2+2 as an expression, or assign its value to an object (in the following, the R
Console output will be considered):

> ##### Expressions vs assignments


>
> ### this is an expression
> 2+2
[1] 4
>
> ### this is an assignment

15
> a = 2+2 # or a <- 2+2
>a
[1] 4
>
> ### this is an assignment split in two rows
> ### (notice that R asks for the rest of the command,
> ### showing "+" instead of ">")
> b = (2+2)*(3-
+ 2)
>b
[1] 4

We can recall and correct previous commands also directly from the console, by pressing the up
arrow (the previous command(s) will appear after the prompt). Any expression that we want to
evaluate can be of three types:
_ our own written expression (2+2 or (2+2)*(3-2)),
_ a function of R, either built in or created by us (mean() or log() or myfunction()),
_ a mix or the previous two (2+2+log(2)).

The functions available in R are built by putting together expressions and assignments (we will
see how), and returning an output (which is printed if the function is called as an expression or it
is not if the function is called as an assignment). For instance:
> log(2)
[1] 0.6931472
> 2+2+log(2)
[1] 4.693147

16

You might also like