R Studio Basics
R Studio Basics
www.jcu.edu.au/students/learning-centre
1. Setting Up R-Studio
② The Environment
An organised list of your
created
Variables will appear here
① The Console
Commands will be run and the results ③ Your Plots
output here Your graphs and plots will
be displayed here
(R Scripts and dataframes will also be
displayed in the top half of this box)
This tab shows the computer’s This tab shows R packages This tab provides help and
file system installed or available to install documentation for all commands
On creating a new R script (see next section), the script panel will open. You will write your R
code/script in here and it will be run in the console.
④ Your Script
When starting a new R script, this pane will open.
You write your code here and it will run in the
console.
① The Console
Commands will be run and the results output
here.
Start a New Project in R
When starting to work with a new dataset, a New Project should be created.
Creating a New Directory makes a default working directory and a logical place to store all
associated files such as raw data spreadsheets.
Any associated excel documents or text files can be saved into this new folder and easily
accessed from within R. You can then perform data analysis or produce visualisations with
your imported data.
Directions:
1. Open R Studio
2. Create a New Project using a New Directory in a location of your choosing
2. Basic Commands in R Studio
Run a Command
Base R Cheat Sheet: https://www.rstudio.com/resources/cheatsheets/
Commands in R can be entered directly into the console at the bottom left of the screen.
Entering 3 or more characters of a command into the console or a script will open the
suggested command menu. This menu suggests commands or the names of variables you
have intended to type, alongside a description and suggested use. Entering ? followed by
any R command will open a help page in the help tab found in the bottom right hand corner
of the screen (eg. ?log10 opens the log10 help page). This help page will offer settings and
formatting for each command, as well as an example.
On completing a command and pressing enter, R will immediate run the code, print the
output and move to a new line. Using the ↑ key will repeat the last command entered into
the console.
At its heart, R is a calculator and can accept any mathematical calculation directly into the
console. The following table represent just a few of the available mathematical operators:
2. log10(90+5*2)
3. sum(50, 3, 5)
Using R Scripts
Commands can be run directly from the console, but creating an R Script allows you to edit
and reuse previous commands and to create more complicated lists of commands. A script
also allows you to save your commands to be reopened later.
Multiple commands can be entered into a script, one after the other across multiple lines.
The script above creates a new variable called variable1 with the value “This is a character
variable”. The next command counts the number of characters in the variable. The final
command returns the value of variable1.
To run the script one line at a time, navigate the cursor to the appropriate line and press
CTRL + Enter. To run all commands from the start, press CTRL + Shift + Enter.
Directions:
1. Create a new script
(Source: https://sydney-informatics-hub.github.io/lessonbmc/02-BMC_R_Day1_B/index.html)
Variable Name
Value
Creating a Vector
A variable can hold one value or many values. A vector is used to store more than one value
of the same type. The combine or c() function (type ?c in the console for the further
information) allows us to combine them into a single list of values.
Example:
A class of 10 students have been surveyed and their heights recorded:
150cm, 150cm, 142cm, 154cm, 168cm, 153cm, 151cm, 153cm, 142cm and 151cm
To begin analysing this data in R, the values must be stored in a variable. In the following,
the variable height has been created to contain our values:
The vector can then be recalled using the variable name height. Vectors for colour (text
strings) and survey (logical values) have also created.
To recall a single value from the vector, use the variable name followed by the position of
the value in the list and surrounded by [ ] brackets.
Useful statistical analysis can be performed on single set of values using vectors:
Combining Vectors into a Dataframe
A dataset will often contain more than 1 variable of interest. To combine vectors of the
same length together into a table or “dataframe” (similar to an excel table), we use the
data.frame() command. Below we have combined 3 vectors (colour, height and survey),
containing 10 values each, together into a dataframe. The table has been assigned to the
variable table.
Columns
Header row
Values
The top line of the dataframe is called the header row and contains a descriptive name for
the values in each column.
Below the header row, values can be referred to individually or in whole rows and columns.
To retrieve a single value from a dataframe assigned to a variable, the variable name is used
then followed by the coordinates of the value within [row, column] square brackets (eg.
table[2,1] returns green and table[3,2] returns 142). Whole rows can be returned
individually using coordinates (eg. table[3,] returns all values for row 3). Single columns can
be returned by either referring to them using a coordinate with the row blank (eg. table[,3]
returns all values in 3rd column), or by using the variable name followed by a $ and the
column header (eg. table$height returns all “height” values from the dataframe “table”).
Note: You can find out what type of data is stored within a variable or a column in a
dataframe using the class( ) function.
Some examples of classes are: “matrix”, “data.frame”, ”array”, “factor”, “numeric”, “logic”
Directions:
1. Create vectors
2. Create a dataframe
3. Use coordinates to find cells
4. Use coordinates to find rows/columns
5. Use variable names
6. Use coordinates to produce a table WITHOUT a row or column
Importing an Excel File into R
In Excel, simplify your table to basic column titles and no unnecessary data as R will attempt
to manipulate the structure into columns. Save your excel file in *.csv (MS-DOS) into your R
Project folder. In R Studio use the read.csv function with your file name in “ ” to read:
The read.csv function can include settings or parameters that may need to be set for the file
to be read correctly (see ?read.csv for more info). Parameters are entered after the file
name and separated by a comma. Some of the more useful parameters are shown below:
Does the first row of your table Does your data use a “.”
contain the header row? or a “,” for decimals?
While numeric variables generally cause few problems, logical and factor
variables often do have issues due to R interpreting the variable type when
importing from an Excel file.
• Problem: One of the logical values is in lower case (eg. TRUE, TRUE,
false, FALSE), so all values are interpreted as a character type variable
(“TRUE”, “TRUE”, ”false”, ”FALSE”). R interprets logical variables only if
all values are in CAPITAL letters. This prevents any analysis that relies on
a comparison between the TRUE and FALSE values.
Solution: The problem is most easily fixed in Excel before the data is
imported. Converting the variables in R can be achieved using the
toupper( ) function.
The package ggplot2 simplifies data visualisation. You provide the data, how the variables
should be interpreted, and graphical options if required, while ggplot2 handles the details.
To load the data that you would like to visualise, the ggplot(data = yourdata) function is
used.
The next step is choosing a visualisation type or “geom”. This option comes after the ggplot
function and they are separated by a +. There are many different types of graph and the
choice of which to use depends on your data. The most common types are geom_histogram
for a histogram, geom_bar for a bar chart, geom_boxplot for a box and whisker plot,
geom_point for a scatterplot, and geom_smooth for a line of best fit or fitted curve. Many
of these different graphs can be overlayed with other visualisations, with each requiring you
to define the variable/s to display. Many also require extra parameters, all of which can be
found in help documentation.
The following visualisation examples will use the following dataframe:
A box and whisker plot representing the “weight” for each “colour”:
2 variables
The above scatterplot but with a fitted line using a linear model:
Adding the line of best fit Variables
A bar chart representing the frequency of “taste” and the ratio of “eaten” for each:
www.jcu.edu.au/students/learning-centre