R Basics and Essentials
R Basics and Essentials
2
https://moderndive.netlify.app/1-getting-started.html#r-rstudio
What is R and RStudio?
3
https://moderndive.netlify.app/1-getting-started.html#r-rstudio
Why R?
● R is free to install, use, update, clone, modify, redistribute, even sell.
● It works in all operative systems.
● R's strong package ecosystem and charting benefits
● Clean, analyse, plot, and communicate all from the same place
● Keep track of your steps -> Reproducibility
● Reduce processing time -> Automation
● Everything you do in the analysis, from deleting outliers to interpreting
results, is contained in your code
● State-of-the-art graphics
● Maintained by its large community: great support!
● Most tools for digital soil mapping are written in R
4
Additional learning material
Using R for Digital Soil Mapping
https://link.springer.com/book/10.1007/978-3-319-44327-
0#:~:text=This%20book%20describes%20and%20provides,and%20s
patial%20data%20in%20R.
5
Installation
1. Download and install R and Rtools
https://cloud.r-project.org/ https://cran.r-
project.org/bin/windows/Rtools/rtools42/rtools.html
6
General workflow in R
7
https://r4ds.had.co.nz/explore-intro.html
RStudio interface
R script:
We write text (scripts/code) here
Environment:
Shows the saved objects
Console:
The real interface of R
Files, plot, packages, and help!
8
Open the script 0.Introduction_to_R.R
9
Main parts of R syntax
● Functions: mean(), read_csv(), plot()
○ Arguments in functions: mean(x = 1:10) which is the same than mean(1:10)
● Objects
○ Vectors: concatenated values
○ Dataframes: spreadsheets
○ Lists: group of any objects, like a folder
○ Others
● Type of data in objects
○ Numeric: 1, 2, 3, 4.5, 6.02
○ Character: “a”, “b”, “1”, “this is a character”
○ Factors: characters with levels
● Packages:
○ Add-ons that contain functions for an specific use: “base”, “tydiverse”, “raster” , etc.
10
Main parts of R syntax
● Operators
○ Assign: <- (=)
○ Equal: a == b
○ Different: a != b
○ Column: $, dataframe$column_1
○ Subset: [],
■ vector[1] is the first element of the vector
■ Dataframe[row,column], e.g. dataframe[1,] is the first row of the dataframe, dataframe[,1:5] are the
first columns of the dataframe
■ List[[1]] is the first object of the list
○ Concatenate functions: %>% (pipe symbol)
○ Other symbols with expected behavior: +, -, *, /, >, <, ^ (power), & (and), | (or)
● R is case sensitive
○ “THIS” is not the same than “this”
11
Other logical operators
● %in% allows you to verify if an element is part of ==
another object
TRUE
==
FALSE
%in%
TRUE
12
We will use the following packages
● Tabular-data management: tidyverse
● Plotting:
○ Tabular data: ggplot2
○ Maps: mapview and tmap
● For spatial data
○ Rasters: raster and terra
○ Shape files: sf and terra
○ Access remote sensing data: rgee
○ Other GIS tools: rgdal
● Modelling: caret
13
Tidyverse
https://www.tidyverse.org/
14
Tidyverse
https://r4ds.had.co.nz/
Chapter 5:
Data transformation
15
Tidyverse
■ Concatenate function with (%>%).
■ Pick variables by their names (select()).
■ Pick observations by their values (filter()).
■ Create new variables with functions of existing variables (mutate() & transmute()).
■ Group table by key variable(s) (group_by()).
■ Collapse many values down to a single summary (summarise()).
■ Reshape the structure of the table (pivot_longer(), pivot_wider()).
■ Join relational tables (left_join(), right_join(), full_join()).
16
Tidyverse: Pipe %>%
● Data
○ %>%
■ subproduct 1
● %>%
○ subproduct 2
■ %>%
● result
17
Tidyverse: select()
■ Pick variables by their names (select()).
Exercise
● Select variables id_prof, id_hor, top, bottom, ph_h2o, cec
and save as a new object called dat_1
18
Tidyverse: filter()
■ Pick observations by their values (filter()).
Exercise
● Filter observations from dat_1 with more than 50 cmolc/kg
cec and save it as dat_2
19
Tidyverse: mutate()
■ Create new variables with functions of existing variables (mutate() & transmute()).
Exercise
● Create a column with the thickness of the horizon. Column
name “thickness”. Save the new object as dat_3. What is
the difference with transmute?
20
Tidyverse: group_by() & summarise()
■ Group table by key variable(s) (group_by()).
■ Collapse many values down to a single summary (summarise()).
Exercise
● Compute the mean of pH and mean of cec for each soil
profile (pid) using dat_3 as input. Save as dat_4
21
Tidyverse: pivot
■ Reshape the structure of the table
(pivot_longer(), pivot_wider()).
Exercise
● Using dat_3, put the
names of the variables
ph_h2o, cec and thickness
in the column
soil_property, and the
values of the soil property
in the value column. Keep
the rest of the table. Save
in dat_5 22
Tidyverse: join
■ Join relational tables (left_join(), right_join(),
full_join())
Exercise
● Load site.csv (in 01-Data folder) and join
its columns with dat_3. Use pid as key.
Save the result as dat_6
23
Data Visualization with ggplot2
24
Data Visualization with ggplot2
25
Data Visualization with ggplot2
26
Data Visualization with ggplot2
27
ggplot2: One Dimension
Basic structure
ggplot(data=..., aes(x=..., y=...)) +
geom_…()
Exercise
● Make a histogram (geom_histogram())
of cec and pH using dat_3
28
ggplot2: Two Dimensions
Basic structure
ggplot(data=..., aes(x=..., y=...)) +
geom_…()
Exercise
● Make a scatterplot (geom_point())
of cec in x and pH in y using dat_3
● Then, add a fitting line (geom_smooth())
29
ggplot2: Three Dimensions
Basic structure
ggplot(data=..., aes(x=..., y=...)) +
geom_…()
Exercise
● Make a scatterplot (geom_point())
of cec in x and pH in y using dat_3
● Add cec as argument color and size
30
ggplot2: Three Dimensions
Basic structure
ggplot(data=..., aes(x=..., y=...)) +
geom_…()
Exercise
● Make a scatterplot (geom_point())
of cec in x and pH in y using dat_3
● Add cec as argument color and size
31
ggplot2: Facets
Basic structure
ggplot(data=..., aes(x=..., y=...)) +
geom_…() +
facet_wrap(~variable)
Exercise
● Make a scatterplot (geom_point())
of cec in x and pH in y using dat_3
● Add cec as argument color and size
32
R Basics
Spatial data
Geospatial data in R
https://geocompr.robinlovelace.net/index.html
34
Main concepts
Geospatial data is data with spatial coordinates represented in a coordinate
reference system (CRS)
35
Vector data
36
Vector data
37
Raster data
38
Raster data
39
Raster data
40
Digital soil mapping
41
terra package
To familiarize with handling spatial data in R, we will focus on:
● Load a raster with rast() and explore its attributes
● Load a vector with vect() and explore its attributes
● Transform their coordinate system (project())
● Cropping (crop()) and masking (mask()) a raster
● Replace values in a raster by filtering their cells
● Rasterize (rasterize()) a vector layer
● Extracting raster values (extract()) using points
● Zonal statistics (zonal()) using polygons and rasters
42
terra: rast() & vect()
Exercise
● Load 01-Data/covs/grass.tif using rast()
function, then plot it
● Load 01-Data/soil map/SoilTypes.shp using
vect() function and plot it
● Explore the attributes of these layers
43
terra: project()
Exercise
● Check the current CRS (EPSG) of the raster
and the vector.
● Find a *projected* CRS in http://epsg.io
for Macedonia and copy the number
● Check the Arguments of function project
(?project) that need to be defined
● Save the new object as r_proj and v_proj
● plot both objects (used add=TRUE in plot
function)
44
terra: crop() and mask()
Exercise
● Compute the area of the polygons in v_proj
(search for a function) and
● assign the values to a new column named
area
● select the largest polygon using [], $, ==
and max() func. and save it as pol
○ Use it as it were a dataframe, for example
df[df$col == max(df$col)]
● crop the raster with pol using the crop()
function and save it as r_pol
● mask the raster r_pol with the polygon pol
and save it with the same name
● Plot each result
45
terra: replace cell values
Exercise
● Explore the following link to understand
how terra manage cell values
● https://rspatial.org/terra/pkg/4-
algebra.html
● Replace values lower than 5 in r+pol by 0
46
terra: rasterize()
Exercise
● Use rasterize() function to convert v_proj
to raster
● Use r_proj as reference raster
● Use field Symbol to assign cell values, and
plot the new map
47
terra: extract()
Exercise
● Convert dat_6 to spatial points using vect()
function (check help of vect())
● Note that the EPSG number is 6204
● Save the points as s
● Plot s and r_proj together in the same map
(Argument add=TRUE)
● Extract the values of the raster using
extract() function (check the help)
● Remove the ID column of the extracted
values
● Merge the extracted data with s using
cbind() function
● Convert s as a dataframe
48
terra: zonal statistics
Exercise
● Use the extract() func. to estimate the
mean value of r_proj at each polygon
● Use the fun= argument (check the help)
● Use the cbind() func. to merge v_proj and
the extracted values
● Convert v_proj to a dataframe
● Create a ggplot boxplot (geom_boxplot)
with x=Symbol and y=grass
49
terra package
https://rspatial.github.io/terra/reference/terra-package.html
https://rspatial.org/terra/index.html
50