0% found this document useful (0 votes)
44 views51 pages

R Basics and Essentials

R is a programming language for statistical analysis and graphics. RStudio is an integrated development environment that provides a user-friendly interface for R. The document introduces R and RStudio, explains why R is useful, and provides information about installing R and RStudio as well as the general workflow in R and RStudio.

Uploaded by

Waleed Ramzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views51 pages

R Basics and Essentials

R is a programming language for statistical analysis and graphics. RStudio is an integrated development environment that provides a user-friendly interface for R. The document introduces R and RStudio, explains why R is useful, and provides information about installing R and RStudio as well as the general workflow in R and RStudio.

Uploaded by

Waleed Ramzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

R Basics

Essentials for Getting Started


GSP-Secretariat
Isabel Luotto & Marcos Angelini
What is R?

2
https://moderndive.netlify.app/1-getting-started.html#r-rstudio
What is R and RStudio?

● R is an object-based programming language that runs computations

● RStudio is an integrated development environment (IDE) that provides an


interface by adding many convenient features and tools

So just as the way of having access to a speedometer, rearview mirrors, and


a navigation system makes driving much easier, using RStudio’s interface
makes using R much easier as well

3
https://moderndive.netlify.app/1-getting-started.html#r-rstudio
Why R?
● R is free to install, use, update, clone, modify, redistribute, even sell.
● It works in all operative systems.
● R's strong package ecosystem and charting benefits
● Clean, analyse, plot, and communicate all from the same place
● Keep track of your steps -> Reproducibility
● Reduce processing time -> Automation
● Everything you do in the analysis, from deleting outliers to interpreting
results, is contained in your code
● State-of-the-art graphics
● Maintained by its large community: great support!
● Most tools for digital soil mapping are written in R

4
Additional learning material
Using R for Digital Soil Mapping
https://link.springer.com/book/10.1007/978-3-319-44327-
0#:~:text=This%20book%20describes%20and%20provides,and%20s
patial%20data%20in%20R.

Soil Organic Carbon Cookbook


http://www.fao.org/3/I8895EN/i8895en.pdf

Introduction to the R Project for Statistical Computing for use at ITC by


D G Rossiter
https://cran.r-project.org/doc/contrib/Rossiter-RIntro-ITC.pdf

Youtube channel: MarinStatsLectures- R Programming & Statistics


https://www.youtube.com/channel/UCaNIxVagLhqupvUiDK01Mgg

5
Installation
1. Download and install R and Rtools
https://cloud.r-project.org/ https://cran.r-
project.org/bin/windows/Rtools/rtools42/rtools.html

1. Download and install RStudio


https://www.rstudio.com/products/rstudio/download/

6
General workflow in R

7
https://r4ds.had.co.nz/explore-intro.html
RStudio interface

R script:
We write text (scripts/code) here
Environment:
Shows the saved objects

Console:
The real interface of R
Files, plot, packages, and help!

8
Open the script 0.Introduction_to_R.R

9
Main parts of R syntax
● Functions: mean(), read_csv(), plot()
○ Arguments in functions: mean(x = 1:10) which is the same than mean(1:10)
● Objects
○ Vectors: concatenated values
○ Dataframes: spreadsheets
○ Lists: group of any objects, like a folder
○ Others
● Type of data in objects
○ Numeric: 1, 2, 3, 4.5, 6.02
○ Character: “a”, “b”, “1”, “this is a character”
○ Factors: characters with levels
● Packages:
○ Add-ons that contain functions for an specific use: “base”, “tydiverse”, “raster” , etc.

10
Main parts of R syntax
● Operators
○ Assign: <- (=)
○ Equal: a == b
○ Different: a != b
○ Column: $, dataframe$column_1
○ Subset: [],
■ vector[1] is the first element of the vector
■ Dataframe[row,column], e.g. dataframe[1,] is the first row of the dataframe, dataframe[,1:5] are the
first columns of the dataframe
■ List[[1]] is the first object of the list
○ Concatenate functions: %>% (pipe symbol)
○ Other symbols with expected behavior: +, -, *, /, >, <, ^ (power), & (and), | (or)

● R is case sensitive
○ “THIS” is not the same than “this”

● Many packages -> many different ways to do the same


○ Some packages require an specific syntax, such as ggplot2

11
Other logical operators
● %in% allows you to verify if an element is part of ==
another object
TRUE

==
FALSE

%in%
TRUE

12
We will use the following packages
● Tabular-data management: tidyverse
● Plotting:
○ Tabular data: ggplot2
○ Maps: mapview and tmap
● For spatial data
○ Rasters: raster and terra
○ Shape files: sf and terra
○ Access remote sensing data: rgee
○ Other GIS tools: rgdal
● Modelling: caret

13
Tidyverse
https://www.tidyverse.org/

The tidyverse is an opinionated collection of R


packages designed for data science. All packages
share an underlying design philosophy, grammar,
and data structures.

14
Tidyverse
https://r4ds.had.co.nz/

Chapter 5:
Data transformation

15
Tidyverse
■ Concatenate function with (%>%).
■ Pick variables by their names (select()).
■ Pick observations by their values (filter()).
■ Create new variables with functions of existing variables (mutate() & transmute()).
■ Group table by key variable(s) (group_by()).
■ Collapse many values down to a single summary (summarise()).
■ Reshape the structure of the table (pivot_longer(), pivot_wider()).
■ Join relational tables (left_join(), right_join(), full_join()).

16
Tidyverse: Pipe %>%
● Data
○ %>%
■ subproduct 1
● %>%
○ subproduct 2
■ %>%
● result

17
Tidyverse: select()
■ Pick variables by their names (select()).

Exercise
● Select variables id_prof, id_hor, top, bottom, ph_h2o, cec
and save as a new object called dat_1

18
Tidyverse: filter()
■ Pick observations by their values (filter()).

Exercise
● Filter observations from dat_1 with more than 50 cmolc/kg
cec and save it as dat_2

19
Tidyverse: mutate()
■ Create new variables with functions of existing variables (mutate() & transmute()).

Exercise
● Create a column with the thickness of the horizon. Column
name “thickness”. Save the new object as dat_3. What is
the difference with transmute?
20
Tidyverse: group_by() & summarise()
■ Group table by key variable(s) (group_by()).
■ Collapse many values down to a single summary (summarise()).

Exercise
● Compute the mean of pH and mean of cec for each soil
profile (pid) using dat_3 as input. Save as dat_4

21
Tidyverse: pivot
■ Reshape the structure of the table
(pivot_longer(), pivot_wider()).

Exercise
● Using dat_3, put the
names of the variables
ph_h2o, cec and thickness
in the column
soil_property, and the
values of the soil property
in the value column. Keep
the rest of the table. Save
in dat_5 22
Tidyverse: join
■ Join relational tables (left_join(), right_join(),
full_join())

Exercise
● Load site.csv (in 01-Data folder) and join
its columns with dat_3. Use pid as key.
Save the result as dat_6
23
Data Visualization with ggplot2

24
Data Visualization with ggplot2

25
Data Visualization with ggplot2

26
Data Visualization with ggplot2

27
ggplot2: One Dimension

Basic structure
ggplot(data=..., aes(x=..., y=...)) +
geom_…()

Exercise
● Make a histogram (geom_histogram())
of cec and pH using dat_3

28
ggplot2: Two Dimensions

Basic structure
ggplot(data=..., aes(x=..., y=...)) +
geom_…()

Exercise
● Make a scatterplot (geom_point())
of cec in x and pH in y using dat_3
● Then, add a fitting line (geom_smooth())

29
ggplot2: Three Dimensions

Basic structure
ggplot(data=..., aes(x=..., y=...)) +
geom_…()

Exercise
● Make a scatterplot (geom_point())
of cec in x and pH in y using dat_3
● Add cec as argument color and size

30
ggplot2: Three Dimensions

Basic structure
ggplot(data=..., aes(x=..., y=...)) +
geom_…()

Exercise
● Make a scatterplot (geom_point())
of cec in x and pH in y using dat_3
● Add cec as argument color and size

31
ggplot2: Facets

Basic structure
ggplot(data=..., aes(x=..., y=...)) +
geom_…() +
facet_wrap(~variable)

Exercise
● Make a scatterplot (geom_point())
of cec in x and pH in y using dat_3
● Add cec as argument color and size

32
R Basics
Spatial data
Geospatial data in R
https://geocompr.robinlovelace.net/index.html

34
Main concepts
Geospatial data is data with spatial coordinates represented in a coordinate
reference system (CRS)

Search the CSR of your country at https://epsg.io/

35
Vector data

● Vector data represent spatial


discrete object, such as regions,
roads, rivers, cities, etc.
● Simple features (sf) is an open
standard (ISO 19125-1:2004)
developed and endorsed by the
Open Geospatial Consortium
(OGC). The package sf has been
developed to manage this type of
data
● Attributes are typically stored in
data.frame objects. Geometries
are also stored in a data.frame
column.
● sf objects can be treated as a
data.frame

36
Vector data

37
Raster data

● The spatial raster data model


represents the world with the
continuous grid of cells
● It is an appropriate format for
continuous variables, such as
remote sensing data, terrain
attributes, etc., although also
categorical data can be represented

38
Raster data

● The main package for handling


raster data was called “raster” and
has been recently by “terra”.
However, many packages still use
raster as raster data manager
● The main attributes of a raster
model are:
○ Number of rows and columns
○ Number of layers
○ Pixel size (or resolution)
○ CRS

39
Raster data

● terra can also handle vector data,


which is specially recommended
when rasters and vectors need to be
combined in any way (convert
vector to raster, zonal statistics,
etc.)

40
Digital soil mapping

● In digital soil mapping Modelling/Mapping


we mostly work with
data in table format
and then rasterize this
data so that we can
make a continuous map

41
terra package
To familiarize with handling spatial data in R, we will focus on:
● Load a raster with rast() and explore its attributes
● Load a vector with vect() and explore its attributes
● Transform their coordinate system (project())
● Cropping (crop()) and masking (mask()) a raster
● Replace values in a raster by filtering their cells
● Rasterize (rasterize()) a vector layer
● Extracting raster values (extract()) using points
● Zonal statistics (zonal()) using polygons and rasters

42
terra: rast() & vect()
Exercise
● Load 01-Data/covs/grass.tif using rast()
function, then plot it
● Load 01-Data/soil map/SoilTypes.shp using
vect() function and plot it
● Explore the attributes of these layers

43
terra: project()
Exercise
● Check the current CRS (EPSG) of the raster
and the vector.
● Find a *projected* CRS in http://epsg.io
for Macedonia and copy the number
● Check the Arguments of function project
(?project) that need to be defined
● Save the new object as r_proj and v_proj
● plot both objects (used add=TRUE in plot
function)

44
terra: crop() and mask()
Exercise
● Compute the area of the polygons in v_proj
(search for a function) and
● assign the values to a new column named
area
● select the largest polygon using [], $, ==
and max() func. and save it as pol
○ Use it as it were a dataframe, for example
df[df$col == max(df$col)]
● crop the raster with pol using the crop()
function and save it as r_pol
● mask the raster r_pol with the polygon pol
and save it with the same name
● Plot each result
45
terra: replace cell values
Exercise
● Explore the following link to understand
how terra manage cell values
● https://rspatial.org/terra/pkg/4-
algebra.html
● Replace values lower than 5 in r+pol by 0

46
terra: rasterize()
Exercise
● Use rasterize() function to convert v_proj
to raster
● Use r_proj as reference raster
● Use field Symbol to assign cell values, and
plot the new map

47
terra: extract()
Exercise
● Convert dat_6 to spatial points using vect()
function (check help of vect())
● Note that the EPSG number is 6204
● Save the points as s
● Plot s and r_proj together in the same map
(Argument add=TRUE)
● Extract the values of the raster using
extract() function (check the help)
● Remove the ID column of the extracted
values
● Merge the extracted data with s using
cbind() function
● Convert s as a dataframe
48
terra: zonal statistics
Exercise
● Use the extract() func. to estimate the
mean value of r_proj at each polygon
● Use the fun= argument (check the help)
● Use the cbind() func. to merge v_proj and
the extracted values
● Convert v_proj to a dataframe
● Create a ggplot boxplot (geom_boxplot)
with x=Symbol and y=grass

49
terra package
https://rspatial.github.io/terra/reference/terra-package.html

https://rspatial.org/terra/index.html

50

You might also like