0% found this document useful (0 votes)
4 views

Module 4 Script

Module 4 covers the argoFloats R package, which is designed to analyze data from Argo floats that provide extensive oceanographic measurements. It addresses barriers to accessing and analyzing long-term ocean data, detailing the workflow for retrieving and processing this data using functions like getIndex() and subset(). The module also highlights the benefits of Argo floats compared to traditional sampling methods and introduces visualization tools to enhance data accessibility.

Uploaded by

jude
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Module 4 Script

Module 4 covers the argoFloats R package, which is designed to analyze data from Argo floats that provide extensive oceanographic measurements. It addresses barriers to accessing and analyzing long-term ocean data, detailing the workflow for retrieving and processing this data using functions like getIndex() and subset(). The module also highlights the benefits of Argo floats compared to traditional sampling methods and introduces visualization tools to enhance data accessibility.

Uploaded by

jude
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Module 4 Script

Module 4: The argoFloats R Package


Module 4 Overview
Slide 1: Title Slide
In this section you will learn about Argo floats, as well as the need for the argoFloats package. You will
also learn about steps of the argoFloats package, and how to use quality control in the argoFloats
package.

M4 Lesson 1: Argo Floats Data


Slide 1: Title Slide
Slide 2: Solution: Access to Long-Term, Large Spatial Coverage Data
Until now, we've identified that the goal of this workshop is to take ocean data and turn it into ocean
information. In doing so, we've identified that two barriers exist: the first barrier is the ability to analyse a
variety of different types of data. We’ve addressed this barrier using the oce package, where we first
learned the tools of the package, and then we applied those tools to a real world example, and looked at
changing temperatures in the waters near Bermuda.

The next thing we’re going to do is to look at that second barrier. The second barrier is the need to have
access to long term, large spatial coverage data. The solution to this is Argo floats. Argo floats provide
large spatial coverage and long-time series of chemical, physical, and biological measurements. By long
time series we’re talking since the year of about 2000. They also provide us with real time, quality control
data, and they can also measure down to depths of 6000m if it’s a deep Argo, and in general they
measure down to about 2000m. They also provide us with publicly available data.

Slide 3: Benefits Compared to Other Sampling Methods


To get a visual of the benefits compared to other sampling methods, on the left we have all of the ship
based CTD measurements that were taken in the year 2018. If you then compare that to the right, that’s
all of the Argo samples that were taken in the year 2018. We can see that there is clearly a greater
coverage of sampling.

Slides 4-11: How do Argo Floats Work?


Argo floats are deployed from a ship, where they then adjust their buoyancy to descend down to a
drifting depth of about 1000 m. They remain at this drifting depth for approximately 10 days. They then
adjust their buoyancy again and descend again down to a depth of 2000 m if they are a core or
Biogeochemical (BGC) float or 6000 m if they are a Deep Argo. After that, they become more buoyant
and ascend to the surface while they take a series of measurements. Once they reach the surface, they
transmit their data to be uploaded to the Argo floats servers and then proceed to start their next cycle.
This whole process takes about 10 days. Each Argo performs this process about 150 times in their
lifetime and they last for approximately 4-5 years.
Module 4 Script

Slide 12: What do Argo Floats Measure?


So during that sampling, what do they measure? If it’s a core Argo, they’re measuring temperature,
salinity, and pressure. If it’s a biogeochemical (BGC), they’re measuring or they can measure things such
as oxygen, pH, nitrate, downward irradiance, chlorophyll fluorescence, and optical backscattering. And
then if it’s a deep Argo (that float which goes down to 6000m), they’re measuring temperature, salinity,
pressure, and oxygen.

Slide 13: Real-Time vs Delayed-Mode Data


As I mentioned, once the float reaches the surface of the water, it transmits its data within 24 to 48
hours. This is known as real time data. This real-time data goes through a series of automatic quality
control tests, and this is to look at the quality of the data. In addition to that real-time data that’s
uploaded within 24 to 48 hours, there also exists delayed-mode data. Delayed-mode data can appear
on the server up to two years after sampling, and it has additional quality control tests that are
completed by scientists.

Slide 14: End of Lesson

M4 Lesson 2: The argoFloats R Package


Slide 1: Title Slide
Slide 2: How do we Analyse Argo floats Data?
Now that we've identified the benefits of Argo data, the question is how do we get access to this data
and how do we analyse this data. The answer to that is the argoFloats R package, which is an R package
for analysing Argo data. It builds on the oce package that we chatted about earlier, and it identifies,
downloads, caches, and analyses oceanographic data collected by Argo profiling floats while handling
quality control, and does some processes that are just not possible by oce.

Slide 3: argoFloats: an R package for Analysing Oceanographic Data


Increasingly, oceanographers are relying on the R language for their data analysis. There are five main
reasons for this. First, R offers a comfortable interactive environment, supported by easy linkage with
compiled languages for speed and with other interpreted languages, such as Python, for convenience.
Second, R provides a vast and unrivalled suite of statistical functions, supported with detailed
documentation provided within R itself and in dozens of textbooks. Third, R packages are tested
stringently, both when they are first accepted to the system and also at regular intervals thereafter,
with the testing being carried out across co-dependent packages, on multiple versions of R and on
multiple computing systems. Fourth, R is an inherently cross-disciplinary tool, frequently used by
chemical and biological oceanographers. And, fifth, the oce package supports many aspects of
oceanographic analysis, with specialised functions for computing seawater properties, for reading
many native instrument formats, and for creating specialised graphical representations that meet
oceanographic conventions.
Module 4 Script

See Kelley and Richards (2020) for more on oce and Kelley (2018) for more details of general
oceanographic analysis with R.

Slide 4: argoFloats Resources


So you do have access to this PowerPoint, and I do recommend, in your own time, to click on these links
and look at the resources for the argoFloats package.

Slide 5: Visualisation Tools


Now because there is growing interest with Argo data, there have been a number of excellent
visualisation tools that have been created in hopes of having better access to this data. Two of these
examples are Argovis and Fleet monitor.

Slide 6: Barriers for Argo data


However, with the visualisation tools, some barriers still exist. These barriers include locating the data
and caching the data. For example, if we are dealing with data from the year 2000, as I mentioned, each
float takes about 150 profiles in its lifespan, and there are also about 4,000 floats currently in the world
oceans. As you can imagine, that’s a lot of data, so of course we need to figure out a way to speed our
analysis. There’s also barriers of downloading the data (how do you actually get it onto your computer?),
as well as handling quality control, and having freedom of analysis to look at different parameters.

Slide 7: Argo Software Packages


Because of these additional barriers, there have been a number of software packages that have been
created to address some of these issues. Some of these include:

- argodata: an R package that uses the data frame as the primary data structure rather than the
multidimensional arrays (this is for more moderate to expert coders);

- rnoaa: an R package that provides limited access to a subset of Argo data from the North Atlantic;

- oce: provides some tools for dealing with Argo data;

- argopy: a Python package used to easily fetch and manipulate measurements from Argo floats; and

- argoFloats: an R package that identifies, downloads, caches, and analyses oceanographic data
collected by Argo profiling floats, while handling quality control.

Slide 8: End of Lesson

M4 Lesson 3: Turning Argo Floats Data into Information


Slide 1: Title Slide
Slide 2: Built-in Datasets
Similar to the oce package, the argoFloats package has several built-in data sets to help the user
become familiar with the format of the files.
Module 4 Script

Slide 3: Workflow Overview


However, any real-world analysis would follow this workflow that I’m presenting here. So our first step is
to get an index of profiles from an Argo server, and this is done using the getIndex() function. We then
focus on a subset of profiles using the subset() function. We then get profile data files from the server
using getProfiles(). Then we read profile data files using readProfiles(), and then we process and analyse
the data. So we’ll go through and talk about what each function does in a moment.

Slide 4: Application
The idea for this workshop is that we are going to follow that workflow and we are going to apply it to a
real-world example. In other words, we are going to analyse Argo data using the R package argoFloats
to determine if the ocean temperature near Australia is warming. So it’s very similar to what we did at
the beginning, when we were looking at the BATS, or the Bermuda Atlantic Time Series data. The
difference now with Argo floats is that we just arbitrarily chose to look around Australia, but we could
have chosen anywhere, because of the coverage that Argo floats provide. So that’s very different than
the CTD BATS data. We had to look at that data because it’s very rare to have long-term CTD data, but
that’s not the case for Argo data.

Slide 5: Turning Argo Data into Information


If you remember, I mentioned that after a 10 days cycle, the Argo float sends all of its data to the server
- well this is the server. The idea is that we’re taking all of this valuable data from the server and actually
turning it into information.

Slide 6: Setup
For this section of the workshop, there are some additional set up tools. It’s important to remember
that our goal is to determine if the ocean temperature near Australia warming using Argo data.

The first two lines are loading the required packages, and in line 3 were simply accessing the
coastlineWorld dataset, which shows us where the coastlines are at in the world. In line 5 we’re creating
a folder in our top directory named “data”, and in line 6 we’re going into that folder and created another
folder named “argo”. We will explain why we did this later on. In line 8, we’re using getIndex(), which is the
first step in our argoFloats workflow.

Slide 7: End of Lesson

M4 Lesson 4: argoFloats Workflow


Slide 1: Title Slide
Slide 2: Workflow Steps - getIndex()
getIndex() gets an index of available Argo float profiles. This function was created to eliminate the
barrier of locating and caching Argo data.
Module 4 Script

Slide 3: Demonstration
In the help pages we can also see the names of the arguments for the getIndex() function. The first is
filename. If the user scrolls down in the help pages, they can see that there are a number of filename
arguments that they can give. This argument specifies what type of file the user wants to get, and by
default it gets the core data including temperature, salinity, and pressure. If, for example, the user was
interested in studying the changes in oxygen, they would instead get the bgc filename. The second
argument is the server. The server argument tells RStudio where to pull the data from. For Argo data,
there are two main servers, USGODAE based out of the USA and Infremer based out of France. By
default, argoFloats uses the USA server, but the user can always have the option to change that. Next
is the destdir, or destination directiory argument. This argument tells the getIndex() function where to
store the Argo data on your computer. By default, it looks for ~/data/argo, which is why we created
those directories, but the user always has the option of storing it wherever they want. The last one we’ll
talk about, as the others are more for developers, is the age argument. The age argument is in days. If
for example, the user made the age argument be 7, this means if the user went and got the index again,
but had already got it within the last 7 days, it wouldn’t get it again. On the contrary, in our case, we have
age=0, this means that we want to get all of the new available Argo index up until now regardless of when
we last got it. With all of that being said, in line 8, we’re getting the most recent available core index, from
the default server (USGODAE), and storing it in the default folder (~/data/argo).

Slide 4: Workflow Steps - subset()


Now that we’ve used getIndex() to get all of our available Argo index, the next step in the workflow is to
subset. The argoFloats package provides tools to sift through profiles based on ID, time, geography,
variable, institution, etc. This helps with caching the data.

Slide 5: Demonstration
Now what we’re going to do is type ?`subset,argoFloats-method` in our console to determine how to
subset in the variety of different ways with explanations and examples. In our case, we’re going to first
subset by rectangle around Australia. According to the help documentation, to subset by rectangle the
user needs to provide a list named rectangle that has elements named longitude and latitude. The
latitude and longitude arguments are just the limits of the longitude and latitude around Australia. In line
11 that’s exactly what we’re doing followed by line 12 which is simply plotting up our study region.

For our analysis, we’re not interested in all of the floats around Australia, but rather solely the floats in
the south-eastern part. This means, our next step is to subset by polygon. The reason we're doing this
is because at the end we're going to compare our results to another study that looked at the change in
temperature and they solely specified in and around that area. In our help pages, if the user looks at the
subset by polygon, it says that it's a list named polygon that has elements named longitude and latitude
that are numeric vectors specifying a polygon within which profiles will be retained. This means any Argo
floats within the specified polygon will be kept. To give the user a visual, in line 19, I provided you with the
specified polygon by arbitrarily choosing points in the south-east part of Australia. In line 24, we’re
simply drawing the lines around the polygon. This shows that we will keep anything that is within that red
line. That is what is occurring in line 25, we’re doing the subset, and in line 26 we’re simply plotting up our
new study region.
Module 4 Script

Slide 6: mapApp()
Before we move onto the next step of the workflow I want to first mention the app that we created,
which is mapApp(). mapApp() is a graphical user interface (GUI) which permits specifying a spatial
temporal region of interest, as well as looking at a set of float types to show. Then you can simply click
the “Code” button to reveal the argoFloats function calls needed to isolate the data.

Slide 7: Demonstration
So let me show you why I’m showing you this now. To access the app that we created, all you have to do
is type “mapApp()” in the console. Now I am going to highlight in and around Australia, which just zooms
in. There is also a Help button which, if clicked, will provide an explanation of how you can use this app. I
encourage you to look at that in your own time. If I click the Code button, it will tell me all of the required
code in order to do that subset by rectangle that I just zoomed in on. So the user has the option to copy
and paste this code into the console. Recently, we have also added the ability to subset by polygon. To
do that, you need to click around the map and choose your polygon; you would then click code, and this
would show you the code required to look into that specific area. I encourage you to look at that by
yourself, as it is a great way to get familiar with the code and with the package.

Slide 8: Workflow Steps - getProfiles()


So again, we’ve used getIndex(), we’ve subsetted by rectangle around Australia, and then by polygon to
look into that little area, we’ve demonstrated that you can also use mapApp() to gain a little bit more
information about the floats, and now we’re going to move on to step three of the workflow, which is
getProfiles(). getProfiles() looks at the index that we created, so in our case, we’re just interested in our
subset, and it creates a list of files that will eventually be read and downloaded.

Slide 9: Demonstration
So as of now, we are just creating the list. And you’ll notice for me, this is commented out, and I’ll explain
why in the next video.

Slide 10: Workflow Steps - readProfiles()


Until now, we’ve used getIndex() to get all of our Argo data, we’ve subset() to first subset by rectangle
then by polygon, then we’ve used getProfiles() to create a list of files to download, and now we’re now
moving on to step four of the workflow, which is readProfiles(). readProfiles() is that function that is used
to read and actually download that list that we have previously created with getProfiles().

In line 32, we’re using readProfiles(). The user will notice in the code that this, along with a save argument,
is commented out. That is because these operations, getProfiles() and readProfiles(), can take a bit of
time. For the sake of this workshop, these operations have been completed and the output was saved
in something called argos.rda that the user can find in their workshop_material. In other words,
argos.rda contains a list of downloaded and read Argo files (the output from getProfiles() and
readProfiles()). If we look at the length of this by doing length(argos[[‘argos’]]), we see that there are
approximately 5000 Argo profiles in our subset of interest.

Slide 11: End of Lesson


Module 4 Script

M4 Lesson 5: Quality Control


Slide 1: Title Slide
Slide 2: Quality Control (QC)
Up until this point, we’ve looked at getIndex() to get our available index, we’ve then subsetted both by
rectangle and polygon, and looked at how we can do a similar approach with mapApp(). I then
demonstrated how to use getProfiles() to create a list of profiles to download, followed by readProfiles().
readProfiles() to actually download and read that list.

For the sake of this workshop, I have done readProfiles() and getProfiles() for you, saved it, and we just
loaded it up. However, for any actual analysis that you do beyond that point, you would be required to
do getProfiles() and readProfiles(). So I do encourage you to look at that work image and make sure that
you are following all of the proper steps.

Up until this point, we have a list of about 5000 Argo profiles in the southern part of Australia. The final
step of this workflow is to analyse the data. Before we analyse the data, we want to make sure that the
data that we're looking at is clean. So, in addition to the workflow, we also created a workflow for quality
control. The steps for quality control include plotting the quality of the data, then using showQCTests()
to look at which quality tests were performed, and then looking at the applyQC() function which
removes any low quality data.

When we were talking about the oce package, I mentioned that the oce package has the ability to look
at and deal with flags of data. This argoFloats package has used a similar approach to oce and a lot of its
functionality is based on the oce functionality. So just keep that in mind.

Slide 3: QC Steps - plot(which=“QC”)


To first demonstrate the plot(which=”QC”), you’ll see that this plot here shows the percentage of good
data on top, compared with the mean data for the specified parameter on the bottom.

Slide 4: Demonstration
It’s important to note that the QC plot can only plot one ID at a time. For good practice, we should be
plotting all of the IDs that are unique in our polygon. For the sake of this demonstration, we’re just going
to arbitrarily choose the first ID.

In line 38, we’re subsetting all of our data so that “ai” includes all of the Argo data, not just our subset
area and then subsetting for the arbitrarily chosen ID from our subset. In line 39 and 40 we’re doing
getProfiles() and readProfiles() respectively, which were already previously explained. In line 41, we’re
creating a new window because the QC plot output is a big plot, and then in line 42 we’re making a QC
plot of our arbitrarily chosen ID. This plot shows the percentage of the good temperature data on top
and the mean temperature on the bottom. The user always has the option to specify which parameter
they’re interested in.

This tells us for at least one of the float IDs in our polygon subset, it had been going along and things
seemed to be OK but then the red dots are associated with bad points. This means, near the end of its
life cycle something had gone wrong. The question is now posed, “what went wrong?”
Module 4 Script

Slide 5: QC Steps - showQCTests()


Now that we've identified how to get a visual of the quality of the data, we need to now determine why
certain data is flagged bad, and what happened to these floats that do have bad data. So that’s where
the next step of the quality control workflow comes into play. And that’s showQCTests().

Slide 6: Demonstration
According to the help documentation (?showQCTests), showQCTests() prints a summary of quality
control tests that were performed and or failed on an Argo profile.

In line 46, we’re creating a list of all of the cycles and in line 47 we’re creating a list of bad flags. According
to the Argo User’s Manual, anything that is flagged 0,3,4, or 9 is considered bad. Using this information,
in the for loop, we’re looking through the list of 5000 cycles and identifying if there's any temperature
flags that are flagged 0,3,4, or 9. If there is, the code will print a message telling us what ID is bad and tell
us why it is flagged bad (ie. It will perform showQCTests() on that cycle). Something that’s important to
note is with showQCTests() we’ll notice in some cases bad flags are identified, but it also says “Passed
all tests”. There is because there is a downfall to the showQCTests() function. Argo files only store what
tests are performed and failed for the real-time data, not delayed mode data. This means, if a scientist
goes back two years later and flags something as bad, there is no record of why that certain test was
failed. This is the case for our ID that we arbitrarily chose for the QC plot. In our QC plot, we can see that
cycle 220 was flagged bad, however, it says “Passed all tests”. This mean for some reason, during the
delayed mode testing, temperature was flagged bad, but we don’t know why. Lastly, the user also has
the option of typing showQCTests(argosTest[[220]], style=”full”) to determine what test each number
is associated with.

Slide 7: QC Steps - applyQC()


Now that we have had a visual representation of the quality of the data, we then used showQCTests to
determine why data was flagged bad during real-time quality control tests, we are moving on to the third
and final step of the quality control workflow, which is known as applyQC(). applyQC() removes any data
that has been flagged bad so that it is no longer used in future calculations or analysis

Slide 8: Demonstration
In line 57, when we use applyQC() we tell RStudio to look at all of the profiles within our polygon and
remove any data that is flagged bad. In line 58, we’re then redefining our cycles variable to only consider
cleaned up data.

Slide 9: End of Lesson

You might also like