Module 4 Script
Module 4 Script
The next thing we’re going to do is to look at that second barrier. The second barrier is the need to have
access to long term, large spatial coverage data. The solution to this is Argo floats. Argo floats provide
large spatial coverage and long-time series of chemical, physical, and biological measurements. By long
time series we’re talking since the year of about 2000. They also provide us with real time, quality control
data, and they can also measure down to depths of 6000m if it’s a deep Argo, and in general they
measure down to about 2000m. They also provide us with publicly available data.
See Kelley and Richards (2020) for more on oce and Kelley (2018) for more details of general
oceanographic analysis with R.
- argodata: an R package that uses the data frame as the primary data structure rather than the
multidimensional arrays (this is for more moderate to expert coders);
- rnoaa: an R package that provides limited access to a subset of Argo data from the North Atlantic;
- argopy: a Python package used to easily fetch and manipulate measurements from Argo floats; and
- argoFloats: an R package that identifies, downloads, caches, and analyses oceanographic data
collected by Argo profiling floats, while handling quality control.
Slide 4: Application
The idea for this workshop is that we are going to follow that workflow and we are going to apply it to a
real-world example. In other words, we are going to analyse Argo data using the R package argoFloats
to determine if the ocean temperature near Australia is warming. So it’s very similar to what we did at
the beginning, when we were looking at the BATS, or the Bermuda Atlantic Time Series data. The
difference now with Argo floats is that we just arbitrarily chose to look around Australia, but we could
have chosen anywhere, because of the coverage that Argo floats provide. So that’s very different than
the CTD BATS data. We had to look at that data because it’s very rare to have long-term CTD data, but
that’s not the case for Argo data.
Slide 6: Setup
For this section of the workshop, there are some additional set up tools. It’s important to remember
that our goal is to determine if the ocean temperature near Australia warming using Argo data.
The first two lines are loading the required packages, and in line 3 were simply accessing the
coastlineWorld dataset, which shows us where the coastlines are at in the world. In line 5 we’re creating
a folder in our top directory named “data”, and in line 6 we’re going into that folder and created another
folder named “argo”. We will explain why we did this later on. In line 8, we’re using getIndex(), which is the
first step in our argoFloats workflow.
Slide 3: Demonstration
In the help pages we can also see the names of the arguments for the getIndex() function. The first is
filename. If the user scrolls down in the help pages, they can see that there are a number of filename
arguments that they can give. This argument specifies what type of file the user wants to get, and by
default it gets the core data including temperature, salinity, and pressure. If, for example, the user was
interested in studying the changes in oxygen, they would instead get the bgc filename. The second
argument is the server. The server argument tells RStudio where to pull the data from. For Argo data,
there are two main servers, USGODAE based out of the USA and Infremer based out of France. By
default, argoFloats uses the USA server, but the user can always have the option to change that. Next
is the destdir, or destination directiory argument. This argument tells the getIndex() function where to
store the Argo data on your computer. By default, it looks for ~/data/argo, which is why we created
those directories, but the user always has the option of storing it wherever they want. The last one we’ll
talk about, as the others are more for developers, is the age argument. The age argument is in days. If
for example, the user made the age argument be 7, this means if the user went and got the index again,
but had already got it within the last 7 days, it wouldn’t get it again. On the contrary, in our case, we have
age=0, this means that we want to get all of the new available Argo index up until now regardless of when
we last got it. With all of that being said, in line 8, we’re getting the most recent available core index, from
the default server (USGODAE), and storing it in the default folder (~/data/argo).
Slide 5: Demonstration
Now what we’re going to do is type ?`subset,argoFloats-method` in our console to determine how to
subset in the variety of different ways with explanations and examples. In our case, we’re going to first
subset by rectangle around Australia. According to the help documentation, to subset by rectangle the
user needs to provide a list named rectangle that has elements named longitude and latitude. The
latitude and longitude arguments are just the limits of the longitude and latitude around Australia. In line
11 that’s exactly what we’re doing followed by line 12 which is simply plotting up our study region.
For our analysis, we’re not interested in all of the floats around Australia, but rather solely the floats in
the south-eastern part. This means, our next step is to subset by polygon. The reason we're doing this
is because at the end we're going to compare our results to another study that looked at the change in
temperature and they solely specified in and around that area. In our help pages, if the user looks at the
subset by polygon, it says that it's a list named polygon that has elements named longitude and latitude
that are numeric vectors specifying a polygon within which profiles will be retained. This means any Argo
floats within the specified polygon will be kept. To give the user a visual, in line 19, I provided you with the
specified polygon by arbitrarily choosing points in the south-east part of Australia. In line 24, we’re
simply drawing the lines around the polygon. This shows that we will keep anything that is within that red
line. That is what is occurring in line 25, we’re doing the subset, and in line 26 we’re simply plotting up our
new study region.
Module 4 Script
Slide 6: mapApp()
Before we move onto the next step of the workflow I want to first mention the app that we created,
which is mapApp(). mapApp() is a graphical user interface (GUI) which permits specifying a spatial
temporal region of interest, as well as looking at a set of float types to show. Then you can simply click
the “Code” button to reveal the argoFloats function calls needed to isolate the data.
Slide 7: Demonstration
So let me show you why I’m showing you this now. To access the app that we created, all you have to do
is type “mapApp()” in the console. Now I am going to highlight in and around Australia, which just zooms
in. There is also a Help button which, if clicked, will provide an explanation of how you can use this app. I
encourage you to look at that in your own time. If I click the Code button, it will tell me all of the required
code in order to do that subset by rectangle that I just zoomed in on. So the user has the option to copy
and paste this code into the console. Recently, we have also added the ability to subset by polygon. To
do that, you need to click around the map and choose your polygon; you would then click code, and this
would show you the code required to look into that specific area. I encourage you to look at that by
yourself, as it is a great way to get familiar with the code and with the package.
Slide 9: Demonstration
So as of now, we are just creating the list. And you’ll notice for me, this is commented out, and I’ll explain
why in the next video.
In line 32, we’re using readProfiles(). The user will notice in the code that this, along with a save argument,
is commented out. That is because these operations, getProfiles() and readProfiles(), can take a bit of
time. For the sake of this workshop, these operations have been completed and the output was saved
in something called argos.rda that the user can find in their workshop_material. In other words,
argos.rda contains a list of downloaded and read Argo files (the output from getProfiles() and
readProfiles()). If we look at the length of this by doing length(argos[[‘argos’]]), we see that there are
approximately 5000 Argo profiles in our subset of interest.
For the sake of this workshop, I have done readProfiles() and getProfiles() for you, saved it, and we just
loaded it up. However, for any actual analysis that you do beyond that point, you would be required to
do getProfiles() and readProfiles(). So I do encourage you to look at that work image and make sure that
you are following all of the proper steps.
Up until this point, we have a list of about 5000 Argo profiles in the southern part of Australia. The final
step of this workflow is to analyse the data. Before we analyse the data, we want to make sure that the
data that we're looking at is clean. So, in addition to the workflow, we also created a workflow for quality
control. The steps for quality control include plotting the quality of the data, then using showQCTests()
to look at which quality tests were performed, and then looking at the applyQC() function which
removes any low quality data.
When we were talking about the oce package, I mentioned that the oce package has the ability to look
at and deal with flags of data. This argoFloats package has used a similar approach to oce and a lot of its
functionality is based on the oce functionality. So just keep that in mind.
Slide 4: Demonstration
It’s important to note that the QC plot can only plot one ID at a time. For good practice, we should be
plotting all of the IDs that are unique in our polygon. For the sake of this demonstration, we’re just going
to arbitrarily choose the first ID.
In line 38, we’re subsetting all of our data so that “ai” includes all of the Argo data, not just our subset
area and then subsetting for the arbitrarily chosen ID from our subset. In line 39 and 40 we’re doing
getProfiles() and readProfiles() respectively, which were already previously explained. In line 41, we’re
creating a new window because the QC plot output is a big plot, and then in line 42 we’re making a QC
plot of our arbitrarily chosen ID. This plot shows the percentage of the good temperature data on top
and the mean temperature on the bottom. The user always has the option to specify which parameter
they’re interested in.
This tells us for at least one of the float IDs in our polygon subset, it had been going along and things
seemed to be OK but then the red dots are associated with bad points. This means, near the end of its
life cycle something had gone wrong. The question is now posed, “what went wrong?”
Module 4 Script
Slide 6: Demonstration
According to the help documentation (?showQCTests), showQCTests() prints a summary of quality
control tests that were performed and or failed on an Argo profile.
In line 46, we’re creating a list of all of the cycles and in line 47 we’re creating a list of bad flags. According
to the Argo User’s Manual, anything that is flagged 0,3,4, or 9 is considered bad. Using this information,
in the for loop, we’re looking through the list of 5000 cycles and identifying if there's any temperature
flags that are flagged 0,3,4, or 9. If there is, the code will print a message telling us what ID is bad and tell
us why it is flagged bad (ie. It will perform showQCTests() on that cycle). Something that’s important to
note is with showQCTests() we’ll notice in some cases bad flags are identified, but it also says “Passed
all tests”. There is because there is a downfall to the showQCTests() function. Argo files only store what
tests are performed and failed for the real-time data, not delayed mode data. This means, if a scientist
goes back two years later and flags something as bad, there is no record of why that certain test was
failed. This is the case for our ID that we arbitrarily chose for the QC plot. In our QC plot, we can see that
cycle 220 was flagged bad, however, it says “Passed all tests”. This mean for some reason, during the
delayed mode testing, temperature was flagged bad, but we don’t know why. Lastly, the user also has
the option of typing showQCTests(argosTest[[220]], style=”full”) to determine what test each number
is associated with.
Slide 8: Demonstration
In line 57, when we use applyQC() we tell RStudio to look at all of the profiles within our polygon and
remove any data that is flagged bad. In line 58, we’re then redefining our cycles variable to only consider
cleaned up data.