0% found this document useful (0 votes)
20 views10 pages

Citi 2020

The document discusses using temporal regression models to predict chlorophyll-a presence around the Galapagos Islands using open access data from the Copernicus space program. The authors aim to build a reliable temporal model for forecasting chlorophyll-a concentrations to help prevent illegal fishing. They explore applying a convolution vector transformation to input data to highlight the spatial and temporal components when extracting a regression model from spatial-temporal oceanographic data. Initial results from their approach could be used to design a more complex and implementable prediction system for real-time chlorophyll-a forecasting.

Uploaded by

Fernando Chavez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views10 pages

Citi 2020

The document discusses using temporal regression models to predict chlorophyll-a presence around the Galapagos Islands using open access data from the Copernicus space program. The authors aim to build a reliable temporal model for forecasting chlorophyll-a concentrations to help prevent illegal fishing. They explore applying a convolution vector transformation to input data to highlight the spatial and temporal components when extracting a regression model from spatial-temporal oceanographic data. Initial results from their approach could be used to design a more complex and implementable prediction system for real-time chlorophyll-a forecasting.

Uploaded by

Fernando Chavez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Temporal Aspects of Chlorophyll-a Presence

Prediction Around Galapagos Islands

First Author1[0000−1111−2222−3333] , Second Author2,3[1111−2222−3333−4444] , and


Third Author3[2222−−3333−4444−5555]
1
Princeton University, Princeton NJ 08544, USA
2
Springer Heidelberg, Tiergartenstr. 17, 69121 Heidelberg, Germany
[email protected]
http://www.springer.com/gp/computer-science/lncs
3
ABC Institute, Rupert-Karls-University Heidelberg, Heidelberg, Germany
{abc,lncs}@uni-heidelberg.de

Abstract. Chlorophyll-a is a specific form of chlorophyll used in oxy-


genic photosynthesis which has been linked to nutrient presence, and
being able to correctly its concentrations may turn out to be a key step
in helping preventing and controlling illegal fishing activities in a certain
area. In this work, we consider open access data taken from the Coper-
nicus space program (currently used in the European Union for Earth
observation and monitoring) that include several physical and biochem-
ical variables and measures of the ocean surrounding Galapagos islands
(Ecuador). We use such data in an attempt to build a reliable temporal
model that can be used for forecast the presence of Chlorophyll-a, using
a novel technique called temporal regression. Our initial results can be
used to design a more complex, reliable, and implementable prediction
model for real-time forecasting of Chlorophyll-a presence.

Keywords: Chlorophyll-a concentrations · Temporal regression · Illegal


fishing prevention

1 Introduction

Oceanography is an important Earth science that studies the physical and bio-
logical aspects of the ocean, and it requires a large amounts of data for modelling,
investigating, predicting and explaining the different natural phenomena. Such
data is usually provided by scientific instruments like satellites, oceanographic
ships, buoys, and others. Because available data are growing in quantity and in
complexity, the need is emerging for a machine learning approach to integrate,
if not substitute, the more classical statistical take in oceanographic research.
Among the many applications of oceanography to marine resources management,
controlling and preventing illegal fishing stands out as a very important one. Ille-
gal, unregulated, and unreported fishing is becoming more sophisticated, and, as
it turn out, oceanographic conditions are predominant predictors of the seasonal
variations in fishing effort [2]. Being able to automatically identifying favorable
2 F. Author et al.

fishing zone is one possible strategy for helping illegal fishing activity monitoring
and preventing, and, to this end, automatically identifying favourable oceanic
conditions in a promising strategy.
Chlorophyll-a is a specific form of chlorophyll used in oxygenic photosynthesis
which has been linked to nutrient presence in several different areas [7, 19, 6]. It
is known that certain kind of satellite data can be used to predict the presence
of Chlorophyll-a in oceanic areas [6, 9]. In this work, we consider open access
data taken from the Copernicus space program, currently used in the European
Union for Earth observation and monitoring, in an attempt to build a reliable
spatial-temporal prediction model for Chlorophyll-a presence around Galapagos
Islands, and, in particular, in the Galapagos Marine Reserve (GMR), with the
purpose of creating the basis for an implementable, cost-effective, and reliable
model for potential fishing area prediction to be used in illegal fishing control
activities. Chlorophyll-a presence in a certain geographical point, in combination
with relevant physical, chemical, and biological variables of the same point can be
thought of as a multivariate spatial-temporal series, in which the Chlorophyll-a
plays the role of dependent variable. As such, multivariate spatial-temporal re-
gression can be used to estimate not only the functional model, but also the
temporal component for each predictor. In extracting a regression model from
spatial-temporal data, one can benefit from a suitable transformation of the data
itself that highlight the roles played by the spatial and by the temporal compo-
nent. Such a transformation is known as convolution vector [8]. In this work we
want to asses the expected improvement that can be obtained by applying a con-
volution vector to the original data, effectively paving the way toward a more
systematic exploration and optimization of the possible data transformations
that take into account both component of space and time.
This paper is organized as follows. In Section 2 we give a short account of
the current literature that concerns oceanographic data and learning. In Sec-
tion ?? we give some practical motivations for this work and the problem we
want to solve. Then, in Section ?? we describe the data that we have used and
mathematical model that we have applied. Finally, in Section ?? we describe our
practical approach and its results, before concluding.

2 Related Work
As the quantity, the complexity, and the availability of oceanographic data grows,
machine learning-based approaches to their analysis are becoming ever more
common [1]. Typical applications range from climate prediction, habitat model-
ing, and climate change analysis, to species distribution, species identification,
resource management, and environmental protection (see [18] for a recent re-
view). Examples of concrete applications include species identification [10], au-
tomatic detection and classification of ocean pollution, oil spills, alga bloom,
plastic pollution [5], as well as several fishing control-related applications. Fish-
ing control, and connected activities, in particular, are of special interest in this
work.
Temporal Aspects of Chlorophyll-a Prediction 3

The surveillance of illegal fishing activities and the detection of abnormal


fishing vessel behaviors are critical issues for the management of marine re-
sources. Machine learning techniques were employed for fishing gear recognition
from starting from vessel monitor system (in short, VMS) data to detect abnor-
mal VMS patterns of fishing vessels in Indonesia [13]. Also, while the coastal
fisheries in national waters are closely monitored, at least by some countries,
in high seas, there is a lot of uncertainty. For the automatic control of fishing
activities in high seas it is necessary to understand the general behaviour of
fishing fleets, to enforce fisheries management and conservation measures world-
wide. Satellite-based automatic information systems (in short, S-AISs) are now
commonly installed on the vessels and have the function to control the ships’
positions, and have been proposed as a tool to monitor the movements of fish-
ing fleets in near real-time. Using this data, models have been developed to
detect potential fishing activity from trawlers, longliners and purse seiners [17].
However, illegal fishing control is related with ocean resource and habitat man-
agement, because it affects the conservation of fishery resources, and taking the
correct decisions is often an hard problem, due to the non-availability of specific
data; as observed in [18], machine learning techniques have demonstrated the
potential to eliminate data gaps, predict future events, and increase the accu-
racy of the results. The satellite remote sensing for marine applications started
in the early 1960s, with the first pictures of the Earth. Currently, satellites are
being used in the indirect detection of fishes, via measuring water temperature,
which is the most used environmental parameter in investigations concerning the
relationship between environment and fish abundance [16]. Nevertheless, other
oceanographic variables exist that can be used to increase the accuracy in the
prediction. The lack of datasets that include such oceanographic variables taken
directly from permanent observation stations or from ships have contributed to
satellite remote sensing playing a pivotal role, considering that satellites offer
the opportunity to measure and monitor multiple oceanographic variables at
the same time [14]. Thus, remote sensing using satellite sensor systems has been
applied on large spatial scales with high temporal resolutions in coastal waters,
and while oceanic color satellites suffer of serious limitations, such as the low
spatial resolution of sensor systems, using machine learning techniques, such as
artificial neural networks and support vector machines, an optimal chlorophyll-a
model was developed for example in [12].

Sea surface temperature and chlorophyll images are considered fundamen-


tal for the identification of fishing zones, which in turn is essential for illegal
fishing detection. Features such as eddies, gyres, meanders, and upwelling that
are indicative of fish abundance areas, and these can be derived from satellite
information. One of the elements that indicate the quality of the water is the
concentration of Chlorophyll-a, whose presence is highly correlated with the phy-
toplankton biomass [6]. Phytoplankton is the base of the food chain in the marine
ecosystems, and it is the main responsible for the primary production. A very
recent multivariate statistical approach to predict Chlorophyll-a levels in coastal
marine ecosystems, taking account 20 variables from 64 observations is reported
4 F. Author et al.

Fig. 1. Fishing Ships around Galapagos exclusive economic zone. Snapshot taken in
2019, August. Source: http://www.globalfishingwatch.org.

in [9]. The main differences between [9] and our approach are that the former
uses costal data extracted from sensor, instead of high sea data from satellite,
and, moreover, it does not include a spatio-temporal study of the cause-effect
relationships.

3 Motivation
Galapagos Islands are located more than 1000kms to the west from Ecuador’s
continental coast. In 1998, the Government of Ecuador created the Galapagos
Marine Reserve (in short, GMR) to preserve the resources of the islands. In 2001,
Galapagos was declared a World Heritage by UNESCO. Due to its location,
the islands receive, from the east, the influence of two currents, the so-called
Humboldt’s cold current and Panama’s warm current, while, from the west,
that of the so-called cold and deep Cromwell current. These currents which
carry waters plenty of nutrients from the bottom of the sea to the surface. As a
result of this combination, Galapagos has extremely high productivity areas with
diverse marine organisms [11]. Because of its rich seas, this ecosystem attracts
various species towards the exclusive economic zone of the Galapagos and its
surroundings, and, for this reason, it receives pressure from industrial fishing,
principally from Asia, as it can be seen in Fig. 1. Many times, the activities
around the maritime limits derive in illegal fishery, which is very difficult to
control due to the immense Galapagos’ maritime territory. The effective control
of maritime spaces can only be carried on through satellite monitoring; however,
this is an extremely expensive solution. Other less expensive solutions require
the use of VMSs and S-AISs systems, which are installed onboard the fishing
ships, to monitor the position and fishing activities; nevertheless, these systems
can be disconnected by the ships when performing illegal fishing activities.
An alternative solution to help illegal fishing control while reducing the cost
of surveillance is being able to predict the are where fishing activity may take
Temporal Aspects of Chlorophyll-a Prediction 5

place. This prediction is related to oceanographic variables, chlorophyll levels


and sea temperatures being two of the most important ones. As a matter of fact,
it is known that distribution and migration of species is strongly influenced by
these two variables [19]. Therefore, in this research we try to develop a model
that can predict chlorophyll levels at open sea, in an attempt to identify the areas
where is most probable to find a high concentration of chlorophyll. With this
information, law enforcement ships can monitor these area more closely, in the
hope of intercepting find ships during illegal and unregulated fishing activities.

4 Clorophille Prediction
4.1 Data
The space program that is currently used in the European Union for Earth obser-
vation and monitoring is called Copernicus, and it encompasses three complete
constellations, each one with two satellites plus an additional single satellite.
This system provides 150 TB of open access data every day, including funda-
mental measurements or estimates of several physical, chemical, and biological
oceanic variables. Ocean color information of the Sentinel satellites is employed
for monitoring water quality through chlorophyll-a and phytoplankton analysis;
other oceanic variables such as wave height, tide, sea current, salinity, tempera-
ture, nutrient, and oxygen are used to develop hydrodynamic models to forecast
the evolution of ocean variables relevant for aquaculture. Moreover, the temper-
ature, salinity, mixed layer thickness, wind, sea currents, wave heights, mixed
layer thickness, chlorophyll, phytoplankton, zooplankton, and nutrients are used
to develop models related with Oceanic conditions and fish’es habitat spatial
distribution [15].
Satellite data. Years. Granularity. Available columns. Basic statistical values
for the columns (mean, variance, skewness, kurtosis, normality, etc..). Missing
data. Pre processing (only classical preprocessing, not the preprocessing specific
to this problem).
The study area of this research is around the Galapagos Islands between
6◦ N to 10◦ S and 85◦ W to 116◦ W., and has been conducted using E.U. Coper-
nicus Marine Service Information products [3] and [4], as like: Global Phys-
ical Analysis and Coupled System Forecasting (Global Analysis Forecast-PHY
-CPL-001-012) and Global Biogeochemical Analysis and Forecast (Global Anal-
ysis Forecast-BIO-001-028). The first product has information about physical
variables and the second of biological and chemical variables. A summary of
the variables considered are in Table N. 1 and Table N. 2. The dataset was
constructed with daily mean values of these variables of January 2018, with
granularity of 1/4◦ , obtaining a total of 8124 samples per day for each variable
without missing data. Basic statiscal values of physical and bio-chemical vari-
ables are in Table N. 4 and Table N. 3. An advantage of Copernic’s system
is the availability of data at various depths; The depth of 40 meters has been
selected, because the scales are different between the products; however they
coincide in the depth indicated above.
6 F. Author et al.

Table 1. Bio-Chemical Variables [3]

Variables Definition Unit


Chl Total Chlorophyll [mg m-3]
Mass concentration of chlorophyll in sea water – 40 meters.
Fe Dissolved Iron [mmol m-3]
Mole concentration of disolved iron in sea water
NO3 Nitrate [mmol m-3]
Mole concentration of nitrate in sea water
Nppv Total Primary Production of Phytoplankton [mg m-3
day-1]
Net primary production of biomass expressed as carbon per unit
volume in sea water
O2 Dissolved Oxygen [mmol m-3]
Mole concentration of disolved molecular oxygen in sea water
pH PH [1]
Sea water ph reported on total scale
Phyc Total Phytoplankton [mg m-3]
Mole concentration of phytoplankton expressed as carbon in sea wa-
ter
PO4 Phosphate [mmol m-3]
Mole concentration of phosphate in sea water
Si Dissolved Silicate [mmol m-3]
Mole concentration of silicate in sea water
SPCO2 Surface CO2 [Pa]
Surface partial pressure of carbon dioxide in sea water

Table 2. Physical Variables [4]

Variables Definition Unit


SST Sea water Potential temperature – 0 m. [◦ C]
T-40m Temperature – 40 m. [◦ C]
so Salinity [1e-3]
zos Sea surface height [m]
mlotst Mixed layer depth (m]
Uo Northward sea current water velocity [ms-1 ]
vo Eastward sea current water velocity [ms-1 ]
Temporal Aspects of Chlorophyll-a Prediction 7

Table 3. Statistical Values - Bio-chemical Variables

SpCO2 1 O2 NO3 PO4 SI PH FE Nppv Phyc Chl a


Mean 47.4768 177.7013 9.9591 1.1534 7.3785 7.9317 0.00009847 8.6971 1.6231 0.4479
Median 47.7869 188.6593 8.1943 1.0937 6.7654 7.9482 0.00007000 8.1185 1.3614 0.3199
Maximum 62.0582 230.2693 26.8773 2.0968 21.6457 8.0333 0.00064200 34.6280 5.3696 2.0252
Minimum 32.7676 57.9499 0.2017 0.4255 2.6331 7.6647 0.00000523 0.2895 0.4041 0.1276
Variance 9.7035 1630.8175 32.9802 0.0762 9.7871 0.0034 0.00000001 11.8383 0.4076 0.0937
Skewness -0.2980 -0.6807 0.6253 0.5264 0.8179 -1.4098 1.54613039 1.4275 2.0830 1.6868
Kurtosis 0.7984 -0.6408 -0.6369 -0.0834 -0.0139 2.0112 2.20067769 3.2961 4.2917 2.4735
Standard 3.1150 40.3834 5.7428 0.2760 3.1284 0.0583 0.00008757 3.4407 0.6385 0.3060
Deviation

Table 4. Statistical Values - Physical Variables

Vo Uo mlotst so zos SST T 40M


Mean -0.0532 -0.0260 13.2966 34.8830 0.2082 24.3530 21.1912
Median -0.0310 -0.0300 9.0000 34.9630 0.2000 24.3590 21.9920
Maximum 1.1090 1.2640 87.3000 36.6770 0.4640 29.0980 28.1780
Minimum -1.4190 -1.4050 2.9000 32.9000 0.0400 18.3470 11.4110
Variance 0.0518 0.0703 103.1637 0.1903 0.0037 1.8850 9.8673
Skewness -0.5126 -0.0570 1.8289 -0.8854 0.6107 -0.0170 -0.5063
Kurtosis 1.8535 1.4522 3.7690 1.2080 -0.0056 -0.1140 -0.8191
Standard 0.2275 0.2652 10.1570 0.4362 0.0610 1.3730 3.1412
Deviation
8 F. Author et al.

4.2 Temporal Predictors


What is lagged regression? Explain multivariate linear regression and the role
of lagged variables. Mathematical formulation. Initial experiments (only a few
data) to modulate the amount of lag. explain why not using autoregression.

4.3 Experiment
Explain experimental parameters . Describe training data set and test data set.
Table with test results. Results for 1-unit prediction, 2-units prediction, etc.
The relationships between chlorophyll levels and other physical and biochem-
ical variables were indentified through a simple correlation matrixes (Table N. 5)
and (Table N. 6). Taking as reference the chlorophyll, exist a high positive cor-
relation with nutrients (NO3, PO4, Si, Fe), phytoplankton, primary production,
and negative correlation with oxygen, pH, temperature, and mixed layer depth.
Other parameters like surface CO2, salinity, sea surface height, and sea current
water velocity have a low correlation.

Table 5. Simple Correlation Matrix of Bio-Chemical Variables

Chl SpCO2 O2 NO3 PO4 Si pH Fe Phyc Nppv


1 -0.1479 -0.8706 0.8795 0.8481 0.8315 -0.8076 0.8308 0.9568 0.6796 Chl
1 0.2154 -0.0461 0.0766 -0.0114 0.0697 -0.0357 -0.1811 -0.1167 SpCO2
1 -0.9423 -0.8888 -0.8974 0.8747 -0.8395 -0.7647 -0.5859 O2
1 0.9698 0.8687 -0.882 0.8698 0.765 0.6012 NO3
1 0.8531 -0.8966 0.8266 0.7471 0.5896 PO4
1 -0.7529 0.8814 0.7113 0.4942 Si
1 -0.6824 -0.7349 -0.6987 pH
1 0.6818 0.3676 Fe
1 0.7157 Phyc
1 Nppv

4.4 Results
Discuss the results.

5 Conclusions
References
1. Ahmad, H.: Machine learning applications in oceanography. Aquatic Research 2(3),
161–169 (2019)
Temporal Aspects of Chlorophyll-a Prediction 9

Table 6. Simple Correlation Matrix of Physical Variables

Chl SST SST 40m Vo Uo mlotst so zos


1 0.2165 -0.4824 -0.0606 -0.02 -0.4245 -0.3062 -0.1058 Chl
1 0.3254 0.0736 0.12 0.0668 -0.4285 0.5167 SST
1 0.0603 -0.0411 0.4863 -0.1902 0.5849 SST 40m
1 -0.052 0.1444 0.1362 -0.0144 Vo
1 0.037 0.0292 -0.0768 Uo
1 0.3896 0.1067 mlotst
1 -0.4936 so
1 zos

2. Cimino, M.A., Anderson, M., Schramek, T., Merrifield, S., Terrill, E.J.: Towards a
fishing pressure prediction system for a western pacific eez. Scientific reports 9(1),
1–10 (2019)
3. COPERNICUS: Product User Manual for Global Biogeochemical Analy-
sis and Forecasting Product. Marine Enviroment Monitoring Service (2019),
https://resources.marine.copernicus.eu/
4. COPERNICUS: Product User Manual for Global Physical Analysis and Cou-
pled System Forecasting Product. Marine Enviroment Monitoring Service (2020),
https://resources.marine.copernicus.eu/
5. Del Frate, F., Petrocchi, A., Lichtenegger, J., Calabresi, G.: Neural networks for
oil spill detection using ers-sar data. IEEE Transactions on geoscience and remote
sensing 38(5), 2282–2287 (2000)
6. Desortová, B.: Relationship between chlorophyll-α concentration and phytoplank-
ton biomass in several reservoirs in Czechoslovakia. Internationale Revue der
gesamten Hydrobiologie und Hydrographie 66(2), 153–169 (1981)
7. Dutta, S., Chanda, A., Akhand, A., Hazra, S.: Correlation of phytoplankton
biomass (chlorophyll-a) and nutrients with the catch per unit effort in the PFZ
forecast areas of northern bay of bengal during simultaneous validation of winter
fishing season. Turkish Journal of Fisheries and Aquatic Sciences 16, 767–777 (06
2016)
8. F. Jiménez, J. Kamı́nska, E.L.S.J.P., Sciavicco, G.: Multi-objective evolutionary
optimization for time series lag regression. In: Proceedings of the 6th International
Conference on Time Series and Forecasting (ITISE). pp. 373 – 384 (2019)
9. Franklin, J.B., Sathish, T., Vinithkumar, N.V., Kirubagaran, R.: A novel approach
to predict chlorophyll-a in coastal-marine ecosystems using multiple linear regres-
sion and principal component scores. Marine Pollution Bulletin 152, 110902 (2020)
10. Guisande, C., Manjarrés-Hernández, A., Pelayo-Villamil, P., Granado-Lorencio,
C., Riveiro, I., Acuña, A., Prieto-Piraquive, E., Janeiro, E., Matı́as, J., Patti, C.,
et al.: Ipez: an expert system for the taxonomic identification of fishes based on
machine learning techniques. Fisheries Research 102(3), 240–247 (2010)
11. Jones, P.J.: A governance analysis of the galápagos marine reserve. Marine Policy
41, 65–71 (2013)
10 F. Author et al.

12. Kwon, Y.S., Baek, S.H., Lim, Y.K., Pyo, J., Ligaray, M., Park, Y., Cho, K.H.:
Monitoring coastal chlorophyll-a concentrations in coastal areas using machine
learning models. Water 10(8), 1020 (2018)
13. Marzuki, M.I., Gaspar, P., Garello, R., Kerbaol, V., Fablet, R.: Fishing gear identi-
fication from vessel-monitoring-system-based fishing vessel trajectories. IEEE Jour-
nal of Oceanic Engineering 43(3), 689–699 (2017)
14. Monolisha, S., George, G., Platt, T.: Fisheries oceanography-established links in
the eastern arabian sea (2017)
15. Ricardo, S.: Chapter 4 - Living Ocean, pp. 2830–2841. Mercator Ocean Interna-
tional (2019)
16. Santos, A.M.P.: Fisheries oceanography using satellite and airborne remote sensing
methods: a review. Fisheries Research 49(1), 1–20 (2000)
17. de Souza, E.N., Boerder, K., Matwin, S., Worm, B.: Improving fishing pattern
detection from satellite ais using data mining and machine learning. PloS one
11(7) (2016)
18. Thessen, A.: Adoption of machine learning techniques in ecology and earth science.
One Ecosystem 1, e8621 (2016)
19. Zainuddin, M.: Skipjack tuna in relation to sea surface temperature and
chlorophyll-a concentration of Bone Bay using remotely sensed satellite data. Ju-
rnal Ilmu dan Teknologi Kelautan Tropis 3(1) (2011)

You might also like