The focus of RAN1 are the AVHRR/3 sensors flown onboard the NOAA-KLMNN’ (NOAA-15 to 19) and Metop series. Relevant to SST applications, the major improvement over the AVHRR/2 (the first AVHRR with a split-window capability, suitable for multi-channel SST retrievals) was installation of a larger external sun shield to the scan motor housing on NOAA satellites, to reduce sunlight impingement on the sensor and to stabilize its thermal regime, which in turn improved the quality and stability of its IR calibration. The AVHRR/3 GAC data analyzed in this study are summarized in
Table 1. Note that the AVHRR L1b data are available from the NOAA Comprehensive Large Array-data Stewardship System (CLASS;
www.class.noaa.gov). This study uses a copy of this data available at the NOAA Center for Satellite Applications and Research (STAR) Central Data Repository on a spinning disk.
2.1. ACSPO Processing
The AVHRR L1b GAC orbital files are first aggregated into 1-hr non-overlapping granules. The ACSPO processing is organized into three major interleaved blocks: (1) forward radiance calculations using the NOAA Community Radiative Transfer Model (CRTM), with the Canadian Meteorological Center (CMC) L4 SST analysis [
9] and the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS;
www.emc.ncep.noaa.gov/index.php?branch=GFS) first guess fields as inputs [
2]; (2) identification of clear-sky ocean pixels suitable for SST retrievals [
1]; and (3) calculation of SST using the regression equations [
3,
4]. The following Multi-Channel SST (MCSST; nighttime) and Non-Linear SST (NLSST; daytime) equations are used in RAN1:
MCSST (Night; Solar Zenith Angle, SZA > 90°)
Here, T3.7, T11 and T12 are brightness temperatures (BTs) observed in AVHRR bands centered at 3.7, 11 and 12 µm, respectively; S = sec(VZA) − 1, VZA is the view zenith angle, and To is the first guess CMC L4 SST interpolated in space to the retrieval pixel. Only highest quality level (QL = 5) ACSPO data are used.
2.2. SSTs Derived Using Static Regression Coefficients
The static SST regression coefficients were calculated using one full calendar year from 1 January to 31 December (of the year following 2002 or the year in which the satellite was launched; shown in the 5th column of
Table 1). ACSPO SSTs,
TS, and BTs have been matched up with quality controlled
in situ SSTs,
Tin situ, from the drifting and tropical moored buoys in the NOAA
in situ Quality Monitor system (
iQuam;
www.star.nesdis.noaa.gov/sod/sst/iquam/, [
10]), by selecting the closest in space ACSPO SST pixel within (10 km, 2 hr) of
Tin situ. The SST coefficients have been first calculated using all matchups, and then recalculated excluding outlier points in which the AVHRR SST deviated from the
in situ SST by more than 4 global standard deviations (SD) of Δ
TS =
TS −
Tin situ. (This exclusion had a very minor impact on the SST coefficients.) The static coefficients have been applied to the full record of all seven AVHRRs. For routine validation in the NOAA SST Quality Monitor system (SQUAM;
www.star.nesdis.noaa.gov/sod/sst/squam/, [
11]), the ACSPO SSTs were matched up with
iQuam
in situ SSTs using the same matchup criteria, except the space-time window was relaxed to (20 km, 4 hr), which is currently the standard setting in SQUAM. One anonymous reviewer of this paper suggested reducing the time window to ±2 or even ±1 hr, if enough collocations are available. Rerunning the validation analyses is possible but proper selection of the new parameters should be based on sensitivity analyses, which are currently underway but were challenging to complete in the limited time allocated for the revision. Note also that the diurnal changes are relatively small at night, and if anything, all validation statistics shown in this study should only improve if smaller windows are used.
The time series of nighttime mean Δ
TS are shown in
Figure 1a as they appear in the NOAA SQUAM system. For some platforms, the derived SSTs are stable and close to
in situ SSTs, for the full time of their operation (e.g., Metop-A and -B, and NOAA-17 and -19). For some others, the initial periods of stability alternate with periods of uncontrolled variations (e.g., NOAA-16 and -18). And yet for some others, the SSTs are unstable for the full period from 2002 to present (e.g., NOAA-15).
2.3. The Root Cause of the Unstable SSTs
To attribute the anomalous SST behavior in
Figure 1, another NOAA monitoring system is employed. Similarly to SQUAM, the MICROS system (
www.star.nesdis.noaa.gov/sod/sst/micros/, [
12]) monitors deviations of regression SST,
TS, from the first guess L4 CMC SST,
TL4: Δ
TS =
TS −
TL4. In addition, it also monitors deviations of the “observed” AVHRR BTs,
TO, from their “model” counterparts,
TM: Δ
TB =
TO −
TM, in three AVHRR bands 3b, 4, and 5. The time series of Δ
TS and Δ
TB in band 3b (which contributes most to the nighttime SST) corresponding to
Figure 1a are shown in
Figure 2a,b, respectively, as they appear in MICROS.
The two Δ
TS’s in
Figure 1a and
Figure 2a are in close agreement. This is expected because the CMC L4, ACSPO AVHRR L2, and
iQuam
in situ data all characterize the same “true SST” state. However, they do it differently. In particular, the global gap-free CMC L4 SST is produced by anchoring several infrared and microwave satellite SST products to
in situ SST and blending them together. (Note that the ACSPO AVHRR and
iQuam
in situ SSTs have not been assimilated in the CMC L4 used here.) Also,
Figure 2a is based on the full global ACSPO retrieval domain, whereas
Figure 1a uses only its relatively small (<1/1000)
in situ matchup sub-sample. As a result, the time series of (
TS −
TL4) are less noisy compared to (
TS −
Tin situ), because the day-to-day noise is suppressed more efficiently in the L4 matchups than in the
in situ matchups, due to the three orders of magnitude larger matchup samples.
Figure 2b shows Δ
TB’s in band 3b corresponding to the Δ
TS’s in
Figure 2a. The two time series strongly correlate, suggesting that the major reason for the SST artifacts in
Figure 1a and
Figure 2a are the unstable AVHRR BTs (
cf. [
13]). The root cause of the unstable BTs is suboptimal AVHRR calibration [
14]. Work is underway to generate an improved AVHRR Level 1b dataset, and use it in ACSPO RAN. In the meantime, the main focus of RAN1 was on improving the stability of SST time series using the most stable sensors and periods of their operation, and employing variable SST regression coefficients.
2.4. SSTs Derived from Variable Regression Coefficients and Comparison with PFV5.2
In consultation with the GPB and CRW Teams, RAN1 uses the two most stable satellites at any time, one midmorning and one afternoon. The selected satellites and periods are listed in
Table 1 and the corresponding mean biases of (
TS −
Tin situ) are shown in
Figure 1b. The time series in
Figure 1b have improved from
Figure 1a, but some instabilities and cross-platform inconsistencies still remain.
In PFV5.2, SST regression coefficients are recalculated monthly, to stabilize the retrieved SSTs in time. A similar approach was explored in RAN1, except the coefficients here are calculated daily, using a ninety-one day moving window. In RAN1, all matchups within a 91-day window are used with equal weight, whereas in PFV5.2, weighted least squares regression is employed. A moving window is used to minimize the potential month-to-month discontinuities which may be expected in the PF SSTs, and a factor of ×3 larger time window is used to reduce the uncertainty of the calculated regression coefficients while still capturing their seasonal variability.
The time series of the mean biases in RAN1 and in PFV5.2 with respect to
in situ data are plotted in
Figure 3. Only highest quality PFV5.2 data (QL = 5) are used in this study. Analysis of one day (1 July 2010) of global NOAA-18 data has shown that the number of nighttime observations with QL = 5 was ~1.1M (M = million), compared with ~0.7M with QL = 4. The QL = 4 data are clearly degraded compared with QL = 5 (in particular, they show a ~−0.2 K colder bias, and a 0.15 K larger standard deviation, SD), and therefore they were not included in the comparisons with RAN1. Note that in PFV5.2 only one platform is processed at any time, due to the initial requirement to ensure a consistent observation time in the CDR, from only afternoon platforms (although later, midmorning NOAA-17 was used before the NOAA-18 launch). Also, PFV5.2 data are available in a delayed mode (e.g., at the time of this writing, data up to December 2012 were available, [
8]).
RAN1 SSTs are more stable in time and cross-platform consistent, with typical biases of ~0 ± 0.1 K, compared with −0.17 ± 0.25 K in PFV5.2. The regression coefficients in both datasets are trained against buoys, but in PFV5.2, a −0.17 K is subtracted, to obtain the “skin” SST product [
8]. Comparing
Figure 3 with
Figure 1b shows that using variable coefficients in RAN1 makes the SST biases smaller and more stable. In PFV5.2, the SST biases are generally grouped around the expected −0.17 K level, but are less stable in time and less cross-platform consistent. In particular, increasing positive bias is seen in the last year(s) of NOAA-18 (which according to
Figure 1 and
Figure 2, started showing degraded BTs and SSTs in 2010, and therefore was replaced in RAN1 by NOAA-19 in early 2009). NOAA-19 is out of family, likely due to a coding error in PFV5.2, which has degraded its cloud mask [
8].
The stabilized RAN1 SDs are shown in
Figure 4a. (Note that additional analyses have shown that the SDs are only minimally impacted by using variable coefficients). They are stable in time and uniform across different platforms (including between the midmorning, with a nominal equator crossing time, ECT~9:30–10 p.m., and afternoon, with a nominal ECT~2 a.m.), and typically range from 0.30 to 0.45 K, compared with 0.30–0.70 K in PFV5.2. The improved SSTs in RAN1 over PFV5.2 are largely due to the use of the ACSPO clear-sky mask [
1] and SST regression algorithms [
3,
4]. In particular, at night, ACSPO uses the MCSST in conjunction with the transparent 3.7 µm band (see Equation (1)), whereas PFV5.2 uses the split-window NLSST with the 11/12 µm bands [
8]. The ACSPO and PFV5.2 SST equations are also different. While the ACSPO equations primarily target the effect of variable VZA, the PFV5.2 algorithms are stratified by “wet” and “dry” atmospheric conditions, to mainly correct for the effect of variable water vapor in the atmosphere (see discussion in [
3,
4] and in
Section 2.7 below). The NOAA-19 SDs are degraded, likely due to the same coding error in the corresponding PFV5.2 cloud mask mentioned earlier [
8]. Overall, the conventional daily validation statistics for PFV5.2 shown in
Figure 3 and
Figure 4 are largely consistent with their annual robust counterparts reported in [
8].
The corresponding number of matchups used to calculate the daily validation statistics in
Figure 3 and
Figure 4 is shown in
Figure 5. It follows the evolution of the
in situ fleet with time, scaled with the number of QL = 5 retrievals in RAN1 and PFV5.2. In RAN1, the number of matchups ranges from <500 in the early 2000s to >2000 in recent years. In PFV5.2, it is approximately half of RAN1 numbers. The different number of matchups in the RAN1 and PFV5.2 is due to the different number of the corresponding clear sky SST pixels with QL = 5 (shown in
Figure 6). There is a clear seasonal pattern in both products, due to the alternating land-sea fraction in the Northern and Southern hemispheres following the Sun illumination periodicity. Note that direct comparison between
Figure 5a,b is not straightforward, because the PFV5.2 is a remapped equal-grid 0.04° L3 product, whereas the RAN1 is an L2 product reported in the native GAC swath projection, with a ~4 km resolution at nadir and up to ~30 km at swath edge. Also, the retrievals in the PFV5.2 are only made at VZA < 55°, in contrast with the ACSPO product which covers the full swath up to VZA ~68°.
Generating a RAN1 L3 product similar to the PFV5.2 was also considered but not implemented, mainly because the mapping of the full-swath ACSPO data to a 0.04° grid would have resulted in multiple duplicates at the swath edge (where a ~30 km pixel may be mapped into multiple 0.04° grids). Using a coarser L3 grid (e.g., ~0.1°) would minimize the duplicates at swath edges (although not eliminate them completely), but at the expense of degraded spatial resolution at nadir. More discussions between the data producers and users are needed before a gridded version of RAN1 SST product can be generated.
The number of high-quality (QL = 5) nighttime SST retrievals is from ~2 to 3.5 M (M = million) GAC pixels in RAN1 whereas in PFV5.2, it is typically from 1 to 1.5 M (this latter number should be scaled by ~1.6, to account for the degraded retrievals with QL = 4). Note that the RAN1 product is available from two platforms, which effectively increases the coverage of the global ocean.
2.5. Improved RAN1 SST Using the Sensor-Specific Errors Statistics (SSES)
The international Group for High Resolution SST (GHRSST) recommends that estimates of SST bias and SD (comprising the Sensor-Specific Errors Statistics, or SSES) should be appended to each reported SST value. As of today, there is no community consensus on how the SSES should be calculated. In RAN1, the new formulation recently adopted in ACSPO is used [
15]. Effectively, correcting for the SSES biases in ACSPO minimizes regional biases between the satellite “sub-skin” and
in situ “bulk” SSTs, and thus results in a product which is better harmonized with the
in situ SSTs measured by drifting and moored buoys.
Figure 7 re-plots the time series of the SST biases and SDs from
Figure 3a and
Figure 4a, respectively, but after the SSES bias correction has been applied. This results in reduced SST biases with respect to
in situ data (~±0.05 K
vs. ±0.10 K in
Figure 3a), smaller SDs (0.20–0.35 K
vs. 0.30–0.45 K in
Figure 4a), and overall improved stability and cross platform consistency of RAN1 SST. Therefore we recommend that in any data assimilation applications of RAN1 data, especially those blending together satellite and
in situ data, and targeting “bulk” SST L4 products (such as the “foundation”, e.g., CMC SST) the SSES bias correction be applied. Also, the RAN1 data may be assimilated in analyses with weights depending on the SSES SDs (e.g., inversely proportional to their squares, see e.g., [
16] and references therein).
2.6. Validation against Independent in Situ SSTs from ARGO Floats
Validation in
Section 2.4 and
Section 2.5 was performed against the same drifters and tropical moorings used to train the regression coefficients in RAN1. One anonymous reviewer of this paper expressed a concern that this “dependent” validation may potentially favor the RAN1 over the PFV5.2 due to over-fitting. (Recall that the PFV5.2 SST was also trained against drifting buoys, but the reviewer is correct that those were obtained from other sources than
iQuam, and processed with different quality control and matchup criteria, thus making the PFV5.2 matchup data used here more “independent”.) To address this concern, additional validation was performed against ARGO floats obtained from
iQuam version 2 [
17], and reported in this section. In contrast with the daily validation in
Section 2.4 and
Section 2.5, this section reports monthly statistics, due to significantly smaller number of ARGO matchups.
Figure 8 shows the monthly number of RAN1 and PFV5.2 matchups with drifters/tropical moorings and with ARGO floats. As seen before in
Figure 5, the number of matchups is up to a factor of 2 larger in RAN1 than in PFV5.2 (in proportion to the number of SST reports in the two products shown in
Figure 6). Since the ARGO projects’ inception in 1999, the number of floats has been steadily increasing, resulting in <20 monthly matchups in RAN1 in 2002 and reaching >800 in late 2015. Recall that ARGO floats may only take 2–3 SST measurements a month, due to their 10-day profiling cycle, compared to continuous measurements by drifters and tropical moorings. The number of monthly matchups with ARGO floats is from two (in 2015) to three (in 2002) orders of magnitude smaller than with drifters/tropical moorings. As a result, all corresponding ARGO statistics shown in
Figure 9 and
Figure 10 are noisier than their drifters/tropical moorings counterparts.
Figure 9 plots mean Δ
TS’s. With respect to drifters/tropical moorings, the mean biases are ~0 ± 0.03 K in RAN1 and ~−0.17 ± 0.15 K in PFV5.2. With respect to ARGO floats, the statistics are noisier for both datasets and typically are within ~0 ± 0.1 K for RAN1 and ~−0.17 ± 0.2 K for PFV5.2. The overall observation from
Figure 9 is that, with proper consideration of noise in different matchup datasets, the independent (monthly) validation against ARGO floats is in good qualitative, and even quantitative, agreement with the dependent (daily and monthly) validation against drifters/tropical moorings.
Figure 10 plots SDs of the Δ
TS’s. A seasonal signal is seen in both datasets (more so in PFV5.2). Typical monthly SDs with respect to drifters and tropical moorings are 0.28 ± 0.05 K in RAN1 and 0.47 ± 0.10 K in PFV5.2. With respect to ARGO floats, the corresponding monthly SDs are 0.32 ± 0.08 K in RAN1 and 0.50 ± 0.15 K in PFV5.2.
Overall, analyses in this section show that independent validation against ARGO floats, although noisier due to much smaller matchup samples, is qualitatively consistent with the results obtained against the dependent dataset of drifters/tropical moorings. The RAN1 mean biases and SDs are generally smaller and more stable in time, and show better cross-platform consistency, against both types of in situ data. The evaluation of VZA and regional performance of satellite SST in the next section was therefore performed against more abundant drifters/tropical moorings, to minimize the noise in the matchup dataset and to facilitate quantifying possible SST residuals and artifacts.
2.7. Performance of RAN1 SST as a Function of Latitude and VZA
Mean biases and SDs are convenient metrics to measure the global performance of various SST products and to compare different products. For users, however, it is also important to ensure that the performance is maximally uniform across the full retrieval domain.
The accuracy of satellite SSTs depends upon many factors, one of the most important being the atmosphere between the satellite sensor and ocean pixel. In the SST bands, the atmospheric attenuation and re-emission mainly depends on the total precipitable water, TPW, and VZA. One might expect that the two would aggregate as
TPW ×
sec(VZA). However, in reality they are not fully coupled (due to e.g., decreasing surface emissivity and increasing cloud across the swath). All current atmospheric correction algorithms fall under two broad categories of “MCSST” and “NLSST”, but the actual equations used in different retrieval groups differ. Some (e.g., employed in OSI SAF) explicitly target the VZA dependencies, whereas others focus on the TPW correction (e.g., the algorithm employed in PFV5.2). Based on systematic comparisons in [
3,
4], the OSI SAF type MCSST/NLSST algorithms (represented by Equations (1) and (2) above) were found to be more efficient and therefore adopted in ACSPO [
3]. Proper handling of the VZA dependencies is critically important in ACSPO, which was the first SST system to attempt high-quality retrievals in the full sensor swath (up to VZA~68°;
cf. VZA~55° cut-off in PFV5.2). The reality however is more complex than a simplistic “TPW-VZA” model, and there are other factors involved in the retrievals. Assuming that those may be aggregated and approximated as a function of latitude, the future release of PF plans to explore a new implementation of the SST regression algorithm, LATBAND [
8] (which targets minimizing SST errors as a function of latitude; note that the LATBAND algorithm was also analyzed in [
3]). In this section, the performance of the satellite SSTs is therefore evaluated as a function of VZA (only in RAN1, because the VZA is not reported in PFV5.2 data files) and latitude (in both RAN1 and PFV5.2).
Figure 11 plots mean biases and SDs in RAN1 SST as a function of VZA, by platform. The biases are close to zero, within several hundredths of a degree Kelvin, and near-flat across the full swath. Note a slight trend across the swath (positive for all PM platforms, which descend at night and negative for all AM platforms, which ascend), due to the satellite sensor looking at systematically different parts of the SST diurnal cycle across the swath. The SDs are also highly consistent across platforms, but non-uniform across the swath, increasing from ~0.25 K at nadir to ~0.40 K toward the edge. This degradation is expected, due to reduced atmospheric transmission and surface emission, and increased atmospheric reemission, which together lead to a progressively smaller surface signal at slant angles. The actual degradation may appear small and disproportionate to, e.g., a factor of ×3 thicker atmosphere at VZA ~68°. However, one should keep in mind that the validation SD represents a combination of the atmospheric correction and other errors (such as e.g., uncertainty in the
in situ data and matchup errors). Accounting for these “pedestal” contributions, which are uniform across the swath, will increase the contrast between the nadir and swath edge SDs.
Turning to geographical biases,
Figure 12 shows Hovmöller diagrams of the mean SST biases. The RAN1 biases are relatively flat in the full domain. A few larger and intermittent (positive/negative) anomalies are seen only in high latitudes, which are least populated with
in situ data. Note that in the early 2000s, the number of drifters in the Arctic above 70° N was very limited, and increased after ~2008. In the Antarctic, very few observations are still found below 70° S, even at the present time.
Figure 12b shows that the PFV5.2 biases (which recall are expected to be centered at −0.17 K) are highly non-uniform in space and time. In particular, there are sharp discontinuities between NOAA-18 and −19 (consistent with the time series in
Figure 3b and
Figure 9b; likely due to the coding bug for NOAA-19 in PFV5.2 as mentioned earlier), and between the Northern and Southern hemispheres.
To estimate the magnitude of spatial and temporal non-uniformities in
Figure 12 quantitatively,
Figure 13 plots their corresponding histograms. In RAN1, the biases are near-Gaussian, centered at ~0 K and consistent between all five platforms. The inter-quartile ranges, IQR (defined as Q3-Q1) are ~0.14 K, meaning that 50% of regional biases are within ±0.07 K (
Figure 13a further shows that the vast majority of RAN1 biases fall within ±0.3 K). In PFV5.2, the mean biases are centered close to the expected −0.17 K for NOAA-17 and -18 but are ~−0.1 K colder for NOAA-19, and less symmetric. The IQR~0.36 K suggests that 50% of regional biases in PFV5.2 fall within ±0.18 K, with a measurable fraction of cold biases (suggesting residual cloud) extending to <−1 K and warm biases to ~+0.5 K.
Figure 14 further shows the spatiotemporal structure of the corresponding SDs. The RAN1 SDs are more uniform in space and time, typically ranging from 0.25 K to 0.5 K. In contrast, the PFV5.2 SDs are larger in magnitude (typically from 0.4 to 0.7 K), and more variable in space and time. Note that different spatiotemporal patterns in the SDs between RAN1 and PFV5.2 are at least in part due to the different SST algorithms used in the two products. In particular, the PFV5.2 split-window (11/12 µm) NLSST algorithm is more sensitive to the increased TPW in the tropics, compared with the RAN1 dual-window (3.7/11/12 µm) MCSST algorithm.
Analyses in this section show that compared to PFV5.2, the RAN1 SST performance is consistently superior and more uniform across different satellites and in the full retrieval domain (in particular, as a function of VZA and latitude).