Validation of FPOD Data
Validation of FPOD Data
This is an essential step in delivering sound results. Before diving into the file you need to know:
Don’t say ‘Zero’! You need to assess how big the difference is that you are looking for. If you wanted
to show there is a difference between sites that are giving detection rates that differ by a factor of 3 then
an error rate of 10% FPs is very unlikely to affect your conclusion.
If you are looking for a trend over years in a population you will want a much lower FP rate. You might
aim for a rate per year in your whole set of files that was below 5% or below 1% - discuss this with your
friendly statistician!
SONARS and other man-made sources. At source these clicks would be easy to distinguish,
being long and regularly timed, but after multiple reflections and long transmission paths the
trains can look very different.
NBHF species can be, rarely, misclassified as other cetaceans.
WUTS – weak unknown train sources. This one is hard, but rare.
Boat sonar trains …. etc
Boat sonars and other man-made sources. These typically make long clicks with very regular timing.
The FP rate for sonars from KERNO-F is not as low as for cetaceans, but this identification is largely
correct and can be used to give information on boat presence. These are put in the ‘sonar’ category.
The presence of these sources usually reduces the sensitivity to cetacean trains (the classifier is
worrying). In signal detection terms they are interference rather than noise.
You should avoid deployment in harbours and busy traffic lanes if possible.
Acoustic fish tags are logged, and can be seen and extracted from the data, but rarely get identified as
train sources.
1. Assessing the whole file
A view of the whole time range of a file:
In most projects you will crop the file at the points where it has been deployed and before it is retrieved.
These can be shown for open files, or exported for batches of files, via the Filters +files page.
Here they are, bottom left, for the same file as above ‘Harbour outside 2021 01 14 FPOD_7001 file0.FP3’
As experience accumulates you may be able to skip train validation at this point for NBHF or Other Cet if there
are no warnings related to them.
2. Assessing the click trains
A fille may contain 100,000 trains and validation requires assessing only a sample – the default is 100 trains but
you can change that if your target max FP rate is 3% or less.
The sample is obtained by setting validation points:
Right-click in the display area of the FP3 (or CP3) file you want to validate and select ‘Set this
file for action’.
Set the filters on the left to the selection you will be using. For dolphins these are the normal
filter settings:
Click ‘set validation sampling points’ (low on the right of the Filters +files page shown above)
That will pick 100 minutes evenly spaced across all the ‘other cet’ clicks, with the starting point
at a randomised position in the first 1% of that set. If you repeat it you will get a different set, and
you can also change the number of validation points.
Set the display time resolution to 20ms or similar.
Using validation points:
The lowest panel is showing clicks/s – that’s just 1 / interval between successive clicks in the raw data. What
can be seen is a long fuzzy line spanning more than 1 minute showing repeated inter-click intervals. So it’s a
sonar! And the FP3 the click trains have been wrongly identified as coming from an NBHF species because the
clicks are the right length (number of cycles), the right frequency, and narrowband.
By filtering the clicks – not trains – to those over 110kHz that long flat line in clicks/s in the FP1 becomes
sharper:
Long nearly-horizontal lines like that are THE diagnostic feature of sonars. They may have a gentle slope
because of multipath effects. You do not see such long distinct lines in cetacean data.
There are other much weaker features, but you rarely need to look for them: KERNO-F handles the features
of the clicks reasonably well, so you won’t often get much power to detect any errors from them, but for
completeness here are the main discriminatory features:
Feature SONAR CETACEANS
Number of cycles in click Sometimes over 70cycles NBHF occasionally up to 70cycles
kHz Sometimes over 160kHz Rarely over 160kHz
Wavenumber of loudest Sometimes over 10, and Higher for NBHF than dolphins but not often
cycle sometimes late in a long click >10. Rarely occurs late in a long click.
Amplitude profile Tends to be flat From dolphins it is rarely flat
Bandwidth Often low Rarely low from dolphins
wider scene Boats often go past in a few mins
TRAIN SOURCE versus CHANCE TRAIN
This is a tough computational problem because there are so many distinct possible sequences of clicks in
almost every minute.
…. but the KERNO-F results on F-POD data are good, and much better than the results from the
KERNO classifier processing C-POD data. The KERNO classifier missed many dolphin click trains –
actually it did find the trains, but it could not be confident that they were cetacean trains. KERNO-F has
a unique (currently) input of very high-precision time-domain data on each click, with wave period
values (250ns resolution) consistently referenced to the loudest cycle in the click. It also receives the
cycle number of the loudest cycle, the last wave period and other time-domain features.
KERNO-F uses this high-resolution time-domain data to measure the coherence of each train and that
becomes a major element in identifying and rejecting chance trains. Coherence is an aggregated
measure of how much each click and each interval resemble its neighbours in the sequence…
However you will see errors occasionally and here are their features:
Feature Cetaceans Chance trains
wider scene Cetacean detections are typically of These are mainly isolated, or within
encounters in which the animals are within minutes that have true cetacean trains
detection range for a few minutes, where they have ‘survived’ as a result
producing trains that can be seen by eye of positive feedback provided by their
even if they have not been classified as true cetacean neighbours.
trains. A pattern of increasing amplitudes
early on, with a more rapid fall at the end is
typical.
Amplitude profile Cetacean trains typically form discrete, Chance trains and sonars are often not
within minute neatly rounded or prominent, humps on the prominent, or discrete, and the
amplitude display. amplitude envelope is ragged rather
- at 100ms
than smooth.
resolution or Sometimes the spacing of the peaks is
higher. showing you the cetaceans tail beat rate.
Amplitude profile Lots of sequences of similar profiles within a The profile usually jumps around from
within clicks train – see images below. Big step changes click-to-click but may be similar when
in the profile don’t matter – it’s the source is sediment transport noise with
sequences that count. a narrow frequency e.g. fine sand in
suspension.
Inter-click-interval Often has a smoothly varying profile. Don’t Overall, a more irregular graph with
profile i.e. click rate worry about infrequent very brief up/down sharp transitions in rate and few
spikes on the graph. smooth sections.
Multipath cluster Where these are logged they are generally the most powerful discriminatory feature
features and are described in more detail below.
In the graphic below the multipath frequency content forms structured lines in the lowest panel - the frequency
display of the FP1 file. This is typical of fast dolphin click trains because the pathway does not change much
during the short inter-click intervals, so the clicks get split and sometimes reunited in similar ways as they
travel.
Below there is an NBHF train and then a false train from a very brief noise burst.
It shows no such structure in the lowest panel which shows the frequencies , in the FP1 display (the lower 2
panels here), and the amplitude profile is ragged.
NBHF versus OTHER CETACEANS
The main problems here are:
Some, possibly many, dolphin species can produce narrow-band high frequency clicks. This
does not happen often but is seen in the large volumes of data collected by PODs. This gives
false NBHF trains.
Some ‘NBHF species’ make clicks that are not typical of the group and give false ‘other
cetacean’ trains. The report from the KERNO-F classifier gives an indication of this by showing
the actual modal kHz of the NBHF trains found in the whole file. (It is possible to shift the target
frequency, duration and position of the loudest cycle to optimise detections for particular
species and populations.)
Possibly some ‘NBHF species’ make clicks that are not typical of the group and may contain
much lower frequencies that appear in the multipath clusters. This gives false ‘other cetacean’
trains.
All features vary and are affected by ambient noise, so there are no perfectly sharp dividing
lines.
‘NBHF index’ is an arbitrary value derived from several click features (the code is given at the end)
including the number of cycles and bandwidth, so you often don’t need to look at them individually.
It is often useful in this species discrimination challenge. NBHFi also uses the ‘target frequency’ set
– the default is 120kHz, but there is variation between species and regions. If the target is too high it
will tend to generate errors in which NBHF trains are put into ‘other cet’. Fixing that requires re-
processing through KERNO-F with the adjusted target settings.
Multipath clusters Composed of clicks within NBHF frequency May include clicks outside the
range (105 – 150kHz) NBHF frequency range
… you rarely need to look at the features below as they are represented in the NBHFindex
There are good dolphin trains just before the misclassified NBHF train.
Zooming in on the first two ‘NBHF’ trains shows:
The echoes here – the lower amplitude lines between the louder clicks – show a range of frequencies with many
outside the kHz range of NBHF clicks. They are clearly part of the train because of their match in timing and
amplitude. So this cannot be an NBHF source.
The explanation here is that the clicks emitted from the Beluga’s melon on the acoustic axis were just
acceptable as an NBHF train, although many had low NBHFi values. At the same time the sound emitted off
the axis included lower frequencies than would come from a porpoise, and these are logged after reflection by
the sea surface.
WEAK UNKNOWN TRAIN SOURCES
WUTS
WUTS are not classified as such by KERNO-F because we have too little data, they overlap other species
classes, and KERNO-F does not take a sufficiently wide view of the pattern of detections.
Instead KERNO-F gives trains a WUTS risk, and trains can be filtered by that. A high risk does not mean that
it does come from a WUTS, only that it has some features of that.
Concluding it is a WUTS depends on your analysis of wider scene as described in the table below.
I’m confident WUTS are biological and that there are many species producing these sounds. Their features are
quite diverse and can overlap both dolphin and NBHF trains, so they are a challenge. Suspect sources include
small pelagic crustaceans, mollusc radulas, and polychaete worms in sediments.
They were first recognised in T-POD data from a ria in the SW of Britain, then in mangrove swamps, and they
seem to be more numerous in places with high nutrient levels. Other risky areas and PODs among kelps –
large seaweeds, or lying on the sea bed.
Feature WUTS The rest
wider scene These are generally isolated trains but ‘Encounters’ are usual and often there is a
where the POD is on the sea bed recognisable approach phase as clicks get
many may be recorded, and louder, then trains are identified, and the
sometimes this happens when the end of the encounter is more abrupt (at
POD is higher in the water column. least for cetaceans – boat sonars being
vertical may fade in the same way they
The absence of any ‘good’ trains of
grew)
dolphins or porpoises in the
surrounding minutes is the most
powerful feature.
Multipath clusters – Rare, and if present very limited i.e. Multipath is common with more clicks in
very important one weak replicate very close in time the cluster in the middle of the real train
to the primary path click. than near the start or finish.
Amplitude profile of Mostly fairly flat but some do have Rounded amplitude profiles are the norm
train rounded amplitude profiles, which is
normally a feature of cetacean trains
Frequency - kHz From the lowest logged to about Trains below 25kHz are not classified as
140kHz. A useful feature is a sweep in dolphins by KERNO-F
frequency through the train.
NBHF trains don’t show weak frequency
sweeps, and dolphin trains rarely show
smooth frequency sweeps (although they
might in broadband data)
Click rate range Can be very fast – near 2,000/s or The Boto uses social click trains at
down to 2/s … similarly high rates, but so far WUTS have
not been identified in data from rivers.
procedure SetNBHFindex;
var
diff, temp: integer;
begin
if Fs[FN].NrClk.ClkKHZ in [NBHFtargetKHZ - 2..NBHFtargetKHZ + 4] then temp:= 3 else
if Fs[FN].NrClk.ClkKHZ in [NBHFloKHZ ..NBHFhiKHZ ] then temp:= 2 else
if Fs[FN].NrClk.ClkKHZ in [NBHFminKHZ ..NBHFmaxKHZ ] then temp:= 1 else
temp:= 0;
diff:= Fs[FN].NrClk.Ncyc - NBHFtargetNcyc;
if diff < -2 then else
if diff < 6 then Inc(temp ) else
if diff < 20 then Inc(temp,2) else
if diff < 35 then Inc(temp ) else
if diff < 40 then else
if diff < 50 then Dec(temp ) else
if diff < 60 then Dec(temp,2) else
if diff < 70 then Dec(temp,3) else
Dec(temp,4);
if temp > 0 then
begin
diff:= Fs[FN].NrClk.PkAt - NBHFtargetPkAt; // PkAt = the wavenumber of the loudest cycle in the click
if diff < -3 then else
if diff = -3 then temp:= Max(0,temp shr 2) else // shr = shift all bits right by n places
if diff = -2 then temp:= Max(1,temp shr 1) else
if diff = -1 then Inc(temp ) else
if diff < 2 then Inc(temp,2) else
if diff < 4 then Inc(temp ) else
if diff < 6 then else
if diff < 8 then Dec(temp ) else
if diff < 13 then Dec(temp,2);