Futureinternet 13 00265 v2
Futureinternet 13 00265 v2
Article
Underwater Target Recognition Based on Multi-Decision
LOFAR Spectrum Enhancement: A Deep-Learning Approach
Jie Chen * , Bing Han , Xufeng Ma and Jian Zhang
National Key Laboratory of Science and Technology on Communications, University of Electronic Science and
Technology of China, Chengdu 610054, China; [email protected] (B.H.); [email protected] (X.M.);
[email protected] (J.Z.)
* Correspondence: [email protected]; Tel.: +86-1592-810-9908
Abstract: Underwater target recognition is an important supporting technology for the development
of marine resources, which is mainly limited by the purity of feature extraction and the universality of
recognition schemes. The low-frequency analysis and recording (LOFAR) spectrum is one of the key
features of the underwater target, which can be used for feature extraction. However, the complex
underwater environment noise and the extremely low signal-to-noise ratio of the target signal lead
to breakpoints in the LOFAR spectrum, which seriously hinders the underwater target recognition.
To overcome this issue and to further improve the recognition performance, we adopted a deep-
learning approach for underwater target recognition, and a novel LOFAR spectrum enhancement
(LSE)-based underwater target-recognition scheme was proposed, which consists of preprocessing,
offline training, and online testing. In preprocessing, we specifically design a LOFAR spectrum
enhancement based on multi-step decision algorithm to recover the breakpoints in LOFAR spectrum.
Citation: Chen, J.; Han, B.; Ma, X.; In offline training, the enhanced LOFAR spectrum is adopted as the input of convolutional neural
Zhang, J. Underwater Target network (CNN) and a LOFAR-based CNN (LOFAR-CNN) for online recognition is developed. Taking
Recognition Based on Multi-Decision advantage of the powerful capability of CNN in feature extraction, the recognition accuracy can be
LOFAR Spectrum Enhancement: further improved by the proposed LOFAR-CNN. Finally, extensive simulation results demonstrate
A Deep-Learning Approach. Future
that the LOFAR-CNN network can achieve a recognition accuracy of 95.22%, which outperforms the
Internet 2021, 13, 265. https://
state-of-the-art methods.
doi.org/10.3390/fi13100265
The application of deep learning in the field of underwater target recognition mainly
involves three aspects. The first is the field of underwater recognition. Due to many
reasons such as confidentiality and security, the collection and production of data sets
are difficult. Therefore, researchers will use as many existing samples as possible, such
as using Generative Adversarial Networks (GAN) to achieve sample expansion to meet
the needs of deep learning with large data volumes. The second is the orthodox field of
deep learning, such as computer vision and natural language recognition. Researchers
start from optimizing and designing complex deep neural network structures, and only
rely on neural networks to complete feature extraction. The third is the data preprocessing
stage before inputting the neural network. In view of the serious pollution of the collected
data by environmental noise, researchers perform denoising and spectral transformation
on audio samples, or perform image denoising on sonar images. Its purpose is to make
the sample features as obvious as possible through feature engineering, which is more
conducive to the needs of deep neural network feature extraction.
In this paper, we are interested in underwater target-recognition methods based on
deep learning. The Low-frequency analysis and recording (LOFAR) spectrum is widely
used in the field of passive sonar ship target recognition due to its significant sound source
information and relatively high signal-to-noise ratio. It transforms the signal from the
time domain to the time-frequency domain. Sonars usually observe the line spectrum in
the LOFAR spectrum to determine whether the target exists, and perform tracking and
recognition. This type of method is mainly realized by extracting features and training
classifiers. Unfortunately, because the data are always contaminated by environmental
noise, breakpoints are introduced in the LOFAR spectrum, which reduces the performance
of signal processing. To overcome this problem and further improve the performance
of underwater target recognition, we use deep-learning methods for underwater target
recognition, and propose an underwater target-recognition scheme based on LOFAR
spectral enhancement (LSE). This solution can restore the breakpoints in the LOFAR
spectrum and combine with Convolutional Neural Network (CNN) for online recognition,
which reduces the impact of environmental noise and significantly improves the target
recognition rate of existing algorithms.
1.1. Contributions
The main contributions of this paper are summarized as follows:
(1) In contrast to the traditional algorithm, we use the decomposition algorithm based on
resonance signal to preprocess the signal. Based on the multi-step decision algorithm
with the line spectrum characteristic cost function [4], this paper proposes the specific
calculation method of double threshold. In the purpose, this algorithm not only
retains the continuous spectrum information in the original LOFAR spectrum, but
also merges the extracted line spectrum with the original LOFAR spectrum. Finally,
the breakpoint completion of the LOFAR spectrum is realized.
(2) To further improve the recognition rate of underwater targets, we adopt the enhanced
LOFAR spectrum as the input of CNN and develop a LOFAR-based CNN (LOFAR-
CNN) for online recognition. Taking advantage of the powerful capability of CNN in
feature extraction, the proposed LOFAR-CNN can further improve the recognition
accuracy.
(3) Simulation results demonstrate that when testing on the ShipsEar database [5], our
proposed LOFAR-CNN method can achieve a recognition accuracy of 95.22% which
outperforms the state-of-the-art methods.
with the traditional feedforward neural networks such as MLP, three strategies in CNN
make use of the spatial correlation of data which include weight sharing, local receptive
field and down sampling. They reduce the risk of over fitting, the defect of gradient
disappearance the complexity and parameter size of the network. However, they improve
the generalization ability of the network. CNN was first proposed by LeCun [11] in 1990
and applied to the handwritten character detection system. In 2014, Szegedy [12] proposed
GoogleLeNet which introduced the inception module. Receptive fields of different sizes
enhanced the adaptability of the network to scale. The improved version [13,14] greatly
reduces the parameter amount to enhance the nonlinearity of the network and speed up
the calculation. The residual network was proposed by Kaiming. He [15] in 2015 adopted
the idea of Shortcut Connection (SC) to solve the problem of network degradation. After
full investigation and experimental verification, CNN is very suitable for underwater
target recognition.
In addition, many effective and efficient DL-based schemes have been proposed for
underwater target recognition. For example, Refs. [16,17] focused on underwater target
recognition which have sufficient training samples. In the first step, the original audio
was converted into LOFAR spectrum, and then GAN was used for sample expansion.
In the second step, a 15% performance improvement could be obtained using convolu-
tional neural networks (CNNs) for feature learning and classification when the number
of samples was more sufficient. Ref. [18] combined competitive learning with deep belief
network (DBN) and proposed a deep competitive network that used unlabeled samples to
solve small number of samples in acoustic target recognition. This method could achieve a
classification accuracy of 90.89%. To address the negative impact of redundant features on
recognition accuracy and efficiency, the authors in [19] proposed a compressed deep com-
petition network which combined network pruning with training quantization and other
technologies and could achieve a classification accuracy of 89.1%. Refs. [20,21] proposed a
new time-frequency feature extraction method by jointly exploiting the resonance-based
sparse signal decomposition (RSSD) algorithm, the phase space reconstruction (PSR), the
time-frequency distribution (TFD), and the manifold learning. At the same time, a one-
dimensional convolutional auto-encoder-decoder model was used to further extract and
separate features from high-resonance components, which finally completed the recogni-
tion task and achieves a recognition accuracy of 93.28%. In addition, Refs. [22–24] all used
convolutional neural networks for feature extraction, but the application scenarios and the
classifiers were different. Ref. [22] proposed an automatic target-recognition method of
unmanned underwater vehicle (UUV), which adopted CNN to extract features from sonar
images and used support vector machine (SVM) classifier to complete the classification.
Ref. [23] aimed to study different types of marine mammals. It also used the CNN+SVM
structure to complete the feature extraction and classification recognition task. It compared
the two classification and multi-class task scenarios. Ref. [24] adopted the civil ship data
set and exploited the framework structure of CNN+ELM (extreme learning machine) as
the underwater target classifier, which improved the recognition accuracy. We can see that
with the in-depth research of scholars, the recognition rate of underwater targets based on
deep learning has gradually increased.
1.3. Organization
The rest of this article is organized as follows. The second section introduces the model
of the system. The third section introduces the deep-learning underwater target signal
recognition framework based on multi-step decision LOFAR line spectrum enhancement.
The fourth section is the experimental verification and simulation results of our proposed
algorithm framework. The fifth section is the summary of the article.
Some notations in this paper are shown in the following. k · k2 and k · k1 respectively
represent the L2 norm and L1 norm. STFT {·} is short-time Fourier transform. Term E(·) is
the statistical expectation. argmin represents the variable value when the objective function
is minimized.
Future Internet 2021, 13, 265 4 of 21
2. System Model
In this paper, we consider a deep-learning underwater target-recognition frame-
work based on multi-step decision LOFAR line spectrum enhancement which is shown in
Figure 1. It is divided into four modules: sampling, feature preprocessing, offline training
and online testing.
Figure 1. Deep-learning underwater target-recognition framework based on multi-step decision LOFAR line
spectrum enhancement.
The RSSD algorithm regards resonance as the basis for signal decomposition [26], and
the Q factor quantifies the degree of signal resonance. Specifically, high-resonance signals
exhibit a higher degree of frequency aggregation in the time domain, more simultaneous
oscillating waveforms with a larger Q factor. Low-resonance signals appear non-oscillating
and indefinite transient signal with a smaller Q factor. Therefore, the basic theory of the
RSSD algorithm is that using two different wavelet basis functions (corresponding to Q
factors of different sizes), we can find a sparse representation of a complex signal and
reconstruct the signal.
The algorithm mentioned in this section is divided into adjustable Q-Factor Wavelet
Transform (TQWT) [27] and Morphological Component Analysis (MCA) [28]. Its algorithm
framework is shown in Figure 2.
Future Internet 2021, 13, 265 5 of 21
Low resonance
component
Ship Morphological Adjustable Q
High resonance
radiated component factor wavelet
component
noise analysis transform
Residual
component
Jh + 1 Jl +1
∑ ∑
j j
J (wl , wh ) = k x − Φh wh − Φl wl k22 + λh,j kwh k1 + λl,j kwl k1 . (2)
j =1 j =1
j
Here, Jh and Jl represent the number of decomposition layers of xh and xl . wh and
j
wl are the wavelet coefficients of the high-resonance component and the low-resonance
component of the jth layer, respectively. λh,j , λl,j are the normalized coefficients of wh,j , wl,j
and their values are related to energy of Φh,j , Φl,j :
(1) Select the appropriate filter scaling factor α, β according to the waveform character-
istics of the signal. Then calculate the parameters Qh , rh , Jh corresponding to the
high-resonance component, and the parameters Ql , rl , Jl corresponding to the low-
resonance component. At last, construct the corresponding wavelet basis function
Φh , Φl .
(2) Reasonably set the weighting coefficient λh,j , λl,j of the L1 norm of the wavelet
coefficients of each layer. Obtain the optimal wavelet coefficient wh∗ , wl∗ by minimizing
the objective function through the SALSA algorithm.
(3) Reconstruct the optimal sparse representation xh∗ , xl∗ of high-resonance components
and low-resonance components.
2
Wl J
1 Hlow(w) LPS α WL
Wl
Hlow(w) LPS α
x(n) Hlow(w) LPS α Hhigh(w) HPS β WH
J
Hhigh(w) HPS β
Hhigh(w) HPS β level J
level 2 2
WH
level 1
1
WH
LSP1/
*
WLJ H low ( w)
*
LSP1/ H low ( w)
WHJ HPS1/ *
H high ( w) LPS 1/ *
H low ( w) y(n)
level J HPS1/ *
H high ( w)
HPS 1/
*
H ( w)
WHJ 1 level J-1 high
level 1
WH1
The analysis filter bank of each layer is composed of high-pass filter Hhigh (w), low-pass
filter Hlow (w), and the corresponding scaling process, which are defined as follows:
| w | ≤ (1 − β ) π
0
θ ( ααπ −w (1 − β)π ≤ w ≤ απ ,
Hhigh (w) = + β −1 ) (7)
απ ≤ |w| ≤ π
1
1
| w | ≤ (1 − β ) π
w+( β−1)π
Hlow (w) = θ ( α+ β−1 ) (1 − β)π ≤ w ≤ απ . (8)
0 απ ≤ |w| ≤ π
p
θ (w) = 0.5(1 + cos(w)) 2 − cos(w) is the Daubechies filter with second-order disap-
pearing moment [27]. α, β (0 < α < 1, 0 < β < 1) are the scaling factors after the signal
passes through the low-pass and high-pass filters, respectively. The scaling process of
low-pass and high-pass are defined as:
X ( βw + (1 − β)π ) 0≤w≤π
Y (w) = . (10)
X ( βw − (1 − β)π ) −π < w < 0
The Q factor quantifies the degree of signal resonance, and its definition is f c /BW,
where f c represents the center frequency of the signal and BW represents the bandwidth.
If the sampling frequency of the original input signal is f s , then the center frequency
f c , the filter bank level j and α, β [31] can be expressed as:
2−β
fc = αj fs. (11)
4α
Similarly, bandwidth BW can be expressed as:
2−β
Q= . (13)
β
After the original signal passes through the filter bank, the output of the low-pass
channel is iteratively inputted to the deeper level filter bank until the preset level J. At the
same time, the wavelet basis functions Φh , Φl are constructed by selecting the oversampling
rate r. The deepest level Jmax and the oversampling rate r are defined as follows:
β
r= , (14)
α+1
log( βN/8)
Jmax = . (15)
log(1/α)
In summary, in the TQWT algorithm, Q, r, J can be calculated by selecting α, β, and
α, β selection is only determined by the inherent oscillation characteristics of the signal.
Therefore, it can flexibly select α, β according to the specific requirements of Q, r, J. For the
input signal of ship radiated noise, we need to set Qh , rh , Jh to extract its high-resonance
information and set Ql , rl , Jl to extract its low-resonance information.
where STFT {·} is short-time Fourier transform, s(t) is the signal to be transformed and
w(t) is the window function (truncating function). The process of calculating the LOFAR
spectrum can be compared with the “LOFAR Spectrum” in the Feature Processing stage in
Figure 1. The specific calculation steps are as follows:
(1) Framing and windowing. The sound signal is unstable globally, but can be regarded
as stable locally. In the subsequent speech processing, a stable signal needs to be input.
Therefore, it is necessary to frame the entire speech signal, i.e., to divide it into multiple
segments. We divide the sampling sequence of the signal into K frames and each frame
contains N sampling points. The larger the N and K, the larger the amount of data, and
the closer the final result is to the true value. Due to the correlation between the frames,
there are usually some points overlap between the two frames. Framing is equivalent
to truncating the signal, which will cause distortion of its spectrum and leakage of
its spectral energy. To reduce spectral energy leakage, different truncation functions
which are called window function can be used to truncate the signal. The practical
application window functions include Hamming window, rectangular window and
Hanning window, etc.
(2) Normalization and decentralization. The signal of each frame needs to be normalized
and decentralized, which can be calculated by the following formula:
s(t) − E[s(t)]
s00 (t) = . (17)
max(|s0 (t)|)
0
Here, s (t) is the normalization of s(t), which makes the power of the signal uniform in
00
time. s (t) is the decentralization of s(t), which makes the mean of the samples zero.
(3) Perform Fourier transform on each frame signal and arrange the transformed spectrum
in the time domain to obtain the LOFAR spectrum.
λF (η ) + µT (η )
O(η ) = , (18)
A(η )
where η represents a summation path along the time axis in the observation window
of the LOFAR graph, and the length of the path is N. A(η ) characterize the amplitude
characteristics of the line spectrum, F (η ) is the frequency continuity of the line spectrum,
and T (η ) is the trajectory continuity of the line spectrum, λ and µ are weighting coefficients.
The definitions of A(η ), F (η ), and T (η ) are as follows:
N
A(η ) = ∑ a( Pi ), (19)
i =1
Future Internet 2021, 13, 265 9 of 21
N
F (η ) = ∑ |d( Pi−2 , Pi−1 ) − d( Pi−1 , Pi )|, (20)
i =3
N
T (η ) = ∑ g( Pi ). (21)
i =1
Each pixel on the summing path is Pi (1 ≤ i ≤ N ), which means a point on the i line
of the time axis. a( Pi ) characterizes the amplitude of the point Pi . d( Pi−1 , Pi ) characterizes
the frequency gradient at two points in the path, which is defined as follows:
3.3. Sliding Window Line Spectrum Extraction Algorithm Based on Multi-Step Decision
In this section, a sliding window line spectrum extraction algorithm based on multi-
step decision is used to search for the optimal path. As shown in Figure 5, in this algorithm,
a window which can slide along the frequency axis and cover the whole time axis is set in
the LOFAR spectrum. We search the optimal path in this window. The reason for setting
the window is that there may be multiple line spectrum co-existing in the LOFAR spectrum.
By properly setting the size of the window, the search range of the path can be limited to
a certain region of the LOFAR spectrum. Then the line spectrum in each window can be
extracted, which can avoid that only the strongest spectral line is extracted in the whole
LOFAR spectrum.
Future Internet 2021, 13, 265 10 of 21
*p
k *pj *p
k L 1
i 1 i 1 i 1
t1
f1 fk f k L 1 fM
Figure 5. Frequency-domain sliding window multi-step decision dynamic tracking line spectrum.
To cover a line spectrum in a search window, the size of the window is related to
the line spectrum broadening and the frequency resolution in the LOFAR spectrum. The
specific calculation steps are as follows:
(1) The length of the frequency axis in the LOFAR spectrums M. The start point is f 1 , and
the end point is f N . The length of the time axis is N. The start point is t1 , and the end
points t N . The search window size is defined as L.
j
(2) Each point in the figure is defined as Pi , representing the time-frequency pixel on the
jth column on the frequency axis and the ith row on the time axis, where 1 ≤ j ≤ M,
1 ≤ i ≤ N. η ∗ j represents the optimal path from t1 to t N in the observation window,
Pi
j
A ( η j ), F ( η j ), T ( η ∗ j )
∗ ∗ defines as a set of ternary vectors for points Pi , and the triplet
Pi Pi Pi
j
of each point at t1 is initialized to ( a( P1 ), 0, 0).
(3) From t2 to t N , find the optimal path with length from 2 to N in the search window line
by line. In the figure, Pi is set to any point in t1 , the start position of the observation
window is f k , and the corresponding end position is f k+ L−1 . At ti−1 , the neighboring
L points of Pi form a set as follows, V ( Pi ) = { Pik−1 , · · · , Pik−+1L−1 }, the optimal path ηP∗i
j
to the length i of the point Pi is obtained from the optimal path η ∗ j of Pi−1 ∈ V ( Pi ),
Pi
i.e., ηP∗i = η ∗ j ∪ { Pi }, where k ≤ j ≤ k + L − 1 satisfies:
Pi−1
(5) Set a counter for each time-frequency point in the LOFAR spectrum, and the counter
value is initialized to 0. If the value of the objective function O(η ∗) corresponding to
the optimal path η ∗ in the search window is greater than the threshold γ, we would
consider that there is a line spectrum on the optimal path, and the counter values
corresponding to the N points on the optimal path are increased by 1 respectively.
The specific steps of threshold calculation are as follows:
First, the input of the algorithm is changed from the LOFAR spectrum of ship radiation
noise to the LOFAR spectrum of mariner environmental noise. The corresponding cost
r
function O ηnoise of the optimal path ηnoise in the rth observation window is obtained,
where 1 ≤ r ≤ M − L + 1 then the threshold is:
r
γ = min O(ηnoise ). (29)
1≤r ≤ M
(6) Slide the search window with a step size of 1. Repeat the above steps until the
observation window slides to the end. The output count value graph is the traced
line spectrum.
Item Value
Optimizer adam
Learning rate 0.01
Number of samples 200
Training round 30
Loss function Cross entropy loss function
Adam was originally proposed by Diederik Kingma of OpenAI and Jimmy Ba of the
University of Toronto [36]. It is a first-order optimization algorithm that can replace the
traditional stochastic gradient descent process. It can iteratively update neural network
weights based on training data.
5. Numerical Results
5.1. Source of Experimental Data
The experimental data used in this article is divided into two parts: The first part of the
underwater acoustic database is named ShipsEar [5], which was recorded by David et al.
in the port of Vigo and it is vicinity on the Atlantic coast of northwestern Spain. The second
part is based on the four types of signals simulated by the ship radiated noise. By mixing
with the audio No. 81–92 in the database which are treated as the pure marine environment
background noise, the simulated actual ship radiated noise under different signal-to-noise
ratios is obtained.
Vigo Port is one of the largest ports in the world with a considerable cargo and
passengers. Taking advantage of the high traffic intensity of the port and the diversity of
ships, it can record the radiated noise of many different types of ships on the dock, including
fishing boats, ocean liners, Roll-on/Roll-off ships, tugboats, yachts, small sailboats, etc.
The ShipsEar database contains 11 ship types (marine environmental noise) and a total of
90 audio recordings in “wav” format, with audio lengths varying from 10 s to 11 min.
By extracting and summarizing audios in the database, it is divided into four categories
according to the size of the ship types collected which is shown in Table 3. In addition,
the date and weather conditions of the collected audios, the coordinates and driving
status of the ship’s specific position, the number, depth and power gain of hydrophones,
Future Internet 2021, 13, 265 13 of 21
atmospheric and marine environmental data are also listed in detail. The information can
be used as a reference in the study.
Item Value
W Fishing boat, trawler, mussel harvester, tugboat, dredge
X Motorboat, pilot boat, sailboat
Y Passenger ferry
Z Ocean liner, ro-ro ship
During the whole experiment, we mainly use two pieces of software, Pycharm 2019.1
and MATLAB R2016B. Referring to Figure 1, we used MATLAB to complete the entire
process of feature extraction. The algorithm in Section 3 is based on MATLAB for the
calculation of LOFAR spectrum samples. For the algorithm in Section 4, we used Pycharm
and Python to complete the task of CNN-based underwater target recognition. In addition,
the depth library and software toolbox used are Keras (offline training and online testing),
Librosa (all phasis) and TWQT (sampling and feature preprocessing).
5.3. Multi-Step Decision LOFAR Line Spectrum Enhancement Algorithm Validity Test
In this section, the audio data of ShipsEar (a database of measured ship radiated noise)
is used to verify the effectiveness of the algorithm.
In the process of testing the effectiveness of the multi-step decision LOFAR line
spectrum enhancement algorithm, morphological component analysis is required. In
Section 2.1.1, we detailed the use of the RSSD algorithm to construct the optimal sparse
representation of the high and low-resonance components in the ship radiated noise. In
the RSSD algorithm, it is necessary to select an appropriate filter scaling factor according to
Future Internet 2021, 13, 265 14 of 21
where NA ( f ) and NB ( f ) represent the power spectrum of the two types of signals A and
B, respectively. f 1 and f 2 represent the range of the power spectrum. This means that the
radiated noise of the two types of ships with a higher degree of difference has a smaller
spectral correlation coefficient. It can be seen from Table 5 that the spectral correlation
coefficients in the high-resonance components of signals A and B are smaller than their
original spectral correlation coefficients. It means we can enhance the degree of difference
between the two signals by extracting the high-resonance components of the signal.
10
Percentage of energy of this
8
component signal
0
411.9844 1647.9375 6591.75 26,367
Frequency/Hz
Figure 6. Percentage of total energy of each frequency band of low-resonance component signal.
Future Internet 2021, 13, 265 15 of 21
component signal 6
0
411.9844 1647.9375 6591.75 26,367
Frequency/Hz
Figure 7. The percentage of total energy of each frequency band of the original signal.
10
Percentage of energy of this
8
component signal
0
411.9844 1647.9375 6591.75 26,367
Frequency/Hz
Figure 8. Percentage of total energy of each frequency band of high-resonance component signal.
Table 5. Spectral correlation coefficients between the two types of original signals and their high-
resonance components.
For the line spectrum enhancement algorithm based on multi-step decision, the exper-
imental results are shown in Figure 9 and 10, which are the LOFAR spectrum of the original
signal and the LOFAR spectrum after line spectrum enhancement. In Figure 9, there is an
obvious line spectrum in the part marked by white circles, but the line spectrum is broken
in the part marked by black circles. In Figure 10, the line spectrum indicated by the white
circles are extended to completeness, and the vacant part of the line spectrum indicated
by the black circles is also completed. Therefore, even if the line spectrum in the LOFAR
Future Internet 2021, 13, 265 16 of 21
spectrum has “breakpoints”, “broken lines” or only a short line due to noise interference,
the line spectrum enhancement algorithm can still extend and complete the line spectrum.
dB
20
20
0
15 -20
Time/s
10 -40
-40
5
-80
dB
20
20
0
15 -20
Time/s
10 -40
-60
5
-80
limited to a certain range, which can eliminate singular sample data. At the same
time, it can also avoid the saturation of neurons and accelerate the convergence rate
of the network.
(3) First, perform Fourier transform on each frame signal. Second, take the logarithmic
amplitude spectrum of the transformed spectrum and arrange it in the time domain.
Last, take 64 points on the time axis as a sample, which means obtaining a size of
1024 ∗ 64 LOFAR spectrum sample. The sampling frequency of audio is 52,734 Hz,
and the duration of each sample is about 0.62 s. The numbers of training and testing
sets of various samples are shown in Table 6. The ID in the table is the label of the
audio in the ShipsEar database. The corresponding type of ship for each segment can
be obtained according to the ID. The type of ship corresponding to audio is used as a
label for supervised learning of deep neural networks.
(4) The sample obtained in step (3) is treated with LOFAR spectrum enhancement. The
specific sample processing process and calculation process are in Section 3.3. Then the
LOFAR spectrum with enhanced line spectrum characteristics is obtained. The LOFAR
spectrum is a two-dimensional matrix, which can be regarded as a single-channel
image. After that as shown in Figure 1, the data enhanced by the multi-step decision
LOFAR spectrum is input into the CNN network for subsequent identification.
400
Sample num
X 0.0504 0.8760 0.0736 0
Real label
300
Y 0 0 1.0000 0
200
100
Z 0 0.0017 0.0216 0.9768
0
W X Y Z
prediction
Figure 11. Confusion matrix of four types of measured ship radiated noise under CNN.
Table 7. The recognition accuracy rate of the four types of measured ship radiated noise.
Figure 12 shows the ROC curve and the corresponding AUC value of the four types
of signals. The horizontal axis uses a logarithmic scale to enlarge the ROC curve in the
upper left corner. The ROC curves of the signals of W, Y, and Z are relatively close to the
(0, 1) point, and their classification effects are relatively good. However, the ROC curve
of the signals of type X is closest to the 45-degree line, so the classification effect is worst.
Judging from the AUC value, the AUC of the Z-type signal is the highest, which reaches
0.9981. The AUC values of the W-type and Y-type signals have respectively reached 0.9952
and 0.9925. The AUC value of the X-type signal is only 0.9702. Therefore, the classification
effect of the X-type signal is also inferior to the other three types of signals.
0.8
True Positive Rate
0.6
0.4
W class(AUC=0.9952)
0.2 X class(AUC=0.9702)
Y class(AUC=0.9925)
Z class(AUC=0.9981)
0
-3 -2 -1 0
10 10 10 10
False Positive Rate
Figure 12. The ROC curve and AUC value of four types of measured ship radiated noise under CNN.
Future Internet 2021, 13, 265 19 of 21
6. Conclusions
In this paper, we have studied underwater target recognition using the LOFAR spec-
trum. First, a deep-learning underwater target-recognition framework based on multi-step
decision LOFAR line spectrum enhancement is developed, in which we use CNN for
offline training and online testing. Under the developed underwater target-recognition
framework, we then use the LOFAR spectrum as the input of CNN. In particular, on
calculating the LOFAR spectrum of the high-resonance component, we use the algorithm
based on resonance and design the LOFAR spectrum line enhancement algorithm which
is based on multi-step decision. To the best of our knowledge, the difference between
the radiated noise of different types of ships is enhanced, and the broken line spectrum
can be detected and enhanced. Finally, we conduct extensive experiments in terms of
the detection performance, scalability, and complexity. The results have shown that the
LOFAR-CNN method can achieve the highest recognition rate of 95.22% with the measured
ship radiation noise which can further improve the recognition accuracy compared with
other traditional method.
7. Future Works
This paper uses deep-learning methods to provide a framework to realize underwater
target recognition. This algorithm shows excellent underwater target-recognition ability,
and has great application value in many aspects such as seabed exploration, oil platform
monitoring and economic fish detection. It can not only predict dangerous objects in
advance, improve the safety of ship navigation, but also create greater economic benefits.
However, there are still some shortcomings that need to be resolved.
(1) Most studies on underwater target recognition do not disclose data sources for reasons
such as confidentiality. There is also a lack of unified and standardized data sets in
the industry. The actual measured ship radiated noise data set used in this article is
already one of the few publicly available underwater acoustic data sets. However, the
data set itself is seriously disturbed by marine environmental noise. The number of
samples of various types of ships is unevenly distributed, and the total number of
samples is also insufficient. Therefore, how to combine underwater target recognition
with deep learning under limited conditions is a big problem.
(2) Because the data set is seriously disturbed, this paper adopts a series of feature-
enhancing preprocessing methods to improve the recognition rate, and has achieved
excellent results. In fact, further reducing the impact of ocean noise and evaluating
the impact of various neural networks on the recognition effect can be considered in
the future work.
Author Contributions: Conceptualization, J.C.; methodology, J.C., B.H., X.M. and J.Z.; software, B.H.
and X.M.; validation, B.H. and X.M.; formal analysis, J.C. and B.H.; investigation, J.C. and B.H.;
resources, J.C. and B.H.; data curation, J.C., B.H., X.M. and J.Z.; writing—original draft preparation,
J.C. and B.H.; writing—review and editing, J.C. and B.H.; visualization, J.C. and B.H.; supervision,
J.C.; project administration, J.C.; funding acquisition, J.C. All authors have read and agreed to the
published version of the manuscript.
Funding: This research was funded by the China National Key R&D Program under grant
2020YFB1807700.
Data Availability Statement: Not Applicable, the study does not report any data.
Conflicts of Interest: The authors declare no conflict of interest.
Reference
1. Xie, J.; Fang, J.; Liu, C.; Li, X. Deep learning-based spectrum sensing in cognitive radio: A CNN-LSTM approach. IEEE Commun.
Lett. 2020, 24, 2196–2200. [CrossRef]
2. Liu, C.; Yuan, W.; Li, S.; Liu, X.; Ng, D.W.K.; Li, Y. Learning-based Predictive Beamforming for Integrated Sensing and
Communication in Vehicular Networks. arXiv 2021, arXiv:2108.11540.
Future Internet 2021, 13, 265 20 of 21
3. Xie, J.; Fang, J.; Liu, C.; Yang, L. Unsupervised deep spectrum sensing: A variational auto-encoder based approach. IEEE Trans.
Veh. Technol. 2020, 69, 5307–5319. [CrossRef]
4. Hubel, D.H.; Wiesel, T.N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Phy. 1962,
160, 106–154. [CrossRef]
5. Santos-Domínguez, D.; Torres-Guijarro, S.; Cardenal-López, A.; Pena-Gimenez, A. ShipsEar: An underwater vessel noise
database. Appl. Acoust. 2016, 113, 64–69. [CrossRef]
6. Ciaburro, G.; Iannace, G. Improving Smart Cities Safety Using Sound Events Detection Based on Deep Neural Network
Algorithms. Informatics 2020, 7, 23. [CrossRef]
7. Liu, C.; Wei, Z.; Ng, D.W.K.; Yuan, J.; Liang, Y.C. Deep transfer learning for signal detection in ambient backscatter communica-
tions. IEEE Trans. Wirel. Commun. 2020, 20, 1624–1638. [CrossRef]
8. Ciaburro, G. Sound Event Detection in Underground Parking Garage Using Convolutional Neural Network. Big Data Cogn.
Comput. 2020, 4, 20. [CrossRef]
9. Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, UK, 2016.
10. Liu, C.; Wang, J.; Liu, X.; Liang, Y.C. Deep CM-CNN for spectrum sensing in cognitive radio. IEEE J. Sel. Areas Commun. 2019,
37, 2306–2321. [CrossRef]
11. LeCun, Y.; Boser, B.E.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.E.; Jackel, L.D. Handwritten digit recognition
with a back-propagation network. In Proceedings of the Handwritten Digit Recognition with a Back-Propagation Network.
Advances in Neural Information Processing Systems, Denver, CO, USA, 27–30 November 1989; pp. 396–404. Available online:
https://proceedings.neurips.cc/paper/1989/file/53c3bce66e43be4f209556518c2fcb54-Paper.pdf (accessed on 4 October 2021).
12. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with
convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA,
7–12 June 2015; pp. 1–9.
13. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016;
pp. 2818–2826.
14. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on
learning. In Proceedings of the Thirty-first AAAI conference on artificial intelligence (AAAI), San Francisco, CA, USA, 4–9
February 2017; pp. 4278–4284.
15. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on
computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, 17–19 June 2016; pp. 770–778.
16. Jin, G.; Liu, F.; Wu, H.; Song, Q. Deep learning-based framework for expansion, recognition and classification of underwater
acoustic signal. J. Exp. Theor. Artif. Intell. 2019, 32, 205–218. [CrossRef]
17. Liu, F.; Song, Q.; Jin, G. Expansion of restricted sample for underwater acoustic signal based on generative adversarial networks.
May 2019. In Proceedings of the Tenth International Conference on Graphics and Image Processing (ICGIP), Chengdu, China,
12–14 December 2018; Volume 11069, pp. 1106948–1106957.
18. Yang, H.; Shen, S.; Yao, X.; Sheng, M.; Wang, C. Competitive deep-belief networks for underwater acoustic target recognition.
Sensors 2018, 18, 952–965. [CrossRef]
19. Shen, S.; Yang, H.; Sheng, M. Compression of a deep competitive network based on mutual information for underwater acoustic
targets recognition. Entropy 2018, 20, 243–256. [CrossRef] [PubMed]
20. Yan, J.; Sun, H.; Chen, H.; Junejo, N.U.R.; Cheng, E. Resonance-based time-frequency manifold for feature extraction of
ship-radiated noise. Sensors 2018, 18, 936–957. [CrossRef]
21. Ke, X.; Yuan, F.; Cheng, E. Underwater Acoustic Target Recognition Based on Supervised Feature-Separation Algorithm. Sensors
2018, 18, 4318–4342. [CrossRef] [PubMed]
22. Zhu, P.; Isaacs, J.; Fu, B.; Ferrari, S. Deep learning feature extraction for target recognition and classification in underwater sonar
images. In Proceedings of the 2017 IEEE 56th Annual Conference on Decision and Control CDC), Melbourne, Australia, 12–15
December 2017; pp. 2724–2731.
23. McQuay, C.; Sattar, F.; Driessen, P.F. Deep learning for hydrophone big data. In Proceedings of the 2017 IEEE Pacific Rim
Conference on Communications, Computers and Signal Processing (PACRIM), Victoria, QC, Canada, 21–23 August 2017; pp. 1–6.
24. Hu, G.; Wang, K.; Peng, Y.; Qiu, M.; Shi, J.; Liu, L. Deep learning methods for underwater target feature extraction and recognition.
Comput. Intell. Neurosci. 2018, 2018, 1–10. [CrossRef]
25. Kubáčková, L.; Burda, M. Mathematical model of the spectral decomposition of periodic and non-periodic geophysical stationary
random signals. Stud. Geophys. Geod. 1977, 21, 1–10. [CrossRef]
26. Huang, W.; Sun, H.; Liu, Y.; Wang, W. Feature extraction for rolling element bearing faults using resonance sparse signal
decomposition. Exp. Tech. 2017, 41, 251–265. [CrossRef]
27. Selesnick, I.W. Wavelet transform with tunable Q-factor. IEEE Trans. Signal Process. 2011, 59, 3560–3575. [CrossRef]
28. Starck, J.L.; Elad, M.; Donoho, D.L. Image decomposition via the combination of sparse representations and a variational
approach. IEEE Trans. Image Process. 2005, 14, 1570–1582. [CrossRef] [PubMed]
29. Al-Raheem, K.F.; Roy, A.; Ramachandran, K.; Harrison, D.K.; Grainger, S. Rolling element bearing faults diagnosis based on
autocorrelation of optimized: Wavelet de-noising technique. Int. J. Adv. Manuf. Technol. 2009, 40, 393–402. [CrossRef]
Future Internet 2021, 13, 265 21 of 21
30. Shensa, M.J. The discrete wavelet transform: Wedding the a trous and Mallat algorithms. IEEE Trans. Signal Process. 1992,
40, 2464–2482. [CrossRef]
31. Wang, H.; Chen, J.; Dong, G. Feature extraction of rolling bearing’s early weak fault based on EEMD and tunable Q-factor wavelet
transform. Mech. Syst. Signal Proc. 2014, 48, 103–119. [CrossRef]
32. Di Martino, J.C.; Haton, J.P.; Laporte, A. Lofargram line tracking by multistage decision process. In Proceedings of the 1993
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Minneapolis, MN, USA, 27–30 April 1993;
Volume 1, pp. 317–320.
33. Liu, C.; Liu, X.; Ng, D.W.K.; Yuan, J. Deep Residual Learning for Channel Estimation in Intelligent Reflecting Surface-Assisted
Multi-User Communications. IEEE Trans. Wirel. Commun. 2021, 1. [CrossRef]
34. Liu, X.; Liu, C.; Li, Y.; Vucetic, B.; Ng, D.W.K. Deep residual learning-assisted channel estimation in ambient backscatter
communications. IEEE Wirel. Commun. Lett. 2020, 10, 339–343. [CrossRef]
35. Liu, C.; Yuan, W.; Wei, Z.; Liu, X.; Ng, D.W.K. Location-aware predictive beamforming for UAV communications: A deep learning
approach. IEEE Wirel. Commun. Lett. 2020, 10, 668–672. [CrossRef]
36. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980.
37. Chen, Z.; Li, Y.; Cao, R.; Ali, W.; Yu, J.; Liang, H. A New Feature Extraction Method for Ship-Radiated Noise Based on Improved
CEEMDAN, Normalized Mutual Information and Multiscale Improved Permutation Entropy. Entropy 2019, 21, 624–640.
[CrossRef]
38. Yuan, F.; Ke, X.; Cheng, E. Joint Representation and Recognition for Ship-Radiated Noise Based on Multimodal Deep Learning. J.
Mar. Sci. Technol. Eng. 2019, 7, 380–397. [CrossRef]
39. Ke, X.; Yuan, F.; Cheng, E. Integrated optimization of underwater acoustic ship-radiated noise recognition based on two-
dimensional feature fusion. Appl. Acoust. 2020, 159, 107057–107070. [CrossRef]
40. Hou, W. Spectrum autocorrelation. Acta Acust 1988, 2, 46–49.