1. Introduction
The advancement of science and technology has led to increased resource consumption and a growing scarcity of land resources, making the resource-rich oceans the target of competition among nations [
1]. AUVs, with their outstanding underwater operational capabilities, have become one of the most widely used devices in ocean exploration [
2].
Since the beginning of the new century, with the acceleration of ocean development, underwater robots, particularly AUVs, have entered the public eye. The excellent underwater mobility and high level of autonomous control of AUVs have significantly enhanced human capabilities in exploring and developing the deep sea, showing great potential in marine exploration, surveying, and mapping [
3]. Precise navigation of AUVs is a prerequisite for performing various underwater missions, and its accuracy directly affects the effectiveness of underwater detection, especially in hydrographic data collection and seabed mapping. Due to the inability to obtain GNSS signals underwater, AUVs cannot correct their position in real-time [
4]; thus, inertial navigation systems have been widely applied in the field of underwater navigation. Currently, the intergration of SINS and DVL remains the mainstream solution in underwater navigation. Although the cooperative navigation of inertial and acoustic velocity measuring devices generally meets requirements and keeps navigation errors within a controllable range, its limitation lies in the dependency of navigation accuracy on the precision of the equipment. The high cost of SINS and DVL limits the application and data collection of small AUVs [
5]. Specifically, the cost of fiber optic inertial navigation systems can reach tens of thousands of dollars, making it unaffordable for small teams. For AUVs that cannot frequently surface to obtain GNSS signals, acoustic navigation is a suitable alternative. In fact, numerous studies have been conducted by scholars based on USBL/LBL to correct the cumulative errors of dead reckoning, but this method inherently relies on external sensors, which undoubtedly increases costs. Additionally, the USBL/LBL base stations need to be pre-deployed, restricting the activity range of AUVs and making them unsuitable for long-distance missions [
6].
In recent years, Visual-Inertial Navigation Systems (VINS) have demonstrated significant potential in applications such as autonomous driving and drone navigation, providing accurate and robust navigation solutions [
7]. However, despite the many advantages of VINS, its application in AUV navigation faces substantial limitations in underwater environments characterized by weak textures and low lighting conditions [
8]. Additionally, underwater SLAM (Simultaneous Localization and Mapping) technology based on imaging sonar has also made significant progress, but so far, most SLAM algorithms designed for AUVs remain prohibitively expensive, making the real-time requirements for practical implementation in AUVs quite challenging [
9,
10].
MEMS inertial sensors have dominated the market due to their low cost and relatively reliable performance. Thanks to advanced MEMS manufacturing processes, these sensors can achieve precise measurements and have significant advantages in terms of size, weight, and power consumption. However, the accuracy of IMUs is susceptible to calibration parameters, including scale factors and axis misalignment [
11,
12]. Inaccurate calibration can lead to imprecise measurements of angular velocity and linear acceleration, causing errors to accumulate rapidly during integration, which could lead to severe consequences [
13].
To limit IMU error drift and address the issue of cumulative error fundamentally, scholars have conducted extensive research. By improving traditional calibration methods, such as gravity-based accelerometer correction and continuous rotation-based gyroscope correction, Rohac proposed a calibration method based on a sensor error model for accurately correcting MEMS sensor errors [
12]. Additionally, by extending the open-source Kalibr toolkit [
14], Rehder and colleagues achieved simultaneous calibration of multiple IMU parameters [
15]. Over the past decade, with the tremendous success of neural networks in various fields, many researchers have started applying them to dead reckoning [
16], indoor pedestrian navigation [
17], gyroscope noise reduction [
18], and attitude estimation [
19]. Despite the relatively shallow layers of neural networks designed in these methods, the results obtained have been remarkably effective [
20]. Additionally, most of these studies focus on datasets collected by UAVs/rovers, and have not yet conducted research related to surface or underwater environments.
In this paper, we propose a lightweight network based on CNN–LSTM for the calibration of low-cost IMU gyroscopes. This network is based on the mathematical model of gyroscope calibration and extracts spatiotemporal features from historical IMU sequences to dynamically calibrate IMU measurements. By using the calibrated IMU measurements, we can achieve orientation estimation accuracy approaching that of fiber-optic SINS. Integrated with DVL (instruments that use the Doppler effect to measure underwater velocity), this method provides a new approach for low-cost underwater navigation solutions. The methodology framework can be found in
Figure 1.
Our contributions can be summarized as follows:
We propose a lightweight CNN–LSTM model based on dilated convolutions to dynamically compensate for IMU measurement errors through learning.
We introduce a gyroscope calibration matrix (including additive and multiplicative noise) to construct a training and calibration framework, optimizing the calibration matrix and hyperparameters within the network through learning.
We conduct qualitative and quantitative evaluations of the proposed method using a self-made dataset. Current research mostly focuses on vehicles and drones, to fill the gap in waterborne or underwater navigation systems, we created a waterway navigation dataset using a motorboat and the experimental results show that the denoised MEMS IMU data achieve accuracy close to that of fiber-optic SINS. When integrated with DVL, this method provides a reliable reference for low-cost underwater navigation solutions.
This paper will be organized as follows:
Section 2 introduces related work,
Section 3 presents the mathematical theory and our method,
Section 4 qualitatively and quantitatively tests the proposed method through experiments, and
Section 5 concludes with future works.
2. Related Works
To improve the accuracy of IMUs, their parameters need to be calibrated. Tedaldi et al. [
21] proposed an IMU calibration method that does not require external equipment. This method involves placing the IMU in different static orientations at multiple positions. To avoid unobservability in the estimation of the calibration parameters, data from at least nine different orientations of the IMU must be collected. The more orientations used, the more accurate the calibration results. Cheuk et al. [
22] proposed an automatic IMU calibration method that obtains the IMU’s scale factors, biases, and angle alignment errors by rotating all axes of the IMU and holding it stationary in 12 positions. Zhang et al. [
23] proposed an IMU calibration method based on a three-dimensional turntable, which not only removes biases but also focuses on correcting the angular correlations between different sensors. To address on-site IMU calibration issues, Qureshi [
24] proposed a method using the Earth’s gravitational field as a reference. This method does not require external equipment and only involves a few simple rotations of the IMU within 20 min. These methods often rely on mathematical models and cannot achieve online calibration, making it difficult to meet the needs for dynamic calibration.
Aimed at VINS, Furgale et al. proposed a method that calculates the calibration parameters of the IMU offline by pre-calibrating the external parameters between the camera and the IMU, known as the Kalibr library [
14]. This method has been widely used in visual–inertial odometry and was extended in 2016 to achieve axis calibration for multiple IMUs [
15]. For visual–inertial odometry, Qin et al. proposed an online calibration method for dynamically optimizing model parameters [
25], and also introduced VINS-Mono, which is one of the most advanced visual–inertial odometry systems [
26]. However, such methods rely on additional equipment, and due to the difficulty of obtaining clear optical images under low-light conditions underwater, these methods are challenging to apply in underwater navigation.
Additionally, given the tremendous success of deep learning in various fields, researchers have also started to focus on using deep learning techniques to calibrate IMU errors. Herath et al. proposed RoNIN, which uses three network architectures based on ResNet, LSTM, and TCN to regress true velocities [
27]. Chen et al. introduced IONet, a network structure based on LSTM that learns positional transformations in polar coordinates from raw IMU data and constructs an inertial odometry system, reducing errors by segmenting inertial data into independent windows [
28]. Esfahani et al. proposed OriNet, a deep learning framework that achieves accurate orientation estimation for drones by learning IMU sequences [
29]. Liu et al. proposed TLIO, which combines ResNet with EKF to predict displacement and uncertainty in error propagation by learning from data collected by head-mounted IMUs [
30]. Nobre et al. introduced a reinforcement learning-based framework that models the IMU calibration process as a Markov decision process and uses reinforcement learning to regress optimal calibration parameters [
31]. Brossard et al. proposed a method based on dilated convolutions [
18], using only five layers of dilated convolutions to extract spatiotemporal features from past IMU sequences to regress the true values of orientation. This method has been validated on datasets such as EuRoC [
32] and performs comparably to top visual–inertial odometry systems like OpenVINS [
33].
Russo et al. developed an intelligent deep denoising autoencoder to enhance Kalman filter outputs, with its key advantage being a comprehensive noise compensation model that eliminates the need to handle each influencing factor separately [
34]. Di Ciaccio et al. introduced DOES, a deep-learning-based method tailored for maritime navigation, designed to improve roll and pitch estimations from conventional AHRS [
35]. Zhang et al. proposed a multi-learning fusion model for denoising low-cost IMU data, integrating convolutional autoencoders, LSTM, and Transformer multi-head attention mechanisms, which is roven to be more effective than traditional signal processing techniques for the complex motion characteristics of ships [
36]. Wang et al. presented a denoising approach combining the nonlinear generalization of SVM with the multi-resolution capabilities of wavelet transform, merging uncertain data from INS and GPS [
37].
In summary, although research on deep-learning-based IMU calibration is still in its infancy, existing studies indicate that deep-learning-based MEMS IMU calibration can be performed online without relying on external equipment. Its application in underwater navigation is feasible, providing a new approach to reducing the cost of underwater navigation.
4. Experimental Evaluation
This section will focus on the collection and data processing of the dataset, and using this dataset for qualitative and quantitative analysis of our method. In addition, we will compare the performance of our method with other learning-based algorithms in attitude estimation.
4.1. Data Collection and Preprocessing
We used the motorboat (shown in
Figure 4) as the data collection platform. This platform is equipped with high-precision SINS, RTK GNSS, and DVL. To ensure a diverse range of data types and a representative sample set, we collected trajectories under different times and weather conditions through various navigation tasks, including circular trajectories, irregular maneuvers, and sharp turns.
Our data collection system includes the following sensors:
A fiber optic strapdown inertial navigation system, including high-precision optical fiber gyroscopes and quartz flexible accelerometers, GNSS, and RTK positioning systems, providing high-frequency ground truth data (100 Hz) according to the NMEA-0183 protocol.
A low-precision MEMS inertial navigation system, including MEMS gyroscopes and accelerometers, and RTK GNSS, providing training data (100 Hz).
A Pathfinder RDI 600k DVL, providing low-frequency bottom speed information (5 Hz).
Detailed parameters can be found in
Table 3.
Two INS units were installed in the cabin of the boat and synchronized using precise time provided by GPS. The DVL was mounted on the left side of the boat at a depth of 1 m and synchronized with the INS via 1pps. During the experiment, the boat traveled at a constant speed of 6 knots. Experiments were conducted in a natural canyon in northwest China (shown in
Figure 5 and
Figure 6) in April 2024 and June 2024 (both experiments were conducted on sunny days, with calm lake surfaces or occasional light breezes), lasting a total of 10 h and collecting approximately 30 km of data.
Finally, the raw data were processed into KITTI format [
43] and augmented with bottom speed and bottom height information provided by the DVL to facilitate the subsequent processing and enhance interpretability. Additionally, we visualized the trajectory by integrating the collected angular velocity, linear acceleration, and DVL bottom speed and compared it with the precise positioning provided by RTK GNSS to verify the accuracy of the data samples.
4.2. Evaluation Metrics
To quantitatively validate the effectiveness of the method, we use the following metrics to evaluate the proposed approach, which are based on the methods proposed in the EVO toolbox [
44,
45].
4.2.1. Absolute Orientation Error
The absolute orientation error (AOE) is defined as follows:
AOE represents the relative rotational change between the estimated pose and the ground truth within each time interval. In our experiments, we calculate the AOE for each time interval and then average these values to obtain the AOE for the entire trajectory, where M means the length of the trajectory.
4.2.2. Absolute Positioning Error
The absolute positioning error (APE) is defined as follows:
APE represents the difference between the estimated position and the ground truth, calculated as the RMSE (Root Mean Square Error).
4.2.3. Relative Positioning Error
The relative positioning error (RPE) is defined as follows:
RPE represents the difference between the incremental changes in position estimates and the ground truth increments within each time interval Δt, calculated as RMSE. In our experiments, Δt is set to 20 s.
4.3. Performance
We compare the following methods:
Table 4 shows the AOE results. Our method achieved the best performance in predicting RPY (roll, pitch, and yaw).
To illustrate the differences more clearly, we selected several trajectories and plotted the orientation estimation and estimation error curves in
Figure 7. The black line represents the ground truth (RPY), the red line represents the raw IMU output, the gold line represents our method, the turquoise line represents Gyros-Net [
16], and the purple line represents the mahony. For all methods, we used integration to compute the orientation. As shown in the figure, due to uncalibrated and severe noise, the error of the raw IMU output accumulates rapidly, causing significant deviations from the ground truth. Our method has the smallest deviation from the ground truth, achieving the best results.
4.4. Navigation Experiment
To demonstrate the potential of our method in low-cost underwater navigation, we conducted a series of cooperative navigation experiments, selecting four trajectories with diverse motion patterns. These trajectories cover different navigation conditions to comprehensively test the adaptability and accuracy of our method.
In the experiments, we used a DVL to collect velocity data along these trajectories. We integrated the DVL velocity data with the IMU data denoised by our method to evaluate its performance in providing accurate navigation data and compared it with the traditional SINS–DVL navigation system.
Preliminary results is shown in
Figure 8, We are pleasantly surprised to find that, our method performs quite comparably to the SINS–DVL integrated navigation system. This suggests that our method has significant potential for low-cost underwater navigation while significantly reducing equipment costs.
Table 5 and
Table 6 show the performance of several methods in terms of APE and RPE. We observe that our method performs comparably to the SINS–DVL integrated navigation system. The comparison indicates that our approach is effective in practical scenarios and provides new insights for reducing the cost of underwater navigation.