Terrain Recogonization
Terrain Recogonization
Abstract—This paper focuses on the problem of using artificial, II. R ELATED WORK
deep neural networks in terrain recognition task based on
data from vision sensor. Information about a terrain class is In our previous works we have focused on terrain classifi-
valuable for mobile robots, as it can improve their motion control cation using a variety of sensors from pure vision [9], through
algorithm performance through the use of information about shape (depth and intensity data) [4], until haptic and multi-
surface properties. In this work RGB-D sensor was used for modal perception [10].
providing vision data, which comprise a depth map and infra-
red image in addition to the standard RGB data. Our own model
In terms of visual terrain classification, most solutions (till
of the artificial neural network is presented in this work. It was 2015) found in the literature were based on hand-crafted
trained using the latest machine learning libraries. The results features [2], [3]. Recent approaches are based on the use of
of this work demonstrate the performance of artificial neural Deep Neural Networks. In [1], Deep Neural Networks vision
networks in the terrain recognition task and give some hints based forest trail following was performed. Friction parameters
how to improve classification in the future.
Index Terms—neural networks, robot vision systems, terrain
estimation from vision was provided in the work described
mapping in [11], where CNN was used for material recognition. Deep
neural networks were also used for acoustic terrain identifica-
I. I NTRODUCTION tion [12].
Mobile robots are able to traverse versatile and challenging The DNN approaches are mainly used in Computer/Robot
terrains both in indoor and outdoor conditions. In order to Vision. Yet another approach to terrain identification is to use
do that, the knowledge about the properties (class) of the force data, e.g. [13], where the single leg was used in exper-
surface they are travelling on is required. This information iments on four kinds of ground types. There are also works
could be gathered using pure vision [1]–[3], depth sensing [4], describing haptic based material classification of artificial [14]
[5] or tactile sensors [6]–[8]. The information obtained by and natural terrains [15]. Beside pure classification, there are
the perception system supports the execution of the robot also approaches where prediction of terrain properties is done
movement by allowing better tuning of the motion controller by matching visual (RGB-D) information with haptic classi-
parameters. fication [16], [17]. Most of the above-mentioned works were
Most of the motion control algorithms assume that the robot performed in a highly controlled, laboratory environment. To
is moving on a solid ground, hence only the shape of the best of our knowledge, the only feasibility study of performing
terrain is taken into account. In many cases, the shape of in-situ terrain modelling from force data for legged systems is
the terrain might be the same but the material and hence the presented in [18].
traction properties might be different. Type of material directly
III. M ATERIALS AND M ETHODS
influences the robot motion, e.g. a hill made of concrete or one
made of sand requires different control strategies for the robot. A. Sensor
In this paper, we attempt to classify different terrain types In our research, we were using images recorded with
using vision data. The classification process is performed using Microsoft Kinect Sensor v2 for Windows. Using this device
Deep Neural Networks. one can obtain RGB image, depth data, and infra-red image.
In the beginning, we will review the state of the art Interestingly, Kinect v2 depth sensor is a Time of Flight cam-
(Section II). Next, we will give the description of the vision era with a good accuracy. The details of the data acquisition
sensor used (Section III-A) and some details about the dataset process for this sensor, taking into account each modality, are
(Section III-B). Subsequently, more information on the Deep described in [19]. The data for each modality is available as
Neural Network architecture will be given (Section III-C). a separate information source but it could also be combined
This section will be followed by the results (Section IV). to obtain coloured point cloud together with the infrared data.
Finally, concluding remarks and future work plans will be To achieve that, one needs to obtain RGB and depth data in
given (Section V). the same reference frame. It is either in RGB camera or depth
This work was supported by the Poznan University of Technology, Faculty camera reference frame. This functionality is provided by the
of Electrical Engineering grant No DSPB/0162 in year 2017. SDK for this sensor.
284
However, such good results, presented in this paper, were
obtained for the patch based classification. To apply this
algorithm on a mobile robot it should be able to perform image
segmentation into different classes. The result of applying
presented network, using a sliding window approach with the
window size equal to the size of a single patch, are shown in
Fig. 8b and Fig. 9b. The obtained results could be compared
qualitatively to the results obtained with SegNet network, the
approach proposed in [21]. These results are shown in Fig. 8c
Fig. 5. The accuracy of classification on the test set using RGB data only.
and Fig. 9c respectively. As it could be observed, due to the
fact that in our approach sliding window method was used we
obtain better segmentation when the terrain borders are the
To obtain even better results we performed learning and straight lines, but SegNet outperforms our approach when the
testing on the dataset containing depth and infra-red data as borders between terrain classes are curved.
well. The results are shown in Fig. 6. Here the accuracy is no
lower than 96.49%, with the average result of 98.99%.
Fig. 6. The accuracy of classification on the test set using RGB, depth and
infra-red data.
V. C ONCLUSIONS
In our work, we have presented a Deep Neural Network
architecture for terrain classification. The results obtained on
the dataset recorded in the laboratory setting are almost perfect
reaching almost 99% of correctly recognized terrain classes.
We compared our approach to the state of the art architecture
in the image segmentation task. It turned out that the sliding
window method influences the separation of the classes on the
images hence end-to-end systems tends to perform better on
the images where the borders between classes are curved. As
a future work, we envision our approach to be used on a real
robot where segmentation of the terrain must be performed
Fig. 7. Loss function for the train and test data, while working with robustly and in real time.
multimodal dataset. Borders between terrains – lines.
R EFERENCES
Comparing this to previous work [10], where the hand- [1] A. Giusti, J. Guzzi, D. C. Cireşan, F. L. He, J. P. Rodrı́guez, F. Fontana,
crafted features were used, the results were 66% for pure M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza,
and L. M. Gambardella, “A Machine Learning Approach to Visual
vision. When we combined this information with different Perception of Forest Trails for Mobile Robots,” IEEE Robotics and
sensing modalities the results were 94% percent. Automation Letters, vol. 1, no. 2, pp. 661–667, July 2016.
285
[15] M. H. Hoepflinger, C. D. Remy, M. Hutter, and R. Siegwart, “Haptic
Terrain Classification on Natural Terrains for Legged Robots,” in In-
ternational Conference on Climbing and Walking Robots (CLAWAR),
Nagoya, Japan, 2010, pp. 785–792.
[16] M. A. Hoepflinger, M. Hutter, C. Gehring, M. Bloesch, and R. Siegwart,
“Unsupervised identification and prediction of foothold robustness,” in
International Conference on Robotics and Automation (ICRA). IEEE,
2013, pp. 3293–3298.
[17] M. A. Hoepflinger, M. Hutter, C. Gehring, P. Fankhauser, and R. Sieg-
wart, “Haptic Foothold Suitability Identification and Prediction for
Legged Robots,” in International Conference on Climbing and Walking
Robots (CLAWAR), Poznan, PL, 2014, pp. 425–432.
[18] W. Bosworth, J. Whitney, S. Kim, and N. Hogan, “Robot locomotion on
hard and soft ground: Measuring stability and ground properties in-situ,”
in 2016 IEEE International Conference on Robotics and Automation
(ICRA), May 2016, pp. 3582–3589.
[19] J. Sell and P. O’Connor, “The xbox one system on a chip and kinect
sensor,” IEEE Micro, vol. 34, no. 2, pp. 44–53, Mar 2014.
[20] G. Bradski, “The OpenCV Library,” Dr. Dobb’s J. of Software Tools,
2000.
[21] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep con-
volutional encoder-decoder architecture for image segmentation,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 39,
no. 12, pp. 2481–2495, Dec 2017.
Fig. 9. The accuracy of classification on the test set using RGB, depth and
infra-red data.
286