0% found this document useful (0 votes)
25 views4 pages

Terrain Recogonization

This document discusses using deep neural networks for terrain recognition tasks with a mobile robot. It reviews previous work using various sensors for terrain classification. The authors present their own neural network model trained on RGB-D vision data from 9 different terrains collected with a Microsoft Kinect sensor. Their neural network architecture includes convolutional and pooling layers followed by dropout to classify image patches and recognize the terrain. Preliminary results demonstrate the potential of deep neural networks for terrain recognition to help robots navigate varied environments.

Uploaded by

prerak gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views4 pages

Terrain Recogonization

This document discusses using deep neural networks for terrain recognition tasks with a mobile robot. It reviews previous work using various sensors for terrain classification. The authors present their own neural network model trained on RGB-D vision data from 9 different terrains collected with a Microsoft Kinect sensor. Their neural network architecture includes convolutional and pooling layers followed by dropout to classify image patches and recognize the terrain. Preliminary results demonstrate the potential of deep neural networks for terrain recognition to help robots navigate varied environments.

Uploaded by

prerak gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Deep Neural Networks for Terrain Recognition Task

Pawel Kozlowski Krzysztof Walas


Institute of Control, Robotics and Information Engineering Institute of Control, Robotics and Information Engineering
Poznan University of Technology Poznan University of Technology
Poznan, Poland Poznan, Poland
[email protected] [email protected]

Abstract—This paper focuses on the problem of using artificial, II. R ELATED WORK
deep neural networks in terrain recognition task based on
data from vision sensor. Information about a terrain class is In our previous works we have focused on terrain classifi-
valuable for mobile robots, as it can improve their motion control cation using a variety of sensors from pure vision [9], through
algorithm performance through the use of information about shape (depth and intensity data) [4], until haptic and multi-
surface properties. In this work RGB-D sensor was used for modal perception [10].
providing vision data, which comprise a depth map and infra-
red image in addition to the standard RGB data. Our own model
In terms of visual terrain classification, most solutions (till
of the artificial neural network is presented in this work. It was 2015) found in the literature were based on hand-crafted
trained using the latest machine learning libraries. The results features [2], [3]. Recent approaches are based on the use of
of this work demonstrate the performance of artificial neural Deep Neural Networks. In [1], Deep Neural Networks vision
networks in the terrain recognition task and give some hints based forest trail following was performed. Friction parameters
how to improve classification in the future.
Index Terms—neural networks, robot vision systems, terrain
estimation from vision was provided in the work described
mapping in [11], where CNN was used for material recognition. Deep
neural networks were also used for acoustic terrain identifica-
I. I NTRODUCTION tion [12].
Mobile robots are able to traverse versatile and challenging The DNN approaches are mainly used in Computer/Robot
terrains both in indoor and outdoor conditions. In order to Vision. Yet another approach to terrain identification is to use
do that, the knowledge about the properties (class) of the force data, e.g. [13], where the single leg was used in exper-
surface they are travelling on is required. This information iments on four kinds of ground types. There are also works
could be gathered using pure vision [1]–[3], depth sensing [4], describing haptic based material classification of artificial [14]
[5] or tactile sensors [6]–[8]. The information obtained by and natural terrains [15]. Beside pure classification, there are
the perception system supports the execution of the robot also approaches where prediction of terrain properties is done
movement by allowing better tuning of the motion controller by matching visual (RGB-D) information with haptic classi-
parameters. fication [16], [17]. Most of the above-mentioned works were
Most of the motion control algorithms assume that the robot performed in a highly controlled, laboratory environment. To
is moving on a solid ground, hence only the shape of the best of our knowledge, the only feasibility study of performing
terrain is taken into account. In many cases, the shape of in-situ terrain modelling from force data for legged systems is
the terrain might be the same but the material and hence the presented in [18].
traction properties might be different. Type of material directly
III. M ATERIALS AND M ETHODS
influences the robot motion, e.g. a hill made of concrete or one
made of sand requires different control strategies for the robot. A. Sensor
In this paper, we attempt to classify different terrain types In our research, we were using images recorded with
using vision data. The classification process is performed using Microsoft Kinect Sensor v2 for Windows. Using this device
Deep Neural Networks. one can obtain RGB image, depth data, and infra-red image.
In the beginning, we will review the state of the art Interestingly, Kinect v2 depth sensor is a Time of Flight cam-
(Section II). Next, we will give the description of the vision era with a good accuracy. The details of the data acquisition
sensor used (Section III-A) and some details about the dataset process for this sensor, taking into account each modality, are
(Section III-B). Subsequently, more information on the Deep described in [19]. The data for each modality is available as
Neural Network architecture will be given (Section III-C). a separate information source but it could also be combined
This section will be followed by the results (Section IV). to obtain coloured point cloud together with the infrared data.
Finally, concluding remarks and future work plans will be To achieve that, one needs to obtain RGB and depth data in
given (Section V). the same reference frame. It is either in RGB camera or depth
This work was supported by the Poznan University of Technology, Faculty camera reference frame. This functionality is provided by the
of Electrical Engineering grant No DSPB/0162 in year 2017. SDK for this sensor.

978-83-949421-3-7 ©2018 Warsaw Univ. of Technology 283


Fig. 3. Tiling of the example terrain – artificial grass.

Finally, for each terrain type, we have performed tilling to


obtain patches which could be used in the learning process.
The size of the patch is 20x20 pixels. Doing that we obtained
4000 patches per terrain. The dataset was divided into larger
Fig. 1. Terrains used in the experiments. training set and smaller testing set. The tiling of the example
terrain is shown in Fig. 3. As the data for each modality was
registered in a common reference frame, the same tilling was
also performed for depth and infra-red images.
B. Dataset
C. Algorithm
Using the sensor described in the previous section the
registration of the terrain dataset was performed. Namely, Based on the registered data, we proposed a Deep Neural
images of 9 different terrains, from various view and tilt Network architecture shown in Fig. 4 Looking from left to
angles and with different lighting conditions, were recorded.
We stored aligned RGB and D image together with IR image,
all in D camera coordinates frame. The types of terrains are
artificial grass (1), black rubber (2), ceramic tiles (3), chipping
(4), gravel (5), green rubber (6), sand (7), rocks (8), wooden
boards (9). The terrain samples are shown in Fig. 1.
Before teaching the system we have performed preprocess-
ing of the images. Using standard image processing methods,
available in OpenCV [20] applied to the image, e.g. Fig. 2a,
we obtained a binary mask for the part of the image where the
terrain sample is placed (see Fig. 2b). To ease the process, the Fig. 4. Scheme of neural network model.
terrain lies on the homogeneous background. Subsequently, the
mask was added to the original image to obtain the masked
image shown in Fig. 2c. As it was mentioned in section III-A right. In the first phase, there are two CONV layers each
data registration is performed for common reference frame for followed by RELU layer, to provide efficient gradient propa-
each modality. Therefore, we apply the same mask to the depth gation. At the end of the first phase, POOLING is performed
and infra-red data. followed by a 25% dropout which prevents overfitting to the
learning data. Phase 2 is exactly the same. Thereafter, we
obtain some abstract, learned, representation of the data. The
output of the second phase is fed to the fully connected layers
which perform classification process based on the abstract rep-
resentation. To perform that, we selected the architecture with
the 512 input neurons which are then fed into RELU layer.
Next, the 50% dropout is performed. Finally, architecture has
9 neurons in the penultimate layer, which represent 9 terrain
classes in one-hot encoding. To make the output probabilistic
SOFTMAX layer was added.
IV. R ESULTS
Using proposed Deep Neural Network architecture we ob-
tained very promising results which are presented in the form
Fig. 2. Image of example terrain – rocks. The process of forming a mask for of a graph in Fig. 5. As it could be observed none of the
the terrain: an original image of the terrain (a), a binary mask (b), the result
of applying a mask to the RGB image (c).
class recognition rates is below 94.50%. The average result is
98.41%.

284
However, such good results, presented in this paper, were
obtained for the patch based classification. To apply this
algorithm on a mobile robot it should be able to perform image
segmentation into different classes. The result of applying
presented network, using a sliding window approach with the
window size equal to the size of a single patch, are shown in
Fig. 8b and Fig. 9b. The obtained results could be compared
qualitatively to the results obtained with SegNet network, the
approach proposed in [21]. These results are shown in Fig. 8c
Fig. 5. The accuracy of classification on the test set using RGB data only.
and Fig. 9c respectively. As it could be observed, due to the
fact that in our approach sliding window method was used we
obtain better segmentation when the terrain borders are the
To obtain even better results we performed learning and straight lines, but SegNet outperforms our approach when the
testing on the dataset containing depth and infra-red data as borders between terrain classes are curved.
well. The results are shown in Fig. 6. Here the accuracy is no
lower than 96.49%, with the average result of 98.99%.

Fig. 6. The accuracy of classification on the test set using RGB, depth and
infra-red data.

The results are almost ideal. The accuracy of finding the


right terrain class from vision data is a little bit smaller than
100% . The changes in the loss function for the training and
testing data, when working with multimodal dataset are shown
in Fig. 8.
Fig. 8. The accuracy of classification on the test set using RGB, depth and
infra-red data. Borders between terrains – curves.

V. C ONCLUSIONS
In our work, we have presented a Deep Neural Network
architecture for terrain classification. The results obtained on
the dataset recorded in the laboratory setting are almost perfect
reaching almost 99% of correctly recognized terrain classes.
We compared our approach to the state of the art architecture
in the image segmentation task. It turned out that the sliding
window method influences the separation of the classes on the
images hence end-to-end systems tends to perform better on
the images where the borders between classes are curved. As
a future work, we envision our approach to be used on a real
robot where segmentation of the terrain must be performed
Fig. 7. Loss function for the train and test data, while working with robustly and in real time.
multimodal dataset. Borders between terrains – lines.
R EFERENCES
Comparing this to previous work [10], where the hand- [1] A. Giusti, J. Guzzi, D. C. Cireşan, F. L. He, J. P. Rodrı́guez, F. Fontana,
crafted features were used, the results were 66% for pure M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza,
and L. M. Gambardella, “A Machine Learning Approach to Visual
vision. When we combined this information with different Perception of Forest Trails for Mobile Robots,” IEEE Robotics and
sensing modalities the results were 94% percent. Automation Letters, vol. 1, no. 2, pp. 661–667, July 2016.

285
[15] M. H. Hoepflinger, C. D. Remy, M. Hutter, and R. Siegwart, “Haptic
Terrain Classification on Natural Terrains for Legged Robots,” in In-
ternational Conference on Climbing and Walking Robots (CLAWAR),
Nagoya, Japan, 2010, pp. 785–792.
[16] M. A. Hoepflinger, M. Hutter, C. Gehring, M. Bloesch, and R. Siegwart,
“Unsupervised identification and prediction of foothold robustness,” in
International Conference on Robotics and Automation (ICRA). IEEE,
2013, pp. 3293–3298.
[17] M. A. Hoepflinger, M. Hutter, C. Gehring, P. Fankhauser, and R. Sieg-
wart, “Haptic Foothold Suitability Identification and Prediction for
Legged Robots,” in International Conference on Climbing and Walking
Robots (CLAWAR), Poznan, PL, 2014, pp. 425–432.
[18] W. Bosworth, J. Whitney, S. Kim, and N. Hogan, “Robot locomotion on
hard and soft ground: Measuring stability and ground properties in-situ,”
in 2016 IEEE International Conference on Robotics and Automation
(ICRA), May 2016, pp. 3582–3589.
[19] J. Sell and P. O’Connor, “The xbox one system on a chip and kinect
sensor,” IEEE Micro, vol. 34, no. 2, pp. 44–53, Mar 2014.
[20] G. Bradski, “The OpenCV Library,” Dr. Dobb’s J. of Software Tools,
2000.
[21] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep con-
volutional encoder-decoder architecture for image segmentation,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 39,
no. 12, pp. 2481–2495, Dec 2017.
Fig. 9. The accuracy of classification on the test set using RGB, depth and
infra-red data.

[2] S. T. Namin, M. Najafi, M. Salzmann, and L. Petersson, “Cutting Edge:


Soft Correspondences in Multimodal Scene Parsing,” in 2015 IEEE
International Conference on Computer Vision (ICCV), Dec 2015, pp.
1188–1196.
[3] P. Filitchkin and K. Byl, “Feature-Based Terrain Classification for
LittleDog,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems
(IROS), 2012, pp. 1387–1392.
[4] K. Walas and M. Nowicki, “Terrain Classification using Laser Range
Finder,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems
(IROS), 2014, pp. 5003–5009.
[5] I. Genya, O. Masatsugu, and K. Takashi, “Lidar-based Terrain Map-
ping and Navigation for a Planetary Exploration Rover,” in Proc. Int.
Symposium on Artificial Intelligence, Robotics and Automation in Space
(i-SAIRAS)’12, 2012.
[6] C. Kertesz, “Rigidity-Based Surface Recognition for a Domestic Legged
Robot,” IEEE Robotics and Automation Letters, vol. 1, no. 1, pp. 309–
315, Jan 2016.
[7] J. Mrva and J. Faigl, “Feature Extraction for Terrain Classification with
Crawling Robots,” in Proceedings ITAT 2015: Information Technologies
- Applications and Theory, Slovensky Raj, Slovakia, September 17-21,
2015., 2015, pp. 179–185.
[8] M. Hoffmann, K. Stepanova, and M. Reinstein, “The Effect of Motor
Action and Different Sensory Modalities on Terrain Classification in
a Quadruped Robot Running with Multiple Gaits,” Robotics and Au-
tonomous Systems, vol. 62, no. 12, pp. 1790–1798, 2014.
[9] K. Walas, “Terrain Classification Using Vision, Depth and Tactile
Perception.” in RSS workshop RGB-D: Advanced Reasoning with Depth
Cameras., june 2013, p. archived on the website of the workshop.
[10] ——, “Terrain Classification and Negotiation with a Walking Robot,”
Journal of Intelligent & Robotic Systems, vol. 78, no. 3-4, pp. 401–423,
2015.
[11] M. Brando, Y. M. Shiguematsu, K. Hashimoto, and A. Takanishi,
“Material recognition cnns and hierarchical planning for biped robot
locomotion on slippery terrain,” in 2016 IEEE-RAS 16th International
Conference on Humanoid Robots (Humanoids), Nov 2016, pp. 81–88.
[12] L. S. Abhinav Valada and W. Burgard, “Deep Feature Learning for
Acoustics-based Terrain Classification,” in Proceedings of the Interna-
tional Symposium on Robotics Research (ISRR), September 2015.
[13] L. Ding, H. Gao, Z. Deng, J. Song, Y. Liu, G. Liu, and K. Iagnemma,
“Foot–Terrain Interaction Mechanics for Legged Robots: Modeling and
Experimental Validation,” The Int. J. of Robotics Research, vol. 32,
no. 13, pp. 1585–1606, 2013.
[14] M. Höpflinger et al., “Haptic Terrain Classification for Legged Robots,”
in IEEE Int. Conf. on Robotics and Automation (ICRA), 2010, pp. 2828–
2833.

286

You might also like