Learning Driven Coarse-to-Fine Articulated Robot Tracking
Learning Driven Coarse-to-Fine Articulated Robot Tracking
Learning Driven Coarse-to-Fine Articulated Robot Tracking
Christian Rauch1 , Vladimir Ivan1 , Timothy Hospedales1 , Jamie Shotton2 , Maurice Fallon3
probability
0.3
0.2
0.1
prediction 0.0
−π −34 π −12 π −14 π 0 +14 π +12 π +34 π +π
1 2 3
(x,y,z) Fig. 4. Example distribution (orange) and samples (green) of a lower arm
(x,y,z) joint position, predicted and sampled from an image of the occluded bottle
sequence (Figure 8). The distribution shows two strong modes at 1.2rad and
−1.8rad since the link has a similar visual appearance at half rotations.
Upscale 240
640
feature maps 20 320
discretised joint positions are serialised into a single vector
480
ResNet-34 15 R900×1 (15 × 60) and then reshaped into a matrix R15×60
keypoints
3x3x23
256
Upscale 240
After prediction we can treat the scores of each joint as a
3x3x256
3x3x256
3x3x128
3x3x128
Upscale
23 320
ConvT
ConvT
ConvT
ConvT
0.06 0.06
0.04 0.04
0.02 0.02
0.00 0.00
0 10 20 30 0 10 20 30 0 20 40 0 20 40
time [s] time [s] time [s] time [s]
Keypoints Keypoints + Edges Keypoints Keypoints + Edges
Fig. 6. Grasping box. Using the additional edge objective reduces average Fig. 8. Occluded bottle. Using additional edge objective reduces average
position error from 5cm to 3.1cm (forearm) and 3.7cm to 2.7cm (palm). position error for the forearm from 3.1cm to 2.1cm but increases the palm
position error from 2.6cm to 2.8cm.
forearm palm
0.10
forearm palm
position error [m]
0.08 0.15
position error [m]
0.06
0.04 0.10
0.02
0.05
0.00
0 20 40 0 20 40
time [s] time [s] 0.00
0 10 20 30 40 0 10 20 30 40
Keypoints Keypoints + Edges
time [s] time [s]
Keypoints Keypoints + Edges