0% found this document useful (0 votes)
43 views6 pages

Pixhawk A Syste

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 6

PIXHAWK: A System for Autonomous Flight using Onboard Computer

Vision
Lorenz Meier, Petri Tanskanen, Friedrich Fraundorfer and Marc Pollefeys

Abstract— We provide a novel hardware and software system


for micro air vehicles (MAV) that allows high-speed, low-latency
onboard image processing. It uses up to four cameras in parallel
on a miniature rotary wing platform. The MAV navigates based
on onboard processed computer vision in GPS-denied in- and
outdoor environments. It can process in parallel images and
inertial measurement information from multiple cameras for
multiple purposes (localization, pattern recognition, obstacle
avoidance) by distributing the images on a central, low-latency
image hub. Furthermore the system can utilize low-bandwith
radio links for communication and is designed and optimized to
scale to swarm use. Experimental results show successful flight
with a range of onboard computer vision algorithms, including
localization, obstacle avoidance and pattern recognition.

I. INTRODUCTION Fig. 1. PIXHAWK Cheetah Quadrotor


We present a novel small rotary wing hardware and
software system design capable of autonomous flight using
onboard processing for computer vision. Our system (Fig. operate in parallel. Therefore the swarm size is physically
1) is a micro air vehicle that can be operated in- and limited to a very few vehicles. This fundamental limitation
outdoors in GPS-denied environments. Our key contribution can be addressed by onboard computer vision. Up until
is a lightweight system design pattern which better suits now only larger systems (20 - 100 kg all-up weight) have
micro air vehicle applications than larger robotic toolkits processed images for vision-based localization onboard. Our
geared towards ground robotics. The presented software and system brings the multi-process architecture and onboard
hardware system is an open-source research platform which processing capabilities from the 20-100 kg range to vehicles
enables full onboard processing on a micro air vehicle. In around 1 kg liftoff-weight. In contrast to systems using
contrast to previous research, it allows the vehicle to be nav- local stabilization approaches on specialized microcontroller
igated fully autonomously without any radio link or external hardware (Parrot ARDrone), the system is geared towards
processing device. The system design allows to utilize up global localization and autonomous exploration of unknown
to four cameras (for example as two stereo camera pairs) environments using stereo vision. The presented initial re-
for localization, pattern recognition and obstacle avoidance. sults show that our system consumes at 30 Hz frame rate
Cameras and inertial measurement unit (IMU) are hardware only 10 % of the maximum CPU load (5 ms processing
synchronized and thus allow tight vision-IMU fusion. We time per frame) for autonomous marker based flight and 40-
show the validity of the system design with real flight results 60 % load (20 ms processing time per frame) for stereo-based
in the final section. obstacle avoidance and pattern recognition, which leaves
enough capacity for future work, including simultaneous
A. Onboard Processing localization and mapping.
Current micro air vehicle research systems are using
either GPS/Inertial Navigation System (INS) navigation on a B. Time Base for Computer Vision
microcontroller or computer vision and laser ranging using GPS and, to a large extend, laser based systems can
off-board processing for localization and maneuvering. Off- offer a deterministic processing time to fuse the sensor
board processing effectively makes the MAV dependent on data into a localization. Computer Vision in contrast has
the external processing unit and severely limits the safety varying and in comparison often longer processing time
and operation range of the vehicle. In addition, the physical depending on the image content. Therefore the estimation
wireless bandwidth limits the number of vehicles which can and control steps cannot depend on a fixed interval length ∆t
Lorenz Meier is master student with the Computer Vision and Geometry and a fixed processing delay ∆p. Instead they must use the
Lab, ETH Zurich, 8092 Zurich, Switzerland [email protected] actual timestamp of all measurements to calculate the correct
P. Tanskanen, F. Fraundorfer and M. Pollefeys are with the Computer intervals. Therefore all information in our system is times-
Vision and Geometry Lab, ETH Zurich, 8092 Zurich, Switzerland
{petri.tanskanen, friedrich.fraundorfer, tamped with microseconds resolution. The system guarantees
marc.pollefeys}@inf.ethz.ch correct time information (stamp and predict design pattern)
instead of guaranteeing a fixed interval. It also relaxes the case of the 3-point algorithm [5], the calculation steps for the
timing constraints for the computer vision algorithms; thus triangulation are significantly simplified when substituting
allowing more complex approaches which can deal with parts of the calculation by IMU roll and pitch, which
larger environments. speeds up RANSAC (see Fig. 2 for the geometric relation).
For any non-global vision based localization approach IMU
C. Related Work information can provide the gravity vector and heading as
Current MAV research is using either GPS/INS navigation global reference. This property is important when local
on a microcontroller ([7], [6]) or computer vision/laser rang- vision information is used as controller input.
ing and offboard processing. Until now, only larger systems
processed images onboard. Conte [3] et al. processed visual
odometry at 4 Hz on an Yamaha RMAX helicopter UAV
with over 20 kg payload (94 kg maximum total weight)
and 3.6 m diameter, but did not use the output for flight
control. In the MAV domain previous onboard processing
approaches ([4]) used simple optical flow mouse sensors
to locally stabilize the position and to do simple obstacle
avoidance. Other approaches used laser scanners and cameras
but did not process the data/images onboard. The system of
[1] uses an Asctec Pelican Quadrotor with Hokuyo URG
line scanner and onboard Intel ATOM processor to collect
laser scan data and camera images for off-board processing.
This data is sent to an off-board notebook for processing
the actual localization. The outdoor system of [7] uses an Fig. 2. Relation of gravity vector and camera angles
analog TV camera for object tracking and GPS/INS for
position control. The work of [2] et al. demonstrated visual III. A ERIAL ROBOTICS M IDDLEWARE
localization using a camera on a USB tether cable and
processing on a notebook. The STARMAC quadrotor [6] has Several toolkits for larger unmanned ground, surface and
a PC104 form factor onboard computer, but does not utilize air vehicles have been widely used in research. Micro air
it for vision processing due to limited performance. vehicles with significant or complete onboard processing
are a rather new development, though. Existing toolkits for
II. C OMPUTER V ISION F RAMEWORK ground robotics include ROS, CARMEN and CLARAty.
Following the well-known principle of stereo cameras, The communication architecture significantly blocks ground
to estimate the metric distance in 3D, two images with a robotic toolkits to be adapted on small-scale flying plat-
known baseline (distance of the camera centers) are needed. forms. All toolkits assume TCP/IP or UDP network links,
Therefore the PIXHAWK quadrotor has a setup of 2x2 such as IEEE 802.3 Ethernet and IEEE 802.11a/b/g/n WiFi.
cameras in two stereo setups, pointed down and front with However, MAV onboard-networks typically include several
5 cm baseline. All four cameras are triggered from the devices connected via serial links. As these toolkits do
onboard inertial measurement unit. Computer vision allows not scale down to this link type, every packet has to be
to extract both the 3D geometry of the environment as well transcoded by bridge processes, leading to unnecessary effort
as its texture/appearance. Therefore multiple algorithms are and system complexity. Therefore we propose a new commu-
necessary to extract all information of an image, leading nication protocol and architecture that can be transparently
to the need to distribute the images to multiple algorithms used on different hardware links and minimizes the system
in multiple processes. Previous approaches ([1], [2]) did complexity.
not have the possibility to run camera interface and image
processing pipelines separated. The central image hub dis-
tributes the combined images and IMU information to all
connected computer vision algorithms, including localization
and pattern recognition algorithms.

A. Vision-IMU Combination
As the trigger system supports both monocular and stereo Fig. 3. PIXHAWK Middleware Architecture Layers
setups, any localization approach that uses one or two
cameras can receive the vision-IMU data and process it. The PIXHAWK robotics toolkit is based on a lightweight
The attitude (roll, pitch, yawspeed) is available as part of protocol called MAVLink, which scales from serial to
the image metadata. This allows to speed up the localization UDP links. It serves also as communication protocol be-
algorithms, either by providing an initial guess of the attitude tween flight computer (pxIMU) and onboard main computer
or in closed form as direct contribution to the localization. In (pxCOMex/pxOvero). This is also important for the safe
operation of any robotic aircraft, as a human operator should
always be able to intervene. The typical 30-100 m range of
WiFi does not generalize to most outdoor applications, which
makes a communication architecture scaling down to radio
modems also desirable for the off-board communication.
As shown in Figure 3, the PIXHAWK Linux middleware
consists of several layers. This architecture allows to use
the different base communication layers (ROS and LCM)
and provides a convenient high-level programming interface
(API) to the image distribution server. MAVLink messages Fig. 4. pxIMU Inertial Measurement Unit
from the IMU and the ground control station can also be
directly received in any process.
be flashed via an USB bootloader and stores settings such
A. MAVLink Protocol
as PID parameters in its onboard EEPROM. It provides the
Our MAVLink protocol is a very lightweight message required I2C bus to the motor controllers and additional
marshalling protocol optimized for micro air vehicles. It GPIOs, ADC input and other peripherals. It is interfaced via
has only 8 bytes overhead per packet, allows routing on UART to the computer vision processing unit and it operates
an inter-system or intra-system level and has an inbuilt at a maximum update rate of 200-500 Hz.
packet-drop detection. Due to the low overhead, it is both 2) Processing Unit: The processing unit it the core piece
suitable for UDP and UART/radio modem transport layers. of the system and consists of a two-board stack. The px-
The efficient encoding also allows to execute the protocol COMEx base board provides the USB and UART periph-
on microcontrollers. These properties allowed building a ho- erals to interface machine vision cameras, communication
mogenous communication architecture across the PIXHAWK equipment and the pxIMU module. It can accept any micro
system. MAVLink has been already adopted by a number of COM express industry standard module. Currently, a Kontron
other systems (pxIMU autopilot, ArduPilotMega autopilot, etxExpress module with Intel Core 2 DUO 1.86GHz and 2
SLUGS autopilot, UDB autopilot). The MAVLink sentences GB DDR3 RAM is used, but future upgrade options include
are generated based on an XML protocol specification file Intel i7 CPUs. It has 4x UART, 7x USB 2.0 and 1x S-ATA
in the MAVLink format. The code generator ensures well- 2.0 peripheral options. The typical onboard setup consists of
formed messages and generates C89-compatible C-code for 4x PointGrey Firefly MV monochrome, 1x USB 2.0 802.11n
the message packing and unpacking. This allows fast and WiFi adapter and 1x S-ATA 128 GB SSD with more than
safe extensions and changes to the communication protocol 100 MB/s write speed. The pxIMU unit, the GPS module
and ensures that no implementation errors will occur for new and the XBee radio modem are connected via UART to the
messages. Our current implementation supports the use of the processing unit. With a weight of 230 g including cooling
lightweight communication marshalling library (LCM) or the and only 27 W peak power consumption, the processing unit
Robot Operating System (ROS) as transport layers. can be easily lifted by a wide range of aerial systems, not
IV. V EHICLE D ESIGN limited to the quadrotor presented here.

The PIXHAWK Cheetah quadrotor design was built from


B. Mechanical Structure and Flight Time
scratch for onboard computer vision. Beside the commercial-
off-the-shelf (COTS) motor controllers and cameras all elec- Our custom mechanical design effectively protects the
tronics and the mechanical frame is our custom design. First onboard processing module in case of a system crash and
the payload, consisting of the pxCOMEx processing module the fixed mounting of the four cameras allows inter-camera
and four PointGrey Firefly MV USB 2.0 cameras, was and camera-IMU calibration. As the processing board and
selected. The system design then followed the requirements four cameras represent a relatively large payload of 400 g
of onboard computer vision. for the small diameter of 0.55 m (0.70 m for the larger
version) of the quadrotor, the overall system structure has
A. Electronics been optimized for low weight. It consists of lightweight
The onboard electronics consists of an inertial measure- carbon sandwich material with carbon fiber base plates and
ment unit and autopilot unit, pxIMU, and the onboard an inner layer made of Kevlar in shape of honeycombs.
computer vision processing unit, pxCOMEx. Each of the four motors with 8” propeller contributes a
1) Autopilot Unit: The pxIMU inertial measurement maximum of 452g thrust, enabling the system to lift 400g
unit/autopilot board (Fig. 4) provides 3D linear acceleration payload at a total system weight of 1.00–1.20 kg, including
(accelerometer, ±6g), 3D angular velocity (±500 deg/s), 3D battery. This allows a continous flight time of 7-9 minutes
magnetic field (± milligauss), barometric pressure (130-1030 with 8” propellers and 14-16 minutes with 12” propellers.
Hectopascal (hPa)) and temperature. The onboard MCU for The propulsion consumes 150-180W for hovering, while the
sensor readout and sensor fusion as well as position and highspeed onboard computer consumes only 27 W peak.
attitude control is a 60 MHz ARM7 microcontroller. It can Therefore flight time is governed by the weight of the system.
V. L OCALIZATION AND F LIGHT C ONTROL P IPELINE
The localization and flight control pipeline is only one
of the several onboard pipelines. As the PIXHAWK mid-
dleware provides a precise time base, a standard textbook
estimation and control pipeline already performs well for
autonomous flight. The overall pipeline, including camera
interfacing and communication, consumes only 10–15 % of
the total CPU power. Other implemented pipelines are stereo
obstacle avoidance and planar pattern recognition. Individual
pipelines can be activated / deactivated at runtime and
Fig. 6. Figure-8 setup at IMAV 2010 competition
individual pipeline components can be replaced by different
algorithms without changes to the overall system. Fig. 5
illustrates the data processing and information flow from error model of the computer vision approach. As IMU and
image capture to motor control output. vision both estimate the 3 degree of freedom attitude of the
helicopter, this redundant data can be used to detect and
remove position outliers produced by the localization step.
Any erroneous vision measurement will not only contain a
wrong position estimate, but also wrong attitude estimate
because of the projective dependency of position and attitude.
Position outliers can therefore be rejected based on the
Fig. 5. Vision-guided Control Pipeline comparison of roll and pitch estimates from the IMU and
from the visual localization.
A. Vision-IMU Synchronization D. Discrete Kalman Filtering
As the vision-IMU fusion depends on measurements from The remaining data is more conformant to the normal
a known timebase, the image capture is controlled by the distribution, and allows the use of a simple discrete Kalman
inertial measurement unit using a shutter signal. The IMU filter. As the dynamics of a quadrotor are only loosely
also delivers the current roll, pitch and yaw estimate at the coupled in x, y and z direction, the dynamics can be
time of image capture with the shutter time to the vision modelled as three independent dimensions. As the yaw angle
pipeline. Images are transmitted over USB to the camera pro- is of much better accuracy and resolution in indoor settings
cess, while the IMU measurements and the shutter timestamp from computer vision than from a magnetometer (due to
are delivered through a serial interface. Image transmission iron structures in the building), the yaw angle is taken as
from camera to main memory via USB takes approximately the fourth independent dimension for filtering. Given this
16 ms, the transmission of the shutter timestamp from IMU quadrotor dynamic model, the Kalman filter is designed as
to main memory via serial/MAVLink takes approximately a block of 4x 1D Kalman filters with position and speed as
1–3 ms. As it is guaranteed that the IMU data arrives earlier states. The Kalman filter assumes a constant speed model,
than the image, the camera process can always deliver the and takes position as input. The estimated velocity is critical
full vision-IMU dataset at once to the localization process. to damp the system, as the only physical damping is the air
B. Vision based Localization resistance on the horizontal plane, which is not significant at
the typical hovering or low-speed conditions. The states of
The initial flight tests were conducted using a marker the four Kalman filters are:
based approach with an adapted implementation of        
x y z ψ
ARToolkit+ [8] for the localization. The marker positions xk = yk = zk = ψk =
ẋ ẏ ż ψ̇
are encoded in a global world map with the 6D position and
attitude of each marker. By extracting the marker quadrangle, We try to estimate the current state of the vehicle xk ,
the global marker position can be estimated by calculating which is modeled by
the homography on the four corner points of the quadrangle.
xk = A · xk−1 + wk−1 .
The correct orientation on the quadrangle plane and the
marker ID is encoded by a 2D binary code inside each Where the dynamics matrix A models the law of motion,
marker. An example of a larger marker setup is shown in xk−1 is the previous state and wk−1 the process noise.
Fig. 6. However, the system itself is not depended on this This motion is measured at certain time steps, where the
particular approach – any kind of localization algorithm can measurements are expressed as the gain H times the current
be used. state plus the measurement noise v.

C. Outlier Removal zk = H · x k + v k
The data is filtered with a Kalman filter in the next The speed in the model will therefore only be changed
step, which implies that the filter is parametrized with the by measurements and then again assumed constant during
prediction. From this formulation it is already obvious that
varying time steps can be handled by the filter as long
as they are precisely measured. As this filter does not
include the control input matrix B, the filter is assuming
a constant speed model, which is a valid approximation if
the filter update frequency is fast enough with respect to
the change of speed of the physical object. Because the
PIXHAWK system provides a precise timebase, the filter
uses the measured inter-frame interval as time difference
input ∆t. It is using the standard Kalman prediction-update
scheme. If measurements are rejected as outliers, the filter
only predicts for this iteration and compensates in the next
update step for the then longer time interval. This allows the
system to estimate its egomotion for several seconds without
Fig. 7. QGroundControl Operator View with map and detected patterns
new vision measurements.

E. Position and Attitude Control


landing after a short period of hovering on a spot. The plot
As already pointed out, the x, y, z, yaw motion can be shows a section of the Figure-8 setup of the International
modeled as independent. Thus, it is possible to control the Micro Air Vehicle Competition 2010, which is also depicted
quadrotor’s x- and y-position with the angle of attack of in Fig. 6. A video of a similar flight can be viewed at
the collective thrust vector by setting the desired pitch angle http://pixhawk.ethz.ch/videos/.
for x and the desired roll angle for y. The z-position can
be controlled with the component of the collective thrust
collinear to the gravity vector. The yaw angle finally can be
controlled by the difference of rotor drag of the clockwise
(CW) and counter-clockwise (CCW) rotor pairs. As the
previous step contributed a smoothened position and speed 1.2

estimate with little phase delay, no model-driven/optimal 1


control is needed to account for the missing direct speed 0.8
Z position (meters)

measurement. The controller can thus be designed as a 0.6


standard PID controller, implemented as four independent
0.4 4
SISO PID controllers for x, y, z, and yaw. Attitude control
0.2 3
was implemented following the standard PID based attitude
0
control approach for quadrotors using one PID controller for 2

roll, pitch and yaw each. The craft is actuated by directly −0.2
0
1
0.5 1
mixing the attitude control output onto the four motors. 1.5 2 2.5 3 3.5
0 Y position (meters)

X position (meters)
VI. O PERATOR C ONTROL U NIT
The design paradigm presented in this paper shows a clear Fig. 8. Trajectory of an autonomous flight using ARToolkit localization
separation of the onboard autonomy system and the off-board including takeoff and landing
operator control. As the MAV is not any more a remote
sensor head, no sensor data needs to be transmitted to the As our system is modular, we can also easily replace the
ground control station for autonomous flight. It is therefore ARToolkit-based localization with other methods for position
desirable to reduce the communication and operator load and control. Fig. 9 shows a trajectory resulting from a flight
only send abstract system information such as remaining using a Vicon motion capture system. The localization is
fuel/battery and position. The QGroundControl application very precise (< 1mm error) at a maximum rate of 250 Hz.
allows to represent multiple vehicles. The map view from We use a low latency wireless link to transfer the current
Fig. 7 shows the aerial map view. position computed by the motion capture system to the
helicopter. The flight trajectory contains autonomous takeoff
VII. E XPERIMENTAL R ESULTS / F LIGHT and landing, as well as several yaw rotations and an obstacle
Fig. 8 shows flight results using artificial marker based lo- avoidance reaction (overlapping parts of the trajectory on top
calization. The plot shows a flight around a rectangular path. of the figure). The onboard view of the helicopter during
The two perpendicular movements are autonomous startoff this flight is shown in Fig. 10. This illustrates the flexibility
and landing. The right vertical trajectory is the open-loop and scalability of the system, as these experimental results
liftoff. As initially no computer-vision based localization is include onboard stereo depth estimation, dynamic obstacle
available, the helicopter runs an open-loop maneuver until it avoidance and pattern recognition operating in parallel on the
has 0.5 m altitude. The left vertical path is the autonomous onboard computer. Although sensors delivering a depth map
without high processing burden currently emerge, such as and pattern recognition however increase significantly higher
the PrimeSense Kinect sensor, for outdoor applications stereo load in in the 40-60 % range if run in parallel. Because of
still remains the best option to obtain a depth map on a micro the choice of an industry standard computing platform, the
air vehicle. The significant higher processing performance of presented system will scale with future increases in process-
the presented platform is therefore a key differentiator to the ing performance and always roughly deliver the processing
previous state of the art. speed of a medium-level notebook computer. Hence enough
capacity for extensions is available onboard. As the system
design provides a precise common time base, IMU-GPS-
vision fusion will be a future extension for outdoor navi-
gation. On a multi-system level, the lightweight MAVLink
1.2
Z position (meters)

1 protocol provides an ideal basis for future swarm extensions.


0.8
0.6
As all processing is onboard, the number of vehicles is not
0.4 limited by the communication bandwidth.
0.2

IX. ACKNOWLEDGMENTS
3.5
3 We would like to thank our students (in alphabetical
2.5
order) Bastian Bücheler, Andi Cortinovis, Fabian Landau,
2
1.5 Laurens Mackay, Tobias Nägeli, Philippe Petit, Martin
1 Rutschmann, Amirehsan Sarabadani, Christian Schluchter
0.5
0 1 and Oliver Scheuss for their contributions to the current
0.5
Y position (meters) −0.5
0
system and the students of the previous semesters for the
−1 −0.5
X position (meters)
foundations they provided. Raffaello d’Andrea and Sergei
Lupashin (ETH IDSC) provided constant and valuable feed-
back. This work was supported in part by the European
Fig. 9. Trajectory of an autonomous flight using Vicon localization
including takeoff and landing Communitys Seventh Framework Programme (FP7/2007-
2013) under grant #231855 (sFly) and by the Swiss National
Science Foundation (SNF) under grant #200021-125017.
R EFERENCES
[1] M Achtelik, A Bachrach, R He, and S Prentice. Stereo vision
and laser odometry for autonomous helicopters in gps-denied indoor
environments. Proceedings of the SPIE Unmanned Systems Technology
XI, 7332, 733219, Jan 2009.
[2] M Blösch, S Weiss, D Scaramuzza, and R Siegwart. Vision based mav
navigation in unknown and unstructured environments. Proc. of The
IEEE International Conference on Robotics and Automation (ICRA),
2010.
[3] G Conte and P Doherty. An integrated uav navigation system based on
aerial image matching. Proceedings of the IEEE Aerospace Conference,
Jan 2008.
[4] SG Fowers, Dah-Jye Lee, BJ Tippetts, KD Lillywhite, AW Dennis,
and JK Archibald. Vision aided stabilization and the development of
a quad-rotor micro uav. International Symposium on Computational
Intelligence in Robotics and Automation, 2007. CIRA 2007., pages 143
Fig. 10. QGroundControl HUD View with live image streaming from the –148, 2007.
helicopter. The live view shows detected faces and patterns by the helicopter. [5] F Fraundorfer, P Tanskanen, and M Pollefeys. A minimal case solution
On the right top corner the computed depth map from the stereo camera to the calibrated relative pose problem for the case of two known
setup is shown. orientation angles. In Kostas Daniilidis, Petros Maragos, and Nikos
Paragios, editors, Computer Vision ECCV 2010, volume 6314, pages
269–282. Springer Berlin / Heidelberg, 2010.
[6] G Hofiann, D Rajnarqan, and S Waslander. The stanford testbed of
VIII. CONCLUSIONS AND FUTURE WORKS autonomous rotorcraft for multi agent control (starmac). Proceedings
The open-source system presented in this paper provides of Digital Avionics Systems Conference (DASC04).
[7] N Roy, R He, A Bachrach, and M Achtelik. On the design and use of a
a new onboard and offboard architecture for computer vision micro air vehicle to track and avoid adversaries. International Journal
based flight with tight IMU integration. As the PIXHAWK of Robotics Research, Jan 2010.
system is built for onboard computer vision, future work [8] D Wagner and D Schmalstieg. Artoolkitplus for pose tracking on mobile
devices. Proceedings of 12th Computer Vision Winter Workshop, Jan
will focus on natural feature based localization and mapping 2007.
for autonomous flight. Other future improvements are the
optimization of the onboard position and attitude control and
the extension of the current waypoint based scheme to tra-
jectory control. The current system load for artificial feature
based localization is 10 % of the maximum CPU capacity.
Higher-level approaches such as stereo obstacle avoidance

You might also like