Revisiting The Paper Helicopter Project Using An Adaptive Surrogate-Based Approach
Revisiting The Paper Helicopter Project Using An Adaptive Surrogate-Based Approach
Abstract
The paper helicopter project has been given to graduate students of the French engineering school
Ecole des Mines de Saint Etienne, as a practical work of two lectures: design of experiments and
global optimization. The project consists of designing a very simplified paper helicopter for maximum
flying time. Its main difficulty lies in the very high level of uncertainty, as flying times for the same
design can vary very substantially due to experimental conditions or manufacturing imprecisions.
The students were asked to combine design of experiments and surrogate modeling techniques (in
order to deal with noise) with global optimization strategies. Despite its apparent simplicity, the
project was found quite challenging and the strategies experimented by students were eventually
close to what would require a real-case problem.
1 Problem description
The paper helicopter problem has been initially proposed by Box (1992) to introduce design of ex-
periments and regression concepts. Since then, it has been re-used and modified many times (see for
instance Antony and Jiju Antony, 2001; Annis, 2005; Viana et al, 2011), as it offers many pedagogical
and practical qualities.
We present here a variant of the problem, as it has been given to graduate students of the French
engineering school Ecole des Mines de Saint Etienne, as a practical work of two lectures: design of
experiments and global optimization.
The helicopter is composed of two superimposed rectangles (a slightly simpler version than the original
of Box, 1992); the upper rectangle is cut in two and folded to constitute the rotor and the lower rectangle
constitutes the tail. The rotor wings make a right angle with the tail. For stability, two paper clips are
attached at the bottom of the tail (Figure 1). The objective is to design the helicopter for maximum
flying time.
The helicopter must be cut from an A4 sheet without gluing; hence, we have the following constraints:
th + wh ≤ 29.7 cm
2ww ≤ 21 cm
The helicopter should hold two large paper clips (3 × 1cm) placed vertically next to each other, which
gives a minimal size for the tail:
tw ≥ 2×1 cm
th ≤ 3 cm
To ease in particular the design of experiments process, we decided to simplify those constraints to
interval constraints on the design variables:
3.0 ≤ wh ≤ 13.0 cm
1.0 ≤ ww ≤ 7.0 cm
3.5 ≤ th ≤ 10.0 cm
2.2 ≤ tw ≤ 6.0 cm
1
The flying time T is measured by throwing the helicopters from a fixed height, as large as possible
to minimize timing error (typically, from two storey high), preferably in a wind free environment (a
stairwell). This time is a random variable, due mainly to three factors: the experimental conditions
(wind, throwing gesture, ...), the measuring error and the manufacturing uncertainties (in particular, the
folding angle between the tail and the wings). Hence, we choose as an objective the maximization of the
expected flying time T̄ , which is measured by the average time of five throws.
The optimization problem is then expressed as follow:
max T̄
tw ,th ,ww ,wh
s.t. 3.0 ≤ wh ≤ 13.0 cm
1.0 ≤ ww ≤ 7.0 cm
3.5 ≤ th ≤ 10.0 cm
2.2 ≤ tw ≤ 6.0 cm
2 Methodology
2.1 General scheme
The overall objective of the project is the global optimization of the paper helicopter based on design of
experiments and metamodeling techniques. As the class is divided in groups, each group will experiment a
different strategy, all the results being eventually compared. However, all the strategies use the following
general scheme:
1. construction of an initial design of experiments (DOE) (50 designs);
2. construction of a metamodel based on the initial set of observations;
2
Steps 1, 2 and 4 are standard in surrogate-based optimization, while step 3 is more original. It
consists of adding a set of observations that are chosen based on the initial information, in particular in
order to improve the quality of prediction of the metamodel. At that stage, the design space can also
be reduced if some regions are clearly identified as non-optimal. it would be possible to suppress this
step and transfer the experimental effort (20 observations) to either step 1 or step 4; however, this step
is motivated by several reasons:
• it is more efficient than step 1 only, as part of the observations are model-oriented and within a
smaller region;
• step 4 requires to perform experiments one-by-one, which is very time consuming while they are
done all at once in step 3;
• it is also much more robust than step 4 when the number of observations is too small to fit an
accurate model.
Criteria The initial DOEs are improved by adding a set of 20 observations using model-oriented
criteria, that is, the new observations are added to improve the quality of the initial model. For PRS,
the D-criterion (determinant of the Fisher information matrix (see John and Draper (1975)) is used.
For kriging, the IMSE criterion is used (see Sacks et al (1989)). On both cases, the criterion is of the
form:
C(xp+1 , . . . , xp+n ) = φ(x1 , . . . , xp+n ), (1)
where x = (tw , th , ww , wh ) denotes a design, x1 , . . . , xn is the current DOE, xp+1 , . . . , xp+n is a set of
candidate additional observations, and φ is a scalar function defined by the model. Calculation details
are given in 4.
Improving the DOE amounts to solve the following optimization problem:
3
Optimization of the criteria Solving this problem is challenging, as it can be of high dimension
(4 × p), and the objective is strongly multimodal (it is for instance invariant by a permutation of any
xp+i , xp+j ). Therefore, it constitues an excellent application for global optimization algorithms.
The students have first implemented different strategies: Nelder-Mead, CMA-ES, DIRECT, etc.
Then, they are tested on two analytical functions (a quadratic function, and Michalewicz’s function)
before applying it to the DOE problem.
Kriging Contrarily to PRS, kriging is well-suited for sequential design, and the final step consists of
several iterations of the Efficient Global Optimization (EGO) algorithm (see Jones et al (1998)). Similarly
to the PRS case, this strategy amouts to find the maximizer of a function defined by the kriging mean
and variance called the Expected Improvement, defined as:
mn (x) − ymax mn (x) − ymax
EIn (x) = (mn (x) − ymax )Φ − sn (x)φ , (3)
sn (x) sn (x)
where Φ and φ denote the Gaussian cumulative distribution function and probability density, respectively,
mn and sn denote the kriging mean and variance and ymax is the maximum of the current set of
observations.
As the observations are noisy, the original EGO strategy should not be applied, so a heuristic modi-
fication (called the reinterpolation procedure) is used instead (see Forrester et al (2006)).
On both cases, as doing experiments one at a time is particularly time-consuming, only a few designs
are tested.
3 Discussion
Scientific challenges and potential advanced topics In our setting, a relatively large proportion
of a space-filling design (say, 20 to 30%) experience unstable flights, which result in a lot shorter recorded
times. Those unstability are not systematic, and often happen two or three times out of five.
A first consequence is that noise is largely heteroskedastic, which requires a statistical treament. It
even seems reasonable to infer a noise model, either continuous (noise varying smoothly with some design
variables) or discrete (by partitioning the space into stable/unstable regions).
Another interesting point is that optimal designs seem to be at the limit of stability (helicopter as
big as possible but stable). An alternative approach would be to define a constrained optimization prob-
lem. However, the stochasticity of the constraint evaluation (stability) makes the problem particularly
challenging.
Finally, it seems that a critical factor, hidden in our approach, is the angle between the wing and
the body. The students were told to ply the paper to an approximate 90 degree angle, but this is quite
difficult to control. As shown in Annis (2005), this angle has a strong influence on the helicopter velocity,
and its uncertainty may explain a large proportion of the noise in the observations. Controlling this angle
(by changing the helicopter design), or coupling the statistical approach with a physical model (Annis,
2005) might provide interesting solutions.
Pedagogical aspects Despite its apparent simplicity, this was found quite rich and challenging re-
garding both statistical and optimization aspects, which makes it an excellent tool for teaching.
First, each observation has an actual cost (timely speaking), hence it really motivates the students to
double-check their codes, designs of experiments, etc., before performing any actual experiment. Besides,
4
contrarily to computer experiments for instance, the study cannot be redone, so the students have to
deal with the data they have acquired, which sometimes lead to interesting compromises.
Second, this problem share many characteristics with current challenging computer experiments and
optimization problems: it can be seen as an expensive black-box, stochastic, multi-physics. Hence, while
the primary purpose of the problem was to teach engineers statistics (Box, 1992; Antony and Jiju Antony,
2001), we feel that it can also serve as an introduction to advanced research topics.
4 Appendix
4.1 D criterion
Let the polynomial reponse surface be of the form:
p
X
y(x) = fk (x) + ε (4)
k=1
For a design of experiments x1 , . . . , xn , the corresponding Fisher information matrix is given by:
M = FT F, (5)
where s2n () denotes the prediction variance based on n observations. In our context, p observations are
to be added to the design, so the criterion to optimize is:
Z
IM SE(xp+1 , . . . , xp+n ) = s2n+p (x)dx. (10)
D
The kriging prediction variance doe not depend on the value of the observations, hence sn+p can be
computed by updating the current kriging model with dummy observation values. Note that kriging
update equations can be use for speed-up, but they are not considered here.
The integral over D must be done numerically, using Gauss quadrature or Quasi Monte-Carlo methods
for instance. As the integral calculation is embedded in an optimization loop, it is important to always
use the same integration points, which creates some bias but avoid having to treat numerical noise in
the objective function.
5
References
Annis DH (2005) Rethinking the paper helicopter: Combining statistical and engineering knowledge.
The American Statistician 59(4):320–326
Antony J, Jiju Antony F (2001) Teaching the taguchi method to industrial engineers. Work Study
50(4):141–149
Box GE (1992) Teaching engineers experimental design with a paper helicopter. Quality Engineering
4(3)
Forrester A, Keane A, Bressloff N (2006) Design and Analysis of” Noisy” Computer Experiments. AIAA
journal 44(10):2331
John RS, Draper NR (1975) D-optimality for regression designs: a review. Technometrics 17(1):15–23
Jones D, Schonlau M, Welch W (1998) Efficient global optimization of expensive black-box functions.
Journal of Global Optimization 13(4):455–492
Sacks J, Welch W, Mitchell T, Wynn H (1989) Design and analysis of computer experiments. Statistical
science pp 409–423
Viana FA, Haftka RT, Hamman R, Venter G (2011) Efficient global optimization with experimental data:
revisiting the paper helicopter design. Gainesville: University of Florida