0% found this document useful (0 votes)

20 views164 pages

Interactive Virtual Cinematography

This thesis proposes methods for interactive virtual cinematography in 3D computer games. It introduces an automatic camera control system that uses optimization algorithms to dynamically position and animate the virtual camera based on high-level rules. It also presents an approach to model player camera preferences using machine learning and adapt the automatic camera behavior accordingly. Experiments show that the novel optimization algorithm and camera control architecture can generate smooth camera movements in real-time. A user survey also found that an adaptive cinematographic experience based on learned player preferences can positively impact the game experience.

Uploaded by

7ddv29qw89

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views164 pages

Interactive Virtual Cinematography

Uploaded by

7ddv29qw89

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 164

Interactive Virtual

Cinematography

Paolo Burelli
Center For Computer Games Research
IT University of Copenhagen

A thesis submitted for the degree of

Doctor of Philosophy

April 2012
To Diego and Rosalba
Acknowledgements

I would like to express my gratitude to my supervisor and friend Dr. Georgios

Yannakakis. He has been a critical and yet very encouraging supervisor and a
model for me to develop as a researcher. Furthermore, I would like to thank all
the colleagues that I had the honour and pleasure to collaborate with during my
studies at the Center for Computer Games Research. My gratitude goes also to
the members of my doctoral committee Dr. Julian Togelius, Dr. John Hallam
and Dr. James Lester for their insightful comments and suggestions. Finally, a
big “thank you” to my friends and family; this thesis is also the product of your
support, your patience and your love.
Abstract

A virtual camera represents the point-of-view of the player through which she
perceives the game world and gets feedback on her actions. Thus, the virtual
camera plays a vital role in 3D computer games and affects player experience
and enjoyability in games. Interactive virtual cinematography is the process of
visualising the content of a virtual environment by positioning and animating
the virtual camera in the context of interactive applications such as a computer
game.

Camera placement and animation in games are usually directly controlled by

the player or statically predefined by designers. Direct control of the camera by
the player increases the complexity of the interaction and reduces the designer’s
control on game storytelling. A completely designer-driven camera releases the
player from the burden of controlling the point of view, but might generate
undesired camera behaviours. Furthermore, if the content of the game is pro-
cedurally generated, the designer might not have the necessary information to
define a priori the camera positions and movements.

Automatic camera control aims to define an abstraction layer that permits to

control the camera using high-level and environment-independent rules. The
camera controller should dynamically and efficiently translate these rules into
camera positions and movements before (or while) the player plays the game.
Automatically controlling the camera in virtual 3D dynamic environments is an
open research problem and a challenging task. From an optimisation perspective
it is a relatively low dimensional problem (i.e. it has a minimum of 5 dimen-
sions) but the complexity of the objective function evaluation combined with the
strict time constraints make the problem computationally complex. Moreover,
the multi-objective nature of the typical camera objective function, introduces
problems such as constraints conflicts, over-constraining or under-constraining.

An hypothetical optimal automatic camera control system should provide the

right tool to allow designers to place cameras effectively in dynamic and unpre-
dictable environments. However, there is still a limit in this approach: to bridge
the gap between automatic and manual cameras the camera objective should
be influenced by the player. In our view, the camera control system should be
able to learn camera preferences from the user and adapt the camera setting to
improve the player experience. Therefore, we propose a new approach to auto-
matic camera control that indirectly includes the player in the camera control
loop.

To achieve this goal we have analysed the automatic camera control problem
from a numerical optimization perspective and we have introduced a new opti-
mization algorithm and camera control architecture able to generate real-time,
smooth and well composed camera animations. Moreover, we have designed
and tested an approach to model the player’s camera preferences using machine
learning techniques and to tailor the automatic camera behaviour to the player
and her game-play style.

Experiments show that, the novel optimisation algorithm introduced success-

fully handles highly dynamic and multi-modal fitness functions such as the ones
typically involved in dynamic camera control. Moreover, when applied in a
commercial-standard game, the proposed automatic camera control architec-
ture shows to be able to accurately and smoothly control the camera. Finally,
the results of a user survey, conducted to evaluate the suggested methodology for
camera behaviour modelling and adaptation, shows that the resulting adaptive
cinematographic experience is largely favoured by the players and it generates
a positive impact on the game performance.

8
Contents

1 Introduction 1
1.1 Problem Definition and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives and Key Contributions . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Automatic Camera Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Adaptive Camera Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Related Work 9
2.1 Navigation and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Virtual Camera Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Automatic Camera Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Camera Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Virtual Camera Composition . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 Camera Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Camera Control in Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.2 Artificial Intelligence in Games . . . . . . . . . . . . . . . . . . . . . 19
2.5.3 Player Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 Gaze Interaction in Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Automatic Camera Control 23

3.1 Frame constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Vantage Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 Object Projection Size . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.3 Object Visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.4 Object Frame Position . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.5 Camera Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

i
Contents

3.1.6 Composition Objective Function . . . . . . . . . . . . . . . . . . . . 31

3.2 Animation Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Controller’s Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Optimisation Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4.1 Artificial Potential Field . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.2 Sliding Octree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.3 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.4 Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.5 Particle Swarm Optimisation . . . . . . . . . . . . . . . . . . . . . . 41
3.4.6 Hybrid Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Animation Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5.1 Lazy Probabilistic Roadmap . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Adaptive Camera Control 49

4.1 Camera Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1.1 Gaze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.2 Behaviour Identification . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Test-bed Environments 61
5.1 Virtual Camera Sandbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.1 Functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.2 Environment Elements . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Lerpz Escape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.1 Game Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.2 Stages and Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.3 Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.4 Technical Characteristics . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

ii
Contents

6 Hybrid Genetic Algorithm Evaluation 73

6.1 Test-bed Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.1 Fitness-based Measures . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.2 Complexity Measures by Törn et al. . . . . . . . . . . . . . . . . . . 75
6.2.3 Dynamism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.4 Scenario categorisation by problem complexity . . . . . . . . . . . . 78
6.3 Algorithms Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7 Controller Evaluation 87
7.1 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.3 Collected Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.4.1 Order Of Play . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.4.2 Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.4.3 Gameplay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8 Evaluation of Adaptive Camera 99

8.1 Camera Behaviour Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.1.2 Collected Data Types and Feature Extraction . . . . . . . . . . . . . 103
8.1.3 Behaviour Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.1.4 Models Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.2 User Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

iii
Contents

9 Discussion and Conclusions 125

9.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.1.1 Dynamic Virtual Camera Composition . . . . . . . . . . . . . . . . . 125
9.1.2 Real-Time Automatic Camera Control . . . . . . . . . . . . . . . . . 126
9.1.3 Adaptive Camera Control . . . . . . . . . . . . . . . . . . . . . . . . 127
9.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.2.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
9.3 Extensibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.3.1 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.3.2 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.3.3 Manual Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9.3.4 Aesthetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9.3.5 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.3.6 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Bibliography 135

iv
List of Figures

1.1 Standard virtual camera parameters. . . . . . . . . . . . . . . . . . . . . . . 2

1.2 An example of an automatic camera control problem: the camera controller
is requested to find the position and orientation of the camera that produced
a certain shot (a) and it has to animate to camera to the target position
according to the characteristics of a certain animation (b) . . . . . . . . . . 3

2.1 Examples of advanced camera control in modern computer games. . . . . . 17

3.1 View angle sample shots. Each shot is identified by a horizontal and a vertical
angle defined in degrees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Object projection size sample shots. Each shot is identified by a projection
size defined as the ratio between the longest side of the object’s projection
bounding box and the relative side of the frame. . . . . . . . . . . . . . . . 27
3.3 Object visibility sample shots. Each shot is identified by a visibility value
defined as the ratio between the visible area of the object and its complete
projected area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Example of the 5 points used to check visibility in the Object Visibility
objective function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Object frame position sample shots. Each shot is identified by a two-dimensional
vector describing the position of the object’s center in the frame. . . . . . . 30
3.6 Controller’s architecture. The black squares identify the controllable settings,
while the white squares identify the virtual environment features considered
by the controller. The two modules of the architecture are identified by the
white ellipses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.7 Position (b) and orientation (c) potential fields produced by a visibility con-
straint on a sphere (a). The two three-dimensional potential fields are sam-
pled along the XZ plane on at the Y coordinate of the sphere. . . . . . . . . 36

v
List of Figures

3.8 Example of a Sliding Octree iteration step. At each pass of the iteration, the
branch of the octree containing the best solution is explored, the distance
between the children nodes and the parent node decreases by 25% at each
level of the tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.9 Chromosome of an individual of the Genetic Algorithm, containing 5 real
values describing the camera position and orientation. . . . . . . . . . . . . 40
3.10 Flowchart representing the execution flow of the proposed hybrid GA. The
labels CR, MU, A-MU are used to refer, respectively, to the crossover, mu-
tation and APF base mutation operators . . . . . . . . . . . . . . . . . . . . 44
3.11 Probabilistic Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1 Camera behaviour modelling and adaptation phases of the adaptive camera
control methodology. The first phase leads to the construction of a camera
behaviour model, which is used in the adaptation phase to drive the auto-
matic camera controller and generate the adaptive camera behaviour. . . . 50
4.2 A screenshot from a 3D platform game in which the objects observed by the
player are highlighted by green circles. . . . . . . . . . . . . . . . . . . . . . 51
4.3 Example of a data collection setup. The player plays a game and manu-
ally controls the virtual camera. The game keeps track of the virtual camera
placement, the player behaviour and the gaze position on the screen; gaze po-
sition is recorded using a gaze-tracking device. The software used in the setup
portrayed in the figure is the ITU Gaze Tracker (http://www.gazegroup.org),
and the sensor is an infra-red camera. . . . . . . . . . . . . . . . . . . . . . 54
4.4 An example of a fully connected feed-forward artificial neural network. Start-
ing from the inputs, all the neurons are connected to all neurons in the sub-
sequent layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5 Camera behaviour prediction scheme. The neural network is presented with
the features describing the gameplay characteristics of the next record and
the features about the previous player behaviour as input and returns the
next predicted camera behaviour for the next record as the output. . . . . . 58

5.1 Interface of the virtual camera sandbox, showing the virtual environment and
the interface elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Main components of the virtual camera sandbox. From left to right: the
subject, the forest, the house and the square. . . . . . . . . . . . . . . . . . 63

vi
List of Figures

5.3 Maximum value of the objective function sampled for each area of the sand-
box across the X and Z axis. The position and orientation of each subject is
identified by the black marks. The composition problem evaluated in each
figure contains three frame constraints: an Object Visibility an Objective
Projection Size and a Vantage Angle. . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Virtual camera sandbox architecture. . . . . . . . . . . . . . . . . . . . . . . 65
5.5 Main components of the Lerpz Escape game. From left to right: player’s
avatar (Lerpz), a platform, a collectible item (fuel canister), an enemy (cop-
per), a respawn point and Lerpz’s spaceship. . . . . . . . . . . . . . . . . . 66
5.6 The two virtual environments employed in the user evaluation. The avatar is
initially placed at the right side of the map, close to the dome-like building,
and has to reach the space-ship at the left side of the map. . . . . . . . . . 67
5.7 The three different area types met in the test-bed game. . . . . . . . . . . . 68
5.8 Sceenshots from the game used during the evaluation displaying the different
controls schemes. The game interface displays the game controls configu-
ration, as well as the current number of collected canisters and the time
remaining to complete the level. . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.1 Test problems sorted in a scatter plot according to their dynamism factor D
and their landscape complexity P . The Pareto front identified by the squares
contains all the non dominated problems in terms of complexity and dynamism. 79
6.2 Median best solution value over time (median run) for each algorithm on the
problems in which the proposed hybrid GA fails to achieve the best results. 83

7.1 Screen-shots of the game menus introducing the experiment and gathering
information about the subject and her experience. . . . . . . . . . . . . . . 88
7.2 Expressed preferences (7.2a) and corresponding motivations (7.2b). The bar
colours in the motivations chart describe which preference the motivations
have been given for. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

vii
List of Figures

7.3 Differences ∆F = Fa − Fm of completion time (7.3a), number of canisters

collected (7.3b), number of jumps (7.3c) and number of falls (7.3d) between
the games played with the automatic camera controller and the ones with
manual camera control. The background color pattern shows which level was
preferred and which level was played first for each game played. If the dark
grey bar is in the upper half of the plot the level featuring the automatic
camera controller was preferred and vice versa. If the light grey bar is in the
upper half of the plot the level featuring the automatic camera controller was
played first and vice versa. The four features displayed have a significant or
close-to-significant (i.e. p-value < 0.10) correlation with either the order of
play or the camera control paradigm. . . . . . . . . . . . . . . . . . . . . . . 96

8.1 Experimental setup used for the data collection experiment. . . . . . . . . . 101
8.2 Best 3-fold cross-validation performance obtain by the three ANNs across
different input feature sets and past representations. The bars labelled 1S
refer to the one step representation of the past trace, the ones labelled 2S refer
to the two step representation and 1S+A to the representation combining one
previous step and the average of the whole past trace. . . . . . . . . . . . . 109
8.3 Example of a transition from one camera profile to another. As soon as the
avatar enters the bridge, the neural network relative to the fight area is ac-
tivated using the player’s gameplay features, recorded in all the previously
visited fight areas, as its input. The camera profile associated to the be-
haviour cluster selected by the network is activated until the avatar moves
to a new area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.4 Changes in a camera profile before and after the collection of one fuel canister
and the activation of one re-spawn point. The screen-shots above each profile
depict the camera configuration produced by the two profiles for the same
avatar position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.5 Expressed preferences (a) and corresponding motivations (b). The bar colours
in the motivations chart describe which preference the motivations have been
given for. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.6 Subjects sorted according to their average completion time and average num-
ber of collected canisters. The subjects indicated with a cross symbol belong
to the expert players cluster, the ones indicated with a triangle symbol be-
long to the average players cluster, while the subjects indicated with a circle
symbol belong to the novices cluster. . . . . . . . . . . . . . . . . . . . . . . 118

viii
List of Figures

8.7 Differences ∆F = Fad − Fau of completion time (a), number of canisters

collected (b), number of jumps (c) and number of falls (d) between the levels
played with the adaptive camera behaviours and the ones without. The
background color pattern shows which level was preferred and which level
was played first for each game played. If the dark grey bar is in the upper half
of the plot the level featuring the adaptive camera controller was preferred
and vice versa. If the light grey bar is in the upper half of the plot the level
featuring the adaptive camera controller was played first and vice versa. . . 122

ix
List of Figures

x
Chapter 1

Introduction

According to the Oxford Dictionary Of English (2005), cinematography is “is the art of
photography and camerawork in film-making”; following this definition it is possible to
define the concept of virtual cinematography (or virtual camera control) as the application
of this art to virtual reality. With the term interactive virtual cinematography we identify
the process of visualising the content of a virtual environment in the context of an interactive
application such as a computer game. In this thesis, we propose the automation of such a
creative process using automatic camera control, we investigate the challenges connected to
camera automation and we examine solutions for them. In addition we investigate means
for the player to influence the automatic control process.

1.1 Problem Definition and Motivation

A virtual camera represents the point-of-view of the player through which she perceives the
game world and gets feedback on her actions. The perspective camera model in OpenGL 1

defines a virtual camera with six parameters: position, orientation, field of view, aspect
ratio, near plane and far plane (see Figure 1.1). Camera position is a three-dimensional
vector of real values defining a Cartesian position. Camera orientation can be defined either
using a quaternion, a set of three Euler angles or a combination of two three-dimensional
vectors describing the front direction and the up direction.
Camera placement — i.e the configuration of the camera parameters within the virtual
environment — and animation — i.e. the process of transitioning from one set of camera
parameters to another one — play a vital role in 3D graphics interactive applications and
it deeply influences his way to perceive the environment and his ability to effectively ac-
complish any task. In applications such as 3D modellers and computer games, the virtual
camera provides means of interaction with the virtual environment and has a large impact
1
Open Graphics Library - http://www.opengl.org

1
Chapter 1. Introduction

Figure 1.1: Standard virtual camera parameters.

on the usability and the overall user experience (Pinelle and Wong, 2008). Moreover, in
3D computer games the presentation of the game events largely depends on the camera
position and movements, thus virtual camera control has a significant impact on aspects
such as gameplay and story-telling.
In computer games, the virtual camera is usually directly controlled by the player or
manually predefined by a game designer. While the player’s control is often assisted by the
game, direct control of the camera by the player increases the complexity of the interaction
and reduces the designer’s control over game storytelling. On the other hand, designer
placed cameras release the player from the burden of controlling the point of view, but they
cannot guarantee a correct visualisation of all possible player actions. This often leads the
game designers to reduce the range of possible player actions to be able to generate a more
cinematographic player experience. Moreover, in multi-player games or in games where
the content is procedurally generated, the designer has potentially no information to define
a-priori the camera positions and movements.
Automatic camera control aims to define an abstraction layer that permits the control of
the camera using high-level and environment-independent requirements, such as the visibil-
ity of a particular object or the size of that object on the screen. Given these requirements
and the game state at any time, the camera controller should dynamically and efficiently
calculate an optimal configuration. An optimal camera configuration is defined as the com-
bination of camera settings which maximises the satisfaction of the requirements imposed
on the camera, known as camera profile (Bares et al., 2000).
The camera requirements describe the desired camera in terms of abstract properties
such as frame composition or motion. Frame composition (see Figure 1.2a) describes the
way in which the objects should be framed by the camera, such as their position in the
frame or the size of their projected image. Camera motion requirements describe the way

2
1.1. Problem Definition and Motivation

Subj
ect

Camer
a

(a) Composition (b) Animation

Figure 1.2: An example of an automatic camera control problem: the camera controller is
requested to find the position and orientation of the camera that produced a certain shot
(a) and it has to animate to camera to the target position according to the characteristics
of a certain animation (b)
.

the virtual camera should be animated in the virtual environment while framing the subjects
(see Figure 1.2b). The definition of these requirements originates from the rules used to
shoot real-world scenes in cinematography and photography (Arijon, 1991).
Finding the optimal camera that maximises the fulfilment of the designer requirements
is a complex optimisation problem, despite its relatively low dimensionality (i.e. it can
be modelled using a 5 to 10 dimensional space); the complexity of the objective function
evaluation and the short execution time imposed to ensure a real-time execution, make
the problem computationally hard. The evaluation of visual aspects such as visibility or
size of the project image require computationally expensive operations such as rendering or
ray-casting and their evaluation functions often generate terrains that are very rough for
a search algorithm to explore. Moreover, in real-time camera control, the algorithm needs
to find the best possible solution within the rendering time of a frame (typically between
16.6 and 40 ms in commercial games) and needs to maintain it throughout the computation
while the environment changes due to the dynamic nature of ergodic media such as games.
Several techniques for automatic camera control have been proposed in the past — the
reader is referred to Christie et al. (2008) and Chapter 2 of this thesis for a comprehensive
review. The most common approaches model the camera control problem as a constraint
satisfaction or optimisation problem. These approaches allow the designer to define a set
of requirements on the frames that the camera should produce and on the camera motion.
Depending on the approach, the controller positions and animates one or more virtual
cameras that attempt to satisfy the predefined requirements.
While automatic camera control aims to automatise the translation process between
high level requirements and the camera animation and placement, the definition of the
requirements is commonly delegated to a human designer which hand-crafts manually the

3
Chapter 1. Introduction

cinematographic experience. Some efforts have been dedicated also to the automation of
these requirements (Bares and Lester, 1997a; Bares et al., 1998; Tomlinson et al., 2000). In
particular the studies of Bares et al. (1997a; 1998) have investigated the personalisation of
the cinematographic experience through task and user modelling.
In this thesis we propose a novel approach to interactive virtual cinematography in
which we address both the problem of efficient and effective automatic camera control
(see Section 1.3) and camera behaviour adaptivity (see Section 1.4). For this purpose, we
present an automatic camera control framework, we name CamOn, that addresses both
virtual camera composition (i.e. finding the best camera configuration to fulfil the com-
position requirements) and camera animation (i.e. finding the best camera trajectory to
fulfil the motion requirements). We introduce a novel hybrid search algorithm able find
and track the optimal camera configuration in dynamic real-time environments and, finally,
we present a methodology to build camera user models and use them to generate adaptive
cinematographic experiences for interactive 3D virtual environments.

1.2 Objectives and Key Contributions

This thesis aims to contribute primarily to the field of virtual cinematography; however, mi-
nor contributions to areas of research peripheral to camera control, such as player modelling,
natural interaction and dynamic optimisation, are also present. An impact is expected also
on the computer games industry as we believe that the instruments for interactive virtual
cinematography described in this thesis could expand the design possibilities for future
computer games.
The main research questions that will be answered in this thesis are as follows.

1. How to effectively approach camera composition and animation in real-time in dy-

namic and interactive environments.

2. How does automatic camera control affect the player experience in three-dimensional
computer games.

3. Within the framework of automatic camera control, how can the player affect the
cinematographic experience.

In order to answer these research questions, this thesis pursuits two objectives: develop
and evaluate a real-time automatic camera controller, and design and evaluate a methodol-
ogy for adaptive camera control. Our first hypothesis is that a combination of optimisation

4
1.3. Automatic Camera Control

and path planning can be used to successfully animate the camera in dynamic and inter-
active environments; moreover, we believe, that by hybridising a population-based optimi-
sation algorithm with a local search algorithm, a camera controller can achieve sufficient
efficiency, accuracy and robustness to be employed in such conditions. We also hypothesise
that controlling the camera using such an approach can effectively increase the quality of
the player experience in computer games and can improve the player’s performance. Our
last hypothesis is that player modelling can be employed to implicitly involve the player in
the control loop of the camera also in the context of automatic camera control.
In summary, the main contributions of this thesis are as follows.

• A novel architecture for automatic camera control that is able to address both the
problems of virtual camera composition and virtual camera animation in interactive
virtual environments (Burelli and Yannakakis, 2010a, 2012a).

• A novel hybrid optimisation algorithm for dynamic virtual camera composition that
combines a Genetic Algorithm with an Artificial Potential Field based local search
algorithm (Burelli and Jhala, 2009a,b; Burelli and Yannakakis, 2010b, 2012b).

• A novel methodology to generate personalised cinematographic experiences in com-

puter games based on player modelling and adaptation (Picardi et al., 2011; Burelli
and Yannakakis, 2011).

In the remanding of the chapter we introduce the concepts of automatic camera control
and adaptive camera control.

1.3 Automatic Camera Control

To address the key problems of camera composition and animation, we propose a novel
approach that combines advantages from a set of algorithms with different properties. In
particular, we combine a local search algorithm with a population-based algorithm to couple
the computational efficiency of the first with the robustness of the second. Finally, the
solution found through the combination of these algorithms is used as a target to guide a
3D path planning algorithm.
At each iteration, the system evaluates two types of input: the current state of the
environment and the camera requirements. The first input class includes the geometry of
the scene and the camera configuration defined as a three-dimensional position and a two-
dimensional rotation (represented as spherical coordinates). The scene geometry is stored
in a engine-dependent data structure (usually a scene tree) and it is used to evaluate the
frame constraints that define the composition problem.

5
Chapter 1. Introduction

The second input class includes the desired frame composition properties — which
describe how the scene rendered through the virtual camera should look like — and the
desired camera motion properties. Frame composition properties describe the disposition
of the visual elements in the image (Arijon, 1991); following the model proposed by Bares
et al. (2000), we define a set of properties each of which may be applied to an object of
interest for the camera.
The optimal camera configuration is calculated by optimising an objective function
which is proportional to the satisfaction level of the required frame composition proper-
ties. For this purpose, we have designed an hybrid optimisation algorithm that combines a
Genetic Algorithm (Holland, 1992) with an Artificial Potential Field (Khatib, 1986) based
search algorithm. The structure of this hybrid meta-heuristic algorithm follows the struc-
ture of a Genetic Algorithm with one main difference: a mutation operator that is driven
by an APF is added to the standard crossover and mutation operators. Such an operator is
added to exploit the knowledge of the objective function by employing an Artificial Poten-
tial Field optimisation algorithm based on an approximation of the composition objective
function derivative.
The results of a comparative evaluation show that this hybrid Genetic Algorithm demon-
strates the best reliability and accuracy across most of the investigated scenarios indepen-
dently of task complexity.
Furthermore, the overall architecture combining the hybrid Genetic Algorithm with a
Probabilistic Roadmap Method (Kavraki et al., 1994) for camera animation is evaluated
in a commercial-standard three-dimensional computer game production. A user survey on
this game reveals both high levels of player satisfaction for the automatic camera and a
significant preference for the automatic camera over the manual camera scheme.

1.4 Adaptive Camera Control

Virtual camera parameters are commonly hand-crafted by game designers and are not
influence by player preferences. Including the player in the definition of these parameters
requires the construction of a model of the relationship between camera motion and player
experience (Martinez et al., 2009). We aim to close the loop of automatic camera control
by allowing the player to implicitly affect the automatic camera controller; for this purpose
we investigate player preferences concerning virtual camera placement and animation, and
we propose a model of the relationship between camera behaviour and player behaviour,
and game-play. This model is used to drive the automatic camera controller and provide a
personalised camera behaviour.

6
1.5. Thesis Structure

In this thesis, we present a methodology to build user models of camera behaviour from
the combination of player’s gaze, virtual camera position and player’s in-game behaviour
data. Including gaze information allows for a finer analysis of the player’s visual behaviour
permitting, not only to understand what objects are visualised by the player, but also which
ones are actually observed. Information on player’s visual focus also allows to filter exactly
which object is relevant for the player among the ones visualised by the player through her
control of the virtual camera.
From this data, a set of camera behaviours is extracted using a clustering algorithm
and the relationship between such behaviours and the players’ playing style is modelled
using machine learning. This model is then used to predict which kind of camera behaviour
does the user prefer while playing the game in order to appropriately instruct the camera
controller to replicate such a behaviour.

1.5 Thesis Structure

The thesis is organized into chapters as follows.

Chapter 2 outlines the state-of-the-art in control, user modelling and their application
to games; furthermore, it presents an extensive literature review of automatic virtual cine-
matography.
Chapter 3 presents a camera control architecture designed to successfully solve the virtual
camera composition and animation problems and it describes the algorithms and techniques
employed.
Chapter 4 describes a methodology to design adaptive cinematographic experiences in
games by building user models of the camera behaviour and using them to control camera
movements.
Chapter 5 presents the virtual environment developed to evaluate the algorithms and
methodologies presented in this thesis.
Chapter 6 presents an evaluation of the optimisation algorithm at the core of the automatic
camera control architecture presented in the thesis.
Chapter 7 showcases the application of the automatic camera control architecture pre-
sented in this thesis to a commercial-standard 3D platform game and it tests its performance
trough a user evaluation experiment.
Chapter 8 describes a case study that showcases the application of the adaptive camera
control methodology presented in Chapter 4 to a 3D platform game. Moreover it presents
a user evaluation of the resulting adaptive camera controller.

7
Chapter 1. Introduction

Chapter 9 summarises the thesis’ main achievements and contributions and discusses the
proposed approaches’ current limitations. Moreover, potential solutions that might embrace
these drawbacks are presented.

8
Chapter 2

Related Work

This chapter briefly outlines the state-of-the-art of navigation, control and optimisation
since these are the areas in which interactive cinematography and our solution belong to.
The analysis then narrows its scope by presenting an extensive literature review of virtual
camera control and it follows with an analysis of camera control in computer games. The
chapter closes with a review of the techniques used for optimisation, player modelling and
adaptation in games and with a summary of the findings.

2.1 Navigation and Control

Navigation (i.e. motion planning) has attracted the interest of different communities, such
as non-linear control, robotics and artificial intelligence. A classical motion planning prob-
lem raises several challenges (Salichs and Moreno, 2000): first, the agent must be able to
control its movements while interacting with the external environment; second, the agent
must be able to collect knowledge of the environment and be aware of its state within the
environment, finally the agent must be able to identify the goal location and an optimal
path that connects its current location to the goal.
In real-world motion control problems, such as autonomous vehicle control (Frazzoli
et al., 2000), the controller is required to handle the agent’s interaction with the physical
world and it has to deal with aspects such as obstacle avoidance, inertia or speed. Common
techniques for motion control include Artificial Potential Field (APF) (Khatib, 1986), Vi-
sual Servoing (Corke, 1994) or Certainty Grids (Moravec and Elfes, 1985); such techniques
address the motion control problem in different environment types and with different types
of inputs, for instance Artificial Potential Field require a global spatial information, while
Visual Servoing uses local visual information acquired through one or multiple cameras.
Furthermore, if the world model is unknown, an agent also needs the ability to explore
and learn about their environment, build a model of the world and identify its state within

9
Chapter 2. Related Work

this model. The representation of such a model greatly varies among different approaches
depending on the type of information available about the world and the moment in which
the learning takes place. Occupancy Grids (Filliat and Meyer, 2002) is a deterministic
graph based model of the environment which is built during the navigation process, while
Certainty Grids (Moravec and Elfes, 1985) employ Artificial Neural Networks to store the
probability of a certain position to be occupied by an obstacle.
In order to reach the goal destination, an agent needs to identify a plan that connects its
current position to the target one. Probably the most popular technique used for this task
is A* (Hart et al., 1968). A* performs a best-first search to find the least-cost path from
a given initial cell to the goal cell in a discrete space. Other algorithms such as Reinforce-
ment Learning (Sutton and Barto, 1998), Monte-Carlo Search (bar, 1990) or Probabilistic
Roadmap Methods (PRM) (Kavraki et al., 1994) address the path planning problems under
different conditions such as a continuous space or an unknown distance heuristic function.
Controlling the view point in a 3D application shares some of the same problems as
the virtual camera needs to be moved through a virtual environment composed by com-
plex three-dimensional geometries; the next section will introduce how the problem has
been modelled within the field of virtual camera control and it will show how some of the
aforementioned techniques have been successfully applied to address different tasks.

2.2 Virtual Camera Control

Since the introduction of virtual reality, virtual camera control attracted the attention of
a large number of researchers (refer to Christie et al. (2008) for a comprehensive review).
Early approaches focused on the mapping between the degrees of freedom (DOF) for input
devices to 3D camera movement. Ware and Osborne (1990) proposed the following set of
mappings:

Eyeball in hand: the camera is controlled by the user as if she holds it in her hands;
rotations and translations are directly mapped to the camera.

Scene in hand: the camera is pinned to a point in the world, and it rotates and translates
with respect to that point; the camera faces always the point and rotates around it,
the forward direction of the movements is the relative direction of the point. Such a
metaphor has been explored also by Mackinlay et al. (1990) in the same period.

Flying vehicle control: the camera is controlled similarly to an aeroplane, with controls
for translation and rotation velocity.

10
2.3. Automatic Camera Control

While these metaphors are currently still common in many virtual reality applications,
direct manipulation of the several degrees of freedom of the camera soon demonstrated
to be problematic for the user, leading researchers to investigate how to simplify camera
control (Phillips et al., 1992; Drucker and Zeltzer, 1994). Another control metaphor, called
through-the-lens camera control, was introduced by Gleicher and Witkin (1992) and was
refined by Kyung and Kim (1995). In through-the-lens camera control, the user controls the
camera by translating and rotating the objects on the screen; the camera parameters are
recomputed to match the new user’s desired locations on screen by calculating the camera
displacement from the object’s screen displacement. Christie and Hosobe (2006) extended
this metaphor by including an extended set of virtual composition primitives to control the
features of the image.
In parallel to the research on control metaphors, a number of researchers investigated
the automation of the camera configuration process. The first example of an automatic
camera control system was showcased in 1988 by Blinn (1988). He designed a system to
automatically generate views of planets in a space simulator for NASA. Although limited in
its expressiveness and flexibility and suitable only for non interactive applications, Blinn’s
work served as an inspiration to many other researchers that investigated more efficient
solutions and more flexible mathematical models able to handle more complex aspects such
as camera motion and frame composition (Arijon, 1991) in static and dynamic contexts.

2.3 Automatic Camera Control

Automatic camera control identifies the process of automatically configuring the camera in a
virtual environment according to a set of requirements. It is a non-linear automatic control
problem (Pontriagin, 1962): the system’s input are the requirements on composition and
motion, while the internal state of the system is defined by the camera parameters — e.g.
position and rotation. Following this model. it is possible to identify three main research
questions:

• How to identify the best inputs for the system.

• How to find the optimal state of the camera with respect to the given inputs.

• How to control the dynamic behaviour of the camera.

The first problem, the definition of the system’s inputs, is often referred as camera/shots
planning and it deals with the selection of the shot type to be used to frame a certain event
in the virtual world. The second problem deals with the approach to be used to find the
optimal camera configuration that satisfies the input requirements. Constraint satisfaction

11
Chapter 2. Related Work

or optimisation techniques are often used for this purpose. The last problem deals with the
control of the camera movements during the framing of one scene and during the transitions
between two subsequent shots.

2.3.1 Camera Planning

The camera planning problem was defined for the first time by Christianson et al. (1996)
as the problem of automatically scheduling the sequence of shots to film one or more events
in a virtual environment. Christianson et al. proposed a language (DCCL) to define shot
sequences and to automatically relate such sequences to events in the virtual world. He
et al. (1996) extended the concept of idioms within DCCL by modelling them as a finite
state machines and allowing for richer expressiveness. Bares and Lester (1997b; 1998)
suggested and evaluated a system that selects the most appropriate camera settings to
support different user’s tasks in a virtual learning environment. Charles et al. (2002) and
Jhala and Young (2005) investigated the automatic generation of shot plans from a story.
Moreover, Jhala and Young (2009; 2010) proposed an evaluation method for such task,
based on the users’ understanding of the story represented.
An alternative approach to shot generation through planning has been proposed by
various researchers. Tomlinson et al. (2000) modelled the camera as an autonomous virtual
agent, called CameraCreature, with an affective model and a set motivations. The agent
shots the most appropriate shot at every frame according to the events happening in the
environment and its current internal state. Bares and Lester (1997a) investigated the idea of
modelling the camera behaviour according to the user preferences to generate a personalised
cinematographic experience. The user model construction required the user to specifically
express some preferences on the style for the virtual camera movements. In this thesis, we
propose a methodology to build user profiles of camera implicitly by capturing a user’s gaze
movements on the screen during the interaction.

2.3.2 Virtual Camera Composition

The aforementioned approaches addressed the problem of automatic shot selection, however,
once the shot has been selected it is necessary to identify the best camera configuration to
convey the desired visuals. This process involves two aspects: the definition of the desired
shot in terms of composition rules and the calculation of the camera parameters.
The first seminal works addressing virtual camera composition according to this model
(Jardillier and Languénou, 1998; Bares et al., 2000; Olivier et al., 1999) defined the concept
of frame constraint and camera optimisation. These approaches require the designer to
define a set of required frame properties which are then modelled either as an objective

12
2.3. Automatic Camera Control

function to be maximised by the solver or as a set of constraints that the camera configu-
ration must satisfy. These properties describe how the frame should look like in terms of
object size, visibility and positioning.
Jardillier and Languénou (1998) as well as Bares et al. (2000) modelled the visual prop-
erties as constraints and looked for valid camera configurations within the constrained space.
The solver suggest by Bares et al., as well as all the other constraint satisfaction approaches,
returns no solution in the case of conflicting constraints. Bares and Lester (1999) addressed
the issue by identifying conflicting constraints and producing multiple camera configurations
corresponding to the minimum number of non-conflicting subsets.
In contrast to constraint satisfaction approaches, global optimisation approaches (Olivier
et al., 1999; Halper and Olivier, 2000) model frame constraints as an objective function (a
weighted sum of each required frame property) allowing for partial constraint satisfaction.
These approaches are able to find a solution also in case of conflicting constraints; how-
ever, they may converge to a near-optimal solution and their computational cost is usually
considerably higher than the constraint satisfaction ones. A set of approaches (Pickering,
2002; Christie and Normand, 2005; Burelli et al., 2008) address this computational cost
issue by combining constraint satisfaction to select feasible volumes (thereby reducing the
size of the search space) and optimisation to find the best camera configuration within these
spaces. While such solutions are more robust than pure constraint satisfaction methods,
the pruning process might still exclude all possible solutions.
An algorithm’s computational cost becomes a critical factor especially when the algo-
rithm is applied for control on real-time dynamic virtual environments. In the camera
control context the controller is required to produce a reasonable solution at short intervals
(potentially down to 16.6 ms) for an indefinitely long time. For this reason, researchers
have also investigated the application of local search methods; for instance, Bourne et al.
(2008) proposed a system that employs sliding octrees to guide the camera to the optimal
camera configuration.
Local search approaches offer reasonable real-time performance and handle well frame
coherence, but often they converge prematurely to local minima. This characteristic be-
comes critical when the camera control system has to optimise the visibility of a subject in
the frame since the visibility heuristic consists of many local minima areas with almost no
gradient information available to guide local search.

Occlusion

Successfully handling object occlusion constitutes a vital component of an efficient camera

controller (Christie et al., 2008). Object visibility plays a key role in frame composition.

13
Chapter 2. Related Work

For instance, an object satisfying all frame conditions (e.g. position in frame and projection
size) does not provide any of the required visual information if it is completely invisible due
to an occlusion.
The object occlusion problem can be separated in two dissimilar yet dependent tasks:
occlusion evaluation/detection and occlusion minimisation/avoidance. An occlusion occurs
when one of the subjects the camera has to track is hidden (fully or partially) by another
object. A common technique to detect occlusion consists of casting a series of rays between
the object of interest and the camera (Burelli et al., 2008; Bourne et al., 2008). A similar
approach (Marchand and Courty, 2002) generates a bounding volume containing both the
camera and the object of interest and checks whether other objects intersect this volume.
A third approach (Halper et al., 2001) exploits the graphic hardware by rendering the scene
at a low resolution with a colour associated to each object and checking the presence of the
colour associated to the object of interest. All the optimization algorithms examined in this
thesis perform occlusion-detection using ray casting as presented in Chapters 6,7 and 8 of
the thesis.
Based on their occlusion detection technique, Halper et al. (2001) deal with the problem
of occlusion avoidance by sequentially optimising each constraint and solving visibility as
the last problem in the sequence; therefore, occlusion minimisation overrides all the previous
computations. Bourne et al. (2008) devised an escape mechanism from occluded camera
positions which forces the camera to jump to the first non-occluded position between its cur-
rent position and the position of the object of interest. Their approach, however, considers
just one object of interest.
Pickering (2002) proposed a shadow-volume occlusion avoidance algorithm where the
object of interest is modelled as a light emitter and all the shadow-volumes generated are
considered occluded areas. However, the aforementioned approach to occlusion avoidance is
not suitable for real-time applications like games due to their high computational cost. Lino
et al. (2010), within their system on real-time cinematography, use a similar approach, based
on a cell-and-portal visibility propagation technique, which achieves real-time performance.

2.3.3 Camera Animation

The aforementioned virtual camera composition approaches calculate an optimal camera

configuration given a shot description. However they do not take into consideration the
dynamic nature of the scene and the camera. The camera is positioned in order to max-
imise the satisfaction of the current requirements without considering the previous and the
future states of the system. Nevertheless, these aspects are extremely relevant in the auto-
matic generation of camera animations and the automatic control of the camera in dynamic

14
2.4. Camera Control in Games

environments. In these conditions, continuity in camera position, smooth animations and

responsiveness become critical aspects.
Beckhaus et al. (2000) proposed a first approach to automatise the camera animation
for virtual museum tours generation. Their system used an Artificial Potential Field (APF)
to guide the camera through the museum: each painting was modelled as low potential
area attracting the camera and every time a painting was reached the potential of the area
was deactivated so that the camera would smoothly continue towards the next paining.
The system suggested by Bourne et al. (2008) is able to control also the camera animation
during the optimisation process, this is a characteristic common to local search approaches,
as subsequent solutions are normally close to each other an can, therefore, be used to
animate the camera.
Such an approach combines both the identification of the target camera configuration
and the animation of the camera towards the target. However, due to the hill-climbing na-
ture of the algorithms adopted these approaches often fail to find a smooth path for the cam-
era as they tend to prematurely converge into local optima. A set of approaches (Nieuwen-
huisen and Overmars, 2004; Baumann et al., 2008; Oskam et al., 2009) have focused purely
on the camera path planning task and have proposed different forms of the probabilistic
roadmap method to generate smooth and occlusion aware camera paths.
The camera control framework presented in this thesis addresses the problems of camera
composition and animation by combining different algorithms. In particular, we couple a
local search algorithm with a population-based algorithm to combine the computational
efficiency of the first with the robustness of the second (Burelli and Yannakakis, 2012b).
Finally the solution found through the combination of these algorithms is used as a target
to guide a 3D path planning algorithm (Burelli and Yannakakis, 2012a).

2.4 Camera Control in Games

Computer games stand as a natural benchmark application for automatic camera control
techniques as they impose the necessity for real-time execution; moreover, the camera needs
to react to unexpected and dynamic changes of the environment and to accommodate the
behaviour of the player to foster an optimal interactive experience. Despite this, in the
game industry, virtual cinematography has received considerably less attention than other
aspects such as visual realism or physics, and camera control has been mostly restricted to
the following standard interaction paradigms (Christie et al., 2008):

First person: the camera position and orientation corresponds to the player’s location in
the virtual environment; therefore, the camera control scheme follows the character

15
Chapter 2. Related Work

control scheme. Examples of games adopting such a camera control scheme include
Quake (id Software, 1996) and Halo: Combat Evolved (Microsoft Game Studios, 2001).

Third person: the camera either follows the character from a fixed distance with different
angles to avoid obstacles in the environment or multiple cameras are manually placed
in the environment and the view point changes according to the position of the main
game character. A different variant of the first type of camera control paradigm is
used in strategy or managerial games; in these games the target of the camera is
freely selectable by the player. Examples of games adopting such a camera control
scheme include Tomb Raider (Eidos Interactive, 1996) and Microsoft Game Studios
(Microsoft Game Studios, 2007).

Cut-scenes and replays: in these non interactive phases of the games, the camera focuses
on representing the important elements of the story rather than focuses on fostering
the interaction. It is often used in sport games (i.e. replay) and in story-heavy games
(i.e. cut-scenes). Games featuring such a camera control scheme include Metal Gear
Solid (Konami, 1998) or most sport video games.

Recent games demonstrate an increasing tendency to extend this list of paradigms and
enhance the player experience by employing cinematographic techniques to portrait narra-
tive and interaction in games. Examples such as Heavy Rain (Sony Computer Entertain-
ment, 2010) or the Uncharted (Sony Computer Entertainment, 2007) series show extensive
usage of cinematic techniques to frame the in-game actions (see Figure 2.1a). In such games,
however, the cameras are set manually in place during the development of the game; reduc-
ing heavily the movement and the actions the player can take. Moreover, such a solution
would be inapplicable in games in which the content is not known in advance since it is
either procedurally generated (Yannakakis and Togelius, 2011; Shaker et al., 2010) or crowd
sourced — e.g. World Of Warcraft (Vivendi, 2004).
Some simple dynamic techniques have been applied to more action oriented games such
as Gears Of War (Microsoft Game Studios, 2006) (see Figure 2.1b), in which the camera
changes relative position and look-at direction automatically to enhance some actions or
allow for a better view of the environment.
In games, a common camera control problem, involves following and keeping visible
one or more subjects while avoiding obstacles. Moreover, aesthetics and narrative can be
supported during and between actions by applying high composition and animation rules.
Halper et al. (2001) proposed an automatic camera control system specifically designed for
computer games, in which he highlights the necessity of frame-coherence (smooth changes in
camera location and orientation) to avoid disorienting the player during the game. Hamaide

16
2.5. Artificial Intelligence

(a) A screen-shot from Heavy Rain by Quantic (b) A screen-shot from Gears Of War by Epic during
Dream, demonstrating usage of cinematographic a running action; in such context the camera moves
techniques. downward and shakes to enhance the haste sensa-
tion.

Figure 2.1: Examples of advanced camera control in modern computer games.

(2008) presented a camera system developed by 10Tacle Studios based on the controller by
Bourne et al. (2008); this system extends the one presented by Bourne et al. by adding
more advanced control on the camera dynamic behaviour with the support for predefined
movement paths.
None of these systems, however, supports both composition on multiple subjects and
animation contemporary, being unable to support cinematographic representations of the
game actions. Moreover, these works focus primarily on the automation of camera animation
and occlusion avoidance not considering aspects such as shots planning and editing. This
thesis addresses these aspects by applying machine learning to model the players’ preferences
on camera behaviour; such computational models are used to identify the most appropriate
shot to be selected during gameplay

2.5 Artificial Intelligence

As stated by Russell and Norvig (2009), Artificial Intelligence (AI) has been defined in
multiple ways depending on which aspect is considered more important. Rusell and Norvig
identify two dimensions along which the different definitions vary: which aspect of the
process is considered between reasoning and acting and whether the final goal is to achieve
and optimal or a human like result. This thesis work, by investigating the aforementioned
aspects of automatic camera control, focuses on the acting aspect of AI with the purpose
of both achieving and optimal and a human like behaviour of the camera. This section will
present a short overview of the state of the art in optimisation followed by two subsections
describing how optimisation and machine learning are used in computer games to generate
optimal agent behaviours and game contents and to model the player.

17
Chapter 2. Related Work

2.5.1 Optimisation

Many of the approaches presented in section 2.3 address different aspects of virtual camera
control as numerical optimisation tasks. In numerical optimisation, the algorithm is required
to find the best configuration in a given search space that minimises or maximises a given
objective function. For instance, in APF, the controller searches for the solution which
minimises the function representing the potential field, similarly in PRM, the controller
searches for the path of minimum length connecting two points. In the first example, the
objective function is the potential field function and the search space is its domain. In the
second example, the objective function is the function measuring the length of the produced
path, while the search space contains all the possible paths connecting the two points.
Optimisation techniques vary according to the domain type, the objective function and
how much is known about these two aspects. Some techniques, such as linear program-
ming (Hadley, 1962), make strong assumptions on the type of objective function, while
other techniques, such as meta-heuristics, make few or no assumptions about the problem
being optimized. Due to this flexibility, meta-heuristic methods stand as an ideal instrument
to address both the problems of camera animation and virtual camera composition (Olivier
et al., 1999; Halper and Olivier, 2000; Beckhaus et al., 2000; Bourne et al., 2008; Burelli
and Yannakakis, 2012b; Pickering, 2002; Christie and Normand, 2005; Burelli et al., 2008)
Metaheuristics are optimisation methods that iteratively try to improve a candidate
solution with regard to a given measure of quality; the iterative process commonly in-
cludes some form of stochastic optimization. Examples of meta-heuristic methods include
population based algorithms such as Genetic Algorithms (Holland, 1992) or Evolutionary
Algorithms (Schwefel, 1981) and local search algorithms such as Simulated Annealing (Kirk-
patrick et al., 1983) or Tabu Search (Glover and Laguna, 1997).
Section 3.4 describes virtual camera composition as an optimisation problem and, more
specifically, a dynamic optimisation problem. This family of problems includes all the op-
timisation problems in which the objective function changes during the optimisation prob-
lem (Branke, 2001). Algorithms that attempt to solve such problems have to be able to
reuse information between optimisation cycles to guide their convergence while keeping the
population diversified to avoid premature convergence. Examples of memory mechanisms
used to improve standard meta-heuristics algorithms include memorisation of successful
individuals (Ramsey and Grefenstette, 1993) or diploidy (Goldberg and Smith, 1987). Ex-
amples of techniques adopted to ensure diversity in the population include both approaches
that inject noise after the objective function changes, such as Hypermutation (Cobb, 1990)
and Variable Local Search (Vavak et al., 1997), as well as approaches that maintain the
diversity during the optimisation process, such as multi-modal optimisation (Ursem, 2000).

18
2.5. Artificial Intelligence

These techniques, however, assume that the objective function changes at most once every
generation; as it is explained in Chapter 3, the change rate in automatic camera control
is potentially much higher so a different approach is proposed to deal with the dynamic
nature of the objective function.

2.5.2 Artificial Intelligence in Games

Artificial Intelligence in games has been employed in a variety of forms to address a multi-
tude of problems such as procedural content generation (Togelius et al., 2010), game play-
ing (Lucas, 2008) or player modelling (Houlette-Stottler, 2004; Yannakakis and Maragoudakis,
2005).
Various optimisation algorithms have been employed to develop optimal strategies to
play games (Wirth and Gallagher, 2008) or to generate more intelligent non player character
behaviours (Hagelbäck and Johansson, 2009). Others attempted to use machine learning to
learn how to play a game (Tesauro, 1994; Thrun, 1995); such task very often stands as a more
complex task than board or card games, due to larger space of states, higher unpredictability
and larger number of possible actions. Therefore, several approaches attempted to learn
playing computer games such as Super Mario Bros (Togelius et al., 2009), Pacman (Lucas,
2005) or TORCS (Cardamone et al., 2009).
Another prominent direction of research in the field investigates the application of ma-
chine learning for content generation in games. The primary challenges of procedural con-
tent generation are: how to represent the game content, how to evaluate the content quality
and how to find the best content configuration. Machine learning has been used to address
all these problems: for instance, evolution has been employed to generate tracks for racing
games (Togelius et al., 2006) or to generate strategy game units (Mahlmann et al., 2011)
and ANNs have been used to model the weapons in a multi-player space fight game (Hast-
ings et al., 2009). Shaker et al. (2010) built a set of ANNs models of player experience and
employed them as evaluation functions for the game content.
Bares and Lester (1997a) proposed an explicit method to build user models of camera
behaviour used to generate personalised camera control experiences, this thesis draws upon
this work and modern player modelling to design a method to generate adaptive cinemato-
graphic experiences in computer games.

2.5.3 Player Modelling

The term player modelling identifies the application of user modelling to games (Charles
and Black, 2004; Houlette-Stottler, 2004; Yannakakis and Maragoudakis, 2005). Player
modelling has been employed on different aspects of games; however, it is possible to isolate

19
Chapter 2. Related Work

two main purposes: a post-hoc analysis of the players’ in-game behaviour (Drachen et al.,
2009) or as as the first step to adapt game content (Yannakakis and Togelius, 2011).
Clustering techniques have been used to isolate important traits of the player behaviour:
self-organising maps have been used to identify relevant game-play states (Thurau et al.,
2003) or to identify player behaviour types from game-play logs (Thawonmas et al., 2006;
Drachen et al., 2009) and neural gas (Thurau et al., 2004) has been applied to learn players’
plans. Thurau et al. (2003; 2004) coupled clustering techniques with supervised learning to
build believable Quake II (Activision, 1997) bots.
Viewing player modelling as an initial step in the design process of adaptive behaviour
in games, Yannakakis and Maragoudakis (2005) combined naive Bayesian learning and on-
line neuro-evolution in Pac-Man to maximize the player’s entertainment during the game
by adjusting NPCs behaviour. Thue et al. (2007) built a player profile during the game,
based on theoretical qualitative gameplay models, and used this profile to adapt the events
during an interactive narrative experience.
Yannakakis et al. (2010) studied the impact of camera viewpoints on player experience
and built a model to predict this impact. That research study demonstrates the existence of
a relationship between player emotions, physiological signals and camera parameters; how-
ever, since the relationship is built on low level camera parameters, the findings give limited
information about the visual features which are more relevant for the player. Therefore, in
the light of these results, in this thesis we further investigate the relationship between cam-
era and player experience to automate the generation and selection of the virtual camera
parameters. More specifically, in this thesis, we attempt to incorporate alternative player
input modalities (i.e. gaze) to model the user’s visual attention for camera profiling.

2.6 Gaze Interaction in Games

Eye movements can be recognised and categorised according to speed, duration and direc-
tion (Yarbus, 1967). In this paper, we focus on fixations, saccades and smooth pursuits. A
fixation is an eye movement that occurs when a subject focuses at a static element on the
screen; a saccade occurs when a subject is rapidly switching her attention from one point
to another and a smooth pursuit is a movement that takes place when a subject is looking
at a dynamic scene and she is following a moving object.
Research on gaze interaction in computer games include studies on the usage of gaze
as a direct player input (Nacke et al., 2009; Munoz et al., 2011) and studies on gaze as a
form of implicit measure of the player’s state. El-Nasr and Yan (2006), for instance, used
an eye tracker to record eye movements during a game session to determine eye movement

20
2.7. Summary

patterns and areas of interest in the game. Moreover, they employed this information to
calculate the areas of the game that necessitate a higher graphic quality.
Sundstedt et al. (2008) conducted an experimental study to analyse players’ gaze be-
haviour during a maze puzzle solving game. The results of their experiment show that
gaze movements, such as fixations, are mainly influenced by the game task. They conclude
that the direct use of eye tracking during the design phase of a game can be extremely
valuable to understand where players focus their attention, in relation to the goal of the
game. Bernhard et al. (2010) performed a similar experiment using a three-dimensional
first-person shooter game in which the objects observed by the players were analysed to
infer the player’s level of attention. We are inspired by the experiment of Bernhard et al.
(2010); unlike that study however, in this thesis, we analyse the player’s gaze patterns to
model the player’s camera movements, and moreover, investigate the relationship between
camera behaviour, game-play and player-behaviour.

2.7 Summary

This chapter described the state-of-the art of virtual camera control and how it is related
to field such as optimisation, machine learning and user modelling. Moreover, it gives and
outline of the outstanding issues existing in the field and, more specifically, in the use of
automatic camera control in dynamic and interactive applications. Problems such as the
computational complexity of dynamic camera placement and animation and the dichotomy
between interactivity and cinematography, which have been introduced in this section, will
be analysed in depth in the following chapter and a series of solutions will be proposed and
evaluated.

21
Chapter 2. Related Work

22
Chapter 3

Automatic Camera Control

This chapter1 presents the algorithms and techniques adopted to successfully tackle the
virtual camera composition and animation problems.
The virtual camera composition problem is commonly defined as a static problem in
which, given an environment setup and a shot description, one or multiple optimal static
shots are generated. In this paper, the concept of virtual camera composition is extended
to address environments that change dynamically during optimisation. In this context,
optimisation is not intended as a finite process that produces an optimal set of results at
the end of its execution. It is, instead, a never-ending process that continuously adapts and
tracks the best possible camera configuration while the environment is changing. In other
words, the optimisation process is not run every frame, instead it runs in parallel to the
rendering process at each frame rendering the current best solution can be used to drive
the camera.
We believe that continuous dynamic optimisation of the camera configuration with re-
spect to composition is a fundamental aspect of automatic camera control in dynamic
environments. Successfully solving such an optimisation problem would allow develop a
controller which is always aware of the optimal camera configuration in composition terms.
Such information could be employed directly to place the camera or it could be used to drive
an animation process. In such a structure, a camera controller is divided in three layers
solving respectively composition, animation and shot planning. The camera configurations
identified by the composition layer during the optimisation are used to drive inform the
animation layer, which employs a Probabilistic Roadmap based path planning algorithm to
smoothly animate a virtual camera.
In the first part of the chapter, virtual camera composition and animation are defined
as dynamic numerical optimisation problems and the concepts of frame and animation
1
Parts of this chapter have been published in (Burelli and Jhala, 2009b,a; Burelli and Yannakakis, 2010a,b)
and have been submitted for publication in (Burelli and Yannakakis, 2012b)

23
Chapter 3. Automatic Camera Control

constraints are introduced and each constraint is described in detail with references to the
state-of-the art. The chapter continues with an overview of the architecture of CamOn,
a detailed description of the optimisation and path planning algorithms employed in its
development and evaluation, and it concludes with a summary of the chapter’s content.

3.1 Frame constraints

In order to define virtual camera composition as an optimisation problem, we primarily

need to identify the number and types of key attributes that need to be included in the
objective function of the optimisation problem.
The characteristics of the search space of the optimisation problem depend on the num-
ber and type of parameters used to define the camera. The choice of parameters affects
both the dimensionality of the optimisation problem (thus, the performance of the optimi-
sation process) and the expressiveness, in terms of shot types that can be generated. The
solution space contains all the possible camera configurations (the term solution and cam-
era configuration are used interchangeably in this article) and, according to the standard
perspective camera model in OpenGL, a virtual camera is defined by six parameters: posi-
tion, orientation, field of view, aspect ratio, near plane and far plane. Camera position is
a three-dimensional vector of real values defining a Cartesian position. Camera orientation
can be defined either using a quaternion, a set of three Euler angles or a combination of two
three-dimensional vectors describing the front direction and the up direction. With the last
four parameters, the domain of the virtual camera composition objective function is at least
10-dimensional. However, some parameters such as near and far planes and the camera up
vector are commonly constant, while other parameters are tightly related to the shot type.
In particular, parameters such as field of view and aspect ratio are used dynamically and
statically to express certain types of shots (Arijon, 1991). For instance, an undesired change
of field of view performed by the composition optimisation layer during the shooting of a
scene, might create an undesired zoom effect which disrupts the current shot purpose. On
the other hand, if a vertigo shot is selected at shot planning level — e.g. by a designer or by
an automatic planning system such as Darshak (Jhala and Young, 2010) —, the animation
layer will take care of widening the field of view, while the composition layer will automat-
ically track the best camera configuration towards the subject to maintain the composition
properties. For this reason, we consider these aspects of camera as parts of the high level
requirements for the optimisation algorithm instead of as optimisation variables. The search
space which composes the domain of our objective function is, therefore, five-dimensional
and it contains all possible combinations of camera positions and orientations.

24
3.1. Frame constraints

Bares et al. CamOn

OBJ PROJECTION SIZE Object Projection Size
OBJ VIEW ANGLE Vantage Angle
OBJ IN FIELD OF VIEW Object Visibility
OBJ OCCLUSION MINIMISE
OBJ EXCLUDE OR HIDE
OBJ OCCLUSION PARTIAL
OBJ PROJECTION ABSOLUTE Object Projection Position
CAM POS IN REGION Camera Position

Table 3.1: Comparison between the frame constraints defined by Bares et al. (2000) and
the frame constraints supported by the camera controller described in this thesis.

Bares et al. (2000) described the concept of frame constraint as a set of requirements that
can be imposed to define a camera control problem. Every frame constraint is converted
into an objective function that, in a linear combination with all the constraints imposed,
defines a camera control objective function (Olivier et al., 1999). An example of a frame
constraint is OBJ PROJECTION SIZE that requires the projection of an object to cover
a specified fraction of the frame.
We consider a reduced set of frame constraints: Vantage Angle, Object Projection Size,
Object Visibility and Object Projection Position. These four constraints serve as representa-
tives of all the constraints listed by Bares et al. (2000). Table 3.1 contains a comprehensive
comparison of such constraints and their equivalent frame constraint in the CamOn system.
In the remaining of this section, we present these frame constraints, their corresponding
objective functions as well as how each constraint relates to one or more constraints of the
aforementioned list. Note that the terms fitness and objective function value will be used
interchangeably in this chapter to describe the same concept as most of the algorithms
considered are based on population-based meta-heuristic algorithms.

3.1.1 Vantage Angle

This constraint binds the camera position to the position and rotation of a target object. It
requires the camera to be positioned so that the angle between the target object front vector
and the front vector of the camera equals to a certain value. A vantage angle constraint is
defined by three parameters: the target object, the horizontal angle and the vertical angle.
Figure 3.1 depicts three sample shots showcasing the relationship between the angles, the
target object and the generated shot. The objective function fθ of this frame constraint
quantifies how close to the required angle is the position of the camera and it is defined as

25
Chapter 3. Automatic Camera Control

(a) Angle: 0, 0 (b) Angle: -90, 0 (c) Angle: 45, 45

Figure 3.1: View angle sample shots. Each shot is identified by a horizontal and a vertical
angle defined in degrees.

where α is the desired horizontal angle, β is the desired vertical angle, F~ is the target’s
~ is the current camera position, T~ is the current target position and P~ˆ is the
front vector, C
normalised relative direction of the camera with respect to the target object. Using this
constraint, it is also possible to control only one angle; in which case, fθ equals either to fα
or fβ depending on the angle that should be constrained.
This frame constraint is equivalent to OBJ VIEW ANGLE constraint of the Bares et al.
(2000) list (see Table 3.1).

3.1.2 Object Projection Size

This constraint binds the camera position and rotation to the position and size of a target
object. It requires the area covered by the projection of a target object to have a specific
size. The object projection size constraint is defined by two parameters: the target object

26
3.1. Frame constraints

(a) Size: 1.0 (b) Size: 0.5 (c) Size: 1.5

Figure 3.2: Object projection size sample shots. Each shot is identified by a projection size
defined as the ratio between the longest side of the object’s projection bounding box and
the relative side of the frame.

and the fraction of the frame size that the projection should cover. Figure 3.2 shows three
sample shots demonstrating the relationship between the projection size, the target object
and the generated shot. The objective function fσ of this frame constraint quantifies the
proximity of the current camera position to the closest position which generates a projection
of the object covering the desired size. It is calculated as:
(
σc
if σd > σc ,
fσ = σσdd (3.2)
σc otherwise.

where σd is the desired projection size and σc is the actual projected image size of the
target object. The size of an object’s projected area can be calculated with different lev-
els of approximation. Maximum accuracy can be obtained by rendering to an off-screen
buffer and counting the area covered by the object. The bounding box and sphere can
also be used effectively for the area calculation. While these approximations drastically
decrease the computational cost of the objective function evaluation, they also provide less
accuracy. Using the bounding sphere of the object is the fastest evaluation method but
it approximates poorly most of the possible targets, especially human-shaped objects. In
the current implementation of the evaluation function, the target object is approximated
using its bounding box and the projected area size is calculated using Schmalstieg and To-
bler’s method (1999). This frame constraint corresponds to the OBJ PROJECTION SIZE
constraint of the Bares et al. (2000) list (see Table 3.1).

3.1.3 Object Visibility

This constraint binds the camera position and rotation to the position and size of a target
object. It requires the target object to be included in the frame and not hidden by any

27
Chapter 3. Automatic Camera Control

(a) Visibility: 1.0 (b) Visibility: 0.6 (c) Visibility: 0.5

Figure 3.3: Object visibility sample shots. Each shot is identified by a visibility value
defined as the ratio between the visible area of the object and its complete projected area.

other object; both conditions are necessary to identify the target object as visible. In order
to respect these two requirements, the camera should be placed at a sufficient distance from
the target and oriented in order to frame the target. Moreover, the volume between the
camera and the target object should not contain obstacles that hide the target object.
Every opaque object in the virtual environment can potentially act as an obstacle and
generate an occlusion. Figure 3.1 illustrates three sample shots showcasing the relationship
between the visibility value, the target object and the generated shot. The objective function
fγ of this frame constraint quantifies the rate between the actual visible area of the projected
image of the object and its total projected area and it is defined as:

fγ = 1 − |γd − γc |
PN
vi ) N
P e
i=1 inf ov(~ i=1 (1 − occ(~
ei ))
γc =
( N 5
1 if ~x is in the view frustum, (3.3)
inf ov(~x) =
0 otherwise.
(
1 if ~x is occluded,
occ(~x) =
0 otherwise.

where γc is the current visibility value of the target object, γd the desired visibility value,
~vi is the position of the ith vertex of the object’s mesh, N is the number of vertices of the
mesh, function inf ov(~x) calculates whether a point is included in the field of view or not,
~e is the list containing the positions of the four extreme vertices in field of view — i.e. the
top, bottom, left and right vertices on screen — and the one closer to the center of the
projected image, and Ne is equal to 5 (an example of these points is depicted in Fig. 3.4).
The occ(~x) function calculates whether the point ~x is occluded by another object or not.

28
3.1. Frame constraints

Figure 3.4: Example of the 5 points used to check visibility in the Object Visibility objective
function.

29
Chapter 3. Automatic Camera Control

(a) Position: 0.25, 0.25 (b) Position: 0.75, 0.75 (c) Position: 0.5, 0.5

Figure 3.5: Object frame position sample shots. Each shot is identified by a two-dimensional
vector describing the position of the object’s center in the frame.

The first part of the visibility function returns the fraction of the object which is in the
field of view, while the second part returns the fraction of that part which is not occluded,
the product of these two values is the overall visibility.
The implemented version of the function is optimised not to calculate the second part of
the function if the first part is equal to 0. The occlusion check is implemented by casting a
ray towards the point defined by the vector ~x and then checking whether the ray intersects
any other object other than the target. The inf ov(~x) function is implemented by checking
whether the point defined by ~x is included within the six planes composing the view frustum.
The object visibility constraint includes the OBJ IN FIELD OF VIEW, OBJ OCCLU-
SION MINIMISE, OBJ EXCLUDE OR HIDE and OBJ OCCLUSION PARTIAL con-
straints of the list proposed by Bares et al. (2000) (see Table 3.1). The first two can be
obtained by setting the desired visibility to 1, the third by setting it to 0, while any number
between these two cases expresses the fourth constraint.

3.1.4 Object Frame Position

This constraint binds the camera position and rotation to the position of a target object. It
requires the target object to be included in the frame at a specific two-dimensional location.
The object frame position constraint is defined by three parameters: the target object, the
desired horizontal position and the desired vertical position. Figure 3.5 shows three sample
shots demonstrating the relationship between the frame position, the target object and the
generated shot.
The objective function fρ of this frame constraint quantifies how close to the required

30
3.1. Frame constraints

orientation is the camera and it is defined as follows:

|p~a − p~d |
fρ = 1 − √ (3.4)
2
where p~a is the two-dimensional position of the target object in the frame and p~d is the
desired position. Both vectors are defined between (0,0) and (1,1) where (0,0) corresponds
to the lower left corner of the frame and (1,1) corresponds to the upper right corner of
the frame. By combining object frame position and object projection size constraints it is
possible to express the OBJ PROJECTION ABSOLUTE constraint, since it is possible to
control both size and location of the object’s projected image.

3.1.5 Camera Position

This constraint binds the camera position to a specific location in the virtual environment.
It is improperly considered a frame constraint as it does not bind the solution to a char-
acteristic of the image to be rendered; however, it is often useful to be able to bind the
camera to a certain location. Moreover, the constraint has been included for completeness
with respect to the reference list defined by Bares et al. (2000) as it corresponds to the
CAM POS IN REGION constraint (see Table 3.1).
The objective function f of this frame constraint expresses how close the camera is to
the region of space identified by the positions ~vmax and ~vmin and it is defined as follows:
(
1 if ~v < ~vmax ∧ ~v > ~vmin
f = min(|~v −~vmax |,|~v −~vmin |) (3.5)
1− Dmax otherwise.
where Dmax is the maximum distance between two points in the virtual environment.

3.1.6 Composition Objective Function

The complete virtual camera composition objective function is a linear combination of the
four aforementioned objective functions. Each objective function corresponds to a frame
constraint imposed on a certain object, the complete objective function f is given by:

Nγ Nσ Nθ Nρ N
X X X X X
f= wγi fγi + wσi fσi + wθi fθi + wρi fρi + wi fi (3.6)
i=1 i=1 i=1 i=1 i=1

where Nγ , Nσ , Nθ , Nρ and N are, respectively, the number of object visibility, object

projection size, vantage angle, object frame position and camera position constraints. wγi
and fγi are, respectively, the weight and the objective function value of the ith object
visibility constraint; wσi and fσi are, respectively, the weight and the objective function
value of the ith object projection size constraint; wθi and fθi are, respectively, the weight and

31
Chapter 3. Automatic Camera Control

the objective function value of the ith vantage angle constraint; wρi and fρi are, respectively;
the weight and the objective function value of the ith object frame position constraint; and
wi and fi are, respectively, the weight and the objective function value of the ith camera
position constraint.

3.2 Animation Constraints

The objective function presented in Eq. 3.6 describes the ideal shot the camera should gen-
erate; however, in a dynamic virtual environment, the camera cannot jump continuously to
the optimal configuration as this would often provoke a completely disorienting behaviour
(e.g. a virtual camera instructed to frame flickering object, such as a flame, would contin-
uously jump forward and backward to maintain the object’s projection size).
In dynamic environments, it is necessary for an automatic camera controller to moderate
also the camera motions/animation properties. For this purpose, we identify a minimal set
of four key motion constraints that the system should support, which are as follows:

• Camera Movement Speed: Defines the speed in space units per second at which
the camera follows the ideal position.

• Camera Rotation Speed: Defines the speed in degrees per second at which the
camera adjusts towards ideal rotation.

• Frame Coherence: Defines the threshold value (minimum percentage of visible

targets) for generating a cut — i.e. if the current visible surface of all the targets is
below the fraction defined by frame coherence (or there is no available path connecting
the current camera position with the next one) the controller will generate a cut, and
will place to camera to the new position without generating any animation.

• Obstacle Avoidance: A predefined boolean value that controls whether the camera
should or should not avoid the objects in the scene during its motion.

More constraints could be included in the set to support more cinematographic anima-
tions such as constraints on motion directions or constraints on frame properties during
motion (Lino et al., 2010). However, the proposed set guarantees an essential control of the
camera dynamic behaviour.

32
3.3. Controller’s Architecture

3.3 Controller’s Architecture

The structure of CamOn (see Fig. 3.6) is the result of an iterative design process. Through
the different design phases we have analysed the virtual camera composition objective func-
tion (Eq. 3.6) and we have evaluated different algorithms. The final results of this empirical
process are presented in Chapter 6.
The architecture of the controller includes two main components: an optimisation mod-
ule and an animation module (see Figure 3.6). The controller operates iteratively: at each
iteration, the optimisation module finds the next best camera solution according to the
frame constraints defined to describe the current shot, while the animation module ani-
mates the camera towards this solution according to the animation constraints. The inputs
considered by the controller are: the current camera configuration, the current geomet-
rical configuration of the virtual environment, the frame constraints and the animation
constraints. The output of the controller is a new virtual camera configuration. The pa-
rameters of the camera which are not controlled by CamOn (e.g. field of view) can be
directly set on the camera.
The controller balances the computational resources given to the two modules at each
iteration in order to guarantee real-time execution; therefore, depending on the computa-
tional resources available, the controller will pause the optimisation process to guarantee
an execution of each iteration in 16.6 ms (60 iterations per second). For this reason, only
few steps of the optimisation are executed each cycle; therefore, the optimiser has to search
an optimal solution in a dynamic problem in which the geometry might change during the
optimisation process. After the optimisation phase is concluded the animation module cal-
culates the new configuration of the virtual camera according to the animation constraints
and the best solution found so far by the optimiser.
The controller’s execution flow is as follows:

1. The system takes the current camera configuration, the frame constraints, the ani-
mation constraints and the current virtual environment’s geometry configuration as
input.

2. The next potential camera configuration that optimises the frame composition objec-
tive function is calculated by the optimisation module.

3. A path connecting the potential new camera configuration to the current one is cal-
culated by the animation module.

33
Chapter 3. Automatic Camera Control

Camera Settings Frame Constraints Animation Constraints

CamOn

Camera Geometry Optimiser

Animator

Figure 3.6: Controller’s architecture. The black squares identify the controllable settings,
while the white squares identify the virtual environment features considered by the con-
troller. The two modules of the architecture are identified by the white ellipses.

4. If a path connecting the two points is available, the camera position and rotation are
animated towards the new solution according to the animation constraints; otherwise
the current camera is replaced by the newly found one, generating a cut.

5. The execution restarts from step 1

3.4 Optimisation Module

The optimisation module within the CamOn controller is responsible of finding and tracking
the camera configuration that optimises the given set of frame constraints. The optimisation
is executed in real-time while the virtual environment changes; therefore the module needs
to search an optimal solution of a dynamic optimisation problem.
We propose a novel approach, based on an hybrid meta-heuristic, to the problem of
finding and tracking the optimal camera in real-time in a dynamic environment. The pro-
posed search algorithm, combining an Artificial Potential Field and a Genetic Algorithm,
stands upon the author’s earlier work on camera control (Burelli and Jhala, 2009b; Burelli
and Yannakakis, 2010a) and extends it by introducing a novel coupling mechanism of the
two algorithms. At the initial stage of its development the optimisation module was purely
base on Artificial Potential Field to optimize the camera position and orientation; how-
ever, while the algorithm proved to allow the controller to operate in real-time, it showed

34
3.4. Optimisation Module

limitations caused by tendency of APF to prematurely converge to local optima with no

visibility of the targets (Burelli and Jhala, 2009b). A specific study on this aspect of the
optimization function revealed that GAs successfully optimize the visibility objective func-
tion (Burelli and Yannakakis, 2010b); however, the objective function in dynamic virtual
camera composition changes multiple times during a single generation evaluation and, as
pointed out by Schönemann (2007), such high frequency of change prevent any evolutionary
strategy to identify and track the global optimum successfully. The addition of an APF
operator allows the hybrid genetic algorithm to track the global optimum by concentrating
the search process locally as long as exploration is not necessary to avoid or escape local
optima basins.
The reminder of the section will describe a set of algorithms for virtual camera compo-
sition and it will present the hybrid Genetic Algorithm used to optimise the virtual camera
composition objective function in CamOn. The algorithms that will be described alongside
our solution will be: Artificial Potential Field, Sliding Octree, Genetic Algorithm, Particle
Swarm Optimisation and Differential Evolution. These algorithms have been selected since
they stand as state-of-the-art optimisation for virtual camera composition and they are part
of the comparative study presented in Chapter 6.

3.4.1 Artificial Potential Field

The first algorithm presented in this section is the Artificial Potential Field (APF) (Khatib,
1986). APF is an iterative technique commonly adopted in the area of robotics for control-
ling the navigation of robots in dynamic environments. Robots are modelled as particles
moving in a field of potentials attracted by low potential areas; the position to be reached
generates an attractive force (a low potential zone) and obstacles generate repulsive forces
(high potential zones). At each iteration, the particle moves along the force resulting from
the sum of all repulsive (obstacle avoidance) and attractive (goals) forces influencing the
particle’s current position; the particle continues to move until it reaches a stable state.
In automatic camera control, APF has initially been employed for simple navigation
tasks (Drucker and Zeltzer, 1994; Beckhaus et al., 2000), while we have adapted it to
solve the full camera optimisation problem. In its application to camera optimisation,
obstacle avoidance is modelled using repulsive forces and frame composition is obtained by
translating frame constraints into forces affecting both the position and the look-at point
of the camera.
Each frame constraint produces one force attracting or repulsing the camera’s position
and one force attracting or repulsing the camera’s look-at point; the system treats these two
aspects of the camera as two different particles moving into two different potential fields.

35
Chapter 3. Automatic Camera Control

(a) Target object (b) Position potential field (c) Orientation potential field

Figure 3.7: Position (b) and orientation (c) potential fields produced by a visibility con-
straint on a sphere (a). The two three-dimensional potential fields are sampled along the
XZ plane on at the Y coordinate of the sphere.

An example of the two potential fields created by a frame constraint (the target sphere has
to be fully visible in the projected frame) can be seen in Figure 3.7. The two potential fields
shown are a sample of the 3D field measured along the horizontal plane passing through
the sphere centre. The particle representing the look-at point of the camera and the one
representing its position are attracted by the low potential areas; resulting in the camera
turning towards the target sphere and moving at a sufficient distance to be able to frame
the complete object.
The Vantage Angle constraint produces one force F~pθ that affects the camera position
defined as follows:
ˆ ~t)
F~pθ = (1 − fθ )(~c −
   
cos(α) 0 sin(α) 1 0 0 (3.7)
~t =  0 1 0  × 0 cos(β) − sin(β) × f~wc |~c − ~s|
− sin(α) 0 cos(α) 0 sin(β) cos(β)

where ~c is the camera position, ~s is the subject position, f~wc is the subject’s front vector
and the fθ function is the vantage angle objective function defined in Equation 3.1.
The Object Projection Size constraint also produces only a force F~pσ that affects the
camera position defined as follows:
ˆ ~s)
F~pσ = dp (1 − fσ )(~c −
(
1 if σc < σd (3.8)
dp =
−1 otherwise

where σc is the current projections size value and σd the desired one, ~c is the camera position,
~s is the subject position and the fσ function is the projection size objective function defined
in Equation 3.2.

36
3.4. Optimisation Module

The Object Visibility constraint produces one force that affects the camera position and
one force that affects the camera look-at point. The camera position force F~pγ is defined as
follows:
F~pγ = (v~uw + h~rc )|~c − ~s|(1 − fγ )

dp
 if occ(~ebottom ) ∧ occ(~etop )
v = −dp if occ(~etop ) ∧ occ(~ebottom )

0 otherwise


dp
 if occ(~elef t ) ∧ occ(~eright ) (3.9)
h = −dp if occ(~eright ) ∧ occ(~elef t )

0 otherwise

(
1 if γc < γd
dp =
−1 otherwise

where γc is the current visible fraction value and γd the desired one, ~c is the camera position,
~s is the subject position, ~uw is the world’s up vector and ~rc is the camera’s right vector.
The ~e vectors, the fγ function and the occ() function are the ones defined in equation 3.3.
The camera orientation force F~oγ is defined as follows:

F~oγ = do (~s − ~c − f~wc ((~s − ~c) · f~wc ))|i − γd |

(
1 if i < γd
do = (3.10)
−1 otherwise
PN
inf ov(~vi )
i = i=1
N
where γd the desired visible fraction, ~vi is the position of the ith vertex of the subject’s mesh,
N is the number of vertices of the mesh, function inf ov(~x) calculates whether a point is
included in the field of view or not, ~c is the camera position, ~s is the subject position and
f~w is the camera’s front vector.
c
The Object Frame Position constraint produces one force F~pρ that affects the camera
look-at point an it is defined as follows:

F~oρ = (1 − fρ )(v~uw + h~rc )

(
1 if ρ
~dy > ρ
~cy
v=
−1 otherwise (3.11)
(
1 if ρ
~dx < ρ
~cx
h=
−1 otherwise

where ρc is the current on screen position value and ρd the desired one, ~uw is the world’s
up vector, ~rc is the camera’s right vector and the fρ function is the ones defined in equation
3.4.

37
Chapter 3. Automatic Camera Control

The Camera Position constraint produces one force F~p that affects the camera position
defined as follows:

ˆ ~c)
F~p = (1 − f )(~ − (3.12)

where ~c is the current camera position, ~ is the desired camera position and the f function
is the camera position objective function defined in Equation 3.5.
The force value at each point is described as a linear combination of scalar weighted
forces, where each force corresponds to a frame constraint and each scalar value to the
relative subject importance; the resulting values define the gradients of the potential field.
The resulting force value represents the multi-dimensional derivative of the potential field;
thus, an Euler integration method can be used used to follow the slope of the potential
curve and converge to the low potential area. The initial iteration step s starts from an
arbitrary initial value and, during the optimisation, decreases according to the following
formula:
e10(1−Fi−1 ) − 1
si = s0 (3.13)
e10(1−Fi−1 ) + 1
where s0 is the initial step, Fi is the best fitness value before iteration i and si is the step
length that should be at the applied at iteration i. Note that Fi−1 can assume values
between 0 and 1, thus, the length reduction is also constrained in the same interval. Such a
fitness-based decrease is intended to speed-up the convergence when the objective function
value of the current solution is low and to stabilise the convergence when the solution is
near optimal.

3.4.2 Sliding Octree

In the second version of their camera control system, Bourne et al. (2008) introduced a
local search algorithm, the Sliding Octree (SO), that explores the search space through
cubic spaces that get smaller at each iteration step.
SO is a best-first search algorithm, following the node of the octree with the highest
fitness value. The search algorithm starts by generating an octree that contains 8 points
surrounding the current camera position. The distance between the points is equivalent to
the maximum distance the camera can move during one algorithm iteration. During each
iteration the algorithm performs a number of passes in which the octree slides towards the
point having the highest evaluated fitness value and a new evaluation of the octree nodes is
performed. At each pass, the distance of the octree points is decreased by a linear scaling
factor, so that the search process starts coarsely and becomes increasingly focused around
the solution. After all the passes have been performed, a new octree is generated around
the new solution and the execution proceeds with the next iteration.

38
3.4. Optimisation Module

Initial solution

pass #1 Potential solutions

pass #2

pass #n
Iteration solution

Figure 3.8: Example of a Sliding Octree iteration step. At each pass of the iteration, the
branch of the octree containing the best solution is explored, the distance between the
children nodes and the parent node decreases by 25% at each level of the tree.

The distance scaling factor and the number of passes are the parameters of the algorithm.
If the scaling factor is too large, the octree shrinks too quickly and large portions of the
search space may be missed. If the scaling factor is too small, the solver takes significantly
longer to converge on the optimal solution. Bourne at al. suggest a linear scaling factor as
equal to 0.75 and a number of passes Np calculated as follows:

Np = 3Ds (3.14)

where Ds is the number of dimensions of the search space.

The version of the optimisation algorithm evaluated in Chapter 6 differs slightly from
this implementation due to the dimensionality of the search space. The original version of
the search algorithm optimised only the camera position: therefore, it searched for a solution
only in a three-dimensional space. To apply the same heuristic to the five-dimensional search
space explored by the CamOn optimisation module, the number of nodes at each level of
the tree is 32 (25 ) instead of 8 (23 ).

3.4.3 Genetic Algorithm

A Genetic Algorithm (GA) (Holland, 1992) is a biologically-inspired population-based meta-

heuristic that mimics the process of evolution. It operates by generating a population of
random individuals in the search space, each encoding a possible solution. At each iteration
of the algorithm, each individual of the population is evaluated against the function that
needs to be optimised (objective function). Subsequently, a new population of individuals
is generated from the application of genetic operators such as crossover and mutation on a
selection of individuals belonging to the previous generation.

39
Chapter 3. Automatic Camera Control

x y z h v

Figure 3.9: Chromosome of an individual of the Genetic Algorithm, containing 5 real values
describing the camera position and orientation.

Despite the vast variation of GAs used in the literature, there are some common basic
algorithmic steps which are described as follows:

1. A population of chromosomes (i.e. individuals or solutions) is randomly initialised.

2. Each chromosome is evaluated and its fitness value is calculated as the objective
function value corresponding to the solution described by the chromosome.

3. A subset of parents is selected according to their fitness. Several selection schemas

can be applied such as elitism or roulette wheel.

4. Genetic operators (crossover and mutation) are applied to the previously selected
subset of chromosomes and a new generation of potential solutions is generated.

5. The new solutions are evaluated and re inserted in the initial population by substi-
tuting other solutions according a replacement strategy.

6. The algorithm re-iterates starting from step 3.

In GAs, a crossover operator is a genetic operation that generates one or more new
potential solutions from the combination of two parents. The chromosome of each new
potential solution is the result of a recombination of two parent chromosomes. A mutation
operator is a genetic operation that alters a potential solution by randomly changing the
values contained in the selected chromosome. Both operators exist in different variations
and can be applied to both binary and real valued solutions.
In the author’s initial study on the combination of APF and GA for automatic camera
control (Burelli and Yannakakis, 2010a), a GA was employed to optimise a reduced objective
function built only on occlusion requirements. Occlusion depends only on camera position;
therefore, by reducing the dimensionality of the search space, the computational cost of
the GA was significantly lower. However, due to this solution, the GA often converged to
undesirable areas of the search space with very low overall objective function value and did
not allow the local search process to converge to a better solution.
For the purpose of this thesis, a GA is employed to optimise the complete virtual cam-
era composition objective function; therefore each individual represents a possible camera
configuration. The chromosome of each individual contains 5 float values representing the

40
3.4. Optimisation Module

camera position and orientation, using 3 Cartesian coordinates for the position and 2 spher-
ical coordinates for the orientation, as displayed in Fig. 3.9. The crossover operator applied
is a uniform crossover, while the mutation operator applies a uniform random mutation.

3.4.4 Differential Evolution

Differential Evolution (Storn and Price, 1997) is an evolutionary algorithm which, similarly
to a GA, encodes the solutions as chromosomes belonging to a population of individuals
which are iteratively selected and recombined during the optimisation process. DE, how-
ever, employs a particular selection and recombination scheme: for each individual in the
population other three individuals are randomly picked and the new offspring is generated
by the combination of these individuals with probability C for each value of the chromosome.
The new values of the offspring y are given by the following equation:
(
αi + Wd (βi − γi ) if r < C
yi = (3.15)
xi otherwise.

where Wd is the weighting factor, xi is the ith value of the current individual’s chromosome,
α, β and γ are the corresponding gene values of the other three randomly selected individ-
uals, C is the crossover probability and r is a uniform random number in the range (0,1).
DE has proven to be one of the best performing evolutionary algorithms on a large number
of problems and demonstrated to be extremely robust to the parameters choice (Storn and
Lampinen, 2004; Ali and Törn, 2004). The Wd parameter controls the magnitude of the
recombination of the chromosomes, while the C parameter controls its probability.
In its implementation for virtual camera composition, DE encodes in each chromosome
a potential five dimensional camera solution defined by the same real-valued parameters
described for GA, as displayed in Fig. 3.9.

3.4.5 Particle Swarm Optimisation

Particle Swarm Optimisation (Kennedy and Eberhart, 1995) is a population-based global

optimisation algorithm, whose dynamics are inspired by the social behaviour that underlies
the movements of a swarm of insects or a flock of birds. These movements are directed
towards both the best solution to the optimisation problem found by each individual and
the global best solution. The algorithm initially generates a population of random solutions
encoded as particles. Each particle is then initialised with a random velocity. At each
iteration n, after the particles moved at the speed they have been previously assigned, the
algorithm evaluates the fitness corresponding to their new positions and recalculates their

41
Chapter 3. Automatic Camera Control

velocity according to the new global and local optima detected. The velocity vin of the ith
particle at the nth iteration is calculated according to the following formula:

vin = lvin−1 + c1 r1 (on−1

i − xin−1 ) + c2 r2 (on−1
g − xn−1
i ) (3.16)

where r1 and r2 are two uniformly distributed random numbers in the [0, 1] range, whose
purpose is to maintain population diversity. The constants c1 and c2 are, respectively, the
cognitive and social parameters, which control the influence of the local and global optimum
on the convergence of each particle. A high c1 value favours exploration and diversity in
the population, while a high c2 value favours exploitation of the current known solutions
increasing speed of convergence, but also increasing the possibility of early convergence.
The value l is the inertia weight and it establishes the influence of the search history
on the current move. Finally, on−1
i is the best configuration assumed by the ith particle at
step n − 1 and is known as the particle’s local optimum, while on−1
g is the best configuration
assumed by any particle at step n − 1 and is the current global optimum.
Since at each iteration a solution to the problem is available (although it is not guaran-
teed to be the optimal one), PSO belongs to the family of any-time algorithms, which can
be interrupted at any moment still providing a solution.
PSO has been first employed in camera control for off-line camera optimisation combined
with a space pruning technique (Burelli et al., 2008). In the implementation included in this
thesis, each particle moves in a 5-dimensional vector space defined by the same parameters
of the camera described for GA in Section 3.4.3.

3.4.6 Hybrid Genetic Algorithm

The optimisation module employed in the CamOn camera controller is based on a hybrid
Genetic Algorithm designed to exploit the speed of convergence of APF while maintaining
the reliability and the ability to explore the solution space of a GA. Such a hybrid meta-
heuristic extends and draws upon the author’s earlier work integrating visibility optimisation
via a GA combined with an APF based camera controller (Burelli and Yannakakis, 2010a).
However, contrarily to the aforementioned approach, in which the two algorithms were
executed as separate entities and the GA was only used to minimize occlusion, the algorithm
proposed in this article couples the two algorithms in a tighter fashion. The structure of the
hybrid meta-heuristic algorithm follows the structure of a classic GA; however, a new unary
operator is added to the classical crossover and mutation: an artificial potential field driven
mutation. Such operator is added to exploit the knowledge of the objective function by
employing an Artificial Potential Field optimisation algorithm based on an approximation
of the composition objective function derivative.

42
3.5. Animation Module

At the initial stage of its development CamOn employed purely Artificial Potential Field
to optimise the camera position and orientation; however, while the algorithm proved to
allow the controller to operate in real-time, it showed limitations caused by tendency of
APF to prematurely converge to local optima with no visibility of the targets (Burelli and
Jhala, 2009b). A specific study on this aspect of the optimisation function revealed that
GAs successfully optimise the visibility objective function (Burelli and Yannakakis, 2010b):
however, the objective function in dynamic virtual camera composition changes multiple
times during a single generation evaluation and, as pointed out by Schönemann (2007),
such high frequency of change prevent any evolutionary strategy to identify and track the
global optimum successfully. The addition of an APF operator allows the hybrid genetic
algorithm to track the global optimum by concentrating the search process locally as long
as exploration is not necessary to avoid or escape local optima basins.
As it is possible to observe in the flow chart depicted in Fig. 3.10, at every execution
iteration the flow control passes to a component name Scheduler ; this component is respon-
sible to decide when to follow two types of policies: exploitation and exploration. When
exploration is chosen the algorithm behaves exactly like a standard generational GA; there-
fore, it initialises the population up to the desired number of individuals and then applies
the standard genetic operators. On the contrary, when the Scheduler chooses to activate
exploitation, the APF based mutation operator is applied to the current optimal solution.
In the execution flow of the algorithm, the Scheduler is activated before any action is
picked; therefore, the algorithm is able to exploit or explore also during the population
initialisation phase. At the beginning of the algorithm execution, the Scheduler component
follows the exploration policy once every P exploitations, where P is an integer parameter
that defines how often the hybrid GA follows the exploitation policy. If the scheduler detects
an early convergence, the ratio between the two policies execution is inverted; therefore, the
Scheduler follows the exploitation policy only once every P explorations. Two conditions
may trigger an early convergence: all the objects (that are requested to be visible) are
completely occluded, or the optimal solution’s fitness does not increase for more than one
frame rendering time. When all the aforementioned conditions cease, the Scheduler returns
to the initial configuration.

3.5 Animation Module

This section describes the second component of the CamOn architecture; this component
is responsible of finding a path between the current camera position and the new solution
found by the optimisation module and to animate the camera. If no obstacle avoidance is
required for camera motion, this component animates the camera position along a straight

43
Chapter 3. Automatic Camera Control

Start

Scheduler

Explore
Exploit

Population Full? Apply A-MU to the optimum

Yes No

Apply CR to the next two individuals Generate a new individual

Apply MU to the resulting two individuals

Evaluate and replace parents

Figure 3.10: Flowchart representing the execution flow of the proposed hybrid GA. The
labels CR, MU, A-MU are used to refer, respectively, to the crossover, mutation and APF
base mutation operators

line between the current camera position and the next one. On the other hand, if the camera
is required to avoid obstacles, the component calculates and updates a path that connects
the current camera position with the target one at each frame. This path is computed using
the Lazy Probabilistic Roadmap Method (Lazy PRM) path planning algorithm (Bohlin and
Kavraki, 2000).
In both cases, the camera rotation is animated by linearly interpolating the current
camera orientation with the target orientation. The speed movement along the calculated
path and the rotation movement are controlled according to the animation constraints
given as inputs to the camera controller. Under two conditions, the animation module does
not follow the aforementioned procedure and positions and rotates the camera directly to

44
3.5. Animation Module

the optimal solution. This happens if the path planning algorithm is unable to find valid
path in the allowed computation time or if the camera is required to perform a cut for
cinematographic reasons.
The animation component in the aforementioned architecture ensures control over cam-
era animation independently of optimisation convergence and, moreover, effectively reduces
the camera jittering problems caused by the natural oscillation in the local search conver-
gence process.

3.5.1 Lazy Probabilistic Roadmap

The path planning algorithm employed in CamOn is a Lazy Probabilistic Roadmap (Bohlin
and Kavraki, 2000), which is an evolution of the original Probabilistic Roadmap Method
introduced by Kavraki et al. (1994). In a standard PRM, the path planning process is
separated in two phases: the learning and the query phase. In the first phase, the algorithm
builds a probabilistic roadmap by randomly sampling the search space and storing the valid
solutions as nodes of a fully connected graph (see Fig. 3.11a). Each edge of the graph is
then checked for collisions and the ones for which a collision is detected are disconnected.
This way, only a graph of the possible transitions is built (see Fig. 3.11b). In the second
phase, a query asks for a path between two nodes; to process the query, any graph path
finding algorithm can be used, for instance, a Dijkstra’s algorithm (Dijkstra, 1959) can be
used if the shortest path is required.
In a dynamic problem, such as camera control where the starting and target position
change at each iteration, the two phases need to be recomputed continuously to be able to
query for a path on a constantly updated probabilistic roadmap. However, the path needs
to be re calculated at every frame; therefore the changes of the two nodes are generally
very small and we can assume that a large part of the graph and the path computed at
the previous iteration is still valid at the next one. For this reason, a Lazy PRM is used
compute the path at each iteration. The main difference between this algorithm and its
original form (Kavraki et al., 1994) is that, at each iteration of the learning phase, the
roadmap is not reconstructed but it is updated by removing the nodes which have been
already crossed and by adding a number of new nodes positioned towards the new target
configuration. The collision checks are then performed only on the edges connecting the
new nodes, thus, substantially reducing the computational cost of each iteration.
According to our implementation, a set of random (normally distributed) points is gen-
erated in the space between the two camera positions; the mean of the mixed-Gaussian
distribution is located halfway between the two positions and its standard deviation equals
half of their distance. All the generated points as well as the current camera position and

45
Chapter 3. Automatic Camera Control

Tar
get Tar
get
Posit
ion Posit
ion

Cur
rent Cur
rent
Posi
ti
on Posi
ti
on
(a) Graph generation (b) Shortest path

Figure 3.11: Probabilistic Roadmap

the newly found solution define the nodes of a 3D graph which is fully interconnected. All
the point-connectors which cross an object are removed and the shortest path between the
current position and the goal position is calculated (see Fig. 3.11) and used to animate the
camera. Since our implementation is based on Lazy PRM, the graph is not recomputed
at each iteration and only a new node containing the new position is added to the graph,
thereby, reducing the computational cost of path planning and generating a more stable tra-
jectory for the camera compared to the authors’ previous solution that employed standard
PRM (Burelli and Yannakakis, 2010a).

3.6 Summary

This chapter introduced the problems of virtual camera composition and virtual camera
animation and described their characteristics from a numerical perspective. The concepts
of frame and animation constraints are defined as well as their relationship with the camera
parameters. Virtual camera composition has been defined as an optimisation problem and
its objective function has been described in all its components. Such objective function is
the linear combination of the objective functions corresponding to each frame constraint
used to describe the controller’s task. The objective functions of all constraints have also
been mathematically defined.
The second part of the chapter described our approach on the composition and anima-
tion problems by initially introducing the CamOn camera controller architecture and later
describing the algorithms used for the controller’s implementation and evaluation. The

46
3.6. Summary

CamOn controller addresses the key problems of camera composition and animation by
combining advantages from a set of algorithms with different properties. In particular, we
combine a local search algorithm with a population-based algorithm to couple the compu-
tational efficiency of the first with the robustness of the second. Finally, the solution found
through the combination of these algorithms is used as a target to guide a 3D path planning
algorithm.
The hybrid optimisation algorithm proposed is a modified Genetic Algorithm with a
new genetic operator based on an Artificial Potential Field algorithm. The path planning
algorithm used in the animation module is a Lazy Probabilistic Roadmap Method. A
set of other state-of-the-art optimisation algorithms, used for the optimisation module’s
performance evaluation (see Chapter 6), is also described in the chapter.

47
Chapter 3. Automatic Camera Control

48
Chapter 4

Adaptive Camera Control

This chapter1 describes a methodology to model the relationship between camera placement
and playing behaviour in games for the purpose of building a user model of the camera
behaviour that can be used to control camera movements based on player preferences.
An hypothetical optimal automatic camera control system should provide the right tool
to allow designers to place the camera effectively in dynamic and unpredictable environ-
ments; however, a limit of this definition is that it excludes the player from the control
loop. Tomlinson et al., in their paper on expressive autonomous cinematography (Tomlin-
son et al., 2000), quote a statement by Steven Drucker at SIGGRAPH ’99: ”It was great!
I didn’t notice it!”. Drucker was commenting Tomlinson’s work presented at SIGGRAPH
that year and, in his comment, he clearly associates the quality of a camera control system
with the lack of intrusiveness; however, the way we can achieve such a result or how it is
possible to take the control of the camera from the player but still moving the camera the
way the user would have wanted (or as close as possible) are open research questions. We
believe that, to bridge the gap between automatic and manual camera control, the camera
objective should be affected by the player. To achieve this goal, we propose a new approach
to automatic camera control that indirectly includes the player in the camera control loop.
In our view, the camera control system should be able to learn camera preferences from the
user and adapt the camera profile to improve the player experience.
For this purpose, we investigate player preferences concerning virtual camera placement
and animation, and we propose a methodology to model the relationship between camera
behaviour, player behaviour and game mechanics. Player behaviour describes the way the
player performs in the game — e.g. how many jumps she performs, while camera behaviour
describes how a player moves the camera and what she would like to frame with it. In the
proposed approach, we model camera behaviour using a combination of gaze and camera
position at each frame. Combining gaze data with camera data allows a finer analysis of the
1
Parts of this chapter have been published in (Burelli and Yannakakis, 2011; Picardi et al., 2011)

49
Chapter 4. Adaptive Camera Control

Behaviour Modelling Adaptation

Collect Data Gameplay

Gaze Camera Camera Behaviour Model

Clustering Gameplay Camera Profile

Camera Behaviours Automatic Camera Controller

Machine Learning Camera

Camera Behaviour Model

Figure 4.1: Camera behaviour modelling and adaptation phases of the adaptive camera
control methodology. The first phase leads to the construction of a camera behaviour
model, which is used in the adaptation phase to drive the automatic camera controller and
generate the adaptive camera behaviour.

player’s visual behaviour permitting, not only to understand what objects are visualised by
the player, but also which ones are actually observed. This information permits to filter
exactly which object is relevant for the player among the ones visualised by the player
through her control of the virtual camera. A cluster analysis of the gaze data collected is
run to investigate the existence of different virtual camera motion patterns among different
players and different areas of the game.
Moreover, the relationship between player behaviour and camera behaviour is modelled
using machine learning. Artificial Neural Networks (ANNs) are employed to build predictive
models of the virtual camera behaviour on player behaviour in earlier stages of the game.
These models are used to drive the automatic camera controller and provide a personalised
camera behaviour.
The two aforementioned phases of the methodology can be seen in Figure 4.1: the step
of the first phase is required to build the camera behaviour model for a game (left diagram)

50
4.1. Camera Behaviour

Figure 4.2: A screenshot from a 3D platform game in which the objects observed by the
player are highlighted by green circles.

which is later used to detect the most appropriate camera behaviour for the player when
the game is played (right diagram).
The remainder of the chapter describes the different techniques used to build the mod-
els and the adaptation mechanism. Chapter 8 presents an experiment demonstrating the
effectiveness of the methodology on a 3D platform game.

4.1 Camera Behaviour

Camera behaviour can be modelled directly using the data about camera position relative
to the avatar. However, this approach would fail revealing which objects the player wants
to watch during play. A better approximation would be achieved by analysing the objects
present on screen through each area. The presence of a certain object on the screen, however,
does not necessarily imply an intentionality of the player; e.g. the object might be on the
screen only because it is close to an object the player is interested to. The gaze data available
permits to overcome this limitation since, using the gaze position, it is possible to understand
which object is actually observed among the ones framed by the player. Therefore, to model
the camera behaviour, we combine camera movements and gaze coordinates to identify the
objects observed by the player at each frame. Figure 4.2 shows an example of how gaze
information con help filtering the objects framed by the camera, identifying the highlighted
objects as the only ones receiving visual attention from the player.
In addition to the gaze information, a number of features regarding the dynamic camera
behaviour are required, such as the average camera speed or the maximum and minimum

51
Chapter 4. Adaptive Camera Control

acceleration. All this information combined is used to build a model describing how the
player moves the camera in a specific game under different game conditions.
To generate the camera behaviour model for a specific game, we propose to collect the
aforementioned data regarding the players’ camera behaviour while playing the game using
a manual camera controller (e.g. during an early stage of the game testing) as seen in
Figure 4.3. Subsequently, following the approach outlined by Mahlmann et al. (2010) we
propose to mine the logged data to identify a set of prototypical behaviours and then use
machine learning to build a model connecting the player in-game behaviour with the camera
behaviour types. This model is used to adapt the camera behaviour during the actual game
play based on the player’s in-game behaviour.

4.1.1 Gaze

Eye movements can be recognised and categorised according to speed, duration and direc-
tion (Yarbus, 1967). In this work, we focus on fixations, saccades and smooth pursuits.
These movements, among other, describe how a person looks at a scene and how her visual
attention is distributed between the different objects visualised. A consistent pattern of
eye movements, called ”saccade and fixate” strategy can be found in all humans; a saccade
occurs when a person is rapidly switching her attention from one point to another and a fix-
ation is an eye movement that occurs when a subject focuses at a static object. If the object
observed is moving, the fixation movement is replaced by a smooth pursuit. According to
Land and Tatler (2009), there are two reasons for those different movements: first, the area
of the eye with maximum acuity of vision, the fovea, covers only one in four-thousandth of
the retinal surface, so the eye needs to move to place the desired object’s projected image
on this area. Second, gaze must remain still on the target object for the photo-reception
process to acquire an unblurred image.
During saccadic movements the eye is effectively blind due to the motion blur and active
suppression; therefore, to model the player’s visual attention, we will focus on fixations and
smooth pursuit movements. A fourth movement that Land and Tatler (2009) identify
in the human eye movements repertoire is Vergence; however, it is not considered in this
methodology as it occurs only when looking at objects at different distances from the viewer.
This kind of movement does not occur while looking at a virtual environment on a screen,
as even objects at different virtual distances have the same physical distance to the eyes.
These two movements are not necessarily accurate indicators of cognitive processing,
since a person is able to process visual information also surrounding the gaze position and
the gaze itself can be driven by automatically towards salient image areas without an active
cognitive process. However, in a visually rich virtual environment such as a computer game,

52
4.1. Camera Behaviour

the locus of attention coincides with the gaze direction (Irwin, 2004) and, moreover, due
to the task oriented nature of games, the main reason fixations and pursuits are mainly
attention driven (Land, 2009).
The detailed characteristics of the aforementioned eye movements are as follows:

Saccade Saccades are fast, stereotyped, jump-like movements of the eyes with speeds up
to 900◦ /s. Saccades can be motivated by external stimuli, such as an object appearing
in the field of view, or by internally generated instructions. These movements occur
before and after a smooth pursuit or a fixation movement to relocated the visual focus.

Fixation Fixations are pauses in the eye scanning process over informative regions of inter-
est. During these pauses the eye is able to collect visual information about the fixation
target area. They are usually intervals between two consecutive saccadic movements
and they usually have a duration of at least 100 ms. Longer fixation duration implies
more time spent on interpreting, processing or associating a target with its internal-
ized representation. During a Fixation eyes make small jittery motions, generally
covering less than 1-2 degrees. This measure is called fixation dispersion and it is one
of the measures usually calculated in order to classify a sequence of gaze records as a
fixation.

Smooth pursuit Smooth pursuits are a class of movements similar to fixations and they
can be observed when the gaze follows a moving object. During a smooth pursuit
the velocity will be similar to the target’s speed, up to 15◦ /s. Above this velocity
the smooth pursuit will be interrupted by catch-up saccades, as the eye is unable to
follow the target.

53
Chapter 4. Adaptive Camera Control

Figure 4.3: Example of a data collection setup. The player plays a game and manually
controls the virtual camera. The game keeps track of the virtual camera placement, the
player behaviour and the gaze position on the screen; gaze position is recorded using a
gaze-tracking device. The software used in the setup portrayed in the figure is the ITU
Gaze Tracker (http://www.gazegroup.org), and the sensor is an infra-red camera.

the three-dimensional virtual camera position, the gaze position on screen and the in-game
events.
The gaze position on screen can be recorded using a gaze-tracking system; these systems
usually employ one or more cameras tracking the orientation of the eyes and the head
position. Figure 4.3 shows an example of the setup needed during the data-collection phase:
the experiment participant plays the game at a desk sitting on a chair with rigid back and
no wheels (to reduce head movements), the gaze is tracked using an infra-red camera to
detect and track eye features.

4.1.2 Behaviour Identification

To investigate and create a model of the relationship between camera behaviour, player
behaviour and game-play, we analyse the collected data through two steps: identification
and prediction. In the first step, we use a clustering technique to extract the relevant
camera behaviours and analyse their characteristics, while in the second step, we build a
model based on this categorisation able to predict the correct camera behaviour given a set
of game-play and player behaviour data.
The number of distinct camera behaviours as well as their internal characteristics can be
based, in part, on domain knowledge. One can infer camera behaviour profiles inspired by a

54
4.1. Camera Behaviour

theoretical framework of virtual cinematography (Jhala and Young, 2005) or alternatively

follow an empirical approach — as the one suggested here — to derive camera behaviour
profiles directly from data. The few existing frameworks focus primarily on story-driven
experiences with little or no interaction, thus are not applicable in our context. Therefore,
we adopt a data-driven approach and we employ clustering on the gaze-based extracted
features for the purpose of retrieving the number and type of different camera behaviours.
Before clustering can be applied to the data, the data has to be divided in records
describing each a single triple of player behaviour, gameplay and camera behaviour. De-
pending on the types of game and the length of the recorded experience the records might
vary in temporal and spatial length. The observation times describing the gaze behaviour
should be normalised for each record depending on the records length.
The granularity of the subdivision in records depends on the game characteristics and
three different criteria can guide this process: time, space and task. According to task
criterion, the logged data for each player is divided in records representing the experience
for each task completed by the player. While this solution is probably the most appropriate,
as it is based on the game characteristics, it is often inapplicable as some part of the game
might have unclear tasks or multiple ones active at the same time. The space criterion
is simpler to apply as in many games there are well defined areas, e.g. Devil May Cry
(Capcom, 2001) or Halo (Microsoft, 2001); therefore, it is easy to identify the beginning
and the end of each record. In case neither of the previous criteria can be applied, the
overall experience can be divided in time chunks of variable length depending on the game
characteristics. The player behaviour in each game chunk is described in quantitative terms
and the number and type of the factors varies depending on the game. Moreover, depending
on the complexity and variety of the game mechanics, the records might be sorted into
different groups according to the type of interaction, so that multiple gameplay specific
models can be built.
After having divided the data in records, a clustering algorithm applied on the camera
behaviour data can be used to identify the types of behaviour represented by the collected
data. The algorithm employed in the data analysis presented in this thesis is K-means (Mac-
Queen, 1967), a simple and well known algorithm for data clustering. In K-means, each
record can be thought of as being represented by some feature vector in an n-dimensional
space, n being the number of all features used to describe the objects to cluster. The algo-
rithm then randomly chooses k points in that vector space; these points serve as the initial
centroids of the clusters. Afterwards, each record is assigned to the centroid they are closest
to. Usually, the distance measure is chosen by the user and determined by the learning task.
After that, a new centroid is computed for each cluster by averaging the feature vectors of

55
Chapter 4. Adaptive Camera Control

Hidden 1

Input 1

Hidden 2

Input 2 Output 1

Hidden 3

Input 3

Hidden 4

Figure 4.4: An example of a fully connected feed-forward artificial neural network. Starting
from the inputs, all the neurons are connected to all neurons in the subsequent layer.

all records assigned to it. The process of assigning records and recomputing centroids is
repeated until the process converges.
K-means requires initial knowledge of the number of clusters k existent in the data to
minimise the intra-cluster variance. To overcome this limitation, the algorithm can be run
with progressively higher k values and the clusters generated at each run are evaluated
using a set of cluster validity indexes such as the Davies-Bouldin (1979), the Krzanowski-
Lai (1988) or the Calinski-Harabasz (1974). The number of clusters can be selected using
a majority voting mechanism; the algorithm runs for a number of times for each k and the
run with the smallest within cluster sum of squared errors is picked.

4.1.3 Prediction

Once the camera behaviour patterns are identified, we proceed by modelling the relationship
between game-play and camera behaviour types. More precisely, since the model is intended
to select the most appropriate camera behaviour that fits the player’s preferences in the
game, we attempt to approximate the function that maps the game-play behaviour of the
player to the camera behaviour. For this purpose, we use Artificial Neural Networks (ANNs)
which are chosen as a function known for its universal approximation capacity.
An ANN is a bio-inspired data structure used to model arbitrary non-linear mappings
between inputs and outputs, typically used for pattern recognition and association (Haykin,
2008). The computation is performed with a number of neurons, the neurons are connected

56
4.1. Camera Behaviour

in a graph, and typically propagate signals through the network (see Figure 4.4). The
signal’s path through the network is controlled by both the structure of the network and
the weights, w, that are placed at each connection between neurons. Each neuron in the
network transfers the input values to its output according to a transfer function fiT defined
as follows:
X
yi = fiT ( wji xji + θi ) (4.1)
J

where yi is the output of the ith neuron, xji is the j th input of the ith neuron, wji is the
weight of the j th input of the ith neuron, and θi is the bias of the ith neuron.
For the network to produce the desired association between its inputs and its outputs,
it is necessary to find the configuration of size (number of nodes and layers) and weights of
the connections that generate the desired function. The process of finding these parameters
is called learning; depending on the application of the network and the knowledge of the
function to be approximated, three types of learning can be performed: supervised, rein-
forcement and unsupervised. For the purpose of this thesis we will focus only on supervised
learning; however, the reader is advised to refer to Haykin (2008) for fundamentals on ANNs
and other learning methods.
In supervised learning, a set of valid input-output pairs, called training set, are used
to train the network. At each training iteration, an input vector is presented to the ANN
along with a corresponding vector of desired (or target) outputs, one for each neuron, at the
output layer. The discrepancies between the desired and actual response for each output
node is calculated and this error is used to correct the weight of the network according to
a learning rule.
One of the most common learning rules for training a neural network is back-propagation
(Rumelhart et al., 1986). This algorithm is a gradient descent method of training in which
gradient information is used to modify the network weights to minimise the error function
value on subsequent tests of the inputs. Using back-propagation, it is possible for a feed-
forward artificial neural network to learn the association between the inputs and the outputs
in the training set; moreover, if coupled with techniques such as early stopping (Caruana
et al., 2001), the resulting network will be able to generalise the association also to previously
unseen inputs.
ANNs and back-propagation can be used to learn the connection between the player
in-game behaviour and its camera behaviour; however, to be able to adapt the camera
behaviour, the network needs to be trained to predict the future player choice over camera
instead, so that the automatic camera controller can be instructed to generate the desired
camera behaviour exactly at the moment the behaviour is needed.

57
Chapter 4. Adaptive Camera Control

Player Behaviour

Record 0

Record 1
Camera Behaviour

... Neural Network Record N

Record N-1

Content

Record N

Figure 4.5: Camera behaviour prediction scheme. The neural network is presented with the
features describing the gameplay characteristics of the next record and the features about
the previous player behaviour as input and returns the next predicted camera behaviour
for the next record as the output.

For this purpose, the ANN should be trained to learn the association between the
camera behaviour cluster at at the ith record and the player’s in-game behaviour in the
previously recorder records (from 0 to i − 1). Moreover, if the game includes different types
of challenges, the ANN should consider also the characteristics of the gameplay at record i
(see Figure 4.5).

4.2 Adaptation

Using the model of camera behaviour built in the data collection experiment it is possible
to predict the player’s preferences on camera during the game and use this information to
instruct an automatic camera controller and personalise the cinematographic experience.
When the player performs a switch from one chunk (area, task or time frame, depending on
the subdivision criterion) to the next one, the prediction of the neural network can be used
to decide which camera profile should be assigned to instruct the camera controller for the
next chunk of the game.
The same input-output scheme depicted in Figure 4.5, used for training, should be used
for prediction. The computational model is able to predict the camera behaviour given
information about the way the player played up to a certain point of the game. Therefore,
before the right camera profile can be selected for the player it is necessary that she plays
at least one area for each area type. A variable number of previously played game chunks

58
4.3. Summary

might be necessary to achieve the best accuracy of prediction. The length of the past
tracks depends on the characteristics of the game, therefore, it is necessary to find the best
configuration trough an empirical test at the time of training.
Once the next camera behaviour is detected, a camera profile — i.e. a set of frame
and motion constraints — should be generated to instruct the automatic camera controller.
This translation process can be approached in two ways: either a designer assigns a custom
designed camera profile to each behaviour identified by the clustering algorithm — which
is then automatically picked by the adaptation mechanism — or the profile is generated
completely based on the selected behaviour’s characteristics.
Independently of the approach chosen, the translation process between gaze based cam-
era behaviours and camera profiles is hardly generalisable over different games as the number
and the quality of the objects present on screen varies considerably. In most action games
— e.g. Tomb Raider (Eidos Interactive, 1996) or Super Mario 64 (Nintendo, 1996) — the
camera can be instructed to follow the main avatar and maintain the visibility of the objects
which have received visual attention in the camera behaviour. The weights of the frame
constraints imposed on each object can be related to the amount of time spent observing
the objects of the same kind as this information is related to the amount of attention that
and objects receives.
Such an approach can be applied to virtually any game which features an avatar and
a third person camera and it is the one used in the test case presented in Chapter 8 for a
three-dimensional platform game. The constraints imposed on the avatar and on the other
objects included in the behaviour can be changed to alter the overall composition. For very
different game genres, such as strategy games or racing games, different strategies can be
developed to transform the behaviours in camera profiles.

4.3 Summary

This chapter presented a methodology to model the relationship between camera placement
and playing behaviour in games and to generate adaptive camera behaviours. The method-
ology combines player’s gaze, gameplay and virtual camera data to build a model of camera
behaviour based on the player’s in-game behaviour. The data collected is clustered using
k-means to identify relevant behaviours and the relationship between the clusters and the
game-play experience is modelled using three ANNs. Finally, an adaptation mechanism
based on the aforementioned model is presented including a method for translating the
camera behaviour clusters into camera profiles suitable for an automatic camera controller.
An application of the method described in this chapter and its evaluation is presented
in Chapter 8.

59
Chapter 4. Adaptive Camera Control

60
Chapter 5

Test-bed Environments

This chapter1 presents the virtual environment developed to evaluate the algorithms and
methodologies presented in this thesis. The test environments will be described in detail
and their choice will be motivated in terms of completeness and generality.
Two test environments are presented in this chapter with two different purposes. The
first, a virtual camera sandbox, is an application developed to test the numerical perfor-
mance of optimisation algorithms and to evaluate the characteristics of the camera com-
position problem. The sandbox will be used in Chapter 6 to test the algorithms employed
in CamOn and to analyse the complexity of different camera composition problems. The
second test environment is a 3D computer game developed to test the proposed camera con-
troller and camera adaptation method against human players and to evaluate its effect on
player experience. On this basis, this game will be used both in the experiments presented
in Chapter 7 and Chapter 8.

5.1 Virtual Camera Sandbox

The first test-bed environment presented in this thesis is a development sandbox featuring
a three-dimensional virtual environment and a series of virtual characters with whom the
environment can be populated to yield camera control problems.
The purpose of such a tool is to stand as an abstraction layer between the user and the
virtual environment allowing him to easily define camera composition problems and test
their algorithms. The long term aim of the sandbox development is to create a flexible
testing platform that contains a set of standard and replicable camera control problems
usable as common benchmarks within the research community.
At the time this manuscript is written, the sandbox is still at an early stage of devel-
opment; however, it offers already the possibility to build composition problems with any
1
Parts of this chapter have been submitted for publication in (Burelli and Yannakakis, 2012b,a)

61
Chapter 5. Test-bed Environments

Figure 5.1: Interface of the virtual camera sandbox, showing the virtual environment and
the interface elements.

number of subjects and frame constraints. The sandbox allows to test the behaviour of
custom solving techniques on dynamic and static problems; moreover it allows to analyse
the complexity of such problems.

5.1.1 Functionalities

The virtual camera sandbox is a development tool designed to act as a testing platform for
virtual camera control algorithms. For this purpose, the sandbox contains a virtual envi-
ronment composed by a set of areas (see Fig. 5.1) with different geometrical characteristics.
The user can place any number of characters in the environment and assign to each of them
a set of frame constraints.
The interaction between the sandbox user and the sandbox can be described through
the three following steps:

1. The user connects to the sandbox and selects the area in which to perform the exper-
iment.

2. The user places a number of subjects in the environment and sets the frame con-
straints.

62
5.1. Virtual Camera Sandbox

(a) (b) (c) (d)

Figure 5.2: Main components of the virtual camera sandbox. From left to right: the subject,
the forest, the house and the square.

3. The user conducts the evaluation.

The sandbox supports four possible areas to pick: forest (see Fig. 5.2b), city (see
Fig. 5.2d), house (see Fig. 5.2c) and everywhere. The three first options restrict the exper-
iment to the respective area, the last option configures the sandbox in order to allow for
sampling in any point of the virtual environment.
In the second step, a number of subjects can be placed anywhere within the previously
selected area. After placement, each subject will be affected by the physical model of the
virtual environment; therefore, it will move until it reaches a stable condition. At this point,
it will be possible to assign one or more frame constraints to the subject just placed in the
environment. The sandbox supports all the constraints defined in Section 3.1, i.e. Object
Visibility, Object Projection Size, Object Frame Position and Vantage Angle.
The evaluation step can be performed with two different ends: to evaluate an algorithm’s
behaviour or to analyse the problem’s objective function. In the first case, the sandbox can
be used as a black box providing the objective function value of any camera configuration
during the optimisation process. In the second case, the sandbox provides several functions
to sample the objective function, visualize it in the virtual environment and save it to a log
file for further analysis.
The evaluation con be conducted on a static problem. In the first case, the subjects stand
still in the position they have been placed initially. In the second case, the subjects move in
random directions turning at random times and avoiding the obstacles in the environment.
The user can set the movement speed and can control the animation flow through the play,
pause and next functions.

5.1.2 Environment Elements

The sandbox allows the user to place a set of subjects (see Fig. 5.2a) within the virtual
environment to define the camera composition problem. These subjects are human-like 3D
characters approximately 1.8 meters high (the measure is defined in proportion to the rest

63
Chapter 5. Test-bed Environments

(a) (b) (c)

Figure 5.3: Maximum value of the objective function sampled for each area of the sandbox
across the X and Z axis. The position and orientation of each subject is identified by
the black marks. The composition problem evaluated in each figure contains three frame
constraints: an Object Visibility an Objective Projection Size and a Vantage Angle.

of the environment elements). Each subject can be placed anywhere in the environment
and, when placed, it will lean on the ground in standing posture. The orientation of the
subject can be selected by the user; however, the subject can be rotated only along the
vertical axis.
The virtual environment in which the subjects can be placed is a collection of three
distinct environments: forest, house and square. Each environment features different char-
acteristics and challenges in terms of camera composition and animation, and overall they
represent a selection of typical game environments. This section will present the three ar-
eas and will give a short description of the characteristics of each area, using a Long Shot
problem as a representative example, to motivate for their presence. An in depth analysis
of the camera control problem in each area will be presented in Chapter 6.
The three test environments have been selected to provide the widest possible range of
geometrical features commonly present in computer games and interactive virtual environ-
ments. Moreover, they have been designed to incorporate a wide variety of camera control
challenges with different level of complexity. They include one indoor and two outdoor
environments with different geometrical characteristics: a forest, an house and a square.
The forest environment (see Fig. 5.2b) is an outdoor virtual environment composed by
a cluster of trees; the subjects are placed between these threes which act as partial occludes
and scattered obstacles. As it is possible to observe from the sample landscape depicted in
Fig. 5.3a, such environment influences the objective function landscape by increasing the
search space modality; this is mostly due to the fact that the three trunks are thin occluders
which produce a slicing effect in the objective function landscape.
The second environment, the house (see Fig. 5.2c), is an indoor environment with closed
spaces separated by solid walls. As described by Burelli and Yannakakis (2010b), walls act

64
5.1. Virtual Camera Sandbox

User

Figure 5.4: Virtual camera sandbox architecture.

as large occluders inducing large areas of the objective function landscape to have little or
no visibility gradient, as visible in Fig. 5.3b.
The last environment, the square (see Fig. 5.2d), is the simplest one from a geometrical
perspective. It is largely an empty space with one single obstacle placed in the center.
This environment is expected to be the simplest in terms of optimisation complexity, as the
lack of obstacles produced mostly landscapes with smooth gradients and a small number of
modalities. Figure 5.3c illustrates an example of such a smooth objective function landscape.

5.1.3 Architecture

The sandbox is a stand-alone application developed using the Unity 3D game engine. It con-
tains a virtual environment including the aforementioned elements and camera composition
objective function evaluation module.
As it is possible to see in Fig. 5.4, the sandbox can be controlled by the user via a
TCP socket. This communication method has been chosen for two reasons: using socket
communication allows the sandbox to run on a machine different from the one of the user
and sockets are supported by most of programming languages.
At the time this document is written, the sandbox includes libraries for both Java and
C++; these languages have been initially selected as most of the optimisation algorithms
in the literature have been implemented in either of the two. To conduct experiments on
the sandbox, a system user needs only to include the sandbox library in her project and
use the methods to connect, build the camera control problem and evaluate the objective
function.
The user can also interact with the sandbox through its user interface; however, it is
possible only to setup a problem and test a set of standard optimisation functions included
in the software.

65
Chapter 5. Test-bed Environments

(a) (b) (c) (d) (e) (f)

Figure 5.5: Main components of the Lerpz Escape game. From left to right: player’s avatar
(Lerpz), a platform, a collectible item (fuel canister), an enemy (copper), a respawn point
and Lerpz’s spaceship.

5.2 Lerpz Escape

The second test environment described in this thesis is a three-dimensional platform game
designed to evaluate the applicability and the effects of automatic camera control in com-
puter games. Within the context of this thesis, this test environment is used to evaluate
both our automatic camera control system and our approach to camera behaviour modelling
and automatic camera adaptation against human players.

5.2.1 Game Mechanics

The game is a custom version of Lerpz Escape, a tutorial game by Unity Technologies2 . It
features an alien-like avatar (Lerpz, see Fig. 5.5a) trapped in a futuristic 3D environment
made of floating platforms (see Fig. 5.5b). The platforms can be flat, or be composed
by multiple smaller platforms with different heights clustered together. Each platform can
be connected to another platform through a bridge or be disconnected, in which case, the
avatar is required to jump to move from one platform to the other.
The game includes three main elements/objects that avatar can interact with: fuel
canisters, coppers and re-spawn points. Fuel canisters (see Fig. 5.5c) are floating items that
the player needs to collect to complete the game. Coppers (see Fig. 5.5d) are animated
robots which chase the avatar and hit it until it falls from a platform. Coppers are normally
static and get activated when the avatar enters the platform they are placed on. The player
can kill a copper by moving the avatar close to it and hitting it three times; killing allows
the player to collect one extra fuel canister which spawns out of the robot’s wreckage. Re-
spawn points (see Fig. 5.5e) are small glowing stands placed on some platforms. When the
avatar touches a re-spawn point this becomes activated; each time the avatar falls from a
platform it reappears on the last activated re-spawn point.
2
http://www.unity3d.com

66
5.2. Lerpz Escape

(a) Tutorial level environment

(b) Main level environment

Figure 5.6: The two virtual environments employed in the user evaluation. The avatar is
initially placed at the right side of the map, close to the dome-like building, and has to
reach the space-ship at the left side of the map.

The goal of the player is to control Lerpz and make him reach his spaceship (see Fig. 5.5f)
at end of the level. However, the spaceship is surrounded by a force field which does not
allow any access. To deactivate the field, Lerpz has to collect a number of fuel canisters (2
for the tutorial level and 6 for the main level) across the virtual environment. Moreover,
the task has to be performed within a predefined time window, otherwise the current level
will stop and the game will proceed to the next level whether the goal has been achieved or
not.
The game designed for this evaluation stands as a prototypical platform game, as it
incorporates the typical aspects of the genre. According to the classification described by
Wolf (2001), the platform games are defined as “games in which the primary objective re-
quires movement through a series of levels, by way of running, climbing, jumping, and other
means of locomotion”; moreover, according to Wolf, such games often “involve the avoidance
of dropped or falling objects, conflict with (or navigation around) computer-controlled char-
acters, and often some character, object, or reward at the top of the climb which provides
narrative motivation”.
Therefore, we can reasonably assume that the results of the evaluation performed using

67
Chapter 5. Test-bed Environments

(a) Fight area (b) Jump area

(c) Collection area

Figure 5.7: The three different area types met in the test-bed game.

the aforementioned game are generalizable to the whole platform games genre. Moreover,
still according to Wolf’s classification, it is possible to hypothesise that such results would,
at least, partially, apply also to genres such as Adventure and Obstacle Course which have
a very similar interaction scheme.
Moreover, the game has been designed not only to maximise the applicability of the
evaluation results, but also to incorporate a highly interactive gameplay that minimises
narrative. The reason for this choice is our desire for the evaluation of automatic camera
control to focus on interaction rather than on story telling.

5.2.2 Stages and Areas

The game consists of two stages: a tutorial and the main level. Each level starts with the
avatar standing on the initial re-spawn point, as shown in Fig. 5.8a and Fig. 5.8b, and ends
when the avatar reaches the spaceship in the last platform. The tutorial level includes a
walk-through of the game controls and an explanation of the purpose and the effects of
the different objects available. Moreover, during the tutorial the player is presented with
all the four main challenges she will face during the main game: jumping from platform
to platform, facing and enemy (i.e. a copper ), collecting items (i.e. fuel canisters) and
reaching the end of the level (i.e. the spaceship).
In the tutorial level the subjects have to face one copper, collect at most five fuel canisters
and activate two respawn points. Only one copper is present in the main level, while fuel

68
5.2. Lerpz Escape

(a) Manual camera control game (b) Automatic camera control game

Figure 5.8: Sceenshots from the game used during the evaluation displaying the different
controls schemes. The game interface displays the game controls configuration, as well as
the current number of collected canisters and the time remaining to complete the level.

canisters and respawn points are 10 and 3, respectively.

The two stages are composed by a series of sub-stages we name areas which are classified
as jump, fight or collection areas. The areas are categorised according to the game-play
experience they offer and the type of challenge they pose to the player. In case an area offers
more than one challenge type, the category is defined by the most threatening challenge.
The challenges are sorted in decreasing level of threat as follows: fight, jump and collect.
Figure 5.7a shows a fight area where the main threat is given by the opponent copper
at the center of the platform. The jump area depicted in Fig. 5.7b is composed of several
small floating platforms; the player needs to make the avatar jump across all the platforms
to complete the area. Figure 5.7c shows an area where the main task of the player is to
collect the fuel cells placed around the platform.

5.2.3 Controls

The game features two possible control schemes along with two possible camera control
schemes: manual camera control and automatic camera control. Figure 5.8a shows how the
game controls displayed when the camera is controlled manually, while Fig. 5.8b shows how
the same controls are presented when the camera is controlled automatically. Instructions
on the level objective and what action to perform next are displayed in the lower left corner.
Control of avatar movements is common to both schemes: pressing the up arrow moves
the avatar forward (with respect to its local orientation), while pressing the down arrow
moves the avatar slowly backward. The left and right arrows correspond respectively to a
leftward rotation and a rightward rotation. Pressing a horizontal arrow key at the same
time as a vertical arrow key will make the avatar curve towards the selected direction.

69
Chapter 5. Test-bed Environments

On the contrary, the buttons used to activate the punch and jumps actions differ in
the two camera control schemes. If the player controls the camera, the left mouse button
activates the punch action and the right mouse button activates the jump action. If the
camera is controlled by the automatic camera control system, the punch action is activated
by pressing the Z key and the jump action is activated by pressing the X key. The difference
of the punch and jump controls between the two camera control schemes has been introduced
after a pilot evaluation showed that the players felt uncomfortable at controlling the jump
and punch actions using the mouse buttons when not controlling the camera with it.
The last difference in the game controls between the two schemes lie in the control of
the camera. When the camera is not automatically animated by the automatic camera
controller, the player controls the camera position and orientation using the mouse. In such
case, the camera automatically follows the avatar at a fixed distance and angle: the player
can control the angles by moving the mouse and the distance suing the mouse scroll wheel.
Moving the mouse forward increases the vertical angle, while moving it backward decreases
it; the same applies for leftward and rightward mouse movement.
The choice of a mouse plus keyboard control scheme for the game, although not ideal
for the genre, is motivated by the necessity to make the game playable in a browser. Such a
choice allows to conduct experiments with wider audiences, reduced experimentation effort
and a more believable playing environment (the participants play in a location of their
choice) which leads possibly to a more genuine experience.

5.2.4 Technical Characteristics

The game has been developed using the Unity 3D game engine and built for the web player
platform, in order to be played via a web browser. The avatar model is composed of
3334 triangles, platforms are composed on average of 400 triangles each, enemies of 1438
triangles each, fuel canisters of 260 triangles each, respawn points of 192 triangles each
and the spaceship of 1548 triangles. Overall, the tutorial level geometry is comprises of
approximately 15500 triangles with 5 light sources, while the main level of approximately
20000 triangles with 6 light sources. All objects are skinned with one or more texture and
both the stages contain approximately 86.2 MB of textures.

5.3 Summary

The chapter presented the two 3D applications that will be used in the experiments pre-
sented in Chapter 6, Chapter 7 and Chapter 8. The two environments have been developed
to be able to evaluate both the algorithmic performance of camera controller and the effect
of automatic camera control and adaptation on human players.

70
5.3. Summary

In summary, the first test environment is a development tool (a sandbox) that allows
to easily create a wide variety of camera composition problems, to evaluate the resulting
objective function and to test different camera control algorithms. The second test environ-
ment is 3D platform game integrating all the typical gameplay elements of the game genre.
It has been developed to support both automatic and manual camera to assist the design of
comparative user survey experiments that aim to evaluate the differences between camera
control paradigms.

71
Chapter 5. Test-bed Environments

72
Chapter 6

Hybrid Genetic Algorithm

Evaluation

This chapter1 intends to evaluate the optimisation algorithm at the core of the automatic
camera control architecture described in Chapter 3. For this purpose, we propose an analysis
of the real-time virtual camera composition problem from a dynamic optimisation perspec-
tive and, based on this analysis, we present a comparative study of the proposed solution
against a set of state-of-the-art algorithms in automatic camera control. The performance
of this hybrid GA is evaluated on its ability to generate well composed shots in real-time in
a set of unpredictable dynamic problems. The results obtained are compared against the
optimisation algorithms described in Chapter 3: Genetic Algorithm, Differential Evolution,
Particle Swarm Optimisation, Sliding Octree and Artificial Potential Field. The compar-
ative study is conducted on a set of scenarios representing the different challenges usually
met in virtual camera optimisation. In each scenario, the algorithms are evaluated on their
reliability and accuracy in the optimisation of four different problems: optimising the visi-
bility, optimising the projection size, optimising the vantage angle and the combination of
these problems. Moreover, each problem in each scenario is tested with one, two or three
subjects. To describe these test problems and to better understand the performance of the
algorithms, the objective functions connected to each test problem are analysed and a set
of complexity measures are suggested. The problems are evaluated on their optimisation
complexity both as static and dynamic optimisation problems.

6.1 Test-bed Problems

This section presents the set of 36 dynamic virtual camera problems designed to evaluate
the proposed hybrid GA. This study includes 12 dynamic composition problems tested in
1
The content of this chapter is currently at the second stage of review for the Computer Graphics Forum
journal (Burelli and Yannakakis, 2012b)

73
Chapter 6. Hybrid Genetic Algorithm Evaluation

the 3 virtual environments described in Section 5.1.2. The 12 composition problems include
the optimisation of a Vantage Angle constraint, a Projection Size constraint, a Visibility
Constraint and the combination of all three constraints. These four conditions are evaluated
with one, two and three subjects. In each test problem, the subjects are positioned and
move around the environment in a random fashion. The subjects speed is also randomised
and varies from 0 to the average human walking speed (1.38 m/s). The subjects are shaped
as stylized human beings, and have human-like proportions and height (approx. 1.8 m tall).
The test problems include up to three moving subjects, with up to three frame constraints
imposed on each subject. We believe that such a range of problems covers a large number of
the possible virtual camera composition tasks in terms of optimisation challenges, making
them an ideal benchmark to evaluate and compare the performances of the algorithms
included in this thesis.

6.2 Complexity

The choice of appropriate case studies is a key aspect of any comparative analysis since those
affect both the validity and the extensibility of any conclusions derived. Moreover, to analyse
and compare the performance of the proposed algorithm fairly, it is necessary to define
common complexity measures (Törn et al., 1999) within common test problems (Floudas
and Pardalos, 1990). Such measures should express how hard is to find and track the global
optima of a certain optimisation problem.
In our effort to measure the complexity of each scenario examined we identified four
different complexity heuristics: one fitness-based measure and three measures inspired by
the complexity analysis of Törn et al. (1999). Moreover, since we aim to analyse a set of
dynamic virtual camera composition problems and to evaluate the algorithms on them, we
include also a measure of the problem’s dynamism, on top of these four measures.

6.2.1 Fitness-based Measures

The average fitness of a population of solutions has been used as a complexity measure
for occlusion minimisation test problems (Burelli and Yannakakis, 2010b). This measure,
however, did not prove to be fully reliable as the algorithms tested did not perform according
to the estimated problem complexity. Moreover, the average fitness does not consider any
topological aspect of the fitness landscape, so it gives limited space for analysis and is is
not considered in this thesis.
Jones and Forrest (1995) proposed the correlation between fitness and distance to a
global optimum as a statistical measure of complexity. Fitness Distance Correlation (FDC)
is generally considered a valid complexity measure but, as Jones and Forrest point out,

74
6.2. Complexity

it fails to capture the complexity of problems that exhibit scattered fitness landscapes —
i.e. landscapes that contain multiple and densely distributed local optima, separated by
areas with little or no gradient. As seen in Figure 5.3, the objective functions we consider
clearly showcase a scattered fitness landscape, potentially with a large number of sparse
local optima. The complexity of such landscapes is only poorly represented by a linear
correlation. Even though the above disadvantages limit the use of FDC for the description
of virtual camera control complexity, FDC is included in this study as a reference.

6.2.2 Complexity Measures by Törn et al.

Törn et al. (1999) place global optimisation problems into four categories (unimodal, easy,
moderate and difficult) by analysing four key aspects of the problem:

• The size of the region of attraction of global optima, p∗ . The p∗ region is defined as
“the largest region containing global optima, xm , such that when starting an infinitely
small step strictly descending local optimisation from any point in p∗ then xm will be
found each time.”

• The number of affordable objective function evaluations in a unit of time.

• If and how embedded are the global optima. I.e. if there exists local optima near
the global optima region of attraction, so that sampling around these leads to the
detection of better solution and, eventually, of a global optimum.

• The number of local optima.

The above-mentioned aspects may be transposed to three complexity measure values:

P , E and N oL (number of local optima). The P value represents the probability that the
region of attraction is missed when randomly sampling the solution space for one frame
time (16.6 ms) and is defined as follows:

P = (1 − p∗ )Nf (6.1)

where P ∗ is the percentage the whole search space being a region of attraction of global
optima and Nf denotes the affordable number of objective function evaluations within 16.6
ms.
The E value measures the degree of how embedded are the global optima within local
optimal solutions, which further implies that search near those local optima may lead to
optimal solutions. The value of E is given by:
n
1 X
E =1− Dmin (xi ) (6.2)
nDmax
i=1

75
Chapter 6. Hybrid Genetic Algorithm Evaluation

where Dmin (x) is the distance to the closest global optima region of attraction from each
solution x and Dmax is the maximum distance between two points in the search space.
In the experiment’s search space, this distance equals to the length of the parallelepiped’s
diagonal. While this formula is not defined mathematically by Törn et al., the proposed
formalisation is as close as possible to their description.
The last complexity measure considered, N oL, is the relative size of the areas of the
search space containing local optima and it is defined as follows:
Alo
N oL = (6.3)
Ago

where Alo and Ago are the sizes of the areas of space containing respectively local optima
and global optima.
The measures derived by the study of Törn et al. (1999) appear to be appropriate for
capturing problem complexity of virtual camera control. The three measures, collectively,
draw a broader picture of problem complexity than FDC and, moreover, P considers the
evaluation time as a key component of problem complexity. The latter is an important
aspect within the analysis of the camera control paradigm since execution time plays a
central role for this optimisation problem.

6.2.3 Dynamism

Since the problems presented in this study incorporate a set of dynamic elements the op-
timisation complexity cannot be only analysed in static terms. We need, therefore, to
define a measure of how complex the problems are in terms of dynamism. Such a measure
should capture the speed and the magnitude of the changes in the objective function and
should give a comprehensive measure for the whole landscape. For this purpose, we define
a measure D of the landscape dynamism calculated as follows:
K N −1
1 1 XX
D= |f (xi , tj+1 ) − f (xi , tj )| (6.4)
KT
i=0 j=0

where K is the number of samples taken from the landscape at each sampling, T is the
duration of the animation, N is the number of times the landscape is sampled during the
animation, f (xi , tj ) is the objective function value of the sample xi at j th sampling time
and f (xi , tj ) is the objective function value of the sample xi at time j + 1.
This measure is a discrete approximation of the average absolute change in the objective
function for all the points in the solutions space over unit of time. High D values correspond
to highly dynamic problems, while a D value equal to 0 corresponds to a static optimisation
problem.

76
6.2. Complexity

P E NoL FDC D
Forest
Angle 1 0.00 1.00 0.47 -0.87 1.34
Angle 2 0.02 1.00 0.45 -0.68 0.94
Angle 3 0.10 1.00 3.45 -0.73 0.77
Size 1 0.64 0.91 823.63 -0.01 0.14
Size 2 0.96 0.71 8401.20 0.00 0.15
Size 3 0.94 0.73 6382.89 -0.01 0.09
Visib. 1 0.52 1.00 46.10 -0.16 0.04
Visib. 2 0.84 0.59 158.48 -0.20 0.05
Visib. 3 0.97 0.56 17439.26 -0.06 0.03
All 1 0.47 0.85 879.82 -0.03 0.49
All 2 0.53 1.00 1267.34 -0.07 0.20
All 3 0.78 0.95 3343.36 -0.09 0.33
House
Angle 1 0.00 1.00 0.00 -0.86 1.15
Angle 2 0.12 1.00 3.23 -0.70 0.93
Angle 3 0.14 1.00 5.57 -0.70 0.89
Size 1 0.63 0.93 131.96 -0.03 0.18
Size 2 0.94 0.69 2773.30 -0.02 0.09
Size 3 0.94 0.67 4017.58 -0.02 0.09
Visib. 1 0.63 0.95 95.04 -0.16 0.05
Visib. 2 0.95 0.79 3186.23 -0.07 0.03
Visib. 3 0.97 0.81 14262.47 -0.05 0.04
All 1 0.64 0.99 2912.77 -0.05 0.44
All 2 0.84 0.99 6249.59 -0.09 0.31
All 3 0.89 0.75 8614.36 -0.06 0.33
Square
Angle 1 0.00 1.00 0.39 -0.92 1.08
Angle 2 0.15 0.88 11.40 -0.63 0.90
Angle 3 0.13 0.91 3.99 -0.74 0.85
Size 1 0.57 0.94 221.38 -0.05 0.17
Size 2 0.93 0.77 696.86 -0.04 0.11
Size 3 0.95 0.55 1902.31 -0.03 0.10
Visib. 1 0.49 0.98 54.32 -0.21 0.06
Visib. 2 0.80 0.92 533.79 -0.24 0.06
Visib. 3 0.90 0.88 1445.53 -0.18 0.06
All 1 0.35 0.94 631.58 -0.04 0.41
All 2 0.69 0.89 1633.66 -0.09 0.34
All 3 0.82 0.95 2497.88 -0.08 0.34

Table 6.1: Complexity measures for each scenario.

77
Chapter 6. Hybrid Genetic Algorithm Evaluation

6.2.4 Scenario categorisation by problem complexity

Table 6.1 presents the values of the aforementioned complexity measures for each test prob-
lem investigated. All the values have been calculated by sampling the solution space with
a resolution of 0.5 meters on the X and Z axis, 1 m on the Y axis and by sampling linearly
32 rotations per position. All the problems have been evaluated by sampling the land-
scape each half second for 10 seconds, this process has been repeated 20 times for each test
problem to minimise the effects of the initial random placement of the subjects. Each mea-
surement contains 28000 samples, for a total of 560000 samples per repetition. The static
complexity values contained in Table 6.1 are averaged over the whole number of samples
and repetitions, while the dynamism measure is an average of the dynamism calculated in
every repetition. The FDC values are also presented in Table 6.1 for comparison purposes.
It is clearly shown that FDC follows the trend of the other static complexity measures
proposed in most scenarios; however, the value becomes non representative of the problem
complexity in the most complex cases, in which the correlation value is always close to 0
despite of the complexity differences revealed by the measure defined by Törn et al. This
problem is expected, as FDC is known to be inaccurate for scattered fitness landscapes and
it does not consider the objective function evaluation complexity.
From a static optimisation perspective, the test problems are sorted into 4 categories
following the ranges proposed by Törn et al. (1999). According to this categorisation, the
vantage angle test problems with one subject are categorised as unimodal since their P
value equals to 0. The other two vantage angle test problems are categorised as easy. These
scenarios have a smaller global optima attraction area and higher evaluation times resulting
in P values greater than 0. All of the aforementioned scenarios are also highly embedded
(high E values), meaning that, even if the search algorithm ends in a local optimum,
there are high chances that it can still converge to a global optimum if a small random
perturbation is applied. The projection size and visibility test problems with one subject
also fall in the easy category in all test environments; these problems have all an average
probability to miss the global optima attraction space around 50-60% and have highly
embedded local optima (E > 0.91); however, the number of local optima is more than an
order of magnitude higher than the previous problems, identifying these problems as slightly
more complex than the previous ones. More precisely, according to the categorisation by
Törn et al., the first problems fall in the E1 category while the second ones in the E2.
The projection size and visibility test problems with more than one subject are categorised
as moderate in the square virtual environment, since they demonstrate a high P value and
embedded local optima, while they are categorised as difficult in the other two environments

78
6.2. Complexity

Forest−Visibility−3
House−Visibility−3
1.0

●
● ●
Forest−ProjectionSize−2
● ●●
●●
● Visibility
● House−all−3 ● ProjectionSize
●
●
● Angle
Square−all−3
● ● All
0.8

●
● Non dominated problems

●
● ●● ●
House−all−1
●
0.6

●
P

● ●
● Forest−all−1
●
0.4

●
0.2

●
Square−Angle−2
●● ● House−Angle−2
●

●
Forest−Angle−2 Forest−Angle−1
0.0

● ● ●

0.0 0.5 1.0 1.5

Figure 6.1: Test problems sorted in a scatter plot according to their dynamism factor D
and their landscape complexity P . The Pareto front identified by the squares contains all
the non dominated problems in terms of complexity and dynamism.

due to the lower E value. Interestingly, the combined problems do not stand as the most
complex ones; on the contrary, they exhibit a complexity that varies from easy to moderate.
When analysing the problems according to their dynamism, a different picture emerges:
the visibility and projection size problems have all a significantly lower D value than the
angle problems, revealing that the latter are the most complex set of problems from a
dynamic optimisation perspective. The reason for such a difference in terms of dynamism
is expected, as the angle objective function, depends on the subjects orientation; therefore,
even a little orientation change in terms of degrees, has an amplified effect on the objective

79
Chapter 6. Hybrid Genetic Algorithm Evaluation

function landscape proportional to the distance to the subject.

Due to the contradiction between static and dynamic complexity, we need to sort the
problems in a two dimensional space, defined by these two measures. Figure 6.1 shows the
problems sorted according to P on the Y axis and D on the X axis; it is apparent how
the combined problems — i.e. the ones including all the three frame constraints — stand
in the middle of the graph, demonstrating to have an average complexity in terms of both
dynamism and static optimisation complexity. The problems identified by a square symbol
in Fig. 6.1 belong to the Pareto front; a solution belongs to the Pareto front if it is not
dominated by any other solution in the solution space. A Pareto optimal solution cannot
be improved with respect to any objective without worsening at least one other objective;
thus, this group of problems is the one that can be identified as the most complex both
in terms of dynamism and static optimisation complexity. We identify this group as the
most representative sub set of problems and, therefore, we concentrate the evaluation of the
algorithms on these 11 problems:

1. A visibility constraint on 3 subjects in the forest

2. A visibility constraint on 3 subjects in the house

3. A projection size constraint on 2 subjects in the forest

4. All frame constraints on 3 subjects in the house

5. All frame constraints on 3 subjects in the square

6. All frame constraints on 1 subject in the house

7. All frame constraints on 1 subject in the forest

8. A vantage angle constraint on 2 subjects in the square

9. A vantage angle constraint on 2 subjects in the house

10. A vantage angle constraint on 2 subjects in the forest

11. A vantage angle constraint on 1 subject in the forest

The first three problems (1,2,3) are among the ones categorised as difficult in terms of
static optimisation complexity, while they have extremely low dynamism. We name these
problems as difficult-slow. Following the same principles, problems 4,5,6,7 are labelled as
average, problems 8,9,10 as easy-fast and problem 11 as easy-faster.

80
6.3. Algorithms Configurations

6.3 Algorithms Configurations

The algorithms employed in the comparative study have all been configured based on ex-
tensive experimentation and know successful settings.
Differential Evolution is extremely robust to the parameters choice (Storn and Lampinen,
2004; Ali and Törn, 2004); therefore, the algorithm has been configured for this study follow-
ing the parameter settings suggested by Storn (1996) as the ideal ones for a 6-dimensional
optimisation problem: the population consists of one 60 individuals, the crossover proba-
bility C is 0.8 and the differential weight Wd is 0.9.
Particle Swarm Optimisation has been employed in camera control for off-line camera
optimisation combined with a space pruning technique (Burelli et al., 2008). Based on the
results of that study, the algorithm has been implemented using a population of 64 particles,
a cognitive factor of 0.75, a social factor of 0.5 and an inertia linearly decreases from an
initial value of 1.2 to 0.2. The cognitive factor has been increased to 0.75 to increase the
diversity in the population and improve the algorithm’s ability to deal with a dynamic
objective function.
The settings of both Genetic Algorithm and hybrid genetic algorithm have been chosen
based on the successful parameter settings of our previous study on occlusion minimisa-
tion (Burelli and Yannakakis, 2010b): the population consists of 80 individuals, the crossover
probability is 100%, the mutation probability is 30% and the selection scheme is ranking
with all individuals selected for breeding. The main difference lies in the replacement policy,
which is generational as it is better suited for dynamic optimisation.
The configuration of the Artificial Potential Field follows the settings described in the
authors’ previous work on automatic camera control using Artificial Potential Field (Burelli
and Jhala, 2009b). The configuration of the Sliding Octree Algorithm follows the suggestions
described in (Bourne et al., 2008); therefore, the scaling factor at each pass is equal to 0.75
and the number of passes is equal to 15 and the maximum movement distance is equal to
1.

6.4 Results

The experiment consists of a comparative analysis of the performance of the six optimisation
algorithms. Each algorithm is run to find and track the optimal solution on the 11 dynamic
virtual camera composition problems described in Section 6.2. All algorithms are evaluated
on each problem for 10 seconds, during which, the subjects move in the environment as
described in Section 6.2. The best solution found by the algorithms is recorded at each

81
Chapter 6. Hybrid Genetic Algorithm Evaluation

Hyb. GA DE PSO APF SO

Difficult-Slow
F. Visib. 3 0.61 0.27 0.56 0.43 0.14 0.25
H. Visib. 3 0.51 0.23 0.35 0.30 0.07 0.14
F. Size 2 0.10 0.32 0.31 0.24 0.16 0.22
Average
H. All 3 0.38 0.28 0.35 0.35 0.21 0.30
S. All 3 0.43 0.28 0.41 0.39 0.23 0.32
H. All 1 0.72 0.34 0.40 0.31 0.28 0.23
F. All 1 0.89 0.35 0.38 0.32 0.43 0.25
Easy-Fast
S. Angle 2 0.65 0.46 0.47 0.51 0.66 0.46
H. Angle 2 0.65 0.53 0.49 0.48 0.66 0.49
F. Angle 2 0.66 0.51 0.50 0.52 0.63 0.48
Easy-Faster
F. Angle 1 0.49 0.52 0.49 0.47 0.49 0.48

Table 6.2: Average best function values (ABFV) reached by each search algorithm. The
first column indicates the test environment. The best values are highlighted.

algorithm iteration. All the evaluations are repeated 50 times to to minimise the effects of
the initial random placement and the random movements of the subjects.
In this section, we showcase the performance of the proposed optimisation method in
comparison with the other five search algorithms. The algorithms are compared with respect
to the complexity of the task to be solved and the time taken for the algorithm to reach a
solution. All experiments are executed on a Intel Core i7 950 3.06 GHz (the implemented
algorithms use only one core) with 4 GB of RAM at 1333 MHz. The performance of the
algorithms is compared for accuracy and reliability. Accuracy is measured using the average
best function value measure suggested by Schönemann (2007), while reliability is measured
as the average amount of time in which the algorithm is unable to find a solution with an
objective function value higher than 0.25.

6.4.1 Accuracy

The average best function value (ABFV), obtained by each algorithm across 50 runs on
each test problem is used to measure how accurately the algorithm finds and tracks a global
optimum. This value is calculated as the mean of the median run, which is defined as the
list of median values between all the runs at each moment of the computation.
Table 6.2 contains the accuracy values of the hybrid GA opposed to the accuracy values
of the other 5 algorithms on all test problems. The values highlighted in bold in each table
row have no significant difference in accuracy between them — i.e. the result of the pairwise

82
6.4. Results

1.0

1.0
Hybrid PSO Hybrid PSO
GA APF GA APF
0.9

0.9
DE SO DE SO
0.8

0.8
0.7

0.7
Objective Function

Objective Function
0.6

0.6
0.5

0.5
0.4

0.4
0.3

0.3
0.2

0.2
0.1

0.1
0.0

0.0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Time (s) Time (s)

(a) Forest - Projection Size - 2 (b) Forest - Angle - 1

Figure 6.2: Median best solution value over time (median run) for each algorithm on the
problems in which the proposed hybrid GA fails to achieve the best results.

t-test with the other highlighted algorithms yields a p value greater than 0.01 — but they
show a significantly higher accuracy when compared to the non-highlighted values.
From the presented results, it is evident that the hybrid GA is capable of finding and
tracking the best solution across the vast majority of the test problems. In particular, the
algorithm appears to constantly achieve the best accuracy in the visibility tasks in all virtual
environments. Moreover, it also demonstrates the best accuracy in the optimisation of the
problems including all the frame constraints, independently of the number of subjects and
the type of virtual environment.
If we focus the analysis on a comparison between the hybrid GA and the two algorithms
on which this algorithm is based (GA and APF), we can observe two facts: in the average
and difficult-slow problems, the hybrid GA is able to achieve accuracy often more than two
times higher than the classical GA and the APF, while in the easy-fast and easy-faster the
hybrid GA and the APF achieve similar results. This behaviour reveals that the scheduler
and the early convergence detection method described in Section 3.3 are effectively able
to accelerate the convergence of a classical GA while maintaining a sufficient population
diversity to adapt to the changes of the objective function. However, in one of the difficult-
slow optimisation problems and in the easy-faster problems, the classical GA achieves higher
accuracy. This happens for two distinct reasons: inaccuracy of the approximated projection
size derivative and the high dynamism of the problems.

83
Chapter 6. Hybrid Genetic Algorithm Evaluation

Hyb. GA DE PSO APF SO

Difficult-Slow
F. Visib. 3 1.07 4.48 1.03 2.38 7.96 5.02
H. Visib. 3 0.77 4.63 3.49 3.62 8.61 6.85
F. Size 2 9.14 4.43 4.33 6.11 8.15 6.52
Average
H. All 3 2.27 4.66 2.88 2.63 6.95 3.75
S. All 3 1.85 4.67 2.01 2.41 6.37 3.89
H. All 1 1.28 4.12 3.84 4.99 4.66 6.36
F. All 1 0.17 4.38 4.05 5.62 4.05 6.11
Easy-Fast
S. Angle 2 0.78 1.48 2.12 1.52 0.72 1.92
H. Angle 2 0.60 1.42 1.89 1.73 0.73 1.07
F. Angle 2 0.78 1.25 1.88 1.38 0.88 1.82
Easy-Faster
F. Angle 1 3.66 2.45 2.74 2.85 2.66 3.15

Table 6.3: Average time in which the algorithms produces a solution with an objective
function value lower than 0.25. The time is expressed in seconds, the duration of each run
is 10 seconds. The best values — i.e. the lowest times — are highlighted.

In the projection size optimisation problems, the algorithm fails to accurately optimise
the function when more than one subject is included in the problem. The approximated
derivative becomes inaccurate when dealing with more than one subject, misleading the
search algorithms, as it is possible to observe in the median run of both GA and APF
depicted in Fig.6.2a. This is even more evident if we consider that the same algorithm with
only one subject achieves the best accuracy in all environments (these problems are not
part of the Pareto front in Fig. 6.1; therefore, they have not been included in the tables).
In the vantage angle problem with one subject (easy-faster ), the hybrid algorithm is
unable to track optimum due to the high dynamism of the problem and the scheduler is
unable to effectively detect the early convergence of the algorithm. This results in the
oscillating behaviour of both the hybrid GA and APF visible in Fig. 6.2b. The latter
aspect, in particular, requires future investigation, as the detection of early convergence, in
a highly dynamic environment in which it is impossible to rely on value stagnation, is still
an open research problem.

6.4.2 Reliability

We define the concept of reliability as the ability of the algorithm to find and keep track
of an acceptable solution over time and it is measured as the average amount of time in
which the algorithms are unable to provide an acceptable solution during each run of the

84
6.5. Summary

dynamic experiment; the lower this time is, the more reliable is the algorithm. A solution
is considered acceptable if its fitness is above a certain fraction of the ideal value of the
objective function (f = 1); for this experiment, we compare the algorithms’ reliability with
the following threshold: F ≥ 0.25. To measure this value, we run each algorithm 50 times,
measure the amount of time each algorithm provides a solution with a value lower than
0.25 for each run and we calculate the average of all the runs on each test problem. Table
6.3 shows the values of the reliability measure for each algorithm across the different test
problems.
The reliability findings exhibit a pattern very similar to the accuracy results, in which
the hybrid GA appears to perform best on all average and easy-fast problems, with fail
times down to 1.7% of the total experiment time. Also similarly to the accuracy results,
the performance decrease drastically in two problems: the projection size problem with two
subjects and the vantage angle problem with one subject. It is clear that also reliability is
affected by the imprecise approximation of the projection size derivative and by the failure
of the early convergence detection method.

6.5 Summary

This chapter presented an experimental evaluation of the optimisation algorithm at the core
of the automatic camera control architecture described in Chapter 3.
To test this algorithm, we identify the building blocks of the camera control problem and
we study the complexity of each block. In particular, we identify three principal objective
functions, we relate these to the state-of-the-art definition of frame constraints and we
systematically analyse the complexity of each objective function and their combination.
As a result of this analysis, we identify 11 test problems which represent a wide range of
problem complexity values in terms of both static and dynamic optimisation.
These problems compose a benchmark on which we evaluate the performance of the
proposed hybrid meta-heuristic algorithm and compare it with other five search algorithms
widely used in automatic camera control. The algorithms are compared on two aspects: ac-
curacy and reliability. Accuracy is evaluated using the average best function value (ABFV)
measure suggested by Schönemann (2007), while reliability is measured as the average
amount of time in which the algorithm is unable to find a solution with an objective func-
tion value higher than 0.25.
The proposed search algorithm is able to find and track solutions with the highest ac-
curacy and reliability on most test problems; however, the results achieved in two test
problems reveal the main limitations of the proposed algorithm. The accuracy and reliabil-
ity demonstrated by the algorithm in the optimisation of projection size for more than one

85
Chapter 6. Hybrid Genetic Algorithm Evaluation

subject are low, as the algorithm is unable to provide an acceptable solution, on average, for
91% of the time. Similar performance is manifested also by the standalone APF algorithm,
suggesting that the approximated projection size objective function derivative used by the
APF operator misleads the search process. The hybrid GA appears also unable to accu-
rately and reliably optimise the most dynamic of the problems considered: a vantage angle
constraint on one subject in the forest environment. We believe that such a behaviour is
the consequence of an imprecise detection of early convergence, which induces the scheduler
to constantly switching between the two policies.
However, the results emerging from the optimisation of the problems including all the
frame constraints are very positive: the algorithm shows the best accuracy and reliability
independently of the number of subjects and the type of virtual environment. This kind
of problem is the most realistic and the most common encountered in interactive virtual
environments, where most of the times the camera needs to follow an avatar and a number
of other objects.

86
Chapter 7

Controller Evaluation

In this chapter1 , we intend to evaluate the automatic camera control architecture described
in Chapter 3 and, in general, test the suitability of automatic camera control to three dimen-
sional computer games with respect to its impact on player experience. For this purpose,
an implementation of the automatic camera control architecture has been integrated into
a custom built three-dimensional platform game including all the stereotypical aspects of
the game genre. The game has been published as an on-line experiment on the author’s
web page2 attracting 22 subjects as participants of the experiment, among which 16 were
males, 13 declared to be regularly playing games and 9 declared to be regularly playing
three-dimensional platform games. The age of the subjects is between 26 and 37 years.
Each participant was asked to play a custom version of the game presented in Section 5.2.
The players initially plays a tutorial level (see Fig. 5.6a) to familiarize with the game, the
tutorial level was followed by two instances of the main level (see Fig. 5.6b), alternatively
with automatic camera control and manual camera control. At the end of each pair of
games, we collect information about the players preferences and their performance in the
games. The remainder of the chapter describes the experimental protocol followed (see
Section 7.1), the experimental setup (see Section 7.2), the extracted features (see Section
7.3) and the analysis of the data collected (see Section 7.4).

7.1 Experimental Methodology

Our experimental hypotheses are that players prefer to play without controlling the camera
and that players who play with an automatically controlled camera will achieve better
results in the game (e.g. more collected items, shorter time). To test these hypotheses we
have conducted a within subject experimental evaluation, in which each subject plays the
1
The content of this chapter has been submitted to the ACM Transactions On Graphics journal (Burelli
and Yannakakis, 2012a)
2
http://www.paoloburelli.com/experiments/CamOnLerpzEscapeEvaluation/

87
Chapter 7. Controller Evaluation

(a) Opening screen. (b) Demographic questionnaire.

(c) 4-AFC questionnaire of preference. (d) Motivations questionnaire.

Figure 7.1: Screen-shots of the game menus introducing the experiment and gathering
information about the subject and her experience.

main level with and without automatic camera control and, at the end of the second level
played, she or he expresses her or his preference between the two. The order of the levels
presented is randomised to eliminate potential order effects.
The participant is initially presented with a short description of the experiment and the
game objectives (see Fig. 7.1a). The subject chooses to start the game, by clicking on the
start button on the screen. The game then goes in full screen mode and the subject is asked
to fill in the following information (see Fig. 7.1b): age, gender, whether she normally plays
computer games, whether she normally plays 3D platform games and how often she plays
games in a week (e.g. from 0 to 2 hours per week or from 2 to 5 hours per week).

88
7.1. Experimental Methodology

At completion of the questionnaire, the subject starts to play the tutorial level (see
Fig. 5.6a). During this phase of the experiment, the player plays a short version of the
main game level, in which she can familiarise with the controls. To minimise the impact
of the learning effect, the camera control paradigm is randomised during the tutorial stage.
After the tutorial, the participant proceeds to the main part of the experiment, in which
she plays two times through the main game level (see Fig. 5.6b) with or without automatic
camera control.
At the end of this phase, the subject is asked to express a preference between the two
main game levels (Game A and Game B) just played (see Fig. 7.1c) and subsequently to
provide a motivation for the preference (see Fig. 7.1d). The preference is expressed through
4-alternative forced choice (4-AFC) questionnaire scheme. The preference questionnaire
includes four alternative choices: Game A, Game B, Neither or Both Equally. This scheme
has been chosen over a rating scheme for a number of advantages including the absence
of scaling, personality, and cultural biases as well as the lower order and inconsistency
effects (Yannakakis and Hallam, 2011). Moreover, a 4-AFC scheme, opposed to a 2-AFC
scheme, accounts also for cases of non-preference.
The motivations for preference can be picked out of six predefined options (labels) which
are as follows:

• It was simpler to play.

• I felt more immersed.

• I dislike loosing control of the camera.

• The game looked more aesthetically pleasing.

• The camera was pointing in the wrong direction.

• The camera was moving oddly.

The first three motivations have been included to allow us to analyse the magnitude
of the impact of the camera control scheme on gameplay. The fourth motivation has been
included to gather more information about the role of aesthetics in camera control. The
last two motivations have been included to allow the participant to report for any undesired
behaviour of the automatic camera controller.

89
Chapter 7. Controller Evaluation

7.2 Experimental Setup

Since subjects participated in the experiment directly on their computers by loading and
running the game in a web browser, there is no obvious physical experimental setup. How-
ever, we can identify some common conditions among the subjects: all of them played the
game through a web browser and the game was played in full screen. All subjects played
a custom version of the game presented in Section 5.2: the game features one tutorial level
and two main levels. The two main stages differ in terms of the camera control paradigm:
in one stage, the player controls the camera manually and, in the other one, the camera is
controlled automatically.
In the stage featuring automatic camera control, the controller is instructed to position
and animates the camera so to the generated images display the following properties for the
avatar:

• The avatar should be fully visible; to achieve this, the Object Visibility property should
be equal to 1.0.

• The projection image of the avatar, either horizontally or vertically, should cover
approximately one third of the screen length; that means the Object Projection Size
property should be equal to 0.3.

• The avatar should be shot from above its right shoulder; that means the Object View
Angle property should be equal to 170 degrees horizontally and 25 degrees vertically.

• The avatar should appear at the crossing point of the lower horizontal line and the
left vertical line according to the rule of the thirds in photography; that means the
Object Frame Position property coordinates should be equal to (0.33,0.66).

All together, these properties describe an over-the-shoulder shot, which places the avatar
in the lower left quadrant of the screen. This quadrant has been chosen to balance the
image, since the controls are depicted on the right side of the screen, and to grant to the
player a good view of the horizon. Figure 5.8b displays and example of how the game
is framed by the automatic camera controller. The numerical constraint parameters have
been selected after a testing phase in which we have searched the values that matched the
desired composition principles.

7.3 Collected Features

As already mentioned, the player preference and the motivation for such a preference are
asked to the player upon the completion of the two gaming sessions. Given that the two

90
7.4. Results

games differ only in the way the camera is controlled, the self-reported preferences pro-
vide a valid indicator of the preferred camera control scheme and a set of arguments for
justifying this preference. Moreover, we can evaluate whether the implementation of the
camera control architecture presented in this thesis is able to provide a satisfactory camera
experience to the player in a real game context. To evaluate this aspect, we triangulate the
preference information with the average frame-rate and the number of algorithm iterations
per second. Statistical features of the in-game performance of the experiment participants
are also recorded and include the following:

• Level completion time.

• Number of collected canisters.

• Amount of damage inflicted.

• Amount of damage received.

• Number of falls from the platforms.

• Number of jumps performed.

• Number of respawn points activated.

The performance holds important information about which aspect of the gameplay is af-
fected more by the camera control paradigm and what is the magnitude of such impact.

7.4 Results

As a first step in the analysis of the results, we test if the order of play affects the reported
preferences and a number of key gameplay features. The section follows with an analysis of
the effect of the order of play on the player performance and preferences (see Section 7.4.1),
and an analysis of the impact of the proposed camera controller on player preferences (see
Section 7.4.2) and performance (see Section 7.4.3).

7.4.1 Order Of Play

To check whether the order of playing game levels affects the user’s performances, we test
the statistical significance of the difference between the features collected in the first game
level and in the second game level played. Table 7.1 contains the results of a paired two-
sample t-test between each feature collected in the first and in the second game. The
statistical test shows that the order of play does not significantly affect any of the features

91
Chapter 7. Controller Evaluation

Feature (F ) First (F1 ) Second (F2 ) p-value

Completion time 187.47 170.81 0.100
Canisters collect. 6.86 7.32 0.094
Damage inflicted 0.54 0.68 0.164
Damage received 0.59 0.86 0.233
Falls 1.22 1.45 0.277
Jumps 57.18 55.54 0.278
Respawns activ. 2.86 2.77 0.164

Table 7.1: Average features of the players with respect to the order of play and test of order
of play effect on the features. The p-value is the result of a paired-sample t-test between
the values recorded in the first game level and the values recorded in the second game level.
Features for which the null hypothesis can be rejected with a 5% threshold (i.e. there is a
significant effect of order of play on the feature) are highlighted.

describing the player user’s performance, as the null hypothesis can be rejected in all cases
(p-value < 0.05).
The p-value for the number of collected canisters is very close to the 5% threshold,
suggesting the presence of a weak influence of the order of play. Such influence can also be
seen in Fig. 7.3b where, in most cases, the fuel canisters difference is negative when the level
featuring the automatic camera controller is played as the second game. Such a difference
is most likely a learning effect caused by the canister positions which are the same in both
games; players appear to be more efficient in collecting them in the second game.
To check the effect of the order of play on the user’s preferences, we follow the procedure
suggested in (Yannakakis et al., 2008) and we calculate the correlation ro as follows:
K −J
ro = (7.1)
N
where K is the number of times the users prefer the first game level, J is the number of
times the users prefer the second game level and N is the number of games. The statistical
test shows that no significant order effect emerges from the reported preferences (ro = 0.000,
p-value = 0.500).

7.4.2 Preferences

Figure 7.2a displays the total number of selections for each of the four options of the 4-AFC
questionnaire. The first noticeable fact is that 18 subjects (81.8%) have expressed a clear
preference between the two camera types. Nearly all of these subjects (17 out of 18) have
stated that they prefer playing the level in which the camera was automatically controlled
by our system. Therefore, the vast majority of the players prefer the automatic camera
scheme over the manual one, confirming our fist initial hypothesis. The significance of this

92
7.4. Results

20
18
16
Number of instances

14
12
10
8
6
4
2
0
Automatic Manual Both Neither

(a) Preferences

20
18
16
Number of instances

14
12
10
8
6
4
2
0
Difficulty Immersion Control Aesthetics Other

(b) Motivations

Figure 7.2: Expressed preferences (7.2a) and corresponding motivations (7.2b). The bar
colours in the motivations chart describe which preference the motivations have been given
for.

choice is tested using the same formula employed to test the order effect with different K, J
and N values: K is the number of times the level featuring the automatic camera controller
was picked, whereas J is the number of times the level with manual camera control was
picked; N is the number of of games where subjects expressed a clear preference (either
A < B or A > B; 2-AFC). The test reveals a strong positive correlation (ro = 0.889,
p-value = 0.001) confirming the presence of a significant influence of the camera control
scheme on the users’ preference.
The most common motivation, given by 15 subjects out of 17, is that the level with auto-
matic camera control was “simpler to play”, two of these subjects have explicitly stated that

93
Chapter 7. Controller Evaluation

Feature (F ) Automatic (Fa ) Manual (Fm ) p-value

Completion time 167.70 191.05 0.032
Canisters collect. 7.09 7.09 0.500
Damage inflicted 0.55 0.68 0.164
Damage received 0.82 0.64 0.314
Falls 1.59 1.09 0.093
Jumps 58.27 54.45 0.081
Respawns activ. 2.82 2.82 0.500

Table 7.2: Average features of the players with respect to camera control type. The third
column contains the p-values of a paired two sample t-test between the two measurements
across camera control type. The features for which the null hypothesis can be rejected with
a 5% of significance (i.e. there is a significant effect of the camera control scheme on the
feature) appear in bold.

the game was made simpler by the fact that the camera was controlled automatically. Such
results validate our initial hypothesis that players would prefer an automatically controlled
camera to a manually controlled camera in a virtual environment such as a 3D platform
game.
A very interesting fact that emerges from the subjects’ motivations (see Fig. 7.2b) is that,
although the automatic camera controller provided a well composed shot according to the
rule of the thirds, only one subject stated that she perceived the level with automatic camera
as “more aesthetically pleasing” and only four “felt more immersed”. Considering that the
automatic camera controller was explicitly instructed to provide properly composed shots,
such results might question the applicability of classical photography and cinematography
rules to computer games.
The four subjects (18.2%) that have expressed no preference between the two games
motivated the fact by stating that they did not like the control in general and found both
games frustrating. It is worth noticing that one of these four subjects, despite having
expressed no preference between the two levels, she stated in the motivations that she
preferred the camera to be automatically controlled.
The subject which stated that prefers the level featuring manual camera control, moti-
vated her choice by saying that she personally “prefers a mouse+keyboard control for PC
games”. This answer, along with the fact that no subject reported any kind of unexpected
or annoying behaviour of the camera, allows us to state that the camera control system
is able to successfully position and animate the camera in a real game environment with
dynamic geometry and interactive events.

94
7.4. Results

Feature (F) c(~z) p-value

Completion time -0.389 0.037
Canisters collect. -0.111 0.311
Damage inflicted 0.056 0.403
Damage received 0.111 0.311
Falls 0.167 0.229
Jumps 0.222 0.160
Respawns activ. 0.000 0.500

Table 7.3: Correlation between collected features and reported preferences. Features for
which the null hypothesis can be rejected with a 5% threshold (i.e. there is a significant
correlation between preference and the feature) are highlighted.

7.4.3 Gameplay

Our second hypothesis is that, when players play with the automatic camera controller their
performance in the game will increase. As described earlier in Section 7.3, we have extracted
and collected seven features to describe the players’ performance during the game. The
collected features have been sorted according to the camera control scheme (see Table 7.2)
and the two groups of measurements have been tested to assess whether their means differ
significantly. The first two columns of Table 7.2 contain the mean values of each feature
in the levels featuring the automatic camera controller and the ones in which the player
is requested to manually control the camera. The last column displays the p-values of a
two-sample paired t-test between the two groups of features.
These statistics reveal that only completion time is affected by the camera control scheme
as it is the only feature for which it is possible to reject the null hypothesis. In other words,
between the games played with the automatic camera controller and without it, only one
feature significantly differs (with a 5% threshold), while no significant difference emerges for
the other features; this feature is completion time. Figure 7.3a shows how the difference of
completion time between the automatically controlled and the manually controlled stages is
constantly equal or below 0, except from 3 instances, further confirming that the automatic
camera controller helps reducing the time required to complete the game.
Other two features worth observing are the number of falls and the number of jumps.
In both tests (see Tables 7.1 and 7.2), the p-values are slightly above the 5% significance
level, meaning that these features are probably also mildly affected by the camera control
scheme. The average number of jumps is about 9% lower in the games played with automatic
camera control and the number of falls is about 32% lower. Moreover, the trend depicted in
Figure 7.3 shows that, for both features, the difference between the level with the automatic
camera controller and the one with a manually controlled camera is positive more than half

95
Chapter 7. Controller Evaluation

5
150
4

100 3

2
50
1
Fa − Fm

Fa − Fm
0 0

−1
−50
−2
−100 Oder Of Play −3 Oder Of Play
Preference Preference
−4
−150 Completion Time Fuel Canisters
−5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Instance Instance

(a) Completion time (b) Fuel canisters

6
40
5
30 4
3
20
2
10
1
Fa − Fm

Fa − Fm

0 0
−1
−10
−2
−20
−3
Oder Of Play Oder Of Play
−30 −4
Preference Preference
Jumps −5 Falls
−40
−6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Instance Instance

(c) Jumps (d) Falls

Figure 7.3: Differences ∆F = Fa − Fm of completion time (7.3a), number of canisters

collected (7.3b), number of jumps (7.3c) and number of falls (7.3d) between the games
played with the automatic camera controller and the ones with manual camera control.
The background color pattern shows which level was preferred and which level was played
first for each game played. If the dark grey bar is in the upper half of the plot the level
featuring the automatic camera controller was preferred and vice versa. If the light grey bar
is in the upper half of the plot the level featuring the automatic camera controller was played
first and vice versa. The four features displayed have a significant or close-to-significant (i.e.
p-value < 0.10) correlation with either the order of play or the camera control paradigm.

of the times. An explanation for these findings could be that the automatic camera controller
did not support adequately the jump action and, therefore, it did not provide the optimal
view point for the player to jump between platforms, resulting in more attempts and more
falls.
Finally, an analysis of statistically significant correlations between subject expressed
preferences and collected gameplay features has been carried out. Correlation coefficients
c(~z) are calculated according to the following equation:
Np
X
c(~z) = (zi /Np ) (7.2)
i=1

96
7.5. Summary

where Np is the total number of games where subjects expressed a clear preference and
zi = 1, if the subject preferred the level with the larger value of the examined feature
and zi = −1, if the subject chose the other level. A significant negative correlation is
observed between average completion time and reported preference (see Table 7.3), further
validating the results of the survey; since shorter completion times can be seen as signs
of lower gameplay difficulty; decreased difficulty was widely reported as a motivation for
preferring the automatic camera over the manual one.

7.5 Summary

The chapter presented a user evaluation of the automatic camera control architecture de-
scribed in Chapter 3 in a commercial-standard three-dimensional computer game. Through
this experiment, the performance of the camera control system and the impact of automatic
camera control in third-person computer games are evaluated.
The pairwise preferences reported by the participants indicated that automatic camera
control is perceived as an enhancement to the game by simplifying the interaction. Fur-
thermore, the subjects reported no complaints on the camera behaviour, which, along with
the dominating preference for automatic camera control, suggests that the camera control
system was able to elicit a satisfactory experience without noticeable issues on the quality
of the generated images and camera motion. The analysis of the features describing player
performance in the game revealed that when players did not control the camera, they com-
pleted the level in less time, confirming our hypothesis that the automatic camera controller
would positively influence their performance.
However, when playing the level with the automatic camera controller, a large amount
of players had more difficulty to perform jumps. The reason for such a negative impact is
most likely due to the camera profile chosen which did not support all the types of actions
the player had to perform.

97
Chapter 7. Controller Evaluation

98
Chapter 8

Evaluation of Adaptive Camera

In this chapter1 we investigate player preferences concerning virtual camera placement and
animation, by applying the methodology described in Chapter 4 to a 3D platform game. We
build a model of the relationship between camera behaviour, player behaviour and game-
play and we evaluate the performance of this model. For this purpose, data from player
gaze and the virtual camera motion is collected in a data-collection experiment and used
to identify and describe the players’ camera behaviours.
The obtained model is then employed to adjust in real-time the behaviour of an auto-
matic camera controller at the same 3D platform game. This camera adaptation process
is tested through a user evaluation experiment in which a set of participants are asked to
express their preference between a level of the game played featuring automatic camera
control with a static camera profile (identical to the controller employed in the evaluation
study of Chapter 7) and a level that features the adaptive camera profile approach.
The reminder of this chapter presents the models obtained from the first experiment,
discusses their accuracy and finally presents the results of the user evaluation of the adap-
tation mechanism.

8.1 Camera Behaviour Models

Our first hypothesis is that camera behaviour depends on player’s actions and the game
context and that it is possible to predict the current camera behaviour given a description
of the current game situation and a track of the previous player actions. Therefore, we
expect that different players will focus on different object types in the same game situation
and that the same player will observe different types of object in different game situations.
Moreover, we expect to find a function that relates these differences to the game situation
and the previous actions of the players.
1
Parts of this chapter have been published in Burelli and Yannakakis (2011) and Picardi et al. (2011)

99
Chapter 8. Evaluation of Adaptive Camera

To evaluate this hypothesis, we conducted an experiment in which we collected eye gaze,

camera and game-play data from subjects playing a 3D platform game. In this experiment,
the participants play a three-dimensional platform game featuring all the stereotypical as-
pects of the genre’s mechanics.
The player controls, using keyboard and mouse, a humanoid avatar and the camera
through the game while jumping on platforms, fighting with enemies and collecting objects
(see Section 5.2). Camera behaviour is modelled using the combination of gaze and camera
position at each frame. Combining gaze data with camera data allows a finer analysis of the
player’s visual behaviour permitting, not only to understand what objects are visualised by
the player, but also which ones are actually observed. This information permits to filter
exactly which object is relevant for the player, among the ones visualised by the player
through her control of the virtual camera. A cluster analysis of the gaze data collected is
run to investigate the existence of different virtual camera motion patterns among different
players and different areas of the game. Moreover, the relationship between game-play and
camera behaviour is investigated through a linear analysis but also modelled using artificial
neural networks. Results obtained support the initial hypothesis: relevant differences in the
camera behaviour were found among groups of players and among game situations, and the
models built are accurate predictors of the camera behaviour.

8.1.1 Experiment

A custom version of the three-dimensional platform game presented in Section 5.2 has been
developed as a test-bed for this study. Similarly to the original game, it features an alien-
like avatar in a futuristic environment floating in the open space. The player controls the
avatar and the camera using keyboard and mouse. Avatar movements, defined in camera-
relative space, are controlled using the arrow keys, and jump and hit actions are activated
by pressing the left and right mouse buttons, respectively. The camera orbits around the
avatar at a fixed distance; the player can change the distance using the mouse wheel and
can rotate the camera around the avatar by moving the mouse. The only difference between
the game used in this experiment and the original test-bed environment lies in the structure
of the main game level. For the purpose of this study, the main game comprises 34 areas
containing 14 collection areas, 11 fight areas and 9 jump areas. The length of the main
level is longer than the original game to maximise the amount of data collected from each
participant. This is possible for two reasons: first, in this experiment the participants are
expected to play only one instance of the main level, so the total duration of the experiment
is shorter; and, second, the participants play the game in a controlled environment, so it is
much less likely for them to drop the experiment before its conclusion.

100
8.1. Camera Behaviour Models

Figure 8.1: Experimental setup used for the data collection experiment.

We represent the virtual camera behaviour as the amount of time the player spends
framing and observing different objects in the game environment while playing the game.
This representation of behaviour is chosen over a direct model of the camera position and
motion as it describes the behaviour in terms of the content visualised by the camera
and, therefore, it is independent of the absolute position of the avatar, the camera and
other objects. To get information about the objects observed by the player during the
experiment, we used the Eyefollower 2 gaze tracker. The Eyefollower is a device able to
locate the player’s gaze position on a computer screen through a combination of three
cameras. The first camera, placed on top of the screen, tracks the head position, while two
other motorised infra-red cameras, placed at the bottom of the screen, follow the player’s
eyes and track the point of regard (Fig. 8.1).
The Eyefollower necessitates an environment with fluorescent lighting and no daylight or
incandescent lights as ambient infra-red light or incandescent lights would interfere with the
infra-red light emitted by the gaze tracker. In order to provide the best possible operational
environment for the gaze tracker, the experiment is set in a room without windows.
The room contains a desk, a chair and two computers (see Fig. 8.1): the first computer
runs the computer game, while the second one collects the data from the game and the
player’s gaze, and synchronises the two streams into one log. The first computer is connected
to a 17” screen, a keyboard and a mouse; the second computer is only connected to the
gaze tracker and, through a network cable to the first computer. Running the gaze-tracking
process and the game on different machines guarantees the necessary resources for both and
minimizes the risk of data loss due to heavy computation.
2
developed by LC Technologies, Inc. - www.eyegaze.com

101
Chapter 8. Evaluation of Adaptive Camera

Twenty-nine subjects participated in the experiment. Twenty-four were male and the
age of the participants ranged between 23 and 34 years (mean=26.7, standard devia-
tion=2.41). Statistical data about game-play behaviour, virtual camera movement and
gaze position is collected for each participant. The collection of the data is synchronised
to the Eyefollower sampling rate, therefore, both data from the game and from the gaze
tracker are sampled 120 times per second. Each data sample contains: information about
the game-play, position and orientation of the camera, coordinates of the gaze position on
the screen and the number and the type of objects around the avatar. The objects are
classified into two categories: close and far. All the objects reachable by the avatar within
the next action are labelled as close, otherwise as far.
The data is logged only during the time the participants play the game; this phase is
preceded by the calibration of the gaze-tracking system, a tutorial level and a demographics
questionnaire. However, this is only a step of the whole experimental procedure, which is
structured as follows:

1. The participant enters the experiment room and sits at the desk in front of the screen.

2. The screen and the cameras are placed at 60 cm away from the participants’ head
and just below her line of sight.

3. The experimenter guides the participant through the gaze tracker calibration process.

4. The experimenter leaves the room and the participant starts the experiment by click-
ing on the start button on the screen.

5. The participant is presented on-screen instructions about the game controls and pro-
ceeds to the tutorial level of the game.

6. After the tutorial, the participant is asked to answer few demographic questions.

7. The participant plays through the game during which data is logged.

8. Once the game is over, the player exits the experiment room.

The total amount of time needed to complete the experiment is, on average, 13 minutes.
A time limitation is intentionally not considered in order to give the players the possibility
to familiarise well with the game and choose the game style they prefer.

102
8.1. Camera Behaviour Models

8.1.2 Collected Data Types and Feature Extraction

The data collected for each game is sorted into three datasets according to the three area
types described in Section 5.2. Each time a player completes an area two types of statistical
features are extracted from that area: game-play and camera related features. The features
of the first type are the experiment’s independent variables and encapsulate elements of
the player’s behaviour in the area. The features of the second type describe the camera
behaviour for each platform, therefore, they define the experiment’s dependent variables.
The player’s behaviour is defined by the actions the player performs in each area or,
more precisely, by the consequences of these actions. Hence, the features extracted describe
the interaction between the player and the game though the avatar’s actions, rather than
the sequences of pressed keys. For each area the following features have been extracted: the
numbers of fuel cells collected, damage given, damage received, enemies killed, re-spawn
points visited and jumps. The features are normalised to a range between 0 and 1 using a
standard min-max normalisation.
To model the camera behaviour, we analyse the content visualised by the camera instead
of the camera absolute position and rotation. The presence of a certain object on the screen,
however, does not necessarily imply an intentionality of the player; e.g. the object might be
on the screen only because it is close to an object the player is interested to. The gaze data
available permits to overcome this limitation since, using the gaze position, it is possible
to understand which object is currently observed among the ones framed by the player.
Therefore, we combine camera movements and gaze coordinates to identify the objects
observed by the player at each frame and we extract the following statistical features: the
relative camera speed as well as the time spent observing the avatar, the fuel cells close
to the avatar, the enemies close to the avatar, the re-spawn points close to the avatar, the
jump pads close to the avatar, the platforms close to the avatar and the far objects.
The seven features related to the time spent observing objects are calculated as the sum
of the durations of the smooth pursuit and fixation movements of the eyes (Yarbus, 1967),
during which the gaze position falls within an object’s projected image. These values are
normalised to [0, 1] via the completion time of each area. The first feature is the average
speed S of the camera relative to the avatar and it is defined as S = (Dc − Da )/T , where Dc
is the distance covered by the camera during the completion of an area, Da is the distance
covered by the avatar and T is the time spent to complete the area.
Each time the avatar leaves an area, the aforementioned features are logged for that
area. The data is then cleaned from experimental noise by removing all the records with a
duration inferior to 3 seconds and the ones with no platforms or no enemies and fuel cells
left. The remaining records are sorted into three separate groups according to the area type

103
Chapter 8. Evaluation of Adaptive Camera

Area Type
Index Collection Fight Jump
Davies-Bouldin 3 3 8
Krzanowski-Lai 2 3 3
Calinski-Harabasz 2 3 3
Dunn 2 2 2
Hubert-Levin 3 3 3

Table 8.1: Optimal K-values for the five validity indexes used to chose the number of
clusters. The K-value is selected according to a majority voting scheme; the selected value
is highlighted in the table for each area type.

and stored into three datasets, containing 239 records for the collection areas, 378 records
for the fight areas and 142 records for the jump areas.

8.1.3 Behaviour Models

As described in Chapter 4, we adopt a data-driven approach to the definition of the cam-

era behaviour; therefore, we employ the k-means clustering algorithm on the gaze-based
extracted features to isolate the different types of camera behaviour present in the data.
Unsupervised learning allows us to isolate the most significant groups of samples from
each dataset. However, k-means requires the number of clusters k existent in the data to
minimise the intra-cluster variance. To overcome this limitation, the algorithm runs with a
progressively higher k value — from 2 to 10 — and the clusters generated at each run are
evaluated using a set of five cluster validity indexes. The algorithm runs 50 times for each
k and the run with the smallest within cluster sum of squared errors is picked. Each se-
lected partition is evaluated against 5 validity indices: Davies-Bouldin (1979), Krzanowski-
Lai (1988), Calinski-Harabasz (1974), Dunn (1973) and Hubert-Levin (1976). As displayed
in Table 8.1, the smallest k value that optimises at least 3 validity measures out of five is
used for the clustering; the chosen k value is 2 for the collection type areas and 3 for the
fight and jump type areas.

Clusters Characteristics

As seen in Table 8.2, the camera behaviour is described with a different feature set for
each area type. The features are selected to match the visual stimuli offered by each area;
thus, only the features describing observation of objects which are present in the area
type are included in the set. Moreover, for each area type the features are sorted into 5
categories: camera speed and time spent observing the avatar, the primary task objects,
other close objects and far objects. The primary task objects highlighted in dark grey
colour in Table 8.2, represent the time spent observing objects relative to the main task of

104
8.1. Camera Behaviour Models

Collection Areas (k = 2)
Records Avatar Fuel Cells
Jump Pads Re-spawn Points Far Objects Speed
150 0.595 0.108 0.034 0.113 0.021 3.338
89 0.361 0.125 0.056 0.072 0.012 8.852
Fight Areas (k = 3)
Records Avatar Fuel Cells Coppers Jump Pads Re-spawn Points Far Objects Speed
137 0.674 0.042 0.095 0.049 0.034 0.036 3.283
99 0.676 0.032 0.478 0.044 0.056 0.025 5.293
142 0.250 0.029 0.069 0.030 0.052 0.013 5.927
Jump Areas (k = 3)
Records Avatar Fuel Cells Platforms Far Objects Speed
33 0.759 0.464 0.795 0.202 2.1293
80 0.736 0.166 0.658 0.059 2.7593
29 0.450 0.113 0.559 0.012 5.5854

Table 8.2: Camera behaviour features of the centroid of each cluster and the number of
records in each cluster. Speed indicates the average camera speed with respect to the
avatar. The remaining features express the time spent observing each object of a type in an
area divided by the time spent completing the area. Highlighted in dark grey colour is the
feature related to the main task of the area type. The features related to the other objects
close to the avatar are highlighted in light grey colour.

each area type; all other objects are categorised as secondary. The time spent observing the
avatar, the far objects and the camera speed are sorted separately since they represent type-
independent features. According to this feature categorisation, it is possible to observe four
main behaviour types: task focused, including the clusters spending more time observing
the task related objects, avatar focused, including the clusters mostly observing the avatar,
overview, which includes the clusters evenly observing all the objects in each area, and look
ahead, which includes the clusters that reveal a higher time spent observing the far objects.
Each of these categories is divided according to the camera speed as slow of fast according
to whether the speed is higher or lower than the overall average speed, 4.667.
The two clusters emerging from the gaze behaviour data of the collection areas show two
very distinct behaviours in terms of camera speed and time spent observing the avatar and
the fuel cells. The first cluster of players spend, on average, almost twice as much time as
the players of the second cluster observing the avatar. Moreover, the players belonging to
the first cluster spend on average 20% less time observing fuel cells and move the camera at
less than half of the speed, compared to the players belonging to the second cluster. These
values suggest that the camera for the first cluster should move slowly and focus primarily
on the avatar, while the behaviour exhibited by the second cluster would be better modelled
with a fast camera with an equal focus on the avatar and the other relevant items in the
area. We label these behaviour patterns, respectively, as slow avatar focused and fast task

105
Chapter 8. Evaluation of Adaptive Camera

focused.
The first and the second cluster of behaviours in the fight areas exhibit a pattern match-
ing, respectively, the slow avatar focused and the fast task focused types. The third cluster
of behaviours, on the other hand, exhibits a very short time for all the objects, suggesting
a far camera moving at high speed. We label this behaviour as fast overview.
In the jump areas, as expected, all the players spend a large amount of time observing
the platforms; however, their camera behaviour differs on all the other parameters. The
centroid of the first cluster evenly spends a large amount of time focusing on all the objects
in the area and moves the camera at very low speed (2.1293). Moreover, the time spent
looking at far objects is more than four times larger than all the other clusters. Such values
suggest that the camera for this cluster should stay at a longer distance in order to frame
all the objects of each platform and should move at a low speed. We label this behaviour
as slow look ahead. Both the second and the third clusters of players focus primarily on
the avatar and the platforms, but differ on the camera speed. Since the platforms are the
primary task objects in these areas, we label the two clusters as slow task focused and fast
task focused, respectively.

Behaviour Prediction

Once the relevant camera behaviour types are identified, we proceed by modelling the
relationship between game-play and camera behaviour types. More precisely, since the
model is intended to select the most appropriate camera behaviour that fits the player’s
preferences in the game, we attempt to approximate the function that maps the game-play
behaviour of the player to the camera behaviour. For this purpose, we use Artificial Neural
Networks (ANNs) which are chosen as a function known for its universal approximation
capacity. In particular, we employ a different ANN for each area type, instead of one for
the whole dataset, to be able to base each model on the best features necessary to describe
the game-play in that area.
The three fully connected feed-forward ANNs are trained using Resilient Backpropa-
gation (Riedmiller and Braun, 1993) on the game-play data (ANN input) and the camera
behaviour clusters (ANN output) using early stopping for over-fitting avoidance. The net-
works employ the logistic (sigmoid) function at all their neurons. The performance of the
ANNs is obtained as the best classification accuracy in 100 independent runs using 3-fold
cross-validation. While the inputs of the ANN are selected algorithmically from the set
of game-play features the ANN outputs are a set of binary values corresponding to each
cluster of the dataset.

106
8.1. Camera Behaviour Models

The exact ANN input features, the number of hidden neurons and the number of pre-
vious areas considered in the ANN input are found empirically through automatic feature
selection and exhaustive experimentation. Sequential Forward Selection (SFS) (Kittler,
1978) is employed to find the feature subset that yields the most accurate ANN model.
SFS is a local-search procedure in which a feature is added at a time to the current feature
(ANN input) set until the accuracy of the prediction increases. The feature to be added is
selected from the subset of the remaining features, such that the new feature set generates
the maximum value of the performance function over all candidate features for addition.
The performance of a feature set is calculated as the best classification accuracy of the
model in 100 independent runs using 3-fold cross-validation. The network used for the
feature selection process has one hidden layer containing a number of neurons equal to two
times the number of inputs. Once the best feature set is selected, the best ANN topology
is calculated through an exhaustive search of all possible combinations of neurons in two
hidden layers with a maximum of 30 neurons per layer.
The combination of automatic feature and topology selection is tested on three different
feature sets representing different time horizons in the past: input (game-play features) from
one (one step) previous area visited in the past, input from the previous two areas visited
(two step) and the combination of one previous area in the past with the average features
of the rest of the previous areas visited (one step + average). Each area is represented by
the 6 features describing player behaviour introduced in section 8.1.3 and other 3 values
describing the number of fuel cells, coppers and re-spawn points present in the area.

8.1.4 Models Evaluation

In this section, we present and discuss the results in terms of prediction accuracy of the
camera behaviour models created. First, a statistical analysis of the data is performed to
check the existence of a relationship between camera behaviours and game-play features and
to identify the significant ones. Then, the prediction accuracy of the models is evaluated
with respect to the length of the time-series expressing the past which is considered in the
ANN input, the selected feature set and the network topology.
To isolate the significant features among the ones logged, we perform an inter-cluster
one-way ANOVA (Snedecor and Cochran, 1989) for each game-play feature to identify for
which features we can reject the null hypothesis (no statistical difference exists).
As it is possible to see in Table 8.3, for each area type, at least one feature demonstrates
a significant difference revealing the existence of significant linear relationships. In the fight
areas dataset, there is a significant difference in terms of damage (both given and taken),

107
Chapter 8. Evaluation of Adaptive Camera

Area Fuel Damage Damage Enemies Re-spawn

Type Cells Given Received Killed Points Jumps
Collection 5.02 - - - 1.23 0.76
Fight 1.63 12.42 10.89 24.03 0.49 9.64
Jump 11.98 - - - - 0.53

Table 8.3: F-distribution values of the inter-cluster ANOVA test on the game-play features.
The threshold for a 5% significance level is F > 3.85 for the collection areas and F > 2.99
for the jump areas. Values above the threshold appear in bold.

number of killed enemies and number of jumps. In the other two area datasets, the clusters
differ significantly in the number of fuel cells collected.
The features highlighted by the ANOVA test reveal the existence of a linear relationship
between the current camera behaviour and those features. However, variable relationships,
in general, are most likely more complex, given that linearly separable problems are ex-
tremely rare. Thus, the aim of the analysis presented below is the construction of non-linear
computational models of camera behaviour via the use of ANNs described in Section 8.1.3.
Figure 8.2 depicts the best performance (3-fold cross validation) of 100 runs for each feature
set on the three representations of the past events described in Section 8.1.3. The best value
over 100 runs is picked to minimise the impact of the initial randomisation of the network
weights on the backpropagation performance.
Each value shown in Figure 8.2 corresponds to the best topology found. It is noteworthy
that all the selected topologies have at least one hidden layer, confirming the non linear
nature of the relationship. This aspect is also highlighted, in Figure 8.2, by the difference in
prediction accuracy between the ANNs that use the subset of significant features identified
through the ANOVA test and the ANNs using the subset identified through SFS. The
latter type of ANNs, yield a better accuracy regardless of the past representation and game
areas. The best 3-fold cross-validation performances achieved for the fight, the jump, and
the collection areas are, respectively, 70.04%, 76.43% and 82.29%. It is worth noting that
in the collection areas, while the first type of ANNs, built solely on the features that are
found significant by ANOVA, perform even worse than the ones using the full feature set,
indicating that the linear analysis does not capture the relationship between game-play and
camera behaviour accurately.
While, in the collection and jump areas, the ANOVA test indicates the number of fuel
cells collected as the only significant feature, SFS selects the number of jumps and the
number of re-spawn points activated as additional features for the ANN input. On the
other hand, in the collection areas, SFS does not only select features not indicated by

108
8.2. User Evaluation

100
80
Accuracy %

60
40
20
0
1S 2S 1S+A 1S 2S 1S+A 1S 2S 1S+A
Fight Areas Jump Areas Collection Areas

All Features Significant Features SFS

Figure 8.2: Best 3-fold cross-validation performance obtain by the three ANNs across dif-
ferent input feature sets and past representations. The bars labelled 1S refer to the one step
representation of the past trace, the ones labelled 2S refer to the two step representation
and 1S+A to the representation combining one previous step and the average of the whole
past trace.

ANOVA (the number of re-spawn points activated), but it also discards the number of
jumps performed.
The results shown in Figure 8.2 confirm also our supposition that a more extensive
representation of the past events would lead to a better accuracy. In fact, the best accuracies
are achieved when the ANNs use the most extensive information about the past game-play
events. All ANN models predicting camera behaviour in the collection type areas have two
hidden layers suggesting a more complex function between game-play and camera behaviour
in the collection areas. The non-linearity of the models emerges also from the difference,
in terms of performance, between the models based on significant features and the ones
based on the features selected with SFS. The first models exhibit always an accuracy lower
than the corresponding models assisted by SFS and, for the collection type areas, they even
perform worse than the models based on the full feature set. This confirms further our
conjecture on the complexity of the function between game-play and camera behaviour in
the collection type areas, as a selection based on linear relationships discards very important
information.

8.2 User Evaluation

The second step towards adaptive camera control requires the adaptation of the camera
behaviour based on the models just evaluated. Following the methodology described in
Chapter 3, we need to define a camera profile corresponding to each behaviour identified
and to employ the predictive models to choose the right behaviour at the right moment in

109
Chapter 8. Evaluation of Adaptive Camera

the game. In this section, we present an experiment in which we apply such a methodology
to the 3D platform game described in Chapter 5 and we test the effects of the adaptive
camera control on human players. Through this experiment, we evaluate the effectiveness
of the proposed methodology beyond the numerical accuracy of the models and we analyse
the impact of adaptivity on player preferences and in-game performance.
The remainder of the section describes the experiment, the mechanism used to select
and activate the behaviours, the profiles associated to each camera behaviour identified in
Section 8.1.3 and it concludes with a discussion on the evaluation results obtained.

8.2.1 Experiment

Our experimental hypothesis is that the camera behaviour models built on the combination
of gameplay, gaze and camera information can be successfully used to adapt the camera
behaviour to the player’s preferences. To test this hypothesis, we have conducted a within-
subject evaluation in which each subject plays the same level with two camera control
schemes (conditions): (a) the camera is controlled automatically with a static profile or (b)
the camera is controlled by the player profile that is influenced by the player behaviour. At
the end of the second experiment each subject expresses her or his preferences between the
two camera control schemes.
This experiment has been conducted in parallel with the evaluation presented in Chap-
ter 7 and it shares the same protocol structure and the same game mechanics; therefore,
the rest of the section will only highlight the main differences with that experiment. For
details about the game and the experimental setup please refer to Chapter 7.
More precisely, the two experiments share the following aspects:

• The experiments have been conducted on-line; therefore the players have participated
to the experiment on a web browser though the Unity 3D web player.

• The game used in the evaluation is the one described in Section 5.2.

• In both experiments, the players are asked to answer to a demographic questionnaire,

play a tutorial level followed by two main levels with different experimental conditions
and they finally are asked for their preference between the two last games and the
motivations for their choice.

• The order of the two camera control conditions is randomised to minimise learning
effect.

110
8.2. User Evaluation

• The preference is expressed, in both experiments, using a 4-AFC scheme. The mo-
tivations can be chosen from a list of suggestions or expressed freely using a text
box.

• The data collected includes both the player preferences and the following seven fea-
tures from the gameplay: level completion time, number of collected canisters, amount
of damage inflicted, amount of damage received, number of falls from the platforms,
number of jumps performed and number of respawn points activated.

The primary difference between the two experiments lies in the experimental conditions
as this experiment aims to evaluate the improvements given to automatic camera control
by adapting the camera profiles to the player behaviour. Therefore, in both conditions the
player controls only the avatar movements while the automatic camera controller animates
the camera according to the camera profile: in one condition the camera profile is static,
while in the other condition the camera profile is selected from a list of behaviours according
to the way the player plays. Furthermore, as no experimental condition features direct
manual camera control, during the tutorial, the player controls manually the camera to
eliminate any possible learning effect on the camera control scheme.

Adaptation

In the level featuring adaptive camera control, the camera profile depends on the way the
player performed in the game and on the type of area in which the player is currently
playing. At the instant in which the player enters an area, the game takes the gameplay
data collected up until that moment, selects the most appropriate prediction model and
activates it using the collected data as input. The output of the model indicates which
camera behaviour should be selected from that moment until the player enters another
area. Figure 8.3 displays an example of a camera profile transition that happens between
a collection area and a fight area. The new profile is activated two seconds after the player
enters the new area, to avoid multiple repetitive changes if the player moves in and out of
the area.
A new camera profile can be activated also in the case the current area changes category,
and this could happen in two cases: if the current area is a collection area and the last
collectible item is taken or if the area is a fight area and the last enemy present is killed.
This event is considered as an area transition by the game, therefore, the area is considered
completed and the player enters a new area. The new area type depends on the type
of elements present in the area and it is categorised according to the most threatening
challenge. The challenges are sorted in decreasing level of threat as follows: fight, jump

111
Chapter 8. Evaluation of Adaptive Camera

Fi
ghtAr
ea Col
l
ect
ionAr
ea
s
electnextcamer
apr
oﬁl
e

Figure 8.3: Example of a transition from one camera profile to another. As soon as the
avatar enters the bridge, the neural network relative to the fight area is activated using the
player’s gameplay features, recorded in all the previously visited fight areas, as its input.
The camera profile associated to the behaviour cluster selected by the network is activated
until the avatar moves to a new area.

and collect. If the area offers none of these challenges, it will have no category; therefore,
no prediction model is activated to identify the behaviour and a neutral camera profile is
selected.
The models of camera behaviour presented in Section 8.1.3 are able to predict the
camera behaviour cluster each player belongs to in each area based on two inputs: the area
type and a set of features describing the player behaviour in the previous platform of the
corresponding type. Therefore, before the right camera profile can be selected for the player
it is necessary that she plays at least one area for each area type. Moreover, as it emerges
from the results presented in Section 8.1.4, the more areas are used by the model to predict
the behaviour, the more accurate the model will be.
For this reason, in this experiment, the data about the player in-game behaviour is
collected starting from the tutorial game. The tutorial includes at least one area of each
type (2 collection, 1 jump, 1 fight), allowing the model to predict the camera profile starting
from the first area of the level played with adaptive camera control. The in-game player
behaviour is recorded throughout the game so that, each time the model is requested to
predict which camera behaviour cluster to select, the inputs are updated to the way the
player plays up until that moment. This means that a player playing twice the same area
might experience two different camera behaviours, either because its behaviour changed
during the game or because a more accurate model (with more inputs) can be used.
After the tutorial game, the player has crossed at least two collection areas, one fight
area and one jump area. The next time the player enters an area, the game selects the
most appropriate prediction model according to the area type and the number of areas
previously explored. For instance, when the player enters a jump area for the first time

112
8.2. User Evaluation

after the tutorial, the game will activate the behaviour prediction model that uses only one
previous area in its inputs. As displayed in Figure 8.2, the behaviour will be predicted with
approximately 62% accuracy. The following time a jump area will be visited, two previous
areas will be used for the prediction and the accuracy will increase up to the maximum
prediction accuracy of 76.43%.

Camera Profiles

In the level featuring no adaptivity, the controller is instructed to position and animate the
camera so that the generated images display the following properties for the avatar:

• The Object Visibility property should be equal to 1.0, meaning that the avatar should
be fully visible.

• The Object Projection Size property should be equal to 0.3, meaning that the projec-
tion image of the avatar, either horizontally or vertically, should cover approximately
one third of the screen along.

• The Vantage Angle property should be equal to 170 degrees horizontally and 25 de-
grees vertically.

• The Object Frame Position property coordinates should be equal to (0.33,0.66), mean-
ing that the avatar should appear at the crossing point of the lower horizontal line
and the left vertical line according to the rule of the thirds in photography.

All together, these properties describe an over the shoulder shot, which places the avatar in
the lower left quadrant of the screen. This quadrant has been chosen to balance the image,
as in the experiment presented in Chapter 7. Figure 5.8b displays and example how the
game is framed by the automatic camera controller.
The profiles used in the level featuring adaptive camera control are built on top of this
profile. Each profile differs on three aspects: the weight given to the frame constraints on
the avatar, the desired angle and projection size of the avatar and the other objects on
which a frame constraint is imposed. The basic profile is also used in the level featuring
adaptive camera control, when the current area cannot be placed in any category.
A different camera profile corresponds to each behaviour identified in Section 8.1.3. Each
camera profile includes the frame constraints of the basic profile plus one Object Visibility
constraint for each object present in the platform that belongs to a category observed for
more than 5% of the time (e.g. the profile corresponding to the first behaviour cluster of the
collection areas will contain an Object Visibility frame constraint for each re-spawn point
and one for each fuel cell). The 5% threshold has been introduced to deal with possible noise

113
Chapter 8. Evaluation of Adaptive Camera

in the collected data; the objects observed for a time windows shorter than the threshold
are considered not important.
The time spent observing the far objects is considered separately and it affects the
vertical angle value of the Object View Angle constraint imposed on the avatar. The vertical
angle β will be equal to 45◦ , if the time spent far objects is equal to 0 and it will decrease
down to 0 as the time increases according to the following formula:
(tmax − tf o )
β = 45 ∗ (8.1)
tmax
where tmax is the maximum fraction of time spent observing far objects in all behaviour
clusters and equals 0.202 and tf o is the fraction of time spent observing the far objects in
the current cluster.
The weight of the frame constraints imposed on the avatar is equal to the fraction of
time the avatar is observed, while the weight of the Object Visibility constraints for the
other objects is equal to the fraction of time in which object of the same type are observed
in the cluster divided by the number of objects present in the area (e.g. if there are 3 fuel
cells in the area and the current active camera profile corresponds to the second collection
behaviour, each Object Visibility constraint will have a weight equal to 0.125/3). The weight
of each object is assigned according to this scheme because the observation time of each
cluster is independent of the number of objects present in each area.
At the time the player enters a new area, the game queries the most appropriate be-
haviour prediction model (given the number of previous areas played and the type of area)
and builds an initial camera profile; when the player collects an item or kills an enemy, the
profile is updated. If such an action triggers an area type change, a complete new profile
is selected, otherwise, the current profile is updated according to the remaining number
of objects in the area. Figure 8.4 shows an example of the update of a camera profile in
a collection area. The camera behaviour remains unaltered as the type of area does not
change, but two frame constraints are dropped from the camera profile and the weight of
the Object Visibility constraint on the remaining fuel canister is doubled.

8.2.2 Results

Twentyeight subjects participated to the experiment, among which 21 were males, 20

declared to be regularly playing games and 14 declared to be regularly playing three-
dimensional platform games. The age of the participants ranged between 18 and 40 years
(mean=27.04, standard deviation=4.63). Before the analysis, the data has been filtered to
remove the logs of the experimental sessions in which the average frame-rate was lower than
30 frames per second. The data about these subjects have been removed to guarantee that

114
8.2. User Evaluation

• Object Visibility on Avatar: • Object Visibility on Avatar:

visibility=1, weight=0.36 visibility=1, weight=0.36
• Object Projection Size on Avatar: • Object Projection Size on Avatar:
size=0.3, weight=0.36 size=0.3, weight=0.36
• Vantage Angle on Avatar: • Vantage Angle on Avatar:
α = 170, β = 42, weight=0.36 α = 170, β = 42, weight=0.36
• Object Frame Position on Avatar: • Object Frame Position on Avatar:
x = 0.33, y = 0.66, weight=0.36 x = 0.33, y = 0.66, weight=0.36
• Object Visibility on Avatar: • Object Visibility on Avatar:
visibility=1, weight=0.36 visibility=1, weight=0.36
• Object Visibility on Fuel Canister: • Object Visibility on Fuel Canister:
visibility=1, weight=0.06 visibility=1, weight=0.12
• Object Visibility on Fuel Canister:
visibility=1, weight=0.06
• Object Visibility on Re-spawn:
visibility=1, weight=0.07

a: Profile for two fuel canisters and one active b: Profile for one fuel canister.
re-spawn point.

Figure 8.4: Changes in a camera profile before and after the collection of one fuel canister
and the activation of one re-spawn point. The screen-shots above each profile depict the
camera configuration produced by the two profiles for the same avatar position.

115
Chapter 8. Evaluation of Adaptive Camera

all the participants included in the study have been exposed to the same experience. After
this selection, the number of subjects considered is 23 in this experiment.
Due to the similarities in the structure between the experiment presented in this section
and the experiment protocol presented in Chapter 7, the analysis of the results follows the
same steps. We first control the effect of the order of play on the participants performance
and on their preferences. The section ends with an analysis of the impact of the adaptive
camera behaviours on the participants’ preferences and performance.

Order Of Play

To check whether the order of playing the levels affects the user’s performances, we test for
the statistical significance of the difference between the features collected in the first level
and in the second level played. Table 8.4 contains the results of a paired two-sample t-test
between each feature collected in the first and in the second game.
The statistical test shows that the order of play affects one of the features describing
the players’ performance: completion time. Moreover, similarly to what happened in the
evaluation of the automatic camera controller, the p-value for the number of collected
canisters is very close to the 5% significance level, suggesting the presence of a weak influence
of the order of play on this feature. Both influences can be observed also in Figure 8.7a and
in Figure 8.7b. In the first case, the completion time difference between the level featuring
an adaptive camera behaviour is almost always positive when the adaptive level is played
first. The opposite behaviour can be seen for the fuel canisters where, in several cases, the
difference is positive when the level featuring the adaptivity is played as the second game.
Similarly to what emerged in the automatic camera controller evaluation experiment, such
an influence most likely occurs since players can remember the positions of the objects and
the directions to reach the end of the level.
To check the effect of the order of play on the user’s preferences, we follow the procedure
suggested by Yannakakis et al. (2008) and we calculate the correlation ro as follows:
K −J
ro = (8.2)
N
where K is the number of times the users prefer the first level, J is the number of times the
users prefer the second level and N is the number of games. The greater the absolute value
of ro the more the order of play tends to affect the subjects’ preference; ro is trinomially-
distributed under the null hypothesis. The statistical test shows that no significant order
effect emerges from the reported preferences (ro = −0.22, p-value = 0.46).

116
8.2. User Evaluation

Feature (F ) First (F1 ) Second (F2 ) p-value

Completion time 172.00 148.24 0.01
Canisters collected 7.09 7.78 0.07
Damage inflicted 0.57 0.57 0.50
Damage received 0.30 0.35 0.61
Falls 2.83 2.09 0.20
Jumps 58.87 53.04 0.15
Respawns activated 2.65 2.74 0.29

Table 8.4: Average performance values of the players with respect to the order of play and
test of order of play effect on the features. The p-value is the result of a paired-sample t-test
between the values recorded in the first level and the values recorded in the second game.
Features for which the null hypothesis can be rejected with a 5% threshold (i.e. there is a
significant effect of order of play on the feature) are highlighted.

10
8

8
Number of instances

Number of instances
6

6
4

4
2

2
0

Adaptive Automatic Both Neither Difficulty Immersion Control Aesthetics Wrong Other

(a) Preferences (b) Motivations

Figure 8.5: Expressed preferences (a) and corresponding motivations (b). The bar colours
in the motivations chart describe which preference the motivations have been given for.

Preferences

Figure 8.5a displays the total number of selections for each of the four options of the 4-AFC
questionnaire. The first noticeable fact is that 18 subjects (78.3%) have expressed a clear
preference between the two camera types. Among these, half of the subjects (9 out of 18)
have stated that they prefer playing the level in which the camera profile was adapted and
the same amount preferred the level without adaptation.
The significance and the magnitude of the correlation between preference and camera
control paradigm can be tested using the same formula employed to test the order effect
with different K, J and N values: K is the number of times the level featuring the adap-
tive camera controller is preferred, J is the number of times the level without adaptivity is

117
Chapter 8. Evaluation of Adaptive Camera

10
●
●
●

●
8

●
● ●● ●
Fuel canisters
6
4
2

100 150 200 250 300

Completion time

Figure 8.6: Subjects sorted according to their average completion time and average number
of collected canisters. The subjects indicated with a cross symbol belong to the expert
players cluster, the ones indicated with a triangle symbol belong to the average players
cluster, while the subjects indicated with a circle symbol belong to the novices cluster.

preferred and N is the number of games. A positive correlation value expresses a stronger
preference for the levels featuring the adaptive camera behaviour, while a negative correla-
tion expresses a stronger preference for the levels without adaptation.
Even by considering the Both equally preferences as positive (assuming that the adap-
tation process is successful also when not noticeable), the test reveals only a very mild
correlation (ro = 0.14), with no statistical significance (p-value = 0.51). These results in-
dicate that the adaptivity has not been clearly preferred by all participants; however, if
we sort the subjects according to their performances, it appears that there are two dis-
tinct groups of players among the participants with different preferences about the camera
control paradigm.
To identify such groups, we clustered the subjects, according to the completion time and
the number of collected canisters, into three groups using k-means: the expert players, the
intermediate players and the novice players (see Figure 8.6). The first group contains the
9 players which, on average, have completed each level in approximately 106 seconds and
have collected approximately 8.1 canisters. The second group contains the 9 players which,
on average, have completed each level in approximately 154 seconds and have collected ap-
proximately 8.3 canisters, while the last group contains 5 subjects which completed the level

118
8.2. User Evaluation

in approximately 276 seconds and have collected approximately 4.6 canisters on average.
The main difference between the expert players and the intermediate player groups is the
completion time. In addition, the novices group differs from the other two in both features
considered (completion time and canisters collected). The novice players were often unable
to complete the level and, when they did complete it, it took them significantly more time
than the other players.
As displayed in Table 8.5, the correlation coefficient ro for the first group of subjects
equals −0.55 and it clearly indicates that the expert players preferred the level without
adaptation. The same analysis carried out on the two non-expert groups reveals, instead, a
significant positive correlation between preference and adaptivity (ro = 0.71 for the inter-
mediate players and ro = 0.6 for the novice players), indicating that the non expert-players
preferred the level played with adaptation.
The motivations given by the players for their preferences (see Figure 8.5b) can partially
motivate these results; in fact, the main motivation given when the level without adaptation
is preferred is the lack of sense of control. The answer ”I dislike loosing control of the
camera” has been given 5 times in total (4 times by the expert players). Moreover, 3
subjects reported that the camera pointed in the wrong direction while playing the level
with adaptive camera control, suggesting that the camera behaviour models are either not
accurate enough or not general enough. The two subjects that have expressed a dislike for
both levels motivated the fact by stating that they did not like automatic camera control
in general and found both controls frustrating.

Gameplay

Our second hypothesis is that adaptive camera control will positively affect the player. As
described earlier, we have extracted and collected seven features to describe the players’
performance during the game. The collected features have been sorted according to the
camera control scheme (see Table 8.6) and the two groups of measurements have been

Player group ro p-value

All players 0.14 0.51
Experts -0.55 0.04
Intermediate 0.71 0.01
Novices 0.6 0.05

Table 8.5: Correlation between expressed preference and camera type for all players and
for the players sorted according to their skill. A positive correlation expresses a positive
preference towards the adaptive camera control experience.

119
Chapter 8. Evaluation of Adaptive Camera

Feature (F ) Adap. Auto. p Feature (F ) Adap. Auto. p

(Fad ) (Fau ) (Fad ) (Fau )
Completion time 156.54 163.70 0.25 Completion time 115.07 98.28 0.90
Canisters collected 7.96 6.91 0.01 Canisters collected 8.20 8.00 0.31
Damage inflicted 0.61 0.52 0.08 Damage inflicted 0.70 0.60 0.17
Damage received 0.35 0.30 0.61 Damage received 0.40 0.10 0.96
Falls 1.83 3.09 0.05 Falls 0.80 0.30 0.84
Jumps 54.70 57.22 0.54 Jumps 53.20 46.80 0.11
Respawns activated 2.78 2.61 0.13 Respawns activated 2.80 2.90 0.70

a: All players b: Expert players

Feature (F ) Adap. Auto. p Feature (F ) Adap. Auto. p
(Fad ) (Fau ) (Fad ) (Fau )
Completion time 146.12 161.92 0.19 Completion time 256.15 297.38 0.10
Canisters collected 8.50 8.25 0.26 Canisters collected 6.60 2.60 0.01
Damage inflicted 0.75 0.75 1.00 Damage inflicted 0.20 0.00 0.19
Damage received 0.25 0.75 0.05 Damage received 0.40 0.00 0.91
Falls 1.50 1.88 0.29 Falls 4.40 10.60 0.04
Jumps 57.50 62.25 0.45 Jumps 53.20 70.00 0.27
Respawns activated 3.00 3.00 1.00 Respawns activated 2.40 1.40 0.04

c: Intermediate players d: Novice players

Table 8.6: Average performance values of the players with respect to camera control type.
The third column contains the p-values of a paired two sample t-test between the two
measurements across camera control type. The features for which the null hypothesis can
be rejected with a 5% of significance (i.e. there is a significant effect of the camera control
scheme on the feature) appear in bold.

tested to assess whether their means differ significantly. Moreover, the same analysis has
been separately applied to the three groups of subjects introduced in the previous section.
The first two columns of Table 8.6 contain the mean values of each feature in the levels
featuring the adaptive camera behaviour and the levels without. The last column displays
the p-values of a two-sample paired t-test between the two groups of features.
These statistics reveal that, overall, the number of collected canisters and the number
of falls is affected by the camera control paradigm as these are the only features for which
it is possible to reject the null hypothesis.
It is worth noting how both these features are not affecting the player preference, as
displayed in Table 8.7; this appears to be another reason for the lack of a clear preference
towards the adaptive version of the game. Completion time is, overall, the only feature
affecting the players’ preference (the players tended to prefer the level that took them less
time to complete) and adaptivity did not affect this aspect of the players performances.
Within the group of novice players, two different performance features yield a significant

120
8.2. User Evaluation

Feature (F) Pref. Other p Feature (F) Pref. Other p

Completion time 149.57 170.66 0.02 Completion time 93.18 120.17 0.01
Canisters collected 7.78 7.09 0.07 Canisters collected 8.10 8.10 0.50
Damage inflicted 0.57 0.57 0.50 Damage inflicted 0.60 0.70 0.83
Damage received 0.39 0.26 0.81 Damage received 0.20 0.30 0.30
Falls 2.04 2.87 0.17 Falls 0.20 0.90 0.08
Jumps 53.13 58.78 0.16 Jumps 45.20 54.80 0.01
Respawns activated 2.83 2.57 0.04 Respawns activated 2.90 2.80 0.30

a: All players b: Expert players

Feature (F) Pref. Other p Feature (F) Pref. Other p
Completion time 137.43 170.61 0.02 Completion time 281.79 271.74 0.61
Canisters collected 8.38 8.38 0.50 Canisters collected 6.20 3.00 0.05
Damage inflicted 0.75 0.75 1.00 Damage inflicted 0.20 0.00 0.19
Damage received 0.62 0.38 0.77 Damage received 0.40 0.00 0.91
Falls 1.38 2.00 0.18 Falls 6.80 8.20 0.37
Jumps 55.50 64.25 0.14 Jumps 65.20 58.00 0.66
Respawns activated 3.00 3.00 1.00 Respawns activated 2.40 1.40 0.04

c: Intermediate players d: Novice players

Table 8.7: Average performance values of the players with respect to preference. The third
column contains the p-values of a paired two sample t-test between the two measurements
across camera control type. The features for which the null hypothesis can be rejected with
a 5% of significance (i.e. there is a significant effect of the camera control scheme on the
feature) appear in bold.

difference between the level types: the number of collected canisters and the number of falls.
The number of canisters is significantly higher (154%) in the levels featuring adaptation and
the number of falls decreases by 72% between the two types of levels. This finding reveals
that, for a part of the subjects, the adaptation mechanism allowed the automatic camera
controller to overcome the limitations emerged in Chapter 7. Moreover, if we compare the
average values for these two features, it is worth noticing how the adaptation mechanism
improves the performance of the novice players getting them closer to the performances of
the other players and, above all, it allows the novice players to complete the level as the
average number of canisters increases above 6 (the minimum number of canisters required
to complete a level) and the completion time is approximately 40 second lower than the
maximum level duration. Such an improvement is not noticeable for the expert players.
Moreover, when comparing the significant features that are presented in Table 8.6d and
Table 8.6d, it appears that the adaptation of the camera behaviour supported the collection
of fuel canisters and lead to a stronger preference of the novice players for the levels featuring
adaptivity.

121
Chapter 8. Evaluation of Adaptive Camera

6
100

4
50

2
Fad − Fau

Fad − Fau
0

0
−2
−50

−4
−100

−6
Instance Instance

(a) ∆ Completion time (b) ∆ Number of fuel canisters collected

10
40

5
20
Fad − Fau

Fad − Fau
0

0
−60 −40 −20

−5
−10

Instance Instance

(c) ∆ Number of jumps performed (d) ∆ Number of falls

Figure 8.7: Differences ∆F = Fad −Fau of completion time (a), number of canisters collected
(b), number of jumps (c) and number of falls (d) between the levels played with the adaptive
camera behaviours and the ones without. The background color pattern shows which level
was preferred and which level was played first for each game played. If the dark grey bar is
in the upper half of the plot the level featuring the adaptive camera controller was preferred
and vice versa. If the light grey bar is in the upper half of the plot the level featuring the
adaptive camera controller was played first and vice versa.

8.3 Summary

The chapter presented a case study that evaluated the applicability of the adaptive camera
control methodology described in Chapter 4.
The camera behaviour is modelled using a combination of information about players’

122
8.3. Summary

gaze and virtual camera position collected during a game experiment. The data collected
is sorted into three sets of areas according to the game-play provided to the player. For
each group, the data is clustered using k-means to identify relevant behaviours and the
relationship between the clusters and the game-play experience is modelled using three
ANNs.
The evaluation of the ANN accuracy in predicting camera-behaviour is analysed with
respect to the number and type of features used as input to the model. The analysis reveals
that the best prediction accuracies (i.e. 76.43% for jump, 82.29% for collection and 70.04%
for fight) are achieved using a representation of the past events which includes the previous
area and the average features of the other previously visited areas. Moreover, sequential
feature selection reduces vector size which results in higher accuracies for all ANNs.
The chapter concluded with the presentation of a user evaluation of the adaptive camera
control architecture methodology. Through this experiment, the performance of the adap-
tivity and the impact of adaptivity on automatic camera control in third-person computer
games are evaluated.
While the results show no clear preference for the levels featuring adaptivity, it appears
that the adaptive camera controller is able to overcome some of the limitations emerged in
the evaluation of the automatic controller. Moreover, the adaptation mechanism showed to
be able to provide a satisfactory experience for most of the non-expert participants.

123
Chapter 8. Evaluation of Adaptive Camera

124
Chapter 9

Discussion and Conclusions

This thesis aimed to answer three main research questions: how to effectively solve the
camera composition and animation problems in real-time in dynamic and interactive envi-
ronments; how automatic camera control affects the player experience in three-dimensional
computer games; and, within the framework of automatic camera control, how the player
can affect the cinematographic experience. In order to answer these research questions,
this thesis pursued two objectives: develop and evaluate a real-time automatic camera con-
troller, and design and evaluate a methodology for adaptive camera control. The rest of
the chapter summarises the results presented in the thesis and discusses the extents of its
contributions and its limitations.

9.1 Contributions

This section summarises this thesis’ main achievements and contributions that advance the
state-of-the-art in automatic camera control for interactive and dynamic virtual environ-
ments. To a lesser degree, results of this thesis can be of used to the fields of dynamic
optimisation, animation and human-computer interaction. More specifically, the contribu-
tions of this thesis can be described as follows.

9.1.1 Dynamic Virtual Camera Composition

This thesis introduces a hybrid search algorithm, combining local and population-based
search, for virtual camera composition in real-time dynamic virtual environments. The al-
gorithm is a composite of a Genetic Algorithm employing an APF-based mutation operator.
The proposed algorithm is evaluated and compared against state-of-the-art algorithms for
optimisation in automatic virtual camera control.
To test this algorithm, we identify the building blocks of the camera control problem
and we study the complexity of each. In particular, we identify three principal objective

125
Chapter 9. Discussion and Conclusions

functions, we relate these to the state-of-the-art definition of frame constraints and we

systematically analyse the complexity of each objective function and their combination.
As a result of this analysis we identified 11 test problems which represent a wide range of
problem complexity values in terms of both static and dynamic optimisation.
These problems (see Section 6.2) compose a benchmark on which we evaluate the per-
formance of the proposed hybrid meta-heuristic algorithm and compare it with other five
search algorithms widely used in automatic camera control. The algorithms are compared
on two aspects: accuracy and reliability. Accuracy is evaluated using the average best func-
tion value (ABFV) measure suggested by Schönemann (2007), while reliability is measured
as the average amount of time in which the algorithm is unable to find a solution with an
objective function value higher than 0.25.
The proposed search algorithm is able to find and track solutions with the highest
accuracy and reliability on most test problems (see Section 6.4). Moreover, the results
obtained by the hybrid Genetic Algorithm in the optimisation of the test problems including
all the frame constraints are very positive: the algorithm shows the best accuracy and
reliability independently of the number of subjects and the type of virtual environment.
Such test problems are the most realistic and the most common encountered in interactive
virtual environments, where most of the times the camera needs to follow an avatar and a
number of other objects.
Moreover, the performance measures and the test problems employed in the evalua-
tion of the hybrid Genetic Algorithm stand, to the best of the author’s knowledge, as the
first systematic benchmark for virtual camera composition in dynamic environments (see
Section 6.2).

9.1.2 Real-Time Automatic Camera Control

The thesis introduces a novel automatic camera control architecture (see Chapter 3), that
combines the aforementioned hybrid Genetic Algorithm and a lazy probabilistic roadmap
to effectively solve the problems of composition and animation of the camera in dynamic
and interactive virtual environments. The proposed approach employs the hybrid Genetic
Algorithm to iteratively search the camera configuration that optimises the composition
requirements at each frame. The lazy probabilistic roadmap algorithm is used to animate
the camera to the new solution found at each iteration of the system. We evaluated the
system’s quality and impact on player experience by conducting a user study in which
participants play a game featuring alternatively manually controlled camera and a camera
automatically controlled by system presented in this thesis.

126
9.1. Contributions

The thesis presented a user evaluation of the proposed automatic camera control system
in a commercial-standard three-dimensional computer game (see Chapter 7). Through this
experiment, the performance of the camera control system and the impact of automatic
camera control in third-person computer games were evaluated.
The pairwise preferences reported by the participants indicate that automatic camera
control is perceived as an enhancement to the game by simplifying the interaction. Fur-
thermore, the subjects reported no complaints on the camera behaviour, which, along with
the dominating preference for automatic camera control, suggests that the camera control
system was able to elicit a satisfactory experience without noticeable issues on the quality of
the generated images and camera motion. The analysis of the features representing aspects
of player performance in the game revealed that when players did not control the camera,
they completed the level in less time, confirming our hypothesis that the automatic camera
controller would positively influence the players’ performance.

9.1.3 Adaptive Camera Control

This thesis introduced a methodology to build adaptive virtual camera control in computer
games by modelling camera behaviour and its relationship to game-play (see Chapter 4).
Camera behaviour is modelled using a combination of information about players’ gaze and
virtual camera position collected during a game experiment. The data collected is clustered
using k-means to identify relevant behaviours and the relationship between the clusters
and the game-play experience is modelled using three ANNs. These models of camera
behaviour are then used to select — at specific times in the game — which camera profile
to select in order to instruct the automatic camera controller and provide a personalised
cinematographic experience.
The methodology is evaluated on a three-dimensional platform game and the model
accuracy in predicting camera-behaviour is analysed with respect to the number and type
of features used as input to the model (see Chapter 8). The analysis reveals that the best
prediction accuracies are around 76% for areas in which the main task is to jump, 82%
for the ones in which items collection is the main task, and around 70% for fight areas.
Such results are achieved using a representation of the past in-game events which includes
game and player features of the previous area as well as the average features of the other
previously visited areas. Moreover, sequential feature selection reduces the input space of
the models which results in higher accuracies for all ANNs.
The models built from this data are employed for camera behaviour adaptation in the
same game and the adaptation mechanism is evaluated in a user study. The pairwise pref-
erences reported by the participants do not indicate a strong preference towards adaptive

127
Chapter 9. Discussion and Conclusions

camera control which is, in general, not perceived as an enhancement to the game. How-
ever, if the subjects are sorted by their playing skill level, it appears that the two groups
of players with lower in-game performance significantly prefer adaptive camera control over
simple automatic camera control, while the opposite effect is present for the expert play-
ers. Moreover, for the novice players the number of falls significantly decreases (by 106%)
when the adaptive camera control is used. Such findings suggest that, for a part of the
subjects, the adaptation mechanism allowed the automatic camera controller to overcome
its limitations and to support a wider set of player actions.

9.2 Limitations

The limitations of the proposed approach to virtual cinematography, as appeared through-

out the experiments of this thesis, are summarized in this section. Moreover, the generality
of the obtained experimental results is also discussed.

9.2.1 Methods

The first limitation of the camera control approach proposed emerges from the hybrid
optimisation algorithm per se. The accuracy and reliability demonstrated by the algorithm
in the optimisation of the projection size frame property for more than one subject are low
as the algorithm is unable to provide an acceptable solution, on average, in 91% of the
time. As shown in Section 6.4, similar performance is manifested also by the standalone
APF algorithm, suggesting that the approximated derivative of the projection size objective
function used to drive the APF operator misleads the search process. This aspect requires a
further investigation, and a more precise force function needs to be designed. The hybrid GA
appears also unable to accurately and reliably optimise the most dynamic of the problems
considered: a vantage angle constraint on one subject in the forest environment. We believe
that such a behaviour is the consequence of an imprecise detection of early convergence,
which induces the scheduler to constantly switch between the application of the standard
operators and the APF-based operator. A better early convergence detection mechanism
would probably allow the algorithm to achieve better accuracy and reliability. This is a
direction that should be considered in order to shed further light on the potential of hybrid
optimisation for real-time camera composition.
A second limitation emerges from the user evaluation of the automatic camera controller
presented in Chapter 7: a large amount of players had more difficulty to perform jumps.
The reason for such a negative impact is probably the camera profile chosen, which did not
support all the types of actions the player had to perform. This issue has been addressed
by adapting the camera profile to the player’s preferences and the type of interaction; as it

128
9.2. Limitations

appears from the results of the evaluation of the adaptive camera controller, the number of
falls decreases significantly when the game is played with a personalised camera.
While the incorporation of adaptation elements to automatic camera control appears
to improve, to some extent, the performances of the players, the preferences expressed in
the user evaluation reveal that the proposed adaptive camera controller is not successful in
improving the experience for a large number of players. The data collected in the experi-
ment presented in Chapter 8 suggest that the expert players systematically preferred the
game without adaptation and their performance is mostly unaffected by the camera control
scheme.
The reasons for such results are manifold and can probably be identified in all three
stages of the adaptation methodology: the data collection experiment, behaviour modelling
and camera profile adaptation.
The analysis of the data collected to build the camera behaviour models (see Section 8.1)
suggests that the game used in the collection experiment is visually very complex and a
multitude of objects have high visual saliency and compete for the player’s attention. This
is especially evident in the camera behaviour clusters in which the players do not focus
primarily on the task-related objects but spend a large amount of time observing the virtual
environment. This effect could be minimized using a different version of the game during
the data collection phase, which incorporates textures with low saliency values.
While the accuracy of the models evaluated in Section 8.1.4 is relatively high, none of the
models used in the evaluation has a prediction accuracy higher than 85%. This means that
there is a substantial number of cases for which the models are unable to predict the right
behaviour leading to a selection of incorrect camera profiles during the adaptation phase.
Different directions could be pursued to increase these results, such as other representations,
a larger data set, a global search based feature selection. A different clustering algorithm
could also be employed, as the error might be a consequence of a wrong cluster separation.
However, considering also that building generalised human behaviour models is a hard
problem, it might be impossible to obtain better prediction results; therefore, the model
itself could be adjusted during the gameplay, to better match the behaviour of each specific
player.
Finally, the camera profile adaptation process employs an arbitrary method to translate
the behaviours emerging from the data collected to camera profiles; the same criterion,
probably, cannot be used for all the players.

129
Chapter 9. Discussion and Conclusions

9.2.2 Experiments

The algorithms presented in Chapter 3 represent the state-of-the-art in optimization for au-
tomatic camera control. Such a selection limits the scope of the results as it does not allow
to extend the results drawn from the presented experiment beyond the problem of virtual
camera composition. However, such an analysis was necessary to position the proposed so-
lution within the field and to suggest a systematic methodology to run future comparisons.
Given the results of the comparative study and the analysis of the virtual camera compo-
sition objective function, a comparison against the state-of-the-art algorithm in dynamic
optimization is a natural future step.
The game used for the evaluation of the proposed automatic camera controller and the
adaptive camera control methodology has been designed to represent a large number of
games and, therefore, allow the generalisation of the findings to other games. However,
aspects such as the number of participants and the specific characteristics of the games
need to be covered to evaluate the extent of the generality of the evaluation results.
In the evaluation presented in Chapter 7, the participants were asked to state their
preference between a game played using automatic camera control and a game in which
they were required to manually control the camera. In order to maintain a coherent control
scheme, the participants are required to change the hand they use to control the avatar (the
left one when the camera is controlled manually and the right one when the camera is auto-
matically controlled). While this lack of symmetry might have influenced the participant’s
judgement, this scheme appeared the most natural during the pilot study conducted previ-
ous to the experiment. Other options would have required to participants to either change
the controls completely or to interact from very uncomfortable positions. Moreover, only
one participant mentioned the avatar control key mapping as uncomfortable and suggested
the use of W,A,S,D keys.
Another limitation of the experiment presented in Chapter 7 comes from the base-line
condition: a comparison of the proposed automatic controller against a simple manual
controller limits the scope of the results. Nonetheless, we believe that the comparison was
necessary as, to the best of my knowledge, there is no previous evidence in the literature
that players would prefer automatic camera over manual camera. Therefore, this needed to
be tested since a negative answer would have undermined the core motivations of this thesis.
Moreover, a preference towards CamOn in such experiment, does not only demonstrate that
automatic camera is preferred over manual, but it also demonstrates that CamOn is capable
of providing an acceptable automatic camera experience.
The game interaction consists of four main activities: jump, fight, collect and explore.
It is therefore possible to assume that the findings can be applied to any three-dimensional

130
9.3. Extensibility

game in which the gameplay is based primarily on these aspects (e.g. platform games,
action games or fighting games). Although these games often include a strong narrative
component, which is almost completely absent on the test game, we still assume our findings
to be valid for the rest of the game interaction. Moreover, the impact of automatic camera
on narrative is largely assumed to be positive in the literature (Drucker et al., 1992; Charles
et al., 2002; Christie et al., 2002) and a user survey by Jhala and Young (2010) confirms
such an assumption.
Another critical aspect to estimate the validity of the findings is the size of the par-
ticipants’ sample. The number of subjects in our user studies is relatively small; however,
considering that only one independent variable has been tested, that the findings are re-
stricted to a specific typology of games and that the results appear to be largely in favour
of automatic camera control, we believe the sample size to be sufficient. One or more larger
user survey studies will be required to evaluate automatic camera control across more game
genres.

9.3 Extensibility

The experiments evaluating the different aspects of this thesis work have demonstrated the
solidity of the proposed solutions. Here, we discuss a number of research directions that
could benefit from the findings of this thesis. Moreover, on the basis of the results obtained,
we suggest new research objectives and possible solutions.

9.3.1 Benchmarking

One of the challenges encountered during this thesis work consisted of designing the eval-
uation of the methodologies proposed. In automatic camera control, in contrast to other
fields such as computer vision, there is a lack of standard data corpora and measures which
would allow to fairly evaluate and compare different approaches. This thesis proposes an
initial benchmark and suggests a set of performance measures for the evaluation of virtual
camera composition algorithms in dynamic environments. We believe that, starting from
this initial experiment, a complete camera control benchmark can be developed and that
this would be beneficial for future advancements in the field of both automatic camera
control and dynamic optimisation.

9.3.2 Optimisation

The results demonstrated by the hybrid Genetic Algorithm proposed in this thesis are an
encouraging starting point in the development of hybrid optimisation algorithms for dy-
namic camera composition. This aspect deserves further empirical analysis to identify the

131
Chapter 9. Discussion and Conclusions

ideal combination of algorithms. Considering the high dynamism of the camera control
objective function and its very rough landscape (see Chapter 6), an algorithm that suc-
cessfully optimises this function could prove very effective in other dynamic optimisation
problems (Branke, 2001): . For this purpose it will be necessary to analyse the virtual
camera composition problem with respect to the existing benchmarks in dynamic optimi-
sation and test the proposed hybrid optimisation algorithm on these benchmarks against
the state-of-the art algorithms for dynamic optimisation.
Furthermore, investigating mechanisms for dynamic population handling during the
optimisation, could improve the start up performance and the problem of early convergence.
This mechanism, as well as the best pattern of application of the genetic operators, could
be learned using machine learning. For this purpose, the benchmark problems introduced
in this thesis could be employed as a training set.
Finally, since the intrinsic shape of the camera composition objective function contains
multiple alternative solutions often visually different, such solutions can be used to gener-
ate multiple different shots from the same description. It is already known that from an
optimization standpoint, these problems are usually multi-modal so that algorithms with
restart and/or niching components are needed as we may otherwise miss the best optima.
Supported by some initial results (Preuss et al., 2012; Burelli et al., 2012), we believe it
is worth to explore the objective function landscapes and see how the difficulty of finding
multiple good shots varies across different scenes and problems in both static and dynamic
environments.

9.3.3 Manual Control

The automatic camera controller’s evaluation presented in Chapter 7, compares the impact
of two camera control schemes on player preferences and performances. While, the results
clearly show a strong preference of the players towards automatic camera control, it would
be interesting to compare the proposed camera controller against manual camera control
schemes other than the one used in this thesis (orbit camera). Such an extended study
could lead to an empirical taxonomy of camera control schemes in different game genres.

9.3.4 Aesthetics

The evaluations of the automatic camera controller and the adaptation methodology have
both revealed a very low number of subjects identifying aesthetics as a motivation for their
preference. In particular, the lack of a reported aesthetic improvement between the game
featuring manual camera control and the one employing the automatic camera controller

132
9.4. Summary

suggests the necessity to extend the authors’ evaluation of the ability of the system to
provide well composed shots in dynamic environments (Burelli and Yannakakis, 2010a).
Furthermore, it appears very important to investigate the role of image composition and
virtual camera control in computer game aesthetics it differs from photography and classing
cinematography. For this purpose, an evaluation of the impact of different directorial styles
on player experience in different game genres should be carried out; such a study could lead
to the design of a structured taxonomy of interactive virtual cinematography.

9.3.5 Animation

The animation capabilities of the camera control architecture presented in this thesis are
limited to the control of a few dynamic parameters of the camera, such as speed and accel-
eration. These parameters showed to be sufficient as control attributes for a virtual camera
of a 3D platform game; however, in order to generate more cinematographic experiences, we
believe it is necessary to explore new virtual camera animation methods and the integration
of trajectory prediction (Halper et al., 2001) algorithms in the path planning process, to
reduce undesired discontinuities in the animation.

9.3.6 Adaptation

In the proposed methodology of camera behaviour adaptivity, the user models are built
a-priory and do not change during the experience. By integrating a form of feedback about
the user cognitive state would allow to adapt, not only the camera profile, but the model
itself. Investigating the use of multiple input modalities, such as computer vision, electroen-
cephalogram (EEG), electromyogram (EMG) and other physiological signals appears also
a promising way to detect and model the user’s affective state (Yannakakis et al., 2010),
allowing to better understand the relationship between virtual cinematography and the
player.

9.4 Summary

This dissertation has presented an approach to interactive virtual cinematography address-

ing the problems of efficient and effective camera placement and implicit camera interaction.
The presented approach includes a novel architecture for automatic camera control, a novel
hybrid optimisation algorithm and a methodology to design adaptive virtual camera expe-
riences. These approaches have been described in detail and each one of the architecture’s
components has been evaluated in terms of both numerical performance and its effect on
player experience. The key findings of the thesis suggest that, by embedding domain knowl-
edge into a global optimisation algorithm through the Artificial Potential Field operator, the

133
Chapter 9. Discussion and Conclusions

algorithm is able to deal with highly dynamic and multi-modal fitness functions. Moreover,
the user evaluation of the proposed automatic camera control architecture, shows that the
combination of hybrid optimisation and path planning allows to accurately and smoothly
control the camera. Finally, the suggested methodology for camera behaviour modelling
and adaptation, proved to be able to overcome some of the limitations of pure automatic
camera control.

134
Bibliography

(1990). A Monte-Carlo algorithm for path planning with many degrees of freedom. — Cited
on page 10.

Ali, M. M. and Törn, A. A. (2004). Population set-based global optimization algo-

rithms: some modifications and numerical studies. Computers & Operations Research,
31(10):1703–1725. — Cited on pages 41 and 81.

Arijon, D. (1991). Grammar of the Film Language. Silman-James Press LA. — Cited on
pages 3, 6, 11, and 24.

Bares, W. H. and Lester, J. C. (1997a). Cinematographic User Models for Automated

Realtime Camera Control in Dynamic 3D Environments. In International Conference on
User Modeling, pages 215–226. Springer-Verlag. — Cited on pages 4, 12, and 19.

Bares, W. H. and Lester, J. C. (1997b). Realtime Generation of Customized 3D Animated

Explanations for Knowledge-Based Learning Environments. In Conference on Innovative
Applications of Artificial Intelligence. — Cited on page 12.

Bares, W. H. and Lester, J. C. (1999). Intelligent multi-shot visualization interfaces for

dynamic 3D worlds. In International Conference on Intelligent User Interfaces, pages
119–126, Los Angeles, California, USA. ACM Press. — Cited on page 13.

Bares, W. H., McDermott, S., Boudreaux, C., and Thainimit, S. (2000). Virtual 3D camera
composition from frame constraints. In ACM Multimedia, pages 177–186, Marina del
Rey, California, USA. ACM Press. — Cited on pages 2, 6, 12, 13, 24, 25, 26, 27, 30,
and 31.

Bares, W. H., Zettlemoyer, L. S., Rodriguez, D. W., and Lester, J. C. (1998). Task-Sensitive
Cinematography Interfaces for Interactive 3D Learning Environments. In International
Conference on Intelligent User Interfaces, pages 81–88, San Francisco, California, USA.
ACM Press. — Cited on pages 4 and 12.

135
Bibliography

Baumann, M. A., Dupuis, D. C., Leonard, S., Croft, E. A., and Little, J. J. (2008).
Occlusion-free path planning with a probabilistic roadmap. In IEEE/RSJ International
Conference on Intelligent Robots and Systems, pages 2151–2156. IEEE. — Cited on
page 15.

Beckhaus, S., Ritter, F., and Strothotte, T. (2000). CubicalPath - Dynamic Potential Fields
for Guided Exploration in Virtual Environments. In Pacific Conference on Computer
Graphics and Applications, pages 387–459. — Cited on pages 15, 18, and 35.

Bernhard, M., Stavrakis, E., and Wimmer, M. (2010). An empirical pipeline to derive gaze
prediction heuristics for 3D action games. ACM Transactions on Applied Perception,
8(1):4:1—-4:30. — Cited on page 21.

Blinn, J. (1988). Where Am I? What Am I Looking At? IEEE Computer Graphics and
Applications, 8(4):76–81. — Cited on page 11.

Bohlin, R. and Kavraki, L. (2000). Path planning using lazy PRM. In IEEE International
Conference on Robotics and Automation., volume 1, pages 521–528. IEEE. — Cited on
pages 44 and 45.

Bourne, O., Sattar, A., and Goodwin, S. (2008). A Constraint-Based Autonomous 3D

Camera System. Journal of Constraints, 13(1-2):180–205. — Cited on pages 13, 14, 15,
17, 18, 38, and 81.

Branke, J. (2001). Evolutionary Optimization in Dynamic Environments. Kluwer Academic

Publishers. — Cited on pages 18 and 132.

Burelli, P., Di Gaspero, L., Ermetici, A., and Ranon, R. (2008). Virtual Camera Composi-
tion with Particle Swarm Optimization. In International symposium on Smart Graphics,
pages 130–141. Springer. — Cited on pages 13, 14, 18, 42, and 81.

Burelli, P. and Jhala, A. (2009a). CamOn: A Real-Time Autonomous Camera Control Sys-
tem. In AAAI Conference On Artificial Intelligence In Interactive Digitale Entertainment
Conference, Palo Alto. AAAI. — Cited on pages 5 and 23.

Burelli, P. and Jhala, A. (2009b). Dynamic Artificial Potential Fields for Autonomous
Camera Control. In AAAI Conference On Artificial Intelligence In Interactive Digitale
Entertainment Conference, Palo Alto. AAAI. — Cited on pages 5, 23, 34, 35, 43, and 81.

Burelli, P., Preuss, M., and Yannakakis, G. N. (2012). Optimising for Multiple Shots: An
Analysis of Solutions Diversity in Virtual Camera Composition. In FDG Workshop On
Intelligent Camera Control and Editing. — Cited on page 132.

136
Bibliography

Burelli, P. and Yannakakis, G. N. (2010a). Combining Local and Global Optimisation for
Virtual Camera Control. In IEEE Conference on Computational Intelligence and Games,
page 403. — Cited on pages 5, 23, 34, 40, 42, 46, and 133.

Burelli, P. and Yannakakis, G. N. (2010b). Global Search for Occlusion Minimisation in

Virtual Camera Control. In IEEE Congress on Evolutionary Computation, pages 1–8,
Barcelona. IEEE. — Cited on pages 5, 23, 35, 43, 64, 74, and 81.

Burelli, P. and Yannakakis, G. N. (2011). Towards Adaptive Virtual Camera Control In

Computer Games. In International symposium on Smart Graphics. — Cited on pages 5,
49, and 99.

Burelli, P. and Yannakakis, G. N. (2012a). Automatic Camera Control in Computer Games:

Design And Evaluation. Under Journal Review. — Cited on pages 5, 15, 61, and 87.

Burelli, P. and Yannakakis, G. N. (2012b). Real-Time Virtual Camera Composition In

Dynamic Environments. Under Journal Review. — Cited on pages 5, 15, 18, 23, 61,
and 73.

Calinski, T. and Harabasz, J. (1974). A Dendrite Method For Cluster Analysis. Commu-
nications in Statistics, 3(1):1–27. — Cited on pages 56 and 104.

Cardamone, L., Loiacono, D., and Lanzi, P. L. (2009). On-line Neuroevolution Applied to
The Open Racing Car Simulator. In IEEE Congress on Evolutionary Computation. —
Cited on page 19.

Caruana, R., Lawrence, S., and Giles, L. (2001). Overfitting in Neural Nets : Backprop-
agation, Conjugate Gradient and Early Stopping. In Advances in Neural Information
Processing Systems, pages 402–408. MIT Press. — Cited on page 57.

Charles, D. and Black, M. (2004). Dynamic Player Modelling : A Framework for Player-
centred. In International Conference on Computer Games: Artificial Intelligence, Design
and Education, pages 19–35. — Cited on page 19.

Charles, F., Lugrin, J.-l., Cavazza, M., and Mead, S. J. (2002). Real-time camera con-
trol for interactive storytelling. In International Conference for Intelligent Games and
Simulations, pages 1–4, London. — Cited on pages 12 and 131.

Christianson, D., Anderson, S., He, L.-w., Salesin, D. H., Weld, D., and Cohen, M. F.
(1996). Declarative Camera Control for Automatic Cinematography. In AAAI, pages
148–155. AAI. — Cited on page 12.

137
Bibliography

Christie, M. and Hosobe, H. (2006). Through-the-lens cinematography. In Butz, A., Fisher,

B., Krüger, A., and Olivier, P., editors, International symposium on Smart Graphics, vol-
ume 4073 of Lecture Notes in Computer Science, pages 147–159–159, Berlin, Heidelberg.
Springer Berlin Heidelberg. — Cited on page 11.

Christie, M., Languénou, E., and Granvilliers, L. (2002). Modeling Camera Control with
Constrained Hypertubes. In International Conference on Principles and Practice of Con-
straint Programming, pages 618–632, London, UK. Springer-Verlag. — Cited on page 131.

Christie, M. and Normand, J.-M. (2005). A Semantic Space Partitioning Approach to

Virtual Camera Composition. Computer Graphics Forum, 24(3):247–256. — Cited on
pages 13 and 18.

Christie, M., Olivier, P., and Normand, J.-M. (2008). Camera Control in Computer Graph-
ics. In Computer Graphics Forum, volume 27, pages 2197–2218. — Cited on pages 3, 10,
13, and 15.

Cobb, H. G. (1990). An investigation into the use of hypermutation as an adaptive operator

in genetic algorithms having continuous, time-dependent nonstationary environments.
Technical report, NAVAL RESEARCH LAB WASHINGTON. — Cited on page 18.

Corke, P. I. (1994). Visual Control Of Robot Manipulators – A Review. In Visual Servoing,

pages 1–31. World Scientific. — Cited on page 9.

Davies, D. L. and Bouldin, D. W. (1979). A Cluster Separation Measure. Pattern Analysis

and Machine Intelligence, (2):224–227. — Cited on pages 56 and 104.

Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische

Mathematik, 1(1):269–271. — Cited on page 45.

Drachen, A., Canossa, A., and Yannakakis, G. N. (2009). Player modeling using self-
organization in Tomb Raider: Underworld. In IEEE Symposium on Computational In-
telligence and Games, pages 1–8. IEEE. — Cited on page 20.

Drucker, S. M., Galyean, T. A., and Zeltzer, D. (1992). CINEMA: a system for procedu-
ral camera movements. In ACM SIGGRAPH Symposium on Interactive 3D Graphics,
page 67. — Cited on page 131.

Drucker, S. M. and Zeltzer, D. (1994). Intelligent camera control in a virtual environment.

In Graphics Interface, pages 190–199. — Cited on pages 11 and 35.

138
Bibliography

Dunn, J. C. (1973). A Fuzzy Relative of the ISODATA Process and Its Use in Detecting
Compact Well-Separated Clusters. Cybernetics and Systems, 3(3):32–57. — Cited on
page 104.

El-Nasr, M. S. and Yan, S. (2006). Visual Attention in 3D Video Games. In ACM SIGCHI
international conference on Advances in computer entertainment technology. — Cited on
page 20.

Filliat, D. and Meyer, J.-A. (2002). Global localization and topological map-learning for
robot navigation. In From Animals to Animats. — Cited on page 10.

Floudas, C. A. and Pardalos, P. M. (1990). A collection of test problems for constrained

global optimization algorithms. Springer-Verlag New York. — Cited on page 74.

Frazzoli, E., Dahleh, M., and Feron, E. (2000). Robust hybrid control for autonomous
vehicle motion planning. In IEEE Conference on Decision and Control, volume 1, pages
821–826. IEEE. — Cited on page 9.

Gleicher, M. and Witkin, A. (1992). Through-the-lens camera control. In ACM SIGGRAPH,

volume 26, page 331. — Cited on page 11.

Glover, F. W. and Laguna, M. (1997). Tabu Search. Springer. — Cited on page 18.

Goldberg, D. and Smith, R. (1987). Nonstationary function optimization using genetic algo-
rithms with dominance and diploidy. In Inernational Conference on Genetic Algorithms,
pages 59–69. — Cited on page 18.

Hadley, G. (1962). Linear programming. Addison-Wesley. — Cited on page 18.

Hagelbäck, J. and Johansson, S. J. (2009). A Multi-agent Potential Field based bot for a
Full RTS Game Scenario. In AAAI Conference On Artificial Intelligence In Interactive
Digitale Entertainment Conference. — Cited on page 19.

Halper, N., Helbing, R., and Strothotte, T. (2001). A Camera Engine for Computer Games:
Managing the Trade-Off Between Constraint Satisfaction and Frame Coherence. Com-
puter Graphics Forum, 20(3):174–183. — Cited on pages 14, 16, and 133.

Halper, N. and Olivier, P. (2000). CamPlan: A Camera Planning Agent. In International

symposium on Smart Graphics, pages 92–100. AAAI Press. — Cited on pages 13 and 18.

Hamaide, J. (2008). A Versatile Constraint-Based Camera System. In AI Game Program-

ming Wisdom 4, pages 467–477. — Cited on page 16.

139
Bibliography

Hart, P. E., Nilsson, N. J., and Bertram, R. (1968). A formal basis for the heuristic determi-
nation of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics.
— Cited on page 10.

Hastings, E. J., Guha, R. K., and Stanley, K. O. (2009). Automatic Content Generation in
the Galactic Arms Race Video Game. IEEE Transactions on Computational Intelligence
and AI in Games, 1(4):1–19. — Cited on page 19.

Haykin, S. (2008). Neural Networks and Learning Machines. Number v. 10 in Neural

networks and learning machines. Prentice Hall. — Cited on pages 56 and 57.

He, L.-w., Cohen, M. F., and Salesin, D. H. (1996). The virtual cinematographer: a
paradigm for automatic real-time camera control and directing. In ACM SIGGRAPH,
pages 217–224. ACM Press. — Cited on page 12.

Holland, J. H. (1992). Genetic Algorithms. Scientific American, 267:66–72. — Cited on

pages 6, 18, and 39.

Houlette-Stottler, R. (2004). Player Modeling for Adaptive Games. In AI Game Program-

ming Wisdom 2, pages 557–566. Charles River Media, Inc. — Cited on page 19.

Hubert, L. J. and Levin, J. R. (1976). Evaluating object set partitions: Free-sort analysis
and some generalizations. Journal of Verbal Learning and Verbal Behavior, 15(4):459–470.
— Cited on page 104.

Irwin, D. E. (2004). Fixation Location and Fixation Duration as Indices of Cognitive

Processing. In The interface of language, vision, and action: Eye movements and the
visual world, chapter 3, pages 105–133. Psychology Press, New York. — Cited on page 53.

Jardillier, F. and Languénou, E. (1998). Screen-Space Constraints for Camera Movements:

the Virtual Cameraman. Computer Graphics Forum, 17(3):175–186. — Cited on pages 12
and 13.

Jhala, A. and Young, R. M. (2005). A discourse planning approach to cinematic camera

control for narratives in virtual environments. In AAAI, number July, pages 307–312,
Pittsburgh, Pennsylvania, USA. AAAI Press. — Cited on pages 12 and 55.

Jhala, A. and Young, R. M. (2009). Evaluation of intelligent camera control systems based
on cognitive models of comprehension. In International Conference On The Foundations
Of Digital Games, pages 327–328. — Cited on page 12.

140
Bibliography

Jhala, A. and Young, R. M. (2010). Cinematic Visual Discourse: Representation, Gen-

eration, and Evaluation. IEEE Transactions on Computational Intelligence and AI in
Games, 2(2):69–81. — Cited on pages 12, 24, and 131.

Jones, T. and Forrest, S. (1995). Fitness Distance Correlation as a Measure of Problem

Difficulty for Genetic Algorithms. In International Conference on Genetic Algorithms,
pages 184–192. Morgan Kaufmann. — Cited on page 74.

Kavraki, L., Svestka, P., Latombe, J.-c., and Overmars, M. (1994). Probabilistic Roadmaps
for Path Planning in High-Dimensional Configuration Spaces. In IEEE International
Conference on Robotics and Automation, page 171. — Cited on pages 6, 10, and 45.

Kennedy, J. and Eberhart, R. C. (1995). Particle swarm optimization. In IEEE Conference

on Neural Networks, pages 1942–1948. — Cited on page 41.

Khatib, O. (1986). Real-time obstacle avoidance for manipulators and mobile robots. The
International Journal of Robotics Research, 5(1):90–98. — Cited on pages 6, 9, and 35.

Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983). Optimization by simulated an-
nealing. Science, 220:671–680. — Cited on page 18.

Kittler, J. (1978). Feature Set Search Algorithms. Pattern Recognition and Signal Process-
ing, pages 41–60. — Cited on page 107.

Krzanowski, W. J. and Lai, Y. T. (1988). A criterion for determining the number of groups
in a data set using sum-of-squares clustering. Journal of Biometrics, 44(1):23–34. —
Cited on pages 56 and 104.

Kyung, M. and Kim, M. (1995). Through-the-lens camera control with a simple jacobian
matrix. In Graphics Interface. — Cited on page 11.

Land, M. F. (2009). Vision, eye movements, and natural behavior. Visual neuroscience,
26(1):51–62. — Cited on page 53.

Land, M. F. and Tatler, B. W. (2009). Looking and Acting: Vision and Eye Movements in
Natural Behavior. Oxford University Press. — Cited on page 52.

Lino, C., Christie, M., Lamarche, F., Schofield, G., and Olivier, P. (2010). A Real-time
Cinematography System for Interactive 3D Environments. In Eurographics/ACM SIG-
GRAPH Symposium on Computer Animation, pages 139–148. — Cited on pages 14
and 32.

141
Bibliography

Lucas, S. M. (2005). Evolving a Neural Network Location Evaluator to Play Ms . Pac-Man.

In IEEE Symposium on Computational Intelligence and Games. — Cited on page 19.

Lucas, S. M. (2008). Computational intelligence and games: Challenges and opportunities.

International Journal of Automation and Computing, 5(1):45–57. — Cited on page 19.

Mackinlay, J. D., Card, S. K., and Robertson, G. G. (1990). Rapid controlled movement
through a virtual 3D workspace. In ACM SIGGRAPH, volume 24, pages 171–176, Dallas.
— Cited on page 10.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate obser-
vations. In Berkeley Symposium on Mathematical Statistics and Probability, volume 233,
pages 281–297. — Cited on page 55.

Mahlmann, T., Drachen, A., Togelius, J., Canossa, A., and Yannakakis, G. N. (2010).
Predicting Player Behavior in Tomb Raider: Underworld. In IEEE Conference on Com-
putational Intelligence and Games. — Cited on page 52.

Mahlmann, T., Togelius, J., and Yannakakis, G. N. (2011). Towards Procedural Strategy
Game Generation : Evolving Complementary Unit Types. In European Conference on
Applications of Evolutionary Computation. — Cited on page 19.

Marchand, E. and Courty, N. (2002). Controlling a camera in a virtual environment. The

Visual Computer, 18:1–19. — Cited on page 14.

Martinez, H. P., Jhala, A., and Yannakakis, G. N. (2009). Analyzing the impact of camera
viewpoint on player psychophysiology. In International Conference on Affective Comput-
ing and Intelligent Interaction and Workshops, pages 1–6. IEEE. — Cited on page 6.

Moravec, H. and Elfes, A. E. (1985). High Resolution Maps from Wide Angle Sonar. In
IEEE International Conference on Robotics and Automation, pages 116 – 121. — Cited
on pages 9 and 10.

Munoz, J., Yannakakis, G. N., Mulvey, F., Hansen, D. W., Gutierrez, G., and Sanchis, A.
(2011). Towards Gaze-Controlled Platform Games. In IEEE Conference on Computa-
tional Intelligence and Games, number Section V, pages 47–54. — Cited on page 20.

Nacke, L., Stellmach, S., Sasse, D., and Lindley, C. A. (2009). Gameplay experience in a
gaze interaction game. In Communication by Gaze Interaction, pages 49–54. — Cited on
page 20.

142
Bibliography

Nieuwenhuisen, D. and Overmars, M. H. (2004). Motion Planning for Camera Movements

in Virtual Environments. In IEEE International Conference on Robotics and Automation,
pages 3870 – 3876. IEEE. — Cited on page 15.

Olivier, P., Halper, N., Pickering, J., and Luna, P. (1999). Visual Composition as Optimi-
sation. In Artificial Intelligence and Simulation of Behaviour. — Cited on pages 12, 13,
18, and 25.

Oskam, T., Sumner, R. W., Thuerey, N., and Gross, M. (2009). Visibility transition planning
for dynamic camera control. In ACM SIGGRAPH/Eurographics Symposium on Computer
Animation, pages 55–65. — Cited on page 15.

Phillips, C. B., Badler, N. I., and Granieri, J. (1992). Automatic viewing control for 3D
direct manipulation. In ACM SIGGRAPH Symposium on Interactive 3D graphics, pages
71–74, Cambridge, Massachusetts, USA. ACM Press. — Cited on page 11.

Picardi, A., Burelli, P., and Yannakakis, G. N. (2011). Modelling Virtual Camera Be-
haviour Through Player Gaze. In International Conference On The Foundations Of
Digital Games. — Cited on pages 5, 49, and 99.

Pickering, J. (2002). Intelligent Camera Planning for Computer Graphics. PhD thesis,
University of York. — Cited on pages 13, 14, and 18.

Pinelle, D. and Wong, N. (2008). Heuristic evaluation for games. In CHI, CHI ’08, page
1453, New York, New York, USA. ACM Press. — Cited on page 2.

Pontriagin, L. S. (1962). Mathematical Theory of Optimal Processes. Interscience Publishers.

— Cited on page 11.

Preuss, M., Burelli, P., and Yannakakis, G. N. (2012). Diversified Virtual Camera Composi-
tion. In European Conference on the Applications of Evolutionary Computation, Malaga.
— Cited on page 132.

Ramsey, C. and Grefenstette, J. (1993). Case-based initialization of genetic algorithms. In

International Conference on Genetic Algorithms, pages 84–91. — Cited on page 18.

Riedmiller, M. and Braun, H. (1993). A direct adaptive method for faster backpropagation
learning: the RPROP algorithm. IEEE. — Cited on page 106.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning internal representa-
tions by error propagation. Nature, 323:533–536. — Cited on page 57.

143
Bibliography

Russell, S. and Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd Edition).
Prentice Hall, 3 edition. — Cited on page 17.

Salichs, M. A. and Moreno, L. (2000). Navigation of mobile robots: open questions. Robot-
ica, 18(3):227–234. — Cited on page 9.

Schmalstieg, D. and Tobler, R. F. (1999). Real-time Bounding Box Area Computation.

Technical Report Figure 1, Vienna University. — Cited on page 27.

Schönemann, L. (2007). Evolution Strategies in Dynamic Environments. In Yang, S.,

Ong, Y.-S., and Jin, Y., editors, Evolutionary Computation in Dynamic and Uncertain
Environments, volume 51 of Studies in Computational Intelligence, pages 51–77. Springer
Berlin Heidelberg, Berlin, Heidelberg. — Cited on pages 35, 43, 82, 85, and 126.

Schwefel, H.-P. (1981). Numerical Optimization of Computer Models. — Cited on page 18.

Shaker, N., Yannakakis, G. N., and Togelius, J. (2010). Towards automatic personalized
content generation for platform games. In AAAI Conference On Artificial Intelligence In
Interactive Digitale Entertainment. — Cited on pages 16 and 19.

Snedecor, G. W. and Cochran, W. G. (1989). Statistical Methods. Number vb. 276 in

Statistical Methods. Iowa State University Press. — Cited on page 107.

Soanes, C. and Stevenson, A. (2005). Oxford Dictionary of English. Oxford University

Press. — Cited on page 1.

Storn, R. (1996). On the usage of differential evolution for function optimization. In North
American Fuzzy Information Processing, pages 519–523. IEEE. — Cited on page 81.

Storn, R. and Lampinen, J. A. (2004). Differential Evolution. In New Optimization Tech-

niques in Engineering, chapter 6, pages 123–166. Springer-Verlag. — Cited on pages 41
and 81.

Storn, R. and Price, K. (1997). Differential Evolution A Simple and Efficient Heuristic for
global Optimization over Continuous Spaces. Journal of Global Optimization, 11(4):341–
359–359. — Cited on page 41.

Sundstedt, V., Stavrakis, E., Wimmer, M., and Reinhard, E. (2008). A psychophysical
study of fixation behavior in a computer game. In Symposium on Applied perception in
graphics and visualization, pages 43–50. ACM. — Cited on page 21.

Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction (Adaptive

Computation and Machine Learning). The MIT Press. — Cited on page 10.

144
Bibliography

Tesauro, G. (1994). TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-

Level Play. Neural Computation, 219:215–219. — Cited on page 19.

Thawonmas, R., Kurashige, M., Keita, I., and Kantardzic, M. (2006). Clustering of Online
Game Users Based on Their Trails Using Self-organizing Map. In International Conference
on Entertainment Computing, number 16500091, pages 366–369. — Cited on page 20.

Thrun, S. (1995). Learning To Play the Game of Chess. Advances in Neural Information
Processing Systems, 7. — Cited on page 19.

Thue, D., Bulitko, V., Spetch, M., and Wasylishen, E. (2007). Interactive Storytelling : A
Player Modelling Approach. In AAAI Conference On Artificial Intelligence In Interactive
Digitale Entertainment, number July, pages 43–48, Stanford, CA. — Cited on page 20.

Thurau, C., Bauckhage, C., and Sagerer, G. (2003). Combining Self Organizing Maps and
Multilayer Perceptrons to Learn Bot-Behavior for a Commercial Game. In GAME-ON,
pages 119–123. — Cited on page 20.

Thurau, C., Bauckhage, C., and Sagerer, G. (2004). Learning human-like movement behav-
ior for computer games. In International Conference on Simulation of Adaptive Behavior,
pages 315–323. — Cited on page 20.

Togelius, J., Karakovskiy, S., and Koutn, J. (2009). Super Mario Evolution. In IEEE Sym-
posium on Computational Intelligence and Games, pages 156–161. — Cited on page 19.

Togelius, J., Nardi, R. D., and Lucas, S. M. (2006). Making Racing Fun Through Player
Modeling and Track Evolution. In SAB Workshop on Adaptive Approaches to Optimizing
Player Satisfaction. — Cited on page 19.

Togelius, J., Yannakakis, G. N., Stanley, K. O., and Browne, C. (2010). Search-based
Procedural Content Generation. In European Conference on Applications of Evolutionary
Computation, pages 1–10. — Cited on page 19.

Tomlinson, B., Blumberg, B., and Nain, D. (2000). Expressive autonomous cinematography
for interactive virtual environments. In International Conference on Autonomous Agents,
page 317. — Cited on pages 4, 12, and 49.

Törn, A., Ali, M., and Viitanen, S. (1999). Stochastic Global Optimization: Problem
Classes and Solution Techniques. Journal of Global Optimization, 14(4):437–447–447. —
Cited on pages 74, 75, 76, and 78.

145
Bibliography

Ursem, R. K. (2000). Multinational GAs: Multimodal Optimization Techniques in Dy-

namic Environments. In Genetic and Evolutionary Computation Conference. — Cited
on page 18.

Vavak, F., Jukes, K., and Fogarty, T. (1997). Learning the local search range for genetic
optimisation in nonstationary environments. IEEE International Conference on Evolu-
tionary Computation, pages 355–360. — Cited on page 18.

Ware, C. and Osborne, S. (1990). Exploration and virtual camera control in virtual three
dimensional environments. ACM SIGGRAPH, 24(2):175–183. — Cited on page 10.

Wirth, N. and Gallagher, M. (2008). An influence map model for playing Ms. Pac-Man.
In IEEE Symposium On Computational Intelligence and Games, pages 228–233. Ieee. —
Cited on page 19.

Wolf, M. J. P. (2001). Genre and the video game. In Wolf, M. J. P., editor, The medium of
the video game, chapter 6, pages 113–134. University of Texas Press. — Cited on page 67.

Yannakakis, G. N. and Hallam, J. (2011). Rating vs. Preference: A comparative study of

self-reporting. In Affective Computing and Intelligent Interaction Conference. Springer-
Verlag. — Cited on page 89.

Yannakakis, G. N., Hallam, J., and Lund, H. H. (2008). Entertainment capture through
heart rate activity in physical interactive playgrounds. User Modeling and User-Adapted
Interaction, 18(1-2):207–243. — Cited on pages 92 and 116.

Yannakakis, G. N. and Maragoudakis, M. (2005). Player modeling impact on players en-

tertainment in computer games. In Ardissono, L., Brna, P., and Mitrovic, A., editors,
International Conference on User Modeling, volume 3538 of Lecture Notes in Computer
Science, pages 74–78, Berlin, Heidelberg. Springer Berlin Heidelberg. — Cited on pages 19
and 20.

Yannakakis, G. N., Martinez, H. P., and Jhala, A. (2010). Towards Affective Camera
Control in Games. User Modeling and User-Adapted Interaction. — Cited on pages 20
and 133.

Yannakakis, G. N. and Togelius, J. (2011). Experience-driven procedural content generation.

IEEE Transactions on Affective Computing, pages 1–16. — Cited on pages 16 and 20.

Yarbus, A. L. (1967). Eye movements and vision. Plenum press. — Cited on pages 20, 52,
and 103.

146

Art and Visual Perception Thesis
No ratings yet
Art and Visual Perception Thesis
128 pages
TheGameDesignerPlaybook Ebook
100% (2)
TheGameDesignerPlaybook Ebook
103 pages
UPBGE Manual Documentation: Release Latest
100% (3)
UPBGE Manual Documentation: Release Latest
466 pages
C++ Game Development Book
100% (2)
C++ Game Development Book
526 pages
Cardiologie MANUAL
50% (12)
Cardiologie MANUAL
15 pages
D&D 4th Edition - Adventurer's Vault 2
100% (6)
D&D 4th Edition - Adventurer's Vault 2
159 pages
Wsu Kinematic R30
No ratings yet
Wsu Kinematic R30
24 pages
Kinematics Simulación - CATIA V5
0% (1)
Kinematics Simulación - CATIA V5
40 pages
Animation in Unity
No ratings yet
Animation in Unity
121 pages
Official Notification For OAVS Recruitment
No ratings yet
Official Notification For OAVS Recruitment
28 pages
Breakout Play (Trend Following) - Trading Plan - Full (Sample)
91% (11)
Breakout Play (Trend Following) - Trading Plan - Full (Sample)
15 pages
Motion-Controlled Game Interface
No ratings yet
Motion-Controlled Game Interface
11 pages
ECE 470 Introduction To Robotics Lab Manual: Jonathan K. Holm
No ratings yet
ECE 470 Introduction To Robotics Lab Manual: Jonathan K. Holm
72 pages
Ebook Introduction To Game Level Design
No ratings yet
Ebook Introduction To Game Level Design
115 pages
Thesis
No ratings yet
Thesis
65 pages
GREER, David - DEng. - 2015
No ratings yet
GREER, David - DEng. - 2015
155 pages
Manual Unity3D
No ratings yet
Manual Unity3D
427 pages
Chapter 2 Searching and Sorting
No ratings yet
Chapter 2 Searching and Sorting
19 pages
Maricel Movs Ok Rpms With Annotations Kra 1-5-2021 2022
100% (1)
Maricel Movs Ok Rpms With Annotations Kra 1-5-2021 2022
146 pages
MakingGamesForAtari2600 Ebook Dec2018
No ratings yet
MakingGamesForAtari2600 Ebook Dec2018
244 pages
GD Haigh-Hutchinson FundamentalsReal-TimeCameraDesign2
No ratings yet
GD Haigh-Hutchinson FundamentalsReal-TimeCameraDesign2
20 pages
Solution Guide II A Image Acquisition
No ratings yet
Solution Guide II A Image Acquisition
56 pages
Embedded Intro
No ratings yet
Embedded Intro
69 pages
Autonomous Charactersin Virtual Environments
No ratings yet
Autonomous Charactersin Virtual Environments
177 pages
Previewpdf
No ratings yet
Previewpdf
51 pages
Ryan, Daniel James
No ratings yet
Ryan, Daniel James
81 pages
Mattiahj Master Thesis
No ratings yet
Mattiahj Master Thesis
91 pages
EEP 13 Manual
No ratings yet
EEP 13 Manual
177 pages
AIR HOCKEY Project
No ratings yet
AIR HOCKEY Project
26 pages
TFG RiveroCabezas Francisco
No ratings yet
TFG RiveroCabezas Francisco
50 pages
Cinema Tics
50% (2)
Cinema Tics
40 pages
10 1 1 1 6269 PDF
No ratings yet
10 1 1 1 6269 PDF
231 pages
Thesis
No ratings yet
Thesis
187 pages
Vehicle Sensor Data Real Time Visualizer: Erik Karlsson Henrik Olsson
No ratings yet
Vehicle Sensor Data Real Time Visualizer: Erik Karlsson Henrik Olsson
80 pages
Documentation AFCore 4.1
No ratings yet
Documentation AFCore 4.1
78 pages
Character Movement Fundamentals
No ratings yet
Character Movement Fundamentals
61 pages
Rapport JV 2023
No ratings yet
Rapport JV 2023
65 pages
++thesis Silvia Ramos
No ratings yet
++thesis Silvia Ramos
110 pages
Esfuerzos en Vigas - PDF
No ratings yet
Esfuerzos en Vigas - PDF
6 pages
SeaVipers - Computer Vision and Inertial Position Reference Sensor
No ratings yet
SeaVipers - Computer Vision and Inertial Position Reference Sensor
52 pages
Game Camera
No ratings yet
Game Camera
8 pages
Solucionario Koretssky
No ratings yet
Solucionario Koretssky
130 pages
The Manual For The Quality Management of Educational Programmes in Myanmar
100% (1)
The Manual For The Quality Management of Educational Programmes in Myanmar
160 pages
AP Chemistry Solubility Rules Equations Sheet
100% (1)
AP Chemistry Solubility Rules Equations Sheet
8 pages
LIBRARY MANAGEMENT SYSTEM - Final
100% (1)
LIBRARY MANAGEMENT SYSTEM - Final
25 pages
Punching Shear
100% (1)
Punching Shear
4 pages
Player-Video Game Interaction, A Systematic Review of Current Concepts
No ratings yet
Player-Video Game Interaction, A Systematic Review of Current Concepts
16 pages
Applications HVAC EN
No ratings yet
Applications HVAC EN
21 pages
2D Game Engine: Bachelor's Thesis
No ratings yet
2D Game Engine: Bachelor's Thesis
77 pages
Fitting Simulation and Kinematics
No ratings yet
Fitting Simulation and Kinematics
29 pages
Contour Based Tracking
No ratings yet
Contour Based Tracking
20 pages
Motion Controlled Gaming Interface
No ratings yet
Motion Controlled Gaming Interface
10 pages
Microsoft XNA
No ratings yet
Microsoft XNA
542 pages
Angular JS-8
No ratings yet
Angular JS-8
87 pages
PORTUGALICBR Proceedings BookOfAbstracts
No ratings yet
PORTUGALICBR Proceedings BookOfAbstracts
348 pages
1 RIYA Immmm
No ratings yet
1 RIYA Immmm
60 pages
Max 30
No ratings yet
Max 30
6 pages
Ericsson India Private Limited VS Reliance Telecom Limited NCLT MUMBAI
No ratings yet
Ericsson India Private Limited VS Reliance Telecom Limited NCLT MUMBAI
30 pages
Analysis of Legal Case Document Automated Summarizer
No ratings yet
Analysis of Legal Case Document Automated Summarizer
6 pages
Borang
No ratings yet
Borang
1 page
NZ Pa 36 New Zealand Numeracy Stages 1 To 8 Weekly Planning Template English Ver 2
No ratings yet
NZ Pa 36 New Zealand Numeracy Stages 1 To 8 Weekly Planning Template English Ver 2
12 pages
Pleuropulmonary Infections
No ratings yet
Pleuropulmonary Infections
40 pages
Lecture Notes On EOQ
No ratings yet
Lecture Notes On EOQ
6 pages
2015-2016". May I Respectfully Ask Your Permission To Allow Me To Conduct This Research
No ratings yet
2015-2016". May I Respectfully Ask Your Permission To Allow Me To Conduct This Research
6 pages
Markets in Profile 部分18
No ratings yet
Markets in Profile 部分18
5 pages
Cisco Script
No ratings yet
Cisco Script
2 pages
Quiksam PDF
No ratings yet
Quiksam PDF
6 pages
1113
No ratings yet
1113
1 page
Director of Training
No ratings yet
Director of Training
2 pages
Cefasabal Underland - 2011 - CAM Reviews Serenoa Repens For Benign Prostatic Hyperplasia-2
No ratings yet
Cefasabal Underland - 2011 - CAM Reviews Serenoa Repens For Benign Prostatic Hyperplasia-2
2 pages
Mastering Python Advanced Concepts and Practical Applications
From Everand
Mastering Python Advanced Concepts and Practical Applications
Aissa Younes
No ratings yet
Risk Management and System Safety
From Everand
Risk Management and System Safety
Leonam dos Santos Guimarães
5/5 (1)
Digital Photography Complete Course: How to Capture Visually Stunning Photos
From Everand
Digital Photography Complete Course: How to Capture Visually Stunning Photos
Stefan Johnston
3/5 (2)
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Digital Photography for Beginners
From Everand
Digital Photography for Beginners
Samuel J.Swan
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Teaching Scratch Programming…from Scratch
From Everand
Teaching Scratch Programming…from Scratch
John Nunez
No ratings yet
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Plain JavaScript: Learning the Front-End
From Everand
Plain JavaScript: Learning the Front-End
Roger Beans-Rivet
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Cybersecurity for Executives: A Guide to Protecting Your Business
From Everand
Cybersecurity for Executives: A Guide to Protecting Your Business
Matthew C. Smith
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Conquering the Competition: Strategies for Standing Out in the Gaming Content Landscape
From Everand
Conquering the Competition: Strategies for Standing Out in the Gaming Content Landscape
Rian McCullen
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet

Interactive Virtual Cinematography

Uploaded by

Interactive Virtual Cinematography

Uploaded by

Interactive Virtual

A thesis submitted for the degree of

I would like to express my gratitude to my supervisor and friend Dr. Georgios

Camera placement and animation in games are usually directly controlled by

Automatic camera control aims to define an abstraction layer that permits to

An hypothetical optimal automatic camera control system should provide the

Experiments show that, the novel optimisation algorithm introduced success-

3 Automatic Camera Control 23

3.1.6 Composition Objective Function . . . . . . . . . . . . . . . . . . . . 31

4 Adaptive Camera Control 49

6 Hybrid Genetic Algorithm Evaluation 73

8 Evaluation of Adaptive Camera 99

9 Discussion and Conclusions 125

1.1 Standard virtual camera parameters. . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Examples of advanced camera control in modern computer games. . . . . . 17

7.3 Differences ∆F = Fa − Fm of completion time (7.3a), number of canisters

8.7 Differences ∆F = Fad − Fau of completion time (a), number of canisters

1.1 Problem Definition and Motivation

Figure 1.1: Standard virtual camera parameters.

(a) Composition (b) Animation

1.2 Objectives and Key Contributions

1. How to effectively approach camera composition and animation in real-time in dy-

• A novel methodology to generate personalised cinematographic experiences in com-

1.3 Automatic Camera Control

1.4 Adaptive Camera Control

1.5 Thesis Structure

The thesis is organized into chapters as follows.

2.1 Navigation and Control

2.2 Virtual Camera Control

2.3 Automatic Camera Control

• How to identify the best inputs for the system.

• How to control the dynamic behaviour of the camera.

2.3.1 Camera Planning

2.3.2 Virtual Camera Composition

Successfully handling object occlusion constitutes a vital component of an efficient camera

2.3.3 Camera Animation

The aforementioned virtual camera composition approaches calculate an optimal camera

environments. In these conditions, continuity in camera position, smooth animations and

2.4 Camera Control in Games

Figure 2.1: Examples of advanced camera control in modern computer games.

2.5 Artificial Intelligence

2.5.2 Artificial Intelligence in Games

2.5.3 Player Modelling

2.6 Gaze Interaction in Games

Automatic Camera Control

3.1 Frame constraints

In order to define virtual camera composition as an optimisation problem, we primarily

Bares et al. CamOn

3.1.1 Vantage Angle

(a) Angle: 0, 0 (b) Angle: -90, 0 (c) Angle: 45, 45

3.1.2 Object Projection Size

(a) Size: 1.0 (b) Size: 0.5 (c) Size: 1.5

3.1.3 Object Visibility

(a) Visibility: 1.0 (b) Visibility: 0.6 (c) Visibility: 0.5

3.1.4 Object Frame Position

orientation is the camera and it is defined as follows:

3.1.5 Camera Position

3.1.6 Composition Objective Function

where Nγ , Nσ , Nθ , Nρ and N are, respectively, the number of object visibility, object

3.2 Animation Constraints

• Frame Coherence: Defines the threshold value (minimum percentage of visible

3.3 Controller’s Architecture

Camera Settings Frame Constraints Animation Constraints

Camera Geometry Optimiser

5. The execution restarts from step 1

3.4 Optimisation Module

limitations caused by tendency of APF to prematurely converge to local optima with no

3.4.1 Artificial Potential Field

F~oγ = do (~s − ~c − f~wc ((~s − ~c) · f~wc ))|i − γd |

F~oρ = (1 − fρ )(v~uw + h~rc )

3.4.2 Sliding Octree

pass #1 Potential solutions

where Ds is the number of dimensions of the search space.

3.4.3 Genetic Algorithm

A Genetic Algorithm (GA) (Holland, 1992) is a biologically-inspired population-based meta-

1. A population of chromosomes (i.e. individuals or solutions) is randomly initialised.

where Nγ , Nσ , Nθ , Nρ and N are, respectively, the number of object visibility, object