A Review of Film Editing Techniques For
A Review of Film Editing Techniques For
Remi Ronfard
INRIA, LJK, Université de Grenoble, France
[email protected]
and techniques, and identify both promising avenues and count by following one of several working practises. We men-
hot topics for future research tion three of them.
2. CINEMATOGRAPHY AND EDITING 1. Cutting in the head means that the director has al-
One fundamental part of cinematography, as outlined in ready decided very precisely every single shot, usually
Maschielli’s 5C’s of cinematography [14] is to provide shots in the form of a storyboard. In that case, it suffices to
that can easily be edited together. In the early days of cin- shoot each action or beat in the screenplay from a sin-
ema, the interplay between cinematography and editing was gle viewpoint. Textbooks in film-making warn against
a matter of trial and error. As noted by Barry Salt [20], it the dangers of the method because it cannot recover
took several years before cinematographers and editors un- easily from errors in planning.
derstood the ”exit left enter right” editing rule. Before that,
the rule was usually obeyed because it appeared to work bet- This approach is very suitable for real-time applica-
ter in most cases. But the ”wrong” solution was still used tions. It consists in planning the editing first, resulting
from time to time. When it finally became clear what the in a list of shots that can then be rendered exactly as
”right” solution was, cinematographers stopped shooting the planned following the timeline of the final movie.
alternate solution because they knew it was useless. After One drawback of that approach is that the animation
more than a century of cinema, good professional cinematog- itself cannot always be predicted in all its actual de-
raphers have thus ”internalized” the rules of editing in such tails. As a result, it may be difficult to plan exactly
a way that they can avoid shots that will not cut together. when to cut from shot to shot.
In games, we are probably still at an earlier stage because it 2. Three-take technique A common variant of ”cutting
is not yet quite clear how the rules of cinematography should in the head” consists in shooting a little more of the
translate for an interactive game, which is a very different action from each planned camera position. As a result,
situation from a movie. each action is shot from three camera positions - one
according to the shot list, one from the immediately
In computer graphics, the camera is controlled by animators. previous viewpoint and one from the next viewpoint.
A good professional animator should have a similar sense of This has the advantage that the exact cutting point
which shots will cut together. When this is not the case, can be resolved at a later stage.
3. Master-shot technique Another common practice
consists in planning all the camera works for shooting
the scene in one continuous take - the ”master shot”
- and then adding shots of various sizes to show the
details of the action in various sizes (close-ups and
medium shots). Editing can then more carefully pre-
pared by insuring that all those shots will cut nicely
with the master shot, resulting in a typical sequence
of ”Master-Closeup-Master-Closeup”, etc.
Note that those techniques are very useful in practice be- preventing editing errors of all orders. But that is of course
cause they are more general than ”film idioms” where the not the entire story because there are still infinitely many
camera positions are prescribed once and for all. ”correct” camera pairs that can be cut together at any given
time. A second part of automated editing is therefore to
3. AUTOMATIC CAMERA EDITING evaluate when to cut to which shot.
This section covers the approaches that draw on a theory
of film editing for planning and performing camera place- The classical Hollywood concept of editing [14] recommends
ment and composition. Here scenes are described in terms that successive shots should minimize perceptually disrup-
of actions and communicative goals that must be translated tive transitions. The modern viewpoint [9] stresses the con-
into successive shots. Cutting between cameras adds con- sistency of the narrative structure which overrule disturbing
siderable freedom in the focalization and order of presenta- transitions, as attention will primarily be directed to grasp-
tion of the visual material. Cutting between cameras also ing the succession of significant events in the story. A good
introduces constraints. We review the most important con- computational theory of film editing should probably stand
straints and corresponding rules (180 degree rule, 60 degree in the middleground between those two viewpoints. On the
rule) and explain how they can be expressed and solved al- one hand, it is difficult to get a good model of ”perceptu-
gorithmically. Then, we review the principles that can be ally disruptive transitions”. At best, a computational model
used to evaluate the quality of a shot sequences and the al- may be expected to avoid the most obvious mistakes, still
gorithmic strategies that can be used to solve for the best leaving a large number of possibilities. On the other hand,
sequence. Finally, we review the strengths and limitations the narrative structure of an animated scene may not always
for some of the existing systems proposed for real-time, live- be easily uncovered, again leaving multiple choices.
editing [He96,Funge98] as well as offline, post-production
editing [Elson07] and sketch promising future directions for Few editors have written about their art with more depth
research in this area. than Walter Murch [16]. In his book, he introduces a Rule of
Six with six layers of increasing complexity and importance
Automatic film editing has a long history, dating back at in the choice of how and when to cut between shots:
hal-00694444, version 1 - 4 May 2012
editing. Computing the screen size and visibility of actors the actors. The VC takes as input strings of simple sen-
and objects in a shot is the easy part. Computing their tences : SUBJECT+VERB+OBJECT representing the ac-
importance in the plot is the really difficult part. tion taking place in the scene. The VC also takes as input a
continuous stream of bounding boxes and orientation, rep-
In a scripted sequence, it seems reasonable to assume that resenting the relative geometric positions and orientations
the scripted actions are all equally important. Thus at any of the virtual actors, objects and scene elements.
given time, the importance of actors and objects can be
approximated as the number of actions in which they are Idioms are usually chosen based on the next action string.
taking part, divided by the total number of actions being More complex editing patterns can also be achieved by defin-
executed in the scene at that time. Other approximations ing hierarchical state machines, encoding the transitions be-
are of course possible. For instance, it may be preferable to tween idioms.
assign all the attention to a single action at all times. This
may be implemented with a ”winner takes all” strategy. While powerful, this scheme has yet to demonstrate that
it can be used in practical situations. One reason may be
3.1.5 Emotion. that there is a heavy burden on the application program-
Emotion is hardest to evaluate. There is a large body of mer, who must encode all idioms for all narrative situations.
research being done in neuroscience on emotion. They dis- Another reason may be that the resulting editing may be
tinguish between primitive emotions, such as surprise, fear, too predictable.
laughter, etc. whose action is very fast; primitive moods,
such as sadness or joy, whose action is much slower; and In a finite state machine, the switching of a camera is trig-
learned, cognitive affects such as love, guilt, shame, etc. gered by the next action string. This may have the unde-
sirable effect that the switching becomes too predictable. A
For the purpose of editing, evaluating the emotional impact good example is the ”dragnet” style of editing [16] where the
of any given shot or cut appears to be very difficult. Emo- camera consistently switches to a close-up of the speaker on
tional cues can be received from the screenplay or from the each speaker change; then back to a reaction shot of the
director’s notes. They assert which emotions should be con- other actors being spoken to. This can become especially
veyed at any given point in time. Given such emotional cues, annoying when the speakers alternate very quickly.
we can then apply simple recipes such as separating actors
or showing them closer together; changing editing rhythm While it is possible to use the dragnet style of editing as
to show increasing or decreasing tension; changing shot sizes a separate film idiom, this causes the number of idiom to
to show increasing or decreasing tension; using lower cam- explode since every configuration can be filmed in dragnet
era angles to show ceilings and feel oppression; using higher style. A better solution separates the camera set-ups from
camera angles to hide ceilings and feel freedom; using longer the state machines - for each set-up, different styles can then
lenses to slow down actor movements and isolate them from be encoded with different state machines. But the same
the background; using wider lenses to accelerate actor move- ”style” must still be separately re-encoded for each set-up.
ments and put them in perspective, etc. How simple should
those strategies be ? Too simple a solution may look foolish. It is not obvious how to ”generalize” film idioms. This is an
Too complicated solution may be out of reach. open problem for procedural approaches.
3.3 Declarative approaches 3.4 Optimization approaches
In the beginning, automatic editing was attempted with tra- To overcome the problems of procedural and AI-based declar-
ditional, rule-based systems. ative approaches, it seems natural to rephrase the editing
problem as an optimization problem. In this section, we re-
IDIC by Sack and Davis [19] was one of the first systems to visit the editing constraints listed above and illustrate how
attempt automatic film editing from annotated movie shots. they can be used to build a quality function.
Mostly a sketch of what is possible, it was based on the
general problem solver (GPS), a very simple forward planner Let us review the common case of a dialog scene between two
[18]. actors. We are given a sequence of time intervals, each of
which may include actions performed by the two characters
”Declarative Camera Control for Automatic Cinematogra- A and B.
phy” is a much more elaborate attempt at formalizing the
editing of an animated movie, this time using modern plan- (a1 , b1 , t1 ), (a2 , b2 , t2 ), ...., (an , bn , tn )
ning techniques [4]. In that paper, idioms are not described
in terms of cameras in world coordinates but in terms of A solution to the automatic editing problem is a sequence
shots in screen coordinates, through the use of the DCCL of shots
language. DCCL is compiled into a film tree, which con-
tains all the possible editings of the input actions. Actions (c1 , s1 , t1 ), (c2 , s2 , t2 ), ...., (cn , sn , tn )
are represented as subject-verb-object triples. As in the Vir- where each shot si is taken by camera ci starting at time ti .
tual Cinematographer companion paper, the programming Cuts occur whenever the camera changes between succes-
effort for implementing an idiom is important. sives intervals. Reframing actions occur when the camera
remains the same but the shot descriptions change. Transi-
Jhala and Young have used text generation techniques to au- tions between the same shots result in longer shots and we
tomatically edit shots together using ”plan operators” [12]. can write the duration of the shot ∆i .
In another paper, Jhala and Young have used examples from
the movie ”The Rope” by Alfred Hitchcock to emphasize
hal-00694444, version 1 - 4 May 2012
that are aesthetically pleasing. FACADE by Mateas and watch” other actors. In dedicated applications such as Xtra-
Stern is a good example, although with a very simple cine- normal, The Sims or MovieStorm, the actors’s movements
matic look [15]. are labeled with higher-level commands, including ”looking”,
”speaking”, ”pointing”, ”sitting” or ’standing”, etc. This is
3.5.2 Automated movie production sufficient in principle to motivate the cinematography and
Some systems go even beyond camera control and editing , editing. In addition, Movie Storm outputs a ”movie script”
towards fully automated movie production. In 1966, Alfred inferred from the choice of actions. On the other hand, text-
Hitchcock dreamed of ªa machine in which heŠd insert to-scene systems such as Xtranormal instead use the movie
the screenplay at one end and the film would emerge at the script as an input, and infer the sequence of actions to be
other end, complete and in colorÂŤ (Truffaut/Hitchcock, performed by the virtual actors from the script.
p. 330). In limited ways, this dream can be achieved by
combining the staging of virtual actors with the techniques
of camera control and editing described in this course. An 4. DISCUSSION AND OPEN ISSUES
example is the text-to-scene system by Xtranormal. This This final section discusses the problems related to the ac-
includes various declarative shots - one-shots and two-shots tual deployment of these techniques and directions for fu-
with a variety of camera angles and compositions. Cam- ture research, including augmenting the expressiveness of
era placement is automated for declarative shots. Editing is camera control and switching techniques by considering cog-
fully automated and makes use of both declarative shots (id- nitively well-founded perceptual and aesthetic properties of
ioms) and free cameras. This overcomes the traditional lim- the shots, including framing and lighting ; extending cam-
itations associated with a purely idiom-based system. Visi- era models to include the control of other important cine-
bility is taken into account through the use of ”stages”, i.e. matographic properties such as focus, depth-of-field (DOF)
empty spaces with unlimited visibility, similar to Elson and and stereoscopic 3-D depth (interaxial distance and conver-
Riedl. Both systems use a simple algebra of ”stages”, i.e. gence); and learning more general and varied camera control
intersections and unions of stages, allowing for very fast vis- and editing idioms directly from real movies using a variety
ibility computation against the static elements of the scene. of data mining and machine learning techniques.
Occlusion between actors is handled separately by taking
pictures through the eyes of the actors. The text-to-scene 4.1 Perception and aesthetics
system by Xtranormal is currently limited to short dialogue The state of the art in automatic framing (composition) and
scenes, although with a rich vocabulary of gestures, facial editing relies on a symbolic description of the view seen by
expressions and movements. But we can expect future im- the virtual camera. This is powerful, but important aspects
provements and extensions to other scene categories, includ- of the image are not well taken into account. A possible
ing action and mood scenes. avenue for future research lies in the possibility to perform
image analysis directly from the virtual camera, to recover
3.5.3 Machinima other important perceptual and/or aesthetics attributes of
Most machinima systems include some sort of camera con- the image. This is especially important for lighting [5].
trol. For instance, MovieStorm by Short Fuze includes ”through- Other image attributes, such as the contrast between fig-
the-lens” camera control with two types of cameras. A ”free” ure and background may be equally important [2].
s Proceedings of AAAI ’96 (Portland, OR), pages
148–155, 1996.
4.2 Level of details [5] M. S. El-Nasr. A user-centric adaptive story
architecture: borrowing from acting theories. In ACE
One as yet unexplored area for future research is the relation
’04: Proceedings of the 2004 ACM SIGCHI
between cinematography and level-of-details modeling. Dur-
International Conference on Advances in computer
ing the early phases of pre-production and previz, a rough
entertainment technology, pages 109–116, 2004.
version of the scene with little details may be sufficient to
do the blocking of the actors and the cameras, and even [6] D. K. Elson and M. O. Riedl. A lightweight intelligent
to generate an early version of the editing (a rough cut). virtual cinematography system for machinima
The choices made in this stage result in a list of a few shots generation. In AI and Interactive Digital
which need to be rendered at full resolution. Thus, only Entertainment, 2007.
those parts of the scene that appear in the shot list really [7] D. Friedman and Y. A. Feldman. Automated
need to be modeled and rendered in full details. In practice, cinematic reasoning about camera behavior. Expert
it is not easy to implement this because the animation must Systems with Applications, 30(4):694–704, May 2006.
still appear realistic. Levels-of-details are still a problem for [8] F. Germeys and G. dı̈£¡Ydewalle. The psychology of
physics-based animation and AI. film: perceiving beyond the cut. Psychological
Research, 71(4):458–466, 2007.
[9] J.-L. Godard. Montage, mon beau souci. Les cahiers
4.3 Cinematic knowledge du cinéma, 11(65), décembre 1956.
Much of the current research remains limited to simple toy
[10] L. He, M. Cohen, and D. Salesin. The virtual
problems such as two-actor dialogues and fights. At this
cinematographer: a paradigm for automatic real-time
point, there has never been a convincing demonstration of a
camera control and directing. In SIGGRAPH ’96,
touching machine-generated love scene. Or a funny machine-
pages 217–224, 1996.
generated comic scene. Or a frightening machine-generated
[11] L. Itti, C. Koch, and E. Niebur. A model of
horror scene.
saliency-based visual attention for rapid scene
hal-00694444, version 1 - 4 May 2012