Automatic 3d
Automatic 3d
1 Centre for Digital Entertainment, National Centre for Computer Animation, Bournemouth University, UK
2 Hibbert Ralph Animation, UK
Figure 1: From initial storyboard to pre-visualisation. The lamp in these three shots is automatically posed using the storyboard
drawing. On the top row, drawings are overlaid on top of the 3D scene with the assets unposed. On the bottom row, the lamp
has been posed automatically using our method.
Abstract
Inferring the 3D pose of a character from a drawing is a non-trivial and under-constrained problem. Solving it
may help automate various parts of an animation production pipeline such as pre-visualisation. In this paper, a
novel way of inferring the 3D pose from a monocular 2D sketch is proposed. The proposed method does not make
any external assumptions about the model, allowing it to be used on different types of characters. The 3D pose
inference is formulated as an optimisation problem and a parallel variation of the Particle Swarm Optimisation
algorithm called PARAC-LOAPSO is utilised for searching the minimum. Testing in isolation as well as part of a
larger scene, the presented method is evaluated by posing a lamp and a horse character. The results show that this
method is robust and is able to be extended to various types of models.
Categories and Subject Descriptors (according to ACM CCS): I.2.8 [Artificial Intelligence]: Problem Solving, Con-
trol Methods, and Search—Heuristic methods I.2.10 [Artificial Intelligence]: Vision and Scene Understanding—
Shape I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Animation I.4.9 [Image Processing
and Computer Vision]: Applications—
DOI: 10.2312/pgs.20141264
126 A. Gouvatsos / Automatic 3D Posing from 2D Hand-Drawn Sketches
production. These drawings are then used by animators or or simply classifying [EHA12]. In animation, Jain et al.
pose artists to set up the initial pose layout before going into [JSH09] follows a database approach in order to infer the
the animation stage. Rather than introducing a new work- missing depth and pose humanoid characters. However, it
flow or step, the aim of the proposed method is to automate is able to handle only orthogonal camera projections, while
one of the existing steps. Specifically, the focus is on making the method proposed in this paper can handle both orthogo-
the initial posing stage of a production less labour intensive. nal and perspective projection. Building a database of poses
may be expensive, especially since motion capture cannot
Posing 3D models from 2D sketches is a non-trivial open
be easily used for non-humanoids that are so common in 3D
problem, closely related to the computer vision problem of
animation productions. As such, an approach that does not
inferring a 3D pose from a 2D photograph but with sev-
require a database was preferred.
eral differences. Photographs are usually accurate in terms
of scale and include colour information. Sketches may have Explicit shape model approach: This approach is very
squashing and stretching, usually lack colour and contain effective when there are multiple camera views because the
noise in the form of auxiliary strokes e.g. for shading. problem is no longer under-constrained (no occlusions). It
This paper focuses on using sketches to pose 3D charac- consists of comparing the 3D render pose to the 2D image in
ters automatically, for layout or pre-visualisation. order to estimate how close it is to a correct pose. One of its
limitations in computer vision is that it requires a 3D model
that is a good match to the 2D image. In 3D productions this
2. Related work limitation does not apply, since a 3D model exists anyway.
The mainstream method of posing 3D characters is by Favreau et al. successfully pose animals using Radial Ba-
manipulating controls of a skeletal structure, usually joints sis Functions of 3D pose examples [FRDC06]. Ivekovic et
[BW76]. Even with the invention of Inverse Kinematics al. [ITP08] infer 3D poses of humans in video sequences
[ZB94], this remains a time consuming process which re- using multi-view images. Ivekovic et al. still make human-
quires the user to be trained in specific suites of software. specific assumptions limiting its use to a small subset of
A drawing represents an internal model the artist has characters while both Favreau et al. and Ivekovic et al., use
about a scene and it contains no depth information. As video data and exploit temporal relations between frames.
such, the problem of inferring the depth from a drawing is
Posing interfaces: Interfaces to replace the mainstream
under-constrained, creating problems such as the forward-
skeleton manipulation process have been the focus of pre-
backward ambiguity problem (Figure 2).
vious research e.g. tangible devices [JPG∗ 14]. Examples
of sketch-based interfaces make use of differential blend-
ing [OBP∗ 13] or the line of action to pose 3D characters
[GCR13]. These sketch-based interfaces may offer improve-
ments, but are still operated by artists. The proposed method
is aiming to reduce the process to an automatic method a
user with no artistic background is able to perform.
This paper proposes a method for posing 3D mod-
Figure 2: An example of the forward-backward ambiguity els from 2D drawings, aiming at pose layout and pre-
issue in inferred 3D pose of a lamp when not using the inter- visualisation. It makes no assumptions about the 3D model
nal lines for additional information. in order to deal with many types of characters. Unlike Jain
et al. [JSH09], it does not rely on a database. Instead, a par-
Automatic 3D posing: In general, inferring a 3D pose ticle swarm optimisation (PSO) variant called Parallel Con-
from a drawing can be broken down into two sub-problems. vergence Local Optima Avoidance PSO (PARAC-LOAPSO)
Firstly, match the 3D model projection to the drawing. Sec- is used. This allows for posing of both cooperative (e.g. hu-
ondly, infer the missing depth. For 3D productions, the cam- mans) and non-cooperative (e.g. animals) characters. It does
era represents the position from which the characters in the not require the model to have a skeleton; any type of con-
drawing are viewed, according to the artist’s internal model. troller is suitable. Unlike Ivekovic et al. [ITP08] the input
data are monocular, stand-alone drawings with no tempo-
A comprehensive overview of progress in this computer
ral relations between them. Minimal user input is required,
vision problem [Gav99] highlights two main types of ap-
in the form of pinpointing the joints on top of the drawing.
proach, the explicit shape model approach and the database
To avoid forward-backward ambiguity, the proposed method
approach. In this case the focus is on data from drawings
uses a combination of various descriptors [FAK12]. There
rather than photographs or video frames.
are currently no other methods that perform automatic pos-
Database approach: Previous computer vision research ing for the layout phase of animation, that work irrespective
has used database approaches not only for tracking and pos- of the character model and that can be applied without the
ing [JSH09] but also for positioning [XCF∗ 13] in a scene need for a pre-existing database.
Figure 3: Joints overlaid by a user on Figure 4: Edge map of a drawing of a Figure 5: Silhouette of a drawing of a
top of a drawing of a human. human, extracted using the Canny al- human.
gorithm.
of the epochs will be dedicated to exploitation. tmax is the Joints: The user overlays the positions of the joints on top
maximum number of epochs for a run. of the drawing (Figure 3). The position of the overlaid joints
on the drawing is compared to that of the 3D positions of the
Xit+1 = Xit +Vit+1 (4)
joints, transformed into screen space coordinates.
For every particle i, its position Xi is updated (equation 4). If
Edge map: To use internal lines in the drawing, the Canny
the particle belongs to the first population subgroup, its ve-
algorithm [Can86] is used to extract a binary map of the ex-
locity component Vit+1 is updated using equation 5. Other-
ternal and internal edges (Figure 4). Then the binary map of
wise, to attempt and avoid local minima by adding variation,
the drawing is compared with that of the rendered image.
it is updated using equation 6.
Silhouette: Finally, for the overall shape, the silhouette of
Vit+1 = K[Vit + c1 (Bti − Xit+1 ) + c2 (Btg − Xit+1 )] (5)
the drawing is used as a binary map (Figure 5). The silhou-
ette distance between the rendered image and the drawing is
Vit+1 = K{Vit − Lt [r1 (Bti − Xit+1 ) + r2 (Btg − Xit+1 )]} (6) calculated using a previously successful method [FAK12].
2t
Lt = 2 − (7)
tmax
4. Results
r1 , r2 ∼ U(0, 1) (8) The system was tested on two problems of different dimen-
c1 = φ1 r1 (9) sionality and difficulty: a lamp (Figure 6) and a horse (Figure
c2 = φ2 r2 (10) 7). The reason for choosing them was to evaluate whether
the proposed method is suitable irrespective of the problem.
Algorithm: The above components are combined in the Both models would be difficult to be posed via methods such
following steps to form the overall algorithm. as motion capture, especially the horse. Most motion cap-
ture solutions focus on humanoids and are not affordable
1. Initialise the swarm population, with global best Btg . for smaller studios. Apart from posing models in isolation,
2. For each particle i, with personal best Bti (in parallel): the proposed method is evaluated by posing a lamp within a
a. Update particle position (equation 4). scene using an existing storyboard drawing (Figure 1).
b. Update descriptors of 3D pose (Section 3.3). For these tests, the algorithm was ran for tmax = 1000
c. Evaluate fitness of particle (equation 3). epochs. The values φ1 and φ2 as defined in Section 3, were
d. If current fitness < Bti , set Bti to current fitness. set to a value of 2.05, which is the default value in the canon-
e. If current fitness < Btg , set Btg to current fitness. ical PSO [BK07] and ensures convergence of the swarm. The
minimum and maximum velocity of each particle were set
to -360.0 and 360.0 respectively, since this is the space of
3.3. Comparing drawings to renders
possible rotations. This means that in one epoch, a joint can
Three ways to compare the drawing with the 3D render were rotate at most by one complete rotation either clockwise or
chosen given their previous reported advantages and disad- counter-clockwise. The minimum and maximum position of
vantages. Joints are used by Jain et al. in order to find close each particle was set dynamically based on the angle rotation
poses from a database. In order to make use of the internal limits of each joint of the model, reducing the search space
edges, edge maps are simple, fast and as effective [TR07] of possible poses. The exploration phase (Section 3.2), was
as more expensive methods like Shape Context Histograms set to Tc = 34 , so the first tmax Tc = 750 epochs were dedicated
[AT06]. Finally, silhouettes have been used to find poses to exploration and the last quarter of epochs was focused on
generatively [ITP08] or to train models [AT04]. exploitation.
Figure 6: Side by side comparison of the drawings (top) and the estimated 3D pose (bottom) for the lamp model.
1000 epochs took approximately 81 seconds on average However, it is this flexibility which leads to the main dis-
on a system with 8GB of RAM, an Intel HD Graphics 3000 advantage of this method. Since the approach is generative, it
GPU and an Intel Core i5-2520M 2.50GHz CPU. requires a rendering step in every iteration, which may mean
it takes a longer time to return a result. It is important to
note that the time to return a result is computational only,
meaning that this method is still able to contribute to reduc-
ing costs. Furthermore, implementing the method on a GPU
reduces the effects of this disadvantage significantly.
Since the PARAC-LOAPSO algorithm is stochastic, result
accuracy may vary between runs. However, a system with a
powerful graphics card can normalise the results through the
use of a large population of particles, which helps both in
searching as well as in having a more varied initialisation.
6. Conclusion
This paper proposes a method to automatically pose 3D
Figure 7: Side by side comparison of the drawings (top) and
models using information from monocular drawings. This
the estimated 3D pose (bottom) for the horse model.
approach is generative and deals with inferring a 3D pose as
an optimisation problem. The results show that it is general
enough to deal with many types of characters, and a lamp
5. Discussion and a horse model are successfully posed in different scenar-
It is important to note that this method does not aim to pro- ios. Moreover, it does not require changes in the pipeline to
duce final animated output. It aims at automating the ini- accommodate for it. The focus of the proposed method is to
tial pose layout phase or pre-visualisation, which is currently pose models for the pose layout phase or pre-visualisation,
performed manually by skilled operators. not final animated output. It may serve as a direct link be-
tween storyboarding and pose layout phases of a pipeline.
The choice of the method of comparing the drawing to
the render may have a notable effect on the end result. For It is worth examining more general or accurate descrip-
example, by not using the internal lines of a drawing, the tors and methods of comparing drawings to renders, such
forward-backward ambiguity problem can become an issue as a context-based approach on joints [JSH09]. Completely
as seen in Figure 2. removing the need for user input may be possible by using
medial axis methods [ABCJ08] to extract the curve skele-
The method proposed in this paper easily fits into tradi-
ton automatically from the drawing and then fit the character
tional animation pipelines. Unlike Jain et al. [JSH09], it does
skeleton to the curve skeleton.
not require pre-existing data and can use perspective and or-
thographic camera models. The pinpointing of the joints user A hybrid between the current optimisation approach and
input can take place as part of the clean up phase. The sys- a data-driven approach such as Jain et al. [JSH09] is another
tem is flexible and applicable to a broad range of models, area where future work may expand to, to get results faster
from objects like lamps, to quadrupeds like horses. while remaining more general than a pure database method.