Learning Hatching For Pen-And-Ink Illustration of

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/239761619

Learning Hatching for Pen-and-Ink Illustration of Surfaces

Article  in  ACM Transactions on Graphics · May 2012


DOI: 10.1145/2077341.2077342

CITATIONS READS
49 1,321

4 authors, including:

Evangelos Kalogerakis Derek Nowrouzezahrai


University of Massachusetts Amherst Université de Montréal
70 PUBLICATIONS   5,049 CITATIONS    92 PUBLICATIONS   1,462 CITATIONS   

SEE PROFILE SEE PROFILE

Simon Breslav
Autodesk Research
37 PUBLICATIONS   590 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Learning Material-Aware Local Descriptors for 3D Shapes View project

A Weakly Supervised Consistency-based Learning Method for COVID-19 Segmentation in CT Images View project

All content following this page was uploaded by Simon Breslav on 26 May 2015.

The user has requested enhancement of the downloaded file.


1

Learning Hatching for Pen-and-Ink Illustration of Surfaces


EVANGELOS KALOGERAKIS
University of Toronto and Stanford University
DEREK NOWROUZEZAHRAI
University of Toronto, Disney Research Zurich, and University of Montreal
SIMON BRESLAV © ACM, (2012). This is the author's version of the
University of Toronto and Autodesk Research work. It is posted here by permission of ACM for
and your personal use. Not for redistribution. The
AARON HERTZMANN definitive version is published in ACM Transactions
University of Toronto on Graphics 31{1}, 2012.

This article presents an algorithm for learning hatching styles from line ACM Reference Format:
drawings. An artist draws a single hatching illustration of a 3D object. Her Kalogerakis, E., Nowrouzezahrai, D., Breslav, S., and Hertzmann, A. 2012.
strokes are analyzed to extract the following per-pixel properties: hatching Learning hatching for pen-and-ink illustration of surfaces. ACM Trans.
level (hatching, cross-hatching, or no strokes), stroke orientation, spacing, Graph. 31, 1, Article 1 (January 2012), 17 pages.
intensity, length, and thickness. A mapping is learned from input geometric, DOI = 10.1145/2077341.2077342
contextual, and shading features of the 3D object to these hatching prop- http://doi.acm.org/10.1145/2077341.2077342
erties, using classification, regression, and clustering techniques. Then, a
new illustration can be generated in the artist’s style, as follows. First, given
a new view of a 3D object, the learned mapping is applied to synthesize
target stroke properties for each pixel. A new illustration is then generated 1. INTRODUCTION
by synthesizing hatching strokes according to the target properties.
Nonphotorealistic rendering algorithms can create effective illus-
Categories and Subject Descriptors: I.3.3 [Computer Graphics]: trations and appealing artistic imagery. To date, these algorithms
Picture/Image Generation—Line and curve generation; I.3.5 [Computer are designed using insight and intuition. Designing new styles re-
Graphics]: Computational Geometry and Object Modeling—Geomet- mains extremely challenging: there are many types of imagery that
ric algorithms, languages, and systems; I.2.6 [Artificial Intelligence]: we do not know how to describe algorithmically. Algorithm design
Learning—Parameter learning is not a suitable interface for an artist or designer. In contrast, an
example-based approach can decrease the artist’s workload, when
General Terms: Algorithms
it captures his style from his provided examples.
Additional Key Words and Phrases: Learning surface hatching, data-driven This article presents a method for learning hatching for pen-and-
hatching, hatching by example, illustrations by example, learning orientation ink illustration of surfaces. Given a single illustration of a 3D object,
fields drawn by an artist, the algorithm learns a model of the artist’s hatch-
ing style, and can apply this style to rendering new views or new
objects. Hatching and cross-hatching illustrations use many finely-
This project was funded by NSERC, CIFAR, CFI, the Ontario MRI, and placed strokes to convey tone, shading, texture, and other quali-
KAUST Global Collaborative Research. ties. Rather than trying to model individual strokes, we focus on
Authors’ addresses: E. Kalogerakis (corresponding author), University of hatching properties across an illustration: hatching level (hatching,
Toronto, Toronto, Canada and Stanford University; email: kalo@stanford. cross-hatching, or no hatching), stroke orientation, spacing, inten-
edu; D. Nowrouzezahrai, University of Toronto, Toronto, Canada, Disney sity, length, and thickness. Whereas the strokes themselves may be
Research Zurich, and University of Montreal, Canada; S. Breslav, Univer- loosely and randomly placed, hatching properties are more stable
sity of Toronto, Toronto, Canada and Autodesk Research; A. Hertzmann, and predictable. Learning is based on piecewise-smooth mappings
University of Toronto, Toronto, Canada. from geometric, contextual, and shading features to these hatching
Permission to make digital or hard copies of part or all of this work for properties.
personal or classroom use is granted without fee provided that copies are To generate a drawing for a novel view and/or object, a
not made or distributed for profit or commercial advantage and that copies Lambertian-shaded rendering of the view is first generated, along
show this notice on the first page or initial screen of a display along with with the selected per-pixel features. The learned mappings are ap-
the full citation. Copyrights for components of this work owned by others plied, in order to compute the desired per-pixel hatching properties.
than ACM must be honored. Abstracting with credit is permitted. To copy A stroke placement algorithm then places hatching strokes to match
otherwise, to republish, to post on servers, to redistribute to lists, or to use these target properties. We demonstrate results where the algorithm
any component of this work in other works requires prior specific permission generalizes to different views of the training shape and/or different
and/or a fee. Permissions may be requested from Publications Dept., ACM, shapes.
Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 Our work focuses on learning hatching properties; we use exist-
(212) 869-0481, or [email protected]. ing techniques to render feature curves, such as contours, and an
c 2012 ACM 0730-0301/2012/01-ART1 $10.00 existing stroke synthesis procedure. We do not learn properties like
DOI 10.1145/2077341.2077342 randomness, waviness, pentimenti, or stroke texture. Each style is
http://doi.acm.org/10.1145/2077341.2077342 learned from a single example, without performing analysis across
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
1:2 • E. Kalogerakis et al.

(b) smoothed curvature directions


(a) artist’s illustration (c) smoothed PCA axis directions
[Hertzmann and Zorin 2000]

(d) smoothed image gradient (e) our algorithm, (f) our algorithm,
directions without segmentation full version

(g) results on new views and new objects

Fig. 1. Data-driven line art illustrations generated with our algorithm, and comparisons with alternative approaches. (a) Artist’s illustration of a screwdriver.
(b) Illustration produced by the algorithm of Hertzmann and Zorin [2000]. Manual thresholding of N · V is used to match the tone of the hand-drawn illustration
and globally-smoothed principal curvature directions are used for the stroke orientations. (c) Illustration produced with the same algorithm, but using local
PCA axes for stroke orientations before smoothing. (d) Illustration produced with the same algorithm, but using the gradient of image intensity for stroke
orientations. (e) Illustration whose properties are learned by our algorithm for the screwdriver, but without using segmentation (i.e., orientations are learned by
fitting a single model to the whole drawing and no contextual features are used for learning the stroke properties). (f) Illustration learned by applying all steps
of our algorithm. This result more faithfully matches the style of the input than the other approaches. (g) Results on new views and new objects.

a broader corpus of examples. Nonetheless, our method is still able proportional to depth, shading, or texture, or else based on user-
to successfully reproduce many aspects of a specific hatching style defined prioritized stroke textures [Praun et al. 2001; Winkenbach
even with a single training drawing. and Salesin 1994, 1996]. In these methods, each hatching property
is computed by a hand-picked function of a single feature of shape,
shading, or texture (e.g., proportional to depth or curvature). As a
2. RELATED WORK
result, it is very hard for such approaches to capture the variations
Previous work has explored various formulas for hatching prop- evident in artistic hatching styles (Figure 1). We propose the first
erties. Saito and Takahashi [1990] introduced hatching based on method to learn hatching of 3D objects from examples.
isoparametric and planar curves. Winkenbach and Salesin [1994; There have been a few previous methods for transferring
1996] identify many principles of hand-drawn illustration, and de- properties of artistic rendering by example. Hamel and Strothotte
scribe methods for rendering polyhedral and smooth objects. Many [1999] transfer user-tuned rendering parameters from one 3D object
other analytic formulas for hatching directions have been proposed, to another. Hertzmann et al. [2001] transfer drawing and painting
including principal curvature directions [Elber 1998; Hertzmann styles by example using nonparametric synthesis, given image
and Zorin 2000; Praun et al. 2001; Kim et al. 2008], isophotes [Kim data as input. This method maps directly from the input to stroke
et al. 2010], shading gradients [Singh and Schaefer 2010], para- pixels. In general, the precise locations of strokes may be highly
metric curves [Elber 1998], and user-defined direction fields (e.g., random (and thus hard to learn) and nonparametric pixel synthesis
Palacios and Zhang [2007]). Stroke tone and density are normally can make strokes become broken or blurred. Mertens et al. [2006]
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
Learning Hatching for Pen-and-Ink Illustration of Surfaces • 1:3

transfer spatially-varying textures from source to target geometry The decomposition of an illustration into hatching properties is
using nonparametric synthesis. Jodoin et al. [2002] model relative illustrated in Figure 2 (top). In the analysis process, these properties
locations of strokes, but not conditioned on a target image or object. are estimated from hand-drawn images, and models are learned.
Kim et al. [2009] employ texture similarity metrics to transfer During synthesis, the learned model generates these properties as
stipple features between images. In contrast to the preceding targets for stroke synthesis.
techniques, our method maps to hatching properties, such as Modeling artists’ orientation fields presents special challenges.
desired tone. Hence, although our method models a narrower range Previous work has used local geometric rules for determining stroke
of artistic styles, it can model these styles much more accurately. orientations, such as curvature [Hertzmann and Zorin 2000] or gra-
A few 2D methods have also been proposed for transferring styles dient of shading intensity [Singh and Schaefer 2010]. We find that,
of individual curves [Freeman et al. 2003; Hertzmann et al. 2002; in many hand-drawn illustrations, no local geometric rule can ex-
Kalnins et al. 2002] or stroke patterns [Barla et al. 2006], problems plain all stroke orientations. For example, in Figure 3, the strokes
which are complementary to ours; such methods could be useful for on the cylindrical part of the screwdriver’s shaft can be explained as
the rendering step of our method. following the gradient of the shaded rendering, whereas the strokes
A few previous methods use maching learning techniques to ex- on the flat end of the handle can be explained by the gradient of
tract feature curves, such as contours and silhouettes. Lum and Ma ambient occlusion ∇a. Hence, we segment the drawing into re-
[2005] use neural networks and Support Vector Machines to iden- gions with distinct rules for stroke orientation. We represent this
tify which subset of feature curves match a user sketch on a given segmentation by an additional per-pixel variable.
drawing. Cole et al. [2008] fit regression models of feature curve
locations to a large training set of hand-drawn images. These meth- —Segment label (c ∈ C) is a discrete assignment of the pixel to one
ods focus on learning locations of feature curves, whereas we focus of a fixed set of possible segment labels C.
on hatching. Hatching exhibits substantially greater complexity and Each set of pixels with a given label will use a single rule to
randomness than feature curves, since hatches form a network of compute stroke orientations. For example, pixels with label c1
overlapping curves of varying orientation, thickness, density, and might use principal curvature orientations, and those with c2 might
cross-hatching level. Hatching also exhibits significant variation in use a linear combination of isophote directions and local PCA axes.
artistic style. Our algorithm also uses the labels to create contextual features
(Section 5.2), which are also taken into account for computing the
3. OVERVIEW rest of the hatching properties. For example, pixels with label c1
may have thicker strokes.
Our approach has two main phases. First, we analyze a hand-drawn
pen-and-ink illustration of a 3D object, and learn a model of the Features. For a given 3D object and view, we define a set of
artist’s style that maps from input features of the 3D object to target features containing geometric, shading, and contextual information
hatching properties. This model can then be applied to synthesize for each pixel, as described in Appendices B and C. There are two
renderings of new views and new 3D objects. Shortly we present an types of features: “scalar” features x (Appendix B) and “orientation”
overview of the output hatching properties and input features. Then features θ (Appendix C). The features include many object-space
we summarize the steps of our method. and image-space properties which may be relevant for hatching, in-
cluding features that have been used by previous authors for feature
Hatching properties. Our goal is to model the way artists draw curve extraction, shading, and surface part labeling. The features
hatching strokes in line drawings of 3D objects. The actual place- are also computed at multiple scales, in order to capture varying
ments of individual strokes exhibit much variation and apparent ran- surface and image detail. These features are inputs to the learning
domness, and so attempting to accurately predict individual strokes algorithm, which map from features to hatching properties.
would be very difficult. However, we observe that the individual
strokes themselves are less important than the overall appearance Data acquisition and preprocessing. The first step of our
that they create together. Indeed, art instruction texts often focus on process is to gather training data and to preprocess it into features
achieving particular qualities such as tone or shading (e.g., Guptill and hatching properties. The training data is based on a single
[1997]). Hence, similar to previous work [Winkenbach and Salesin drawing of a 3D model. An artist first chooses an image from
1994; Hertzmann and Zorin 2000], we model the rendering process our collection of rendered images of 3D objects. The images are
in terms of a set of intermediate hatching properties related to tone rendered with Lambertian reflectance, distant point lighting, and
and orientation. Each pixel containing a stroke in a given illustration spherical harmonic self-occlusion [Sloan et al. 2002]. Then, the
is labeled with the following properties. artist creates a line illustration, either by tracing over the illustration
on paper with a light table, or in a software drawing package with a
—Hatching level (h ∈ {0, 1, 2}) indicates whether a region contains tablet. If the illustration is drawn on paper, we scan the illustration
no hatching, single hatching, or cross-hatching. and align it to the rendering automatically by matching borders
with brute-force search. The artist is asked not to draw silhouette
—Orientation (φ1 ∈ [0 . . . π ]) is the stroke direction in image space, and feature curves, or to draw them only in pencil, so that they can
with 180-degree symmetry. be erased. The hatching properties (h, φ, t, I, d, l) for each pixel are
—Cross-hatching orientation (φ2 ∈ [0..π ]) is the cross-hatch direc- estimated by the preprocessing procedure described in Appendix A.
tion, when present. Hatches and cross-hatches are not constrained
to be perpendicular. Learning. The training data is comprised of a single illustration
with features x, θ and hatching properties given for each pixel.
—Thickness (t ∈ + ) is the stroke width. The algorithm learns mappings from features to hatching properties
—Intensity (I ∈ [0..1]) is how light or dark the stroke is. (Section 5). The segmentation c and orientation properties φ are
the most challenging to learn, because neither the segmentation c
—Spacing (d ∈ + ) is the distance between parallel strokes. nor the orientation rules are immediately evident in the data; this
—Length (l ∈ + ) is the length of the stroke. represents a form of “chicken-and-egg” problem. We address this
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
1:4 • E. Kalogerakis et al.

no hatching
Analysis for input
hatching
object and view
cross-hatching

Extracted
Extracted Thickness Extracted Spacing Hatching Level Learning

Artist’s illustration

Extracted Intensity Extracted Length Extracted Orientations


Synthesis for input no hatching
object and view hatching
cross-hatching

Learned
Synthesized Thickness Synthesized Spacing Hatching Level

Input horse Data-driven illustration

Synthesized Intensity Synthesized Length Synthesized Orientations


Synthesis for novel no hatching
object and view hatching
cross-hatching

Synthesized
Synthesized Thickness Synthesized Spacing
Hatching Level

Input cow
Data-driven illustration

Synthesized Intensity Synthesized Length Synthesized Orientations

Fig. 2. Extraction of hatching properties from a drawing, and synthesis for new drawings. Top: The algorithm decomposes a given artist’s illustration into
a set of hatching properties: stroke thickness, spacing, hatching level, intensity, length, orientations. A mapping from input geometry is learned for each of
these properties. Middle: Synthesis of the hatching properties for the input object and view. Our algorithm automatically separates and learns the hatching
(blue-colored field) and cross-hatching fields (green-colored fields). Bottom: Synthesis of the hatching properties for a novel object and view.

ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
Learning Hatching for Pen-and-Ink Illustration of Surfaces • 1:5

(a) Estimated clusters using (b) Learned labeling (c) Learned labeling (d) Synthesized labeling
our mixture-of-experts model with Joint Boosting with Joint Boosting+CRF for another object

f1 = ∇a2 f1 = .73(∇I 3 ) + .27(r) f1 = .77(eb,3 ) + .23(∇I 3 )


f2 = .54(kmax,1 ) + .46(r⊥ ) f2 = .69(kmax,2 ) + .31(∇I ⊥,3 ) f2 = v
f1 = .59(eb,3 ) + .41(∇(L · N )3 ) f1 = .88(∇a3 ) + .12(∇(L · N )3 )
f2 = .63(ea,3 ) + .37(∇(L · N )⊥,3 ) f2 = .45(kmax,2 ) + .31(∇a⊥,3 ) + .24(ea,3 )

Fig. 3. Clustering orientations. The algorithm clusters stroke orientations according to different orientation rules. Each cluster specifies rules for hatching (f1 )
and cross-hatching (f2 ) directions. Cluster labels are color-coded in the figure, with rules shown below. The cluster labels and the orientation rules are estimated
simultaneously during learning. (a) Inferred cluster labels for an artist’s illustration of a screwdriver. (b) Output of the labeling step using the most likely labels
returned by the Joint Boosting classifier alone. (c) Output of the labeling step using our full CRF model. (d) Synthesis of part labels for a novel object. Rules:
In the legend, we show the corresponding orientation functions for each region. In all cases, the learned models use one to three features. Subscripts {1, 2, 3}
indicate the scale used to compute the field. The ⊥ operator rotates the field by 90 degrees in image-space. The orientation features used here are: maximum
and minimum principal curvature directions (kmax , kmin ), PCA directions corresponding to first and second largest eigenvalue (ea , eb ), fields aligned with
ridges and valleys respectively (r , v ), Lambertian image gradient (∇I ), gradient of ambient occlusion (∇a), and gradient of L  ·N (∇(L  · N)).
 Features that
arise as 3D vectors are projected to the image plane. See Appendix C for details.

using a learning and clustering algorithm based on Mixtures-of- 4.1 Segmentation and Labeling
Experts (Section 5.1).
Once the input pixels are classified, a pixel classifier is learned For a given view of a 3D model, the algorithm first segments the
using Conditional Random Fields with unary terms based on Joint- image into regions with different orientation rules and levels of
Boost (Section 5.2). Finally, each real-valued property is learned hatching. More precisely, given the feature set x for each pixel, the
using boosting for regression (Section 5.3). We use boosting tech- algorithm computes the per-pixel segment labels c ∈ C and hatching
niques for classification and regression since we do not know in level h ∈ {0, 1, 2}. There are a few important considerations when
advance which input features are the most important for different choosing an appropriate segmentation and labeling algorithm. First,
styles. Boosting can handle a large number of features, can select the we do not know in advance which features in x are important, and so
most relevant features, and has a fast sequential learning algorithm. we must use a method that can perform feature selection. Second,
neighboring labels are highly correlated, and performing classifi-
Synthesis. A hatching style is transferred to a target novel view cation on each pixel independently yields noisy results (Figure 3).
and/or object by first computing the features for each pixel, and then Hence, we use a Conditional Random Field (CRF) recognition algo-
applying the learned mappings to compute the preceding hatching rithm, with JointBoost unary terms [Kalogerakis et al. 2010; Shotton
properties. A streamline synthesis algorithm [Hertzmann and Zorin et al. 2009; Torralba et al. 2007]. One such model is learned for seg-
2000] then places hatching strokes to match the synthesized prop- ment labels c, and a second for hatching level h. Learning these
erties. Examples of this process are shown in Figure 2. models is described in Section 5.2.
The CRF objective function includes unary terms that assess the
4. SYNTHESIS ALGORITHM consistency of pixels with labels, and pairwise terms that assess the
consistency between labels of neighboring pixels. Inferring segment
The algorithm for computing a pen-and-ink illustration of a view
labels based on the CRF model corresponds to minimizing the
of a 3D object is as follows. For each pixel of the target image,
following objective function. We have
the features x and θ are first computed (Appendices B and C). The
segment label and hatching level are each computed as a function  
E(c) = E1 (ci ; xi ) + E2 (ci , cj ; xi , xj ), (1)
of the scalar features x, using image segmentation and recognition
i i,j
techniques. Given these segments, orientation fields for the target
image are computed by interpolation of the orientation features θ. where E1 is the unary term defined for each pixel i, E2 is the
Then, the remaining hatching properties are computed by learning pairwise term defined for each pair of neighboring pixels {i, j },
functions of the scalar features. Finally, a streamline synthesis algo- where j ∈ N (i) and N (i) is defined using the 8-neighborhood of
rithm [Hertzmann and Zorin 2000] renders strokes to match these pixel i.
synthesized properties. A streamline is terminated when it crosses The unary term evaluates a JointBoost classifier that, given the
an occlusion boundary, or the length grows past the value of the per- feature set xi for pixel i, determines the probability P (ci |xi ) for
pixel target stroke length l, or violates the target stroke spacing d. each possible label ci . The unary term is then
We now describe these steps in more detail. In Section 5, we will
describe how the algorithm’s parameters are learned. E1 (ci ; x) = − log P (ci |xi ). (2)
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
1:6 • E. Kalogerakis et al.

The mapping from features to probabilities P (ci |xi ) is learned from class contains cross-hatching regions, it has an additional orienta-
the training data using the JointBoost algorithm [Torralba et al. tion function f (θ; wc,2 ) for determining the cross-hatching direc-
2007]. tions. The resulting vector is computed to an image-space angle as
The pairwise energy term scores the compatibility of adjacent φ = atan2(y, x)/2.
pixel labels ci and cj , given their features xi and xj . Let ei be The weights w and feature selection σ are learned by the gradient-
a binary random variable representing if the pixel i belongs to a based boosting for regression algorithm of Zemel and Pitassi [2001].
boundary of hatching region or not. We define a binary JointBoost The learning of the parameters and the feature selection is described
classifier that outputs the probability of boundaries of hatching in Section 5.1.
regions P (e|x) and compute the pairwise term as
E2 (ci , cj ; xi , xj ) = − · I (ci , cj ) · (log((P (ei |xi ) + P (ej |xj ))) + μ),
4.3 Computing Real-Valued Properties
(3) The remaining hatching properties are real-valued quantities. Let y
where , μ are the model parameters and I (ci , cj ) is an indicator be a feature to be synthesized on a pixel with feature set x. We use
function that is 1 when ci = cj and 0 when ci = cj . The parameter multiplicative models of the form
 controls the importance of the pairwise term while μ contributes  αk
to eliminating tiny segments and smoothing boundaries. y= ak xσ (k) + bk , (8)
Similarly, inferring hatching levels based on the CRF model cor- k
responds to minimizing the following objective function. where xσ (k) is the index to the k-th scalar feature from x. The use
  of a multiplicative model is inspired by Goodwin et al. [2007], who
E(h) = E1 (hi ; xi ) + E2 (hi , hj ; xi , xj ) (4) propose a model for stroke thickness that can be approximated by a
i i,j product of radial curvature and inverse depth. The model is learned
in the logarithmic domain, which reduces the problem to learning
As already mentioned, the unary term evaluates another JointBoost
the weighted sum.
classifier that, given the feature set xi for pixel i, determines the  
probability P (hi |xi ) for each hatching level h ∈ {0, 1, 2}. The pair- 
log(y) = αk log ak xσ (k) + bk (9)
wise term is also defined as
k
E2 (hi , hj ; xi , xj ) = − · I (hi , hj ) · (log((P (ei |xi ) + P (ej |xj ))) + μ) Learning the parameters αk , ak , bk , σ (k) is again performed using
(5) gradient-based boosting [Zemel and Pitassi 2001], as described in
with the same values for the parameters of , μ as earlier. Section 5.3.
The most probable labeling is the one that minimizes the CRF
objective function E(c) and E(h), given their learned parameters. 5. LEARNING
The CRFs are optimized using alpha-expansion graph-cuts [Boykov
et al. 2001]. Details of learning the JointBoost classifiers and , μ We now describe how to learn the parameters of the functions used
are given in Section 5.2. in the synthesis algorithm described in the previous section.

4.2 Computing Orientations 5.1 Learning Segmentation and Orientation


Functions
Once the per-pixel segment labels c and hatching levels h are com-
puted, the per-pixel orientations φ1 and φ2 are computed. The num- In our model, the hatching orientation for a single-hatching pixel
ber of orientations to be synthesized is determined by h. When h = 0 is computed by first assigning the pixel to a cluster c, and then
(no hatching), no orientations are produced. When h = 1 (single applying the orientation function f (θ; wc ) for that cluster. If we
hatching), only φ1 is computed and, when h = 2 (cross-hatching), knew the clustering in advance, then it would be straightforward
φ2 is also computed. to learn the parameters wc for each pixel. However, neither the
Orientations are computed by regression on a subset of the orien- cluster labels nor the parameters wc are present in the training data.
tation features θ for each pixel. Each cluster c may use a different In order to solve this problem, we develop a technique inspired
subset of features. The features used by a segment are indexed by a by Expectation-Maximization for Mixtures-of-Experts [Jordan and
vector σ , that is, the features’ indices are σ (1), σ (2), . . . , σ (k). Each Jacobs 1994], but specialized to handle the particular issues of
orientation feature represents an orientation field in image space, hatching.
such as the image projection of principal curvature directions. In The input to this step is a set of pixels from the source illus-
order to respect 2-symmetries in orientation, a single orientation θ tration with their corresponding orientation feature set θ i , training
is transformed to a vector as orientations φi , and training hatching levels hi . Pixels containing
intersections of strokes or no strokes are not used. Each cluster c
v = [cos(2θ ), sin(2θ )]T . (6) may contain either single-hatching or cross-hatching. Single-hatch
clusters have a single orientation function (Eq. (7)), with unknown
The output orientation function is expressed as a weighted sum of parameters wc1 . Clusters with cross-hatches have two subclusters,
selected orientation features. We have each with an orientation function with unknown parameters wc1 and

f (θ; w) = wσ (k) vσ (k) , (7) wc2 . The two orientation functions are not constrained to produce
k
directions orthogonal to each other. Every source pixel must be-
long to one of the top-level clusters, and every pixel belonging to a
where σ (k) represents the index to the k-th orientation feature in cross-hatching cluster must belong to one of its subclusters.
the subset of selected orientation features, vσ (k) is its vector rep- For each training pixel i, we define a labeling probability γic
resentation, and w is a vector of weight parameters. There is an indicating the probability that pixel i lies in top-level cluster c,
orientation function f (θ; wc,1 ) for each label c ∈ C and, if the such that c γic = 1. Also, for each top-level cluster, we define a
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
Learning Hatching for Pen-and-Ink Illustration of Surfaces • 1:7

subcluster probability βicj , where j ∈ {1, 2}, such that βic1 + βic2 = pixels in their neighborhood and adding them only if the difference
1. The probability βicj measures how likely the stroke orientation between their orientation angle and the cluster’s current mean ori-
at pixel i corresponds to a hatching or cross-hatching direction. entation is below a threshold. In the case of cross-hatching clusters,
Single-hatching clusters have βic2 = 0. The probability that pixel i the minimum difference between the two mean orientations is used.
belongs to the subcluster indexed by {c, j } is γic βicj . The threshold is automatically selected once during preprocessing
The labeling probabilities are modeled based on a mixture-of- by taking the median of each pixel’s local neighborhood orientation
Gaussians distribution [Bishop 2006]. We have angle differences. The process is repeated for new pixels and the
cluster’s mean orientation(s) are updated at each iteration. Clusters
πc exp(−ric /2s)
γic =  , (10) composed of more than 10% cross-hatch pixels are marked as cross-
c πc exp(−ric /2s) hatching clusters; the rest are marked as single-hatching clusters.
πcj exp(−ricj /2sc ) The initial assignment of pixels to clusters gives a binary-valued ini-
βicj = , (11) tialization for γ . For cross-hatch pixels, if more than half the pixels
πc1 exp(−ric1 /2sc ) + πc2 exp(−ric2 /2sc )
in the cluster are assigned to orientation function wk2 , our algorithm
where πc , πcj are the mixture coefficients, s, sc are the variances swaps wk1 and wk2 . This ensures that the first hatching direction will
of the corresponding Gaussians, ricj is the residual for pixel i with correspond to the dominant orientation. This aids in maintaining
respect to the orientation function j in cluster c, and ric is defined orientation field consistency between neighboring regions.
as An example of the resulting clustering for an artist’s illustration
ric = min ||ui − f (θ i ; wcj )||2 , (12) of screwdriver is shown in Figure 3(a). We also include the functions
j ∈{1,2} learned for the hatching and cross-hatching orientation fields used
where ui = [cos(2φi ), sin(2φi )]T . in each resulting cluster.
The process begins with an initial set of labels γ , β, and w,
and then alternates between updating two steps: the model update 5.2 Learning Labeling with CRFs
step where the orientation functions, the mixture coefficients, and Once the training labels are estimated, we learn a procedure to trans-
variances are updated, and the label update step where the labeling fer them to new views and objects. Here we describe the procedure
probabilities are updated. to learn the Conditional Random Field model of Eq. (1) for assign-
Model update. Given the labeling, orientation functions for ing segment labels to pixels as well as the Conditional Random
each cluster are updated by minimizing the boosting error function, Field of Eq. (4) for assigning hatching levels to pixels.
described in Appendix D, using the initial per-pixel weights αi = Learning to segment and label. Our goal here is to learn the
γic βicj . parameters of the CRF energy terms (Eq. (1)). The input is the scalar
In order to avoid overfitting, a set of holdout-validation pixels are feature set x̃i for each stroke pixel i (described in Appendix B) and
kept for each cluster. This set is found by selecting rectangles of ran- their associated labels ci , as extracted in the previous step. Following
dom size and marking their containing pixels as holdout-validation Tu [2008], Shotton et al. [2008], and Kalogerakis et al. [2010], the
pixels. Our algorithm stops when 25% of the cluster pixels are parameters of the unary term are learned by running a cascade
marked as holdout-validation pixels. The holdout-validation pixels of JointBoost classifiers. The cascade is used to obtain contextual
are not considered for fitting the weight vector wcj . At each boost- features which capture information about the relative distribution of
ing iteration, our algorithm measures the holdout-validation error cluster labels around each pixel. The cascade of classifiers is trained
measured on these pixels. It terminates the boosting iterations when as follows.
the holdout-validation error reaches a minimum. This helps avoid The method begins with an initial JointBoost classifier using an
overfitting the training orientation data. initial feature set x̃, containing the geometric and shading features,
During this step, we also update the mixture coefficients and described in Appendix B. The classifier is applied to produce the
variances of the Gaussians in the mixture model, so that the data probability P (ci |x̃i ) for each possible label ci given the feature set
likelihood is maximized in this step [Bishop 2006]. We have x̃i of each pixel i. These probabilities are then binned in order
  to produce contextual features. In particular, for each pixel, the
πc = γic /N, s = γic ric /N, (13)
algorithm computes a histogram of these probabilities as a function
i ic
  of geodesic distances from it. We have
πcj = βicj /N, sc = βicj ricj /N, (14) 
i ij pic = P (cj )/Nb , (15)
j : db ≤dist(i,j )<db+1
where N is the total number of pixels with training orientations.
Label update. Given the estimated orientation functions from where the histogram bin b contains all pixels j with geodesic
the previous step, the algorithm computes the residual for each distance range [db , db+1 ] from pixel i, and Nb is the total number of
model and each orientation function. Median filtering is applied to pixels in the histogram bin b. The geodesic distances are computed
the residuals, in order to enforce spatial smoothness: ric is replaced on the mesh and projected to image space. 4 bins are used,
with the value of the median of r∗c in the local image neighborhood chosen in logarithmic space. The bin values pic are normalized
of pixel i (with radius equal to the local spacing Si ). Then the pixel to sum to 1 per pixel. The total number of bins are 4|C|. The
labeling probabilities are updated according to Eqs. (10) and (11). values of these bins are used as contextual features, which are
concatenated into x̃i to form a new scalar feature set xi . Then, a
Initialization. The clustering is initialized using a constrained second JointBoost classifier is learned, using the new feature set
mean-shift clustering process with a flat kernel, similar to con- x as input and outputting updated probabilities P (ci |xi ). These are
strained K-means [Wagstaff et al. 2001]. The constraints arise from used in turn to update the contextual features. The next classifier
a region-growing strategy to enforce spatial continuity of the initial uses the contextual features generated by the previous one, and
clusters. Each cluster grows by considering randomly-selected seed so on. Each JointBoost classifier is initialized with uniform
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
1:8 • E. Kalogerakis et al.

Gaussian Nearest Linear Ridge Gradient-based


Least-squares Decision Tree Neighbors Regression Lasso boosting
Bayes Regression

Fig. 5. Comparisons of the generalization performance of various tech-


niques for regression for the stroke spacing. The same training data are
provided to the techniques based on the extracted spacing on the horse of
Figure 2 and feature set x. Left to right: Linear regression (least-squares
Logistic JointBoost without regularization), Ridge Regression, Lasso, gradient-based boosting.
SVM JointBoost
Regression and CRF Fitting a model on such very high-dimensional space without any sparsity
no hatching hatching cross-hatching prior yields very poor generalization performance. Gradient-based boosting
gives more reasonable results than Ridge Regression or Lasso, especially
Fig. 4. Comparisons of various classifiers for learning the hatching level. on the legs of the cow, where the predicted spacing values seem to be more
The training data is the extracted hatching level on the horse of Figure 2 consistent with the training values on the legs of the horse (see Figure 2).
and feature set x. Left to right: least-squares for classification, decision tree The regularization parameters of Ridge Regression and Lasso are estimated
(Matlab’s implementation based on Gini’s diversity index splitting crite- by hold-out validation with the same procedure as in our algorithm.
rion), Gaussian Naive Bayes, Nearest Neighbors, Support Vector Machine,
Logistic Regression, Joint Boosting, Joint Boosting and Conditional Ran- of Eq. (1) based on the learned parameters of its unary and pairwise
dom Field (full version of our algorithm). The regularization parameters classifier and using different values for , μ. This optimization at-
of SVMs, Gaussian Bayes, Logistic Regression are estimated by hold-out tempts to “push” the segment label boundaries to be aligned with
validation with the same procedure as in our algorithm. pixels that have higher probability to be boundaries. The energy is
maximized using Matlab’s implementation of Preconditioned Con-
weights and terminates when the holdout-validation error reaches jugate Gradient with numerically-estimated gradients.
a minimum. The holdout-validation error is measured on pixels
that are contained in random rectangles on the drawing, selected Learning to generate hatching levels. The next step is to
as before. The cascade terminates when the holdout-validation learn the hatching levels h ∈ {0, 1, 2}. The input here is the hatching
error of a JointBoost classifier is increased with respect to the level hi per pixel contained inside the rendered area (as extracted
holdout-validation error of the previous one. The unary term is during the preprocessing step (Appendix A) together with their full
defined based on the probabilities returned by the latter classifier. feature set xi (including the contextual features as extracted before).
To learn the pairwise term of Eq. (3), the algorithm needs to Our goal is to compute the parameters of the second CRF model
estimate the probability of boundaries of hatching regions P (e|x), used for inferring the hatching levels (Eq. (4)). Our algorithm first
which also serve as evidence for label boundaries. First, we ob- uses a JointBoost classifier that maps from the feature set x to the
serve that segment boundaries are likely to occur at particular parts training hatching levels h. The classifier is initialized with uniform
of an image; for example, pixels separated by an occluding and weights and terminates the boosting rounds when the hold-out
suggestive contour are much less likely to be in the same segment validation error is minimized (the hold-out validation pixels are
as two pixels that are adjacent on the surface. For this reason, we selected as described earlier). The classifier outputs the probability
define a binary JointBoost classifier, which maps to probabilities of P (hi |xi ), which is used in the unary term of the CRF model.
boundaries of hatching regions for each pixel, given the subset of Finally, our algorithm uses the same pairwise term parameters
its features x computed from the feature curves of the mesh (see trained with the CRF model of the segment labels to rectify the
Appendix B). In this binary case, JointBoost reduces to an earlier boundaries of the hatching levels.
algorithm called GentleBoost [Friedman et al. 2000]. The training Examples comparing our learned hatching algorithm to several
data for this pairwise classifier are supplied by the marked bound- alternatives are shown in Figure 4.
aries of hatching regions of the source illustration (see Appendix A);
pixels that are marked as boundaries have e = 1, otherwise e = 0.
5.3 Learning Real-Valued Stroke Properties
The classifier is initialized with more weight given to the pixels that
contain boundaries of hatching level regions, since the training data Thickness, intensity, length, and spacing are all positive, real-valued
contains many more nonboundary pixels. More specifically, if NB quantities, and so the same learning procedure is used for each one in
are the total number of boundary pixels, and NNB is the number turn. The input to the algorithm are the values of the corresponding
of nonboundary pixels, then the weight is NNB /NB for boundary stroke properties, as extracted in the preprocessing step (Section A)
pixels and 1 for the rest. The boosting iterations terminate when the and the full feature set xi per pixel.
hold-out validation error measured on validation pixels (selected as The multiplicative model of Eq. (8) is used to map the features
described earlier) is minimum. to the stroke properties. The model is learned in the log-domain, so
Finally, the parameters  and μ are optimized by maximizing the that it can be learned as a linear sum of log functions. The model is
energy term learned with gradient-based boosting for regression (Appendix D).
 The weights for the training pixels are initialized as uniform. As
ES = P (ei |x), (16)
earlier, the boosting iterations stop when the holdout-validation
i:ci =cj ,j ∈N(i)
measured on randomly selected validation pixels is minimum.
where N (i) is the 8-neighborhood of pixel i, and ci , cj are the labels Examples comparing our method to several alternatives are
for each pair of neighboring pixels i, j inferred using the CRF model shown in Figure 5.
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
Learning Hatching for Pen-and-Ink Illustration of Surfaces • 1:9

Our rendering for


Artist’s illustration input view & object

Fig. 6. Data-driven line art illustrations generated with our algorithm. From left to right: Artist’s illustration of a horse. Rendering of the model with our
learned style. Renderings of new views and new objects.

6. RESULTS directions as well as local principal axes (corresponding to candidate


local planar symmetry axes) play very important roles for synthe-
The figures throughout our article show synthesized line drawings sizing the hatching orientations. Fields aligned with suggestive con-
of novel objects and views with our learning technique (Figures 1, tours, ridges, and valleys are also significant for determining orien-
and 6 through 14). As can be seen in the examples, our method tations. Fields based on shading attributes have moderate influence.
captures several aspects of the artist’s drawing style, better than We show the frequency of scalar features averaged selected by
alternative previous approaches (Figure 1). Our algorithm adapts boosting and averaged over all our nine drawings in Figure 16 for
to different styles of drawing and successfully synthesizes them learning the hatching level, thickness, spacing, intensity, length,
for different objects and views. For example, Figures 6 and 7 show and segment label. Shape descriptor features (based on PCA, shape
different styles of illustrations for the same horse, applied to new contexts, shape diameter, average geodesic distance, distance from
views and objects. Figure 14 shows more examples of synthesis medial surface, contextual features) seem to have large influence
with various styles and objects. on all the hatching properties. This means that the choice of tone is
However, subtleties are sometimes lost. For example, in probably influenced by the type of shape part the artist draws. The
Figure 12, the face is depicted with finer-scale detail than the segment label is mostly determined by the shape descriptor features,
clothing, which cannot be captured in our model. In Figure 13, our which is consistent with the previous work on shape segmentation
method loses variation in the character of the lines, and depiction and labeling [Kalogerakis et al. 2010]. The hatching level is mostly
of important details such as the eye. One reason for this is that the influenced by image intensity, V · N , L  · N . The stroke thickness
stroke placement algorithm attempts to match the target hatching  · N ,
properties, but does not optimize to match a target tone. These is mostly affected by shape descriptor features, curvature, L
variations may also depend on types of parts (e.g., eyes versus gradient of image intensity, the location of feature lines, and, finally,
torsos), and could be addressed given part labels [Kalogerakis et al. depth. Spacing is mostly influenced by shape descriptor features,
curvature, derivatives of curvature, L  · N , and V · N . The intensity
2010]. Figure 11 exhibits randomness in stroke spacing and width
that is not modeled by our technique. is influenced by shape descriptor features, image intensity, V · N ,
 · N , depth, and the location of feature lines. The length is mostly
L
Selected features. We show the frequency of orientation fea- determined by shape descriptor features, curvature, radial curvature,
tures selected by gradient-based boosting and averaged over all our  · N , image intensity and its gradient, and location of feature lines
L
nine drawings in Figure 15. Fields aligned with principal curvature (mostly suggestive contours).
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
1:10 • E. Kalogerakis et al.

Our rendering for


Artist’s illustration input view & object

Fig. 7. Data-driven line art illustrations generated with our algorithm. From left to right: Artist’s illustration of a horse with a different style than 6. Rendering
of the model with our learned style. Renderings of new views and new objects.

However, it is important to note that different features are learned 7. SUMMARY AND FUTURE WORK
for different input illustrations. For example, in Figure 11, the light
directions mostly determine the orientations, which is not the case Ours is the first method to generate predictive models for synthe-
for the rest of the drawings. We include histograms of the frequency sizing detailed line illustrations from examples. We model line il-
of orientation and scalar features used for each of the drawing in lustrations with a machine learning approach using a set of features
the supplementary material. suspected to play a role in the human artistic process. The complex-
ity of man-made illustrations is very difficult to reproduce; however,
Computation time. In each case, learning a style from a source we believe our work takes a step towards replicating certain key as-
illustration takes 5 to 10 hours on a laptop with Intel i7 processor. pects of the human artistic process. Our algorithm generalizes to
Most of the time is consumed by the orientation and clustering step novel views as well as objects of similar morphological class.
(Section 5.1) (about 50% of the time for the horse), which is im- There are many aspects of hatching styles that we do not capture,
plemented in Matlab. Learning segment labels and hatching levels including: stroke textures, stroke tapering, randomness in strokes
(Section 5.2) represents about 25% of the training time (imple- (such as wavy or jittered lines), cross-hatching with more than two
mented in C++) and learning stroke properties (Section 5.3) takes hatching directions, style of individual strokes, and continuous tran-
about 10% of the training time (implemented in Matlab). The rest sitions in hatching level. Interactive edits to the hatching properties
of the time is consumed for extracting the features (implemented could be used to improve our results [Salisbury et al. 1994].
in C++) and training hatching properties (implemented in Matlab). Since we learn from a single training drawing, the generalization
We note that our implementation is currently far from optimal, capabilities of our method to novel views and objects are limited. For
hence, running times could be improved. Once the model of the example, if the relevant features differ significantly between the test
style is learned, it can be applied to different novel data. Given the views and objects, then our method will not generalize to them. Our
predicted hatching and cross-hatching orientations, hatching level, method relies on holdout validation using randomly selected regions
thickness, intensity, spacing, and stroke length at each pixel, our to avoid overfitting; this ignores the hatching information existing
algorithm traces streamlines over the image to generate the final in these regions that might be valuable. Retraining the model is
pen-and-ink illustration. Synthesis takes 30 to 60 minutes. Most of sometimes useful to improve results, since these regions are selected
the time (about 60%) is consumed here for extracting the features. randomly. Learning from a broader corpus of examples could help
The implementations for feature extraction and tracing streamlines with these issues, although this would require drawings where the
are also far from optimal. hatching properties change consistently across different object and
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
Learning Hatching for Pen-and-Ink Illustration of Surfaces • 1:11

Our rendering for


Artist’s illustration input view & object

Fig. 8. Data-driven line art illustrations generated with our algorithm. From left to right: Artist’s illustration of a rocker arm. Rendering of the model with our
learned style. Renderings of new views and new objects.

Our rendering for


Artist’s illustration input view & object

Fig. 9. Data-driven line art illustrations generated with our algorithm. From left to right: Artist’s illustration of a pitcher. Rendering of the model with our
learned style. Renderings of new views and new objects.

views. In addition, if none of the features or a combination of them processing step. This step gives only coarse estimates, and depends
can be mapped to a hatching property, then our method will also fail. on various thresholds. This preprocessing cannot handle highly-
Finding what and how other features are relevant to artists’ pen- stylized strokes such as wavy lines or highly-textured strokes.
and-ink illustrations is an open problem. Our method does not repre- Example-based stroke synthesis [Freeman et al. 2003; Hertz-
sent the dependence of style on part labels (e.g., eyes versus torsos), mann et al. 2002; Kalnins et al. 2002] may be combined with
as previously done for painterly rendering of images [Zeng et al. our approach to generate styles with similar stroke texture. An
2009]. Given such labels, it could be possible to generalize the optimization technique [Turk and Banks 1996] might be used
algorithm to take this information into account. to place streamlines appropriately in order to match a target
The quality of our results depend on how well the hatching tone. Our method focuses only on hatching, and renders feature
properties were extracted from the training drawing during the pre- curves separately. Learning the feature curves is an interesting
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
1:12 • E. Kalogerakis et al.

Our rendering for


Artist’s illustration input view & object

Fig. 10. Data-driven line art illustrations generated with our algorithm. From left to right: Artist’s illustration of a Venus statue. Rendering of the model with
our learned style. Renderings of new views and new objects.

Our rendering for


Artist’s illustration input view & object

Fig. 11. Data-driven line art illustrations generated with our algorithm. From left to right: Artist’s illustration of a bunny using a particular style; hatching
orientations are mostly aligned with point light directions. Rendering of the model with our learned style. Renderings of new views and new objects.

future direction. Another direction for future work is hatching for Breadth-First Search (BFS) is performed to find the nearest pixel
animated scenes, possibly based on a data-driven model similar in the source image with intensity less than half of the start pixel.
to Kalogerakis et al. [2009]. Finally, we believe that aspects of The distance to this pixel is the stroke thickness.
our approach may be applicable to other applications in geometry Orientation. The structure tensor of the local image neighborhood
processing and artistic rendering, especially for vector field design. is computed at the scale of the previously-computed thickness of
the stroke. The dominant orientation in this neighborhood is given
by the eigenvector corresponding to the smallest eigenvalue of the
APPENDIX structure tensor. Intersection points are also detected, so that they
can be omitted from orientation learning. Our algorithm marks as
intersection points those points detected by a Harris corner detector
A. IMAGE PREPROCESSING in both the original drawing and the skeleton image. Finally, in
Given an input illustration drawn by an artist, we apply the fol- order to remove spurious intersection points, pairs of intersection
lowing steps to determine the hatching properties for each stroke points are found with distance less than the local stroke thickness,
pixel. First, we scan the illustration and align it to the rendering and their centroid is marked as an intersection instead.
automatically by matching borders with brute-force search. The Spacing. For each skeletal pixel, a circular region is grown around
following steps are sufficiently accurate to provide training data for the pixel. At each radius, the connected components of the region
our algorithms. are computed. If at least 3 pixels in the region are not connected to
the center pixel, with orientation within π/6 of the center pixel’s
Intensity. The intensity Ii is set to the grayscale intensity of the orientation, then the process halts. The spacing at the center pixel
pixel i of the drawing. It is normalized within the range [0, 1]. is set to the final radius.
Thickness. Thinning is first applied to identify a single-pixel-wide Length. A BFS is executed on the skeletal pixels to count the
skeleton for the drawing. Then, from each skeletal pixel, a number of pixels per stroke. In order to follow a single stroke
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
Learning Hatching for Pen-and-Ink Illustration of Surfaces • 1:13

,
Artist s illustration Our rendering for
input view & object

Fig. 12. Data-driven line art illustrations generated with our algorithm. From left to right: Artist’s illustration of a statue. Rendering of the model with our
learned style. Renderings of new views and new objects.

, Our rendering for


Artist s illustration
input view & object

Fig. 13. Data-driven line art illustrations generated with our algorithm. From left to right: Artist’s illustration of a cow. Rendering of the model with our
learned style. Renderings of new views and new objects.

(excluding pixels from overlapping cross-hatching strokes), at major radius equal to its spacing. All pixels belonging to any of
each BFS expansion, pixels are considered inside the current these masks are given label Hi = 1. For each intersection pixel,
neighborhood with similar orientation (at most π/12 angular a circular mask is also created around it with radius equal to its
difference from the current pixel’s orientation). spacing. All connected components are computed from the union
Hatching level. For each stroke pixel, an ellipsoidal mask is created of these masks. If any connected component contains more than 4
with its semiminor axis aligned to the extracted orientation, and intersection pixels, the pixels of the component are assigned with

ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
1:14 • E. Kalogerakis et al.

Artists’
illustrations
Synthesis for novel objects

Fig. 14. Data-driven line art illustrations generated with our algorithm based on the learned styles from the artists’ drawings in Figures 1, 6, 7, 10, 13.

We perform a final smoothing step (with a Gaussian kernel of


width equal to the median of the spacing values) to denoise the
kmax , kmin
properties.
ea eb
∇(L × N ) B. SCALAR FEATURES
∇(V × N ) There are 1204 scalar features (x̃ ∈ 760 ) for learning the scalar
properties of the drawing. The first 90 are mean curvature, Gaus-
s
sian curvature, maximum and minimum principal curvatures by
v sign and absolute value, derivatives of curvature, radial curvature
and its derivative, view-dependent minimum and maximum curva-
r
tures [Judd et al. 2007], geodesic torsion in the projected viewing
∇a direction [DeCarlo and Rusinkiewicz 2007]. These are measured in
three scales (1%, 2%, 5% relative to the median of all-pairs geodesic
∇I
distances in the mesh) for each vertex. We also include their abso-
∇(L · N ) lute values, since some hatching properties may be insensitive to
sign. The aforesaid features are first computed in object-space and
∇(V · N )
then projected to image-space.
E The next 110 features are based on local shape descriptors, also
used in Kalogerakis et al. [2010] for labeling parts. We compute the
L singular values s1 , s2 , s3 of the covariance of vertices inside patches
0.0 0.10 0.20 0.30 of various geodesic radii (5%, 10%, 20%) around each vertex, and
also add the following features for each patch: s1 /(s1 + s2 + s3 ),
Fig. 15. Frequency of the first three orientation features selected by s2 /(s1 + s2 + s3 ), s3 /(s1 + s2 + s3 ), (s1 + s2 )/(s1 + s2 + s3 ),
gradient-based boosting for learning the hatching orientation fields. The (s1 + s3 )/(s1 + s2 + s3 ), (s2 + s3 )/(s1 + s2 + s3 ), s1 /s2 , s1 /s3 , s2 /s3 ,
frequency is averaged over all our nine training drawings (Figures 1, 6, 7, s1 /s2 + s1 /s3 , s1 /s2 + s2 /s3 , s1 /s3 + s2 /s3 , yielding 45 features total.
8, 9, 10, 11, 12, 13). The contribution of each feature is also weighted by We also include 24 features based on the Shape Diameter Function
the total segment area where it is used. The orientation features are grouped (SDF) [Shapira et al. 2010] and distance from medial surface [Liu
based on their type: principal curvature directions (kmax , kmin ), local prin- et al. 2009]. The SDF features are computed using cones of angles
 × N ), ∇(V × N ), directions aligned with
cipal axes directions (ea , eb ), ∇(L 60, 90, and 120 per vertex. For each cone, we get the weighted
suggestive contours (s ), valleys (v ), ridges (r ), gradient of ambient occlu- average of the samples and their logarithmized versions with dif-
sion (∇a), gradient of image intensity (∇I ), gradient of (L  · N),
 gradient of ferent normalizing parameters α = 1, α = 2, α = 4. For each
(V · N ), vector irradiance (E),
 projected light direction (L).  of the preceding cones, we also compute the distance of medial
surface from each vertex. We measure the diameter of the maximal
label Hi = 2. Two horizontal and vertical strokes give rise to a inscribed sphere touching each vertex. The corresponding medial
minimum cross-hatching region (with 4 intersections). surface point will be roughly its center. Then we send rays from
Hatching region boundaries. Pixels are marked as boundaries if this point uniformly sampled on a Gaussian sphere, gather the in-
they belong to boundaries of the hatching regions or if they are tersection points, and measure the ray lengths. As with the shape
endpoints of the skeleton of the drawing. diameter features, we use the weighted average of the samples, we
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
Learning Hatching for Pen-and-Ink Illustration of Surfaces • 1:15

Curv. Curv. Curv.


D.Curv. D.Curv. D.Curv.
Rad.Curv. Rad.Curv. Rad.Curv.
D.Rad.Curv. D.Rad.Curv. D.Rad.Curv.
View Curv. View Curv. View Curv.
Torsion Torsion Torsion
PCA PCA PCA
SC SC SC
GD GD GD
SDF SDF SDF
MSD MSD MSD
Depth Depth Depth
Amb.Occl. Amb.Occl. Amb.Occl.
I I I
V ·N V ·N V ·N
L·N L·N L·N
|∇ I| |∇ I| |∇I|
|∇ (V · N )| |∇(V · N )| |∇(V · N )|
|∇ (L · N )| |∇(L · N )| |∇(L · N )|
S.Contours S.Contours S.Contours
App.Ridges App.Ridges App.Ridges
Ridges Ridges Ridges
Valleys Valleys Valleys
Contextual Contextual Contextual

0.0 0.10 0.20 0.30 0.0 0.05 0.10 0.15 0.20 0.0 0.05 0.10 0.15 0.20
Top features used for hatching level Top features used for thickness Top features used for spacing
Curv. Curv. Curv.

D.Curv. D.Curv. D.Curv.

Rad.Curv. Rad.Curv. Rad.Curv.

D.Rad.Curv. D.Rad.Curv. D.Rad.Curv.

View Curv. View Curv. View Curv.

Torsion Torsion Torsion

PCA PCA PCA

SC SC SC

GD GD GD

SDF SDF SDF

MSD MSD MSD


Depth Depth Depth

Amb.Occl. Amb.Occl. Amb.Occl.


I I I

V ·N V ·N V ·N
L·N L·N L·N
|∇ I| |∇I| |∇I|

|∇(V · N )| |∇(V · N )| |∇(V · N )|


|∇(L · N )| |∇(L · N )| |∇(L · N )|
S.Contours S.Contours S.Contours
App.Ridges App.Ridges App.Ridges
Ridges Ridges Ridges
Valleys Valleys Valleys
Contextual Contextual Contextual

0.0 0.05 0.10 0.15 0.20 0.25 0.0 0.05 0.10 0.15 0.20 0.0 0.10 0.20 0.30 0.40
Top features used for intensity Top features used for length Top features used for segment label

Fig. 16. Frequency of the first three scalar features selected by the boosting techniques used in our algorithm for learning the scalar hatching properties. The
frequency is averaged over all nine training drawings. The scalar features are grouped based on their type: Curvature (Curv.), Derivatives of Curvature (D.
Curv.), Radial Curvature (Rad. Curv.), Derivative of Radial Curvature (D. Rad. Curv.), Torsion, features based on PCA analysis on local shape neighborhoods,
features based Shape Context histograms [Belongie et al. 2002], features based on geodesic distance descriptor [Hilaga et al. 2001], shape diameter function
features [Shapira et al. 2010], distance from medial surface features [Liu et al. 2009], depth, ambient occlusion, image intensity (I), V · N , L · N,
 gradient
magnitudes of the last three, strength of suggestive contours, strength of apparent ridges, strength of ridges and values, contextual label features.

normalize and logarithmize them with the same preceding normal- The next 53 features are based on functions of the rendered
izing parameters. In addition, we use the average, squared mean, 3D object in image-space. We use maximum and minimum image
10th, 20th, . . . , 90th percentile of the geodesic distances of each curvature, image intensity, and image gradient magnitude features,
vertex to all the other mesh vertices, yielding 11 features. Finally, computed with derivative-of-Gaussian kernels with σ = 1, 2, 3, 5,
we use 30 shape context features [Belongie et al. 2002], based on the yielding 16 features. The next 12 features are based on shading under
implementation of Kalogerakis et al. [2010]. All the these features different models: V · N , L
 · N (both clamped at zero), ambient
are first computed in object-space per vertex and then projected to occlusion, where V , L,
 and N are the view, light, and normal vectors
image-space. at a point. These are also smoothed with Gaussian kernels of σ =
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
1:16 • E. Kalogerakis et al.

1, 2, 3, 5. We also include the corresponding gradient magnitude, in Eq. (7) with rk = wσ (k) . For the thickness, spacing, intensity, and
the maximum and minimum curvature of V · N and L  · N features, length, we use functions of the form: ψσ (k) (x) = log(ak xσ (k) + bk ),
yielding 24 more features. We finally include the depth value for so that the selected feature is scaled and translated properly to match
each pixel. the target stroke property, as expressed in Eq. (9) with rk = ασ (k) .
We finally include the per-pixel intensity of occluding and sug- Given N training pairs {xi , ti }, i = {1, 2, . . . , N }, where ti are
gestive contours, ridges, valleys, and apparent ridges extracted by exemplar values of the target property, the gradient-based boosting
the rtsc software package [Rusinkiewicz and DeCarlo 2007]. We algorithm attempts to minimize the average error of the models of
use 4 different thresholds for extracting each feature line (the rtsc the single features with respect to the weight vector r.
thresholds are chosen from the logarithmic space [0.001, 0.1] for K   K 
1   −0.5 
N
suggestive contours and valleys and [0.01, 0.1] for ridges and appar- L(r) = rk exp rk · (ti − ψk (xi )) 2
(18)
ent ridges). We also produce dilated versions of these feature lines by N i=1 k=1 k=1
convolving their image with Gaussian kernels with σ = 5, 10, 20,
yielding in total 48 features. This objective function is minimized iteratively by updating a set
Finally, we also include all the aforesaid 301 features with their of weights {ωi } on the training samples {xi , ti }. The weights are
powers of 2 (quadratic features), −1 (inverse features), −2 (in- initialized to be uniform, that is, ωi = 1/N , unless there is a prior
verse quadratic features), yielding 1204 features in total. For the confidence on each sample. In this case, the weights can be initial-
inverse features, we prevent divisions by zero, by truncating near- ized accordingly as in Section 5.1. Then, our algorithm initiates the
zero values to 1e − 6 (or −1e − 6 if they are negative). Using these boosting iterations that have the following steps.
transformations on the features yielded slightly better results for —for each feature f in x, the following function is minimized:
our predictions.

N  
C. ORIENTATION FEATURES Lf = ωi rk−0.5 exp (rk (ti − ψf (xi )))2 (19)
i=1
There are 70 orientation features (θ) for learning the hatching and
with respect to rk as well as the parameters of ak , bk in the
cross-hatching orientations. Each orientation feature is a direction
case of learning stroke properties. The parameter rk is optimized
in image-space; orientation features that begin as 3D vectors are
using Matlab’s active-set algorithm including the constraint that
projected to 2D. The first six features are based on surface principal
rk ∈ (0, 1] (with initial estimate set to 0.5). For the first boosting
curvature directions computed at 3 scales as before. Then, the next
iteration k = 1, rk = 1 is used always. For stroke properties,
six features are based on surface local PCA axes projected on the
our algorithm alternates between optimizing for the parameters
tangent plane of each vertex corresponding to the two larger singular
ak , bk with Matlab’s BFGS implemenation, keeping rk constant
values of the covariance of multiscale surface patches computed as
and optimizing for the parameter rk , keeping the rest constant,
earlier. Note that the local PCA axes correspond to candidate local
until convergence or until 10 iterations are completed.
planar symmetry axes [Simari et al. 2006]. The next features are:
 × N and V × N . The preceding orientation fields are undefined
L —the feature f is selected that yields the lowest value for Lf , hence
at some points (near umbilic points for curvature directions, near σ (k) = arg min Lf .
f
planar and spherical patches for the PCA axes, and near L  · N = 0
—the weights on the training pairs are updated as follows.
and V · N = 0 for the rest). Hence, we use globally-smoothed
direction based on the technique of Hertzmann and Zorin [2000]. ωi = ωi · rk−0.5 exp (rk · (ti − ψσ (k) (xi )))2 (20)
Next, we include L,  and vector irradiance E [Arvo 1995]. The 
—The ωi = ωi / i ωi are normalized so that they sum to 1.
next 3 features are vector fields aligned with the occluding and
—the hold-out validation error is measured: if it is increased, the
suggestive contours (given the view direction), ridges, and valleys
loop is terminated and the selected feature of the current iteration
of the mesh. The next 16 features are image-space gradients of the
is disregarded.
following scalar features: ∇(V ·N ), ∇(L·
 N ), ambient occlusion, and

image intensity ∇I computed at 4 scales as before. The remaining Finally, the weights rk = rk / k rk are normalized so that they sum
orientation features are the directions of the first 35 features rotated to 1.
by 90 degrees in the image-space.
ACKNOWLEDGMENTS
D. BOOSTING FOR REGRESSION The authors thank Seok-Hyung Bae, Patrick Coleman,
The stroke orientations as well as the thickness, intensity, length, Vikramaditya Dasgupta, Mark Hazen, Thomas Hendry, and
and spacing are learned with the gradient-based boosting technique Olga Vesselova for creating the hatched drawings. The auhtors
of Zemel and Pitassi [2001]. Given input features x, the gradient- thank Olga Veksler for the graph cut code and Robert Kalnins,
based boosting technique aims at learning an additive model of the Philip Davidson, and David Bourguignon for the jot code. The
following form to approximate a target property. We have authors thank Aim@Shape, VAKHUN, and Cyberware reposi-
 tories as well as Xiaobai Chen, Aleksey Golovinskiy, Thomas
τ (x) = rk ψσ (k) (x), (17) Funkhouser, Andrea Tagliasacchi and Richard Zhang for the 3D
k models used in this article.
where ψσ (k) is a function on the k-th selected feature with index σ (k) REFERENCES
and rk is its corresponding weight. For stroke orientations, the func-
tions are simply single orientation features: ψσ (k) (v) = vσ (k) . Hence, ARVO, J. 1995. Applications of irradiance tensors to the simulation of non-
in this case, the preceding equation represents a weighted combi- lambertian phenomena. In Proceedings of the SIGGRAPH Conference.
nation (i.e., interpolation) of the orientation features, as expressed 335–342.
ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.
Learning Hatching for Pen-and-Ink Illustration of Surfaces • 1:17

BARLA, P., BRESLAV, S., THOLLOT, J., SILLION, F., AND MARKOSIAN, L. 2006. KIM, Y., YU, J., YU, X., AND LEE, S. 2008. Line-Art illustration of dynamic
Stroke pattern analysis and synthesis. Comput. Graph. Forum 25, 3. and specular surfaces. ACM Trans. Graphics.
BELONGIE, S., MALIK, J., AND PUZICHA, J. 2002. Shape matching and ob- LIU, R. F., ZHANG, H., SHAMIR, A., AND COHEN-OR, D. 2009. A part-aware
ject recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. surface metric for shape analysis. Comput. Graph. Forum 28, 2.
Intell. 24, 4. LUM, E. B. AND MA, K.-L. 2005. Expressive line selection by example. Vis.
BISHOP, C. M. 2006. Pattern Recognition and Machine Learning. Springer. Comput. 21, 8, 811–820.
BOYKOV, Y., VEKSLER, O., AND ZABIH, R. 2001. Fast approximate energy min- MERTENS, T., KAUTZ, J., CHEN, J., BEKAERT, P., AND DURAND., F. 2006.
imization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 11. Texture transfer using geometry correlation. In Proceedings of the EGSR
COLE, F., GOLOVINSKIY, A., LIMPAECHER, A., BARROS, H. S., FINKELSTEIN, Conference.
A., FUNKHOUSER, T., AND RUSINKIEWICZ, S. 2008. Where do people draw PALACIOS, J. AND ZHANG, E. 2007. Rotational symmetry field design on
lines? ACM Trans. Graph. 27, 3. surfaces. ACM Trans. Graph.
DECARLO, D. AND RUSINKIEWICZ, S. 2007. Highlight lines for conveying PRAUN, E., HOPPE, H., WEBB, M., AND FINKELSTEIN, A. 2001. Real-Time
shape. In Proceedings of the NPAR Conference. Hatching. In Proceedings of the SIGGRAPH Conference.
ELBER, G. 1998. Line art illustrations of parametric and implicit forms. RUSINKIEWICZ, S. AND DECARLO, D. 2007. rtsc library. http://www.cs.
IEEE Trans. Vis. Comput. Graph. 4, 1, 71–81. princeton.edu/gfx/proj/sugcon/.
FREEMAN, W. T., TENENBAUM, J., AND PASZTOR, E. 2003. Learning style SAITO, T. AND TAKAHASHI, T. 1990. Comprehensible rendering of 3-D shapes.
translation for the lines of a drawing. ACM Trans. Graph. 22, 1, 33–46. In Proceedings of the SIGGRAPH Conference. 197–206.
FRIEDMAN, J., HASTIE, T., AND TIBSHIRANI, R. 2000. Additive logistic re- SALISBURY, M. P., ANDERSON, S. E., BARZEL, R., AND SALESIN, D. H. 1994.
gression: A statistical view of boosting. Ann. Statist. 38, 2. Interactive pen-and-ink illustration. In Proceedings of the SIGGRAPH
GOODWIN, T., VOLLICK, I., AND HERTZMANN, A. 2007. Isophote distance: A Conference. 101–108.
shading approach to artistic stroke thickness. In Proceedings of the NPAR SHAPIRA, L., SHALOM, S., SHAMIR, A., ZHANG, R. H., AND COHEN-OR, D.
Conference. 53–62. 2010. Contextual part analogies in 3D objects. Int. J. Comput. Vis.
GUPTILL, A. L. 1997. Rendering in Pen and Ink, S. E. Meyer, Ed., Watson- SHOTTON, J., JOHNSON, M., AND CIPOLLA, R. 2008. Semantic texton forests
Guptill. for image categorization and segmentation. In Proceedings of the CVPR
HAMEL, J. AND STROTHOTTE, T. 1999. Capturing and re-using rendition styles Conference.
for non-photorealistic rendering. Comput. Graph. Forum 18, 3, 173–182. SHOTTON, J., WINN, J., ROTHER, C., AND CRIMINISI, A. 2009. Textonboost for
HERTZMANN, A., JACOBS, C. E., OLIVER, N., CURLESS, B., AND SALESIN, D. H. image understanding: Multi-Class object recognition and segmentation by
2001. Image analogies. In Proceedings of the SIGGRAPH Conference. jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81, 1.
HERTZMANN, A., OLIVER, N., CURLESS, B., AND SEITZ, S. M. 2002. Curve SIMARI, P., KALOGERAKIS, E., AND SINGH, K. 2006. Folding meshes: Hierar-
analogies. In Proceedings of the EGWR Conference. chical mesh segmentation based on planar symmetry. In Proceedings of
HERTZMANN, A. AND ZORIN, D. 2000. Illustrating smooth surfaces. In the SGP Conference.
Proceedings of the SIGGRAPH Conference. 517–526. SINGH, M. AND SCHAEFER, S. 2010. Suggestive hatching. In Proceedings of
HILAGA, M., SHINAGAWA, Y., KOHMURA, T., AND KUNII, T. L. 2001. Topol- the Computational Aesthetics Conference.
ogy matching for fully automatic similarity estimation of 3d shapes. In SLOAN, P.-P., KAUTZ, J., AND SNYDER, J. 2002. Precomputed radiance transfer
Proceedings of the SIGGRAPH Conference. for real-time rendering in dynamic, low-frequency lighting environments.
JODOIN, P.-M., EPSTEIN, E., GRANGER-PICHÉ, M., AND OSTROMOUKHOV, V. In Proceedings of the SIGGRAPH Conference. 527–536.
2002. Hatching by example: A statistical approach. In Proceedings of the TORRALBA, A., MURPHY, K. P., AND FREEMAN, W. T. 2007. Sharing visual fea-
NPAR Conference. 29–36. tures for multiclass and multiview object detection. IEEE Trans. Pattern
JORDAN, M. I. AND JACOBS, R. A. 1994. Hierarchical mixtures of experts and Anal. Mach. Intell. 29, 5.
the em algorithm. Neur. Comput. 6, 181–214. TU, Z. 2008. Auto-context and its application to high-level vision tasks. In
JUDD, T., DURAND, F., AND ADELSON, E. 2007. Apparent ridges for line Proceedings of the CVPR Conference.
drawing. ACM Trans. Graph. 26, 3. TURK, G. AND BANKS, D. 1996. Image-Guided streamline placement. In
KALNINS, R., MARKOSIAN, L., MEIER, B., KOWALSKI, M., LEE, J., DAVIDSON, Proceedings of the SIGGRAPH Confernce.
P., WEBB, M., HUGHES, J., AND FINKELSTEIN, A. 2002. WYSIWYG NPR: WAGSTAFF, K., CARDIE, C., ROGERS, S., AND SCHRÖDL, S. 2001. Constrained
Drawing strokes directly on 3D models. In Proceedings of the SIGGRAPH k-means clustering with background knowledge. In Proceedings of the
Conference. 755–762. ICML Conference.
KALOGERAKIS, E., HERTZMANN, A., AND SINGH, K. 2010. Learning 3d mesh WINKENBACH, G. AND SALESIN, D. 1994. Computer-Generated pen-and-ink
segmentation and labeling. ACM Trans. Graph. 29, 3. illustration. In Proceedings of the SIGGRAPH Conference. 91–100.
KALOGERAKIS, E., NOWROUZEZAHRAI, D., SIMARI, P., MCCRAE, J., HERTZ- WINKENBACH, G. AND SALESIN, D. 1996. Rendering parametric surfaces in
MANN, A., AND SINGH, K. 2009. Data-Driven curvature for real-time line pen and ink. In Proceedings of the SIGGRAPH Conference. 469–476.
drawing of dynamic scenes. ACM Trans. Graph. 28, 1. ZEMEL, R. AND PITASSI, T. 2001. A gradient-based boosting algorithm for
KIM, S., MACIEJEWSKI, R., ISENBERG, T., ANDREWS, W. M., CHEN, W., SOUSA, regression problems. In Proceedings of the Conference on Neural Infor-
M. C., AND EBERT, D. S. 2009. Stippling by example. In Proceedings of mation Processing Systems.
the NPAR Conference. ZENG, K., ZHAO, M., XIONG, C., AND ZHU, S.-C. 2009. From image parsing
KIM, S., WOO, I., MACIEJEWSKI, R., AND EBERT, D. S. 2010. Automated to painterly rendering. ACM Trans. Graph. 29.
hedcut illustration using isophotes. In Proceedings of the Smart Graphics
Conference. Received October 2010; revised June 2011; accepted July 2011

ACM Transactions on Graphics, Vol. 31, No. 1, Article 1, Publication date: January 2012.

View publication stats

You might also like