GPU-Driven Real-Time Mesh Contour Vectorization
GPU-Driven Real-Time Mesh Contour Vectorization
South China University of Technology, Institute of Computer Science and Engineering, China
Figure 1: Real-time vectorization and stylization of a rose (158k triangles) under 2560 × 1440 resolution: From left to right are respectively
input 3D mesh, vectorized stroke curves rendered with different colors, and two different stylizations based on extracted stroke curves
Abstract
Rendering contours of 3D meshes has a wide range of applications. Previous CPU-based contour rendering algorithms support
advanced stylized effects but cannot achieve realtime performance. On the other hand, real-time algorithms based on GPU
have to sacrifice some advanced stylization effects due to the difficulty of linking contour elements into stroke curves. This
paper proposes a GPU-based mesh contour rendering method which includes the following steps: (1) before rendering, a
preprocessing step analyzes the adjacency and geometric information from the 3d mesh model; (2) at runtime, an extraction
stage firstly selects contour edges from the 3D mesh model, then the parallelized Bresenham algorithm rasterizes the contour
edges into a set of oriented contour pixels; (3) next, Potrace is parallelized to extract (pixel) edge loops from the contour pixels;
(4) subsequently, a novel segmentation procedure is designed to partition the edge loops into strokes; (5) finally, these strokes
are then converted into 2D strip meshes in order to support rendering with controllable styles. Except the preprocessing step,
all other procedures are implemented in parallel on a GPU. This enables our framework to achieve real-time performance for
high-resolution rendering of dense mesh models.
CCS Concepts
• Computing methodologies → Non-photorealistic rendering; Image processing;
https://www.eg.org https://diglib.eg.org
DOI: 10.2312/sr.20221159
94 Wangziwei Jiang, Guiqing Li, Yongwei Nie, Chuhua Xian / GPU-Driven Real-Time Mesh Contour Vectorization
fully implemented on GPU in order to avoid frequent commu- 2.1. Image-based contour rendering
nication between CPU and GPU. GPU-based methods can be
Image-based approaches directly apply image filters to extract fea-
roughly divided into two categories: contour-edge-based rendering
ture pixels from a rendered image. Some CPU-based algorithms
[MH04,CF09], and image-filtering-based rendering [ST90,ND04].
further exploit image vectorization algorithms to convert feature
The former directly extract contour edges from the mesh and then
pixels into continuous planar curves. For example, CPU-based im-
renders each edge as a line segment or a rectangle while the lat-
age vectorization algorithms [Sel03] can be used for tracing feature
ter first renders the geometry (for example, depth and normals)
curves. Xiong et al. [XFZ16] used GPU to accelerate the vector-
into textures, then finds feature pixels via image processing filters.
ization process, however their method relies on CPU to finish the
Unfortunately, to our knowledge, existing full GPU-based methods
sequential contouring.
cannot link contour pixels / edges together to form stroke curves,
which however is the key for contour stylization. Recently, we have GPU-based approaches usually make use of a fragment shader to
also witnessed deep neural networks being utilized to produce styl- apply edge detection filter on G-buffers. A G-buffer generally con-
ized line drawings [LNHK20, LFHK21], which mainly focus on sists of three components: scene color, depth and normal images. A
learning styles rather than curve generation. pixel in the G-buffer is considered as a line feature if its gradient is
higher than a specified threshold [CS16]. Mesh contour can be ap-
Chaining contour elements (image pixels or mesh edges) into a
proximately extracted as a subset of these line features. As only a
curve usually starts from a contour element and then continuously
set of scattered pixels are generated, this kind of methods can only
links the current element to its adjacent neighbor, until arriving at a
support limited control over the stylization [ND04, Har07]. For ex-
singular point where the chain’s visibility changes [BH19]. When
ample, it is challenging to achieve thick lines since the detected
linking 2D pixels, this process can be considered as a particular
pixels are usually highly noisy under a low gradient threshold. It is
genre of image vectorization. It is difficult to parallelize the linking
often inaccurate too. For example, contours may be missed when
procedure due to its sequential nature and the irregular topology of
the depth varies slowly around the contour area.
contour edges or pixels.
To address the issue, we propose a GPU-based system to gener- “Inverted hull”, a special GPU-based method by Raskar and Co-
ate stroke curves from 3D mesh models. It first works on the CPU hen [RC99], is very popular in industries due to its simplicity and
to prepare the adjacency information between vertices, edges and efficiency. The given mesh model is rendered twice to reveal its
faces, and also geometric attributes including vertex positions and outline. The first pass renders front faces into a depth buffer while
face normals. A GPU scheme is then designed to quickly locate the second pass renders slightly enlarged back faces in black color,
contour edges between front and back faces of the mesh. After that, so that contour appears as black borders.
the parallelized Bresenham algorithm [Wri90] is adopted to raster- Bénard et al. [BJC∗ 12] proposed a method to track feature
ize these contour edges. To trace the boundaries of these rasterized curves in the image space with temporal coherence. Their algo-
contours efficiently, we parallelize the Potrace algorithm [Sel03] rithm is mainly based on CPU, except for the stages of line pixel
on the GPU, where the pixel-edge chaining step is parallelized by filtering and final rendering that are done on a GPU. Each curve is
the technique of parallel list ranking [Wyl79]. Finally, based on the represented as a polyline initialized using a CPU-based image vec-
orientation of mesh contours and the traced boundaries, we devise torization algorithm. In each frame, they avoided the cost of vector-
a simple heuristic that is also parallelized to extract stroke polylines ization (essentially reconstruction) by tracking and deforming a set
from image boundaries. of curves. Although being able to achieve excellent temporal coher-
In summary, our contributions include: ence for meshes of moderate complexity, the approach suffers from
a performance bottleneck due to multiple readbacks from GPU to
• We parallelize the Potrace algorithm [Sel03] that is previously CPU. As having little knowledge about the underlying 3D scene,
designed for CPU-based image vectorization, overcoming the its curve topology sometimes deviates from scene occlusions and
sequential nature of boundary tracing by using the technique of details.
parallel list ranking [Wyl79].
• We also propose a heuristic-rule-based parallel algorithm to ex-
tract stroke curves from the traced boundaries. 2.2. Mesh-edge-based contour rendering
• Our method is fully in parallel. By exploiting the sparsity of con-
Instead of extracting contour strokes from rendered images, some
tour edges and pixels, we further improve the performance of our
methods directly compute and render contour edges from the 3D
method, achieving real-time performance.
model. An edge is considered on the contour when one of its two
adjacent faces is forward and the other one backward with respect
2. Related work to the current viewpoint.
An large amount of literature has been contributed to contour ex- The earliest GPU-based methods [CM02, Goo03] treat each
traction and stylization, which can be roughly classified into three mesh edge as a degenerated quad and select contour edges in a
categories: image-based contour rendering, mesh-edge-based con- vertex shader. Each quad contains four vertices (two are the end-
tour rendering, and the hybrid methods. This work focuses on real- points of the mesh edge and the other two are its opposite vertices
time approaches that can be implemented on a GPU. We refer the on its two adjacent faces) in order to determine whether the corre-
readers to the survey by Bénard and Hertzmann [BH19] for more sponding edge is a contour one. A fragment shader is then devised
details. to scan-convert the contour edges. Noticing that it may lead to gaps
between adjacent edges when vertex normals fail to reflect the con- a specialized rasterization scheme to collect contour pixels only.
tour curvature well, McGuire and Hughes [MH04] drew caps at Our scheme includes three sequential stages: recognition of contour
the ends of each contour edge. There are also efforts using GPU edges, rasterization of the recognized edges and visibility decision
to extract mesh edges for other purposes. For example, Peciva et on the rasterized fragments (pixels).
al. [PSM∗ 13] and wachter et al. [WKS07] used GPU to efficiently
compute shadow volumes.
4.1. Computation of contour edges
Cole and Finkelstein [CF10] noted that early GPU-based meth-
An edge of a mesh is considered contour if and only if one of its
ods suffer from visibility issues. They utilized geometry shader and
two adjacent triangles is a front face and the other one is a back face
advanced fragment shader techniques to achieve accurate visibil-
with respect to the viewpoint. Given local information of all edges
ity determination for contour edges. Each edge is projected onto
collected on the CPU, our GPU-based procedure first computes the
the screen and sliced into small 2D segments, then the visibility of
orientation of all faces, and then collects the contour edges while
each segment is estimated via comparing its depth against the scene
getting rid of non-contour ones.
depth buffer, and finally each contour edge is individually rendered
as textured quads. Pre-processing. Recognizing an contour edge needs to know the
local geometry near the edge, therefore we collect all related infor-
2.3. Hybrid approaches mation in CPU. This includes the following five buffers:
Hybrid method combines both the geometric information of con- • edge-vertex buffer Bev : store the index of 2 vertices for each
tour edges and texture information of the rasterized pixels to gen- edge;
erate contour stroke curves during the whole process. Both contour • edge-face buffer Be f : store the index of 2 adjacent faces of each
edges and pixels have their own advantages and disadvantages for edge;
rendering. The former may lead to small and frequent zig-zag arti- • vertex buffer Bvc : record vertex coordinates;
facts when rendered as strokes [NM00]. On the contrary, the latter • face-vertex buffer B f v : save vertex index for each face;
has simpler topology and natural appearance but usually loses ac- • face-normal buffer B f n : record face normals.
curate 3D information. Considering that concave edges, whose internal dihedral angles are
A typical hybrid approach by Isenberg et al. [IHS02] extracts greater than π, cannot be a part of a visible contour [BH19], we
3D curves from contour edges with the help of a image-precision discard all this kind of edges in Bev and Be f to save resources. To
line visibility algorithm adapted for contours. The algorithm is es- our experience, about 40% of the total mesh edges can be removed
sentially a software depth test in which contour edges are scan- (see Figure 12).
converted into pixel-sized fragments and each fragment compares Orientation of triangles with respect to viewpoint. We dis-
its depth against its 3 × 3 neighbors in the z-buffer. patch a GPU kernel to calculate the orientation of each face with
Our approach analyzes the strokes in the image space. We also respect to the viewpoint based on buffers B f v and B f n and store it
record geometric information such as vertex positions, face nor- in the face orientation buffer B f o in which a back face is labelled
mals, primitive adjacency, and projected direction of contour edges with ’1’ while a front face is labelled with ’0’.
for use. Therefore, our method can also be viewed as a GPU-based Detection of contour edges. With B f o as input, a GPU kernel is
hybrid algorithm. created to recognize contour edges from Be f . An edge is a contour
edge if its two adjacent faces have different orientations, namely
3. Overview one with label ’1’ and the other with label ’0’. Next, we use paral-
lel stream-compaction [BOA09] to select contour edges while dis-
Our method takes a triangular mesh as input and generates vector- carding the rest. This yields a new buffer, the contour edge buffer
ized contour curves. Specifically, it consists of five stages as shown Bce . The subsequent GPU threads will only process edges in Bce in-
in Figure 2. From left to right: (1) Preprocessing is conducted on stead of those in Bev . According to McGuire [McG04, MH04], the
CPU to collect the adjacency information and geometric attributes number of contour edges is close to N 0.8
f where N f is the number
from the given mesh models; (2) Rasterization is responsible for of mesh faces.
recognizing the contour edges by checking the orientation of faces
sharing the edge and then rasterizing the edges into pixels via a
parallelized Bresenham algorithm; (3) The vectorization state par- 4.2. Fragment generation
allelizes Potrace algorithm to trace the loops of the pixel bound- A parallelized Bresenham algorithm [Wri90] is designed to scan-
aries; (4) The stroke generation stage employs a simple heuristic convert the contour edges into fragments. Each fragment is a pixel-
to extract the strokes from the pixel edge loops; (5) Finally, the sized primitive with geometric attributes and a pointer to its contour
stylization stage yields the rendering result of contour edges with a edge. The algorithm consists of two passes: a counting pass and an
specific style. allocation pass.
Fragment counting pass. With Bce and Bvc as input, this pass
4. Contour rasterization
counts how many fragments are covered by each contour edge. If
Conventional hardware rasterization only yields a whole image in- the absolute slope of the projection of the contour edge is less than
stead of generating the desired contour pixels. Hence, we develop 1, the number of pixels equals to the length of its projection along
Figure 2: Our approach consists of five stages. From left to right are respectively preprocessing, rasterization, vectorization, stroke genera-
tion, and stylization rendering. In the middle three pictures, white pixels stand for background regions.
x-axis. Otherwise the number is the length of its projection along employ the parallel stream compaction algorithm [SHG∗ ] to obtain
y-axis. Fragment count is stored as a sub-buffer Bc f within Bce . We a contour pixel buffer Bcp with pixel coordinates.
call these fragments contour fragments.
Fragment generation pass. This pass allocates a fragment at-
tribute buffer B f a for the fragments according to the total fragment
number. We record pixel coordinates, projection of its associated
edge vector (edge vertices oriented by its adjacent front face), depth
and normal for each fragment in B f a . For each contour edge, its
fragments are sequentially stored in B f a . To achieve such an al-
location scheme, we need the mapping between contour edges in
Bce and contour fragments in B f a . We apply an exclusive add-scan
upon the fragment count buffer Bc f to build the mapping Bce_ f
for each edge to its starting fragment index. The fragment-to-edge
mapping B f _ce is initialized with negative ones. We use Bce_ f to Figure 3: Creation of contour pixels: A soft test generates a buffer
build the mapping at starting fragments in B f _ce . Then we broadcast of contour pixels such as (x, y) and a set of pv-frags for each pixel,
the mapping to other fragments via a segmented max-scan [Ble90] ’e’, ’d’ and ’f’ on (x, y); A hard test selects the front-most one for
upon B f _ce , with each starting fragment seen as the segment head. each contour pixel, e. g. ’e’ among ’e’, ’d’, and ’f’ on (x, y).
B f _ce and Bce_ f enables each fragment (resp. contour) to access
attributes from the corresponding contour (resp. fragments). Fi- Hardware depth test pass. This pass picks the front-most pv-
nally, we apply the parallel Bresenham algorithm [Wri90] to com- frag for each contour pixel and copies the fragment attributes into
pute the coordinate for each fragment. Note that the depth and nor- the corresponding pixel in Bcp . We treat each pv-frag as a 1-pixel-
mal should be interpolated from vertex attributes in a perspective- size point whose depth is the fragment depth and whose color is
correct manner. computed by packing bits of the fragment attributes from B f a . The
pv-frag points are then rendered into a texture with hardware z-test.
4.3. Contour pixel generation At last, each contour pixel samples the texture at its coordinate and
decodes the sampled color to the corresponding fragment attributes.
We need to extract visible fragments from B f a . Accurate contour In Figure 3, pv-frag ’e’ is finally selected in this test.
visibility has long been a challenging problem [CF10,BHK14]. We
address the issue by a two-pass procedure on GPU. A soft depth test Figure 4 presents an example of the visibility test: visible (resp.
picks up pixels covered by visible contour fragments, referred to as hidden) fragments are marked as green (resp. red) on the left col-
contour pixels. A hardware z-test pass then selects the front-most umn; the right column illustrates contour pixels colored with en-
fragment for each contour pixel. coded geometrical attributes. Rasterized contour-pixels only oc-
cupy a tiny portion of the screen, making it possible to achieve
Soft depth-test pass. A scene depth texture is rendered in ad-
realtime image vectorization.
vance. For each contour fragment, we compare its depth from B f a
against depth samples from its 3 × 3 neighborhood in the depth tex-
ture. A fragment passes the test if it is in front of no fewer than two
neighbors and is called a pseudo-visible-fragment (abbrev. as pv-
frag). This relaxed depth test allows multiple pv-frags to cluster in
the same contour pixel as shown in Figure 3 in which ’e’, ’d’ and
’f’ among 6 fragments pass the test to be a pv-frag within the same
screen pixel.
To generate the contour pixels, we use a texture with all pixels
assigned to 0. Each pv-frag atomically reads its pixel value from Figure 4: An example showing results before (left: fragments) and
the texture, and then mark the pixel value as 1. We record the co- after (right: the contour pixels) generating contour pixels.
ordinate of a pixel in the pv-frag first visiting the pixel and then
5. Contour chaining in which each element knows the indices of its previous and next
pixel-edge neighbors.
So far, we have obtained Bcp , the buffer of contour pixels with
geometric attributes, in which contour pixels generally form long
and thin strands in the corresponding image. A chaining process
should be conducted to link the contour pixels into a set of long
curves [GTDS10].
Our chaining process is inspired by Potrace [Sel03], which is Figure 6: Four pixel configurations. Given the left pixel-edge (red
designed for vectorizing the boundary of a binary image, where the arrow) of a contour pixel (black one), its next pixel-edge should be
boundary consists of a sequence of boundary pixel edges. A pixel the blue one.
has four pixel edges by viewing it as a square and a boundary pixel
edge is one shared by a foreground pixel and a background pixel as
shown in Figure 5. Each boundary is an oriented pixel-edge loop 5.2. Edge loop flattening
and encloses a connected region. These loops act as a superset of
our final stroke curves. We propose a parallel solution to replace the highly sequential pro-
cess of Potrace to extract all pixel-edge loops from B pel and flatten
them onto a linear array as shown in Figure 7. Our solution consists
of two passes: loop breaking and list ranking. The first pass selects
a head element to break edge-loops while the later pass ranks pixel-
edges in each edge-loop with respect to the head element.
In our setting, each edge-loop is a circular linked list and each
pixel-edge is a list node randomly scattered in B pel . It is quite suit-
able for Wyllie’s parallel list ranking algorithm [Wyl79] to deter-
mine the rank of each pixel-edge in the pixel-edge loop. With ranks
calculated, organizing the pixel-edges into linear arrays becomes
trivial.
Loop breaking. In this step, we determine the head pixel-edge
for each pixel-edge-loop. We specify the pixel-edge with the largest
Morton code [Mor66] as the head of the loop, where the Morton
Figure 5: Edge-loops: black and white squares are foreground code, unique for each pixel-edge, encodes its direction and related
(contour) and background pixels, respectively; edges shared by pixel coordinates. After obtaining the Morton codes of all pixel-
white and black squares are boundary pixel edges (red, blue and edges, we employ Wyllie’s algorithm with its operator set as inte-
yellow one with arrow indicating their direction); three colored ger maximum to pick up the head pixel-edge with maximal Morton
polygons are pixel-edge loops. code. The tail node of an edge-loop will be chosen as the predeces-
sor of the starting one. Two traced edge-loops are shown in the top
of Figure 7 in which the red arrows stand for the head.
5.1. Generation of pixel edges and creation of their linkage List ranking. This pass ranks the above linked lists with head
and tail nodes via Wyllie algorithm [Wyl79]. After ranking, we use
We follow the ‘path decomposition’ scheme of Potrace to generate
the rank of each node (pixel-edge) as its array index and serialize all
oriented pixel-edges for each contour pixel and build their linkage
pixel-edge loops into an array, i.e., B pel , such that the pixel-edges
according to different contour pixel configurations.
belonging to the same loop occupy a continuous segment as shown
Each pixel-edge is clockwise oriented around its contour pixel, in the bottom of Figure 7.
therefore for the left pixel-edge of a contour pixel, we need to con-
sider the 2 × 2 block where the contour pixel is at the bottom-right
5.3. Operations on edge-loop pool
corner as shown in Figure 6. In this case, there are four possible
configurations for the next pixel-edge. Other three cases, namely The pixel-edge loop buffer B pel forms the basis of our following
top, right and bottom pixel-edges of a contour pixel can be dealt screen-space algorithms. We call it an edge-loop pool. We develop
with in a similar manner. Furthermore, it also requires to find the two special operations: spatial filtering and segmentation. Classical
previous pixel-edge of the current one for each of the above four parallel computing primitives like the segmented scan [SHG∗ ] can
cases, which is needed in loop breaking process. be applied to the edge-loop pool by treating each edge-loop as a
segment.
The whole task only involves 3 × 3 neighborhood of a contour
pixel in the bitmap and it is trivial to parallelize. GPU threads only Spatial filtering. Spatial filtering can be considered as a 1D con-
work on Bcp , i.e., the buffer of contour-pixels (foreground pixels), volution on each edge-loop: each edge navigates around its edge-
in order to improve performance. A binary bitmap with contour loop and collects data from the neighboring pixel-edges. In our im-
pixels as the foreground is required to support neighboring pixel plementation, GPU threads linearly map to all pixel edges. Each
queries. It finally outputs a pixel-edge loop buffer denoted by B pel thread caches data into the thread group shared memory. In most
cases, neighboring data can be found and fetched efficiently from (b) The case of a self-occlusion model generates a cusp and a junction.
this cache. However, there are a few cache misses: (1) Pixel edge
is mapped to the start or end of a thread group; (2) Pixel edge is at Figure 8: Orientation-based stroke extraction.
the start (or end) of an edge-loop, and its predecessor (or successor)
is not mapped to the same thread group. This can only happen to
the first or last edge-loop mapped to the thread group. We detect contour edge share the same vertex order as its adjacent front face.
both scenarios and load missed data to the group shared memory. The contour of a smooth mesh will form counter-clockwise curves
Since the topology of edge-loop is fixed each frame, we can prepare on the screen. After rasterization, the hidden contour is discarded
missed data for each thread group and reuse it the whole frame. while the visible contour becomes thin and long strands of contour-
pixels. As the camera projection preserves the orientation of a tri-
Segmentation. Given a key for each edge, segmentation splits
angle face, the strands of contour-pixels share a counter-clockwise
each edge-loop into segments; pixel-edges inside a segment share
orientation (see Figure 8).
the same key, and two adjacent segments have different keys. Seg-
mentation requires each pixel-edge to evaluate where its segment During the rasterization stage (Section 4), we numerate vertices
starts and ends, which can be implemented via two segmented vc0 and vc1 of each contour edge according to its winding order in
scans, one for the starting index and another one for the ending the adjacent front face, project them to the screen positions vs0 and
index. vs1 respectively, and finally copy the edge direction vs1 − vs0 to its
rasterized contour fragments.
6. Stroke extraction
6.2. Orientation of pixel-edges
Edge-loops excessively cover contour features and neither start nor
end at visibility changes. To resolve this issue, we select desired All pixel-edges are originally clockwise oriented around their
pixel-edges from loops, which we call stroke segments. Each stroke contour-pixel square. To estimate an accurate orientation of the
segment starts or ends as its underlying mesh contour became vis- pixel-edge, we fit a curve to the local shape on its edge-loop. Our
ible or hidden, and each contour feature is covered by exactly one fitting algorithm takes the framework by Lewiner et al. [LGJLC05].
stroke segment (Figure 8). An “inner” edge-loop without visibility We dispatch two kernels to realize the local curve fitting. The
change will be extracted as a stroke (see the inner loop in Figure 7). first kernel samples the midpoint of each pixel edge, and then ap-
We match the orientation of each pixel-edge with its surround- plies Laplacian operator to smooth the midpoints by using the spa-
ing contour-pixels along the edge-loop; pixel-edges with coher- tial filtering discussed in Subsection 5.3. The second kernel applies
ent orientation will be selected as stroke segments. According to the spatial filtering again to collect for each pixel edge e0 the mid-
Bénard and Hertzmann [BH19], there are two kinds of visibility points W = {m−n , . . . m0 , . . . mn } of its neighborhood along the
change among contour-pixels: cusp and junction (see Figure 8b). corresponding edge loop (n = 8 in our experiments). In addition,
Our heuristic can resolve both cases owing to the oriented nature of we compute the arc-length parametrization of W as follows
edge-loops and mesh contours. 0 k = 0;
sk = sk+1 + ∥mk − mk+1 ∥ k = −1, −2 · · · , −n; (1)
sk−1 + ∥mk − mk−1 ∥ k = 1, 2, · · · , n;
6.1. Orientation of contour pixels
A quadratic parametric curve us then use to fit We
Seen from the viewpoint, vertices at a front face (resp. back face)
have a counter-clockwise (resp. clockwise) winding order. Let each r(s) = as + bs2 , (2)
where r(s) = (x(s), y(s)), a = (ax , ay ) and b = (bx , by ). This leads Conventional stroke rendering algorithms [DiV13] can be eas-
to the following optimization ily applied to these stroke polylines to achieve stylized results. In
n order to collaborate with texture mapping, we extend each vertex
argmin{a,b} ∑k=−n ||wk (mk − r(sk )||2 , (3)
of a polyline along its the nomral direction [HLW93] to obtain a
−
(pk+e −pe )2 strip mesh as illustrated in Figure 10, and then create the texture
where wk = e σ2 are Gaussian weights. The orientation of coordinates for each of the mesh vertex on the given texture.
(a)
pixel edge e is then computed as te = ∥(a)∥ .
(a)
Sparsity analysis of contour primitives. Our approach greatly Figure 12: Sparsity of primitives: (a) Ratios between convex edges
benefits in performance from the sparsity of contour edges and con- and the total edges, and the contour edges and the total edges ;
tour pixels which determines the GPU workload. To verify this, we (b) Ratio between contour pixels and screen pixels covered by the
rotate the mesh model and record the average of three ratios - ratio mesh bounding box.
between the amount of convex edges and the total edges, contour
edges and the total edges, contour-pixels and the pixels covered
by the mesh’s screen bounding box. Figure 12) (a) illustrates that
about 60% convex edges remain after preprocessing while only 4% Active Stroke is a prototype based on an image-space line ren-
of total edges are contour edges. Similarly, near 4% of the render- dering method [BJC∗ 12], which is different from the other three
ing pixels are contour ones. tools in two critical points: (1) it only generates curves in the first
frame and then maintains a set of curves throughout the subsequent
frames while the other three methods generate stroke curves for
7.3. Comparison with previous work each frame; (2) it generates curves from feature samples in the
depth buffer with image-space filters (not from actual 3D mesh con-
As no existing GPU-based approaches support contour vectoriza-
tours) while the other three methods generate curves from the 3D
tion, all approaches to be compared are CPU-based: Freestyle, Line
contour of the mesh model.
Art, Pencil+4 and Active Strokes are CPU-based. Both Freestyle
and Line Art are line drawing modules of Blender, among which Runtime Performance. For methods generating curves in each
Freestyle is based on a series of work by Grabli and Turquin et frame (all except Active Strokes), we profile the time of contour ex-
al. [GTDS10, GTDS04]. Pencil+4 is a closed-source line drawing traction and stroke vectorization, which are the main focus of our
renderer, with implementation across multiple softwares. Consid- method. For Active Strokes, we record the total cost of feature pixel
ering that our system is developed in Unity Engine, we choose the extraction (extract samples and image readback) and curve pro-
Unity version of Pencil+4 for comparison. cessing (advection, relaxation, and topology adjustments). Table 1
Mesh Model (tris) Suzanne (0.1k) Bunny (5k) Cow (6k) Teapot (6k) Fandisk (13k) Rocker. (20k)
Ours 0.76 0.78 0.7 0.75 0.63 0.86
Pencil+4 7.93 8.43 9.32 8.21 9.22 14.84
Freestyle 27.10 43.32 43.75 44.79 53.34 67.13
Line Art 11.30 18.80 19.65 20.29 28.71 44.69
Active Strokes 54.40 54.68 53.79 46.70 52.26 54.40
Mesh Model (tris) Horse (97k) Buddha (99k) Arm. (100k) David (100k) Dragon (249k) Lucy (300k)
Ours 0.87 0.98 0.97 1.02 1.42 0.96
Pencil+4 68.10 97.67 111.61 67.00 208.00 183.60
Freestyle 242.37 331.10 370.56 383.69 828.47 845.61
Line Art 165.90 189.49 233.44 166.54 445.14 551.52
Active Strokes 51.45 97.09 124.56 91.49 107.43 73.73
demonstrates that our approach achieves tens to hundreds folds of 7.4. Stylized rendering of complex models
acceleration over other methods. This is because the serial nature
At the end, we present some stylized results of complex models.
of CPU makes it challenging to process massive geometric data. In
Figure 15 depicts three types of stroke patterns for Lucy model
addition, the iterative processes such as contour vectorization also
while Figure 16) illustrates the final rendering results and stylized
consume considerable time on CPU (linear time complexity). In
strokes for two complex monstrous models.
contrast, the time complexity of our vectorization algorithm is only
O(logn).
Stroke chaining quality. Generally, our method produces stroke 8. Limitations, conclusions and future work
curves with quality comparable to or even better than the CPU- Our method suffers from some disadvantages in some special situ-
based approaches regardless of the complexity of mesh shapes. In ations. First, it may ignore subtle contour visibility changes due to
our experiments, hybrid methods (all except Active Strokes) were lack of accurate 3D contour information to guide the stroke extrac-
tuned to ensure a coherent configuration: only mesh contour is ex- tion and therefore wrongly connect different strokes together (see
tracted, and contour elements are chained to stroke curves starting Region A in Figure 14). Second, a stroke may be falsely broken if
and ending at visibility events. Since the stroke topology of Active being occluded by other primitive or objects as shown in the top
Strokes is mainly determined by the curve tracking process, instead left of Figure 13a. This case worsens for contour features highly
of rendering a static scene, we animate the scene to bring motion to clustering on the screen such as thin objects as shown in the bottom
the curves and take a screen capture. left of Figure 13a. It usually does not happen for geometry-based
All experimental results are presented in Figure 14 in which algorithms such as Pencil+4, e.g. seeing the second column in Fig-
strokes are drawn with different colors. As shown in areas C and ure 13b. Third, a minor limitation is that the extracted strokes have
D in the figures, contour by other methods is inappropriately bro- a pixel offset from the actual screen-space contour. This artefact
ken into fragmented curves, and the curve topology fails to re- will not be perceived generally and can be amended by moving the
flect the occlusion relationship. In contrast, continuous curves by stroke pixels towards their associated contour-pixels.
our method match the occlusion relationship better. Nevertheless, Regardless of the aforementioned drawbacks, our method
area A depicts that our image-based line extraction process can achieves acceleration of hundreds of times against CPU-based
not recognize those endpoints with subtle visibility change on the methods and is enough to make up for these disadvantages in
screen, while other hybrid methods (all except Active Strokes) pro- real-time applications. As future work is to extend our frame-
duce more accurate line distribution. This defect is even more vis- work to generate temporally coherent stylized contour anima-
ible for Active Strokes, which links all pixels as a single curve. tions. It is also interesting to integrate the proposed framework
For dense meshes shown in the third ("David") and fourth column into a more complete and powerful GPU contour stylization
("Lucy"), our method yields a line topology similar to that by Pen- pipeline [BH19]. A reference implementation of the proposed
cil+4 and Active Strokes and much better than those by Line Art method is available at https://github.com/JiangWZW/
and Freestyle which are highly fragmented as shown in area G to Realtime-GPU-Contour-Curves-from-3D-Mesh.
K.
Our algorithm achieves a balance between Pencil+4 and Active
9. Acknowledgements
Strokes: (1) It leads to more coherent and continuous curves com-
pared with Pencil+4 which links curves directly on meshes because We thank anonymous reviewers, especially the primary reviewer,
contour-pixels on image often have cleaner topology and smoother for the valuable and careful comments. We thank Pierre Bénard for
geometric attributes than the contour edges on the mesh; (2) It kindly providing the experiment data of Active Strokes [BJC∗ 12].
catches more details and better reflects the occlusion relationship We also thank Wengrui Ma and Yiming Wu for helpful discussions
than Active Strokes. on Freestyle and Line Art.
This research is sponsored in part by the National Nat- [CF09] C OLE F., F INKELSTEIN A.: Fast high-quality line visibility. In
ural Science Foundation of China (61972160, 62072191), in Proceedings of the 2009 symposium on Interactive 3D graphics and
part by the Natural Science Foundation of Guangdong Province games (Boston, Massachusetts, 2009), Proceedings of the 2009 sympo-
sium on Interactive 3D graphics and games, Association for Computing
(2019A1515012301 , 2019A1515010860). Guiqing Li is the cor- Machinery, p. 115–120. 2
responding author.
[CF10] C OLE F., F INKELSTEIN A.: Two fast methods for high-quality
line visibility. IEEE Transactions on Visualization and Computer Graph-
ics 16, 5 (2010), 707–717. doi:10.1109/TVCG.2009.102. 3, 4
[CM02] C ARD D., M ITCHELL J. L.: Non-photorealistic rendering with
pixel and vertex shaders. Direct3D ShaderX: vertex and pixel shader tips
and tricks (2002), 319–333. 2
[CS16] C ARDONA L., S AITO S.: Temporally coherent and artistically
intended stylization of feature lines extracted from 3d models. Computer
Graphics Forum 35, 7 (2016), 137–146. doi:https://doi.org/
10.1111/cgf.13011. 2
[DiV13] D I V ERDI S.: A brush stroke synthesis toolbox. In Image and
(a) Video-Based Artistic Stylisation, Image and Video-Based Artistic Styli-
sation. 2013, pp. 23–44. 7
[GDS04] G RABLI S., D URAND F., S ILLION F.: Density measure for
line-drawing simplification, 2004 6-8 Oct. 2004 2004. 1
[Goo03] G OOCH B.: Silhouette extraction. Course Notes for Theory
and Practice of Non-Photorealistic Graphics: Algorithms, Methods, and
Production Systems 6 (2003), 1–10. 2
[GTDS04] G RABLI S., T URQUIN E., D URAND F., S ILLION F. X.: Pro-
grammable style for npr line drawing. In Proceedings of the Fifteenth Eu-
rographics conference on Rendering Techniques (Norrköping, Sweden,
(b)
2004), Proceedings of the Fifteenth Eurographics conference on Render-
ing Techniques, Eurographics Association, p. 33–44. 8
Figure 13: Limitations: (a) Our method wrongly partitions the rect-
[GTDS10] G RABLI S., T URQUIN E., D URAND F., S ILLION F. X.: Pro-
angle into two strokes due to occlusion by a stick (top left) while grammable rendering of line drawing from 3d scenes. ACM Transactions
Pencil+4 preserves the integrity well (top right); (b) Our method on Graphics 29, 2 (2010), 1–20. 1, 5, 8
falsely clusters contour features of thin objects (the red rectangle
[GVH07] G OODWIN T., VOLLICK I., H ERTZMANN A.: Isophote dis-
regions, bottom left) and Pencil+4 again yields more reasonable tance: a shading approach to artistic stroke thickness. In Proceedings
results (bottom right). of the 5th international symposium on Non-photorealistic animation and
rendering (San Diego, California, 2007), Proceedings of the 5th interna-
tional symposium on Non-photorealistic animation and rendering, Asso-
ciation for Computing Machinery, p. 53–62. 1
References [Har07] H ARVILL A.: Effective toon-style rendering control using scalar
fields., 2007. 2
[BCGF10] B ÉNARD P., C OLE F., G OLOVINSKIY A., F INKELSTEIN A.:
Self-similar texture for coherent line stylization. In Proceedings of the [Har10] H ARRIS M.: State of the Art in GPU Data-Parallel Algorithm
8th International Symposium on Non-Photorealistic Animation and Ren- Primitives. Tech. rep., Nvidia, 2010. 7
dering (New York, NY, USA, 2010), NPAR ’10, Association for Com- [HLW93] H SU S. C., L EE I. H. H., W ISEMAN N. E.: Skele-
puting Machinery, p. 91–97. URL: https://doi.org/10.1145/ tal strokes. In Proceedings of the 6th Annual ACM Sym-
1809939.1809950, doi:10.1145/1809939.1809950. 1 posium on User Interface Software and Technology (New York,
[BH19] B ÉNARD P., H ERTZMANN A.: Line drawings from 3d models: NY, USA, 1993), UIST ’93, Association for Computing Machin-
A tutorial. Foundations and Trends® in Computer Graphics and Vision ery, p. 197–206. URL: https://doi.org/10.1145/168642.
11, 1-2 (2019), 1–159. doi:10.1561/0600000075. 168662, doi:10.1145/168642.168662. 7
[BHK14] B ÉNARD P., H ERTZMANN A., K ASS M.: Computing smooth [IHS02] I SENBERG T., H ALPER N., S TROTHOTTE T.: Styliz-
surface contours with accurate topology. ACM Transactions on Graphics ing silhouettes at interactive rates: From silhouette edges to sil-
33, 2 (2014), 1–21. doi:10.1145/2558307. houette strokes. Comput. Graph. Forum 21, 3 (2002), 249–258.
URL: https://doi.org/10.1111/1467-8659.00584, doi:
[BJC∗ 12] B ÉNARD P., J INGWAN L., C OLE F., F INKELSTEIN A., 10.1111/1467-8659.00584. 3
T HOLLOT J.: Active strokes: Coherent line stylization for animated
[LFHK21] L IU D., F ISHER M., H ERTZMANN A., K ALOGERAKIS E.:
3d models. In NPAR 2012 - 10th International Symposium on Non-
Neural strokes: Stylized line drawing of 3d shapes, October 2021. 2
photorealistic Animation and Rendering (Annecy, France, 2012), NPAR
2012 - 10th International Symposium on Non-photorealistic Animation [LGJLC05] L EWINER T., G OMES J R J. D., L OPES H., C RAIZER M.:
and Rendering, ACM, pp. 37–46. Curvature and torsion estimators based on parametric curve fitting. Com-
puters & Graphics 29, 5 (2005), 641–655. 6
[Ble90] B LELLOCH G.: Pre x sums and their applications. Tech. rep.,
Citeseer, 1990. 4 [LNHK20] L IU D., NABAIL M., H ERTZMANN A., K ALOGERAKIS E.:
Neural contours: Learning to draw lines from 3d shapes, June 2020. 2
[BOA09] B ILLETER M., O LSSON O., A SSARSSON U.: Efficient stream
compaction on wide simd many-core architectures. In Proceedings of the [McG04] M C G UIRE M.: Observations on silhouette sizes. Journal
Conference on High Performance Graphics 2009 (New York, NY, USA, of Graphics Tools 9, 1 (2004), 1–12. doi:10.1080/10867651.
2009), HPG ’09, Association for Computing Machinery, p. 159–166. 3 2004.10487594. 3
(a)
(b)
(c)
(d)
Figure 14: (continued) Comparison of contour stroke quality among five approaches: four examples are presented for each approaches and
from top to bottom are respectively our approach, Pencil+4, Line Art, Freestyle and Active Strokes. Dotted rectangles on the model are
zoom-in regions whose larger versions are placed around the models.
Figure 15: Stylization with texture mapping: three types of stroke patterns are depicted for the Lucy model.
[MH04] M C G UIRE M., H UGHES J. F.: Hardware-determined feature [RC99] R ASKAR R., C OHEN M.: Image precision silhouette edges. In
edges. In Proceedings of the 3rd international symposium on Non- Proceedings of the 1999 symposium on Interactive 3D graphics (Atlanta,
photorealistic animation and rendering (2004), Proceedings of the 3rd Georgia, USA, 1999), Proceedings of the 1999 symposium on Interactive
international symposium on Non-photorealistic animation and rendering, 3D graphics, Association for Computing Machinery, p. 135–140. 2
pp. 35–47. 2, 3
[Sel03] S ELINGER P.: Potrace: a polygon-based tracing algorithm. Po-
[Mor66] M ORTON G. M.: A computer oriented geodetic data base and a trace (online), http://potrace. sourceforge. net/potrace. pdf (2009-07-01)
new technique in file sequencing. 5 (2003). 2, 5
[ND04] N IENHAUS M., D ÖLLNER J.: Sketchy drawings. In Proceedings [SHG∗ ] S ENGUPTA S., H ARRIS M., G ARLAND M., ET AL .: Efficient
of the 3rd international conference on Computer graphics, virtual real- parallel scan algorithms for gpus. 4, 5
ity, visualisation and interaction in Africa (Stellenbosch, South Africa,
2004), Proceedings of the 3rd international conference on Computer [ST90] S AITO T., TAKAHASHI T.: Comprehensible rendering of 3-d
graphics, virtual reality, visualisation and interaction in Africa, Associa- shapes. SIGGRAPH Comput. Graph. 24, 4 (1990), 197–206. doi:
tion for Computing Machinery, p. 73–81. 10.1145/97880.97901. 2
[NM00] N ORTHRUP J., M ARKOSIAN L.: Artistic silhouettes: A hybrid [WKS07] W ÄCHTER C., K ELLER A., S TICH M.: Efficient and ro-
approach. In Proceedings of the 1st international symposium on Non- bust shadow volumes using hierarchical occlusion culling and geometry
photorealistic animation and rendering (2000), pp. 31–37. 3 shaders, 2007. 3
[PSM∗ 13] P ECIVA J., S TARKA T., M ILET T., KOBRTEK J., Z EMCIK [Wri90] W RIGHT W. E.: Parallelization of bresenham’s line and circle
P.: Robust silhouette shadow volumes on contemporary hardware. In algorithms. IEEE Computer Graphics and Applications 10, 5 (1990),
GraphiCon’2013 (2013), pp. 56–59. 3 60–67. 2, 3, 4
Figure 16: Stylization with toon shading for two monsters: each row shows the final render (left) and stylized strokes (right).