0% found this document useful (0 votes)
144 views57 pages

A Vector Format For Flutter (PUBLICLY SHARED)

This document discusses requirements and priorities for creating new vector graphics formats. It covers topics like optimizing for authoring experience, external constraints, and rendering performance. Existing formats like SVG and PDF are evaluated. The document proposes features for a new format including parameters, expressions, shapes, gradients, and composition. It includes strawman designs for a binary format with conventions for headers, metadata, blocks, and APIs. The goal is an open industry-wide standard for vector graphics.

Uploaded by

Joey Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views57 pages

A Vector Format For Flutter (PUBLICLY SHARED)

This document discusses requirements and priorities for creating new vector graphics formats. It covers topics like optimizing for authoring experience, external constraints, and rendering performance. Existing formats like SVG and PDF are evaluated. The document proposes features for a new format including parameters, expressions, shapes, gradients, and composition. It includes strawman designs for a binary format with conventions for headers, metadata, blocks, and APIs. The goal is an open industry-wide standard for vector graphics.

Uploaded by

Joey Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

A vector format for Flutter (and beyond)

2019-2021 - Ian Hickson - Flutter team - flutter.dev/go/vector-graphics

This document is publicly shared.

Please do not hesitate to add comments.

Introduction
Supporting a vector graphics format is a popular and long-standing request from the Flutter community1.
There are no good formats available. This document discusses the requirements and priorities for
creating new formats, and ends with some straw-man proposals for a potential new industry-wide vector
graphics standard.

Table of contents
Introduction 1

Table of contents 1

Terminology and typographic conventions 6

Scope 6

Target audience 6

Use cases 6

Use cases deemed out of scope 7

User stories 7

User interface components (especially icons) 7

Featured graphics 7

Geographic maps 8

Existing formats 8

SVG 8

PDF 8

PostScript 9

Lottie 9

Open Font Format (OFF) 9

VectorDrawable 9

Rive 9

HVIF 9

1
https://github.com/flutter/flutter/issues/1831
IconVG 10

Priorities 10

Optimizing for the authoring experience 10

Hand authoring of images 10

Tool-driven authoring of images (round-tripping in editors) 11

Optimizing for external constraints 11

Security 11

Accessibility 12

Indexability 12

Disk and network footprint (compressed size) 12

Flutter suitability 12

Backwards compatibility 13

Forwards compatibility 13

Optimizing for the renderer 13

Memory footprint 14

Footprint of the renderer itself 14

Speed of rendering the full image 14

Speed of rendering subsets of the image (cropping at rendering time) 14

Speed of rendering images at small sizes 15

Power requirements of rendering the image 15

Optimizing for quality of rendering 15

Hinting 15

Level of detail 15

Summary 🧚 15

Features 16

Metadata 16

Image dimensions 16

Baseline 16

Text 🚫 16

Text styling features 17

Span styling 17
Font fallback 18

Paragraph layout 18

Internationalization 18

Vertical text 18

Text along a path or along a mesh 18

Localization 18

Parameters ✅ 19

Constants 19

Predefined parameters 🚫 19

Animation 20

Multi-stage animations 🚫 20

Versatility of rendering at different sizes 20

Types of parameters 20

Expressions ✅ 21

Arithmetic 21

Conditional 22

Properties of values 22

Interpolation 22

Shapes 22

Types of paths 22

Fills ✅ 23

Strokes 🚫 23

Animations 24

Hairline strokes 24

Pixel snapping, shape alignment 🚫 24

Paints 24

Colors ✅ 24

Color spaces 24

High dynamic range 24

Gradients ✅ 25

Textures 🚫 25
Shaders 🚫 25

Transforms at the shape level ✅ 25

Group transforms, group opacity, effect layers, and clips 🚫 25

Injected widgets🚫 26

Back references ✅ 26

Hit testing ✅ 26

Turing completeness 26

Clipping at the image edge 27

Design ideas 27

Culling 27

Designing for GPUs 28

Traditional GPUs 28

Animations 28

General-purpose GPUs 29

Future GPUs 29

Discussion 29

Ideas from other formats 30

Avoiding antialiasing seams 🚫 30

Multiple formats 30

Using fixed-point coordinates 🚫 30

Design decision summary 31

Evaluating existing designs 32

Evaluating new designs 32

Compatibility with authoring tools 32

Strawman designs 32

Icon VG 32

Open Font Format 32

A fixed-alignment binary format 32

Introduction 33

Conventions 36

Structure 37
Header block 37

Metadata blocks 38

Parameter blocks 38

Expression blocks 39

Matrix blocks 43

Shapes 44

Cubic Béziers 44

Rational Quadratic Béziers 44

Curve blocks 44

Shape blocks 46

Gradient blocks 47

Paint blocks 49

Flat color 50

Linear gradient 50

Radial gradient 51

Flags 51

Paint codes 51

Composition blocks 52

APIs 53

Updating parameters 53

Hit testing 53

Bounds introspection 53

Metadata APIs 53

Other APIs 54

Compressibility 54

Summary of internal references 54

Future extensions 55

New block types 55

Bitmap images and other attachments 55

Extending packed blocks 55

More parameters, expressions, and paints 55


More kinds of references 56

Terminology and typographic conventions


Terms in this document are generally used according to their usual meaning in the industry. The following
words in particular, however, are used herein to refer to concepts with specific meanings. To avoid
ambiguity where those terms may in other contexts sometimes be used to refer to other concepts, they
are defined explicitly here.

shape. a piece of geometry or other rendered component of the graphic, which could be as simple as a
circle or path, but could also be complex, such as a paragraph of styled text or a composite layer
(consisting of other shapes and instructions for blending them as a whole into the scene).

🚫 indicates a feature that is currently not planned for inclusion.


✅ indicates a feature that is intended to be supported.
🧚 indicates a decision (more subtle than just a "yes" or "no").
🤔 indicates a feature where more thought is required to make a determination.
Scope
Target audience
While Flutter's use cases (as described below) are motivating, a format will be more useful if it has broad
appeal. As such, it is the goal of this document to describe a format suitable for implementation in web
browsers and other user interface frameworks. A format will also be significantly more useful if it is
implemented as an export format in common vector graphic editors, so that should also be a goal.

Use cases
These are the use cases that have been discussed for which vector graphics would be useful in Flutter:

● Icons. Static images that may need to be recoloured dynamically (e.g. grayed out when
disabled), and may need to have varying levels of detail depending on the target render size.
Unlikely to have text.
● Animated icons. Could be two icons as described above, and a description of how to seamlessly
transition from one to another; or it could be a single icon that has a way to react to being
touched. The Material Design guidelines have a variety of examples2. These are unlikely to have
text.
● Backdrops. Images that may be much larger than the screen, maybe shown with parallax;
generally simple scenes with shapes, gradients, and shadows, unlikely to have text.
● Featured graphics. Images that form the foreground of the application, for example for tutorials,
or to report state or suggest the user perform some action. Typically animated. For example, an
image showing how to plug a peripheral into the host device. May have text.

2
https://material.io/design/iconography/animated-icons.html
● Skeuomorphic UI widgets. For example, for knobs or sliders. Some variants are unlikely to have
text (e.g. knobs), some are likely to have text (e.g. push buttons with labels integrated into the
picture).
● Skeuomorphic UI regions. For example, a tutorial for some machinery's control panel. This
would involve being able to hit-test on components. May have text.

Anecdotally, a very common source of animations is Adobe After Effects.

Use cases deemed out of scope

The following use case has also been discussed, but is not something for which we are going to actively
optimize:

● Maps. Images that can be panned and zoomed several orders of magnitude, with varying detail
at different zoom levels. Likely to have text. This is considered out of scope because in practice
applications (or libraries) that show maps are likely to be focused on this topic enough that they
can afford to provide bespoke solutions.

User stories
Let us study some of these use cases in more detail.

User interface components (especially icons)

It is common for visual user interfaces to contain graphical components (indeed, that is pretty much all
that they contain other than text). These tend to fall into one of two categories: graphics that are drawn
by the application (or its framework) directly, such as window outlines, background colors and gradients,
even checkboxes or radio buttons; and more elaborate images as used in icons.

Icons can be represented by bitmaps, but this tends to fail on modern hardware if one is looking for
high-quality imagery as different devices have different device pixel ratios (also known as resolution or
pixel density). One could downsample from a much larger image, but this requires a large image
(affecting app size) and lots of memory or computation time during decoding. One could ship many
different versions, but this also ends up having an unfortunate impact on app size and not every possible
resolution can be provided (since the user can, on some systems, select an arbitrary value).

Thus developers tend to gravitate towards vectors as a solution for showing icons. Vectors tend to be a
good choice since icons typically have clean lines and a low level of detail.

Developers tend to very quickly find that static icons are insufficient. Modern user experiences are rife
with transitions and animations. It is no longer sufficient for a "play" icon to change from a triangle to a
square when tapped; instead it must morph from one to the other. It is no longer sufficient for a dial to
merely rotate when its value is changed; instead the reflection and shadows represented in the image
must maintain a consistent illusion throughout.

As such, whatever format applications use needs to support some level of animation.

Featured graphics

Some applications are focused around specific images. For example, an education application could
show a diagram of an animal's anatomy or a photograph of a historical event.
These graphics tend to fall into two categories: the photorealistic, for which bitmaps are the only practical
solution, and the more abstract, such as diagrams, for which vectors are the preferred solution due to
their smaller size and resolution independence.

As with user interface components, one could use bitmaps for the diagram case. However, doing so
quickly runs into issues with memory and disk consumption issues (these images are by necessity large
to be beautiful even at high pixel densities on large displays).

Geographic maps3

Applications that show geographic maps, such as Google Maps, operate in a vector space. Roads are
lines that intersect, labels are placed upon those lines, buildings may be represented by paths. However,
a vector image is insufficient to properly represent this data. The actual map data is not Cartesian (it's the
surface of a sphere, or potentially an even more complicated shape), the layers are too numerous (and
include petabytes of satellite image data), the logic for showing or hiding labels depends on features
such as the identity of the user seeing the map (e.g. consider "home" and "work" labels, or highlighting
recent destinations), not to mention the need to directly transition from 2D views to 3D views (zooming
out to see the whole planet, zooming in to see a 3D view of a building interior).

These features all result in a desire from map application creators to carefully control their rendering
surface and as such providing these features in a vector graphics format would not actually help their use
case.

Existing formats
The de-facto standards for vector graphics are SVG4 and PDF5. Most other formats are either proprietary
6
, or hail from a different era (e.g. designed in the 1990s) and thus not well-suited to modern needs.

SVG
SVG is really an application SDK that happens to include vector graphics (e.g. fully supporting SVG
involves supporting XML, JS, DOM, SMIL, HTML, audio playback, keyboard, mouse, and touch input,
form controls, HTTP submission, video conferencing, etc). Additionally, there is no clearly defined subset
of SVG to target if one only wants "vector graphics": the modern version of SVG Tiny involves supporting
JavaScript and video playback, while the original version of SVG Tiny requires an unusual subset of
features, for example it does not support gradients, but does support custom fonts.

PDF
Similarly, PDF has developed into an electronic document exchange format that happens to include
vector graphics (e.g. fully supporting PDF involves supporting multipage documents, form controls, video,
3D, digital signatures, etc).

3
This section does not apply to simpler kinds of maps like floor plans and topological maps.
4
Specification: https://www.w3.org/TR/SVG/
5
Specification: https://www.iso.org/obp/ui/#iso:std:63534:en
6
There are many such formats, for example Skia has the SKP format. Like other proprietary formats, it has limitations that make it unsuitable for
this document's purposes.
PostScript
PDF is built on PostScript, which is also the basis of EPS. PostScript is a programming language
designed in 1984; vector graphics are the output of the program. EPS could be used as a vector graphics
format more easily than PDF; implementing support for EPS does not, for instance, involve supporting
digital signatures. However, PostScript, and thus EPS, are intended primarily for printing. A variant of
PostScript called Display PostScript (DPS) was designed to make PostScript more suitable for use in
user interfaces7. DPS is the most plausible existing format that could be used to address the use cases
described above. Even DPS, however, comes with significant baggage, for example a garbage collection
model, the ability to specify the halftone phase, and a specific set of fonts.

Lottie
There is one recent addition to this space that has gained some traction, namely Lottie8. This is a format
created by Airbnb specifically for the purpose of allowing Adobe After Effects assets to be rendered on
the Web, Android, iOS, and (increasingly) other platforms. It is based on JSON. There is currently no
first-class Lottie support in Flutter, although adding such support has been discussed9 and there are
packages that allow Lottie to be used with Flutter in various ways.

Lottie suffers from being very specific to After Effects. Many of its design decisions are, in the abstract,
esoteric, and not what you would want from a format designed from the ground up. There's also no
specification for the format, currently10.

Open Font Format (OFF)


OFF (also known as OpenType), as a font format, is effectively a vector graphics format. It may be one of
the vector graphics formats most used by the general population, in fact. Supporting Emoji has led to this
format adding support for color, and there are proposals to extend it to support gradients11.

VectorDrawable
Android's Vector Drawable is spiritually a simpler version of SVG (XML-based, similar path data format).

Rive
Rive's format12 is designed specifically for Flutter but is also proprietary and is largely undocumented13.

HVIF
The Haiku open source project's vector format, HVIF, is optimized for icons. Notable features include a
level-of-detail system that is extensively used by Haiku's icon set. While it would be an interesting choice
for a subset of the use cases described above, some of the design choices limit its use. For example,
there is a small maximum number of styles and paths per file. It also seems to lack a formal specification.

7
NeXT used DPS for its vector graphics. Mac OS X, now macOS, originally a fork of NeXT, switched to a subset of PDF for its vector graphics.
8
https://airbnb.io/lottie/
9
Usually in the context of exposing Skia's Skottie module.
10
This may eventually change, q.v. https://github.com/lottie-animation-community
11
https://github.com/googlefonts/colr-gradients-spec/blob/main/OFF_AMD2_WD.md
12
https://rive.app/
13
This may eventually change, q.v. https://help.rive.app/runtimes/advanced_topics/format
IconVG
On the simpler end of the spectrum, IconVG14 is an experimental format that could address some of the
needs described in this document. Currently its focus is a little unclear15. Similar to HVIF, its design
places a small file size higher in the list of priorities than this document argues is appropriate (as
discussed below).

Priorities
In creating a new format, one must first decide what one is optimizing for. Here are some options, many
of which are, to some extent or another, mutually exclusive, and all of which are relevant to today's
market:

Optimizing for the authoring experience


Naturally, any format, to be useful, must be supported as an export format from major authoring tools
such as Adobe Illustrator16. Similarly, any successful format is going to need tools to convert to and from
the format and today's widely used formats like SVG.

Hand authoring of images

One could imagine creating a graphics format optimized for hand-authoring. While graphics are usually
edited using a WYSIWYG graphical editor, hand authoring is useful when creating series of similar
images17, or when creating diagrams or other images where precision is more important than aesthetics.

A hand-authored format is also easier to test than a binary format, since it is easier to create content for
such a format.

Such a format would be text-based, maybe based on XML, JSON, or some other commonly-understood
metalanguage, and would probably focus on features to allow styles to be reused, coordinates to be
given relative to other coordinates, and generally may support many ways to express the same core
concepts, such as having colors expressible either by name, or by decimal RGB values, or hex RGB
values.

In many ways, SVG fits this description18. It isn't clear what new value would be brought to the table by
creating a new format that is so close to an existing one.

Furthermore, if we build a format that is not optimized for authoring, it would be a simple matter to create
a hand-authoring-optimized variant of that format along with a tool that converts files from one format to
the other.

The reverse is not necessarily true: a format optimized for hand authoring may not be easily converted
into a format optimized for other concerns, such as low memory usage at render time or fast rendering.

14
https://github.com/google/iconvg
15
q.v. https://github.com/google/iconvg/issues/4#issuecomment-860649783
16
It's tempting to list other authoring tools here, such as Inkscape or Affinity Designer, but the reality appears to be that this market has only one
major player, with other vector graphic authoring tools having minimal usage in comparison. If export from Adobe Illustrator is supported, it is
probably sufficient to ensure the format's viability from a designer perspective; on the other hand, even if ten other tools were to support export to
this format, it may not be enough to matter.
17
For example, the ability to hand-author SVG was key to creating the original set of consistent icons on the Flutter widget catalog page:
https://flutter.dev/docs/reference/widgets
18
That said, it doesn't appear that SVG's original design was intended to be optimized for hand-authoring, and hand-authoring SVG is not an
overly pleasant experience. See also: https://www.w3.org/Graphics/SVG/WG/wiki/Secret_Origin_of_SVG
For example, a format optimized for low memory usage would probably avoid creating an object model
that can be manipulated by a script during the rendering; if we create a vector format that handles
animation by running a script each frame that can manipulate an object model, then it may be difficult to
faithfully "compile" it to a memory-efficient memory format19.

For these reasons, we will not focus on a format optimized for hand authoring, though we will keep in
mind the ability to provide a corresponding hand-authoring format and a tool to convert between the two
formats. 🧚
Tool-driven authoring of images (round-tripping in editors)

Vector graphics editors can typically export to SVG or PDF, but even those that use SVG or PDF as their
native format require extensions to exactly represent their internal state, and these extensions vary from
editor to editor20. This is natural, as different editors have different UIs and thus different state. Indeed,
many vector graphic editors have a dedicated format that they use to represent their internal state21.

This also means, however, that there is little sense in creating a general format for editors: each editor
has its own needs. 🧚
Optimizing for external constraints
If one does not optimize for authoring (whether by hand or by tool), one can optimize instead for aspects
of the file itself.

Security

A format can be optimized for security, meaning that it is designed to not contain features that are
insecure as designed, and meaning that it is unlikely for renderers to accidentally implement features in
an insecure way.

For example, a format not optimized for security could contain x86 code that is intended to run directly on
the CPU. This may be good for performance, but would be terrible for security.

A format not optimized for security could also do things like have two ways to mark the sizes of data
buffers, e.g. having a length field as well as a redundant end-of-block sentinel. This could lead to
renderers allocating a buffer using the length field but then writing data to the buffer until the end-of-block
sentinel is reached, allowing for buffer overrun situations when the file is maliciously crafted.

For obvious reasons, we should optimize for security.

The threat model used to evaluate this proposal assumes that graphics are represented as a single file,
which is entirely under the control of an attacker, and which is being rendered by software running with
the privileges of a potential victim user. All of the following are considered security flaws: exfiltrating data
back to the attacker, effecting a change to the system configuration or any user data, affecting the display
in any manner outside of the region to which the graphic image is being rendered (including violating
expectations regarding the order of painting, e.g. being able to paint over a window that itself should be

19
Tools that convert SVG to other formats suffer from this issue, and therefore uniformly only support a subset of SVG's features, though the
precise subset varies from tool to tool and can be hard to precisely describe.
20
For example, Inkscape uses proprietary extensions to SVG to describe editing state (see
https://inkscape.org/learn/faq/#what-inkscape-svg-opposed-plain-svg), and Adobe Illustrator uses a variant of EPS.
21
For example, Corel Draw has had a variety of file formats over the years, all proprietary.
covering the image), consuming hardware resources disproportionate to the complexity of the image
being rendered.

Accessibility

A vector format could be optimized for accessibility above all else: for example, every shape could be
required to have a description, images could be required to have multiple palettes to handle color
blindness, and text in images could be required to be at a minimum font size.

In practice, we know from experience (e.g. with SVG, which has the <title> and <desc> elements to
describe any arbitrary shape) that authors typically do not make any attempt to make their images
accessible even when text description features are available22, relying instead on the host environment to
provide accessibility affordances (e.g. alt="" attributes on <img> elements in HTML, when they point to
SVG images).

Flutter provides text description accessibility affordances for bitmap images already; we can rely on those
for vector graphics as well, and therefore ignore that issue for the vector graphics format itself.

The other issues are harder to ignore. Text size scaling may make sense, for instance, as might allowing
control over the colors to allow for adjustments for color blindness. That said, it's not critical for the format
itself to have this built in. While we may wish to allow it, these use cases could be equally handled by
merely providing multiple images. For this reason, this is probably a low priority for the format (while it
naturally remains a high priority for the platform as a whole).

Indexability

The ability for search engines to find images. With modern search engines able to evaluate code, use ML
models to recognize images, and perform OCR to find text, it's hard to imagine a format that would make
this especially difficult, but it's worth considering.

One step in this direction might be to ensure that text is available in an easily accessible form, for
example having a string resources section if the overall format is binary.

Disk and network footprint (compressed size)

One obvious factor to optimize for is the size of the file.

In practice, even a format like SVG, for which very little attempt has been made to really optimize for
size, can describe images of relatively modest complexity with relatively modest sizes23, and that's before
compression.

For this reason, it's not clear that optimizing for disk or network footprint first is especially valuable.
Naturally, once a focus is established, decisions can be made with a bias towards minimizing the
compressed size footprint.

22
Even those who consider using these features to make SVG-based apps accessible often find it difficult. A full solution really requires
combining ARIA and SVG and scripts dedicated to updating the ARIA attributes, but that's for "images" beyond the scope of this document
(applications, really, for which we would propose using Flutter itself, not whatever format is designed for this document).
23
Consider, e.g., the sample images for SVG provided by the W3C (warning, some sexist content):
https://dev.w3.org/SVG/tools/svgweb/samples/svg-files/?C=S;O=D
Flutter suitability

We can optimize for use in Flutter, or we can make the format more general.

For example, we can design the format to fit the instantiateImageCodec API24 rather than requiring
that vector graphics in this format use an entirely different codepath than bitmap images (as is currently
required for SVG in Flutter). We can use straight colors rather than using colors with a premultiplied
alpha channel.

In practice, there is a limit to how much we can really optimize for Flutter above other potential hosts of
vector graphics, because Flutter is really just a Dart binding of the Skia API. Anything we do to optimize
for Flutter is really optimizing for Skia, which is also used by, e.g., Android, Chrome and Firefox.

There's also the possibility that optimizing for Flutter may involve changing Flutter, e.g. if our vector
format supports adjusting parameters on the fly, or animation along multiple axes, etc, then targeting the
instantiateImageCodec API may not be desirable.

Overall, optimizing for Flutter is a logical choice with little likely downside.

Backwards compatibility

A format can be optimized for backwards compatibility, that is, the ability for a renderer of a later version
of the format to render images that were written for an earlier version of the format.

Lacking backwards compatibility is a non-starter. If we try to release a renderer that cannot render
existing files that are supposedly of the same format, people will describe that as a serious regression.
Therefore, this is key to any design.

In practice, backwards compatibility at the format level is easy to achieve. It merely requires that
revisions not involve renaming or renumbering features from earlier versions, or otherwise causing the
semantics of existing files to change.

Forwards compatibility

A format can also be optimized for forwards compatibility, that is, the ability to render images that use a
later version of the format using a renderer written for an earlier version of the format.

This is less critical than backwards compatibility.

Forwards compatibility is also easy to achieve. It requires defining error handling behavior (which should
be done anyway for security), and defining enough of the error handling to be non-fatal that extensions
(features in newer versions of the format) can be "smuggled" into files (from the perspective of older
renderers).

It makes sense to design for forwards compatibility where this does not conflict with more important
priorities. (An example of where we would forego forward compatibility could be if we found a security
problem in the format itself, which required a change that older renderers could not handle.)

24
https://master-api.flutter.dev/flutter/dart-ui/instantiateImageCodec.html
Optimizing for the renderer
We can choose between the usual time/space tradeoffs, with various variants.

Memory footprint

The first possible aspect to optimize for is memory usage during rendering. This comes in various forms,
including the steady-state cost once the image is loaded, the peak cost as the image is being parsed,
and the cost of merely loading the raw data into memory to parse it in the first place.

In practice, memory is limited but not exiguous in the environments that Flutter is used in. Even the
smallest of devices we might one day target (e.g. an Android Wear watch) have at least 512MB of RAM25
. We generally accept a modestly bigger memory (or disk) footprint an acceptable price to pay for
improved performance26.

Therefore, as with optimizing disk or network footprint, this is probably something that is best considered
as a secondary concern: something that we bias towards, but only after having first optimized for
something more important.

Footprint of the renderer itself

The design of the language impacts the code size of the renderer itself. For example, an SVG renderer
must include an XML parser, which is a not insignificant amount of code in its own right.

Flutter has footprint constraints in various environments (for example, there is a fervent desire for
Flutter's overhead on Android to be an order of magnitude smaller), so we should attempt to minimize the
disk footprint of the implementation of any vector format that we eventually hope will be implemented by
Flutter's runtime.

That said, Flutter will usually accept a greater renderer footprint if it allows greater rendering speed.
Processing cycles are much more scarce than disk and network bandwidth.

Speed of rendering the full image

Flutter optimizes heavily for rendering performance in other aspects of its design, because rendering
performance is one of the corollaries of our main value27. Optimizing for rendering performance in the
context of vector graphics is a logical continuation of this.

It is common for animated images to be used in large numbers28. We should make sure that any format
we design can handle animating many images simultaneously without skipping frames.

Speed of rendering subsets of the image (cropping at rendering time)

A specific aspect of rendering performance that we can optimize for, given the use cases described
above, is that of rendering subsets of an image, as in when the image is being cropped (e.g. due to it
being panned and zoomed).

25
Obviously not all of this would be accessible to the vector graphics renderer...
26
See also: A strategy for making judgements regarding space/time trade-offs (PUBLICLY SHARED)
27
"Build the best way to develop user interfaces", with the corollary being "The best way to develop user interfaces creates fast applications".
See: https://github.com/flutter/flutter/wiki/Values
28
For example, galleries of animated images.
As only a subset of the use cases require this feature, it makes sense to correspondingly prioritize this
aspect below some of the others. (Anecdotally, this does not seem like a widely-needed feature.)

Speed of rendering images at small sizes

The more images are on the screen, the smaller they typically are. If an image with a lot of complexity is
being drawn at a small size, we may be able to get away with only rendering the larger shapes -- for
example, if a shape is less than a tenth of a hardware pixel in size, then it isn't likely to really matter. This
can help with the total cost of rendering all the images, as adding more images (and correspondingly
shrinking them) could end up increasing the overall cost sub-linearly.

By carefully designing the format to allow us to skip small parts when they are so small that they don't
matter, we could reduce the cost of rendering images at small sizes. Anecdotally, this feature does seem
to have some use, especially in icons, where small details are omitted entirely at smaller sizes in order to
keep the icons looking simple and recognizable rather than cluttering them with detail that may be
desired at higher sizes to give images more texture.

Power requirements of rendering the image

This is essentially the same as the speed of rendering the full image, especially in the context of multiple
parallel animations of many images (where optimizing for reducing the total incremental cost on each
frame is as important as the cost of rendering the image in the first place). On mobile devices battery
usage is sometimes the driving motivation behind the same optimizations that would be made for
improving the overall rendering speed.

Optimizing for quality of rendering


Hinting

Fonts have historically supported means to adjust glyph shapes based on size, a process known as
hinting29, to improve legibility (especially at small font sizes). Icons are often rasterized for use in
applications to enable designers to "touch up" the images in a way that they could not achieve with pure
vector graphics (even if the originals are vector graphics).

This suggests that the format could provide features for such adjustments. Such features vary from the
relatively straight-forward (such as pixel snapping) to the relatively complicated (e.g. allowing shape
positioning to be relative to other shapes, e.g. to force at least one device pixel to exist between two
shapes regardless of render size).

The trade-off here is between renderer complexity and rendering quality.

Level of detail

Similar to hinting, but more coarse, is the option to entirely omit sections at certain sizes. This is closely
related to features for optimizing the rendering speed at small sizes.

This would be simpler than the more elaborate hinting features. The trade-off here is between format
complexity and rendering quality.

29
See also https://en.wikipedia.org/wiki/Font_hinting
Summary 🧚
Based on the discussion above, our priorities for this format are, in order of importance:

1. Can be supported as an export format from authoring tools (most important).


2. Security.
3. Backwards compatibility.
4. Rendering speed of the full image (and power requirements).
5. Flutter suitability.
6. Forwards compatibility.
7. Rendering speed of the image when rendered at small sizes.
8. Rendering quality.
9. Rendering speed of subparts of the image.
10. Disk footprint of the renderer.
11. Disk footprint of the image.
12. Memory footprint.
13. Ability to create a corresponding hand-editable format.
14. Accessibility30 and indexability.

Features
To handle the use cases listed earlier in this document, there are some features that would be particularly
helpful. This section discusses possible features that we could include in the format beyond the obvious
ones such as "circle" or "fill path with color".

Metadata
Beyond the pixels, there is information that describes how the image can be used.

Image dimensions

Images typically have a width and height. This can be expressed in various ways:

● 4 values: minX, maxX, minY, maxY


● 2 values: width, height
● 1 value: aspect ratio

Giving just the aspect ratio means that images don't have an intrinsic size, which may or may not be a
good thing: intrinsic sizes in images are what cause images to "pop in" in incremental environments like
the web. Without intrinsic sizes, but only an intrinsic ratio, one dimension would need to be provided.
With neither, both dimensions would need to be provided (but then the aspect ratio might be lost, which
leads to poor rendering quality).

Baseline

Images are often embedded within text, in which case aligning the image in an aesthetically pleasing
manner is non-trivial. For example, the 🐑
symbol on this line is not aligned with the bottom of the line
and not aligned with the baseline of the line, it's aligned so that it sits pleasantly relative to the baseline.

30
As noted earlier, accessibility in the platform as a whole is critical. Accessibility being low on this list reflects that the needs can be met outside
the format as well, and supporting them inside the format would be beneficial only to the extent that it provides greater flexibility to designers.
To achieve this effect, the image needs to have an intrinsic baseline position. (In the case of the 🐑
symbol, the picture is actually a character in a font, so baseline information comes from the font.)

Text 🚫
A common feature of graphics is embedded text. We could require that all text be vectorized before being
embedded. This would guarantee that the results are the same on all platforms, and would side-step the
need to deal with fonts, which are a serious source of difficulty with vector graphics (generally one does
not want to embed every font in every vector image, but if the font is not embedded there is the risk that it
isn't available at render time).

One reason to support text as a primitive shape, though, is that it would allow for reflowing of text when
rendering at different sizes (for example to honor font size scaling done for accessibility), without the
image having to contain all the precomputed shapes. It would also allow for text to be "late-bound",
provided as a parameter during rendering (see also the next few sections).

Looking at our use cases and the priorities listed above, the arguments pro and con are somewhat
limited. Assuming fonts are carried out of band anyway to show text in the app, an argument could be
made that including the text verbatim would result in much smaller images than if every glyph had to be
expressed as a path. On the other hand, if the font must be embedded then the footprint argument
swings the other way.

One could also argue that it would make hand-editing easier if text was supported (so that people didn't
have to find a way to vectorize text when hand-editing files).

The main argument against is the cost of implementation. Text is not a trivial problem31, and so many
features would need to be added to support it; for example, to style the text or spans of the text, as well
as those features listed in the following subsections. Even the simplest of text features presents a very
large implementation burden and the potential scope is even larger. Text rendering involves executing
code (fonts are turing complete), which has security implications. Text is out of reach of trivial
implementations, which could limit the potential reach of the format. Text may also be too complicated to
reasonably implement purely in GPUs (e.g. for a shader-based implementation filling paths is fine but
implementing hyphenation, line wrapping, the bidi algorithm, shaping, etc, may be a step too far).

Text styling features

There are a number of features that could be considered when implementing text in a vector graphics
format:

● Alignment to a side or to a center.


● Fonts and font selection; embedded fonts, system fonts, referencing fonts in remote resources
(e.g. over HTTP by URL).
● Font variants, font features.
● Font size.
● Paint style for text (e.g. color, gradients, blend modes...).
○ For consistency, the same styles as can be applied to any shape should apply to text.

In addition there are more complicated features, listed in the subsequent sections.

31
Much has been written on the complexities of text layout, shaping, et al. This blog post provides an interesting introduction to the topic.
Span styling

A line of text could be styled uniformly, or support could exist for styling subspans of the text with different
styles. This introduces new difficulties such as baseline alignment and text decoration spanning (e.g. do
underlines span across subspans or can they be turned off). Spanning itself can be described either as
overlapping regions or as a tree structure.

Font fallback

A subset of span styling is support for font fallback, where a glyph that is absent in one font is obtained
from another font during rendering. For example, text might use a basic Latin1-only font but include
Emojis and Fraktur mathematical symbols that are obtained from two other dedicated fonts. Support for
this is effectively a form of implicit span styling and suffers from many of the same complications.

Paragraph layout

Supporting text could mean supporting a single line of text, or supporting flowing text into multiple lines.
In the latter case there are a number of potential complications:

● Defining line breaking opportunities.


● Hyphenation.
● Justification.
● Line spacing (half-leading, struts, etc).
● Irregular wrapping shapes (flowing around an image).

Internationalization

If we support text, we must support Unicode, bidirectional text, labeling text as LTR vs RTL, aligning to
"start" and "end", providing the text's locale for font selection, and so forth. Flutter already bears the cost
of supporting this, so the impact on the implementation in Flutter would be small. The cost in the format
itself should also be relatively low. Unicode is the standard way of encoding text and is quite efficient; it
also supports expressing bidirectional text formatting. Labeling the overall direction takes one bit per text
shape, the text alignment a few more bits, and so forth.

Vertical text

Flutter explicitly does not support vertical text. We could support vertical text in this format, since many of
the constraints don't apply if there's no layout mechanism (presuming for instance that we decide to lay
shapes absolutely rather than computing their layout at runtime). This could also be a direction to expand
in later, should there be demand.

In general, even if we support text, we should probably initially not support vertical text, so as to minimize
the overall scope of the initial effort. The priorities described above argue for this too (minimizing the disk
footprint of the renderer).

Text along a path or along a mesh

A pair of common effects in formats that have a text primitive is the ability to draw the text along an
irregular baseline (placing glyphs tangential to a path) or warped to a mesh.
Localization

We could allow images to include tables of strings, and then have the strings be looked up based on a
locale parameter (see below).

With this feature, individual images may be bigger (containing text for every supported locale). Without
this feature, we would either need multiple images (one per locale), or need to make text
parameterizable.

Looking at the priority list described earlier, there is a push towards not supporting this feature in the
format itself but instead putting the burden on the application that uses the image (Flutter suitability
arguing to just rely on Flutter's existing mechanisms, and disk footprint of the image arguing against
tables of strings).

Combining localization and parameterization features (for example allowing numbers to be inserted into
text) would dramatically increase the cost of localization, since it would require supporting numeric, time,
and date formats, pluralization, and other localization features which are significantly more work than
merely picking a string from a table. Similarly, localization combined with paragraph-wrapping and
hyphenation suddenly extends the scope of both localization and wrapping to include locale-specific
hyphenation dictionaries.

Parameters ✅
Values within the image, such as coordinates for geometry, colors, the size of text (or even maybe the
contents of text), could be driven from input from outside the image.

Some of the parameters described below would be very useful for addressing some of the priorities and
use cases listed above (e.g. improving the rendering speed at small sizes, or animations). Once
parameters are supported in any form, supporting them in general need only be a minimal additional
cost. The precise extent to which they should be supported can be decided based on the constraints of
the details of the format when it is designed.

Constants

Some values are known when the image is generated (for example, the image's intrinsic dimensions).
These do not need to be exposed to the image, since they can be hard-coded into the file by the
generator.

Predefined parameters 🚫
Certain values that could impact the rendering and that are not known at the time the image is generated
include:

● The total width and height of the render surface in physical pixels (possibly an approximation32).
● The time according to the system clock.
● The user's preferred locale(s).
● The user's preferred font size scaling factor.
● The ambient text directionality (RTL vs LTR).

32
An approximation because this measurement cannot be exact if the image is transformed in any way beyond a scale transform.
Exposing these to all files may have downsides, however. For example, testing is harder if the file can
determine the time independent of the test. If, instead, all parameters must be explicitly passed in, then a
test would have full control over the output. Another example would be potential privacy implications: a
website that shows user-provided images would unwittingly allow a user to upload a file that always
matched the other users' locales (e.g. imagine a file that shows a different flag based on the locale),
which could be used in malicious ways.

Animation

For animated icons that transition from one state to another, or that react to a state, a parameter could be
provided from the host that drives a clock from 0.0 to 1.0. This would fit in well with the animation APIs in
Flutter already.

Multi-stage animations 🚫
A feature that isn't handled by merely having one parameter for the animation clock is an animation that
transitions between states that each themselves loop. For example, consider a vector graphic that
describes a train spinning on a loop of track with a switch that leads to a second loop. One clock is
required to describe the looping train, and if the train is also to be able to switch to the other loop, a
second parameter is needed to indicate which loop the train should be on. However, merely those two
parameters are insufficient to have the train remain on the first loop until it reaches the switch.

To achieve this kind of animation control, either the application would need to manage multiple
parameters, or the animation would need built-in logic to make decisions about its animations, or the
format would need some mechanism to support such animations. The solutions are not fantastically
attractive. The first requires artists to get engineers to write bespoke code for their animations. The
second would lead to Turing-completeness, which is discussed elsewhere in this document. The third
option would open the format to a potentially unlimited set of features to handle compound animations.

Versatility of rendering at different sizes

Being able to turn on or turn off certain shapes based on the zoom level would be very useful for several
of the use cases, most obviously icons, which are often shown at wildly varying sizes (e.g. on macOS
icons are rendered at sizes from 1024x1024 to 16x16 depending on the UI mode).

In practice, it's more than just "on" vs "off". When rescaling an icon, for instance, features in the image
that are turned off at one level should probably fade out rather than simply snapping out of existence.
This suggests that the detail level should be a parameter, possibly corresponding to some approximation
of the number of physical pixels per coordinate system pixel33, that can be used to drive the level-of-detail
feature. 🧚
There is a lot of prior art in this area, especially relating to fonts, which try to optimize shapes for different
sizes to maintain consistent stroke widths, improve contrast, and maximize legibility ("optical sizing"). In
some cases, entirely different glyphs are used at different sizes.

33
An approximation because if the image is being rendered with a non-uniform scale or non-affine transform, the precise number of hardware
pixels per coordinate system pixel may vary based on the axis or location in the image.
Types of parameters

There's a variety of types of data in a vector graphic. Most of them are numeric, or can be trivially
interpreted as numeric values; some others have more elaborate representations.

● doubles, e.g. for coordinates, for sizes, stroke widths


● colors (8 bits per channel, more than 8 bits per channel)
● strings, e.g. for text being rendered in the image 🚫
● booleans
● various enums, or custom enums (or integers)
● transformation matrices
● points (offsets), sizes (i.e. pairs of doubles)
● rectangles (four doubles)
● paths and components of paths 🚫
● paints 🚫
● bitmap images 🚫
● locales🚫
One can imagine allowing any of these to be used as parameters.

Expressions ✅
If one exposes parameters, as discussed above, then one quickly finds the need to derive values from
those parameters. For example, darkening a color, so that an icon can be colorized with a single
parameter, but still retain multiple shades. For some computations, workarounds could be found, e.g.
changing colors by blending the given color with transparent black or white. However, expressiveness is
increased if we allow for arbitrary expressions.

To make this useful at all, some built-in operators and functions are necessary. Different types have
different needs. There is a question about how much to allow types to be converted between each other;
for example, should it be possible to take four doubles and create a color? Should it be possible to cast a
double to an integer, and an integer to an enum? Answering these questions will likely require a study of
the use cases and of available features in hardware (see the GPU section below). 🧚 The sections below
cover some of the possibilities for the various types discussed above.

There is a tradeoff to be made between expressiveness and complexity (and thus footprint) of the
implementations. Assuming a forward-compatible strategy is used, it is likely best to start with a minimal
set of features here and then extend them in response to market needs.

Arithmetic

Arithmetic would allow parameters to be used for controlling the positions and other details of shapes.
For example, having one shape move at twice the speed of another in an animation, or having three
shapes staggered one after the other in an animation.

Precisely how much to build in is unclear. Presumably the basics, addition and subtraction, multiplication
and division, are uncontroversial. Beyond this, however, a wide variety of operators and functions could
be provided, for example:

● exponention (powers, roots).


● logarithms, e.g. log2, loge.
● trigonometry, e.g. sin, cos, tan.
● rounding, e.g. round, ceil, floor.

Some types are numeric in nature, but may need more operators. For example, colors may need bitwise
operators. We could also expose arithmetic on individual components of a color rather than the whole, or
conversion between RGB and HSL/HSV color spaces, or between degrees, turns, and radians.

Conditional

We could include a manner in which to optionally include a shape. We could also include a mechanism to
select between two values in an expression (as in the "?:" operator). In either case, we would be dealing
with booleans, and usually this implies needing a way to compare values to each other.

For numeric types, the basic operators (<, >, <=, >=, ==, !=) seem uncontroversial. For other types,
equality seems obviously valuable. However, it is easy to see more elaborate options, e.g. point-in-path,
pattern matching for strings, or measuring the "darkness" of a color (e.g. so that an image can
automatically adjust its colors to remain high-contrast regardless of parameter values).

Properties of values

Some values are multidimensional and it makes sense to inspect different aspects of them. For example,
the red, green, and blue components of a color, or the length of a string, a particular property of a paint,
or the bounding box of a path.

Interpolation

A common pattern in Flutter code around animations is the "lerp" method (short for "linear interpolation").
Many types have defined interpolations. One could provide such a mechanism in expressions in this
format, e.g. to allow easily computing the color during a fade between two colors given in parameters.

One could go even further and define interpolation between shapes (especially between paths).

Shapes
A path primitive is fundamental to a vector format, as it allows for the ultimate flexibility in drawing vector
images. There is the issue of what path primitives to expose, and whether to support relative and
absolute coordinates. These issues depend on implementation details that will be discussed below.

There are other possible primitives. Text has been mentioned already. One could imagine providing
primitives similar to Canvas.drawAtlas and Canvas.drawPoints. Which primitives should be included in
the format depends on implementation details.

Types of paths

Shape paths can be described in a number of ways. It is common in formats like SVG to provide an
expressive vocabulary with arcs, straight lines, Bézier curves of various orders, etc, potentially with
different variants such as absolute coordinates, relative coordinates, and chaining curves (e.g. SVG's "T"
command).

One could imagine a format that only supports a single command, also, if that path type is sufficiently
expressive (e.g. rational cubic Bézier curves).
The primary factors to consider for each particular type of curve and variants of tha type are:

● How much data does this particular type of curve need?


● How expensive is it to implement?
● How expressive is it?
● How redundant is it with other types of curves?

In addition, one must consider the cumulative cost of each supported type.

For example, straight, axis-aligned lines require very little data, and are cheap to implement. On the other
hand, they are entirely redundant with arbitrary straight lines (those not necessarily axis-aligned), as well
as with Bézier curves (a Bézier curve can describe any straight line). So when deciding whether to
support a dedicated axis-aligned line and arbitrary straight line features, one must compare the
complexity of a format with N curve types, and one with N+1 curve types, and one with N+2 curve types.

In general, the following feedback is pretty compelling:

● Cubic Bézier curves are able to express most shapes, including straight lines, but they cannot
strictly express true circular arcs.
● There is a desire to be able to express true circular arcs and straight lines.
● Rational cubic Bézier curves (which could handle circles) are expensive to compute.
● Rational quadratic Bézier curves (which could handle circles) are less expensive.
● Cubic Bézier curves are less expensive.
● Minimizing the number of kinds of curves supported is desirable to minimize implementation
complexity.

From these points, it follows that one could consider a format with only two kinds of curves (rational
quadratic Bézier curves and cubic Bézier curves). The main downside is that images with lots of straight
lines would be bigger and slower to render than if the format was optimized for straight lines. 🧚
Similarly when considering relative coordinates vs absolute coordinates vs supporting both, one must
consider the redundancy of having both (with the commensurate implementation cost) as well as the
convenience of having both (e.g. making paths easier to handcraft). In practice, since hand-authoring this
format is not a priority, there seems little need for supporting redundant features like relative coordinates.
🧚
Fills ✅
Filling the inside of a path is probably the most basic feature of a vector format. How to describe the path
is an issue that will be discussed in more detail below, but it is worth noting in passing that a fill can be
described either by a sequential set of steps in a path, or an unordered set of path segments that,
together, describe an outline. This latter approach can allow for more parallelism in the implementation.

Strokes 🚫
Stroking a path is a common feature in vector formats but for static images it does not add more
expressiveness as any stroke can be converted to a fill of a more elaborate shape34. For example, the
stroke of a circle is the fill of two nested circles with the right winding rule.

34
This is not strictly speaking true; a stroke may require a curve of higher order to be precisely represented. But in practice you can always
approximate it with sufficient precision for this to be true enough.
Strokes do require more implementation complexity than fills, however. Corners can have different joins,
and some joins may need miter limits specified; the ends of paths might need special caps; the precise
order of points in the path is important. All these additional complexities make it tempting to push the
problem of strokes to the encoder, thus simplifying the format and its implementations.

Animations

One use case that would suffer if strokes had to be pre-converted to fills by the encoder is animated
strokes, especially animated dashed strokes (e.g. a crawling ants effect). In some cases, animating a fill
by rotating a gradient's transform could achieve a similar effect but this is not a general solution,
especially for curved strokes.

Hairline strokes

A feature that cannot be implemented using fills alone is hairline strokes (a stroke that is exactly one
device pixel wide, regardless of the image size).

Pixel snapping, shape alignment 🚫


Shapes could be annotated to indicate that their coordinates (especially start and end coordinates)
should be snapped to device pixel boundaries.

Shape coordinates could be defined relative to each other, e.g. starting a line at an offset to another
shape's coordinates, maybe with the offset being influenced by the pixel density.

These features could interact with level-of-detail features, or parameters that expose the device pixel
ratio, the physical image size, the absolute device image alignment offset, etc.

Paints
Shapes can be styled in various ways: solid colors, gradients, textures that are repeated with particular
transforms and generated from bitmaps or nested vector graphics, programmatically generated patterns,
blend modes, filters...

The precise set that a format should support is probably best determined by considering the use cases,
priorities, and implementation needs. For example, programmatically generated patterns would involve
embedding a programming language in the format; this is something we will probably want to avoid (see
"Turing completeness" below). Solid colors are very common in icons, this is certainly something we will
want to include.

Colors ✅
The most obvious stylistic option is flat color. There are questions that would have to be answered even
here: is alpha supported, is it premultiplied, what is the color space, etc.

Color spaces

While a first version of a format might be able to get away with only supporting sRGB, subsequent
versions of the format will surely find the need to support more elaborate color spaces to take advantage
of the greater expressivity of newer display hardware.
High dynamic range

In addition to supporting broader color spaces than sRGB, a format may need to support a greater color
depth than 8 bits per channel: 48 bit color (16 bits per channel) is becoming more widely available today
and will surely become commonplace in the decades hence.

There is a question of the cost of reserving 8 bytes each time a color is expressed. A vector format will
rarely contain millions of uniquely specified colors (gradients result in many colors in the output but a
much smaller number appear in the input). This leaves open the possibility of having a palette that
describes 64 bit colors but using 32 bit, or even 16 bit, indices into that palette in the file itself.

Gradients ✅
In addition to a single flat color, one could provide the option of styling with more colors, in the form of a
gradient. There are many questions one would have to answer here: how many colors, what stop points
are they at, what is the interpolation function, are the gradients linear, radial, swept, or of some other
shape; can the gradients be transformed, what tiling mode do they use...

Textures 🚫
It is common for vector graphics to embed bitmap images.

This feature has several tradeoffs. Supporting this feature requires supporting a form of image reference,
either inlining a bitmap format, supporting arbitrary attachments, or allowing external references (e.g.
URLs). For example, SVG supports the latter35, while PDFs support inline bitmap images.

These features all come with complexity. For example, inlining another file requires defining an envelope
format, and augments the conformance requirements of the format to include the entirety of the
conformance space of the adopted bitmap format. Testing implementations for conformance expands to
include the entirety of the conformance testing of the adopted format. Implementations can end up
supporting different kinds of bitmap formats, which fragments the format's ecosystem.

If a format can be designed without support for these in the initial version without preventing future
support for such features, this would allow for the basic features to the format to be solidified before
having to attend to these additional complexities.

Shaders 🚫
Beyond gradients and textures, there is no limit to what could be provided. A static set of predefined
shaders (e.g. color filters, blurs) or even an open-ended space (e.g. inline SPIR-V code). These could be
provided inline in the file, or could be configurable at runtime.

Transforms at the shape level ✅


For some shapes (e.g. paths), transforms (and clips) can be baked in. However, it would allow for simpler
animations if transforms could be separately encoded and manipulated at the path level rather than
requiring each point to be manipulated during an animation.

35
Implementations typically support data: URLs too, enabling a super-inefficient inline encoding of bitmap data.
Group transforms, group opacity, effect layers, and clips 🚫
Applying effects at the shape level allows for many images to be expressed. However, it may be simpler
to reason about the image if groups of paths could be collected and treated as a unit, which could then
itself be transformed, clipped, or otherwise painted (e.g. blended). This also allows for certain effects that
are not otherwise possible to express, for example, applying a shadow to a group of differently-painted
shapes.

There are trade-offs involved in offering group effects, notably around performance, since each group
typically requires separate rasterization, and render target switching is expensive.

Injected widgets 🚫
If the vector graphic format is integrated tightly with Flutter's rendering pipeline, it becomes possible
(possibly even easy) to support injecting content from outside the vector graphic into the image. By
specifying a placeholder rectangle in the image, the renderer can be told to pause rendering the image,
call back out to Flutter's framework and ask for a Picture to be rendered at the given location, with the
given size. This is similar to how widgets can be embedded inline into text rendering in Flutter.

Painting a widget can involve pushing layers. For example, a TextureLayer for hardware-accelerated
video or for a Web view. To support these along with special blend modes in the vector graphic would
require deeply integrating the graphics rendering with Flutter's rendering pipeline. For embedding
widgets with inline text we avoid this complexity by only allowing widgets to be layered atop the text.

Back references ✅
For images that contain many copies of the same shapes, paints, or other effects, it may be useful to
offer a way to define such objects and allow them to be referenced later. For example, defining a
particular path that is then reused for a clip mask in one location and a stroke in another, or defining a
gradient paint that is then reused in multiple shapes.

This would especially help with the disk and memory footprints of the image.

Hit testing ✅
For some use cases, the ability to hit-test the image would be useful. It would be relatively
straight-forward to provide a kind of shape that does not render but that has an identifier; the renderer
could then report all such shapes (or the topmost such shape) that intersect(s) a given point. This would
integrate well with Flutter's framework.

Turing completeness
Some formats, notably PDF/EPS/DPS (via PostScript) and SVG (via JavaScript), are literally
Turing-complete, in that they can run programs (or in the case of PostScript, are programs) to compute
the graphical output.

There are advantages to such an approach, in particular, expressiveness. There are also disadvantages,
prime among which is that it makes a comprehensive static analysis of the image essentially impossible
(due to the halting problem). For some purposes, e.g. determining ahead of time what parts of the image
will be rendered based on the given inputs (see the "culling" section below), static analysis is very
important.

For this reason, it is probably valuable if this format eschews Turing completeness. Depending on 🧚
what features we introduce (especially around expressions), this may be tricky, and care will need to be
taken to keep from accidentally falling into this trap36.

Clipping at the image edge


It's usually assumed that images will be clipped at their edge, so shapes that extend beyond the edge of
the image are not drawn outside the image bounds. In principle one could require that encoders never go
outside the bounds, but this opens the door to some interesting security issues, e.g. images on web sites
that render over adjacent content from other security domains (origins).

In general, high-quality clips are expensive. A compromise requiring low-quality (non-antialiased) clips at
the image edge may be sufficient to address the security needs without a performance hit.

Design ideas
This section lists some design ideas that may or may not make it into the final proposal.

Culling
Shapes in the vector image could be stored so that those that are needed to render a subscene can be
quickly found.

There are multiple dimensions that are relevant:

● two or more dimensions to describe the region being drawn and the region covered by the shape
(in the simplest case, the region and shape can be described as axis-aligned bounding
rectangles, which only requires two dimensions).
● for variable detail, one dimension for the current level of detail to show (see discussion above).
● for images that depend on parameters that may themselves vary, e.g. a clock parameter, one
dimension per parameter.

If the detail level and other parameters are treated uniformly, then this simplifies to two dimensions for
the bounding box, plus one dimension per parameter. However, if these parameters are themselves
capable of being used to affect the geometry, the bounding box would have to be the bounding box over
every possible combination of values for the parameters, which may be prohibitively expensive to
compute.

Several data structures are candidates for this culling mechanism, including multidimensional interval
trees, and multidimensional R-trees. For geometry-based filtering in particular, the scene could be stored
in a data structure similar to a quad tree.

The tradeoffs involved here:

36
Humans have a long history of accidentally making things Turing complete, e.g. with C++ templates and other type systems, various card
games and video games, even the x86 MOV instruction alone.
● This would likely improve performance for complex scenes that use parameters or that are
rendered cropped. Without a culling algorithm, the entire scene has to be processed in every
frame, regardless of how much of the scene is to be rendered.
● Implementation complexity of the renderer is increased somewhat.
● Implementation complexity of the encoder is increased, the magnitude of the complexity depends
on the design. (Generally, creating balanced trees is more complicated than querying
pre-balanced trees, so the cost on the generation side is likely higher than the code on the
renderer side.)
● Disk footprint of the image is increased since it has to hold these tables.
● Depending on the design, there is the potential for redundancy in the format (e.g. if the shape is
of a size that exceeds the size implied by its position in the quad tree). This would be a source of
bugs.
● Memory footprint for rendering is increased, as these tables must be kept in memory. This could
be somewhat mitigated by careful design of the on-disk format so that it can be efficiently
processed in its raw form.

In practice, the burden for implementation here is primarily on the generator, in determining what shapes
are visible for what combinations of parameters. That said, the degenerate case where the generator
assumes every shape is always available would function correctly, it would just be less optimal.

Conclusion: This feature should be included in the format if a data structure can be found that has a
reasonable level of implementation complexity. 🧚
Designing for GPUs37
One question to be asked is how much can the language be optimized for implementation using GPUs
(e.g. using shaders)? Maximizing the level to which specialized hardware is used to render the vector
graphics, moving as much work as possible from the CPU to other hardware like the GPU, is in line with
our desire to prioritize rendering speed.

Traditional GPUs

In practice, to truly optimize a format for the traditional GPU hardware, one would need to consider a very
basic format, primarily focused around drawing triangles (something equivalent to different calls to
"Canvas.drawVertices" in the Flutter API). As soon as the format is in any way more complicated than
that, the implementation on traditional GPUs becomes non-trivial and there is little sense in trying to
optimize for the ability to implement it efficiently in hardware.

Animations

At a higher level, there are format decisions that can be made that can dramatically help with
performance. The main one is being able to report ahead of time if a particular shape or set of shapes
will be changing, vs whether it will remain static. If a shape remains static, a higher up-front cost to
"compile" it into a form that is more efficient to paint will be worth paying, as the initial cost is amortized
over subsequent frames (to put it another way, "GPUs like things that never change"). On the other hand,
if a shape will change dynamically every frame, it is more efficient to use more expensive paint
operations to paint the shape, but avoid the much more expensive cost to set up the drawing in the first
place.

37
This section is based on discussions with the Skia and Spinel teams.
For example, consider drawing a circle38. There are two approaches one could take on a traditional GPU.
The first is to create a shader that solves the equation of the ellipse for each pixel, to determine if the
pixel should be rendered as opaque or transparent. This approach is expensive on a per-pixel basis, and
has a constant cost; subsequent frames will cost the same to render as the first frame. The price is
almost entirely borne by the GPU. The second approach is to convert the circle into a batch of triangles.
This has a high upfront cost (borne by the CPU), but actually painting the circle is absurdly fast as it
leverages the GPU's innate affinity to drawing triangles. If the circle is drawn twice, the second time will
be very quick, much quicker than the approach with the shader and the equation of the ellipse. On the
other hand, if the circle changes radius every frame, then the triangle approach would be much more
expensive as it would need to be converted to triangles afresh every frame.

An approach that can help with performance for animations in particular is a mutable scene graph, where
animations result in updates to the scene graph rather than an entirely new display list each frame. The
update approach allows a rendering engine to re-use substantial portions of the computation from
previous frames. For this reason, we should ensure that any format we develop is designed to support
being implemented as a scene graph.

General-purpose GPUs

For more modern GPUs (General Purpose GPUs, supporting Vulkan), an approach based on rendering
paths in bulk with unordered segments is dramatically more efficient39. For such an approach to be
maximally effective, path data must be provided in a form that can be consumed by a shader efficiently,
rather than being expressed as an imperative set of operations. For example, rather than passing the set
of commands "moveTo x0,y0, lineTo x1,y1, lineTo x2,y2, lineTo x3,y3, close", one might pass an array of
path segments of type "line" consisting of "x0,y0,x1,y1;x0,y0,x3,y3;x1,y1,x2,y2;x2,y2,x3,y3". In such an
approach, strokes are not a supported primitive; instead, strokes would be pre-converted to fills. Taking
this further, one can imagine bulk-uploading not just paths, but also transforms, style information, and
blend information (each being a separate step in a parallel pipeline for rendering the paths).

Future GPUs

Looking forward to hardware that could be expected in coming decades, the highest importance is to
enable parallelism, and thus avoid features that are inherently ordered in their processing. For example,
a fill style that was defined as an iterative function where a user-provided expression was computed for
each pixel based on the result of that expression applied to the previous pixel would be a worst-case
scenario: only one pixel can be computed at a time. A fill style that was defined as a user-provided
expression whose parameters are only the coordinates of the pixel would, on the other hand, allow every
pixel to be computed in parallel.

Discussion

Another minor factor is how much effort is needed to convert the data in the file into a form usable by the
rendering logic. A format that must be converted into data by mapping commands in the file to an
imperative immediate-mode drawing API may not achieve as fast a rendering performance as one where
the format can be mapped directly into data structures that can be used to drive the rendering. One way
to achieve this would be to provide path descriptions in one part of the file, paint (styling) descriptions in

38
This is a vastly simplified example that is intended to convey the general truth rather than conveying an accurate reality. In practice, there are
many ways to draw circles, and the details may vary from GPU to GPU, from circle to circle, and over time as new algorithms are discovered.
39
See also GPU-Centered Font Rendering Directly from Glyph Outlines (Lengyel 2017), Resolution Independent Curve Rendering using
Programmable Graphics Hardware (Loop, Blinn; 2005), and piet-gpu.
another part of the file, and so forth, with those sections quickly parsed (or even directly mapped) into
corresponding data structures.

Another approach is to avoid features that indirectly cause changes to the scene graph where significant
work must be done to determine what will change. (An example of such a feature is SVG's use of CSS
rules and inheritance, where many changes (e.g. to one element's attributes, or even mouse movements
via ":hover" rules) can have knock-on effects on other parts of the scene graph -- computing what is
affected by any particular change, and computing what might be affected by any hypothetical future
change, is non-trivial.)

There are specific features that can be expensive, especially on traditional GPUs. Blurs and some other
image filter effects are one obvious example (blurs in particular are expensive because they require
multiple passes and are not a good fit for implementation on the GPU). Compositing layers is another
(equivalent to "saveLayer" in the Flutter API).

In general, any time the GPU has to switch configurations there is a cost; the render target switch of
compositing layers is merely the most expensive example. Another would be alternating between
drawing rectangles and drawing text. This particular cost can be avoided in many cases by reordering
draw operations so that similar operations are done together (drawing multiple rectangles then drawing
multiple segments of text).

Ideas from other formats


Avoiding antialiasing seams 🚫
Consider two adjacent rectangles of one color, composited over another solid color. At the point where
the two shapes touch, there is typically a seam because the colors of the rectangles are anti-aliased with
the background.

Paths in Adobe Flash could be given a separate "left side color" and "right side color", so that two
adjacent shapes being composited over a shape of another color could be antialiased without leaving
this seam40. Essentially, the "outside color" would override the background color when computing the
antialiasing of the shape's edge.

Multiple formats
It is possible that rather than solving all the use cases listed above with one format, we should consider
different formats. For example, one for static images and one for animated images; or one for small
images (icons, skeuomorphic widgets) and one for large images (backdrops, skeuomorphic screens); or
one for simple graphics and one for graphics with parameters and hit testing.

The tradeoff here is on the implementation complexity front and on the matter of how easy it would be for
us to convince people to adopt multiple formats rather than one.

Using fixed-point coordinates 🚫


Most graphics formats use floating-point coordinates. There are benefits to this; for example, it allows for
arbitrary levels of detail; a map could be expressed using a coordinate system in kilometers and yet still
allow the user to zoom into the image and show a virus in detail at the true scale, while simultaneously

40
More or less. This is documented in the SWF spec on pages 128-129, and in some blog posts.
allowing the user to zoom out of the image and show the entire solar system in the same image also at
true scale.

However, our use cases explicitly exclude maps as a use case, and the use cases that are listed really
do not need this level of expressivity.

Floating point numbers have some issues.

Most ARM GPUs today don't support 64 bit floats, so we would have to consider using 32 bit floats or
requiring some preprocessing for today's slowest hardware. (In contrast, 64 bit integers, even where not
supported, could be implemented relatively easily in software using instructions such as ADDC.)

Errors tend to creep into arithmetic involving floating point numbers in unintuitive ways. If we take an
approach where fill paths are expressed using disjoint path segments, then code that attempts to
correlate points may find that the encoders failed to compute the coordinates consistently and that the
path does not in fact exactly line up in the least-significant-bits. This problem is so pervasive in Flutter's
layout code that Flutter allows floating point numbers to be considered equal even if they are only mostly
equal. (In contrast, integers do not have this issue.)

This all suggests considering using integers, potentially combined with some defined or dynamic scale
factor, as the basic data type for expressing coordinates. On the other hand, graphics are commonly
done using floating point, and forcing all computation to be done in the integer domain may be sufficiently
non-idiomatic to be worth avoiding.

Design decision summary


● Format will be optimized for machine readability, not hand-authoring.
● A human-optimized format will exist that can be compiled to the machine format.
● A tool will be created to convert from human format to the machine format.
● The format will not be optimized for editors.
● The format will not be Turing complete41.
● Design will prioritize concerns as follows:
1. Security (most important).
2. Backwards compatibility.
3. Rendering speed of the full image (and power requirements).
4. Flutter suitability.
5. Forwards compatibility.
6. Rendering speed of the image when rendered at small sizes.
7. Rendering quality.
8. Rendering speed of subparts of the image.
9. Disk footprint of the renderer.
10. Disk footprint of the image.
11. Memory footprint.
12. Ability to create a corresponding hand-editable format.
13. Accessibility and indexability.
● Features will include only:
○ parameters (allowing runtime manipulation of numbers)
■ only explicitly defined parameters, no implicit parameters

41
Taking bets on how long it'll take to accidentally violate this design goal. Anyone? Anyone?
■ only parameters expressible as numeric values
■ usable as replacements for colors/gradients, coordinates
○ expressions to manipulate parameters
■ operators limited to what is commonly available in GPUs
○ filling shapes (not strokes, not text, not textures) with specific styles
■ paths described using only cubic Béziers.
■ styles described using only:
● flat straight colors
● gradients
○ hit testing
○ parameter-driven image composition (e.g. including or excluding shapes based on level of
detail)
● The format will be designed so that primitives (e.g. shapes) can be referenced multiple times.

Evaluating existing designs


The focus on rendering speed as a very high priority pulls away from formats such as SVG and
VectorDrawable, which require, at a minimum, an XML parser, and to a lesser degree formats like Lottie,
which require a JSON parser. For optimal performance one is pushed towards binary formats and,
ideally, formats that can be interpreted and rendered natively in GPU hardware (e.g. using compute
shaders). Parsing text formats does not lend itself to this implementation strategy.

Evaluating new designs


Compatibility with authoring tools

As part of reviewing new designs (such as those below), we should consider how well the proposed
formats fit in with existing tools. For example, verifying that gradients are defined in a manner compatible
with the conventions used in Adobe Illustrator or SVG.

Strawman designs
If you have any proposals, please do not hesitate to describe them here. Proposals should have sample
implementations and sample images (ideally derived from the sample images of existing formats, so that
they can be compared more easily), as well as documentation describing the format.

Icon VG
Developed in the google/iconvg GitHub repository, IconVG is a vector graphics format whose design
constraints differ from those described in this file, but which addresses a similar set of needs.

Primary designer: Nigel Tao

Dart implementation: https://github.com/google/iconvg/tree/main/src/dart

Test images: https://github.com/google/iconvg/tree/main/test/data


Open Font Format
One hypothetical proposal could be to extract the vector graphics parts of the OFF into an independent
format.

A fixed-alignment binary format


This section describes a file format known here as Web Vector Graphics, or WVG.

This is presented as a proof of concept, not a formal proposal. It is intended to encourage a


review of the priorities presented earlier in this document, to verify that the specified features are
indeed a suitable set of features and that none of the omitted features are important enough to
warrant reconsideration.

Primary designer: Ian Hickson

Dart implementation: https://github.com/google/ui-exp-dg/blob/e9416c/wvg/rendering/lib/wvg.dart

Test images: https://github.com/google/ui-exp-dg/tree/e9416c/wvg/handcrafting/samples

Introduction

This section is non-normative.

The WVG format is a binary vector graphics format.

While it is intentionally quite extensible and therefore could host many more features in the future,
currently this format supports only painting stacks of paths, each one painted by filling it by either a solid
color, a linear gradient, or a radial gradient.

A path is described as one or more shapes, shapes consist of one or more curves, curves are either
cubic Béziers and rational quadratic Béziers.

The format consists of blocks of 64 words, and every word is 32 bits. There are various block types, such
as matrix blocks, curve blocks, or gradient blocks. Data in these blocks is aligned in a regular fashion; for
example, a matrix block consists of four sets of 16 words giving the 16 values of a 4x4 matrix. Blocks
have no framing.

Here is a sample file:


0: 0a475657 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000006 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1: 42400000 42400000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2: 000000ff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3: 3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 41c00000 40800000 00000000 3f800000
3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 41d00000 42080000 00000000 3f800000
3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 41d00000 41900000 00000000 3f800000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4: c1a00000 00000000 41a00000 00000000 c0800000 c0800000 00000000 00000000 c0800000 c0800000 00000000 00000000 ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
5: 41a00000 42200000 41a00000 00000000 00000000 c1400000 c1400000 00000000 00000000 c0800000 c0800000 00000000 ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
6: c130cccd c1a00000 4130cccd 41a00000 00000000 c0800000 c0800000 00000000 00000000 c0800000 c0800000 00000000 ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
7: 00000000 41f86666 42200000 410f3333 00000000 00000000 c1400000 c1400000 00000000 00000000 c0800000 c0800000 ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
8: c1a00000 c130cccd 41a00000 4130cccd c0800000 c0800000 00000000 00000000 c0800000 c0800000 00000000 00000000 ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
9: 410f3333 42200000 41f86666 00000000 00000000 c1400000 c1400000 00000000 00000000 c0800000 c0800000 00000000 ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
10: 00000000 00000000 00000004 00000006 00000000 00000004 00000004 00000006 00000000 00000008 00000004 00000006 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
11: 00000000 00000000 00000002 ffd00000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

This file has 12 blocks (numbered along the left margin). The first block is the header, and specifies how
many blocks of each type are in the file:

0: 0a475657 00000001 00000000 00000000 00000000 00000000 00000000...

The first word is the signature. The second word says that there is one block of type 0 (metadata blocks),
the next few words are all zero indicating that there's no blocks of type 1, 2, 3, etc.

Examining the first block carefully indicates that there are the following blocks:

● 1 header block (not indicated in the header itself)


● 1 block of type 0
● 1 block of type 7
● 1 block of type 23
● 6 blocks of type 31
● 1 blocks of type 35
● 1 block of type 55

This adds to a total of 12 blocks, as expected.

Most block types aren't defined in this specification, which is why they are zero; this allows for future
expansion in a forward- and backward-compatible manner (renderers ignore unknown block types but
can skip them easily).

Each of the block types that are present have a particular meaning. For example, block type 0 is the
metadata block. The metadata blocks starts as follows:

1: 42400000 42400000 00000000 00000000 00000000 00000000 00000000...

The word 0x42400000 is an IEEE754-encoded floating point number (binary32). It represents the
number 48.0. The first one is the width of the image, the second is the height. The remaining 62 words of
the metadata block are zero; again, future versions of the format may define meaning for those values
but for now they are skipped.

As per the header, the next block is of type 7, which corresponds to a parameter block. The parameter
blocks define blocks of 64 values that can be configured at "runtime" (when the image is being
displayed). The values in the file represent the default values for the parameters. In this file, it turns out
that only the first parameter is actually used; the other 63 are ignored. (There is no way to know this
directly from examining the parameter block.)

Here is the parameter block:

2: 000000ff 00000000 00000000 00000000 00000000 00000000 00000000...


The default value of the first parameter in this file is 0x000000FF, which is either the number 255, the
color "black", or roughly 3.57e-43, depending on whether it represents an integer, a color, or a floating
point number. We will see how the parameter is used later (spoiler: in this file, it's interpreted as a color).
A file could have more than one block of parameters; for example, a file with 3 blocks of parameters
would have 3*64 = 192 configurable parameters.

The next block is of type 23. (This implies that there are no expression blocks in this file; those are of
type 15, and the 17th word in the file, which gives the number of blocks of type 15, is zero.)

That block has a lot more non-zero data than the others:

3: 3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000


00000000 00000000 3f800000 00000000 41c00000 40800000 00000000 3f800000
3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000
00000000 00000000 3f800000 00000000 41d00000 42080000 00000000 3f800000
3f800000 00000000 00000000 00000000 00000000 3f800000 00000000 00000000
00000000 00000000 3f800000 00000000 41d00000 41900000 00000000 3f800000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

This block is a set of four matrices, in column-major order, with each word representing a binary32
floating-point number. In this case the first matrix is42:
1.0 0.0 0.0 24.0
0.0 1.0 0.0 4.0
0.0 0.0 1.0 0.0
0.0 0.0 0.0 1.0
...which is a translation matrix applying an offset of (+24.0, +4.0). The second matrix is almost identical,
but applies an offset of (+26.0,+32.0), the third applies an offset of (+26.0,+18.0), and the fourth is all
zeroes (and, it will transpire, is not used in this file).

Next we have 6 blocks of type 31, curve blocks:

4: c1a00000 00000000 41a00000 00000000 c0800000 c0800000 00000000...


5: 41a00000 42200000 41a00000 00000000 00000000 c1400000 c1400000...
6: c130cccd c1a00000 4130cccd 41a00000 00000000 c0800000 c0800000...
7: 00000000 41f86666 42200000 410f3333 00000000 00000000 c1400000...
8: c1a00000 c130cccd 41a00000 4130cccd c0800000 c0800000 00000000...
9: 410f3333 42200000 41f86666 00000000 00000000 c1400000 c1400000...

These are read "vertically": each curve is has one coordinate in each block, so here we see 7 curves (out
of the 64 curves that these 6 blocks represent). The data in these blocks is in binary32 format (floating
point numbers). In this case the curves are all cubics, and the coordinates in block 4 are the x3
coordinates, block 5 has the y3 coordinates, block 6 has the x1 coordinates, and so on with y1, x2, and
y2. (The x0 and y0 coordinates of each curve are implied by the previous curve; and each set of curves
begins at the origin.) Curve blocks come in groups, in this case 6 blocks form the group.

Looking at the blocks carefully will show that many of these words are 0xFFFFFFFF. This is a NaN in the
floating point binary32 format. Of the 64 curves, all but 12 are entirely formed of NaNs. These are,
unsurprisingly, unused in the file. So really, there are 12 curves in these 6 blocks.

42
Recall that the format is column-major, which is why the 24.0 value, 0x41c00000, is the 13th entry in the matrix data, not the 4th.
There are only two more blocks in this file. The first of these, block 10, is of type 35, shape blocks:

10: 00000000 00000000 00000004 00000006 00000000 00000004 00000004...

It indicates how to combine the curves into a shape. In the shape blocks, each block holds up to 16
shapes (four words each). The words are integers. The first two words of the shape identify the first curve
of the shape (so for the first shape, 0, 0, the first curve of the shape starts at block 0 of the curves, word
0 of that block). The third word is the number of curves in the shape (for the first shape here, that's 4
curves), and the fourth word is the number of blocks per group for these curves (in this case, 6). The
second shape's numbers are 0, 4, 4, 6, indicating that the secord shape has four curves, starting at curve
4 in block 0. It turns out there is one more shape, whose numbers are 0, 8, 4, 6. (The rest of the block is
all zeroes.) So in total we have three shapes, each formed of four curves, all in the same group
described by the 6 blocks of type 31 discussed above.

Finally we have one more block, the composition block, of type 55. Each composition takes an entire
block (allowing for significant expansion in the future). Here is the one composition in this file:

11: 00000000 00000000 00000002 ffd00000 00000000 00000000 00000000...

Compositions specify groups of shapes and matrices to form together into a single path, which is then
filled by a specified paint (gradient) or color. Currently each composition consists of five numbers (and 59
zeroes). The first word specifies the index of the matrix that is used for the first shape, the second
specifies the index of the first shape itself, the third is the number of extra shapes to add, and the fourth
and fifth specify the paint style.

So in this case we specify that the first matrix is matrix 0, the first shape is shape 0, and that there's a
total of 3 shapes. (This is why there are three matrices. Each one specifies how to position one of these
shapes to form the actual path.)

The fourth word is 0xFFD00000 which is a special value indicating parameter 0 is to be used as a color
to paint the path. 0xFFD0 indicates a parameter reference, and 0x0000 indicates the first parameter.

Parameters can actually be referenced in many places, e.g. in curves, using this same form.
(0xFFD00000 is a NaN value in binary32, so this does not reduce the expressivity of the format.)

Conventions

All assertions, diagrams, examples, and notes are non-normative, as are all sections explicitly marked
non-normative. Everything else is normative.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD
NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as
described in RFC 2119.

These keywords have the same meaning when written in lowercase and cannot appear in non-normative
content.

This section describes two conformance classes: WVG files (and by implication, WVG file generators),
and WVG renderers. Conformance requirements for each class are entirely independent. WVG file
generators must not generate non-conforming WVG files.
In this section, where algorithms are described as sequences of steps, the construction "Assert:
Condition" indicates that if this specification has no errors, the condition specified should invariably be
true at that point in the algorithm.

Expressions used in the definitions in this section operate in an unlimited domain. (As opposed to, say,
the 32 bit domain. For example, addition never overflows.) Implementations are not expected to use
these expressions literally, but must implement equivalent logic.

Numbers with a 0x prefix are hexadecimal. Numbers with a 0b prefix are binary. Numbers without a prefix
are decimal. For example, 0xC0, 0b11000000, and 192 are equivalent.

Ranges of numbers are inclusive (i.e. the range 0..3 is the four numbers 0, 1, 2, 3).

Structure

WVG files consist of sequences of blocks, each of which is 64 words long. A word is 32 bits long and can
be interpreted, depending on context, as any of:

● uint32: a little-endian unsigned integer.


● int32: a little-endian two's-complement signed integer.
● float32: a little-endian IEEE 754 32 bit single floating point binary32 number.
● color32: a little-endian unsigned integer representing straight (not premultiplied) RGBA quads,
with the high 8 bits representing the red channel, the next 8 bits representing the blue channel,
the next 8 bits representing the green channel, and the low 8 bits representing the alpha channel.

When this section talks about comparing a word to an integer, it must be interpreted as a uint32. When
this section talks about the bits of a word, those bits are interpreted as an unsigned integer of the given
size. (For example, "the high 16 bits" of a word implies interpreting the word as a uint32, then shifting that
integer right by sixteen bits.) The number zero is represented identically in all three numeric
representations and so is sometimes referenced without specifying its interpretation. Zero and
fully-transparent black are equivalent.

WVG files must be at least one block long. The first block, the header, consists of a signature word
followed by 63 words describing the number of subsequent blocks of each type.

All blocks of a particular type are present in the file contiguously.

Header block

The first word of a WVG file must be the WVG signature. The WVG signature is a uint32 with value
0x0A475657. (Recall that WVG files are little-endian.)

If the file is less than 64 words long, or if the signature word interpreted as uint32 is not 0x0A475657, the
remainder of the file must be ignored.

Words 2-64 form an array of 63 uint32s known as BLOCK_SIZES[i] where i has the values 0..62, with the
second word in the file being BLOCK_SIZES[0], the third word being BLOCK_SIZES[1], and so forth up
to the 64th word being BLOCK_SIZES[62].
The sum of all the numbers in BLOCK_SIZES plus 1 must correspond to the exact size of the file in
blocks (a block being 64 words or 256 bytes). If it does not, then the file is invalid and the remainder of
the file must be ignored.

For convenience, the following indices in BLOCK_SIZES are named, and the respective entries in
BLOCK_SIZES give the number of blocks of each known type in the file:

● METADATA_BLOCKS = 0 (metadata blocks, with information like width/height)


● PARAM_BLOCKS = 7 (parameter blocks)
● EXPR_BLOCKS = 15 (expression blocks)
● MATRIX_BLOCKS = 23 (matrix blocks, where raw data for matrices is given)
● CURVE_BLOCKS = 31 (curve blocks, where the raw data for paths is described)
● SHAPE_BLOCKS = 35 (shape blocks, where curves are collected into paths)
● GRADIENT_BLOCKS = 43 (gradient blocks, where raw data for gradients is given)
● PAINT_BLOCKS = 47 (paint blocks, where styles are described)
● COMP_BLOCKS = 55 (composition blocks)

The value BLOCK_OFFSETS[i] is defined to be 1 plus the sum of all values in BLOCK_SIZES with
indices less than i. (BLOCK_OFFSETS[63] is therefore the size of the file in blocks.) It specifies the offset
of the first block of a particular type.

The values of BLOCK_SIZES[i] for values of i other than those with named indices above must be zero.

Metadata blocks

If BLOCK_SIZES[METADATA_BLOCKS] is greater than zero, then the first word of the block starting at
BLOCK_OFFSETS[METADATA_BLOCKS], as float32, gives IMAGE_WIDTH, the width of the image,
and the second word of that block, as float32, gives IMAGE_HEIGHT, the height of the image. If
BLOCK_SIZES[METADATA_BLOCKS] is zero then the width and height are both 1.0.

The units of IMAGE_WIDTH and IMAGE_HEIGHT are arbitrary but the coordinate space they define is
the one used by all other coordinates in the file, modulo transforms.

All other words in metadata blocks must be zero and must be ignored by renderers.

WVG files should not contain any compositions that would render pixels outside of the rectangle whose
top left is at the origin and whose bottom right is at IMAGE_WIDTH,IMAGE_HEIGHT.

Renderers should clip images to the rectangle whose top left is at the origin and whose bottom right is at
IMAGE_WIDTH,IMAGE_HEIGHT. Renderers may perform this clip at low quality, because for conforming
images this clip will never be necessary. Renderers may skip the clip entirely, especially for content that
is known to not extend outside the image rectangle (e.g. because it is not arbitrary user-generated
content but is instead content selected by the same team as that invoking the renderer).

Parameter blocks

WVG files can be adjusted at runtime by varying their parameters. For example, a value in the parameter
block could represent the time component of an animation, and paint blocks can refer to parameters
when defining flat color paints and colors in gradients.
Parameters are words. A WVG file has zero or more parameter blocks, each of which introduces 64
parameters. For convenience, we define PARAM_COUNT as BLOCK_SIZES[PARAM_BLOCKS] * 64.

If PARAM_COUNT is zero, then the file has no parameters.

Otherwise, the BLOCK_SIZES[PARAM_BLOCKS] blocks starting at


BLOCK_OFFSETS[PARAM_BLOCKS] are parameter blocks and represent default parameter data. Each
word represents one parameter's default value. Parameters are numbered consecutively and default to
the value given in the file, with the first word of the first parameter block being the default value of
parameter zero, and the last word of the last parameter block being the default value of the parameter
with number PARAM_COUNT - 1.

Later references to parameters with indices PARAM_COUNT or greater are interpreted as references to
the number zero.

PARAM_COUNT can in theory be as large as 270, but due to other limitations of this format, only 65536
possible parameters are ever actually accessible (parameters are referenced using 16 bit values). For
this reason, files should not specify a BLOCK_SIZES[PARAM_BLOCKS] value greater than 1024, and
renderers can treat PARAM_COUNT values greater than 65536 as 65536 without loss of generality
(skipping over the "unreachable" parameter blocks if BLOCK_SIZES[PARAM_BLOCKS] is greater than
1024).

At runtime, before or after rendering a file (but not during the rendering of a file), implementations may
replace the value of any parameter in the range 0..PARAM_COUNT-1 in an implementation-defined
manner (typically, as a result of some API call, as described below). When a parameter is changed, the
image should be rerendered at the next available opportunity.

Expression blocks

Parameters are converted to values used in shape and paint definitions using expressions.

Expression blocks are the BLOCK_SIZES[EXPR_BLOCKS] blocks starting with the block at
BLOCK_OFFSETS[EXPR_BLOCKS].

Each expression block represents one expression. Expressions are evaluated as per the steps described
below. Expressions can refer to earlier expressions (this is necessary in some cases to express
complicated expressions since the definition of each expression must fit in 64 words). An evaluated
expression has a value which is a word, the interpretation of which is determined when it is used (so e.g.
an expression could describe the addition of two int32s, which itself results in an int32, but which is later
interpreted as a float32 as part of a coordinate).

For convenience, we define EXPR_COUNT as BLOCK_SIZES[EXPR_BLOCKS].

EXPR_COUNT can in theory be as large as 232, but due to other limitations of this format, only 65536
possible expressions are ever actually accessible (parameters are referenced using 16 bit values). For
this reason, files should not specify a BLOCK_SIZES[EXPR_BLOCKS] value greater than 65536, and
renderers can treat EXPR_COUNT values greater than 65536 as 65536 without loss of generality
(skipping over the "unreachable" expression blocks if BLOCK_SIZES[EXPR_BLOCKS] is greater than
65536).
If EXPR_COUNT is greater than zero, then, for each expression from zero to EXPR_COUNT-1, the value
of the expression is computed as follows. These computations must be done sequentially since later
expressions may refer to earlier expressions.

1. Let CURRENT_EXPR be the number of the expression being evaluated, where the first
expression has number zero, and the last expression has number EXPR_COUNT-1.
2. Let EXPR[i] be the ith word of the 64 words in the block of the expression being evaluated, where
i is zero for the first word of the block and 63 for the last word of the block. The block in question
is the one at block offset BLOCK_OFFSETS[EXPR_BLOCKS] + CURRENT_EXPR.
3. Let EXPR_INDEX be zero.
4. Let STACK[i] be storage space for 64 words, where i is zero for the first stored word and 63 for
the 64th stored word. The algorithm below is designed to allow, though not require, that STACK
and EXPR share the same memory (the observable behaviour is intended to be identical either
way).
5. Let STACK_INDEX be zero.
6. Let ERROR_COUNT be zero. (ERROR_COUNT is only used to determine the validity of the file,
it does not affect renderer semantics.)
7. Loop:
7.1. Assert: STACK_INDEX is less than 64, because each loop iteration increases
EXPR_INDEX by one and STACK_INDEX by no more than one, and the loop ends if
EXPR_INDEX reaches 64.
7.2. If the high bit of EXPR[EXPR_INDEX] is zero:
7.2.1. Let STACK[STACK_INDEX] equal EXPR[EXPR_INDEX]. This allows any positive
number (whether int32 or float32) to be encoded verbatim in the expression.
Negative numbers can be encoded as their positive value followed by the integer
negate or float negate operator (see below).
7.2.2. Increment STACK_INDEX.
7.2.3. Skip to loop increment below.
7.3. Otherwise, the high bit of EXPR[EXPR_INDEX] is one. If the two highest bits of
EXPR[EXPR_INDEX] are 0b10, this is a one-argument operator:
7.3.1. If STACK_INDEX is less than one, increment ERROR_COUNT and skip to the
loop increment step below. The operator has no effect.
7.3.2. If EXPR[EXPR_INDEX] is 0x80000000 (integer negate):
7.3.2.1. Let STACK[STACK_INDEX-1] equal the int32 negation of
STACK[STACK_INDEX-1] as int32.
7.3.2.2. Skip to loop increment below.
7.3.3. If EXPR[EXPR_INDEX] is 0x80010000 (float negate):
7.3.3.1. Let STACK[STACK_INDEX-1] equal the float32 negation of
STACK[STACK_INDEX-1] as float32.
7.3.3.2. Skip to loop increment below.
7.3.4. If EXPR[EXPR_INDEX] is 0x80008000 (integer cast):
7.3.4.1. Let ARG be STACK[STACK_INDEX-1] as float32.
7.3.4.2. If ARG is not a finite number, or if ARG is greater than 231-1, if ARG is less
than -231, let ARG be zero.
7.3.4.3. Let STACK[STACK_INDEX-1] equal the int32 nearest integer
representation of ARG using odd-even rounding.
7.3.4.4. Skip to loop increment below.
7.3.5. If EXPR[EXPR_INDEX] is 0x80018000 (float cast):
7.3.5.1. Let STACK[STACK_INDEX-1] equal the nearest float32 representation of
STACK[STACK_INDEX-1] as int32.
7.3.5.2. Skip to loop increment below.
7.3.6. If EXPR[EXPR_INDEX] is 0x80020000 (duplicate):
7.3.6.1. Let STACK[STACK_INDEX] equal STACK[STACK_INDEX-1].
7.3.6.2. Increment STACK_INDEX.
7.3.6.3. Skip to loop increment below.
7.3.7. Otherwise, the operator has no effect; increment ERROR_COUNT and skip to loop
increment below.
7.4. If the three highest bits of EXPR[EXPR_INDEX] are 0b110, this is a two-argument
operator:
7.4.1. If STACK_INDEX is less than two, increment ERROR_COUNT and skip to the
loop increment step below. The operator has no effect.
7.4.2. If EXPR[EXPR_INDEX] is 0xC0000001 (integer add):
7.4.2.1. Let STACK[STACK_INDEX-2] equal the lower 32 bits of the int64 sum of
STACK[STACK_INDEX-2] as int32 and STACK[STACK_INDEX-1] as int32.
7.4.2.2. Decrement STACK_INDEX.
7.4.2.3. Skip to loop increment below.
7.4.3. If EXPR[EXPR_INDEX] is 0xC0000002 (integer subtract):
7.4.3.1. Let STACK[STACK_INDEX-2] equal the lower 32 bits of the int64 difference
of STACK[STACK_INDEX-2] as int32 as the minuend and
STACK[STACK_INDEX-1] as int32 as the subtrahend.
7.4.3.2. Decrement STACK_INDEX.
7.4.3.3. Skip to loop increment below.
7.4.4. If EXPR[EXPR_INDEX] is 0xC0000003 (integer multiply):
7.4.4.1. Let STACK[STACK_INDEX-2] equal the lower 32 bits of the int64 product
of STACK[STACK_INDEX-2] as int32 and STACK[STACK_INDEX-1] as
int32.
7.4.4.2. Decrement STACK_INDEX.
7.4.4.3. Skip to loop increment below.
7.4.5. If EXPR[EXPR_INDEX] is 0xC0000004 (integer divide):
7.4.5.1. Let STACK[STACK_INDEX-2] equal the lower 32 bits of the int64 integer
quotient with STACK[STACK_INDEX-2] as int32 as the dividend and
STACK[STACK_INDEX-1] as int32 as the divisor43. (This being the "integer
quotient" means the result is an integer, e.g. "0x07 0x04 /" leaves 0x01 on
the stack.) If the divisor is zero, the result must be zero.
7.4.5.2. Decrement STACK_INDEX.
7.4.5.3. Skip to loop increment below.
7.4.6. If EXPR[EXPR_INDEX] is 0xC0010001 (float add):
7.4.6.1. Let STACK[STACK_INDEX-2] equal the float32 sum of
STACK[STACK_INDEX-2] as float32 and STACK[STACK_INDEX-1] as
float32.
7.4.6.2. Decrement STACK_INDEX.
7.4.6.3. Skip to loop increment below.
7.4.7. If EXPR[EXPR_INDEX] is 0xC0010002 (float subtract):

43
Performing this operation entirely in the 32 bit domain would allow overflow in the case where STACK[STACK_INDEX-2] is -231 and
STACK[STACK_INDEX-1] is -1.
7.4.7.1. Let STACK[STACK_INDEX-2] equal the float32 difference of
STACK[STACK_INDEX-2] as float32 as the minuend and
STACK[STACK_INDEX-1] as float32 as the subtrahend.
7.4.7.2. Decrement STACK_INDEX.
7.4.7.3. Skip to loop increment below.
7.4.8. If EXPR[EXPR_INDEX] is 0xC0010003 (float multiply):
7.4.8.1. Let STACK[STACK_INDEX-2] equal the float32 product of
STACK[STACK_INDEX-2] as float32 and STACK[STACK_INDEX-1] as
float32.
7.4.8.2. Decrement STACK_INDEX.
7.4.8.3. Skip to loop increment below.
7.4.9. If EXPR[EXPR_INDEX] is 0xC0010004 (float divide):
7.4.9.1. Let STACK[STACK_INDEX-2] equal the float32 quotient with
STACK[STACK_INDEX-2] as float32 as the dividend and
STACK[STACK_INDEX-1] as float32 as the divisor. If the divisor is zero, the
result must be infinity (with the sign being positive if the dividend and
divisor have the same sign, otherwise negative).
7.4.9.2. Decrement STACK_INDEX.
7.4.9.3. Skip to loop increment below.
7.4.10. Otherwise, the operator has no effect; increment ERROR_COUNT and skip to loop
increment below.
7.5. If the 10 highest bits of EXPR[EXPR_INDEX] are all set, this is a zero-argument operator:
7.5.1. If EXPR[EXPR_INDEX] is 0xFFC00000 (terminate):
7.5.1.1.1. If there exists a value of i, where i is greater than EXPR_INDEX but
less than 64, for which EXPR[i] is not 0xFFFFFFFF, then increment
ERROR_COUNT.
7.5.1.1.2. End the loop (skip to the after loop step).
7.5.2. If the high 16 bits of EXPR[EXPR_INDEX] are 0xFFD0 (parameter reference):
7.5.2.1. Let ARG be the lower 16 bits of EXPR[EXPR_INDEX].
7.5.2.2. If ARG is greater than or equal to PARAM_COUNT, increment
ERROR_COUNT.
7.5.2.3. Let STACK[STACK_INDEX] equal the parameter with index ARG. (If ARG
is greater than or equal to PARAM_COUNT, then this is zero.)
7.5.2.4. Increment STACK_INDEX.
7.5.2.5. Skip to loop increment below.
7.5.3. If the high 16 bits of EXPR[EXPR_INDEX] are 0xFFE0 (expression reference):
7.5.3.1. Let ARG be the lower 16 bits of EXPR[EXPR_INDEX].
7.5.3.2. If ARG is greater than or equal to CURRENT_EXPR, increment
ERROR_COUNT.
7.5.3.3. Let STACK[STACK_INDEX] equal the value of the expression numbered
ARG. (If ARG is greater than or equal to CURRENT_EXPR, then this is
zero.)
7.5.3.4. Increment STACK_INDEX.
7.5.3.5. Skip to loop increment below.
7.5.4. Otherwise, the operator has no effect; increment ERROR_COUNT and skip to loop
increment below.
7.6. Otherwise, the operator has no effect; increment ERROR_COUNT and skip to loop
increment below.
7.7. Loop increment: Increase EXPR_INDEX by one.
7.8. If EXPR_INDEX equals 64, end the loop (skip to the after loop step).
7.9. Jump to the top of the loop.
8. After loop: If STACK_INDEX is zero:
8.1. Let STACK[STACK_INDEX] equal zero.
8.2. Increment STACK_INDEX.
9. The expression's value is STACK[STACK_INDEX-1]. If ERROR_COUNT is non-zero, then the
expression is invalid.

A WVG file must not contain any expression blocks which, when computed according to the steps above,
are determined to be invalid. This primarily means that operators in valid files will always have the right
number of values on the stack, that references will be to valid parameters and expressions, and that
values after the terminate operator will all be 0xFFFFFFFF NaNs.

Later references to expressions with numbers EXPR_COUNT or greater are interpreted as references to
the number zero.

Matrix blocks

Matrices are used by paints and compositions. In WVG all matrices are 4x4.

Matrix blocks are the BLOCK_SIZES[MATRIX_BLOCKS] blocks starting with the block at
BLOCK_OFFSETS[MATRIX_BLOCKS].

Each matrix block contains 4 matrices.

For convenience we define MATRIX_COUNT as BLOCK_SIZES[MATRIX_BLOCKS] * 4.

MATRIX_COUNT can in theory be as large as 234, but due to other limitations of this format, only 232
possible matrices are ever actually accessible (matrices are referenced using 32 bit values). For this
reason, files should not specify a BLOCK_SIZES[MATRIX_BLOCKS] value greater than 230, and
renderers can treat MATRIX_COUNT values greater than 230 as 230 without loss of generality (skipping
over the "unreachable" expression blocks if BLOCK_SIZES[MATRIX_BLOCKS] is greater than 230). Of
course a file with that many matrices would be over 17 gigabytes so this may be academic for a while
yet.

We additionally define MATRIX[i] as the first matrix returned when following these steps:

1. If i is greater than or equal to MATRIX_COUNT, return the identity matrix. It is static and valid.
2. Let STATIC be true.
3. Let VALID be true.
4. Let CELL[j] be the word that is i*16+j words after the start of the
BLOCK_OFFSETS[MATRIX_BLOCKS] block.
5. Let RESULT be an empty 4-by-4 matrix with rows and columns numbered 0 to 3.
6. For values of x from 0 to 3:
6.1. For values of y from 0 to 3:
6.1.1. Let VALUE be CELL[x * 4 + y]. (The matrix is stored in column-major order.)
6.1.2. Let the value of cell of the RESULT matrix in column x and row y be the first value
returned when following these substeps:
6.1.2.1. If VALUE as float32 is a non-NaN value, then return VALUE as float32.
6.1.2.2. Let ARG be the low 16 bits of VALUE.
6.1.2.3. Let MODE be the high 16 bits of VALUE.
6.1.2.4. If MODE is 0xFFD0, then let STATIC be false, let VALID be false if ARG is
greater than or equal to PARAM_COUNT, and return the parameter with
index ARG.
6.1.2.5. If MODE is 0xFFE0, then let STATIC be false, let VALID be false if ARG is
greater than or equal to EXPR_COUNT, and return the value of the
expression numbered ARG.
6.1.2.6. If no value has yet been returned by these substeps, let VALID be false.
6.1.2.7. Return VALUE as float32.
7. Return the RESULT matrix. If STATIC is true, the matrix is static, otherwise it is dynamic. If VALID
is true, the matrix is valid, otherwise it is invalid.

A matrix is either static or dynamic as defined by the steps above. (This indicates whether it depends on
the parameters, and therefore whether an implementation might need to recompute it when the
parameters are changed.)

A WVG file must not contain any matrix blocks such that the first matrix returned by MATRIX[i] is defined
as invalid, for any value of i. As a result of these definitions, WVG file generators may substitute
references to the identity matrix with references to matrices with an index greater than or equal to
MATRIX_COUNT. It is suggested that for files whose MATRIX_COUNT is less than 232, the index
0xFFFFFFFF be used for the identity matrix.

Shapes

Shapes consist of a series of one or more curves. Each curve in a shape is anchored at (and leading
away from) the end of the previous curve. (The first curve is implicitly anchored at, and leading away
from, the origin.) A straight line called the closing line (not to be confused with a clothing line) is implied
leading from the end of the last curve in a shape back to the origin.

WVG supports the expression of two kinds of curves: Cubic Béziers and Rational Quadratic Béziers.
Cubic Béziers are defined by six numbers (the coordinates of two control points, plus the coordinate of
the end point). Rational Quadratic Béziers use five numbers (the coordinates of the control point, the
control point's weight, and the coordinates of the end point).

Cubic Béziers

Cubic Béziers are third-order Béziers curves where:

● P0 = the end point of the previous curve, if any, or the origin otherwise
● P1 = the point x1,y1 (first control point)
● P2 = the point x2,y2 (second control point)
● P3 = the point x3,y3 (end point of this curve)

Rational Quadratic Béziers

Rational Quadratic Béziers are second-order rational Béziers curves where:

● P0 = the end point of the previous curve, if any, or the origin otherwise
● P1 = the point x1,y1 (control point)
● P2 = the point x2,y2 (end point of this curve)
● w0 = the value w, the weight
Curve blocks

The numbers used for representing curves are stored in contiguous groups of contiguous blocks. Each
group provides the data for up to 64 curves. The number of blocks per group (the group size) depends on
the needs of the curves in that group.

Curve blocks are those starting with the block at BLOCK_OFFSETS[CURVE_BLOCKS].

Data is striped within a group so that each curve has data at the same index of each block in the
contiguous blocks of that group. At each index there is either data for a cubic Bézier, a rational quadratic
Bézier, or no curve at all. Within each group, the blocks have the following semantics:

Block Cubic Béziers Rational Quadratic Béziers None

block 0: x3 (end point) x2 (end point) 0xFFFFFFFF

block 1: y3 (end point) y2 (end point) 0xFFFFFFFF

block 2: x1 (first control point) x1 (first control point) 0xFFFFFFFF

block 3: y1 (first control point) y1 (first control point) 0xFFFFFFFF

block 4: x2 (second control point) w (weight) 0xFFFFFFFF

block 5: y2 (second control point) 0xFFFFFFFF 0xFFFFFFFF

blocks 6+: 0xFFFFFFFF 0xFFFFFFFF 0xFFFFFFFF

Cells in the table above labeled 0xFFFFFFFF indicate that the relevant word must have all 32 bits sets to
indicate that the data is unused. Other cells indicate the semantic of the relevant word, those words must
be stored and interpreted as float32 values. (0xFFFFFFFF, when interpreted as a float32, is a NaN.)

In other words, the first block of a group contains 64 x-coordinates of the end points of 64 curves, then
the second block contains 64 y-coordinates of the end points of those 64 curves, and so forth. Curve
types can be mixed, for example in the fifth block (block 4) of a group that has just two curves, one cubic
and one quadratic, the first word would be the x-coordinate of the second control point of the cubic curve,
the second word would be the weight of the quadratic, and the other 62 words would all be set to
0xFFFFFFFF.

In this version of WVG, a 0xFFFFFFFF value in block 6 (or a group with only 6 blocks) with a non-NaN
value in block 5 indicates a cubic Bézier, and a 0xFFFFFFFF in block 5 (or a group with only 5 blocks)
indicates that the curve is a rational quadratic Bézier. There is never a need for a renderer to recognize
the "no curve" case. Future versions of this format may introduce other sentinel values to indicate other
kinds of curves.

Blocks that are not present in a group (e.g. the seventh block, block 6, in a 6-block group) are implicitly
full of 0xFFFFFFFF values (this is implemented in step 4 of the algorithm below). So for example if a
shape refers to curves with a group size of 5, then implicitly all the curves will be rational quadratics (not
cubics) because block 5 is implicitly always 0xFFFFFFFF in that set of curves..
While it would be highly unusual, there is nothing in this format that prevents blocks from being part of
more than one group (for example, with one group using curve blocks 0 to 6 and another using curve
blocks 3 to 9).

The curve with index i, group offset b, and group size g is defined as follows:

1. Let GROUP be the integer component of i / 64.


2. Let GROUP_OFFSET be BLOCK_OFFSETS[CURVE_BLOCKS] + b + GROUP * g.
3. Assert: GROUP_OFFSET < BLOCK_OFFSETS[CURVE_BLOCKS+1]
4. Let RAWCELL[j] be defined as 0xFFFFFFFF if j is greater than or equal to g, and the word with
offset i % 64 in the block with offset GROUP_OFFSET + j otherwise. (Word offsets are measured
in words from the start of their block, and block offsets are measured in blocks from the start of
the file.)
5. Let STATIC be true.
6. Let VALID be true.
7. Let CELL[j] be defined as the first value returned from following these substeps:
7.1. If RAWCELL[j] is 0xFFFFFFFF, let VALID be false and return RAWCELL[j].
7.2. If RAWCELL[j] is a non-NaN value when interpreted as float32, return RAWCELL[j].
7.3. Let ARG be the low 16 bits of RAWCELL[j].
7.4. Let MODE be the high 16 bits of RAWCELL[j].
7.5. If MODE is 0xFFC0, then let STATIC be false, let VALID be false if ARG is greater than or
equal to PARAM_COUNT, and return the parameter with index ARG.
7.6. If MODE is 0xFFD0, then let STATIC be false, let VALID be false if ARG is greater than or
equal to EXPR_COUNT, and return the value of the expression numbered ARG.
7.7. If no value has yet been returned by these substeps, let VALID be false.
7.8. Return RAWCELL[j] (a NaN value that is not 0xFFFFFFFF).
8. If CELL[6] is 0xFFFFFFFF and CELL[5] is not a NaN value, then the curve is a cubic Bézier:
8.1. x1 and y1 are CELL[2] and CELL[3] respectively, as float32.
8.2. x2 and y2 are CELL[4] and CELL[5] respectively, as float32.
8.3. x3 and y3 are CELL[0] and CELL[1] respectively, as float32.
9. Otherwise if CELL[5] is 0xFFFFFFFF, then the curve is a rational quadratic Bézier:
9.1. x1 and y1 are CELL[2] and CELL[3] respectively, as float32.
9.2. x2 and y2 are CELL[0] and CELL[1] respectively, as float32.
9.3. w is CELL[4], as float32.
10. Otherwise, the curve is nothing. Let VALID be false.
11. If STATIC is true, then this curve is dynamic. Otherwise, it is static.
12. If VALID is true, then this curve is valid. Otherwise, it is invalid.

A curve is either static or dynamic as defined by the steps above. (This indicates whether it depends on
the parameters, and therefore whether an implementation might need to recompute it when the
parameters are changed.)

A curve is either valid or invalid as defined by the steps above. A WVG file must not contain any invalid
curves.

A curve is either drawable or not. A curve is drawable if it is a cubic Bézier or a rational quadratic Bézier,
and all of its parameters are finite (not NaN and not infinite). Whether a curve is drawable or not can vary
based on the value of the parameters.
Assertion: If a curve is valid and static, it is drawable.

Shape blocks

Shape blocks are those starting with the block at BLOCK_OFFSETS[SHAPE_BLOCKS].

Each shape block describes 16 shapes, using four words each. Shapes are numbered. The shape with
index SHAPE_INDEX starts at word BLOCK_OFFSETS[SHAPE_BLOCKS] + SHAPE_INDEX * 4 of the
file.

For each shape:

1. The first word is SHAPE_GROUP_OFFSET, a uint32 indicating the index of the first curve block
for this shape.
2. The second word is SHAPE_START_CURVE_INDEX, a uint32 indicating the index within that
block for the first curve of this shape.
3. The third word is SHAPE_CURVE_COUNT, a uint32 indicating the number of curves in this
shape.
4. The fourth word is SHAPE_GROUP_SIZE, a uint32 indicating the number of blocks per group for
this shape.

A shape with no curves must have all four values set to zero.

All the shapes in shape blocks in a WVG file must meet all of the following conditions. If any of the
following conditions are not met, then the shape is considered invalid.

● SHAPE_GROUP_OFFSET must be less than BLOCK_SIZES[CURVE_BLOCKS].


● SHAPE_START_CURVE_INDEX must be less than 64.
● Let LENGTH be (SHAPE_START_CURVE_INDEX + SHAPE_CURVE_COUNT)/64, rounding up
to the nearest integer if it is not an integral number. SHAPE_GROUP_OFFSET + LENGTH *
SHAPE_GROUP_SIZE must be less than or equal to BLOCK_SIZES[CURVE_BLOCKS].
● SHAPE_GROUP_SIZE must be greater than or equal to 5 and less than or equal to 64.

These conditions imply that all but the low six bits of SHAPE_START_CURVE_INDEX and
SHAPE_GROUP_SIZE are unused (and will be zero)44. These bits could be used for future extensions of
this format.

A renderer must treat an invalid shape as if it had no curves.

A shape's curves are the curves with index i, group offset SHAPE_GROUP_OFFSET, and group size
SHAPE_GROUP_SIZE, where i is the integers greater than or equal to SHAPE_START_CURVE_INDEX
and less than SHAPE_START_CURVE_INDEX+SHAPE_CURVE_COUNT-1, in ascending numerical
order of i.

Every curve of a shape must be a valid curve.

A renderer must treat curves that are not drawable as if they were straight lines with zero length.

If a shape has one or more valid dynamic curves, then the shape itself is dynamic. Otherwise, the shape
is static. An implementation can precompute static shapes; they will not change even when the file's

44
Really all but one bit of SHAPE_GROUP_SIZE is unused right now, if we're honest.
parameters are changed at runtime. On the other hand, a dynamic shape might change when the
parameters are updated.

For convenience, we define SHAPE_COUNT as BLOCK_SIZES[SHAPE_BLOCKS] * 16.

SHAPE_COUNT can in theory be as large as 236, but due to other limitations of this format, only 232
possible paints are ever actually accessible (paints are referenced using 32 bit values). For this reason,
files should not specify a BLOCK_SIZES[SHAPE_BLOCKS] value greater than 228, and renderers can
treat SHAPE_COUNT values greater than 232 as 232 without loss of generality (skipping over the
"unreachable" parameter blocks if BLOCK_SIZES[SHAPE_BLOCKS] is greater than 228).

A renderer must treat a reference to a shape with an index greater than or equal to SHAPE_COUNT as if
it had no curves.

Gradient blocks

Gradient blocks describe the stop points and their colors for gradients defined in paint blocks.

Gradient blocks are the BLOCK_SIZES[GRADIENT_BLOCKS] blocks starting with the block at
BLOCK_OFFSETS[GRADIENT_BLOCKS].

For convenience we define GRADIENT_COUNT as BLOCK_SIZES[GRADIENT_BLOCKS] / 2, rounded


down if it is not an integer. (In other words, gradient blocks always come in pairs.) The number of
gradient blocks in a file must be even. Renderers must ignore the last gradient block if the number of
gradient blocks is odd.

Each pair of gradient blocks defines 2 to 64 stops and a matching number of colors. Stops are float32
numbers in the range 0..1. The first stop is always 0.0, the last stop is always 1.0, and each stop is
greater than or equal to the previous stop. Colors are always references to parameters or expressions.

We define GRADIENT[i] as the set of stops and corresponding colors yielded from following these steps
until they are terminated:

1. If i*2 is greater than or equal to GRADIENT_COUNT, then:


1.1. Yield a 0.0 stop with the color 0x00000000 (fully transparent block).
1.2. Yield a 1.0 stop with the color 0x00000000 (fully transparent block).
1.3. Terminate these steps.
2. Let STOP[j] be the word that is i*128+j words after the start of the
BLOCK_OFFSETS[GRADIENT_BLOCKS] block.
3. Let COLOR[j] be the word that is i*128+64+j words after the start of the
BLOCK_OFFSETS[GRADIENT_BLOCKS] block.
4. Let LAST_STOP be 0.0.
5. Let COUNT be 0.
6. Loop:
6.1. If COUNT is 0, then let NEXT_STOP be 0.0.
6.2. Otherwise:
6.2.1. Let VALUE be STOP[COUNT].
6.2.2. Let NEXT_STOP be the first value returned from following these steps, as float32:
6.2.2.1. If VALUE is a non-NaN value when interpreted as float32, return VALUE.
6.2.2.2. Let ARG be the low 16 bits of VALUE.
6.2.2.3. Let MODE be the high 16 bits of VALUE.
6.2.2.4. If MODE is 0xFFD0, then return the parameter with index ARG.
6.2.2.5. If MODE is 0xFFE0, then return the value of the expression numbered
ARG.
6.2.2.6. Return VALUE.
6.2.3. If LAST_STOP is less than 1.0 and NEXT_STOP is NaN or greater than 1.0, or, if
COUNT is 63, then let NEXT_STOP be 1.0.
6.3. If NEXT_STOP is less than LAST_STOP, greater than 1.0, or NaN, then terminate these
steps.
6.4. Let LAST_STOP be NEXT_STOP.
6.5. Let VALUE be COLOR[COUNT].
6.6. Let ARG be the low 16 bits of VALUE.
6.7. Let MODE be the high 16 bits of VALUE.
6.8. If MODE is 0xFFD0, then let NEXT_COLOR be the parameter with index ARG as color32.
6.9. Otherwise, if MODE is 0xFFE0, then let NEXT_COLOR be the expression numbered ARG
as color32.
6.10. Otherwise, let NEXT_COLOR be zero as color32 (fully transparent black).
6.11. Yield NEXT_STOP as float32 and NEXT_COLOR as color32.
6.12. Increment COUNT.
6.13. If COUNT is 64, terminate these steps.

Every even gradient block must fulfill the following criteria:

● The first word in the block is zero.


● There is exactly one n such that:
○ n is greater than or equal to 2.
○ n is not greater than 64.
○ The nth word as float32 is 1.0.
○ None of the first n words in the block are 0xFFFFFFFF.
○ All words in the block after the nth word (if any) are 0xFFFFFFFF.
● Ignoring any words which, when interpreted as float32, are NaN values, none of the words in the
block, when interpreted as float32, represent a value that is less than an earlier value in the block.
● Each word that is not 0xFFFFFFFF but is a NaN when interpreted as float32 in the even gradient
block must fulfill the following criteria:
● The high sixteen bits of the word are either 0xFFD0 or 0xFFE0.
● The low sixteen bits of a word whose high sixteen bits are 0xFFD0 are a number less than
PARAM_COUNT.
○ The low sixteen bits of a word whose high sixteen bits are 0xFFE0 are a number less than
EXPR_COUNT.
● The word 64 words after each word that is not 0xFFFFFFFF in the even gradient block must fulfill
the following criteria:
● The high sixteen bits of the word are either 0xFFD0 or 0xFFE0.
● The low sixteen bits of a word whose high sixteen bits are 0xFFD0 are a number less than
PARAM_COUNT.
● The low sixteen bits of a word whose high sixteen bits are 0xFFE0 are a number less than
EXPR_COUNT.
● Each word that is 0xFFFFFFFF must have a corresponding zero as the word 64 words later in the
file (the corresponding color in the odd gradient block).
Paint blocks

Paint blocks are the BLOCK_SIZES[PAINT_BLOCKS] blocks starting with the block at
BLOCK_OFFSETS[PAINT_BLOCKS].

Paint blocks use a varying number of words to describe the paint effect they represent. To allow for future
expansion, each block represents a single effect. Composition blocks refer to paint blocks to describe
how they should be styled.

The first word in each paint block describes the kind of effect represented by the block; the remaining
words describe the parameters of the effect.

We define PAINT_COUNT as BLOCK_SIZES[PAINT_BLOCKS].

PAINT_COUNT can in theory be as large as 232, but due to other limitations of this format, only 65536
possible paints are ever actually accessible (paints are referenced using 16 bit values). For this reason,
files should not specify a BLOCK_SIZES[PAINT_BLOCKS] value greater than 65536, and renderers can
treat PAINT_COUNT values greater than 65536 as 65536 without loss of generality (skipping over the
"unreachable" parameter blocks if BLOCK_SIZES[PAINT_BLOCKS] is greater than 65536).

The paint described by an index i is a paint that draws nothing if i is greater than or equal to
PAINT_COUNT, otherwise, it is a the paint described by the block with offset
BLOCK_OFFSETS[PAINT_BLOCKS] + i, as defined by the section that corresponds to the first word of
that block, as per the following table and the following subsections.

First word Effect

0x00000010 Linear gradient

0x00000014 Radial gradient

Anything else A paint that paints nothing

Paint blocks must not start with a word that does not have a corresponding section below.

Flat color

No paint blocks describe a flat color paint; to describe a flat color, a flat color paint code or a paint code
that references a parameter or expression is used instead.

Linear gradient

A paint block whose first word is 0x00000010 is a linear gradient.

The words of such a block must be interpreted as follows:

Word Interpretation

word 0 must be 0x00000010, signature for linear gradient

word 1 index of gradient to use, GRADIENT_INDEX, as uint32


word 2 flags, FLAGS, as uint32

word 3 index of matrix to use, MATRIX_INDEX, as uint32

words 4+ must be zero, must be ignored

The block represents a paint that draws a linear gradient that interpolates using the stops and the colors
of GRADIENT[GRADIENT_INDEX] from the origin to the coordinate 1.0,0.0, with the flags FLAGS,
transform by MATRIX[MATRIX_INDEX].

A matrix is used for linear gradients (rather than just specifying two coordinates, which for linear
gradients is equivalent and would be simpler) so that gradients can be adjusted by parameters without
requiring implementations to have logic for expanding paints.

Radial gradient

A paint block whose first word is 0x00000014 is a radial gradient.

The words of such a block must be interpreted as follows:

Word Interpretation

word 0 must be 0x00000014, signature for radial gradient

word 1 index of gradient to use, GRADIENT_INDEX, as uint32

word 2 flags, FLAGS, as uint32

word 3 index of matrix to use, MATRIX_INDEX, as uint32

words 4+ must be zero, must be ignored

The block represents a paint that draws a radial gradient that interpolates using the stops and the colors
of GRADIENT[GRADIENT_INDEX] from the origin to the unit circle, with the flags FLAGS, transformed
by MATRIX[MATRIX_INDEX].

Flags

The flags of a gradient are bits in a uint32. The bottom two bits must be interpreted as follows:

Bits Interpretation

0x0 Samples beyond the edge must be clamped to the nearest color in the defined
inner area.

0x1 Samples beyond the edge must be repeated from the far end of the defined area.

0x2 Samples beyond the edge must be mirrored back and forth across the defined
area.
0x3 Samples beyond the edge must be treated as transparent black.

Paint codes

A paint code consists of two words. The paint for a paint code whose two words are OPERATOR and
COLOR is defined as the first paint that is returned from the following steps:

1. If OPERATOR is 0xFFFFFFFF, return a paint that draws with the flat color COLOR interpreted as
color32. The paint code is valid.
2. Let ARG be the low 16 bits of OPERATOR.
3. Let MODE be the high 16 bits of OPERATOR.
4. If MODE is 0xFFD0, then return a paint that draws with a flat color, that color being the parameter
with index ARG as color32. In this case, if ARG is less than PARAM_COUNT and COLOR is zero
then the paint code is valid, otherwise it is not.
5. If MODE is 0xFFE0, then return a paint that draws with a flat color, that color being expression
numbered ARG as color32. In this case, if ARG is less than EXPR_COUNT and COLOR is zero
then the paint code is valid, otherwise it is not.
6. If MODE is 0xFFF0, then return the paint described by the paint block with index ARG. In this
case, if ARG is less than PAINT_COUNT and COLOR is zero then the paint code is valid,
otherwise it is not.
7. Return a paint that draws nothing. In this case, the paint code is not valid.

A paint code can be valid or not, as determined by these steps.

Composition blocks

Composition blocks are the BLOCK_SIZES[COMP_BLOCKS] blocks starting with the block at
BLOCK_OFFSETS[COMP_BLOCKS].

Composition blocks represent actual rendering. Each block specifies a group of matrices and shapes,
and a paint. Specifically, the words in a composition block are as follows:

1. The MATRIX_INDEX, as uint32.


2. The SHAPE_INDEX, as uint32.
3. The SEQUENCE_LENGTH, as uint32.
4. The OPERATOR, as uint32.
5. The COLOR, as color32.

All remaining words must be zero.

The SEQUENCE_LENGTH is biased by one, meaning that a value of zero indicates there is one shape
in the composition, a value of one indicates two shapes, etc.

For each composition block, a path must be created as per the following steps:

1. Let i be zero.
2. Loop:
3. Let SHAPE be the shape with index SHAPE_INDEX + i.
3.1. Let MATRIX be the matrix with index MATRIX_INDEX + i.
3.2. Transform SHAPE by MATRIX to form PATH_COMPONENT.
3.3. Add PATH_COMPONENT to the path being created.
3.4. Increment i.
3.5. If i is greater than or equal to SEQUENCE_LENGTH, terminate these steps.

The interior of the path (which is the area that is painted, as described below) is defined by a non-zero
sum of signed edge crossings: for a given point, the point is considered to be on the inside of the path if
a line drawn from the point to infinity crosses curves going clockwise around the point a different number
of times than it crosses curves going counter-clockwise around that point.

Composition blocks must be composited, in the order specified in the file. For each composition, the path
must be filled as specified by the paint with the paint code formed by OPERATOR and COLOR, into the
rectangle whose top left is at 0,0 and whose bottom right is at IMAGE_WIDTH,IMAGE_HEIGHT. Each
composition block's pixels must be combined with the previous composition's using the "over" Porter-Duff
operator.

Composition blocks must not have a MATRIX_INDEX greater than or equal to MATRIX_COUNT, a
SHAPE_INDEX greater than or equal to SHAPE_COUNT, a SHAPE_SEQUENCE_LENGTH greater
than or equal to SHAPE_COUNT-SHAPE_INDEX or MATRIX_COUNT-MATRIX_INDEX, or a paint code
(consisting of OPERATOR and COLOR) that is not valid.

APIs

Implementations should offer the following APIs for introspecting images.

Updating parameters

Given a parameter index that is equal to or greater than zero and less than PARAM_COUNT, as well as
a parameter value in the form of a 32 bit integer (signed or unsigned), 32 bit color value, or 32 bit floating
point value (in binary32 format), the parameter updating API must update the value of the parameter
specified by that index to the given new parameter value, and then schedule the image to be rerendered
at the earliest available and appropriate opportunity.

Hit testing

Given a point in the image's coordinate space (as given by the width and height in the metadata block, or
implied by such a block's absence), the hit testing API must return the index of the top-most composition
that describes a path that considers the given point to be within its interior, or a sentinel value (such as -1
or null) if the point is not within the interior of any of the paths.

For example, a suitable Dart API could have the following signature:

int? hitTest(Offset position);

Bounds introspection

Given an index that specifies a composition block (the index being greater than or equal to zero, and less
than BLOCK_SIZES[COMP_BLOCKS]), the bounds introspection API must return an axis-aligned
rectangle (aligned to the x and y axes of the image) giving the smallest rectangle that contains all points
that are considered to be in the interior of the path of that composition block (the bounding box of that
path).
For example, a suitable Dart API could have the following signature:

Rect bounds(int composition);

Implementations should fail (e.g. throw an exception) if the specified composition index is out of range.

Implementations should also fail (e.g. throw an exception) if the specified composition block has no
curves.

Metadata APIs

The width API must return the image's width as specified by the metadata block, or 1.0 if there is no
metadata block.

The height API must return the image's height as specified by the metadata block, or 1.0 if there is no
metadata block.

Other APIs

Implementations may offer other affordances, e.g. providing a count of parameters or composition
blocks, exposing the default or current values of parameters, or offering APIs to update parameters
continually (e.g. specifying that a particular parameter's value should be increased by a specific amount
every 16ms).

Compressibility

This section is non-normative.

WVG files are somewhat sparse, have a lot of redundancy, and are extremely regular, which makes them
interesting targets for compressibility.

In practice, simple WVG files compress by a factor of 10, in some cases a factor of 20. Anecdotally,
based on the very few sample files at this early stage, xz (which uses LZMA2) performs best among
commonly-available compression tools, compressing the 35,328 byte test data to 2,220 bytes:
Summary of internal references

This section is non-normative.

Expression blocks, matrix blocks, curve blocks, and gradient blocks can contain words of the form
0xFFD0XXXX to refer to parameters and words of the form 0xFFE0YYYY to refer to earlier expressions,
where XXXX is the parameter index (the XXXXth word of the file starting from the first parameter), and
YYYY is the expression index (the YYYYth block of the file starting from the first expression block).

Shape blocks specify curves by giving the number of the shape block that contains the first coordinate of
the curve in question, and the index of that coordinate's word in that block (as well as the number of
blocks in the groups).

Paint blocks refer to gradients by specifying the gradient index (which is half the number of the block that
the gradient starts from, relative to the first gradient block) and matrices by specifying the matrix index
(which is the number of words from the first word of the first matrix block to the first word of the matrix
being specified, divided by sixteen).

Composition blocks refer to matrices in the same manner as paint blocks, shapes by the number of
words from the first word of the first shape block to the first word of the shape being specified divided by
four, and colors by either using 0xFFD0XXXX to refer to parameters, 0xFFE0YYYY to refer to
expressions, and 0xFFF0ZZZZ to refer to paints, where XXXX and YYYY are interpreted as above and
ZZZZ is the number of the paint block being referenced relative to the first paint block.

In summary, parameters, expressions, curves, gradients, matrices, shapes, and paints can be
referenced. The method for referencing a feature of the format is the same any time that it can be
referenced (so e.g. every reference to a parameter is always of the form 0xFFD0XXXX).
Future extensions

When adding features to this format in the future, various options are available. Here are some thoughts
that may help.

New block types

Obviously the simplest extension mechanism is adding new block types. To make these block types
relevant, they would need to be referenced from a source, e.g.

Bitmap images and other attachments

Arbitrary data from other formats can be embedded in a new block type without additional internal
structure, with parts identified by start offset and length. Alternatively, two block types could be used, one
containing raw unstructured binary data (PNGs, JPEGs, etc), and the other providing a directory index, or
manifest, of the data in the other blocks, identifying entries by name, type, offset, and length (and
potentially including even more data such as modification times). Each file would use one block of the
manifest, and specific files could then be identified in other parts of the format (e.g. in paints) by indexing
into the manifest.

Extending packed blocks

Some block types, e.g. matrices and parameters, are packed tightly, with no room for expansion. To add
new information to such blocks, a new parallel block type can be minted, and indices into the original
type can simultaneously refer to the additional data in the new block. Thus for example a reference to
matrix 4 would refer simultaneously to the fourth group of 16 words in the first block of the
MATRIX_BLOCKS blocks, as well as the fourth group of 16 words in the first block of the
MATRIX2_BLOCKS blocks, where MATRIX2_BLOCKS is a different block type (e.g. 24, one more than
MATRIX_BLOCKS itself).

More parameters, expressions, and paints

The format currently limits the number of parameters, expressions, and paints to 65,536 (216), because it
uses 16 bits to specify the index. This makes the implementations very slightly simpler by splitting the 32
bit words that reference parameters, expressions, and paints into two 16 bit parts that together always
form a floating point NaN value. This allows the high 16 bits to always be compared literally (and always
to the same magic constants, even though the context doesn't always require NaN-boxing), and the low
16 bits to be used literally as the index.

However, the low 4 bits of the high 16 bits are always zero in the current scheme. These bits could be
used to extend the references with only a slight increase in complexity, allowing up to 1,048,576 (220)
parameters, expressions, and paints per file.

More kinds of references

The remaining 12 bits are the NaN sign bit, the eight exponent bits which must all be set to indicate a
NaN value, and three bits of the mantissa (the NaN payload). The mantissa cannot be all zeroes. This
leaves the following ways to bundle data into the NaNs:
High 12 bits Range of low 20 bits45 Current assigned meaning

0b011111111000 0x7F8 0x00001-0xFFFFF Not currently assigned

0b011111111001 0x7F9 0x00000-0xFFFFF Not currently assigned

0b011111111010 0x7FA 0x00000-0xFFFFF Not currently assigned

0b011111111011 0x7FB 0x00000-0xFFFFF Not currently assigned

0b011111111100 0x7FC 0x00000-0xFFFFF Not currently assigned

0b011111111101 0x7FD 0x00000-0xFFFFF Not currently assigned

0b011111111110 0x7FE 0x00000-0xFFFFF Not currently assigned

0b011111111111 0x7FF 0x00000-0xFFFFF Not currently assigned

0b111111111000 0xFF8 0x00001-0xFFFFF Not currently assigned

0b111111111001 0xFF9 0x00000-0xFFFFF Not currently assigned

0b111111111010 0xFFA 0x00000-0xFFFFF Not currently assigned

0b111111111011 0xFFB 0x00000-0xFFFFF Not currently assigned

0b111111111100 0xFFC 0x00000-0xFFFFF Not currently assigned

0b111111111101 0xFFD 0x00000-0xFFFFF Parameter reference

0b111111111110 0xFFE 0x00000-0xFFFFF Expression reference

0b111111111111 0xFFF 0x00000-0xFFFFF Paint reference

Ranges could be combined if more bits are needed, for example rows 0x7FC to 0x7FF could be
combined to store a 22 bit number in the remaining bits, if 20 bits is insufficient for some payload.

In expression blocks, references must start with a leading one bit (so that all positive integers can be
pushed onto the stack), so the rows above starting with 0x7F are only useful for references that are not
meaningful in expressions.

45
The two rows where the other bits of the mantissa are zeroes must have a non-zero payload, so the low 20 bits
cannot represent the number 0x00000. Every other row can encode any 20 bit number.

You might also like