A Comparison of Performance On WebGPU and WebGL in The Godot Game Engine
A Comparison of Performance On WebGPU and WebGL in The Godot Game Engine
Abstract—WebGL has been the standard API for rendering in medicine and geospatial applications. WebGL currently
graphics on the web over the years. A new technology, WebGPU, exists as a possible rendering backend in the Godot engine
2024 IEEE Gaming, Entertainment, and Media Conference (GEM) | 979-8-3503-7453-7/24/$31.00 ©2024 IEEE | DOI: 10.1109/GEM61861.2024.10585437
has been set to release in 2023 and utilizes many of the for rendering graphics on the web platform.
novel rendering approaches and features common for the native
modern graphics APIs, such as Vulkan. Currently, very limited This paper consists of an implementation of a rendering
research exists regarding WebGPU’s rasterization capabilities. backend for the game engine Godot using the currently latest
In particular, no research exists about its capabilities when low-level web graphics API WebGPU, and comparing its
used as a rendering backend in game engines. This paper performance in various test cases to the performance of the
aims to investigate performance differences between WebGL and WebGL backend currently implemented in Godot. WebGPU
WebGPU. It is done in the context of the game engine Godot,
and the measured performance is that of the CPU and GPU is a new graphics API that aims to bring a more modern API
frame time. The results show that WebGPU performs better than workflow to web platforms, with its first draft of specifications
WebGL when used as a rendering backend in Godot, for both being released in 2021 [7]. Like the previously mentioned
the games tests and the synthetic tests. The comparisons clearly modern APIs, it aims to enable the developer to work closer to
show that WebGPU performs faster in mean CPU and GPU the hardware of the machine it is running on. The API utilized
frame time.
Index Terms—Game Engine, Performance Overhead, Render- by the web browser is determined by the operating system
ing, WebGPU, WebGL on which it is executed. Depending on the specifications of
the system, the web browser may utilize either the Direct3D
I. I NTRODUCTION 12, Vulkan, or Metal APIs. As with these APIs, WebGPU
provides developers with relatively direct access to previously
Modern video games leverage sophisticated graphics appli- inaccessible low-level GPU resources. It also employs a state-
cation programming interfaces (APIs) to render highly detailed less syntax, which leads to fewer API calls, invoking less API
worlds. They accomplish this at interactive frame rates by overhead when compared to the stateful syntax of WebGL,
utilizing powerful graphics processing units (GPUs) equipped inherited by OpenGL.
with modern computers. Commonly used APIs include Di- Section 2 lists some related work in the areas of WebGPU.
rect3D [1] for machines running Windows, Metal [2] for Section 3 details the overall research method including the
Apple products, and Vulkan [3] and OpenGL [4] as a cross- implementation details and how the experiment and data
platform alternative. Those APIs all target native platforms, gathering were conducted. Section 4 presents the results and
and as is evident, many choices are available to developers. analysis of the conducted experiment. Section 5 contains
However, when it comes to rendering on the web, the choices a discussion of the performed work, and the final section
narrow significantly. WebGL was the lowest-level alternative presents the conclusions and future work.
for rendering on the web [5]. It is based on the aforementioned
OpenGL native API and adopts the same workflow and syntax. II. R ELATED W ORK
WebGL is a cross-platform, open-source API for rendering A study by Hidaka et al. found that their implementation
interactive 2D and 3D graphics on the web, with an initial of a deep neural network (DNN) using WebGPU performed
release in March 2011. A typical WebGL program consists around 36 times faster (91 ms over 3297 ms) compared to
of JavaScript-written control code and shader code facilitated another popular DNN implementation for the web that makes
by the OpenGL Shading Language (GLSL). Additionally, Em- use of the emulated compute capabilities of WebGL [8].
scripten may compile C/C++ OpenGL code into WebAssem- Aldahir researched the compute performance differences
bly, allowing the WebGL API to be interacted with through (Mandelbrot set generation and matrix multiplication) of
lower-level languages [6]. WebGL is a mature API supported CUDA and WebGPU, with WebGPU set up to run compute op-
by many different hardware products and browsers. It has been erations in a cluster of web browsers. The results showed that
applied in many environments and fields, such as rendering CUDA is faster and more efficient than WebGPU. However,
backends in the gaming industry and for visualization purposes the authors added that WebGPU is still in early development
Authorized licensed use limited to: Zhejiang University. Downloaded on January 01,2025 at 02:28:27 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-7453-7/24/$31.00 ©2024 IEEE
and hence not as stable and mature as CUDA. Also, WebGPU, 1) The shaders used must be as close as possible in terms
along with WebRTC, displayed good scalability with over 75% of instruction count, branching, and operations. Exactly
efficiency for building clusters of web browsers [9]. the same work must be done in the shaders.
Usher and Pascucci compared the computing capabilities 2) The shader pressure, in terms of data types and data
of WebGPU with those of native Vulkan. In the paper, the layout, must be as close as possible.
marching cubes algorithm applied on a scalar field was used 3) No optimizations are allowed for the WebGPU Raster-
as a proxy for compute-intensive tasks. The results display izer on the CPU-side or GPU-side, which would put it
similar performance with WebGPU falling in the same order of at an unfair advantage over the WebGL Rasterizer.
magnitude and often even closer to the Vulkan implementation 4) The CPU workflow must be as identical as possible in
in terms of time-to-render [10]. terms of computations and branching.
Dyken et al. investigated the relative performance of ren- 5) The run time allocations should be as identical as
dering large-scale graph layouts on the web using libraries possible.
based on WebGPU (GraphWaGu), WebGL (NetV & Stardust), To achieve the prerequisites, the work began with deconstruct-
and non-GPU-accelerated equivalents (such as D3 Canvas). ing the WebGL Rasterizer to a state where it would match the
GraphWaGu is the only GPU-leveraged library that is able MVP aimed for as close as possible; the Rasterizer should be
to compute iterations of the graph algorithms in parallel. So able to render simple 2D games of predetermined complexity
at 100.000 nodes and 2.000.000 edges, only GraphWaGu is and nothing more. In order for measurements between the
able to maintain interactive rendering at a frame rate of ten performance of the two APIs to be as fair as possible, the
or more. The equivalent frame rate for NetV is three, with WebGPU rendering backend has to adhere to the rendering
StarDust being unable to render the graph layout at all [11]. techniques that Godot employs. The techniques that concern
There has been quite some research done in the field of the scaled-down version of the Rasterizer backend include
WebGPU and its general computing capabilities. However, this batching and instancing as well as the forcing of render
does not hold true for WebGPU and its rasterization capability target blitting. Batching is a technique used to group similar
counterpart, in particular research involving comparisons of items and render them together to avoid unnecessary resource
WebGPU and WebGL. Furthermore, at the time of doing binding. For blitting, a separate pipeline was set up with a
this study, no research could be found that places its context vertex shader that simply renders a triangle covering the entire
inside the environment of a game engine. The work presented back buffer, and a fragment shader that textures this triangle
in this paper aims to effectively reduce the research gap on using the main render target texture.
WebGPU as a new rasterization technology for the web in the
environment of the Godot game engine, grounding the research B. Experiment and Data Gathering
and results in real usability scenarios. When it comes to the performance of games and graphical
scenes the general consensus of how well something performs
III. M ETHODS is how smooth it appears to run to the human eye. The
Godot is an open-source game engine first released in gathered data in the conducted experiment is that of the frame
2015. It has since had many updates and the newest version, time measured in milliseconds. As the WebGL backend and
4.0, was recently released as of doing this study [12], with implemented WebGPU backend spans over both the CPU and
many new features and an entirely new rendering pipeline GPU in terms of work performed, both the CPU work times
leveraging the aforementioned Vulkan API, along with a host and GPU work times are measured. The time gathered is for
of updates to the existing legacy rendering backends. Godot a full frame for the CPU and GPU.
is multifaceted in the advantages it affords the work when The timings are gathered as averages over 2000 frames.
used as a foundation for implementing a rendering backend. The measurements of elapsed time on the CPU for the
Firstly, a pre-established architecture can be followed during various scopes was measured by using the C++ standard
implementation, keeping comparisons between rendering APIs library’s chrono header. A timestamp was acquired from
fair. Secondly, the currently implemented WebGL rendering chrono::high_resolution_clock at the start of the
backend can be assumed to be fairly well optimized and thus relevant scope and another one at the end of it. To calculate
serves as a good benchmark for the performance of WebGL how much time elapsed, the start time stamp was subtracted
rendering engines in the industry. The reason for choosing from the end one. This elapsed time was then stored in a
Godot over another game engine mainly comes down to its vector and used later when enough samples have been gathered
open-source nature. to calculate an average elapsed time. For measuring time on
the GPU, different methods need to be used for the different
A. Implementation APIs. WebGL provides a way of measuring the elapsed time
In order for the implemented WebGPU Rasterizer and the between two points, whereas WebGPU provides a way to
existing WebGL Rasterizer to be eligible for performance queue a timestamp on the command encoder. If one timestamp
comparisons, the overall computation work they do must be as is acquired at the start of a frame and one at the end, the
identical as possible. More precisely, these prerequisites must elapsed time can be acquired in the same way as described
be aimed for: for the CPU measurements.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 01,2025 at 02:28:27 UTC from IEEE Xplore. Restrictions apply.
The experiments include two categories: simple 2D games will increase significantly with each test increasing the
and synthetic tests. For the category of simple 2D games six polygon count, up to and including 16 million vertices.
different games that are simple in scope and complexity were
C. Hardware and Software Specification
selected. As the Rasterizers are limited in scope, and as the
games must be supported by the Godot version used in this The hardware as well as what versions of relevant graphics
work, the games were selected purely based on the engine’s drivers were used are presented in Table I.
and the two Rasterizers’ ability to support and render them.
TABLE I
The games are: I NFORMATION ABOUT HARDWARE AND SOFTWARE VERSIONS OF THE
1) Snake [13], in which the player must avoid obstacles and MACHINE UPON WHICH ALL TEST CASES WERE RUN .
gather apples in order for the snake character to grow Component
longer and longer. CPU Intel Core i7 12700H, 2.7GHz
GPU NVIDIA GeForce RTX 3070 Ti (Laptop Version), 8GB GDDR6
2) Evader [14], in which the player must avoid incoming Memory SK Hynix, 2x8GB DDR4, 3.2GHz
Disk Samsung MZVL21T0HCLR-00B07, 1TB, 7.0/5.1 GB/s
shapes on the highway. Monitor Resolution 2560x1440
3) Checkers1 [15], in which the player plays the checkers Monitor Refresh Rate 165Hz
Operating System Windows 11 Home 22H2
game either versus an AI or optionally versus another NVIDIA Driver Version 531.41
Emscripten 3.1.30
player locally. Chrome Canary 114.0.5715.1
4) Falling Cats [16], in which the player must catch cats Godot Engine Version 4.0
1 The version used in testing is v1.0.1-0-g7a4203b 1) GPU Frame Time: In Figure 1, it can be seen that
2 The version used for testing is v1.0.0 WebGPU on average has much shorter GPU frame times than
Authorized licensed use limited to: Zhejiang University. Downloaded on January 01,2025 at 02:28:27 UTC from IEEE Xplore. Restrictions apply.
WebGL in all games that were included in the test. Further- B. Performance Comparison of Synthetic Tests
more, a speed-up of WebGPU to WebGL ranges between 1) GPU Frame Time: For the synthetic test involving
6.822, in the case of Ponder, and 35.611, in the case of Evader. rendering multiple quads WebGPU outperforms WebGL in all
Figure 2 shows that the difference between the lowest 95% of cases in GPU mean frame times, as can be clearly seen in
frame times and the highest 1% is larger for WebGL. However, Figure 5. The speed-up factor ranges from 4.588, as is the
for Checkers and Ponder and Falling Cats, the percentage case when rendering 40 000 quads, up to 9.039, as is the case
difference is more significant for WebGPU. For checkers, this when rendering ten quads. The results of rendering multiple
comes out to a 7.020 times increase for WebGPU compared to
a 4.641 times increase for WebGL. For Ponder, the increase is
4.748 times for WebGPU and 4.292 times for WebGL. Lastly,
for Falling Cats, WebGPU shows a 1.613 times increase and
WebGL shows a 1.611 times increase. For the other games,
WebGPU has a smaller spread in absolute and percentage
terms.
Fig. 5. Comparison of the mean WebGL and WebGPU GPU frame times, in
milliseconds, for the Multiple Quads test. The workloads range from 10 to
50.000 quads.
Fig. 6. Comparison of the mean WebGL and WebGPU GPU frame times, in
milliseconds, for the Full-screen quads test. The workloads range from 10 to
50.000 full-screen quads.
Fig. 4. Comparison of the highest 1% mean and lowest 95% mean WebGL
and WebGPU CPU frame times, in milliseconds, for the various games.
The Polygons synthetic tests show the biggest comparative
GPU frame time differences between the two Rasterizers,
2) CPU Frame Time: In Figure 3, it is shown that WebGPU with WebGPU vastly outperforming WebGL in every case.
has shorter mean frame times for all of the game tests As an example, in Figure 7, at the point of rendering 50 000
compared to WebGL. Deck Before Dawn is a clear outlier polygons WebGL manages an average of 150.94 milliseconds
in the data set in terms of how much shorter the CPU frame per frame while WebGPU is still running at passable real-time
time is with the WebGPU implementation. Figure 4 shows speeds (15.75 milliseconds, equivalent to more than 60 frames
that the percentage differences between the lowest 95% and per second). The GPU frame times for the Large Polygons
highest 1% of frame times are typically lower compared to synthetic test show that WebGPU is roughly 2 - 3 times
the spread documented for GPU frame times in Figure 2. This faster across various workloads. The frame time increases
does, however, not hold true for all cases. For instance, Evader roughly linearly for the Rasterizers with greater workloads,
shows a larger spread for WebGPU in CPU frame time than with a statistical deviation occurring at 4 million polygons
it did for the GPU. for WebGL. Like with the test of rendering multiple quads,
Authorized licensed use limited to: Zhejiang University. Downloaded on January 01,2025 at 02:28:27 UTC from IEEE Xplore. Restrictions apply.
steady increase for all workloads below 50 000 quads, the 50
000 quads variant has a much larger frame time than the 40
000 quads variant. The frame time for this variant looks very
similar to the GPU frame time presented in Figure 6.
Fig. 7. Comparison of the mean WebGL and WebGPU GPU frame times, in
milliseconds, for the Multiple Polygons test. The workloads range from 10 to
50.000 polygons.
Fig. 8. Comparison of the mean WebGL and WebGPU GPU frame times,
in milliseconds, for the Large Polygons test. The workloads range from 2
million to 16 million vertices.
Fig. 11. Comparison of the mean WebGL and WebGPU CPU frame times,
in milliseconds, for the Multiple Polygons test. The workloads range from 10
to 50.000 polygons.
Authorized licensed use limited to: Zhejiang University. Downloaded on January 01,2025 at 02:28:27 UTC from IEEE Xplore. Restrictions apply.
The work was grounded in both game examples with realistic
workloads and raw stress tests of varying workloads through
synthetic experiments. The results presented show how the
WebGPU implementation, in its current state, consistently
performs better than the WebGL equivalent. It does so across
all conducted experiments in terms of total mean CPU and
GPU frame time. Furthermore, and in general, the presented
results are statistically significant. The WebGPU renderer
implementation is relatively naive, the better results could
be achieved with a more modern graphics API workflow. A
notable suggestion for future research is to investigate the GPU
Fig. 12. Comparison of the mean WebGL and WebGPU CPU frame times, VRAM usage by both WebGL and WebGPU, if and when this
in milliseconds, for the Large Polygons test. The workloads range from 2 feature eventually becomes available for WebGPU. Another
million to 16 million vertices.
suggestion for future research is to build upon the work in this
study in order to have the WebGPU Rasterizer more feature
GPU frame time performance. This is especially prominent in rich. This would mainly involve adding support for additional
the case of rendering multiple polygons and multiple quads. render item types and complementing the 2D Canvas Renderer
WebGL also shows more fluctuating frame times, as heavily with the 3D Scene Renderer.
evident by the multiple quads and the large polygons tests. R EFERENCES
V. D ISCUSSION [1] “Direct3D - Win32 apps,” Microsoft, Sep. 2021. [Online]. Available:
https://learn.microsoft.com/en-us/windows/win32/direct3d
There are several explanations as to why WebGPU performs [2] “Metal Overview,” Apple Inc. [Online]. Available:
better than WebGL at the rendering tasks presented in the https://developer.apple.com/metal/
[3] Vulkan, “Vulkan Cross platform 3D Graphics,” Khronos Group.
study. One is due to the use of modern graphics drivers and [Online]. Available: https://www.vulkan.org/
bundled state. These provide optimizations that WebGL or [4] “OpenGL - The Industry Standard for High Performance Graphics,”
OpenGL cannot achieve, and explain the superior performance Khronos Group. [Online]. Available: https://www.opengl.org/
[5] A. Evans, M. Romeo, A. Bahrehmand, J. Agenjo, and J. Blat, “3D
of WebGPU. Despite WebGPU showing consistently better graphics on the web: A survey,” Computers & Graphics, vol. 41, pp.
frame times than WebGL, there are still times when it strug- 43–61, 2014.
gles. For instance, the CPU frame time for the 50,000 full- [6] D. Liu, J. Peng, Y. Wang, M. Huang, Q. He, Y. Yan, B. Ma,
C. Yue, and Y. Xie, “Implementation of interactive three-dimensional
screen quads synthetic test increases significantly compared visualization of air pollutants using WebGL,” Environmental Modelling
to the 40 000 full-screen quads due to data uploading to & Software, vol. 114, pp. 188–194, Apr. 2019. [Online]. Available:
the instance buffer. This function call may force CPU and https://linkinghub.elsevier.com/retrieve/pii/S1364815218304195
[7] “WebGPU,” W3C, 2021. [Online]. Available:
GPU synchronization, leading to longer CPU times and longer https://www.w3.org/TR/2021/WD-webgpu-20210518/
GPU frame times. The reason for synchronization not being [8] M. Hidaka, Y. Kikura, Y. Ushiku, and T. Harada, “WebDNN: Fastest
necessary for other variants is unknown and requires further DNN execution framework on web browser,” in MM 2017 - Proceedings
of the 2017 ACM Multimedia Conference, 2017, pp. 1213–1216.
study. [9] A. Aldahir, “Evaluation of the performance of webGPU
Aside from the already discussed performance benefits in a cluster of web-browsers for scientific computing,”
inherent to WebGPU as a modern technology, there exist other Bachelor’s thesis, Umeå University, 2022. [Online]. Available:
http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-197058
possible explanations as to why the performance of WebGL [10] W. Usher and V. Pascucci, “Interactive Visualization of Terascale Data
falls behind WebGPU in the experiments conducted. One of in the Browser: Fact or Fiction?” in 2020 IEEE 10th Symposium on
the reasons is related to stalling. WebGL experiences different Large Data Analysis and Visualization (LDAV), 2020, pp. 27–36.
[11] L. Dyken, P. Poudel, W. Usher, S. Petruzza, J. Y. Chen, and
types of stalling at varying workloads, while WebGPU does S. Kumar, “Graphwagu: Gpu powered large scale graph layout
not. In the case of larger workloads, WebGL is several hundred computation and rendering for the web,” in Eurographics Symposium
milliseconds slower than WebGPU in mean GPU frame times. on Parallel Graphics and Visualization, 2022. [Online]. Available:
https://diglib.eg.org/xmlui/bitstream/handle/10.2312/pgv20221067/073-
On the other hand, the Polygons tests show that the GPU is 083.pdf?sequence=1
stalled instead as the workload increases. The CPU stalls as [12] “Godot 4.0 sets sail: All aboard for new horizons,” Godot. [Online].
it waits for WebGL GPU instructions to complete, which is Available: https://godotengine.org/article/godot-4-0-sets-sail/
[13] P. Hex, “Snake in Godot4,” Itch.io. [Online]. Available:
reflected in the exceptionally high mean CPU frame times in https://hexblit.itch.io/snake-in-godot4
WebGL. [14] MohamedA.G, “Evader,” Itch.io. [Online]. Available:
https://mohamedag.itch.io/evader
VI. C ONCLUSIONS AND F UTURE W ORK [15] Aezart, “Snake,” Itch.io. [Online]. Available:
https://aezart.itch.io/checkers
This paper has investigated the relative performance of two [16] angelchama333, “Falling Cats,” Itch.io. [Online]. Available:
Rasterizers based on two different rendering APIs: WebGL https://angelchama333.itch.io/falling-cats
[17] ShoeFisherGames, “Deck Before Dawn,” Itch.io. [Online]. Available:
and WebGPU. This was done by implementing a WebGPU https://shoefishergames.itch.io/deck-before-dawn
Rasterizer backend and comparing it with the existing WebGL [18] ceruleancerise, “Ponder,” Itch.io. [Online]. Available:
Rasterizer backend in the context of the Godot game engine. https://ceruleancerise.itch.io/ponder
Authorized licensed use limited to: Zhejiang University. Downloaded on January 01,2025 at 02:28:27 UTC from IEEE Xplore. Restrictions apply.