Performance Analysis of RTX Architecture in
Performance Analysis of RTX Architecture in
Abstract— Real-time rendering techniques developed for Ray Tracing (RT) has been used for CGI movies for many
computer games combined with the improved algorithms and years but it is only recently that its application in real-time has
advanced hardware such as the Nvidia Geforce RTX 3000 series become practical. This is partly due to improved software, but
of graphic cards improve the quality of the rendered images in advances in hardware, such as the Nvidia Geforce RTX 3000
CGI. In this paper, our goal is to test the performance of RTX series of graphic cards, have provided hardware support for
architecture in Virtual Production and graphics processing. We real-time lighting and improved the quality of the rendered
conducted a series of tests for rendering of a scene in Unreal scenes in CGI when compared to traditional real-time
game engine in a Virtual Production studio. Images are rendering techniques [4]. This technology has also simplified
rendered in 4K and output to a network distribution system
the processes required to light a real-time scene. In traditional
where the image is broken down into a series of smaller images
each rendered onto LED screens. The comparison of render
rasterized images, lightmaps and light probes must be created
times between two graphics workstations using Nvidia RTX and placed by artists. With VP, RT artists are instead free to
A6000 GPU and Nvidia RTX A3090 GPU show that whilst RTX focus on lighting photorealistic scenes in much the same way
architecture produces better image quality, the gains might not as lighting engineers in a traditional film production.
be worth the additional hardware cost required by the high-end
graphic cards. It might also be optimal to split the rendering of III. RTX ARCHITECTURE
the scene across multiple computers. The graphics cards powered by Ampere, NVIDIA’s 2nd
generation RTX architecture [4], with new ray tracing [RT]
Keywords— Real-time rendering, Graphics Processing, cores, Tensor Cores, and streaming multiprocessors for the
Virtual production, Game engineering most realistic ray-traced graphics and cutting-edge AI
features, promise to revolutionize real-time rendering over the
I. INTRODUCTION coming decade. It is expected that RTX architecture [5] will
Virtual Production (VP) integrates virtual and augmented largely replace the current rendering techniques for all mid to
reality technologies with CGI and VFX using a game engine high range hardware applications. It is likely to become the
to enable on set film production crews to capture and unwrap standard to light the scenes, particularly relevant to virtual
scenes in real time [1]. The use of real-time rendering production applications [6] where real-time images are
techniques, developed for computer games, offers great essential for believable integration of live actors and CGI.
opportunities in film production. As stated in [2], until
recently, most film production has been done at 2K (2048 RTX architecture (The NVIDIA Turing™ architecture)
pixels wide). As more 4K (4096 pixels wide) digital projectors combined with GeForce RTX™ platform fuses together real-
become available, there has been a push to use 4K throughout time Ray Tracing (RT), AI, and programmable shading. It
the virtual production process. While this technically allows for significantly more transistors to be packed into a
increases the image quality, some issues still need to be smaller area, using the 2nd generation of consumer RT and
addressed. third generation deep learning hardware. “This dedicated RT
hardware can cast upwards of 10 gigarays per second,
This paper focuses on hardware used for virtual allowing real-time, movie-like lighting in games. Real-time
production. In this paper, our goal is to test the performance ray tracing is only possible because RTX graphics cards
of RTX architecture in Virtual Production (VP) and graphics deliver up to 6x faster ray-tracing performance… RTX
processing. In the remaining sections we first review graphics graphics cards also offer Tensor Cores capable of delivering
processing and RTX architecture, and conduct a performance over 100 teraflops of AI processing to accelerate gaming
analysis on RTX architecture. Then, we discuss the test performance with NVIDIA DLSS [25].” However, there are
results, and evaluate the performance of two graphics cards some problems with the process for which there are currently
used in virtual production. no good solutions. One of these is that the system requires
placement of light probes in the scene which are used in real-
II. ADVANTAGES OF VP time by the GPU hardware for quantization of the scene.
Modern rendering systems can produce
While only RTX cards have RT and Tensor Cores, the
photorealistic scenes in real-time and those can be projected
GTX 16-Series also uses the shared Turing architecture to
onto screens behind the actors. With VP the actors are placed
offer high performance. The GTX cards excel at both
in front of a live screen displaying exactly what will appear on
immersive gaming, providing great experiences in popular
the final film. Thus, directors and cinematographers are able
titles [17] such as Fortnite, Overwatch and Counterstrike:
to better assess the quality of shots in real-time and adjust the
Global Offensive and Wolfenstein II: The New Colossus by
final output accordingly. This significantly reduces
making use of Turing’s concurrent floating point and integer
production time and improves the quality. As discussed in [1]
operation, and advanced shading, including variable rate
and [3], VP offers significant advantages over the traditional
green screen production methods.
216
Authorized licensed use limited to: PUC MG. Downloaded on July 04,2023 at 18:11:23 UTC from IEEE Xplore. Restrictions apply.
2) Rasterization of the polygons in the scene geometry
3) The coloring of individual pixels which make up the
polygons once projected to the render surface.
Even though modern scenes can contain millions of points
which need transforming, it is the last stage in the pipeline
which typically requires the most significant GPU resources.
Each pixel needs to be colored and the calculations required
to light it can be a very significant cost.
V. TESTING
Our goal in testing was to conduct a series of experiments
Fig. 1. Plan view of example scene to determine the relative effectiveness of various hardware
and software configurations in terms of differences in frame
rate when switching between RTX and traditional raster
lighting. Data has been collected from the above hardware
configuration using three different test scenes and a variety
of different configurations. Several scenes were created in
Unreal for testing, based on the scene shown with various
modifications.
A. Complex Scene with complex lighting (Fig 3)
B. Complex Scene Simple lighting
C. Simple Scene
D. Overlapped transparency
E. Particle effects
Fig. 2. Screen grab of example scene in the Unreal editor window.
217
Authorized licensed use limited to: PUC MG. Downloaded on July 04,2023 at 18:11:23 UTC from IEEE Xplore. Restrictions apply.
• Particle Count: 13,200 Particles the NVlink, when only one RTX A6000 is running,
For the transparent scenes an additional option was performance is worse on machine1, than it is on machine2.
explored to test with real-time raytracing on only the This is a surprise because machine2 is inferior to machine1
transparency and on both the particles and transparency. in every way. Again, performance comparisons of the Nvidia
The principle hardware configurations tested where: RTX A3090 and the Nvidia RTX A6000 suggest that the
• Machine 1 – NVLink enabled former is superior for gaming purposes and because we are
• Machine 1 – NVLink Disabled using Unreal as the tool for rendering the scenes, which is
• Machine 2 primarily aimed at the games market, it may be heavily
For every combination above tests were performed with the optimized for the Nvidia RTX A3090.
following lighting:
• Real time ray tracing Comparison of display update (average FPS)
• Rasterization
The tests are run without the physical camera being used 60.00
but some movement is simulated using a script which simply 50.00
rotates the camera through 360 degrees and moves it along a 40.00
track. This provides some degree of averaging across 30.00
different scene complexity. 20.00
10.00
A. Data Analysis 0.00
Data was gathered using the UnrealInsights tool. “Unreal Ray trace Ray trace Ray Raster Raster Raster
Insights helps developers identify bottlenecks, which is simple Trace simple simple
lights simple lights scene
useful when optimizing for performance [16].” The tool runs scene
in parallel whilst the game engine is running and captures
information in real-time concerning every frame rendered. A6000 SLI A6000 3090
Our process was to load the scene, start the profiler and then
run the script to simulate the camera movement. The data Fig. 4. Comparison of display update (Average FPS)
from UnrealInsight was then saved out as a file which can be
reloaded again later for analysis. The tool is a great aid when Fig 5 shows the Timing Window and how tasks are
attempting to analyse a scene if looking for ways to improve distributed between the various processing cores and threads.
the scene performance. The metric which we chose to Analysis of the timing window from Unreal Insight in Fig 6
compare performance was the average time taken to render sheds some light on why NVlink is not providing a significant
each frame. In order to calculate the average frame rate we improvement in performance, as it shows that the two GPU
simply divide the time the test took to run (which is available are rarely fully occupied.
in the session summary tab) by the total number of frames
rendered. We also note the maximum startup frame time,
maximum frame time after startup and the minimum frame
time. This process was completed for all the test data we
collected and collated into an excel spreadsheet. Data was
split between the non-transparent and transparent screens.
Session data was first grouped by the hardware configuration
and then, within those groups sub groups were created for
raster or ray traced, then again by the scene complexity.
Results were plotted to show the comparison of render
times in similar hardware and software configurations across
the six nontransparent scenes. Output from a typical session
is shown in Fig 4. The vertical axis shows frames rendered
per second, the higher the frame rate the smoother any visible Fig. 5. Typical Output from Unreal Insight
animation on the screens will be and the faster the scene will
update if the camera is being moved. Results are grouped into
those for the same scene with the three hardware
configurations next to one another for comparison. On all
graphs the 2 RTX 6000 with NVLink are first, followed by
the single RTX6000 and finally the RTX3090. The first
group is the most complex scene and the last is the simplest Fig. 6. Typical Output from the timing window (when Ray tracing enabled)
scene.
For a typical set of frames GPU1 runs at approximately
VI. RESULTS 80% capacity, but GPU2 is only occupied for 50% of the
time. Further analysis shows that GPU2 is running tasks after
As might be expected RT is the slowest to render and the
GPU starts generating the next frame and the next frame
simple scene with simple lighting the fastest. However, there
cannot be started until GPU2 has completed its tasks. This in
are some interesting aspects to note here. First of all, whilst
turn delays GPU1 from starting the next frame. This sort of
the two Nvidia RTX A6000 running on the NVLink is the
scheduling problem of one task waiting for another to
fastest hardware, it is not significantly so. Secondly, without
218
Authorized licensed use limited to: PUC MG. Downloaded on July 04,2023 at 18:11:23 UTC from IEEE Xplore. Restrictions apply.
complete the task is very common in multi-threaded systems Nvidia have provided some information as to how their RTX
and would require careful scheduling to fully occupy both system handles ray tracing which provides some insight into
GPU threads. For the single processor (both RTX A6000 and why this might be. For opaque meshes a flag should be set
RTX A3090) we found that the GPU1 was almost fully which ensures that a closest-hit shader is called once when a
occupied all the time. Hence, for the NVlink configuration ray strikes a polygon. For non-opaque meshes an any-hit
we typically see is an approximate improvement of only shader must be called and the ray may continue after
130% over the single GPU configuration and even then, intersecting the geometry, this is clearly slower than non-
performance will further be lost through the additional opaque meshes [14], because multiple ray-to-geometry
scheduling work that needs to be done. collisions will likely need to be calculated for the same ray
Our observations with regards to the performance of and the any-hit shader itself is more complex, as it has to
NVLink closely follow those found in an online article correctly calculate the layers of transparency and possibly
published on the [8]. In this article the author compares the spawn additional rays depending on the lighting technique
performance of systems using a pair of Nvidia RTX 2080Ti used.
and RTX 2080 with and without NVLink enabled. For their For rasterization there are several different techniques to
experiments they have used a set of graphics cards to ray trace handle nonopaque geometries [15], which one is used is
5 scenes of varying complexity using VRay [13]. Their engine dependent. Transparency presents many problems for
experiments compare render time in seconds. Scenes are modern rendering hardware and render pipelines. It appears
substantially more complex than ours with render times up to that RTX is no exception.
500 seconds to generate a 4K image. In most cases the
systems with NVlink perform slightly worse than the systems VII. EVALUATION
without it. As to why the Nvidia RTX A3090 is almost as fast as the
significantly more expensive Nvidia RTX A6000, it is likely
A. Transparency
that Unreal is optimized more towards this card, since it is
We created a scene with many transparent overlapping predominantly aimed at the gaming market and the Nvidia
layers and another with overlapping layers and particles. The RTX A6000 is aimed more at the professional graphics
intention was to obtain a quick estimate of the cost of market. It is unlikely that many gamers would consider
transparency with the available hardware. Fig. 7 purchasing a Nvidia RTX 6000, the Nvidia RTX A3090 is
demonstrates average frame rate per second for each scene in currently considered state of the art. Finally, we note that Ray
terms of rendering Transparency. As with the previous Tracing, even on the top end GTX cards, is significantly
graphs, results are grouped by scene and render complexity. slower in all configurations that the equivalent scenes
rendered with raster. The simple scene with simple light (one
Average FPS light) takes almost as long to render as the most complex
scene using rasterization. This indicates that although the
60.00 hardware is extremely capable the workload involved in real-
40.00 time ray tracing is still very high. This may be acceptable if
20.00 the quality of rendered output is substantially better.
0.00 However, informal tests indicate that in a side by side
comparison most users struggled to tell the difference
between real-time ray traced images and rasterized ones. Of
course, this is at least in part due to the effort put into making
A6000 A6000 no SLI 3090 the rasterized scene look acceptable (baking lights, light
probes etc.) but still, it is an issue which needs to be
Fig. 7. Transparency Average FPS considered.
RTX6000 has promised that designers and artists could
In Fig. 5, the first group shows the relative performance of have the power of hardware-accelerated ray tracing, deep
the three hardware configurations with transparency rendered learning, and advanced shading to dramatically boost their
using RT. The second shows relative performance with productivity and create content faster than ever before.
particles and transparency and RT. The next two show the However, industry reports, as seen in [17] and [18] have
same scenes rendered using rasterization. The final three are published disappointing outcomes, while Nvidia GeForce
a combination of raster for transparency and ray tracing for RTX 3090 was rated at 100% in terms of performance [19].
particles. As before, there is not much advantage between the Although we were unable to locate one to one testing of these
two graphic cards for the ray traced scene. The top end cards, there are close enough comparisons between
hardware is the fastest, but only by about 20%. Nvidia RTX RTX2080Ti and RTX6000. A benchmarking report [18]
A3090 outperforms the single Nvidia RTX A6000. With the states that RTX2080Ti for example, is best suited for small-
particles included the twin Nvidia RTX A6000 perform about scale model development, rather than full-scale training
the same, but the other two machines perform somewhat workloads. On the other hand, the RTX 6000, while at a
worse. The introduction of particles has not significantly significantly higher cost than the RTX2080Ti, has the
affected the frame rate, possibly the complexity of the scenes benefits of both the 2080Ti's blower design and the TITAN
was not sufficient to stress the hardware, although visually RTX's large memory capacity. The blower design allows for
they were quite obvious. workstations to be configured with up to 4 in a single
The scene which uses transparency is significantly faster workstation. The Nvidia RTX A6000 can be densely
to render using Raster techniques than RT, over 300% in fact. populated in a system, whilst boasting large memory capacity
219
Authorized licensed use limited to: PUC MG. Downloaded on July 04,2023 at 18:11:23 UTC from IEEE Xplore. Restrictions apply.
for large models. This makes it preferable in deep learning REFERENCES
tasks for computer vision. [1] Manolya Kavakli, and Cinzia Cremona, C., 2022: The Virtual
Is the NVIDIA Quadro RTX 6000 good value for Production Studio Concept – A Game Changer in Filmmaking, IEEE
money? This question has been currently explored in the VR 2022: the 29th IEEE Conference on Virtual Reality and 3D User
Interfaces, 12-16 March, 2022, Virtual, p.1-10
industry. The architecture leveraging DirectX Raytracing
[2] Jeffrey A. Okun and Susan Zwerman, 2021. The VES Handbook of
(DXR), OptiX (a ray tracing API), and Vulkan (a cross- Visual Effects, 3rd Edition, Taylor and Francis, London
platform API, open standard for 3D graphics and computing [3] Tony Oakden and Kavakli, Manolya , 2022: Graphics Processing in
that targets high-performance real-time 3D graphics Virtual Production, 14th International Conference on Computer and
applications, such as video games and interactive media) was Automation Engineering (ICCAE 2022) March 25-27, 2022 Brisbane,
Australia, 1-6
first introduced in August 2018 at SIGGRAPH 2018 in the
[4] NVIDIA, 2021a. GEFORCE RTX 3080 FAMILY THE ULTIMATE
workstation-oriented Quadro RTX cards [16], and later at PLAY, last accessed on 29 Oct 2021, https://www.nvidia.com/en-
Gamescom [21]. As seen in [22], earlier Turing generation au/geforce/graphics-cards/30-series/rtx-3080-3080ti/
has experienced some glitches in addition to high prices, poor [5] NVIDIA, 2021b. NVIDIA RTX™ platform, last accessed on 29 Oct
availability and raytracing at a low level. Many users such as 2021, https://developer.nvidia.com/rtx
[23] and [24] reported failures of RTX 2080Ti. However, we [6] Unreal, 2021. Storytelling reimagined, last accessed on 29 Oct 2021,
have not observed such glitches in RTX3090 and RTX6000. https://www.unrealengine.com/en-US/solutions/film-television
[26] also compared the NVIDIA Quadro RTX 6000 with the [7] a6000-datasheet-us-nvidia, 2022. Retrieved from Nvidia.com:
https://www.nvidia.com/content/dam/en-zz/Solutions/design-
most popular Graphics Cards. NVIDIA Quadro RTX 6000 visualization/quadro-product-literature/proviz-print-nvidia-rtx-a6000-
scores poorly in their evaluation, barely achieving 69% datasheet-us-nvidia-1454980-r9-web%20(1).pdf
performance compared to AMD Radeon 6900 XT, and [8] Branko Gapo, 2021. NVLink vs. SLI and Multiple GPUs – Is it worth
NVIDIA GeForce RTX 3090 which are cost effective it? Retrieved from CGdirector: https://www.gpumag.com/nvlink-sli-
alternatives. Our findings are aligned with these studies. difference/
[9] Tessera SX40, 2022. https://www.bromptontech.com/product/sx40/
VIII. CONCLUSION [10] Brompton Tessera XD, 2022.
https://www.bromptontech.com/product/xd/S.
In this paper, we conducted experiments to test various [11] Manolya Kavakli, 2022: Requirements for reducing the input lag in a
hardware configurations using a variety of different scenes. Virtual Production Studio, HCI INTERNATIONAL 2022, 24TH
More specifically, we compared the consumer-oriented and INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER
product-oriented graphics cards such as NVIDIA Quadro INTERACTION, 26 June - 1 July 2022, Gothenburg, Sweden
RTX 6000 with GeForce RTX3090. The results were collated [12] Sevan Dalkian, 2019. “nDisplay Technology Whitepaper,” Epic
Games, https://cdn2.unrealengine.com/Unreal+Engine%2Fndisplay-
and presented graphically. Our high-end hardware did not whitepaper-final-updates%2FnDisplay_Whitepaper_FINAL-
offer the performance gains we had expected over the lower f87f7ae569861e42d965e4bffd1ee412ab49b238.pdf.
end hardware and further research is needed to understand [13] Chaos, 2022. Retrieved from https://www.chaos.com/
why. RTX was shown to be significantly slower, even on the [14] Juha Sjoholm, 2018. Effectively Integrating RTX Ray Tracing into a
top end hardware, than rasterization. Some additional Real-Time Rendering Engine. Retrieved from nvidia developer Blog:
experiments were carried out to test the effects of https://developer.nvidia.com/blog/effectively-integrating-rtx-ray-
tracing-real-time-rendering-engine/
transparency in the scene and this had a significant
[15] Alex Dunn and Louis Bavoil, 2014. Transparency (or Translucency)
detrimental effect on performance but noticeably worse when Rendering. Retrieved from Nvidia Developer Blog:
rendering using RTX rather than rasterization. https://developer.nvidia.com/content/transparency-or-translucency-
In summary, we conclude that whilst RTX produces better rendering
image quality, the gains might not be worth the additional [16] Epic. 2022. nDisplay Overview. Retrieved from
https://docs.unrealengine.com/:
hardware cost required. It might also be optimal to split the https://docs.unrealengine.com/4.27/en-
rendering of the scene across multiple computers where the US/WorkingWithMedia/IntegratingMedia/nDisplay/Overview/
images destined for the displays which do not appear in the [17] https://benchmarks.ul.com/hardware/gpu/NVIDIA+Quadro+RTX+60
frustum are rendered by lower spec computers not using 00+review
rasterization and at a lower resolution, but the main screen is [18] https://www.exxactcorp.com/blog/Benchmarks/deep-learning-
rendered by higher end hardware. benchmarks-comparison-2019-rtx-2080-ti-vs-titan-rtx-vs-rtx-6000-vs-
rtx-8000-selecting-the-right-gpu-for-your-needs
In future, first, we intend to create a series of more complex
[19] GPU Benchmarks Ranking
scenes to adequately stress the hardware used in the high-end (https://www.tomshardware.com/reviews/gpu-
machine and run similar tests to the ones in this paper. hierarchy,4388.html)
Second, we plan to test hardware configuration by physically [20] https://www.anandtech.com/show/13282/nvidia-turing-architecture-
switching the cards over and run the tests on the same deep-dive/5
computer to eliminate CPU issues. Third, we plan to assess [21] "NVIDIA TURING GPU ARCHITECTURE: Graphics Reinvented"
(PDF). Nvidia. 2018. Retrieved June 28, 2019.
the relative quality of the ray traced and rasterised versions of
[22] https://www.pcbuildersclub.com/en/2018/11/faulty-rtx-2080-ti-
the scenes, using experimental subjects to view the output and nvidia-switches-from-micron-to-samsung-for-gddr6-memory/
to rate the quality. [23] Florian Maislinger [2018]. "Faulty RTX 2080 Ti: Nvidia switches from
Micron to Samsung for GDDR6 memory". PC Builder's Club.
ACKNOWLEDGMENT Retrieved July 15, 2019.
We are grateful to Andy Marriott, Managing Director of [24] https://www.igorslab.de/
Silver Sun Pictures for giving us permission to conduct [25] https://blogs.nvidia.com/blog/2019/11/01/whats-the-difference-
testing in their VP Studio, and Lachlan Emanuel for running between-nvidia-rtx-and-gtx/
the tests and helping us in data collection.
220
Authorized licensed use limited to: PUC MG. Downloaded on July 04,2023 at 18:11:23 UTC from IEEE Xplore. Restrictions apply.