0% found this document useful (0 votes)

30 views16 pages

Performanceanalysisoptimizationforpcbasedvr-Applicationscpuperspective-699994 VR

Uploaded by

SAI GOWTHAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views16 pages

Performanceanalysisoptimizationforpcbasedvr-Applicationscpuperspective-699994 VR

Uploaded by

SAI GOWTHAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Performance Analysis and Optimization for PC-Based VR Applications: From the

CPU’s Perspective

Virtual Reality (VR) is becoming more and more popular these days as technology advancement
following Moore’s Law continues to make this brand new experience technically possible. While VR
brings a fantastic immersive experience to users, it also puts significantly greater computing workloads
on both the CPU and GPU compared to traditional applications due to dual-screen rendering, low
latency, high resolution and high frame rate requirements. As a result, performance issues are especially
critical in VR applications since a non-optimized VR experience with insufficient frame rate and high
latency could cause nausea for users. In this article, we’ll introduce a general methodology to profile,
analyze, and tackle bottlenecks and hotspots in a PC-based VR application regardless of the underlying
engine or VR runtime used. We use a PC VR game from Tencent* called Pangu* as an example to
showcase the analysis flow.

The rendering pipeline in VR games and conventional games

Before digging into the details of the analysis, we want to explain why the CPU plays an important
role in VR and how it affects VR performance. Figure 1 shows the rendering pipeline in conventional
games where CPU and GPU are processed in parallel in order to maximize the hardware utilization.
However, the scheme cannot be applied to VR since VR requires a low and stable rendering latency, the
rendering pipeline in conventional games doesn’t meet this requirement.

Let’s take Figure 1 as an example, if we look at the rendering latency of Frame N+2, we find that the
latency is much longer than normal because GPU has to finish the workload of Frame N+1 before starts
working on the workload of Frame N+2, thus introducing a significant latency to Frame N+2. Besides, the
rendering latency is varying for Frame N, Frame N+1 and Frame N+2 due to different execution
circumstances, which is also unfavorable in VR since it will introduce simulation sickness to users.

Figure 1: The rendering pipeline in conventional games.

As a result, the rendering pipeline in VR is changed to Figure 2 in order to achieve a shortest latency
for each frame. In Figure 2, the CPU/GPU parallelism is intentionally broken in order to exchange
efficiency for a low and stable rendering latency for each frame. In this case, CPU could be a bottleneck
in VR since GPU has to wait for the CPU to finish pre-rendering jobs (drawcall preparation, initialization
of dynamic shadowing, occlusion culling, etc.), optimization on CPU can help reduce the GPU bubbles
and improve the performance.

Figure 2: The rendering pipeline in VR games.

Background of the Pangu* VR workload

Pangu* is a PC-based VR title from Tencent*, it’s a DirectX* 11 FPS VR game developed with Unreal
Engine* 4 and supports both Oculus Rift* and HTC Vive*. We worked with Tencent* to improve the
performance and user experience of the game in order to achieve a best- in-class gaming experience on
Intel® Core™ i7 processors. Our result shows that during the development work outlined in this article
the frame rate was significantly improved from 36.4 frames per second (fps) on Oculus Rift* DK2
(1920x1080) during early testing to 71.4 fps on HTC Vive* (2160x1200) at the time of this article. Here
are the engines and VR runtimes used at the start and end of the development work:

• Initial development platform: Oculus v0.8 x64 runtime and Unreal 4.10.2

• Final development platform: SteamVR* v1463169981 and Unreal 4.11.2

The reason why different VR runtimes were used during development is that Pangu was initially
developed on Oculus Rift DK2 since both Oculus Rift CV1 and HTC Vive have not been released yet at
that time. Pangu was then migrated to HTC Vive once the device had been officially released. The
adoption of different VR runtimes was evaluated and didn’t make a significant difference in the
performance since both Oculus and SteamVR runtimes adopted the same VR rendering pipeline as
shown in Figure 2, and the rendering performance is mainly determined by the game engine in this
situation. It can also be verified in Figure 5 and Figure 14 that both Oculus and SteamVR runtimes
inserted GPU work(for distortion pass) after the GPU rendering of each frame, which consumed only a
small proportion of time with respect to the rendering.

Here shows the screenshots of the game before and after the optimization work, note that the
number of drawcalls was reduced by 5X after optimization, and the GPU execution period for each
frame was also reduced from 15.1ms to 9.6ms in average in order to fit the 90fps requirement on HTC
Vive*, as seen in Figure 12 and 13:
Figure 3: Screenshots of the game before(left) and after(right) optimization.

The specifications of the test platform:

• Intel® Core™ i7-6820HK processor (4 cores, 8 threads) @ 2.7GHz

• NVIDIA GeForce* GTX980 16GB GDDR5

• Graphics Driver Version: 364.72

• 16 GB DDR4 RAM

• Windows* 10 RTM Build 10586.164

Spotting the performance issues

In order to better understand the potential performance issues of Pangu*, we first collected the
basic performance metrics of the game, shown in Table 1. All the data in this table were collected using
various tools including GPU-Z, TypePerf, and Unreal Frontend. If we compare the data to system idle,
several observation can be made:

 Relatively low GPU utilization (49.64 percent on GTX980) with respect to the low frame rate (36.4
fps). If the GPU utilization were improved, a higher frame rate could be achieved.

 High numbers of draw calls. The rendering in DirectX 11 is single threaded and has relatively high
draw call overhead in the render thread as compared to DirectX 12. Since the game was developed
on DirectX 11 and VR rendering pipeline breaks the CPU/GPU concurrency in order to achieve a
shorter Motion-to-Photon(MTP) latency, the performance will be significantly decreased if the game
is render thread bound. Less draw calls can help relief the render thread bound in this case.

 CPU utilization doesn’t seem to be an issue in this table since it is only 13.6 percent on average. In
the following session we show that this statement is not true, that the workload is actually bounded
by some CPU threads.
System Idle Pangu* on Oculus Rift* DK2
(before optimization)

GPU Core Clock (MHz) 135 1337.6

GPU Memory Clock (MHz) 162 1749.6
GPU Memory Used (MB) 184 1727.71
GPU Load (%) 0 49.64
Average Frame Rate (fps) N/A 36.4
Draw Calls (/frame) 0 4437
1.04 13.58
Processor(_Total)\Processor Time
(%) (5.73/0.93/0.49/0.29/0.7/0.37/0.2 (30.20/10.54/26.72/3.76/12.72/8.
4/0.2) 16/12.27/4.29)
Processor
Information(_Total)\Processor 800 2700
Frequency (MHz)

Table 1: Basic performance metrics of the game before optimization.

In the following section, we use GPUView and Windows Performance Analyzer (WPA) from the
Windows Assessment Development Kit (ADK) [1] to profile and analyze the bottlenecks in the VR
workload.

A deeper look into the performance issues

GPUView [2] is a tool that can be used to investigate the performance interaction between graphics
applications, CPU threads, graphics driver, Windows graphics kernel, and related interactions. This tool
can also show whether an application is CPU bound or GPU bound in the timeline view. On the other
hand, WPA [3] is an analysis tool that creates graphs and data tables of Event Tracing for Windows
(ETW) events. It has a flexible UI that can be pivoted to view call stacks, CPU hotspots, context switches,
and so on. It can also be used to explore the root cause of performance issues. Both GPUView and WPA
can be used to analyze the event trace log (ETL) file captured by Windows Performance Recorder (WPR),
which can be run from the user interface (UI) or from the command line, and have built-in profiles that
can be used to select the events to be recorded.

For a VR application, it’s better to determine whether the application is bounded by the CPU, GPU,
or both. We can focus our optimization efforts on the most critical part of the performance bottlenecks,
thus achieving as much performance gain as possible with minimum effort.
Figure 4 shows the timeline view of Pangu* in GPUView before optimization, where the GPU work
queue, CPU context queues, and CPU threads are all shown in Figure 4. Several facts can be concluded
from the chart:

• The frame rate is about 37 fps.

• GPU utilization is about 50 percent.

• The user experience of this VR workload is bad since the frame rate is far less than 90 fps, which
is easy to induce motion sickness and nausea to end users.

• As seen in the GPU work queue, only two processes submitted tasks to the GPU: Oculus VR
runtime and VR workload. Oculus VR runtime performed works including distortion, chroma
aberration, and time warp at the last stage of frame rendering.

• The VR workload was bounded by both the CPU and GPU:

 For CPU bound, the GPU was idle for 50 percent of the time (GPU bubbles) and was
bounded by the execution of some CPU threads (T1864, T8292, T8288, T4672, T8308),
which means that GPU works could not be submitted and executed as long as the CPU
tasks in these threads had not been finished. If CPU tasks were optimized, GPU
utilization could be greatly improved to allow more works to be accomplished in the
GPU, thus achieving a higher frame rate.

 For GPU bound, we can see that even if we could eliminate all the GPU bubbles, the
GPU execution period of a single frame was still larger than 11.1ms (about 14.7ms in
this workload), which means that without further optimization on the GPU side, the VR
workload is not able to run at 90 fps, which is the required frame rate for premier VR
head-mounted displays (HMDs) including Oculus Rift* CV1 and HTC Vive*.
GPU bubbles

CPU bottleneck
Render thread
Game thread
Task thread
Task thread
Task thread
Driver thread

Figure 4: A timeline view of Pangu* in GPUView.

Preliminary recommendations for improving the frame rate and GPU utilization:

• Some non-urgent CPU work such as physics and AI could be deferred to let graphics rendering
jobs get submitted earlier, in order to reduce GPU bubbles during CPU bottlenecks

• Apply multithreading techniques efficiently to increase the amount of parallel execution and
reduce the CPU bottleneck in the game

• Reduce tasks that lead to CPU bottleneck such as draw calls, dynamic shadowing, cloth
simulation, physics and AI navigation, etc..

• Submit the CPU task of the next frame earlier to reduce GPU gaps. Although motion-to-photon
latency might be slightly increased, performance and efficiency could be greatly improved.

• DirectX 11 has a high drawcall and driver overheads, having too much drawcalls will lead to
serious CPU bound caused by the render thread, consider migrating to DirectX 12 if possible.
• Have to optimize GPU workloads as well(e.g. overdraw, bandwidth, texture fillrate, etc.) since
GPU active period for a single frame is longer than a vsync period, leading to frames dropping.

In order to take a deeper look into the bottleneck, we can use WPA to explore the same ETL file
analyzed with GPUView. WPA can also be used to identify CPU hotspots in terms of CPU utilization or
context switches; readers who are interested in this topic can refer to [4] for more details. Here we
introduce the main methodology for CPU bottleneck analysis and optimization.

Look at a single frame of the VR workload that has performance issues. Since the present packet is
submitted to the GPU once per frame after rendering, the timing between two succeeding present
packets is the period of a single frame, as shown in Figure 5 (26.78 ms, which is equivalent to 37.34 fps).

Present Present

26.78ms

7.37ms

CPU bottleneck leads to GPU bubble

Render thread
Game thread
Task thread
Task thread
Task thread
Driver thread

Figure 5: A timeline view of Pangu* in GPUView for a single frame. Note the CPU threads that lead to
GPU bubble.

Note that there are GPU bubbles in the GPU work queue (for example, 7.37 ms at the beginning of a
frame) which were actually caused by the CPU thread bound in the VR workload, as marked in the red
rectangle. It is because CPU tasks such as draw call preparation, culling, and the like must finish before
GPU commands are submitted for rendering.

If we use WPA to look at the CPU bound periods shown in GPUView, we are able to find out the key
CPU hotspots that prevent the GPU from execution. Figures 6–11 show the utilization and the call stacks
of CPU threads in WPA, within the same time period in GPUView.
CPU bottleneck leads to GPU bubble

7.37ms

Figure 6: A timeline view of Pangu* in WPA with the same period as Figure 5.

Let’s look at the bottleneck of each CPU thread.

Figure 7: The call stack of the render thread T1864.

As seen in the call stack, the top three bottlenecks in the render thread are

1. Base pass rendering for static meshes (50 percent)

2. Initialization of dynamic shadows (17 percent)
3. Compute view visibility (17 percent)

These bottlenecks are caused by too many draw calls, state changes, and shadow map rendering in
the render thread. Some suggestions to optimize the render thread performance:

• Apply batching in Unity* or actor merging in Unreal to reduce static mesh drawing. Combine
close objects together and use Level of Details (LOD). Using fewer materials and putting
separate textures into a larger texture atlas can also help.

• Use Double Wide Rendering in Unity or Instanced Stereo Rendering in Unreal to reduce draw
call submission overhead for stereo rendering.

• Reduce or turn off real-time shadows. Objects that receive dynamic shadowing will not be
batched, thus incurring a severe draw call penalty.
• Avoid using effects that cause objects to be rendered multiple times (reflections, per-pixel lights,
transparent, and multi-material objects).

Figure 8: The call stack of the game thread T8292.

For the game thread, the top three bottlenecks are

1. Set up pre-requirements for parallel processing of animation evaluation (36.4 percent)

2. Redraw view ports (21.2 percent)
3. Process Mouse Move Event (21.2 percent)

These bottlenecks can be optimized by reducing the number of view ports and the overhead of
parallel animation evaluation at the CPU side. Use single-thread processing instead if only a few number
of animation nodes are used, and examine the usage of mouse control at the CPU side.

Task threads (T8288, T4672, T8308):

Figure 9: The call stack of the task thread T8288.

Figure 10: The call stack of the task thread T4672.

Figure 11: The call stack of the task thread T8308.

For the task threads, bottlenecks are mostly located in physics-related simulations such as cloth
simulation, animation evaluation, and particle system update.
Table 2 shows a summary of the CPU hotspots (percent of clockticks) during GPU bubble periods.

THREAD FUNCTION CLOCKTICK %

Render Base pass rendering for static 13.1%
thread meshes 22.1%
Initialization of dynamic shadows 4.5%
Compute view visibility 4.5%
Game thread Set up pre-requirements for 7.7% 16.7%
parallel processing of animation
evaluation
Redraw view ports 4.5%
Process Mouse Move Event 4.5%
Physics Cloth simulation 13.5% 22%
Animation evaluation 4%
particle system update 4.5%
Driver 4.4%
Table 2: CPU hotspots during GPU bubble periods before optimization.

Optimization
After implementation of some of the optimization including Level of Detail (LOD), instanced stereo
rendering, dynamic shadow removal, deferred CPU tasks and optimized physics, the frame rate was
increased from 36.4 fps on Oculus Rift* DK2 (1920x1080) to 71.4 fps on HTC Vive* (2160x1200); the GPU
utilization was also increased from 54.7 percent to 74.3 percent due to fewer CPU bottlenecks.

Figures 12 and 13 show the GPU utilization of Pangu* before and after optimization, respectively, as
seen from the GPU work queue.

Figure 12: The GPU utilization of Pangu* before optimization.

Figure 13: The GPU utilization of Pangu* after optimization.

2.62ms

CPU bottleneck leads to GPU bubble

Render thread

Game thread
Task thread
Task thread
Task thread

Driver thread

Figure 14: A timeline view of Pangu* in GPUView after optimization.

Figure 14 shows the Pangu* VR workload viewed from the GPUView after optimization. The CPU
bottleneck period was decreased from 7.37 ms to 2.62 ms after optimization, which is achieved by the
following optimizations:

• Running start of the render thread(a method that reduces CPU bottleneck by introducing
an extra MTP latency) [5]
• Reduction on the number of draw call and overheads, including the adoption of LOD,
Instanced Stereo Rendering, and the removal of dynamic shadowing
• Works in game thread and task threads are deferred to process

Figures 15 shows the call stack of the CPU render thread in the CPU bottleneck period, as marked in the
red rectangle shown in Figure 14.

Figure 15: The call stack of the render thread T10404.

Table 3 shows a summary of the CPU hotspots (percent of clockticks) during GPU bubble periods after
optimization. Note that many of the hotspots and threads were removed from the CPU bottleneck as
compared to Table 2.

THREAD FUNCTION CLOCKTICK %

Render Base pass 44.3% 52.2%

thread rendering for
static meshes

Render occlusion 7.9%

Driver 38.5%
Table 3: CPU hotspots during GPU bubble periods after optimization.

More optimizations, such as actor merging or using fewer materials, can be done to optimize the
static mesh rendering in the render thread and further improve the frame rate. If CPU tasks were fully
optimized, the processing time of a single frame could be further reduced by 2.62 ms (the period of CPU
bottleneck in a single frame) to 11.38 ms, which is equivalent to 87.8 fps on average.

Table 4 shows the performance metrics before and after the optimization.

System Idle Pangu* on Oculus Pangu* on HTC Vive*

Rift* DK2
(after optimization)
(before optimization)

GPU Core Clock (MHz) 135 1337.6 1316.8

GPU Memory Clock

162 1749.6 1749.6
(MHz)

GPU Memory Used (MB) 184 1727.71 2253.03

GPU Load (%) 0 49.64 78.29

Average Frame Rate (fps) N/A 36.4 71.4

Draw Calls (/frame) 0 4437 845

31.37
1.04 13.58
Processor(_Total)\Proces (46.63/27.72/33.34/18.4
sor Time (%) (5.73/0.93/0.49/0.29/0.7 (30.20/10.54/26.72/3.76/
2/39.77/19.04/46.29/19.
/0.37/0.24/0.2) 12.72/8.16/12.27/4.29) 76)

Processor
Information(_Total)\Proc 800 2700 2700
essor Frequency (MHz)

Table 4: Basic performance metrics of the game before and after optimization.

Conclusion
In this article, we worked closely with Tencent* to profile and optimize the Pangu* VR workload on
premier HMDs in order to achieve 90 fps on Intel® Core™ i7 processors. After implementing some of our
recommendations, the frame rate was increased from 36.4 fps on Oculus Rift* DK2 (1920x1080) to 71.4
fps on HTC Vive* (2160x1200), the GPU utilization was also increased from 54.7 percent to 74.3 percent
on average due to fewer CPU bottlenecks. The CPU bound period in a single frame was also reduced
from 7.37 ms to 2.62 ms. Additional optimizations such as actor merging and texture atlasing could be
done to further optimize the performance.

Profiling and analyzing a VR application with various tools gives insights on the behaviors and
bottlenecks of the application, and it is essential to VR performance optimization since performance
metrics alone might not reflect the real bottlenecks. The methodology and tools discussed in this article
can be used to analyze VR applications developed with different game engines and VR runtimes, and
determine whether the workload is bounded by CPU, GPU, or both. Sometimes the CPU has a larger
impact to VR performance than the GPU due to drawcall preparation, physics simulation, lighting, or
shadowing. After analyzing various VR workloads with performance issues, we found that many of them
were CPU bounded, implying that CPU optimization can help improve the GPU utilization, performance,
and the user experience of the applications.

Reference
[1] https://developer.microsoft.com/en-us/windows/hardware/windows-assessment-deployment-kit

[2] http://graphics.stanford.edu/~mdfisher/GPUView.html

[3] https://msdn.microsoft.com/en-us/library/windows/hardware/hh162981.aspx

[4] https://randomascii.wordpress.com/2015/09/24/etw-central/

[5] http://www.gdcvault.com/play/1021771/Advanced-VR

About the author

Finn Wong is a senior application engineer in the Intel Software and Solutions Group (SSG),
Developer Relations Division (DRD), Advanced Graphics Enabling Team (AGE Team). He joined Intel in
2012 and has been actively enabling third-party media, graphics and perceptual computing applications
for the company’s PC products since then. Before joining Intel, Finn has seven years of experience and
expertise in the fields of video coding, digital image processing, computer vision, algorithms and
performance optimization, with several academic papers published in the literature as well. Finn holds a
bachelor's degree in electrical engineering and a master's degree in communication engineering, all
from National Taiwan University.

Notices

Intel technologies’ features and benefits depend on system configuration and may require enabled
hardware, software or service activation. Performance varies depending on system configuration. Check
with your system manufacturer or retailer or learn more at intel.com.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by
this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising
from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All
information provided here is subject to change without notice. Contact your Intel representative to
obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause
deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be
obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.

Intel, the Intel logo, and Intel Core are trademarks of Intel Corporation in the U.S. and/or other
countries.

*Other names and brands may be claimed as the property of others.

This sample source code is released under the Intel Sample Source Code License Agreement.

Batch, Batch, Batch
No ratings yet
Batch, Batch, Batch
38 pages
法 papalql uetheartofroborecall21egc-170427044904
No ratings yet
法 papalql uetheartofroborecall21egc-170427044904
161 pages
Computing Architectures For Virtual Reality: Electrical and Computer Engineering Dept
100% (1)
Computing Architectures For Virtual Reality: Electrical and Computer Engineering Dept
136 pages
Amd 2018 Porting To Vulkan dx12 Adam Sawicki
No ratings yet
Amd 2018 Porting To Vulkan dx12 Adam Sawicki
45 pages
Stuttering in Game Graphics:: Detection and Solutions
No ratings yet
Stuttering in Game Graphics:: Detection and Solutions
58 pages
Pierre Loup Griffais and John McDonald Vulkan
No ratings yet
Pierre Loup Griffais and John McDonald Vulkan
65 pages
Graphics Benchmarks
No ratings yet
Graphics Benchmarks
13 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
Furmark Log
No ratings yet
Furmark Log
36 pages
Performing 3D Measurements in A VR Environment
No ratings yet
Performing 3D Measurements in A VR Environment
8 pages
(Visualization) Mmwave Wireless VR Data Transmission
No ratings yet
(Visualization) Mmwave Wireless VR Data Transmission
4 pages
Tao 等 - 2025 - GS-Cache a GS-Cache Inference Framework for Large
No ratings yet
Tao 等 - 2025 - GS-Cache a GS-Cache Inference Framework for Large
19 pages
Chap02-03 Virtual Reality
No ratings yet
Chap02-03 Virtual Reality
127 pages
Furmark Log
No ratings yet
Furmark Log
2 pages
Thomas Bradley: Hyper-Q Example
No ratings yet
Thomas Bradley: Hyper-Q Example
9 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
Graphics Card:: FPS or Frames Per Second
No ratings yet
Graphics Card:: FPS or Frames Per Second
10 pages
Hyper Q
No ratings yet
Hyper Q
9 pages
Direct3D 11 Computer Shader More Generality For Advanced Techniques
No ratings yet
Direct3D 11 Computer Shader More Generality For Advanced Techniques
54 pages
Slab Design Eurocode
100% (2)
Slab Design Eurocode
6 pages
Stuttering in Game Graphics:: Detection and Solutions
No ratings yet
Stuttering in Game Graphics:: Detection and Solutions
58 pages
Installation Guide & User 'S Manual: The ACS-600 Load Moment Limiter
100% (1)
Installation Guide & User 'S Manual: The ACS-600 Load Moment Limiter
35 pages
NSX Lab Description
No ratings yet
NSX Lab Description
344 pages
Baker
No ratings yet
Baker
4 pages
SINAMICS G120 PN at S7-1200 DOCU V1d0 en
No ratings yet
SINAMICS G120 PN at S7-1200 DOCU V1d0 en
63 pages
MP WRD 6625 - Rewa
No ratings yet
MP WRD 6625 - Rewa
77 pages
De Vera, Crisangelyn C
No ratings yet
De Vera, Crisangelyn C
2 pages
The End of The Gpu Roadmap: Tim Sweeney CEO, Founder Epic Games
No ratings yet
The End of The Gpu Roadmap: Tim Sweeney CEO, Founder Epic Games
74 pages
KASAMA/SSC Constitution and by Laws of 2000
100% (1)
KASAMA/SSC Constitution and by Laws of 2000
12 pages
QC Yorp Forms
No ratings yet
QC Yorp Forms
4 pages
Payment Plan: Doctors Floor Price List
No ratings yet
Payment Plan: Doctors Floor Price List
1 page
Chapter-Three Understand Consumer Behavior 3.1 Consumer Buying Behavior
No ratings yet
Chapter-Three Understand Consumer Behavior 3.1 Consumer Buying Behavior
11 pages
Classification of Financial Intermediaries - Banks
No ratings yet
Classification of Financial Intermediaries - Banks
34 pages
John Crane Seal Type 1A 2
No ratings yet
John Crane Seal Type 1A 2
6 pages
Systemair Fans KVO Data Sheet Eng PDF
No ratings yet
Systemair Fans KVO Data Sheet Eng PDF
4 pages
CS IMMIGRATION LTD. Financial Statements 2023
No ratings yet
CS IMMIGRATION LTD. Financial Statements 2023
7 pages
Qualitrol - Low Frequency Vs High Frequency Partial Discharge Detection
No ratings yet
Qualitrol - Low Frequency Vs High Frequency Partial Discharge Detection
20 pages
BUAD 823 Note-Environment of Business
No ratings yet
BUAD 823 Note-Environment of Business
98 pages
Canicosa Contract To Sell
No ratings yet
Canicosa Contract To Sell
5 pages
7th Sem Mech Internal Question Papers
No ratings yet
7th Sem Mech Internal Question Papers
16 pages
Agarrado vs. Librando-Agarrado
No ratings yet
Agarrado vs. Librando-Agarrado
6 pages
Global Maritime Distress and Safety System (GMDSS) : Companies Can Opt For Block Booking
100% (1)
Global Maritime Distress and Safety System (GMDSS) : Companies Can Opt For Block Booking
1 page
Index: Powerpoint
No ratings yet
Index: Powerpoint
24 pages
Phannarak CV
No ratings yet
Phannarak CV
2 pages
BMW PDF
No ratings yet
BMW PDF
38 pages
كاتلوج 2
No ratings yet
كاتلوج 2
44 pages
Black Box and White Box Testing
No ratings yet
Black Box and White Box Testing
5 pages
Logan Keylock - Term 2 Marketing Task 2024
No ratings yet
Logan Keylock - Term 2 Marketing Task 2024
4 pages
Module 3.1 - Training Certificate - Folayeni - Awosika
No ratings yet
Module 3.1 - Training Certificate - Folayeni - Awosika
1 page
Notif VO BVO 06 2024 23082024
No ratings yet
Notif VO BVO 06 2024 23082024
1 page
New Indy Complaint
No ratings yet
New Indy Complaint
5 pages
CompTIA A+ Complete Review Guide: Core 1 Exam 220-1201 and Core 2 Exam 220-1202
From Everand
CompTIA A+ Complete Review Guide: Core 1 Exam 220-1201 and Core 2 Exam 220-1202
Troy McMillan
No ratings yet
CompTIA A+ Complete Review Guide: Core 1 Exam 220-1101 and Core 2 Exam 220-1102
From Everand
CompTIA A+ Complete Review Guide: Core 1 Exam 220-1101 and Core 2 Exam 220-1102
Troy McMillan
5/5 (2)
Exploring BeagleBone: Tools and Techniques for Building with Embedded Linux
From Everand
Exploring BeagleBone: Tools and Techniques for Building with Embedded Linux
Derek Molloy
4/5 (1)
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC
From Everand
Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC
Naim Dahnoun
No ratings yet
C Programming for the Pc the Mac and the Arduino Microcontroller System
From Everand
C Programming for the Pc the Mac and the Arduino Microcontroller System
Peter D Minns
No ratings yet
CNC Router Essentials: The Basics for Mastering the Most Innovative Tool in Your Workshop
From Everand
CNC Router Essentials: The Basics for Mastering the Most Innovative Tool in Your Workshop
Randy Johnson
5/5 (3)
Building a Gaming Computer
From Everand
Building a Gaming Computer
Bellerophon Carlyle
5/5 (2)
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet
The RTX 5090 Blueprint: A Guide for Power Users
From Everand
The RTX 5090 Blueprint: A Guide for Power Users
Terrance Young
No ratings yet
NES Architecture: Architecture of Consoles: A Practical Analysis, #1
From Everand
NES Architecture: Architecture of Consoles: A Practical Analysis, #1
Rodrigo Copetti
5/5 (1)
The No Bull$#!£ Guide to Building Your Own PC: No Bull Guides
From Everand
The No Bull$#!£ Guide to Building Your Own PC: No Bull Guides
David Smallway
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
From Everand
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
Robert Johnson
No ratings yet
Engineering AI Excellence
From Everand
Engineering AI Excellence
Azhar ul Haque Sario
No ratings yet
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
Wii U Architecture: Architecture of Consoles: A Practical Analysis, #21
From Everand
Wii U Architecture: Architecture of Consoles: A Practical Analysis, #21
Rodrigo Copetti
No ratings yet
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
From Everand
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
Rodrigo Copetti
No ratings yet
Proton for Linux Gaming: Definitive Reference for Developers and Engineers
From Everand
Proton for Linux Gaming: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Master System Architecture: Architecture of Consoles: A Practical Analysis, #15
From Everand
Master System Architecture: Architecture of Consoles: A Practical Analysis, #15
Rodrigo Copetti
2/5 (1)
PSP Architecture: Architecture of Consoles: A Practical Analysis, #18
From Everand
PSP Architecture: Architecture of Consoles: A Practical Analysis, #18
Rodrigo Copetti
No ratings yet
Neo Geo Architecture: Architecture of Consoles: A Practical Analysis, #23
From Everand
Neo Geo Architecture: Architecture of Consoles: A Practical Analysis, #23
Rodrigo Copetti
No ratings yet
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
From Everand
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
Rodrigo Copetti
No ratings yet
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
From Everand
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
Rodrigo Copetti
No ratings yet
CompTIA A+ Fast Track : Study Guide & Practice Tests
From Everand
CompTIA A+ Fast Track : Study Guide & Practice Tests
SUJAN
No ratings yet
SNES Architecture: Architecture of Consoles: A Practical Analysis, #4
From Everand
SNES Architecture: Architecture of Consoles: A Practical Analysis, #4
Rodrigo Copetti
No ratings yet
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
From Everand
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
Mulayam Singh
No ratings yet
Virtual Boy Architecture: Architecture of Consoles: A Practical Analysis, #17
From Everand
Virtual Boy Architecture: Architecture of Consoles: A Practical Analysis, #17
Rodrigo Copetti
No ratings yet
Wii Architecture: Architecture of Consoles: A Practical Analysis, #11
From Everand
Wii Architecture: Architecture of Consoles: A Practical Analysis, #11
Rodrigo Copetti
No ratings yet
Game Boy Advance Architecture: Architecture of Consoles: A Practical Analysis, #7
From Everand
Game Boy Advance Architecture: Architecture of Consoles: A Practical Analysis, #7
Rodrigo Copetti
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Game Boy / Color Architecture: Architecture of Consoles: A Practical Analysis, #2
From Everand
Game Boy / Color Architecture: Architecture of Consoles: A Practical Analysis, #2
Rodrigo Copetti
No ratings yet
Mega Drive Architecture: Architecture of Consoles: A Practical Analysis, #3
From Everand
Mega Drive Architecture: Architecture of Consoles: A Practical Analysis, #3
Rodrigo Copetti
No ratings yet
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
PC Engine / TurboGrafx-16 Architecture: Architecture of Consoles: A Practical Analysis, #16
From Everand
PC Engine / TurboGrafx-16 Architecture: Architecture of Consoles: A Practical Analysis, #16
Rodrigo Copetti
No ratings yet
OpenGL Deep Dive: Expert Techniques and Performance Optimization: OpenGL
From Everand
OpenGL Deep Dive: Expert Techniques and Performance Optimization: OpenGL
Kameron Hussain
No ratings yet
OpenGL to Vulkan: Mastering Graphics Programming
From Everand
OpenGL to Vulkan: Mastering Graphics Programming
Kameron Hussain
No ratings yet
Joint Photographic Experts Group: Unlocking the Power of Visual Data with the JPEG Standard
From Everand
Joint Photographic Experts Group: Unlocking the Power of Visual Data with the JPEG Standard
Fouad Sabry
No ratings yet

Performanceanalysisoptimizationforpcbasedvr-Applicationscpuperspective-699994 VR

Uploaded by

Performanceanalysisoptimizationforpcbasedvr-Applicationscpuperspective-699994 VR

Uploaded by

Performance Analysis and Optimization for PC-Based VR Applications: From the

The rendering pipeline in VR games and conventional games

Figure 1: The rendering pipeline in conventional games.

Figure 2: The rendering pipeline in VR games.

Background of the Pangu* VR workload

• Final development platform: SteamVR* v1463169981 and Unreal 4.11.2

The specifications of the test platform:

• Intel® Core™ i7-6820HK processor (4 cores, 8 threads) @ 2.7GHz

• NVIDIA GeForce* GTX980 16GB GDDR5

• Graphics Driver Version: 364.72

• Windows* 10 RTM Build 10586.164

Spotting the performance issues

GPU Core Clock (MHz) 135 1337.6

Table 1: Basic performance metrics of the game before optimization.

A deeper look into the performance issues

• The frame rate is about 37 fps.

• GPU utilization is about 50 percent.

• The VR workload was bounded by both the CPU and GPU:

Figure 4: A timeline view of Pangu* in GPUView.

CPU bottleneck leads to GPU bubble

Let’s look at the bottleneck of each CPU thread.

Figure 7: The call stack of the render thread T1864.

1. Base pass rendering for static meshes (50 percent)

Figure 8: The call stack of the game thread T8292.

For the game thread, the top three bottlenecks are

1. Set up pre-requirements for parallel processing of animation evaluation (36.4 percent)

Task threads (T8288, T4672, T8308):

Figure 10: The call stack of the task thread T4672.

Figure 11: The call stack of the task thread T8308.

THREAD FUNCTION CLOCKTICK %

Figure 12: The GPU utilization of Pangu* before optimization.

Figure 13: The GPU utilization of Pangu* after optimization.

CPU bottleneck leads to GPU bubble

Figure 14: A timeline view of Pangu* in GPUView after optimization.

Figure 15: The call stack of the render thread T10404.

THREAD FUNCTION CLOCKTICK %

Render Base pass 44.3% 52.2%

Render occlusion 7.9%

System Idle Pangu* on Oculus Pangu* on HTC Vive*

GPU Core Clock (MHz) 135 1337.6 1316.8

GPU Memory Clock

GPU Memory Used (MB) 184 1727.71 2253.03

GPU Load (%) 0 49.64 78.29

Average Frame Rate (fps) N/A 36.4 71.4

Draw Calls (/frame) 0 4437 845

About the author

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation

You might also like