UnityPerformanceTuningBible En
UnityPerformanceTuningBible En
This document was created with the goal of being used as a reference when you
have trouble with performance tuning of Unity applications.
Performance tuning is an area where past know-how can be utilized, and I feel that
it is an area that tends to be highly individualized. Those with no experience in this
field may have the impression that it is somewhat difficult. One of the reasons may
be that the causes of performance degradation vary widely.
However, the workflow of performance tuning can be molded. By following that
flow, it becomes easy to identify the cause of the problem and simply look for a
solution that fits the event. Knowledge and experience can help in the search for
a solution. Therefore, this document is designed to help you learn mainly about
"workflow" and "knowledge from experience".
Although it was intended to be an internal document, we hope that many people
will take the time to look at it and brush up on it. We hope that it will be of some help
to those who have read it.
GitHub
The repository for this book, *1 , is open to the public. Additions and corrections will
be made as needed. You can also point out corrections or suggest additions using
PR or Issue. Please use it if you like.
Disclaimer
The information in this document is provided for informational purposes only. There-
fore, any development, production, or operation using this document must be done at
your own risk and discretion. We assume no responsibility whatsoever for the results
of development, production, or operation based on this information.
*1 https://github.com/CyberAgentGameEntertainment/UnityPerformanceTuningBible/
iii
Table of Contents
Introduction. ii
About this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Organization of this Manual . . . . . . . . . . . . . . . . . . . . . . . . ii
GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Chapter 2 Fundamentals 24
2.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.1 SoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.2 iPhone, Android and SoC . . . . . . . . . . . . . . . . 26
2.1.3 CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.4 GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1.5 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1.6 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.2.1 Rendering Pipeline . . . . . . . . . . . . . . . . . . . . 43
2.2.2 Semi-transparent rendering and overdraw . . . . . . . 45
2.2.3 Draw calls, set-pass calls, and batching . . . . . . . . . 46
2.3 Data Representation . . . . . . . . . . . . . . . . . . . . . . . 47
2.3.1 Bits and Bytes . . . . . . . . . . . . . . . . . . . . . . . 48
2.3.2 Image . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3.3 Image Compression . . . . . . . . . . . . . . . . . . . 51
2.3.4 Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.5 Keyframe Animation . . . . . . . . . . . . . . . . . . . 54
2.4 How Unity Works . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4.1 Binaries and Runtime . . . . . . . . . . . . . . . . . . . 56
2.4.2 Asset entities . . . . . . . . . . . . . . . . . . . . . . . 60
2.4.3 Threads . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.4.4 Game Loop . . . . . . . . . . . . . . . . . . . . . . . . 63
2.4.5 GameObject . . . . . . . . . . . . . . . . . . . . . . . . 66
2.4.6 AssetBundle . . . . . . . . . . . . . . . . . . . . . . . . 68
2.5 C# Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
v
Table of Contents
vi
3.7.2 Allocations . . . . . . . . . . . . . . . . . . . . . . . . . 153
3.8 Android Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
3.8.1 Profile Method . . . . . . . . . . . . . . . . . . . . . . . 157
3.8.2 CPU Measurement . . . . . . . . . . . . . . . . . . . . 158
3.8.3 Memory Measurement . . . . . . . . . . . . . . . . . . 160
3.9 RenderDoc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
3.9.1 Measurement Method . . . . . . . . . . . . . . . . . . 162
3.9.2 How to View Capture Data . . . . . . . . . . . . . . . . 165
vii
Table of Contents
viii
7.3.4 SRP Batcher . . . . . . . . . . . . . . . . . . . . . . . . 224
7.4 SpriteAtlas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.5 Culling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
7.5.1 Visual Culling . . . . . . . . . . . . . . . . . . . . . . . 230
7.5.2 Rear Culling . . . . . . . . . . . . . . . . . . . . . . . . 230
7.5.3 Occlusion culling . . . . . . . . . . . . . . . . . . . . . 231
7.6 Shaders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.6.1 Reducing the precision of floating-point types . . . . . 234
7.6.2 Performing Calculations with Vertex Shaders . . . . . . 234
7.6.3 Prebuild information into textures . . . . . . . . . . . . 235
7.6.4 ShaderVariantCollection . . . . . . . . . . . . . . . . . 236
7.7 Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.7.1 Real-time shadows . . . . . . . . . . . . . . . . . . . . 238
7.7.2 Light Mapping . . . . . . . . . . . . . . . . . . . . . . . 242
7.8 Level of Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
7.9 Texture Streaming . . . . . . . . . . . . . . . . . . . . . . . . . 247
ix
Table of Contents
x
12.3.2 UniTask Tracker . . . . . . . . . . . . . . . . . . . . . . 302
CONCLUSION 305
xi
Chapter 1 Getting Started with Performance Tuning
Chapter 1
This chapter describes the preparation required for performance tuning and the flow
of the process.
First, we will cover what you need to decide on and consider before starting per-
formance tuning. If your project is still in the early stages, please take a look at it.
Even if your project is somewhat advanced, it is a good idea to check again to see
if you have taken into account the information listed in this section. Next, we will ex-
plain how to address the application that is experiencing performance degradation.
By learning how to isolate the cause of the problem and how to resolve it, you will be
able to implement a series of performance tuning flows.
2
1.1 Preliminary Preparation
is started, the cost of change will be enormous. It will take time to decide on the
indicators introduced in this section, but do not be in a hurry and proceed firmly.
Suppose you have a project that is now in the post-production phase, but
has a rendering bottleneck on a low-spec terminal. Memory usage is already
near its limit, so switching to a lower-load model based on distance is not an
option. Therefore, we decide to reduce the number of vertices in the model.
First, we will reorder the data for reduction. A new purchase order will be
needed. Next, the director needs to check the quality again. And finally, we
also need to debug. This is a simplified description, but in reality there will be
more detailed operations and scheduling.
After mass production, there will be dozens to hundreds of assets that will
need to be handled as described above. This is time-consuming and labor-
intensive, and can be fatal to the project.
To prevent such a situation, it is very important to create the most burden-
some scenes and verify in advance whether they meet the indicators.
Item Element
Frame rate How much frame rate to aim for at all times.
Memory Estimate the maximum memory on which screen and determine the limit value.
Transition time How long is the appropriate transition time wait?
Heat How much heat can be tolerated in X hours of continuous play
Battery How much battery consumption is acceptable for X hours of continuous play
Table 1.1Frame rate andmemory are the most important indicators among the
3
Chapter 1 Getting Started with Performance Tuning
The definition of the volume zone depends on the project. You may want to de-
cide based on market research or other titles that can be used as benchmarks.
Or, given the background of prolonged replacement of mobile devices, you may
use the mid-range of about four years ago as a benchmark for now. Even if the
rationale is a bit vague, let’s set a flag to aim for. From there, you can make
adjustments.
Let us consider an actual example. Suppose you have a project with the following
goals
When the team verbalized these vague goals, the following metrics were gener-
ated.
• Frame rate
– 60 frames ingame and 30 frames outgame from a battery consumption
perspective.
• Memory
– To speed up the transition time, the design should retain some out-game
resources during ingame. The maximum amount of memory used shall
be 1 GB.
• Transition Time
– Transition time to ingame and outgame should be at the same level as the
competition. In time, it should be within 3 seconds.
• Heat
– Same level as the competition. It does not get hot for 1 hour continuously
on the verified device. (Not charging)
• Battery
4
1.1 Preliminary Preparation
Once you have determined the target, you can use a reference device to test it. If
the target is not reached at all, it is a good indicator.
In this case, the theme of the game was to run smoothly, so the frame rate
was set at 60 frames per second. A high frame rate is also desirable for
rhythm action games and games with severe judgments such as first-person
shooters (FPS). However, there is a disadvantage to a high frame rate. The
higher the frame rate, the more battery power is consumed. In addition, the
more memory is used, the more likely it is to be killed by the OS when it
suspends. Considering these advantages and disadvantages, decide on an
appropriate target for each game genre.
5
Chapter 1 Getting Started with Performance Tuning
The verification environment is Unity 2019.4.3 and Xcode 11.6, using the values in
the Memory section of Xcode’s Debug Navigator as reference. Based on the results
of this verification, it is recommended to keep the memory within 1.3 GB for devices
with 2 GB of onboard memory, such as the iPhone 6S and 7. It can also be seen that
when supporting devices with 1GB of onboard memory, such as the iPhone 6, the
memory usage constraints are much stricter. Another characteristic of iOS11 is that
its memory usage is significantly higher than that of the iPhone 6, possibly due to a
different memory management mechanism. When verifying, please note that such
differences due to operating systems are rare.
6
1.1 Preliminary Preparation
Figure 1.1 In the following example, the test environment is a little old, so some
of the measurements have been re-measured using the latest environment at
the time of writing. We used Unity 2020.3.25 and 2021.2.0 and Xcode 13.3.1 to
build on iPhoneXR with OS versions 14.6 and 15.4.1. As a result, there was no
particular difference in the measured values, so I think the data is still reliable.
7
Chapter 1 Getting Started with Performance Tuning
There are several benchmark measurement applications, but I use Antutu as my
benchmark. This is because there is a website that compiles measurement data
and volunteers are actively reporting their measurement data.
• Screen resolution
• Number of objects displayed
• Shadows
• Post-effect function
8
1.2 Prevention
• Frame rate
• Ability to skip CPU-intensive scripts, etc.
However, this will reduce the quality of the look and feel of the project, so please
consult with the director and together explore what line is acceptable for the project.
1.2 Prevention
As with defects, performance degradation can have a variety of causes over time,
increasing the difficulty of investigation. It is a good idea to implement a mechanism
in your application that will allow you to notice the problem as early as possible. A
simple and effective way to do this is to display the current application status on the
screen. It is recommended that at least the following elements be displayed on the
screen at all times
While frame rate can be detected by the user’s experience that performance is
declining, memory can only be detected by crashes. Figure 1.3 The probability of
detecting memory leaks at an early stage will increase simply by constantly display-
ing them on the screen as shown in the following table.
This display method can be further improved to be more effective. For example,
if the target frame rate is 30 frames per second, turn the display green for frames
between 25 and 30, yellow for frames between 20 and 25, and red for frames below
that. This way, you can intuitively see at a glance whether the application meets the
9
Chapter 1 Getting Started with Performance Tuning
criteria.
10
1.3 Work on performance tuning
To identify the cause of the problem and confirm that the system has become
faster. This is an important attitude for performance tuning.
First, crashes can be classified into two main types: "memory overflow"
or"program execution error. The latter is not the domain of performance tuning,
so the specifics will not be covered in this document.
Next, "CPU and GPU processing time" will probably account for the majority of
11
Chapter 1 Getting Started with Performance Tuning
screen dropouts and long loading times. In the following sections, we will focus on
"memory" and "processing time" to delve deeper into performance degradation.
This is the author’s experience, but there were cases where some re-
sources were not released due to timing issues after resource release (after
UnloadUnusedAssets). These unreleased resources are released when tran-
sitioning to the next scene. In contrast, a gradual increase in memory usage
with repeated transitions will eventually cause a crash. In order to separate
12
1.5 Let’s investigate memory leaks.
the former problem from the latter, this document recommends repeating tran-
sitions several times during memory measurement.
Incidentally, if there is a problem like the former, some object is probably
still holding a reference at the time of resource release and is subsequently
released. It is not fatal, but it is a good idea to investigate the cause of the
problem and resolve it.
Profiler (Memory)
This is a profiler tool that is included by default in the Unity editor. Therefore, you can
easily perform measurement. Basically, you should snapshot the memory with "De-
tailed" and "Gather object references" set and investigate. Unlike other tools, this tool
does not allow snapshot comparisons of measurement data. For more information
on how to use this tool, please refer to "3.1.3 Memory".
Memory Profiler
This one must be installed from Package Manager. It graphically displays memory
contents in a tree map. It is officially supported by Unity and is still being updated fre-
quently. Since v0.5, the method of tracking reference relationships has been greatly
improved, so we recommend using the latest version. See "3.4 Memory Profiler" for
details on usage.
13
Chapter 1 Getting Started with Performance Tuning
Heap Explorer
This must be installed from the Package Manager. It is a tool developed by an
individual, but it is very easy to use and lightweight. It is a tool that can be used to
track references in a list format, which is a nice addition to the Memory Profiler of
v0.4 and earlier. It is a good alternative tool to use when the v0.5 Memory Profiler is
not available. See "3.5 Heap Explorer" for details on how to use it.
1.6.1 Assets
If Simple View has a lot of Assets, it may be due to unnecessary assets or memory
leaks. The Assets-related area here is the area enclosed by the rectangle at Figure
1.6.
14
1.6 Let’s reduce memory
Please refer to Chapter 4 "Tuning Practice - Asset" for more information on what
to look out for in each asset.
1.6.2 GC (Mono)
If there is a lot of GC (Mono) in Simple View, it is likely that a large GC.Alloc is
occurring at one time. Or memory may be fragmented due to GC.Alloc occurring
every frame. These may be causing extra expansion of the managed heap area. In
this case, you should steadily reduce GC.Alloc.
See "2.1.5 Memory" for more information on managed heap. Similarly, details on
15
Chapter 1 Getting Started with Performance Tuning
GC" is shown as "GC" in 2020.2 and later versions, while "Mono" is shown
up to 2020.1 and below. Both refer to the amount of managed heap space
occupied.
1.6.3 Other
Check for suspicious items in the Detailed View. For example, you may want to open
the Other item once to investigate.
16
1.7 Isolating the Cause of Processing Failures
do so once. Comparing the values of each may reveal outliers. See "2. Detailed
view" for more information.
1.6.4 Plug-ins
So far we have used Unity’s measurement tools to isolate the cause of the problem.
However, Unity can only measure memory managed by Unity. In other words, the
amount of memory allocated by plug-ins is not measured. Check to see if ThirdParty
products are allocating extra memory.
Use the native measurement tool (Instruments in Xcode). See "3.7 Instruments"
for more information.
All of these changes have a large impact and may fundamentally affect the fun of
the game. Therefore, specification considerations are a last resort. Make sure to
estimate and measure memory early on to prevent this from happening.
17
Chapter 1 Getting Started with Performance Tuning
Instantaneous processing slowdowns are measured as a processing load that
is sharp like a needle. They are also called spikes because of their appearance.
Figure 1.8 In the following section, the measured data shows a sudden increase
in steady-state load, as well as periodic spikes. Both events will require performance
tuning. First, a relatively simple instantaneous load study will be explained. Then,
the steady-state load survey will be explained.
18
1.9 Investigate the steady-state load
The fewer the allocations, the better, but this does not mean that allocations should
be zero. For example, there is no way to prevent allocations that occur during the
Instantiate process. In such cases, pooling, in which objects are used instead of
generating objects each time, is effective. For more information on GC, please refer
to "2.5.2 Garbage Collection".
• Instantiate processing
• Active switching of a large number of objects or objects in a deep hierarchy
• Screen capture processing, etc.
As this is a part that is highly dependent on the project code, there is no one-
size-fits-all solution. If the actual measurement reveals the cause, please share the
measurement results with the project members and discuss how to improve it.
19
Chapter 1 Getting Started with Performance Tuning
A situation where the CPU is the bottleneck is called CPU-bound, and a situation
where the GPU is the bottleneck is called GPU-bound.
As an easy way to isolate the two, if any of the following apply to you, there is a
good chance that you are GPU-bound.
On the other hand, if these are not present, there is a possibility of CPU bounce.
In the following sections, we will explain how to investigate CPU bounces and GPU
bounces.
20
1.9 Investigate the steady-state load
on optimization.
Note that occlusion culling requires data to be prepared in advance and that
memory usage will increase as the data is deployed to memory. It is a common
practice to build pre-prepared information in memory to improve performance in
this way. Since memory and performance are often inversely proportional, it is
a good idea to be aware of memory as well when employing something.
Is batching appropriate?
Batching is the process of drawing all the objects at once. Batching is effective for
GPU bouncing because it improves drawing efficiency. For example, Static Batching
can be used to combine the meshes of multiple immobile objects.
There are many different methods for batching, so I will list some of the most
common ones. If you are interested in any of them, please refer to "7.3 Reducing
Draw Calls".
• Static Batching
• Dynamic Batching
• GPU Instancing
• SRP Batcher, etc.
21
Chapter 1 Getting Started with Performance Tuning
Otherwise
It can be said that each GPU processing is piled up and heavy. In this case, the only
way is to steadily improve one by one.
Also, as with CPU bound, if the target reduction cannot be reached, it is a good
idea to go back to "1.1.4 Determine quality setting specifications." and reconsider.
1.10 Conclusion
In this chapter, we have discussed what to watch out for "before" and "during" per-
formance tuning.
The things to watch out for before and during performance tuning are as follows
22
1.10 Conclusion
23
2.1 Hardware
Chapter 2
Fundamentals
2.1 Hardware
Computer hardware consists of five main devices: input devices, output devices,
storage devices, computing devices, and control devices. These are called the five
major devices of a computer. This section summarizes the basic knowledge of these
hardware devices that are important for performance tuning.
2.1.1 SoC
A computer is composed of various devices. Typical devices include CPUs for control
and computation, GPUs for graphics computation, and DSPs for processing audio
and video digital data. In most desktop PCs and other devices, these are indepen-
dent as separate integrated circuits, which are combined to form the computer. In
smartphones, on the other hand, these devices are implemented on a single chip to
reduce size and power consumption. This is called a system-on-a-chip, or SoC.
25
Chapter 2 Fundamentals
26
2.1 Hardware
The naming of Snapdragon has been a combination of the string "Snapdragon"
and a three-digit number.
These numbers have a meaning: the 800s are the flagship models and are used
in so-called high-end devices. The lower the number, the lower the performance
and price, and the 400s are the so-called low-end handsets.
Even if a device is in the 400s, the performance improves with the newer release
date, so it is difficult to make a general statement, but basically, the higher the
number, the higher the performance.
Furthermore, it was announced in 2021 that the naming convention will be
changed to something like Snapdragon 8 Gen 1 in the future, as this naming
convention will soon run out of numbers.
These naming conventions are useful to keep in mind when tuning performance,
as they can be used as an indicator to determine the performance of a device.
2.1.3 CPU
CPU(Central Processing Unit)The CPU is the brain of the computer and is responsi-
ble not only for executing programs, but also for interfacing with the various hardware
components of the computer. When actually tuning performance, it is useful to know
what kind of processing is performed in the CPU and what kind of characteristics it
has, so we will explain it from a performance perspective here.
CPU Basics
What determines the execution speed of a program is not only simple arithmetic
power, but also how fast it can execute the steps of a complex program. For exam-
ple, there are four arithmetic operations in a program, but there are also branching
operations. For the CPU, it does not know which instruction will be called next until
it executes the program. Therefore, the hardware of the CPU is designed to be able
to process a variety of instructions in rapid succession.
27
Chapter 2 Fundamentals
The flow of instructions inside the CPU is called a pipeline, and instructions are
processed while predicting the next instruction in the pipeline. If the next instruction
is not predicted, a pause called a pipeline stall occurs and the pipeline is reset. The
majority of stalls are caused by branching. Although the branch itself anticipates the
result to some extent, mistakes can still be made. Although performance tuning is
possible without memorizing the internal structure, just knowing these things will help
you to be more aware of how to avoid branching in loops when writing code.
28
2.1 Hardware
29
Chapter 2 Fundamentals
The advantage of asymmetric cores is that normally only the power-saving cores are
used to conserve battery power, and the cores can be switched when performance is
required, such as in games. Note, however, that the maximum parallel performance
is reduced by the power-saving cores, so the number of cores alone cannot be used
to judge the performance of asymmetric cores.
Whether a program can use up multiple cores also depends on the parallel pro-
cessing description of the program. For example, there are cases where the game
engine has streamlined the physics engine by running it in a separate thread, or par-
allel processing is utilized through Unity’s JobSystem, etc. Since the operation of the
game’s main loop itself cannot be parallelized, the higher performance of the core
itself is advantageous even with multiple cores. Therefore, it is advantageous to have
a high performance core itself, even if it is multi-core.
30
2.1 Hardware
programs can quickly access the data they need. There are three types of cache
memory: L1, L2, and L3 cache. The smaller the number, the faster the speed,
but the smaller the capacity. The smaller the number, the faster the cache, but the
smaller the capacity. Therefore, the CPU cache cannot store all data, but only the
most recently handled data.
▲ Figure 2.5 Relationship between the CPU L1, L2, and L3 caches and main memory
2.1.4 GPU
While CPUs specialize in executing programs GPU(Graphics Processing Unit) is
a hardware specialized for image processing and graphics rendering.
GPU Basics
GPUs are designed to specialize in graphics processing, so their structure is very
different from that of CPUs, and they are designed to process a large number of
simple calculations in parallel. For example, if an image is to be converted to black
and white, the CPU must read the RGB values of certain coordinates from memory,
convert them to grayscale, and return them to memory, pixel by pixel. Since such a
31
Chapter 2 Fundamentals
process does not involve any branching and the calculation of each pixel does not
depend on the results of other pixels, it is easy to perform the calculations for each
pixel in parallel.
Therefore, GPUs can perform parallel processing that applies the same operation
to a large amount of data at high speed, and as a result, graphics processing can be
performed at high speed. In particular, graphics processing requires a large number
of floating-point operations, and GPUs are particularly good at floating-point oper-
ations. For this reason, a performance index called FLOPS, which measures the
number of floating-point operations per second, is generally used. Since it is difficult
to understand the performance only in terms of computing power, an indicator called
fill rate, which indicates how many pixels can be drawn per second, is also used.
32
2.1 Hardware
GPU Memory
GPUs, of course, also require memory space for temporary storage to process data.
Normally, this area is dedicated to the GPU, unlike main memory. Therefore, to
perform any kind of processing, data must be transferred from main memory to GPU
memory. After processing, the data is returned to main memory in the reverse order.
Note that if the amount of data to be transferred is large, for example, transferring
multiple high-resolution textures, the transfer takes time and becomes a processing
bottleneck.
In mobile devices, however, the main memory is generally shared between the
CPU and GPU, rather than being dedicated to the GPU. While this has the advantage
of dynamically changing the memory capacity of the GPU, it has the disadvantage of
sharing the transfer bandwidth between the CPU and GPU. In this case, data must
still be transferred between the CPU and GPU memory areas.
GPGPU
33
Chapter 2 Fundamentals
speed, which CPUs are not good at. GPGPU(General Purpose GPU)
GPGPU, General Purpose GPU. In particular, there are many cases where
GPUs are used for machine learning such as AI and computational process-
ing such as blockchain, which has led to a sharp increase in the demand for
GPUs, resulting in a price hike and other effects. GPGPU can also be used
in Unity by utilizing a function called Compute Shader.
2.1.5 Memory
Basically, all data is held in main memory, as the CPU only holds the data necessary
for the calculation at that time. Since it is not possible to use more memory than
the physical capacity, if too much is used, the memory cannot be allocated and the
process is forced to terminate by the OS. This is generally referred to as OOM(Out
Of Memory) This is commonly referred to as OOM, Out Of Memory, and is called
"killed. As of 2022, the majority of smartphones are equipped with 4-8 GB of memory
capacity. Even so, you should be careful not to use too much memory.
Also, as mentioned above, since memory is separated from the CPU, performance
itself will vary depending on whether or not memory-aware implementation is used.
In this section, we will explain the relationship between programs and memory so
that performance-conscious implementation can be performed.
Memory Hardware
Although it is advantageous to have the main memory inside the SoC due to the
physical distance, memory is not included in the SoC. There are reasons for this,
such as the fact that the amount of memory installed cannot be changed from de-
vice to device if it is included in the SoC. However, if the main memory is slow, it
will noticeably affect program execution speed, so a relatively fast bus is used to
connect the SoC and memory. The memory and bus standards commonly used in
smartphones are LPDDR is the LPDDR standard. There are several generations of
LPDDR, but the theoretical transfer rate is several Gbps. Of course, it is not always
possible to achieve the theoretical performance, but in game development, this is
rarely a bottleneck, so there is no need to be aware of it.
34
2.1 Hardware
Memory and OS
Within an OS, there are many processes running simultaneously, mainly system pro-
cesses and user processes. The system processes play an important role in running
the OS, and most of them reside in the OS as services and continue to run regard-
less of the user’s intention. On the other hand, user processes are processes that
are started by the user and are not essential for the OS to run.
There are two display states for apps on smartphones: foreground (foremost) and
background (hidden). Generally, when a particular app is in the foreground, other
apps are in the background. While an app is in the background, the process exists
in a suspended state to facilitate the return process, and memory is maintained as it
is. However, when the memory used by the entire system becomes insufficient, the
process is killed according to the priority order determined by the OS. At this time,
the most likely to be killed are user applications (≒ games) in the background that
are using a lot of memory. In other words, games that use a lot of memory are more
likely to be killed when they are moved to the background, resulting in a worse user
experience when returning to the game and having to start all over again.
If there is no other process to kill when it tries to allocate memory, it will be killed
itself. In some cases, such as iOS, it is controlled so that no more than a certain
percentage of the physical memory can be used by a single process. Therefore,
there is a limit to the amount of memory that can be allocated. As of 2022, the
limit for an iOS device with 3GB of RAM, which is a major RAM capacity, will be
1.3~1.4GB, so this is likely to be the upper limit for creating games.
Memory Swap
In reality, there are many different hardware devices, some of which have very small
physical memory capacity. In order to run as many processes as possible on such
terminals, the OS tries to secure virtual memory capacity in various ways. This is
memory swap.
One method used in memory swap is memory compression. Physical capacity
is saved by compressing and storing in memory, mainly memory that will not be
accessed for a while. However, because of the compression and decompression
costs, it is not done for areas that are actively used, but for applications that have
gone to the background, for example.
Another technique is to save storage of unused memory. On hardware with ample
35
Chapter 2 Fundamentals
36
2.1 Hardware
The heap, on the other hand, is a memory area that can be freely used within
the program. Whenever the program needs it, it can issue a memory allocation
instruction (malloc in C) to allocate and use a large amount of data. Of course, when
the program finishes using the memory, it needs to release it (free). In C#, memory
allocation and deallocation are automatically performed at runtime, so implementors
do not need to do this explicitly.
Since the OS does not know when and how much memory is needed, it allocates
memory from the free space when it is needed. If the memory cannot be allocated
continuously when the memory allocation is attempted, it is assumed to be out of
memory. This keyword "consecutive" is important. In general, repeated allocation
and deallocation of memory results in memory fragmentation occurs. When mem-
37
Chapter 2 Fundamentals
ory is fragmented, even if the total free space is sufficient, there may be no con-
tiguous free space. In such a case, the OS will first try to Heap expansion to the
heap. In other words, it allocates new memory to be allocated to processes, thereby
ensuring contiguous space. However, due to the finite memory of the entire system,
the OS will kill the process if there is no more memory left to allocate.
The Stack Overflow error occurs when stack memory is used up due to
recursive calls to functions. The default stack size for iOS/Android is 1MB, so
this error is more likely to occur when the size of recursive calls increases. In
general, it is possible to prevent this error by changing to an algorithm that
38
2.1 Hardware
does not result in recursive calls, or by changing to an algorithm that does not
allow recursive calls to become too deep.
2.1.6 Storage
When you actually proceed with tuning, you may notice that it often takes a long time
to read a file. Reading a file means reading data from the storage where the file is
stored and writing it to memory so that it can be handled by the program. Knowing
what is actually happening there is useful when tuning.
First, a typical hardware architecture will have dedicated storage for persistent
data. Storage is characterized by its large capacity and its ability to persist data
without a power supply (nonvolatile). Taking advantage of this feature, a vast amount
of assets as well as the program of the application itself are stored in the storage,
and are loaded from the storage and executed at startup, for example.
39
Chapter 2 Fundamentals
However, the process of reading and writing to this storage is very slow compared
to the program execution cycle from several perspectives.
• The physical distance from the CPU is greater than that of memory, resulting
in large latency and slow read/write speeds.
• There is a lot of waste because reads are done in block units, including the
commanded data and its surroundings.
• Sequential read/write is fast, while random read/write is slow.
The fact that random read/write is slow is particularly important. To begin with, se-
quential read/write and random read/write are sequential when a single file is read-
/written in order from the beginning of the file. However, when reading/writing multiple
parts of a single file or reading/writing multiple small files at once, it is random. If you
are reading/writing multiple parts of a file, or reading/writing multiple small files, it
will be random. It is important to note that even when reading/writing multiple files in
the same directory, they may not be physically located consecutively, so if they are
physically far apart, they will be randomized.
When reading a file from storage, the details are omitted, but the process
roughly follows the flow below.
The 1. program commands the storage controller the area of the file to
be read from storage The 2. storage controller receives the command and
calculates the area to be read on the physical where the data is located 3.
Reads the data 4. Writes the data in memory The 5. program accesses the
40
2.1 Hardware
Also, typical storage achieves performance and space efficiency by writing a single
file in blocks of 4KB or so. These blocks are not necessarily physically contiguous,
even if they are a single file. The state in which files are physically distributed is
called fragmentation(fragmentation)and the operation to eliminate fragmentation is
called defragmentation is called defragmentation. While fragmentation was often a
problem with HDDs, which were the mainstay of PCs, it has almost disappeared with
the advent of flash storage. While we do not need to be aware of file fragmentation
in smartphones, it is important to be aware of it when considering PCs.
41
Chapter 2 Fundamentals
In the PC world, HDDs and SSDs are the most common types of storage;
you may not have seen HDDs before, but they are media that are recorded in
the form of disks, like CDs, with heads that move over the disks to read the
magnetism. As such, it was a device that was structurally large and had high
latency due to the physical movement involved. In recent years, SSDs have
become popular. Unlike HDDs, SSDs do not generate physical movement
and therefore offer high-speed performance, but on the other hand, they have
a limit to the number of read/write cycles (lifespan), so they become unusable
when frequent read/write cycles occur. Although smartphones are different
from SSDs, they use a type of flash memory called NAND.
• Storage read/write speeds are surprisingly slow, and do not expect the same
speed as memory
• Reduce the number of files to be read/written at the same time as much as
possible (e.g., distribute timing, consolidate files into a single file, etc.)
2.2 Rendering
In games, rendering workloads often have a negative impact on performance. There-
fore, knowledge of rendering is essential for performance tuning. Therefore, this
*1 https://maxim-saplin.github.io/cpdt_results/
42
2.2 Rendering
The rendering pipeline starts with sending the necessary data from the CPU to
the GPU. This data includes the coordinates of the vertices of the 3D model to be
rendered, the coordinates of the lights, the material information of the objects, the
camera information, and so on.
At this point, the data sent are the coordinates of the 3D model’s vertices, camera
coordinates, orientation, angle of view, etc., each of which is individual data. The
GPU compiles this information and calculates where the object will appear on the
screen when it is viewed with the camera. This process is called coordinate transfor-
mation.
Once the position of the object on the screen is determined, the next step is to
determine the color of the object. The GPU then calculates the color of the object
by asking, "What color will the corresponding pixels on the screen be when the light
43
Chapter 2 Fundamentals
In the above process, "where on the screen the object will appear" is determined
by the Vertex Shader and "the color of the area corresponding to each pixel on
the screen" is calculated by a program called fragment shader and "what color the
corresponding part of each pixel on the screen will be" is calculated by a program
called the fragment shader.
These shaders can be freely written. Therefore, writing heavy processing in the
vertex shaders and fragment shaders will increase the processing load.
Also, the vertex shader processes the number of vertices in the 3D model, so the
more vertices there are, the greater the processing load. Fragment shaders increase
the processing load as the number of pixels to be rendered increases.
In the actual rendering pipeline, there are many processes other than vertex
shaders and fragment shaders, but since the purpose of this document is to
understand the concepts necessary for performance tuning, we will only give
a brief description.
44
2.2 Rendering
First, consider the case where both of these objects are opaque. In this case, the
objects in front of the camera are drawn first. In this way, when drawing the object in
the back, the part of the object that is not visible because it overlaps the object in the
front does not need to be processed. This means that the fragment shader operation
can be skipped in this area, thus optimizing the processing load.
On the other hand, if both objects are semi-transparent, it would be unnatural if the
back object is not visible through the front object, even if it overlaps the front object.
45
Chapter 2 Fundamentals
In this case, the drawing process is performed starting with the object in the back
as seen from the camera, and the color of the overlapping area is blended with the
already drawn color.
There are several ways to implement the rendering pipeline. Of these, the
description in this section assumes forward rendering. Some points may not
be partially applicable to other rendering methods such as deferred rendering.
46
2.3 Data Representation
different from that of the object rendered in the previous draw call, the GPU will set
the texture or other information to the GPU. This is done using the set path call and
is a relatively heavy process. Since this process is done on the CPU’s render thread,
it is a processing load on the CPU, and too much of it can affect performance.
Unity has a feature to reduce draw calls called draw call batching to reduce draw
calls. This is a mechanism whereby meshes of objects with the same texture and
other information (i.e., the same material) are combined in CPU-side processing in
advance and drawn with a single draw call. Batching at runtime Dynamic batch-
ing and the merged mesh is created in advance. Static batching which creates a
combined mesh in advance.
There is also a Scriptable Render Pipeline also has an SRP Batcher mecha-
nism. Using this mechanism, set-pass calls can be combined into a single call, even
if the mesh and material are different, as long as the shader variants are the same.
This mechanism does not reduce the number of draw calls, but it does reduce the
set-pass calls, since these are the ones that are the most expensive to process.
For more information on these batching arrangements, see "7.3 Reducing Draw
Calls".
GPU Instancing
47
Chapter 2 Fundamentals
If we use two bits, we can express the range that can be represented by two
digits of binary numbers, in other words, four combinations. Since there are four
combinations, it is possible to express, for example, which key was pressed: up,
down, left, or right.
Similarly, 8 bits can represent a range of 8 binary digits, i.e., 2 ways ^ 8 digits = 256
ways. At this point, it seems that a variety of information can be expressed. These
8 bits are expressed in the unit of 1 byte. In other words, one byte is a unit that can
express 256 different amounts of information.
48
2.3 Data Representation
There are also units that represent larger numbers, such as the kilobyte (KB),
which represents 1,000 bytes, and the megabyte (MB), which represents 1,000 kilo-
bytes.
2.3.2 Image
Image data is represented as a set of pixels. For example, an 8 × 8 pixel image
consists of a total of 8 × 8 = 64 pixels.
49
Chapter 2 Fundamentals
In this case, each pixel has its own color data. So how is color represented in
digital data?
First, color is created by combining four elements: red (Red), green (Green), blue
(Blue), and transparency (Alpha). These are called channels, and each channel is
represented by the initial letters RGBA.
In the commonly used True Color method of color representation, each RGBA
value is expressed in 256 steps. As explained in the previous section, 256 steps
means 8 bits. In other words, True Color can be represented with 4 channels × 8
bits = 32 bits of information.
50
2.3 Data Representation
Thus, for example, an 8 × 8 pixel True Color image has 8 pixels × 8 pixels × 4
channels × 8 bits = 2,048 bits = 256 bytes. For a 1,024 × 1,024 pixel True Color
image, its information content would be 1,024 pixels × 1,024 pixels × 4 channels ×
8 bits = 33,554,432 bits = 4,194,304 bytes = 4,096 kilobytes = 4 megabytes.
51
Chapter 2 Fundamentals
In Unity, various compression methods can be specified for each platform using
the texture import settings. Therefore, it is common to import an uncompressed
image and apply compression according to the import settings to generate the final
texture to be used.
2.3.4 Mesh
In 3DCG, a three-dimensional shape is expressed by connecting many triangles in
3D space. This collection of triangles is called a mesh is called a mesh.
52
2.3 Data Representation
Since the vertex information is stored in a single array, we need additional infor-
mation to indicate which of the vertices will be combined to form a triangle. This is
called the vertex index and is represented as an array of type int that represents the
index of the array of vertex information.
Additional information is needed for texturing and lighting objects. For example,
mapping a texture requires UV coordinates. Lighting also requires information such
as vertex color, normals, and tangents.
The following table summarizes the main vertex information and the amount of
information per vertex.
It is important to determine the number of vertices and the type of vertex informa-
53
Chapter 2 Fundamentals
tion in advance because mesh data grows as the number of vertices and the amount
of information handled by a single vertex increases.
In addition to time and value, keyframes have other information such as tangents
and their weights. By using these in the interpolation calculation, more complex
animations can be realized with less data.
54
2.4 How Unity Works
In keyframe animation, the more keyframes there are, the more complex the an-
imation can be. However, the amount of data also increases with the number of
keyframes. For this reason, the number of keyframes should be set appropriately.
There are methods to compress the amount of data by reducing the number of
keyframes while keeping the curves as similar as possible. In Unity, keyframes can
be reduced in the model import settings as shown in the following figure.
55
Chapter 2 Fundamentals
56
2.4 How Unity Works
The reason for interrupting IL once is that once converted to machine language,
the binary can only be executed on a single platform. With IL, any platform can run
simply by preparing a runtime for that platform, eliminating the need to prepare bi-
naries for each platform. Therefore, the basic principle of Unity is that IL obtained by
compiling the source code is executed on the runtime for the respective environment,
thereby achieving multi-platform compatibility.
57
Chapter 2 Fundamentals
IL2CPP
As mentioned above, Unity basically compiles C# into IL code and runs it at runtime,
but starting around 2015, some environments started having problems. That is 64-bit
support for apps running on iOS and Android. As mentioned above, C# requires a
runtime to run in each environment to execute IL code. In fact, until then, Unity was
actually a long-standing OSS implementation of the . Mono NET Framework OSS
implementation, and Unity itself modified it for its own use. In other words, in order
for Unity to become 64-bit compatible, it was necessary to make the forked Mono 64-
bit compatible. Of course, this would require a tremendous amount of work, so Unity
decided to use IL2CPP Unity overcame this challenge by developing a technology
called IL2CPP instead.
IL2CPP is, as the name suggests, IL to CPP, a technology that converts IL code
to C++ code. Since C++ is a highly versatile language that is natively supported in
any development environment, it can be compiled into machine language in each
development tool chain once it is output to C++ code. Therefore, 64-bit support is
*2 https://sharplab.io/
58
2.4 How Unity Works
the job of the toolchain, and Unity does not have to deal with it. Unlike C#, C++ code
is compiled into machine language at build time, eliminating the need to convert it to
machine language at runtime and improving performance.
Although C++ code generally has the disadvantage of taking a long time to build,
the IL2CPP technology has become a cornerstone of Unity, solving 64-bit compati-
bility and performance in one fell swoop.
Unity Runtime
By the way, although Unity allows developers to program games in C#, the runtime
of Unity itself, called the engine, does not actually run in C#. The source itself is
written in C++, and the part called the player is distributed pre-built to run in each
environment. There are several possible reasons why Unity writes its engine in C++.
Since the C# code written by the developer runs in C#, Unity requires two ar-
eas: the engine part, which runs natively, and the user code part, which runs at
C# runtime. The engine and user code work by exchanging data as needed during
execution. For example, when GameObject.transform is called from C#, all game
execution state such as scene state is managed inside the engine, so first makes
a native call to access memory data in the native area and then returns values to
C#. It is important to note that memory is not shared between C# and native, so
data needed by C# is allocated on the C# side each time it is needed. API calls are
also expensive, with native calls occurring, so an optimization technique of caching
values without frequent calls is necessary.
59
Chapter 2 Fundamentals
*3 https://github.com/Unity-Technologies/UnityCsReference
60
2.4 How Unity Works
tiple asset loads and leaks increases. This is because developers mainly focus on
profiling and debugging the C# side. It is difficult to understand the C# side execu-
tion state alone, and it is necessary to analyze it by comparing it with the engine
side execution state. Profiling of the native area is dependent on the API provided by
Unity, which limits the tools available. We will introduce methods for analysis using a
variety of tools in this document, but it will be easier to understand if you are aware
of the space between C# and native.
2.4.3 Threads
A thread is a unit of program execution, and processing generally proceeds by cre-
ating multiple threads within a single process. Since a single core of the CPU can
only process one thread at a time, it executes the program while switching between
threads at high speed to handle multiple threads. This is called context switch is
called a context switch. Context switches incur overhead, so if they occur frequently,
processing efficiency is reduced.
When a program is executed, the underlying main thread is created, from which
the program creates and manages other threads as needed. Unity’s game loop is
designed to run on a single thread, so scripts written by users will basically run on
the main thread. Conversely, attempting to call Unity APIs from a thread other than
61
Chapter 2 Fundamentals
The main thread and the render thread run like a pipeline, so the render thread
starts computing the next frame while the render thread is processing it. However, if
62
2.4 How Unity Works
the time to process a frame in the render thread is getting longer, it will not be able
to start drawing the next frame even if the calculation for the next frame is finished,
and the main thread will have to wait. In game development, be aware that the FPS
will drop if either the main thread or the render thread becomes too heavy.
1. Processing input from controllers such as keyboard, mouse, touch display, etc.
2. Calculating the game state that should progress in one frame of time
3. Rendering the new game state
4. Waiting until the next frame depending on the target FPS
This loop is repeated to output the game as a video to the GPU. If processing
within a single frame takes longer, then of course the FPS will drop.
63
Chapter 2 Fundamentals
*4 https://docs.unity3d.com/ja/current/Manual/ExecutionOrder.html
64
2.4 How Unity Works
65
Chapter 2 Fundamentals
2.4.5 GameObject
As mentioned above, since the Unity engine itself runs natively, the Unity API in C#
is also, for the most part, an interface for calling the internal native API. The same is
true for GameObject and MonoBehaviour, which defines components that attach to
it, which will always have native references from the C# side. However, if the native
side manages the data and also has references to them on the C# side, there is an
inconvenience when it comes time to destroy them. This is because the references
from C# cannot be deleted without permission when the data is destroyed on the
native side.
In fact, List 2.1 checks if the destroyed GameObject is null, but true is output in the
log. This is unnatural for standard C# behavior, since _gameObject is not assigned
null, so there should still be a reference to an instance of type GameObject.
*5 https://tsubakit1.hateblo.jp/entry/2018/04/17/233000
66
2.4 How Unity Works
System.Collections.IEnumerator DelayedDestroy()
{
// cache WaitForSeconds to reuse
var waitOneSecond = new UnityEngine.WaitForSeconds(1f);
yield return waitOneSecond;
Destroy(_gameObject);
yield return waitOneSecond;
// Excerpt.
public static bool operator==(Object x, Object y) {
return CompareBaseObjects(x, y);
}
*6 https://github.com/Unity-Technologies/UnityCsReference/blob/
c84064be69f20dcf21ebe4a7bbc176d48e2f289c/Runtime/Export/Scripting/
UnityEngineObject.bindings.cs
67
Chapter 2 Fundamentals
if (o is MonoBehaviour || o is ScriptableObject)
return false;
return DoesObjectWithInstanceIDExist(o.GetInstanceID());
}
2.4.6 AssetBundle
Games for smartphones are limited by the size of the app, and not all assets can be
included in the app. Therefore, in order to download assets as needed, Unity has a
mechanism called AssetBundle that packs multiple assets and loads them dynami-
cally. At first glance, this may seem easy to handle, but in a large project, it requires
careful design and a good understanding of memory and AssetBundle, as memory
can be wasted in unexpected places if not designed properly. Therefore, this section
describes what you need to know about AssetBundle from a tuning perspective.
68
2.4 How Unity Works
In other words, uncompressed is good for the fastest loading time, but its fatally
large file size makes it basically unusable to avoid wasting storage space on smart-
phones. LZMA, on the other hand, has the smallest file size, but has the disadvan-
tages of slow decompression and partial decompression due to algorithm problems.
LZ4 is a compression setting that offers a good balance between speed and file size,
and as the name ChunkBasedCompression suggests, partial decompression is pos-
sible, so partial loading is possible without having to decompress the entire file as
with LZMA.
AssetBundle also has Caching.compressionEnabled, which changes the com-
pression settings when cached in the terminal cache. In other words, by using LZMA
for delivery and converting to LZ4 on the terminal, the download size can be mini-
mized and the benefits of LZ4 can be enjoyed when actually used. However, recom-
pression on the terminal side means that the CPU processing cost on the terminal is
that much higher, and memory and storage space are temporarily wasted.
69
Chapter 2 Fundamentals
70
2.5 C# Basics
2.5 C# Basics
This section describes the language specification and program execution behavior
of C#, which is essential for performance tuning.
71
Chapter 2 Fundamentals
72
2.5 C# Basics
1: GarbageCollector.GCMode = GarbageCollector.Mode.Disabled;
But of course, if GC.Alloc is done during the period of disabling, the heap space
will be extended and consumed, eventually leading to a crash of the app as it cannot
be newly allocated. Since memory usage can easily increase, it is necessary to
implement the function so that GC.Alloc is not performed at all during the period
when it is disabled, and the implementation cost is also high, so the actual use is
limited. (e.g., disabling only the shooting part of a shooting game)
In addition, Incremental GC can be selected starting with Unity 2019. With In-
cremental GC, garbage collection processing is now performed across frames, and
large spikes can now be reduced. However, for games that must maximize power
while reducing processing time per frame, it is necessary to implement an imple-
mentation that avoids the occurrence of GC.Alloc when it comes down to it. Specific
examples are discussed at "10.1 GC.Alloc cases and how to deal with them".
73
Chapter 2 Fundamentals
*7 https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/choosing-
between-class-and-struct
74
2.5 C# Basics
Handling Arrays
Arrays of value types are allocated inline, and the array elements are the entities
(instances) of the value type. On the other hand, in an array of reference type,
the array elements are arranged by reference (address) to the entity of the refer-
ence type. Therefore, allocation and deallocation of arrays of value types is much
less expensive than for reference types. In addition, in most cases, arrays of value
types have the advantage that the locality (spatial locality) of references is greatly im-
proved, which makes CPU cache memory hit probability higher and facilitates faster
processing.
Value Copying
In reference-type assignment (allocation), the reference (address) is copied. On
the other hand, in a value type assignment (allocation), the entire value is copied.
The size of the address is 4 bytes in a 32-bit environment and 8 bytes in a 64-bit
environment. Therefore, a large reference type assignment is less expensive than a
value type assignment that is larger than the address size.
Also, in terms of data exchange (arguments and return values) using methods,
the reference type passes the reference (address) by value, whereas the value type
passes the instance itself by value.
For example, in this method, the entire value of MyStruct is copied. This means
that as the size of MyStruct increases, so does the copy cost. On the other hand,
the MyClass method only copies the reference to myClass as a value, so even if the
size of MyClass increases, the copy cost will remain constant because it is only for
the address size. Since the increase in copy cost is directly related to the processing
load, the appropriate choice must be made according to the size of the data to be
handled.
Immutability
Changes made to an instance of a reference type will affect other locations that
reference the same instance. On the other hand, a copy of an instance of a value
75
Chapter 2 Fundamentals
Pass-by-Reference
Since the reference (address) was copied in reference type value pass-
ing, replacing an instance does not affect the original instance, but reference
passing allows replacing the original instance.
Boxing
Boxing is the process of converting a value type to a object type or a value type
to an interface type. A box is an object that is allocated on the heap and subject
to garbage collection. Therefore, an excess of boxing and unboxing will result in
76
2.5 C# Basics
GC.Alloc. In contrast, when a reference type is cast, no such boxings take place.
▼ List 2.7 When a value type is cast to an object type, boxed
1: int num = 0;
2: object obj = num; // Boxed
3: num = (int) obj; // Unboxing
We would never use such obvious and meaningless boxings, but what about when
they are used in the method?
▼ List 2.8 Example of boxed by implicit cast
• Conditions for avoiding structures: unless the type has all of the following char-
acteristics
– When it logically represents a single value, as with primitive types ( int, d
ouble, etc.)
– The size of the instance is less than 16 bytes
– It is immutable.
– Does not need to be boxed frequently
There are a number of types that do not meet the above selection criteria but are
defined as structures. Types such as Vector4 and Quaternion, which are frequently
77
Chapter 2 Fundamentals
used in Unity, are defined as structs, though not less than 16 bytes. Please check
how to handle these efficiently, and if copying costs are increasing, choose a method
that includes a workaround. In some cases, consider creating an optimized version
with equivalent functionality on your own.
78
2.6 Algorithms and computational complexity
▼ Table 2.5 Number of data and number of computation steps for major quantities
To illustrate each of the computational quantities, we will list a few code samples.
First, let’s look at the following code samplesO(1) indicates a constant amount of
computation independent of the number of data.
Aside from the raison d’etre of this method, obviously the process is independent
79
Chapter 2 Fundamentals
of the number of data in the array and takes a constant number of calculations (in
this case, one).
Next, we callO(n) code example.
▼ List 2.10 O(n) Code Example of O(n)
Here is an array containing integer values with1 is present, the process just returns
true. If by chance the first of array1 is found at the beginning of , the process may
be completed in the fastest possible time, but if there is no 1 in array , the process
will return1 or at the end of array for the first time, the process will return1 is found for
the first time at the end of , the loop will go all the way to the end, son times because
the loop goes all the way to the end. This worst-case scenario is calledO(n) and you
can imagine that the amount of computation increases with the number of data.
Next, let us denote the worst-case scenario asO(n2 ) Let’s look at an example for
the case of O(n^2).
▼ List 2.11 O(n2 ) Example code for
80
2.6 Algorithms and computational complexity
18: }
This one is just a method that returns true if any of the two arrays contain the
same value in a double loop. The worst-case scenario is that they are all mismatched
cases, son2 times.
As a side note, in the concept of computational complexity, only the term with
the largest order is used. If we create a method that executes each of the three
methods in the above example once, we get the maximum orderO(n2 ) of the
maximum order. (TheO(n2 + n + 1) )
It should also be noted that the calculation volume is only a guideline when the
number of data is sufficiently large, and is not necessarily linked to the actual
measurement time. O(n5 ) may not be a problem when the number of data is
small, even if it looks like a huge calculation volume such as . Therefore, it is
recommended to use the calculation volume as a reference and measure the
processing time to see if it fits within a reasonable range, taking the number of
data into consideration each time.
List<T>
This is the most commonly used List<T>. The data structure is an array. It is effec-
tive when the order of data is important, or when data is often retrieved or updated
by index. On the other hand, if there are many insertions and deletions of elements,
it is best to avoid using List<T> because it requires a large amount of computation
81
Chapter 2 Fundamentals
due to the need to copy after the indexes that have been manipulated.
In addition, when the capacity is exceeded by Add, the memory allocated for the
array is extended. When memory is extended, twice the current Capacity is allo-
cated, so it is recommended that Add be used withO(1) to use it with appropriate
initial values so that it can be used without causing expansion.
LinkedList<T>
The data structure of LinkedList<T> is a linked list. A linked list is a basic data
structure, in which each node has a reference to the next node. C#’s LinkedList<T
> is a two-way linked list, so each has a reference to the node before and after it. Li
nkedList<T> has strong features for adding and deleting elements, but is not good
at accessing specific elements in the array. It is suitable when you want to create a
process that temporarily holds data that needs to be added or deleted frequently.
Queue<T>
Queue<T> is a collection class that implements the FIFO (first in first out) method. It is
used to implement so-called queues, for example, to manage input operations. In Q
ueue<T>, a circular array is used. Enqueue The first element is added at the end with
Dequeue and the first element is removed while the second element is removed with
. When adding beyond capacity, expansion is performed. Peek is an operation to
take out the top element without deleting it. As you can see from the computational
82
2.6 Algorithms and computational complexity
complexity, Enqueue and Dequeue can be used to keep high performance, but they
are not suitable for operations such as traversal. TrimExcess is a method to reduce
capacity, but from a performance tuning perspective, it can be used so that capacity is
not increased or decreased in the first place, further exploiting Queue<T> its strengths.
Stack<T>
Stack<T> is a collection class that implements the last in first out (LIFO) method: last
in first out. Stack<T> is implemented as an array. Push The first element is added
with Pop, and the first element is removed with . Peek is an operation to take out
the first element without deleting it. A common use of is when implementing screen
transitions, where the scene information for the destination of the transition is stored
in Push, and when the back button is pressed, retrieving the scene information by Po
p. As with Queue, high performance can be obtained by using only Push and Pop for
Stack. Be careful not to search for elements, and be careful to increase or decrease
capacity.
Dictionary<TKey, TValue>
While the collections introduced so far have been semantic in order, Dictionary<TK
ey, TValue> is a collection class that specializes in indexability. The data structure
is implemented as a hash table (a kind of associative array). The structure is like a
83
Chapter 2 Fundamentals
dictionary where keys have corresponding values (in the case of a dictionary, words
are keys and descriptions are values). Dictionary<TKey, TValue> has the disad-
vantage of consuming more memory, but the speed of the lookup isO(1) and faster.
It is very useful for cases that do not require enumeration or traversal, and where the
emphasis is on referencing values. Also, be sure to pre-set the capacity.
84
2.6 Algorithms and computational complexity
and subsequent times, we first check to see if they are cached, and if they
are, we return only the result and exit. In this way, no matter how high the
computation volume may be the first time, the second and subsequent times
the computation volume is reduced toO(1) the second time. If the number of
arguments that can be passed is known in advance, it is possible to complete
the calculation before the game and cache it, so that effectively returnsO(1)
and cache them before the game.
85
Chapter 3
Profiling Tools
Profiling tools are used to collect and analyze data, identify bottlenecks, and deter-
mine performance metrics. There are several of these tools provided by the Unity
engine alone. Other tools include native-compliant tools such as Xcode and Android
Studio, and GPU-specific tools such as RenderDoc. Therefore, it is important to un-
derstand the features of each tool and choose appropriately. In this chapter, we will
introduce each tool and discuss profiling methods, aiming to help you use each tool
appropriately.
87
Chapter 3 Profiling Tools
88
3.1 Unity Profiler
The following are also useful functions common to the entire Profiler tool.
Figure 3.3 In the "Profiler Modules" section, "①" lists the items that each module is
measuring. By clicking on this item, you can switch between display and non-display
on the timeline on the right. Displaying only the necessary items will make the view
easier to read. You can also reorder the items by dragging them, and the graph on
the right side will be displayed in that order. The second item (2) is a function for
saving and loading the measured data. It is recommended to save the measurement
results if necessary. Only the data displayed on the profiler can be saved.
This book explains CPU Usage and Memory module, which are frequently used in
Figure 3.1.
89
Chapter 3 Profiling Tools
simply to click the measurement button during execution, so the details are omitted.
If Deep Profile uses a lot of memory, such as in a large project, it may not
be possible to make measurements due to insufficient memory. In that case,
you have no choice but to add your own measurement process by referring to
"Supplement: About Sampler" in the "3.1.2 CPU Usage" section.
There are two ways to configure these settings: by explicitly specifying them in a
script or by using the GUI. First, we will introduce the method of setting from a script.
90
3.1 Unity Profiler
91
Chapter 3 Profiling Tools
The Unity Editor for measurement does not have to be the project you built. It is
recommended to create a new project for the measurement, as it is lightweight.
Next, for Android, there are a few more steps than for iOS.
adb forward The command requires the Package Name of the application. For
example, if the Package Name is "jp.co.sample.app", enter the following.
If adb is not recognized, please set the adb path. There are many web-based
instructions on how to set up adb, so we will skip this section.
For simple troubleshooting, if you cannot connect, check the following
As an additional note, if you run the application directly in Build And Run, the a
db forward command described above will be performed internally. Therefore, no
command input is required for measurement.
92
3.1 Unity Profiler
Autoconnect Profiler
93
Chapter 3 Profiling Tools
First, the Hierarchy view is explained in terms of what it shows and how to use it.
1. Hierarchy View
The Hierarchy view looks like Figure 3.7.
This view is characterized by the fact that the measurement results are arranged
in a list format and can be sorted by the items in the header. When conducting an
investigation, bottlenecks can be identified by opening items of interest in the list.
However, the information displayed is an indication of the time spent in the selected
thread. For example, if you are using Job System or multi-threaded rendering, the
processing time in another thread is not included. If you want to check, you can do
so by selecting a thread like Figure 3.8.
94
3.1 Unity Profiler
Calls is easier to see as a view because it combines multiple function calls into a
single item. However, it is not clear whether all of them have equal processing time
or only one of them has a long processing time. In such cases, the Raw Hierarchy
View is used in this case. The Raw Hierarchy view differs from the Hierarchy view in
that Calls is always fixed at 1. Figure 3.9 In the following example, multiple calls to
the same function are shown in the Raw Hierarchy view.
To summarize what has been said so far, the Hierarchy view is used for the follow-
ing purposes
95
Chapter 3 Profiling Tools
When opening an item, it is often the case that there is a deep hierarchy. In this
case, you can open all levels of the hierarchy by holding down the Option key
on a Mac (Alt key on Windows). Conversely, closing an item while holding down
the key will close everything below that hierarchy.
2. timeline view
Another way to check the timeline view is as follows.
In the timeline view, items in the hierarchy view are visualized as boxes, so you
can intuitively see where the load is at a glance when viewing the entire view. And
because it is mouse-accessible, even deep hierarchies can be grasped simply by
dragging. In addition, with timelines, there is no need to switch threads; all threads
are displayed. This makes it easy to see when and what kind of processing is taking
place in each thread. Because of these features, timelines are mainly used for the
following purposes
Timeline is not suited for sorting operations to determine the order of heavy pro-
cessing, or for checking the total amount of allocations. Therefore, the Hierarchy
View is better suited for tuning allocations.
96
3.1 Unity Profiler
1: using UnityEngine.Profiling;
2: /* ... Omitted...*/
3: private void TestMethod()
4: {
5: for (int i = 0; i < 10000; i++)
6: {
7: Debug.Log("Test");
8: }
9: }
10:
11: private void OnClickedButton()
12: {
13: Profiler.BeginSample("Test Method")
14: TestMethod();
15: Profiler.EndSample()
16: }
The embedded sample will be displayed in both the Hierarchy and Timeline views.
There is one more feature worth mentioning. If the profiling code is not a Develop-
ment Build, the caller is disabled, so there is zero overhead. It may be a good idea to
put this in place in advance in areas where the processing load is likely to increase
in the future.
The BeginSample method is a static function, so it can be used easily, but there
is also a CustomSampler that has similar functionality. This method was added af-
97
Chapter 3 Profiling Tools
ter Unity 2017 and has less measurement overhead than BeginSample, so it can
measure more accurate times.
1: using UnityEngine.Profiling;
2: /* ... Omitted...*/
3: private CustomSampler _samplerTest = CustomSampler.Create("Test");
4:
5: private void TestMethod()
6: {
7: for (int i = 0; i < 10000; i++)
8: {
9: Debug.Log("Test");
10: }
11: }
12:
13: private void OnClickedButton()
14: {
15: _samplerTest.Begin();
16: TestMethod();
17: _samplerTest.End();
18: }
3.1.3 Memory
Memory modules are displayed as Figure 3.12.
98
3.1 Unity Profiler
• Simple view
• Detailed view
First, we will explain the contents and usage of the Simple view.
1. simple view
The Simple view looks like Figure 3.13.
99
Chapter 3 Profiling Tools
Figure 3.13 The meaning of the items listed to the right of Total Used Memory in
As an additional note regarding the terminology names, starting with Unity 2019.2,
"Mono" has been changed to "GC" and "FMOD" has been changed to "Audio".
Figure 3.13 The number of assets used and the amount of memory allocated for
the following are also listed in the following table.
• Texture
• Mesh
• Material
• Animation Clip
• Audio Clip
The following information on the number of objects and GC Allocation is also avail-
able.
Asset Count
Total number of assets loaded.
100
3.1 Unity Profiler
101
Chapter 3 Profiling Tools
The Simple view in Unity 2021 and later has a greatly improved UI, making it
easier to see the items displayed. There are no major changes in the content
itself, so the knowledge introduced here can be used as is. Note, however, that
some of the names have been changed. For example, GC has been renamed
Managed Heap.
2. Detailed view
Detailed view looks like Figure 3.15
The result of this view can be obtained by clicking the "Take Sample" button to take
102
3.1 Unity Profiler
a snapshot of the memory snapshot at that point in time. Unlike the Simple view, this
view is not updated in real time, so if you want to refresh the view, you need to Take
Sample again.
Figure 3.15 On the right side of the "Sample" button, there is an item called "Ref-
erenced By. This shows the objects that reference the currently selected object. If
there are any assets that are leaking, the information of the object’s references may
help to solve the problem. This display is only shown if "Gather object references" is
enabled. Enabling this feature will increase the processing time during Take Sample,
but it is basically recommended to leave it enabled.
In Referenced By, you may see the notation ManagedStaticReferences(). This
means that it is referenced by some static object. If you are familiar with the
project, this information may be enough to give you some idea. If not, we rec-
ommend using "3.5 Heap Explorer".
The header items of the Detailed view are not explained here, since they mean
what you see. The operation is the same as "1. Hierarchy View" in "3.1.2 CPU
Usage". There is a sorting function for each header, and the items are displayed in
a hierarchical view. The top node displayed in the Name item is explained here.
You may not be familiar with the items listed under Others in the top node. The
following is a list of items that you should know about.
System.ExecutableAndDlls
Indicates the amount of allocations used for binaries, DLLs, and so on. De-
pending on the platform or terminal, it may not be obtainable, in which case it
is treated as 0B. The memory load for the project is not as large as the listed
103
Chapter 3 Profiling Tools
104
3.2 Profile Analyzer
and visualizing the results of optimization because it has a function for comparing
measurement data, which CPU Usage cannot do.
105
Chapter 3 Profiling Tools
There are two modes of functionality: "Single" and "Compare". Single mode is
used to analyze a single measurement data, while Compare mode is used to com-
pare two measurement data.
Pull Data" allows you to analyze data measured with the Unity Profiler and display
the results. The "Pull Data" mode allows you to analyze the data measured in the
Unity Profiler and display the results.
Save" and "Load" allow you to save and load the data analyzed by Profile Analyzer.
Of course, there is no problem if you keep only the Unity Profiler data. In that case,
you need to load the data in Unity Profiler and do Pull Data in Profile Analyzer each
time. If this procedure is troublesome, it is better to save the data as a dedicated
data.
106
3.2 Profile Analyzer
• Thread Summary
• Summary of selected markers
107
Chapter 3 Profiling Tools
When Depth Slice is set to All, the top node called PlayerLoop is displayed, or
different layers of the same process are displayed, which can be difficult to see. In
such cases, it is recommended to fix Depth to 2~3 and set it so that subsystems
such as rendering, animation, and physics are displayed.
108
3.2 Profile Analyzer
The mean is the value obtained by adding all values together and dividing
by the number of data. The median, on the other hand, is the value that lies
in the middle of the sorted data. In the case of an even number of data, the
average value is taken from the data before and after the median.
The average has the property that it is susceptible to data with values that
are extremely far apart. If there are frequent spikes or the sampling number
is not sufficient, it may be better to refer to the median.
Figure 3.23 is an example of a large difference between the median and
the mean.
109
Chapter 3 Profiling Tools
Analyze your data after knowing the characteristics of these two values.
5. frame summary
This screen shows the frame statistics of the measured data.
This screen displays interval information for the frame being analyzed and the
degree of variation in the values using a boxplot or histogram. Box plots require
an understanding of quartiles. Quartiles are defined values with the data sorted as
Table 3.5.
The interval between 25% and 75% is boxed, which is called a box-and-whisker
graph.
110
3.2 Profile Analyzer
The histogram shows processing time on the horizontal axis and the number of
data on the vertical axis, which is also useful for viewing data distribution. In the
frame summary, you can check the interval and the number of frames by hovering
the cursor over them.
After understanding how to see these diagrams, it is a good idea to analyze the
features.
6. thread summary
This screen shows statistics for the selected thread. You can see a box-and-whisker
diagram for each thread.
111
Chapter 3 Profiling Tools
112
3.3 Frame Debugger
The usage of the screen is almost the same as Single mode, but the words "Left"
and "Right" appear in various screens like Figure 3.30.
This shows which data is which, and matches the color shown at Figure 3.29. Left
is the top data and Right is the bottom data. This mode will make it easier to analyze
whether the tuning results are good or bad.
113
Chapter 3 Profiling Tools
Start the application, select the device connection, and press "Enable" to display the
drawing instruction.
The left frame shows a single drawing instruction per item, with the instructions
issued in order from top to bottom. The right frame shows detailed information about
drawing instructions. You can see which Shader was processed with what properties.
114
3.3 Frame Debugger
Operation Panel
First, let’s look at the operation panel in the upper section.
The part marked "RT0" can be changed when there are multiple render targets.
This is especially useful when using multiple render targets to check the rendering
status of each target. Channels can be changed to display all RGBA or only one
of the channels. Levels is a slider that allows you to adjust the brightness of the
resulting rendering. This is useful, for example, to adjust the brightness of a dark
rendering, such as ambient or indirect lighting, to make it easier to see.
Drawing Overview
This area provides information on the resolution and format of the destination. Ob-
viously, you will be able to notice immediately if there is a drawing destination with a
higher resolution. Other information such as the Shader name used, Pass settings
such as Cull, and keywords used can also be found. The sentence "Why this~" listed
at the bottom describes why the drawing could not be batching. Figure 3.34 In the
case of "Why this~," it states that the first drawing call was selected and therefore
batching was not possible. Since the causes are described in such detail, you can
rely on this information to make adjustments if you want to devise batching.
115
Chapter 3 Profiling Tools
116
3.4 Memory Profiler
Sometimes it is necessary to check in detail the state of Texture2D displayed in
the property information. To do so, click on the image while holding down the
Command key on a Mac (Control key on Windows) to enlarge the image.
The UI of the Memory Profiler has changed significantly between v0.4 and later
versions. This book uses v0.5, which is the latest version at the time of writing.
For v0.4 or later versions, Unity 2020.3.12f1 or later version is required to use all
features. In addition, v0.4 and v0.5 look the same at first glance, but v0.5 has been
significantly updated. In particular, object references are now much easier to follow,
so we basically recommend using v0.5 or later.
117
Chapter 3 Profiling Tools
118
3.4 Memory Profiler
Then install the Memory Profiler from Package in the Unity Registry. After installa-
tion, go to "Window -> Analysis -> Memory Profiler" to launch the tool.
In Unity 2021 and later, the method of adding packages has been changed. To add
a package, click on "Add Package by Name" and enter "com.unity.memoryprofiler".
119
Chapter 3 Profiling Tools
• Toolbar
• Snapshot Panel
• Measurement Results
• Detail Panel
1. tool bar
Figure 3.41 indicates a capture of the Header. The button ① allows you to select
the measurement target. The button (2) measures the memory at the time when it
is pressed. Optionally, you can choose to measure only Native Objects or disable
screenshots. The basic default settings should be fine. Clicking the button (③) will
load the measured data. Clicking the "Snapshot Panel" or "Detail Panel" button will
120
3.4 Memory Profiler
show or hide the information panels on the left and right sides of the screen. If you
only want to see the tree map, it is better to hide them. You can also click the "? to
open the official document.
There is one important point to note regarding the measurement. One thing to
note about measurement is that the memory required for measurement is newly
allocated and will not be released again. However, it does not increase infinitely
and will eventually settle down after several measurements. The amount of memory
allocated at measurement time will depend on the complexity of the project. If you
do not know this assumption, be careful because you may mistakenly think there is
a leak when you see the amount of memory usage ballooning.
Snapshot Panel
The Snapshot Panel displays the measured data and allows you to choose which
data to view. The data is organized by session, from the time the application is
launched to the time it is terminated. You can also delete or rename the measured
121
Chapter 3 Profiling Tools
A" is the data selected in Single Snapshot and "B" is the data selected in Compare
Snapshots. By clicking on the "Replace" button, "A" and "B" can be switched without
returning to the Single Snapshot screen.
3. measurement results
There are three tabs for measurement results: "Summary," "Objects and Allocations,"
and "Fragmentation. This section describes Summary, which is frequently used, and
briefly describes the other tabs as supplementary information. The upper part of
122
3.4 Memory Profiler
the Summary screen is an area called Memory Usage Overview, which displays an
overview of the current memory. Clicking on an item displays an explanation in the
Detail Panel, so it is a good idea to check items you do not understand.
The next area of the screen is called the Tree Map, which graphically displays
memory usage for each category of objects. By selecting each category, you can
check the objects within the category. Figure 3.45 In the following example, the
Texture2D category is selected.
123
Chapter 3 Profiling Tools
The bottom part of the screen is called Tree Map Table. Here, the list of objects is
arranged in a table format. The displayed items can be grouped, sorted, and filtered
by pressing the header of the Tree Map Table.
Especially, grouping the Types makes it easier to analyze, so please use it proac-
tively.
124
3.4 Memory Profiler
When a category is selected in the Tree Map, a filter is automatically set to display
only objects in that category.
In the Tree Map Table, a Diff item is added to the Header. Diffs can be of the
following types
125
Chapter 3 Profiling Tools
Detail Panel
This panel is used when you want to track the reference relationship of the selected
object. By checking this Referenced By, you will be able to figure out what is causing
the continued reference grabbing.
The bottom section, Selection Details, contains detailed information about the ob-
ject. Among them, the "Help" section contains advice on how to release it. You may
want to read it if you are not sure what to do.
126
3.4 Memory Profiler
127
Chapter 3 Profiling Tools
Fragmentation" visualizes the virtual memory status and can be used to investigate
fragmentation. However, it may be difficult to use because it contains a lot of non-
intuitive information such as memory addresses.
A new feature called "Memory Breakdowns" has been added since v0.6 of Memory
Profiler. Unity 2022.1 or later is required, but it is now possible to view TreeMaps
in list view and object information such as Unity Subsystems. Other new features
include the ability to check for possible duplicate objects.
128
3.5 Heap Explorer
*1 https://github.com/pschraut
*2 https://github.com/pschraut/UnityHeapExplorer
129
Chapter 3 Profiling Tools
130
3.5 Heap Explorer
The measurement result screen looks like the following. This screen is called
Overview.
In the Overview, the categories of particular concern are Native Memory Usage
131
Chapter 3 Profiling Tools
and Managed Memory Usage, which are indicated by green lines. Click the "Investi-
gate" button to see the details of each category.
In the following sections, we will focus on the important parts of the category de-
tails.
1. object
When Native Memory is Investigate, C++ Objects are displayed in this area. In case
of Managed Memory, C# Objects will be displayed in this area.
DDoL
DDoL stands for "Don’t Destroy On Load. You can see if the object is desig-
nated as an object that will not be destroyed after a scene transition.
Persistent
Indicates whether the object is a persistent object or not. This is an object that
is automatically created by Unity at startup.
The display area introduced below will be updated by selecting the object Figure
3.59.
Referenced by
The object from which the target object is referenced is displayed.
132
3.5 Heap Explorer
References to
Displays objects that are referenced by the target object.
Path to Root
Displays the root objects that reference the target object. This is useful when inves-
tigating memory leaks, as it allows you to see what is holding the reference.
133
Chapter 3 Profiling Tools
3.6 Xcode
Xcode is an integrated development environment tool provided by Apple. When you
set the target platform as iOS in Unity, the build result will be an Xcode project. It
is recommended to use Xcode for rigorous verification, as it provides more accu-
rate values than Unity. In this section, we will touch on three profiling tools: Debug
Navigator, GPU Frame Capture, and Memory Graph.
134
3.6 Xcode
The second method is to attach the running application to the Xcode debugger.
This can be profiled by selecting the running process from "Debug -> Attach to Pro-
cess" in the Xcode menu after running the application. However, the certificate at
build time must be for developer (Apple Development). Note that Ad Hoc or Enter-
prise certificates cannot be used to attach.
135
Chapter 3 Profiling Tools
1. CPU Gauge
You can see how much CPU is being used. You can also see the usage rate of each
thread.
2. Memory Gauge
An overview of memory consumption can be viewed. Detailed analysis such as
breakdown is not available.
136
3.6 Xcode
Energy Gauge
This gauge provides an overview of power consumption. You can get a breakdown
of CPU, GPU, Network, etc. usage.
137
Chapter 3 Profiling Tools
Disk Gauge
This gauge provides an overview of File I/O. It will be useful to check if files are being
read or written at unexpected times.
5. Network Gauge
This gauge provides an overview of network communication. Like the Disk gauge, it
is useful for checking for unexpected communication.
138
3.6 Xcode
FPS Gauge
This gauge is not displayed by default. It is displayed when GPU Frame Capture,
described at "3.6.3 GPU Frame Capture", is enabled. You can check not only the
FPS, but also the utilization of shader stages and the processing time of each CPU
and GPU.
1. preparation
To enable GPU Frame Capture in Xcode, you need to edit the scheme. First, open
the scheme edit screen by selecting "Product -> Scheme -> Edit Scheme.
139
Chapter 3 Profiling Tools
Next, change GPU Frame Capture to "Metal" from the "Options" tab.
Finally, from the "Diagnostics" tab, enable "Api Validation" for Metal.
140
3.6 Xcode
2. capture
Capture is performed by pressing the camera symbol from the debug bar during
execution. Depending on the complexity of the scene, the first capture may take
some time, so please be patient. Note that in Xcode13 or later, the icon has been
changed to the Metal icon.
When the capture is completed, the following summary screen will be displayed.
141
Chapter 3 Profiling Tools
From this summary screen, you can move to a screen where you can check de-
tails such as drawing dependencies and memory. The Navigator area displays com-
mands related to drawing. There are "View Frame By Call" and "View Frame By
Pipeline State".
142
3.6 Xcode
In the By Call view, all drawing commands are listed in the order in which they
were invoked. In the By Call view, all drawing commands are listed in the order in
which they were invoked, which includes buffer settings and other preparations for
drawing, so that a large number of commands are lined up. On the other hand, By
Pipeline State lists only the drawing commands related to the geometry drawn by
each shader. It is recommended to switch the display according to what you want to
investigate.
By pressing any of the drawing commands in the Navigator area, you can check
the properties used for that command. The properties include texture, buffer, sam-
pler, shader functions, and geometry. Each property can be double-clicked to see
143
Chapter 3 Profiling Tools
the details. For example, you can see the shader code itself, whether the sampler is
Repeat or Clamp, and so on.
Geometry properties not only display vertex information in a table format, but also
allow you to move the camera to see the shape of the geometry.
144
3.6 Xcode
Next, we will discuss "Profile" in the Performance column of the Summary screen.
Clicking this button starts a more detailed analysis. When the analysis is finished,
the time taken for drawing will be displayed in the Navigator area.
The results of the analysis can be viewed in more detail in the "Counters" screen.
In this screen, you can graphically see the processing time for each drawing such as
Vertex, Rasterized, Fragment, etc.
Next, "Show Memory" in the Memory column of the Summary screen is explained.
Clicking this button will take you to a screen where you can check the resources used
by the GPU. The information displayed is mainly textures and buffers. It is a good
idea to check if there are any unnecessary items.
145
Chapter 3 Profiling Tools
146
3.6 Xcode
Use this screen when you want to see which drawings depend on what.
147
Chapter 3 Profiling Tools
This tool can be used to investigate memory usage of objects that cannot be mea-
sured in Unity, such as plug-ins. The following is an explanation of how to use this
tool.
1. preliminary preparation
In order to obtain useful information from Memory Graph, it is necessary to edit
the scheme. Open the scheme edit screen by clicking "Product -> Scheme -> Edit
Scheme. Then, enable "Malloc Stack Logging" from the "Diagnostics" tab.
148
3.6 Xcode
By enabling this, Backtrace will be displayed in Inspector and you can see how it
was allocated.
2. capture
Capture is performed by pressing the branch-like icon from the debug bar while the
application is running.
149
Chapter 3 Profiling Tools
Memory Graph can be saved as a file by clicking "File -> Export MemoryGraph".
You can use the vmmap command, the heap command, and the malloc_history com-
mand to further investigate this file. If you are interested, please check it out. As an
example, the summary display of the vmmap command is shown below, allowing
you to grasp an overall picture that was difficult to grasp with the MemoryGraph
command.
3.7 Instruments
Xcode has a tool called Instruments that specializes in detailed measurement and
analysis. To build Instruments, select "Product -> Analyze". Once completed, a
screen will open to select a template for the measurement items as shown below.
150
3.7 Instruments
As you can see from the large number of templates, Instruments can analyze a
wide variety of content. In this section, we will focus on "Time Profiler" and "Alloca-
tions," which are frequently used.
When the measurement is performed, the display will look like Figure 3.94.
151
Chapter 3 Profiling Tools
Unlike the Unity Profiler, we will be analyzing not in frames, but in segments. The
Tree View at the bottom shows the processing time within the interval. When optimiz-
ing the processing time of game logic, it is recommended to analyze the processing
below the PlayerLoop in the Tree View.
To make the Tree View display easier to read, you should set the Call Trees setting
at the bottom of Xcode like Figure 3.95. In particular, checking the Hide System
Libraries checkbox hides inaccessible system code, making it easier to investigate.
152
3.7 Instruments
The symbol names in the Time Profiler differ from those in the Unity Profiler. The
symbol names in the Time Profiler are different from those in the Unity Profiler,
but they are still the same: "class_name_function_name_random_string".
3.7.2 Allocations
Allocations is a tool for measuring memory usage. It is used to improve memory
leakage and usage.
Before measuring, open "File -> Recording Options" and check "Discard events
for freed memory".
153
Chapter 3 Profiling Tools
If this option is enabled, the recording will be discarded when memory is freed.
Figure 3.99 As you can see in Figure 1, the appearance changes significantly
with and without options. With the option, lines are recorded only when memory is
allocated. Also, the recorded lines are discarded when the allocated area is released.
In other words, by setting this option, if a line remains in memory, it has not been
released from memory. For example, in a design where memory is released by
scene transitions, if many lines remain in the scene section before the transition,
there is a suspicion of a memory leak. In such a case, use the Tree View to follow
the details.
The Tree View at the bottom of the screen displays the details of the specified
range, similar to the Time Profiler. The Tree View can be displayed in four different
ways.
154
3.7 Instruments
The most recommended display method is Call Trees. This allows you to follow
which code caused the allocation. There are Call Trees display options at the bottom
of the screen, and you can set options such as Hide System Libraries in the same
way as Figure 3.95 introduced in the Time Profiler. Figure 3.101 Now we have cap-
tured the Call Trees display. You can see that 12.05MB of allocation is generated by
SampleScript’s OnClicked.
When this button is pressed, the memory at that timing is stored. After that, press-
155
Chapter 3 Profiling Tools
ing the "Mark Generations" button again will record the amount of memory newly
allocated compared to the previous data.
Figure 3.103 Each Generation in is displayed in a Call Tree format so that you can
follow what caused the memory increase.
156
3.8 Android Studio
Next, open the exported project in Android Studio. Then, with the Android device
connected, press the gauge-like icon in the upper right corner to start the build. After
the build is complete, the application will launch and the profile will start.
The second method is to attach the running process to the debugger and measure
it. First, open the Android Profiler from "View -> Tool Windows -> Profiler" in the
Android Studio menu.
157
Chapter 3 Profiling Tools
Next, open the Profiler and click on SESSIONS in the Profiler. To connect a ses-
sion, the application to be measured must be running. Also, the binary must be a
Development Build. Once the session is connected, the profile will start.
The second method of attaching to the debugger is good to keep in mind because
it does not require exporting the project and can be used easily.
Strictly speaking, you need to configure debuggable and profileable settings in
AndroidManifest.xml, not Development Build in Unity. In Unity, debuggable is
automatically set to true when you do a Development Build.
158
3.8 Android Studio
After selecting a thread, press the Record button to measure the thread’s call
stack. Figure 3.110 There are several measurement types like "Callstack Sample
Recording", but "Callstack Sample Recording" will be fine.
Clicking the Stop button will end the measurement and display the results. The
159
Chapter 3 Profiling Tools
result screen will look like the CPU module of the Unity Profiler.
If you want to see the breakdown of memory, you need to perform an additional
measurement. There are three measurement methods. Capture heap dump" can
160
3.9 RenderDoc
acquire the memory information at the timing when it is pressed. Other buttons are
for analyzing allocations during the measurement section.
3.9 RenderDoc
RenderDoc is an open source, free, high-quality graphics debugger tool. The tool is
currently available for Windows and Linux, but not for Mac. Graphics APIs supported
include Vulkan, OpenGL(ES), D3D11, and D3D12. Therefore, it can be used on
161
Chapter 3 Profiling Tools
Next, connect your Android device to RenderDoc. Click the house symbol in the
lower left corner to display the list of devices connected to the PC. Select the device
you want to measure from the list.
*3 https://renderdoc.org/
162
3.9 RenderDoc
Next, select the application to be launched from the connected device. Select
Launch Application from the tabs on the right side and choose the application to run
from the Executable Path.
A File Browser window will open. Find the Pacakge Name for this measurement
and select Activity.
163
Chapter 3 Profiling Tools
Finally, from the Launch Application tab, click the Launch button to launch the
application on the device. In addition, a new tab for measurement will be added on
the RenderDoc.
Capture Frame(s) Immediately" will capture frame data, which will be listed in the
"Capture collected" tab. Double-click on this data to open the captured data.
164
3.9 RenderDoc
Next is the Event Browser. Each command is listed here in order from the top.
Clicking the "clock symbol" at the top of the Event Browser displays the process-
ing time for each command in the "Duration" column. The processing time varies
depending on the timing of the measurement, so it is best to consider it as a rough
estimate. The breakdown of the DrawOpaqueObjects command shows that three
commands are batch processed and only one is drawn out of batch.
Next, let’s look at the tabs on the right side of the window. In this tab, there is a
window where you can check detailed information about the command selected in
165
Chapter 3 Profiling Tools
the Event Browser. The three most important windows are the Mesh Viewer, Texture
Viewer, and Pipeline State.
First, let’s look at Pipeline State. Pipeline State allows you to see what parameters
were used in each shader stage before the object was rendered to the screen. You
can also view the shaders used and their contents.
The stage names displayed in the Pipeline State are abbreviated, so the official
names are summarized at Table 3.7.
166
3.9 RenderDoc
Figure 3.123 The VTX stage is selected at , where you can see the topology and
vertex input data. Other FB stages at Figure 3.124 allow you to see details such as
the state of the output destination texture and Blend State.
You can also check the FS stage at Figure 3.125 to see the textures and parame-
ters used in the fragment shader.
167
Chapter 3 Profiling Tools
Resources in the center of the FS stage shows the textures and samplers used.
Uniform Buffers at the bottom of the FS stage shows the CBuffer. This CBuffer
contains numerical properties such as float and color. To the right of each item,
there is a "Go" arrow icon, which can be pressed to see the details of the data.
The shader used is shown in the upper part of the FS stage, and the shader code
can be viewed by pressing View. Disassembly type GLSL is recommended to make
the display easier to understand.
Next is the Mesh Viewer. This function allows you to visually view mesh informa-
tion, which is useful for optimization and debugging.
168
3.9 RenderDoc
The upper part of the Mesh Viewer shows mesh vertex information in a table for-
mat. The lower part of the Mesh Viewer has a preview screen where you can move
the camera to check the shape of the mesh. Both tabs are divided into In and Out
tabs, so you can see how the values and appearance have changed before and after
the conversion.
169
Chapter 3 Profiling Tools
Finally, there is the Texture Viewer. This screen shows the "texture used for input"
and "output result" of the command selected in the Event Browser.
In the area on the right side of the screen, you can check the input and output
textures. Clicking on the displayed texture will reflect it in the area on the left side of
the screen. The left side of the screen not only displays the texture, but also allows
170
3.9 RenderDoc
Figure 3.129 In the example above, "Wireframe Mesh" was selected for the Over-
lay, so the object drawn with this command has a yellow wireframe display, making it
easy to see visually.
Texture Viewer also has a feature called Pixel Context. This function allows the
user to view the drawing history of selected pixels. The history allows the user to
determine how often a pixel has been filled. This is a useful feature for overdraw
investigation and optimization. However, since it is on a per-pixel basis, it is not
suitable for investigating overdraw on a global basis. To investigate, right-click on the
area you want to investigate on the left side of Figure 3.129, and the location will be
reflected in the Pixel Context.
Next, click the History button in the Pixel Context to see the drawing history of the
pixel.
171
Chapter 3 Profiling Tools
Figure 3.132 In the following section, there are four histories. The green line in-
dicates that the pixel passed all the pipeline tests, such as the depth test, and was
painted. If some of the tests failed and the pixel was not rendered, it will be red. In the
captured image, the screen clearing process and capsule drawing were successful,
while the Plane and Skybox failed the depth test.
172
Chapter 4 Tuning Practice - Asset
Chapter 4
Game production involves handling a large number of different types of assets such
as textures, meshes, animations, and sounds. This chapter provides practical knowl-
edge about these assets, including settings to keep in mind when tuning perfor-
mance.
4.1 Texture
Image data, which is the source of textures, is an indispensable part of game pro-
duction. On the other hand, it consumes a relatively large amount of memory, so it
must be configured appropriately.
174
4.1 Texture
4.1.2 Read/Write
This option is disabled by default. If disabled, textures are only expanded in GPU
memory. If enabled, it will be copied not only to GPU memory but also to main
memory, thus doubling the consumption. Therefore, if you do not use APIs such as
Texture.GetPixel or Texture.SetPixel and only use Shader to access textures,
be sure to disable them.
Also, for textures generated at runtime, set makeNoLongerReadable to true as
shown at List 4.1 to avoid copying to main memory.
175
Chapter 4 Tuning Practice - Asset
Since transferring textures from GPU memory to main memory is time-
consuming, performance is improved by deploying textures to both if they are
readable.
176
4.1 Texture
The Aniso Level can be set from 0 to 16, but it has a slightly special specification.
When textures are imported, the value is 1 by default. Therefore, the Forced On
setting is not recommended unless you are targeting a high-spec device. Forced On
can be set from "Anisotropic Textures" in "Project Settings -> Quality".
Make sure that the Aniso Level setting is not enabled for objects that have no
effect, or that it is not set too high for objects that do have an effect.
The effect of Aniso Level is not linear, but rather switches in steps. The author
verified that it changes in four steps: 0~1, 2-3, 4~7, and 8 or later.
177
Chapter 4 Tuning Practice - Asset
1: using UnityEditor;
2:
3: public class ImporterExample : AssetPostprocessor
4: {
5: private void OnPreprocessTexture()
6: {
7: var importer = assetImporter as TextureImporter;
8: // Read/Write settings, etc. are also possible.
9: importer.isReadable = false;
10:
11: var settings = new TextureImporterPlatformSettings();
12: // Specify Android = "Android", PC = "Standalone
13: settings.name = "iPhone";
14: settings.overridden = true;
15: settings.textureCompression = TextureImporterCompression.Compressed;
16: // Specify compression format
17: settings.format = TextureImporterFormat.ASTC_6x6;
18: importer.SetPlatformTextureSettings(settings);
19: }
20: }
Not all textures need to be in the same compression format. For example, among
UI images, images with overall gradations tend to show a noticeable quality loss due
to compression. In such cases, it is recommended to set a lower compression ratio
for only some of the target images. On the other hand, for textures such as 3D
models, it is difficult to see the quality loss, so it is best to find an appropriate setting
such as a high compression ratio.
4.2 Mesh
The following are points to keep in mind when dealing with mesh (models) imported
into Unity. Performance of imported model data can be improved depending on the
settings. The following four points should be noted.
• Read/Write Enabled
• Vertex Compression
• Mesh Compression
• Optimize Mesh Data
178
4.2 Mesh
If you do not need to access the mesh during runtime, you should disable it. Specif-
ically, if the model is placed on Unity and used only to play an AnimationClip, Read-
/Write Enabled is fine to disable.
Enabling Read/Write Enabled will consume twice as much memory because infor-
mation accessible by the CPU is stored in memory. Please check it out, as simply
disabling it will save memory.
179
Chapter 4 Tuning Practice - Asset
However, please note that Vertex Compression is disabled under the following
conditions
• Read/Write is enabled
• Mesh Compression is enabled
• Mesh with Dynamic Batching enabled and adaptable (less than 300 vertices
and less than 900 vertex attributes)
• Off: Uncompressed
• Low: Low compression
• Medium: Medium compression
• High: High compression
180
4.2 Mesh
This option is useful because it automatically deletes vertex data, but be aware that
it may cause unexpected problems. For example, when switching between Material
and Shader at runtime, the properties accessed may be deleted, resulting in incorrect
rendering results. When bundling only Mesh assets, the incorrect Material settings
may result in unnecessary vertex data. This is common in cases where only a mesh
reference is provided, such as in the Particle System.
181
Chapter 4 Tuning Practice - Asset
4.3 Material
Material is an important function that determines how an object is rendered. Although
it is a familiar feature, it can easily cause memory leaks if used incorrectly. In this
section, we will show you how to use materials safely.
1: Material material;
2:
3: void Awake()
4: {
5: material = renderer.material;
6: material.color = Color.green;
7: }
1: Material material;
2:
3: void Awake()
4: {
5: material = renderer.material;
6: material.color = Color.green;
7: }
8:
9: void OnDestroy()
10: {
11: if (material != null)
12: {
13: Destroy(material)
14: }
15: }
182
4.4 Animation
1: Material material;
2:
3: void Awake()
4: {
5: material = new Material(); // Dynamically generated material
6: }
7:
8: void OnDestroy()
9: {
10: if (material != null)
11: {
12: Destroy(material); // Destroying a material when you have finished using it
13: }
14: }
Materials should be destroyed when they are finished being used (OnDestroy).
Destroy materials at the appropriate timing according to the rules and specifications
of the project.
4.4 Animation
Animation is a widely used asset in both 2D and 3D. This section introduces practices
related to animation clips and animators.
183
Chapter 4 Tuning Practice - Asset
This setting can also be adjusted dynamically from a script. Therefore, it is possible
to set Skin Weights to 2 for low-spec devices and 4 for high-spec devices, and so on,
for fine-tuning.
184
4.4 Animation
Keyframe Reduction reduces keys when there is little change in value. Specifically,
keys are removed when they are within the Error range compared to the previous
curve. This error range can be adjusted.
The Error settings are a bit complicated, but the units of the Error settings differ
depending on the item. Rotation is in degrees, while Position and Scale are in per-
cent. The tolerance for a captured image is 0.5 degrees for Rotation, and 0.5% for
Position and Scale. The detailed algorithm can be found in the Unity documentation
at *1 , so please take a peek if you are interested.
Optimal is even more confusing, but it compares two reduction methods, the Dense
Curve format and Keyframe Reduction, and uses the one with the smaller data. The
key point to keep in mind is that Dense Curve is smaller in size than Keyframe Re-
duction. However, it tends to be noisy, which may degrade the animation quality.
After understanding this characteristic, let’s visually check the actual animation to
see if it is acceptable.
*1 https://docs.unity3d.com/Manual/class-AnimationClip.html#tolerance
185
Chapter 4 Tuning Practice - Asset
There are a few things to note about each option. First, be careful when using
Root motion when setting Cull Completely. For example, if you have an animation
that frames in from off-screen, the animation will stop immediately because it is off-
screen. As a result, the animation will not frame in forever.
Next is Cull Update Transform. This seems like a very useful option, since it only
skips updating the transform. However, be careful if you have a shaking or other
Transform-dependent process. For example, if a character goes out of frame, no
updates will be made from the pose at that time. When the character enters the
frame again, it will be updated to a new pose, which may cause the shaking object to
move significantly. It is a good idea to understand the pros and cons of each option
before changing the settings.
186
4.5 Particle System
Also, even with these settings, it is not possible to dynamically change the fre-
quency of animation updates in detail. For example, you can optimize the frequency
of animation updates by halving the frequency of animation updates for objects that
are farther away from the camera. In this case, you need to use AnimationClip-
Playable or deactivate Animator and call Animator.Update yourself. Both require
writing your own scripts, but the latter is easier to implement than the former.
187
Chapter 4 Tuning Practice - Asset
188
4.5 Particle System
▲ Figure 4.13 Limiting the number of particles emitted with Max Particles
Another way is to use Max Particles in the main module. In the example above,
particles over 1000 will not be emitted.
189
Chapter 4 Tuning Practice - Asset
The Sub Emitters module generates arbitrary particle systems at specific times
(at creation, at the end of life, etc.) Depending on the Sub Emitters settings, the
number of particles may reach the peak number at once, so be careful when using
this module.
190
4.6 Audio
The Noise module’s Quality is easily overloaded. Noise can express organic particles
and is often used to easily increase the quality of effects. Because it is a frequently
used function, you should be careful about its performance.
• Low (1D)
• Midium (2D)
• High (3D)
The higher the dimension of Quality, the higher the load. If you do not need Noise,
turn off the Noise module. If you need to use noise, set the Quality setting to Low
first, and then increase the Quality according to your requirements.
4.6 Audio
The default state with sound files imported has some improvement points in terms of
performance. The following three settings are available.
• Load Type
• Compression Format
• Force To Mono
Set these settings appropriately for background music, sound effects, and voices
that are often used in game development.
191
Chapter 4 Tuning Practice - Asset
• Decompress On Load
• Compressed In Memory
• Streaming
Decompress On Load
Decompress On Load loads uncompressed video into memory. It is less CPU-
intensive, so playback is performed with less wait time. On the other hand, it uses a
lot of memory.
It is recommended for short sound effects that require immediate playback. BGM
and long voice files use a lot of memory, so care should be taken when using this
function.
Compressed In Memory
Compressed In Memory loads an AudioClip into memory in a compressed state.
This means that it is decompressed at the time of playback. This means that the
CPU load is high and playback delays are likely to occur.
It is suitable for sounds with large file sizes that you do not want to decompress
directly into memory, or for sounds that do not suffer from a slight playback delay. It
is often used for voice.
Streaming
Streaming, as the name implies, is a method of loading and playing back sounds. It
uses less memory, but is more CPU-intensive. It is recommended for use with long
BGM.
192
4.6 Audio
PCM
Uncompressed and consumes a large amount of memory. Do not set this unless you
want the best sound quality.
ADPCM
Uses 70% less memory than PCM, but the quality is lower, and the CPU load is much
smaller than with Vorbis. The CPU load is much lower than Vorbis, which means that
the speed of decompression is faster, making it suitable for immediate playback and
for sounds that are played back in large quantities. This is especially true for noisy
sounds such as footsteps, collisions, weapons, etc., that need to be played back
quickly and in large quantities.
Vorbis
As a lossy compression format, the quality is lower than PCM, but the file size is
smaller. It is the only format that allows for fine-tuning of the sound quality. It is
the most used compression format for all sounds (background music, sound effects,
voices).
193
Chapter 4 Tuning Practice - Asset
194
4.7 Resources / StreamingAssets
Mono playback is often fine for sound effects. In some cases, mono playback is
also better for 3D sound. It is recommended to enable Force To Mono after careful
consideration. The performance tuning effect is a mountain out of a molehill. If you
have no problem with monaural playback, you should actively use Force To Mono.
Although this is not the same as performance tuning, uncompressed audio files
should be imported into Unity. If you import compressed audio files, they will be
decoded and recompressed on the Unity side, resulting in a loss of quality.
• Resources folder
• StreamingAssets folder
Normally, Unity only includes objects referenced by scenes, materials, scripts, etc.
195
Chapter 4 Tuning Practice - Asset
in a build.
The rules are different for the special folders mentioned above. Stored files are
included in the build. This means that even files that are not actually needed are
included in the build if they are stored, leading to an expansion of the build size.
The problem is that it is not possible to check from the program. You have to
visually check for unnecessary files, which is time consuming. Be careful adding
files to these folders.
However, the number of stored files will inevitably increase as the project pro-
gresses. Some of these files may be mixed in with unnecessary files that are no
longer in use. In conclusion, we recommend that you review your stored files on a
regular basis.
You can load objects with code like this It is easy to overuse the Resources folder
because you can access objects from scripts by storing them in the Resources folder.
However, overloading the Resources folder will increase the startup time of the ap-
plication, as mentioned above. The reason for this is that when Unity starts up, it
analyzes the structure in all Resources folders and creates a lookup table. It is best
to minimize the use of the Resources folder as much as possible.
196
4.8 ScriptableObject
4.8 ScriptableObject
ScriptableObjects are YAML assets, and many projects are likely to manage their
files as text files. By explicitly specifying an [PreferBinarySerialization] Attribute
to change the storage format to binary. For assets that are mainly large amounts of
data, binary format improves the performance of write and read operations.
However, binary format is naturally more difficult to use with merge tools. For
assets that need to be updated only by overwriting the asset, such as those for
which there is no need to check the text for changes ( ), or for assets whose data is
no longer being changed after game development is complete, it is recommended to
use . [PreferBinarySerialization] ScriptableObjects are not required to be used
with ScriptableObjects.
A common mistake when using ScriptableObjects is mismatching class names
and source code file names. The class and file must have the same name. Be
careful with naming when creating classes and make sure that the .asset file is
correctly serialized and saved in the binary format.
1: /*
2: * When the source code file is named ScriptableObjectSample.cs
3: */
4:
5: // Serialization succeeded
6: [PreferBinarySerialization]
7: public sealed class ScriptableObjectSample : ScriptableObject
8: {
9: ...
10: }
11:
12: // Serialization Failure
13: [PreferBinarySerialization]
14: public sealed class MyScriptableObject : ScriptableObject
15: {
16: ...
17: }
197
5.1 Granularity of AssetBundle
Chapter 5
• Assets that are supposed to be used at the same time should be combined
into a single AssetBundle.
• Assets that are referenced by multiple assets should be in separate AssetBun-
dles.
It is difficult to control perfectly, but it is a good idea to set some rules regarding
granularity within the project.
199
Chapter 5 Tuning Practice - AssetBundle
AssetBundle.LoadFromFile
Load by specifying the file path that exists in storage. This is usually used
because it is the fastest and most memory-efficient.
AssetBundle.LoadFromMemory
Load by specifying the AssetBundle data already loaded in memory. While
using AssetBundle, a very large amount of data needs to be maintained in
memory, and the memory load is very large. For this reason, it is not normally
used.
AssetBundle.LoadFromStream
Load by specifying Stream which returns the AssetBundle data. When loading
an encrypted AssetBundle while decrypting it, use this API in consideration of
the memory load. However, since Stream must be seekable, be careful not to
use a cipher algorithm that cannot handle seek.
200
5.4 Optimization of the number of simultaneously loaded AssetBundles
*1 In Linux/Unix environments, the limit can be changed at runtime using the setrlimit function
201
6.1 Turning Physics On and Off
Chapter 6
203
Chapter 6 Tuning Practice - Physics
204
6.3 Collision Shape Selection
lowed Timestep which is the maximum amount of time that physics operations can
use in a single frame. This value defaults to 0.33 seconds, but you may want to set
it closer to the target FPS to limit the number of Fixed Update calls and stabilize the
frame rate.
205
Chapter 6 Tuning Practice - Physics
The Collision Matrix indicates that if the checkboxes at the intersection of two
layers are checked, those layers will collide.
Properly performing this setting is the most efficient way to eliminate calcula-
tions between objects that do not need to collide, since layers that do not collide
are also excluded from the pre-calculation that takes a rough hit on the object,
called the broad phase.
For performance considerations, it is preferable to have a dedicated layer for
physics calculations and uncheck all checkboxes between layers that do not
need to collide.
206
6.5 Raycast Optimization
207
Chapter 6 Tuning Practice - Physics
array of RaycastHit structures. Therefore, each call to this method will result in a
GC Alloc, which can cause spikes due to GC.
To avoid this problem, there is a method called Physics.RaycastNonAlloc that,
when passed an allocated array as an argument, writes the result to that array and
returns it.
For performance considerations, GC Alloc should not occur within FixedUpdate
whenever possible.
List 6.1 As shown in Figure 2.1, GC.Alloc can be avoided except during array
initialization by maintaining the array to which the results are written in a class field,
pooling, or other mechanism, and passing that array to Physics.RaycastNonAlloc.
208
6.6 Collider and Rigidbody
209
Chapter 6 Tuning Practice - Physics
object is considered dormant and its internal state is changed to sleep. Moving to
the sleep state minimizes the computational cost for that object unless it is moved by
an external force, collision, or other event.
Therefore, objects to which the Rigidbody component is attached that do not
need to move can be transitioned to the sleep state whenever possible to reduce the
computational cost of physics calculations.
Rigidbody The threshold used to determine if a component should go to sleep is
set in Physics in Project Settings as shown in Figure 6.4. Sleep Threshold Sleep
Threshold inside Physics in Project Settings, as shown at . Alternatively, if you wish
to specify the threshold for individual objects, you can set it from the Rigidbody.sle
epThreshold property.
210
6.6 Collider and Rigidbody
Physics item in the Profiler. You can see the number of Rigidbody active as well
▲ Figure 6.5 as the number of each element on the physics engine.
You can also check the number of elements in the Physics Debugger to see which
objects on the scene are active.
Physics Debugger, which displays the state of the objects on the scene in terms
▲ Figure 6.6 of the physics engine, color-coded.
211
Chapter 6 Tuning Practice - Physics
• Discrete
• Continuous
• Continuous Dynamic
• Continuous Speculative
212
6.8 Optimization of other project settings
6.8.1 Physics.autoSyncTransforms
In versions prior to Unity 2018.3, the position of the physics engine was automatically
synchronized with Transform each time an API for physics operations such as Phys
ics.Raycast was called.
This process is relatively heavy and can cause spikes when calling APIs for physics
operations.
To work around this issue, a setting called Physics.autoSyncTransforms has
been added since Unity 2018.3. Setting this value to false will prevent the Transfo
rm synchronization process described above when calling the physics API.
Synchronization of Transform will be performed after FixedUpdate is called during
physics simulation. This means that if you move the collider and then perform a
raycast on the new position of the collider, the raycast will not hit the collider.
6.8.2 Physics.reuseCollisionCallbacks
Prior to Unity 2018.3, every time an event was called to receive a collision call for a
Collider component such as OnCollisionEnter, a new Collision instance of the
argument was created and passed, resulting in a GC Alloc.
Since this behavior can have a negative impact on game performance depending
on how often events are called, a new property Physics.reuseCollisionCallback
s has been exposed since 2018.3.
Setting this value to true will suppress GC Alloc as it internally uses the Collisi
on instance that is passed around when calling events.
This setting has a default value of true in 2018.3 and later, which is fine if you
created your project with a relatively new Unity version, but if you created your project
with a version prior to 2018.3, this value may be set to false. If this setting is
disabled, you should enable it and then modify your code so that the game
runs correctly.
213
7.1 Resolution Tuning
Chapter 7
The final resolution is determined by multiplying the Target DPI value by the Res-
olution Scaling DPI Scale Factor value in the Quality Settings.
215
Chapter 7 Tuning Practice - Graphics
216
7.2 Semi-transparency and overdraw
Resolution settings at Screen.SetResolution are reflected only on the actual
device.
Note that changes are not reflected in the Editor.
In the Editor of the Built-in Render Pipeline, set the Scene view mode to Overdraw
in the Editor of the Built-in Render Pipeline, which is useful as a basis for adjusting
overdraw.
217
Chapter 7 Tuning Practice - Graphics
The Universal Render Pipeline supports the Scene Debug View Modes imple-
mented in the Universal Render Pipeline since Unity 2021.2.
218
7.3 Reducing Draw Calls
same material.
To use it, go to Player Settings and select Dynamic Batching item in the Player
Settings.
Also, in the Universal Render Pipeline, you can enable Dynamic Batching item
in the Universal Render Pipeline Asset. However, the use of Dynamic Batching is
deprecated in the Universal Render Pipeline.
Dynamic batching may not be recommended because of its impact on steady
CPU load. See below. SRP Batcher described below can be used to achieve
an effect similar to dynamic batching.
219
Chapter 7 Tuning Practice - Graphics
To make an object eligible for static batching, set the object’s static flag flag of the
object must be enabled. Specifically, the Batching Static sub-flag in the static flag
must be enabled.
220
7.3 Reducing Draw Calls
Static batching differs from dynamic batching in that it does not involve vertex
conversion processing at runtime, so it can be performed with a lower load. However,
it should be noted that it consumes a lot of memory to store the mesh information
combined by batch processing.
221
Chapter 7 Tuning Practice - Graphics
Creating shaders that can use GPU instancing requires some special handling.
Below is an example shader code with a minimal implementation for using GPU
instancing in a built-in render pipeline.
1: Shader "SimpleInstancing"
2: {
3: Properties
4: {
5: _Color ("Color", Color) = (1, 1, 1, 1)
6: }
7:
8: CGINCLUDE
9:
10: #include "UnityCG.cginc"
11:
12: struct appdata
13: {
14: float4 vertex : POSITION;
15: UNITY_VERTEX_INPUT_INSTANCE_ID
16: };
17:
18: struct v2f
19: {
20: float4 vertex : SV_POSITION;
21: // Required only when accessing INSTANCED_PROP in fragment shaders
22: UNITY_VERTEX_INPUT_INSTANCE_ID
23: };
24:
25: UNITY_INSTANCING_BUFFER_START(Props)
26: UNITY_DEFINE_INSTANCED_PROP(float4, _Color)
27: UNITY_INSTANCING_BUFFER_END(Props)
28:
29: v2f vert(appdata v)
222
7.3 Reducing Draw Calls
30: {
31: v2f o;
32:
33: UNITY_SETUP_INSTANCE_ID(v);
34:
35: // Required only when accessing INSTANCED_PROP in fragment shaders
36: UNITY_TRANSFER_INSTANCE_ID(v, o);
37:
38: o.vertex = UnityObjectToClipPos(v.vertex);
39: return o;
40: }
41:
42: fixed4 frag(v2f i) : SV_Target
43: {
44: // Only required when accessing INSTANCED_PROP with fragment shaders
45: UNITY_SETUP_INSTANCE_ID(i);
46:
47: float4 color = UNITY_ACCESS_INSTANCED_PROP(Props, _Color);
48: return color;
49: }
50:
51: ENDCG
52:
53: SubShader
54: {
55: Tags { "RenderType"="Opaque" }
56: LOD 100
57:
58: Pass
59: {
60: CGPROGRAM
61: #pragma vertex vert
62: #pragma fragment frag
63: #pragma multi_compile_instancing
64: ENDCG
65: }
66: }
67: }
GPU instancing only works on objects that reference the same material, but you
can set properties for each instance. You can set the target property as a property
to be changed individually by enclosing it with UNITY_INSTANCING_BUFFER_START(P
rops) and UNITY_INSTANCING_BUFFER_END(Props), as in the shader code above.
This property can then be set in C# to MaterialPropertyBlock API in C# to set
properties such as individual colors. Just be careful not to use MaterialPropertyBlock
for too many instances, as accessing the MaterialPropertyBlock may affect CPU
performance.
223
Chapter 7 Tuning Practice - Graphics
You can also enable or disable the SRP Batcher at runtime with the following C#
code
▼ List 7.3 Enabling SRP Batcher
1: GraphicsSettings.useScriptableRenderPipelineBatching = true;
The following two conditions must be met to make shaders compatible with SRP
Batcher
For UnityPerDraw Universal Render Pipeline and other shaders basically support
it by default, but you need to set up your own CBUFFER for UnityPerMaterial.
224
7.3 Reducing Draw Calls
1: Properties
2: {
3: _Color1 ("Color 1", Color) = (1,1,1,1)
4: _Color2 ("Color 2", Color) = (1,1,1,1)
5: }
6:
7: CBUFFER_START(UnityPerMaterial)
8:
9: float4 _Color1;
10: float4 _Color2;
11:
12: CBUFFER_END
With the above actions, you can create a shader that supports SRP Batcher, but
you can also check if the shader in question supports SRP Batcher from Inspector.
In the Inspector of the shader, click on the SRP Batcher item in the shader’s
Inspector is compatible the shader is compatible with SRP Batcher, and If it is "not
compatible is not compatible, it means it is not supported.
225
Chapter 7 Tuning Practice - Graphics
7.4 SpriteAtlas
2D games and UIs often use many sprites to build the screen. In such cases, a
function to avoid generating a large number of draw calls is SpriteAtlas to avoid a
large number of draw calls in such cases.
SpriteAtlas reduces draw calls by combining multiple sprites into a single texture.
To create a SpriteAtlas, first go to the Package Manager and click on 2D Sprite
must first be installed in the project from the Package Manager.
226
7.4 SpriteAtlas
After installation, right click in the Project view and select "Create -> 2D -> Sprite
Atlas" to create the SpriteAtlas asset.
227
Chapter 7 Tuning Practice - Graphics
To specify the sprites that will be made into an atlas, go to the SpriteAtlas inspector
and select Objects for Packing item of the SpriteAtlas inspector to specify the sprite
or the folder that contains the sprite.
228
7.4 SpriteAtlas
With the above settings, the sprite will be atlased during build and playback in the
Unity Editor, and the integrated SpriteAtlas texture will be referenced when drawing
the target sprite.
Sprites can also be obtained directly from SpriteAtlas with the following code.
1: [SerializeField]
2: private SpriteAtlas atlas;
3:
4: public Sprite LoadSprite(string spriteName)
5: {
6: // Obtain a Sprite from SpriteAtlas with the Sprite name as an argument
7: var sprite = atlas.GetSprite(spriteName);
8: return sprite;
9: }
Loading a single Sprite in the SpriteAtlas consumes more memory than loading
just one, since the texture of the entire atlas is loaded. Therefore, the SpriteAtlas
should be used with care and divided appropriately.
229
Chapter 7 Tuning Practice - Graphics
This section is written targeting SpriteAtlas V1. SpriteAtlas V2 may have signif-
icant changes in operation, such as not being able to specify the folder of the
sprite to be atlased.
7.5 Culling
In Unity, to omit in advance the processing of the parts that will not be displayed on
the screen in the final version. culling process is used to eliminate the part of the
image that will not ultimately be displayed on the screen in advance.
1: SubShader
2: {
3: Tags { "RenderType"="Opaque" }
4: LOD 100
5:
230
7.5 Culling
There are three settings: Back, Front, and Off. The effect of each setting is as
follows.
• Back - Do not draw polygons on the side opposite to the viewer’s point of view
• Front - Do not draw polygons in the same direction as the viewpoint
• Off - Disable back culling and draw all faces.
231
Chapter 7 Tuning Practice - Graphics
232
7.6 Shaders
Occlusion culling reduces rendering cost, but at the same time, it puts more load
on the CPU for the culling process, so it is necessary to balance each load and make
appropriate settings.
Only the object rendering process is reduced by occlusion culling, while pro-
cesses such as real-time shadow rendering remain unchanged.
7.6 Shaders
Shaders are very effective for graphics, but they often cause performance problems.
233
Chapter 7 Tuning Practice - Graphics
1: CGPROGRAM
2: #pragma vertex vert
3: #pragma fragment frag
4:
5: #include "UnityCG.cginc"
6:
7: struct appdata
8: {
9: float4 vertex : POSITION;
10: float2 uv : TEXCOORD0;
11: };
12:
13: struct v2f
14: {
15: float2 uv : TEXCOORD0;
16: float3 factor : TEXCOORD1;
17: float4 vertex : SV_POSITION;
18: };
19:
20: sampler2D _MainTex;
21: float4 _MainTex_ST;
22:
234
7.6 Shaders
235
Chapter 7 Tuning Practice - Graphics
7.6.4 ShaderVariantCollection
ShaderVariantCollection can be used to compile shaders before they are used to
prevent spikes.
ShaderVariantColletion allows you to keep a list of shader variants used in your
game as assets. It is created by selecting "Create -> Shader -> Shader Variant
Collection" from the Project view.
From the Inspector view of the created ShaderVariantCollection, press Add Shader
to add the target shader, and then select which variants to add for the shader.
236
7.6 Shaders
You can also set the shader variants from a script by calling ShaderVariantCollec-
tion.WarmUp() from a script to explicitly precompile the shader variants contained
in the corresponding ShaderVariantCollection.
237
Chapter 7 Tuning Practice - Graphics
7.7 Lighting
Lighting is one of the most important aspects of a game’s artistic expression, but it
often has a significant impact on performance.
There are several ways to reduce the number of objects dropping shadows, but
a simple method is to use the MeshRenderer’s Cast Shadows setting in MeshRen-
derer to off. This will remove the object from the shadow draw call. This setting is
usually turned on in Unity and should be noted in projects that use shadows.
238
7.7 Lighting
It is also useful to reduce the maximum distance an object can be drawn in the
shadow map. In the Quality Settings Shadow Distance in the Quality Settings to
reduce the number of objects that cast shadows to the minimum necessary. Adjust-
ing this setting will also reduce the resolution of the shadows, since shadows will be
drawn at the minimum range for the resolution of the shadow map.
239
Chapter 7 Tuning Practice - Graphics
The Shadows section allows you to change the format of the shadows, and the
Hard Shadows will produce a clear shadow border, but with a relatively low load,
while Soft Shadows is more expensive, but it can produce blurred shadow borders.
Shadow Resolution and Shadow Cascades items affect the resolution of the
shadow map, with larger settings increasing the resolution of the shadow map and
consuming more fill rate. However, since these settings have a great deal to do
with the quality of the shadows, they should be carefully adjusted to strike a balance
between performance and quality.
Some settings can be adjusted using the Light component’s Inspector, so it is
possible to change the settings for individual lights.
240
7.7 Lighting
Pseudo Shadow
Depending on the game genre or art style, it may be effective to use plate polygons or
other materials to simulate the shadows of objects. Although this method has strong
usage restrictions and is not highly flexible, it is far lighter than the usual real-time
shadow rendering method.
241
Chapter 7 Tuning Practice - Graphics
242
7.7 Lighting
In this state, select "Window -> Rendering -> Lighting" from the menu to display
the Lighting view.
The default setting is Lighting Settings asset is not specified, we need to change
243
Chapter 7 Tuning Practice - Graphics
the settings by clicking on New Lighting Settings button to create a new one.
244
7.7 Lighting
There are many settings that can be adjusted to change the speed and quality of
lightmap baking. Therefore, these settings should be adjusted appropriately for the
desired speed and quality.
Of these settings, the one that has the greatest impact on performance is
Lightmap Resolution This setting has the largest impact on performance. This
setting determines how many lightmap texels are allocated per unit in Unity, and
since the final lightmap size varies depending on this value, it has a significant
impact on storage and memory capacity, texture access speed, and other factors.
Finally, at the bottom of the Inspector view, the Generate Lighting button at the
bottom of the Inspector view to bake the lightmap. Once baking is complete, you will
see the baked lightmap stored in a folder with the same name as the scene.
245
Chapter 7 Tuning Practice - Graphics
246
7.9 Texture Streaming
247
Chapter 7 Tuning Practice - Graphics
In addition, the texture import settings must be changed to allow streaming of tex-
ture mipmaps. Open the texture inspector and select Advanced Streaming Mipmaps
Streaming Mipmaps in the Advanced settings.
These settings enable streaming mipmaps for the specified texture. Also, in the
Quality Settings Memory Budget under Quality Settings to limit the total memory
usage of the loaded textures. The texture streaming system will load the mipmaps
without exceeding the amount of memory set here.
248
Chapter 8 Tuning Practice - UI
Chapter 8
Tuning Practice - UI
Tuning Practices for uGUI, the Unity standard UI system, and TextMeshPro, the
mechanism for drawing text to the screen.
Splitting Canvas is also valid when Canvas is nested under Canvas. If the ele-
ments contained in the child Canvas change, a rebuild of the child Canvas will
only run, not the parent Canvas. However, upon closer inspection, it seems that
the situation is different when the UI in the child Canvas is switched to the active
state by SetActive. In this case, if a large number of UIs are placed in the
parent Canvas, there seems to be a phenomenon that causes a high load. I do
not know the details of why this behavior occurs, but it seems that care should
be taken when switching the active state of the UI in the nested Canvas.
250
8.2 UnityWhite
8.2 UnityWhite
When developing UIs, it is often the case that we want to display a simple rectangle-
shaped object. This is where UnityWhite comes in handy. UnityWhite is a Unity
built-in texture that is used when the Image or RawImage component does not specify
the image to be used ( Figure 8.1). You can see how UnityWhite is used in the Frame
Debugger ( Figure 8.2). This mechanism can be used to draw a white rectangle, so
a simple rectangle type display can be achieved by combining this with a multiplying
color.
251
Chapter 8 Tuning Practice - UI
since the same SpriteAtlas will be used for the same material.
When using the Layout component, Layout rebuilds occur when the target object
is created or when certain properties are edited. Layout rebuilds, like mesh rebuilds,
are costly processes.
To avoid performance degradation due to Layout rebuilds, it is effective to avoid
using Layout components as much as possible.
For example, if you do not need dynamic placement, such as text that changes
placement based on its content, you do not need to use the Layout component. If
you really need dynamic placement, or if it is used a lot on the screen, it may be
better to control it with your own scripts. Also, if the requirement is to be placed in
a specific position relative to the parent even if the parent changes size, this can
be accomplished by adjusting the RectTransform anchors. If you use a Layout
component when creating a prefab because it is convenient for placement, be sure
252
8.4 Raycast Target
This property is enabled by default, but actually many Graphic do not require this
property to be enabled. On the other hand, Unity has a feature called preset *1 that
allows you to change the default value in your project. Specifically, you can create
presets for the Image and RawImage components, respectively, and register them as
default presets from the Preset Manager in Project Settings. You may also use this
feature to disable the Raycast Target property by default.
8.5 Masks
To represent masks in uGUI, use either the Mask component or the RectMask2d
component.
Since Mask uses stencils to realize masks, the drawing cost increases with each
*1 https://docs.unity3d.com/ja/current/Manual/Presets.html
253
Chapter 8 Tuning Practice - UI
8.6 TextMeshPro
The common way to set text in TextMeshPro is to assign text to the text property,
but there is another method, SetText.
There are many overloads to SetText, for example, that take a string and a value
of type float as arguments. If you use this method like List 8.1, you can print the
value of the second argument. However, assume that label is a variable of type TM
P_Text(or inherited from it) and number is of type float.
1: label.SetText("{0}", number);
The advantage of this method is that it reduces the cost of generating strings.
1: label.text = number.ToString();
*2 https://issuetracker.unity3d.com/issues/rectmask2d-diffrently-masks-image-in-the-
play-mode-when-animating-rect-transform-pivot-property
254
8.7 UI Display Switching
List 8.2 In the method using the text property, as in the following example, ToS
tring() of type float is executed, so the string generation cost is incurred each
time this process is executed. In contrast, the method using SetText is designed
to generate as few strings as possible, which is a performance advantage when the
text to be displayed changes frequently, as is the case with .
This feature of TextMeshPro is also very powerful when combined with ZString *3 .
ZString is a library that reduces memory allocation in string generation. ZString pro-
vides many extension methods for the TMP_Text type, and by using those methods,
flexible text display can be achieved while reducing the cost of string generation.
*3 https://github.com/Cysharp/ZString
255
Chapter 8 Tuning Practice - UI
you to adjust the transparency of all objects under it at once. If you use this function
and set the transparency to 0, you can hide all the objects under its CanvasGroup(
Figure 8.6).
While these methods are expected to avoid the load caused by SetActive, you
may need to be careful because GameObject will remain in the active state. For
example, if Update methods are defined, be aware that they will continue to run even
in the hidden state, which may lead to an unexpected increase in load.
For reference, we measured the processing time for 1280 GameObject with Image
components attached, when switching between visible and hidden states using each
method ( Table 8.1). The processing time was measured using the Unity editor, and
Deep Profile was not used. The processing time of the method is the sum of the
execution time of the actual switching *4 and the execution time of UIEvents.WillRe
nderCanvases in the frame . The execution time of UIEvents.WillRenderCanvases
is added together because the UI rebuild is performed in .
Table 8.1 From the results of , we found that the method using CanvasGroup has
by far the shortest processing time in the situation we tried this time.
*4 For example, if SetActive, the SetActive method call is enclosed in Profiler.BeginSample and
Profiler.EndSample to measure the time.
256
Chapter 9 Tuning Practice - Script (Unity)
Chapter 9
Casual use of the features provided by Unity can lead to unexpected pitfalls. This
chapter introduces performance tuning techniques related to Unity’s internal imple-
mentation with actual examples.
258
9.2 Accessing tags and names
*1 ??
*2 ??
259
Chapter 9 Tuning Practice - Script (Unity)
1: void Update()
2: {
3: Rigidbody rb = GetComponent<Rigidbody>();
4: rb.AddForce(Vector3.up * 10f);
5: }
260
9.5 Classes that need to be explicitly discarded
1: void Start()
2: {
3: _texture = new Texture2D(8, 8);
4: _sprite = Sprite.Create(_texture, new Rect(0, 0, 8, 8), Vector2.zero);
5: _material = new Material(shader);
6: _graph = PlayableGraph.Create();
7: }
8:
9: void OnDestroy()
10: {
11: Destroy(_texture);
12: Destroy(_sprite);
13: Destroy(_material);
14:
15: if (_graph.IsValid())
16: {
17: _graph.Destroy();
18: }
19: }
261
Chapter 9 Tuning Practice - Script (Unity)
1: _animator.Play("Wait");
2: _material.SetFloat("_Prop", 100f);
JsonUtility (although it has less functionality than .NET JSON) has been shown
in benchmark tests to be significantly faster than the commonly used .
However, there is one performance-related thing to be aware of. NET JSON, but
there is one performance-related issue to be aware of: the handling of null.
The sample code below shows the serialization process and its results. You can
see that even though the member b1 of class A is explicitly set to null, it is serialized
with class B and class C generated with the default constructor. If the field to be
*3 https://docs.unity3d.com/ja/current/Manual/JSONSerialization.html
262
9.8 Pitfalls of Render and MeshFilter
serialized has null as shown here, a dummy object will be new created during JSON
conversion, so you may want to take that overhead into account.
If the material is used by any other renderers, this will clone the shared material
and start using it from now on.
It is your
Keep acquired materials and meshes in member variables and destroy them at
the appropriate time. It is your responsibility to destroy the automatically instantiated
mesh when the game object is being destroyed.
1: void Start()
2: {
3: _material = GetComponent<Renderer>().material;
4: }
5:
6: void OnDestroy()
7: {
8: if (_material != null) {
9: Destroy(_material);
10: }
*4 https://docs.unity3d.com/ja/current/ScriptReference/Renderer-material.html
*5 https://docs.unity3d.com/ja/current/ScriptReference/MeshFilter-mesh.html
263
Chapter 9 Tuning Practice - Script (Unity)
11: }
If you turn off the Logging setting in Unity, the stack trace will stop, but the logs
will be output. If UnityEngine.Debug.unityLogger.logEnabled is set to false in
Unity, no logging is output, but since it is just a branch inside the function, function
call costs and string generation and concatenation that should be unnecessary are
done. There is also the option of using the #if directive, but it is not realistic to deal
with all log output processing.
1: #if UNITY_EDITOR
2: Debug.LogError($"Error {e}");
3: #endif
The Conditional attribute can be utilized in such cases. Functions with the Cond
itional attribute will have the calling part removed by the compiler if the specified
symbol is not defined. List 9.13 As in the sample in #1, it is a good idea to add the C
onditional attribute to each function on the home-made class side as a rule to call
the logging function on the Unity side through the home-made log output class, so
that the entire function call can be removed if necessary.
264
9.10 Accelerate your code with Burst
One thing to note is that the symbols specified must be able to be referenced by
the function caller. The scope of the symbols defined in #define would be limited to
the file in which they are written. It is not practical to define a symbol in every file that
calls a function with the Conditional attribute. Unity has a feature called Scripting
Define Symbols that allows you to define symbols for the entire project. This can be
done under "Project Settings -> Player -> Other Settings".
*6 https://docs.unity3d.com/Packages/[email protected]/manual/docs/QuickStart.html
*7 https://llvm.org/
265
Chapter 9 Tuning Practice - Script (Unity)
1: [BurstCompile]
2: private struct MyJob : IJob
3: {
4: [ReadOnly]
5: public NativeArray<float> Input;
6:
7: [WriteOnly]
8: public NativeArray<float> Output;
9:
10: public void Execute()
11: {
12: for (int i = 0; i < Input.Length; i++)
13: {
14: Output[i] = Input[i] * Input[i];
15: }
16: }
17: }
List 9.14 Each element in line 14 of the job can be computed independently (there
is no order dependence in the computation), and since the memory alignment of the
output array is continuous, they can be computed together using the SIMD instruc-
tion.
*8 https://docs.unity3d.com/Packages/[email protected]/manual/docs/
CSharpLanguageSupport_Types.html
*9 https://docs.unity3d.com/Manual/JobSystemNativeContainer.html
266
9.10 Accelerate your code with Burst
You can see what kind of assembly the code will be converted to using Burst
Inspector as shown at Figure 9.2.
Using the Burst Inspector, you can check what kind of assembly the code will
▲ Figure 9.2 be converted to.
List 9.14 The process on line 14 of the code will be converted to List 9.15 in an
assembly for ARMV8A_AARCH64.
The fact that the operand of the assembly is suffixed with .4s confirms that the
SIMD instruction is used.
The performance of the code implemented with pure C# and the code optimized
with Burst are compared on a real device.
The actual devices are Android Pixel 4a and IL2CPP built with a script backend for
comparison. The array size is 2^20 = 1,048,576. The same process was repeated
10 times and the average processing time was taken.
267
Chapter 9 Tuning Practice - Script (Unity)
Table 9.1 The results of the performance comparison are shown in Figure 2.
268
Chapter 10 Tuning Practice - Script (C#)
Chapter 10
This chapter mainly introduces performance tuning practices for C# code with exam-
ples. Basic C# notation is not covered here, but rather the design and implementation
that you should be aware of when developing games that require performance.
The major problem with this code is that List<int> is new in the Update method
that is executed every frame.
To fix this, it is possible to avoid GC.Alloc every frame by pre-generating List<in
t> and using it around.
270
10.1 GC.Alloc cases and how to deal with them
I don’t think you will ever write meaningless code like the sample code here, but
similar examples can be found in more cases than you might imagine.
As you may have noticed, the sample code from List 10.2 above is all you
need to do.
271
Chapter 10 Tuning Practice - Script (C#)
1: // Member Variables
2: private int _memberCount = 0;
3:
4: // static variables
5: private static int _staticCount = 0;
6:
7: // member method
8: private void IncrementMemberCount()
9: {
10: _memberCount++;
11: }
12:
13: // static method
14: private static void IncrementStaticCount()
15: {
16: _staticCount++;
17: }
18:
19: // Member method that only invokes the received Action
20: private void InvokeActionMethod(System.Action action)
21: {
22: action.Invoke();
23: }
272
10.1 GC.Alloc cases and how to deal with them
In this way, the Action is new only the first time, but it is cached internally to avoid
GC.Alloc from the second time onward.
However, making all variables and methods static is not very adoptable in terms of
code safety and readability. In code that needs to be fast, it is safer to design without
using lambda expressions for events that fire at every frame or at indefinite times,
rather than to use a lot of statics to eliminate GC.Alloc.
273
Chapter 10 Tuning Practice - Script (C#)
For example, if struct, which does not implement the IEquatable<T> interface, is
specified to T, it will be cast to object with the argument Equals, resulting in boxing.
To prevent this from happening in advance, change the following
By using the where clause (generic type constraint) to restrict the types that T
can accept to those that implement IEquatable<T>, such unexpected boxing can be
prevented.
274
10.2 About for/foreach
*1 https://sharplab.io/
275
Chapter 10 Tuning Practice - Script (C#)
▼ List 10.13 Decompilation result of the example of looping through a List with foreach
In the case of turning with foreach, you can see that the implementation is to
get the enumerator, move on with MoveNext(), and refer to the value with Curren
t. Furthermore, looking at the implementation of MoveNext() in list.cs *2 , it appears
that the number of various property accesses, such as size checks, are increased,
and that processing is more frequent than direct access by the indexer.
Next, let’s look at when we turn in for.
In C#, the for statement is a sugar-coated syntax for the while statement, and the
indexer (public T this[int index] ), and is obtained by reference by the indexer
( Also, if you look closely at this while statement, you will see that the conditional
*2 https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs
276
10.2 About for/foreach
expression contains list.Count. This means that the access to the Count prop-
erty is performed each time the loop is repeated. Count The more the number of
Count accesses to the property, the more the number of accesses to the property
increases proportionally, and depending on the number of accesses, the load be-
comes non-negligible. If Count does not change within the loop, then the load on
property accesses can be reduced by caching them before the loop.
▼ List 10.17 Example of List in for: Decompiled result of the improved version
Caching Count reduced the number of property accesses and made it faster. Both
of the comparisons in this loop are not loaded by GC.Alloc, and the difference is due
to the difference in implementation.
In the case of arrays, foreach has also been optimized and is almost unchanged
from that described in for.
277
Chapter 10 Tuning Practice - Script (C#)
For the purpose of verification, the number of data is 10,000,000 and random
numbers are assigned in advance. List<int> The sum of the data is calculated.
The verification environment was Pixel 3a and Unity 2021.3.1f1.
In the case of List<int>, a comparison with a finer set of conditions shows that f
or and for with Count optimizations are even faster than foreach. List The forea
ch of can be rewritten to for with Count optimization to reduce the overhead of the
MoveNext() and Current properties in the processing of foreach , thus making it
faster.
In addition, when comparing the respective fastest speeds of List and arrays,
arrays are approximately 2.3 times faster than List. Even if foreach and for are
written to have the same IL result, foreach is the faster result, and array’s foreach
is sufficiently optimized.
Based on the above results, arrays should be considered instead of List<T> for
situations where the number of data is large and processing speed must be fast.
foreachfor However, if the rewriting is insufficient, such as when List defined in
a field is referenced without local caching, it may not be possible to speed up the
process.
278
10.4 string
is called object pool ing. For example, objects that are to be used in the game
phase can be pooled together in the load phase and handled while only assigning
and referencing the pooled objects when they are used, thereby avoiding GC.Alloc
during the game phase.
In addition to reducing allocations, object pooling can also be used in a variety of
other situations, such as enabling screen transitions without having to recreate the
objects that make up the screen each time, reducing load times, and avoiding multi-
ple heavy calculations by retaining the results of processes with very high calculation
costs. It is used in a variety of situations.
Although the term "object" is used here in a broad sense, it applies not only to the
smallest unit of data, but also to Coroutine and Action, . For example, consider
generating Coroutine more than the expected number of executions in advance,
and use it when necessary to exhaust it. For example, if a game that takes 2 minutes
to complete will be executed a maximum of 20 times, you can reduce the cost of
generating by generating IEnumerator in advance and only using StartCoroutine
when you need to use it.
10.4 string
The string object is a sequential collection of System.Char objects representing
strings. string GC.Alloc can easily occur with one usage. For example, concatenat-
ing two strings using the character concatenation operator + will result in the creation
of a new string object. string The value of cannot be changed (immutable) after it
is created, so an operation that appears to change the value creates and returns a
new string object.
In the above example, a string is created with each string concatenation, resulting
279
Chapter 10 Tuning Practice - Script (C#)
280
10.5 LINQ and Latency Evaluation
implementing your own code like unsafe or introducing a library with extensions for
Unity like ZString *3 (e.g. NonAlloc applicability to TextMeshPro).
List 10.22 The reason why GC.Alloc occurs in is due to the internal implementation
of LINQ. In addition, some LINQ methods are optimized for the caller’s type, so the
size of GC.Alloc changes depending on the caller’s type.
*3 https://github.com/Cysharp/ZString
281
Chapter 10 Tuning Practice - Script (C#)
23:
24: public void RunAsIEnumerable()
25: {
26: var query = ienumerable.Where(i => i % 2 == 0);
27: foreach (var i in query){}
28: }
List 10.23 We measured the benchmark for each method defined in Figure 10.1.
The results show that the size of heap allocations increases in the order T[] → List
<T> → IEnumerable<T>.
Thus, when using LINQ, the size of GC.Alloc can be reduced by being aware of
the runtime type.
Part of the cause of GC.Alloc with the use of LINQ is the internal implemen-
tation of LINQ. Many LINQ methods take IEnumerable<T> and return IEnum
erable<T>, and this API design allows for intuitive description using method
chains. The entity IEnumerable<T> returned by a method is an instance of the
class for each function. LINQ internally instantiates a class that implements I
Enumerable<T>, and furthermore, GC.Alloc occurs internally because calls to
GetEnumerator() are made to realize loop processing, etc.
282
10.5 LINQ and Latency Evaluation
List 10.24 The result of the execution of List 10.25 is the result of . By adding ToAr
ray at the end, which is an immediate evaluation, the result of executing the method
Where or Select and evaluating the value is returned when the assignment is made
to query. Therefore, since HeavyProcess is also called, you can see that processing
time is taken at the timing when query is generated.
1: Query: 3013
2: diff: 3032
3: diff: 3032
4: diff: 3032
As you can see, unintentional calls to LINQ’s immediate evaluation methods can
result in bottlenecks at those points. ToArray Methods that require looking at the
entire sequence once, such as OrderBy, Count, and , are immediate evaluation, so
be aware of the cost when calling them.
283
Chapter 10 Tuning Practice - Script (C#)
worsen heap allocation and execution speed compared to when it is not used. In
fact, Microsoft’s Unity performance recommendations at *4 clearly state "Avoid use of
LINQ. Here is a benchmark comparison of the same logic implementation with and
without LINQ at List 10.26.
The results are available at Figure 10.2. The comparison of execution times shows
that the process with LINQ takes 19 times longer than the process without LINQ.
While the above results clearly show that the use of LINQ deteriorates perfor-
mance, there are cases where the coding intent is more easily conveyed by using
*4 https://docs.microsoft.com/en-us/windows/mixed-reality/develop/unity/performance-
recommendations-for-unity#avoid-expensive-operations
284
10.6 How to avoid async/await overhead
LINQ. After understanding these behaviors, there may be room for discussion within
the project as to whether to use LINQ or not, and if so, the rules for using LINQ.
1: using System;
2: using System.Threading.Tasks;
3:
4: namespace A {
5: public class B {
6: public async Task HogeAsync(int i) {
7: if (i == 0) {
8: Console.WriteLine("i is 0");
9: return;
10: }
11: await Task.Delay(TimeSpan.FromSeconds(1));
12: }
13:
14: public void Main() {
15: int i = int.Parse(Console.ReadLine());
16: Task.Run(() => HogeAsync(i));
17: }
18: }
19: }
In cases such as List 10.27, the cost of generating a state machine structure for
IAsyncStateMachine implementation, which is unnecessary in the case of syn-
chronous completion, can be omitted by splitting HogeAsync, which may be com-
pleted synchronously, and implementing it as List 10.28.
285
Chapter 10 Tuning Practice - Script (C#)
1: using System;
2: using System.Threading.Tasks;
3:
4: namespace A {
5: public class B {
6: public async Task HogeAsync(int i) {
7: await Task.Delay(TimeSpan.FromSeconds(1));
8: }
9:
10: public void Main() {
11: int i = int.Parse(Console.ReadLine());
12: if (i == 0) {
13: Console.WriteLine("i is 0");
14: } else {
15: Task.Run(() => HogeAsync(i));
16: }
17: }
18: }
19: }
*5 https://tech.cygames.co.jp/archives/3417/
286
10.7 Optimization with stackalloc
Since C# 7.2, the Span<T> structure can be used to allocate an array of ints on the
stack as shown in List 10.30The structure can now be used without unsafestack
alloc can be used without unsafe as shown in .
▼ List 10.30 Span<T> Allocating an array on the stack using the struct
For Unity, this is standard from 2021.2. For earlier versions, Span<T> does not
exist, so System.Memory.dll must be installed.
Arrays allocated with stackalloc are stack-only and cannot be held in class or
structure fields. They must be used as local variables.
Even though the array is allocated on the stack, it takes a certain amount of pro-
cessing time to allocate an array with a large number of elements. If you want to
use arrays with a large number of elements in places where heap allocation should
be avoided, such as in an update loop, it is better to allocate the array in advance
during initialization or to prepare a data structure like an object pool, and implement
it in such a way that it can be rented out when used.
287
Chapter 10 Tuning Practice - Script (C#)
Also, note that the stack area allocated by stackalloc is not released until the
function exits. For example, the code shown at List 10.31 may cause a Stack Over-
flow while looping, since all arrays allocated in the loop are retained and released
when exiting the Hoge method.
Specifically, for each method call definition of a class, the code shown at List 10.32
is automatically generated.
1: struct VirtActionInvoker0
2: {
3: typedef void (*Action)(void*, const RuntimeMethod*);
4:
5: static inline void Invoke (
6: Il2CppMethodSlot slot, RuntimeObject* obj)
7: {
8: const VirtualInvokeData& invokeData =
9: il2cpp_codegen_get_virtual_invoke_data(slot, obj);
10: ((Action)invokeData.methodPtr)(obj, invokeData.method);
11: }
12: };
It generates similar C++ code not only for virutal methods, but also for non-virtual
*6 https://blog.unity.com/technology/il2cpp-internals-method-calls
288
10.8 Optimizing method invocation under IL2CPP backend with sealed
methods that do not inherit at compile time. This auto-generated behavior leads
to bloated code size and increased processing time for method calls.
This problem can be avoided by adding the sealed modifier to the class definition
*7 .
List 10.33 If you define a class like List 10.34 and call a method, the C++ code
generated by IL2CPP will generate method calls like .
▼ List 10.34 List 10.33 The C++ code corresponding to the method call in
*7 https://blog.unity.com/technology/il2cpp-optimizations-devirtualization
289
Chapter 10 Tuning Practice - Script (C#)
▼ List 10.35 Class Definition and Method Calls Using the SEALED
Thus, we can see that the method call calls Cow_Speak_m1607867742, which di-
rectly calls the method.
However, in relatively recent Unity, the Unity official clarifies that such optimization
is partially automatic *8 .
In other words, even if you do not explicitly specify sealed, it is possible that such
optimization is done automatically.
However, the "[il2cpp] Is ‘sealed‘ Not Worked As Said Anymore In Unity 2018.3?"
*8 As mentioned in the forum, this implementation is not complete as of April 2019.
Because of this current state of affairs, it would be a good idea to check the code
generated by IL2CPP and decide on the setting of the sealed modifier for each
project.
For more reliable direct method calls, and in anticipation of future IL2CPP opti-
mizations, it may be a good idea to set the sealed modifier as an optimizable mark.
*8 https://forum.unity.com/threads/il2cpp-is-sealed-not-worked-as-said-anymore-in-
unity-2018-3.659017/#post-4412785
290
10.9 Optimization through inlining
Inlining is done by copying and expanding the contents within a method, such as
List 10.38, and the call to the Add method within the Func method of List 10.37.
In IL2CPP, no particular inlining optimization is performed during code generation.
However, starting with Unity 2020.2, by specifying the MethodImpl attribute for
a method and MethodOptions.AggressiveInlining for its parameter, the corre-
sponding function in the generated C++ code will be given the inline specifier. In
other words, inlining at the C++ code level is now possible.
The advantage of inlining is that it not only reduces the cost of method calls, but
also saves copying of arguments specified at the time of method invocation.
291
Chapter 10 Tuning Practice - Script (C#)
For example, arithmetic methods take multiple relatively large structures as argu-
ments, such as Vector3 and Matrix. If the structs are passed as arguments as they
are, they are all copied and passed to the method as passed by value. If the number
of arguments and the size of the passed structs are large, the processing cost may
be considerable for method calls and argument copying. In addition, method calls
may become a case that cannot be overlooked as a processing burden because
they are often used in periodic processing, such as in the implementation of physical
operations and animations.
In such cases, optimization through inlining can be effective. In fact, Unity’s new
mathmatics library Mathmatics specifies MethodOptions.AggressiveInlining for
method calls everywhere *9 .
On the other hand, inlining has the disadvantage that the code size increases with
the expansion of the process within the method.
Therefore, it is recommended to consider inlining especially for methods that are
frequently called in a single frame and are hot-passed. It should also be noted that
specifying an attribute does not always result in inlining.
Inlining is limited to methods that are small in content, so methods that you want
to inline must be kept small.
Also, in Unity 2020.2 and earlier, the inline specifier is not attached to attribute
specifications, and there is no guarantee that inlining will be performed reliably even
if the C++ inline specifier is specified.
Therefore, if you want to ensure inlining, you may want to consider manual inlining
for methods that are hotpaths, although it will reduce readability.
*9 https://github.com/Unity-Technologies/Unity.Mathematics/blob/
f476dc88954697f71e5615b5f57462495bc973a7/src/Unity.Mathematics/math.cs#L1894
292
Chapter 11 Tuning Practice - Player Settings
Chapter 11
This chapter introduces the Player items in Project Settings that affect performance.
In addition, changing the Scripting Backend to IL2CPP will also change the C++
Compiler Configuration can be selected.
294
11.2 Strip Engine Code / Managed Stripping Level
Here you can choose between Debug, Release, and Master, each of which has a
tradeoff between build time and degree of optimization, so it is best to use the one
that best suits your build objectives.
11.1.1 Debug
Debug does not perform well at runtime because no optimization is performed, but
build time is the shortest compared to the other settings.
11.1.2 Release
Optimization improves run-time performance and reduces the size of built binaries,
but increases build time.
11.1.3 Master
All optimizations available for the platform are enabled. For example, Windows builds
will use more aggressive optimizations such as link-time code generation (LTCG).
In return, build times will be even longer than with the Release setting, but Unity
recommends using the Master setting for production builds if this is acceptable.
295
Chapter 11 Tuning Practice - Player Settings
on static analysis, types that are not directly referenced in the code, or code that is
dynamically called in reflection, may be mistakenly removed.
In such cases, the link.xml file or by specifying the Preserve attribute. *1
*1 https://docs.unity3d.com/2020.3/Documentation/Manual/ManagedCodeStripping.html
296
Chapter 12 Tuning Practice - Third Party
Chapter 12
This chapter introduces some things to keep in mind from a performance perspective
when implementing third-party libraries that are often used when developing games
in Unity.
12.1 DOTween
DOTween *1 is a library that allows scripts to create smooth animations. For example,
an animation that zooms in and out can be easily written as the following code
12.1.1 SetAutoKill
Since the process of creating a tween, such as DOTween.Sequence() or transform.
DOScale(...), basically involves memory allocation, consider reusing instances for
animations that are frequently replayed.
By default, the tween is automatically discarded when the animation completes,
so SetAutoKill(false) suppresses this. The first use case can be replaced with
the following code
*1 http://dotween.demigiant.com/index.php
298
12.1 DOTween
Note that a tween that calls SetAutoKill(false) will leak if it is not explicitly
destroyed. Call Kill() when it is no longer needed, or use the SetLink described
below.
12.1.2 SetLink
Tweens that call SetAutoKill(false) or that are made to repeat indefinitely with S
etLoops(-1) will not be automatically destroyed, so you will need to manage their
lifetime on your own. It is recommended that such a tween be associated with an
associated GameObject at SetLink(gameObject) so that when the GameObject is
Destroyed, the tween is also destroyed.
299
Chapter 12 Tuning Practice - Third Party
It is also useful to check for tween objects that continue to move even though their
associated GameObjects have been discarded and for tween objects that are in a
Pause state and leaking without being discarded.
300
12.2 UniRx
12.2 UniRx
UniRx *2 is a library implementing Reactive Extensions optimized for Unity. With a
rich set of operators and helpers for Unity, event handling of complex conditions can
be written in a concise manner.
12.2.1 Unsubscribe
UniRx allows you to subscribe ( Subscribe) to the stream publisher IObservable to
receive notifications of its messages.
When subscribing, instances of objects to receive notifications, callbacks to pro-
cess messages, etc. are created. To avoid these instances remaining in memory
beyond the lifetime of the Subscribe party, it is basically the Subscribe party’s re-
sponsibility to unsubscribe when it no longer needs to receive notifications.
There are several ways to unsubscribe, but for performance considerations, it is
better to explicitly Dispose retain the IDisposable return value of Subscribe.
If your class inherits from MonoBehaviour, you can also call AddTo(this) to auto-
matically unsubscribe at the timing of your own Destroy. Although there is an over-
head of calling AddComponent internally to monitor the Destroy, it is a good idea to
use this method, which is simpler to write.
*2 https://github.com/neuecc/UniRx
301
Chapter 12 Tuning Practice - Third Party
12.3 UniTask
UniTask is a powerful library for high-performance asynchronous processing in Unity,
featuring zero-allocation asynchronous processing with the value-based UniTask
type. It can also control the execution timing according to Unity’s PlayerLoop, thus
completely replacing conventional coroutines.
12.3.1 UniTask v2
UniTask v2, a major upgrade of UniTask, was released in June 2020. UniTask v2
features significant performance improvements, such as zero-allocation of the entire
async method, and added features such as asynchronous LINQ support and await
support for external assets. *3
On the other hand, be careful when updating from UniTask v1, as it includes de-
structive changes, such as UniTask.Delay(...) and other tasks returned by Fac-
tory being invoked at invocation time, prohibiting multiple await to normal UniTask
instances, *4 and so on. However, aggressive optimizations have further improved
performance, so basically UniTask v2 is the way to go.
*3 https://tech.cygames.co.jp/archives/3417/
*4 UniTask.Preserve UniTask v2 can be converted to a UniTask that can be awaited multiple times
by using
302
12.3 UniTask
If _hp of this MonoBehaviour is Destroyed before is fully depleted, _hp will not be
depleted any further, so UniTask, the return value of WaitForDeadAsync, will lose
the opportunity to complete, and will continue to wait.
It is recommended that you use this tool to check for UniTask leaking due to a
misconfiguration of termination conditions.
The reason why the example code leaks a task is that it does not take into
account the case where the task itself is destroyed before the termination
condition is met.
To do this, simply check to see if the task itself has been destroyed. Or,
the CancellationToken obtained by this.GetCancellationTokenOnDestro
y() to itself can be passed to WaitForDeadAsync so that the task is canceled
when is Destroyed.
303
Chapter 12 Tuning Practice - Third Party
At Destroy time, the former UniTask completes without incident, while the
latter OperationCanceledException is thrown. Which behavior is preferable
depends on the situation, and the appropriate implementation should be cho-
sen.
304
CONCLUSION
This is the end of this document. We hope that through this book, those of you who
are "not confident about performance tuning" have come to think, "I kind of get it, and
I want to try it. As more people practice it in their projects, they will be able to deal
with problems much faster, and the stability of their projects will increase.
You may also encounter complex events that cannot be solved with the information
presented in this book. But even in such cases, what you will do will be the same.
You will still need to profile, analyze the cause, and take some action.
From this point forward, please make full use of your own knowledge, experience,
and imagination through practice. I hope you will enjoy performance tuning in this
way. Thank you for reading to the end.
Introduction of the Authors
The following is a list of the authors involved in this book. Please note that the
profiles of the authors and the sections they are responsible for are current at the
time of writing.
Takuya Iida
Engineering Manager, SGE Core Technology Division, Grange Corporation. He is in
charge of writing
Chapter 1 "Getting Started with Performance Tuning" and Chapter 3 "Profiling
Tools". Currently involved in optimization across subsidiaries. I do various things
in my work, and I am working hard everyday to improve development speed and
quality.
Yusuke Ishiguro
CyberAgent, Inc. SGE Core Technology Division,
Responsible for part of Chapter 2 "Fundamentals" and writing for Chapter 5 "Tun-
ing Practice - AssetBundle". He was assigned to the infrastructure development team
of Ameba game (now QualArts) as a Unity engineer and engaged in the development
of various infrastructures such as real-time infrastructure, chat infrastructure, Asset-
Bundle management infrastructure "Octo", authentication and billing infrastructure.
Currently, he is transferred to the SGE Core Technology Division, where he leads
the overall infrastructure development and focuses on optimizing the development
efficiency and quality of the entire Game Division.
Daiki Hakamata
Writes for
Chapter 9 "Tuning Practice - Script (Unity)" , SGE Core Technology Division, Cy-
berAgent, Inc. Engaged in game development and operation at Grange Inc. and
G-Crest Inc. Currently he belongs to SGE Core Technology Division and is develop-
ing the infrastructure.
Gaku Ishii
Server and client-side engineer at Samzap Inc. He writes for
Chapter 11 "Tuning Practice - Player Settings" and Chapter 12 "Tuning Practice -
Third Party". After being assigned to Samzap Inc., he worked on the development of
new game apps as a Unity engineer. After being involved in the release of several
apps, he switched to server-side engineering. Currently working as a server-side
engineer at Samzap and as a Unity engineer at SGE Core Technology Division, both
on the server/client side.
307
Appendix Introduction of the Authors
Kazunori Tamura
He belongs to QualArts Corporation and writes for
Chapter 8 "Tuning Practice - UI" . At QualArts Corporation, he is engaged in
game development and internal infrastructure development as a Unity engineer. He
is mainly involved in the development of UI for the company’s internal infrastructure.
He is also interested in improving the efficiency of game development through AI,
and is struggling to utilize AI within the game division.
308
Unity Performance Tuning Bible