0% found this document useful (0 votes)
67 views

Approaches To Optimizing V8 JavaScript Engine

This document summarizes an article about approaches to optimizing the V8 JavaScript engine. It discusses the architecture of V8, which compiles JavaScript to native machine code using two just-in-time compilers. It describes how V8 inserts polymorphic inline caches to optimize function calls based on runtime types. The author was involved in a project to optimize V8's performance on benchmarks by optimizing its build, tuning runtime options, and implementing additional scalar optimizations, achieving a 10% overall improvement.

Uploaded by

Kushal Expert
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Approaches To Optimizing V8 JavaScript Engine

This document summarizes an article about approaches to optimizing the V8 JavaScript engine. It discusses the architecture of V8, which compiles JavaScript to native machine code using two just-in-time compilers. It describes how V8 inserts polymorphic inline caches to optimize function calls based on runtime types. The author was involved in a project to optimize V8's performance on benchmarks by optimizing its build, tuning runtime options, and implementing additional scalar optimizations, achieving a 10% overall improvement.

Uploaded by

Kushal Expert
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/295901765

Approaches to optimizing V8 JavaScript engine

Article  in  Proceedings of the Institute for System Programming of RAS · January 2015


DOI: 10.15514/ISPRAS-2015-27(6)-2

CITATIONS READS
0 147

1 author:

Dmitri Botcharnikov
Samsung Advanced Institute of Technology
1 PUBLICATION   0 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Dmitri Botcharnikov on 09 October 2018.

The user has requested enhancement of the downloaded file.


Дмитрий Бочарников. Подходы к оптимизации движка JavaScript V8. Труды ИСП РАН, том 27, вып. 6, 2015 г., Dmitry Botcharnikov. Approaches to Optimizing V8 JavaScript Engine. Trudy ISP RAN /Proc. ISP RAS, vol. 27, issue
с.21-32 6, 2015, pp. 21-32

To speed up execution of JavaScript programs there were developed several


optimization techniques in recent years. One example of modern high-performing
JavaScript engine is a V8 engine [2] used in Google Chrome browser and node.js
web server among others. This is an open source project which implemented some
advanced optimization methods including Just-in-Time compilation [3],
Approaches to Optimizing V8 JavaScript Polymorphic Inline Caches [4], optimized recompilation of hot code regions, On
Engine Stack Replacement [5] &c.
In previous year we were involved in project of optimizing performance of V8
JavaScript engine on major benchmark suites including Octane [6], SunSpider [7]
Dmitry Botcharnikov <[email protected]> and Kraken [8]. The project was quite time limited, however we achieved about
LLC Samsung R&D Institute Rus, 12, ul. Dvintsev, housing 1, office #1500, 10% total performance improvement compared to open source version.
Moscow, 127018, Russian Federation The rest of paper is organized as follow: in Section 2 there is an architectural
overview of V8, in Section 3 we enumerate and reason our approaches with more
Abstract. JavaScript is one of the most popular programming languages in the world. Started detailed discussion in Sections 4, 5, and 6. We conclude in Section 7.
as a simple scripting language for web browsers it now becomes language of choice for
millions of engineers in the web, mobile and server-side development. However its 2. V8 engine architecture
interpretational nature doesn’t always provide adequate performance. To speed up execution
of JavaScript programs there were developed several optimization techniques in recent years. In contrast to other JavaScript engines V8 implements compilation to native code
One example of modern high-performing JavaScript engine is a V8 engine used in Google from the beginning. It consists of two JIT compilers: the first (called Full code
Chrome browser and node.js web server among others. This is an open source project which generator) performs fast non-optimized compilation for every encountered
implemented some advanced optimization methods including Just-in-Time compilation, JavaScript function, while the second one (called Crankshaft) compiles and
Polymorphic Inline Caches, optimized recompilation of hot code regions, On Stack optimizes only those functions (and loops) which already ran some amount of time
Replacement &c. In previous year we were involved in project of optimizing performance of and are likely to run further.
V8 JavaScript engine on major benchmark suites including Octane, SunSpider and Kraken.
The project was quite time limited, however we achieved about 10% total performance
improvement compared to open source version. We have decided to focus on following
approaches to achieve the project’s goal: optimized build of V8 itself, because total running
time is shared between compilation and execution; tuning of V8 runtime options which
default values may not be always optimal; implementation of additional scalar optimizations.
All of these approaches have made contribution to final result.
Ключевые слова: JavaScript; optimizations; V8; common subexpression eimination
DOI: 10.15514/ISPRAS-2015-27(6)-2
For citation: Botcharnikov Dmitry. Approaches to Optimizing V8 JavaScript Engine. Trudy
ISP RAN/Proc. ISP RAS, vol. 27, issue 6, 2015, pp. 21-32 (in Russian). DOI:
10.15514/ISPRAS-2015-27(6)-2

Fig. 1 V8 Engine Architecture


1. Introduction The overall work of V8 engine is as follows (Fig.1):
JavaScript is one of the most popular programming languages in the world [1]. • Every new script is preliminary scanned to separate each individual
Started as a simple scripting language for web browsers it now becomes language of function.
choice for millions of engineers in the web, mobile and server-side development. • The function that should run is compiled into Abstract Syntax Tree (AST)
However its interpretational nature doesn’t always provide adequate performance. form.
• AST is compiled into native machine code instrumented with counters for
21 22
Дмитрий Бочарников. Подходы к оптимизации движка JavaScript V8. Труды ИСП РАН, том 27, вып. 6, 2015 г., Dmitry Botcharnikov. Approaches to Optimizing V8 JavaScript Engine. Trudy ISP RAN /Proc. ISP RAS, vol. 27, issue
с.21-32 6, 2015, pp. 21-32

function calls and loop back edges. ction)


• Also on method call sites V8 inserts special dispatch structure called 0.66% [kernel.kallsyms] _raw_spin_unlock_irqrestore
Polymorphic Inline Cache (PIC). This cache is initialized with call to 0.65% libc-2.17.so memchr
generic dispatch routine. After each invocation PIC is populated with direct 0.52% d8 v8::internal::Heap::DoScavenge(v8::internal::Object
call to type specific receiver up to some predefined limit. In such way PICs Visitor*, unsigned char*)
collect runtime type information of objects. 0.51% libc-2.17.so 0x0004f3ac
• The result code then runs. 0.50% d8 int
v8::internal::FlexibleBodyVisitor<v8::internal::New
• When instrumentation counters reach some predefined threshold, “hot” SpaceScavenger,
function or loop is selected for optimized recompilation. v8::internal::JSObject::BodyDescriptor,
• For this purpose V8 one more time recompiles selected function in AST int>::VisitSpecialized<20>
form. But in this case it also performs optimizations. (v8::internal::Map*, v8::internal::HeapObject*)
• It compiles AST into Static Single Assignment (SSA) form (called 0.44% d8 void
Hydrogen) and propagates type information collected by PICs along SSA v8::internal::ScavengingVisitor<(v8::internal::Mark
edges. sHandling)1,
(v8::internal::LoggingAndProfiling)0>::EvacuateOb
• Then it performs several optimizations on this SSA form using type ject
information. <(v8::internal::ScavengingVisitor<(v8::internal::Ma
• After that it generates low level representation (called Lithium), does rksHandling)1,
Register Allocation and generates optimized native code which then runs. (v8::internal::LoggingAndProfiling)0>::ObjectConte
nts)1, 4>(v8::internal::Map*,
Note that V8 optimizing compiler performs much less transformational passes than
v8::internal::HeapObject**,
common ahead-of-time compilers (e.g. gcc, clang/llvm). The reasons behind this we
v8::internal::HeapObject*, int)
further discuss in Section 6.
0.43% d8 v8::internal::ScavengeWeakObjectRetainer::Retain
As(v8::internal::Object*)
3. Approaches to speed up V8 engine 0.43% d8 v8::internal::Scanner::Scan()
To investigate possible areas of V8 optimization we have performed V8 engine 0.37% [kernel.kallsyms] __memzero
profiling on ARM platform with three different profiling tools: Perf [9], ARM
Fig. 2 Several top entries from detailed profile of Octane benchmark by V8 on Linux.
Streamline [10] and Gprof [11]. Each of those has advantages and disadvantages
over others but results are very close: V8 JavaScript engine has no ‘hot’ functions in We have decided to focus on following approaches to achieve the project’s goal:
itself that need to be optimized. Different methods show different functions in order
• Optimized build of V8 itself, because total running time is shared between
of share to total execution time. This is clear evidence that individual function’s
compilation and execution.
contribution is very small compared to precision of measurement. Thus optimization
of individual functions can’t achieve much increase in performance. • Tuning of V8 runtime options which default values may not be always
In following table object identified as perf-2549.map is a code generated by V8 optimal.
engine. • Implementation of additional scalar optimizations.
Overhe Shared Object Symbol All of these approaches have made contribution to final result.
ad
65.11% perf-2549.map 0x5aba4000 4. Optimized build
0.76% d8 v8::internal::Scanner::ScanIdentifierOrKeyword() We have decided to investigate Link Time Optimization [12] and platform options
0.75% d8 v8::internal::IncrementalMarking::Step tuning [13]. The latter gave us small outcome (~0.5%) while former have decreased
(int,v8::internal::IncrementalMarking::CompletionA performance.
23 24
Дмитрий Бочарников. Подходы к оптимизации движка JavaScript V8. Труды ИСП РАН, том 27, вып. 6, 2015 г., Dmitry Botcharnikov. Approaches to Optimizing V8 JavaScript Engine. Trudy ISP RAN /Proc. ISP RAS, vol. 27, issue
с.21-32 6, 2015, pp. 21-32

We have made investigation on Arndale ARM (Samsung Exynos 5250 CPU) 5. Runtime parameters tuning
development board running Linux with Linaro gcc 4.7 toolchain for the first V8 engine has quite large set of parameters which guides JIT compilation and
investigation and the same board running Android 4.4 with Android NDK 9 Linaro execution of JavaScript programs. We have found that their default values are not
toolchain for the second one. adequate in all cases, e.g. we have found that disabling lazy compilation can
We have specified the following platform options: substantially improve performance.
• -O3 for highest optimization level As noted in Section 2 V8 performs preliminary parsing of each new script source to
separate each individual function. However when we specify parameter ‘--no-lazy’,
• -mcpu=cortex-a15 for target CPU.
it instead compiles all functions at once in given script.
Enabling this mode has various impacts on different benchmark. We can see big
degradation of CodeLoad test score by about 40% while in the same time huge
increase 2.5 times of MandreelLatency test score. The overall increase about 5%
was also reproduced on Galaxy Note 3 devices running Android 4.4.

Fig. 3 Effect of LTO

Fig. 5 Effect of eager compilation on Octane benchmarks.

6. Scalar optimizations
We have tried to implement several well-known scalar optimizations in V8 however
with varying success. In contrast to ahead of time compilers for classic imperative
languages such as C/C++, Pascal, Ada &c., just-in-time compiler has to share time
among analysis, optimization and execution. That’s why sophisticated optimizations
which require thorough analysis don’t necessarily lead to increasing performance in
such case.
As noted in Section 2 the V8 engine performs optimized compilation of ‘hot’
regions similar to off-line compiles did. At this stage PICs already collected type
Fig. 4 Effect of platform options tuning

25 26
Дмитрий Бочарников. Подходы к оптимизации движка JavaScript V8. Труды ИСП РАН, том 27, вып. 6, 2015 г., Dmitry Botcharnikov. Approaches to Optimizing V8 JavaScript Engine. Trudy ISP RAN /Proc. ISP RAS, vol. 27, issue
с.21-32 6, 2015, pp. 21-32

information so we can apply well-known scalar optimization techniques in AST and We have found that running this optimization before and after Global Value
SSA representations. Numbering gives net effect about 2% performance improvement.
The platform used in benchmark was Samsung Galaxy Note 3 with Qualcomm
Snapdragon (N9005) CPU. Devices run Android 4.4.2 (KitKat). Octane benchmark
suite used in tests was Version 9 download from corresponding repository. For
development we use Android NDK r9c on Linux x86_64 Ububtu 12.04 TLS

6.1 Algebraic Simplification


The Algebraic Simplification uses algebraic identities like a - 0 = a to simplify
expressions. This transformation was implemented in V8 parser when it builds AST
representation for Crankshaft.
As was noted above at this point we have collected type information so we can
safely optimize algebraic expression given that operands are numeric.
Despite the large amount of optimized expressions in Octane benchmark suite the
final result was very small.

Fig. 7 Effect of Global Common Subexpression Elimination

6.3 Fast call frame for ARM


In our investigations we also have found interesting instruction sequence that speeds
up call frame management on original ARMv7 CPUs.
To support EABI [15] compiler typically generate the following prologue and
epilogue in each function.
Prologue:
func:
stmdb sp!, {r4-r5, fp, lr}
add fp, sp, #N
Epilogue:
mov sp, fp
Fig. 6 Effect of Algebraic Expression Simplification
ldmia sp!, {r4-r5, fp, lr}
bx lr
6.2 Common Subexpression Elimination
We have found however that the following sequences of instruction while provide
V8 engine already has implemented Global Value Numbering optimization which the same functionality are executed faster on ARMv7 CPUs:
eliminates redundant code. However there are related but not identical optimizations Prologue:
such as Constant Propagation and Common Subexpression Elimination. For their
func:
differences see [14].
sub sp, sp, #16
Because V8 already has some kind of Constant Propagation we decided to
stm sp, {r4,r5,fp, lr}
implement Global Common Subexpression Elimination in SSA form.
add fp, sp, #N
27 28
Дмитрий Бочарников. Подходы к оптимизации движка JavaScript V8. Труды ИСП РАН, том 27, вып. 6, 2015 г., Dmitry Botcharnikov. Approaches to Optimizing V8 JavaScript Engine. Trudy ISP RAN /Proc. ISP RAS, vol. 27, issue
с.21-32 6, 2015, pp. 21-32
Epilogue: [5]. Wingo A., On-stack replacement in V8, June 20, 2011
mov sp, fp (https://wingolog.org/archives/2011/06/20/on-stack-replacement-in-v8)
[6]. Octane 2.0 (https://chromium.github.io/octane)
ldm sp, {r4, r5, fp, lr} [7]. SunSpider 1.0.2 JavaScript Benchmark
add sp, sp, #16 (https://www.webkit.org/perf/sunspider/sunspider.html)
bx lr [8]. Kraken JavaScript Benchmark (version 1.1) (http://krakenbenchmark.mozilla.org)
The results on synthetic benchmark (~2 million calls, sec): [9]. perf (Linux), Wikipedia, (https://en.wikipedia.org/wiki/Perf_%28Linux%29)
[10]. Streamline Performance Analyzer, (http://ds.arm.com/ds-5/optimize)
[11]. Gprof, Wikipedia, (http://en.wikipedia.org/wiki/Gprof)
[12]. Interprocedural optimization, Wikipedia
(https://en.wikipedia.org/wiki/Interprocedural_optimization)
[13]. GCC ARM options (https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/ARM-
Options.html#ARM-Options)
[14]. Muchnik S., Advanced Compiler Design and Implementation, Morgan Kauffmann
Publishers, San Francisco, USA, 1997, 856p
[15]. Application Binary Interface for the ARM Architecture v2.09
(http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0036b/index.html)

Подходы к оптимизации движка


JavaScript V8
Дмитрий Бочарников <[email protected]>
It is interesting however, that such results are not reproduced on Qualcomm Московский исследовательский центр Cамсунг,
Snapdragon 800 CPU. Москва, ул. Двинцев, 12, корп. 1

7. Conclusion Аннотация. JavaScript является одним из наиболее распространенных языков


программирования. Однако производительность движков JavaScript не всегда
We have found that even in the presence of type information in V8 optimizing удовлетворительна. Автором разработаны подходы, позволяющие повысить
compiler application of traditional scalar optimizations in JavaScript gives производительность движка V8 на 10% на основных тестовых наборах.
diminishing returns.
Ключевые слова: JavaScript, оптимизации, V8, исключение общих подвыражений
On the other hand successful application of optimized build gives us evidence that
there is a space for optimizations in JavaScript engines. DOI: 10.15514/ISPRAS-2015-27(6)-2
Для цитирования: Бочарников Дмитрий. Подходы к оптимизации движка JavaScript
References V8. Труды ИСП РАН, том 27, вып. 6, 2015 г., стр. 21-32. DOI: 10.15514/ISPRAS-2015-
[1]. TIOBE Index for October 2015 27(6)-2.
(http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html).
[2]. Chrome V8, September 10, 2015 (https://developers.google.com/v8/?hl=en) Литература
[3]. Just-in-time compilation, Wikipedia, October 17, 2015 [1]. TIOBE Index for October 2015
(https://en.wikipedia.org/wiki/Just-in-time_compilation) (http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html).
[4]. Hölzle U., Chambers C., Ungar D. Optimizing Dynamically-Typed Object-Oriented [2]. Chrome V8, September 10, 2015 (https://developers.google.com/v8/?hl=en)
Languages With Polymorphic Inline Caches, ECOOP ‘91 proceedings, Springer Verlag [3]. Just-in-time compilation, Wikipedia, October 17, 2015
Lecture Notes in Computer Science 512, July, 1991 (https://en.wikipedia.org/wiki/Just-in-time_compilation)

29 30
Дмитрий Бочарников. Подходы к оптимизации движка JavaScript V8. Труды ИСП РАН, том 27, вып. 6, 2015 г., Dmitry Botcharnikov. Approaches to Optimizing V8 JavaScript Engine. Trudy ISP RAN /Proc. ISP RAS, vol. 27, issue
с.21-32 6, 2015, pp. 21-32
[4]. Hölzle U., Chambers C., Ungar D. Optimizing Dynamically-Typed Object-Oriented
Languages With Polymorphic Inline Caches, ECOOP ‘91 proceedings, Springer Verlag
View publication stats

Lecture Notes in Computer Science 512, July, 1991


[5]. Wingo A., On-stack replacement in V8, June 20, 2011
(https://wingolog.org/archives/2011/06/20/on-stack-replacement-in-v8)
[6]. Octane 2.0 (https://chromium.github.io/octane)
[7]. SunSpider 1.0.2 JavaScript Benchmark
(https://www.webkit.org/perf/sunspider/sunspider.html)
[8]. Kraken JavaScript Benchmark (version 1.1) (http://krakenbenchmark.mozilla.org)
[9]. perf (Linux), Wikipedia, (https://en.wikipedia.org/wiki/Perf_%28Linux%29)
[10]. Streamline Performance Analyzer, (http://ds.arm.com/ds-5/optimize)
[11]. Gprof, Wikipedia, (http://en.wikipedia.org/wiki/Gprof)
[12]. Interprocedural optimization, Wikipedia
(https://en.wikipedia.org/wiki/Interprocedural_optimization)
[13]. GCC ARM options (https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/ARM-
Options.html#ARM-Options)
[14]. Muchnik S., Advanced Compiler Design and Implementation, Morgan Kauffmann
Publishers, San Francisco, USA, 1997, 856p
[15]. Application Binary Interface for the ARM Architecture v2.09
(http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0036b/index.html)

31 32

You might also like