Nomad Cross-Platform Computational Offloading and Migration in Femtoclouds Using WebAssembly
Nomad Cross-Platform Computational Offloading and Migration in Femtoclouds Using WebAssembly
Abstract—Latency and privacy concerns, together with the paper, we use the term edge to denote this paradigm where
spread of IoT devices, have recently sparked interest in edge commodity hardware on the edge is used to opportunistically
computing and computational offloading to the edge and be- perform computations on behalf of each other.
2021 IEEE International Conference on Cloud Engineering (IC2E) | 978-1-6654-4970-0/21/$31.00 ©2021 IEEE | DOI: 10.1109/IC2E52221.2021.00032
of writing one from scratch, we modify an existing interpreter, We make the following contributions:
wasm3 [38], which is the fastest WebAssembly interpreter • Identify portability and migratability as core requirements
available, and modify it to support migration. Our system is for computational offloading in edge femtoclouds.
purely user-space and does not modify WebAssembly itself in • Develop an interpreter architecture that decouples the
anyway. runtime state from host architecture.
• Build Nomad, the first cross-platform live-migration
We introduce several architectural changes and new com-
system for WebAssembly.
ponents to wasm3 to enable migration (Sec. III). First, we
• Conduct a thorough evaluation of migration cost and
decouple the program state from the host to make it con-
performance for Nomad.
sistent across systems. We replace native calls implementing
WebAssembly calls with a virtual call stack that holds the II. BACKGROUND AND W EBA SSEMBLY
call frames of WebAssembly functions. Similarly, we break
Offloading to edge femtoclouds requires portability and
and continue from loops that tail-call their target instructions
migratability to account for the heterogeneity and relative
instead of returning to the loop head. This yields a continuation-
instability of femtocloud infrastructure. To motivate the need
passing style (CPS) execution of the interpreter which, with
for Nomad, we show next that common platforms used today do
the help of tail-call optimization, keeps the execution fast and
either portability or migratability but not both sufficiently. After
the native stack flat. Second, we add the migrator, which is
that, we provide an overview of WebAssembly, its code format,
responsible for pausing the program, saving its state on the
execution model and runtime environments and why it is, with
migration source, and restoring the paused program on the
some additions, a potentially superior femtocloud platform.
migration destination. When the migrator is triggered, it stops
This will guide our description of the Nomad architecture in
the interpreter, and the migrator starts serializing the program
the next section.
state. The migrator serializes absolute pointers in the virtual
call stack to relative pointers which can be migrated. A state A. Portability and migratability options
writer writes the serialized state to a file which can be migrated Portability and migratability are old problems that have
and given to a state reader. The state reader reloads the paused been tackled historically under two main paradigms, a system-
program, translates relative addresses to the corresponding oriented approach and a language-oriented approach. System-
absolute addresses in the target, and continues from where it oriented approaches use system-level mechanisms to isolate
was stopped. programs from the host environment providing a level of
We implement a fully working cross-platform prototype of portability. Language-based approaches use a non-compiled
Nomad and we systematically evaluate its performance on language and rely on the existence of the language runtime on
five platforms of different operating systems and hardware all target platforms to run the same code. We use portability
architectures (Sec. IV). We use standard benchmarks and a set to denote the ability to run across operating systems and
of functions each representative of a kind of computational hardware architectures and also independence from specific
load. Our tests show our system is on par with wasm3, programming languages. For migratability, we focus on hot-
the fastest available WebAssembly interpreter that does not migration exclusively since cold migration, i.e., restarting a
support migration, exceeding it on some platforms with minimal service independently on another host, is easily achievable for
overhead on others. Our system also migrates programs with stateless apps. For stateful ones, it requires application-specific
a mean increase in CPU and memory usage post-migration rewriting to implement persistence [33], thus not generalizable.
that is less than 5%, and a startup time, serialization time, 1) System-based approaches: Virtual machines (VMs) are
and restoration time under 2ms in most tests. The system relatively simple to migrate but suffer from high overhead.
can also migrate repeatedly with no special overhead for On the other hand, containers are more lightweight since they
repetition. Finally, we successfully carry out cross-platform make direct use of native resources which comes at the cost
migration between Linux and Windows and also between x86- of sacrificing portability across incompatible systems. System-
64 and aarch64. We show that execution speed cross-platform based platforms perform at near-native speed and thus are ideal
is comparable to running on a single platform. for high-performance applications running on relatively stable
169
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE GOIAS. Downloaded on April 10,2024 at 09:22:23 UTC from IEEE Xplore. Restrictions apply.
(a) C source (b) wasm text format (c) Trace
Fig. 1: Tracing a function in WebAssembly model
infrastructure. In literature, the focus has been on improving
the migration overhead in terms of time and space [39], [40],
[41], [42], [40], [26].
2) Language-based approaches: Language runtimes can be
developed for many platforms which allow programs in that
language to run on many potentially incompatible systems, e.g.,
Java and JavaScript. However, since these programs are not
pre-compiled, there is a certain performance overhead in using
portable language runtimes [2], [3]. Moreover, this approach
requires that particular programming languages are used, which
is not always possible, especially with preexisting libraries. Fig. 2: WebAssembly in the browser
The state of the program running on one system cannot be
directly migrated to another since the representation is different in efficiency to system-based platforms. To better understand
on each platform. the components of Nomad, we start with an overview of how
The characteristics of the different approaches are sum- WebAssembly itself works.
marized in Table I. As the table shows, system-oriented 1) The WebAssembly format: WebAssembly is a stack-based
approaches are generally migratable but less portable due to language using structured control flow and only primitive data
their reliance on the host system. On the other hand, language- types. WebAssembly states its key design goals as efficiency,
based approached tend to be more portable as they rely on a safety, debuggability and web integration [37]. WebAssembly
portable language runtime, but are less performing, they tie uses a binary format with an accompanying standard text format.
developers to specific programming languages and are generally It is a compilation target for high-level languages, e.g., C++
not migratable. and Rust, and itself can be compiled to native code to run
efficiently in-browser.
B. WebAssembly Overview Figure 1 shows an example of tracing the execution of
As web content gets increasingly richer, more and more abstract WebAssembly for a function that computes the absolute
computations are happening on the client-side, and a platform value. This is for illustration purposes only as WebAssembly
for fast code execution on the web is necessary. Numerous actually has a built-in absolute value instruction. Grayed-out
solutions arose over time including e.g., Java applets and boxes indicate values that have just been popped. After the
Google’s LLVM-based PNaCl [43]. Earlier solutions did not caller places the arguments on the stack, it executes call,
see the aspired success due to their poor integration with the which pops the argument to the locals of the callee and transfers
web platform and the lack of combined support from all the control to the first instruction in the callee. Next get pushes
browsers. Web-browser vendors recognized the need for a a local and const pushes a numerical constant to the stack.
unified solution, hence, in 2017, all the major browser vendors ge_s checks if the first argument is greater than or equal to
announced the standardization of WebAssembly (wasm), a the second argument. if pops the condition from the stack
binary format for code to run on the web. Table I lists and the if-branch is taken iff the popped value is non-zero,
WebAssembly and shows it is, for our metrics, more portable here the else-branch is taken. At the end, the final value that
than current language-based platforms while still being close is on the stack is the return value of the function.
170
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE GOIAS. Downloaded on April 10,2024 at 09:22:23 UTC from IEEE Xplore. Restrictions apply.
Fig. 3: Nomad Architecture
2) Running WebAssembly: WebAssembly is only a spec- Massey and Shymanskyy which is, as of date, the fastest
ification of a format and its formal semantics which can interpreter for WebAssembly. We modify wasm3 to decouple
run in practice in multiple ways including in-browser, in a the runtime from the host system to enable migration and
standalone runtime, and through an interpreter. In all methods, add a migrator which outputs dumps of running programs and
the code starts as source code that is compiled to WebAssembly resumes programs from state dumps. Throughout the paper, we
which is distributed and runs using one of the methods refer to the original interpreter as wasm3 and to the modified
above. Running in-browser is the originally intended way interpreter as Nomad. All of our changes are in the interpreter
of running WebAssembly, which is illustrated in Fig. 2. which uses the standard WebAssembly format with the standard
WebAssembly runs inside the JavaScript engine which just-in- WASI extension and runs in user-space.
time compiles the WebAssembly to native code and executes
A. Nomad Architecture Overview
it. Currently, WebAssembly requires JavaScript glue which
loads the WebAssembly module, compiles it to machine code, The architecture of Nomad is depicted in Fig. 3, and com-
instantiates its imports then invokes some exported function. ponents that are new or modified from wasm3 are highlighted.
WebAssembly on its own has no concept of a main function. The input WASM file is compiled into an internal intermediate
Instead, all globals are initialized at instantiation, then We- representation (IR). Each program has a value stack, a memory
bAssembly acts as a library of functions that can be called from (heap), a set of globals and, in Nomad, a virtual call stack.
JavaScript. There are also standalone WebAssembly runtimes The IR also uses a set of virtual registers which are part
that can run standalone WebAssembly modules outside the of the program state. The program state encapsulates the
browser. These are usually just-in-time compilers that run runtime state of the program, and is what we transfer from
WebAssembly locally without the JavaScript glue. Finally, source to destination alongside the source WebAssembly during
WebAssembly can also be run using an interpreter which comes migration.
with the usual trade-off of interpreters, lower performance for Execution is carried out by the engine, a set of functions
higher portability. Non-browser solutions, including standalone in the interpreter that each interprets a particular instruction
runtimes and interpreters, may also support a standard extension of the IR. Some of these functions are modified to enable
proposal called WebAssembly System Interface (WASI), which migration as we will see later. In particular, in wasm3, when a
defines a standard interface for WebAssembly to perform WebAssembly function call is executed, the state of the caller
system-calls across different platforms which allows for stan- is saved on the native stack. This is modified to a virtual call
dalone WebAssembly executables. stack holding the virtual registers and the return address of the
caller WebAssembly function. Loops are similarly modified
III. N OMAD SYSTEM ARCHITECTURE to avoid native function calls becoming part of the program
state thus enabling migration. Figure 4 shows an example
In this section, we describe our implementation of Nomad, WebAssembly function, how it is represented in wasm3 IR and
an interpreter for WebAssembly that provides fine-grained how one of the engine functions looks like.
control over execution and live-migration of code across When a migration trigger is fired, control goes from the
heterogeneous OS and ISA platforms. Instead of developing interpreter engine to the state writer in the migrator, which
one from scratch, we modify the wasm3 interpreter [38] by writes the state of the program to a state dump file. It utilizes
171
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE GOIAS. Downloaded on April 10,2024 at 09:22:23 UTC from IEEE Xplore. Restrictions apply.
Fig. 4: IR and engine functions
the pointer serializer, which translates absolute pointers to The virtual registers keep track of the state of the program
relative ones using tables of functions and bridges in the IR. We during execution and include the program counter pc pointing
migrate both the source WebAssembly file and the state dump. to the IR, the value stack pointer sp and the call stack pointer
On target, the state reader uses the state dump to recreate cs . An important optimization in this IR compared to abstract
the program state, the pointer serializer translates relative WebAssembly is introducing an accumulator virtual register r0
pointers back to addresses matching the newly compiled IR to cache intermediate values of computations, which speeds
and execution is resumed as seen in Figure 5. up execution as fewer reads and writes happen to the value
stack. For example, Fig. 4 shows a WebAssembly function
that adds three integers and the corresponding IR in wasm3.
B. Interpreter First, op_Add_ss 0 1 reads the values at stack slots 0 and
WebAssembly is defined in terms of a virtual stack machine 1, which are the first two parameters, adds them, then stores
which is inefficient to interpret directly. The interpreter starts to the accumulator r0 . Next op_Add_rs 2 adds r0 to the
at the compiler which compiles the input WebAssembly to an value in slot 2, which is the third parameter, and stores the
internal wasm3 IR that implements some optimizations which result in r0 . Then, op_SetSlot 0 copies r0 to slot 0, which
make it more suitable for real hardware and Nomad does not is conventionally the return-value slot. Finally, op_Return
modify this IR. In this IR, each opcode is encoded as a pointer exits the WebAssembly function and resumes its caller.
to the function in the engine that interprets it. WebAssembly The virtual call stack maintains the states of function calls,
functions are just-in-time (JIT) compiled to the IR on the which is a new component we introduced. In wasm3, the call
first invocation. To avoid repeated reallocations, the wasm3 stack is implicit in the native stack which is not portable nor
compiler allocates a new page of code whenever necessary and easily programmatically accessible. WebAssembly function
injects a bridge that jumps from the previous page to the next calls are implemented through native calls to the first operation
page, since allocated pages are not contiguous in memory. This in the callee. The state of the caller stays in the native stack’s
speeds up compilation time, thus startup time; but complicates caller frame. On function return in WebAssembly, it returns in
migration when a single function spans multiple pages as we the native stack to the caller frame and continues. In Nomad,
describe in the migrator section. We discuss further details we modify this so whenever a WebAssembly function call is
of the IR throughout this section as we describe the rest of encountered, the virtual registers of the caller are first saved
the Nomad architecture. The engine interprets the instructions into a virtual call stack which is in the heap. This allows the
of the IR modifying the program state in the process. We call stack to be serialized at migration time. This introduces
modify some engine functions and the structure of the program the call stack pointer cs which is not a part of wasm3. When
state to make it independent of the host system, which enables a WebAssembly function hits return operation, Nomad pops
migration. a frame off the virtual call stack and loads its registers into
1) Program State: The program state consists of the com- the virtual registers. A side-effect problem that arises from
ponents that change during execution and these are the parts this implementation is that the last return that terminates the
we write to the state dump file during migration. This consists program will try to read invalid memory that is beyond the
of the globals, the memory (heap), the value stack, virtual beginning of the virtual call stack. To solve this, we insert a
registers and the virtual call stack. Some of these are implicit dummy operation at the beginning of the stack when starting
in the native runtime in wasm3 and we factor these out to the program. The final return will call this operation which
in-memory data structures to be able to write them out to the performs a long jump to the part of the interpreter that called
state dump. the first instruction in the WebAssembly program.
Globals, memory and the value stack are just arrays in the 2) Engine: The instructions of the IR are interpreted by
heap and they store all the data of the program. The value the functions in the interpreter engine. An operation in IR
stack in wasm3 is different from the abstract value stack in is stored as a pointer to the function that interprets it. Each
WebAssembly. Each function has a base stored in a virtual Engine function after executing its operation, fetches the next
stack pointer. The IR reads and writes to offsets from the base operation from IR then jumps to it. Each function calls the
pointer which are embedded in the IR. next function which results in continuation-passing style (CPS)
172
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE GOIAS. Downloaded on April 10,2024 at 09:22:23 UTC from IEEE Xplore. Restrictions apply.
the state reader reconstructs the state from the dump file and
the WebAssembly file is recompiled to IR.
A crucial decision to make is where to check the migration
flag in the Engine functions. We can check after every operation,
which will instantly respond to the migration trigger, but slow
down the program. We can check just at function boundaries,
which is what many migration methods do. We decided to
check at function boundaries and loop heads. A program cannot
execute long without going into a loop or calling a function,
Fig. 5: Migration in Nomad but this still allows many operations to run uninterrupted.
Nevertheless, this is a choice that can be reconfigured, and we
execution of the program. Each of the engine functions receives evaluate some of the alternatives in the next section (Sec. IV).
the wasm3 virtual registers as parameters. As most modern 1) Pointer Serializer: At runtime, the virtual call stack stores
hardware passes the first few function arguments in registers, actual pointers to the return addresses. However, since the IR is
the wasm3 registers are actually mapped to hardware registers, compiled separately, it can have different addresses on different
resulting in very efficient interpretation. Figure 4 shows an machines, absolute pointers cannot be migrated. The pointer
example of an engine function that adds two stack slots and serializer converts between absolute and relative representation
stores the result in the accumulator register r0 of pointers, which is detailed in Algorithm 1. As mentioned
Some wasm3 engine functions are modified to accommodate earlier, functions are compiled to IR on the fly to potentially
migration, such as function call and function return operations separate pages. We call each continuous block of the function
we mentioned earlier. Another example of a change in Nomad a snippet. The code of a function consists of a list of snippets
is loops. Loops in wasm3 use the native stack. The loop head where each snippet ends with a bridge to the next snippet. As
calls the first loop body operation and the break and continue functions can consist of a list of snippets not contiguous in
operations return in the native stack to the loop head which memory, the offset of a pointer from the base of its function
decides whether it should rerun the loop. In Nomad, the loop is not trivial. To solve this, the pointer serializer inserts hooks
does not introduce a new native stack frame. Instead, continue into the compiler to create snippets, a map from snippets
calls the loop head and break calls the operation after the loop. to functions sorted by snippet address; and bridges, a sorted
This way, if we migrate in the middle of the loop, resuming array of bridges’ positions and their target snippet address.
does not rely on the native stack frame of the loop head below Abs2RelPc searches snippets to find the id and the base
the current operation frame. of the function to which pc belongs. Next, it traverses the
As each Engine function calls the next function, it is essential bridges from base to reach pc computing the offset along the
that the Engine uses tail-call optimization (TCO) to ensure
the stack does not overflow. This is platform-dependent and
the Engine might need customization for different platforms. Algorithm 1: Pointer Serializer
For example, we modify the order of the parameters of the 1 funcs = Sorted block starts and the function the block belongs to;
Engine functions on Windows to ensure TCO, as well as some 2 bridges = Sorted array of bridge positions and their targets;
operations inside the engine functions which cause clang on 3 Function Abs2relPc(pc):
4 fnId ← greatest function in funcs ≤ pc;
Windows not to be able to do TCO. 5 base ← Start of function fnId’s code;
6 offset ← 0;
C. Migrator 7 while true do
8 nextBridge ← first bridge with position ≥ base;
The migrator is the last component of Nomad in Fig. 3 9 if base > pc or nextBridge.pos < pc then
10 offset ← offset + (nextBridge.pos - base);
and it allows pausing and restoring programs. The state writer 11 base ← nextBridge.target;
produces the state dump from the running program and the 12 else
13 offset ← pc - base;
state reader resumes the program from a state dump. Both 14 break;
make use of the pointer serializer to translate pointers between 15 end
16 end
absolute and relative representations as the program state layout 17 end
in memory can change between systems. 18 Function Rel2AbsPc(fnId, offset):
The migration flow is depicted in Figure 5. The migration 19 base ← Start of function fnId’s code;
20 while offset > 0 do
trigger when fired, sets a flag that is checked by Engine 21 nextBridge ← first bridge with position ≥ base;
functions in the interpreter. If the flag is set, the function 22 if nextBridge.pos < base + offset then
23 offset ← offset − (nextBridge.pos - base);
transfers control to the state writer instead of calling the next 24 base ← nextBridge.target;
operation. The state writer then serializes the components of 25 else
26 pc ← base + offset;
the program state into a state dump file. The IR is not part 27 break;
of the serialized dump because the IR does not change as the 28 end
29 end
program executes. Instead, the source WebAssembly itself is 30 end
migrated to the target alongside the state dump. On the target,
173
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE GOIAS. Downloaded on April 10,2024 at 09:22:23 UTC from IEEE Xplore. Restrictions apply.
Algorithm 2: State Writer and Reader IV. E VALUATION
1 Procedure StateWriter : In this section, we present our evaluation of Nomad, both
2 call pointer serializer to sort funcs and bridges;
3 output {fnId, offset} << Abs2relPc(pc);
as compared to the original interpreter wasm3 and in terms of
4 output {sp-relative} << sp − valStackBase; the performance of migrating code. Our solution is the first
5 output {cs-relative} << cs − callStackBase;
6 output << r0 ;
implementation of truly live migration of WebAssembly. Hence,
7 output << {Globals, Memory, Value Stack}
we are unable to benchmark our solution directly against other
8 foreach frame f in call stack do solutions. The range of hardware we used exceeds what can
9 output {f.fnId, f.offset } << Abs2relPc(f.return-address); be used with current migration solutions such as containers.
10 output {f.bp-relative} << f.bp - valStackBase;
11 end Having said that, we discuss other cross-platform migration
12 end solutions and how they compare to Nomad in the next section.
13 Procedure StateReader :
{fnId, offset } ← input;
14
15 sp ← input {sp-relative} + valStackBase;
A. Experimental setup
16 cs ← input {cs-relative} + callStackBase; As a platform for offloading to edge femtoclouds, we must
17 r0 ← input;
18 {Globals, Memory, Value Stack} ← input;
consider the myriad conditions that arise in an edge setting.
19 compile(function fnId);
We run sample WebAssembly programs and benchmarks and
20 foreach frame f in call stack do migrate them with Nomad to evaluate our implementation. We
21 {f .fnId, f .offset } ← input;
22 f .bp ← input {f .bp-relative} + valStackBase;
use a variety of devices to represent the range of hardware
23 compile (function f .fnId); and software typically existing in an edge environment. The
24 end
devices in our testbed are listed in Table II.
25 call pointer serializer to sort bridges;
For the set of programs to test, we use a collection that
26 pc ← Rel2AbsPc(fnId, offset );
27 foreach frame f in call stack do represents different kinds of computational loads. We use the
28 f .return-address ← Rel2AbsPc(f .fnId, f .offset ); programs listed in Table III to benchmark Nomad. CoreMark
29 end
30 resume execution from *pc; [44] is a standard CPU benchmark widely used in industry to
31 end benchmark CPU performance. Our system is an architecture
that interprets WebAssembly instructions, so it is appropriate
to use this benchmark. Loops and function calls are the two
way. On the other hand, Rel2AbsPc takes the id and the components changed in the engine and function calls also
offset, it starts from the base of the function of given id, then test our new component: the virtual call stack. Hence, for the
traverses bridges until it reaches the required offset. rest of the tests, we use our own programs which represent a
2) State Reader and Writer: At migration time, the state combination of loop-heavy and function call-heavy tasks. n
writer takes the program state and serializes it into a portable is the input size, e.g., insertion sort was run with an array of
state dump. Conversely, the state reader takes this dump and 30000 integers. The Fibonacci test uses these two formulae:
reconstructs the program state. The process is not a simple n−1
n
memory dump, however, and involves a few transformations F2n = F2i+1 , F2n+1 = 1 + F2i .w
as detailed in Algorithm 2. We use the syntax output{name} i=0 i=0
<< expression to denote that expression is written into output which was actually produced through compiler optimizations
file and we are naming that expression name to refer to it from C code that implements the common formula
later. First, the writer converts pc to relative, then writes all
the virtual registers to file, subtracting sp and cs from their Fn = Fn−1 + Fn−2
base (lines 3-6). Next, global variables, the memory and the The CoreMark test was not used in experiments where actual
value stack are written directly since they are just values (line migration happens because it uses the system timer to measure
7). Finally, the writer iterates over frames of the call stack,
converting pc and bp (stack base of the previous frame) of
Device OS (kernel) Processor
each frame to relative pointers and outputting them (lines 8-10).
laptop Fedora 32 (5.10.8) Core i5-8250u
The state reader reverses the process to reconstruct the process laptop Windows 10 Core i5-8250u
state. Since wasm3 JIT-compiles WebAssembly to IR on the Desktop Ubuntu 18.04 (4.15.0) Core i7-3770
first invocation, the state reader in Nomad precompiles the Phone Android 9 (4.9.59) Snapdragon 845
Pi B V1.1 Raspbian (5.4.79) BCM2836 (32-bit)
functions in the call stack and the function corresponding to
pc , so that it can reconstruct the absolute addresses for them. TABLE II: Experimental testbed
In addition, compiling each function changes the bridges table,
which has to be sorted in the pointer serializer. To avoid sorting Program n notes
repeatedly, the reader does two passes on the call stack, once CoreMark (cm) - runs for 10s
Matrix Multiply (mm) 1000 (n, n) × (n, n)
reading the relative pointers and compiling the functions (lines Fibonacci (fib) 43
20-24), then sort bridges and pass again to convert all pointers Insertion Sort (is) 30000 linear search
(lines 27-29). Finally, the reader jumps to pc to resume the
TABLE III: Benchmark Programs
execution of the program.
174
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE GOIAS. Downloaded on April 10,2024 at 09:22:23 UTC from IEEE Xplore. Restrictions apply.
(a) time (a) time
175
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE GOIAS. Downloaded on April 10,2024 at 09:22:23 UTC from IEEE Xplore. Restrictions apply.
(a) Startup (b) Serialization (c) Restoration
Fig. 8: Additional time costs
state transformations. This experiment tests the extent of this migration trigger going off to Nomad transferring control to
problem in Nomad. The results are shown in Figure 8. Figure the state writer. Two scenarios we consider are checking the
8a shows the startup time of Nomad, which comprises of the migration flag on function boundaries only and checking on
initialization of the interpreter runtime and compiling of the both function boundaries and loop heads. In the first case,
input WebAssembly into the internal IR, is very minimal and the reaction time is instant for fib as it is recursive, but is
is less than 1ms for laptop/desktop and barely reaches 2ms essentially ∞ for matmul and is since these are nested loops
for the Pi. It also shows the startup time is comparable for all with no function calls. Checking the migration flag both at
programs thanks to JIT-compilation which delays compilation function boundaries and loop heads (the default we use in all
of functions until they are called. Figures 8b and 8c show other tests) leads to instant reaction time, less than 1μ s which
the serialization and restoration delays for our system and are is beyond the precision of the timers available to us. Both
plotted on a semi-log scale to fit the wide range of values. They cases cannot be plotted in a useful way.
show that the serialization and restoration delays are mostly Finally, in a deployed migration system, the network also
less than 2ms, with the exception of matrix multiplication. The introduces a delay when we migrate the state file from source
reason for this is that the matrix multiplication program uses to target. We do not measure this delay because it is dependent
a lot of memory (~12MB of matrices) and writing this out to on the network characteristics, not our system. Any migration
the state dump file bottlenecks on I/O. system will incur such a network delay depending on the
4) Cross-migration: So far, we have only emulated migra- network and the migration and scheduling strategy which are
tion by pausing the program then resuming it from a state not under the scope of this paper.
dump on the same device. In the next experiment, we carry
out migration across different operating systems and hardware V. R ELATED W ORK
architectures. For each test, we run the program on the source
device and pause it around half-way through. Then we transfer A. Cross-platform migration
the state dump file to the destination device and resume the Several attempts have been made to achieve migration
program to completion. The plots in Fig. 9 show for each test across different platforms. Owing to its de facto status as
the time it takes to run on source (blue) and destination (orange) the programming language of the web, prior work attempted
without interruption compared to the time it takes to run half to harness its ubiquity to migrate programs. Kwon and Moon
on source, migrate and complete on destination (red and purple [29] modified a JavaScript engine to capture the state of the
stacked). The halfway mark is only approximated from the program at migration time. In ThingsMigrate [30], Gascon-
full time on source. The key finding from this figure is that Samson et al instrument the input JavaScript program to keep
cross-migration works in Nomad and the total execution delay track of the state. This object is migrated and used to recreate
is close to the average of execution delay on source and target. the paused state at the destination. Jeong et al [31] take this
Thus, cross-migration does not introduce any considerable a step further and allow the JavaScript to call WebAssembly
overhead compared to running the program uninterrupted. functions. The WebAssembly functions, however, are stateless,
Finally, Nomad migration is compatible even with different and at migration time, only the JavaScript state is migrated
OSes and ISAs on source and destination, which is beyond and the WebAssembly functions are recompiled. All these
what for example containers can achieve. This is consistent with approaches are useful for app migration, but not live-migration
our architecture as the program state is completely independent of computational loads as they rely on the JavaScript event
from the architecture, hence migration to a different platform engine to process migration, so no actively running computation
is equivalent to migrating to the same platform. can be migrated.
5) Additional costs: We run other tests which are not plotted Recently docker added support for multi-arch images with
here for brevity or because they are not precisely measurable. buildx [45]. This bundles the images for several architectures
First, we measured reaction time, the time it takes from the into a single image and the docker engine selects the correct
176
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE GOIAS. Downloaded on April 10,2024 at 09:22:23 UTC from IEEE Xplore. Restrictions apply.
(a) laptop-linux to android (b) laptop-linux to laptop-windows (c) laptop-linux to Pi
177
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE GOIAS. Downloaded on April 10,2024 at 09:22:23 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES [25] L. silver, A. Smith, C. Johnson, J. Jiang, M. Anderson, and
L. Rainie, “Mobile Connectivity in Emerging Economies,” 03 2019.
[1] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated [Online]. Available: https://www.pewresearch.org/internet/2019/03/07/
performance comparison of virtual machines and linux containers,” in mobile-connectivity-in-emerging-economies/
2015 IEEE ISPASS, 03 2015, pp. 171–172. [26] A. E. Elgazar and K. A. Harras, “Enabling seamless container migration
[2] A. Jangda, B. Powers, E. D. Berger, and A. Guha, “Not so fast: Analyzing in edge platforms,” in CHANTS’19, 2019, p. 1–6.
the performance of webassembly vs. native code,” in USENIX ATC’19, [27] M. Habib ur Rehman, S. L. Chee, T. Y. Wah, A. Iqbal, and P. P. Jayaraman,
07 2019, pp. 107–120. “Opportunistic computation offloading in mobile edge cloud computing
[3] L. Gherardi, D. Brugali, and D. Comotti, “A java vs. c++ performance environments,” in IEEE MDM ’16, vol. 1, 2016, pp. 208–213.
evaluation: A 3d modeling benchmark,” vol. 7628, 11 2012. [28] W. Li, X. You, Y. Jiang, J. Yang, and L. Hu, “Opportunistic computing of-
[4] H. Abdelnasser, K. Harras, and M. Youssef, “A ubiquitous wifi-based floading in edge clouds,” Journal of Parallel and Distributed Computing,
fine-grained gesture recognition system,” IEEE Transactions on Mobile vol. 123, pp. 69–76, 2019.
Computing, vol. 18, no. 11, pp. 2474–2487, 2018. [29] J.-w. Kwon and S.-M. Moon, “Web application migration with closure
[5] H. Abdelnasser, K. A. Harras, and M. Youssef, “Magstroke: A magnetic reconstruction,” ser. WWW ’17. Republic and Canton of Geneva, CHE:
based virtual keyboard for off-the-shelf smart devices,” in IEEE SECON, International World Wide Web Conferences Steering Committee, 2017,
2020, pp. 1–9. p. 133–142.
[6] A. Saeed, A. Abdelkader, M. Khan, A. Neishaboori, K. A. Harras, and [30] J. Gascon-Samson, K. Jung, S. Goyal, A. Rezaiean-Asel, and K. Pat-
A. Mohamed, “Argus: realistic target coverage by drones,” in ACM/IEEE tabiraman, “ThingsMigrate: Platform-Independent Migration of Stateful
IPSN, 2017. JavaScript IoT Applications,” in ECOOP’18, 2018, pp. 18:1–18:33.
[7] A. Saeed, M. Ammar, K. A. Harras, and E. Zegura, “Vision: The [31] H.-J. Jeong, C. H. Shin, K. Y. Shin, H.-J. Lee, and S.-M. Moon, “Seamless
case for symbiosis in the internet of things,” in Proceedings of the offloading of web app computations from mobile device to edge clouds
6th International Workshop on Mobile Cloud Computing and Services. via html5 web worker migration,” in SoCC ’19, 2019, p. 38–49.
ACM, 2015, pp. 23–27. [32] P. Olivier, A. K. M. F. Mehrab, S. Lankes, M. L. Karaoui, R. Lyerly, and
[8] A. Emam, A. A. Abdellatif, A. Mohamed, and K. A. Harras, “Edgehealth: B. Ravindran, “Hexo: Offloading hpc compute-intensive workloads on
An energy-efficient edge-based remote mhealth monitoring system,” in low-cost, low-power embedded systems,” in Proceedings of HPDC ’19.
IEEE WCNC, 2019, pp. 1–7. New York, NY, USA: Association for Computing Machinery, 2019, p.
[9] M. F. Al-Sa’D, M. Tlili, A. A. Abdellatif, A. Mohamed, T. Elfouly, 85–96.
K. Harras, M. D. O’Connor et al., “A deep learning approach for vital [33] A. Barbalace, M. L. Karaoui, W. Wang, T. Xing, P. Olivier, and
signs compression and energy efficient delivery in mhealth systems,” B. Ravindran, “Edge computing: The case for heterogeneous-isa container
IEEE Access, vol. 6, pp. 33 727–33 739, 2018. migration,” ser. VEE ’20. New York, NY, USA: Association for
[10] M. Satyanarayanan, “The emergence of edge computing,” Computer, Computing Machinery, 2020, p. 73–87.
vol. 50, no. 1, pp. 30–39, 01 2017. [34] A. Hall and U. Ramachandran, “An execution model for serverless
[11] N. Abbas, Y. Zhang, A. Taherkordi, and T. Skeie, “Mobile edge functions at the edge,” in Proceedings of IoTDI ’19, 2019, p. 225–236.
computing: A survey,” IEEE Internet of Things Journal, vol. 5, no. 1, [35] E. Wen and G. Weber, “Wasmachine: Bring the edge up to speed with a
pp. 450–465, 02 2018. webassembly os,” in 2020 IEEE 13th International Conference on Cloud
[12] A. Elgazar and K. Harras, “Teddybear: Enabling efficient seamless Computing (CLOUD), 2020, pp. 353–360.
container migration in user-owned edge platforms,” in IEEE CloudCom, [36] P. Hickey, “How fastly and the developer community are investing in
2019, pp. 70–77. the webassembly ecosystem.” [Online]. Available: www.fastly.com/blog/
[13] A. E. Elgazar, M. Aazam, and K. A. Harras, “{SMC}: Smart media how-fastly-and-developer-community-invest-in-webassembly-ecosystem
compression for edge storage offloading,” in 2nd {USENIX} Workshop [37] A. Haas et al., “Bringing the web up to speed with webassembly,” in
on Hot Topics in Edge Computing (HotEdge 19), 2019. Proceedings of PLDI ’17, 2017, p. 185–200.
[14] K. Habak, M. Ammar, K. A. Harras, and E. Zegura, “Femto clouds: [38] S. Massey and V. Shymanskyy, “wasm3.” [Online]. Available:
Leveraging mobile devices to provide cloud service at the edge,” in IEEE https://github.com/wasm3/wasm3
CLOUD, 2015, pp. 9–16. [39] K. Ha, Y. Abe, T. Eiszler, Z. Chen, W. Hu, B. Amos, R. Upadhyaya,
[15] H. Gedawy, K. A. Harras, K. Habak, and M. Hamdi, “Femtoclouds P. Pillai, and M. Satyanarayanan, “You can teach elephants to dance:
beyond the edge: The overlooked data centers,” IEEE Internet of Things Agile vm handoff for edge computing,” in Second ACM/IEEE SEC,
Magazine, vol. 3, no. 1, pp. 44–49, 2020. 2017, pp. 12:1–12:14.
[16] H. K. Gedawy, K. Habak, K. Harras, and M. Hamdi, “Ramos: A resource- [40] L. Ma, S. Yi, and Q. Li, “Efficient service handoff across edge servers
aware multi-objective system for edge computing,” IEEE Transactions via docker container migration,” in Second ACM/IEEE SEC, 2017, pp.
on Mobile Computing, 2020. 11:1–11:13.
[17] C. Shi, V. Lakafosis, M. H. Ammar, and E. W. Zegura, “Serendipity: [41] A. Machen, S. Wang, K. K. Leung, B. J. Ko, and T. Salonidis, “Live ser-
Enabling remote computing among intermittently connected mobile vice migration in mobile edge clouds,” IEEE Wireless Communications,
devices,” ser. MobiHoc ’12. New York, NY, USA: Association for vol. 25, no. 1, pp. 140–147, 02 2018.
Computing Machinery, 2012, p. 145–154. [42] R. A. Addad, D. L. Cadette Dutra, M. Bagaa, T. Taleb, and H. Flinck,
[18] K. Habak, E. W. Zegura, M. Ammar, and K. A. Harras, “Workload “Towards a fast service migration in 5g,” in 2018 IEEE CSCN, 2018, pp.
management for dynamic mobile device clusters in edge femtoclouds,” in 1–6.
Proceedings of ACM/IEEE SEC ’17. New York, NY, USA: Association [43] A. Donovan, R. Muth, B. Chen, and D. Sehr, “Pnacl: Portable native
for Computing Machinery, 2017. client executables,” Google White Paper, 2010.
[19] A. Essameldin and K. A. Harras, “The hive: An edge-based middleware [44] S. Gal-On and M. Levy, “Exploring coremark™ – a benchmark
solution for resource sharing in the internet of things,” in MobiCom maximizing simplicity and efficacy,” EEMBC, Tech. Rep. [Online].
Smart Objects Workshop, 2017, pp. 13–18. Available: https://www.eembc.org/techlit/articles/coremark-whitepaper.
[20] A. Essameldin, M. Nurulhoque, and K. A. Harras, “More than the sum pdf
of its things: Resource sharing across iots at the edge,” in ACM/IEEE [45] A. Parco, 04 2019. [Online]. Available: https://www.docker.com/blog/
SEC, 2020. multi-arch-images/
[21] A. Elgazar, M. Aazam, and K. Harras, “Edgestore: Leveraging edge [46] T. Schürmann. [Online]. Available: https://www.admin-magazine.com/
devices for mobile storage offloading,” in IEEE CloudCom, 2018, pp. Archive/2014/22/Save-and-Restore-Linux-Processes-with-CRIU
56–61. [47] A. Reber and P. V. äterlein, “Checkpoint/restore in user-space with open
[22] L. A. Barroso and U. Hölzle, “The case for energy-proportional mpi,” in Proceedings of SInCom ’14, 2014, pp. 50–54.
computing,” IEEE Computer, vol. 40, 2007. [48] P. Mendki, “Evaluating webassembly enabled serverless approach for
[23] A. Mtibaa, K. A. Harras, K. Habak, M. Ammar, and E. W. Zegura, “To- edge computing,” in 2020 IEEE Cloud Summit, 2020, pp. 161–166.
wards mobile opportunistic computing,” in 2015 IEEE 8th International [49] J. Napieralla, “Considering webassembly containersfor edge comput-
Conference on Cloud Computing, 06 2015, pp. 1111–1114. ing onhardware-constrained iot devices,” Master’s thesis, Faculty of
[24] B. Stella and K. J. Bwalya, Fog Computing in a Developing World
Context: Jumping on the Bandwagon. Cham: Springer International Computing, 2020.
Publishing, 2018, pp. 63–80.
178
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE GOIAS. Downloaded on April 10,2024 at 09:22:23 UTC from IEEE Xplore. Restrictions apply.