Prebaking Functions To Warm The Serverless Cold
Prebaking Functions To Warm The Serverless Cold
Prebaking Functions To Warm The Serverless Cold
net/publication/347439620
CITATIONS READS
27 1,326
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Thiago Emmanuel Pereira da Cunha Silva on 17 December 2020.
1
Middleware ’20, December 7–11, 2020, Delft, Netherlands Paulo Silva, Daniel Fireman, and Thiago Emmanuel Pereira
related to the long and unpredictable delays observed when The results attained in these experiments indicate that
the platform needs to start new function replicas. The so- the runtime initialization (which in Java includes lazy code
called cold-start happens not only when a new version of compilation) is key to the cold start delay. To understand
the function runs for the first time but also whenever the the relationship between the code compilation and the cold
FaaS platform policy decides to scale the function up to start delay, we extend our experiments to consider syntheti-
address a demand growth [9]. This start-up delay includes: cally generated functions, which vary in the code size. These
i) the time spent provisioning and starting the resources results helped us to discover that it is critical to decide at
to run the functions (typically, VMs or containers); ii) the which point of the function execution lifetime, the snapshot
initialisation of the function runtime environment (e.g, the should be generated. The speed-ups reported earlier were
JVM, the Chrome V8 engine or the Python interpreter); and achieved using checkpoints, generated just after the function
iii) the execution of application-specific bootstrap, which was ready to process requests. The results are even more im-
includes loading and compiling libraries. proved when one creates the checkpoints after the functions
Evaluating and improving the efficiency of serverless in- have received at least one request, which forces the Java
frastructure is an active area of research. In particular, re- runtime to compile and optimize the code. In this case, the
garding resource provisioning and its impact on cold-starts. speed-ups can be even higher: the ratio between the start-
A common approach is to avoid delays by being conserva- up time using the standard function creation mechanism
tive when provisioning functions [14]. On the one hand, by and the prebaking technique is increased from 127.45% to
maintaining an idle pool of functions instances, the platform 403.96% for a small, synthetic function; for a bigger, synthetic
addresses surges in demand with no performance penalty. function, this ratio increases from 121.07% to 1932.49%.
On the other hand, as the platform provider does not charge In summary, this paper has the following contributions:
for idle function instances, this strategy increases the plat- • Dives into the function start-up to investigate causes
form’s operational cost. Other approaches to the cold-start of delay, including within the runtime environment;
performance problem include the usage of specialized sand- • Proposes the usage of checkpoint/restore in the con-
boxing mechanisms like unikernels [2, 8], lightweight con- text of FaaS, the so-called prebaking technique;
tainers [19, 25] and microVMs [1]. • Creates a prototype of the prebaking technique using
In this work, we focus on reducing the impact of the run- CRIU, a checkpoint mechanism available for the Linux
time environment setup and loading. The prebaking tech- kernel;
nique replaces the standard fork-exec procedure by a mech- • Evaluates of the prebaking prototype, comparing with
anism that restores snapshots of previously created functions state-of-the-practice.
processes. To evaluate the prebaking technique, we devel-
oped a prototype using the CRIU checkpoint/restore tool1 The remaining of the paper is the following. In Section 2,
available for the Linux Kernel and analyzed this prototype we overview the design of FaaS platforms and its relation to
experimentally. the cold-start issue. In Section 3, we describe the design of
Our experiments evaluate the start-up delays of real and our cloning technique as well as the prototype implemen-
synthetic functions. We compared the cold-start when using tation using the CRIU tool. In Section 4, we describe our
the prebaking mechanism and the usual option based on experimental design and the results attained in the evalua-
creating new processes each time a function is started. The tion of our technique. This evaluation compares the proposed
evaluated functions include: i) a NOOP function that does technique against the state-of-the-practice of function start-
nothing, ii) an Image Resizer and, iii) a render of Markdown up. In Section 5, we show the feasibility of our technique by
files. reporting how we manage to integrate our technique with
The results indicate that the checkpoint/restore technique an existing serverless platform. In Section 6, we overview
is effective: the least improvement case, using the prebak- the literature on function-as-a-service performance improve-
ing technique decreases the start-up delay by 40% for the ments, in particular, related to the function start-up problem.
elementary function that does nothing other than returning Finally, in Section 7, we discuss the results we obtained and
an ack for the request. We also observe that the speed-up possible limitations and improvements.
achieved by the prototype increases as a function grows
more complex and is based on the increased amount of code 2 Background
(e.g., number of loaded classes). For a function that renders In this Section, we overview the typical design of FaaS plat-
a file in the markdown format to HTML, the start-up time forms. In addition to providing the background to this re-
is reduced from 100𝑚𝑠 to 53𝑚𝑠. For the Image Resizer, the search, this overview highlights the issue of cold-start of
improvement is even more significant: the start-up delay is such platforms.
decreased from 310𝑚𝑠 to 87𝑚𝑠, i.e., a speed-up of 71%. As introduced earlier, the FaaS model has two key aspects:
1) payment is based only on time and resources used dur-
1 https://criu.org/ ing the execution of functions; and, 2) users are liberated
2
Prebaking Functions to Warm the Serverless Cold Start Middleware ’20, December 7–11, 2020, Delft, Netherlands
from operation and management of computing resources. the execution of a function (in this case, not available). As a
Although all the major FaaS providers share above basic prin- result, the Function Deployer steps in to have a new Func-
ciples, the design of their platforms is very diverse; maybe a tion Replica provisioned. To this end, it gathers the desired
sign of a still-nascent field. Nevertheless, there are already function configuration from the Function Registry. With
some common, emerging, architectural patterns, as identi- the necessary information, the Resource Manager is com-
fied by the SPEC Research Group on Cloud [7]. We based manded to deploy the Function Replica on the computing
our background overview on the SPEC-RG reference archi- nodes. Once the Function Deployer is informed that a new
tecture. function replica was created, the Function Router resumes
The SPEC-RG reference architecture is organized in three the triggering of the original event that will lead to the exe-
layers: the Resource Orchestration, the Function Management, cution of the function.
and Workflow Management layers. We focus the remaining The cold-start delay of the above execution scenario has
of this discussion on the first two layers since they are more two components: 1) the delay to provision the execution
related to the cold-start issue considered in this work. environment (VMs and containers) for the new function
The Resource Orchestration layer is responsible for the replica; and 2) the delay to start-up the function application.
management of computing resources, e.g. containers and As containerization or virtualization techniques are op-
VMs, employed to support the executions of functions. In its timized to decrease start-up time [16, 19, 23], applications
turn, the Function Management layer, built upon the previous start-up time will become a more evident problem. Our ex-
one, is responsible for deploying the function replicas, in periments showed that Java applications typically takes more
addition to executing the functions and to autoscaling the than 100 ms to start, and depending on the application ini-
function replicas, when needed. tialization requirements, this time can reach 300 ms.
Figure 1 illustrates the interaction between some of the
components of the reference architecture to handle the exe-
cution of a function when there is no function replica already 3 Prebaking
deployed — the cold-start case. The Figure shows, for the In this Section, we describe the design and a prototype im-
Resource Orchestration layer, the Resource Manager compo- plementation of our prebaking technique, which has the
nent. While for the Function Management layer, it shows primary goal of reducing serverless cold-starts. Furthermore,
the Function Router, Function Deployer, Function Registry, and the technique aims to: i) be easy to integrate with existing
Function Replica components. serverless platforms, ii) not harm the function performance
The Function Router dispatches new requests or events after the start-up, and iii) not increase the costs of operating
to the correct function replicas (or, queue the requests and the serverless platform.
events while the replicas are still not available to process In the following, we detail the design of the prebaking
them). The Function Registry is a repository for the metadata technique (Section 3.1). Then, we describe a prototype imple-
and binaries of the functions available in the platform. The mentation of this design using the CRIU checkpoint/restore
Function Builder transforms the function representations, tool (Section 3.2).
kept by the Function Registry into a deployable form (this
process might include compiling, handling dependencies,
and other building activities). The Function Deployer drives 3.1 Design
the actual deploy mechanisms, implemented by the Resource The prebaking technique reduces function start-up time by
Orchestration layer, to deploy new function replicas into restoring snapshots of previously started functions runtimes.
computing resources. The Function Deployer component is As shown in Figure 1, before a function is ready to serve
responsible for deciding how many function replicas should requests, it executes a complex (and, in many cases, slow)
be deployed and which kind of resources should be used to series of steps. These steps include: creating a new process
deploy the functions. The Resource Manager, based on the to host the runtime, bootstrapping the runtime (e.g., the
information gathered by agents running of the infrastruc- initialization of its data structures and auxiliary services),
ture node, ensures that the state of the computing cluster is and loading the function code. We assume that it is faster
always in the desired states. to restore a function snapshot than to re-execute all those
The function building process starts when the Function start-up steps.
Builder component receives the function source code and This sort of checkpoint/restart method is widely used in
transforms it into a deployable artifact. After the building high-performance computing (HPC) to tolerate faults in long-
phase, the deployable function is stored into the Function running applications; when a failure happens, the application
Registry (from where later it can be downloaded and used could be resumed, from periodically generated snapshots,
to create a Function Replica [7]). instead of restarting from scratch. However useful, check-
The execution flow shown in Figure 1 starts when the point/restart may impact application performance. On the
Function Router receives a new event that would trigger one hand, the more frequent the snapshots are generated,
3
Middleware ’20, December 7–11, 2020, Delft, Netherlands Paulo Silva, Daniel Fireman, and Thiago Emmanuel Pereira
Figure 1. Function execution sequence when a function replica is not available. Adapted from [7]
the faster it is to recover the state just before the failure (be- Another possible optimization process is to fine-tune the
cause less computation must be re-executed). On the other moment of the runtime start-up to snapshot the function.
hand, since the snapshot generation competes for comput- Choosing the best time requires in-depth knowledge about
ing resources, frequent snapshots could slow the application the runtime, such as when the start-up procedure reaches
down. In the best case, when there is no fault, the snapshot the right balance between progress and the amount of state
generation is pure overhead. generated. The rationale behind such optimization is that
Differently from the HPC case, the prebaking technique the larger the snapshot, the longer it takes to be restored.
creates function snapshots only when the user deploys a new Despite the potential benefits, harnessing fine-grained
function version. From a typical serverless platform archi- knowledge about the runtime could compromise our aim to
tecture point of view, its more appropriate for the Function be easy to integrate with existing serverless platforms, since
Builder to trigger the function snapshot since this component any modification on the runtime (not unusual) should be
is responsible for transforming the function into deployable integrated back. It would also make it hard to instrument
artifacts. After building the function based on the prebak- the runtime and function codes to support the generation of
ing technique, the function building process can remain the the snapshots.
same as explained in Section 2. This has the additional advan-
tage of not delaying the function execution, since function 3.2 Prototype implementation
building executes before the function is available to be called.
The first thing needed to perform the checkpoint is to read
The platform would restore the snapshot whenever a new
the memory and state of the target process. The most straight-
function instance is created. The same snapshot can be used
forward mechanism to perform this is to modify the program
to restore different Function Replicas because all of them
to perform its checkpoint. This solution is inadequate to our
have the same state at the beginning of the execution. More
scenario. As we described in the beginning of this Section,
importantly, the prebaking technique allows the creation of
we aim to be easy to integrate with existing serverless plat-
snapshots at any point of the function setup. This character-
forms and the solution would demand modifications to the
istic opens a room for optimizing the process of snapshot
code of any function submitted to the platform.
generation to minimize the restart delay. For example, a
A fully-transparent checkpoint that is, without acknowl-
runtime-agnostic option is to generate the snapshots after
edgment of the target program is doable at the kernel level.
the end of the start-up procedure. This alternative eases
At this level, the checkpoint procedure would be able to
the snapshot generation. Instead of instrumenting runtime-
access the address space of any process. Unfortunately, how-
specific code, it is only necessary to wait for the completeness
ever doable, the kernel-based solution never achieved main-
of the process creation to generate the snapshot.
stream adoption. Another option is to use solutions that relax
transparency to stay at the userland, which is the case of
4
Prebaking Functions to Warm the Serverless Cold Start Middleware ’20, December 7–11, 2020, Delft, Netherlands
libckpt [22]. That is still not ideal because it requires the – i.e., prebaking versus the usual start method, based on
recompilation of the target application to include the code fork-exec system calls (henceforth, the Vanilla method). The
that performs the checkpoint. start-up of a function involves procedures executed both by
More recently, CRIU achieved a fully-transparent check- the operating system (e.g. clone, fork and exec system calls)
point at the user level. Instead of modifying the target appli- as well as procedures executed at the user level, including the
cations at the compiling time, CRIU injects the checkpoint bootstrap of the runtime and the application initialization.
procedure into the application code while the applicaiton Since the duration of the user level procedures are affected
is running. After injection, the checkpoint code runs in the by the characteristics of the functions, we evaluated three
same address space of the target process and thus can read different functions: NOOP, Markdown Render and Image
its internal state to perform the checkpoint. Once the check- Resizer2 .
point is finished it removes itself from the code, and the Both functions were written in Java and used an HTTP
unaware target application resumes its execution. server to handle the requests, as usually employed in com-
First, CRIU needs to freeze all the target process’s threads, mercial FaaS providers, such as AWS Lambda, Google Cloud
so that its state does not change while generating the check- Function, Azure Functions, and IBM OpenWhisk [13, 17].
point dump. After stopping all the threads, CRIU needs to dis- The NOOP function is very straightforward. It does noth-
cover what should be checkpointed for each of these threads. ing and returns success to every incoming request. The
For example, it reads the /proc/$pid/pagemap file to find function business logic neither has extra dependencies nor
the mapped memory areas. Afterward, CRIU injects the pro- adds extra processing/memory overhead. On the other hand,
cedure (parasite code) responsible for performing the actual the Markdown Render converts a markdown to an HTML
dump into the target process address space using the ptrace page. We embed a markdown3 inside the body of each in-
system call. When the parasite code starts running, it com- coming request, and receive the HTML page as response.
municates with the CRIU process to know what to dump, In its turn, the Image Resizer is more complex [2, 19].
reads the content from the process address space, and sends On start-up, it loads a 1MB, 3440x1440 pixels image4 , and
it through a pipe to the CRIU process. Finally, CRIU uses for each incoming request the function scales it down to 10%
the ptrace system call to remove the parasite code and to of its original size. The Image Resizer function depends on
detach from the target process, which resumes its execution. three image processing packages, all from the Java Software
The restore process is more straightforward than the dump Development Kit[21].
one. During the restoration, the CRIU tool process trans- As we are focusing on the function start-up, the experi-
mutes itself into the checkpointed process. The first action is ments were composed only by the load generator and the
to read the dump files and restore the process’s state. Then, it function runtime (i.e., JVM). That means we deliberately
recreates all namespaces and opened files. Finally, the check- excluded some typical components of FaaS platforms, such
pointed memory is remapped. as container orchestrators [17]. Without loss of generality,
CRIU is able to run both the checkpoint and the restore focusing on the runtime simplified the experimental setup
mechanisms unprivileged. This is possible due the recently and removed sources of experimental noise.
added CAP_CHECKPOINT_RESTORE capability [11]. This capa- The most common concurrent model in public clouds is
bility relax some permissions to execute procedures such that each function replica handles one request at a time. If a
as selecting a specific pid when cloning a new process and replica is busy and a new request arrives, the platform starts
acessing the memory mapped files. another replica to do the job. On the other hand, if a replica
is inactive for a certain period, the platform garbage collects
4 Evaluation the function replica to save resources [27]. So, to mimic this
behavior, the load generator starts the function replica and
In this Section, we evaluate the performance of using the
holds the first request until the replica becomes ready. After
prebaking technique, described previously, in comparison
that, the load is sent sequentially and at a constant rate.
with the usual start-up procedure. We focused our evaluation All the experiments discussed in Sections 4.2 and 4.3 were
on two questions: i) Can the prebaking technique improve the performed in a quad-core Intel(R) Core(TM) i5-3470S 2.90GHz
start-up time of a serverless function (Section 4.2)? ii) Does the VM, with 8GB RAM running Ubuntu 16.04 with Linux kernel
novel start-up procedure lead to any penalty on the function 4.15.0-45-generic-x86_64. We used the Java Oracle 1.8.0_201
performance after start-up (Section 4.3)?
runtime. Each experiment treatment was repeated 200 times.
The load generator and the function runtime was restarted
4.1 Methodology before a run.
To answer the questions described above, we conducted a 22
factorial experiment. Our experiments measured the runtime 2 https://github.com/paulofelipefeitosa/serverless-handlers
start-up and the function response time of 200 subsequent 3 https://github.com/PrincetonUniversity/openpiton
5
Middleware ’20, December 7–11, 2020, Delft, Netherlands Paulo Silva, Daniel Fireman, and Thiago Emmanuel Pereira
6
Prebaking Functions to Warm the Serverless Cold Start Middleware ’20, December 7–11, 2020, Delft, Netherlands
7
Middleware ’20, December 7–11, 2020, Delft, Netherlands Paulo Silva, Daniel Fireman, and Thiago Emmanuel Pereira
versus 121.07%, for more realistic functions). This growth functions. As an example, we list some operations which are
is because snapshot loading is less impacted by function essential to the integration with the prebaking technique:
size than the vanilla source-code loading and compilation. 1. new: creates a new function project by copying a lan-
Table 1 help us analyzing this impact by showing the start-up guage template from the Templates Repository. After
time intervals for the three start-up techniques and function it, the developer can edit the project to implement the
sizes. As we could see, the start-up time growth from small function’s business logic;
to big functions was ≈ 30ms for prebaking with warmup (i.e., 2. build: transforms the function source code into a de-
PB-Warmup) and ≈ 1168ms for prebaking without warmup ployable artifact which is a Docker container image;
(i.e., PB-NOWarmup). 3. push: stores the function deployable artifacts into the
Function Registry which is a Container Image Reposi-
4.3 Service Time Overhead tory;
In Section 4.2, we analyzed the start-up time and showed 4. deploy: deploys the function into an OpenFaaS Gate-
that the prebaking technique led to significant performance way enabling its usage through the platform.
improvements. In the state of the practice, those improve- Every request that comes through the platform hits the
ments imply in more latency predictability, as the perceived Gateway API, which is the OpenFaaS platform entry point.
latency of the first request suffers less from the cold start. In It provides APIs to deploy, invoke, scale, gather information,
this Section, we present the evaluation of the service time, and metrics about the instances of the function. Furthermore,
aiming to understand how FaaS functions behave after being the platform auto-scaling functionality is shared between
restored. In other words, we assess if the start-up procedure the Gateway API and the Prometheus tool, which continu-
leads to any performance penalty. ously monitors metrics and fires alerts. All alerts fired by
Figure 7 presents the empirical cumulative distribution Prometheus are processed by Gateway API, which decides
function (ECDF) of the service time for 200 requests applied when to scale down/up the number of active function repli-
to NOOP, Markdown Render and Image Resizer functions cas.
after being initialized by the prebaking and vanilla technique. Instead of directly executing operations, such as incre-
Both ECDFs pretty much coincide, thus a good indication menting the number of replicas of a particular function, the
that the prebaking technique does not lead to any perfor- API Gateway delegates it to the FaaS-Provider. This indirec-
mance penalty after the functions are restored. tion abstract details about different container orchestration
mechanisms and tools. Currently, the FaaS-Provider has im-
5 Integration plementations for Kubernetes and DockerSwarm integration.
To access the feasibility of the prebaking technique, we in- Finally, the function Watchdog is the component respon-
tegrate it with the open-source OpenFaaS platform6 . In our sible for managing and monitoring the function replica life-
integration scenario, we used Kubernetes as the Resource cycle. Furthermore, it is a communication interface between
Management layer. OpenFaaS is one of the most popular the platform API and the replica process.
open-source serverless platforms. It provides excellent docu-
mentation about the platform architecture, making it easier 5.2 Prebaking OpenFaaS Functions
to understand how we could integrate our technique. In the As shown in figure 9, OpenFaaS introduced the concept of
next sections, we overview the OpenFaaS design and explain templates. A template hides setup complexity from users that
how its integration with the prebaking technique. have everyday use cases. There are templates for languages
The prebaking technique was designed to be easily inte- like Go, Python, Java, PHP, and C#.
grated with the existing serverless platforms. Such a premise To spin off a prebaked function, we need to create a tem-
does not tied-up our technique to a specific serverless plat- plate that adds all CRIU dependencies and executes CRIU
form, neither to a process isolation technology. commands. As CRIU uses different commands to start pro-
cesses in different runtimes, we created a new CRIU-version
5.1 OpenFaaS template for each language that we wanted to support7 . With
OpenFaaS is a container-based serverless platform. It means the prebaking template, the developers can create function
that there is a container for every function, and the container projects that adopts the prebaking technique by using the
should encapsulate all dependencies (e.g., source-code and new operation from FaaS-CLI.
runtime). That said, Figure 8 presents an overview of the Prebaking templates work differently from usual Open-
OpenFaas architecture and how its components communi- FaaS templates. On the build phase, when transforming
cate. the source-code and dependencies into a deployable func-
As shown in Figure 8, users operate OpenFaaS through tion, Prebaking templates start the function runtime and run
the Faas-CLI, which defines an API for the operations with an optional post-processing script (e.g., warm-up requests),
6 https://www.openfaas.com/ 7 https://github.com/paulofelipefeitosa/templates
8
Prebaking Functions to Warm the Serverless Cold Start Middleware ’20, December 7–11, 2020, Delft, Netherlands
Table 1. start-up time intervals (in milliseconds) for functions with small, medium and big code bases. Intervals were calculated
to provide 95% of statistical confidence.
Figure 7. Empirical cumulative distribution function (ECDF) of the service time for 200 requests applied to NOOP, Markdown
Render and Image Resizer functions after being initialized by the Prebaking and Vanilla technique.
and checkpoint the function process into the container im- Finally, as mentioned before, the restore operation is priv-
age. And, after creating the Docker container image, the ileged. The docker run command already supports this func-
OpenFaaS platform can finish the function push and deploy tionality by starting the container using the –privileged
processes as usual. With that, when it is time to start the option. As Kubernetes already support this behavior, we only
function replica, the container executes the CRIU command needed to introduce it in the FaaS-Provider implementation.
to restore the dump previously saved inside the container
image. 6 Related Work
Since usual docker build does not allow the execution of
Long start-up time remains an open problem in serverless
privileged operations, it was necessary to install the Docker
computing. Many efforts have been proposed to decrease
Buildx CLI plugin8 to allow docker command to perform
start-up time, including snapshot-based solutions. For exam-
such operations.
ple, SEUSS harnesses unikernels to executes snapshots of
After creating a new function using the CRIU templates
serverless applications [4]. SEUSS improves startup times by
and installing the Docker Buildx, the developers can deploy
snapshotting the unikernel state at different moments of the
function instances using the same commands provided by
function life-cycle and cloning the unikernel snapshot when
FaaS-CLI.
there is no Function Replica available. SEUSS runs on a ded-
icated OS focused on delivering fast snapshot restore. The
decision of adopting an ad-hoc kernel instead of using an
8 https://docs.docker.com/buildx/working-with-buildx/ out-of-box Linux kernel comes with a price: it leads to more
9
Middleware ’20, December 7–11, 2020, Delft, Netherlands Paulo Silva, Daniel Fireman, and Thiago Emmanuel Pereira
Figure 8. OpenFaaS main components. Functions artifacts are store in a Container Image Repository, and later downloaded
and used to create new Function Replicas.
complex integrations with existing serverless platforms. In [10] propose to clone and restore the JVM internal structures
particular, when the platform is deployed on premise, by avoiding several start-up steps to reduce Java applications
small organizations, the cost of operation could be prohibi- start-ups. Oh and Moon [20] propose to decrease Web ap-
tive. Furthermore, it is unclear how SEUSS performs when plications start-up time by snapshotting Javascript objects
dealing with more complex functions as the work describes and restoring them when the application is loaded. We chose
only the evaluation of the NOOP function. the process cloning technique because of its generality. As
SOCK proposes to cache application loaders with pre- the clone operation is applied at the process level, it does
imported packages and clone them to avoid runtime start- not need to know runtime or application internal structures.
up[18, 19]. However, SOCK service implementation is language- However, the knowledge of runtime internals could be used
specific, and it does not deal with other application aspects to make the start-up even better.
that influence the start-up time, for instance, I/O heavy
initialization. Boucher et al. [3] proposes the adoption of
language-based isolation instead of process-based. The au- 7 Conclusion
thors implemented a multi-tenant worker process in Rust, Serverless platform users face a well-known problem of high
which directly executes functions by dynamic loading the response times when the request handling need wait for the
function code and running it as a thread. Even though they platform to scale-up. This problem is commonly known as
achieved a start-up time in the order of microseconds, the "cold start", which has as significant contributors the plat-
solution requires users to write functions in Rust, which form orchestration overhead and virtualized environment
brings challenges in terms of usage and code transpilation. (container or VM) start-up [14, 16]. However, corroborating
Our work provides a language-aware solution that can be with previous studies [14, 15, 19], our findings revealed that
plugged by the cloud provider and massively improve the function runtime start-up also plays a major role in cold
start-up time of applications without changing the user code starts. Our experiment results show that JVM start-up times
or imposing new user requirements. range from 310 to 1600 ms, depending on the code size.
The clone and restore approach has already been used We focused on decreasing the function process start-up
in other contexts to decrease cloud applications’ start-up time using a cloning technique based on Checkpoint/Restore
time. Kukreti and Mueller [12] propose the process cloning In Userspace (CRIU). The proposed solution persists the pro-
technique to avoid speculative tasks recompute work already cess state of a ready-to-serve serverless instance to recover
done by the original task improving the probability of a spec- this state in a cloned process later, when the platform needs
ulative task catching up the straggler task. Kawachiya et al. to scale up. Our results show that using process cloning to
10
Prebaking Functions to Warm the Serverless Cold Start Middleware ’20, December 7–11, 2020, Delft, Netherlands
Figure 9. Diagram of the Prebaked Functions deployment and execution flow in the OpenFaaS platform. On the build phase,
CRIU triggers the process checkpoint and stores the Function Snapshot data inside the Function Container Image. Whenever
the FaaS-Provider launches a new Function Replica, CRIU restores the snapshot.
start serverless functions removes the overhead of the JVM to evaluate the checkpoint/restore as a service including
start-up, leading to a gain of 40% for a NOOP function, and aspects such as the performance to deal with even bigger
47% to 71% for more representative functions. The proposed function code sizes and concurrent snapshots. Finally, we
solution also allows for platform maintainers to interact plan to adopt the recently released version of the CRIU tool,
with the process before persisting its state. We used this that does not require the execution of previleged operations
functionality to warm a Java function up before persisting and to experiment with in-memory optimization on CRIU
it (i.e., prebaking), and our experiments show that it leads to speed-up snapshot restore [26].
to improvements ranging from 127.45% to 1932.49%. That
means the prebaking technique, not only removed the JVM
start-up overhead, but also effectively removed the overhead Acknowledgments
caused by loading and compiling the code (JIT). Finally, we We would like to thank all anonymous reviewers from this
showed that these gains are proportional to the code size of 2020 Middleware edition and our shepherd Lucy Cherkasova
the function. for their guidance and precious feedback. This work was
As future work, we plan to extend our evaluation to other supported by CAPES– Brazilian Federal Agency for Support
runtimes environments such as Node.JS and Python, all sup- and Evaluation of Graduate Education. Furthermore, it has
ported by the leading public FaaS platforms. As different been funded by grant #2015/24461-2, São Paulo Research
runtimes implement distinct start-up procedures, the poten- Foundation (FAPESP), and by the project ATMOSPHERE
tial improvements remain unknown. Also, we plan to evolve (atmosphere-eubrazil.eu), by the Brazilian Ministry of Sci-
the integration of the Prebaking technique into other FaaS ence, Technology and Innovation (Project 51119 - MCTI /
platforms, aiming to assess how ease it is to integrate the RNP 4th Coordinated Call) and by the European Commis-
technique with different designs. In addition to that, we plan sion under the Cooperation Programme, Horizon 2020 grant
agreement no 777154.
11
Middleware ’20, December 7–11, 2020, Delft, Netherlands Paulo Silva, Daniel Fireman, and Thiago Emmanuel Pereira
References git/commit/?id=74858abbb1032222f922487fd1a24513bbed80f9
[1] Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony [12] Sarthak Kukreti and Frank Mueller. 2018. CloneHadoop: Process
Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Cloning to Reduce Hadoop’s Long Tail. In 5th IEEE/ACM International
Firecracker: Lightweight Virtualization for Serverless Applications. In Conference on Big Data Computing Applications and Technologies, BD-
17th USENIX Symposium on Networked Systems Design and Implemen- CAT 2018, Zurich, Switzerland, December 17-20, 2018. IEEE Computer
tation, NSDI 2020, Santa Clara, CA, USA, February 25-27, 2020, Ranjita Society, 11–20. https://doi.org/10.1109/BDCAT.2018.00011
Bhagwan and George Porter (Eds.). USENIX Association, 419–434. [13] Hyungro Lee, Kumar Satyam, and Geoffrey C. Fox. 2018. Evalu-
https://www.usenix.org/conference/nsdi20/presentation/agache ation of Production Serverless Computing Environments. In 11th
[2] Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus IEEE International Conference on Cloud Computing, CLOUD 2018, San
Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. SAND: Francisco, CA, USA, July 2-7, 2018. IEEE Computer Society, 442–450.
Towards High-Performance Serverless Computing. In 2018 USENIX https://doi.org/10.1109/CLOUD.2018.00062
Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July [14] Ping-Min Lin and Alex Glikson. 2019. Mitigating Cold Starts in Server-
11-13, 2018, Haryadi S. Gunawi and Benjamin Reed (Eds.). USENIX less Platforms: A Pool-Based Approach. CoRR abs/1903.12221 (2019).
Association, 923–935. https://www.usenix.org/conference/atc18/ arXiv:1903.12221 http://arxiv.org/abs/1903.12221
presentation/akkus [15] Johannes Manner, Martin EndreB, Tobias Heckel, and Guido Wirtz.
[3] Sol Boucher, Anuj Kalia, David G. Andersen, and Michael Kaminsky. 2018. Cold Start Influencing Factors in Function as a Service. In 2018
2018. Putting the "Micro" Back in Microservice. In 2018 USENIX Annual IEEE/ACM International Conference on Utility and Cloud Computing
Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11-13, Companion, UCC Companion 2018, Zurich, Switzerland, December 17-
2018, Haryadi S. Gunawi and Benjamin Reed (Eds.). USENIX Associa- 20, 2018, Alan Sill and Josef Spillner (Eds.). IEEE, 181–188. https:
tion, 645–650. https://www.usenix.org/conference/atc18/presentation/ //doi.org/10.1109/UCC-Companion.2018.00054
boucher [16] Anup Mohan, Harshad Sane, Kshitij Doshi, Saikrishna Edupuganti,
[4] James Cadden, Thomas Unger, Yara Awad, Han Dong, Orran Krieger, Naren Nayak, and Vadim Sukhomlinov. 2019. Agile Cold Starts for
and Jonathan Appavoo. 2020. SEUSS: skip redundant paths to make Scalable Serverless. In 11th USENIX Workshop on Hot Topics in Cloud
serverless fast. In EuroSys ’20: Fifteenth EuroSys Conference 2020, Her- Computing, HotCloud 2019, Renton, WA, USA, July 8, 2019, Christina
aklion, Greece, April 27-30, 2020, Angelos Bilas, Kostas Magoutis, Evan- Delimitrou and Dan R. K. Ports (Eds.). USENIX Association. https:
gelos P. Markatos, Dejan Kostic, and Margo I. Seltzer (Eds.). ACM, //www.usenix.org/conference/hotcloud19/presentation/mohan
32:1–32:15. https://doi.org/10.1145/3342195.3392698 [17] Sunil Kumar Mohanty, Gopika Premsankar, and Mario Di Francesco.
[5] Serjik G. Dikaleh, Eric Charpentier, John Liu, Neil DeLima, and Vince 2018. An Evaluation of Open Source Serverless Computing Frame-
Yuen. 2018. Build a cognitive serverless slack app with IBM cloud works. In 2018 IEEE International Conference on Cloud Computing
functions & IBM Watson API. In Proceedings of the 28th Annual In- Technology and Science, CloudCom 2018, Nicosia, Cyprus, December
ternational Conference on Computer Science and Software Engineering, 10-13, 2018. IEEE Computer Society, 115–120. https://doi.org/10.1109/
CASCON 2018, Markham, Ontario, Canada, October 29-31, 2018. 354– CloudCom2018.2018.00033
355. https://dl.acm.org/citation.cfm?id=3291336 [18] Edward Oakes, Leon Yang, Kevin Houck, Tyler Harter, Andrea C.
[6] Bradley Efron and Robert J. Tibshirani. 1993. An Introduction to the Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. Pipsqueak: Lean
Bootstrap. Number 57 in Monographs on Statistics and Applied Proba- Lambdas with Large Libraries. In 37th IEEE International Conference
bility. Chapman & Hall/CRC, Boca Raton, Florida, USA. on Distributed Computing Systems Workshops, ICDCS Workshops 2017,
[7] Erwin Van Eyk, Alexandru Iosup, Johannes Grohmann, Simon Eis- Atlanta, GA, USA, June 5-8, 2017, Aibek Musaev, João Eduardo Ferreira,
mann, André Bauer, Laurens Versluis, Lucian Toader, Norbert Schmitt, and Teruo Higashino (Eds.). IEEE Computer Society, 395–400. https:
Nikolas Herbst, and Cristina L. Abad. 2019. The SPEC-RG Ref- //doi.org/10.1109/ICDCSW.2017.32
erence Architecture for FaaS: From Microservices and Containers [19] Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Har-
to Serverless Platforms. IEEE Internet Comput. 23, 6 (2019), 7–18. ter, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2018.
https://doi.org/10.1109/MIC.2019.2952061 SOCK: Rapid Task Provisioning with Serverless-Optimized Contain-
[8] Henrique Fingler, Amogh Akshintala, and Christopher J. Rossbach. ers. In 2018 USENIX Annual Technical Conference, USENIX ATC 2018,
2019. USETL: Unikernels for Serverless Extract Transform and Load Boston, MA, USA, July 11-13, 2018, Haryadi S. Gunawi and Benjamin
Why should you settle for less?. In Proceedings of the 10th ACM SIGOPS Reed (Eds.). USENIX Association, 57–70. https://www.usenix.org/
Asia-Pacific Workshop on Systems, APSys 2019, Hangzhou, China, Aug- conference/atc18/presentation/oakes
sut 19-20, 2019. ACM, 23–30. https://doi.org/10.1145/3343737.3343750 [20] JinSeok Oh and Soo-Mook Moon. 2015. Snapshot-based loading-time
[9] Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-che Tsai, acceleration for web applications. In Proceedings of the 13th Annual
Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl IEEE/ACM International Symposium on Code Generation and Opti-
Krauth, Neeraja Jayant Yadwadkar, Joseph E. Gonzalez, Raluca Ada mization, CGO 2015, San Francisco, CA, USA, February 07 - 11, 2015,
Popa, Ion Stoica, and David A. Patterson. 2019. Cloud Program- Kunle Olukotun, Aaron Smith, Robert Hundt, and Jason Mars (Eds.).
ming Simplified: A Berkeley View on Serverless Computing. CoRR IEEE Computer Society, 179–189. https://doi.org/10.1109/CGO.2015.
abs/1902.03383 (2019). arXiv:1902.03383 http://arxiv.org/abs/1902. 7054198
03383 [21] Oracle. 2019. Overview (Java Platform SE 8). Retrieved August 6, 2019
[10] Kiyokuni Kawachiya, Kazunori Ogata, Daniel Silva, Tamiya Onodera, from https://docs.oracle.com/javase/8/docs/api/
Hideaki Komatsu, and Toshio Nakatani. 2007. Cloneable JVM: a new [22] James S. Plank, Micah Beck, Gerry Kingsley, and Kai Li.
approach to start isolated java applications faster. In Proceedings of 1995. Libckpt: Transparent Checkpointing under UNIX. In
the 3rd International Conference on Virtual Execution Environments, USENIX 1995 Technical Conference on UNIX and Advanced
VEE 2007, San Diego, California, USA, June 13-15, 2007, Chandra Krintz, Computing Systems, New Orleans, Louisiana, USA, January
Steven Hand, and David Tarditi (Eds.). ACM, 1–11. https://doi.org/10. 16-20, 1995, Conference Proceedings. USENIX Association, 213–
1145/1254810.1254812 224. https://www.usenix.org/conference/usenix-1995-technical-
[11] Linux Kernel. 2020. Linux merged patch for unprivi- conference/libckpt-transparent-checkpointing-under-unix
leged checkpoint/restore. Retrieved August 31, 2020 from [23] Amazon Web Services. 2018. Firecracker. Retrieved May 15, 2019 from
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux. https://firecracker-microvm.github.io
12
Prebaking Functions to Warm the Serverless Cold Start Middleware ’20, December 7–11, 2020, Delft, Netherlands
[24] S. S. SHAPIRO and M. B. WILK. 1965. An analysis of variance test for In Proceedings of the International Symposium on Memory Systems,
normality (complete samples). Biometrika 52, 3-4 (dec 1965), 591–611. MEMSYS 2019, Washington, DC, USA, September 30 - October 03, 2019.
https://doi.org/10.1093/biomet/52.3-4.591 ACM, 53–65. https://doi.org/10.1145/3357526.3357542
[25] Jörg Thalheim, Pramod Bhatotia, Pedro Fonseca, and Baris Kasikci. [27] Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and
2018. Cntr: Lightweight OS Containers. In 2018 USENIX Annual Michael M. Swift. 2018. Peeking Behind the Curtains of Server-
Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11-13, less Platforms. In 2018 USENIX Annual Technical Conference, USENIX
2018, Haryadi S. Gunawi and Benjamin Reed (Eds.). USENIX Associa- ATC 2018, Boston, MA, USA, July 11-13, 2018, Haryadi S. Gunawi
tion, 199–212. https://www.usenix.org/conference/atc18/presentation/ and Benjamin Reed (Eds.). USENIX Association, 133–146. https:
thalheim //www.usenix.org/conference/atc18/presentation/wang-liang
[26] Ranjan Sarpangala Venkatesh, Till Smejkal, Dejan S. Milojicic, and
Ada Gavrilovska. 2019. Fast in-memory CRIU for docker containers.
13