Serverless Handbook
Serverless Handbook
What is serverless . . . . . . . . . . . . . . . . . 3
Serverless is an ecosystem . . . . . . . . . . . . . 23
Serverless pros . . . . . . . . . . . . . . . . . . 24
Serverless cons . . . . . . . . . . . . . . . . . . 27
The verdict? . . . . . . . . . . . . . . . . . . . . 30
TABLE OF CONTENTS
AWS . . . . . . . . . . . . . . . . . . . . . . . 33
Azure . . . . . . . . . . . . . . . . . . . . . . . 37
Firebase . . . . . . . . . . . . . . . . . . . . . 38
Netlify . . . . . . . . . . . . . . . . . . . . . . 40
Vercel . . . . . . . . . . . . . . . . . . . . . . 42
So … what to choose? . . . . . . . . . . . . . . . 44
Infrastructure-as-code . . . . . . . . . . . . . . . 47
Fast deploys . . . . . . . . . . . . . . . . . . . . 48
Architecture principles . . . . . . . . . . . . . . . . . . 62
Everything fails . . . . . . . . . . . . . . . . . . 64
TABLE OF CONTENTS
Conclusion . . . . . . . . . . . . . . . . . . . . 72
Queue . . . . . . . . . . . . . . . . . . . . . . 78
API Gateway . . . . . . . . . . . . . . . . . . . 84
Logging . . . . . . . . . . . . . . . . . . . . . . 88
Isolate errors . . . . . . . . . . . . . . . . . . . 94
TABLE OF CONTENTS
Be debuggable . . . . . . . . . . . . . . . . . . 103
Conclusion . . . . . . . . . . . . . . . . . . . . 109
Blockchain . . . . . . . . . . . . . . . . . . . . 124
Fin . . . . . . . . . . . . . . . . . . . . . . . 146
Conclusion . . . . . . . . . . . . . . . . . . . . 206
Observability . . . . . . . . . . . . . . . . . . . 208
Conclusion . . . . . . . . . . . . . . . . . . . . 278
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 334
ix of 344
Getting Started with
Serverless
Hello friend
Don’t want the intro? Jump straight to your first app at the end of
this chapter.
What is serverless
Serverless is other people’s servers running
your code.
The logical next step to platform as a service, which came from The
Cloud, which came from virtual private servers, which came from
colocation, which came from a computer on your desk running a
web server.
With a static IP address, you can tell DNS1 servers how to find your
server with a domain. People can type that domain into a URL and
find your server.
1 https://en.wikipedia.org/wiki/Domain_Name_System
Your internet is lower tier than a business would get. Less reliable
and if the provider needs to do maintenance, they think nothing of
shutting off your pipes during non-peak hours. Your server needs
strong internet 24/7.
Colocation lets you take that same server and put it in a data cen-
ter. They supply the rack space, stable power, good internet, and
physical security.
PS: Computers break all the me. A large data center replaces a hard
drive every few minutes just because a typical drive lasts 4 years and
when you have thousands, the stats are not in your favor.
Colocation solved physical problems, but not the fact that your
servers are bored.
You have to keep the hardware happy and thermally content, you
have to over-provision in case of traffic spikes or developer mis-
takes. Sometimes your site just isn’t as popular as you’d like.
The first type of virtualization were basic virtual hosts2 . They let
you run multiple websites on the same machine. A domain maps to
an application on your computer, web server knows the mapping,
and voila: sites can share resources.
Websites on the same computer are very close together. You could
hack one site and gain access to another. You could starve every
website for resources by attacking 1 site with a lot of traffic. You
could config yourself into a corner with overlapping configuration.
You’re on the hook for software setup and you share the machine
with other users.
2 https://en.wikipedia.org/wiki/Virtual_hosting
3 https://en.wikipedia.org/wiki/Virtual_private_server
4 https://en.wikipedia.org/wiki/OS-level_virtualisation
Early VPS was a lot like The Cloud. Computers running on the
internet without touching hardware.
Once your traffic started to grow, you’d need more servers to han-
dle the load. There’s only so much a single server can do every
second.
But how do you ensure your servers are all the same? How do you
spin them up quickly when traffic spikes on Black Friday?
5 https://en.wikipedia.org/wiki/Scalability#VERTICAL-SCALING
6 https://en.wikipedia.org/wiki/Scalability#VERTICAL-SCALING
You start every new server from an image in your cloud provider’s
library. Comes with basic setup and common defaults. You add
tweaks and create a new image.
The cloud provider gives you easy controls to create as many in-
stances of that server as you’d like. Press a button, get a server.
7 https://en.wikipedia.org/wiki/Docker_(software)
8 https://en.wikipedia.org/wiki/Kubernetes
With PaaS you pay somebody else to deal with the cloud while
you focus on code. They configure your servers and dockers and
kubernetes and make everything play together. You build the app.
Many PaaS providers let you drop down a few levels and break ev-
erything. You get to mess with low level configs, operating system
libraries, web servers, databases, etc. Empowering and dangerous.
I tend to get it wrong.
While PaaS takes care of your servers, you have to take care of the
“frontend”. Set up domains and DNS, make your application run
right for the platform, configure your own CDN10 , deal with static
files, and so on.
9 https://en.wikipedia.org/wiki/Platform_as_a_service
10 https://en.wikipedia.org/wiki/Content_delivery_network
Server containers so tiny you can spin them up and down in mil-
liseconds. They achieve this because the code they run is:
1. Small
2. Standardized
3. Does 1 thing
Servers never idle because they live as long as the request they’re
serving.
11 https://en.wikipedia.org/wiki/Serverless_computing
In the next few minutes you’re going to build your first serverless
backend. A service that says Hello
12 https://serverlesshandbook.dev/serverless-flavors
When working with serverless I like to use the open source Server-
less13 framework. We’ll talk more about why in the Good server-
less dev experience14 chapter.
Install it globally:
13 https://github.com/serverless/serverless
14 https://serverlesshandbook.dev/serverless-dx
15 https://serverless.com/framework/docs/providers/aws/guide/credentials/
mkdir hello-world
cd hello-world
touch serverless.yml
touch handler.js
# serverless.yml
service: hello-world
provider:
name: aws
runtime: nodejs12.x
stage: dev
16 https://serverlesshandbook.dev/dev-qa-prod
# serverless.yml
service: hello-world
provider:
name: aws
runtime: nodejs10.x
stage: dev
functions:
hello:
handler: ./handler.hello
events:
- http:
path: hello
method: GET
cors: true
Each entry becomes its own tiny server – a serverless lambda. To-
gether, they’re the hello-world service.
PS: enabling CORS17 lets you call this func on from other websites.
Like your frontend app.
// handler.js
17 https://en.wikipedia.org/wiki/Cross-origin_resource_sharing
You get a URL for your lambda and some debugging output. My
URL is https://z7pc0lqnw9.execute-api.us-east-1.amazonaws.com/dev/
if you open it in your browser, it’s going to say Hello <span
class='emoji' data-emoji='wave'>�</span>
I’ll keep it up because it’s free unless somebody clicks. And when
they do, current AWS pricing gives me 1,000,000 clicks per month
for free
Next chapter, we talk about the pros & cons of using serverless in
your next project.
Yes!
Serverless is a great option for most projects most of the time. You
save configuration and maintenance time, gain flexibility, and in
extreme cases spend more $$ per request than building your own
servers.
Large apps can reach the cost curve limits of serverless. Bank of
America, for example, announced $2B in savings18 from building
their own data centers.
You won’t hit those issues. And if you do, I hope there’s a business
model to back it up and you can afford DevOps professionals.
18 https://www.businessinsider.com/bank-of-americas-350-million-internal-cloud-bet-striking-p
Serverless is an ecosystem
When I say serverless, I don’t mean just throwing up code on a
function-as-a-service19 platform like AWS Lambda. I’m talking
about the whole ecosystem.
Is part of your app the same for every user? Package it up at deploy
time. No need to bother the server or the client with that work.
Is part of your app specific to individual users? Let the client handle
it. Every phone is a powerful computer these days.
19 https://en.wikipedia.org/wiki/Function_as_a_service
20 https://en.wikipedia.org/wiki/Content_delivery_network
Serverless pros
The main benefit of serverless is that you don’t deal with servers.
They’re somebody else’s problem.
21 https://serverlesshandbook.dev/serverless-architecture-principles
Programming productivity
• easier testing
• quicker understanding
• shorter development cycles
22 https://en.wikipedia.org/wiki/Unix_philosophy#Do_One_Thing_and_Do_It_
Well
You save opportunity and employee cost and you’re not paying for
servers you aren’t using.
With serverless, you pay per execution and run time. Like pay-as-
you-go pricing: Run code, pay for that run.
Scalability
Google likes to call serverless architectures from prototype to pro-
duc on to planet-scale. You don’t want to use serverless at planet
scale though.
23 https://serverlesshandbook.dev/getting-started
Serverless cons
As much as I think serverless is the next big thing in web develop-
ment, it’s not all fun and games out there. There are disadvantages
to using serverless.
1. Latency
2. Speed or bandwidth
Each execu on is fast because the code is small and servers are fast.
A few milliseconds and you’re done.
But latency can be high. You’re hitting the server cold every time.
That means each request waits for the computer to wake up.
24 https://en.wikipedia.org/wiki/Elasticity_(cloud_computing)
For low traffic applica ons with low latency demands, you might
need a constantly provisioned server.
Sometimes costly
If you have a lot of requests or long runtimes, you can rack up the
costs beyond what you’d pay with your own servers.
25 https://www.businessinsider.com/bank-of-americas-350-million-internal-cloud-bet-striking-p
Vendor lock-in
You can do all those things, but it’s a tedious and difficult task that
might break your app. You’re not building features or working on
your business while you migrate.
Avoid building architecture agnos c code. It’s hard and you’re not
likely to need it.
We’ll talk more about that in the Robust Backend Design26 chap-
ter.
The verdict?
Now what?
AWS
30
28 https://serverlesshandbook.dev/getting-started
29 https://serverlesshandbook.dev/serverless-pros-cons
30 /tmp/book-320/book-320/pdf-14574/2_chapters/aws.amazon.com
Many other hosting providers use AWS. Heroku runs their dynos
on EC2 instances, Netlify and Vercel use S3 for static files, Lambda
for cloud functions, etc. The exact details are a secret, but we can
guess.
Did you know AWS was more than half of Amazon’s revenue31 in
2019? It’s a beast.
With over 165 services, it’s impossible to try or even know all of
AWS. A few that I’ve used are:
• EC2 – old school cloud. You get a virtual computer, set it up,
and you’re in control. Runs forever unless you make it stop.
• S3 – the standard solution for static files. Upload a file, get
a URL, file stays there forever. Used for image and video
assets, but can’t run server code or host a website.
• CloudFront – a CDN32 that integrates with S3. Point to
static files via CloudFront and they go to a server nearest
to your users. Works like a read-through cache, makes your
apps faster.
• IAM – identity and account management. AWS forces you
to use this to manage permissions. It’s secure, tedious to set
up, and a thorn in your butt. Until it saves your butt.
31 https://www.itproportal.com/news/aws-now-makes-up-over-half-of-all-amazon-revenue/
32 https://en.wikipedia.org/wiki/Content_delivery_network
AWS services add up fast. Every tool does one job. No tool does
your job.
For example: I’d use AWS when my project involves data pipelines,
coordinating between users, and complex backend logic. You
know it’s backend logic because it impacts multiple users on
different devices.
33
33 /tmp/book-320/book-320/pdf-14574/2_chapters/azure.microsoft.com
34
34 /tmp/book-320/book-320/pdf-14574/2_chapters/firebase.google.com
You’ll have to change how you write your frontend code so it hooks
up with Firebase and … that’s about it.
Great for small demos and when you don’t want to think about the
backend at all.
35
Netlify is one of the best funded startups in this arena. That can be
an important factor.
They’ve been around long enough to rely on and are young enough
that you’ll get decent support, if you have issues. I find that to be a
great balance :)
Netlify is almost always a great choice for your web app. Their
cloud function support can be cumbersome and malnourished. It
doesn’t feel like a focus.
If you like interacting with your sites via the command line, I’ve
found Netlify to be less great.
37
I like Vercel’s command line interface and the fact I can run vercel
in any project and it shows up on the internet. No clicking or config
needed.
Vercel is best for frontend-heavy apps and when you’re using their
NextJS framework. Like Netlify, it feels unlikely their backend
support will reach the full power of AWS.
So … what to choose?
My preference is to put the frontend on Netlify or Vercel and the
backend on AWS.
I asked Twitter and it was all over the place. A theme emerged:
1. Infrastructure-as-code
2. Fast deploys
3. Tooling for common tasks
1. Your deploys are repeatable. Run deploy, get the same re-
sult every time. The same functions, the same queues, the
same caching servers, everything.
38 https://serverlesshandbook.dev/getting-started#
setup-for-serverless-work
39 https://en.wikipedia.org/wiki/Serverless_Framework
Fast deploys
The shorter your feedback cycle, the faster you can work.
You make a change … now what? If you have unit tests, they show
you part of the picture. The specific scenarios you thought to test,
the methods you’re exercising, the particular inputs.
All great.
But unit tests can’t tell you your system works. That’s where bugs
come from – systems complexity.
You can simulate the environment and run your tests. That works
to an extent, but it’s never perfect.
0. Hit deploy
1. Compile your code locally on your fast developer machine.
Since your code is small, it compiles in seconds.
2. Compile your infrastructure the serverless framework com-
piles your infrastructure into a config file for the target plat-
form. With AWS that’s SAM40 .
3. Upload your bundle this is the slowest part.
4. Infrastructure sets itself up using your config the platform
sets itself up. Servers appear, queues go up, etc. Takes a few
seconds
5. You’re ready to go
40 https://aws.amazon.com/serverless/sam/
# package.json
"scripts": {
"build": "tsc build",
"deploy": "npm run build && sls deploy"
}
With those 2 lines you can deploy from any branch without worry
that you’ll forget to build your project first. The build script runs
a typescript build and sls deploy runs a serverless deploy.
41 https://serverlesshandbook.dev/dev-qa-prod
In the next few minutes you’re going to build your first serverless
backend. A service that says Hello
42 https://serverlesshandbook.dev/serverless-flavors
Install it globally:
# serverless.yml
service: hello-world
provider:
name: aws
runtime: nodejs12.x
stage: dev
46 https://serverlesshandbook.dev/dev-qa-prod
# serverless.yml
service: hello-world
provider:
name: aws
runtime: nodejs10.x
stage: dev
functions:
hello:
handler: ./handler.hello
events:
- http:
path: hello
method: GET
cors: true
Each entry becomes its own tiny server – a serverless lambda. To-
gether, they’re the hello-world service.
PS: enabling CORS47 lets you call this func on from other websites.
Like your frontend app.
// handler.js
47 https://en.wikipedia.org/wiki/Cross-origin_resource_sharing
You get a URL for your lambda and some debugging output. My
URL is https://z7pc0lqnw9.execute-api.us-east-1.amazonaws.com/dev/he
if you open it in your browser, it’s going to say Hello <span
class='emoji' data-emoji='wave'>�</span>
I’ll keep it up because it’s free unless somebody clicks. And when
they do, current AWS pricing gives me 1,000,000 clicks per month
for free
Ding.
Ding. Ding.
BZZZ
The high API error rate was the biggest alarm. A catch-all that
triggered when you can’t be sure the more specific alarms even
work.
Nobody noticed.
Everything fails
The design principle behind every backend architecture states:
In 2011 Netflix forced engineers to think about this with the Chaos
Monkey48 in 2011. They wanted to “move from a development
model that assumed no breakdowns to a model where breakdowns
were considered to be inevitable”.
Not a big deal, but if that error happens in your payment flow and
you double charge a user … they’ll care.
48 https://en.wikipedia.org/wiki/Chaos_engineering
49 https://aws.amazon.com/lambda/sla/
A request comes in. You do X, then you do Y, then Z, then the result
comes out. The request is your input, the result is your output.
Like Henry Ford’s famous assembly line steel comes in the fac-
tory on one end, cars leave on the other.
• get request
• check if request already processed
• if processed, finish
• if not, do your thing
• trigger the next step
• mark request as processed
Having a way to answer “Did I process this yet?” gives you replayabil-
ity. Call the same action with the same request twice and nothing
happens.
Because each action performs one step in the process, you can look
at the processed flags to see what’s up.
And because you have replayability, you can retry until your action
succeeds.
If it’s a hardware problem, you can wait until your function runs on
a different physical machine. Yay serverless.
You can keep retrying until success because actions are safely re-
playable.
PS: more on queues and how this works in the chapter on serverless
elements51
Since actions are functions and your messages are plain data, you
can test locally. Unit tests are great, running production code on
production data in your terminal, that’s wow.
Conclusion
Build your system out of small isolated pieces
that talk to each other via queues.
Next chapter we dive into queues and lambdas, and talk about how
to tie them together into a system.
52 https://serverlesshandbook.dev/serverless-architecture-principles
53 https://serverlesshandbook.dev/serverless-flavors
54 https://en.wikipedia.org/wiki/Lambda_calculus
55 https://en.wikipedia.org/wiki/Turing_machine
56 https://en.wikipedia.org/wiki/Church%E2%80%93Turing_thesis
// src/handler.ts
Other providers and services have different events and expect dif-
57 https://swizec.com/blog/how-i-answer-the-door-with-aws-lambda-and-twilio/
swizec/9255
functions:
helloworld:
handler: dist/helloworld.handler
events:
- http:
path: helloworld
method: GET
cors: true
events lists the triggers that run this function. An HTTP GET
request on the path /helloworld in our case.
Different implementations exist and they all share these core prop-
erties:
Many modern queues add time to the mix. You can schedule mes-
sages for later. 2 seconds, 2 minutes, 2 days, … Some queues limit
how long messages can stick around.
Server processes can fail for any reason at any time61 . For tempo-
rary errors a queue can use exponential backoff62 when retrying.
Giving your system more and more time to recover from issues.
60 https://en.wikipedia.org/wiki/Polling_(computer_science)
61 https://serverlesshandbook.dev/architecture-principles
62 https://en.wikipedia.org/wiki/Exponential_backoff
Defining a queue
AWS SimpleQueueService is a great queue service for the AWS
serverless ecosystem. More powerful alternatives exist, but re-
quire more setup, more upkeep, and have features you might not
need.
resources:
Resources:
MyQueue:
Type: "AWS::SQS::Queue"
63 https://serverlesshandbook.dev/robust-backend-design
Processing a queue
You need a lambda to process messages on MyQueue.
functions:
myQueueProcess:
64 https://serverlesshandbook.dev/dev-qa-prod
A lambda that runs each time MyQueue has a new message to pro-
cess. With a batchSize of 1, each message runs its own lambda
– a good practice for initial implementations. More about batch
sizes in the Lambda Workflows65 and Robust Backend Design66
chapters.
The strange yaml syntax reads as: an SQS event fired by a queue
with the ARN iden fier of getA ribute(Arn) from MyQueue. Amazon
Resource Names, ARN, are unique identifiers for each resource in
your AWS account.
65 https://serverlesshandbook.dev/lambda-workflows
66 https://serverlesshandbook.dev/robust-backend-design
return true;
};
SNS
A useful alternative to SQS is the Simple Notification Service – SQS.
Similar behavior, except you can’t store messages on the queue.
API Gateway
You might not realize this, but servers don’t talk directly to the
internet.
67 https://en.wikipedia.org/wiki/Reverse_proxy
Other providers might have different names and they all perform
the same function: Take request from the internet and pass it on
to your lambda.
68 https://en.wikipedia.org/wiki/Scalability#HORIZONTAL-SCALING
Adding large files to your code makes starting new containers slow.
You can’t save locally because your server disappears after each
request.
You can think of S3 as a hard drive with an API. Read and write files,
get their URL, change permissions, etc.
Each file gets a URL that’s backed by a server optimized for static
files. No code, no dynamic changes. A raw file flying through HTTP.
69 https://en.wikipedia.org/wiki/Content_delivery_network
You can automate this part with build tools. Netlify and Vercel
both handle it for you.
Now when a browser requests a file, the URL resolves to the near-
est server. Request goes to that server and if the file is there, it’s
served. If there’s no file, the CDN goes to your original source,
caches the file, and then sends it back to the user.
And now your JavaScript, HTML, images, fonts, and CSS are fast to
load anywhere in the world.
Logging
Logging is one of the hardest problems in a distributed multi-
service world. You can’t print to the console or write to a local
file because you can’t see the console and files vanish after every
request.
There’s a bunch more to discover, but that’s the core. Next chapter
we look at using these to build a robust system.
Your buddy with another vast and powerful army hides behind a
hill on the other side. You need their help to win.
Smoke signals would reveal your plan to the city. It’s too far to
shout and phones are 2000 years in the future.
Send more messengers until one makes it back? How does your
friend know that any messenger made it back? Nobody wants to
attack alone.
70 https://en.wikipedia.org/wiki/Two_Generals%27_Problem
• isolate errors
• retry until success
• make operations replayable
• be debuggable
• remove bad requests
• alert the engineer when something’s wrong
• control your flow
73 https://www.theverge.com/2017/3/2/14792442/
amazon-s3-outage-cause-typo-internet-server
In your car, the brakes keep working even if your brake lights go
out. The systems work together, but independently.
Say you were building a basic math module. Would you write a
function that performs plus and minus?
That looks odd to me. Plus and minus are distinct operations.
You grab the pill bottle, take out a pill, put it on your desk, get an
email and start reading. You elbow the pill off your desk.
10 minutes later you look down and there’s no pill. Did you take it?
Avoid coupling
With atomic operations and delegating heavy work to other func-
tions, you’re primed for another mistake: Direct dependency.
Like this:
function myLambda() {
// read from db
// prep the thing
await anotherLambda(data)
74 https://en.wikipedia.org/wiki/Atomicity_(database_systems)
You can see this principle in action in the Lambda Pipelines for
distributed data processing75 chapter.
75 http://serverlesshandbook.dev/lambda-pipelines
AWS retries every lambda invocation76 , if the call fails. The num-
ber of retries depends on who’s calling.
API Gateway is proxying requests from users and that makes re-
tries harder than an SQS queue which has all the time in the world.
Details on how each implements retries differ. You can read more
about How SNS works77 and How SQS works78 in AWS docs.
76 https://docs.aws.amazon.com/lambda/latest/dg/invocation-retries.html
77 https://docs.aws.amazon.com/sns/latest/dg/sns-message-delivery-retries.
html
78 https://docs.aws.amazon.com/AWSSimpleQueueService/latest/
SQSDeveloperGuide/sqs-basic-architecture.html
Wait to delete the message until after confirmation. You might lose
data otherwise.
Two Generals Problem may strike between you and your database
function processMessage(messageId) {
let message = db.get(messageId)
if (!message.processed) {
try {
doTheWork(message)
} catch(error) {
throw error
}
message.processed = true
db.save(message)
if (db.get(messageId).processed) {
return success
} else {
throw "Processing failed"
}
}
return success
}
Be debuggable
You can use a debugger to step through your code locally. With
a unit test using production data. But it’s not the same as a full
production environment.
If local debugging fails, add logs. Many logs. Run in production, see
what happens.
9 requests go great, the 10th is a poison pill. Your code gets stuck
trying and retrying for days.
Dead letter queues79 can help. They hold bad messages until you
have time to debug.
# serverless.yml
functions:
worker:
handler: dist/lambdas/worker.handler
events:
# triggering from SQS events
- sqs:
79 https://en.wikipedia.org/wiki/Dead_letter_queue
resources:
Resources:
WorkerQueue:
Type: "AWS::SQS::Queue"
Properties:
QueueName:
,→ "WorkerQueue-${self:provider.stage}"
# send to deadletter after 10 retries
RedrivePolicy:
deadLetterTargetArn:
Fn::GetAtt:
- WorkerDLQueue
- Arn
maxReceiveCount: 10
WorkerDLQueue:
Type: "AWS::SQS::Queue"
Properties:
QueueName:
,→ "WorkerDLQueue-${self:provider.stage}"
# keep messages for a long time to help
,→ debug
MessageRetentionPeriod: 1209600 # 14 days
Now all you need is an alarm on dead letter queue size to say “Hey
something’s wrong, you should check”.
Bug in your code? Fix the bug, re-run worker from dead letter
queue. No messages are lost.
Alert an engineer
The challenge with serverless systems is that you can’t see what’s
going on. And you’re not sitting there staring at logs.
AWS has basic monitoring built-in, Datadog is great for more con-
trol. More on monitoring in the Monitoring serverless apps chap-
ter80
Writing fast code is great, but if your speedy lambda feeds into a
slow lambda, you’re gonna have a bad day. Work piles up, systems
stop, customers complain.
Conclusion
In conclusion, a distributed system is never 100% reliable. You
can make it better with small replayable operations, keeping code
debuggable, and removing bad requests.
84 https://serverlesshandbook.dev/serverless-performance
But how do you choose which database and where do you put it?
It depends.
85 https://en.wikipedia.org/wiki/Database
86 https://en.wikipedia.org/wiki/Cache_(computing)
Notice how the list is about speed? That’s because speed of data
access is the biggest predictor of app performance.
I’ve seen API endpoints hit the database 30+ times. Queries that
take 10ms instead of 1ms can mean the difference between a great
user experience and a broken app.
We’ll focus on speed in this chapter. But to gain speed and scalabil-
ity, databases sacrifice correctness. It’s important that you know
what correctness means in a database context.
You’re building a glorified database for your web and mobile apps
:)
87 https://en.wikipedia.org/wiki/ACID
88 https://serverlesshandbook.dev/serverless-architecture-principles
1. Flat file storage the simplest and fastest solution, great for
large data
2. Rela onal databases the most correct and surprisingly fast
solution, great for complex data
3. NoSQL the class of databases breaking ACID for greater
speed/scalability; different types exist
4. Blockchain a distributed database without a central author-
ity, the industry is figuring out what it’s good for
The simplest way to store data is a flat file database89 . You might
call it “organized files”.
Flat files are commonly used for blobby binary data like images.
You’ll want to put them on S3 for a serverless environment. That
negates some of the advantages.
89 https://en.wikipedia.org/wiki/Flat-file_database
To add a line at the beginning of a file, you have to move the whole
thing. To change a line in the middle, you have to update everything
that comes after.
Common use cases for flat files are logs, large datasets, and binary
files (image, video, etc).
90 https://serverlesshandbook.dev/appendix-more-databases#
flat-file-database
91 https://en.wikipedia.org/wiki/Relational_database
Disadvantages of relational
databases
Relational databases are harder to use, require more expertise to
tune performance, and you lose flexibility. This can be a good thing.
But you can make it more flexible with a blobby JSON field on every
model. Perfect for metadata.
This makes relational databases the perfect choice for typical ap-
plications. You wouldn’t use an RDBMS for files, but should con-
sider it for metadata about those files.
93 https://serverlesshandbook.dev/appendix-more-databases#
relational-databases--rdbms
94 https://en.wikipedia.org/wiki/NoSQL
Flavors of NoSQL
You can classify NoSQL databases in 4 categories:
Use key:value stores when you need blazing fast data with low
overhead.
Read more about choosing a NoSQL database and how to use it in the
appendix96
Blockchain
95 https://en.wikipedia.org/wiki/Eventual_consistency
96 https://serverlesshandbook.dev/appendix-more-databases#
the-nosql-approach-to-data
That’s right, git97 and The Blockchain98 share the same underlying
data structure: a merkle tree.
As a result you don’t need a central authority to tell you the current
state of your data. Each client can decide, if their data is valid.
97 https://en.wikipedia.org/wiki/Git
98 https://en.wikipedia.org/wiki/Blockchain
99 https://en.wikipedia.org/wiki/Merkle_tree
100 https://blockstack.org/
Files for large binary blobs, relational database for business data,
key:value store for persistent caching, document store for com-
plex data that lives together.
In this chapter, learn about REST best practices and finish with a
small implementation you can try right now. I left mine running
What is REST
REST101 stands for REpresentational State Transfer. Coined in
Roy Fielding’s 2000 doctoral thesis102 , it now represents the stan-
dard approach to web APIs.
You may have noticed this in the wild. RESTful APIs follow similar
guidelines and no two are alike.
These days any API that uses HTTP to transfer data and URLs to
iden fy resources is called REST.
A uniform interface
Each request iden fies the resource it is requesting. Using the URL
itself.
All that is natural. The important part is to start on the right foot
and clean up when you can.
That means
Here are tips I’ve picked up over the past 14 years of building and
using RESTful APIs.
URL schema
Your URL schema exists to solve one problem: Create a uniform
way to identify resources and endpoints on your server.
Engineers like to get stuck on pointless details, but it’s okay. Make
sure your team agrees on what makes sense.
https://api.wonderfulservice.com/<namespace>/<model>/<id>
https://api.wonderfulservice.com/<namespace>/<model>/<verb>/<id>
The verb specifies what you’re doing to the model. More on that
when we discuss HTTP verbs further down.
103 https://en.wikipedia.org/wiki/ISO_8601
104 https://en.wikipedia.org/wiki/Unix_time
Everyone agrees that a GET request is for getting data and should
have no side-effects.
Other verbs belong to the POST request. Or you can use HTTP
verbs like PUT, PATCH, and DELETE.
GET for getting data. POST for posting data (both create and
update). DELETE for deleting … on the rare occasion I let clients
delete data.
Errors
Should you use HTTP error codes to communicate errors?
Opinions vary.
One camp says that HTTP errors are for HTTP-layer problems.
Your server being down, a bad URL, invalid payload, etc. When
your application processes a request and decides there’s an error,
it should return 200 with an error object.
The other camp says that’s silly and we already have a great system
for errors. Your application should use the full gamut of HTTP
error codes and return an error object.
{
"status": "error",
"error": "This is what went wrong"
}
is best.
Added bonus: You can read the error object. Will you remember
what error 418 means?
Versioning
You can see the full code on GitHub105 . I encourage you to play
around and try deploying to your own AWS.
105 https://github.com/Swizec/serverlesshandbook.dev/tree/master/
examples/serverless-rest-example
functions:
getItems:
handler: dist/manageItems.getItem
events:
- http:
path: item/{itemId}
method: GET
cors: true
updateItems:
handler: dist/manageItems.updateItem
events:
- http:
path: item
method: POST
cors: true
- http:
path: item/{itemId}
method: POST
cors: true
deleteItems:
handler: dist/manageItems.deleteItem
events:
- http:
path: item/{itemId}
method: DELETE
cors: true
The {itemId} syntax lets APIGateway parse the URL for us and
pass identifiers to our code as parameters.
Mapping every operation to its own lambda means you don’t have
to write routing code. When a lambda gets called, it knows what
to do.
getItem
// src/manageItems.ts
106 https://github.com/Swizec/serverlesshandbook.dev/blob/master/
examples/serverless-rest-example/src/manageItems.ts
if (item.Item) {
return response(200, {
status: "success",
item: item.Item,
})
} else {
return response(404, {
status: "error",
error: "Item not found",
})
}
}
107 https://github.com/Swizec/serverlesshandbook.dev/blob/master/
examples/serverless-rest-example/src/dynamodb.ts#L80
// src/manageItems.ts
curl https://4sklrwb1jg.execute-api.us-east-1.amazonaws.com/dev/it
updateItem
// upsert an item
// /item or /item/ID
export const updateItem = async (
event: APIGatewayEvent
): Promise<APIResponse> => {
let itemId = event.pathParameters ?
,→ event.pathParameters.itemId : uuidv4()
if (!event.body) {
return response(400, {
status: "error",
error: "Provide a JSON body",
})
}
if (body.itemId) {
// this will confuse DynamoDB, you can't update the key
delete body.itemId
}
return response(200, {
status: "success",
item: item.Attributes,
})
}
To update an item.
deleteItem
Deleting is easy by comparison. Get the ID, delete the item. With
no verification that you should or shouldn’t be able to.
// src/manageItems.ts
if (!itemId) {
return response(400, {
status: "error",
error: "Provide an itemId",
return response(200, {
status: "success",
itemWas: item.Attributes,
})
}
We get itemId from the URL props and call a deleteItem method
on DynamoDB. The API returns the item as it was before deletion.
Fin
And that’s 14 years of REST API experience condensed into 2000
words. My favorite is how much easier this is to implement using
serverless than with Rails or Express.
fetch("https://swapi.dev/api/people/1/")
.then((res) => res.json())
.then(console.log)
{
"name": "Luke Skywalker",
"height": "172",
"mass": "77",
"hair_color": "blond",
"skin_color": "fair",
"eye_color": "blue",
"birth_year": "19BBY",
"gender": "male",
"homeworld": "https://swapi.dev/api/planets/1/",
"films": [
"https://swapi.dev/api/films/2/",
Frustrating … all you wanted was his name and hair color. Why’s
the API sending you all this crap?
And what’s this about Luke’s species being 1? What the heck is 1?
{
"name": "Human",
"classification": "mammal",
"designation": "sentient",
"average_height": "180",
"skin_colors": "caucasian, black, asian, hispanic",
"hair_colors": "blonde, brown, black, red",
"eye_colors": "brown, blue, green, hazel, grey, amber",
"average_lifespan": "120",
"homeworld": "https://swapi.dev/api/planets/9/",
"language": "Galactic Basic",
"people": [
"https://swapi.dev/api/people/1/",
"https://swapi.dev/api/people/4/",
"https://swapi.dev/api/people/5/",
"https://swapi.dev/api/people/6/",
"https://swapi.dev/api/people/7/",
"https://swapi.dev/api/people/9/",
That’s a lot of JSON to get the word "Human" out of the Star Wars
API109 …
109 https://swapi.dev/
fetch("https://swapi.dev/api/starships/12/")
.then((res) => res.json())
.then(console.log)
fetch("https://swapi.dev/api/starships/22/")
.then((res) => res.json())
.then(console.log)
And guess what, you didn’t cache anything. How often do you think
this data changes? Once a year? Twice?
query luke {
// id found through allPeople query
person(id: "cGVvcGxlOjE=") {
name
name
hairColor
species {
name
}
starshipConnection {
starships {
name
}
}
}
}
What is GraphQL
GraphQL111 is an open-source data query and manipulation lan-
guage for APIs and a runtime for fulfilling queries with existing
data.
110 http://graphql.org/swapi-graphql
111 https://en.wikipedia.org/wiki/GraphQL
On the client, you describe the shape of what you want and
GraphQL figures it out. On the server, you write resolver
functions for sub-queries and GraphQL combines them into the
full result.
GraphQL queries
query {
what_you_want {
its_property
}
}
query {
what_you_want {
its_property
}
other_thing_you_want {
its_property {
property_of_property
}
}
}
Variables let you create dynamic queries and build complex filters
to limit the scope of your result.
GraphQL comes with basic equality filters built-in and you’re en-
couraged to add more in your resolvers. Typical projects choose
to support sorting, greater-than, not-equals, etc.
GraphQL mutations
GraphQL mutations write data. Following this pattern:
mutation {
what_youre_updating(argument: "value", argument2: "value
,→ 2") {
prop_you_want_back
Like queries, you can put mutations side-by-side and use variables.
other_field(argument: $argument) {
return_prop
}
}
People like to say GraphQL is replacing REST and that’s not the
case. GraphQL is augmen ng REST.
Should you rip out your REST API and rewrite for GraphQL? Please
don’t.
112 https://swizec.com/blog/how-you-can-start-using-graphql-today-without-changing-the-bac
swizec/9350
No.
Define a clean API using domain driven design113 . How your back-
end stores data for max performance and good database design
differs from how clients think about that data.
113 https://en.wikipedia.org/wiki/Domain-driven_design
114 https://www.apollographql.com/
115 https://github.com/apollographql/apollo-server/tree/master/packages/
apollo-server-lambda
Apollo creates a GraphQL playground for us. You can try my imple-
mentation here: https://yrqqg5l31m.execute-api.us-east-1.amazonaws.co
116 https://serverlesshandbook.dev/serverless-rest-api#build-a-simple-rest
117 https://github.com/Swizec/serverlesshandbook.dev/tree/master/
examples/serverless-graphql-example
serverless.yml
We define a new function in the functions: section.
# serverless.yml
functions:
graphql:
handler: dist/graphql.handler
events:
- http:
path: graphql
method: GET
cors: true
- http:
path: graphql
method: POST
cors: true
Make sure to define both GET and POST endpoints. GET serves the
Apollo playground, POST handles the queries and mutations.
// src/graphql.ts
This server won’t run because the schema and resolvers don’t
match. Resolvers have fields that the schema does not.
We’ll add type definitions to the schema and a resolver for each
Query and Mutation.
To mimic our CRUD API from before118 , we use a schema like this:
// src/graphql.ts
118 https://serverlesshandbook.dev/serverless-rest-api#build-a-simple-rest
type Query {
item(id: String!): Item
}
type Mutation {
updateItem(id: String, name: String, body: String):
,→ Item
deleteItem(id: String!): Item
}
`
With REST, users could store and retrieve arbitrary JSON blobs.
GraphQL doesn’t support that. Instead, we define an Item type
with an arbitrary body string.
Each item will have an id, a name, and a few timestamps managed
by the server. Those help with debugging.
item()
// src/queries.ts
119 https://serverlesshandbook.dev/serverless-rest-api#build-a-simple-rest
return remapProps(item.Item)
}
updateItem()
It creates a new item when you don’t send an id and updates the
existing item when you do. If no item is found, the mutation throws
an error.
// src/mutations.ts
type ItemArgs = {
id: string
name: string
body: string
}
if (find.Item) {
// save createdAt so we don't overwrite on update
createdAt = find.Item.createdAt
} else {
throw "Item not found"
}
}
const updateValues = {
itemName: args.name,
body: args.body,
}
return remapProps(item.Attributes)
}
In the end, we return the object our database returned and let
GraphQL handle the rest.
// src/mutations.ts
return remapProps(item.Attributes)
}
// src/graphql.ts
// ...
const resolvers = {
Query: {
item,
},
Mutation: {
updateItem,
deleteItem,
},
}
Run yarn deploy and you get a GraphQL server. There’s even an
Apollo playground that helps you test.
You could learn Elixir and Erlang – purpose built languages for
message processing used in networking. But is that where you
want your career to go?
You could try Kafka120 or Hadoop121 . Tools designed for big data,
used by large organizations for mind-boggling amounts of data.
Are you ready for that?
Elixir, Erlang, Kafka, Hadoop are wonderful tools, if you know how
to use them. But there’s a significant learning curve and devops
work to keep them running.
120 https://kafka.apache.org/
121 https://hadoop.apache.org/
The system accepts batches of events, adds info about user and
server state, then saves each event for easy retrieval.
Great for problems you can split into independent tasks like prep-
ping data. Less great for large inter-dependent operations like
machine learning.
123 https://en.wikipedia.org/wiki/MapReduce
That means you can distribute the work. Run each on a separate
Lambda in parallel. Thousands at a time.
124 https://en.wikipedia.org/wiki/Amdahl%27s_law
For the comp sci nerds this has no impact on big-O complexity126
. You’re changing real-world performance, not the algorithm.
125 https://en.wikipedia.org/wiki/Commutative_property
126 https://en.wikipedia.org/wiki/Big_O_notation
• easy to understand
• robust against errors
• debuggable
• replayable
• always inspectable
The elements
We’re using 3 lambdas, 2 queues, and 2 DynamoDB tables.
3 Lambdas
Our lambdas are written in TypeScript and each does 1 part of the
process.
2 Queues
You can configure max retries and how long a message should stick
around. When it exceeds those deadlines, you can configure a
Dead Letter Queue to store the message.
2 tables
130 https://aws.amazon.com/sqs/
You first need to get data into the system. We use a Serverless
REST API131 .
# serverless.yml
functions:
sumArray:
handler: dist/sumArray.handler
events:
- http:
path: sumArray
method: POST
cors: true
environment:
timesTwoQueueURL:
Ref: TimesTwoQueue
131 /serverless-rest-api
It’s an SQS queue postfixed with the current stage, which helps us
split between production and development.
// src/sumArray.ts
132 https://docs.aws.amazon.com/AWSSimpleQueueService/latest/
SQSDeveloperGuide/sqs-visibility-timeout.html
if (!event.body) {
return response(400, {
status: "error",
error: "Provide a JSON body",
})
}
return response(200, {
status: "success",
array,
arrayId,
})
}
// src/timesTwo.ts
133 https://github.com/Swizec/serverlesshandbook.dev/blob/master/
examples/serverless-data-pipeline-example/src/utils.ts#L11
AWS and SQS call our lambda and keep retrying when something
goes wrong. Perfect to let you re-deploy when there’s a bug
await Promise.all(
uniqueArrayIds.map((arrayId) =>
sendSQSMessage(process.env.reduceQueueURL!, arrayId)
)
)
return true
}
Accept an event from SQS, parse JSON body, do the work, store
intermediary results, trigger reduce step for each input.
// src/timesTwo.ts
await Promise.all(
uniqueArrayIds.map((arrayId) =>
sendSQSMessage(process.env.reduceQueueURL!, arrayId)
)
)
We use an ES6 Set to get a list of unique array ids from our input
message. You never know what gets jumbled up on the queue and
you might receive multiple inputs in parallel.
Combining intermediary steps into the final result is the most com-
plex part of our example.
You could combine 2 elements at a time and run the reduce step in
parallel.
1. Take 2 elements
2. Combine
3. Write the new element
4. Delete the 2 originals
Like this:
{
arrayId: // ...
packetId: // ...
packetValue: 2,
packetContains: 1,
arrayLength: 10
}
{
arrayId: // ...
packetId: // ...
packetValue: 4,
packetContains: 1,
arrayLength: 10
}
// src/reduce.ts
export const handler = async (event: SQSEvent) => {
// grab messages from queue
// depending on batchSize there could be multiple
let arrayIds: string[] = event.Records.map((record:
,→ SQSRecord) =>
JSON.parse(record.body)
Grab arrayIds from the SQS event and wait until every
reduceArray call is done.
// src/reduce.ts
if (packets.length > 0) {
// sum packets together
const sum = packets.reduce(
(sum: number, packet: Packet) => sum +
,→ packet.packetValue,
0
// are we done?
if (newPacket.packetContains >= newPacket.arrayLength)
,→ {
// done, save sum to final table
await db.updateItem({
TableName: process.env.SUMS_TABLE!,
Key: {
arrayId,
},
UpdateExpression: "SET resultSum = :resultSum",
ExpressionAttributeValues: {
":resultSum": sum,
},
})
} else {
// not done, trigger another reduce step
await sendSQSMessage(process.env.reduceQueueURL!,
,→ arrayId)
}
}
}
134 We use a Limit: 2 argument to limit how much of the table we scan
through. Don’t need more than 2, don’t read more than 2. Keeps everything
snappy
Conclusion
Lambda processing pipelines are a powerful tool that can process
large amounts of data in near real-time. I have yet to find a way to
swamp one of these in production.
What about the bad errors? And how do you debug code you can’t
see?
Observability
Observability is the art of understanding the internal state of a
system based on its outputs. It’s a continuous process.
135 /robust-backend-design
136 /serverless-architecture-principles
Always.
When you have 10 users, eh I’d focus on getting users. When you
have 100 users, eh they’ll tell you when there’s a bug.
You’ll see stranger and stranger bugs the more users you have.
A 1-in-1000 bug happens every day when you have 1000 users.
At Google scale, tiny impossible-to-reproduce bugs happen every
minute.
What to measure
Deciding what to measure is a art. You’ll get it wrong.
You realize the 20% that are useful don’t have enough info. Despite
your best efforts, you can’t be certain what happened.
Adjust what you log, add the info you wish you had, remove the info
you didn’t need. Next time will be better.
You leave them behind so you can later trace a path through the
system. How did this user get into that state? Are we seeing bot-
tlenecks? Did event B that always comes after event A suddenly
stop coming?
At the least, you’ll want to measure 3 metrics for each part of your
system:
When to alarm
Metrics help when you look at them, logs help when you’re solving
a problem. Alarms come to you.
You’ll want to set alarms for high error and failure rates (depends
what you consider high) and anomalies in throughput. When a
100/hour event drops to zero, something’s wrong.
Distributed logging
Logging is the core of your observability toolbox. Metrics and
traces build on top of logs.
In a serverless system, you can’t sign into a server to see the logs.
There’s no server and your system is distributed across many ser-
vices.
console.log("MetricName:value|type|sample_rate|tag1,tag2")
When you print in that format, you can connect a number of 3rd
party tools that give you power beyond the CloudWatch UI. Data-
Dog139 has been a great choice for me.
137 https://github.com/statsd/statsd
138 https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/
CloudWatch-Agent-custom-metrics-statsd.html
139 https://www.datadoghq.com
You can build detailed dashboards for specific parts of your system.
That comes when your project grows.
You’ve built a pipeline that edits user data and want to make sure
it works. Test in production?
A word of caution: It’s easy to fill your database with crappy data.
Try to start production clean.
The bigger your system, the trickier this gets. You need to host a
database, run queues, caching layers, etc.
You can get close with the LocalStack plugin for the Serverless
Framework140 . But only production is like production.
0. localhost
1. development
2. staging / QA
3. production
You build and test on localhost. Get fast iteration and reasonable
certainty that your code works.
140 https://www.serverless.com/plugins/serverless-localstack
The development environment helps you test your code with oth-
ers’ work. You can show off to a friend, coworker, or product man-
ager for early feedback.
141 https://www.serverless.com/framework/docs/providers/aws/
cli-reference/invoke-local/
# serverless.yml
service: my-service
provider:
name: aws
stage: dev
Deploy with sls deploy and that creates or updates the dev
stage.
# serverless.yml
resources:
Resources:
TimesTwoQueue:
Dynamic stages
# serverless.yml
provider:
name: aws
# use stage option, dev by default
stage: ${opt:stage, "dev"}
Deploy previews
The 3 stage split starts breaking down around the 6 to 7 engineers
mark. More if your projects are small, less if they’re big.
Bob meanwhile is fixing bugs and keeping the lights on. He needs
to merge his work into development, staging, and production ev-
ery day.
There’s 2 solutions:
1. Deploy previews
2. Feature flags
142 /handling-secrets
You get an isolated environment with all the working bits and
pieces. Automate it with GitHub Actions to create a new stage for
every pull request.
That’s the model Netlify and Vercel champion. Every pull request
is automatically deployed on a new copy of production with every
update.
Trunk-based development
A popular approach in large teams is trunk-based development.
Anyone can change any code at any time. Tests help you prevent
accidents.
143 https://www.oreilly.com/library/view/software-engineering-at/
9781492082781/ch01.html
You can even split your project into sub-projects. Isolated areas
of concern that can move and deploy independently. Known as
microservices.
With metered pricing, serverless lets you pay directly for execu-
tion time, memory size, and storage space. No overhead. You don’t
use it, you don’t pay it.
144 /serverless-pros-cons
145 /databases
146 /lambda-pipelines
147 /robust-backend-design
148 https://aws.amazon.com/blogs/aws/new-for-aws-lambda-1ms-billing-granularity-adds-cost-sav
Faster code uses more memory to gain speed. Or you can beef up
CPU. Both make your code more expensive to run.
Saving storage space costs CPU time to compress and clean data.
Storage is cheap. Bandwidth to and from storage gets pricey.
You’ll find that computers are fast & cheap and humans are slow
& expensive. Stick to 1, add 2 for easier maintenance. 3 when
necessary.
149 https://aws.amazon.com/s3/pricing/
150 https://aws.amazon.com/lambda/pricing/
1. Latency
2. Throughput
151 https://en.wikipedia.org/wiki/Andrew_S._Tanenbaum
You can write the fastest code in the world, but if it takes 2 seconds
to get started you’ll have unhappy users.
• network time
• internal routing
• lambda wake up time
Throughput
Throughput measures how fast you work. When you get a request,
how long does it take?
• code performance
• input/output
152 /lambda-pipelines
153 https://en.wikipedia.org/wiki/Scalability
This type of scaling can get expensive. You need more resources
– faster CPU, more memory, better hardware, a GPU – and lots of
engineering effort to optimize your code.
Horizontal scaling
Horizontal scaling is the art of splitting work between computa-
tional resources.
But you have to find a balance. 6 cheap computers can cost more
than 3 expensive computers.
Achieving speed
Never optimize your code until it tells you where it hurts. Bottle-
necks are surprising and unpredictable. Measure.
But don’t kick the table with your pinky toe either. You already
know that hurts.
There is no one solution I can give you. Optimizing cold boot per-
formance takes work and understanding your software.
155 https://www.serverless.com/framework/docs/providers/aws/guide/
packaging#package-configuration
156 https://gist.github.com/hellerbarde/2843375
But you lose memory when your lambda goes to bed. That’s where
a cache can help. A service with persistent memory that stores pre-
computed values.
Use memoization when you call the same code multiple times per
request. Use cache when you need the same data across many
requests.
return memoized
}
Keep it simple.
Can you move it off the critical path158 ? Cache or memoize any
API and database responses?
158 https://en.wikipedia.org/wiki/Critical_path_method
Optimizing cost
I talked to an AWS billing expert and that was the take-away. Then
I killed the whole chapter on cost.
Execution time goes from 35s with 128MB to less than 3s with
1.5GB, while being 14% cheaper to run.
160 https://google.com
161 https://github.com/Swizec/serverlesshandbook.dev/tree/master/
examples/serverless-chrome-example
You can use the engine for browser automation – scraping, testing,
screenshots, etc. When you need to render a website, Chromium
is your friend.
This means:
1. install dependencies
This installs everything you need to both run and interact with
Chrome.
2. configure serverless.yml
# serverless.yml
service: serverless-chrome-example
provider:
name: aws
runtime: nodejs12.x
stage: dev
package:
163 https://github.com/alixaxel/chrome-aws-lambda#versioning
Write code that interacts with a website like a person would. Any-
thing a person can do on the web, you can do with Puppeteer.
164 https://pptr.dev/
Build a scraper
Web scraping is fiddly but sounds simple in theory:
• load website
• find content
You adapt the core technique to each website you scrape and
there’s no telling when the HTML might change.
You might even find websites that actively fight against scraping.
Block bots, limit access speed, obfuscate HTML, …
Please play nice and don’t unleash thousands of parallel requests onto
unsuspec ng websites.
https://www.youtube.com/watch?v=wRJTxahPIi4
1. more dependencies
Start with the serverless.yml and dependencies from earlier
(chrome-aws-lambda and puppeteer).
Add aws-lambda:
# serverless.yml
functions:
scraper:
handler: dist/scraper.handler
memorysize: 2536
timeout: 30
events:
- http:
path: scraper
method: GET
cors: true
3. getChrome()
The getChrome method instantiates a new browser context. I like
to put this in a util file.
// src/util.ts
try {
browser = await chrome.puppeteer.launch({
args: chrome.args,
defaultViewport: {
width: 1920,
height: 1080,
isMobile: true,
deviceScaleFactor: 2,
},
executablePath: await chrome.executablePath,
headless: chrome.headless,
ignoreHTTPSErrors: true,
})
return browser
}
4. a shared createHandler()
We’re building 2 pieces of code that share a lot of logic – scraping
and screenshots. Both need a browser, deal with errors, and parse
URL queries.
if (!search) {
return {
statusCode: 400,
body: "Please provide a ?search= parameter",
}
}
if (!browser) {
return {
statusCode: 500,
body: "Error launching Chrome",
}
try {
// call the function that does the real work
const response = await workFunction(browser, search)
return response
} catch (err) {
console.log(err)
return {
statusCode: 500,
body: "Error scraping Google",
}
}
}
5. scrapeGoogle()
if (!response.ok()) {
throw "Couldn't get response"
}
await page.goto(response.url())
return {
statusCode: 200,
body: JSON.stringify(links),
}
}
if (!response.ok()) {
throw "Couldn't get response"
}
await page.goto(response.url())
To scrape google, we type a search into the input field, then hit
submit and wait for the page to load.
return {
statusCode: 200,
body: JSON.stringify(links),
}
https://twitter.com/Swizec/status/1282446868950085632
Take screenshots
Taking screenshots is similar to scraping. Instead of parsing the
page, you call .screenshot() and get an image.
165 https://github.com/Swizec/serverlesshandbook.dev/tree/master/
examples/serverless-chrome-example
First, we tell API Gateway that it’s okay to serve binary data.
# serverless.yml
provider:
name: aws
runtime: nodejs12.x
stage: dev
apiGateway:
binaryMediaTypes:
- "*/*"
functions:
screenshot:
handler: dist/screenshot.handler
memorysize: 2536
timeout: 30
events:
- http:
path: screenshot
method: GET
cors: true
3. screenshotGoogle()
We’re using similar machinery as before.
// src/screenshot.ts
if (!response.ok()) {
throw "Couldn't get response"
}
await page.goto(response.url())
if (!element) {
throw "Couldn't find results div"
}
if (!boundingBox) {
throw "Couldn't measure size of results div"
}
await page.screenshot({
path: imagePath,
clip: boundingBox,
})
const data =
,→ fs.readFileSync(imagePath).toString("base64")
return {
statusCode: 200,
headers: {
"Content-Type": "image/png",
},
body: data,
isBase64Encoded: true,
}
}
Same code up to when we load the results page. Type a query, hit
submit, wait for reload.
if (!element) {
throw "Couldn't find results div"
}
if (!boundingBox) {
throw "Couldn't measure size of results div"
}
return {
statusCode: 200,
headers: {
"Content-Type": "image/png",
},
body: data,
isBase64Encoded: true,
}
3 and 4 are great because you can build a small website that ren-
ders a social card for your content and use this machinery to turn
it into an image.
Have fun
Isolated code that does one thing with no cruft. Runs on-demand,
consumes no resources when not in use, scales near infinitely. Per-
fection.
And it runs on a server where users can’t see the code. There’s
no right-click inspect, no JavaScript files downloaded, no user en-
vironment at all.
1. Hardcoded values
2. Dotenv files
3. Secrets manager
MY_SECRET_KEY="f3q20-98facv87432q4"
Hardcoded secrets are the easiest to use and the least secure.
Code runs on the server and users won’t be able to steal your
secrets.
But anyone with access to your code can see the secrets.
Share on GitHub and that includes the whole world. Bots always
scrape GitHub looking for strings that look like keys. Your secret
will be stolen.
AWS is paranoid enough that their own bot looks for secret keys.
If they find yours, your AWS account gets locked. Ask me how I
know166
166 https://swizec.com/blog/what-happens-when-you-push-aws-credentials-to-github/
Dotenv files
A step up from hardcoded keys are dotenv files – .env. Configura-
tion files in your codebase that hold secrets.
# .env
MY_SECRET_KEY=f3q20-98facv87432q4
MY_API_URL=https://example.com
You should not store these in version control. That’s where the
increased security comes from.
Secrets manager
The most secure way to handle secrets is using a secrets manager.
You can even make your secrets double blind. Nobody needs to
know their values.
Engineers can’t see secrets in the code, they’re not saved on any-
one’s laptop, you can’t steal them from the server, and with the
right configuration, secrets change every N days.
167 https://console.aws.amazon.com/secretsmanager/home
# serverless.yml
provider:
# ...
iamRoleStatements:
- Effect: "Allow"
Action:
- "secretsmanager:GetSecretValue"
Resource:
,→ "arn:aws:secretsmanager:${self:provider.region}:*"
This instantiates a new SSM client, gets your secret value, returns
a JSON. Parse JSON, get secrets.
This is an API call that might fail168 . Make sure to handle errors
and fail correctly, if you can’t get the secret.
168 https://serverlesshandbook.dev/robust-backend-design
Conclusion
Choose the strategy that fits your use-case and safety needs.
Authentication.
It’s easy in theory: Save an identifier on the client, send with every
request, check against a stored value on the server.
Where do you save the identifier? How does the client get it? What
authentication scheme do you use? What goes on the server? How
do you keep it secure? What if you need to run authenticated code
without the user?
What is authentication
A typical authentication system deals with everything from user
identity, to access control, authorization, and keeping your system
secure.
Iden ty answers the “Who are you?” question. The most important
aspect of authentication systems. Can be as simple as an honor-
based input field.
Authoriza on answers the “Which parts of the system can you use?”
question. Two schemes are common: role-based and scope-based
authorization. They specify which users can do what.
Factors of authentication
169 https://en.wikipedia.org/wiki/Authentication#Authentication_factors
Credit card + PIN is 2-factor authentication. You own the card and
know the PIN.
The 2nd job after access control is authorization. What can this
user do?
Technically they’re the same – a user property. It’s like utility vs. se-
mantic classes in CSS. Debate until you’re blue in the face, then
pick what feels right :)
In practice you’ll see that roles get clunky and scopes are tedious.
Like my dayjob gave me permission to configure CloudFront, but
not to see what I’m doing.
You can test your implementation too. Change the Lambda base
URL
The API approach works great with modern JavaScript apps, mo-
bile clients, and other servers.
Always use HTTPS and remember: A JWT token on its own lets you
impersonate a user.
173 https://en.wikipedia.org/wiki/Cryptographic_hash_function
// src/util.ts
Without a salt, the string password turns into the same hash for
every app. Precomputed rainbow tables work like magic.
With a salt, the string password hashes uniquely to your app. At-
tackers need new rainbow tables, if they can find the salt.
Add the username and each hash is unique to your app and the user.
Creating new rainbow tables for every user is not worth it.
174 https://en.wikipedia.org/wiki/Rainbow_table
Environment variables
# serverless.yml
service: serverless-auth-example
provider:
# ...
environment:
SALT: someRandomSecretString_pleaseUseProperSecrets:)
JWT_SECRET: useRealSecretsManagementPlease
175 /handling-secrets
auth.login function
Users need to be able to login – send an API request with their
username and password to get a JWT token. We’ll keep it similar
to the REST API chapter176 .
# serverless.yml
functions:
login:
handler: dist/auth.login
events:
- http:
path: login
method: POST
cors: true
176 /serverless-rest-api
if (!user) {
// user was not found, create
user = await createUser(username, password)
} else {
// check credentials
if (hashPassword(username, password) !== user.password)
,→ {
// <span class='emoji'
,→ data-emoji='rotating_light'>�</span>
return response(401, {
status: "error",
error: "Bad username/password combination",
})
}
}
return response(200, {
user: omit(user, "password"),
token,
})
}
Then we sign a JWT token with our secret and send it back. Make
sure you don’t send sensitive data like passwords to the client.
Even hashed.
return response(200, {
user: omit(user, "password"),
token,
})
auth.verify function
For authentication to work across page reloads, you have to store
the JWT token. These can expire or get revoked by the server.
Clients use the verify API to validate a session every time they
initialize. A page reload on the web.
When you know the session is valid, you treat the user as logged in.
Ask for a username/password otherwise.
177 https://github.com/auth0/node-jsonwebtoken
178 https://codesandbox.io/s/serverless-auth-example-9ipfb
# serverless.yml
functions:
# ...
verify:
handler: dist/auth.verify
events:
- http:
path: verify
method: POST
cors: true
// src/auth.ts
try {
jwt.verify(token, process.env.JWT_SECRET!)
return response(200, { status: "valid" })
} catch (err) {
return response(401, err)
}
}
private.hello function
This is where it gets fun – verifying authentication for private APIs.
# serverless.yml
functions:
179 https://github.com/auth0/node-jsonwebtoken
// src/private.ts
if (user) {
return response(200, {
message: `Hello ${user.username}`,
})
} else {
return response(401, {
status: "error",
error: "This is a private resource",
})
The checkAuth method takes our request, verifies its JWT token,
and returns the payload. A user in our case.
// src/util.ts
if (bearer) {
try {
const decoded = jwt.verify(
// Bearer prefix from Authorization header
bearer.replace(/^Bearer /, ""),
process.env.JWT_SECRET!
)
verify decodes the token for us, which means we can see the
user’s username without a database query.
And if a big provider gets hacked, your app is one among thousands.
Feels less bad eh?
That means a login dance between your server and the auth
provider. You’ll need to send your own JWT token to ask if the
user’s token is valid.
The course platform runs in the browser, which means the user
<> lambda connection wasn’t necessary. A benefit of using a
provider
if (ping.product_permalink in PRODUCTS) {
// create user from Gumroad data
const user = await upsertUser(ping);
if (user) {
// initialize Auth0 server client
const auth0 = await getAuth0Client();
const roleId =
,→ PRODUCTS[ping.product_permalink];
Auth0 Client
return auth0;
}
// find user
const users = await
,→ auth0.getUsersByEmail(purchaseData.email);
Libraries help :)
The simplest way to store data is a flat file database180 . Even if you
call it “just organized files”.
Serverless systems don’t have drives to store files so these flat file
databases aren’t a popular choice. You’d have to use S3 or similar,
which negates some of the built-in advantages.
180 https://en.wikipedia.org/wiki/Flat-file_database
181 https://en.wikipedia.org/wiki/Paging
182 https://en.wikipedia.org/wiki/File_system
183 https://en.wikipedia.org/wiki/Direct_memory_access
To add a line at the beginning of a file, you have to move the whole
thing. To change a line in the middle, you have to update everything
that comes after.
You have to read all your files to compare, analyze, and search. If
you didn’t think of a use-case beforehand, you’re left with a slow
search through everything.
To find images from a certain date, you can search through your
files and look at the metadata they contain.
Gives you quick access to specific dates and fast scans across many.
Due to its low overhead, flat file storage is a great choice when
you’re looking for speed and simplicity.
The most common use cases for flat files are logs, large datasets,
and large binary files (image, video, etc).
You often append logs and rarely read them. You store large
datasets as structured files for easy sharing. You save images and
rarely update, and they contain orders of magnitude more binary
data than structured metadata.
Relational databases –
RDBMS
186 https://en.wikipedia.org/wiki/PostgreSQL
Disadvantages of relational
databases
187 https://en.wikipedia.org/wiki/Database_index
A database server
Since you don’t want to run your own servers (the whole point of
serverless), you’ll need a provider. 3rd party services are okay, if
your serverless provider doesn’t offer their own.
188 https://aws.amazon.com/rds/
189 https://aws.amazon.com/rds/aurora/
https://twitter.com/Swizec/status/1210371195889049600
Let’s say you’re building a blog. You have authors and posts.
190 https://en.wikipedia.org/wiki/Domain_model
191 https://en.wikipedia.org/wiki/Object_model
192 https://en.wikipedia.org/wiki/Foreign_key
193 https://en.wikipedia.org/wiki/SQL
Having the posts table “belong to” (point at) the authors table
means each author can have multiple posts.
Where life gets real tricky real fast is selecting data from multiple
tables. You have to use SQL joins196 , which are based on set arith-
metic.
If you want a list of post titles and dates with each author:
This is called an inner join197 where you take a cartesian join com-
bining every row in authors with every row in posts and filter
away the non-matches.
Those are some basics that cover most use-cases. It takes some
practice to use SQL effectively so practice away :)
196 https://en.wikipedia.org/wiki/Join_(SQL)
197 https://en.wikipedia.org/wiki/Join_(SQL)#Inner_join
This makes relational databases the perfect choice for most appli-
cations. You wouldn’t use them to store files, but should consider
it for metadata about those files. They’re also not a great choice
for fast append-only writes like logs or tweets.
I wouldn’t worry about number 5. If you ever reach the scale where
your data doesn’t fit in a single database, you’ll have a team to solve
the problem for you :)
199 https://en.wikipedia.org/wiki/NoSQL
Flavors of NoSQL
You can classify NoSQL databases in 4 broad categories:
You pay for that 3 years down the line with inconsistent data.
Learned my lesson
Haven’t had a good excuse to use one yet, but I’ve heard Neo4j205
is great.
200 https://en.wikipedia.org/wiki/Redis
201 https://en.wikipedia.org/wiki/Memcached
202 https://en.wikipedia.org/wiki/MongoDB
203 https://en.wikipedia.org/wiki/Amazon_DynamoDB
204 https://en.wikipedia.org/wiki/Bigtable
205 https://en.wikipedia.org/wiki/Neo4j
The simplicity of key:value stores gives you speed at the cost of not
being able to store complex data.
206 https://en.wikipedia.org/wiki/Eventual_consistency
provider:
environment:
resources:
Resources:
DataTable:
return result.Attributes;
};
207 https://en.wikipedia.org/wiki/Universally_unique_identifier
To update this data you have to create a similar method that gets
your dataId as a parameter and uses it to run an updateItem
query. Make sure you aren’t always creating a new identifier.
208 https://github.com/Swizec/markdownlandingpage.com/blob/master/
server/src/dynamodb.ts#L35
return result.Items;
};
But it lets you scan through a table looking for entries that fit a
criteria.
You can use the getItem approach when you know exactly what
you’re looking for.
209 https://github.com/Swizec/markdownlandingpage.com/blob/master/
server/src/dynamodb.ts#L73
if (!result.Item) {
return {};
}
return result.Item
};
Blockchain
That means you can always verify your data. Follow the chain and
validate every hash. Once you reach the initial block, you know
your chain is valid.
As a result you don’t need a central authority to tell you the current
state of your data. Clients can independently decide, if the data
they have is valid. Often by assuming the longest valid chain is
correct.
I wouldn’t use the blockchain to store real data just yet, but it’s
an exciting space to watch. Blockstack213 is a great way to get
started.
210 https://en.wikipedia.org/wiki/Git
211 https://en.wikipedia.org/wiki/Blockchain
212 https://en.wikipedia.org/wiki/Merkle_tree
213 https://blockstack.org/