0% found this document useful (0 votes)
20 views39 pages

AWS Lambda at Scale - ITBA

The document discusses strategies for optimizing AWS Lambda functions to handle high loads at scale. It describes three key performance indicators - latency, throughput, and cost per transaction - and emphasizes minimizing latency. Various techniques are presented for reducing cold start durations and optimizing handler logic, such as increasing RAM allocation, moving work outside the handler, employing efficient algorithms, and using multi-threading.

Uploaded by

Alexis Cuadrado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views39 pages

AWS Lambda at Scale - ITBA

The document discusses strategies for optimizing AWS Lambda functions to handle high loads at scale. It describes three key performance indicators - latency, throughput, and cost per transaction - and emphasizes minimizing latency. Various techniques are presented for reducing cold start durations and optimizing handler logic, such as increasing RAM allocation, moving work outside the handler, employing efficient algorithms, and using multi-threading.

Uploaded by

Alexis Cuadrado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

AWS LAMBDA @ SCALE

DESIGNING FOR HIGH LOAD


BY ALEXIS CUADRADO
LOOK, IT SCALES
AUTOMATICALLY!

MANAGED SERVICE

U N T
ACCO NCY COLD S
TARTS
U R R E
CONC

EXECUTION THROTTLING
COSTS

BURST
LAMBDA @ SCALE CONCURRENCY ALEXIS CUADRADO
A SCHEDULED JOB
check age of access keys 2

IAM

1 3 4
notify
trigger daily noncompliance send email

IAM User
EventBridge SNS
Lambda Function

LAMBDA @ SCALE ALEXIS CUADRADO


A CRITICAL LAMBDA-BASED SERVICE
Other function

Downstream services

Internet
API Gateway Lambda Function
Booking service

Other function

LAMBDA @ SCALE ALEXIS CUADRADO


OH BUT WHEN LOAD HITS...
Other function

Downstream services

1M+
requests per minute
(RPM)
Internet
API Gateway Lambda Function
Booking service

Other function

LAMBDA @ SCALE ALEXIS CUADRADO


I HAVE TWO SIDES
REQUESTS

“I AM GOD, THE WHOLE UNIVERSE”

“I AM NOTHING, WITH NO CONTROL”

TIME

LAMBDA @ SCALE ALEXIS CUADRADO


THE CHALLENGE OF SCALE

GETTING OUR LAMBDA FUNCTIONS TO


PERFORM AS REQUIRED
THREE PERFORMANCE INDICATORS

$
COST PER
LATENCY THROUGHPUT TRANSACTION
seconds or milliseconds transactions per second (TPS) USD

MINIMIZE MAXIMIZE MINIMIZE


(meet demand)
TIP: Define proper Service Level Objectives (SLOs)
LAMBDA @ SCALE ALEXIS CUADRADO
MINIMIZING LATENCY
MAKE EVERY MILLISECOND COUNT

AWS LAMBDA @ SCALE


BY ALEXIS CUADRADO
ANATOMY OF A LAMBDA INVOCATION

1
STATIC CODE
HANDLER CODE
4 5
2

3 RUNTIME
EXECUTION ENVIRONMENT
COMPUTE SUBSTRATE

1 2 3 4 5
Download Code Start Environment Bootstrap Runtime Run Static Code Run Handler

LAMBDA @ SCALE ALEXIS CUADRADO


LONG LIVE THE EXECUTION ENVIRONMENT!

1 2 3

invocation

initializing
EXECUTION ENVIRONMENT executing
available

TIME

LAMBDA @ SCALE ALEXIS CUADRADO


STARTED ON THE COLD FOOT
1 3 4

E1 2 5
cold start

invocation
E2
initializing
executing
available

1 2 3 4 5

Download Code Start Environment Set up Runtime Run Static Code Run Handler

LAMBDA @ SCALE ALEXIS CUADRADO


DISSECTING LATENCY

COLD START ( INITIALIZATION ) EXECUTION


1 2 3 4 5
Download Code Start Environment Bootstrap Runtime Run Static Code Run Handler

LAMBDA @ SCALE ALEXIS CUADRADO


DISSECTING LATENCY
1 2
COLD WARM
WARM LATENCY
= EXECUTION

1 2 3 4 5
Download Code Start Environment Bootstrap Runtime Run Static Code Run Handler

COLD LATENCY = COLD START + EXECUTION

LAMBDA @ SCALE ALEXIS CUADRADO


OPTIMIZE HANDLER LOGIC
A
MOVE WORK OUTSIDE HANDLER B
AVOID ORCHESTRATION
Reusable objects should be Use Step Functions
statically initialized

C
EMPLOY EFFICIENT ALGORITHMS D
MULTI-THREADING
Put those hard-won Parallelize I/O operations
whiteboarding skills to use (e.g. S3 downloads)

1 2 3 4 5

Download Code Start Environment Set up Runtime Run Static Code Run Handler

LAMBDA @ SCALE ALEXIS CUADRADO


TURN UP THE RAM!
Example Python Function: Return all prime numbers between 0 and 10K TIP: Enable Lambda Insights for profiling

MEMORY (MB) EXECUTION DURATION (MS) MEMORY = vCPU NETWORK THROUGHPUT


128 170
256 80
512 40
1024 20
1536 17 lowest latency
3008 17

Beware of negative returns beyond this point


1 2 3 4 5

Download Code Start Environment Set up Runtime Run Static Code Run Handler

LAMBDA @ SCALE ALEXIS CUADRADO


CHOOSE YOUR DESTINY

I mean... your runtime.

.NET

TIP: Run your own benchmarks (if performance is critical to you)

FASTER INITIALIZATION FASTER EXECUTION

1 2 3 4 5

Download Code Start Environment Set up Runtime Run Static Code Run Handler

LAMBDA @ SCALE ALEXIS CUADRADO


STATIC INITIALIZATION DONE RIGHT
+ 120 ms

1 2 3 4 5

Download Code Start Environment Set up Runtime Run Static Code Run Handler

LAMBDA @ SCALE ALEXIS CUADRADO


PACKAGE SIZE MATTERS
50 MB

1 2 3

Download Deployment Package Start Environment Bootstrap Runtime

TIPS:
20 MB • Audit and remove unused dependencies
• Use minifiers (e.g. node-minify)
1 2 3 4 5

Download Deployment Run


Start Environment Bootstrap Runtime Run Static Code
Package Handler

LAMBDA @ SCALE ALEXIS CUADRADO


ON COLD START FREQUENCY
Isn’t that what we’re all asking in our own lives?
COLD
? ? WARM

? ?
HOW CAN WE GET MORE OF THIS?

... AND LESS OF THIS?

LAMBDA @ SCALE ALEXIS CUADRADO


CONCURRENCY
1 2

E1 3

E2 4

E3

TIME
t₀ concurrency = 1 t₁ t₂ concurrency = 3

LAMBDA @ SCALE ALEXIS CUADRADO


PROVISION ALL THE CONCURRENCY!
1 4
concurrency is provisioned
provisioned concurrency = 3

E1 2 5

E2 NO COLD STARTS! 3

E3

TIME
t₀ t₁ requests start arriving
LAMBDA @ SCALE ALEXIS CUADRADO
AUTOSCALING
DEMAND /
CONCURRENCY

TIME

LAMBDA @ SCALE ALEXIS CUADRADO


PUT
MAXIMIZING THROUGH
LEAVE NO REQUEST BEHIND

AWS LAMBDA @ SCALE


BY ALEXIS CUADRADO
A FUNCTION’S THROUGHPUT
= 2 TPS
CONCURRENCY
THROUGHPUT =
1 2
LATENCY
=1

= 500 ms = 4 TPS
= 500 ms

= 4 TPS 1 2
=2
1 2 3 4 =1 3 4

= 250 ms = 250 ms = 250 ms = 250 ms = 500 ms = 500 ms


LAMBDA @ SCALE ALEXIS CUADRADO
REDUCING LATENCY TO GAIN THROUGHPUT
1 3
= 4 TPS = 10 TPS

1 2 3 4 1 2 3 4 5 6 7 8 9 10
250 ms 100 ms

1 sec

2 4

1 2 3 4 5

= 8 TPS
6 7 8
125 ms
? 50 ms

=1
LAMBDA @ SCALE ALEXIS CUADRADO
OH, THE RATE LIMITS
3
= 10 TPS

1 2 3 4 5 6 7 8 9 10
100 ms

ASYNCHRONOUS SYNCHRONOUS
4A INVOCATIONS 4B INVOCATIONS
No limit! Throughput capped due to rate limiting
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10
50 ms 50 ms

= 20 TPS = STILL 10 TPS


=1 Additional invocations will increase
concurrency or be throttled
LAMBDA @ SCALE ALEXIS CUADRADO
ACCOUNT CONCURRENCY
E1

E999

E1000 X

THROTTLED account concurrency = 1000

THROTTLED
TIME
concurrency = 1000 X

LAMBDA @ SCALE ALEXIS CUADRADO


EVERYONE GETS THEIR (FAIR?) SHARE
E1

ANOTHER FUNCTION
E100 concurrency = 200

E1
OUR FUNCTION
concurrency = 800
E800

THROTTLED account concurrency = 1000

TIME
X
LAMBDA @ SCALE ALEXIS CUADRADO
RESERVED CONCURRENCY
E1

OUR FUNCTION
E300 concurrency = 300
reserved concurrency = 900

E1
ANOTHER FUNCTION
concurrency = 100
E100 -1900

THROTTLED account concurrency = 1000

TIME
X
LAMBDA @ SCALE ALEXIS CUADRADO
R
NO MORE THAN YOU DESERVE
E1

E2
OUR FUNCTION
E3 concurrency = 1900
reserved concurrency = 1900

E1900 X

THROTTLED account concurrency = ∞

THROTTLED
TIME
X
LAMBDA @ SCALE ALEXIS CUADRADO
BURST CONCURRENCY (OR HOW FAST CONCURRENCY CAN RISE)
ACCOUNT THROTTLE ZONE BURST QUOTA = 3000 ACCOUNT CONCURRENCY = 5000
Super
0 0
5K
CONCURRENCY (UNITS)

BURST THROTTLE ZONE BURST THROTTLE ZONE


1000
4K

1000 1500 500 1000


3K

FUNCTION CONCURRENCY
2K AVAILABLE BURST

2500 2500
1K

3000

0 +0 1 +500 2 +500 3 +500 4 +500 5 +0 6 +0 7 +500 8 +500 9 TIME (MINUTES)

LAMBDA @ SCALE ALEXIS CUADRADO


MINIMIZING $ COSTS
GET THE BIGGEST BANG FOR YOUR BUCK

AWS LAMBDA @ SCALE


BY ALEXIS CUADRADO
HOW LAMBDA PRICING WORKS $

fixed fee per request


$
COST PER = COMPUTE
CHARGES
REQUEST
CHARGES
TRANSACTION
also COST PER EXECUTION

EXECUTION influences ALLOCATED


i nes
t e r m DURATION MEMORY Free Tier available
de
Elegible for Savings Plans
LATENCY Rates vary based on Region and CPU Architecture
LAMBDA @ SCALE ALEXIS CUADRADO
POWER TUNING
OPTIMAL
MEMORY
CONFIGURATION

alexcasalboni/aws-lambda-power-tuning
LAMBDA @ SCALE ALEXIS CUADRADO
ARM YOURSELF (WITH GRAVITON2)

COST

LATENCY

LAMBDA @ SCALE ALEXIS CUADRADO


AWS LAMBDA @ SCALE
BY ALEXIS CUADRADO

PUTTING IT ALL TOGETHER


BEFORE OUR BRAINS BURST
WHAT WE (HOPEFULLY) LEARNED TODAY
THE CHALLENGE OF SCALE HOW LAMBDA SCALES CONCURRENCY
Getting our Lambda Functions to perform as required QUOTAS ACCOUNT CONCURRENCY (how much)
BURST CONCURRENCY (how fast)
KEY PERFORMANCE METRICS $

CONTROLS PROVISIONED CONCURRENCY (pre-warm)


LATENCY - THROUGHPUT - COST PER TRANSACTION
RESERVED CONCURRENCY (set aside)

HOW LAMBDA PROCESSES INVOCATIONS THROUGHPUT CONSIDERATIONS


Execution Environments COLD STARTS Relation to Latency and Concurrency
Rate Limits, Concurrency Quotas and Controls

WHAT FACTORS AFFECT LATENCY HOW PRICING WORKS POWER TUNING


COLD START EXECUTION Interplay between memory, execution time and costs
OPTIMIZATION TECHNIQUES Price-performance gains with arm64 (Graviton2)
LAMBDA @ SCALE ALEXIS CUADRADO
THANK YOU
ITBA!
FOLLOW ME

@alexscuadrado
/in/alexis-cuadrado
alexis.hashnode.dev

LAMBDA @ SCALE ALEXIS CUADRADO


𝕏

You might also like