Performance With Laravel
Performance With Laravel
Measuring performance
ab
jmeter
Inspector
Telescope
OpenTelemetry
XDebug + qcachegrind
Clockwork
htop
How to start measuring performance?
N+1 queries
Solutions
Prevent lazy loading
Let the client decide what it wants
Multiple resources
Pagination
Cursor pagination
Database indexing
Theory
Arrays
Linked list
Binary tree
Binary search tree (BST)
Indexing in the early days
Single-level indexing
Multi-level indexing
B-Tree
Problems with B-Trees
B+ Trees
Index access types
const
range
range (again)
index
ALL
Select *
Composite indexes
Cardinality
Database indexing in practice
Listing posts by status
Feed
Publishing posts
Avoiding memory problems
Avoiding spamming the database
Measuring performance
Async workflows
Web scraping with jobs
Concurrent programming
fork
Concurrent HTTP requests
Queues and workers
No. 1 / 217
Martin Joo - Performance with Laravel
supervisor
Multiple queues and priorities
Optimizing worker processes
Chunking large datasets
Exports
Imports
Generators & LazyCollections
PHP generators
Imports with generators
Imports with LazyCollections
Reading files
Deleting records
Miscellaneous
fpm processes
nginx cache
Caching static content
Caching fastcgi responses
MySQL slow query log
Monitoring database connections
Docker resource limits
Health check monitors
No. 2 / 217
Martin Joo - Performance with Laravel
Measuring performance
Before we talk about how to optimize performance we need ways to effectively measure it. But even before
we can measure it we need to know what exactly we want to measure.
Here are some of the most important performance measures of an API/backend service:
Throughput: the number of requests the system can handle without going down.
Load time: the amount of time it takes for an HTTP request to respond.
Server uptime: the duration of time the server is up and running usually expressed as a percentage.
CPU usage: the amount of CPU your system needs to run. It is usually expressed as load average which
I'm gonna explain later.
In this book, we're going to talk about backend and APIs but of course there are some frontend-related
metrics as well:
Load time: the amount of time it takes for the full page to load.
First byte: the time taken to start loading the data of your web application after a user requests it.
Time to interactive: this measures how long it takes a page to become fully interactive, i.e., the time
when layout has stabilized, key web fonts are visible, and the main thread is available enough to
handle user input.
Page size: the total size of the web page, that includes all of its resources (HTML, CSS, JavaScript,
images, etc).
Number of requests: the number of individual requests made to the server to fully load the web page.
These things are "black box measures" or "external measures." Take load time for an example and say the
GET api/products endpoint took 912s to load which is slow. Measuring the load time tells you that your
system is slow, but it doesn't tell you why. To find out the cause we need to dig deeper into the black box.
We need to debug things such as:
and so on
Measuring a system from the outside (for example load time of an API endpoint) is always easier than
measuring the internal parts. This is why we start with the external measures first.
No. 3 / 217
Martin Joo - Performance with Laravel
ab
The easiest tool to test your project's performance is ab or Apache Benchmark. It's a command line tool
that sends requests to a given URL and then shows you the results.
Unfortunately, in ab we cannot specify the ramp-up time. This is used to define the total time in which the
tool sends the requests to your app. For example, "I want to send 100 requests in 10 seconds." You cannot
do that with ab . It will always send requests when it can. So if the first batch of the concurrent requests
(which is 10 requests in this example) is finished in 3 seconds then it sends the next batch and so on. Other
than that, it's the perfect tool to quickly check the throughput of your application.
Concurrency Level: 10
Time taken for tests: 2.114 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 1636000 bytes
HTML transferred: 1610100 bytes
Requests per second: 47.31 [#/sec] (mean)
Time per request: 211.363 [ms] (mean)
Time per request: 21.136 [ms] (mean, across all concurrent requests)
Transfer rate: 755.88 [Kbytes/sec] received
As you can see, it sent a total of 100 requests with a concurrency level of 10. The whole test took 2114ms or
2.114 seconds. If we divide 100 with 2.114 seconds the result is 47.31. This is the throughput of the server.
It can handle 47 requests per second.
The next two numbers were quite hard to me to understand at first. They are:
Time per request: 21.136 [ms] (mean, across all concurrent requests)
No. 4 / 217
Martin Joo - Performance with Laravel
When you run ab -n 100 -c 10 ab creates 10 request "groups" that contains 10 requests each:
In this case Time per request: 21.136 [ms] (mean, across all concurrent requests) means that 1
request took 21ms on average. This is the important number.
The other Time per request: 211.363 [ms] (mean) refers to a request group. Which contains 10
requests. You can clearly see the correlation between these numbers:
Time per request: 21.136 [ms] (mean, across all concurrent requests)
So if you use concurrency the last number doesn't really make sense. It was really confusing for me at first,
so I hope I gave you a better explanation.
Easy to install
Easy to use
You can load test your app in minutes and get quick results
No. 5 / 217
Martin Joo - Performance with Laravel
jmeter
The next tool to load test your application is jmeter . It has more advanced features than ab including:
Other useful testing features such as XPath, regular expression, JSON, script variables, and response
parsing, which help to build more exact and effective tests.
GUI
A quick note. If you're having trouble starting jmeter try this command with the UserG1GC argument: java -
XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -jar ApacheJMeter.jar . You can also use this alias:
alias jmeter='JVM_ARGS="-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC"
/path/to/jmeter/bin/jmeter'
To start load testing with jmeter you need to create a new Thread group that has the following options:
Number of threads refers to the number of users or put it simply, the number of requests.
Ramp-up period defines how much time should jmeter take to start the requests. If 10 threads are
used, and the ramp-up period is 100 seconds, then jmeter will take 100 seconds to get all 10 threads
up and running. Each thread will start 10 (100/10) seconds after the previous thread was begun.
And then we have Loop count . By default it's 1 meaning that jmeter runs your HTTP tests once. If you
set to 100 it repeats all of the tests a 100 times.
Note: you can find the example test plan in the source code 1-measuring-performance/simple-jmeter-
test-plan.jmx
Inside the Thread group we need to add an HTTP Sampler which can be found inside the Sampler
category. An HTTP request is pretty straightforward. You just need to configure the base URL, the endpoint,
and query or POST body parameters if you have any.
In order to measure the results we need to add Listeneres as well. These are the components that can
display the results in various formats. Two of the most crucial listeners are the Summary Report and the
View Results Tree . Add them to the Thread group.
No. 6 / 217
Martin Joo - Performance with Laravel
Std. Dev,: it's the standard deviation during the test. In general, standard deviation shows you the
amount of variation or dispersion of a set of values. In performance testing, the standard deviation
indicates how much individual sample times deviate from the average response time. In this example,
27 indicates that the individual sample times in the dataset are, on average, 27 units away from the
mean of the dataset. This means that there is a relatively high level of variability or dispersion in the
sample times.
Throughput: the number of requests per minute that can be served by the server
Received KB/Sec: the amount of data received per second during the test.
So the summary reoprt gives you a quick overview of the overall results of your tests.
View Results Tree on the other hand enables you to check out individual requests which can be helpful if
you have 5xx responses. It looks like this:
No. 7 / 217
Martin Joo - Performance with Laravel
The last thing you probably need is to send the Authorization header in your requests. In jmeter there's a
dedicated component to set header values. It's called HTTP Header Manager and can be found in the
Managers category. To set up is really easy you just need to add the header's name and value.
Note: you can find this example test plan in the source code 1-measuring-performance/simple-jmeter-
test-plan.jmx
No. 8 / 217
Martin Joo - Performance with Laravel
Inspector
ab and jmeter are great if you want to understand the throughput and overall responsiveness of your
application. Let's say you found out that the GET /api/transactions endpoint is slow. Now what? You
open the project go to the Controller trying to find the slow part. You might add some dd or time() and so
on. Fortunately, there's a better way.
For example, here's a time distribution of the different occurances of the GET /api/transactions request:
7 times it took only 20-50ms. You can see these occurrences on the left side on the 0ms mark.
3 times it took something between 500ms and 2500ms. These are the 3 smaller bars.
And then one time it took almost 15 seconds. This is the lonely bar at the right side.
If I click on these bars I can quickly what's the difference between a 29ms and a 2300ms request:
No. 9 / 217
Martin Joo - Performance with Laravel
In the left panel you can see that only 4 mysql queries were executed and the request took 29ms to
complete. On the right side, however, there were 100 queries executed and it took 2.31 seconds. You can
see the individual queries as well. On the right side there are these extra select * from products queries
that you cannot see on the left side.
In the User menu you can always check out the ID of the user that sent the request. It's a great feature
since user settings and different user data can cause differences in performance.
If it's a POST request you can see the body in the Request Body menu:
Another great feature of Inspector is that it also shows outgoing HTTP requests and dispatched jobs in the
timeline:
No. 10 / 217
Martin Joo - Performance with Laravel
In my example application, the POST /api/transactions endpoint comunicates with other APIs and also
dispatched a job. These are the highlighted rows on the image.
The great thing about Inspector is that it integrates well with Laravel so it can detect things like your queue
jobs:
No. 11 / 217
Martin Joo - Performance with Laravel
You can dig into the details of a jobs just like an HTTP requests:
You have the same comparasion view with all the database queries, HTTP requests, or other dispatched
jobs:
No. 12 / 217
Martin Joo - Performance with Laravel
The best thing about Inspector? This is the whole installation process:
It's an awesome tool to monitor your application in production. Easy to get started and easy to use. It gives
you a great overview and you can dig deeper if you need to. It integrates well with Laravel so you'll see your
HTTP requests, commands, and jobs out of the box.
No. 13 / 217
Martin Joo - Performance with Laravel
Telescope
Even tough, Inspector is awesome, it's a paid 3rd party tool so I understand not everyone wants to use it.
One of the easiest tools you can use to monitor your app is Laravel Telescope.
After you've installed the package you can access the dashboard at localhost:8000/telescope . If you
send some requests you'll see something like that:
It gives you a great overview of your requests and their duration. What's even better, if you click on a
specific request you can see all the database queries that were executed:
If you click on an entry you can see the whole query and the request's details:
No. 14 / 217
Martin Joo - Performance with Laravel
Commands
Jobs
Cache
Events
Exceptions
Logs
Mails
...and so on
For example, here are some queue jobs after a laravel-excel export:
No. 15 / 217
Martin Joo - Performance with Laravel
Telescope is a great tool and it's a must have if you want to monitor and improve your app's performance. If
you want to use only free and simple tools go with ab and Telescope. ab tells you what part of the app is
slow. Telescope tells you why it's slow.
No. 16 / 217
Martin Joo - Performance with Laravel
OpenTelemetry
Both Inspector and Telescope tracks everything by default. Which is a great thing. However, sometimes you
might want to control what's being tracked and what is not.
Traces
Spans
A trace is a set of events. It's usually a complete HTTP request and contains everything that happens inside.
Imagine if your API endpoint sends an HTTP request to a 3rd, dispatched a jobs, sends a Notification and
runs 3 database queries. All of this is one trace. Every trace has a unique ID.
A span is an operation inside a trace. The HTTP request to the 3rd party can be a span. The dispatched job
can be another one. The notification can be the third one. And finally you can put the 3 queries inside
another span. Each span has a unique ID and their contain the trace ID. It's a parent-child relationship.
So it's similar to Inspector, however, it requires manual instrumentation. Instrumentation means you need
to start and stop the traces manually, and then you need to add spans as you like to. So it requires more
work but you can customize it as you wish.
OpenTelemetry offers a PHP SDK. However, the bare bone framework is a bit complex to be honest, so I'll
use a simple but awesome Spatie package to simplify the whole process. It's called laravel-open-telemetry.
No. 17 / 217
Martin Joo - Performance with Laravel
Http!$get('!!%');
The start method starts a new span. Behind the scenes a unique trace ID will be generated at the start of
every request. When you call Measure::start() a span will be started that will get that trace id injected.
So we only worry about spans. Traces are handled by the package.
But what happens with this traces and spans? How can I view them? Great question!
The collected data needs to be stored somewhere and needs a frontend. We need to run some kind of store
and connect it to the Spatie package. There are multiple tracing system that handles OpenTelemetry data.
For example, ZipKin or Jaeger. I'm going to use ZipKin since it's the most simple to set up locally. All we need
to do is this:
'drivers' !& [
Spatie\OpenTelemetry\Drivers\HttpDriver!$class !& [
'url' !& 'http:!"localhost:9411/api/v2/spans',
],
],
Now Spatie will send the collected metrics to localhost:9411 where ZipKin listens.
Let's see an example how we can add these spans. When you purchased this book (thank you very much!)
you interacted with Paddle even if you didn't realize it. It's merchant of record meaning you paid for them
and they will send me the money once a month. This way, I worry about only one invoice a month. They also
handle VAT ramification.
So imagine an endpoint when we can buy a product, let's call it: POST /api/transactions the requests
looks like this:
No. 18 / 217
Martin Joo - Performance with Laravel
namespace App\Http\Requests;
It's a simplified example, of course. When someone buys a product we need to do a number of things:
Triggering a webhook meaning we call some user defined URLs with the transaction's data
Calculating the VAT involves talking to 3rd party services (for example VatStack). The transactions table
can be huge in an application like this so it's a good idea to place a span that contains this one query
specifically.
$product = $request!'product();
No. 19 / 217
Martin Joo - Performance with Laravel
Measure!$start('Calculate VAT');
Measure!$stop('Calculate VAT');
$feeRate = $setting!'fee_rate;
Measure!$start('Insert transaction');
$transaction = Transaction!$create([
'product_id' !& $product!'id,
'quantity' !& $request!'quantity(),
'product_data' !& $product!'toArray(),
'user_id' !& $product!'user_id,
'stripe_id' !& Str!$uuid(),
'revenue' !& $total,
'fee_rate' !& $feeRate,
'fee_amount' !& $feeAmount,
'tax_rate' !& $vat!'rate,
'tax_amount' !& $vat!'amount,
'balance_earnings' !& $total
!'subtract($vat!'amount)
!'subtract($feeAmount),
'customer_email' !& $request!'customerEmail(),
]);
Measure!$stop('Insert transaction');
try {
if ($webhook = Webhook!$transactionCreated($request!'user())) {
SendWebhookJob!$dispatch($webhook, 'transaction_created', [
'data' !& $transaction,
]);
}
No. 20 / 217
Martin Joo - Performance with Laravel
} catch (Throwable) {}
Measure!$stop('Create transaction');
return response([
'data' !& TransactionResource!$make($transaction)
], Response!$HTTP_CREATED);
}
The trace is called laravel: create transaction where laravel comes from the default config of the
package and create transaction comes from the first span.
And finally SendWebhookJob was recorded by the package automatically. It tracks every queue job by
default and puts them into the right trace. It's a great feature of the Spatie package.
Unfortunately, it's not perfect. You can see the duration is 1.307s in the upper left corner that refers to the
duration of the whole trace. But it's not true since the operation took only 399ms + 78ms for the job. Since
the job is async there's a delay in dispatching it and the start of the execution by the worker process. I
honestly don't know how we can overcome this problem.
No. 21 / 217
Martin Joo - Performance with Laravel
The duration is much better and I think the timeline is also better. Of course, it's more detailed. If these
small 5ms segments are annoying I have good news. You can group them using segments:
inspector()!'addSegment(function () {
$feeRate = $setting!'fee_rate;
$feeAmount = $total!'multiply((string) $feeRate!'value);
!"!!%
}, 'process', 'Group #1');
Group #1 will be displayed in Inspector as the name of this segment. Instead of 10 small segments you'll
see only one. It's a great feature if you're in the middle of debugging and you want to see less stuff to have a
better overview of your endpoint.
To sum it up:
You have to add your own spans which gives you great customization
No. 22 / 217
Martin Joo - Performance with Laravel
Compare it to Inspector:
You can still add your own segments to customize the default behavior
No. 23 / 217
Martin Joo - Performance with Laravel
XDebug + qcachegrind
The next profiling tool is lowest-level of all. It might not be the most useful but I felt I had to include at least
one low-level tool.
Lots of people know about the step debugging aspect of it. And it's great. I think you should set it up and
use it. Here's a Jeffrey Way video that teaches you the whole process in 11 minutes and 39 seconds.
The other feature of XDebug is profiling. When you send a request to your app it can profile every method
call and creates a pretty big data structure out of it that can viewed and analysed for performance
problems. The programs that allows you the view these structures is called qcachegrind on Mac and
kcachegrind on Linux.
zend_extension=xdebug
xdebug.profiler_enabled=1
xdebug.mode=profile,debug
xdebug.profiler_output_name=cachegrind.out.%c
xdebug.start_upon_error=yes
xdebug.client_port=9003
xdebug.client_host=127.0.0.1
No. 24 / 217
Martin Joo - Performance with Laravel
xdebug.mode=profile makes it to listen to our requests and create a function call map.
xebug.profiler_output_name is the file that it creates in the /var/tmp directory. client_port and
client_host is only needed to step debugging.
php !)ini
As you can see, I didn't add the XDebug config in the php.ini file but created an ext-xdebug.ini file in
the conf.d folder that is automatically loaded by PHP.
Now you need to restart your php artisan serve , or Docker container, or local fpm installation. If you did
everything right phpinfo() should include the XDebug extension.
Now all you need to do is send a request to your application. After that, you should see a new file inside the
/var/tmp directory:
qcachegrind cachegrind.out.1714780293
No. 25 / 217
Martin Joo - Performance with Laravel
It's a little bit old school, it's a little ugly, but it's actually pretty powerful.
On the left side you see every function that was invoked during the requests.
The great thing about the call graph is that it includes the time spent in the given function. Take a look at
this:
No. 26 / 217
Martin Joo - Performance with Laravel
This is the part where Laravel dispatches my TransactionController class and calls the index method in
which I put a sleep function. 40% of the time was spent in the sleep function which is expected in this
case.
The great thing about XDebug+qcachegrind is that you can really dig deep into your application's behavior.
However, I think in most cases it's unnecessary. With Telescope or Inspector you'll get a pretty great
overview of your performance problems. In a standard, high-level, "business" application your problems will
be most likely related to database and Telescope or Inspector are just better tools to profile these kinds of
problems.
However, XDebug+qcachegrind can teach us a few things. For example, I never realized this:
These are the function that were executed during the requests. I highlighted four of them:
I give you some context. These examples come from a financial app. The requests I was testing is the GET
/api/transactions . It returns 50 transactions. A transacion record looks like this:
No. 27 / 217
Martin Joo - Performance with Laravel
1 1 2 1800
2 1 1 900
3 2 1 2900
protected $casts = [
'quantity' !& 'integer',
'revenue' !& MoneyCast!$class,
'fee_rate' !& PercentCast!$class,
'fee_amount' !& MoneyCast!$class,
'tax_rate' !& PercentCast!$class,
'tax_amount' !& MoneyCast!$class,
'balance_earnings' !& MoneyCast!$class,
'product_data' !& 'array',
];
}
MoneyCast is just a Cast that uses the Money value object from the moneyphp package:
No. 28 / 217
Martin Joo - Performance with Laravel
Pretty simple. The database stores scalar values and this Cast casts them into value objects.
No. 29 / 217
Martin Joo - Performance with Laravel
There are thse MoneyForHuman calls. It's just another value object that formats Money objects.
return TransactionResource!$collection(
$request!'user()!'transactions()!'paginate(50),
);
Returning only 50 transactions resulted in 1,100 calls to these objects and functions!
It's crazy. If I put something in one of these classes that takes only 50ms the whole request will take an extra
5,500ms to complete. That is an extra 55 seconds.
These are the base results without slowing down the functions:
I sent only one request and it took 278ms to complete. Of course, it will vary but it's good enough.
class MoneyForHuman
{
public function !*construct(private readonly Money $money)
{
usleep(55000);
!" !!%
}
}
No. 30 / 217
Martin Joo - Performance with Laravel
No. 31 / 217
Martin Joo - Performance with Laravel
So even tough, XDebug+qcachegrind can be a little bit too low-level for 95% of our usual performance
problems as you can see they can help us to see the small details that can ruin the performance of our
applications in some cases.
If you want to learn more about XDebug+qcachegrind check out this live stream from the creator of
XDebug.
No. 32 / 217
Martin Joo - Performance with Laravel
Clockwork
There are some other useful tools to profile your applications. Clockwork and Debugbar are great examples.
Clockwork is very similar to telescope. It's a composer package that you can install and after that you can
open 127.0.0.1:8000/clockwork and you'll get a page such as this:
It's the timeline of an API reuqest showing all the database queries that were executed.
You can also check how many models are being retreived to serve the request:
The great thing about Clockwork is that it also comes with a Chrome plugin. So you can see everything in
your developer tool:
No. 33 / 217
Martin Joo - Performance with Laravel
I think Clockwork is the fastest way to start profiling your application on your localhost. You don't even have
to go to a separate page. Just open your console and the information is there.
No. 34 / 217
Martin Joo - Performance with Laravel
htop
The last tool I'd like to talk about is htop . It's a simple but very powerful command line tool that I'll use in
the rest of this book. It looks like this:
You can check the utilization of your CPU and memory. It's a very important tool to debug performance
issues real-time. By real-time I mean two things:
When shit happens and there is some serious performance issue in your production environment you
can check out htop and see what's happening real-time.
When you're developing a feature on your local machine you can always check htop to get an idea
about the CPU load. Of course, it's highly different from your prod servers but it can be a good
indicator.
Other than the visual representation of the load of the cores we can also see the load average numbers.
They are 1.85, 2.27, 2.39 in my case. These numbers represent the overall load of your CPU. The three
numbers mean:
1.85 (the first one) is the load average of the the last 1 minutes
2.27 (the second one) is the load average of the the last 5 minutes
2.39 (the last one) is the load average of the the last 15 minutes
What does a number such as 1.85 actually mean? It means that the overall CPU utilization was around 23%
on my machine in the last minute. Straightforward, right?
If you have only 1 CPU core a load average of 1 means your core is working 100%. It is fully utilized. If your
load average is 2 then your CPU is doing twice as much work as it can handle.
But if you have 2 CPU cores a load average of 1 means your cores are working at 50%. In this case, a load
average of 2 means 100% utilization.
So the general rule is that if the load average is higher then the number of cores your server is overloaded.
Back to my example. I have 8 cores so a load average of 8 would be 100% utilization. My load average is 1.85
on the image so it means 1.85/8 or about 23% CPU load.
No. 35 / 217
Martin Joo - Performance with Laravel
In a typical business application where users must log in probably one of the most important pages is the
dashboard, the home page that presents right after they log in. If it's a publicly available webpage than it is
the landing page. If you don't know where/how to start this is the perfect place.
Determine how many users you want to/have to serve. Let's say it's 1,000
Come up with a reasonable goal. For example, "I want to be able to serve 100 concurrent users
with a maximum load time of 1.5 seconds" (these numbers are completely random, please don't
take them seriously)
Now open up Inspector or Telescope and identify what takes a long time
I know, i know every feature of your app is "the most important at the moment" according to the product
team. However, we all know we can identify a handful of features that is the most critical in the application
no matter what. Try to identify them and measure them the same way as your home page. However, in this
case your target numbers can be lower because it's usually rare that 72% of your users use the same
feature at the same time. It's always true to the home page but usually it's not the case with other features.
Unless, of course, your feature has some seasonality such as booking.com or it follows some deadlines such
an accounting software. In this case, you know that on X day of every month 90% of users will use that one
feature.
We tend to forget to optimize background jobs because they run in the background and they do not overly
affect the overall user experience. However, they still use our servers. They consume CPU and RAM. They
costs us money.
Just as with features, try to identify your most important/critical jobs and analyze them with Inspector
and/or Telescope the same way as if they were web requests. Try to reduce the number of queries, the
overall memory consumption, the execution time with techniques discussed in the book.
When you set target numbers (such as serving 100 concurrent users, loading the page within 500ms etc) it's
important to use the hardware as your production. Usually the staging environment is a good starting point.
No. 36 / 217
Martin Joo - Performance with Laravel
When you debug a specific feature or a job you can use your local environment as well. Of course, execution
times will be different compared to production, but you can think in percentages. For example, "this job
took 10s to finish but now it only takes 9s. I gained 10%." The number of queries, the overall memory
consuption sill be similar to production.
No. 37 / 217
Martin Joo - Performance with Laravel
N+1 queries
I put this chapter first because the N+1 query is one of the most common performance issues in lots of
projects. The good news is that it's relatively easy to fix.
This chapter doesn't discuss particular solutions it only describes what is an N+1 problem. Feel free to skip it
if you already know it.
Congratulations! If the given user has 500 posts we just executed 501 queries:
This is why it's called an "N+1 query" problem. It always has the following elements:
A loop
Let's zoom out a little bit and see where the $user variable comes from:
This is exponentially worse. This is now an N*M+1 query problem where N is the number of users and M is
the average number of posts per user. If you have 1,000 users and they have 30 posts on average this
function runs 30,000 database queries.
No. 38 / 217
Martin Joo - Performance with Laravel
Another issue is that we load 1,000 users directly into memory all at once. It doesn't sound too much, right?
Can you guess the size of a User object in a fresh Laravel installation in bytes? We're going to talk about
that in another chapter.
class OrderController
{
public function markOrdersPaid(Request $request)
{
$orders = Order!$whereIn('id', $request!'ids);
$this!'save();
}
}
In the Controller it's not obvious because the N+1 query happens inside the model. Of course,
markAsPaid indicates that at least one query is executed. However, if the function call was something
like calculateVat and it executed 5 additional queries for all orders the situation would be much
worse.
In the model, you don't know immediately where the function is being called. You don't know the
context so it's not obvious that an N+1 issue is happening.
No. 39 / 217
Martin Joo - Performance with Laravel
class OrderController
{
public function index()
{
return OrderResource!$collection(Order!$all());
}
}
It's a hidden loop since the OrderResource class is being used N times where N is the number of orders
you return from the Controller. For each order, it executes two additional queries.
No. 40 / 217
Martin Joo - Performance with Laravel
Solutions
As we have seen, one of the most frequent occurrences of N+1 queries is when additional queries are
executed to get related models. Fortunately, there's an easy fix to that problem. Eager loading:
$users = User!$with('posts')!'get();
Instead of User::all() I use User::with('posts') which means that the posts relationship is loaded in
the original query.
Laravel runs only one additional query that gets all the related posts for the users.
In the case of resources we can do one more thing. Using the whenLoaded helper in the Resource:
No. 41 / 217
Martin Joo - Performance with Laravel
In this case, if the items or user relationships are not eager-loaded using with the resource won't query
them at all. This way you can avoid N+1 queries in resources completely. However, I think this is still not the
best solution when it comes to resources but it's a fix to the problem. In the following chapter, I'll show you
my favorite way of handling API requests and relationships in resources.
What about the cases when the issue is not caused by relationships? Such as this example:
class OrderController
{
public function markOrdersPaid(Request $request)
{
$orders = Order!$whereIn('id', $request!'ids);
No. 42 / 217
Martin Joo - Performance with Laravel
$this!'save();
}
}
They are harder to generalize but in the upcoming chapters, we're going to talk about these kinds of
problems. For example, in the Async workflows/Concurrent programming chapter, you can see how to run
queries in a parallel way utilizing most of your CPU cores.
No. 43 / 217
Martin Joo - Performance with Laravel
namespace App\Providers;
No. 44 / 217
Martin Joo - Performance with Laravel
So if you use preventLazyLoading in your local environment you can make sure there are no N+1
problems in your codebase.
No. 45 / 217
Martin Joo - Performance with Laravel
HTTP Resource classes tend to grow big over time. As the project grows, your User model (or any other
model) has more and more columns. It has more and more relationships. Usually, "more and more" is an
understatement. For example, here are some numbers from one of the projects I'm working on. The User
model has:
9 belongsToMany
2 belongsTo
3 hasManyThrough
28 hasMany
1 morphMany
12 hasOne
It has 55 relationships and it's not even a 10-year-old legacy project. As you add features to the project a
good portion of these relationships will appear in the UserResource :
This will result in a lot of unnecessary database queries as it will execute a select * from posts where
user_id = 1 and a select * from comments where user_id = 1
To avoid executing lots of queries usually we eager load the necessary relationships in the Controller:
No. 46 / 217
Martin Joo - Performance with Laravel
class UserController
{
public function show(User $user)
{
$user!'load(['posts', 'comments']);
return UserResource!$make($user);
}
}
That solves the N+1 query problem, however it comes with a cost:
You need to load the necessary relationships in every controller every time.
If you forget the first one, you'll end up with N+1 queries. If you forget the second one, you'll get null and
you might end up having bugs on the frontend.
What if the frontend changes and it doesn't show the comments anymore? Then hypothetically you can
remove the comments from the resource. But it is used in 9 other endpoints so you can't really remove it
because you're not sure when it's needed.
These things should be decided by the frontend. It should know exactly what it needs.
No. 47 / 217
Martin Joo - Performance with Laravel
Instead of this:
GET /api/v1/users/1
GET /api/v1/users/v1?include=posts,comments
The frontend tells the backend that this specific page needs two relationships: posts and comments.
Another page might only need the posts:
GET /api/v1/users/v1?include=posts
The same resource is used in both cases that looks like this:
No. 48 / 217
Martin Joo - Performance with Laravel
If the expression returns true the given relationship is included, otherwise, it's not.
However, it still has N+1 query problems. Every time you load a user there are two additional queries for
posts and comments. Of course, we could use whenLoaded :
But now, we have the same problem. The controller must eager-load the posts relationship.
Fortunately, there's an easy solution with Spatie's laravel-query-builder package and it looks like this:
class UserController
{
public function index()
{
$users = QueryBuilder!$for(User!$class)
!'allowedIncludes(['posts', 'comments'])
!'get();
return UserResource!$collection($users);
}
}
allowedIncludes respects the include parameter of the request. If it contains posts or comments it
eager load the relationships.
return UserResource!$collection($users);
}
No. 49 / 217
Martin Joo - Performance with Laravel
If you rely on eager-loading and you forget a relationship in the controller you'll end up with an N+1
query problem.
If you use QueryBuilder and you forget to allow an include you'll get an exception instead of
performance problems.
For example, if I only allow the comments relationship I get this error:
This seems like a small difference, but believe me, in larger projects N+1 queries is one the most common
and annoying performance problem. HTTP resources are also common for these kinds of problems. By
using the include query param and the laravel-query-builder package you can eliminate lots of N+1
query issues.
No. 50 / 217
Martin Joo - Performance with Laravel
Multiple resources
If for some reason you can't or just don't wan't to use include parameters in your requests (for example, it
would be a huge refactor in a large project) you can start using multiple resources for the same model.
Let's say we're working on a real estate site, something like Zillow. We have a list and a detailed view of real
estate. On the list we only need to display 4 attributes:
Location
Price
Number of bedrooms
Area
But on the detailed page, we obviously need to show more attributes than this.
No. 51 / 217
Martin Joo - Performance with Laravel
}
}
And the list goes on. If you take a look at Zillow I think they have at least 20+ more attributes for each
property. If we use only this one resource we waste lots of bandwidth, memory, and CPU just to show the
first 4 attributes on the index page.
This response:
{
"data": {
"id": 12379,
"addres": "11 Starrow Drive, Newburgh, NY 12550",
"price": "$379,900",
"number_of_bedrooms": "3",
"area": "1,332",
"picture": "https:!"shorturl.at/awDV4"
}
}
is only 169 bytes and contains everything to replicate Zillow's home page.
So one of the solutions to avoid having large resources when you don't need them is to use many of them.
In this situation, we can create two:
RealEstateMinimalResource
RealEstateDetailedResource
RealEstateMinimalResource can be used on the home page and it would look like this:
No. 52 / 217
Martin Joo - Performance with Laravel
RealEstateDetailedResource would be the one I showed you earlier and would be used on the detailed
page of a real estate with lots of information.
This is a pretty easy way to speed up your requests and save some bandwidth for your users.
No. 53 / 217
Martin Joo - Performance with Laravel
Pagination
First of all, use pagination whenever it's possible. I'm not going to go through the basics because it's an easy
concept and Laravel has awesome documentation.
Secondly, did you know that there's a pagination technique that can be 400x faster than the one you're
probably using?
The Product::paginate(50) method returns a LengthAwarePaginator and executes the following query:
If you send a request such as 127.0.0.1:8000/api/products?page=1500 the query looks like this:
While this pagination is simple and works well for smaller projects the problem is that MySQL has to go
through 75,000 records and discard the first 74,950.
No. 54 / 217
Martin Joo - Performance with Laravel
So the query results in a full table scan meaning that MySQL has to read everything from the disk loop
through the first 74,950 records, discard them, and then return rows from #74,951 to #75,000.
From this mechanism, you can quickly see that the more records you have the worse it gets. The higher the
requested page number is the worse it gets because MySQL needs to process more and more records. It
means two things:
Simple pagination does not perform particularly well with large datasets. It's hard to define "large" but
probably something like 100,000+
It's pretty good for smaller datasets. If you have a few thousand rows it's not gonna be a problem at all.
No. 55 / 217
Martin Joo - Performance with Laravel
Cursor pagination
What about a query such as this?
It returns 50 products from #74,951 to #75,000 in an ordered way limiting the number of results by using a
where expression on the id column and a limit expression.
Now MySQL can use the primary key index and it can perform a range query. Meaning, it can use the index
to perform a B-TREE traversal to find the starting point of a range and scan records from that point on. It
performs significantly fewer I/O operations than a full table scan.
However, when using this kind of pagination we don't directly use IDs in the query. Instead, we use MySQL
cursors. This is why this technique is called cursor pagination or relative cursor pagination. A cursor is an
object that allows for sequential processing of query results, row by row. It is a reference point that
identifies a specific position in the result set. It can be a unique ID, a timestamp, or any sortable value that
allows the database to determine where to continue fetching results. It's basically a pointer that points to a
specific row. It's important that the column has to be sortable (such as an ID) so the cursor knows how to
move forward in the result set.
No. 56 / 217
Martin Joo - Performance with Laravel
OPEN cur;
When the fetch statement is executed the cursor runs the query and fetches the first row. Then the cursor
is set to that specific row. If we run fetch again the next row is going to be retrieved. So typically a cursor is
used inside a loop:
read_loop: LOOP
FETCH cur INTO product_id, product_title;
Of course, if there are no results we need to leave the loop and close the cursor but that's not important
now. The important thing is that a cursor is a pointer to a row and we can fetch data row by row. By the
way, this sounds like a pretty good technique to implement infinite scroll just as social media sites do.
class ProductController
{
public function index()
{
return ProductResource!$collection(Product!$cursorPaginate(50));
}
}
When you use the cursorPaginate method, instead of page numbers, Laravel returns a cursor ID:
No. 57 / 217
Martin Joo - Performance with Laravel
{
"path": "http:!"127.0.0.1:8000/api/products",
"per_page": 50,
"next_cursor":
"eyJwcm9kdWN0cy5pZCI6NTAsIl9wb2ludHNUb05leHRJdGVtcyI6dHJ1ZX0",
"prev_cursor": null
}
And then you can get the next page by sending this cursor ID:
{
"first": null,
"last": null,
"prev": null,
"next": "http:!"127.0.0.1:8000/api/products?
cursor=eyJwcm9kdWN0cy5pZCI6NTAsIl9wb2ludHNUb05leHRJdGVtcyI6dHJ1ZX0"
}
Let's compare the two queries. Here's the one with offset :
No. 58 / 217
Martin Joo - Performance with Laravel
It's 2.7ms.
The difference doesn't seem that much because these are pretty low numbers but there's a 11x difference
between the two queries. I used a demo database with just 100,000 records.
Let's see if the database contains 700,000 rows and we want to get products starting at #500,000. This is the
offset query:
Now it's red in Telescope and it took 262ms. We spend most of our time in the HTTP layer and looking at
HTTP requests and responses where 262ms is pretty fast, but at the database level, it's very very slow.
If we check the same query with an offset value of 50,000 instead of 500,000 the result is 41ms:
So here's the proof that the more you paginate and the more records you have your database becomes
worse and worse.
Okay, I know these values (262ms vs 3.52ms) are so small that they sound abstract and neglectable. Let's
put them into context!
Imagine for a minute that you have a MySQL server and there is only one connection to your Laravel app
(which is not recommended and unlikely in the real world). So every user uses the same MySQL connection
and they have to wait for each others' queries to be finished.
If your home page gets 100 visitors and you use the offset query the total execution time is 100 * 262ms or
26.2s. The 10th user has to wait half a minute.
If you use the cursor query the total execution time is 100 * 3.52ms or 0.35 seconds.
Of course, in the real world, we have more than one connection but the point remains the same. A 74 times
difference is huge.
No. 59 / 217
Martin Joo - Performance with Laravel
Shopify experienced the same problem. They ran into pretty slow queries and even complete database
timeouts because of offset pagination. In their article, they explain how they managed to make a 400x
difference when adopting cursor-based pagination.
No. 60 / 217
Martin Joo - Performance with Laravel
Database indexing
My goal in this chapter is to give you the last indexing tutorial you'll ever need. Please do not skip the following
pages.
Theory
This is one of the most important topics to understand, in my opinion. No matter what kind of application
you're working there's a good chance it has a database. So it's really important to understand what happens
under the hood and how indexes actually work. Because of that, this chapter starts with a little bit of theory.
Arrays
Linked lists
Binary trees
B-Trees
B+ Trees
No. 61 / 217
Martin Joo - Performance with Laravel
Arrays
They are one of the oldest data structures. We all use arrays on a daily basis so I won't go into details, but
here are some of their properties from a performance point of view:
An array is a fixed-sized contiguous data structure. The array itself is a pointer to a memory address and
each subsequent element has a memory of x + (sizeof(t) * i) where
sizeof(t) is the size of the data type. For example, an int takes up 8 bytes
The subsequent memory address has an interesting implication: your computer has to shift the elements
when inserting or deleting an item. This is why mutating an array in most cases is an O(n) operation.
Since it's a linear data structure with subsequent elements and memory addresses searching an element is
also an O(n) operation. You need to loop through all the elements until you find what you need. Of course,
you can use binary search if the array is sorted. Binary search is an O(log N) operation and quicksort is an
O(N * log N) one. The problem is that you need to sort the array every single time you want to find an
element. Or you need to keep it sorted all the time which makes inserts and deletes even worse.
What arrays are really good at is accessing random elements. It's an O(1) operation since all PHP needs to
do is calculating the memory address based on the index.
No. 62 / 217
Martin Joo - Performance with Laravel
Linked list
Since arrays have such a bad performance when it comes to inserting and deleting elements engineers
came up with a linked list to solve these problems.
A linked list is a logical collection of random elements in memory. They are connected only via pointers.
Each item has a pointer to the next one. There's another variation called doubly linked list where each
element has two pointers: one for the previous and one for the next item.
The memory addresses are not subsequent. This has some interesting implications:
Since a linked list is not a coherent structure in memory, inserts always have a better performance
compared to an array. PHP doesn't need to shift elements. It only needs to update pointers in nodes.
A linked list is an excellent choice when you need to insert and delete elements frequently. In most cases, it
takes considerably less time and memory.
No. 63 / 217
Martin Joo - Performance with Laravel
Binary tree
The term binary tree can be misleading since it has lots of special versions. However, a simple binary means
a tree where every node has two or fewer children.
The only important property of this tree is that each node has two or fewer children.
No. 64 / 217
Martin Joo - Performance with Laravel
Now, let's think about how much it takes to traverse a binary tree. For example, in the first tree, how many
steps does it take to traverse from the root node to one of the leaf nodes (9, 5, 6, 5)? It takes three steps. If I
want to go to the left-most node (9) it'd look like this (we're already at the root node):
Now let's do the same with the second tree. How many steps does it take to go to the leaf node (to 43,
starting from the root)? 6 steps.
Both trees have 7 nodes. Using the first one takes only 2 steps to traverse to one of the leaf nodes but using
the second one takes 6 steps. So the number of steps is not a function of the number of nodes but the
height of the tree which is 2 in the first one and 6 in the second one. We don't count the root node.
Both of these trees have a name. The first one is a complete tree meaning every node has exactly two
children. The second one is a degenerative tree meaning each parent has only one child. These are the two
ends of the same spectrum. The first one is perfect and the other one is useless.
In a binary tree, density is the key. The goal is to represent the maximum number of nodes in the smallest
depth binary tree possible.
The minimum height of a binary tree is log n which is shown in the first picture. It has 7 elements and the
height is 3.
The maximum height possible is n-1 which is shown in the second picture. 7 elements with a height of 6.
No. 65 / 217
Martin Joo - Performance with Laravel
From these observations, we can conclude that traversing a binary tree is an O(log h) operation where h
is the height of the tree.
To put it in context, if you have a tree with 100,000,000 elements and your CPU can run 100,000,000
operations per second:
There's a 3,846,153 time difference between the two so engineers came up with the the following
conclusion: if a tree is structured well it can traverse it in O(log n) time which is far better than arrays or
linked lists.
No. 66 / 217
Martin Joo - Performance with Laravel
Each node has a left child that is less than or equal to itself
The fact that the tree is ordered makes it pretty easy to search elements, for example, this is how we can
find 5.
Eight is the starting point. Is it greater than 5? Yes, so we need to continue in the left subtree.
No. 67 / 217
Martin Joo - Performance with Laravel
No. 68 / 217
Martin Joo - Performance with Laravel
Is 4 greater than 5? Nope. Each node has a right child that is greater than itself. So we go right.
No. 69 / 217
Martin Joo - Performance with Laravel
Is 5 equal to 5? Yes.
We found a leaf node in just 3 steps. The height of the tree is 3, and the total number of elements is 9. This
is the same thing we discussed earlier. The cost of the search is (O log N) .
So if we take a usual binary tree and add two constraints to it so it is ordered at any time we have (O log
N) search.
Unfortunately, the constraints of a BST don't tell anything about balance. So this is also a perfectly fine BST:
No. 70 / 217
Martin Joo - Performance with Laravel
Each node has two or fewer children. The left child is always less than or equal to the parent. The right child
is always greater than the parent. But the right side of the tree is very unbalanced. If you want to find the
number 21 (the bottom node in the right subtree) it becomes an O(N) operation.
No. 71 / 217
Martin Joo - Performance with Laravel
For simplicity, let's assume a row takes 128 bytes to store on the disk. When you read something from the
disk the smallest unit possible is 1 block. You cannot just randomly read 1 bit of information. The OS will
return the whole block. For this example, we assume a block is 512 bytes. So we can fit 4 records (4 * 128B)
into one block (512B). If we have 100 records we need 25 blocks.
If you run the following query against this table (assuming no index, no PK):
select *
from users
where id = 50
It reads the first block from the disk that contains row #1 - row #4
In the worst-case scenario, it executes 25 I/O operations scanning the table block-by-block. This is called a
full table scan. It's slow. So engineers invented indexing.
No. 72 / 217
Martin Joo - Performance with Laravel
Single-level indexing
As you can see, the problem was the size and the number of I/O operations. Can we reduce it by introducing
some kind of index? Some kind of secondary table that is smaller and helps reduce I/O operations? Yes, we
can.
The index table stores every record that can be found in users . They both have 100 rows. The main benefit
is that the index is small. It only holds an ID that is equivalent to the ID in the users table and a pointer.
This pointer points to the row on the disk. It's some kind of internal value with a block address or something
like that. How big is this index table?
Let's assume that both the ID and ptr columns take up 8 bytes of space. So a record's size in the index table
is 16 bytes.
No. 73 / 217
Martin Joo - Performance with Laravel
Only 4 blocks are needed to store the entire index on disk. To store the entire table the number of blocks is
25. It's a 6x difference.
select *
from users
where id = 50
When it finds #50 in the index it queries the table based on the pointer which is another I/O
In the worst-case scenario, it executes 5 I/O operations. Without the index table, it was 25. It's a 5x
performance improvement. Just by introducing a "secondary table."
No. 74 / 217
Martin Joo - Performance with Laravel
Multi-level indexing
An index table made things much better, however, the main issue remained the same: size and I/O
operations. Now, imagine that the original users table contains 1,000 records instead of 100. This is what
the I/O numbers would look like:
Everything is 10x larger, of course. So engineers tried to divide the problem even more by chunking the size
into smaller pieces and they invented multi-level indexes. Now we said that you can store 32 entries from
the index table in a single block. What if we can have a new index where every entry points to an entire
block in the index table?
Each entry in the second level index points to a range of records in the first level index:
etc
No. 75 / 217
Martin Joo - Performance with Laravel
Each row in L2 points to a chunk of 32 rows in L1 because that's how many records can fit into one block of
disk.
If the L1 index can be stored using 40 blocks (as discussed earlier), then L2 can be stored using 40/32 blocks.
It's because in L2 every record points to a chunk of 32 records in L1. So L1 is 32x bigger than L2. 1,000 rows
in L1 is 32 rows in L2.
select *
from users
where id = 50
Now we can find a specific row by just reading 4 blocks from the disk.
They were able to achieve a 62x performance improvement by introducing another layer.
No. 76 / 217
Martin Joo - Performance with Laravel
No. 77 / 217
Martin Joo - Performance with Laravel
B-Tree
In 1970, two gentlemen at Boeing invented B-Trees which was a game-changer in databases. This is the era
when Unix timestamps looked like this: 1 If you wanted to query the first quarter's sales, you would write
this: between 0 and 7775999 . Black Sabbath released Paranoid. Good times.
What does the B stand for? They didn't specify it, but often people call them "balanced" trees.
A B-Tree is a specialized version of an M-way tree. "What's an M-way tree?" Glad you asked!
Each node holds more than one value. To be precise a node can have m-1 values (or keys).
The keys in children nodes are also ordered compared to the parent node (such as 10 is at the left side
of 20 and 30 is at the right side)
Since it's a 3-way tree a node can have a maximum of 3 children and can hold up to two values.
No. 78 / 217
Martin Joo - Performance with Laravel
The problem is however, there are no rules or constraints for insertion or deletion. This means you can do
whatever you want, and m-way trees can become unbalanced just as we see with binary search trees. If a
tree is unbalanced searching becomes O(n) which is very bad for databases.
So B-Trees are an extension of m-way search trees. They define the following constraints:
I don't know how someone can be that smart but these three simple rules make B-trees always at least half
full, have few levels, and remain perfectly balanced.
There's a B-Tree visualizer website where you can see how insertion and deletion are handled and how the
tree remains perfectly balanced at all times.
Of course, in the case of a database, every node has a pointer to the actual record on disk just as we
discussed earlier.
The next important thing is this: MySQL does not use a standard B-Tree. Even though we use the word
BTREE when creating an index it's actually a B+ Tree. It is stated in the documentation:
The use of the term B-tree is intended as a reference to the general class of index design. B-tree
structures used by MySQL storage engines may be regarded as variants due to sophistications not
present in a classic B-tree design. - MySQL Docs
He built multiple forks of MySQL, for example, Twitter MySQL, he was the head of Cloud SQL at Google and
worked on the internals of MySQL and InnoDB.
No. 79 / 217
Martin Joo - Performance with Laravel
There are two issues with a B-Tree. Imagine a query such as this one:
select *
from users
where id in (1,2,3,4,5)
From 4 to 2
From 2 to 1
From 1 back to 2
From 2 to 3
From 3 to 2
From 2 to 4
From 4 to 6
From 6 to 5
The other problem is wasting space. There's one thing I didn't mention so far. In this example, only the ID is
present on the tree. Because this example is a primary key index. But of course, in real life, we add indexes
to other columns such as usernames, created_at, other dates, and so on. These values are also stored in the
tree.
An index has the same number of elements as the table so its size can be huge if the table is big enough.
This makes a B-Tree less optimal to load into memory.
No. 80 / 217
Martin Joo - Performance with Laravel
B+ Trees
As the available size of the memory grew in servers, developers wanted to load the index into memory to
achieve really good performance. B-Trees are amazing, but as we discussed they have two problems: size
and range queries.
Surprisingly enough, one simple property of a B-Tree can lead us to a solution: most nodes are leaf nodes.
The tree above contains 15 nodes and 9 of them are leaves. This is 60%.
Sometime around 1973, someone probably at IBM came up with the idea of a B+ Tree:
This tree contains the same numbers from 1 to 15. But it's considerably bigger than the previous B-Tree,
right?
Every value is present as a leaf node. At the bottom of the tree, you can see every value from 1 to 15
Some nodes are duplicated. For example, number 2 is present twice on the left side. Every node that is
not a leaf node in a B-Tree is duplicated in a B+ Tree (since they are also inserted as leaf nodes)
Leaf nodes form a linked list. This is why you can see arrows between them.
With the linked list, the range query problem is solved. Given the same query:
select *
from users
where id in (1,2,3,4,5)
No. 81 / 217
Martin Joo - Performance with Laravel
Once you have found the first leaf node, you can traverse the linked list since it's ordered. Now, in this
specific example, the number of operations is the same as before, but in real life, we don't have a tree of 15
but instead 150,000 elements. In these cases, linked list traversal is way better.
So the range query problem is solved. But how does an even bigger tree help reduce the size?
The trick is that routing nodes do not contain values. They don't hold the usernames, the timestamps, etc.
They are routing nodes. They only contain pointers so they are really small items. All the data is stored at
the bottom level. Only leaf nodes contain our data.
Leaf nodes are not loaded into memory but only routing nodes. As weird as it sounds at first according to
PostgreSQL this way the routing nodes take up only 1% of the overall size of the tree. Leaf nodes are the
remaining 99%:
Each internal page (comment: they are the routing nodes) contains tuples (comment: MySQL stores
pointers to rows) that point to the next level down in the tree. Typically, over 99% of all pages are leaf
pages. - PostgreSQL Docs
So database engines typically only keep the routing nodes in memory. They can travel to find the necessary
leaf nodes that contain the actual data. If the query doesn't need other columns it's essentially can be
served using only the index. If the query needs other columns as well, MySQL reads it from the disk using
the pointers in the leaf node.
I know this was a long introduction but in my opinion, this is the bare minimum we should know about
indexes. Here are some closing thoughts:
Both B-Trees and B+ trees have O(log n) time complexity for search, insert, and delete but as we've
seen range queries perform better in a B+ tree.
The nodes in real indexes do not contain 3 or 4 keys as in these examples. They contain thousands of
them. To be precise, a node matches the page size in your OS. This is a standard practice in databases.
Here you can see in MySQL's source code documentation that the btr_get_size function, for
example, returns the size of the index expressed as the number of pages. btr_ stands for btree .
Interestingly enough MongoDB uses B-Trees instead of B+ Trees as stated in the documentation.
Probably this is why Discord moved to Cassandra. They wrote this on their blog:
Around November 2015, we reached 100 million stored messages and at this time we started to see
the expected issues appearing: the data and the index could no longer fit in RAM and latencies
started to become unpredictable. It was time to migrate to a database more suited to the task. -
Discord Blog
No. 82 / 217
Martin Joo - Performance with Laravel
Now, let's apply what we've learned. When the following command is executed:
we know that the leaf nodes of the B+ Tree will contain last_name and first_name . This fact leads us to
the most important rule of indexing: You create an index for a specific query. Or a few queries. But an
index is not generic which will make your whole application magically faster.
So it's easy to find user #100. We also know that last_name and first_name are present in the leaf nodes.
They are all loaded from the disk with that information. If the query only needs these two columns then
there's no need to load the entire row from the disk.
No. 83 / 217
Martin Joo - Performance with Laravel
Since the query needs only these two columns the DB engine won't touch the pointer and won't execute
extra I/O operations.
The index cannot be used anymore. Or at least not in the same way with the same efficiency. In this case,
job_title cannot be found in the tree so MySQL needs to run I/O operations.
No. 84 / 217
Martin Joo - Performance with Laravel
Access type which can be found in the type column of the output is the most important part of explain . It
tells us how MySQL will or will not use the indexes in the database.
We'll use this simple table to explore how MySQL runs queries and uses indexes:
No. 85 / 217
Martin Joo - Performance with Laravel
const
Given the following query:
select *
from users
where id = 1
This is the output of explain select parsed with Tobias Petry's MySQL Explain tool.
In the type column we can see const . This type is used when MySQL is able to see that only one row can
be selected based on a unique key index. It is the most efficient type as it involves only a single constant
value lookup. In the possible_keys column, we can see what indexes MySQL considered to use. In the
key column we can see the index that was actually used. In both cases, there's the PRIMARY which is
created by MySQL, based on the unique, auto-increment ID column.
Quick note: if you use UUIDs as your primary keys, your index is going to be huge. Storing a bigint
unsigned requires 8 bytes, while storing a char(32) column requires 32 bytes. It's a 4x difference in size. A
bigger index requires more space, more memory, and is slower to search in.
So the const type is used because of the where id = 1 clause. It's super fast, and works like this:
In just O(log N) time it's able to find the node and it loads the data from the disk since the query uses
select *
No. 86 / 217
Martin Joo - Performance with Laravel
range
Given this query:
select *
from users
where id in (1,2)
It's a range type. This means that MySQL can traverse the B+ tree and find the first node that satisfies our
query. From that point on, it can traverse the linked link (leaf nodes) to get all the other nodes as well. It's a
very great access type since in just O(log N) time it's able to find the first node of the range. And from that
point on it only needs to inspect the x number of elements where x is the number of IDs in the in (1,2)
clause. It is often used for queries with range conditions such as between , in , etc.
So the range access type can be pretty fast. However, the database still needs to perform I/O operations
after it finds the nodes.
No. 87 / 217
Martin Joo - Performance with Laravel
range (again)
Let's make one small change in the previous query:
select id
from users
where id in (1,2)
It's the same range type. However, there's an important new item in the extra column: Using index . It
means using only the index. Since the query only needs the id columns (not * ) the index is a covering
index meaning it contains everything the query needs. It covers the query.
The visual representation looks almost the same but without the extra I/O lines on the bottom:
No. 88 / 217
Martin Joo - Performance with Laravel
index
The new query is:
select id
from users
where id !, 1
It's an index type. In this case, MySQL cannot identify a range or a single row that needs to be returned.
The best it can do is to scan the entire index. This sounds good, or at least better than a full table scan but
it's still an O(n) operation. Generally speaking, it's not that bad, however, it can cause problems if n (your
table) is large enough.
When using index is present in the extra columns it's a bit better since at least MySQL doesn't need
to perform extra I/O operations.
But when using index is not present it is generally speaking a slower query. According to MySQL it is
as bad as a full table scan:
The index join type is the same as ALL , except that the index tree is scanned. This occurs in two
ways:
If the index is a covering index for the queries and can be used to satisfy all data required from
the table, only the index tree is scanned. In this case, the Extra column says Using index . An
index-only scan usually is faster than ALL because the size of the index usually is smaller than
the table data.
A full table scan is performed using reads from the index to look up data rows in index order.
Uses index does not appear in the Extra column.
MySQL Docs
No. 89 / 217
Martin Joo - Performance with Laravel
Of course, 0001 is not selected but it is scanned and checked against the filter where id != 1
No. 90 / 217
Martin Joo - Performance with Laravel
ALL
Finally, the last query is this:
It's an ALL type. I'd like to quote from Kai Sassnowski's awesome video: "avoid at all costs."
This type of query runs a full table scan so MySQL doesn't use the index at all. In this example, it's because
we filter based on the first_name column which is not part of any index. MySQL essentially runs a for loop
and scans every row until it finds John.
No. 91 / 217
Martin Joo - Performance with Laravel
If you want to try similar queries and check the explain output make sure your table contains at least a
few hundred rows. Otherwise, MySQL might choose to run a full table scan because the table is so small
that the full scan is actually faster than deciding between different optimization strategies.
No. 92 / 217
Martin Joo - Performance with Laravel
Select *
From the examples above we can come to an observation: select * is usually not a great thing to do. It
can be the difference between traversing a B+ Tree and executing thousands of extra I/O operations. Here
are some things about select * :
Index usage. As we discovered it may prevent the optimizer from utilizing indexes efficiently.
Network traffic. MySQL connections are simple TCP (network) connections. When you retrieve every
column vs just a few ones the difference can be big in size which makes TCP connections heavier and
slower.
Resource consumption. Fetching everything from disk just simply uses more CPU and memory as
well. The worst case scenario is when you don't need all the data and the query (without select * )
could have been served using only the index. In this case, the difference is "order of magnitudes."
So we can say, in general, it's a good thing to avoid select * queries and fetch only the rows you really
need. Of course, one of the disadvantages is this:
$user!'notify(new AbandonedOrdersNotification($orders));
}
In the getAbandonedOrders method you only select the order ID and the order items' names that are being
used in the Notification. The possible bug is that you need to know that the $orders collection contains
only specific columns. You cannot use $order->total for example because it's not loaded. These
properties will default to null . So if you have a nullable column you might think everything is great, you
just have an order where column x happens to be null , but in fact, the column is not even loaded.
No. 93 / 217
Martin Joo - Performance with Laravel
Composite indexes
There's another topic I want to cover before jumping into more complicated queries and indexes. Let's talk
about composite indexes.
Let's assume in this example we want to query orders by users in a given time period such as this:
select id
from orders
where user_id = 3001
and created_at between "2024-03-16 00:00:00" and "2024-04-16 23:59:59"
In my case, the table has 6,000 rows and 1,000 of them belongs to user #3001.
It all looks good. It's a range type Using index is present in the Extra column. Well, take a look at the
column called Rows . It says 5,513. Essentially, for some reason, MySQL thinks that it needs to look at every
single record in the database in order to execute the given query. It seems a bit weird especially since the
constraints in the query are quite strict:
No. 94 / 217
Martin Joo - Performance with Laravel
If user #3001 has 1,000 records overall then the date filter should narrow it down even more, right? In fact,
this query returns only 513 rows. So why does MySQL think that it needs to scan every node in the index?
The answer is the order of the columns in the index. We already discussed that indexes are sorted. The
same thing is true if you have a composite index but now it is ordered by two columns. First, created_at
and then user_id .
If we imagine the index as a table this is what the order looks like:
created_at user_id
2024-03-15 1
2024-03-16 1
2024-03-16 2
2024-03-16 3
2024-03-17 1
User IDs are sorted only in relation to the created_at dates. Just look at the value 1 in this table. It's all
over the place. So they are effectively unordered from this query's perspective:
Finds every node in the index between 2024-03-16 and 2024-04-16 which happens to be 5,500
records (90% of the table)
Traverses through them and discard the ones where the user ID is not equal to 3001
Even though it's a range query, in practice it is a full index scan ( index type) which is the second worst
query type.
No. 95 / 217
Martin Joo - Performance with Laravel
This is a much better execution plan because the database engine only wants to scan 513 rows. It's 10% of
the previous one. It's a 10x improvement. All of that is because of the column order in the index.
user_id created_at
1 2024-03-15
1 2024-03-16
1 2024-03-17
2 2024-03-16
3 2024-03-16
With this ordering, MySQL is able to select the following range: from row #1 to row #3 (which in the real
example is 500+ rows) and then perform the where filter on the created_at column.
No. 96 / 217
Martin Joo - Performance with Laravel
Cardinality
Cardinality means how unique your dataset is. How many unique values are there? For example:
In one month there are 2,678,400 unique timestamps (if we count only seconds). In this one-month
period, the created_at column has a cardinality of 2.6m
There's a good chance you don't have 2.6m users but less. The cardinality of the user_id is x where
x is the number of users. Maybe a few thousand, maybe tens of thousands.
If there are fewer users than timestamps in this example, then user_id has much fewer unique values. It
has a lower cardinality. The cardinality of a column determines the selectivity of the query. The more
selective a query is the fewer rows it returns. Or in human-readable form: fewer unique values = fewer
results = faster queries.
This is exactly what happened with the composite index in the previous example.
The index was ordered based on a column that has much fewer unique values
The result was that the optimizer was able to fetch a range of just 500 rows from the index
You should be able to exploit the fact that cardinality matters in some cases. One of the best examples is
when a column can have two or three values. Such as a bool, or a string label, for example, a status
column in a posts table where possible values are published , draft , or archived . These kinds of
columns can be excellent candidates for indexes and effective queries.
No. 97 / 217
Martin Joo - Performance with Laravel
In recent years, I worked on an application that had some social features. The app is being used in larger
companies and admins can post news, events, etc to users. This is one of the most used features and is a
good example of database indexing and queries.
The table itself is pretty simple and self explanatory but there are already some important things:
status is going to store three values: draft , published , and archived . If you know that the longest
word you are about to store is 9 characters, don't use a default varchar(255) column. If you use a
column in an index the size can have a big impact on performance. And if you have a status column
there's a good chance it's a candidate for an index.
content is a text column. In lots of projects somehow longtext became the default for storing
something like the content of a post. Do you know what is the size of a longtext column? It's
4,294,967,295 bytes. It's 4 billion characters. You probably don't need that much. A text column can
hold 65,535 characters.
The same goes for title . Usually, a column such as title , or name always ends up in an index. We
can save some space and set it to 150 . It's probably enough for users. If not, we can always resize.
Maybe now you say: "well, these are pretty minor stuff." Yeah, they are in some cases. In the Building an
analytics platform part I'll show you what these minor changes can do when you have a table with only a
few million records. TL;DR: save some space/memory/CPU and use the right data types.
No. 98 / 217
Martin Joo - Performance with Laravel
Listing posts by status. Admins (the owner of a post) want to see their own posts by status.
Feed
Publishing posts. There's a publish_at column in the table. It can hold past or future dates. If it's a
future one it means we need to automatically publish it. So in a real-world app, I would add a
background job that queries "publishable" posts and publishes them. We need a query for that.
In these examples, we'll only discuss raw MySQL queries. Remember, the main goal of this chapter is MySQL
and MySQL indexes, not some fancy Eloquent function.
No. 99 / 217
Martin Joo - Performance with Laravel
select *
from posts
where user_id = 13268
and status = 'published'
Of course, the table has no indexes right now so it's going to do a full table scan. We can easily identify the
right index for this query:
After so much theory I think this index is self-explanatory. The column with the lower cardinality goes first.
As we discussed earlier, it helps in most cases (however, in some cases it doesn't make a big difference)
After adding this index the execution plan looks like this:
It's a ref type. We haven't seen that one before. I had a hard time understanding what it was based on the
MySQL Docs. Fortunately, PlanetScale's definition by Aaron Francis is much easier to understand:
The ref access type is used when the query includes an indexed column that is being matched by an
equality operator. If MySQL can locate the necessary rows based on the index, it can avoid scanning
the entire table, speeding up the query considerably. - Aaron Francis
As far as I know, it has a similar performance as range . The optimizer is able to identify a range of records
(390 in this case) based on equality operators. A range access type occurs when we use a range operator
such as between , or > , etc.
So it is an efficient query that won't cause problems. We know from the select * part that MySQL will
perform lots of I/O operations to get every column from the disk. We can optimize this a bit:
status since the user requested posts by status so he/she must know it (hopefully)
Using select col1, col2 over select * has two interesting properties:
It's an optimization since MySQL returns less data over the TCP connection
But on the I/O level it doesn't matter at all. As we discovered earlier, the OS always reads an entire
block of data from the disk. This block contains multiple rows. Each row contains every column. So
unfortunately we cannot save I/O operations with this technique. Even if you have a table with 10
columns and your query looks like this:
select col1
from my_table
where id = 1
MySQL reads probably hundreds of rows with 10 columns from the disk (given you don't have an index on
col1 ).
Back to the query. The next feature request is to add a date filter:
Only two things changed: the filtered column went down from 100 to 11.11 . And the extra column
contains Using where . What does it mean?
MySQL uses the index to get 390 rows. It uses the status and the user_id columns
These 390 rows have to be checked against the publish_at between ... expression
MySQL estimates that only 11.11% of the rows will remain in the result. This means about 43 records
This is also the reason why Using where appeared in the extra column.
It all makes sense, right? However, usually, a lower filtered value indicates a slower query. Just think
about it. If MySQL had to drop 88.89% of rows it means that the index is not very good for this query. In a
perfect world, we would be able to serve the request using only an index and save the time to filter out 88%
of records.
To test this theory we can add the publish_at column to the index:
For this specific example, I seeded 177k posts from which 166K belong to user #13268. So the rows values
will be different:
As expected with the publish_at column being included in the index MySQL doesn't need to run extra
filters. The index can satisfy the query.
105ms when publish_at is not part of the index and MySQL needs to execute an extra filter
These numbers are quite small, however there's a 20% difference between the two queries.
While the filtered column is certainly not the most important part of explain it can be a good indicator
for a better index. However, focusing on the access type is much more important, in my opinion.
The next interesting thing is Using index condition in the extra column. It means that MySQL was able
to use the index to run every filter we have in the query. Previously, it must read full table rows because
publish_at was not part of the index. Now, it is part of the index, so it can be used to run the necessary
filters. What's the difference between Using index and Using index condition ?
Using index means that MySQL only used the index to satisfy the whole query.
Using index condition means that MySQL used the index to run the filters. However, it still had to
read from disk because in the select statement, we have created_at and title and they are not part
of the index.
Why not add these rows to the index as well? Because that index would contain almost the whole table and
it would consume so much memory and disk space that it can be problematic.
For example, if I add created_at and title to the index the query becomes a Using index one, and the
execution time is 77ms. Previously it was 84ms. So the time improvement is only 8.33% and I can guarantee
you that adding another timestamp and a varchar(255) to the index causes a memory consumption
increase that is larger than 8%.
Also, this query isn't worth optimizing anymore, I think. It's very unlikely that a single user has 166k posts in
an application. But even if that's the case:
It's fine.
Feed
This feature is not too realistic but it demonstrates another cool access type. Let's say we need to show a
feed to users that has no filter only a date. In a company environment, it's not realistic since you always
have some groups, visibilities, etc. But for a minute, assume this is a Facebook group and as one of the
members of the group, you'll see every post in a given time period.
select id
from posts
where publish_at between "2024-04-10" and "2024-04-17"
and status = "archived"
It's a range query as it should be, but take a look at the extra column. It says Using index for skip
scan . This is called a Skip Scan Range access method and it's beautiful.
1. Range scan
As we discussed earlier, in a range scan MySQL identifies a range of nodes from the tree. It uses only the
index to achieve that.
However, in this case, it cannot be done. The index can be visualized as a table such as this one:
archived 1 2024-04-17
archived 2 2024-03-28
archived 2 2024-04-01
draft 1 2024-03-01
draft 6 2024-02-28
published 4 2024-04-07
Let's say we're looking for archived posts in February 2024. First, MySQL traverses the tree to get only
archived rows:
archived 1 2024-04-17
archived 2 2024-03-28
archived 2 2024-04-01
And now what? Now it gets a subset of nodes ordered by user IDs. This is because user_id is the second
column in the index. The query, however, doesn't have a filter on user_id only on publish_at . This subset
of nodes is ordered by user_id . But we don't care about it right now. If we take it out from the table it
looks like this:
status publish_at
archived 2024-04-17
archived 2024-03-28
archived 2024-04-01
Now you can see what's the problem. From this perspective, timestamps are in completely random
order.
This means that MySQL cannot just traverse the tree to perform a binary search on the nodes because it's
not ordered by publish_at .
So MySQL doesn't have a choice other than performing a full index scan using the linked list at the bottom.
But it's such a waste. Using the subset shown above it has to be a better way to select the required rows.
Fortunately, there is.
archived 1 2024-04-17 1
archived 2 2024-03-28 2
archived 2 2024-04-01 2
Construct a range based on this user_id and the date filter from the original query. So it will construct
a look-up that can imagined like this:
select id
from posts
where status = "archived"
and user_id = 1
and publish_at between "2024-04-10" and "2024-04-17"
In the real world, of course, it's not a MySQL query but an index lookup. This imaginary query examines Row
#1 which satisfies the filters. So we can skip to the next unique value of user_id and repeat the process:
Construct a range based on this user_id and the date filter from the original query:
select id
from posts
where status = "archived"
and user_id = 2
and publish_at between "2024-04-10" and "2024-04-17"
This imaginary query will examine Row #2 and Row #3. Both failed the filters so they got rejected from the
results.
The lookup is done in this specific example and only Row #1 made it.
As you can imagine, the cost of this operation is much lower than a full index scan which is O(n) . The skip
scan range requires O(m) iterations where m is the distinct number of users that have archived posts
which cannot be larger than the number of all users.
Once again, cardinality and the order of columns matter a lot in a composite index! Be aware of these
things.
Publishing posts
And finally, let's write some PHP/Laravel code! This example is not strictly related to database indexing but
it's an interesting one so I left it here.
The next feature is publishing scheduled posts. To do that we need to query all publishable posts, where
publishable means:
Then we loop through the posts mark them as published and send notifications to the audience. I added a
new relationship to users. Each user can have subscribers that are stored in a subscriptions table:
author_id subsriber_id
1 10
1 11
2 23
Author #2 has two subscribers in this example. Both columns are foreign keys to the users table. There are
two relationships in the User model:
namespace App\Models;
It's an n-n relationship where each author (user) can have many subscribers (users) and each subscriber
(user) can subscribe to many authors (users).
namespace App\Jobs;
$post!'status = 'published';
$post!'save();
}
}
}
It tries to load all publishable posts into memory at once. It works perfectly for a small number of posts
but the moment you try to load tens of thousands of them it will slow down your server or exceed your
memory limit and fail.
It tries to execute potentially thousands of update queries in a pretty short time. It'll spam the
database which can cause a slowdown or even worse an outage.
Neither the author nor the subscriptions relationships are lazy loaded. The for loop causes
potentially thousands of extra queries because of the $post->author->subscriptions line.
The goal here is to "divide et impera" or divide and conquer. We need to chunk this big task into smaller
more manageable pieces.
To avoid loading 10,000 posts into memory at once we can use one of Laravel's built-in helpers:
lazy and lazyById that give us a LazyCollection that can be used just like a normal one but it uses
generators and keeps only one record in memory at once
chunk and chunkById that works the same but instead of one collection they return chunks of data in
predefined sizes
We'll use chunkById since it's a better fit for the use case:
This is going to load 100 posts into memory at once and dispatch another job that handles the notification
and the status update. The new job is the reason why chunkById is a better fit than lazyById .
!'with('author.subscriptions:id,email')
line. It prevents N+1 queries and eager loads the required relationships. The :id,email at the end makes
sure we only load columns we actually need. Subscribers are required because of the notification so it only
needs an ID and an email address.
The memory problem is solved, now let's write the other job.
namespace App\Jobs;
It takes a collection of Posts and then updates them in one query. After that, it sends the notifications. The
collection has always 100 posts which is not a lot and it won't cause problems in the query. If you have
25,000 elements then you end up with a query like this:
update posts
set status = "published"
where id in (1,2,3,4,5,6, !!% 24997,24998,24999,25000)
This query is going to be huge and it'll take a long time to run and consume lots of memory. As a general
rule of thumb, a few hundred or a few thousand items won't cause problems but tens of thousands
probably will.
Before measuring performance let's think about the numbers if the job needs to publish 10,000 posts:
Each chunk job runs only one query. So overall 100 queries will be executed.
Measuring performance
The job needs to publish 11k posts. On the first try, it failed after about 15 seconds:
The interesting thing is that it did not fail when loading 11k models into memory but after that. It processed
10.5k records:
So it looks like loading 11k rows wasn't a problem but updating them in one go caused the process to fail.
So the straightforward implementation wasn't good enough to handle 11k records. Now let's see the
optimized one.
The whole process (meaning all the jobs) took 8-10 seconds to complete with 11k posts using 4 workers.
This might be surprising but the PublishPostsJob executed 40 queries:
As you can see, there are lots of subscriptions , users , and posts queries. The reason is that chunkById
or lazyById will run multiple database queries. It will generate a query such as this one:
select
`id`,
`title`,
`user_id`
from
`posts`
where
`status` = 'draft'
and `publish_at` !- '2024-04-18 01:14:14'
and `id` > 175810
order by
`id` asc
limit
100
It chunks the query based on the ID using the number you give it (100 in this case).
We used eager load so it's not just 1 x n where n is the number of chunks but 3 * n since we loaded two
additional relationships using eager-load. These are pretty small, performant queries and this job is
executed only once at the beginning so there's nothing to worry about.
For some reason, each job runs 50 queries. Lots of them target the users and the subscriptions table.
Well, it looks like an N+1 query caused by this row:
This is where we access the author (querying from the users table) and the subscriptions relationships
(reaching out to the subscriptions table).
Post!$query()
!" This line should avoid N+1
!'with('author.subscriptions')
!'select('id', 'title', 'user_id')
!'where('status', 'draft')
!'where('publish_at', '!-', now())
!'chunkById(100, function (Collection $posts) {
PublishPostChunkJob!$dispatch($posts);
});
Well, this is the harsh reality of queue jobs: models are serialized and eager-loaded relationships are lost.
They are not there when your worker processes the job.
Let me repeat that: eager-loaded relationships are not loaded in the worker process. It's an automatic
N+1 query.
$this!'posts!'load('author.subscriptions');
As you can see the update query is pretty fast as well. The overall time for the job is between 20-30ms:
This last feature was not strictly related to database indexes but I hope you found it useful.
Now that we have scratched the surface of jobs let's talk about parallelism, concurrency, and async
workflows. My all-time favorite topic since I was introduced to NodeJS and the event loop.
Async workflows
You probably already know what an async task means but here's a quick summary: asynchronicity means
that a program can perform tasks independently of the main execution flow. In a typical Laravel app, it
means we run background jobs while users can still interact with the application.
There's another concept that we don't usually use in PHP. It's called parallelism. It means simultaneous
execution of multiple tasks, where each task is split into subtasks that can be processed concurrently. Later,
I'm going to show you some examples of concurrent workflows in Laravel.
It's the process of extracting data from websites. It involves fetching the HTML content of a web page and
then parsing the data to extract the desired information, such as text, images, links, or other content. This
data can then be saved, analyzed, or used for various purposes.
Usually, these kinds of scrapers are used to fetch some product and price information or simply to fetch e-
mail addresses from websites and span the hell out of them. In this example, we'll build a simple one that
discovers a site and then simply extracts H1 and H2 tags. It only handles that simple scenario and I only
tested it with Freek's and my blog.
3. Go to that URL
5. Repeat Step 2
There's one major problem and it's Step 5 . Can you guess why? Just think about it for a minute. Can you
tell how many URLs a given website has before you start the scraping process?
You cannot, unfortunately. It's not like importing a huge CSV with 1,000,000 records. You know it has
1,000,000 rows and you can dispatch 1,000 jobs each processing 1,000 rows.
But if we don't know how many jobs we should start, how can we tell if every job has succeeded and we can
start exporting the results?
On top of that, discovering URLs involves recursion which makes everything 10% more weird.
As far as I know, when you don't know how many jobs do you need you cannot determine if they succeded
natively with Laravel's toolset.
DiscoverPageJob is the one with recursion. It fetches the content of a given URL and looks for a tags.
It dispatches another DiscoverPageJob for every href it found.
ScrapePageJob is the one that finds h1 and h2 tags and fetches their content.
There are a number of different approaches to running these jobs. Here's an example website that helps us
understand these approaches:
The page /contact has no further links. Let's see two different solutions and compare them which would
be faster.
Discover first
Discover every page and then dispatch the scrape jobs. This would look like this:
DiscoverPageJob('/')
DiscoverPageJob('/blog')
DiscoverPageJob('/blog/first-article')
DiscoverPageJob('/blog/second-article')
DiscoverPageJob('/products')
DiscoverPageJob('/products/first-product')
DiscoverPageJob('/products/second-article')
DiscoverPageJob('/contact')
These jobs result in the URLs that the given website has. After that, we can dispatch 8 ScrapePageJobs for
the 8 URLs.
What does "good" mean? Good means two things in this example:
We have two runners and we don't want to have idle time when it's not necessary.
What are the other alternatives? Later we'll talk about them.
So there are two runners and we want to use them effectively. Let's simulate a scraping process:
Worker #1 Worker #2
DiscoverPageJob('/') -
Worker #1 Worker #2
DiscoverPageJob('/blog') -
Worker #1 Worker #2
DiscoverPageJob('/blog/first-article') DiscoverPageJob('/blog/second-article')
This "branch" of the discovery process has ended. There are no other pages on the /blog branch. So it
goes back to the /products branch:
Worker #1 Worker #2
DiscoverPageJob('/products') -
Worker #1 Worker #2
DiscoverPageJob('/products/first-product') DiscoverPageJob('/products/second-product')
The branch has ended. It goes back to the last leaf of Level 2:
Worker #1 Worker #2
DiscoverPageJob('/contact') -
/contact does not have links so the discovery process has been completed.
At this point, we have all 8 URLs so we can dispatch the 8 ScrapePageJob in 4 ticks. Both workers process
one job at a time.
So it took 6 ticks to discover the website and then 4 ticks to scrape it. Overall it's 10 ticks.
In the discovery process Worker #2 was idle in 4 ticks. It was idle 4/6 or 67% of the time. That's
probably not good.
Overall the two workers were utilized 8/12 or 67% of the time.
Number of ticks: 10
This is the second approach. Discover one level of the website and scrape it immediately.
ScrapePageJob('/blog')
DiscoverPageJob('/blog')
ScrapePageJob('/products')
DiscoverPageJob('/') ScrapePageJob('/')
DiscoverPageJob('/products')
ScrapePageJob('/contact')
DiscoverPageJob('/contact')
You can immediately see the difference. As soon as we start discovering a page we can scrape it at the same
time.
The DiscoverPageJob dispatches other Discover and also Scrape jobs. In the case of the home page, it
finds three links: /blog , /products , and /contact so it dispatches 6 jobs to discover and scrape these 3
pages. This results in 6 pending jobs in the queue waiting to be processed.
ScrapePageJob('/products')
DiscoverPageJob('/products')
ScrapePageJob('/contact')
DiscoverPageJob('/contact')
ScrapePageJob('/blog') DiscoverPageJob('/blog')
ScrapePageJob('/blog/first-article')
DiscoverPageJob('/blog/first-article')
ScrapePageJob('/blog/second-article')
DiscoverPageJob('/blog/second-article')
Workers process the first jobs from the queue which is discovering and scraping the blog page. It has two
links, so the discover job dispatches 4 new jobs.
ScrapePageJob('/contact')
DiscoverPageJob('/contact')
ScrapePageJob('/blog/first-article')
DiscoverPageJob('/blog/first-article')
ScrapePageJob('/blog/second-article')
ScrapePageJob('/products') DiscoverPageJob('/products')
DiscoverPageJob('/blog/second-article')
ScrapePageJob('/product/first-product')
DiscoverPageJob('/product/first-product')
ScrapePageJob('/product/second-product')
DiscoverPageJob('/product/second-product')
This is the same but with the /products page. We know that there are no other links on the webpage so
from here we can process everything.
4 ScrapePageJob('/contact') DiscoverPageJob('/contact')
5 ScrapePageJob('/blog/first-article') DiscoverPageJob('/blog/first-article')
6 ScrapePageJob('/blog/second-article') DiscoverPageJob('/blog/second-article')
7 ScrapePageJob('/product/first-product') DiscoverPageJob('/product/first-product')
8 ScrapePageJob('/product/second-product') DiscoverPageJob('/product/second-product')
There's a 20% decrease in the number of ticks and a ~25% increase in utilization.
When I introduced the "Discover first" approach I asked the question, "Is this a good approach?" And then I
gave you a bit more information:
What does "good" mean? Good means two things in this example:
We have two runners and we don't want to have idle time when it's not necessary.
What are the other alternatives? Later we'll talk about them.
Now we can clearly see it was not a "good" approach. At least not the best.
The point is that if you want to make something async by using jobs try to think parallel. Try to squeeze as
much work out of your workers as possible.
Scraping is a model that represents the scraping of a website. It has a HasMany relationship to the
ScrapingItem model that contains every discovered URL on the site. This is the scrapings table:
id url created_at
I don't worry about user management and authentication right now but if it was a standard SaaS application
the scraping table would have contained a user_id column.
https://martinjoo.d
1 2 {"h1": "Hey", "h2s": []} done 2024-01-31
ev/blog
A Scraping has many ScrapingItem . One for every URLs. The content contains the h1 and all the h2 tags
on the given page.
Before we start a scraping process and dispatch the DicoverPageJob we have to create a Scraping model
and pass it to the job.
The second parameter to the job is the $currentUrl which is empty by default. It refers to the URL the job
has to discover. On the first run is the homepage so it defaults to the Scraping model's url property.
$depth refers to the current level that is being discovered just as I showed you earlier. $maxDepth sets a
limit where the job stops discovering the page. It's not necessary but it's a great way to avoid jobs running
for multiple hours or days (just imagine discovering 100% of Amazon).
$response = Http!$get($this!'currentUrl)!'body();
@$html!'loadHTML($response);
}
It fetches the content of the current URL and then it parses it as HTML using PHP's DOMDocument class.
Then we have to loop through the <a> tags scrape them and discover the further.
if (Str!$startsWith($href, '#')) {
continue;
}
) {
continue;
}
}
#second-header
http://google.com
As the next step, we can dispatch the scrape and the discover jobs to the new page we just found at $href .
However, links can be in two forms, absolute, and relative. Some <a> tags contain links such as
https://example.com/page-1 while others have a relative URL such as /page-1 . We need to handle this:
if (Str!$startsWith($href, 'http')) {
$absoluteUrl = $href;
} else {
$absoluteUrl = $this!'scraping!'url . $href;
}
The $absoluteUrl variable contains an absolute URL where we can send HTTP requests so it's time to
dispatch the jobs:
DiscoverPageJob!$dispatch(
$this!'scraping, $absoluteUrl, $this!'depth + 1, $this!'maxDepth
);
ScrapePageJob fetches the content of the page while DiscoverPageJob discovers all links on the page and
dispatches new jobs.
try {
$response = Http!$get($this!'url)!'body();
@$doc!'loadHTML($response);
$h1 = @$doc!'getElementsByTagName('h1')!'item(0)!'nodeValue;
$h2s = collect(@$doc!'getElementsByTagName('h2'))
!'map(fn ($item) !& $item!'nodeValue)
!'toArray();
$scrapingItem!'status = 'done';
$scrapingItem!'content = [
'h1' !& $h1,
'h2s' !& $h2s,
];
$scrapingItem!'save();
$scrapingItem!'save();
throw $ex;
}
}
}
It uses the same DOMDocument class to find h1 and h2 tags on the page and then it creates the
scraping_items record. If something goes wrong it sets the status to failed .
So we have the main logic. We can discover and scrape webpages (all right, I didn't show 100% of the code
here because it has some edge cases and small details but that's not important from the async point-of-
view. You can check out the source code.)
DiscoverPageJob!$dispatch(
$this!'scraping, $absoluteUrl, $this!'depth + 1, $this!'maxDepth
);
How can we tell if scraping has finished? Right now there's no way. It's just an endless stream of jobs
without any coordination.
Usually, when you want to execute a function or dispatch another job when a set of jobs has finished you
can use job batches:
Bus!$batch([
new FirstJob(),
new SecondJob(),
new ThirdJob(),
])
!'then(function () {
echo 'All jobs have completed';
})
!'dispatch();
However, in our case, we don't know exactly how many jobs there are because the DiscoverPageJob
recursively dispatches new ones.
!" !!%
Bus!$batch([
new ScrapePageJob($this!'scraping, $absoluteUrl, $this!'depth),
new DiscoverPageJob(
$this!'scraping, $absoluteUrl, $this!'depth + 1, $this!'maxDepth
),
])
!'then(function () {
var_dump('Batch has finished');
})
!'dispatch();
}
Another idea would be to create the batch before the loop, add the jobs to it inside, and then dispatch them
once the loop is finished:
$batch = Bus!$batch([]);
!" !!%
$batch!'add([
new ScrapePageJob($this!'scraping, $absoluteUrl, $this!'depth),
new DiscoverPageJob(
$this!'scraping, $absoluteUrl, $this!'depth + 1, $this!'maxDepth
),
]);
}
$batch
!'then(function () {
var_dump('Batch has finished');
})
!'dispatch();
But it's not a good solution either. The only difference is that each batch has more jobs but we still have 8
batches if we're using the sample website from earlier as an example. Each DiscoverPageJob created a
new batch with number of links * 2 jobs in it.
So using batches is the right path because they allow us to await jobs. However, as far as I know, our exact
problem cannot be solved with Laravel's built-in methods and classes.
What we want to do is count the number of jobs as we dispatch them and then decrease the counter as
workers process them.
$this!'dispatchJobBatch([
new ScrapePageJob($this!'scraping, $absoluteUrl, $this!'depth),
new DiscoverPageJob(
$this!'scraping, $absoluteUrl, $this!'depth + 1, $this!'maxDepth
),
]);
The next step is to implement the counter. Since we have multiple workers it has to be a "distributed"
counter available to all workers. Redis and the Cache facade is an awesome starting point. I mean, the
database cache_driver is equally amazing:
if (Cache!$has($jobCountCacheKey)) {
Cache!$increment($jobCountCacheKey, count($jobs));
} else {
Cache!$set($jobCountCacheKey, count($jobs));
}
Bus!$batch($jobs)
!'dispatch();
}
I'm using the Scraping object's ID as part of the cache key and I increment it with the number of jobs being
dispatched (usually 2).
So now we know exactly how many jobs are needed to scrape a given website:
Great! The next step is to decrease the counter as workers process the jobs. Fortunately, there's a
progress method available on the Bus object:
Bus!$batch($jobs)
!'progress(function () {
var_dump("Just casually making some progress ")
})
!'dispatch();
So the progress callback runs every time a job is processed in the batch. Exactly what we need:
if (Cache!$has($jobCountCacheKey)) {
Cache!$increment($jobCountCacheKey, count($jobs));
} else {
Cache!$set($jobCountCacheKey, count($jobs));
}
Bus!$batch($jobs)
!'progress(function () use ($jobCountCacheKey) {
Cache!$decrement($jobCountCacheKey);
})
!'dispatch()
}
Anytime you call this function and dispatch jobs the counter will be increased with the number of jobs
After every job that has been completed the counter decreases
Now we're ready to add the then callback and run a callback when every job has been completed:
Bus!$batch($jobs)
!'then(function () use ($jobCountCacheKey, $scraping,
$discoveredUrlsCacheKey) {
if (Cache!$get($jobCountCacheKey) !!+ 0) {
var_dump("Look! I'm ready ");
}
})
!'progress(function () use ($jobCountCacheKey) {
Cache!$decrement($jobCountCacheKey);
})
!'dispatch();
Exactly what we wanted! On the screenshot the var_dump is not the very last row, I know, but it's only the
terminal output. Logically it works as it needs to.
Don't forget that we can only run our callback if the counter is zero. So this is a very important line:
if (Cache!$get($jobCountCacheKey) !!+ 0) {
var_dump("Look! I'm ready ");
}
Once again, the then function is called after a batch has been finished so we need this if statement.
One more thing we can do is delete the cache key entirely after all jobs have been completed:
Cache!$delete($jobCountCacheKey);
}
})
And of course, instead of var_dumping I actually dispatch another export job but that's not important right
now, and there's a dedicated chapter for exports and imports.
if (Cache!$has($jobCountCacheKey)) {
Cache!$increment($jobCountCacheKey, count($jobs));
} else {
Cache!$set($jobCountCacheKey, count($jobs));
}
$scraping = $this!'scraping;
Bus!$batch($jobs)
!'then(function () use ($jobCountCacheKey, $scraping,
$discoveredUrlsCacheKey) {
if (Cache!$get($jobCountCacheKey) !!+ 0) {
Excel!$store(new ScrapingExport($scraping), 'scraping.csv');
Cache!$delete($jobCountCacheKey);
}
})
!'progress(function () use ($jobCountCacheKey) {
Cache!$decrement($jobCountCacheKey);
})
!'dispatch();
}
In these callbacks ( then , progress , etc) we cannot use $this . This is why there's this line $scraping =
$this->scraping and then the $scraping variable is being used in the then callback.
$response = Http!$get($this!'currentUrl)!'body();
@$html!'loadHTML($response);
if (Str!$startsWith($href, '#')) {
continue;
}
if (Str!$startsWith($href, 'http')) {
$absoluteUrl = $href;
} else {
$absoluteUrl = $this!'scraping!'url . $href;
}
$this!'dispatchJobBatch([
new ScrapePageJob($this!'scraping, $absoluteUrl, $this!'depth),
new DiscoverPageJob(
$this!'scraping, $absoluteUrl, $this!'depth + 1, $this!'maxDepth
),
]);
}
}
Plan your workflow because there's a big difference between different solutions
I included the sample website's source code in async-workflows/sample-website so you can watch/debug
the process if you'd like to. You can serve the site with:
php -S localhost:3000
And then you can start the whole process with an action:
$scraper = app(App\Actions\ScrapeAction!$class);
$scrape!'execute('http:!"localhost:3000');
Concurrent programming
PHP is traditionally a single-threaded, blocking language. But what does it mean?
sleep(5);
echo "Hello";
php -S localhost:3000
And then send two requests at the same time, this is what happens:
The first request took 5s to complete but the second one took almost 10s.
That's because PHP by default uses only one thread to execute your code. The first request occupied this
one thread for 5 seconds so the second one had to wait for 5s before it was processed.
This sounds lame. And fortunately, this is not what happens in production systems, right? I mean, if 10 users
use the application at the same time the app won't be 10 times slower for the last user. This is possible
because of PHP-FPM.
Earlier, I published a very detailed article about FPM, nginx, and FastCGI so if you don't know them very well,
please read this article. But here's the executive summary:
It balances the requests across the workers, but every request goes to one of them
The worker process boots up your application and executes your code
The main point is that in a production system, your application is not using only one thread. Each of those
worker processes uses different threads. So even though, in our codebase, we use one thread, the
execution environment runs multiple processes and multiple threads to handle multiple requests. And this
is a great architecture. Easy to write your code and easy to scale as well.
However, we can still end up in situations when one function takes a long time to run. No matter how great
FPM works this one function inside the request still needs to be executed and if it takes 10 seconds users
won't be happy. 90% of the time we can use jobs to process these long-running tasks in the background. In
an async way. However, what if we just can't do that? For example, there's a GET request and we need a
response immediately. So it's not like you can just dispatch a request because you need to respond in a
sync way, but the function takes seconds to run.
We can fork new processes. Before we do so, let's clarify three terms: program, process, and thread.
Program: A program is a set of instructions and data that are stored in a file and are intended to be
executed by a computer. Your index.php is a program.
Process: A process is an instance of a program that is being executed. It represents the entire runtime
state of a program, including its code, data, stack, registers, and other resources such as open files and
network connections. If you open htop you see processes. For example, this is an artisan command
(program) being executed as a process:
Thread: A thread is the smallest unit of execution within a process. A single process can contain
multiple threads, each running its own sequence of instructions concurrently. Eventually, a thread is
the actor that executes your code.
So now we know that a process is a running program with code and data. When we fork a new process, the
currently running process copies itself and creates a child process with the same code and data. I
highlighted the words copy and code for a reason that will be important in a minute.
pcntl_fork();
From that point, everything that follows pcntl_fork will run twice. For example:
var_dump('only once');
pcntl_fork();
var_dump('twice');
Remember, a process contains code and data. When we fork a new process, the currently running process
copies itself and creates a child process with the same code and data. So everything that comes after
pcntl_fork will be executed twice: by the parent and the child process as well.
But it's obviously not what we want. I mean, what's the point of having another process that does exactly
the same as the original one? Nothing.
If we are in the parent process the PID is an actual number such as 25477
So we can do this:
$pid = pcntl_fork();
if ($pid !!+ 0) {
var_dump('child process');
} else {
var_dump('parent process');
}
Of course, this is still weird. The whole code is one if-else statement without any user input and somehow
we managed to run both the if and the else branches at the same time. Imagine if this app had any
users.
But now you can see that we have different "branches." One for the child and one for the parent. Of course,
in a minute we'll have not one but 8 children and the basic idea is that:
A child process does the heavy. And lots of children can do lots of heavy work.
Let's simulate that the child process does some long-running tasks:
$pid = pcntl_fork();
if ($pid !!+ 0) {
var_dump('child process');
sleep(2);
var_dump('child finished');
exit;
} else {
var_dump('parent process');
}
As you can see, the parent process exits before the child process finishes its job.
One of the responsibilities of a parent is to wait for children. For that, we can use the pcntl_waitpid
function:
$pid = pcntl_fork();
if ($pid !!+ 0) {
var_dump('child process');
sleep(2);
var_dump('child finished');
exit;
} else {
var_dump('parent process');
pcntl_waitpid($pid, $status);
var_dump('parent finished');
}
So now we have two "branches." One that does the work and another one that manages these workers. It
can wait for them, maybe receive some outputs, and so on.
$childPids = [];
if ($pid !!+ 0) {
sleep(1);
exit;
} else {
$childPids[] = $pid;
}
}
The parent process collects all the PIDs in the $childPids array and waits for all of them in a simple
foreach . That's it and we have 8 worker processes.
By the way, you don't even need the else statement since the $pid is not 0 for the parent process and
there's an exit at the end of the if statement.
Maybe you're confused because I said that the $pid is always zero for a child process. Yes, it is. But it
doesn't mean that a child process doesn't have a PID. It means that the function pcntl_fork returns 0 for
the children. But they still have PIDs as you can see.
The next step is to actually do something in the child process. Let's start with something simple, but CPU-
bound such as calculating prime numbers. For now, let's count prime numbers in a given range:
$childPids = [];
if ($pid !!+ 0) {
$start = $i * 1_000_000 + 1;
exit;
} else {
$childPids[] = $pid;
}
}
Each child process processes 1,000,000 numbers. The first one processes numbers from 1 to 1,000,000 the
second one processes numbers from 1,000,001 to 2,000,000 and so on until 8,000,000
If we try the same function using only one processes this is the result:
So it took 23.9s
If we take a look at htop we can clearly see that all the cores are busy running the parent-child.php
script:
The next step is to "return" values from the child processes because right now they just echo out the
results.
Unfortunately, it's not simple because we cannot just use variables such as this:
$sum = 0;
if ($pid !!+ 0) {
$sum += counnPrimeNumbers(1, 1_000_000);
}
echo $sum;
$sum remains zero because each process has its own code and data so there are no "shared" variables
across processes.
What we need is Inter-Process Communication or IPC for short. IPC refers to the mechanisms and
techniques used by processes to communicate and synchronize with each other. Processes can exchange
data, share resources, and coordinate their activities. Exactly what we need.
One technique is using sockets. Sockets are communication endpoints that allow processes to
communicate with each other, either on the same machine or across a network. Right now, they're on the
same machine so we don't need network sockets.
AF_UNIX means it's a local socket using a local communication protocol family. This is what you need if your
processes live on the same machine.
But in order to communicate in a two-way manner (child -> parent, parent -> child) we need two of these
sockets. They can be created with the socket_create_pair function:
This is an old school function so instead of returning a value we have to pass an array as the last argument.
The function puts two sockets into it.
$pid = pcntl_fork();
if ($pid !!+ 0) {
socket_write($socketToParent, 'hello', 1024);
socket_close($socketToParent);
exit;
}
echo $childResult;
socket_close($socket);
The data you write into $socketToParent in the child process can be read from $socketToChild in the
parent process. We need to close the sockets as if they were files.
$childPids = [];
$primesCount = 0;
$pid = pcntl_fork();
if ($pid !!+ 0) {
$start = $i * 1_000_000 + 1;
socket_close($socketToParent);
exit;
} else {
$childPids[] = $pid;
$childSockets[$pid] = $socketToChild;
}
}
$primesCount += $childResult;
socket_close($socket);
}
var_dump($primesCount);
Now the child processes can communicate and return data to the parent process. The parent process can
manage the flow of the program.
Both the single-process and the multi-process versions give the same result:
As you can see we can gain a lot from using multiple processes. However, there are a lot of technical details
we need to consider. Also, the code doesn't look too good, to be honest, and it's easy to mess it up.
fork
Fortunately, there's a Spatie package (who else) that makes the process seamless. It's called spatie/fork.
$results = Fork!$new()
!'run(
!!%$callbacks,
);
var_dump(collect($results)!'sum());
Fork gives you a very high-level, easy-to-understand API and takes the low-level stuff away. But under the
hood, it uses pcntl_fork and sockets.
The run function takes a number of callback functions and it returns an array with the results of these
callbacks in order, for example:
$results = Fork!$new()
!'run(
fn () !& return 1;
fn () !& return 2;
);
Fork!$new()
!'run(
fn () !& Http!$get('https:!"foo.com'),
fn () !& Http!$get('https:!"bar.com'),
);
Let's say we are working on a financial application. There's a transactions table with hundreds of
thousands or millions of rows. We need to update a large chunk of these rows, let's say 10,000 rows:
$transactionIds = Transaction!$getIdsForUpdate();
Transaction!$query()
!'whereIn('id', $transactionIds)
!'update(['payout_id' !& $payout!'id]);
This is going to be a huge query with lots of memory consumption and it'll probably run for a long time.
$chunks = $transactionIds!'chunk(1000);
Now there are 10 smaller queries that update 1,000 rows each. It's usually a better solution than having one
query that updates 10,000 rows.
$chunks = $transactionIds!'chunk(1000);
$callbacks = [];
Fork!$new()
!'before(fn () !& DB!$connection('mysql')!'reconnect())
!'run(!!%$callbacks);
It creates 10 callback functions for the 10 chunks and then runs them using Fork. If the child processes run
database queries you have to include this line:
The function passed to before will run before every callback (child process). The package requires a
reconnect if we want to use the database in the child processes.
Execution time
Memory usage
As you can see, the parallel one is the clear winner by execution time. It's 2.2x faster than running one huge
query and 1.4x faster than using query chunks. However, there's no difference in memory usage.
These results are already great, in my opinion, but let's see what happens if we "scale" our app and try to
update 100,000 transactions instead of 10,000. The chunks are going to be the same size (1,000).
First of all, updating 100,000 rows in one database query is not possible on my system:
Second, the concurrent version wins with an even more impressive result:
Using query chunks took 4.9s. Using child processes took only 2.4s. So now there's a 2x time difference.
In general, the more tasks you have the more you can benefit from concurrent programming.
But as always, there's a trade-off. With child processes, you can process tasks faster but with a higher CPU
load.
Obviously, the goal is to take the most out of your CPU. However, if there are other important tasks on this
server, then it can cause problems because right now, this job is using ~80% of your server. There's not
much computing power left for other tasks.
Fortunately, there's an "in-between" solution that gives us the best of both worlds. We can limit the number
of concurrent tasks when using fork . Right now, it uses as many CPU cores as possible, which is eight in
my case. But we can limit that to, let's say four:
Fork!$new()
!'concurrent(4)
!'before(fn () !& DB!$connection('mysql')!'reconnect())
!'run(!!%$callbacks);
The time difference is almost negligible but of course, it depends on lots of things. You can play around with
these numbers and hopefully, you can find your sweet spot.
use GuzzleHttp\Client;
use GuzzleHttp\Promise\Utils;
$promises = [
'image' !& $client!'getAsync('/image'),
'png' !& $client!'getAsync('/image/png'),
'jpeg' !& $client!'getAsync('/image/jpeg'),
'webp' !& $client!'getAsync('/image/webp')
];
$responses = Utils!$unwrap($promises);
echo $responses['image'];
echo $responses['png'];
It's pretty similar to Fork's syntax but it uses an associative array instead and in the results, we can
reference the keys which is nice. Instead of the get function we need to use getAsync and then
Utils::unwrap to wait for the promises.
With these techniques, concurrent programming is not hard at all. It can be pretty useful in some situations.
Whenever you cannot use queue jobs maybe you can use Fork or concurrent Guzzle requests instead. But
of course, you can write concurrent logic inside the job too.
For example, let's say you use some 3rd party systems such as MailChimp, a CRM, etc. You want to sync
your users' data with these 3rd parties on a scheduled basis. You have 5,000 users. For each user, you need
to send 3 HTTP requests to 3 different 3rd parties. Here's what you can do:
You can seriously speed up a workflow such as this one. Just think about it. If each HTTP request takes
500ms to complete, traditionally it would take 5,000x3x0.5s=7,500s or 125 minutes or 2 hours. If you send
the requests concurrently then approximately each job would take only 500ms (instead of 1500ms) so the
whole workflow would take 5,000*0.5s=2,500s or 41 minutes.
supervisor
The next chapter is related to deployments but it's important to understand if we want to optimize worker
processes.
Whenever you're deploying worker processes it's a good idea to use supervisor .
The most important thing is that worker processes need to run all the time even if something goes wrong.
Otherwise, they'd be unreliable. For this reason, we cannot just run php artisan queue:work on a
production server as we do on a local machine. We need a program that supervises the worker process,
restarts them if they fail, and potentially scales the number of processes.
The program we'll use is called supervisor . It's a process manager that runs in the background (daemon)
and manages other processes such as queue:work .
[program:worker]
command=php /var/!!0/html/posts/api/artisan queue:work !)tries=3 !)verbose !)
timeout=30 !)sleep=3
We can define many "programs" such as queue:work . Each has a block in a file called supervisord.conf .
Every program has a command option which defines the command that needs to be run. In this case, it's the
queue:work but with the full artisan path.
[program:worker]
command=php /var/!!0/html/posts/api/artisan queue:work !)
queue=default,notification !)tries=3 !)verbose !)timeout=30 !)sleep=3
numprocs=2
In this example, it'll start two separate worker processes. They both can pick up jobs from the queue
independently from each other. This is similar to when you open two terminal windows and start two
queue:work processes on your local machine.
Supervisor will log the status of the processes. But if we run the same program ( worker ) in multiple
instances it's a good practice to differentiate them with "serial numbers" in their name:
[program:worker]
command=php /var/!!0/html/posts/api/artisan queue:work !)
queue=default,notification !)tries=3 !)verbose !)timeout=30 !)sleep=3
numprocs=2
process_name=%(program_name)s_%(process_num)02d
%(program_name)s will be replaced with the name of the program ( worker ), and %(process_num)02d will
be replaced with a two-digit number indicating the process number (e.g. 00 , 01 , 02 ). So when we run
multiple processes from the same command we'll have logs like this:
Next, we can configure how supervisor is supposed to start or restart the processes:
[program:worker]
command=php /var/!!0/html/posts/api/artisan queue:work !)
queue=default,notification !)tries=3 !)verbose !)timeout=30 !)sleep=3
numprocs=2
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
autostart=true tells supervisor to start the program automatically when it starts up. So when we start
supervisor (for example when deploying a new version) it'll automatically start the workers.
autorestart=true tells supervisor to automatically restart the program if it crashes or exits. Worker
processes usually take care of long-running heavy tasks, often communicating with 3rd party services. It's
not uncommon that they crash for some reason. By setting autorestart=true we can be sure that they are
always running.
[program:worker]
command=php /var/!!0/html/posts/api/artisan queue:work !)
queue=default,notification !)tries=3 !)verbose !)timeout=30 !)sleep=3
numprocs=2
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
stopasgroup and killasgroup basically mean: stop/kill all subprocesses as well when the parent process
(queue:work) stops/dies.
As I said, errors happen fairly often in queue workers, so it's a good practice to think about them:
[program:worker]
command=php /var/!!0/html/posts/api/artisan queue:work !)
queue=default,notification !)tries=3 !)verbose !)timeout=30 !)sleep=3
numprocs=2
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
redirect_stderr=true
stdout_logfile=/var/log/supervisor/worker.log
redirect_stderr=true tells supervisor to redirect standard error output to the same place as standard
output. We treat errors and info messages the same way.
That was all the worker-specific configuration we need but supervisor itself also needs some config in the
same supervisord.conf file:
[supervisord]
logfile=/var/log/supervisor/supervisord.log
pidfile=/run/supervisord.pid
pidfile=/run/supervisord.pid tells supervisor where to write its own process ID (PID) file. These files
are usually located in the run directory:
By the way, PID files on Linux are similar to a MySQL or Redis database for us, web devs.
They are files that contain the process ID (PID) of a running program. They are usually created by daemons
or other long-running processes to help manage the process.
When a daemon or other program starts up, it will create a PID file to store its own PID. This allows other
programs (such as monitoring tools or control scripts) to easily find and manage the daemon. For example,
a control script might read the PID file to determine if the daemon is running, and then send a signal to that
PID to stop or restart the daemon.
[supervisorctl]
serverurl=unix:!!1run/supervisor.sock
This section sets some options for the supervisorctl command-line tool. supervisorctl is used to
control Supervisor. With this tool, we can list the status of processes, reload the config, or restart processes
easily. For example:
supervisorctl status
connection: this is what Redis or MySQL is in Laravel-land. Your app connects to Redis so it's a
connection.
queue: inside Redis, we can have multiple queues with different names.
For example, if you're building an e-commerce site, the app connects to one Redis instance but you can
have at least three queues:
payments
notifications
default
Since payments are the most important jobs it's probably a good idea to separate them and handle them
with priority. The same can be true for notifications as well (obviously not as important as payments but
probably more important than a lot of other things). And for every other task, you have a queue called
default. These queues live inside the same Redis instance (the same connection) but under different keys
(please don't quote me on that).
So let's say we have payments, notifications, and the default queue. Now, how many workers do we need?
What queues should they be processing? How do we prioritize them?
A good idea can be to have dedicated workers for each queue, right? Something like that:
[program:payments-worker]
command=php artisan queue:work !)queue=payments !)tries=3 !)verbose !)
timeout=30 !)sleep=3
numprocs=4
[program:notifications-worker]
command=php artisan queue:work !)queue=notifications !)tries=3 !)verbose !)
timeout=30 !)sleep=3
numprocs=2
[program:default-worker]
command=php artisan queue:work !)queue=default !)tries=3 !)verbose !)
timeout=30 !)sleep=3
numprocs=2
ProcessPaymentJob!$dispatch()!'onQueue('payments');
$user!'notify(
(new OrderCompletedNotification($order))!'onQueue('notifications');
);
By defining the queue in the job you can be 100% sure that it'll always run in the given queue so it's a safer
option in my opinion.
So the two (in fact three because there's also the default) are being queued at the same time by dedicated
workers. Which is great, but what if something like that happens?
There are so many jobs in the notifications queue but none in the payments. If that happens we just waste
all the payments worker processes since they have nothing to do. But this command doesn't let them to
processes anything else:
This means they can only touch the payments queue and nothing else.
Because of that problem, I don't recommend you have dedicated workers for only one queue. Instead,
prioritize them!
We can do this:
The command means that if there are jobs in the payments queue, these workers can only process them.
However, if the payments queue is empty, they can pick up jobs from the notifications queue as well. And
we can do the same for the notifications workers:
Now payments workers also pick up jobs from the notifications so we don't waste precious worker
processes. But of course, if there are payments job they prioritize them over notifications:
In this example, only one payment job came in so one worker is enough to process it. All of this is managed
by Laravel!
# payment workers
php artisan queue:work !)queue=payments,notifications,default
# notification workers
php artisan queue:work !)queue=notifications,payments,default
# other workers
php artisan queue:work !)queue=default,payments,notifications
If there are a lot of payment jobs possibly three workers (more than three processes, of course) will
process them.
If there isn't any important job (payment or notification) there are a lot of workers available for default
jobs.
That's a tricky question, but a good rule of thumb: run one process for each CPU core.
But of course, it depends on several factors, such as the amount of traffic your application receives, the
amount of work each job requires, and the resources available on your server.
As a general rule of thumb, you should start with one worker process per CPU core on your server. For
example, if your server has 4 CPU cores, you might start with 4 worker processes and monitor the
performance of your application. If you find that the worker processes are frequently idle or that there are
jobs waiting in the queue for too long, you might consider adding more worker processes.
It's also worth noting that running too many worker processes can actually decrease performance, as each
process requires its own memory and CPU resources. You should monitor the resource usage of your
worker processes and adjust the number as needed to maintain optimal performance.
However, there are situations when you can run more processes than the number of CPUs. It's a rare case,
but if your jobs don't do much work on your machine you can run more processes. For example, I have a
project where every job sends API requests and then returns the results. These kinds of jobs are not
resource-heavy at all since they do not run much work on the actual CPU or disk. But usually, jobs are
resource-heavy processes so don't overdo it.
Queued jobs can cause some memory leaks. Unfortunately, I don't know the exact reasons but not
everything is detected by PHP's garbage collector. As time goes on, and your worker processes more jobs it
uses more and more memory.
--max-jobs tells Laravel that this worker can only process 1000 jobs. After it reaches the limit it'll be shut
down. Then memory will be freed up and supervisor restarts the worker.
--max-time tells Laravel that this worker can only live for an hour. After it reaches the limit it'll be shut
down. Then memory will be freed up and supervisor restarts the worker.
Often times we run workers and nginx on the same server. This means that they use the same CPU and
memory. Now, imagine what happens if there are 5000 users in your application and you need to send a
notification to everyone. 5000 jobs will be pushed onto the queue and workers start processing them like
there's no tomorrow. Sending notifications it's too resource-heavy, but if you're using database notifications
as well, it means at least 5000 queries. Let's say the notification contains a link to your and users start to
come to your site. nginx has few resources to use since your workers eat up your server.
These values can go from 0-19 and a higher value means a lower priority to the CPU. This means that your
server will prioritize nginx or php-fpm processes over your worker processes if there's a high load.
This means the worker will wait for 1 second after it finishes with a job. So your CPU has an opportunity to
server nginx or fpm processes.
I never knew about nice or rest before reading Mohamed Said's amazing book Laravel Queues in Action.
Exports
Exporting to CSV or XLS and importing from them is a very common feature in modern applications.
I'm going to use a finance application as an example. Something like Paddle, Gumroad. They are a merchant
of records that look like this:
Buyers buy the product from the landing page using a Paddle checkout form
I personally use Paddle to sell my books and SaaS and it's a great service. The main benefit is that you don't
have to deal with hundreds or thousands of invoices and VAT ramifications. Paddle handles it for you. They
send an invoice to the buyer and apply the right amount of VAT based on the buyer's location. They also
handle VAT ramifications. You, as the seller, don't have to deal with any of that stuff. They just send you the
money once every month and you have only one invoice. It also provides nice dashboards and reports.
Every month they send payouts to their users based on the transactions. They also send a CSV that contains
all the transactions in the given month.
This is the problem we're going to imitate in this chapter. Exporting tens of thousands of transactions in an
efficient way.
These are two transactions for user #1. I shortened some UUIDs so the table fits the page better. Most
columns are pretty easy to understand. Money values are stored in cent values so 3900 means $39 . There
are other rows as well, but they are not that important.
When it is payout time, a job queries all transactions in a given month for a user, creates a Payout object,
and then sets the payout_id in this table. This way we know that the given transaction has been paid out.
The same job exports the transactions for the user and sends them via e-mail.
laravel-excel is one of the most popular packages when it comes to imports/exports so we're going to
use it in the first example.
namespace App\Exports;
{
return [
$row!'uuid,
Arr!$get($row!'product_data, 'title'),
$row!'quantity,
MoneyForHuman!$from($row!'revenue)!'value,
MoneyForHuman!$from($row!'fee_amount)!'value,
MoneyForHuman!$from($row!'tax_amount)!'value,
MoneyForHuman!$from($row!'balance_earnings)!'value,
$row!'customer_email,
$row!'created_at,
];
}
I've seen dozens of exports like this one over the years. It creates a CSV from a collection. In the
collection method, you can define your collection which is 99% of the time the result of a query. In this
case, the collection contains Transaction models. Nice and simple.
The collection method runs a single query and loads each and every transaction into memory. The
moment you exceed x number of models your process will die because of memory limitations. x of
course varies highly.
If your collection is not that big and the export made it through the query, the map function will run for
each and every transaction. If you execute only one query here, it'll run n times where n is the
number of rows in your CSV. This is the breeding ground for N+1 problems.
Be aware of these things because it's pretty easy to kill your server with a poor export.
The export uses the Exportable trait from the package, which has a queue function
The method that runs the export uses this queue method:
new TransactionsExport(
$user,
$interval,
)
!'queue($report!'relativePath())
!'chain([
new NotifyUserAboutExportJob($user, $report),
]);
Fortunately, there's a much better export type than FromCollection , it is called FromQuery . This export
does not define a Collection but a DB query instead that will be executed in chunks by laravel-excel .
namespace App\Exports;
Instead of returning a Collection the query method returns a query builder. In addition, you can also use
the chunkSize method. It works hand in hand with Exportable and FromQuery :
Queued exports (using the Exportable trait and the queue method) are processed in chunks
So in the chunkSize we can control how many jobs we want. For example, if we have 5,000 transactions for
a given user and chunkSize() returns 250 it means that 20 jobs will be dispatched each processing 250
transactions. Unfortunately, I cannot give you exact numbers. It all depends on your specific use case.
However, it's a nice way to fine-tune your export.
Using the techniques above, exporting 10k transactions is a walk in the park:
9,847 to be precise but the jobs are running smoothly. There are 40 jobs each processing 250 transactions:
Imports
This is what a basic laravel-excel import looks like this:
namespace App\Imports;
It reads the CSV and calls the model method for each row then it calls save on the model you returned. It
means that it executes one query for each row. If you're importing thousands or tens of thousands of
users you'll spam your database and there's a good chance it will be unavailable.
Batch inserts
Chunk reading
Batch inserts
Batch insert means that laravel-excel won't execute one query per row, but instead, it batches the rows
together:
namespace App\Imports;
Chunk reading
Chunk reading means that instead of reading the entire CSV into memory at once laravel-excel chunks it
into smaller pieces:
namespace App\Imports;
Of course, these two features can be used together to achieve the best performance:
namespace App\Imports;
$rows = [];
$rowIdx = -1;
$columns = [];
if ($rowIdx !!+ 0) {
$columns = $data;
continue;
}
$row = [];
$rows[] = $row;
}
fclose($stream);
return collect($rows);
}
fgetcsv by default reads the file line by line so it won't load too much data into memory, which is good.
This function assumes that the first line of the CSV contains the headers. This block saves them into the
$columns variable:
if ($rowIdx !!+ 0) {
$columns = $data;
continue;
}
[
0 !& 'username',
1 !& 'email',
2 !& 'name',
]
[
0 !& 'johndoe',
1 !& '[email protected]',
2 !& 'John Doe',
]
$row = [];
$rows[] = $row;
At the end, the function closes the file, and returns a collection such as this:
[
[
'username' !& 'johndoe',
'email' !& '[email protected]',
'name' !& 'John Doe',
],
[
'username' !& 'janedoe',
'email' !& '[email protected]',
'name' !& 'Jane Doe',
],
]
It's quite simple, but it has one problem: it holds every row in memory. Just like with laravel-excel it will
exceed the memory limit after a certain size. There are two ways to avoid this problem:
PHP generators
Laravel's LazyCollection
Since LazyCollections are built on top generators, let's first understand them.
PHP generators
With a little bit of simplification, a generator function is a function that has multiple return statements. But
instead of return we can use the yield keyword. Here's an example:
Any function that uses the yield keyword will return a Generator object which implements the Iterable
interface so we can use it in a foreach .
Each time you call the getProducts function you get exactly one product back. So it won't load 10,000
products into memory at once, but only one.
return $products;
}
But this function will load 10,000 products into memory each time you call it.
10,000 5.45MB
100,000 49MB
300,000 PHP Fatal error: Allowed memory size of 134217728 bytes exhausted
It reached the 128MB memory limit with 300,000 items. And these items are lightweight arrays with only
scalar attributes! Imagine Eloquent models with 4-5 different relationships, attribute accessors, etc.
10,000 908KB
100,000 4.5MB
1,000,000 33MB
2,000,000 65MB
3,000,000 PHP Fatal error: Allowed memory size of 134217728 bytes exhausted
It can handle 2,000,000 items using only 65MB of RAM. It's 20 times more than what the standard function
could handle. However, the memory usage is only 32% higher (65M vs 49M).
if ($rowIdx !!+ 0) {
$columns = $data;
continue;
}
$row = [];
yield $row;
}
The whole function is identical except that it's not accumulating the data in a $rows variable but instead, it
yields every line when it reads it.
$transactions = $this!'readCsv();
This is the equivalent of chunk reading in laravel-excel . Now let's implement batch inserts as well.
$transactions = $this!'readCsv();
It runs one DB query for each CSV line. It can be dangerous if the CSV contains 75,000 lines, for example.
$transactions = $this!'readCsv();
$transactionBatch = [];
$transactionBatch = [];
}
}
if (!empty($transactionBatch)) {
Transaction!$insert($transactionBatch);
}
It accumulates transactions until it hits an index that can be divided by 500 then it inserts 500 transactions
at once. If there were 1,741 transactions, for example, the insert after the loop inserts the remaining 241.
With generators and a little trick, we achieved the same two things as with laravel-excel :
$collection = LazyCollection!$make(function () {
$handle = fopen('log.txt', 'r');
It works the same way as Generators but the make function returns a LazyCollection instance that has
lots of useful Collection methods such as map or each .
$rowIdx = -1;
$columns = [];
if ($rowIdx !!+ 0) {
$columns = $data;
continue;
$row = [];
yield $row;
}
});
}
The function that uses the readCsv method now looks like this:
$this!'readCsv()
!'chunk(500)
!'each(function(LazyCollection $transactions) {
Transaction!$insert($transactions!'toArray());
});
We can leverage the built-in chunk method that chunks the result by 500.
Reading files
Similarly to CSVs, reading a simple text file by chunks is pretty straightforward:
fclose($stream);
}
Even when reading a 60MB file the peak memory usage is 20MB.
$contents = $this!'readByLines("./storage/app/test.txt");
Deleting records
Deleting a large number of records can be tricky because you need to run a huge query that keeps MySQL
busy for seconds or even minutes.
For example, I have a table from which I'd like to delete 4.2m records:
Running this query took 1.47s (there's no index on the created_at column).
The Laravel command that removes page_views records looks like this:
For 58s the database was really busy executing this huge delete operation. While it was deleting records I
ran the same count(*) query a few times to check the speed of MySQL. At some point, it took 4.4s to run.
Normally it took 1.47s. So a huge delete query such as this one will make your database much slower. Or
even brings it down completely.
A pretty simple and neat trick we can apply here is to chunk the query and add a sleep in the function. I
first read about this trick from Matt Kingshott on Twitter.
while ($query!'exists()) {
$query!'limit(5000)!'delete();
sleep(1);
}
}
The $query doesn't specify if it's a select or a delete . It's just a query builder object with a where
expression.
The while loop invokes exists which will run a select count(*) query to determine if there are
any rows between the given time range.
If records can be found we delete 5,000 and then sleep for 1 second.
Of course, 1 second is just an example, maybe in your specific use case you need to use more than that.
This way, the process will take a much longer time to complete, but you won't overload your database and it
remains fast during the execution.
The point is whenever you need to work with a large dataset, it's probably a good idea to apply "divide and
conquer," or in other words chunk your data into smaller pieces and process them individually. In the
"Async workflows" chapter, you can see more examples of this idea.
Miscellaneous
fpm processes
php-fpm comes with a number of configurations that can affect the performance of our servers. These are
the most important ones:
pm.max_children : This directive sets the maximum number of fpm child processes that can be
started. This is similar to worker_processes in nginx.
pm.start_servers : This directive sets the number of fpm child processes that should be started when
the fpm service is first started.
pm.min_spare_servers : This directive sets the minimum number of idle fpm child processes that
should be kept running to handle incoming requests.
pm.max_requests : This directive sets the maximum number of requests that an fpm child process can
handle before it is terminated and replaced with a new child process. This is similar to the --max-jobs
option of the queue:work command.
The number of php-fpm processes is often calculated based on memory rather than CPU because PHP
processes are typically memory-bound rather than CPU-bound.
When a PHP script is executed, it loads into memory and requires a certain amount of memory to run. The
more PHP processes that are running simultaneously, the more memory will be consumed by the server. If
too many PHP processes are started, the server may run out of memory and begin to swap, which can lead
to performance issues.
TL;DR: if you don't have some obvious performance issue in your code php usually consumes more
memory than CPU.
So we need a few pieces of information to figure out the correct number for the max_children config:
How much memory does your server need just to stay alive?
Here's a command that will give you the average memory used by fpm processes:
-y tells ps to display the process ID (PID) and the process's controlling terminal.
-l instructs ps to display additional information about the process, including the process's state, the
amount of CPU time it has used, and the command that started the process.
-C php-fpm8.1 tells ps to only display information about processes with the name php-fpm8.1 .
--sort:rss : will sort the output based on the amount of resident set size (RSS) used by each process.
What the hell is the resident set size? It's a memory utilization metric that refers to the amount of physical
memory currently being used by a process. It includes the amount of memory that is allocated to the
process and cannot be shared with other processes. This includes the process's executable code, data, and
stack space, as well as any memory-mapped files or shared libraries that the process is using.
It's called "resident" for a reason. It shows the amount of memory that cannot be used by other processes.
For example, when you run memory_get_peak_usage() in PHP it only returns the memory used by the PHP
script. On the other hand, RSS measures the total memory usage of the entire process.
The command will spam your terminal with an output such as this:
The RSS column shows the memory usage. From 25Mb 43MB in this case. The first line (which has
significantly lower memory usage) is usually the master process. We can take that out of the equation and
say the average memory used by a php-fpm worker process is 43MB.
The next question is how much memory does your server need just to stay alive? This can be determined
using htop :
As you can see from the load average, right now nothing is happening on this server but it uses ~700MB of
RAM. This memory is used by Linux, PHP, MySQL, Redis, and all the system components installed on the
machine.
This means there is 1.3GB of RAM left to use. So we can spin up 1300/30=30 fpm processes.
It's a good practice to decrease the available RAM by at least 10% as a kind of "safety margin". So let's
calculate with 1.17GB of RAM: 1170/37=28.
To be completely honest, I'm not sure how these values are calculated but they are the "standard" settings.
You can search these configs on the web and you probably run into an article suggesting similar numbers.
By the way, there's also a calculator here.
To configure these values you need to edit the fpm config file which in my case is locaded in
/etc/php/8.1/fpm/pool.d/www.conf :
pm.max_children = 28
pm.start_servers = 7
pm.min_spare_servers = 7
pm.max_spare_servers = 21
Changing the number of children processes requires a full restart since fpm needs to kill and spawn
processes.
nginx cache
There are different types of caching mechanisms in nginx. We're gonna discover three of them:
Static content
FastCGI
Proxy
Caching static content with nginx can significantly improve the performance of a web application by
reducing the number of requests to the server and decreasing the load time of pages.
nginx provides several ways to cache static content such as JavaScript, CSS, and images. One way is to use
the expires directive to set a time interval for the cached content to be considered fresh.
location ~* \.(css|js|png|jpg|gif|ico)$ {
access_log off;
add_header Cache-Control public;
add_header Vary Accept-Encoding;
expires 1d;
}
~*` means a case-sensitive regular expression that matches files such as
`https:!"example.com/style.css
In most cases, it's a good idea to turn off the access_log when requesting images, CSS, and js files. It
spams the hell out of your access log file but doesn't really help you.
add_header Cache-Control public; : this adds a response header to enable caching of the static files
by public caches such as browsers, proxies, and CDNs. Basically, this instructs the browser to store the
files.
add_header Vary Accept-Encoding; : this adds a response header to indicate that the content may
vary based on the encoding of the request.
expires 1d; : this sets the expiration time for the cached content to 1 day from the time of the
request. There's no "perfect" time here. It depends on your deployment cycle, the usage, and so on. I
usually use a shorter time since it doesn't cause too many errors. For example, if you cache JS files for 7
days because you deploy on a weekly basis it means you cannot release a bugfix confidently, because
browsers might cache the old, buggy version. Of course, you can define a dedicated location directive
for JS, CSS files and another one for images. Something like this:
location ~* \.(css|js)$ {
access_log off;
add_header Cache-Control public;
add_header Vary Accept-Encoding;
expires 1d;
}
location ~* \.(png|jpg|gif|ico)$ {
access_log off;
add_header Cache-Control public;
add_header Vary Accept-Encoding;
expires 7d;
}
As you can see, it was pretty easy. Caching static content with nginx is an effective way to improve the
performance of your app, reduce server load, and enhance the user experience.
The fastcgi_cache directive can be used to store the responses from the fastcgi server on disk and serve
them directly to clients without having to go through the backend server every time. So this is what
happens:
Next time when a request comes into the same URL it won't forward the request to fastcgi. Instead, it
loads the content from the disk and returns it immediately to the client.
Caching fastcgi responses can drastically reduce the load on backend servers, improve the response time of
web applications, and enhance the user experience. It is particularly useful for websites that have high
traffic and serve dynamic content that changes infrequently.
In the company I'm working for, we had a recurring performance problem. The application we're building is
a platform for companies to handle their internal communication and other PR or HR-related workflows.
One of the most important features of the app is posts and events. Admins can create a post and publish
them. Employees get a notification and they can read the post.
Let's say a company has 10,000 employees. They publish an important post that interests people. All 1000
employees get the notification in 60 seconds or so. And they all hit the page within a few minutes. That's a
big spike compared to the usual traffic. The post details page (where employees go from the mail or push
notification) is, let's say, not that optimal. It's legacy code and has many performance problems such as N+1
queries. The page triggers ~80 SQL queries. 10 000 x 80 = 800 000 SQL queries. Eight hundred thousand
SQL queries in 5-10 minutes or so. That's bad.
Optimize the code and remove N+1 queries and other performance issues. This is outside of the scope
but fortunately, there's Laracheck which can detect N+1 and other performance problems in your
code! Now, that was a seamless plug, wasn't it?
The API response doesn't change frequently. Only when admins update or delete the post.
The response is independent of the current user. Every user sees the same title, content, etc so there's
no personalization on the page. This is required because nginx doesn't know anything about users and
their settings/preferences.
Since we're trying to solve a traffic spike problem, it's a very good thing if we could handle it on the
nginx-level. This means users won't even hit the API and Laravel. Even if you cache the result of a
database query with Laravel Cache 10 000 requests will still come into your app.
We can cache the posts for a very short time. For example, 1 minute. When the spike happens this 1
minute means thousands of users. But, using a short TTL means we cannot make big mistakes. Cache
invalidation is hard. Harder than we think so it's always a safe bet to use shorter TTLs. In this case, it
perfectly fits the use case.
I'll solve the same situation in the sample app. There's an /api/posts/{post} endpoint that we're gonna
cache.
http {
fastcgi_cache_path /tmp/nginx_cache levels=1:2 keys_zone=content_cache:100m
inactive=10m;
First, we need to tell nginx where to store the cache on the disk. This is done by using the
fastcgi_cache_path directive. It has a few configurations:
levels=1:2 tells nginx to create 2 levels of subdirectories inside this folder. The folder structure will
be something like that:
4e
b45cffe084dd3d20d928bee85e7b0f4e
2c322014fccc0a5cfbaf94a4767db04e
32
e2446c34e2b8dba2b57a9bcba4854d32
So levels=1:2 means that the first level of directories contains 1 character from the end of the hashed
directory name. Such as e and b45cffe084dd3d20d928bee85e7b0f4e . And then on the second level, the
directory's name contains characters from the end of the hash. Such as 4e and
b45cffe084dd3d20d928bee85e7b0f4e .
If you don't specify the levels option nginx will create only one level of directories. Which is fine for
smaller sites. However, for bigger traffic, specifying the levels option is a good practice since it can boost
the performance of nginx.
keys_zone=content_cache:100m defines the key of this cache which we can reference later. The 100m
sets the size of the cache to 100MB.
inactive=10m tells how long to keep a cache entry after it was last accessed. In this case, it's 10
minutes.
location ~\.php {
fastcgi_cache_key $scheme$host$request_uri$request_method;
fastcgi_cache content_cache;
fastcgi_cache_valid 200 5m;
fastcgi_cache_use_stale error timeout invalid_header http_500 http_503
http_404;
fastcgi_ignore_headers Cache-Control Expires Set-Cookie;
fastcgi_cache_key defines the key for the given request. It looks like this:
HTTPSmysite.composts/1GET This is the string that will be the filename after it's hashed.
fastcgi_cache here we need to specify the key we used in the keys_zone option.
The fastcgi_cache_valid directive sets the maximum time that a cached response can be
considered valid. In this case, it's set to 5 minutes for only successful (200) responses.
The fastcgi_ignore_headers directive specifies which response headers should be ignored when
caching responses. Basically, they won't be cached at all. Caching cache-related headers with
expiration dates does not make much sense.
The fastcgi_cache_use_stale directive specifies which types of stale cached responses can be used
if the backend server is unavailable or returns an error. A stale cached response is a response that has
been previously cached by the server, but has exceeded its maximum allowed time to remain in the
cache and is considered "stale". This basically means that even if the BE is currently down we can serve
clients by using older cached responses. In this project, where the content is not changing that often
it's a perfectly good strategy to ensure better availability.
All right, so we added these directives to the location ~\.php location so it will apply to every request.
Which is not the desired outcome. The way we can control which locations should use cache looks like this:
fastcgi_cache_bypass 1;
fastcgi_no_cache 1;
If fastcgi_cache_bypass is 1 then nginx will not use cache and forwards the request to the backend.
Obviously, we need a way to set these values dynamically. Fortunately, nginx can handle variables and if
statements:
set $no_cache 1;
if ($request_uri ~* "\/posts\/([0-9]+)") {
set $no_cache 0;
}
if ($request_method != GET) {
set $no_cache 1;
}
This code will set the $non_cache variable to 0 only if the request is something like this: GET /posts/12
Otherwise, it'll be 1 . Finally, we can use this variable:
location ~\.php {
fastcgi_cache_key $scheme$host$request_uri$request_method;
fastcgi_cache content_cache;
fastcgi_cache_valid 200 5m;
fastcgi_cache_use_stale error timeout invalid_header http_500 http_503
http_404;
fastcgi_ignore_headers Cache-Control Expires Set-Cookie;
fastcgi_cache_bypass $no_cache;
fastcgi_no_cache $no_cache;
With this simple config, we can cache every posts/{post} for 10 minutes. The 5 minutes is an arbitrary
number and it's different for every use case. As I said earlier, with this example, I wanted to solve a
performance problem that happens in a really short time. So caching responses for 5 minutes is a good
solution to this problem. And of course, the shorter you cache something the less risk you take (by serving
outdated responses).
An important thing about caching on the nginx level: it can be tricky (or even impossible) to cache user-
dependent content. For example, what is users can see the Post in their preferred language? To make this
possible we need to add the language to the cache key so one post will have many cache keys. One for each
language. If you have the language key in the URL it's not a hard task, but if you don't, you have to refactor
your application. Or you need to use Laravel cache (where you have access to the user object and
preferences of course).
There are other kinds of settings that can cause problems. For example, what if every post has an audience?
So you can define who can see your post (for example, only your followers or everyone, etc). To handle this,
you probably need to add the user ID to the URL and the cache key as well.
Be aware of these scenarios. There's also a proxy_pass option that you can use in a reverse proxy.
slow_query_log = 1
slow_query_log_file = /var/log/slow_query.log
long_query_time = 1
long_query_time is in seconds so 1 means MySQL logs every query that takes more than 1s. You should
adjust it to your preferences. With these settings, MySQL will log slow queries in /var/log/slow_query.log
after you restart it.
# Time: 2024-04-23T17:56:50.185184Z
# User@Host: root[root] @ [192.168.65.1] Id: 8
# Query_time: 2.103501 Lock_time: 0.000050 Rows_sent: 5 Rows_examined:
1557987
SET timestamp=1713895008;
select
hashed_uri, count(*) as total
from
`page_views`
where visited_at between "2024-04-07 19:00:00" and "2024-04-08 23:59:59"
and site_id = 1
group by hashed_uri
order by total desc
limit 5;
# Time: 2024-04-23T17:58:48.187662Z
# User@Host: root[root] @ [192.168.65.1] Id: 8
# Query_time: 6.309928 Lock_time: 0.000009 Rows_sent: 1557982
Rows_examined: 1557982
SET timestamp=1713895121;
select *
from page_views
You can enable this on your server and monitor your queries. After a time the log file will be huge so don't
forget to purge or rotate it.
It counts the. number of connections your database has at the given moment, and dispatches an event if it's
greater than the --max argument which is 100 in this case. We need to schedule this command to run
every minute:
The command will fail and dispatch an event if the number of connections is greater than 100. We can
handle the event like this:
app:
image: app:1.0.0
deploy:
resources:
limits:
cpus: 1.5
queue:
image: app:1.0.0
deploy:
resources:
limits:
cpus: 1
mysql:
image: mysql:8.0.35
deploy:
resources:
limits:
cpus: 1.5
This is an example with a 2-core machine. This config makes sure that no container will drive your CPU
crazy. Even if you write an infinite loop that calculates prime numbers your server will have at least 0.5 cores
available for other processes. In the queue, I used a smaller number. This is not a coincidence. In my
opinion, background processes should not use the entire CPU. This way, containers that serve user requests
always have some available CPU.
app:
image: app:1.0.0
deploy:
resources:
limits:
cpus: 1.5
memory: 128M
spatie/laravel-health
pragmarx/health
Both of them do the same thing. They can monitor the health of the application. Meaning:
etc
You can define the desired threshold and the package will notify you if necessary. I'm going to use the
Spatie package but the other one is also pretty good. Spatie is more code-driven meanwhile the Pragmarx
package is more configuration-driven.
<?php
namespace App\Providers;
UsedDiskSpaceCheck warns you if more than 70% of the disk is used and it sends an error message if
more than 90% is used.
CpuLoadCheck measures the CPU load (the numbers you can see when you open htop ). It sends you a
failure message if the 5-minute average load is more than 2 or if the 15-minute average is more than
1.75. I'm using 2-core machines in this project a load of 2 means both the cores run at 100%. If you
have a 4-core CPU 4 means 100% load.
DatabaseConnectionCountCheck sends you a warning if there are more than 50 connections and a
failure message if there are more than 100 MySQL connections.
RedisCheck tries to connect to Redis and notifies you if the connection cannot be established.
RedisMemoryUsageCheck sends you a message if Redis is using more than 500MB of memory.
These are the basic checks you can use in almost every project.
To be able to use the CpuLoadCheck and the DatabaseConnectionCountCheck you have to install these
packages as well:
spatie/cpu-load-health-check
doctrine/dbal
The package can send you e-mail and Slack notifications as well. Just set up them in the health.php config
file:
'notifications' !& [
Spatie\Health\Notifications\CheckFailedNotification!$class !& ['mail'],
],
'mail' !& [
'to' !& env('HEALTH_CHECK_EMAIL', '[email protected]'),
'from' !& [
'address' !& env('MAIL_FROM_ADDRESS', '[email protected]'),
'name' !& env('MAIL_FROM_NAME', 'Health Check'),
],
],
'slack' !& [
'webhook_url' !& env('HEALTH_SLACK_WEBHOOK_URL', ''),
'channel' !& null,
'username' !& null,
'icon' !& null,
],
Finally, you need to run the command provided by the package every minute:
$schedule!'command('health:check')!'everyMinute();
You can also write your own checks. For example, I created a QuerySpeedCheck that simply measures the
speed of an important query:
namespace App\Checks;
$executionTimeMs = Benchmark!$measure(function () {
Post!$with('author')!'orderBy('publish_at')!'get();
});
return $result!'ok();
}
}
In the sample application I don't have too many database queries, but selecting the posts with authors is
pretty important and happens a lot. So if this query cannot be executed in a certain amount of time it
means the application might be slow. The numbers used in this example are completely arbitrary. Please
measure your own queries carefully before setting a threshold.