HTTP Caching - Roadmap - SH
HTTP Caching - Roadmap - SH
HTTP Caching
Everything you need to know about web caching
Introduction
What is a web cache? It is something that sits somewhere between the
client and the server, continuously looking at the requests and their
responses, looking for any responses that can be cached. So that there
is less time consumed when the same request is made again.
:
Note that this image is just to give you an idea. Depending upon the
type of cache, the place where it is implemented could vary. More
on this later.
Before we get into further details, let me give you an overview of the
terms that will be used, further in the article
Origin Server, the source of truth, houses all the content required
by the client and is responsible for fulfilling the client’s requests.
Caching Locations
Web cache can be shared or private depending upon the location
where it exists. Here is the list of different caching locations
Browser Cache
Proxy Cache
Browser Cache
You might have noticed that when you click the back button in your
:
browser it takes less time to load the page than the time that it took
during the first load; this is the browser cache in play. Browser cache is
the most common location for caching and browsers usually reserve
some space for it.
A browser cache is limited to just one user and unlike other caches, it
can store the “private” responses. More on it later.
Proxy Cache
Unlike browser cache which serves a single user, proxy caches may
serve hundreds of different users accessing the same content. They
are usually implemented on a broader level by ISPs or any other
independent entities for example.
:
Reverse Proxy Cache
Caching Headers
So, how do we control the web cache? Whenever the server emits
some response, it is accompanied by some HTTP headers to guide the
caches on whether and how to cache this response. The content
provider is the one that has to make sure to return proper HTTP
headers to force the caches on how to cache the content.
Introduction
Caching Locations
:
Browser Cache
Proxy Cache
Caching Headers
Expires
Pragma
Cache-Control
private
public
no-store
no-cache
max-age: seconds
s-maxage: seconds
must-revalidate
proxy-revalidate
Mixing Values
Validators
ETag
:
Last-Modified
Where do I start?
Utilizing Server
Caching Recommendations
Expires
It should be noted that the date cannot be more than a year and if the
date format is wrong, the content will be considered stale. Also, the
clock on the cache has to be in sync with the clock on the server,
otherwise, the desired results might not be achieved.
Another one from the old, pre HTTP/1.1 days, is Pragma. Everything that
it could do is now possible using the cache-control header given below.
However, one thing I would like to point out about it is, that you might
see Pragma: no-cache being used here and there in hopes of
stopping the response from being cached. It might not necessarily
work; as HTTP specification discusses it in the request headers and
there is no mention of it in the response headers. Rather Cache-
Control header should be used to control the caching.
Cache-Control
private
Setting the cache to private means that the content will not be
cached in any of the proxies and it will only be cached by the client (i.e.
browser)
:
Cache-Control: private
Having said that, don’t let it fool you into thinking that setting this
header will make your data any secure; you still have to use SSL for that
purpose.
public
If set to public, apart from being cached by the client, it can also be
cached by the proxies; serving many other users
Cache-Control: public
no-store
the caches
Cache-Control: no-store
no-cache
max-age: seconds
`max-age` specifies the number of seconds for which the content will
s-maxage: seconds
targets the shared caches. Like max-age it also gets the number of
seconds for which something is to be cached. If present, it will override
max-age and expires headers for shared caching.
:
Cache-Control: s-maxage=3600, public
must-revalidate
network problems and the content cannot be retrieved from the server,
the browser may serve stale content without validation. must-
revalidate avoids that. If this directive is present, it means that stale
content cannot be served in any case and the data must be re-
validated from the server before serving.
proxy-revalidate
why did they not call it s-revalidate?. I have no idea why, if you have
any clue please leave a comment below.
Mixing Values
Validators
Up until now we only discussed how the content is cached and how
long the cached content is to be considered fresh but we did not
discuss how the client does the validation from the server. Below we
discuss the headers used for this purpose.
ETag
A strong validating ETag means that two resources are exactly same
and there is no difference between them at all. While a weak ETag
means that two resources although not strictly the same but could be
considered the same. Weak etags might be useful for dynamic content,
for example.
Now you know what etags are but how does the browser make this
request? by making a request to server while sending the available Etag
in If-None-Match header.
Consider the scenario, you opened a web page which loaded a logo
image with caching period of 60 seconds and ETag of abc123xyz.
After about 30 minutes you reload the page, browser will notice that
the logo which was fresh for 60 seconds is now stale; it will trigger a
:
request to server, sending the ETag of the stale logo image in if-
none-match header
If-None-Match: "abc123xyz"
Server will then compare this ETag with the ETag of the current version
of resource. If both etags are matched, server will send back the
response of 304 Not Modified which will tell the client that the copy
that it has is still good and it will be considered fresh for another 60
seconds. If both the etags do not match i.e. the logo has likely changed
and client will be sent the new logo which it will use to replace the stale
logo that it has.
Last-Modified
When the content gets stale, client will make a conditional request
including the last modified date that it has inside the header called If-
Modified-Since to server to get the updated Last-Modified date; if
it matches the date that the client has, Last-Modified date for the
:
content is updated to be considered fresh for another n seconds. If the
received Last-Modified date does not match the one that the client
has, content is reloaded from the server and replaced with the content
that client has.
You might be questioning now, what if the cached content has both the
Last-Modified and ETag assigned to it? Well, in that case both are to
be used i.e. there will not be any re-downloading of the resource if and
only if ETag matches the newly retrieved one and so does the Last-
Modified date. If either the ETag does not match or the Last-
Modified is greater than the one from the server, content has to be
downloaded again.
Where do I start?
Now that we have got everything covered, let us put everything in
perspective and see how you can use this information.
Utilizing Server
Before we get into the possible caching strategies , let me add the fact
that most of the servers including Apache and Nginx allow you to
implement your caching policy through the server so that you don’t
:
have to juggle with headers in your code.
For example, if you are using Apache and you have your static content
placed at /static, you can put below .htaccess file in the directory
to make all the content in it be cached for an year using below
You can further use filesMatch directive to add conditionals and use
different caching strategy for different kinds of files e.g.
Or if you don’t want to use the .htaccess file you can modify Apache’s
configuration file http.conf. Same goes for Nginx, you can add the
:
caching information in the location or server block.
Caching Recommendations
There is no golden rule or set standards about how your caching policy
should look like, each of the application is different and you have to
look and find what suits your application the best. However, just to give
you a rough idea
You can have aggressive caching (e.g. cache for an year) on any
static content and use fingerprinted filenames (e.g.
style.ju2i90.css) so that the cache is automatically rejected
whenever the files are updated. Also it should be noted that you
should not cross the upper limit of one year as it might not be
honored
Look and decide do you even need caching for any dynamic
content, if yes how long it should be. For example, in case of some
RSS feed of a blog there could be the caching of a few hours but
there couldn’t be any caching for inventory items in an ERP.
Separate the content that changes often from the content that
:
doesn’t change that often (e.g. in javascript bundles) so that when
it is updated it doesn’t need to make the whole cached content
stale.
Test and monitor the caching headers being served by your site.
You can use the browser console or curl -I http://some-url.com for
that purpose.
Community
roadmap.sh is the 6th most starred project on GitHub and is visited by
hundreds of thousands of developers every month.