0% found this document useful (0 votes)
57 views39 pages

For Content Publishers: Michael J. Radwin O'Reilly Open Source Convention July 28, 2004

The document discusses HTTP caching and cache-busting techniques for content publishers. It explains how browsers and proxies use caching to store web content locally for faster loading. It recommends five techniques for publishers: 1) use "Cache-Control: private" for personalized content to prevent caching, 2) implement an "Images Never Expire" policy to cache static images long-term, 3) use a cookie-free domain for static content, 4) use Apache defaults for caching CSS and JavaScript files, and 5) use random strings in URLs for accurate analytics or very sensitive content.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views39 pages

For Content Publishers: Michael J. Radwin O'Reilly Open Source Convention July 28, 2004

The document discusses HTTP caching and cache-busting techniques for content publishers. It explains how browsers and proxies use caching to store web content locally for faster loading. It recommends five techniques for publishers: 1) use "Cache-Control: private" for personalized content to prevent caching, 2) implement an "Images Never Expire" policy to cache static images long-term, 3) use a cookie-free domain for static content, 4) use Apache defaults for caching CSS and JavaScript files, and 5) use random strings in URLs for accurate analytics or very sensitive content.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

HTTP Caching & Cache-Busting

for Content Publishers


Michael J. Radwin
O’Reilly Open Source Convention
July 28, 2004
Publishers must think about caching
• Publishers have a lot of web content
– HTML, images, Flash, movies
• Speed is important part of user experience
• Bandwidth is expensive
– Use what you need, but avoid unnecessary extra
• Personalization differentiates
– Show timely data (stock quotes, news stories)
– Get accurate advertising statistics
– Protect sensitive info (e-mail, account balances)

2
HTTP Review
(1) Client connects to www.example.com port 80

Interne
t
Client
Server

(2) Client sends HTTP GET request

Interne
t
Client
Server
3
HTTP Review (cont’d)
(3) Client reads HTTP response from server

Interne
t
Client
Server

(4) Client and Server close connection

Interne
t
Client
Server
4
HTTP Example
mradwin@machshav:~$ telnet www.example.com 80
Trying 192.168.37.203...
Connected to w6.example.com.
Escape character is '^]'.
GET /foo/index.html HTTP/1.1
Host: www.example.com

HTTP/1.1 200 OK
Date: Wed, 28 Jul 2004 23:36:12 GMT
Last-Modified: Fri, 23 Jul 2004 01:52:37 GMT
Content-Length: 3688
Connection: close
Content-Type: text/html

<html><head>
<title>Hello World</title>
... 5
Browsers use private caches
GET /foo/index.html HTTP/1.1
Host: www.example.com

Interne
t
Client
HTTP/1.1 200 OK
Server
Last-Modified: Fri, 23 Jul 2004 01:52:37 GMT
Content-Length: 3688
Content-Type: text/html

Client stores copy of


http://www.example.com/foo/index.html
on its hard disk with timestamp.

6
Revalidation (Conditional GET)
GET /foo/index.html HTTP/1.1
Host: www.example.com
If-Modified-Since: Fri, 23 Jul 2004 01:52:37 GMT

Interne
t
Client
HTTP/1.1 304 Not Modified
Server

7
Non-Caching Proxy
GET /foo/index.html HTTP/1.1 GET /foo/index.html HTTP/1.1
Host: www.example.com Host: www.example.com

Interne
t
Client
Proxy HTTP/1.1 200 OK Server
Last-Modified: Fri, 23 Jul ...
HTTP/1.1 200 OK Content-Length: 3688
Last-Modified: Fri, 23 Jul ... Content-Type: text/html
Content-Length: 3688
Content-Type: text/html

8
Proxy Cache Miss
GET /foo/index.html HTTP/1.1 GET /foo/index.html HTTP/1.1
Host: www.example.com Host: www.example.com

Interne
t
Client
Proxy HTTP/1.1 200 OK Server
Last-Modified: Fri, 23 Jul ...
HTTP/1.1 200 OK Content-Length: 3688
Last-Modified: Fri, 23 Jul ... Content-Type: text/html
Content-Length: 3688
Content-Type: text/html

9
Proxy Cache Hit
GET /foo/index.html HTTP/1.1
Host: www.example.com

Interne
t
Client
Proxy Server
HTTP/1.1 200 OK
Last-Modified: Fri, 23 Jul ...
Content-Length: 3688
Content-Type: text/html

10
Proxy Cache Revalidation Hit
GET /foo/index.html HTTP/1.1 GET /foo/index.html HTTP/1.1
Host: www.example.com Host: www.example.com
If-Modified-Since: Fri, 23 Jul ...

Interne
t
Client
Proxy HTTP/1.1 304 Not Modified Server
HTTP/1.1 200 OK
Last-Modified: Fri, 23 Jul ...
Content-Length: 3688
Content-Type: text/html

11
Assumptions about content types
Rate of change once published
Frequently Occasionally Rarely/Never

HTML CSS Images


JavaScript Flash
PDF
Dynamic Content Static Content
Personalized Same for everyone

12
Top 5 techniques for publishers
1. Use “Cache-Control: private” for
personalized content
2. Implement “Images Never Expire” policy
3. Use a cookie-free TLD for static content
4. Use Apache defaults for CSS & JavaScript
5. Use random strings in URL for accurate
hit metering or very sensitive content
13
1. Use “Cache-Control: private”
for personalized content
Rate of change once published
Frequently Occasionally Rarely/Never

HTML CSS Images


JavaScript Flash
PDF

Dynamic Content Static Content


Personalized Same for everyone

14
Bad caching of personalized content
GET /msg3.html HTTP/1.1 GET /msg3.html HTTP/1.1
Host: webmail.example.com Host: webmail.example.com
Cookie: user=jane Cookie: user=jane

Interne
t
Client 1 Jane’s e-mail message
Proxy Jane’s e-mail message Webmail
Server

15
Bad caching of personalized content
GET /msg3.html HTTP/1.1
Host: webmail.example.com
Cookie: user=jane

Interne
t
Client 1 Jane’s e-mail message
Proxy Webmail
Server

msg3.html

16
Bad caching of personalized content

GET /msg3.html HTTP/1.1


Host: webmail.example.com Interne
Cookie: user=mary
t
Proxy Webmail
Server

msg3.html
Client 2

17
What’s cacheable?
• HTTP/1.1 allows caching anything by default
– Unless explicit Cache-Control header
• In practice, most caches avoid anything with
– Cache-Control/Pragma header
– Cookie/Set-Cookie headers
– WWW-Authenticate/Authorization header
– POST/PUT method
– 302/307 status code

18
Cache-Control: private
• Shared caches bad for shared content
– Mary shouldn’t be able to read Jane’s webmail
• Private caches perfectly OK
– Speed up web browsing experience
• Avoid personalization leakage with single
line in httpd.conf or .htaccess
Header set Cache-Control private

19
2. “Images Never Expire” policy
Rate of change once published
Frequently Occasionally Rarely/Never

HTML CSS Images


JavaScript Flash
PDF

Dynamic Content Static Content


Personalized Same for everyone

20
The “Images Never Expire” Policy
• Encourage caching of icons & logos
– Forever ≈ 10 years in Internet biz
• Must change URL when you change image
– http://us.yimg.com/i/new.gif
– http://us.yimg.com/i/new2.gif
• Tradeoff
– More difficult for designers
– Bandwidth savings, faster user experience

21
Images Never Expire (mod_expires)
# Works with both HTTP/1.0 and HTTP/1.1
ExpiresActive On
ExpiresByType image/gif A315360000
ExpiresByType image/jpeg A315360000
ExpiresByType image/png A315360000

22
Images Never Expire (mod_headers)
# Works with HTTP/1.1 only
<FilesMatch "\.(gif|jpe?g|png)$">
Header set Cache-Control \
"max-age=315360000"
</FilesMatch>
# Works with both HTTP/1.0 and HTTP/1.1
<FilesMatch "\.(gif|jpe?g|png)$">
Header set Expires \
"Mon, 28 Jul 2014 23:30:00 GMT"
</FilesMatch>

23
mod_images_never_expire
/* Enforce policy with module that runs at URI translation hook */
static int translate_imgexpire(request_rec *r) {
const char *ext;
if ((ext = strrchr(r->uri, '.')) != NULL) {
if (strcasecmp(ext, ".gif") == 0 || strcasecmp(ext, ".jpg") == 0 ||
strcasecmp(ext, ".png") == 0 || strcasecmp(ext, ".jpeg") == 0) {
if (ap_table_get(r->headers_in, "If-Modified-Since") != NULL ||
ap_table_get(r->headers_in, "If-None-Match") != NULL) {
/* Don't bother checking filesystem, just hand back a 304 */
return HTTP_NOT_MODIFIED;
}
}
}
return DECLINED;
}

24
3. Cookie-free TLD for static content
Rate of change once published
Frequently Occasionally Rarely/Never

HTML CSS Images


JavaScript Flash
PDF

Dynamic Content Static Content


Personalized Same for everyone

25
Cookie-free TLD for static content
• For maximum efficiency use two domains
– www.example.com for HTML
– static.example.net for images
• Many proxies won’t cache Cookie reqs
– But: multimedia is never personalized
– Cookies would ignored by server anyways

26
Typical GET request w/Cookies
GET /i/foo/bar/quux.gif HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040707
Firefox/0.8
Accept: application/x-shockwave-
flash,text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai
n;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1
Cookie: U=mt=vtC1tp2MhYv9RL5BlpxYRFN_P8DpMJoamllEcA--&ux=IIr.AB&un=42vnticvufc8v;
brandflash=1; B=amfco1503sgp8&b=2; F=a=NC184LcsvfX96G.JR27qSjCHu7bII3s.
tXa44psMLliFtVoJB_m5wecWY_.7&b=K1It; LYC=l_v=2&l_lv=7&l_l=h03m8d50c8bo
&l_s=3yu2qxz5zvwquwwuzv22wrwr5t3w1zsr&l_lid=14rsb76&l_r=a8&l_um=1_0_1_0_0;
GTSessionID835990899023=83599089902340645635; Y=v=1&n=6eecgejj7012f
&l=h03m8d50c8bo/o&p=m012o33013000007&jb=16|47|&r=a8&lg=us&intl=us&np=1;
PROMO=SOURCE=fp5; YGCV=d=; T=z=iTu.ABiZD/AB6dPWoqXibIcTzc0BjY3TzI3NTY0MzQ-
&a=YAE&sk=DAAwRz5HlDUN2T&d=c2wBT0RBekFURXdPRFV3TWpFek5ETS0BYQFZQUUBb2sBWlcwLQF
0aXABWUhaTVBBAXp6AWlUdS5BQmdXQQ--&af=QUFBQ0FDQURCOUFIQUJBQ0FEQUtBTE
FNSDAmdHM9MTA5MDE4NDQxOCZwcz1lOG83MUVYcTYxOVouT2Ftc1ZFZUhBLS0-;
LYS=l_fh=0&l_vo=myla; PA=p0=dg13DX4Ndgk-&p1=6L5qmg--&e=xMv.AB;
YP.us=v=2&m=addr&d=1525+S+Robertson+Blvd%01Los+Angeles%01CA%0190035-
4231%014480%0134.051590%01-118.384342%019%01a%0190035
Referer: http://www.example.com/foo/bar.php?abc=123&def=456
Accept-Language: en-us,en;q=0.7,he;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300 27
Connection: keep-alive
Same request, no Cookies
GET /i/foo/bar/quux.gif HTTP/1.1
Host: static.example.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040707
Firefox/0.8
Accept: application/x-shockwave-
flash,text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai
n;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1
Referer: http://www.example.com/foo/bar.php?abc=123&def=456
Accept-Language: en-us,en;q=0.7,he;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

• Added bonus: much smaller GET request


– Dial-up MTU size 576 bytes, PPPoE 1492
– 1450 bytes reduced to 550
28
4. Apache defaults for static,
occasionally-changing content
Rate of change once published
Frequently Occasionally Rarely/Never

HTML CSS Images


JavaScript Flash
PDF

Dynamic Content Static Content


Personalized Same for everyone

29
Revalidation works pretty well
• Revalidation default behavior for static content
– Browser sends If-Modified-Since request
– Server replies with short 304 Not Modified
– No fancy Apache config needed
• Use if you can’t predict when content will change
– Page designers can change immediately
– No renaming necessary
• Cost: extra HTTP transaction for 304
– Small with Keep-Alive, but large sites disable

30
Techniques to encourage caching
• Send explicit Cache-Control or Expires
• Generate “static content” headers
– Last-Modified, ETag
– Content-Length
• Avoid “cgi-bin”, “.cgi” or “?” in URLs
– Some proxies (e.g. Squid) won’t cache
– Use PATH_INFO instead

31
5. Random URL strings for accurate
hit metering or very sensitive content
Rate of change once published
Frequently Occasionally Rarely/Never

HTML CSS Images


JavaScript Flash
PDF

Dynamic Content Static Content


Personalized Same for everyone

32
Accurate advertising statistics
• If you trust proxies
– Send Cache-Control: must-revalidate
– Count 304 Not Modified log entries as hits
• If you don’t
– Ask client to fetch uncacheable image URL
– Return 307 to highly cacheable image file
– Count 307s as hits
– Don’t bother to look at cacheable server log

33
Hit-metering for advertisements (1)
<script type="text/javascript">
var r = Math.random();
var t = new Date();
document.write("<img width='109' height='52'
src='http://ads.example.com/ad/foo/bar.gif?t="
+ t.getTime() + ";r=" + r + "'>");
</script>
<noscript>
<img width="109" height="52" src=
"http://ads.example.com/ad/foo/bar.gif?js=0">
</noscript>

34
Hit-metering for advertisements (2)
GET /ad/foo/bar.gif?t=1090538707;r=0.510772917234983 HTTP/1.1
Host: ads.example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
rv:1.7) Gecko/20040707 Firefox/0.8
Referer: http://www.example.com/foo/bar.php?abc=123&def=456
Cookie: uid=C50DF33E-E202-4206-B1F3-946AEDF9308B

HTTP/1.1 307 Temporary Redirect


Date: Wed, 28 Jul 2004 23:45:06 GMT
Cache-Control: max-age=0,no-cache,no-store
Expires: Tue, 11 Oct 1977, 01:23:45 GMT
Pragma: no-cache
Location: http://static.example.net/i/foo/bar.gif
Content-Type: text/html

<a href="http://static.example.net/i/foo/bar.gif">Moved</a>
35
Hit-metering for advertisements (3)
GET /i/foo/bar.gif HTTP/1.1
Host: static.example.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
rv:1.7) Gecko/20040707 Firefox/0.8
Referer: http://www.example.com/foo/bar.php?abc=123&def=456

HTTP/1.1 200 OK
Date: Wed, 28 Jul 2004 23:45:07 GMT
Last-Modified: Mon, 05 Oct 1998 18:32:51 GMT
ETag: "69079e-ad91-40212cc8"
Cache-Control: public,max-age=315360000
Expires: Mon, 28 Jul 2014 23:45:07 GMT
Content-Length: 6096
Content-Type: image/gif

GIF89a...
36
Turning proxies into private caches
• Use distinct tokens in URL
– No two users use same token
– Defeats shared proxy caches
– Works well with private caches
• Doesn’t break the back button
• May break visited-link highlighting
– e.g. JavaScript timestamps/random numbers
– Every link is blue, no purple
37
Breaking the Back button
• When users click browser Back button
– Expect to go back one page instantly
– Private cache enables this behavior
• Aggressive cache-busting breaks Back button
– Server sends Pragma: no-cache or Expires in past
– Browser must re-visit server to re-fetch page
– Hitting network much slower than hitting disk
• Use very sparingly
– Compromising user experience is A Bad Thing

38
Review: Top 5 techniques
1. Use “Cache-Control: private” for
personalized content
2. Implement “Images Never Expire” policy
3. Use a cookie-free TLD for static content
4. Use Apache defaults for CSS & JavaScript
5. Use random strings in URL for accurate
hit metering or very sensitive content
39

You might also like