Here at Port80, we're big advocates of what we call cache control policies. But frankly, it's proven to be something of an uphill battle. Amazingly to us, for example, only about a fifth of Fortune 1000 Web sites show any evidence of having proper cache control policies in place.
There are probably a lot of reasons for this but we remain convinced that two of the big ones are that:
1. Few Web professionals realize how much faster their sites would seem to end-users if they took the trouble to write proper cache control policies, and;
2. Even fewer realize how darn easy this is to do.
What follows is a small example of point number one (we'll come back to point number two in a subsequent post).
With Internet Explorer's default settings and a clear cache, make a request to the Port80 homepage using a good HTTP trace tool (we link to a few here) that lets you see what is going on at the protocol level, and gives you download times for individual objects. You'll get something like the following:
| Result |
Time |
Size |
Type |
URL-path |
| 200 |
0.931 |
50 |
image/gif |
/images/H_vertline |
This is a little vertical line image used for decoration on the page. It weighs 50 bytes and it took almost a full second to download, from the beginning of the request.
Okay, now let's pretend there is no cache control policy in place for this file. What happens to it, given IE's default caching behavior? The first thing to notice is that vertical line image, as is normal for statics, had a Last-Modified header associated with it:
Last-Modified: Thu, 26 Sep 2002 00:07:24 GMT
This means that, if we close down the browser then come back later, when we hit that homepage again, IE isn't simply going to ask for the little vertical line image. Instead, it is going to ask for confirmation that the cached copy it has is still fresh. In IE's HTTP request headers, you would see something like this:
If-Modified-Since: Thu, 26 Sep 2002 00:07:24 GMT
And the server would respond, not by sending the file, but by saying, right in the HTTP response line, the first line of text it sends back to the client:
HTTP/1.1 304 Not Modified
meaning, "Go ahead and use the one you have, it hasn't changed on the server." So IE uses the cached copy, with the resulting kind of numbers:
| Result |
Time |
Size |
Type |
URL-path |
| 304 |
0.220 |
50 |
image/gif |
/images/H_vertline |
So we saved about half a second of download time, because the image itself didn't have to be downloaded, only the lightweight message headers telling IE to use the cached copy. And you didn't have to do a thing. Pretty nice, huh?
But there's a problem here. 0.220 seconds may not seem like much, but now navigate around the site a bit and then, after a while, come back to the homepage. Notice how it popped right up, with no delay at all? Why is that?
The reason is this: Everything was already in the cache but, more importantly, nothing had to be revalidated. IE assumes, within the browser session, that the objects it has cached are still fresh. Hence the eye-popping performance -- no round trips to the server were required. On the return trip, by contrast, we wasted about a quarter second going back to the server to revalidate a tiny file that was obviously reusable -- and that was just the penalty for a single object.
Say that homepage happens to have well over 50 objects on it, many of them tiny ones that hardly ever change -- just like H_vertline.gif. Even with multiplexed and persistent TCP connections (even with TCP pipelining), you'll never get this kind of instantaneous page load if, for each of those little objects, the browser has to go back to the server and ask, "Can I still use this?" And that is exactly what IE does, by default, on every return visit.
Now imagine if you could get intra-session like performance on return visits. That would pretty clearly be a superior type of caching, wouldn't it. Well, you can get performance like that -- if you have cache control policies. This is because, once you now what your cache control policies are, you can implement expiration-based caching, instead of relying on the Last-Modified mechanism. And expiration-based caching blows the Last-Modified caching away, when it comes to fast page loads.
For instance, if you have a cache control policy in place that says how long H_vertline.gif (or other files like it) should be cached, then the first response with that file in it could also contain Expires and Cache-Control headers that implement that policy.
The result would be that IE, on return visits to the homepage, would know it could reuse this little vertical line image straight out of its own cache, instead of going all the way to the server to get permission. And that would mean an object load time too small to even bother measuring in thousandths of a second. (You can actually try this out on our homepage because, in the real world, the images on that page do use expiration-based cache control.)
Well, we hope that converted a few more of you. Next time we return to this topic, we'll cover how easy it is to write a good set of cache control policies.