[200 OK]: A Port80 Software Blog

We're all 200 OK: Web, HTTP and IIS Insights
posts - 199, comments - 719, trackbacks - 95

Cache Control Policies: Why Bother?


Here at Port80, we're big advocates of what we call cache control policies.  But frankly, it's proven to be something of an uphill battle.  Amazingly to us, for example, only about a fifth of Fortune 1000 Web sites show any evidence of having proper cache control policies in place. 

There are probably a lot of reasons for this but we remain convinced that two of the big ones are that:

1. Few Web professionals realize how much faster their sites would seem to end-users if they took the trouble to write proper cache control policies, and;

2. Even fewer realize how darn easy this is to do.

What follows is a small example of point number one (we'll come back to point number two in a subsequent post).

With Internet Explorer's default settings and a clear cache, make a request to the Port80 homepage using a good HTTP trace tool (we link to a few here) that lets you see what is going on at the protocol level, and gives you download times for individual objects.  You'll get something like the following:

Result Time Size Type URL-path
200 0.931 50 image/gif /images/H_vertline

This is a little vertical line image used for decoration on the page.  It weighs 50 bytes and it took almost a full second to download, from the beginning of the request.

Okay, now let's pretend there is no cache control policy in place for this file.  What happens to it, given IE's default caching behavior?  The first thing to notice is that vertical line image, as is normal for statics, had a Last-Modified header associated with it:

Last-Modified: Thu, 26 Sep 2002 00:07:24 GMT

This means that, if we close down the browser then come back later, when we hit that homepage again, IE isn't simply going to ask for the little vertical line image.  Instead, it is going to ask for confirmation that the cached copy it has is still fresh.  In IE's  HTTP request headers, you would see something like this:

If-Modified-Since: Thu, 26 Sep 2002 00:07:24 GMT

And the server would respond, not by sending the file, but by saying, right in the HTTP response line, the first line of text it sends back to the client:

HTTP/1.1 304 Not Modified

meaning, "Go ahead and use the one you have, it hasn't changed on the server."  So IE uses the cached copy, with the resulting kind of numbers:

Result Time Size Type URL-path
304 0.220 50 image/gif /images/H_vertline

So we saved about half a second of download time, because the image itself didn't have to be downloaded, only the lightweight message headers telling IE to use the cached copy.  And you didn't have to do a thing.  Pretty nice, huh?

But there's a problem here.  0.220 seconds may not seem like much, but now navigate around the site a bit and then, after a while, come back to the homepage.  Notice how it popped right up, with no delay at all?  Why is that?

The reason is this:  Everything was already in the cache but, more importantly, nothing had to be revalidated.  IE assumes, within the browser session, that the objects it has cached are still fresh.  Hence the eye-popping performance -- no round trips to the server were required.  On the return trip, by contrast, we wasted about a quarter second going back to the server to revalidate a tiny file that was obviously reusable -- and that was just the penalty for a single object.

Say that homepage happens to have well over 50 objects on it, many of them tiny ones that hardly ever change -- just like H_vertline.gif.  Even with multiplexed and persistent TCP connections (even with TCP pipelining), you'll never get this kind of instantaneous page load if, for each of those little objects, the browser has to go back to the server and ask, "Can I still use this?"  And that is exactly what IE does, by default, on every return visit.

Now imagine if you could get intra-session like performance on return visits.  That would pretty clearly be a superior type of caching, wouldn't it.  Well, you can get performance like that -- if you have cache control policies.  This is because, once you now what your cache control policies are, you can implement expiration-based caching, instead of relying on the Last-Modified mechanism.  And expiration-based caching blows the Last-Modified caching away, when it comes to fast page loads.

For instance, if you have a cache control policy in place that says how long H_vertline.gif (or other files like it) should be cached, then the first response with that file in it could also contain Expires and Cache-Control headers that implement that policy.

The result would be that IE, on return visits to the homepage, would know it could reuse this little vertical line image straight out of its own cache, instead of going all the way to the server to get permission.  And that would mean an object load time too small to even bother measuring in thousandths of a second.  (You can actually try this out on our homepage because, in the real world, the images on that page do use expiration-based cache control.)

Well, we hope that converted a few more of you.  Next time we return to this topic, we'll cover how easy it is to write a good set of cache control policies.
 

posted on Wednesday, February 09, 2005 12:31 PM

Feedback

# The Newest Industry - Someone else is advocating Cache-Control policies

The Newest Industry - Someone else is advocating Cache-Control policies
2/9/2005 12:53 PM | Pingback/TrackBack

# re: Cache Control Policies: Why Bother?

Nice stuff, but how come don't you use them on your website? Look at the logo in the beggining of this page...
3/1/2005 7:16 AM | Just passing by

# re: Cache Control Policies: Why Bother?

It is a case of the shoe maker's daughter not getting the best shoes ALL the time, even though she could...

Thanks for the reminder:

http://www.port80software.com/tools/cachecheck.asp?url=http://www.port80software.com/200ok

Best,
Chris @ Port80

3/1/2005 10:56 AM | Chris @ Port80

# re: Cache Control Policies: Why Bother?

Obviously, we should be careful of the objects we cache, but how can we remedy mistakes. How can you refresh an object once it's been cached - perhaps you made a mistake and have to replace a gif file.

Let's say if an object is cached for 72 hours and a user accesses your page. You update the object right after the user caches the object, therefore the user won't see the new object for roughly another 72 hours.
3/1/2005 11:19 AM | Mike

# re: Cache Control Policies: Why Bother?

In the case of a mistake like you mention, the fix is to do what a Content Management System does: rename the object with a new name or version number.
3/1/2005 11:38 AM | no way jose

# re: Cache Control Policies: Why Bother?

A-ha! I guess the moral of the story is to be very careful that you get it right the first time.

I can't afford a CMS so in my case I would potentially have to go and modify all of my html files. For example, if I messed up my company's "logo.gif" file and had to rename it to something like "logo1.gif", then I would have to modify all of the html files that referenced "logo.html".

But after reading the article, I believe the benifits of setting cache control policies is well worth the effort.
3/1/2005 12:34 PM | Mike

# re: Cache Control Policies: Why Bother?

buxiugangban
3/20/2008 1:43 AM | 不锈钢板

# re: Cache Control Policies: Why Bother?

Thank You
4/7/2008 7:59 AM | software

Post Comment

Title:  
Name:  
Url:  
Comment:  
Verify:
(Enter the word as it appears in the box above.)