Principles and Techniques of Cost-Effective Web Site Acceleration

By Thomas A. Powell and Joe Lima

This three-part article outlines a common sense, cost-effective approach to Web site acceleration according to the two simple laws of Web performance:

  • Send as little data as possible
  • Send it as infrequently as possible

If used properly, these basic principles should result in:

  • Faster Web page loads
  • Reduction of server usage
  • Improved bandwidth utilization

All of these techniques should not only improve user satisfaction with a site or Web-based application, but also save money on site delivery costs.

The principles presented in this article will not only be applied to developer-accessible Web page source including (X)HTML, CSS, and JavaScript, but will also address Web server configuration and modifications. Some suggestions may touch on structural site changes or modification to server-side programming environments, but the primary focus will be on relatively easy changes that can be made to existing sites.

The techniques derived from the two aforementioned principles fall into three major categories:

Part I: 20 Tips for Client-Side Code Optimization

Let's begin by taking a look at client-side code optimization -- the easiest and generally cheapest to implement of the three site acceleration techniques.

Code for Yourself, Compile for Delivery

Any application programmer knows that there are good reasons why the code one works with is not the code one should deliver. It is best to comment source code extensively, to format it for maximum readability, and to avoid overly terse, but convoluted syntax that makes maintenance difficult. Later, one translates that source code using a compiler into some other form that is optimized for performance and protected from reverse engineering. This model can be applied to Web development as well. To do so, you would take the "source" version of your site and prepare it for delivery by "crunching" it down through simple techniques like white space reduction, image and script optimization, and file renaming. You would then take your delivery-ready site and post it.

Now hopefully this isn't too foreign a concept, since you are likely already at least working on a copy of your site, rather than posting changes directly to the live site. If not, please stop reading right now and make a copy of your site, as this is the only proper way to develop, regardless of whether the site is a static brochure or a complex, CMS-driven application. If you don't believe us now, you surely will some day in the very near future if you ruin some of your site files and can't easily recover them.

As you build your site, you are probably focusing on the biggest culprits in site download speed reduction - images and binary files like Flash. While reducing the colors in GIF files, compressing JPEGs, and optimizing SWF files will certainly help a great deal, there are still plenty of other areas for improvement. Remembering the first rule of Web performance, we should always strive to send as few bytes as possible, regardless of whether the file is markup, image, or script. Now it might seem like wasted effort to focus on shaving bytes here and there in (X)HTML, CSS or JavaScript, however, this may be precisely where the greatest attention ought to be paid.

During a typical Web page fetch, an (X)HTML document is the first to be delivered to a browser. We can dub this the host document since it determines the relationships to all other files. Once received, the browser begins to parse the markup, and in doing so, often initiates a number of requests for dependent objects such as external scripts, linked style sheets, images, embedded Flash, and so on. These CSS and JavaScript files may, in turn, host additional calls for related image or script files. The faster these requests for dependent files get queued up, the faster they will get back to the browser and start rendering in the page. Given the importance of the host document, it would seem critical to get it delivered to the browser and parsed as quickly as possible since, despite constituting a relatively small percentage of the overall page weight, it can dramatically impede the loading of the page. Remember: users doesn't measure bytes, they measure time!

So what specifically do you need to do to fully prep your site for optimal delivery? The basic approach involves reducing white space, crunching CSS and JavaScript, renaming files, and similar strategies for making the delivered code as terse as possible (See Google for an example). These general techniques are well known and documented on the Web and in books like Andy King's Speed up Your Site: Website Optimization. In this article we present what we consider to be the top twenty markup and code optimization techniques. You can certainly perform some of these optimizations by hand, find some Web editors and utilities that perform a few of the features for you, or roll your own crunching utilities. We do also point you to a tool developed at Port80 Software, called the w3compiler. This tool is the only one on the market today that provides a reference implementation for nearly all the optimizing features described here and that serves as a legitimate example of the "real world" value of code optimization. Now on with the tips!

Markup Optimization

Typical markup is either very tight, hand-crafted and standards-focused, filled with comments and formatting white space, or it is bulky, editor-generated markup with excessive indenting, editor-specific comments often used as control structures, and even redundant or needless markup or code. Neither case is optimal for delivery. The following tips are safe and easy ways to decrease file size:

Questionable Markup Optimization Techniques

While the first five techniques can result in significant savings on the order of 10 to 15 percent, many tools and developers looking for maximum delivery compression employ some questionable techniques, including:

  • Quote removal on attributes
  • Doctype statement elimination
  • Optional close tag removal
  • Tag substitution like <strong> to <b>

While it is true that most browsers will make sense of whatever "tag soup" they are handed, reasonable developers will not rely on this and will instead always attempt to deliver standards-compliant markup. Generally speaking, the problems associated with bypassing standards (for example, diminished portability and interoperability) outweigh the small gains in speed, and, in the case of missing closing tags, there may even be a performance penalty at page rendering time. While sites like Google have consciously employed many of these techniques on their homepage markup, you probably don't need to go that far, and we suggest that you avoid them unless you have extreme performance requirements.

1. Remove white space wherever possible
In general, multiple white space characters (spaces, tabs, newlines) can safely be eliminated, but of course avoid changing <pre>, <textarea>, and tags affected by the <white-space> CSS property.

2. Remove comments
Almost all comments, save for client-side conditional comments for IE and doctype statements, can be safely removed.

3. Remap color values to their smallest forms
Rather than using all hex values or all color names, use whichever form is shortest in each particular case. For example, a color attribute value like #ff0000 could be replaced with red, while lightgoldenrodyellow would become #FAFAD2.

4. Remap character entities to their smallest forms
As with color substitution, you can substitute a numeric entity for a longer alpha-oriented entity. For example, &Egrave; would become &#200;. Occasionally, this works in reverse as well: &#240; saves a byte if referenced as ð. However, this is not quite as safe to do, and the savings are limited.

5. Remove useless tags
Some "junk" markup, such as tags applied multiple times or certain tags used as advertisements for editors, can safely be eliminated from documents.

CSS Optimizations

CSS is also ripe for simple optimizations. In fact, most CSS created today tends to compress much harder than (X)HTML. The following techniques are all safe, except for the final one, the complexities of which demonstrate the extent to which client-side Web technologies can be intertwined.

6. Remove CSS white space
As is the case with (X)HTML, CSS is not terribly sensitive to white space, and thus its removal is a good way to significantly reduce the size of both CSS files and <style> blocks.

7. Remove CSS comments
Just like markup comments, CSS comments should be removed, as they provide no value to the typical end user. However, a CSS masking comment in a <style> tag probably should not be removed if you are concerned about down-level browsers.

8. Remap colors in CSS to their smallest forms
As in HTML, CSS colors can be remapped from word to hex format. However, the advantage gained by doing this in CSS is slightly greater. The main reason for this is that CSS supports three-hex color values like #fff for white.

9. Combine, reduce, and remove CSS rules
CSS rules like font-size, font-weight, and so on can often be expressed in a shorthand notation using the single property font. When employed properly, this technique allows you to take something like

p {font-size: 36pt; font-family: Arial; line-height: 48pt; font-weight: bold;}

and rewrite it as

p{font:bold 36pt/48pt Arial;}

You also may find that some rules in style sheets can be significantly reduced or even completely eliminated if inheritance is used properly. So far, there are no automatic rule-reduction tools available, so CSS wizards will have to hand-tweak for these extra savings. However, the upcoming 2.0 release of the w3compiler will include this feature.

10. Rename class and id values
The most dangerous optimization that can be performed on CSS is to rename class or id values. Consider a rule like

.superSpecial {color: red; font-size: 36pt;}

It might seem appropriate to rename the class to sS. You might also take an id rule like

#firstParagraph {background-color: yellow;}

and use #fp in place of #firstParagraph, changing the appropriate id values throughout the document. Of course, in doing this you start to run into the problem of markup-style-script dependency: If a tag has an id value, it is possible that this value is used not only for a style sheet, but also as a script reference, or even as a link destination. If you modify this value, you need to make very sure that you modify all related script and link references as well. These may even be located in other files, so be careful.

Changing class values is not quite as dangerous, since experience shows that most JavaScript developers tend not to manipulate class values as often as they do id values. However, class name reduction ultimately suffers from the same problem as id reduction, so again, be careful.

Note: You should probably never remap name attributes, particularly on form fields, since these values are also operated on by server-side programs that would have to be altered as well. Though not impossible, calculating such dependencies would be difficult in many Web site environments.

JavaScript Optimization

More and more sites rely on JavaScript to provide navigational menus, form validation, and a variety of other useful things. Not surprisingly, much of this code is quite bulky and begs for optimization. Many of the techniques for JavaScript optimization are similar to those used for markup and CSS. However, JavaScript optimization must be performed far more carefully because, if it is done improperly, the result is not just a visual distortion, but potentially a broken page! We start with the most obvious and easiest improvements and then move on to ones that require greater care.

11. Remove JavaScript comments
Except for the <!-- //--> masking comment, all JavaScript comments indicated by // or /* */ can safely be removed, as they offer no value to end users (except for the ones who want to understand how your script works).

12. Remove white space in JavaScript
Interestingly, white space removal in JavaScript is not nearly as beneficial as it might seem. On the one hand, code like

x = x + 1;

can obviously be reduced to

x=x+1;

However, because of the common sloppy coding practice of JavaScript developers failing to terminate lines with semi-colons, white space reduction can cause problems. For example, given the legal JavaScript below which uses implied semi-colons

x=x+1
y=y+1

a simple white space remover might produce

x=x+1y=y+1

which would obviously throw an error. If you add in the needed semi-colons to produce

x=x+1;y=y+1;

you actually gain nothing in byte count. We still encourage this transformation, however, since Web developers who provided feedback on the Beta version of the w3compiler found the "visually compressed" script more satisfying (perhaps as visual confirmation that they are looking at transformed rather than original code). They also liked the side benefit of delivering more obfuscated code.

13. Perform code optimizations
Simple ideas like removing implied semi-colons, var statements in certain cases, or empty return statements can help to further reduce some script code. Shorthand can also be employed in a number of situations, for example

x=x+1;

can become

x++;

However, be careful, as it is quite easy to break your code unless your optimizations are very conservative.

The Obfuscation Side Effect of JavaScript Optimization

You'll notice that, if you apply these various JavaScript optimizations, the source code becomes effectively unreadable or, some might even say, obfuscated. While it is true that the reverse engineering of optimized JavaScript can be difficult, it is far from impossible. Real obfuscation would use variables like O1l1l1O0l1 and Ol11l001l, so that unraveling the code would be more confusing. Some may even go so far as to employ light encryption on the page. Be aware that, in general, obfuscation and optimization can be at odds with each other, to the point that more obfuscated code may be larger than the original code. Fortunately, lightweight code obfuscation is generally enough to deter casual code thieves, while still offering performance improvements.

14. Rename user-defined variables and function names
For good readability, any script should use variables like sumTotal instead of s. However, for download speed, the lengthy variable sumTotal is a liability and it provides no user value, so s is a much better choice. Here again, writing your source code in a readable fashion and then using a tool to prepare it for delivery shows its value, since remapping all user defined variable and function names to short one- and two-letter identifiers can produce significant savings.

15. Remap built-in objects
The bulkiness of JavaScript code, beyond long user variable names, comes from the use of built-in objects like Window, Document, Navigator and so on. For example, given code like

alert(window.navigator.appName);
alert(window.navigator.appVersion);
alert(window.navigator.userAgent);

you could rewrite it as

w=window;n=w.navigator;a=alert;
a(n.appName);
a(n.appVersion);
a(n.userAgent);

This type of remapping is quite valuable when objects are used repeatedly, which they generally are. Note however, that if the window or navigator object were used only once, these substitutions would actually make the code bigger, so be careful if you are optimizing by hand. Fortunately, many JavaScript code optimizers will take this into account automatically.

This tip brings up a related issue regarding the performance of scripts with remapped objects: in addition to the benefit of size reduction, such remappings actually slightly improve script execution times because the objects are copied higher up into JavaScript's scope chain. This technique has been used for years by developers who write JavaScript games, and while it does improve both download and execution performance, it does so at the expense of local browser memory usage.

File-Related Optimization

The last set of optimization techniques is related to file and site organization. Some of the optimizations mentioned here might require server modifications or site restructuring.

16. Rename non-user accessed dependent files and directories
Sites will often have file names such as SubHeaderAbout.gif or rollover.js for dependent objects that are never accessed by a user via the URL. Very often, these are kept in a standard directory like /images, so you may see markup like

<img src="/images/SubHeaderAbout.gif">

or worse

<img src="/portals/0/../../images/SubHeaderAbout.gif">

Given that these files will never be accessed directly, this readability provides no value to the user, only the developer. For delivery's sake it would make more sense to use markup like

<img src="/images/a.gif">

While manual file-and-directory remapping can be an intensive process, some content management systems can deploy content to target names, including shortened values. Furthermore, the w3compiler has a feature that automatically copies and sets up these dependencies. If used properly, this can result in very noticeable savings in the (X)HTML files that reference these objects and can also make reworking of stolen site markup much more difficult.

17. Shorten all page URLs using a URL rewriter
Notice that the previous step does not suggest renaming the host files like products.html, which would change markup like

<a href="products.html">Products</a>

to something like

<a href="p.html">Products</a>

The main reason is that end users will see a URL like http://www.sitename.com/p.html, rather than the infinitely more usable http://www.sitename.com/products.html.

However, it is possible to reap the benefits of file name reduction in your source code without sacrificing meaningful page URLs if you combine the renaming technique with a change to your Web server's configuration. For example, you could substitute p.html for products.html in your source code, but then set up a URL rewriting rule to be used by a server filter like mod_rewrite to expand the URL back into a user friendly value. Note that this trick will only put the new URL in the user's address bar if the rewrite rule employs an "external" redirect, thereby forcing the browser to re-request the page. In this case, the files themselves are not renamed as the short identifiers are only used in the source code URLs.

Because of the reliance on URL rewriting and the lack of widespread developer access to, and understanding of, such server-side tools as mod_rewrite, even an advanced tool like the w3compiler does not currently promote this technique. However, considering that sites like Yahoo! actively employ this technique for significant savings, it should not be ignored, as it does produce noticeable (X)HTML reduction when extremely descriptive directory and file names are used in a site.

18. Remove or reduce file extensions
Interestingly, there really is little value to including file extensions such as .gif, .jpg, .js, and so on. The browser does not rely on these values to render a page; rather it uses the MIME type header in the response. Knowing this, we might take

<img src="/portals/0/images/SubHeaderAbout.gif">

and shorten it to

<img src="/portals/0/images/SubHeaderAbout">

Or, if combined with file renaming, you might have

<img src="/0/sA">

Don't be scared away by how strange this technique looks at first, as your actual file will still be sA.gif. It is just the end user who won't see it that way!

In order to take advantage of this more advanced technique, however, you do need to make modifications to your server. The main thing you will have to do is to enable something called "content negotiation," which may be native to your server or require an extension such as mod_negotation for Apache or Port80's PageXchanger for IIS. The downside to this is that it may cause a slight performance hit on your server. However, the benefits of adding content negotiation far outweigh the costs. Clean URLs improve both security and portability of your sites, and even allow for adaptive content delivery whereby you can send different image types or languages to users based upon their browser's capabilities or system preferences! See "Towards Next Generation URLs" by the same authors for more information.

Note: Extension-less URLs will not hurt your search engine ranking. Port80 Software, as well as major sites like the W3C, use this technique and have suffered no ill effects.

19. Restructure <script> and <style> inclusions for optimal number of requests
You will often see in the of an HTML document markup like

<script src="/scripts/rollovers.js"></script>
<script src="/scripts/validation.js"></script>
<script src="/scripts/tracking.js"></script>

In most cases, this should have been reduced to

<script src="/0/g.js"></script>

where g.js contains all the globally used functions. While the break-up of the script files into three pieces makes sense for maintainability, for delivery it does not. The single script download is far more efficient than three separate requests, and it even reduces the amount of needed markup. Interestingly, this approach mimics the concept of linking in a traditional programming language compiler.

20. Consider cacheability at the code level
One of the most important improvements to site performance that can be made is to improve cacheability. Web developers may be very familiar with using the tag to set cache control, but (apart from the fact that meta has no effect on proxy caches) the true value of cacheability is in found in its application to dependent objects such as images and scripts. To prepare your site for improved caching, you should consider segmenting your dependent objects according to frequency of change, storing your more cacheable items in a directory like /cache or /images/cache. Once you start organizing your site this way, it will be very easy to add cache control rules that will make your site clearly "pop" for users who are frequent visitors.

So you now have 20 useful code optimization tips to make your site faster. One by one, they may not seem very powerful, but taken together you will see an obvious improvement in site delivery. Next, we will focus primarily on caching, explaining how it is generally misused and how you can significantly improve performance with just a few simple changes.

NEXT