Leech Bandits on the Loose!

Posted: June 22nd, 2009
Filed under: IIS & HTTP
Tags: , , , ,

Leech Bandits on the Loose!
And What You Can Do to Stop Them

What is Leeching?

Leeching. Inline linking. Hotlinking. Bandwidth theft. Sometimes it is even called direct linking or confused with deep linking. Whatever the term, if a third party site is requesting your files and presenting them on their Web site, you are paying for the bandwidth while they use your content as their own.

How Does this Bandwidth Theft Occur?

In a classic example of content leeching, an online ad (i.e. on eBay) is created with an unauthorized image served from a manufacturer’s site. This arrangement is good for the eBay user, but bad for the original manufacturer’s bandwidth bill – they serve the image but get no benefit from the transfer.

Another common image leeching faux pas is internet bloggers using directly linked images from another site without uploading them to their own server or host.

Not only is this a strain on your bandwidth, but in many cases this is an infringement on your copyright as well. It’s a safe bet to assume that if they are using your bandwidth they are most likely using your images without your permission as well.

How do you Stop Leeching?

If you are working in an Apache server environment one way that you can block unauthorized access to your files would be to add the following to your .htaccess file:

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(.+.)?example.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*.(jpe?g|gif|bmp|png)$ /images/nohotlink.jpeg [L]

The basic idea here is that we block requests, sending them to the “nohotlink” image if they don’t include the Referer header that points to our domain, in this case “example.com.”   This is a simple check but interestingly it knocks down most bandwidth thieves.

It is possible to check other headers as well, for example we might look to see if there is a User-Agent header and deny it in a similar fashion as well.  We might expand the scheme to try to address the particular values of a user-agent but this can be quite troublesome to keep up to date.

Microsoft IIS doesn’t have URL rewriting technology built-in, but there are solutions out there including Microsoft’s URL Rewrite module for IIS7+ . However, rewriting solutions really don’t properly address the leeching problem, and thus the rules found in our LinkDeny product should be employed.

Taking it a Step Further

LinkDeny can block using simple checks like header values; particularly the obvious Referer and User-Agent headers. LinkDeny can also enforce that HTTP requests are proper in their minimal number of headers, such as required Host header for HTTP/1.1, their style and so forth.  Basically the idea is that if people want to use bots to steal content they have to get the details down otherwise we will swat the request.

We can certainly go a bit further in limiting the types of people.  For example, commonly we can spot people who are abusing the site and stealing content and just add an IP block.  Or we may think the reverse and disallow everyone but say a few trusted IP partners. Using GeoIP resolution we can even expand this thinking to geographic regions when we see continued abuse coming from particular locales.  We can add such checks into a module or into application logic.

However, the wily content thief will do their best to imitate user-agents correctly and bounce off proxy servers to avoid detection.  We have to assume that these thieves will not be swatted down by simple checks so we limit our exposure by limiting access.  Consider if we have pieces of content like /books/manual.pdf or /images/logo.gif it is quite easy for the unscrupulous to automate a bot or even script a real browser to fetch these objects.  However, we can up the ante by changing the location or the required credentials to access the object regularly.

First let’s consider the use of a cookie. We could issue cookies in our site and require that a user send a cookie to fetch dependent images.  This alone will knock down many bots who don’t accept cookies.  However, our aim here is defeating those people who know all these things and use browser automation.  In this case they will likely accept cookies and will record a script to fetch the desired object.  Now we can foil these folks by making the cookie be short lived and reissuing allowed cookies very often.  So now their script may work for a few minutes but it fails later.  If they re-record it will fail again as soon as the cookie rotates.

Obviously it would seem that this is defeatable simply by visiting the site properly when you want to get the object and that’s ok.  Remember the point of anti-leeching isn’t to lock out legitimate browser access it is to lock out those who don’t play by the rules.

We can take time limitation and make it really hard to deal with by not only requiring a changing cookie but changing the very URLs of the dependent objects over time.  Thus logo.gif might be logo123fdf2345.gif one moment and logo1235234ss.gif another.  Obviously our pages have to reference some sort of token replacement system to employ such a measure but it makes scripting access to the object quite annoying.


All of these anti-leeching schemes can be defeated, just as all locks online and off can be broken.  Ultimately you get into the game of dealing with anti-automation and injecting CAPTCHAs for people to fill out.  These too can be defeated of course, but that doesn’t mean we should just give up. If that is your attitude we suggest you avoid locking your house or car because frankly those employ easily breakable security mechanisms as well!

Whether you roll your own anti-leeching feature or use a product like LinkDeny, protecting your site content from leeches will save you bandwidth, and money. You won’t be able to stop everyone from trying to steal your content, but you can however make a sizeable dent in just how much in how much they do steal!

~ Port80

No Comments »

Leave a Reply