Spotting Unseen and Potentially Harmful Traffic

Posted: April 2nd, 2012
Filed under: IIS & HTTP

Startling Finds

Don’t think your site is receiving harmful traffic?  Incapsula recently put together a report about website traffic and its legitimacy, harmfulness, and humanness.  The information compiled came, “from a sample of one thousand website of Incapsula customers, with 50,000 to 100,000 monthly visitors,” their site says.

What did the report say?  51% of all website traffic was non-human.  Of that 51%, 31% of traffic consisted of potentially harmful visitors, including hacking tools, scrapers, spies, and comment spammers.

That means that less your site’s traffic may be less man than machine.

So wait, you’re telling me that of the 10,000 visitors Google Analytics told me I had last month, only 5,000 of them were actual people?

No, actually, the amount Google Analytics told you is more or less accurate about the amount of human traffic, it simply doesn’t tell you about the hackers, spammers, scrapers, and spies that are visiting your site.  These visitors essentially go unnoticed and are invisible unless you look for them.  But how can you prepare and secure your site if you don’t even know what’s actually happening on it?

A Problem of Analytical Proportions

Now you’re probably crying, “why wouldn’t Google tell me these kind of things?!”  Well, the short answer is: they can’t.  When a user goes to a page on your site, their browser makes a request from your web server to send a specific page.  The web server returns the page to the browser with an embedded piece of JavaScript, which is the code that runs Google Analytics.  The code gathers the information from the page opened in the browser and sends it to the databases at Google.  After the information is sent to Google, you can go to your Google Analytics account to access the gathered data.

The problem is that non-human traffic, like scrapers and hacking tools, are visiting your site, but they don’t run the JavaScript used to send the information from the browser to Google.  Since Google Analytics runs on a separate server from your site, this script needs to send the information to the Google Analytics databases in order for a visit to be recorded.

What Can You Do?

You can keep using Google Analytics, or your preferred analytics service, but there are measures you can take make yourself more aware of what is happening on your site.

  • Check log files.  By manually coming through your server log files you can see all the traffic that has come to your site.  It may take a little bit of time and work, but spotting bots can be done fairly easily.  Things to look for:
    • Suspicious user agents. Users not fetching dependencies like JavaScript or cookies are good signs of a bot.
    • Rapid movement. If you notice a visitor has viewed a lot of pages in a very short period of time (i.e. 1000 pages in 10 seconds), that’s a good sign of a bot.
    • Bad Requests. If your site is on ASP and you see requests for PHP, a guessing bot is probably making the illegitimate requests.
  • Checking for Robots.txt requests.  This is the file that bots will look for and request, by noting the number of requests you will know how many times bots were on your site.
  • It’s a Trap!  Setting up traps to catch bots can be done by creating invisible links on your site, which human visitors won’t be able to see or find, but bots crawling your site for links will be able to.  For every time that link is hit, you will know that a bot visited your site.
  • ServerDefender VP can track and show you all every request hitting your server.  Real-time logs and monitoring clearly display the names of malicious visitors.  Below are images of what bad traffic looks like versus legitimate traffic.
No Comments »

Comments are closed.