Why You Should Care About Bots, and How to Deal with Them

Posted: July 10th, 2014
Filed under: IIS & HTTP


Has your site recently been bogged down by thousands of rapid requests from a distant land you do no business with? Have you been seeing spam in your form responses or comments? Are you seeing requests for pages that don’t exist on your site? If so, you may have bots.

Don’t worry, we all have bots. It’s a normal part of a site growing up. One day you’re launching, and the next day skimmers, scammers, and scrapers are scouring your site for information or holes they can poke through.

Bots are a ubiquitous part of the web universe at this point, flying through the pipes of the internet looking for prey. Normal as they may be, there is reason to be concerned with bots. One of the most recent reports from Incapsula puts bot traffic at 61% of all web traffic. That number is nothing to sneeze at, mostly because sneezing at things is rude, but also because it’s a very big number. While this is significant, there is still some debate around whether or not this traffic is visible in web analytics.

What do bots actually do?

“Are you a good bot, or a bad bot?”

Well Dorothy, not all bots are bad. In fact, there are some good bots that do things like crawl your site for search engines, or monitor RSS feeds. These bots are helpful and you’ll want to make sure that they don’t encounter any obstacles when crawling your site. Bad bots are primarily used for reconnaissance, but can pose various degrees of threat.

Lesser Threat

  • Email scrapers – These bots will scour sites for email addresses to harvest, which can lead to lots of spam or potentially harmful bait emails (i.e. phishing attacks, or malware).
  • Link spam – Ever see spam comments or in your submission form results? If these form fields allow links to be submitted, they can cause a lot of trouble. A link to a site with malware on it in your comments could endanger your users.

Greater Threat

  • Link spam (part II) – Imagine a scenario where someone with admin privileges clicks a link from a form submission, or in a spam comment. Now imagine that the link is to a site that installs a key logger on the admin’s machine. Next time the admin logs into your site or server, the credentials are captured, and all the protection you’ve put in place is void.

Automated Exploits

  • (Aimlessly) Search and destroy – These bots can potentially do a lot of harm, if they find a vulnerability on your site. While these bots are dangerous, they also operate without any real direction. Armed with a list of popular known exploits, these bots will crawl the web and throw these known exploit tricks at every site they encounter. If they come across a site with a hole, a hook will add that site to a queue for further exploitation.
  • Targeted Search and destroy – The same as above, but with a targeted list of sites to crawl.

What’s the end game?

Bad bots are a way for an outsider to own your server. Once the server is controlled, the bad guys can do a range of things with it:

  • Steal sensitive data stored there (personal info, credit card numbers, etc.)
  • Steal account passwords
  • Send malicious emails through it
  • Attack other sites/servers with it

Why Stop Bots?

An overwhelming amount of bot requests can bog down a site and cause it to run very slowly, just like how a large amount of legitimate traffic can eat up resources and slow down a site. This is a problem for a couple of reasons:

  1. Slow site = unresponsive pages = unhappy customers = lost sales
  2. Slow site = SEO hit (site speed is a factor in SEO ranking)

Prevent heavy resource usage costs

What adds insult to injury after a slowed site prevents sales? A huge bill from your hosting provider! Yes, all those extra requests and all those extra resources being used typically cost money.

Prevent data theft

Guess what else will cost you money: data theft! Of course, this can also hugely damage public perception and reputation – which are invaluable. Not to mention the fact it could mean other people’s information, money, and identities are put at risk.

Signs You Have Bots

There are a number of ways to spot bot traffic in your logs, but if you don’t know what to look for, you will likely never know you have a bot problem. Here are a few tell-tale signs of bots hitting your site:

  • Rapid requests – A normal user browsing a site won’t request 100 pages in a few seconds, as most internet users do not have super-human reading and clicking abilities.  However, bots do. And bots will make multiple requests per second by simply following links they find, or attempting to complete forms.
  • Lots of requests from the same IP address – all over the site – Aside from making a ton of requests in quick succession, bots can typically be spotted by a long trail of requests. No matter how interesting the content on your site, most real users won’t browse every page on the site – unless it happens to be a very small site. Bots will do this. Most real users will also usually be able to successfully submit a form on the first try or two – given they have an IQ higher than that of a lemming. However, a bot, which has no IQ, may not be able to do so. You may, in fact, see multiple failed attempts to submit a form, all from the same IP.
  • Requests without sessions – Real users browsing your site will normally accept cookies, bots often will not. Requests from IPs that don’t have sessions are likely bots.
  • Requests at odd times/locations – If you see requests at times or from locations that do not make sense for your business, then it could be a sign of bot traffic. For example, if you only do business in North America, but you see a number of requests from Eastern Europe in the middle of the night, then it’s definitely worth investigating.
  • Suspicious user-agents – A general way to spot suspicious user-agents is by looking for rare or infrequent user-agents that aren’t associated with a browser (or at least a well-known or popular one). Once you find them, take a look at their activity in your logs for anything suspicious. There are also lists of known dangerous bots that can be used for reference. Lastly, a simple Google search should indicate if they are known to be bad or not.
  • Bad data – You may be accustomed to seeing bad data (spam, empty) come through your forms, but be sure to look at it with a critical eye. Spam in your forms, or empty submissions can be dangerous.
  • Bad requests – Well-behaved users won’t typically type in directories in the address bar when navigating your site, it’s much easier to navigate by clicking links on the site. So, if you see a bunch of requests for URLs with a .asp extension on your all-PHP site, then you may have a bot poking around for a known vulnerability.

Stop Bots with ServerDefender VP

By now, you’re probably asking: How can I stop the bot uprising? Don’t worry, you won’t need John Connor for this mission. You can stop these bots much more easily, and without 3 sequels, mind you. Using ServerDefender VP, you can set up a bot policy in minutes and prevent the pests of the internet from causing you headaches.

1) Figure out your policy. A very strict bot policy will require sensitive security controls that have a very low tolerance for behavior that looks like bot behavior. This will keep bot traffic down, but could put you at risk of blocking good bots or legitimate traffic. Things to take into consideration here:

  • What does normal user behavior look like?
  • Keep in mind that if your site is very error prone (be honest with yourself here), that you may want to be

2) Launch ServerDefender VP’s settings manager and enter expert view. Under the Session Management tab, go to Bot Policy. Click the Configure button to launch the configuration panel.

 

3) Once you know how you plan to handle bots, you can jump into the configuration. Here’s a brief rundown of what each control does:

“Begin applying bot detection counters after ____ requests” – This tells ServerDefender VP when it should begin sniffing an IP for bot behavior. If you set this value to 1, ServerDefender VP will begin monitoring an IP’s requests for bot behavior after its first request. This essentially provides no leeway. You can provide just enough leeway for the good bots by easing up when SDVP begins looking to detect bots. Providing some leeway isn’t necessarily a bad thing, as bad bots are likely to make many many requests, not just a few.

“Maximum allowed errors per second” – As explained earlier, normal users don’t make hundreds of requests per second, and therefore they do not make hundreds of errors per second. Once the number of requests set in the previous control group is reached, then the max errors per second allowed configuration will kick in. This area will determine the strength of your bot policy, as this setting is really where you’ll trap your bots.

Setting this to a higher value provides some leeway for good bots to crawl your site without being penalized for errors. The lower you set this value, the more strict it is. Typically, setting this number to a single digit value should provide sufficient padding to prevent blocking users committing innocuous errors, while ensuring trouble making bots do not pass through.

 

“Percentage of requests allowed without referrer” & “percentage of errors allowed without referrer”– These are good to keep at 100%, as legitimate users sometimes do not even make requests with a referrer all of the time. Once the bot controls are in place, you can also configure the blacklist for user-agents. You can either add bad user-agents you’ve encountered in the past, or add them from a list of bad user-agents. There are plenty of articles and lists of bad-user agents which you can pull from, if you choose to do so.

 

Questions? Need Advice or Help?

We’re always glad to lend a hand. If you have any web app security questions or would like to try out ServerDefender VP for yourself, you can email us at support@port80software.com

 

Bonus Bot Info

NPR reported on the severity of the bot threat recently, bringing the conversation to the general public.

No Comments »

Leave a Reply