iEntry 10th Anniversary RSS Contact


Blocking Harmful Bots From Your Site

By: Chris Richardson
2006-06-20

As you delve further and further into the all-encompassing realm of being a webmaster, you come to understand the nature of bots and how they can benefit or harm your site. Normally, when bots are discussed, its usually in reference to search engine bots that crawl and index whatever pages they come across... unless they are blocked by the webmaster.

Blocking Harmful Bots From Your Site
Keeping The Bad Bots Out

More often then not, webmasters want search bots to visit as often as they want (provided they aren't using up all of the site's bandwidth) because it gives the impression the search engines are paying attention to your site. However, not every bot that visits a website is benign. In fact, some of them are crawling sites for the express purpose of either spamming or some type of content theft.

With this in mind, WebProWorld moderator Webnauts posted a comprehensive list of the bots he was going to block in his site's robots.txt file, which Google's definition search defines as, "The robots exclusion standard or robots.txt protocol is a convention to prevent well-behaved web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website."

As you can see, the robots.txt file can be quite powerful when it comes to bot management, but what happens if you are dealing with malicious bots that ignore a site's robots.txt, which was pointed out in the discussion thread? What actions can you take to protect your site from these bots when they ignore the first line of defense? Some of the responses should shed some light here:

Andilinks - Continually parsing that huge robots.txt file will cause you more harm than the bad bots, which will ignore the file anyway. But not all bad bots shed their user-agents, many are simply being run by irresponsible kids who lack the brains or initiative to be truly evil.

So here's the Apache mod_rewrite code for blocking by user agent, substitute your bad user-agent for "FunWebProducts" This goes in the .htaccess file in your www directory. Test access to your site after making any change to the .htaccess file, the Apache server is very unforgiving of errors in this file.

Code: RewriteEngine on
SetEnvIf User-Agent -FunWebProducts bad_bot=1
deny from env=bad_bot


Someone else offers another idea to block spam bots:

solecist - I had to put an IP blocking scheme in my blog's comment folder to block some spambots - its a PITA - but you have to have several layers to get them...

However, another poster has these words of warning:

carlos_p - You'll be better off if you apply these rules to your server configuration instead of the robots.txt, thus completely barring the entry to these bots. For instance, you can accomplish this on an Apache server through the RewriteEngine directives.

Still, this ain't gonna keep'em all away because a really mean bot might shed its skin and fake the UserAgent anyway...


As you can see, the battle against malicious bots can be a hard one to win. However, that should not stop the fight from occurring, especially if you are concerned with click fraud, comment spam, content theft, and a litany of other potentially harmful actions these bots are capable of.

Even though the robots.txt approach may not be the most effective means of blocking harmful bots, Webnauts did provide a long list of bots that can and have harmed other sites. If you are thinking about implementing some sort of spam bot block, you should definitely use this post as a resource.


|

Add to | DiggThis | Yahoo My Web




About the Author:
Chris is a staff writer for iEntry, focusing on the search industry.


Visit the SearchNewz Directory
Do you have a search site?
Submit it free to the internet's best search industry directory. » Click Here
Search Engines
Google, Yahoo, MSN...

Search Marketing
Marketing, Budget, Planning...

Pay Per Click
Bid, Price, Quality...
SEO Companies
Optimization, Manage, Company...

SEO Tools
Track, Search, Create...

Analytics
Statistics, Counter...
» Submit your site for FREE «

Latest News

Get Your Site Submitted for Free in the World's Largest B2B Directory!

Email Address:
* URL:
*
*Indicates Mandatory Field

Terms & Conditions



Titan Quest Forum Nintendo Wii Graphics Forum
Halo 3 Forum Mac Software

Privacy Policy Legal Sitemap Contact Us RSS Feeds Newsletter Archive SearchNewz.com Privacy Policy Legal Sitemap Contact Us RSS Feeds Newsletter Signup Subscribe to our feeds!