Vectors & Interfaces
The networking specialist
About Vectors & Interfaces Network support services Useful resources PC News Contact The support specialist Support Guide

Save Your Site from Spambots

Techniques to Prevent Address Scraping

By Steven Champeon

The problem: too much spam. Unsolicited advertising email continues to account for untold business losses each year. To give you an idea of the scope of the problem, in 1998 AOL reported that of the approximately 30 million email messages its servers handled each day, between 5 and 30 percent were spam. Assuming that this rate is true for other email providers as well, spam takes a significant economic toll on business, not merely in terms of Internet resources, but in lost employee productivity as well.

Sometimes, whether you receive bulk email is just the luck of the draw. Target addresses are often generated at random, or constructed from common usernames and domains. My own mail server is configured to forward any mail sent to my domain, regardless of address, straight to my account. Among the legitimate mail, I notice lots of spam for variations on hesketh.net (for example, ed@hesketh.net), even though there are very few real email addresses in that domain (which is just the Web hosting arm of my business).

There are many other ways in which real email addresses commonly fall into the hands of spammers. Any publicly available source of email addresses can be considered fuel for their activities. Usenet newsgroups and mailing lists have long been gold mines for spammers, who happily steal return addresses from posts.

One of the most popular sources of addresses for bulk mailings, however, is the Web. Software packages, known informally as "spambots," spider the Web collecting information in much the same way that search engines do. The difference is that spambots have but one purpose: to "scrape," or harvest, every email address they find on the pages they analyze, and add them to bulk email lists.

Email addresses might be harvested from posts on public Web forums or message boards. Or, worse—they could be gathered from your own corporate Web site. Fortunately, if you're in charge of maintaining your company's Web servers, there are steps you can take to prevent this from happening.

Apache to the Rescue

Apache—based on the old NCSA httpd—is the world's most popular Web server. According to the current Netcraft Survey, Apache runs on more than 62 percent of the world's Web servers. With its mod_rewrite module, Apache presents an effective means of blocking spambots from harvesting your site's addresses.

To build Apache with support for mod_rewrite from scratch, download the latest source distribution for your system from an appropriate mirror of apache.org. The file install.sh, available online, includes all of the command line options you'll need for most Unix systems. For other operating systems, see the relevant documentation on the Apache site, or read the INSTALL documentation that comes with Apache.

If you're already running Apache, simply key in the following command (substituting the appropriate path to your existing Apache binary) to check whether your server installation already supports mod_rewrite:

/usr/local/apache/bin/httpd -l

It will either show you that you have support for Apache's runtime shared objects, where modules are compiled and then loaded as needed, or else list the modules that were linked during a static build. Examples of the different types of output you can expect are shown in modules.txt, online. If the output of this command includes mod_rewrite.c, then your Apache installation has what you need. Congratulations!

Getting to Know mod_rewrite

Because it works in seemingly mysterious and powerful ways, mod_rewrite has been sometimes described as voodoo. In a nutshell, the mod_rewrite module lets you perform customized URL rewriting deep in the guts of the Apache process, based on any of the properties associated with an incoming request.


Technical Pg 2








.com relocation .com related .com site ip .com address
.com position .com problem .com transfer .com transfer
.com shifting .com interconnect .com location .com site
.com up .com site .com delivery .com forwarding
.com link .com changes .com reassign .com provider
.com vendor .com supplier .com source .com broker
.com issues .com question .com topic .com matter
.com updater .com updating .com uptime .com information
.com registration .com signup .com requirement .com .sg
.com .my .com .com .com .net .com condition
.com necessity .com provision .com confidentiality .com resources
.com domain .com name .com names .com secrey
.com users .com owners .com spywares .com administrator
.com manager .com client's .com Microsoft .com advertisements
.com informative .com warranty .com shopping .com frequent
.com virtual .com common .com setting .com setup
.com control