|
Save Your Site from Spambots
Techniques to Prevent Address Scraping
By Steven Champeon
The problem: too much spam. Unsolicited advertising email continues to
account for untold business losses each year. To give you an idea of the scope
of the problem, in 1998 AOL reported that of the approximately 30 million email
messages its servers handled each day, between 5 and 30 percent were spam.
Assuming that this rate is true for other email providers as well, spam takes a
significant economic toll on business, not merely in terms of Internet
resources, but in lost employee productivity as well.
Sometimes, whether you receive bulk email is just the luck of the draw.
Target addresses are often generated at random, or constructed from common
usernames and domains. My own mail server is configured to forward any mail sent
to my domain, regardless of address, straight to my account. Among the
legitimate mail, I notice lots of spam for variations on hesketh.net (for
example, ed@hesketh.net), even though there are very few real email addresses in
that domain (which is just the Web hosting arm of my business).
There are many other ways in which real email addresses commonly fall into
the hands of spammers. Any publicly available source of email addresses can be
considered fuel for their activities. Usenet newsgroups and mailing lists have
long been gold mines for spammers, who happily steal return addresses from
posts.
One of the most popular sources of addresses for bulk mailings, however, is
the Web. Software packages, known informally as "spambots," spider the
Web collecting information in much the same way that search engines do. The
difference is that spambots have but one purpose: to "scrape," or
harvest, every email address they find on the pages they analyze, and add them
to bulk email lists.
Email addresses might be harvested from posts on public Web forums or message
boards. Or, worse—they could be gathered from your own corporate Web site.
Fortunately, if you're in charge of maintaining your company's Web servers,
there are steps you can take to prevent this from happening.
Apache to the Rescue
Apache—based on the old NCSA httpd—is the world's most popular Web
server. According to the current Netcraft Survey, Apache runs on more than 62
percent of the world's Web servers. With its mod_rewrite module, Apache presents
an effective means of blocking spambots from harvesting your site's addresses.
To build Apache with support for mod_rewrite from scratch, download the
latest source distribution for your system from an appropriate mirror of
apache.org. The file install.sh, available online, includes all of the command
line options you'll need for most Unix systems. For other operating systems, see
the relevant documentation on the Apache site, or read the INSTALL documentation
that comes with Apache.
If you're already running Apache, simply key in the following command
(substituting the appropriate path to your existing Apache binary) to check
whether your server installation already supports mod_rewrite:
/usr/local/apache/bin/httpd -l
It will either show you that you have support for Apache's runtime shared
objects, where modules are compiled and then loaded as needed, or else list the
modules that were linked during a static build. Examples of the different types
of output you can expect are shown in modules.txt, online. If the output of this
command includes mod_rewrite.c, then your Apache installation has what you need.
Congratulations!
Getting to Know mod_rewrite
Because it works in seemingly mysterious and powerful ways, mod_rewrite has
been sometimes described as voodoo. In a nutshell, the mod_rewrite module lets
you perform customized URL rewriting deep in the guts of the Apache process,
based on any of the properties associated with an incoming request.
Technical Pg 2
|