Josh Tarchuk

Alternatives to totally banishing mailto:
There are alternatives to completely removing email addresses, but they all depend on the stupidity of the spambot, and so could be compromised by a new generation of pest. These include:

* Write out email addresses in a non-email format, e.g. instead of writing 'user@example.com' you would write 'user at example dot com', or something similar. It would only take some spambot with a little more intelligence to be able to scan these patterns and pick up "likely" addresses, so this strategy is a little risky. Any consistent method you choose to write out email addresses could in theory be analyzed and decoded by a savvy bot.

* URL-encode email addresses (suggested by Anthony Martin). Most browsers allow the mailto: URL to contain URL Encoded values: The string of "%40" equals the at "@" symbol, while "%2E" equals period. For that matter, you could URL encode the entire address, name, host, domain, so it's one long encoded string. This is something that might work short term, but it's relatively easy for spambots to get smarter to decode this.

* Use HTML character entities email addresses (suggested by Seann Herdejurgen). Similar to the previous method. For example, <a href=&#109;ailto&#58;user&#64;example&#46;com>user&#64;example&#46;com</a>

* Add stuff to the email address to make it invalid, but so that a human could easily know what to do to make it work. An example of this is writing 'username@_NO_SPAM_example.com'. You need to remove the "_NO_SPAM_" part to make the email address valid. You can have some kind of explanation to make it clear what people have to do to use the address. Personally, I don't like this - you're depending on a level of sophistication on the part of your users which is risky. In my experience, there are a lot of very 'novice' level users out there, who only know how to click on a link. They don't know how to edit an email address. Heck, I've had people come to my site by typing the URL into Google, rather than the 'Location' box of their browser. Also, people don't read instructions.

* Make graphics images which contain the email address. Spambots usually don't download graphics, and even if they did, they probably couldn't decode the bits to get the text. However, they could do it in theory, since software for doing OCR (optical character recognition, getting text from scanned documents) has been around for a while. A downside to this approach is that the user has to manually copy down the email address, since it can't be cut'n'pasted. Also, you can't put a mailto: link on the image, otherwise you're back to square one. Finally, blind people (who use braille browsers) will have a BIG problem with pure graphics (unless you put in some kind of ALT text, using the techniques previously mentioned to obfuscate the email address). You could also put a link to a contact form, with an argument in the link telling your server internally what email address to use. For example, the link could say "contact.cgi?to=23", where '23' is some database key to the actual email address. But the downside here is that you still need to generate the image, which is a bit of a pain in the ass if you have a lot of them. You can do it automatically, if you're willing to put the work in and write the scripts. There are some very nice graphics generation packages out there on CPAN for Perl. Here's an example of an email address presented as an image:

Robert Logan tells me that the PBM package (which seems to be packaged with Linux) is a great way to generate these graphics, for example:

shell> echo user@example.com | pbmtext | pnmcrop | pnmpad -white -l2 -r2 -t2 -b2 > email.pnm
shell> convert email.pnm email.gif

This produces the following, which looks pretty neat and tidy:

An alternative to this (suggested by Andrew Park) is to just make certain characters into graphics, which can then be used again and again for all kinds of email addresses. For example, you could make a GIF of the '@' symbol, and possibly other common parts such as ".com" and ".org". If you have code on the server side that can then automatically convert email addresses into the appropriate HTML, then this will fool most spambots (for now!).

* Use JavaScript to make your email links hard to recognise for spambots. I personally don't like my site to be dependent on JavaScript, since I turn it off in my own browser (mostly for security reasons and to avoid the popup and popunder ads). But, there have been a number of methods suggested for doing this, for example:

o From Marcell Toth:

<html>
<script language="javascript">
function SendMail(Login, Server)
{
window.navigate("mailto:" + Login + "@" + Server);
}
</script>
<body>
<a href="javascript:SendMail('marcell.toth', 'nextra.hu')">Mail me</a>
</body>
</html>

o A JavaScript email encryptor (thanks to Joe Tucek for the link)

o From Brandon Gillespie:

There is a fourth means of dealing with the mailto: link I didn't see mentioned,
but which I have had good success with. Instead of doing href="mailto:foo@bar" you
create an obfuscated javascript function for each domain (for me they are all mailed
to the same domain, so its easy), like:

function m_sfcon (u) {
pre = "mail";
url = pre + "to:" + u;
document.location.href = url + "@sfcon.org";
}

Then use:

href="javascript:m_sfcon('myusername')"

* Some other interesting ideas:

o From Thomas "Balu" Walter:

While working on my new hompepage I found myself asking me how to defend against those bots.
I didn't want to break my eMail-address or to hide it using javascript or images -
especially because my visitors should be able to use mailto: links as expected.

My provider set up a "catchall" mailbox where all mails are stored that are sent
to my domain @example.com. Since I am developing my pages using PHP I thought of
a way to make them unique for each visitor. The result was the following small function:

function generateMail(){
global $HTTP_SERVER_VARS;

// is a proxy in use?
if ($HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"]) {
$ip = $HTTP_SERVER_VARS["HTTP_X_FORWARDED_FOR"];
} else {
$ip = $HTTP_SERVER_VARS["REMOTE_ADDR"];
}

return "web-".sprintf("%u", ip2long($ip)).".".time()."@example.com";
}

This generates an address in the form

web-32bitIP.timestamp@example.com

This way I can easily reject addresses that were found by bots and are used for SPAMming.
I even know where the bot came from and when. I can even find them in the webserver-logfiles
and analyze their activity.

o From Ilmari Karonen (in response to the update regarding newer versions of spambots which use google to find pages, and then follow no links on my website, thus foiling any link traps):

If the spambot is indeed not following links, an obvious solution is to
feed all mailto: links through a redirector script.

On a site I'm currently building, I'm doing the following:

1. All email links are given as "/email/?h=host&u=user".
2. The directory /email is disallowed in robots.txt.
3. Any URL under /email which is _not_ in the above format acts as a
spambot trap.
4. All pages contain links to "/email/something_random.html".

This works great as long as there are no e-mail addresses visible on the
page. I'm currently obfuscating those by inserting the HTML code

<font size="1" color="white" style="font-size: 1px;">X</font>

on either side of the @ sign. I figure a bot has to be pretty damn
clever to de-obfuscate that, while it's pretty obvious to a human even
if the CSS hiding trick fails.

As you can see, there are many ways you can make email addresses harder for spambots to recognise. It all depends on your own expertise and preferences. Still, in my opinion the only totally safe way to ensure spambots can't harvest email addresses is to totally remove them from your website! Can't get around that one, no matter how smart they get...

 



Blackcomb.ca My little corner of the web

Google:
submit

Home
Josh
Work
Linux

"The secret to creativity is knowing how to hide your sources."
A. Einstein

NEWS:
New site


No endorsement or approval of any third parties or their advice, opinions, information, products or services is expressed or implied by any information on this Site or by any hyperlinks to or from any third party websites or pages. ©2004 Josh Tarchuk  |  Blackcomb.ca