Form Spam, Prevention
Why Form Spam?
A good question we often receive is the general wonder of why – why form spam? Why would a person (or an automated process) care about submitting a bunch of junk on whatever web forms they can find?
To understand the answer, one should consider email spam, for both types of spam are perpetrated for similar reasons. The goals behind email spam are obvious to anyone who has considered it: to increase traffic to their websites, to phish for personal and/or financial information, to install subversive code on the victim’s computer, and so on.
Form spam has the same goals in general; they just go about it in a slightly different way. Consider search engine results: one of the best ways to increase the visibility of a given website is to increase the number of backlinks, or the number of links on other websites to the given website. Since form results are often posted onto the related website (think blogs), form spam accomplishes this. Like email spam, it is a numbers game – which explains why your web form is being targeted (like an email address: because it exists).
Form Spam: An Ongoing Topic of Conversation
There has been much discussion in the last few years about how to prevent spambots from submitting forms on web sites. I expect this conversation to continue, because the problem of spam does not appear to be a short-term one. Many different solutions have been presented ranging from the simple to the complex. A number of the anti-form spam solutions actually impact the usability and accessibility of the web page; for example, the use of CAPTCHAs (or re-CAPTCHAs) is a classic case where the user and accessibility is directly impacted.
Here are a few common techniques and terms relating to the prevention of form spam that website owners should be aware of.
One of the simplest ways to avoid form spam is using CSS. Non-human spammers usually fill out every available input field in a form before submitting it. The basic idea here is to create an extra field on your form, actually a dummy input text field -- and then use CSS to make the form field invisible to your human website visitors. That way, if the form is filled out and this dummy field contains any information at all, you can safely bet it is form spam, and it can be accordingly trashed.
The downside to such a simple method is that many more sophisticated form spambots can tell this is a hidden field and thus “know” to avoid it. In fact, this was probably one of the first methods form spammers learned how to automatically circumvent. In our opinion, however, it is still much better than using NO form spam avoidance techniques on a given form.
The “Turing test” is a proposal for a test of a machine's ability to demonstrate intelligence. Described by Alan Turing in the 1950 paper "Computing Machinery and Intelligence," it proceeds as follows: a human judge engages in a natural language conversation with one human and one machine, each of which tries to appear human. All participants are placed in isolated locations. If the judge cannot reliably tell the machine from the human, the machine is said to have passed the test. In order to test the machine's intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard and screen.
The term "CAPTCHA" was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper (all of Carnegie Mellon University), and John Langford (then of IBM). It is a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart." Carnegie Mellon University attempted to trademark the term, but the trademark application was abandoned on 21 April 2008. Currently, CAPTCHA creators recommend use of reCAPTCHA as the official implementation.
reCAPTCHA is a system developed at Carnegie Mellon University that uses CAPTCHA to help digitize the text of books whilst protecting websites from bots attempting to access restricted areas. reCAPTCHA is currently digitizing text from the Internet Archive and the archives of the New York Times.
reCAPTCHA supplies subscribing websites with images of words that optical character recognition (OCR) software has been unable to read. The subscribing websites (whose purposes are generally unrelated to the book digitization project) present these images for humans to decipher as CAPTCHA words, as part of their normal validation procedures. They then return the results to the reCAPTCHA service, which sends the results to the digitization projects. This provides about the equivalent of 160 books per day, or 12,000 man-hours per day of free labor (as of September 2008).
Resources: Turing Test Methods for PHP Forms
Wikipedia – Turing Test - http://en.wikipedia.org/wiki/Turing_test
Resources: Form Spam, General
- Nashville Web Design
- Nashville Web Development
- Nashville SEO
- Database Integration
- Content Management (CMS)
- Make a Payment Online
Web Design & Development Resources
- Nashville Web Development Resources Galore
- SEO Articles from the Outside
- Web Development Articles from the Outside
- SEO Resources: Free Keyword Suggestion Tools
- Spotting Trends
- Creative Inspiration
- Web Design & Development Cheat Sheets
- Color Tools for Web Design
- Model Website Resources
- Photo Galleries, Lightboxes
- Typography Resources for Web Design
- Friends of BestWeb Nashville
Website Services Articles that Really Help You
- Article: How to Speed Up a Slow Computer - for Free!
- Webmail Options for Hosting Clients
- Web Hosting vs. Domain Registration
- Email Account vs. Alias
- Website Content Ideas
- Keyword Research, Web Metrics Help in Tough Times
- Writing Better Web Copy
- Form Spam
- SEO Linking Strategies
- Image Optimization Strategies
- Keyword Relevancy