Jay
SuperDork
3/23/11 11:54 a.m.
Okay, seriously. I know why these things must exist and what they do, but there is no need to have a dick-wagging contest to see who can out-obfuscate each other. Case in point:
berkeley! How am I supposed to type that? The standard German keyboard has keys for the ö, ä, and ü, but I'm on an English keyboard so I had to go into the freaking character map. Wouldn't it be best to assume everyone has the 'lowest common denominator' keyboard? Surely there are enough permutations of A-Z and the numbers to screw over the average spambot. The second "word" is even worse - I don't know what letters those are? Is that "m" or "r n"? You can't tell! Yes, I got this one wrong and had to wait the 90-second "annoy period" before I could try it again.
This wouldn't be full-on rant fodder if this crud weren't all over the dang internet. This is by far not the worst example. Between being totally illegible, using different alphabets, or occasionally forcing people to type non-alphabetic characters that are only available under Windows, it's just getting stupid. I've had Russian sites expect me to type in Cyrillic even though I was on the English version of the site. Yeesh!
(Before anyone points out the obvious - I know for a fact you can easily make Latin letters on a Russian keyboard, just by holding down one function key. Probably the same goes for Arabic, Greek, Thai, or whatever other alphabet you can think of.)
I use the "refresh image" button on a regular basis. Which I suppose means I'm pretty much on the same reading comprehension level as a spambot
No, seriously though...some of them are ridiculous.
Holy Crap!!! That's horrible... you watch Ugly Americans!!!
I have actually seen worse, not with those silly furrin' letters and such, but a lot harder to read (added lines, noise, etc.)
And I thought I was having a bad day translating a Swedish inspection report to English.
Detaljuppmatning me!
Rant time! Knock it off with the "catchpas"!
It's actually spelled "CAPTCHA", which may explain why you're having such a hard time.
Jay
SuperDork
3/23/11 12:13 p.m.
^^ Is that supposed to be a word? 'Coz Mr. Webster back there has a dozen guys with him who say it ain't.
Is that supposed to be a word?
I don't think so. I think you're just supposed to copy the letters. It goes "C-A-P-T-C-H-A."
From what I understand, there are 2 words. One has to be exact, the other doesn't. If it has punctuation or characters not on a standard QWERTY keyboard, then it's not the word they expect to be exact.
As a bonus lol, I also understand that the words often come from the mistyped works people put in and have rejected.
Actually with CAPTCHA you don't have to type the follow items: Number, Words with punctuation, symbols, unreadable items, or double stacked portions. CAPTCHA can only identify plain letters.
Keith
SuperDork
3/23/11 12:50 p.m.
You're actually looking at a "reCAPTCHA". It's a pretty cool concept. One word is the test, the other is a word from a scanned document that the computer can't figure out. So it asks you, because the human brain is much better at this kind of thing. Only the one, known word is used to determine if you pass the test. Google Books uses the reCAPTCHA.
some smart people said:
First Use - Alta-Vista
In 1997 Alta Vista sought ways to block or discourage the automatic submission of URLs to their search engine. This free "add-URL" service is important to AltaVista since it broadens its search coverage. Yet some users were abusing the service by automating the submission of large numbers of URLS, in an effort to skew AltaVista's importance ranking algorithms.
Andrei Broder, Chief Scientist of AltaVista, and his colleagues developed a filter. Their method is to generate an image of printed text randomly so that machine vision (OCR) systems cannot read it but humans still can. In January 2002 Broder stated that the system had been in use for "over a year" and had reduced the number of "spam add-URL" by "over 95%." A U.S. patent was issued in April 2001.
Yahoo's Chat Room Problem
In September 2000, Udi Manber of Yahoo! described this "chat room problem" to researchers at CMU: 'bots' were joining on-line chat rooms and irritating the people there by pointing them to advertising sites. How could all 'bots' be refused entry to chat rooms?
CMU's Prof. Manual Blum, Luis A. von Ahn, and John Langford articulated some desirable properties of a test, including:
the test's challenges can be automatically generated and graded
the test can be taken quickly and easily by human users
the test will accept virtually all human users with high reliability while rejecting very few
the test will reject virtually all machine users
the test will resist automatic attack for many years even as technology advances
CMU's CAPTCHA Research
The CMU team developed a 'hard' GIMPY CAPTCHA which picked English words at random and rendered them as images of printed text under a wide variety of shape deformations and image occlusions, the word images often overlapping. The user was asked to transcribe some number of the words correctly.
A simplified version of GIMPY (EZ GIMPYU), using only one word-image at a time, was installed by Yahoo!, and is in use currently in their chat rooms to restrict access to only human users.
Pioneering CAPTCHA Research at PARC
PARC’s research builds on its pattern and image analysis competencies to create reading-based CAPTCHAs. Principal Scientist Henry Baird, an expert on computer vision and document image analysis, also organized the first NSF-funded International Workshop on Human Interactive Proofs, held at PARC in January 2002.
Baird also collaborated with Richard Fateman and Allison Coates of UC Berkeley to develop PessimalPrint, a CAPTCHA that uses a model of document image degradations that approximates ten aspects of the physics of machine-printing and imaging of text. This model included spatial sampling rate and error, affine spatial deformations, jitter, speckle, blurring, thresholding, and symbol size. Their paper, PessimalPrint: a Reverse Turing Test, was the first refereed technical publication on CAPTCHAs.
Bracing for the Arms Race
Most CAPTCHA research to date has been limited to academic applications. Far more powerful algorithms will be required for commercial CAPTCHAs. As CAPTCHAs become more prevalent, bot programmers are expected to unleash armies of bots bent on breaking them.
Most research programs focus on either building CAPTCHAs or breaking them through, e.g., dictionary and computer-vision attacks. PARC research is unique in that it does both: we play both offense and defense. From exploring how to break them, researchers are discovering new techniques for building CAPTCHAs that are less vulnerable. For example, BaffleText uses non-English pronounceable character strings to defend against dictionary-driven attacks, and Gestalt-motivated image-masking degradations to defend against image restoration attacks.
User-focused studies
PARC’s user-focused approach makes BaffleText algorithms more commercially viable by ensuring they are not too frustrating for people to use. Drawing on PARC’s long tradition of workplace studies that merge insights from both social and computer sciences, researchers have conducted usability studies to confirm the human legibility and user acceptance of BaffleText images.
PARC is seeking corporate partners interested in using PARC CAPTCHA technology inside their own products and applications. To learn more, please contact Julie Chen, Business Development, 650-812-4758.
The funny thing, the pessimal print CAPTCHA is used to help decode old scanned texts. One word is known and the second is not. The program knows which is which and will have you type both. IF you get the known word correct the computer logs what you typed and after so many people have typed the same string for that word it assumes we all got it right and makes it a known word. When the publisher gets all the words back the document is cleared for printing.
Kinda interesting in a geek sort of way.
Keith beat me to the punch.
pstrbrc
New Reader
3/23/11 12:55 p.m.
In case you weren't aware, the CAPTCHAs are used to help prevent spam in forums, comments, etc. It's a way of trying to foil bots.
When you see non-standard characters, you can just type the closest normal one. For the example above, you could've just typed "Ogodai rnamen".
With reCAPTCHA I've had some stuff come up in Kanji and some form of Semetic before, in those cases I just hit the "give me another one" button.
a computer generates the captcha, but they don't think the spambot operators are capable of writing a program that can read them and defeat them?
Keith
SuperDork
3/23/11 7:03 p.m.
That's the whole idea. Character recognition like that is not an easy thing to do. Heck, that's why the reCAPTCHA exists - because people have the ability to recognize distorted characters that computers do not.
novaderrik wrote:
a computer generates the captcha, but they don't think the spambot operators are capable of writing a program that can read them and defeat them?
they have....... NASA SE is being inundated with spam at the moment... nothing they seem to do is able to stop it... 10 , 15 a day or more
While recaptcha is a neat idea, captchas are used by lazy and stupid web developers who don't realize or don't care how much they annoy their users.
There are much better ways to fight spam.
And yes, there are many ways spammers can -- and do -- beat captchas.
A useless annoyance.
One I got recently - I thought it was going a bit far...
Jay
SuperDork
3/28/11 10:05 p.m.
Okay, so today I got a new one... On registering for another forum, it came up with a box and a list of words. You were supposed to select the countries (i.e. "the USA", "France", etc.) out of the list of words and drag them into the empty box.
It wasn't nearly as infuriating as trying to decode illegible characters and getting it wrong, but still, it seemed ridiculously complicated for foiling a simple spambot. How long did it take them to code that?
That's the thing... no matter how complicated you get, you won't foil the spambot for long. Captchas don't work, they've never worked, they're annoying and pointless and do more to run off your customers/audience than fight spam.
Hey Tim, could you explain why/how they don't work at foiling spambots? I would have thought they would work.