On Jul 4, 2007, at 12:47 PM, Brian Candler wrote:

>> have fun putting that together.  to do it you need to render, not
>> just parse, html!
>
> It looks pretty easy to me. You'll conveniently put all the noise  
> characters
> in a different colour.
>
> Here's my two-minute solution:
>
> $ cat reader.rb
> src = File.read("test.html")
>
> src.gsub!(/<span [^>]*#ccc[^>]*>([^<]*)<\/span>/i) { " " * $1.size }
> src.gsub!(/&nbsp;/, ' ')
> src.gsub!(/<br>/i, "\n")
> src.gsub!(/<\/?pre[^>]*>/, '')
> puts src
> $ ruby reader.rb
>    __                      _
>   / _|                    | |
>  | |_     ___      ___      |__      __ _    _ __
>  |  _|   / _ \    / _ \   | '_ \    / _  |  | '__|
>  | |    | (_) |  | (_) |  | |_) |  | (_| |  | |
>  |_|      ___/    \___    |_._ /    \__,_|  |_|
>
> Of course you can keep changing your code, and I can keep changing  
> mine. But
> someone who took more than two minutes over this could come up with  
> a much
> more robust solution (e.g. dynamically working out the contrast  
> between
> foreground and background)
>
> Anyway, once your code is deployed on a real live site, by someone  
> other
> than you, it becomes much harder to change. And the source is going  
> to be
> available to the attacker too.
>


the latest version addresses all these issues and more.  check out

http://drawohara.tumblr.com/post/4944987
http://fortytwo.merseine.nu:3000/flatulent/ajax


key points:

  - noise is image chars
  - no color diff between noise and image chars
  - image is not visible without running gecko or otherwise rendering  
javascript
  - image has an encoded timebomb in it: attacker has only 60s for  
post.  this just rules out brute force attacks.

i think bumps it up into a new league of attacks - maybe not though,  
people are creative ;-)


>> now, where i'm heading now, is using css and javascript so to
>> position the image and characters within the image.
>
> Hmm - this risks making the captcha visible by fewer and fewer  
> browsers. OK,
> so lynx wouldn't be able to view a PNG captcha either; but you risk  
> locking
> out a lot of mobile devices, set-top boxes and other embedded web  
> browsers
> (which could otherwise display a PNG quite happily)
>
> However, perhaps ASCII-art generation (as a form of unusual and  
> disjointed
> character set) combined with server-side rendering to a PNG would  
> get around
> that issue, save you a lot of work in obfuscating the HTML itself,  
> and also
> be harder to parse.
>

true.  i'm not too worried about that though.

>> two other factors in favour of ascii art
>>
>> 1) there are tons of ocr programs out there available for free.
>> there are no ascii art regognition programs that i am aware of.
>
> That's not because it's hard - it's because it's been totally  
> pointless,
> until now that is. If spammers start using ASCII art text, then  
> there's an
> incentive to make a reader. On the other hand, any E-mail which  
> contains
> something that looks like ASCII art could probably be classified as  
> spam on
> that basis alone.
>

the problem is that acsii art can contain any chars.  ;-) <- ascii art


> ASCII art is, I believe, much more suited to machine reading than a  
> scanned
> printout.

i thought so too until i started playing with ocr'ing it - the  
results are absolutely terrible.  no doubt someone could train it -  
but that's true of all captchas: a sufficiently trained one will win.

> Most importantly, the characters will be on an exact
> horizontal/vertical grid alignment, not rotated by a few degrees.

version 0.0.3 adds vertical and horziontal displacement.  the next  
one will introduce rotation.

> And also I
> suspect there will probably only be a handful of legible ASCII art  
> character
> sets to choose from.
>

but the 'pixel' charset is large.  version 0.0.3 works on that angle  
too.

> Anyway, time will tell. If your captcha isn't widely used, then it may
> remain strong enough for a reasonable time. (That's apart from the  
> usual
> attacks on captchas, such as redirecting them to other humans who  
> are in
> search of porn :-)

right.  and this is the key point: attacks can beat you with that  
strategy every time (not with my timebomb though - at least not as  
often).  the only goal for a captcha is that it is not easily beaten  
by average coders - it's not securing something after all - it's a  
filter (not wall) for bots.

i'll await your next attack!

cheers.


-a
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama