> have fun putting that together.  to do it you need to render, not  
> just parse, html!

It looks pretty easy to me. You'll conveniently put all the noise characters
in a different colour.

Here's my two-minute solution:

$ cat reader.rb
src = File.read("test.html")

src.gsub!(/<span [^>]*#ccc[^>]*>([^<]*)<\/span>/i) { " " * $1.size }
src.gsub!(/&nbsp;/, ' ')
src.gsub!(/<br>/i, "\n")
src.gsub!(/<\/?pre[^>]*>/, '')
puts src
$ ruby reader.rb
   __                      _
  / _|                    | |
 | |_     ___      ___      |__      __ _    _ __
 |  _|   / _ \    / _ \   | '_ \    / _  |  | '__|
 | |    | (_) |  | (_) |  | |_) |  | (_| |  | |
 |_|      ___/    \___    |_._ /    \__,_|  |_|

Of course you can keep changing your code, and I can keep changing mine. But
someone who took more than two minutes over this could come up with a much
more robust solution (e.g. dynamically working out the contrast between
foreground and background)

Anyway, once your code is deployed on a real live site, by someone other
than you, it becomes much harder to change. And the source is going to be
available to the attacker too.

> now, where i'm heading now, is using css and javascript so to  
> position the image and characters within the image.

Hmm - this risks making the captcha visible by fewer and fewer browsers. OK,
so lynx wouldn't be able to view a PNG captcha either; but you risk locking
out a lot of mobile devices, set-top boxes and other embedded web browsers
(which could otherwise display a PNG quite happily)

However, perhaps ASCII-art generation (as a form of unusual and disjointed
character set) combined with server-side rendering to a PNG would get around
that issue, save you a lot of work in obfuscating the HTML itself, and also
be harder to parse.

> two other factors in favour of ascii art
> 
> 1) there are tons of ocr programs out there available for free.   
> there are no ascii art regognition programs that i am aware of.  

That's not because it's hard - it's because it's been totally pointless,
until now that is. If spammers start using ASCII art text, then there's an
incentive to make a reader. On the other hand, any E-mail which contains
something that looks like ASCII art could probably be classified as spam on
that basis alone.

ASCII art is, I believe, much more suited to machine reading than a scanned
printout. Most importantly, the characters will be on an exact
horizontal/vertical grid alignment, not rotated by a few degrees. And also I
suspect there will probably only be a handful of legible ASCII art character
sets to choose from.

Anyway, time will tell. If your captcha isn't widely used, then it may
remain strong enough for a reasonable time. (That's apart from the usual
attacks on captchas, such as redirecting them to other humans who are in
search of porn :-)

Regards,

Brian.