On Jul 4, 2007, at 12:47 PM, Brian Candler wrote: >> have fun putting that together. to do it you need to render, not >> just parse, html! > > It looks pretty easy to me. You'll conveniently put all the noise > characters > in a different colour. > > Here's my two-minute solution: > > $ cat reader.rb > src = File.read("test.html") > > src.gsub!(/<span [^>]*#ccc[^>]*>([^<]*)<\/span>/i) { " " * $1.size } > src.gsub!(/ /, ' ') > src.gsub!(/<br>/i, "\n") > src.gsub!(/<\/?pre[^>]*>/, '') > puts src > $ ruby reader.rb > __ _ > / _| | | > | |_ ___ ___ |__ __ _ _ __ > | _| / _ \ / _ \ | '_ \ / _ | | '__| > | | | (_) | | (_) | | |_) | | (_| | | | > |_| ___/ \___ |_._ / \__,_| |_| > > Of course you can keep changing your code, and I can keep changing > mine. But > someone who took more than two minutes over this could come up with > a much > more robust solution (e.g. dynamically working out the contrast > between > foreground and background) > > Anyway, once your code is deployed on a real live site, by someone > other > than you, it becomes much harder to change. And the source is going > to be > available to the attacker too. > the latest version addresses all these issues and more. check out http://drawohara.tumblr.com/post/4944987 http://fortytwo.merseine.nu:3000/flatulent/ajax key points: - noise is image chars - no color diff between noise and image chars - image is not visible without running gecko or otherwise rendering javascript - image has an encoded timebomb in it: attacker has only 60s for post. this just rules out brute force attacks. i think bumps it up into a new league of attacks - maybe not though, people are creative ;-) >> now, where i'm heading now, is using css and javascript so to >> position the image and characters within the image. > > Hmm - this risks making the captcha visible by fewer and fewer > browsers. OK, > so lynx wouldn't be able to view a PNG captcha either; but you risk > locking > out a lot of mobile devices, set-top boxes and other embedded web > browsers > (which could otherwise display a PNG quite happily) > > However, perhaps ASCII-art generation (as a form of unusual and > disjointed > character set) combined with server-side rendering to a PNG would > get around > that issue, save you a lot of work in obfuscating the HTML itself, > and also > be harder to parse. > true. i'm not too worried about that though. >> two other factors in favour of ascii art >> >> 1) there are tons of ocr programs out there available for free. >> there are no ascii art regognition programs that i am aware of. > > That's not because it's hard - it's because it's been totally > pointless, > until now that is. If spammers start using ASCII art text, then > there's an > incentive to make a reader. On the other hand, any E-mail which > contains > something that looks like ASCII art could probably be classified as > spam on > that basis alone. > the problem is that acsii art can contain any chars. ;-) <- ascii art > ASCII art is, I believe, much more suited to machine reading than a > scanned > printout. i thought so too until i started playing with ocr'ing it - the results are absolutely terrible. no doubt someone could train it - but that's true of all captchas: a sufficiently trained one will win. > Most importantly, the characters will be on an exact > horizontal/vertical grid alignment, not rotated by a few degrees. version 0.0.3 adds vertical and horziontal displacement. the next one will introduce rotation. > And also I > suspect there will probably only be a handful of legible ASCII art > character > sets to choose from. > but the 'pixel' charset is large. version 0.0.3 works on that angle too. > Anyway, time will tell. If your captcha isn't widely used, then it may > remain strong enough for a reasonable time. (That's apart from the > usual > attacks on captchas, such as redirecting them to other humans who > are in > search of porn :-) right. and this is the key point: attacks can beat you with that strategy every time (not with my timebomb though - at least not as often). the only goal for a captcha is that it is not easily beaten by average coders - it's not securing something after all - it's a filter (not wall) for bots. i'll await your next attack! cheers. -a -- we can deny everything, except that we have the possibility of being better. simply reflect on that. h.h. the 14th dalai lama