On Jun 15, 2005, at 3:36 PM, Nikolai Weibull wrote:

> Ezra Zygmuntowicz wrote:
>
>
>>     Could someone help me do a little regex conversion? I've got a
>> few perl compatible regexes from a php script I am trying to port to
>> ruby but I need a little help. Here are the php functions:
>>
>>  $buffer = preg_replace("#(?<!\"|http:\/\/)www\.(?:[a-zA-Z0-9\-]+\.)*
>> [a-zA-Z]{2,4}(?:/[^ \n\r\"\'<]+)?#", "http://$0", $buffer);
>>  $buffer = preg_replace("#(?<!\"|href=|href\s=\s|href=\s|href\s=)
>> (?:http:\/\/|https:\/\/|ftp:\/\/)(?:[a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,4}
>> (?::[0-9]+)?(?:/[^ \n\r\"\'<]+)?#", "<a href=\"$0\" target=\"_blank 
>> \">
>> $0</a>", $buffer);
>> $buffer = preg_replace("#(?<=[\n ])([a-z0-9\-_.]+?)@([^,< \n\r]+)#i",
>> "<a href=\"mailto:$0\">$0</a>", $buffer);
>>
>
> OK, this wins my newly instated prize for _worst regexes ever_.   
> Inefficient,
> inconclusive, inconsistent, and just plain wrong.  I really hope you
> don√’ have to work with a lot of code like this.
>
> Nonetheless, here√‘ my solution:
>
> domain = /(?:[[:alnum:]\-]+\.)/
> tld = /[[:alpha:]]{2,4}/
> buffer.gsub!(/(?<!"|http:\/\/)www\.#{domain_part}*#{tld}/, 'http:// 
> \0')
> buffer.gsub!(/(?<!\"|href=|href\s=\s|href=\s|href\s=)
>               (?:https?|ftp):\/\/#{domain_part}+#{tld}
>               (?::\d+)?(?:\/[^\s"'<]+)?/x,
>              '<a href="\0" target="_blank">\0</a>')
> buffer.gsub!(/(?<=\s)[[:alnum:]\-_.]+@[^,<\s]+/i,
>              '\0')
>
> Totally untested, but at least it√‘ somewhat easier to understand  
> and a
> bit more correct.  There are better ways to extract URLs and email
> addresses from an input than this, mind you,
>         nikolai
>
> -- 
> Nikolai Weibull: now available free of charge at http://bitwi.se/!
> Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
> main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
>

Nikolai-
     Thank you. I have inherited a ton of NASTY php code like this at  
the newspaper I work at. I am rewriting it all in rails and ruby cgi  
scripts. But the guy who wrote this stuff is no longer here and I  
think he liked making his code as obsfuscated as possible in order to  
keep his job secure. I am by no means a regex master so digesting  
volumes of stuff like this hurts my head. Thank you for the help.

-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
509-577-7732
ezra / yakima-herald.com