On 15.02.2007 16:19, Ian Macdonald wrote:
> On Thu 15 Feb 2007 at 12:39:21 +0900, Rob Biedenharn wrote:
> 
>> Yes, the LANG is affecting the result in irb, but not ruby.
>>
>> $ irb -v
>> irb 0.9.5(05/04/13)
>>
>> Whether the irb behavior is "correct" or anomalous is probably a  
>> question for the maintainers to debate.  The man page for ctype(3)  
>> (on my Mac OS X 10.4.8) indicates that the macros are supposed to be  
>> based on the locale and my copy of the pickaxe (p.71) says that the  
>> character classes are based on the ctype macros of the same name.   
>> However, a quick C program shows effectively the same behavior as  
>> ruby (i.e., only the [0-9A-Za-z] satisfy isalnum() even for nl_NL).   
>> I'm now more curious as to how irb is finding the character classes.
> 
> It turns out that the poster who mentioned possible interference from
> the readline(3) library was right.

That was me. :-)

> Look at this:
> 
> $ irb
> irb(main):001:0> foo = "prs"
> => "pr\351f\351r\351es"
> irb(main):002:0> foo =~ /[^[:alnum:]]/
> => nil
> 
> $ irb --noreadline
> irb(main):001:0> foo = "prs"
> => "pr\351f\351r\351es"
> irb(main):002:0> foo =~ /[^[:alnum:]]/
> => 2
> 
> This is _very_ unexpected and undesirable behaviour and, as such,
> probably qualifies as a bug.

Yeah, seems so.  Unless it's documented behavior. :-)

> Interestingly, adding "require 'readline'" to the stand-alone script
> does _not_ introduce this behaviour, so it must be something to do with
> the initialisation that irb does.

It's really strange as both print the same output.  How about doing this 
- just to be sure that both strings contain the same sequence of bytes:

require 'enumerator'
foo.to_enum(:each_byte).to_a.join(", ")

Kind regards

	robert