Issue #7845 has been updated by naruse (Yui NARUSE).


matz (Yukihiro Matsumoto) wrote:
> Akira, Thank you for pointing out.
> 
> But it's hard for me to imagine concrete problematic cases.
> When text from network connection is marked as Unicode, that's OK to process them as Unicode text,
> otherwise they should be marked as 'ASCII-8BIT' so that #strip and other methods should behave as
> they are now.
> 
> Matz.

Modern protocol like SMTPUTF8 <http://tools.ietf.org/html/rfc6532> and URL Standard <http://url.spec.whatwg.org/>
use UTF-8 as its character encoding, but they use ASCII whitespace.
----------------------------------------
Feature #7845: Strip doesn't handle unicode space characters in ruby 1.9.2 & 1.9.3 (does in 1.9.1)
https://bugs.ruby-lang.org/issues/7845#change-39169

Author: timothyg56 (Timothy Garnett)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 


Strip and associated methods in ruby 1.9.2 and 1.9.3 do not remove leading/trailing unicode space characters (such as non-breaking space \u00A0 and ideographic space \u3000) unlike ruby 1.9.1.  I'd expect the 1.9.1 behavior.  Looking at the underlying native lstrip! and rstrip! methods it looks like this is because 1.9.1 uses rb_enc_isspace() whereas 1.9.2+ uses rb_isspace().

1.9.1p378 :001 > "\u3000\u00a0".strip
 => "" 

1.9.2p320 :001 > "\u3000\u00a0".strip
 => "?????"

1.9.3p286 :001 > "\u3000\u00a0".strip
 => "?????"


-- 
http://bugs.ruby-lang.org/