Issue #7845 has been updated by timothyg56 (Timothy Garnett).


A patch for this is pretty straightforward, see https://gist.github.com/tgarnett/5032660 which is only a couple of lines.

As someone dealing with a lot of web crawling and chinese source data, having strip remove non-breaking / ideographic spaces is a real boon (particularly given the large amount of code we have originally written to 1.9.1).
----------------------------------------
Bug #7845: Strip doesn't handle unicode space characters in ruby 1.9.2 & 1.9.3 (does in 1.9.1)
https://bugs.ruby-lang.org/issues/7845#change-39126

Author: timothyg56 (Timothy Garnett)
Status: Open
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) [x86_64-linux]
Backport: 


Strip and associated methods in ruby 1.9.2 and 1.9.3 do not remove leading/trailing unicode space characters (such as non-breaking space \u00A0 and ideographic space \u3000) unlike ruby 1.9.1.  I'd expect the 1.9.1 behavior.  Looking at the underlying native lstrip! and rstrip! methods it looks like this is because 1.9.1 uses rb_enc_isspace() whereas 1.9.2+ uses rb_isspace().

1.9.1p378 :001 > "\u3000\u00a0".strip
 => "" 

1.9.2p320 :001 > "\u3000\u00a0".strip
 => "?????"

1.9.3p286 :001 > "\u3000\u00a0".strip
 => "?????"


-- 
http://bugs.ruby-lang.org/