Nooo! Those are the first BYTES of the UTF-8 encoding of thepunctuation that you listed.  MANY Unicode characters (when encoded inUTF-8) can start with those bytes, so if you remove them from a givenstring, you're going to get back a poorly encoded UTF-8 string whichwill is definitely not what you want.
If you want to split on those separators, then why not do soexplicitly?
# fill up c as you've done below>> "asdfasdfasdf".split(/#{c.join('|')}/)=> ["asdf", "asdfasdf"]
On May 8, 2:54 pm, Nanyang Zhan <s... / hotmail.com> wrote:>>> c=%w{ ɡ                                  ݡ                       }>> => ["", "ɡ", "", "", "", "", "", "", "", "", "", "", "",> "", "", "", "", "", "", "", "", "", "", "", "", "", " ",> "", "", "", "", "", "", "", "ݡ", "", "", "", "", "", "",> "", "", "", "", "", "", "", "", "", "", "", "", "", "",> "", "", ""]>>> c.collect.map{|o| o[0]}.sort.uniq> => [226, 228, 229, 230, 231, 233, 239]>> maybe 226 to 239 is the range I need.