On Dec 3, 2:07 pm, Greg Willits <li... / gregwillits.ws> wrote: > >> >> 'añÃvHtéØH¥ÊFuG'.scan(/[\303\251]/u) > >> => [] > >> >> 'añÃvHtéØH¥ÊFuG'.scan(/[#{"\303\251"}]/u) > >> => ["] > > OK, one thing I'm still confused about -- when I look up in any table,t's DEC is 233 which converted to OCT is 351, yet you're using 251 (and > indeed it seems like reducing the OCTs I come up with by 100 is what > actually works). > > Where is this 100 difference coming from? http://www.fileformat.info/info/unicode/char/00e9/index.htm The UTF-16 value is 233 (decimal), but the UTF-8 value is 0xC3 0xA9, which is 195 169 in decimal, or 0303 0251 in octal.