On Sun, 27 Feb 2000, GOTO Kentaro wrote:

...

: Because Ruby's regexp is Japanese character code sensitive, some
: substrings are not matched by `/./'.  I know three solutions.

...

Ah, I guess I'm just used to handling things like EUC characters
on my own... :-)

I see that there are classes for Japanese string conversion and
detection, and there's jcode.rb, but is there a class or module that has
the concept of each EUC/SJIS "character" being a discrete unit instead
of two bytes?  Maybe a string-like class with the underlying data being
an array of integers. 

Is the Japanese-sensitive regex's behavior documented anywhere (I didn't
see anything for the "n" option either)?  e.g. is there a way to use
regexes where /./ would match 2 bytes, since . could match a single
multibyte character? 

Is it possible to set an option like "n" when creating a regex when
using Regexp.new() (since I was creating the regex on the fly using
strings)?  The regex options become an attribute of the regex
itself, right?

This also didn't work:

# change hiragana to katakana...
"\xa4\xa2".sub(/\xa4([\xa1-\xf3])/n, "\xa5\\1")

Other /regex/n string subs with \# references I tried worked, but I
don't know why this one and others like it didn't.  Part of the
string.sub implementation?  If I set $KCODE to none before doing the
string.sub, it worked fine. 

Also, speaking of global variables, what happens to the global variables
in a multithreaded program?  Does each thread get a different copy of,
e.g. $! if they each raise an exception at the same time? 

thanks,
Wes.

##  Wes Nakamura  -  wknaka / pobox.com