In article <20071230084654.90944E0483 / mail.bc9.jp>,
Nobuyoshi Nakada <nobu / ruby-lang.org> writes:
> X11, and Solaris (IIRC).
I tested on Solaris.
% uname -a
SunOS solaris 5.11 snv_70b i86pc i386 i86pc
% cat wdump.c
#include <wchar.h>
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
int main(int argc, char **argv)
{
char *s;
wint_t wc;
s = setlocale(LC_ALL, "");
if (s == NULL) { perror("setlocale"); exit(1); }
printf("sizeof(wchar_t):%d\n", sizeof(wchar_t));
while ((wc = getwchar()) != WEOF) {
printf("0x%lx\n", (long)wc);
}
if (ferror(stdin)) { perror("getwchar"); exit(1); }
return 0;
}
% ./ruby -e 'print "\u3042"'|LANG=ja_JP.UTF-8 ./wdump
sizeof(wchar_t):4
0x3042
% ./ruby -e 'print "\u3042"'|iconv -f UTF-8 -t eucJP|LANG=ja_JP.eucJP ./wdump
sizeof(wchar_t):4
0x30001222
HIRAGANA LETTER A (U+3042) is represented differently on
ja_JP.UTF-8 and ja_JP.eucJP.
We don't know the internal of wchar_t.
(__STDC_ISO_10646__ is optional.)
But the encoding of mbs is known: nl_langinfo(CODESET).
So I recommend wcs -> mbs -> UTF-8.
--
Tanaka Akira