Hi, At Fri, 9 Nov 2007 16:08:02 +0900, David Flanagan wrote in [ruby-core:13326]: > >> Q1) In step 1 above, should the default primary encoding come from the > >> locale environment variables (LC_ALL, LC_CTYPE, and LANG) instead of > >> defaulting to ASCII? > > > > It's planned, but we have no mappings from locale name to > > encoding name. Attached is a quick hack I tried the week > > before last. > > Did you consider nl_langinfo(CODESET)? It needs setlocale() to be called before it, which sets global state. > Your code looks good to me, > except that you don't check LC_CTYPE. Is it your intent to be > conservative and assume ASCII unless the locale explicitly specifies an > encoding name following a .? You're not going to choose either EUC-JP > or SJIS as the default for Japanese locales? Forgotten, thank you. I changed locale_encoding() as following.
static rb_encoding * locale_encoding(void) { static const char *const langs[] = {"LC_ALL", "LC_CTYPE", "LANG",}; const char *lang, *at; int i, len, idx = 0; char buf[32]; rb_encoding *enc; for (i = 0; i < sizeof(langs) / sizeof(langs[0]); ++i) { if (!(lang = getenv(langs[i]))) continue; if (!(lang = strchr(lang, '.'))) continue; at = strchr(++lang, '@'); if ((len = (at ? at - lang : strlen(lang))) >= sizeof(buf) - 1) continue; MEMCPY(buf, lang, char, len); buf[len] = 0; idx = rb_enc_find_index(buf); if (idx < 0 && len > 3 && (strncasecmp(buf, "euc", 3) == 0 || strncasecmp(buf, "utf", 3) == 0) && buf[3]) { MEMMOVE(buf + 4, buf + 3, char, len - 2); buf[3] = '-'; idx = rb_enc_find_index(buf); } enc = rb_enc_from_index(idx); if (enc) return enc; } return rb_enc_default(); }
-- Nobu Nakada