I've got a couple of questions about the handling of primary encoding. First, here is my understanding of how things currently work in the 1.9 sources: 1) The default primary encoding is ASCII. 2) -K, -E, and --encoding options set primary encoding 3) The primary encoding is associated with the string or file that the parser is going to parse, and thus the primary encoding is used as the default source encoding for the script. 4) If a script contains a coding comment or BOM it overrides the default it sets the source encoding, overriding the primary encoding. 5) Once the script has been parsed (but before it is executed) the source encoding is used to set the primary encoding if the primary encoding was not explicitly specified with -K, -E, or --encoding. Here are my questions: Q1) In step 1 above, should the default primary encoding come from the locale environment variables (LC_ALL, LC_CTYPE, and LANG) instead of defaulting to ASCII? Q2) If yes, to the above, then shouldn't we drop step 5 above? If my locale specifies UTF-8 as my primary encoding then I don't think that should be changed just because I run a script developed by a Japanese programmer and encoded in EUC-JP. Finally, I suspect that nl_langinfo is not a portable way to get the encoding from the locale. The code here (looks like it is public domain) provides emulation: http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c David