I've got a couple of questions about the handling of primary encoding. 
First, here is my understanding of how things currently work in the 1.9 
sources:

1) The default primary encoding is ASCII.

2) -K, -E, and --encoding options set primary encoding

3) The primary encoding is associated with the string or file that
the parser is going to parse, and thus the primary encoding is used as 
the default source encoding for the script.

4) If a script contains a coding comment or BOM it overrides the default 
  it sets the source encoding, overriding the primary encoding.

5) Once the script has been parsed (but before it is executed) the 
source encoding is used to set the primary encoding if the primary 
encoding was not explicitly specified with -K, -E, or --encoding.

Here are my questions:

Q1) In step 1 above, should the default primary encoding come from the 
locale environment variables (LC_ALL, LC_CTYPE, and LANG) instead of 
defaulting to ASCII?

Q2) If yes, to the above, then shouldn't we drop step 5 above?  If my 
locale specifies UTF-8 as my primary encoding then I don't think that 
should be changed just because I run a script developed by a Japanese 
programmer and encoded in EUC-JP.

Finally, I suspect that nl_langinfo is not a portable way to get the 
encoding from the locale.  The code here (looks like it is public 
domain) provides emulation: http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c

	David