On Mon, 15 Sep 2008 22:24:57 +1000, Yukihiro Matsumoto  
<matz / ruby-lang.org> wrote:

> I am not sure what you mean by "inconsistent".  What are your ideal
> messages (or behavior) for each case?
>

>
> In message "Re: [ruby-core:18600] [Bug #566] String encoding error  
> messages are inconsistent"
>     on Mon, 15 Sep 2008 15:50:17 +0900, Michael Selig  
> <redmine / ruby-lang.org> writes:
>
> |Please compare:
> |"abc".encode("UTF-16BE") << "abc"
> |==> EncodingCompatibilityError: incompatible character encodings:  
> UTF-16BE and US-ASCII
> |and:
> |"abc".encode("UTF-16BE") =~ /abc/
> |==> ArgumentError: incompatible encoding regexp match (US-ASCII regexp  
> with UTF-16BE string)

I would expect these to both be "EncodingCompatibilityError"

> |
> |also handling of broken (illegal) string encodings is not consistent:
> |"abc".force_encoding("UTF-16BE") =~ /abc/
> |==> ArgumentError: broken UTF-16BE string
> |and:
> |"abc".force_encoding("UTF-16BE") == "abc"
> |==> false (no error)
> |and:
> |"abc".encode("UTF-16BE").count("b".force_encoding("UTF-16BE"))
> |==> ArgumentError: invalid byte sequence in UTF-16BE

I guess in this group there are 2 issues:
1) (This is minor) I would expect both error messages to have the same  
text - I think the "invalid byte sequence in XXX" is the better.
2) It seems inconsistent to me that the 1st & 2nd expressions look almost  
the same as each other (a regexp match & a string compare) yet only the  
regexp match raises an error.

In fact I have noticed that most String methods seem not to complain when  
operating on broken strings, but Regexps do. There is actually a rather  
bizzare test in test_m17n.rb that relies on String methods NOT complaining  
that they are operating on broken strings:
	s = "\xa1".force_encoding("euc-jp")
	assert_equal(true, "".center(2, s).valid_encoding?)
Here "\xa1" by itself is an invalid euc-jp char, but "\xa1\xa1" is valid.  
This test is actually relying on the fact that String#center is putting  
the 2 invalid characters around a null string without complaining and  
creating one valid character! I think this behaviour could be confusing to  
a ruby programmer - padding to 2 chars and getting a 1 character result -  
probably not what was intended.

To me it would be preferable if Regexp & String methods behaved the same  
way in this regard - probably the best would be to raise errors in both.  
That would prevent confusing behaviour like the above test.

Cheers
Mike.