On Fri, Jul 15, 2011 at 1:46 AM, Michael Edgar <adgar / carboni.ca> wrote:
> On Jul 15, 2011, at 12:45 AM, Austin Ziegler wrote:
>> I've had folks asking me for a release of text-hyphen that works with
>> Ruby 1.9, and while I've got something that passes the tests that I've
>> created and added for MRI 1.9, it *loses* compatibility with Ruby
>> 1.8.7 (and does so loudly in the tests) and JRuby (in either 1.8 or
>> 1.9 mode, it appears). I need some help to get the last bits ready,
>> because I'm not ready to drop Ruby 1.8 entirely (at least one more
>> version).
> Running with the debugger on for 1.8.7 brings up this discrepancy:
>
> The "letters" array for 1.8.7 is this:
> ["d", "a", "m", "p", "f", "s", "c", "h", "i", "f", "f", "f", "a", "h", "r", "t", "s", "k", "a", "p", "i", "t", "\303", "\244", "n", "s", "m", "\303", "\274", "t", "z", "e", "n", "h", "a", "l", "t", "e", "r", "h", "e", "r", "s", "t", "e", "l", "l", "e", "r"]
>
> Now, "\303", "\244" is a UTF-8 encoding of umlauts-over-a (. In your 1.8 german
> hyphenation file, you encode the in itwith the latin-1 encoding \344.
>
> Your input text is UTF-8, but the library searches for the latin1 encoding. Changing
> the input to \344 for and \374 for made the test pass for me on 1.8.7.

I think you're right. Now to figure out how to fix it properly in this case.

-a
-- 
Austin Ziegler halostatue / gmail.com austin / halostatue.ca
http://www.halostatue.ca/ http://twitter.com/halostatue