On Fri, Jul 15, 2011 at 8:18 AM, Kaspar Schiess <eule / space.ch> wrote:
>> Running with the debugger on for 1.8.7 brings up this discrepancy:
>>
>> The "letters" array for 1.8.7 is this:
>> ["d", "a", "m", "p", "f", "s", "c", "h", "i", "f", "f", "f", "a", "h",
>> "r", "t", "s", "k", "a", "p", "i", "t", "\303", "\244", "n", "s", "m",
>> "\303", "\274", "t", "z", "e", "n", "h", "a", "l", "t", "e", "r", "h",
>> "e", "r", "s", "t", "e", "l", "l", "e", "r"]
>>
>> Now, "\303", "\244" is a UTF-8 encoding of umlauts-over-a (. In your.8
>> german
>> hyphenation file, you encode the in itwith the latin-1 encoding \344.
>>
>> Your input text is UTF-8, but the library searches for the latin1
>> encoding. Changing
>> the input to \344 for and \374 for made the test pass for me on 1.8.7.
>
> I second that analysis. It seems to use text-hyphen in Ruby 1.8 with other
> languages than english (with any languages that use exotic characters notn
> ASCII), you will have to make sure that your input is in the same character
> encoding as the language file is. In the case of german, this is LATIN1. So
> opening and changing the file in your text editor has probably converted the
> file to utf8, Austin.
>
> Fixing the 1.8 version in the general case (any input, any language file
> encoding) will be hard... and useless, since you would program towards a use
> case that should go extinct.

I'm not so much looking for the general case, but this specific case,
since it's a bug about a word that you filed four years ago (yes, the
one you linked) ;)

Text::Hyphen under Ruby 1.8 has always said you need to match the
encoding of the input to the encoding of the hyphenation file (and
that'll still be true under Ruby 1.9, but at least there it'll be a
*consistent* UTF-8 encoding for all hyphenation files). I just forgot
that for this particular test.

> More than one solution offers itself ;)
>
> a) convert the file test_bugs.rb back to latin1 (-> bad, will break soon
> again)

Doing that would cause Ruby 1.9 to fail. If I'm willing to split the
test into 1.8 and 1.9 versions (and use load) for the specific failing
bug, then I can make this work for this release.

> b) digging back through the old version history (I am sure you have it ;)) -
> trying to see if [1] was specifically about german umlauts or if it was just
> the german and the size of the word that tripped the bug. If it was one of
> the latter - then remove those damn umlauts from the word (-> ae,  ue)
> and use the new test expectations that derive from that. This would make the
> file ASCII again, and less sensible to editor conversion.

It was the umlauts, and (ahem) you filed the bug with the umlauts. ;)

> c) The solution you say you don't want: Dropping 1.8 support from newer
> gems. Since bundler & rvm this is increasingly simple to manage - I'll just
> limit my old projects to use an old version of text-hyphen.
>
> Considering the impossible (aka: very laborious and quite not to the point)
> nature of the bug in 1.8, I would choose c) or (if must be) b).

I'm trying to get out *one more* release of 1.8his onend then
Text::Hyphen (or its successor) will happily be 1.9 only. This is a
"final 1.8" release and then I'm going to bump the major version if I
keep the project name (which is a good one) and put "ruby >= 1.9.2" in
the gemspec. This is the transitional release only.

> [1]
> http://rubyforge.org/tracker/index.php?func=detail&aid=9807&group_id=294&atid=1195

-- 
Austin Ziegler halostatue / gmail.com austin / halostatue.ca
http://www.halostatue.ca/ http://twitter.com/halostatue