>I'm not sure that it is UTF-8 but it's supposed to be. The string I tested
came back as UTF-8, but as you noted it may not be a well done thing.

There's no way to programmatically discover the encoding of a file. There
are tons of internet writeups on this. Unless you know who/where/how it was
typed, best thing you're doing is guessing. If it was typed in the United
States, then it may be UTF-8, Windows-1252, ISO Latin 1 etc. If it was
typed in another country then anything goes.

And that's assuming that the file has remained in the original encoding it
was written in. If it was incorrectly re-encoded along the way, then it's
pretty much a lost cause.

Anyway, you'll just have to fool around with it for a little bit to see if
you can convert it to something that won't break the ruby interpreter. I'm
afraid there are no good answers to your problem.


On Mon, Mar 30, 2015 at 2:17 PM, leam hall <leamhall / gmail.com> wrote:

> On Mon, Mar 30, 2015 at 2:09 PM, Besnik Ruka <bruka / targetedvictory.com>
> wrote:
>
>> You haven't really given enough info to solve this.
>>
>> What encoding is your source file? How certain are you that it is UTF8
>> and that it has always been UTF8? It could've been something else and then
>> someone carelessly converted it to something else.
>>
>> Have you tried using the rails multibyte methods if available?
>> http://api.rubyonrails.org/classes/String.html#method-i-mb_chars
>>
>> Have you tried using the iconv library to convert to true UTF8?
>>
>> You can't test encoding issues on IRB. It's not the same as text coming
>> from an encoded file.
>>
>
> Good points! I'm digesting someone else's XML and have logged a ticket
> with them. They've acknowledged the issue and in 6-12 months it might get
> fixed. Until that time I'm just doing the best I can to get through the
> interim.
>
> I'm not sure that it is UTF-8 but it's supposed to be. The string I tested
> came back as UTF-8, but as you noted it may not be a well done thing.
>
> Had not seen iconv. Have t go see if there's a way to let the system
> assume whatever for the existing encoding.
>
>
>
> --
> Mind on a Mission <http://leamhall.blogspot.com/>
>