On Tue, Sep 18, 2012 at 1:54 AM, Thomas Bednarz <lists / ruby-forum.com> wrote:
> Hi Nathan,
> Thanks for your help. It is a bit a try and error thing! This worked for
> me:
>
> f = File.open("somefile.txt", "r:iso-8859-1:utf-8")
> f.each do |line|
>   ...
> end
>
> In real live I will have to process files (CSV) that are uploaded from
> anywhere, produced by any system (Mac, Linux, Windows) using whatever
> software like different versions of Excel, OpenOffice Calc or extracts
> from databases...

If it's an upload via HTTP, you might get lucky and get a 'charset' in
the Content-Type header. You might be able to make some good guess
based on the operating systems and the applications that create them.
For example, if the CSV is from Windows and Excel, it's likely that
the encoding is Windows-1252. On that example, it's important to note
that Windows-1252 is a superset of ISO-8859-1, so if you parse the
file with ISO-8859-1 and it works most of the time for files from
Windows+Excel, but occasionally includes replacement characters, it's
because the CSV is using the few characters that don't overlap.

Check out http://en.wikipedia.org/wiki/Windows-1252 and
http://en.wikipedia.org/wiki/ISO/IEC_8859-1.

>
> I don't know yet, how to determine the encoding of those uploaded
> files...
>
> Tom
>
> --
> Posted via http://www.ruby-forum.com/.
>