On Feb 25, 2:35 pm, Austin Ziegler <halosta... / gmail.com> wrote:
> On Mon, Feb 25, 2008 at 8:25 AM, Simone Carletti <wep... / gmail.com> wrote:
> >  I run a deep search through this group and other resources online but
> >  I have been unable to find whether is there a way to guess the charset
> >  of a string in Ruby 1.8.6.
>
> >  I need to ensure a string is always UTF-8 encoded but Iconv requires
> >  the developer to specify both in and out charset.
> >  On the other side, Kconv provides a #guess() method but doesn't
> >  support Latin or Western encodings.
>
> >  Any suggestion?
>
> Kconv can guess because the encodings for the set of Asian written
> languages are distinctive (they don't share much with the Latin
> character set). What you're wanting is nearly impossible without a
> large body of text for analysis, and even then the best commercial
> programs are taking stabs at probabilities. (Here's an example: how do
> you tell the difference between ISO-8859-1 and ISO-8859-15
> programmatically? IIRC, the only difference between them is that -15
> supports the Euro symbol, replacing a different symbol from -1.)
>
> You're better off seeking a slightly different approach.
>
> -austin
> --
> Austin Ziegler * halosta... / gmail.com *http://www.halostatue.ca/
>                * aus... / halostatue.ca *http://www.halostatue.ca/feed/
>                * aus... / zieglers.ca

If I'm right both ISO-8859-1 and ISO-8859-15 belongs to Latin1 thus I
can convert them in the same way using Iconv.iconv('UTF-8', 'LATIN1',
'a string').join.

My goal is not to be able to detect each single different charset but
to convert all string from an input into UTF-8.


In the meantime I was reading the code of rFeedParser, the Ruby
implementation of Python FeedParser.
I just discovered it depends on a project called https://rubyforge.org/projects/rchardet/

I gave it a look and it seems to do exactly what I was looking for.

Anyone is using this library?