On Feb 25, 2:35 pm, Austin Ziegler <halosta... / gmail.com> wrote: > On Mon, Feb 25, 2008 at 8:25 AM, Simone Carletti <wep... / gmail.com> wrote: > > I run a deep search through this group and other resources online but > > I have been unable to find whether is there a way to guess the charset > > of a string in Ruby 1.8.6. > > > I need to ensure a string is always UTF-8 encoded but Iconv requires > > the developer to specify both in and out charset. > > On the other side, Kconv provides a #guess() method but doesn't > > support Latin or Western encodings. > > > Any suggestion? > > Kconv can guess because the encodings for the set of Asian written > languages are distinctive (they don't share much with the Latin > character set). What you're wanting is nearly impossible without a > large body of text for analysis, and even then the best commercial > programs are taking stabs at probabilities. (Here's an example: how do > you tell the difference between ISO-8859-1 and ISO-8859-15 > programmatically? IIRC, the only difference between them is that -15 > supports the Euro symbol, replacing a different symbol from -1.) > > You're better off seeking a slightly different approach. > > -austin > -- > Austin Ziegler * halosta... / gmail.com *http://www.halostatue.ca/ > * aus... / halostatue.ca *http://www.halostatue.ca/feed/ > * aus... / zieglers.ca If I'm right both ISO-8859-1 and ISO-8859-15 belongs to Latin1 thus I can convert them in the same way using Iconv.iconv('UTF-8', 'LATIN1', 'a string').join. My goal is not to be able to detect each single different charset but to convert all string from an input into UTF-8. In the meantime I was reading the code of rFeedParser, the Ruby implementation of Python FeedParser. I just discovered it depends on a project called https://rubyforge.org/projects/rchardet/ I gave it a look and it seems to do exactly what I was looking for. Anyone is using this library?