Nobuyoshi Nakada schrieb: > Wolfgang NáÅasi-Donner wrote in [ruby-core:12804]: >>> encoding.string("...") >>> "...".encode(encoding) >> This works already... >> >> irb(main):001:0> s = "abc".force_encoding("UTF-8") >> => "abc" >> irb(main):002:0> s.encoding >> => <Encoding:UTF-8> > > It doesn't validate nor convert the content. We'll need method(s) > to do it. We have Iconv for doing the job. In the example "abc" the text is encoded by the editor used to write it in the program. The author only needs to tell Ruby what encoding he expects. Validating is another critical aspect. I see there two different levels. For utf-8 its clearly defined which byte values are allowed in the positions 1 to a possible maximum position 4. If this is not valid the Ruby methods for utf-8 will not work, so the validation will somehow be done when using the methods. On the other side it may be sometimes necessary for some applications (e.g. testing and analysing) to generate invalid utf-8 sequences. This can today (Ruby 1.9) be done by putting the bytes on ASCII-8BIT String objects together, and then using force_encoding. This works fine. The next validation level are the Unicode codepoints. There are invalid ones, undefined ones, and several for private usage. I think it should be left to the programmer to validate the codepoints, because it is time consuming and sometimes possible, that invalid codepoints are wanted (same reasons as for utf-8). As a conclusion of my viewpoint, I think it will be a good idea to have methods for validation of correct utf-8 and Unicode codepoints, but the usage should be left to the programmer. Nobuyoshi Nakada schrieb: > In general, by the magic coding comment. I still don't understand this. Are these "magic coding comment"s set by the editors? - I don't have something like this when using SciTE on Windows (I'm usually a SciTE user). Wolfgang NáÅasi-Donner