At 19:44 08/01/12, Vincent Isambart wrote: > >On Jan 12, 2008, at 10:53 AM, Martin Duerst wrote: > >> This might slightly change once we introduce a third argument to >> String#encode. This third argument, as I currently plan it, should >> be able to express things such as "convert non-convertibles to >> a replacement character" or "simply drop non-convertible data" >> or so. > > >Instead (or in addition to) this third argument, what about having the >String#encode function taking a block? That's also planned. But a third argument can be much faster for the simple cases, that's why I don't want to exclude it. >I haven't thought at it a lot, but something like this could be useful: >- to replace unknown characters with '?' >str.encode('UTF-8') { '?' } >- to strip the unknown characters >str.encode('UTF-8') { '' } >- to trancode from a mix of UTF-8 and ISO-8859-1 to ISO-8859-1 (yes >that may sound strange but I've seen cases when it may appear with >badly managed data) >str.encode('ISO-8859-1', 'UTF-8') { |s| s } > >(to may everything simpler I did not take into account the encoding of >the string returned by the bloc, checking it may be a good thing, I do >not know) > >You may even want to be able to control if the block is called with >either each unknown byte or each sequences of unknown bytes. Giving >the position in the start string to the lock may be also a good idea. > >I'm not sure this idea could have any useful use except in the case of >data in mixed encodings (and I'm note even sure if this is common or >not), and it probably needs some more thought, but it was just an idea >that crossed my mind and seemed more 'Ruby-like' than just an >additional parameter. The encode function may of course be made to >support both the additional parameter and the bloc. It's definitely Ruby-like, and there are quite a few use cases. The one I'm thinking about most is converting non-convertible characters to escapes of various kinds. I have thought about quite a few of the cases you mention above, but I have to think through your 'convert from mixed encoding' case a bit more. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst / it.aoyama.ac.jp