> On Wed, 06 Dec 2006 12:01:37 -0000, ciapecki <ciapecki / gmail.com> wrote: > > David Vallner schrieb: > >> Ross Bamford wrote: > >> > On Mon, 2006-12-04 at 22:40 +0900, ciapecki wrote: > >> >> Is there a way in ruby to: > >> >> - open a file encoded in ucs-2le, > >> >> - replace every occurance of '\t' (X'0009') with ',' (X'002c'), > >> >> - and save it back in ucs-2le, without loosing any content? > >> > But that strikes me as unnecessary when you could just do: > >> > > >> > newdata = File.read('test').tr("\t", ',') > >> > # => "a\000b\000c\000,\000\273\006,\0001\000" > >> > > >> > >> Um. Other way around. *Old* data is in UCS-2LE, not in UTF-8, so it's > >> not ASCII-transparent. Your iconv approach could work if you swapped > >> around the encoding names, except you'd probably also have to involve a > >> $KCODE = 'u' and require 'jcode' to avoid clobbering the possible cases > >> where in UTF8, 0x09 and 0x2c are part of a multibyte sequence. > >> > > > > Thanks Ross for the try, but it is not working, > > tried for: > > > > "\377\376B\001\363\000|\001k\000o\000\t\000k\000s\000i\000\005\001|\001k\000a\000\t\000c\000z\000B\001o\000w\000i\000e\000k\000\r\000\n\000B\001\005\001k\000a\000\t\000\t\000|\001d\000z\001b\000B\001o\000\r\000\n\000" > > which is: > > > > ko ksika czowiek > > ¥Ê¤Æ©Ìa ¥Ê¥·d¥Ê¥³b¥Ê£Ð > > > > -> (the same :)) > > > > the conversion should be: > > ko,ksika,czowiek > > ¥Ê¤Æ©Ìa,,¥Ê¥·d¥Ê¥³b¥Ê£Ð > > > > but with the Iconv try: > > ko,ksika,czowiek > > Ø·äǵ©µù > > > > after swapping utf-8 to ucs-2le in the both iconv convertions, I get an > > error message: > > `iconv': "\377\376B\001¥Ä¥» |ãø¥³k\000o\000\t\000k\000"... > > (Iconv::IllegalSequence) > > > > > > Any other suggestions highly appreciated. > > > > I think David is confusing the order of the 'from' and 'to' arguments to > Iconv.iconv - they go: (to, from, data). My short example was > ill-conceived, though - this might be safer: > > $ irb -riconv > > s = <the string you show above> > > s.gsub(/\t\000(?!\000)/, ",\000") > # => > "\377\376B\001\363\000|\001k\000o\000,\000k\000s\000i\000\005\001|\001k\000a\000,\000c\000z\000B\001o\000w\000i\000e\000k\000\r\000\n\000B\001\005\001k\000a\000,\000,\000|\001d\000z\001b\000B\001o\000\r\000\n\000" > > (This is: > > ko,ksika,czowiek > ¥Ê¤Æ©Ìa,,¥Ê¥·d¥Ê¥³b¥Ê£Ð > ) > > But I'm not totally sure, so you might be better with iconv anyway: > > Iconv.iconv('ucs-2le', 'utf-8', Iconv.iconv('utf-8','ucs-2le', > s).first.gsub(/\t/u, ',')).first > # => > "\377\376B\001\363\000|\001k\000o\000,\000k\000s\000i\000\005\001|\001k\000a\000,\000c\000z\000B\001o\000w\000i\000e\000k\000\r\000\n\000B\001\005\001k\000a\000,\000,\000|\001d\000z\001b\000B\001o\000\r\000\n\000" > > (This too is: > > ko,ksika,czowiek > ¥Ê¤Æ©Ìa,,¥Ê¥·d¥Ê¥³b¥Ê£Ð > ) > > Unless I missed something, this seems to work fine here. Does it work for > you? > > -- > Ross Bamford - rosco / roscopeco.remove.co.uk Thanks Ross, I was that stupid and forgot to open the writable file as binary "wb" (before I had "w" only) Thanks again for your help chris