-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 All- I'm trying to convert some characters from a text file that are double-byte characters from Czech and Polish into HTML entities. I have a Ruby script that I'm trying to do this with, but when I'm scanning, I'm getting several characters, instead of just one. Can somebody help me out here? I imagine I'm not doing this in a terribly efficiently anyhow, so an optimizations would be appreciated. Here's the text I'm converting: Aguarda carregamento da páČina.... And I'm doing this for the conversion: str = "Aguarda carregamento da páČina...." str.scan(/./mu) { |c| v = c.unpack("U.")[0] if (v > 255) out += "&\##{v};" else out += c end } puts str The output is: Aguarda carregamento da p᧩na.... Which is obviously not correct. When I do a "p c" inside the String#scan block, it shows the script is grabbing all characters individually (as it should), except "áČi" which is grabs together. The output from that is: "\341gi" I don't know enough about internationalization to know any meaning to this, but I need to figure it out ASAP. Any help would be much obliged. Cheers, bs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE9pzF4UpoGqensAXIRAtCHAKCHgJV/E0zvAgqxgJklwMkgcaBpYQCfZmOe 2yZBjeacRfyn3SSMIuGpIms= =7Aet -----END PGP SIGNATURE-----