-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

All-

I'm trying to convert some characters from a text file that are
double-byte characters from Czech and Polish into HTML entities. I have
a Ruby script that I'm trying to do this with, but when I'm scanning,
I'm getting several characters, instead of just one. Can somebody help
me out here? I imagine I'm not doing this in a terribly efficiently
anyhow, so an optimizations would be appreciated. Here's the text I'm
converting:

	Aguarda carregamento da páČina....

And I'm doing this for the conversion:

str = "Aguarda carregamento da páČina...."
str.scan(/./mu) { |c|
	v = c.unpack("U.")[0]
	if (v > 255)
		out += "&\##{v};"
	else
		out += c
	end
}
puts str

The output is:
	Aguarda carregamento da p᧩na....

Which is obviously not correct. When I do a "p c" inside the String#scan
block, it shows the script is grabbing all characters individually (as
it should), except "áČi" which is grabs together. The output from that
is: "\341gi"

I don't know enough about internationalization to know any meaning to
this, but I need to figure it out ASAP.

Any help would be much obliged.

Cheers,

bs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9pzF4UpoGqensAXIRAtCHAKCHgJV/E0zvAgqxgJklwMkgcaBpYQCfZmOe
2yZBjeacRfyn3SSMIuGpIms=
=7Aet
-----END PGP SIGNATURE-----