--nextPart2197406.k2ny99hFUk
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Quoth Jamal Bengeloun:
> Thanks a lot for your help. I thought I will be going mad with this. I=20
> thought it had something to do with ruby being C based (I saw something=20
> on the internet about the difference between Python and JPython and the=20
> accented characters were encoded in UTF-8 and not html escaped).
>=20
> What if the end rendering engine is not a browser (I checked and you're=20
> absolutely right, it does work in a browser)? How to get true UTF-8=20
> encoded characters instead of HTML escaped ones? I am using builder to=20
> generate XML files from the data I get.
>=20
> Thanks a lot for your explanation (it really did enlighten me) and your=20
> help.
>=20
> Jamal
>=20
> Konrad Meyer wrote:
> > Quoth Jamal Bengeloun:
> >> characters into utf-8
> >>=20
> >> ...
> >>=20
> >> Does someone have an explanation?
> >>=20
> >> Does anyone know how to get those characters into the final xml files?
> >>=20
> >> Any help would be greatly appreciated.
> >>=20
> >> Jamal
> >=20
> >   In short, you're asking what the difference between "\303\251", "=C3=
=A9",
> > and "‚" are.
> >=20
> >   The first is an octal sequence embedded in a string (it happens to be=
=20
> > the
> > same as utf-8 '=C3=A9'). The second is also utf-8 '=C3=A9'. These two a=
re the same
> > string ("\303\251" =3D=3D "=C3=A9"). The last, '‚' is the html-es=
caped=20
> > notation
> > for a '=C3=A9' (I'm trusting your email for the correct number here). T=
hat=20
> > is,
> > literally "‚" !=3D "=C3=A9", but they should render the same to a=
 browser
> > capable of displaying utf-8.
> >=20
> > HTH,

If I'm not mistaken, HTML and XML encoding is the same. So you're good for=
=20
those &#xxxxxx; chars.

HTH,
=2D-=20
Konrad Meyer <konrad / tylerc.org> http://konrad.sobertillnoon.com/

--nextPart2197406.k2ny99hFUk
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQBHJzvUCHB0oCiR2cwRAg6nAJ9HEXRlTZLDsRvWx/aPAb45GEoFegCZASuq
pN2jznr6yYf8QWZyMjAERFk=
=s1oo
-----END PGP SIGNATURE-----

--nextPart2197406.k2ny99hFUk--