Ben Schumacher wrote:

> Tobias Reif wrote:
>>What would be the point then? Then the output of an XML writer can't be
>>read by an XML parser. If it's an XML writer, it should @ least be
>>possible to generate a well-formed document.
>>When there's a package parser+writer, and I input a well-formed
>>document, then call the writer to serialize all out, I want to get an
>>equivalent (entities resolved), well-formed document as output.
> 
> I would agree with this, for the 'write' output. However, I would argue
> that if I'm going to be just extracting the CDATA from an element or the
> value of an attribute, it would be nice to have the data automatically
> transformed to the characters, rather than getting resolved entities.

Actually, I'm afraid I let this get out of hand.  Let me clarify:

The source document must be well formed.  If an '&', '<', or '>' are 
encountered in the source document, an error is (should be) reported.
However:

  element.text = "cats & dogs"

is valid; REXML will auto-convert '&' to '&amp;' on output.  With the never 
versions (1.1a+, I believe), it also correctly processes text such as:

  element.text = "cats & &#100;ogs"

When you write out the element, '&' will be converted, and the entity will 
be ignored.

When you *read* an XML source, '&amp;', '&lt;', and '&gt;' are converted to 
'&', '<', and '>' automatically for you, for convenience; all other 
entities are ignored.  Unquoted '&', '<', and '>' generate errors.

This is purely for convenience, and isn't affected by the XML spec at all.

BTW, as per the XML spec, both Text and Attribute values have this 
conversion performed.  All other content (CDATA and Comments) do not.

-- 
--- SER