Hello --

On Fri, 16 Nov 2001, Sean Russell wrote:

> Actually, I'm afraid I let this get out of hand.  Let me clarify:

Don't take all the credit :-)

> The source document must be well formed.  If an '&', '<', or '>' are
> encountered in the source document, an error is (should be) reported.
> However:
>
>   element.text = "cats & dogs"
>
> is valid; REXML will auto-convert '&' to '&amp;' on output.  With the never
> versions (1.1a+, I believe), it also correctly processes text such as:
>
>   element.text = "cats & &#100;ogs"
>
> When you write out the element, '&' will be converted, and the entity will
> be ignored.

There's a possible conceptual problem with this: namely, that when you
set the text this way, you are actually writing to the source
document.  (I don't mean a file on disk; I mean modifying the document
as it's viewed by the parser.)  So one could argue that the argument
to Element#text= should follow the same rules as other input.  As you
say, the auto-escaping is just a convenience, but I'm wondering
whether it could be misleading or too convenient :-)

> When you *read* an XML source, '&amp;', '&lt;', and '&gt;' are converted to
> '&', '<', and '>' automatically for you, for convenience; all other
> entities are ignored.  Unquoted '&', '<', and '>' generate errors.

What if I've defined an entity?  (And there are the other two
built-ins, but I think you've added handling for those.)  For example:

  doc = Document.new <<EOS
  <?xml version='1.0'?>
  <!DOCTYPE doc [
    <!ENTITY me "David Alan Black">
  ]>
  <doc>
    <person>
      Hello, I am &me;
    </person>
  </doc>
  EOS

  puts doc.root.elements["person"].text

  # =>  Hello, I am &me;

Well, I guess that raises the whole DTD question :-)


David

-- 
David Alan Black
home: dblack / candle.superlink.net
work: blackdav / shu.edu
Web:  http://pirate.shu.edu/~blackdav