Personally, I think that conversion when adding to a string would be a good
thing. However, I could see an argument for not doing it if somebody will be
writing pre-escaped XML. Perhaps this could be done with some sort of flag?
Or a special method?

I'm not sure what would be consider the most Ruby-esque way to do this, but
it seems to me people should have the option. I'm just not sure what the
best way to implement this would be.

Ben

-----Original Message-----
From: TAKAHASHI Masayoshi [mailto:maki / inac.co.jp]
Sent: Wednesday, November 14, 2001 10:09 PM
To: ruby-talk / ruby-lang.org
Subject: [ruby-talk:25246] Re: ANN: REXML 1.1a3


Hi, I'm REXML newbie :-), and I have a comment.

Sean Russell <ser / germane-software.com> wrote:
> 1) REXML hasn't been handling entities in parsed documents very well.
This 
> has been fixed, but I'm wondering if REXML's behavior is confusing.  REXML

> inherited Electric XML's behavior of converting &, <, and > to entities on

> write.  This is convenient, but it may be confusing for users to know when

> they have to quote their own entities, and when to leave them alone.
1.1a3 
> fixes this to a certain extent; REXML ignores entities in text, but it 
> still converts &, <, and >.  It also reverse-converts &amp;, &lt;, and
&gt; 
> back to characters on a read.  &#xxx; entities are now correctly handled.

> I'm accepting opinions about this matter.  I'd like entity handling to be 
> fairly painless, but I don't want ambiguous behavior if I can avoid it.

In XML 1.0 Rec., there are 5 predefined general entities: "&", "<", ">",
"'" and '"'.
(cf. 4.6 Predefined Entity
     http://www.w3.org/TR/REC-xml#sec-predefined-ent )

Should REXML convert "'" and '"' into &apos; and &quot;?
At least, the behavior:

  foo_str = ""
  foo = REXML::Element.new("foo")
  foo.attributes["bar"] = "aaa'bbb\"ccc"
  foo.write(foo_str)
  p foo_str         #=> "<foo bar='aaa'bbb\"ccc'/>"

is odd.


Regards,

TAKAHASHI Masayoshi (maki / inac.co.jp)