Further on this, the escaping performed by builder is a bit slack. I
believe that this:

    def _escape(text)
      text.
	gsub(%r{&}, '&').
	gsub(%r{<}, '&lt;').
	gsub(%r{>}, '&gt;')
    end

should be replaced with the more paranoid:

    def _escape(text)
        text.gsub(/[^-\w\d\/\n\r _:;+=.\@*,()#]/) do |x|
            case x
                when '"' : '&quot;'
                when '\'' : '&apos;'
                when '<' : '&lt;'
                when '>' : '&gt;'
                when '&' : '&amp;'
                else
                    "&##{x[0]};"
            end
        end
    end 

This should escape everything outside of the ascii range to ensure the
data isn't corrupted by invalid characters. I might have missed a
character or 10, but escaping everything is safer than not escaping.

(though it should probably be split out to only escape quotes within
attribute strings, otherwise the output can look a little messy).

The hoodwink.d onslaught rss is currently not wellformed (there's a \210
character in a couple of entries), and I was going to pass it through
mousehole to fix it, but there were further issues there. If hoodwink.d
is using Builder::XmlMarkup (or FeedTools) to generate the onslaught
rss, it probably needs this change to stay valid.

> -----Original Message-----
> From: Daniel Sheppard 
> Sent: Friday, 28 October 2005 11:10 AM
> To: ruby-talk ML
> Subject: Escaping Attributes with Builder::XmlMarkup
> 
> Just hit a problem using FeedTools where links with 
> ampersands in them were being left unescaped in the output. I 
> realised this was a Builder::XmlMarkup thing, and patched it 
> there, but when I went to the Builder::XmlMarkup CVS to see 
> what was going on there, I found this:
> 
> http://rubyforge.org/cgi-bin/viewcvs.cgi/builder/lib/builder/x
mlmarkup.r
> b.diff?r1=1.3&r2=1.4&cvsroot=builder
> 
> It seems that a new option has already been added to 
> Builder::XmlMarkup to escape XML attributes, but that it 
> defaults to false. Is there a reason for this? I was sure 
> that <element attr="value&value"> was not well-formed XML, 
> and should read <element attr="value&amp;value"> - why is 
> that not the default behaviour?
> 
> So, which library needs to be fixed? Builder or FeedTools?
> 
> (BTW - this is affecting the xml reprocessing in the CVS 
> version of MouseHole - so _why you might want to keep an eye on this).
> 
> ##############################################################
> #######################
> This email has been scanned by MailMarshal, an email content filter.
> ##############################################################
> #######################
> 
> 
#####################################################################################
This email has been scanned by MailMarshal, an email content filter.
#####################################################################################