2007/8/3, Bob Hutchison <hutch / recursive.ca>:
> Hi,
>
> Does anyone know of a fast implementation of the XML escape method
> (the one that converts '"<>& to &quot; etc.)?
>
> I did some benchmarking on one of my applications and the
> implementation I have, which I thought was okay -- simple minded for
> sure, but okay -- turns out to be a bottle neck in certain operations.
>
> I used ruby-prof with a simple test, running over a 400 character
> string 50,000 times or so. Running the profiler on version0 (below)
> took 1.39 seconds.
>
> def version0(input)
>    # all kinds of other processing of input simulated by the input.dup
>    result = input.dup
>
>    return result
> end
>
> The original simple minded way was, under ruby-prof ran in 8.74 seconds:
>
> def version1(input)
>    # all kinds of other processing of input simulated by the input.dup
>    result = input.dup
>
>    result.gsub!("&", "&amp;")
>    result.gsub!("<", "&lt;")
>    result.gsub!(">", "&gt;")
>    result.gsub!("'", "&apos;")
>    result.gsub!("\"", "&quot;")
>
>    return result
> end
>
> The best I've come up with so far is, under ruby-prof ran in 3.33:
>
> def version2(input)
>    # all kinds of other processing of input simulated by the input.dup
>    result = input.dup
>
>    result.gsub!(/[&<>'"]/) do | match |
>      case match
>      when '&' then return '&amp;'
>      when '<' then return '&lt;'
>      when '>' then return '&gt;'
>      when "'" then return '&apos;'
>      when '"' then return '&quote;'
>      end
>    end
>
>    return result
> end
>
> After accounting for overhead, 3.8 times faster is good, I'd like it
> faster still. BTW, gsub is only marginally slower that gsub! I've
> tried using simple iteration, gsub with a hash to avoid the case, and
> variations, all slower to a lot slower than version 1, nothing really
> near version2 (which really was the first variation I tried).
>
> Any ideas?

You are on the right track. There is just one thing to improve: get
rid of "case":

class Converter
  MAP = {
    "&" => "&amp;",
# ...
  }

  def self.convert(s)
    s.gsub(/[&<>'"]/) do |m|
      MAP[m] || "ERROR"
    end
  end
end

Also, I believe x.dup.gsub! is less efficient than doing just a single x.gsub.

Kind regards

robert