2007/8/3, Bob Hutchison <hutch / recursive.ca>: > Hi, > > Does anyone know of a fast implementation of the XML escape method > (the one that converts '"<>& to " etc.)? > > I did some benchmarking on one of my applications and the > implementation I have, which I thought was okay -- simple minded for > sure, but okay -- turns out to be a bottle neck in certain operations. > > I used ruby-prof with a simple test, running over a 400 character > string 50,000 times or so. Running the profiler on version0 (below) > took 1.39 seconds. > > def version0(input) > # all kinds of other processing of input simulated by the input.dup > result = input.dup > > return result > end > > The original simple minded way was, under ruby-prof ran in 8.74 seconds: > > def version1(input) > # all kinds of other processing of input simulated by the input.dup > result = input.dup > > result.gsub!("&", "&") > result.gsub!("<", "<") > result.gsub!(">", ">") > result.gsub!("'", "'") > result.gsub!("\"", """) > > return result > end > > The best I've come up with so far is, under ruby-prof ran in 3.33: > > def version2(input) > # all kinds of other processing of input simulated by the input.dup > result = input.dup > > result.gsub!(/[&<>'"]/) do | match | > case match > when '&' then return '&' > when '<' then return '<' > when '>' then return '>' > when "'" then return ''' > when '"' then return '"e;' > end > end > > return result > end > > After accounting for overhead, 3.8 times faster is good, I'd like it > faster still. BTW, gsub is only marginally slower that gsub! I've > tried using simple iteration, gsub with a hash to avoid the case, and > variations, all slower to a lot slower than version 1, nothing really > near version2 (which really was the first variation I tried). > > Any ideas? You are on the right track. There is just one thing to improve: get rid of "case": class Converter MAP = { "&" => "&", # ... } def self.convert(s) s.gsub(/[&<>'"]/) do |m| MAP[m] || "ERROR" end end end Also, I believe x.dup.gsub! is less efficient than doing just a single x.gsub. Kind regards robert