--0015175117ae4b747b048b8946ea Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jul 16, 2010 at 6:52 PM, Xeno Campanoli / Eskimo North and Gmail < xeno.campanoli / gmail.com> wrote: > On 10-07-16 03:16 PM, Ammar Ali wrote: > >> On Sat, Jul 17, 2010 at 12:28 AM, Xeno Campanoli / Eskimo North and Gmail< >> xeno.campanoli / gmail.com> wrote: >> >> I'm looking through what documentation I can find for Hpricot (nokogirl >>> wouldn't install for me, and I just wand a quick an simple solution), and >>> I >>> cannot find a simple method to take two xml strings and find out if they >>> are >>> equivalent. I'm getting a bunch of xhmtl back from our rendering agent >>> with >>> random permutations of attributes inside of the tags, and I want a quick >>> and >>> easy ruby way to find out of segments are equivalent without writing my >>> own >>> regex based parser...??? >>> >> >> >> I can think of a few definitions for equivalence. One definition would >> simply require unifying the case of both strings and checking if they are >> the same. A second definition would require building a tree of the >> structure >> in each string, including attributes, sorting it, and looping over them to >> check if they contain the same elements (Nokogiri's XML::NodeSet does >> something like this with . A third definition would build on the second >> one, while treating certain tags as equivalent to other tags (for example >> q >> is equivalent to blockquote). >> >> What's *your* definition of equivalence for two xml documents or >> fragments? >> >> Ammar >> > > The only thing I am concerned about is permutations of attributes inside > the tags. Everything else I'm seeing is regular. Is there something where > I can parse all the tags in a segment and tell if they are equivalent and > just have the attributes in different orders? I'm not even concerned about > different tag forms. We don't see that. A typical example is: > > < <li><img src my/image/path/thingy.jpg" alt lt text" />My Text</li> > > <li><img alt lt text" src my/image/path/thingy.jpg" />My Text</li> > > I need to have something that can help me judge such things as equivalent. > Again, I NEVER see tag permutations, but just attribute permutations. > You should take a look at Lorax: http://github.com/flavorjones/lorax which is Nokogiri-based. Your definition of equivalence (the semantically correct one, imho) can be tested with: Lorax::Signature.new(Nokogiri::XML(string1).root).signature Lorax::Signature.new(Nokogiri::XML(string2).root).signature And note that Nokogiri will also alllow you to parse XML fragments. HTH, -m > >> > Thank you for you response. > > Sincerely, Xeno > > > -- > "It's the preponderance, stupid!" - Professor Stephen Schneider, IPCC > member > > --0015175117ae4b747b048b8946ea--