--0015175117ae4b747b048b8946ea
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Jul 16, 2010 at 6:52 PM, Xeno Campanoli / Eskimo North and Gmail <
xeno.campanoli / gmail.com> wrote:

> On 10-07-16 03:16 PM, Ammar Ali wrote:
>
>> On Sat, Jul 17, 2010 at 12:28 AM, Xeno Campanoli / Eskimo North and Gmail<
>> xeno.campanoli / gmail.com>  wrote:
>>
>>  I'm looking through what documentation I can find for Hpricot (nokogirl
>>> wouldn't install for me, and I just wand a quick an simple solution), and
>>> I
>>> cannot find a simple method to take two xml strings and find out if they
>>> are
>>> equivalent.  I'm getting a bunch of xhmtl back from our rendering agent
>>> with
>>> random permutations of attributes inside of the tags, and I want a quick
>>> and
>>> easy ruby way to find out of segments are equivalent without writing my
>>> own
>>> regex based parser...???
>>>
>>
>>
>> I can think of a few definitions for equivalence. One definition would
>> simply require unifying the case of both strings and checking if they are
>> the same. A second definition would require building a tree of the
>> structure
>> in each string, including attributes, sorting it, and looping over them to
>> check if they contain the same elements (Nokogiri's XML::NodeSet does
>> something like this with . A third definition would build on the second
>> one, while treating certain tags as equivalent to other tags (for example
>> q
>> is equivalent to blockquote).
>>
>> What's *your* definition of equivalence for two xml documents or
>> fragments?
>>
>> Ammar
>>
>
> The only thing I am concerned about is permutations of attributes inside
> the tags.  Everything else I'm seeing is regular.  Is there something where
> I can parse all the tags in a segment and tell if they are equivalent and
> just have the attributes in different orders?  I'm not even concerned about
> different tag forms.  We don't see that.  A typical example is:
>
> < <li><img srcmy/image/path/thingy.jpg" alt
lt text" />My Text</li>
> > <li><img alt
lt text" srcmy/image/path/thingy.jpg" />My Text</li>
>
> I need to have something that can help me judge such things as equivalent.
> Again, I NEVER see tag permutations, but just attribute permutations.
>

You should take a look at Lorax:

http://github.com/flavorjones/lorax

which is Nokogiri-based.

Your definition of equivalence (the semantically correct one, imho) can be
tested with:

    Lorax::Signature.new(Nokogiri::XML(string1).root).signature Lorax::Signature.new(Nokogiri::XML(string2).root).signature

And note that Nokogiri will also alllow you to parse XML fragments.

HTH,
-m



>
>>
> Thank you for you response.
>
> Sincerely, Xeno
>
>
> --
> "It's the preponderance, stupid!" - Professor Stephen Schneider, IPCC
> member
>
>

--0015175117ae4b747b048b8946ea--