On Wed, Mar 10, 2010 at 6:09 AM, Intransition <transfire / gmail.com> wrote: > > > On Mar 9, 10:38 pm, Mike Dalessio <mike.dales... / gmail.com> wrote: > > lorax version 0.1.0 has been released! > > > > * <http://github.com/flavorjones/lorax> > > > > The Lorax is a diff and patch library for XML/HTML documents, based on > > Nokogiri. > > > > It can tell you whether two XML/HTML documents are identical, or if > > they're not, tell you what's different. In trivial cases, it can even > > apply the patch. > > Why not in every case? > Because there are still boogs! :-D One example: the XPath pointing at the elements involved in the deltas doesn't take into consideration the fact that other sibling elements may have been inserted or removed as part of an earlier delta. Another example: there are edge cases where Lorax can get confused by many identical sibling nodes interleaved with changing elements (think whitespace in an HTML doc). I'd like to note that the library uses dependency injection to allow a modular choice of algorithm. So people with better CS chops than me can take a whack at it by building their own delta-set generator for their favorite algorithm, while still taking using the fast subtree signatures. If you're curious and interested, I'd love to have more eyes and hands on these issues. Both master and whitespace-fix branches have failing tests which can tell you where to dive in. The TODO has information on class responsibilities, algorithmic notes, missing integration tests and a list of needed features (like an rspec matcher).