I went browsing in CPAN to find something interesting, and came up with Algorithm::Merge. I don't use the perl version, but 3 way merging is something I do often since we allow concurrent access with our source control at work. Merge.rb is a fairly straight port of the perl version. I did change a callback to a block, and added some symbols in place of numeric constants. I need to add better documentation, but I wanted to get this in before it was too late for the summary. Usage: original= "Ok,\n this is a test sentence\n which will be edited." edited ="Ok,\n this is a sample phrase\n which has been edited." change="Hello World,\n this is a test phrase\n which I edited." puts "\nSplit by lines\n--------------------" $;="\n" Merge::diff3(original.split, edited.split, change.split).each{|l| puts l.inspect} puts Merge::merge(original.split, edited.split, change.split) puts "\nSplit by words\n--------------------" $;=" " Merge::diff3(original.split, edited.split, change.split).each{|l| puts l.inspect} puts Merge::merge(original.split, edited.split, change.split){|conflicts| ["<<"]+conflicts[1]+["|"]+conflicts[2]+[">>"]}.join(' ') yields: Split by lines -------------------- ["r", "Ok,", "Ok,", "Hello World,"] ["c", " this is a test sentence", " this is a sample phrase", " this is a test phrase"] ["c", " which will be edited.", " which has been edited.", " which I edited."] Hello World, <!-- ------ START CONFLICT ------ --> this is a sample phrase which has been edited. <!-- ---------------------------- --> this is a test phrase which I edited. <!-- ------ END CONFLICT ------ -->} Split by words -------------------- ["r", "Ok,", "Ok,", "Hello"] ["r", nil, nil, "World,"] ["u", "this", "this", "this"] ["u", "is", "is", "is"] ["u", "a", "a", "a"] ["l", "test", "sample", "test"] ["o", "sentence", "phrase", "phrase"] ["u", "which", "which", "which"] ["c", "will", "has", "I"] ["c", "be", "been", nil] ["u", "edited.", "edited.", "edited."] Hello World, this is a sample phrase which << has been | I >> edited. Bugs: - Merge::diff3(original,edited, change) - does a character-based diff, but returns inconsistent results (lines like [u, e, s, e]). I think this is because the callback_map has some no-ops where it should have valid callbacks, but it could be due to a porting error. I am still struggling to completely grok the use of the callback_map, with hopes of simplifying/clarifying it. Question: Can I add to or replace the Perl license with the ruby one? Source: ---- Merge.rb ----- module Merge # Module Merge # Three-way merge and diff # # based on perl's Algorithm::Merge # by James G. Smith, <jsmith / cpan.org> # Copyright (C) 2003 Texas A&M University. All Rights Reserved. # This module is free software; you may redistribute it and/or # modify it under the same terms as Perl itself. # ported to Ruby # by Adam Shelly <adam.shelly / gmail.com> require 'diff/lcs' # Given references to three lists of items, diff3 performs a # three-way difference. # This function returns an array of operations describing how the # left and right lists differ from the original list. In scalar # context, this function returns a reference to such an array. # # Given the following three lists, # original: a b c e f h i k # left: a b d e f g i j k # right: a b c d e h i j k # # merge: a b d e g i j k # # we have the following result from diff3: # # [ 'u', 'a', 'a', 'a' ], # [ 'u', 'b', 'b', 'b' ], # [ 'l', 'c', undef, 'c' ], # [ 'o', undef, 'd', 'd' ], # [ 'u', 'e', 'e', 'e' ], # [ 'r', 'f', 'f', undef ], # [ 'o', 'h', 'g', 'h' ], # [ 'u', 'i', 'i', 'i' ], # [ 'o', undef, 'j', 'j' ], # [ 'u', 'k', 'k', 'k' ] # # The first element in each row is the array with the difference: # c - conflict (no two are the same) # l - left is different # o - original is different # r - right is different # u - unchanged # The next three elements are the lists from the original, left, # and right arrays respectively that the row refers to (in the synopsis, # def Merge::diff3( pivot, doc_a, doc_b) ret = [] no_change = proc do |args| ret << ['u', pivot[args[0]], doc_a[args[1]], doc_b[args[2]] ] end conflict = proc do |args| p= pivot[args[0]] if args[0] a= doc_a[args[1]] if args[1] b= doc_b[args[2]] if args[2] ret << ['c', p, a, b] end diff_a = proc do |args| case args.size when 1 ret << ['o',pivot[args[0]], nil, nil] when 2 ret << ['o',nil, doc_a[args[0]], doc_b[args[1]]] when 3 ret << ['o', pivot[args[0]], doc_a[args[1]], doc_b[args[2]]] end end diff_b = proc do |args| case args.size when 1 ret << ['l', nil, doc_a[args[0]], nil] when 2 ret << ['l', pivot[args[0]], nil, doc_b[args[1]]] when 3 ret << ['l', pivot[args[0]], doc_a[args[1]], doc_b[args[2]]] end end diff_c = proc do |args| case args.size when 1 ret << ['r', nil, nil, doc_b[args[0]]] when 2 ret << ['r', pivot[args[0]], doc_a[args[1]], nil] when 3 ret << ['r', pivot[args[0]], doc_a[args[1]], doc_b[args[2]]] end end traverse_sequences3(pivot, doc_a, doc_b, {:NO_CHANGE=>no_change, :CONFLICT=>conflict, :A_DIFF=> diff_a, :B_DIFF=>diff_b, :C_DIFF=>diff_c} ) return ret end #callbacks for Diff::LCS class LCS_Traverse_Callbacks def initialize diffs @diffs = diffs end def [] l,r @diffs[@left=l]=[] @diffs[@right=r]=[] self end def match *args end def discard_a event @diffs[@left]<<event.old_position end def discard_b event @diffs[@right]<<event.new_position end end # constants for traverse_sequences D=nil AB_A=32 AB_B=16 AC_A=8 AC_C=4 BC_B=2 BC_C=1 CB_B=5 #not used in calculations CB_C=3 #not used in calculations @base_doc = {AB_A=>:A,AB_B=>:B,AC_A=>:A,AC_C=>:C,BC_B=>:B,BC_C=>:C} def Merge::traverse_sequences3(adoc, bdoc, cdoc, callbacks = {}) target_len = [bdoc.size,cdoc.size].min bc_different_len = (bdoc.size != cdoc.size) diffs = Hash.new([]) # callbacks#match:: Called when +a+ and +b+ are pointing # to common elements in +:A+ and +:B+. # callbacks#discard_a:: Called when +a+ is pointing to an # element not in +:B+. # callbacks#discard_b:: Called when +b+ is pointing to an # element not in +:A+. # The methods for <tt>callbacks#match</tt>, <tt>callbacks#discard_a</tt>, # and <tt>callbacks#discard_b</tt> are invoked with an event comprising # the action ("=", "+", or "-", respectively), the indicies +ii+ and # +jj+, and the elements <tt>:A[ii]</tt> and <tt>:B[jj]</tt>. Return # values are discarded by #traverse_sequences. ts_callbacks = LCS_Traverse_Callbacks.new(diffs) Diff::LCS::traverse_sequences(adoc, bdoc, ts_callbacks[AB_A, AB_B]) Diff::LCS::traverse_sequences(adoc, cdoc, ts_callbacks[AC_A,AC_C]) if (bc_different_len) Diff::LCS::traverse_sequences(cdoc, bdoc, ts_callbacks[CB_C,CB_B]) Diff::LCS::traverse_sequences(bdoc, cdoc, ts_callbacks[BC_B,BC_C]) if diffs[CB_B] != diffs[BC_B] || diffs[CB_C] != diffs[BC_C] puts "Diff::diff is not symmetric for second and third sequences - results might not be correct"; #trim to equal lengths and try again b_len, c_len = bdoc.size, cdoc.size bdoc_save = bdoc.slice!(target_len..-1) cdoc_save = cdoc.slice!(target_len..-1) Diff::LCS::traverse_sequences(bdoc, cdoc, ts_callbacks[BC_B,BC_C]) #mark the trimmed part as different and then restore diffs[BC_B] += (target_len..b_len).to_a if target_len < b_len diffs[BC_C] += (target_len..c_len).to_a if target_len < c_len bdoc.concat bdoc_save cdoc.concat cdoc_save end else # not bc_different_len Diff::LCS::traverse_sequences(bdoc, cdoc, ts_callbacks[BC_B,BC_C]) end pos = {:A=>0,:B=>0,:C=>0} sizes ={:A=>adoc.size, :B=>bdoc.size, :C=>cdoc.size} matches=[] noop = proc {} # Callback_Map is indexed by the sum of AB_A, AB_B, ..., as indicated by @matches # this isn't the most efficient, but it's a bit easier to maintain and # read than if it were broken up into separate arrays # half the entries are not noop - it would seem then that no # entries should be noop. I need patterns to figure out what the # other entries are. callback_Map = [ [ callbacks[:NO_CHANGE], :A, :B, :C ], # 0 - no matches [ noop, ], # 1 - BC_C [ callbacks[:B_DIFF], :B ], #*2 - BC_B [ noop, ], # 3 - BC_B BC_C [ noop, ], # 4 - AC_C [ callbacks[:C_DIFF], :C ], # 5 - AC_C BC_C [ noop, ], # 6 - AC_C BC_B [ noop, ], # 7 - AC_C BC_B BC_C [ callbacks[:A_DIFF], :A ], # 8 - AC_A [ noop, ], # 9 - AC_A BC_C [ callbacks[:C_DIFF], :A, :B ], # 10 - AC_A BC_B [ callbacks[:C_DIFF], :A, :B, ], # 11 - AC_A BC_B BC_C [ noop, ], # 12 - AC_A AC_C [ noop, ], # 13 - AC_A AC_C BC_C [ callbacks[:C_DIFF], :A, :B, ], # 14 - AC_A AC_C BC_B [ callbacks[:C_DIFF], :A, :B, :C ], # 15 - AC_A AC_C BC_B BC_C [ noop, ], # 16 - AB_B [ noop, ], # 17 - AB_B BC_C [ callbacks[:B_DIFF], :B ], # 18 - AB_B BC_B [ noop, ], # 19 - AB_B BC_B BC_C [ callbacks[:A_DIFF], :B, :C ], # 20 - AB_B AC_C [ noop, ], # 21 - AB_B AC_C BC_C [ noop, ], # 22 - AB_B AC_C BC_B [ callbacks[:CONFLICT], :A, :B, :C ], # 23 - AB_B AC_C BC_B BC_C [ callbacks[:B_DIFF], :B ], # 24 - AB_B AC_A [ noop, ], # 25 - AB_B AC_A BC_C [ callbacks[:C_DIFF], :B, :C ], # 26 - AB_B AC_A BC_B [ noop, ], # 27 - AB_B AC_A BC_B BC_C [ callbacks[:A_DIFF], :B, :C ], # 28 - AB_B AC_A AC_C [ noop, ], # 29 - AB_B AC_A AC_C BC_C [ noop, ], # 30 - AB_B AC_A AC_C BC_B [ callbacks[:B_DIFF], :B ], # 31 - AB_B AC_A AC_C BC_B BC_C [ callbacks[:NO_CHANGE], :A, :B, :C ], # 32 - AB_A [ callbacks[:B_DIFF], :A, :C ], # 33 - AB_A BC_C [ noop, ], # 34 - AB_A BC_B [ callbacks[:B_DIFF], :A, :C ], # 35 - AB_A BC_B BC_C [ noop, ], # 36 - AB_A AC_C [ noop, ], # 37 - AB_A AC_C BC_C [ noop, ], # 38 - AB_A AC_C BC_B [ noop, ], # 39 - AB_A AC_C BC_B BC_C [ callbacks[:A_DIFF], :A, ], # 40 - AB_A AC_A [ noop, ], # 41 - AB_A AC_A BC_C [ callbacks[:A_DIFF], :A ], # 42 - AB_A AC_A BC_B [ noop, ], # 43 - AB_A AC_A BC_B BC_C [ noop, ], # 44 - AB_A AC_A AC_C [ callbacks[:C_DIFF], :A, D, :C ], # 45 - AB_A AC_A AC_C BC_C ##ADS: I think this should be :CONFLICT?? [ noop, ], # 46 - AB_A AC_A AC_C BC_B [ noop, ], # 47 - AB_A AC_A AC_C BC_B BC_C [ noop, ], # 48 - AB_A AB_B [ callbacks[:B_DIFF], :A, :C ], # 49 - AB_A AB_B BC_C [ noop, ], # 50 - AB_A AB_B BC_B [ callbacks[:B_DIFF], :A, :B, :C ], # 51 - AB_A AB_B BC_B BC_C [ callbacks[:A_DIFF], :B, :C ], # 52 - AB_A AB_B AC_C [ noop, ], # 53 - AB_A AB_B AC_C BC_C [ noop, ], # 54 - AB_A AB_B AC_C BC_B [ callbacks[:C_DIFF], :C ], # 55 - AB_A AB_B AC_C BC_B BC_C [ callbacks[:B_DIFF], :A, :C ], # 56 - AB_A AB_B AC_A [ noop, ], # 57 - AB_A AB_B AC_A BC_C [ callbacks[:CONFLICT], :A, :B, D ], # 58 - AB_A AB_B AC_A BC_B ##ADS: I changed this one to :CONFLICT [ noop, ], # 59 - AB_A AB_B AC_A BC_B BC_C [ callbacks[:A_DIFF], :A, :B, :C ], # 60 - AB_A AB_B AC_A AC_C [ callbacks[:CONFLICT], :A, D, :C ], # 61 - AB_A AB_B AC_A AC_C BC_C [ callbacks[:CONFLICT], :A, :B, D ], # 62 - AB_A AB_B AC_A AC_C BC_B [ callbacks[:CONFLICT], :A, :B, :C ], # 63 - AB_A AB_B AC_A AC_C BC_B BC_C ] #while there is something to work with while diffs.values.find{|e|e.size>0} && [:A,:B,:C].find{|n|pos[n]<sizes[n]} #find all the differences at the current position of each doc matchset=[:A,:B,:C].inject([]) do |ms,i| ms+diffs.find_all {|k,v|@base_doc[k]==i && v[0]==pos[i]} end callback_num=matchset.uniq.inject(0){|cb,val| (cb|val[0])} callback = callback_Map[callback_num] args = callback[1..-1] callback[0].call(args.map{|ar| ar&&pos[ar]}) args.each do |n| pos[n]+=1 if n case n when :A diffs[AB_A].shift while diffs[AB_A][0] && ( diffs[AB_A][0] < pos[n] ) diffs[AC_A].shift while diffs[AC_A][0] && ( diffs[AC_A][0] < pos[n] ) when :B diffs[AB_B].shift while diffs[AB_B][0] && ( diffs[AB_B][0] < pos[n] ) diffs[BC_B].shift while diffs[BC_B][0] && ( diffs[BC_B][0] < pos[n] ) when :C diffs[AC_C].shift while diffs[AC_C][0] && ( diffs[AC_C][0] < pos[n] ) diffs[BC_C].shift while diffs[BC_C][0] && ( diffs[BC_C][0] < pos[n] ) end end #args.each #raise "args empty" if args.empty? ##ADS: args.empty? is true if the callback was a no-op. I don't think that should happen. break if args.empty? end #this part takes care of the leftovers bits={:A=>4,:B=>2,:C=>1} while [:A,:B,:C].find{|n|pos[n]<sizes[n]} match = 0 args=[] [:A,:B,:C].each do |i| if pos[i]<sizes[i] match|=bits[i] args << pos[i] pos[i]+=1 end end switch = [0,5,24,17,34,8,10,0][match] #ADS: I totally don't understand how these callbacks were chosen callback_Map[switch][0].call(*args) if callback_Map[switch][0] end end # Given references to three lists of items, merge performs a three-way # merge. The merge function uses the diff3 function to do most of # the work. # # The optional block parameter is called for conflicts. It should # accept an array of 3 arrays # The first array holds a list of elements from the original list. # The second array has a list of elements from the left list. # The last array holds a list of elements from the right list. # The block should return a list of elements to place in the merged # list in place of the conflict. # # The default conflict handler returns: # ["<!-- ------ START CONFLICT ------ -->", # args[1], # "<!-- ---------------------------- -->", # args[2], # "<!-- ------ END CONFLICT ------ -->}"] def Merge::merge(pivot,doc_a, doc_b) conflict_callback = proc do |args| ["<!-- ------ START CONFLICT ------ -->", args[1], "<!-- ---------------------------- -->", args[2], "<!-- ------ END CONFLICT ------ -->}"] end diff = diff3(pivot, doc_a, doc_b); ret = [] conflict = [[],[],[]] diff.each do |diffline| i = 0 if diffline[0] == 'c' # conflict conflict[0] << diffline[1] if diffline[1]; conflict[1] << diffline[2] if diffline[2]; conflict[2] << diffline[3] if diffline[3]; else unless (conflict[0].empty? && conflict[1].empty? && conflict[2].empty?) ret << (block_given? ? yield(conflict) : conflict_callback.call(conflict)) conflict = [[],[],[]] end case diffline[0] when 'u' # unchanged ret << diffline[2] || diffline[3]; when 'o','l' # added by both or left ret << diffline[2] if diffline[2] when 'r' #added by right ret << diffline[3] if diffline[3] end end end unless (conflict[0].empty? && conflict[1].empty? && conflict[2].empty?) ret << (block_given? ? yield(conflict) : conflict_callback.call(conflict)) end ret end end