On Mon, Oct 16, 2006 at 07:15:52PM +0900, Minkoo wrote:
> Hi list.
> 
> I read an article posted at Wikipedia about Levenshtein distance (aka edit
> distance).
> The location of the document is
> http://en.wikipedia.org/wiki/Levenshtein_distance
> 
> In the document, a sample Ruby code goes like the following:
> 
> class String
>   def levenshtein(comparator)
>     a, b = self.unpack('U*'), comparator.unpack('U*')
>     n, m = a.length, b.length
>     a, b, n, m = b, a, m, n if n > m
>     current = [*0..n]
>     1.upto(m) do |i|
>       previous, current = current, [i]+[0]*n
>       1.upto(n) do |j|
>         add, delete = previous[j]+1, current[j-1]+1
>         change = previous[j-1]
>         change += 1 if a[j-1] != b[i-1]
>         current[j] = [add, delete, change].min
>       end
>     end
>     current[n]
>   end
> end
> 
> In the code, there's a parameter called comparator which seems to be used to
> decode given parameter. But, I can't understand what exactly the comparator
> is doing.
> 
> Does anybody know the detail?

It's simply the string you're comparing against; unpack('U*') just turns the
UTF-8 characters into unsigned integers:

    class String
      def levenshtein(comparator)
	a, b = self.unpack('U*'), comparator.unpack('U*')
	b                                                 # => [102, 111, 111, 98, 97, 114]
	n, m = a.length, b.length
	a, b, n, m = b, a, m, n if n > m
	current = [*0..n]
	1.upto(m) do |i|
	  previous, current = current, [i]+[0]*n
	  1.upto(n) do |j|
	    add, delete = previous[j]+1, current[j-1]+1
	    change = previous[j-1]
	    change += 1 if a[j-1] != b[i-1]
	    current[j] = [add, delete, change].min
	  end
	end
	current[n]
      end
    end

    "foo".levenshtein("foobar")                    # => 3

-- 
Mauricio Fernandez  -   http://eigenclass.org   -  singular Ruby