------ art_127788_32535646.1161004307354 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline On 10/16/06, Mauricio Fernandez <mfp / acm.org> wrote: > > On Mon, Oct 16, 2006 at 07:15:52PM +0900, Minkoo wrote: > > Hi list. > > > > I read an article posted at Wikipedia about Levenshtein distance (aka > edit > > distance). > > The location of the document is > > http://en.wikipedia.org/wiki/Levenshtein_distance > > > > In the document, a sample Ruby code goes like the following: > > > > class String > > def levenshtein(comparator) > > a, b elf.unpack('U*'), comparator.unpack('U*') > > n, m .length, b.length > > a, b, n, m , a, m, n if n > m > > current *0..n] > > 1.upto(m) do |i| > > previous, current urrent, [i]+[0]*n > > 1.upto(n) do |j| > > add, delete revious[j]+1, current[j-1]+1 > > change revious[j-1] > > change + if a[j-1] ! [i-1] > > current[j] add, delete, change].min > > end > > end > > current[n] > > end > > end > > > > In the code, there's a parameter called comparator which seems to be > used to > > decode given parameter. But, I can't understand what exactly the > comparator > > is doing. > > > > Does anybody know the detail? > > It's simply the string you're comparing against; unpack('U*') just turns > the > UTF-8 characters into unsigned integers: > > class String > def levenshtein(comparator) > a, b elf.unpack('U*'), comparator.unpack('U*') > b # [102, 111, > 111, 98, 97, 114] > n, m .length, b.length > a, b, n, m , a, m, n if n > m > current *0..n] > 1.upto(m) do |i| > previous, current urrent, [i]+[0]*n > 1.upto(n) do |j| > add, delete revious[j]+1, current[j-1]+1 > change revious[j-1] > change + if a[j-1] ! [i-1] > current[j] add, delete, change].min > end > end > current[n] > end > end > > "foo".levenshtein("foobar") # 3 > > -- > Mauricio Fernandez - http://eigenclass.org - singular Ruby > > I'm afraid that I'm not used to character encodings. Does Ruby use UTF-8 by default? In other words, suppose that I've launched irb and fired "foo".levenshtein("foobar"). In that case, is the string "foo" encoded as utf-8? Do I always have to unpack the string like the code shown above? Sincerely, Minkoo Seo ------ art_127788_32535646.1161004307354--