------art_127788_32535646.1161004307354
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On 10/16/06, Mauricio Fernandez <mfp / acm.org> wrote:
>
> On Mon, Oct 16, 2006 at 07:15:52PM +0900, Minkoo wrote:
> > Hi list.
> >
> > I read an article posted at Wikipedia about Levenshtein distance (aka
> edit
> > distance).
> > The location of the document is
> > http://en.wikipedia.org/wiki/Levenshtein_distance
> >
> > In the document, a sample Ruby code goes like the following:
> >
> > class String
> >   def levenshtein(comparator)
> >     a, b  elf.unpack('U*'), comparator.unpack('U*')
> >     n, m  .length, b.length
> >     a, b, n, m  , a, m, n if n > m
> >     current  *0..n]
> >     1.upto(m) do |i|
> >       previous, current  urrent, [i]+[0]*n
> >       1.upto(n) do |j|
> >         add, delete  revious[j]+1, current[j-1]+1
> >         change  revious[j-1]
> >         change +  if a[j-1] ! [i-1]
> >         current[j]  add, delete, change].min
> >       end
> >     end
> >     current[n]
> >   end
> > end
> >
> > In the code, there's a parameter called comparator which seems to be
> used to
> > decode given parameter. But, I can't understand what exactly the
> comparator
> > is doing.
> >
> > Does anybody know the detail?
>
> It's simply the string you're comparing against; unpack('U*') just turns
> the
> UTF-8 characters into unsigned integers:
>
>     class String
>       def levenshtein(comparator)
>         a, b  elf.unpack('U*'), comparator.unpack('U*')
>         b                                                 # [102, 111,
> 111, 98, 97, 114]
>         n, m  .length, b.length
>         a, b, n, m  , a, m, n if n > m
>         current  *0..n]
>         1.upto(m) do |i|
>           previous, current  urrent, [i]+[0]*n
>           1.upto(n) do |j|
>             add, delete  revious[j]+1, current[j-1]+1
>             change  revious[j-1]
>             change +  if a[j-1] ! [i-1]
>             current[j]  add, delete, change].min
>           end
>         end
>         current[n]
>       end
>     end
>
>     "foo".levenshtein("foobar")                    # 3
>
> --
> Mauricio Fernandez  -   http://eigenclass.org   -  singular Ruby
>
>
I'm afraid that I'm not used to character encodings. Does Ruby use UTF-8 by
default?

In other words, suppose that I've launched irb and fired
"foo".levenshtein("foobar").
In that case, is the string "foo" encoded as utf-8? Do I always have to
unpack the string like
the code shown above?

Sincerely,
Minkoo Seo

------art_127788_32535646.1161004307354--