Jan Fischer wrote:
> Hello together,
> 
> I got a problem grouping rows in a Database by similarity. What I try to
> reach is the following:
> I have a table looking like (just an example):
> 
> ID   Companyname       Group
> 1    Mickeysoft Ltd.   NULL
> 2    Mickysoft LTD     NULL
> 3    Mickeysft Limited NULL
> 4    Aple Inc          NULL
> 5    APPLE INC         NULL
> 
> and so on, you get the point. Group should be 1 for the IDs 1 to 3 and 2
> for the IDs 4 and 5.

If you are trying to figure out every kind of typo then I do not think 
that there is an algorithm that will suffice.

You said that they are in a database.  What if you were to do a

select distinct Companyname from theTable
order by Companyname

Then just go over it yourself.  Perhaps you can add a field calling it 
newID or something like that.  Then, enter a value in that field to 
identify which ones go together.  Perhaps you can even add another field 
that has the correct spelling.  Then, when you have gone through it all, 
you can write something that has the correct spelling and have users go 
through a pick list to prevent spelling errors.

On the other hand, depending on how the table is set up, you could just 
mark all but one for deletion.

You need to do this by hand IMO.  GL!
-- 
Posted via http://www.ruby-forum.com/.