On Mar 23, 2011, at 06:45 , Martin Hansen wrote:

> Hello all,
> 
> 
> Here is an attempt at a naive fuzzy string matching algorithm. I would
> appreciate comments on the code - which is not as simple as I would
> like.
> 
> I have a couple of speed improvements in mind:
>  1) replace EQUAL hash with array (possibly NArray).

That would generally be slower, depending on use. One thing you might want to do is clean up the code is match:

      EQUAL[(@seq1[self.seq1_index].upcase +
             @seq2[self.seq2_index].upcase).to_sym]

could be:

      EQUAL[:"#{@seq1[self.seq1_index]}#{@seq2[self.seq2_index]}"]

You'll have to change your initializer to upcase the input:

    @seq1           = seq1.upcase
    @seq2           = seq2.upcase

Looks like you'd get a better speed up using strings as your hash keys:

# of iterations = 1000000
                          user     system      total        real
null_time             0.130000   0.000000   0.130000 (  0.131828)
upcase + to_sym      21.320000   0.020000  21.340000 ( 21.366015)
interpolated sym     14.760000   0.010000  14.770000 ( 14.796624)
interpolated str     11.290000   0.020000  11.310000 ( 11.449326)

>  2) a path can be terminated before fully explored if it can be
> calculated that it will never complete (somewhere inside in_bounds?).

Dunno... but you have a good test suite, so it is probably pretty easy to write a test and fix it.

Minor cleanup:

EQUAL = Hash[%w(AA BU TH UY TT CB UH SC CC GB VA SG GG TB VC CS UU UB
                VG GS NA DA AV WA NT DG CV WT NC DT GV WU NG DU KG AW
                NU AD KT TW AN GD KU UW TN TD GK RA CN UD TK RG GN HA
                UK AR UN HC YC GR NN HT YT MA BC HU YU MC BG AH CY AM
                BT CH TY CM).map { |pair| [pair.to_sym, true] }]