On Tue, Apr 19, 2011 at 3:13 PM, Martin Hansen <mail / maasha.dk> wrote:
>> IMHO it would be better to separate representation of the sequence and
>> the matching process.  ¨Âèå íáôãèåôèåî ÷ïõìïîìù ãáòòòåæåòåîã>> to the sequence and all the data it needs to do matching.
>
> I am not sure if I understand this. I have tried to copy the behavior of
> String#match.

Maybe on the interface, but you create side effects on the String (Seq

>> Also #vector_update creates a lot of objects and does so for each
>> position in the sequence.  ¨Âèáô§ìéëåìù ÷èåòùïãáéíðòïöôèéîçó>
> Yes, that is quite possible. I might be able to skip .dup on line 128
> and 130. That will require some thinking and testing on my side.
>
>> I am not sure what the matching algorithm is exactly.  ¨Âáî ùïóõííáòéú>> it?
>
> Well, it is a dynamic programming algorithm to do fuzzy searches of
> patterns in strings - allowing for custom matching rules (A==N, etc) and
> a maximum edit distance. Inspired by the paper by Bruno Woltzenlogel
> Paleo (page 197):
>
> http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf
>
> A short example:
>
> http://pastie.org/1811496
>
>
>> you can make matching simpler
>>
>> def match?(char1, char2)
>> (EQUAL[char1.ord] & EQUAL[char2.ord]) != 0
>> end
>
> Yes, but that should not give any significant speed increase?
>
>> You might as well consider changing the Array into a Hash.  ¨Âèåùï>> can even get rid of the #ord call.
>
> Actually, I started with a hash for this - and it was slightly faster.
> However, I think this bit field is very elegant - and since I was
> preparing for porting to C - I think this is the way to go!

I did not say you should get rid of the bit field!  Plus, if you are
in need for speed and if it was slow then you should get rid of it
regardless of elegance.

Cheers

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/