Sorry for the half message, thick fingers hit the wrong key accidentally.

On Tue, Apr 19, 2011 at 3:13 PM, Martin Hansen <mail / maasha.dk> wrote:
>> IMHO it would be better to separate representation of the sequence and
>> the matching process. =A0The matcher then would only carry a reference
>> to the sequence and all the data it needs to do matching.
>
> I am not sure if I understand this. I have tried to copy the behavior of
> String#match.

Maybe on the interface, but you create side effects on the String (Seq
in your case).  This is neither a clean separation (makes classes
bigger and harder to understand) nor is it thread safe (e.g. if you
want to try to concurrently match several patterns against the same
sequence).

Btw, in your case I would probably implement #scan and derive
implement match like

def match(*args)
  scan(*args) do |match_data|
    return match_data
  end

  nil # no match
end

That's pretty easy and you can still replace this implementation with
something else if it should be necessary.

>> Also #vector_update creates a lot of objects and does so for each
>> position in the sequence. =A0That's likely where you can improve things.
>
> Yes, that is quite possible. I might be able to skip .dup on line 128
> and 130. That will require some thinking and testing on my side.

You could at least switch to only two arrays which you use alternating
with relatively moderate effort.  You might even get away with a
single array.

>> I am not sure what the matching algorithm is exactly. =A0Can you summari=
ze
>> it?
>
> Well, it is a dynamic programming algorithm to do fuzzy searches of
> patterns in strings - allowing for custom matching rules (A=3D=3DN, etc) =
and
> a maximum edit distance. Inspired by the paper by Bruno Woltzenlogel
> Paleo (page 197):
>
> http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf
>
> A short example:
>
> http://pastie.org/1811496

Ah, interesting.  Thanks!

>> you can make matching simpler
>>
>> def match?(char1, char2)
>> =A0 =A0 (EQUAL[char1.ord] & EQUAL[char2.ord]) !=3D 0
>> end
>
> Yes, but that should not give any significant speed increase?

Why not?  You get at least rid of a Ruby method call.  Did you measure
differences?

>> You might as well consider changing the Array into a Hash. =A0Then you
>> can even get rid of the #ord call.
>
> Actually, I started with a hash for this - and it was slightly faster.
> However, I think this bit field is very elegant - and since I was
> preparing for porting to C - I think this is the way to go!

I did not say you should get rid of the bit field!  Plus, if you are
in need for speed and if it was slow then you should get rid of it
regardless of elegance.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/