Hi Iam not able to unsubscribe from mymailing list 

-----Original Message-----
From: Gavin Kistner [mailto:gavin / refinery.com] 
Sent: Saturday, May 28, 2005 7:53 AM
To: ruby-talk ML
Subject: Re: How to build an index of phrases in a phrase/sentence?

One last followup (sorry, I'm bored onboard a plane) :)

I did one manual test of RAM comparing the VM used by the Set storage
versus the Trie storage, comparing the previously-measured 496 word
document with a document that had 1007 words. The results were as I
expected:

469 words:
     create set:     16.040000   1.100000  17.140000 ( 21.742738)
     159MB of VM

     create matcher: 85.430000   1.340000  86.770000 ( 96.524512)
     68MB of VM


1007 words:
     create set:    137.470000   9.400000 146.870000 (166.828737)
     ~1GB of VM

     create matcher: 746.690000  11.050000 757.740000 (806.450292)
     149MB of VM

Conclusion: if you have the RAM to spare, the Set-based approach is
quite speedy, but it gets greedy as your full phrase base grows. If you
need to save some memory and can spare the time, go with the Trie based
approach.



Now, having done all this work...if all you want is sub-phrase matching,
why not use a regexp?


469 words:
                               user     system      total        real
     create clean string:  0.010000   0.010000   0.020000 (  0.003050)
     run 100k matches:    10.750000   0.140000  10.890000 ( 15.839430)
     28MB of VM

1007 words:
                               user     system      total        real
     create clean string:  0.010000   0.010000   0.020000 (  0.432572)
     run 100k matches:    19.350000   0.200000  19.550000 ( 27.612700)
     28MB of VM



[Slim:~/Desktop/Match Phrases] gavinkis% cat regexp.rb require
'benchmark'

cleaned = nil
matcher = Regexp.new( "\\b#{ARGV[1]}\\b" )

Benchmark.bm( 20 ){ |x|
         x.report( "create clean string:" ){
                 cleaned = IO.read( ARGV[0] ).downcase.scan( /[a-z'] 
+/ ).join( ' ' )
         }
         x.report( "run 100k matches:"){
                 100_000.times{
                         cleaned =~ matcher
                         cleaned =~ /the brown fox/
                 }
         }
}