Hi --

On Wed, 18 Dec 2002, Shannon Fang wrote:

> Hi Algorithmists,
>
> I am writing a spell checker in ruby. The first step
> is to load the dictionary into memory. After some
> experiments, I found that the following simple code
> worked quite nicely:
>
> f=File.new("dict.txt")
> text=f.sysread(File.size("dict.txt"))
> lexicon=text.split(/\n/)
>
> On my 1.6G Pentium 4 running ruby 1.6.7, the total
> time used to load the dictionary is less than 0.3
> second. (Dictionary file is about 1Mb with 90K
> records)
>
> However, since the dictionary lookup operation will
> be quite heavy, I am thinking of using a hash instead
> of array. I tried the following code:
>
> f=File.new("dict.txt")
> text=f.sysread(File.size("dict.txt"))
> words=text.split(/\n/)
> lexicon={}
> words.each do |word|
> 	lexicon[word]=0
> end
>
> Disaster! It took me about 15 seconds to load the
> dictionary. Problem is that the #each method took too
> much time.

That's really puzzling.  I ran the same script in .23s real time,
using a dict with about 38K words, on a 1.4GHz Pentium.


David

-- 
David Alan Black
home: dblack / candle.superlink.net
work: blackdav / shu.edu
Web:  http://pirate.shu.edu/~blackdav