Hi Algorithmists,

I am writing a spell checker in ruby. The first step
is to load the dictionary into memory. After some
experiments, I found that the following simple code
worked quite nicely:

f=File.new("dict.txt")
text=f.sysread(File.size("dict.txt"))
lexicon=text.split(/\n/)

On my 1.6G Pentium 4 running ruby 1.6.7, the total
time used to load the dictionary is less than 0.3
second. (Dictionary file is about 1Mb with 90K
records)

However, since the dictionary lookup operation will
be quite heavy, I am thinking of using a hash instead
of array. I tried the following code:

f=File.new("dict.txt")
text=f.sysread(File.size("dict.txt"))
words=text.split(/\n/)
lexicon={}
words.each do |word|
	lexicon[word]=0
end

Disaster! It took me about 15 seconds to load the
dictionary. Problem is that the #each method took too
much time.

I have 2 questions:

1. Is it worth to use hash instead of array with binary
search?

2. If I want to use hash, how to minimize the dictionary
load time? BTW, I tried to convert the dict.txt file into
a ruby command, i.e., dictionary={'first'=>0,'second'=>0,
...}, but system hangs when I tried to eval(text) after
sysread... :(

Thanks a lot!
Shannon


_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE* 
http://join.msn.com/?page=features/virus