David,

After reading your results I thought I would try and make a couple of
simple changes. I attempted to cleanup the 'insert' routine since that
is where most of the processing time seemed to be spent. I also added
the ability to perform multi-term searching (individual terms or single
string). This will worsen the look-up times, but it might be a good
change.

If possible, could you run this version through your test to see how it
does?

class IndexHash
	def initialize( documents=nil )
		@index = Hash.new( [] )
		input( documents ) if documents
	end

	def input( documents )
		documents.each_pair do |symbol, contents|
			contents.split.each { |word| insert( symbol, word) }
		end
	end

	def insert( document_symbol, word )
		w = word.downcase
		@index[w] += [ document_symbol ] unless @index[w].include?(
document_symbol )
	end

	def find( *strings )
		result = []
		strings.each do |string|
			string.split.each do |word|
				result += @index[ word.downcase ]
			end
		end
		result.uniq
	end

	def words
		@index.keys.sort
	end
end

class IndexBitmap
	def initialize( documents=nil )
		@index = []
		@documents = Hash.new( 0 )
		input( documents ) if documents
	end

	def input( documents )
		documents.each_pair do |symbol, contents|
			contents.split.each { |word| insert( symbol, word) }
		end
	end

	def insert( document_symbol, word )
		w = word.downcase
		@index.push( w ) unless @index.include?( w )
		@documents[ document_symbol ] |= (1<<@index.index( w ))
	end

	def find( *strings )
		result = []
		mask = 0

		strings.each do |string|
			string.split.each do |word|
				w = word.downcase
				mask |= (1<<@index.index(w)) if @index.index(w)
			end
		end

		@documents.each_pair do |symbol, value|
			result.push( symbol ) if value & mask
		end
		result
	end
	
	def words
		@index.sort
	end
end