On Nov 28, 2004, at 11:11 AM, Wayne Vucenic wrote:

> The solution basically recursively calls clean? on the full array of
> words, then on the first half and on the second half, then on the
> first quarter, the second quarter, the third quarter, etc.  Whenever
> clean? returns true, it stops dividing that section of the array.

Below is my solution, which works the same way.

This "Divide and Conquer" approach is very effective when only a few 
words are banned, but I believe there are better approaches for many 
banned words.  Hopefully, we'll see one submitted...

> I then noticed that we don't really need to make all these calls to
> clean?, because we can sometimes deduce what the results will be
> without calling clean?.  For example,  if we call clean? on [0...8]
> and the result is false, then we call clean? on [0...4] and the result
> is true, then we know that clean? on [4...8] must return false, so we
> don't need to make this call.

Clever.  I didn't think of this.

> Here are the results with Jannis' testing classes.

[...]

> Words:         121430
> Banned Words:  3

[...]

> Words:         121430
> Banned Words:  126

[...]

> Words:         3930
> Banned Words:  29

[...]

> Words:         125282
> Banned Words:  80

[...]

> Words:         127367
> Banned Words:  10

I consider all of these tests to be in the "very few are blocked" 
category, from the quiz.  Anyone try anything closer to the suggested 
10% blocked from the quiz?

James Edward Gray II

#!/usr/bin/env ruby

#
# A simple class for managing a filter that prevents to use
# of a given _banned_words_ list.
#
class LanguageFilter
	#
	# Create a new LanguageFilter object that will
	# disallow _banned_words_.
	# Accepts a list of words, arrays of words,
	# or a combination of the two.
	#
	def initialize( *banned_words )
		@banned_words = banned_words.flatten.sort
		@clean_calls = 0
	end
	
	# A count of the calls to <i>clean?</i>.
	attr_reader :clean_calls
	
	#
	# Test if provided _text_ is allowable by this filter.
	# Returns *false* if _text_ contains _banned_words_,
	# *true* if it does not.
	#
	def clean?( text )
		@clean_calls += 1
		@banned_words.each do |word|
			return false if text =~ /\b#{word}\b/
		end
		true
	end
	
	#
	# Verify a _suspect_words_ list against the actual
	# _banned_words_ list.
	# Returns *false* if the two lists are not identical or
	# *true* if the lists do match.
	# Accepts a list of words, arrays of words,
	# or a combination of the two.
	#
	def verify( *suspect_words )
		suspect_words.flatten.sort == @banned_words
	end
end

# my algorithm
def isolate( list, test )
	if test.clean? list.join(" ")
		Array.new
	elsif list.size == 1
		list
	else
		left, right = list[0...(list.size / 2)], list[(list.size / 2)..-1]
		isolate(left, test) + isolate(right, test)
	end
end

# test code
words = ARGF.read.split " "
filter = LanguageFilter.new words.select { rand <= 0.01 }

start = Time.now
banned = isolate words, filter
time = Time.now - start

puts "#{words.size} words, #{banned.size} banned words found"
puts "Correct?  #{filter.verify banned}"
puts "Time taken: #{time} seconds"
puts "Calls: #{filter.clean_calls}"
puts "Words:"
puts banned.map { |word| "\t" + word }

__END__