Begin forwarded message:

> From: James Quick <jimquick / mac.com>
> Date: July 12, 2006 9:51:17 AM CDT
> To: submission / rubyquiz.com
> Subject: Please Forward: Ruby Quiz Submission
>
> I am not on the ruby-talk list, and would appreciate if you could  
> pass this
> along. This was a nice problem to get my feet wet in this language. I
> began looking at Ruby last week after hearing about what an old  
> colleague
> of mine was doing with it. Because I am familiar with several  
> languages
> with which it  shares some features (smalltalk, eiffel, objective- 
> c, perl) it
> is deceptively easy to pick up, and occasionally infuriating. Thank  
> you,
> James, for providing this excellent service to the community.  
> Sometimes
> it is essential to have throwaway problems to guide one's exploration.
>
> My first attempts were very classy (in a bad way) partitioning  
> things into separate
> classes, rigorously defining interfaces and accessor methods. They  
> were also
> very slow, and usually failed to converge on a solution.
>
> Then, by reading source code for a bunch of modules in /usr/lib/ 
> ruby, I found
> that mixing a few methods into the base classes was not only  
> sufficient for
> most of my uses but preferable. Profiling showed that much of my  
> time was
> being spent in each_byte loops so I ripped out the conversions to  
> and from
> Strings and replaced it with a byte comparison factory which stored  
> lambda
> functions. This way, I could pretend that I was producing arrays of  
> byte counts
> from string operations without actually doing them more than once.
>
> This was more than twice as fast as my previous attempts but was still
> not converging very quickly. I had already optimized my target_counts
> array by initializing it with complete information on what was  
> knowable
> about a solution before starting a search. Unfortunately during the  
> search,
> if several hundred thousand loops had occurred, I would do some sanity
> checking which proved to be insane. I converted the target counts  
> into a
> string then back into a count array. If the real result of the  
> sanity check was
> beyond the range of the target and source, I would munge one or more
> values into my target array, thus throwing out all the good work!
>
> It was not until I saw the excellent submissions by Simon Krer that  
> I realized
> that my sanity checking was demoting me to a random search. Optimal  
> random
> Robinizing is more like solving a rubix cube than like searching  
> randomly
> for text. If properly initialized the target and result are  
> mathematically linked.
> If you make any change to the target or source without making a  
> complementary
> change to the other, you will immediately devolve into a random  
> search.
> Doing so is the equivalent of occasionally removing a piece from the
> cube, twisting it, and putting it back. You may have made it  
> impossible
> to solve (unless by chance you randomly undo what you just did).
>
> Basically, the target contains a direct representation of the  
> pangram search
> space in the form of c->n. Each count expresses the assertion that  
> there be
> n occurrences of the character 'c' in the pangram. For a particular  
> value of
> n it may be a completely bogus assertion. All that matters is that  
> the assertion
> is congruent with the information stored in the result_string.
>
> I initialize my target and source with the exact values for  
> constant text
> and their spelled out occurrences. Simon does it far more simply at  
> the
> expense of a few more iterations (but his chainsaw has a much more
> powerful engine!).  He initializes his target (which he calls  
> guess) to
> an array of 1's indicating the invariant assumption that there will be
> 1 occurrence of each letter of the alphabet in the pangram.  His  
> result
> starts with an array of 1's (representing a-z from the itemized
> list n a's, n b's .... and n z's). He then adds the counts for  
> "and", the prefix
> string, and 26 copies of the word "one". The 26 copies of the word  
> "one"
> provide the required link which binds the guess and the "real" result
> together. As soon as he chooses a better value than 1 for a letter
> frequency he will subtract "one" and then add the spelled out version
> of the new choice.
>
> These 26 copies of "one", or the loop of calls to adder lambda  
> functions
> in my version, each perform an equivalent task. They ensure that the
> target and result are rigorously linked.
>
> From here on, despite their quite different implementations, our  
> solutions
> are essentially the same. Random Robinizing proceeds as follows:
> When the target and source differ, randomly choose a new value(n).
> Add or subtract that value from the source (plain integer arithmetic).
> In the destination, decrement the letter counts for each letter in the
> spelled out number (before the change), then increment the letter
> counts for the new spelled out number.
>
> As you can see, the source and destination have completely different
> operations performed on them. They are different but embody the
> essence of the pangram itself. A self referential sentence whose
> implementation (spelling) is identical to its semantic meaning
> (numerical commentary about the sentence).
>
> If you (like me) ran into a performance wall, and found that
> answers were converging slowly or not at all, take a step back.
> As soon as you break the underlying relationship between the
> target and destination, your program will stray into a random walk,
> which may contain long cycles of aimless wandering.
>
> I think there may be some bugs lurking here, though I think most are
> gone. I would appreciate any stylistic or other advice on best  
> practices.
> I alternated several times between studlyCaps and c style naming.
> I also realize that I have know idea about the relative performance
> tradeoffs between cacheing variables and evaluating method calls.
>
> I did see some fairly large performance boosts when variables are
> found first in the outer scope rather then being allocated and lost
> with each iteration.
>
> Finally, I am interested in learning where to look for up to date
> information on the language definition or syntax. I could only find
> something circa 1.6.
>
> The following code makes me realize that I am far from being
> proficient in this language. However, I hope the comments and
> the basic algorithm are clear enough to provide some useful
> insight into the problem domain.
>
> here is some sample output.
>
> As you can see 9 of the 26 letters are already solved before the  
> first pass
> through the loop. This neighborhood quickly converged. In retrospect
> I am now sure that the 'q' from my initials helped. Zebras are useful
> too.
> lili% ruby jqpangram.rb
> "--- 0 Wed Jul 12 10:32:42 EDT 2006"
> [7, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 2, 2, 1, 1, 1, 1, 1,  
> 1, 1, 1, 2]
> [7, 2, 2, 2, 24, 2, 2, 2, 2, 2, 1, 1, 3, 24, 28, 2, 2, 5, 12, 10,  
> 1, 2, 8, 1, 1, 2]
> "--- 10000 Wed Jul 12 10:32:47 EDT 2006"
> [7, 2, 2, 2, 28, 5, 3, 7, 8, 2, 1, 2, 3, 16, 17, 2, 2, 12, 31, 25,  
> 3, 7, 12, 3, 4, 2]
> [7, 2, 2, 2, 35, 5, 4, 8, 8, 2, 1, 3, 3, 16, 15, 2, 2, 10, 32, 26,  
> 2, 9, 13, 2, 4, 2]
> "--- 20000 Wed Jul 12 10:32:52 EDT 2006"
> [7, 2, 2, 2, 26, 6, 2, 4, 10, 2, 1, 1, 3, 22, 16, 2, 2, 10, 31, 26,  
> 2, 5, 14, 4, 5, 2]
> [7, 2, 2, 2, 21, 7, 2, 3, 9, 2, 1, 1, 3, 17, 20, 2, 2, 9, 31, 25,  
> 4, 4, 14, 5, 5, 2]
> "--- 30000 Wed Jul 12 10:32:57 EDT 2006"
> [7, 2, 2, 2, 28, 6, 4, 6, 10, 2, 1, 2, 3, 17, 16, 2, 2, 9, 30, 25,  
> 3, 5, 12, 3, 4, 2]
> [7, 2, 2, 2, 27, 6, 3, 6, 10, 2, 1, 2, 3, 16, 15, 2, 2, 10, 32, 24,  
> 3, 6, 12, 4, 4, 2]
> "--- 40000 Wed Jul 12 10:33:01 EDT 2006"
> [7, 2, 2, 2, 29, 8, 4, 8, 9, 2, 1, 3, 3, 15, 17, 2, 2, 12, 30, 23,  
> 4, 6, 12, 2, 4, 2]
> [7, 2, 2, 2, 28, 7, 4, 7, 9, 2, 1, 3, 3, 17, 16, 2, 2, 11, 30, 25,  
> 4, 5, 13, 2, 4, 2]
> "--- 50000 Wed Jul 12 10:33:06 EDT 2006"
> [7, 2, 2, 2, 30, 6, 2, 7, 10, 2, 1, 1, 3, 19, 17, 2, 2, 11, 30, 23,  
> 3, 6, 10, 4, 4, 2]
> [7, 2, 2, 2, 28, 4, 2, 6, 7, 2, 1, 2, 3, 19, 16, 2, 2, 11, 31, 23,  
> 3, 5, 10, 3, 4, 2]
> "--- 60000 Wed Jul 12 10:33:11 EDT 2006"
> [7, 2, 2, 2, 29, 7, 2, 6, 9, 2, 1, 3, 3, 19, 16, 2, 2, 11, 33, 23,  
> 4, 5, 12, 4, 4, 2]
> [7, 2, 2, 2, 31, 6, 2, 6, 9, 2, 1, 3, 3, 20, 16, 2, 2, 12, 31, 23,  
> 4, 6, 12, 3, 4, 2]
> "solution found in 69833 iterations"
> [7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25,  
> 3, 5, 12, 4, 4, 2]
> [7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25,  
> 3, 5, 12, 4, 4, 2]
> "mypan----------------------------------------"
> [7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25,  
> 3, 5, 12, 4, 4, 2]
> "A pangram from jq contains one zebra, seven a's, two b's, two c's,  
> two d's, twenty-six e's, eight f's, three g's, six h's, ten i's,  
> two j's, one k, two l's, three m's, fifteen n's, sixteen o's, two  
> p's, two q's, ten r's, thirty-one s's, twenty-five t's, three u's,  
> five v's, twelve w's, four x's, four y's, and two z's."
> true
>
>
> cut here VVVV jqpangram.rb follows
> --------------------------------------
> #!/usr/local/bin/ruby -w
>
> class Integer
>     ONES = %w[ zero one two three four five six seven eight nine ]
>     TEEN = %w[ten eleven twelve thirteen fourteen fifteen
>                   sixteen seventeen eighteen nineteen ]
>     TENS = %w[ zero ten twenty thirty forty fifty
>                 sixty seventy eighty ninety ]
>
>     def to_en
>         d, n = self.divmod(10)
>         if d == 0
>             ONES[n]
>         elsif d == 1
>             TEEN[n]
>         elsif d < 10
>             TENS[d] + ((n > 0) ?  ("-" + ONES[n]) : "")
>         else
>             raise "Class Integer#to_en only goes to 99 not #{self}"
>         end
>     end
> end
>
> class String
>     # to_counts returns a 26 element array containing the counts of
>     # each letter in the input string. e.g. "add".to_counts ->  
> [1,0,0,2,0...]
>     # Only counting lower case for speed. Be sure to downcase.
>     def to_counts
>         counts = Array.new(26).fill(0)
>         each_byte do |b| counts[b - ?a] += 1 if b >= ?a && b <= ?z end
>         counts
>     end
>
>     # similar to to_counts except adds the result into an existing  
> array.
>     def accum_counts(counts)
>         each_byte do |b| counts[b - ?a] += 1 if b >= ?a && b <= ?z end
>     end
> end
>
> class Array
>     include Enumerable
>     # Return an array of textual pangram sections from a count_array
>     def pan_body()
>         self.zip((?a..?z).to_a).collect do |n, c|
>             sprintf(" %s%s %c%s",                   \
>                 (c==?z ? "and " : ""),              \
>                 n.to_en, c,                         \
>                 (n>1) ? "'s" : "")
>         end
>     end
>
>     # Turn a pangram text array into a string.
>     def pan_string()
>         pan_body().join(',') + "."
>     end
> end
>
> class CounterFactory
>     # The designated initializer is called with a pangram prefix  
> string
>     # e.g. "This string contains". It sets up a pair of hash tables
>     # filled with pattern counting functions and other initial values.
>     def initialize(withString=nil)
>         @add_word = {}
>         @subtract_word = {}
>         add_numbers('+', @add_word)
>         add_numbers('-', @subtract_word)
>         add_initial(withString) if withString != nil
>         add_numbers('+', @add_word)
>     end
>
>     # add_function takes an input string and creates lambda  
> functions which
>     # transform count arrays (like those made by to_counts).
>     # add_function(1, "one", "+", add_word) creates a function of  
> arity 1
>     # and adds it to the @add_word hash table with a key of 1.
>     # Calling this function with a count array increments the 'o',  
> 'n', and 'e'
>     # letter counts. If the input array was initially filled with  
> 0, then
>     # the content of that array is equivalent to "one".to_counts.
>     def add_function(key, aString, op, table)
>         counts = aString.downcase.to_counts
>         src = 'lambda { |targ| '
>         counts.each_with_index do |n,i|
>                 src += "targ[#{i}] #{op}= #{n}; " if (n > 0)
>         end
>         src += '}'
>         table[key] = eval(src)
>     end
>
>     # Add functions for each named number, with integer keys, to a  
> table.
>     def add_numbers(op, table)
>         #add_function(0, "no", op, table)
>         add_function(1, "one", op, table)
>         for n in (2..99)
>             add_function(n, n.to_en + "s", op, table)
>         end
>     end
>
>     # Add a function mapping the constant elements of a pangram to  
> a count array.
>     def add_initial(prefix)
>         # A pangram is : "prefix" + counts of each letter + "and"  
> before 'z'
>         known_string = prefix.downcase + ('a'..'z').to_a.join("") +  
> "and"
>         add_function(:initial, known_string, '+', @add_word)
>
>         # known_count[letter] is the count of each letter we know  
> must exist
>         known_count = known_string.to_counts
>
>         name_bytes = Array.new(26).fill(0)
>         (1..99).each { |n| n.to_en.accum_counts(name_bytes) }
>         in_names = name_bytes.collect {|n| n > 0}
>         # Now we have an array of booleans representing whether a  
> character
>         # is constant (only appearing in the prefix or enumerated  
> list)
>         # or is a variable quantity (because it also appears in  
> number names
>
>         # From the count of what we initially knew to be present,  
> return
>         # our initial target. This is our best initial guess for a  
> result
>         # which will converge quickly. Basically, anything which is
>         # not found in a name, is now known to occur a constant  
> number of
>         # times in the result. If it is in a name, I just punt and
>         # prepare to see it 1 time like Simon does.
>         @target_template = in_names.zip(known_count).collect do | 
> in_name, known|
>             if in_name
>                 1
>             else
>                 known
>             end
>         end
>
>         # Starting with the count of known contents add the letter  
> counts
>         # for the numbers we expect to see. The counts in the  
> target template
>         # indicate that we expect to see N occurences of that  
> (char) in the
>         # result. Thus if 7 a's are reflected in the target we must  
> have
>         # an associated set of bytes {seven's} in the result. The  
> following
>         # code initializes the result template with the static byte  
> counts
>         # that we know must occur in the result. Using the target  
> template as
>         # a guide, we then call the appropriate adder_function to  
> increment
>         # the counts for the spelled out numbers which MUST be  
> present in
>         # the result if our guess remains congruent. e.g. if the  
> target contained
>         # [7, 2, 1, 2]... we MUST increment the result_template by
>         # "sevens" "twos" "one" "twos"
>         @result_template = known_count.dup
>         @target_template.each do |n|
>             @add_word[n].call(@result_template)
>         end
>
>     end
>
>     # Make sure these don't get clobbered so return a copy
>     # Should these be dup'ed or frozen instead?
>     def target_template()
>         @target_template.dup
>     end
>     def result_template()
>         @result_template.dup
>     end
>
>     # This is syntactic sugar, taking any number of arguments
>     # or an array of arguments and calling all of them on a fresh  
> array.
>     def array_from_functions(*args)
>         counts = Array.new(26).fill(0)
>         for arg in args
>             if arg.is_a?(Array)
>                 arg.each do |o| @add_word[o].call(counts); end
>             else
>                 p arg
>                 @add_word[arg].call(counts);
>             end
>         end
>         counts
>     end
>
>     # Easy access to the function maps
>     attr_reader :add_word, :subtract_word
> end
>
> def pangram_main()
>     sallowPrefix = 'This pangram tallies'
>     sallowTarget = [5, 1, 1, 2, 28, 8, 6, 8, 13, 1, 1,
>         3, 2, 18, 15, 2, 1, 7, 25, 22, 4, 4, 9, 2, 4, 1];
>     prefix = "A pangram from jq contains one zebra,"
>
>     # Initialize the CounterFactory and grab the maps
>     cf = CounterFactory.new(prefix)
>     add_word=cf.add_word
>     subtract_word=cf.subtract_word
>
>     # Set up the target and result arrays.
>     # Though these will converge eventually these are not  
> interchangeable!
>     # They are complementary not equivalent. The target initially  
> contains
>     # a set of small integers. Each n refers to the expectation of  
> finding
>     # N of a particular character in the output.  Each N == 1  
> indicates
>     # that we have no idea how many occur in the end result.
>     # Any fortunate N > 1 means, we know exactly how may will be in  
> the
>     # final tally. The result_count entries contain sums for each byte
>     # we know we will encounter and also the sum for the spelled out
>     # numbers reflected in out target_count array.
>     #
>     # These two structure mirror what a pangram is.
>     # The target reflects the semantic intent ->
>     # It (the pangram) will have: 1 a, 2 b's...
>     # The result reflects the mechanical instantiation
>     # "This pangram sentence has one a, two b's..."
>     target_count = cf.target_template
>     result_count = cf.result_template
>
>     delta, i, f, target, result = nil
>     loopcount = 0
>     different = true
>
>     # We will loop until we converge
>     while different
>         if loopcount % 10000 == 0
>             p "--- #{loopcount} " + Time.new.to_s
>             p target_count, result_count
>         end
>         loopcount += 1
>
>         # Each time through, visit and compare each target and  
> result element
>         # While they differ, we will change our guess by updating the
>         # counts in the target_array. Change a guess by increasing
>         # or decreasing the expected occurence of a single character.
>         # Reflect that change in the result by adding and subtracting
>         # multiple counts in the result corresponding to spelled out
>         # numbers. For instance if target_count[i] is changed from
>         # 7 to 12 we decrement the result buckets for the bytes in
>         # 'sevens' and increment by the bytes in 'twelves'. The 'seve'
>         # changes are a wash so we have 1 less n and one more t, w,  
> and l
>         # in the result.
>
>         different = false
>
>         # If the target and result differ make
>         for i in (0..25)
>             target = target_count[i]
>             result = result_count[i]
>
>             if target != result
>                 delta = rand((target - result).abs + 1)
>
>                 begin
>                     subtract_word[target].call(result_count)
>                     if target < result
>                         target = target + delta
>                     else
>                         target = result + delta
>                     end
>                     add_word[target].call(result_count)
>                     target_count[i] = target
>                     different = true
>                 rescue
>                     # I'm not sure how to handle this yet.
>                     # A while ago I was getting errors with some
>                     # pangram prefix strings. The calculations
>                     # diverged, causing calls to nonexistent
>                     # adders or subtractors: with keys 0, -1, -2...
>                     # instead of fixnums between 1 and 99
>                     # This usually happens only after 500K or so  
> iterations.
>                     raise
>                 end
>
>             end
>         end
>     end
>
>     p "solution found in #{loopcount} iterations"
>     p target_count, result_count
>     mypan = prefix + target_count.pan_string
>     t2=(mypan.downcase).to_counts
>     p "mypan" +  "--"*20
>     p t2, mypan
>     p t2 == target_count
> end
>
> if __FILE__ == $0
>     pangram_main()
> end
> #[7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31,  
> 25, 3, 5, 12, 4, 4, 2]
> #panstring="A pangram from jq contains one zebra, seven a's, two  
> b's, two c's, \
> #two d's, twenty-six e's, eight f's, three g's, six h's, ten i's,  
> two j's, one k, \
> #two l's, three m's, fifteen n's, sixteen o's, two p's, two q's,  
> ten r's, \
> #thirty-one s's, twenty-five t's, three u's, five v's, twelve w's,  
> four x's, \
> #four y's, and two z's."
> #true
>
>