Begin forwarded message: > From: James Quick <jimquick / mac.com> > Date: July 12, 2006 9:51:17 AM CDT > To: submission / rubyquiz.com > Subject: Please Forward: Ruby Quiz Submission > > I am not on the ruby-talk list, and would appreciate if you could > pass this > along. This was a nice problem to get my feet wet in this language. I > began looking at Ruby last week after hearing about what an old > colleague > of mine was doing with it. Because I am familiar with several > languages > with which it shares some features (smalltalk, eiffel, objective- > c, perl) it > is deceptively easy to pick up, and occasionally infuriating. Thank > you, > James, for providing this excellent service to the community. > Sometimes > it is essential to have throwaway problems to guide one's exploration. > > My first attempts were very classy (in a bad way) partitioning > things into separate > classes, rigorously defining interfaces and accessor methods. They > were also > very slow, and usually failed to converge on a solution. > > Then, by reading source code for a bunch of modules in /usr/lib/ > ruby, I found > that mixing a few methods into the base classes was not only > sufficient for > most of my uses but preferable. Profiling showed that much of my > time was > being spent in each_byte loops so I ripped out the conversions to > and from > Strings and replaced it with a byte comparison factory which stored > lambda > functions. This way, I could pretend that I was producing arrays of > byte counts > from string operations without actually doing them more than once. > > This was more than twice as fast as my previous attempts but was still > not converging very quickly. I had already optimized my target_counts > array by initializing it with complete information on what was > knowable > about a solution before starting a search. Unfortunately during the > search, > if several hundred thousand loops had occurred, I would do some sanity > checking which proved to be insane. I converted the target counts > into a > string then back into a count array. If the real result of the > sanity check was > beyond the range of the target and source, I would munge one or more > values into my target array, thus throwing out all the good work! > > It was not until I saw the excellent submissions by Simon Krer that > I realized > that my sanity checking was demoting me to a random search. Optimal > random > Robinizing is more like solving a rubix cube than like searching > randomly > for text. If properly initialized the target and result are > mathematically linked. > If you make any change to the target or source without making a > complementary > change to the other, you will immediately devolve into a random > search. > Doing so is the equivalent of occasionally removing a piece from the > cube, twisting it, and putting it back. You may have made it > impossible > to solve (unless by chance you randomly undo what you just did). > > Basically, the target contains a direct representation of the > pangram search > space in the form of c->n. Each count expresses the assertion that > there be > n occurrences of the character 'c' in the pangram. For a particular > value of > n it may be a completely bogus assertion. All that matters is that > the assertion > is congruent with the information stored in the result_string. > > I initialize my target and source with the exact values for > constant text > and their spelled out occurrences. Simon does it far more simply at > the > expense of a few more iterations (but his chainsaw has a much more > powerful engine!). He initializes his target (which he calls > guess) to > an array of 1's indicating the invariant assumption that there will be > 1 occurrence of each letter of the alphabet in the pangram. His > result > starts with an array of 1's (representing a-z from the itemized > list n a's, n b's .... and n z's). He then adds the counts for > "and", the prefix > string, and 26 copies of the word "one". The 26 copies of the word > "one" > provide the required link which binds the guess and the "real" result > together. As soon as he chooses a better value than 1 for a letter > frequency he will subtract "one" and then add the spelled out version > of the new choice. > > These 26 copies of "one", or the loop of calls to adder lambda > functions > in my version, each perform an equivalent task. They ensure that the > target and result are rigorously linked. > > From here on, despite their quite different implementations, our > solutions > are essentially the same. Random Robinizing proceeds as follows: > When the target and source differ, randomly choose a new value(n). > Add or subtract that value from the source (plain integer arithmetic). > In the destination, decrement the letter counts for each letter in the > spelled out number (before the change), then increment the letter > counts for the new spelled out number. > > As you can see, the source and destination have completely different > operations performed on them. They are different but embody the > essence of the pangram itself. A self referential sentence whose > implementation (spelling) is identical to its semantic meaning > (numerical commentary about the sentence). > > If you (like me) ran into a performance wall, and found that > answers were converging slowly or not at all, take a step back. > As soon as you break the underlying relationship between the > target and destination, your program will stray into a random walk, > which may contain long cycles of aimless wandering. > > I think there may be some bugs lurking here, though I think most are > gone. I would appreciate any stylistic or other advice on best > practices. > I alternated several times between studlyCaps and c style naming. > I also realize that I have know idea about the relative performance > tradeoffs between cacheing variables and evaluating method calls. > > I did see some fairly large performance boosts when variables are > found first in the outer scope rather then being allocated and lost > with each iteration. > > Finally, I am interested in learning where to look for up to date > information on the language definition or syntax. I could only find > something circa 1.6. > > The following code makes me realize that I am far from being > proficient in this language. However, I hope the comments and > the basic algorithm are clear enough to provide some useful > insight into the problem domain. > > here is some sample output. > > As you can see 9 of the 26 letters are already solved before the > first pass > through the loop. This neighborhood quickly converged. In retrospect > I am now sure that the 'q' from my initials helped. Zebras are useful > too. > lili% ruby jqpangram.rb > "--- 0 Wed Jul 12 10:32:42 EDT 2006" > [7, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 3, 1, 1, 2, 2, 1, 1, 1, 1, 1, > 1, 1, 1, 2] > [7, 2, 2, 2, 24, 2, 2, 2, 2, 2, 1, 1, 3, 24, 28, 2, 2, 5, 12, 10, > 1, 2, 8, 1, 1, 2] > "--- 10000 Wed Jul 12 10:32:47 EDT 2006" > [7, 2, 2, 2, 28, 5, 3, 7, 8, 2, 1, 2, 3, 16, 17, 2, 2, 12, 31, 25, > 3, 7, 12, 3, 4, 2] > [7, 2, 2, 2, 35, 5, 4, 8, 8, 2, 1, 3, 3, 16, 15, 2, 2, 10, 32, 26, > 2, 9, 13, 2, 4, 2] > "--- 20000 Wed Jul 12 10:32:52 EDT 2006" > [7, 2, 2, 2, 26, 6, 2, 4, 10, 2, 1, 1, 3, 22, 16, 2, 2, 10, 31, 26, > 2, 5, 14, 4, 5, 2] > [7, 2, 2, 2, 21, 7, 2, 3, 9, 2, 1, 1, 3, 17, 20, 2, 2, 9, 31, 25, > 4, 4, 14, 5, 5, 2] > "--- 30000 Wed Jul 12 10:32:57 EDT 2006" > [7, 2, 2, 2, 28, 6, 4, 6, 10, 2, 1, 2, 3, 17, 16, 2, 2, 9, 30, 25, > 3, 5, 12, 3, 4, 2] > [7, 2, 2, 2, 27, 6, 3, 6, 10, 2, 1, 2, 3, 16, 15, 2, 2, 10, 32, 24, > 3, 6, 12, 4, 4, 2] > "--- 40000 Wed Jul 12 10:33:01 EDT 2006" > [7, 2, 2, 2, 29, 8, 4, 8, 9, 2, 1, 3, 3, 15, 17, 2, 2, 12, 30, 23, > 4, 6, 12, 2, 4, 2] > [7, 2, 2, 2, 28, 7, 4, 7, 9, 2, 1, 3, 3, 17, 16, 2, 2, 11, 30, 25, > 4, 5, 13, 2, 4, 2] > "--- 50000 Wed Jul 12 10:33:06 EDT 2006" > [7, 2, 2, 2, 30, 6, 2, 7, 10, 2, 1, 1, 3, 19, 17, 2, 2, 11, 30, 23, > 3, 6, 10, 4, 4, 2] > [7, 2, 2, 2, 28, 4, 2, 6, 7, 2, 1, 2, 3, 19, 16, 2, 2, 11, 31, 23, > 3, 5, 10, 3, 4, 2] > "--- 60000 Wed Jul 12 10:33:11 EDT 2006" > [7, 2, 2, 2, 29, 7, 2, 6, 9, 2, 1, 3, 3, 19, 16, 2, 2, 11, 33, 23, > 4, 5, 12, 4, 4, 2] > [7, 2, 2, 2, 31, 6, 2, 6, 9, 2, 1, 3, 3, 20, 16, 2, 2, 12, 31, 23, > 4, 6, 12, 3, 4, 2] > "solution found in 69833 iterations" > [7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25, > 3, 5, 12, 4, 4, 2] > [7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25, > 3, 5, 12, 4, 4, 2] > "mypan----------------------------------------" > [7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, 25, > 3, 5, 12, 4, 4, 2] > "A pangram from jq contains one zebra, seven a's, two b's, two c's, > two d's, twenty-six e's, eight f's, three g's, six h's, ten i's, > two j's, one k, two l's, three m's, fifteen n's, sixteen o's, two > p's, two q's, ten r's, thirty-one s's, twenty-five t's, three u's, > five v's, twelve w's, four x's, four y's, and two z's." > true > > > cut here VVVV jqpangram.rb follows > -------------------------------------- > #!/usr/local/bin/ruby -w > > class Integer > ONES = %w[ zero one two three four five six seven eight nine ] > TEEN = %w[ten eleven twelve thirteen fourteen fifteen > sixteen seventeen eighteen nineteen ] > TENS = %w[ zero ten twenty thirty forty fifty > sixty seventy eighty ninety ] > > def to_en > d, n = self.divmod(10) > if d == 0 > ONES[n] > elsif d == 1 > TEEN[n] > elsif d < 10 > TENS[d] + ((n > 0) ? ("-" + ONES[n]) : "") > else > raise "Class Integer#to_en only goes to 99 not #{self}" > end > end > end > > class String > # to_counts returns a 26 element array containing the counts of > # each letter in the input string. e.g. "add".to_counts -> > [1,0,0,2,0...] > # Only counting lower case for speed. Be sure to downcase. > def to_counts > counts = Array.new(26).fill(0) > each_byte do |b| counts[b - ?a] += 1 if b >= ?a && b <= ?z end > counts > end > > # similar to to_counts except adds the result into an existing > array. > def accum_counts(counts) > each_byte do |b| counts[b - ?a] += 1 if b >= ?a && b <= ?z end > end > end > > class Array > include Enumerable > # Return an array of textual pangram sections from a count_array > def pan_body() > self.zip((?a..?z).to_a).collect do |n, c| > sprintf(" %s%s %c%s", \ > (c==?z ? "and " : ""), \ > n.to_en, c, \ > (n>1) ? "'s" : "") > end > end > > # Turn a pangram text array into a string. > def pan_string() > pan_body().join(',') + "." > end > end > > class CounterFactory > # The designated initializer is called with a pangram prefix > string > # e.g. "This string contains". It sets up a pair of hash tables > # filled with pattern counting functions and other initial values. > def initialize(withString=nil) > @add_word = {} > @subtract_word = {} > add_numbers('+', @add_word) > add_numbers('-', @subtract_word) > add_initial(withString) if withString != nil > add_numbers('+', @add_word) > end > > # add_function takes an input string and creates lambda > functions which > # transform count arrays (like those made by to_counts). > # add_function(1, "one", "+", add_word) creates a function of > arity 1 > # and adds it to the @add_word hash table with a key of 1. > # Calling this function with a count array increments the 'o', > 'n', and 'e' > # letter counts. If the input array was initially filled with > 0, then > # the content of that array is equivalent to "one".to_counts. > def add_function(key, aString, op, table) > counts = aString.downcase.to_counts > src = 'lambda { |targ| ' > counts.each_with_index do |n,i| > src += "targ[#{i}] #{op}= #{n}; " if (n > 0) > end > src += '}' > table[key] = eval(src) > end > > # Add functions for each named number, with integer keys, to a > table. > def add_numbers(op, table) > #add_function(0, "no", op, table) > add_function(1, "one", op, table) > for n in (2..99) > add_function(n, n.to_en + "s", op, table) > end > end > > # Add a function mapping the constant elements of a pangram to > a count array. > def add_initial(prefix) > # A pangram is : "prefix" + counts of each letter + "and" > before 'z' > known_string = prefix.downcase + ('a'..'z').to_a.join("") + > "and" > add_function(:initial, known_string, '+', @add_word) > > # known_count[letter] is the count of each letter we know > must exist > known_count = known_string.to_counts > > name_bytes = Array.new(26).fill(0) > (1..99).each { |n| n.to_en.accum_counts(name_bytes) } > in_names = name_bytes.collect {|n| n > 0} > # Now we have an array of booleans representing whether a > character > # is constant (only appearing in the prefix or enumerated > list) > # or is a variable quantity (because it also appears in > number names > > # From the count of what we initially knew to be present, > return > # our initial target. This is our best initial guess for a > result > # which will converge quickly. Basically, anything which is > # not found in a name, is now known to occur a constant > number of > # times in the result. If it is in a name, I just punt and > # prepare to see it 1 time like Simon does. > @target_template = in_names.zip(known_count).collect do | > in_name, known| > if in_name > 1 > else > known > end > end > > # Starting with the count of known contents add the letter > counts > # for the numbers we expect to see. The counts in the > target template > # indicate that we expect to see N occurences of that > (char) in the > # result. Thus if 7 a's are reflected in the target we must > have > # an associated set of bytes {seven's} in the result. The > following > # code initializes the result template with the static byte > counts > # that we know must occur in the result. Using the target > template as > # a guide, we then call the appropriate adder_function to > increment > # the counts for the spelled out numbers which MUST be > present in > # the result if our guess remains congruent. e.g. if the > target contained > # [7, 2, 1, 2]... we MUST increment the result_template by > # "sevens" "twos" "one" "twos" > @result_template = known_count.dup > @target_template.each do |n| > @add_word[n].call(@result_template) > end > > end > > # Make sure these don't get clobbered so return a copy > # Should these be dup'ed or frozen instead? > def target_template() > @target_template.dup > end > def result_template() > @result_template.dup > end > > # This is syntactic sugar, taking any number of arguments > # or an array of arguments and calling all of them on a fresh > array. > def array_from_functions(*args) > counts = Array.new(26).fill(0) > for arg in args > if arg.is_a?(Array) > arg.each do |o| @add_word[o].call(counts); end > else > p arg > @add_word[arg].call(counts); > end > end > counts > end > > # Easy access to the function maps > attr_reader :add_word, :subtract_word > end > > def pangram_main() > sallowPrefix = 'This pangram tallies' > sallowTarget = [5, 1, 1, 2, 28, 8, 6, 8, 13, 1, 1, > 3, 2, 18, 15, 2, 1, 7, 25, 22, 4, 4, 9, 2, 4, 1]; > prefix = "A pangram from jq contains one zebra," > > # Initialize the CounterFactory and grab the maps > cf = CounterFactory.new(prefix) > add_word=cf.add_word > subtract_word=cf.subtract_word > > # Set up the target and result arrays. > # Though these will converge eventually these are not > interchangeable! > # They are complementary not equivalent. The target initially > contains > # a set of small integers. Each n refers to the expectation of > finding > # N of a particular character in the output. Each N == 1 > indicates > # that we have no idea how many occur in the end result. > # Any fortunate N > 1 means, we know exactly how may will be in > the > # final tally. The result_count entries contain sums for each byte > # we know we will encounter and also the sum for the spelled out > # numbers reflected in out target_count array. > # > # These two structure mirror what a pangram is. > # The target reflects the semantic intent -> > # It (the pangram) will have: 1 a, 2 b's... > # The result reflects the mechanical instantiation > # "This pangram sentence has one a, two b's..." > target_count = cf.target_template > result_count = cf.result_template > > delta, i, f, target, result = nil > loopcount = 0 > different = true > > # We will loop until we converge > while different > if loopcount % 10000 == 0 > p "--- #{loopcount} " + Time.new.to_s > p target_count, result_count > end > loopcount += 1 > > # Each time through, visit and compare each target and > result element > # While they differ, we will change our guess by updating the > # counts in the target_array. Change a guess by increasing > # or decreasing the expected occurence of a single character. > # Reflect that change in the result by adding and subtracting > # multiple counts in the result corresponding to spelled out > # numbers. For instance if target_count[i] is changed from > # 7 to 12 we decrement the result buckets for the bytes in > # 'sevens' and increment by the bytes in 'twelves'. The 'seve' > # changes are a wash so we have 1 less n and one more t, w, > and l > # in the result. > > different = false > > # If the target and result differ make > for i in (0..25) > target = target_count[i] > result = result_count[i] > > if target != result > delta = rand((target - result).abs + 1) > > begin > subtract_word[target].call(result_count) > if target < result > target = target + delta > else > target = result + delta > end > add_word[target].call(result_count) > target_count[i] = target > different = true > rescue > # I'm not sure how to handle this yet. > # A while ago I was getting errors with some > # pangram prefix strings. The calculations > # diverged, causing calls to nonexistent > # adders or subtractors: with keys 0, -1, -2... > # instead of fixnums between 1 and 99 > # This usually happens only after 500K or so > iterations. > raise > end > > end > end > end > > p "solution found in #{loopcount} iterations" > p target_count, result_count > mypan = prefix + target_count.pan_string > t2=(mypan.downcase).to_counts > p "mypan" + "--"*20 > p t2, mypan > p t2 == target_count > end > > if __FILE__ == $0 > pangram_main() > end > #[7, 2, 2, 2, 26, 8, 3, 6, 10, 2, 1, 2, 3, 15, 16, 2, 2, 10, 31, > 25, 3, 5, 12, 4, 4, 2] > #panstring="A pangram from jq contains one zebra, seven a's, two > b's, two c's, \ > #two d's, twenty-six e's, eight f's, three g's, six h's, ten i's, > two j's, one k, \ > #two l's, three m's, fifteen n's, sixteen o's, two p's, two q's, > ten r's, \ > #thirty-one s's, twenty-five t's, three u's, five v's, twelve w's, > four x's, \ > #four y's, and two z's." > #true > >