------ art_6581_14397722.1144604533009 Content-Type: multipart/alternative; boundary --- art_6582_21519320.1144604533009" ------ art_6582_21519320.1144604533009 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Great quiz. It will be interesting to see how others have solved this problem. Here is my submission. To use run the following: $ cat <some_text_file> | ./rand_text.rb Or you can give options $ cat <some_text_file> | ./rand_text.rb -o 2 -n 10 -o : the order. Which is the number of previous words to consider -n : the number of sentences to output I used an hash of arrays to keep track of the possible state transitions. The key is the current state, and the contents of the array is the possible next states. When generating the output I randomly select elements from this array. I always start with the first 'n' number of words in the original text, where 'n' is the order. There is a sample output where <some_text_file> is Moby Dick and using the default parameters of order = 2, and number of sentences = 10. == Call me Ishmael. Some years ago--never mind how long precisely --having little or no money in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every kind whatsoever. It is a damp, drizzly November in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the mummies of those creatures in their huge bake-houses the pyramids. No, when I go to sea, I go to sea as a Commodore, or a Captain, or a Cook. I abandon the glory and distinction of such offices to those who like them. For my part, I abominate all honorable respectable toils, trials, and tribulations of every kind whatsoever. It is quite as much as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the royal mast-head. True, they rather order me about--however they may thump and punch me about, I have of driving off the spleen, and regulating the circulation == On 4/7/06, Ruby Quiz <james / grayproductions.net> wrote: > > The three rules of Ruby Quiz: > > 1. Please do not post any solutions or spoiler discussion for this quiz > until > 48 hours have passed from the time on this message. > > 2. Support Ruby Quiz by submitting ideas as often as you can: > > http://www.rubyquiz.com/ > > 3. Enjoy! > > Suggestion: A [QUIZ] in the subject of emails about the problem helps > everyone > on Ruby Talk follow the discussion. > > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > This week's Ruby Quiz is about text generation. That's right, we're going > to > teach your computer to weave tall tales. > > At its most basic level, a solution might be: > > >> (1..30).map { (("a".."z").to_a + [" "] * 10)[rand(36)] }.join > => "fb mcr hhluesjbhtf swm eehokmi" > > However, let's make our goal to get as close to English looking sentences > as > possible. One way you might do this is using a technique called Markov > Chains. > > To use Markov Chains, you read some text document(s), making note of which > characters commonly follow which characters or which words commonly follow > other > words (it works for either scale). Then, when generating text, you just > select > a character or word to output, based on the characters or words that came > before > it. > > The number of previous characters or words considered is called the > "order" and > you can adjust that to try and find a natural feel. For example, here is > some > generated text using a second order word chain derived from the Sherlock > Holmes > novel "The Hound of the Baskervilles" by Arthur Conan Doyle: > > The stars shone cold and bright, while a crushing weight of > responsibility > from my shoulders. Suddenly my thoughts with sadness. Then on the > lady's > face. "What can I assist you?" > > If you need text's to prime your program with, I suggest searching Project > Gutenberg: > > http://www.gutenberg.org/ > > ------ art_6582_21519320.1144604533009 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Great quiz. It will be interesting to see how others have solved this problem.<br>Here is my submission.<br><br>To use run the following:<br><br>$ catlt;some_text_file> | ./rand_text.rb<br><br>Or you can give options<br> <br>$ cat <some_text_file> | ./rand_text.rb -o 2 -n 10<br><br>-o : the order. Which is the number of previous words to consider<br>-n : the number of sentences to output<br><br>I used an hash of arrays to keep track of the possible state transitions. The key is the current state, and the contents of the array is the possible next states. When generating the output I randomly select elements from this array. I always start with the first 'n'umber of words in the original text, where 'n' is the order. <br><br>There is a sample output where <some_text_file> is Moby Dick and using the default parameters of order = 2, and number of sentences = 10.<br><br><br><br>==<br>Call me Ishmael.<br>Some years ago--never mind how long precisely --having little or no money in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing uphe rear of every kind whatsoever. <br>It is a damp, drizzly November in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the mummies of those creatures in their huge bake-houses the pyramids.<br>No, when I go to sea, I go to sea as a Commodore, or a Captain, or a Cook. <br>I abandon the glory and distinction of such offices to those who like them.<br>For my part, I abominate all honorable respectable toils, trials, and tribulations of every kind whatsoever.<br>It is quite as much as I can. <br>This is my substitute for pistol and ball.<br>With a philosophical flourish Cato throws himself upon his sword; I quietly take to the royal mast-head.<br>True, they rather order me about--however they may thump and punch me about, I have of driving off the spleen, and regulating the circulation <br>==<br><br><div><span class="gmail_quote">On 4/7/06, <b class="gmail_sendername">Ruby Quiz</b> <james / grayproductions.net> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> The three rules of Ruby Quiz:<br><br>1. Please do not post any solutions or spoiler discussion for this quiz until<br>48 hours have passed from the time on this message.<br><br>2. Support Ruby Quiz by submitting ideas as often as you can: <br><br>http://www.rubyquiz.com/<br><br>3. Enjoy!<br><br>Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone<br>on Ruby Talk follow theiscussion.<br><br> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=<br><br>This week's Ruby Quiz is about text generation. That's right, we're going to<br>teach your computer to weave tall tales.<br><br>At its most basic level, a solution might be: <br><br> >> (1..30).map { (("a".."z").to_a + [" "] * 10)[rand(36)] }.join<br> => "fb mcr hhluesjbhtf swm eehokmi"<br><br>However, let's make our goal to get as close to English looking sentences as <br>possible. One way you might do this is using a technique called Markov Chains.<br><br>To use Markov Chains, you read some text document(s), making note of which<br>characters commonly follow which characters orhich words commonly follow other <br>words (it works for either scale). Then, when generating text, you just select<br>a character or word to output, based on the characters or words that came before<br>it.<br><br>The number of previous charactersr words considered is called the "order" and <br>you can adjust that to try and find a natural feel. For example, here is some<br>generated text using a second order word chain derivedrom the Sherlock Holmes<br>novel "The Hound of the Baskervilles" by Arthur Conan Doyle: <br><br> The stars shone cold and bright, while a crushing weight of responsibility<br> from my shoulders. Suddenly my thoughts with sadness. Then on the lady's<br> face. "What can I assist you?" <br><br>If you need text's to prime your program with, I suggest searching Project<br>Gutenberg:<br><br> http://www.gutenberg.org/<br><br></blockquote></div><br> ------ art_6582_21519320.1144604533009-- ------ art_6581_14397722.1144604533009 Content-Type: application/octet-stream; name=rand_text.rb Content-Transfer-Encoding: 7bit X-Attachment-Id: f_eltnxnit Content-Disposition: attachment; filename="rand_text.rb" #!/cygdrive/c/ruby/bin/ruby num_sentences 0 # number of sentences to output order # order: next word depends on n previous words while ARGV.length > 0 do arg RGV.shift if arg '-o' order RGV.shift.to_i elsif arg '-n' num_sentences RGV.shift.to_i end end # Hash key: string which has the last n words # value: an array of the next words that are seen in the text given the last n( der) words in the key hash } # last_n is an array of the last n words seen while processing the text last_n ] # these are the first n words seen save_first_n ] # This loop reads from stdin line by line # # Some obvious non-text characters are removed. # Multiple punctuations are also removed. # # Otherwise, all punctuation is kept. # # White space is stripped ARGF.each_line do |line| # When special characters like <> and {} and [] are encountered, they and the contents between them are removed. line.gsub!(/<[^>]*>/,"") line.gsub!(/<[^>]*$/,"") line.gsub!(/^[^<]*>/,"") line.gsub!(/\[[^\]]*\]/,"") line.gsub!(/\[[^\]]*$/,"") line.gsub!(/^[^\[]*\]/,"") line.gsub!(/\{[^\}]*\}/,"") line.gsub!(/\{[^\}]*$/,"") line.gsub!(/^[^\{]*\}/,"") # Also remove multiple consecutive punctuation. # I don't this this is allowed in the english language (except for 3 dots) line.gsub!(/[!?|$:;'][!?|$:;']+/,"") line.gsub!(/\.\.\.\.+/,"") # remove more than 4 or more dots line.gsub!(/([^.]|^)\.\.($|[^.])/,"") # remove two dots line.strip! words ine.split words.each do |word| word.strip! if last_n.length order # we've accumulated order # of words # Now we can store the transition to the next word into our hash last_n_str ast_n.join(" ") if hash[last_n_str] hash[last_n_str] << word else hash[last_n_str] word] end # Pop the first element in the last_n queue to make room for the next word last_n.shift else # Save the first n words this will be our starting seed save_first_n << word end last_n << word end end # Initial starting point is the first n words last_n ave_first_n print save_first_n.join(" ") + " " # Print 'num' sentences # Sentences are ended when a one of "!?." is encountered. sentence_cnt while sentence_cnt < num_sentences do last_n_str ast_n.join(" ") if hash[last_n_str] ! il # Randomly select the next word from the last_n words word ash[last_n_str][rand(hash[last_n_str].length)] print word if word /[!?.]$/ # if word contains a "!?." then it is the last word of a sentence sentence_cnt + puts else print " " end end last_n.shift last_n << word end ------ art_6581_14397722.1144604533009--