------art_6581_14397722.1144604533009
Content-Type: multipart/alternative; 
	boundary---art_6582_21519320.1144604533009"

------art_6582_21519320.1144604533009
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Great quiz. It will be interesting to see how others have solved this
problem.
Here is my submission.

To use run the following:

$ cat <some_text_file> | ./rand_text.rb

Or you can give options

$ cat <some_text_file> | ./rand_text.rb -o 2 -n 10

-o : the order. Which is the number of previous words to consider
-n : the number of sentences to output

I used an hash of arrays to keep track of the possible state transitions.
The key is the current state, and the contents of the array is the possible
next states. When generating the output I randomly select elements from this
array. I always start with the first 'n' number of words in the original
text, where 'n' is the order.

There is a sample output where <some_text_file> is Moby Dick and using the
default parameters of order = 2, and number of sentences = 10.



==
Call me Ishmael.
Some years ago--never mind how long precisely --having little or no money in
my soul; whenever I find myself involuntarily pausing before coffin
warehouses, and bringing up the rear of every kind whatsoever.
It is a damp, drizzly November in my purse, and nothing particular to
interest me on shore, I thought I would sail about a little and see the
mummies of those creatures in their huge bake-houses the pyramids.
No, when I go to sea, I go to sea as a Commodore, or a Captain, or a Cook.
I abandon the glory and distinction of such offices to those who like them.
For my part, I abominate all honorable respectable toils, trials, and
tribulations of every kind whatsoever.
It is quite as much as I can.
This is my substitute for pistol and ball.
With a philosophical flourish Cato throws himself upon his sword; I quietly
take to the royal mast-head.
True, they rather order me about--however they may thump and punch me about,
I have of driving off the spleen, and regulating the circulation
==

On 4/7/06, Ruby Quiz <james / grayproductions.net> wrote:
>
> The three rules of Ruby Quiz:
>
> 1.  Please do not post any solutions or spoiler discussion for this quiz
> until
> 48 hours have passed from the time on this message.
>
> 2.  Support Ruby Quiz by submitting ideas as often as you can:
>
> http://www.rubyquiz.com/
>
> 3.  Enjoy!
>
> Suggestion:  A [QUIZ] in the subject of emails about the problem helps
> everyone
> on Ruby Talk follow the discussion.
>
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
> This week's Ruby Quiz is about text generation.  That's right, we're going
> to
> teach your computer to weave tall tales.
>
> At its most basic level, a solution might be:
>
>         >> (1..30).map { (("a".."z").to_a + [" "] * 10)[rand(36)] }.join
>         => "fb mcr hhluesjbhtf swm eehokmi"
>
> However, let's make our goal to get as close to English looking sentences
> as
> possible.  One way you might do this is using a technique called Markov
> Chains.
>
> To use Markov Chains, you read some text document(s), making note of which
> characters commonly follow which characters or which words commonly follow
> other
> words (it works for either scale).  Then, when generating text, you just
> select
> a character or word to output, based on the characters or words that came
> before
> it.
>
> The number of previous characters or words considered is called the
> "order" and
> you can adjust that to try and find a natural feel.  For example, here is
> some
> generated text using a second order word chain derived from the Sherlock
> Holmes
> novel "The Hound of the Baskervilles" by Arthur Conan Doyle:
>
>         The stars shone cold and bright, while a crushing weight of
> responsibility
>         from my shoulders. Suddenly my thoughts with sadness. Then on the
> lady's
>         face. "What can I assist you?"
>
> If you need text's to prime your program with, I suggest searching Project
> Gutenberg:
>
>         http://www.gutenberg.org/
>
>

------art_6582_21519320.1144604533009
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Great quiz. It will be interesting to see how others have solved this problem.<br>Here is my submission.<br><br>To use run the following:<br><br>$ catlt;some_text_file&gt; | ./rand_text.rb<br><br>Or you can give options<br>
<br>$ cat &lt;some_text_file&gt; | ./rand_text.rb -o 2 -n 10<br><br>-o : the order. Which is the number of previous words to consider<br>-n : the number of sentences to output<br><br>I used an hash of arrays to keep track of the possible state transitions. The key is the current state, and the contents of the array is the possible next states. When generating the output I randomly select elements from this array. I always start with the first 'n'umber of words in the original text, where 'n' is the order.
<br><br>There is a sample output where &lt;some_text_file&gt; is Moby Dick and using the default parameters of order = 2, and number of sentences = 10.<br><br><br><br>==<br>Call me Ishmael.<br>Some years ago--never mind how long precisely --having little or no money in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing uphe rear of every kind whatsoever.
<br>It is a damp, drizzly November in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the mummies of those creatures in their huge bake-houses the pyramids.<br>No, when I go to sea, I go to sea as a Commodore, or a Captain, or a Cook.
<br>I abandon the glory and distinction of such offices to those who like them.<br>For my part, I abominate all honorable respectable toils, trials, and tribulations of every kind whatsoever.<br>It is quite as much as I can.
<br>This is my substitute for pistol and ball.<br>With a philosophical flourish Cato throws himself upon his sword; I quietly take to the royal mast-head.<br>True, they rather order me about--however they may thump and punch me about, I have of driving off the spleen, and regulating the circulation
<br>==<br><br><div><span class="gmail_quote">On 4/7/06, <b class="gmail_sendername">Ruby Quiz</b> &lt;james / grayproductions.net&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
The three rules of Ruby Quiz:<br><br>1.&nbsp;&nbsp;Please do not post any solutions or spoiler discussion for this quiz until<br>48 hours have passed from the time on this message.<br><br>2.&nbsp;&nbsp;Support Ruby Quiz by submitting ideas as often as you can:
<br><br>http://www.rubyquiz.com/<br><br>3.&nbsp;&nbsp;Enjoy!<br><br>Suggestion:&nbsp;&nbsp;A [QUIZ] in the subject of emails about the problem helps everyone<br>on Ruby Talk follow theiscussion.<br><br>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=<br><br>This week's Ruby Quiz is about text generation.&nbsp;&nbsp;That's right, we're going to<br>teach your computer to weave tall tales.<br><br>At its most basic level, a solution might be:
<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt; (1..30).map { ((&quot;a&quot;..&quot;z&quot;).to_a + [&quot; &quot;] * 10)[rand(36)] }.join<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=&gt; &quot;fb mcr hhluesjbhtf swm eehokmi&quot;<br><br>However, let's make our goal to get as close to English looking sentences as
<br>possible.&nbsp;&nbsp;One way you might do this is using a technique called Markov Chains.<br><br>To use Markov Chains, you read some text document(s), making note of which<br>characters commonly follow which characters orhich words commonly follow other
<br>words (it works for either scale).&nbsp;&nbsp;Then, when generating text, you just select<br>a character or word to output, based on the characters or words that came before<br>it.<br><br>The number of previous charactersr words considered is called the &quot;order&quot; and
<br>you can adjust that to try and find a natural feel.&nbsp;&nbsp;For example, here is some<br>generated text using a second order word chain derivedrom the Sherlock Holmes<br>novel &quot;The Hound of the Baskervilles&quot; by Arthur Conan Doyle:
<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The stars shone cold and bright, while a crushing weight of responsibility<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;from my shoulders. Suddenly my thoughts with sadness. Then on the lady's<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;face. &quot;What can I assist you?&quot;
<br><br>If you need text's to prime your program with, I suggest searching Project<br>Gutenberg:<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;http://www.gutenberg.org/<br><br></blockquote></div><br>

------art_6582_21519320.1144604533009--

------art_6581_14397722.1144604533009
Content-Type: application/octet-stream; name=rand_text.rb
Content-Transfer-Encoding: 7bit
X-Attachment-Id: f_eltnxnit
Content-Disposition: attachment; filename="rand_text.rb"

#!/cygdrive/c/ruby/bin/ruby

num_sentences  0 # number of sentences to output
order     # order: next word depends on n previous words

while ARGV.length > 0 do 
    arg  RGV.shift
    if arg '-o'
        order  RGV.shift.to_i
    elsif arg '-n'
        num_sentences  RGV.shift.to_i        
    end
end

# Hash key: string which has the last n words
# value: an array of the next words that are seen in the text given the last n(der) words in the key
hash  }
# last_n is an array of the last n words seen while processing the text
last_n  ]
# these are the first n words seen
save_first_n  ]

# This loop reads from stdin line by line
#
# Some obvious non-text characters are removed.
# Multiple punctuations are also removed. #
# Otherwise, all punctuation is kept.
#
# White space is stripped
ARGF.each_line do |line|
    # When special characters like <> and {} and [] are encountered, they and the contents between them are removed.    
    line.gsub!(/<[^>]*>/,"")
    line.gsub!(/<[^>]*$/,"")
    line.gsub!(/^[^<]*>/,"")
    
    line.gsub!(/\[[^\]]*\]/,"")
    line.gsub!(/\[[^\]]*$/,"")
    line.gsub!(/^[^\[]*\]/,"")
    
    line.gsub!(/\{[^\}]*\}/,"")
    line.gsub!(/\{[^\}]*$/,"")
    line.gsub!(/^[^\{]*\}/,"")
    
    # Also remove multiple consecutive punctuation. 
    # I don't this this is allowed in the english language (except for 3 dots)
    line.gsub!(/[!?|$:;'][!?|$:;']+/,"")
    line.gsub!(/\.\.\.\.+/,"")      # remove more than 4 or more dots
    line.gsub!(/([^.]|^)\.\.($|[^.])/,"") # remove two dots
    
    line.strip!
       
    words  ine.split
    words.each do |word|
        word.strip! 
        if last_n.length order
            # we've accumulated order # of words
            # Now we can store the transition to the next word into our hash
            last_n_str  ast_n.join(" ")
            if hash[last_n_str] 
                hash[last_n_str] << word
            else
                hash[last_n_str]  word]
            end
            # Pop the first element in the last_n queue to make room for the next word 
            last_n.shift
        else
            # Save the first n words this will be our starting seed
            save_first_n << word
        end
        last_n << word        
    end
end

# Initial starting point is the first n words
last_n  ave_first_n

print save_first_n.join(" ") + " "

# Print 'num' sentences
# Sentences are ended when a one of "!?." is encountered.
sentence_cnt  
while sentence_cnt < num_sentences do
    last_n_str  ast_n.join(" ")
    if hash[last_n_str] ! il 
      # Randomly select the next word from the last_n words
      word  ash[last_n_str][rand(hash[last_n_str].length)]
      print word
      if word /[!?.]$/
          # if word contains a "!?." then it is the last word of a sentence
          sentence_cnt + 
          puts
      else
          print " "
      end
    end
    last_n.shift
    last_n << word    
end


------art_6581_14397722.1144604533009--