------=_Part_6581_14397722.1144604533009
Content-Type: multipart/alternative; 
	boundary="----=_Part_6582_21519320.1144604533009"

------=_Part_6582_21519320.1144604533009
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Great quiz. It will be interesting to see how others have solved this
problem.
Here is my submission.

To use run the following:

$ cat <some_text_file> | ./rand_text.rb

Or you can give options

$ cat <some_text_file> | ./rand_text.rb -o 2 -n 10

-o : the order. Which is the number of previous words to consider
-n : the number of sentences to output

I used an hash of arrays to keep track of the possible state transitions.
The key is the current state, and the contents of the array is the possible
next states. When generating the output I randomly select elements from thi=
s
array. I always start with the first 'n' number of words in the original
text, where 'n' is the order.

There is a sample output where <some_text_file> is Moby Dick and using the
default parameters of order =3D 2, and number of sentences =3D 10.



=3D=3D
Call me Ishmael.
Some years ago--never mind how long precisely --having little or no money i=
n
my soul; whenever I find myself involuntarily pausing before coffin
warehouses, and bringing up the rear of every kind whatsoever.
It is a damp, drizzly November in my purse, and nothing particular to
interest me on shore, I thought I would sail about a little and see the
mummies of those creatures in their huge bake-houses the pyramids.
No, when I go to sea, I go to sea as a Commodore, or a Captain, or a Cook.
I abandon the glory and distinction of such offices to those who like them.
For my part, I abominate all honorable respectable toils, trials, and
tribulations of every kind whatsoever.
It is quite as much as I can.
This is my substitute for pistol and ball.
With a philosophical flourish Cato throws himself upon his sword; I quietly
take to the royal mast-head.
True, they rather order me about--however they may thump and punch me about=
,
I have of driving off the spleen, and regulating the circulation
=3D=3D

On 4/7/06, Ruby Quiz <james / grayproductions.net> wrote:
>
> The three rules of Ruby Quiz:
>
> 1.  Please do not post any solutions or spoiler discussion for this quiz
> until
> 48 hours have passed from the time on this message.
>
> 2.  Support Ruby Quiz by submitting ideas as often as you can:
>
> http://www.rubyquiz.com/
>
> 3.  Enjoy!
>
> Suggestion:  A [QUIZ] in the subject of emails about the problem helps
> everyone
> on Ruby Talk follow the discussion.
>
>
> -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D
>
> This week's Ruby Quiz is about text generation.  That's right, we're goin=
g
> to
> teach your computer to weave tall tales.
>
> At its most basic level, a solution might be:
>
>         >> (1..30).map { (("a".."z").to_a + [" "] * 10)[rand(36)] }.join
>         =3D> "fb mcr hhluesjbhtf swm eehokmi"
>
> However, let's make our goal to get as close to English looking sentences
> as
> possible.  One way you might do this is using a technique called Markov
> Chains.
>
> To use Markov Chains, you read some text document(s), making note of whic=
h
> characters commonly follow which characters or which words commonly follo=
w
> other
> words (it works for either scale).  Then, when generating text, you just
> select
> a character or word to output, based on the characters or words that came
> before
> it.
>
> The number of previous characters or words considered is called the
> "order" and
> you can adjust that to try and find a natural feel.  For example, here is
> some
> generated text using a second order word chain derived from the Sherlock
> Holmes
> novel "The Hound of the Baskervilles" by Arthur Conan Doyle:
>
>         The stars shone cold and bright, while a crushing weight of
> responsibility
>         from my shoulders. Suddenly my thoughts with sadness. Then on the
> lady's
>         face. "What can I assist you?"
>
> If you need text's to prime your program with, I suggest searching Projec=
t
> Gutenberg:
>
>         http://www.gutenberg.org/
>
>

------=_Part_6582_21519320.1144604533009
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Great quiz. It will be interesting to see how others have solved this probl=
em.<br>Here is my submission.<br><br>To use run the following:<br><br>$ cat=
 &lt;some_text_file&gt; | ./rand_text.rb<br><br>Or you can give options<br>
<br>$ cat &lt;some_text_file&gt; | ./rand_text.rb -o 2 -n 10<br><br>-o : th=
e order. Which is the number of previous words to consider<br>-n : the numb=
er of sentences to output<br><br>I used an hash of arrays to keep track of =
the possible state transitions. The key is the current state, and the conte=
nts of the array is the possible next states. When generating the output I =
randomly select elements from this array. I always start with the first 'n'=
 number of words in the original text, where 'n' is the order.
<br><br>There is a sample output where &lt;some_text_file&gt; is Moby Dick =
and using the default parameters of order =3D 2, and number of sentences =
=3D 10.<br><br><br><br>=3D=3D<br>Call me Ishmael.<br>Some years ago--never =
mind how long precisely --having little or no money in my soul; whenever I =
find myself involuntarily pausing before coffin warehouses, and bringing up=
 the rear of every kind whatsoever.
<br>It is a damp, drizzly November in my purse, and nothing particular to i=
nterest me on shore, I thought I would sail about a little and see the mumm=
ies of those creatures in their huge bake-houses the pyramids.<br>No, when =
I go to sea, I go to sea as a Commodore, or a Captain, or a Cook.
<br>I abandon the glory and distinction of such offices to those who like t=
hem.<br>For my part, I abominate all honorable respectable toils, trials, a=
nd tribulations of every kind whatsoever.<br>It is quite as much as I can.
<br>This is my substitute for pistol and ball.<br>With a philosophical flou=
rish Cato throws himself upon his sword; I quietly take to the royal mast-h=
ead.<br>True, they rather order me about--however they may thump and punch =
me about, I have of driving off the spleen, and regulating the circulation
<br>=3D=3D<br><br><div><span class=3D"gmail_quote">On 4/7/06, <b class=3D"g=
mail_sendername">Ruby Quiz</b> &lt;<a href=3D"mailto:james@grayproductions.=
net">james / grayproductions.net</a>&gt; wrote:</span><blockquote class=3D"gm=
ail_quote" style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt =
0pt 0pt 0.8ex; padding-left: 1ex;">
The three rules of Ruby Quiz:<br><br>1.&nbsp;&nbsp;Please do not post any s=
olutions or spoiler discussion for this quiz until<br>48 hours have passed =
from the time on this message.<br><br>2.&nbsp;&nbsp;Support Ruby Quiz by su=
bmitting ideas as often as you can:
<br><br><a href=3D"http://www.rubyquiz.com/">http://www.rubyquiz.com/</a><b=
r><br>3.&nbsp;&nbsp;Enjoy!<br><br>Suggestion:&nbsp;&nbsp;A [QUIZ] in the su=
bject of emails about the problem helps everyone<br>on Ruby Talk follow the=
 discussion.<br><br>
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D<br><br>This week's Ruby Quiz is about text generation.&nbsp;&n=
bsp;That's right, we're going to<br>teach your computer to weave tall tales=
.<br><br>At its most basic level, a solution might be:
<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt; (1..30).ma=
p { ((&quot;a&quot;..&quot;z&quot;).to_a + [&quot; &quot;] * 10)[rand(36)] =
}.join<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=3D&gt; &quot;fb =
mcr hhluesjbhtf swm eehokmi&quot;<br><br>However, let's make our goal to ge=
t as close to English looking sentences as
<br>possible.&nbsp;&nbsp;One way you might do this is using a technique cal=
led Markov Chains.<br><br>To use Markov Chains, you read some text document=
(s), making note of which<br>characters commonly follow which characters or=
 which words commonly follow other
<br>words (it works for either scale).&nbsp;&nbsp;Then, when generating tex=
t, you just select<br>a character or word to output, based on the character=
s or words that came before<br>it.<br><br>The number of previous characters=
 or words considered is called the &quot;order&quot; and
<br>you can adjust that to try and find a natural feel.&nbsp;&nbsp;For exam=
ple, here is some<br>generated text using a second order word chain derived=
 from the Sherlock Holmes<br>novel &quot;The Hound of the Baskervilles&quot=
; by Arthur Conan Doyle:
<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The stars shone col=
d and bright, while a crushing weight of responsibility<br>&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;from my shoulders. Suddenly my thoughts wit=
h sadness. Then on the lady's<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;face. &quot;What can I assist you?&quot;
<br><br>If you need text's to prime your program with, I suggest searching =
Project<br>Gutenberg:<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;<a href=3D"http://www.gutenberg.org/">http://www.gutenberg.org/</a><br><b=
r></blockquote></div><br>

------=_Part_6582_21519320.1144604533009--

------=_Part_6581_14397722.1144604533009
Content-Type: application/octet-stream; name=rand_text.rb
Content-Transfer-Encoding: 7bit
X-Attachment-Id: f_eltnxnit
Content-Disposition: attachment; filename="rand_text.rb"

#!/cygdrive/c/ruby/bin/ruby

num_sentences = 10 # number of sentences to output
order = 2   # order: next word depends on n previous words

while ARGV.length > 0 do 
    arg = ARGV.shift
    if arg == '-o'
        order = ARGV.shift.to_i
    elsif arg == '-n'
        num_sentences = ARGV.shift.to_i        
    end
end

# Hash key: string which has the last n words
# value: an array of the next words that are seen in the text given the last n(=order) words in the key
hash = {}
# last_n is an array of the last n words seen while processing the text
last_n = []
# these are the first n words seen
save_first_n = []

# This loop reads from stdin line by line
#
# Some obvious non-text characters are removed.
# Multiple punctuations are also removed. #
# Otherwise, all punctuation is kept.
#
# White space is stripped
ARGF.each_line do |line|
    # When special characters like <> and {} and [] are encountered, they and the contents between them are removed.    
    line.gsub!(/<[^>]*>/,"")
    line.gsub!(/<[^>]*$/,"")
    line.gsub!(/^[^<]*>/,"")
    
    line.gsub!(/\[[^\]]*\]/,"")
    line.gsub!(/\[[^\]]*$/,"")
    line.gsub!(/^[^\[]*\]/,"")
    
    line.gsub!(/\{[^\}]*\}/,"")
    line.gsub!(/\{[^\}]*$/,"")
    line.gsub!(/^[^\{]*\}/,"")
    
    # Also remove multiple consecutive punctuation. 
    # I don't this this is allowed in the english language (except for 3 dots)
    line.gsub!(/[!?|$:;'][!?|$:;']+/,"")
    line.gsub!(/\.\.\.\.+/,"")      # remove more than 4 or more dots
    line.gsub!(/([^.]|^)\.\.($|[^.])/,"") # remove two dots
    
    line.strip!
       
    words = line.split
    words.each do |word|
        word.strip! 
        if last_n.length == order
            # we've accumulated order # of words
            # Now we can store the transition to the next word into our hash
            last_n_str = last_n.join(" ")
            if hash[last_n_str] 
                hash[last_n_str] << word
            else
                hash[last_n_str] = [word]
            end
            # Pop the first element in the last_n queue to make room for the next word 
            last_n.shift
        else
            # Save the first n words this will be our starting seed
            save_first_n << word
        end
        last_n << word        
    end
end

# Initial starting point is the first n words
last_n = save_first_n

print save_first_n.join(" ") + " "

# Print 'num' sentences
# Sentences are ended when a one of "!?." is encountered.
sentence_cnt = 0
while sentence_cnt < num_sentences do
    last_n_str = last_n.join(" ")
    if hash[last_n_str] != nil 
      # Randomly select the next word from the last_n words
      word = hash[last_n_str][rand(hash[last_n_str].length)]
      print word
      if word =~ /[!?.]$/
          # if word contains a "!?." then it is the last word of a sentence
          sentence_cnt += 1
          puts
      else
          print " "
      end
    end
    last_n.shift
    last_n << word    
end


------=_Part_6581_14397722.1144604533009--