Good quiz this one. 

Markov Chains are highly practical creations, we use them extensively when 
calulating life insurance and pensions schemes in Mercer. 

Wikipedia has a nice summary of other applications, I recommend it hihgly:
http://en.wikipedia.org/wiki/Markov_chains#Scientific_applications

At University, they taught us Markov Chains multiple times for various
uses, but none of them used this linguistic approach. It's very intuitive,
and so much more fun than the hard-core introductions we were given. So if
I ever teach, I'll teach this quiz.


Here we go:

As primer for this writeup I've used Agatha Cristie's "The mysterious
affair at Styles", a 57_000 words english novel starring Poirot, her
famous detective.

To establish the chain of words I use a Hash of Hashes, where the first is
a simple lookup of words, and the second containt the words that folow.
The second also counts how many times each word follows the leader.

Example: 

Murder of Emily Agnes...
Murder against Alfred Inglethorp ...
Murder against some person ...
Murder against him right ...

@words["Murder"]        >> {"of"=>1, "against"=>3}


The counts are to weigh the randomness when selecting the next word. I
wanted a selection process where rare followers in the real world are rear
in this world as well. After 'Murder', 'of' will pop up every forth time.
This is done here

    sum = candidates.inject(0) {|sum,kv| sum += kv[1]}
    random = rand(sum)+1
    partial_sum = 0
    next_word = candidates.find do |word, count|
      partial_sum += count
      partial_sum >= random
    end.first

sum is the total number of followers (4 for 'Murder'). I pick a random
number 0..3, add +1 for indexing and iterate my way up the list of
followers, and keep track of how many words I have passed on my way. When
we reach (or pass for the first time) the random limit, the word at hand
is the word we want. Since the block finds the first ['word', count] pair
that satifies our condition, (f.ex ['against', 3]), I use .first to get
it.

That's it. Now just build a sentence by starting with a word and move
along it's path. My syntectic rules are very simple, a sentence is
complete when it containts a period ".", thereby opening up for som silly
grammar. But it doesn't really matter, grammar processing isn't the text
of the day.

The results are good enoughby far, and quite entertaining too. But alas,
even Mr Poirot would be hard pressed to solve this case.


Output # 1

Murder of sight. Mary Cavendish was vital to connect the charred paper. "A
great risk, it was a book?" "Yes, often. I could certainly had already
been tested!" I was before I asked.


Output # 2

Murder against Alfred Inglethorp, I fancied that, mon ami, it is broken
bell, and the matter, hoping that a note." Miss Howard had a day we came
to warn him more, had kindly offered to make amends; which weighed with
the Hall in the company.  Miss Howard was that?" "She made it was the
Hall--you'n a penknife, and went round Styles Court in some time I voiced
the room?" "I was inclined to call for some friends one side. His words of
horrors business to make a very fond of her--yes, I made a message to the
slips of dismissal.


=== code ===


class MarkovChain
  def initialize(text)
    @words = Hash.new
    wordlist = text.split
    wordlist.each_with_index do |word, index| 
      add(word, wordlist[index + 1]) if index <= wordlist.size - 2
    end
  end

  def add(word, next_word)
    @words[word] = Hash.new(0) if !@words[word]
    @words[word][next_word] += 1
  end

  def get(word)
    return "" if !@words[word]
    followers = @words[word]
    sum = followers.inject(0) {|sum,kv| sum += kv[1]}
    random = rand(sum)+1
    partial_sum = 0
    next_word = followers.find do |word, count|
      partial_sum += count            
      partial_sum >= random
    end.first
    #puts "Selected #{next_word} from #{candidates.inspect}"
    next_word
  end    
end

mc = MarkovChain.new(File.read("Agatha Christie - The Mysterious Affair at 
Styles.txt"))

sentence = ""
word = "Murder"
until sentence.count(".") == 4
  sentence << word << " "
  word = mc.get(word)
end
puts sentence << "\n\n"

=== code ===

-- 
Jon Egil Strand
Phone: +47 98232340
jes / luretanker.no