On Sat, 26 Feb 2005 02:53:40 +0900
Ruby Quiz <james / grayproductions.net> wrote:

> [Snipped Quiz Description]

The LetterWise algorithm seems quite natural. After reading the quiz description I wanted to implement a markov based model and it turns out thats more or less what letter wise is.

In my implementation each key has always the same letters mapped onto it. As with MultiTap multiple taps cycle through the available letters.

But with each keypress the letters mapped to a key are ordered descending by the probability that they will be typed now. Probabilities are learned from a corpus. The probabilities are defined to depend only on the n last entered letters. If the n last entered letters were not encountered in the text, a shorter prefix is used.

As training material I used all the english manpages on my system. So expect debian and linux to be quite easy to enter ;). I let the program learn probabilities for different prefix lengths, the dictionaries can be found on my website:

http://ruby.brian-schroeder.de/quiz/phonetyping/

Additionally I implemented the standard multitap algorithm, from which my algorithm is derived.

As framework I used JEG2 input routines, but added something to make left, right backspace and delete work. So I don't expect it to work on windows.

The learning breaks down to:

    prefix = []
    while c = io.getc
      if k = downcase[c]
        node = (@roots[k] ||= MarkovNode.new)
        prefix.each do | p |
          node.popularity += 1
          node = (node[p] ||= MarkovNode.new)
        end
        node.popularity += 1
        prefix.pop while prefix.length >= @max_prefix
        prefix.unshift k
      end
    end

Checking for the frequency is done like this:

  def popularity(char, prefix)
    node = @roots[char[0]]
    pos = prefix.length - 1
    while pos >= 0 and node.is_a?MarkovNode and node[prefix[pos]]
      node = node[prefix[pos]]
      pos -= 1
    end
    node.to_i
  end


All the rest can be found at:

http://ruby.brian-schroeder.de/quiz/phonetyping/

as always all code browsable in full color, or available as an archive.

best regards,

Brian