--rTlpS9DCuZHX1ghZLyR
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Here is my solution. It was so much fun I couldn't help but keep making
small tweaks. Here are a few of my favourite bits of output with varying
word limits, from various sources (including rubytalk):

thou wast question'd my algorithm? -- posted delight hath.

FasterCSV can also reason, having methods 2006 08:58 am, after lunch.

April 1st joke. I'm subscribed to ruby-dev and the other hand, I've
thought of it...hrrmmm... And now I fear, that the Mozilla Foundation
relicensed everything as MPL and GNU GPL provides. It is unfortunate
that it should be able to check against multiple types as well Hi.

Ruby Quiz: 1. Please do not hesitate dive in to it, on the C:\ruby\lib
\ruby\1.8\cgi\session.rb file, $350 convoluted I'm licenses David 11:26
install the 'ruby-db2-0.4' package and all that...

I'll blow this police whistle from my shoulders. Suddenly and makes it
incompatible. But I might have agreed tosuch without having to support
meta-progarmming. Perhaps an example how to handle initialize_copy: The
mixin manages some attributes and the other hand, I've thought of
it...hrrmmm...
efore using it, you have to give it some knowledge - the -l option
accepts a filename or URL to learn from (omit the filename for stdin).
After chomping through it, the knowledge will get dumped out to a file
ready to be used for generation. You can do this multiple times to
incrementally teach it.

If you want to use a large body of input I recommend storing each dump
in it's own file, which you can do with the -f option. Then, specify one
or more -f options when running generation to use the specified
knowledge file(s).

Once you've got some input, run markov.rb, specifying your -f options if
you're not using the default 'chainstore' file, and optionally
generation parameters such as -w N to limit to at most N words.

There's a few options you can pass to control both learning and
generation, there's a bit of doc available from the -h option.

I posted a few chomped input files at:
	http://roscopeco.co.uk/code/ruby-quiz-entries/74/
	http://roscopeco.co.uk/code/ruby-quiz-entries/74/README

Thanks again for this quiz, I've gotten a fair few lol moments out of
it :)

-- 
Ross Bamford - rosco / roscopeco.REMOVE.co.uk

--rTlpS9DCuZHX1ghZLyR
Content-Disposition: attachment; filename=markov.rb
Content-Type: text/plain; name=markov.rb; charset=utf-8
Content-Transfer-Encoding: 7bit

#!/usr/local/bin/ruby
#
# Run with -h or --help for options.

require 'open-uri'
require 'optparse'

class MarkovEngine
  attr_accessor :chains
  
  def initialize(*filenames)
    @chains  } 

    filenames.each do |filename|
      if filename && File.exists?(filename)
        @chains.merge! Marshal.load(File.read(filename))
      end
    end

    # we could compress the tree here to just the top
    # scoring match, but that would prevent us using
    # variance and fuzzy matching.
  end

  def learn_from(text, order  , weight  , sentence_awareness  alse)
    weight +  if weight
    txt  ext.split
    txt.inject(txt.shift) do |prev,word|
      # don't cross sentence boundaries?
      if sentence_awareness && word /[.!?:;]$/
        prev  "
      else
        oldprevs  rev.split
        plen  oldprevs.length, order].min
        prev  ldprevs[-plen,plen].join(" ")
      end

      # if plen ! rder we need to fill up prev a bit more
      if plen order && word /\w/
        wordsym  ord.intern
        prevsym  rev.intern
        
        # compensate for no default proc with either marshal or yaml
        (@chains[prevsym] || })[wordsym] || 
        @chains[prevsym][wordsym] + eight || word.length
      end

      prev << " " << word
    end
    nil
  end

  def select_next(word, max_variance  )    
    word  ord.to_sym   # may be a sym already
    o  chains[word] || {}
    poss  .inject([0,[]]) do |(score,words),(k,v)|      
      variance  and(max_variance)
      if v > score - variance
        [v,[k]]
      elsif v score - variance
        [score,words << k]
      else
        [score,words]
      end
    end.last
    
    r  poss[rand(poss.length)] || 
         @chains.keys[rand(@chains.keys.length)]).to_s.split.first.to_sym
    
    $stderr.puts "#{word} #{r} (#{poss.inspect})" if $VERBOSE
    r
  end
  
  def generate_text(wordcount, 
                    start_with  il, 
                    max_variance  , 
                    fuzzy_variance  rue, 
                    memory_len  )
    word  tart_with 
    unless word
      tries  
      until word.to_s /^[A-Z]/ || (tries + ) > 20
        word  chains.keys[rand(@chains.keys.length)]
      end
    end
    prev  ord
    
    # loop for words
    memory  [nil] * (memory_len - 1)) + [word]  
    txt  "
    total  wordcount / 2 + rand(wordcount / 2))    
    total.times do |i|
      txt << " " << word.to_s
      word, prev  etch_next(prev,memory,max_variance,fuzzy_variance)
      
      # we close to our total? Is this a sentence boundary?
      # quit now if so, we may not get the chance again...
      break if prev.to_s[-1,1] /[\.\!\?]/ and i > otal - 10
    end
    
    txt.chomp!
    txt.lstrip!
    
    # If we didn't manage to finish on a sentence boundary, try a few
    # extra words to see if we can get one.
    extras  
    until txt[-1,1] /[\.\!\?]/ or extras > [10,wordcount / 4].max
      txt << " " << word.to_s
      word, prev  etch_next(prev,memory,max_variance,fuzzy_variance)
      
      txt.chomp!
      extras + 
    end

    # last ditch
    txt << '.!?'[rand(3)] if txt[-1,1] !~ /[\.\!\?]/
    txt
  end

  def save_to(filename)
    # originally used YAML but there are probs with quoted symbols... :(
    File.open(filename, 'w+') do |f|
      f << Marshal.dump(@chains)
    end
    true
  end

  private

  # helper for generate_text
  def fetch_next(prev,memory,max_variance,fuzzy_variance)
    word  elect_next(prev,max_variance)
    tries  
    while fuzzy_variance && memory.include?(word) && tries < 4
      word  elect_next(prev,max_variance+(tries + ))
    end
    if memory.include? word
      word  chains.keys[rand(@chains.keys.length)].to_s.split.first.to_sym
    end
     
    prev  rev.to_s.split
    prev.shift
    prev  prev << word).join(' ').to_sym

    # Remember this word so we don't get stuck in a cycle
    memory.shift
    memory.push(word)

    [word,prev]   
  end    
end

class MarkovRunner    
  class << self
    def run(opts  RGV)
      new.run(opts)
    end
  end

  def initialize
    @summary_only  alse
    @learn  alse
    @store_files  ]
    @learn_from  ]
    @learn_order  
    @learn_weight  
    @wordcount  00
    @max_variance  
    @fuzzy_variance  rue
    @sentence_breaks  alse
    @start_with  il
    @memory_length  
  end
  
  def run(opts)
    parse_opts(opts)
    e  arkovEngine.new(*@store_files)

    if @summary_only        
      if @store_files.empty?
        puts "Knowledge loaded from: (none)"
      else
        puts "Knowledge loaded from: #{@store_files.inspect}"
      end

      nch  .chains.length
      puts "#{nch} top-level chains"
      if $VERBOSE
        nln  .chains.inject(0) { |n,(word,chain)| n + chain.length }        
        puts "#{nln} links"
        puts "Average chain length: #{(nch > 0) ? (nln / nch.to_f) : 'n/a'}"
      end
      order  .chains.keys[rand(e.chains.keys.length)].to_s.split.length
      puts "Apparent order is   : #{order}"
      
      return 1
    end
      
    if @learn
      if @store_files.length > 1
        $stderr.puts 'warning: multiple store files ignored'
      end
      
      if @learn_from.empty?
        e.learn_from($stdin.read,@learn_order,@learn_weight,@sentence_breaks)
      else
        @learn_from.each do |fn|
          if fn /:\/\//
            e.learn_from(URI(fn).read,@learn_order,@learn_weight,
                         @sentence_breaks)
          elsif File.exists?(fn)
            e.learn_from(File.read(fn),@learn_order,@learn_weight,
                         @sentence_breaks)
          else
            $stderr.puts "warning: unrecognized input: #{fn}"
          end
        end
      end
      e.save_to(@store_files.last)
      0
    else
      # can't generate with an empty engine
      if e.chains.empty?
        $stderr.puts("error: need input")
        1
      else
        puts e.generate_text(@wordcount, 
                             @start_with, 
                             @max_variance, 
                             @fuzzy_variance, 
                             @memory_length)
        0
      end
    end
  end

  private

  def parse_opts(args)
    opts  ptionParser.new do |opt|
      opt.banner  syntax: ./markov.rb [options]"

      opt.separator ""
      opt.separator "where [options] include:"
      
      opt.on('-f','--file FILENAME',
             'Specify knowledge file to use. More than',
             '  one -f option may be supplied when',
             '  generating text. In learn mode, only the',
             '  last filename specified is recognised.',
             '  (default: chainstore)') do |fn|
        @store_files << fn
      end
      
      opt.on('-w','--words MAXWORDCOUNT',Integer,
             'Set the maximum number of words to',
             "  output. (default: #{@wordcount})") do |count|
        @wordcount  ount
      end

      opt.on('-s','--start-with WORDS',
             'Specify one or more (matching order ',
             '  setting) words from which generation',
             '  should begin. (default: random).') do |w|
        @start_with  
      end
             
      opt.on('-v','--max-variance N',Integer,
             'Set the maximum scoring variance to use',
             '  in generation (higher  uzzier match,',
             "  default: #{@max_variance})") do |variance|
        @max_variance  ariance
      end

      opt.on('-z','--no-fuzzy-variance',
             'Disable variance fuzz when searching for',
             '  next word in generation.') do
        @fuzzy_variance  alse
      end
      
      opt.on('-m','--memory-length N',Integer,
             'Set length of the queue used to avoid',
             "  cycles in output. (default: #{@memory_length})") do |length|
        @memory_length  ength
      end

      opt.on('-l','--learn [FILEORURI]',
             'Read FILEORURI and learn from it.',
             '  Multiple -l options may be supplied.',
             '  If no file or URI is specified, stdin',
             '  is read.') do |uri|
        @learn  rue
        @learn_from << uri if uri
      end        

      opt.on('-b','--sentence-breaks',
             'Enable sentence-break awareness in',
             "  learn mode. (default #{@sentence_breaks})") do |b|
        @sentence_breaks  
      end
      
      opt.on('-o','--order N',Integer,
             'Set order to N for learn mode. Ignored',
             "  during generation. (default: #{@learn_order})") do |n|
        @learn_order  
      end

      opt.on('-g','--weight N',
             'Set score weighting for learn mode.',
             '  If weight is "A", word-length weighting',
             '  will be used. (default: 0)') do |weight|
        if weight "A"
          @learn_weight  il
        else
          @learn_weight  eight.to_i
        end
      end

      opt.on_tail('-R', '--report','Display knowledge summary') do
        @summary_only  rue
      end
      
      opt.on_tail('-V', '--verbose','Enable verbose output on stderr') do
        $VERBOSE  rue
      end       

      opt.on_tail('-h','--help','Display this help text') do
        puts opts
        exit(1)
      end
    end

    opts.parse(args)
    if @store_files.empty? and File.exists?('chainstore')
      @store_files << 'chainstore'
    end
  end
end

if $0 __FILE__
  exit(MarkovRunner.run)
end


--rTlpS9DCuZHX1ghZLyR--