--rTlpS9DCuZHX1ghZLyR
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Here is my solution. It was so much fun I couldn't help but keep making
small tweaks. Here are a few of my favourite bits of output with varying
word limits, from various sources (including rubytalk):
thou wast question'd my algorithm? -- posted delight hath.
FasterCSV can also reason, having methods 2006 08:58 am, after lunch.
April 1st joke. I'm subscribed to ruby-dev and the other hand, I've
thought of it...hrrmmm... And now I fear, that the Mozilla Foundation
relicensed everything as MPL and GNU GPL provides. It is unfortunate
that it should be able to check against multiple types as well Hi.
Ruby Quiz: 1. Please do not hesitate dive in to it, on the C:\ruby\lib
\ruby\1.8\cgi\session.rb file, $350 convoluted I'm licenses David 11:26
install the 'ruby-db2-0.4' package and all that...
I'll blow this police whistle from my shoulders. Suddenly and makes it
incompatible. But I might have agreed tosuch without having to support
meta-progarmming. Perhaps an example how to handle initialize_copy: The
mixin manages some attributes and the other hand, I've thought of
it...hrrmmm...
efore using it, you have to give it some knowledge - the -l option
accepts a filename or URL to learn from (omit the filename for stdin).
After chomping through it, the knowledge will get dumped out to a file
ready to be used for generation. You can do this multiple times to
incrementally teach it.
If you want to use a large body of input I recommend storing each dump
in it's own file, which you can do with the -f option. Then, specify one
or more -f options when running generation to use the specified
knowledge file(s).
Once you've got some input, run markov.rb, specifying your -f options if
you're not using the default 'chainstore' file, and optionally
generation parameters such as -w N to limit to at most N words.
There's a few options you can pass to control both learning and
generation, there's a bit of doc available from the -h option.
I posted a few chomped input files at:
http://roscopeco.co.uk/code/ruby-quiz-entries/74/
http://roscopeco.co.uk/code/ruby-quiz-entries/74/README
Thanks again for this quiz, I've gotten a fair few lol moments out of
it :)
--
Ross Bamford - rosco / roscopeco.REMOVE.co.uk
--rTlpS9DCuZHX1ghZLyR
Content-Disposition: attachment; filename=markov.rb
Content-Type: text/plain; name=markov.rb; charset=utf-8
Content-Transfer-Encoding: 7bit
#!/usr/local/bin/ruby
#
# Run with -h or --help for options.
require 'open-uri'
require 'optparse'
class MarkovEngine
attr_accessor :chains
def initialize(*filenames)
@chains }
filenames.each do |filename|
if filename && File.exists?(filename)
@chains.merge! Marshal.load(File.read(filename))
end
end
# we could compress the tree here to just the top
# scoring match, but that would prevent us using
# variance and fuzzy matching.
end
def learn_from(text, order , weight , sentence_awareness alse)
weight + if weight
txt ext.split
txt.inject(txt.shift) do |prev,word|
# don't cross sentence boundaries?
if sentence_awareness && word /[.!?:;]$/
prev "
else
oldprevs rev.split
plen oldprevs.length, order].min
prev ldprevs[-plen,plen].join(" ")
end
# if plen ! rder we need to fill up prev a bit more
if plen order && word /\w/
wordsym ord.intern
prevsym rev.intern
# compensate for no default proc with either marshal or yaml
(@chains[prevsym] || })[wordsym] ||
@chains[prevsym][wordsym] + eight || word.length
end
prev << " " << word
end
nil
end
def select_next(word, max_variance )
word ord.to_sym # may be a sym already
o chains[word] || {}
poss .inject([0,[]]) do |(score,words),(k,v)|
variance and(max_variance)
if v > score - variance
[v,[k]]
elsif v score - variance
[score,words << k]
else
[score,words]
end
end.last
r poss[rand(poss.length)] ||
@chains.keys[rand(@chains.keys.length)]).to_s.split.first.to_sym
$stderr.puts "#{word} #{r} (#{poss.inspect})" if $VERBOSE
r
end
def generate_text(wordcount,
start_with il,
max_variance ,
fuzzy_variance rue,
memory_len )
word tart_with
unless word
tries
until word.to_s /^[A-Z]/ || (tries + ) > 20
word chains.keys[rand(@chains.keys.length)]
end
end
prev ord
# loop for words
memory [nil] * (memory_len - 1)) + [word]
txt "
total wordcount / 2 + rand(wordcount / 2))
total.times do |i|
txt << " " << word.to_s
word, prev etch_next(prev,memory,max_variance,fuzzy_variance)
# we close to our total? Is this a sentence boundary?
# quit now if so, we may not get the chance again...
break if prev.to_s[-1,1] /[\.\!\?]/ and i > otal - 10
end
txt.chomp!
txt.lstrip!
# If we didn't manage to finish on a sentence boundary, try a few
# extra words to see if we can get one.
extras
until txt[-1,1] /[\.\!\?]/ or extras > [10,wordcount / 4].max
txt << " " << word.to_s
word, prev etch_next(prev,memory,max_variance,fuzzy_variance)
txt.chomp!
extras +
end
# last ditch
txt << '.!?'[rand(3)] if txt[-1,1] !~ /[\.\!\?]/
txt
end
def save_to(filename)
# originally used YAML but there are probs with quoted symbols... :(
File.open(filename, 'w+') do |f|
f << Marshal.dump(@chains)
end
true
end
private
# helper for generate_text
def fetch_next(prev,memory,max_variance,fuzzy_variance)
word elect_next(prev,max_variance)
tries
while fuzzy_variance && memory.include?(word) && tries < 4
word elect_next(prev,max_variance+(tries + ))
end
if memory.include? word
word chains.keys[rand(@chains.keys.length)].to_s.split.first.to_sym
end
prev rev.to_s.split
prev.shift
prev prev << word).join(' ').to_sym
# Remember this word so we don't get stuck in a cycle
memory.shift
memory.push(word)
[word,prev]
end
end
class MarkovRunner
class << self
def run(opts RGV)
new.run(opts)
end
end
def initialize
@summary_only alse
@learn alse
@store_files ]
@learn_from ]
@learn_order
@learn_weight
@wordcount 00
@max_variance
@fuzzy_variance rue
@sentence_breaks alse
@start_with il
@memory_length
end
def run(opts)
parse_opts(opts)
e arkovEngine.new(*@store_files)
if @summary_only
if @store_files.empty?
puts "Knowledge loaded from: (none)"
else
puts "Knowledge loaded from: #{@store_files.inspect}"
end
nch .chains.length
puts "#{nch} top-level chains"
if $VERBOSE
nln .chains.inject(0) { |n,(word,chain)| n + chain.length }
puts "#{nln} links"
puts "Average chain length: #{(nch > 0) ? (nln / nch.to_f) : 'n/a'}"
end
order .chains.keys[rand(e.chains.keys.length)].to_s.split.length
puts "Apparent order is : #{order}"
return 1
end
if @learn
if @store_files.length > 1
$stderr.puts 'warning: multiple store files ignored'
end
if @learn_from.empty?
e.learn_from($stdin.read,@learn_order,@learn_weight,@sentence_breaks)
else
@learn_from.each do |fn|
if fn /:\/\//
e.learn_from(URI(fn).read,@learn_order,@learn_weight,
@sentence_breaks)
elsif File.exists?(fn)
e.learn_from(File.read(fn),@learn_order,@learn_weight,
@sentence_breaks)
else
$stderr.puts "warning: unrecognized input: #{fn}"
end
end
end
e.save_to(@store_files.last)
0
else
# can't generate with an empty engine
if e.chains.empty?
$stderr.puts("error: need input")
1
else
puts e.generate_text(@wordcount,
@start_with,
@max_variance,
@fuzzy_variance,
@memory_length)
0
end
end
end
private
def parse_opts(args)
opts ptionParser.new do |opt|
opt.banner syntax: ./markov.rb [options]"
opt.separator ""
opt.separator "where [options] include:"
opt.on('-f','--file FILENAME',
'Specify knowledge file to use. More than',
' one -f option may be supplied when',
' generating text. In learn mode, only the',
' last filename specified is recognised.',
' (default: chainstore)') do |fn|
@store_files << fn
end
opt.on('-w','--words MAXWORDCOUNT',Integer,
'Set the maximum number of words to',
" output. (default: #{@wordcount})") do |count|
@wordcount ount
end
opt.on('-s','--start-with WORDS',
'Specify one or more (matching order ',
' setting) words from which generation',
' should begin. (default: random).') do |w|
@start_with
end
opt.on('-v','--max-variance N',Integer,
'Set the maximum scoring variance to use',
' in generation (higher uzzier match,',
" default: #{@max_variance})") do |variance|
@max_variance ariance
end
opt.on('-z','--no-fuzzy-variance',
'Disable variance fuzz when searching for',
' next word in generation.') do
@fuzzy_variance alse
end
opt.on('-m','--memory-length N',Integer,
'Set length of the queue used to avoid',
" cycles in output. (default: #{@memory_length})") do |length|
@memory_length ength
end
opt.on('-l','--learn [FILEORURI]',
'Read FILEORURI and learn from it.',
' Multiple -l options may be supplied.',
' If no file or URI is specified, stdin',
' is read.') do |uri|
@learn rue
@learn_from << uri if uri
end
opt.on('-b','--sentence-breaks',
'Enable sentence-break awareness in',
" learn mode. (default #{@sentence_breaks})") do |b|
@sentence_breaks
end
opt.on('-o','--order N',Integer,
'Set order to N for learn mode. Ignored',
" during generation. (default: #{@learn_order})") do |n|
@learn_order
end
opt.on('-g','--weight N',
'Set score weighting for learn mode.',
' If weight is "A", word-length weighting',
' will be used. (default: 0)') do |weight|
if weight "A"
@learn_weight il
else
@learn_weight eight.to_i
end
end
opt.on_tail('-R', '--report','Display knowledge summary') do
@summary_only rue
end
opt.on_tail('-V', '--verbose','Enable verbose output on stderr') do
$VERBOSE rue
end
opt.on_tail('-h','--help','Display this help text') do
puts opts
exit(1)
end
end
opts.parse(args)
if @store_files.empty? and File.exists?('chainstore')
@store_files << 'chainstore'
end
end
end
if $0 __FILE__
exit(MarkovRunner.run)
end
--rTlpS9DCuZHX1ghZLyR--