Ruby Quiz wrote:
> This quiz is to make a Hangman guessing player in Ruby. Play should proceed as
> follows:

I focused on building a program that makes good guesses.


== Algorithm overview

The guesser reads a dictionary and then builds a database (which is
reused) with the following information about each word:
* size
* positions of each character
* number of occurrences of each character

The basic algorithm is as follows:
* Remove all words that do not have a matching length.
* While the game has not been solved:
** Pick the character included in the most words still remaining.
** If the character is not in the word: remove all words with the character.
** If the character is in the word: remove all words that do not contain
the character at exactly the revealed positions.

=== Weaknesses

The algorithm is not optimal. The character that's included in the most
words is not necessarily the character which will give the largest
reduction in number of potential words since position is not considered.
Consider a dictionary with the following:

bdc
ebc
fcb

b and c are tied for number of occurrences, but b would be the better
choice. If we pick b we will in all cases be left with one potential
word. If we pick c and the word is one of the first two we get two
potential words.

My guess is that it's a good enough heuristic in most cases though.

Also note that this program is built on the assumption that the word is
picked randomly from the dictionary. More refined solutions could weigh
in the relative frequency of different words in normal English text.


== Speed

It takes about 40 minutes to create the database for a dictionary with
4*10^5 words, but it only has to be created once.

Computing all guesses for a word (i.e. from being given the length to
having the correct word) takes about 30 to 40 seconds for a dictionary
with 4*10^5 words. That time includes about 10 seconds to reset the
database from previous uses, another 10 seconds for pruning based on
word length and the rest for the remaining search.

=== Possible improvements

Much of the initial sorting could be precomputed (e.g. split words into
different table based on length and then only work against the table
with the specified length) to cut down on the time needed reset and do
the initial pruning. The first (and possibly some additional steps)
could also be precomputed.


== Dependencies

Requires a mysql database and the mysql-gem. You need to enter your
username, passwords and database name in HangmanGuesser#db_connection below.


== The code

#!/usr/bin/env ruby
# == Synopsis
#
# automated_hangman: plays a game of hangman with the word of your
# choice
#
# == Usage
#
# automated_hangman [OPTION] ... WORD
#
# -h, --help:
#    show help
#
# -d, --dictionary [dictionary location]:
#    sets up the database to use the specified dictionary (defaults to
#    /usr/share/dict/words), can take some time
#
# WORD: The word that the program should try to guess.


require 'getoptlong'
require 'rdoc/usage'
require 'mysql'

# Describes a game of hangman.
class Hangman
  LIVES = 6

  # Creates a new game of hangman where word is the target word.
  def initialize(word)
    @guesses = []
    @word_characters = word.chomp.downcase.split(//)
  end

  # Returns an array containing the incorrect guessed characters.
  def incorrect_guesses
    @guesses - @word_characters
  end

  # Guesses a specified character. Returns an array of indices (possibly
  # empty) where the character was found.
  def guess(char_guess)
    @guesses << char_guess
    indices = []
    @word_characters.each_with_index do |character, index|
      indices << index if character == char_guess
    end
    return indices
  end

  # Returns a string representation of the current progress.
  def to_s
    hidden_characters = @word_characters - @guesses
    return @word_characters.join(' ') if hidden_characters.empty?
    @word_characters.join(' ').gsub(
      /[#{hidden_characters.uniq.join}]/, '_')
  end

  # Checks whether the player has won.
  def won?
    (@word_characters - @guesses).empty?
  end

  # Checks whether the player has lost.
  def lost?
    incorrect_guesses.size > LIVES
  end

  # Gets the number of characters in the word.
  def character_count
    @word_characters.size
  end
end

# The guessing machine which picks the guesses.
class HangmanGuesser
  # The location of the default dictionary to use.
  DICTIONARY_FILE = '/usr/share/dict/words'
  # An array of the characters that should be considered.
  CHARACTERS = ('a'..'z').to_a
  # Set this to true to see how the search progresses.
  VERBOSE = true
  # The maximum word length accepted.
  MAX_WORD_LENGTH = 50

  # The dictionary given should be the location of a file containing one
  # word per line. The characters should be an array of all characters
  # that should be considered (i.e. no words with other characters are
  # included).
  def initialize(hangman_game, characters = CHARACTERS)
    @con = self.class.db_connection
    @characters = characters
    @hangman_game = hangman_game

    reset_tables
    prune_by_word_length @hangman_game.character_count
  end

  # Returns the guesses that the guesser would make.
  def guesses
    @guesses = []
    log{ "There are #{word_count} potential words left." }
    while not @hangman_game.won?
      guess = next_guess
      raise 'The word is not in the dictionary.' if guess.nil?
      @guesses << guess
      log{ "Guessing #{guess}" }
      add_information(guess, @hangman_game.guess(guess))
      log_state
      log{ "\n" }
    end
    return @guesses
  end

  class << self
    # Creates the database and populates it with the dictionary file
    # located at the specified location. Only considers the specified
    # characters (array).
    def create_database(dictionary = DICTIONARY_FILE,
        characters = CHARACTERS)
      @con = db_connection
      @characters = characters
      @tables = ['words'] + @characters +
        @characters.map{ |c| c + '_occurrences'}
      create_tables
      populate_tables File.open(dictionary)
    end

    # Connects to the database that should store the tables.
    def db_connection
      # Replace <username> and <password> with the database username and
      # password.
      Mysql.real_connect("localhost", <username>, <password>, "hangman")
    end

    private

    # Creates the tables used to store words.
    def create_tables
      # Drop old tables.
      @tables.each do |table|
        @con.query "DROP TABLE IF EXISTS `#{table}`"
      end

      # Words table.
      @con.query <<-"end_sql"
        CREATE TABLE `words` (
          `word_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
          `word` varchar(#{MAX_WORD_LENGTH}) NOT NULL,
          `length` tinyint(3) unsigned NOT NULL,
          `removed` tinyint(1) unsigned NOT NULL DEFAULT '0',
          PRIMARY KEY (`word_id`),
          INDEX (`removed`),
          INDEX (`length`)
        ) ENGINE=MyISAM
      end_sql

      # Tables for the number of occurrences of each character.
      character_occurrences_table_template =<<-'end_template'
        CREATE TABLE `%s_occurrences` (
          `word_id` mediumint(8) unsigned NOT NULL,
          `occurrences` tinyint(3) unsigned NOT NULL,
          PRIMARY KEY (`occurrences`, `word_id`),
          INDEX (`word_id`)
        ) ENGINE=MyISAM
      end_template

      # Tables for the positions of each character.
      character_table_template =<<-'end_template'
        CREATE TABLE `%s` (
          `word_id` mediumint(8) unsigned NOT NULL,
          `position` tinyint(3) unsigned NOT NULL,
          PRIMARY KEY (`position`, `word_id`),
          INDEX (`word_id`)
        ) ENGINE=MyISAM
      end_template

      @characters.each do |character|
        @con.query character_occurrences_table_template % character
        @con.query character_table_template % character
      end
    end

    # Loads a dictionary into the database.
    def populate_tables(dictionary_file)
      # Disable the keys so that we don't update the indices while
      # adding.
      @tables.each do |table|
        @con.query("ALTER TABLE #{table} DISABLE KEYS")
      end

      # Prepare statements.
      add_word = @con.prepare(
        "INSERT INTO words (word, length) VALUES (?, ?)")
      add_character = {}
      add_character_occurrences = {}
      @characters.each do |character|
        add_character[character] = @con.prepare(
          "INSERT INTO #{character} (word_id, position) VALUES (?, ?)")
        add_character_occurrences[character] = @con.prepare(
          "INSERT INTO #{character}_occurrences " +
          "(word_id, occurrences) VALUES (?, ?)")
      end

      # Populate the database.
      previous_word = nil
      dictionary_file.each_line do |line|
        # Only consider words that only contain characters a-z. Make
        # sure we don't get duplicates.
        word = line.chomp.downcase
        next if word == previous_word or word =~ /[^a-z]/ or
          word.size > MAX_WORD_LENGTH

        # Add the word, its character positions and number of
        # occurrences.
        add_word.execute(word, word.size)
        word_id = @con.insert_id
        characters = word.split(//)
        characters.each_with_index do |character, position|
          add_character[character].execute(word_id, position)
        end
        @characters.each do |character|
          occurrences = characters.select{ |c| c == character }.size
          add_character_occurrences[character].execute(
            word_id, occurrences)
        end

        previous_word = word
      end

      # Generate the indices.
      @tables.each do |table|
        @con.query("ALTER TABLE #{table} ENABLE KEYS")
      end
    end
  end

  private

  # Logs the current state of the guessing process.
  def log_state
    log do
      messages = []
      messages << @hangman_game.to_s
      count = word_count
      messages << "There are #{count} potential words left."
      if count <= 10
        res = @con.query('SELECT word FROM words WHERE removed = 0')
        res.each{ |row| messages << row[0] }
        res.free
      end
      messages.join("\n")
    end
  end

  # Logs the string produced by the block (may not be executed at all).
  def log(&block)
    puts yield() if VERBOSE
  end

  # Gets the number of potential words left.
  def word_count
    res = @con.query('SELECT COUNT(*) FROM words WHERE removed = 0')
    count = res.fetch_row[0].to_i
    res.free
    return count
  end

  # Computes the next character that should be guessed. The next guess
  # is the character (that has not yet been tried) that occurrs in the
  # most words remaining.
  def next_guess
    next_character = nil
    max_count = 0
    (@characters - @guesses).each do |character|
      res = @con.query(
        "SELECT COUNT(DISTINCT word_id) FROM #{character} " +
        "NATURAL JOIN words WHERE removed = 0")
      count = res.fetch_row[0].to_i
      res.free
      if count > max_count
        next_character = character
        max_count = count
      end
    end
    return next_character
  end

  # Adds the information about at what indices in the word the specified
  # character can be found to the guesser.
  def add_information(character, indices)
    if indices.empty?
      # The character isn't in the word.
      sql =<<-"end_sql"
        UPDATE words SET removed = 1 WHERE removed = 0 AND word_id IN (
          SELECT word_id FROM #{character}
        )
      end_sql
    else
      # Remove all words where the character isn't at the specified
      # places.
      sql =<<-"end_sql"
        UPDATE words NATURAL JOIN #{character}_occurrences
        SET removed = 1
        WHERE removed = 0
          AND (occurrences != #{indices.size}
            OR word_id IN (
              SELECT word_id FROM #{character}
              WHERE position NOT IN (#{indices.join(', ')})
            )
          )
      end_sql
    end
    @con.query(sql)
  end

  # Resets the table to start a new round of guesses.
  def reset_tables
    @con.query('UPDATE words SET removed = 0')
  end

  # Prunes all words that do not have the specified length.
  def prune_by_word_length(expected_length)
    @con.query(
      "UPDATE words SET removed = 1 WHERE length != #{expected_length}")
  end
end


opts = GetoptLong.new(
  [ '--help', '-h', GetoptLong::NO_ARGUMENT],
  ['--dictionary', '-d', GetoptLong::OPTIONAL_ARGUMENT])
opts.each do |opt, arg|
  case opt
    when '--help'
      RDoc::usage
    when '--dictionary'
      if arg != ''
        HangmanGuesser.create_database(arg)
      else
        HangmanGuesser.create_database
      end
  end
end

if ARGV.size != 1
  abort "Incorrect usage, see --help"
end

game = Hangman.new(ARGV[0])
guesses =  HangmanGuesser.new(game).guesses
if game.won?
  puts 'Successfully guessed the word.'
else game.lost?
  puts 'Failed guessing the word.'
end
puts "Made the following guesses: #{guesses.join(', ')}"
puts "Expended a total of #{game.incorrect_guesses.size} lives."


-- 
Andreas Launila