On Fri, 10 Dec 2004 23:29:02 +0900
Ruby Quiz <james / grayproductions.net> wrote:

> This week's Ruby Quiz is to implement an AI for playing Tic-Tac-Toe, with a
> catch:  You're not allowed to embed any knowledge of the game into your
> creation beyond how to make legal moves and recognizing that it has won or
> lost.
> 
> Your program is expected to "learn" from the games it plays, until it masters
> the game and can play flawlessly.
> 
> Submissions can have any interface, but should be able to play against humans
> interactively.  However, I also suggest making it easy to play against
> another AI, so you can "teach" the program faster.
> 
> Being able to monitor the learning progression and know when a program has
> mastered the game would be very interesting, if you can manage it.


Here comes my solution. As always I've set up a website where you can browse
everything online, that will be the most comfortable way.

http://ruby.brian-schroeder/quiz/tictactoe/

I implemented minimax with alpa-beta search as a standard. It is impossible to
play better. And here we see, that tic-tac-toe is quite uninteresting, in that
it is impossible to win against a perfect player. The best one can achieve is a
draw.

As discussed elsewhere, minimax is not a valid solution. So I wrote a ai that
gets to know only the id of the state the world is in, the id's of all allowed
moves from here, and the state and move id's that the opponent choose. Also it
is allowed to know, when a it has reached a final state, and if it has won,
drawn or lost.

Using this information the ai learns about the possible moves and states, and
assigns each state with a value. It then greedily goes always for the state
with the highest value.

The amount of exploration vs. exploitation that the AI makes can be adjusted,
and is set higher for training than for the actual usage. Note that you can
train the AI to make errors, by repeatedly giving away a chance. This will
train the AI to always run into danger that you exploit this chance one day. ;)

The values of all states encountered in one game are updated according to the
outcome of the  game. The later states are updated more strongly than earlier
states.

I will attach only the learning algorithm. The rest of the framework 
- tictactoe-interface
- tictactoe-server
- tictactoe-client
- general minimax-alphabeta implementation
- the board
- the moves
can be browsed online if someone is interested.


  class Learning < BasicInterface
    attr_accessor :random_prob
    attr_reader :player
      
    def initialize
      @state_values = Hash.new(0)
      @state_transitions = {}
      @random_prob = 0.05
    end

    def new_game(player)
      @player = player
      @states_visited = []        
    end
    
    def choose_move(game)
      moves = game.moves
      if !@state_transitions[game.state_id] or rand < random_prob # Pick a
random move
        move = moves[rand(moves.length)]
      else # Pick the best move
        move_id = @state_transitions[game.state_id].max{ | (ma, sa), (mb, sb) |
@state_values[sa] <=> @state_values[sb] }[0]
        move = moves.select{|m| m.move_id == move_id}[0]
      end
      move
    end

    def inform_of_move(before, after, move)          
      @states_visited << before.state_id << after.state_id
      (@state_transitions[before.state_id] ||= {})[move.move_id] =
after.state_id
      
      if after.final?
        winner = after.winner
        if winner
          value = winner == self.player ? 100.0 : -1000.0
        else
          value = 0.0
        end
        
        factor = 1.0
        while state = @states_visited.pop
          @state_values[state] = (1.0 - factor) * @state_values[state] + factor
* value
          factor *= 0.5
        end
      end
    end
  end

Best regards,

Brian

-- 
Brian Schröäer
http://ruby.brian-schroeder.de/