On Mon, 13 Dec 2004 23:27:19 +0900
Thomas Leitner <t_leitner / gmx.at> wrote:

> | This week's Ruby Quiz is to implement an AI for playing Tic-Tac-Toe,
> | with a catch:  You're not allowed to embed any knowledge of the game
> | into your creation beyond how to make legal moves and recognizing that
> | it has won or lost.
> | 
> | Your program is expected to "learn" from the games it plays, until it
> | masters the game and can play flawlessly.
> 
> So, I have also tried to program a learning AI player. However, it still
> does not do what it should and I do not know why. Maybe someone with
> more brains can help???
> 
> Here is the code for the AI player:
> 
> class AIPlayer < Player
> 
>   def initialize( game, sign )
>     super( game, sign )
>     @stats = {}
>     @cur_stats = []
>   end
> 
>   def move
>     unless @stats.has_key? @game.board
>       @stats[@game.board.dup] = {}
>       @game.board.valid_moves.each {|m| @stats[@game.board][m] = 0}
>     end
>     moves = @stats[@game.board].sort {|a,b| a[1] <=> b[1]}
>     result = moves.shift
>     result = moves.shift while (moves.length > 0) && rand < 0.02
>     @cur_stats << [@game.board.dup, result[0]]
>     return result[0]
>   end
> 
>   def game_finished( result )
>     mult = case result
>            when :won then 1000
>            when :lost then -100
>            else 0
>            end
>     @cur_stats.each_with_index do |o,i|
>       @stats[o[0]][o[1]] += mult * 2**i
>     end
>     @cur_stats = []
>   end
> 
> end
> 
> A short description for the code:
> 
> The @game object is the current game and knows the current board. When
> it is the AI players move, the #move method is called and the AI player
> has to return a number between 0 and 8. Thats because my tictactoe board
> is a simple array and the array inidices correspond to these fields:
> 
> 0 1 2
> 3 4 5
> 6 7 8
> 
> In the #move method, I create a new entry in the @stats Hash (key is the
> board) if it does not have the current board as key and initialize its
> value with a Hash (keys are the valid fields and values are set to 0).
> After that the Hash with the valid moves for this board is taken from
> the @stats Hash and sorted so that the moves with the highest
> probability are in front. Then the first, or if rand < 0.02 the second
> or if rand < 0.0.2 the third, ... value is returned.
> 
> If the game has finished, the #game_finished method is called and the
> moves that were chosen in the game are assigned new values depending on
> the result of the game.
> 

If I understand your code correctly, you choose with highest probability the
worst move ;). You should either use moves.pop or sort in reverse order.

Note that I used exactly the opposite evaluation function: -1000 for loss and
100 for win, because it is impossible to win against a decent player and I
wanted to avoid losses and tend to play draws. Though I don't know if this
really makes a difference compared with -1 0 1 or other numbers. I hope my
choice does not lead to weird psychoanalysis of me ;)

Best Regards,

Brian

-- 
Brian Schröäer
http://www.brian-schroeder.de/