On Fri, 10 Dec 2004 23:29:02 +0900 Ruby Quiz <james / grayproductions.net> wrote: > This week's Ruby Quiz is to implement an AI for playing Tic-Tac-Toe, with a > catch: You're not allowed to embed any knowledge of the game into your > creation beyond how to make legal moves and recognizing that it has won or > lost. > > Your program is expected to "learn" from the games it plays, until it masters > the game and can play flawlessly. > > Submissions can have any interface, but should be able to play against humans > interactively. However, I also suggest making it easy to play against > another AI, so you can "teach" the program faster. > > Being able to monitor the learning progression and know when a program has > mastered the game would be very interesting, if you can manage it. Here comes my solution. As always I've set up a website where you can browse everything online, that will be the most comfortable way. http://ruby.brian-schroeder/quiz/tictactoe/ I implemented minimax with alpa-beta search as a standard. It is impossible to play better. And here we see, that tic-tac-toe is quite uninteresting, in that it is impossible to win against a perfect player. The best one can achieve is a draw. As discussed elsewhere, minimax is not a valid solution. So I wrote a ai that gets to know only the id of the state the world is in, the id's of all allowed moves from here, and the state and move id's that the opponent choose. Also it is allowed to know, when a it has reached a final state, and if it has won, drawn or lost. Using this information the ai learns about the possible moves and states, and assigns each state with a value. It then greedily goes always for the state with the highest value. The amount of exploration vs. exploitation that the AI makes can be adjusted, and is set higher for training than for the actual usage. Note that you can train the AI to make errors, by repeatedly giving away a chance. This will train the AI to always run into danger that you exploit this chance one day. ;) The values of all states encountered in one game are updated according to the outcome of the game. The later states are updated more strongly than earlier states. I will attach only the learning algorithm. The rest of the framework - tictactoe-interface - tictactoe-server - tictactoe-client - general minimax-alphabeta implementation - the board - the moves can be browsed online if someone is interested. class Learning < BasicInterface attr_accessor :random_prob attr_reader :player def initialize @state_values = Hash.new(0) @state_transitions = {} @random_prob = 0.05 end def new_game(player) @player = player @states_visited = [] end def choose_move(game) moves = game.moves if !@state_transitions[game.state_id] or rand < random_prob # Pick a random move move = moves[rand(moves.length)] else # Pick the best move move_id = @state_transitions[game.state_id].max{ | (ma, sa), (mb, sb) | @state_values[sa] <=> @state_values[sb] }[0] move = moves.select{|m| m.move_id == move_id}[0] end move end def inform_of_move(before, after, move) @states_visited << before.state_id << after.state_id (@state_transitions[before.state_id] ||= {})[move.move_id] = after.state_id if after.final? winner = after.winner if winner value = winner == self.player ? 100.0 : -1000.0 else value = 0.0 end factor = 1.0 while state = @states_visited.pop @state_values[state] = (1.0 - factor) * @state_values[state] + factor * value factor *= 0.5 end end end end Best regards, Brian -- Brian Schröäer http://ruby.brian-schroeder.de/