On Dec 16, 2004, at 9:39 AM, Brian Schröäer wrote: > On Thu, 16 Dec 2004 23:13:16 +0900 > Ruby Quiz <james / grayproductions.net> wrote: > >> Knowing this, choose_move() is easy to breakdown. It checks to see >> if it >> knows anything about the moves from the current position. If it >> does, it >> selects the highest score it can find for itself (else branch). If it >> doesn't, it goes with a random choice from all available moves (if >> branch). >> > > Thanks for the writeup james. I have to correct this paragraph here, > as the > program would not work if it worked the way you describe it. It is > important to > see, that one should differentiate between exploration and exploitation > behaviour. Exploration means, that the player tries out new moves to > learn more > about this game, exploitation means usage of the learned knowledge. > Your > description suggest, that once the game know how to make a move, it > makes it. > If it would exhibit this behaviour, it would never learn how to play > different > than the first game. The only thing would be, that it would learn that > it plays > badly (set the score of the states to -INFINITY after an infinit > number of > games). > > In fact, there is an adjustable chance, that the player pics a random > move, > even though it know a move. This is the exploitation factor, that is > adjusted > with the badly named random_prob attribute. You're correct, of course. I was trying not to over-complicate my explanation, but that is a pretty important detail I omitted. Thanks for keeping me honest. James Edward Gray II