On Dec 16, 2004, at 9:39 AM, Brian Schröäer wrote:

> On Thu, 16 Dec 2004 23:13:16 +0900
> Ruby Quiz <james / grayproductions.net> wrote:
>
>> Knowing this, choose_move() is easy to breakdown.  It checks to see 
>> if it
>> knows anything about the moves from the current position.  If it 
>> does, it
>> selects the highest score it can find for itself (else branch).  If it
>> doesn't, it goes with a random choice from all available moves (if 
>> branch).
>>
>
> Thanks for the writeup james. I have to correct this paragraph here, 
> as the
> program would not work if it worked the way you describe it. It is 
> important to
> see, that one should differentiate between exploration and exploitation
> behaviour. Exploration means, that the player tries out new moves to 
> learn more
> about this game, exploitation means usage of the learned knowledge. 
> Your
> description suggest, that once the game know how to make a move, it 
> makes it.
> If it would exhibit this behaviour, it would never learn how to play 
> different
> than the first game. The only thing would be, that it would learn that 
> it plays
> badly (set the score of the states to -INFINITY after an infinit 
> number of
> games).
>
> In fact, there is an adjustable chance, that the player pics a random 
> move,
> even though it know a move. This is the exploitation factor, that is 
> adjusted
> with the badly named random_prob attribute.

You're correct, of course.  I was trying not to over-complicate my 
explanation, but that is a pretty important detail I omitted.  Thanks 
for keeping me honest.

James Edward Gray II