Algorithms¶

Collection of useful AI algorithms

class algorithms.MCTS[source]¶

Implementation of a Monte Carlo Tree Search. We want to learn how to play a game by keeping track of the best action in any state. We will do this by propagating whether or not the current player won the game back up through the game history. After enough iterations of game simulations we can choose the best move based on this stored information

wins¶: dict – A dictionary where the key is a tuple (player, state_hash) and the value is the number of wins that occurred at that state for the player. Note that the player represents whoever played the move in the state.

plays¶: dict – A dictionary of the same format as wins which represents the number of times the player made a move in the given state

best_action(g, s, p)[source]¶

Get the best action for a given player in a given game state

Parameters:	g (Game) – The game s (state) – The current state of the game p (int) – The current player
Returns:	The best action given the current knowledge of the game
Return type:	int

execute_episode(g, nnet=None, c_punt=1.4)[source]¶

Execute a single iteration of the search and update the internal state based on the generated examples

Parameters:	g (Game) – The game nnet (Network) – Optional nework to be used for the policy instead of a naive random playout c_punt (float) – The degree of exploration. Defaults to 1.4

monte_carlo_action(g, s, p, c_punt)[source]¶

Choose an action during self play based on the UCB1 algorithm. Instead of just choosing the action that led to the most wins in the past, we choose the action that balances this concern with exploration

Parameters:

g (Game) – The game
s (any) – The state of the game
p (int) – The player who is about to make a move
c_punt (float) – The degree of exploration

Returns:

Tuple (best_move, expand), where playout is a boolean denoting: whether or not the expansion phase has begun

Return type:

tuple

pi(g, s)[source]¶

Return the favorability of each action in a given state

Parameters:	g (Game) – The game s (any) – The state to evaluate
Returns:	The favorabiltiy of each action
Return type:	`list` of `float`

static random_playout(g, s, p, max_moves=1000)[source]¶

Perform a random playout and return the winner

Parameters:	g (Game) – The game s (any) – The state of the game to start the playout from p (player) – The player whose turn it currently is max_moves (int) – Maximum number of moves before the function exits
Returns:	The winner of the game, or -1 if there is not one
Return type:	int

search(g, num_iters=100, verbose=False, nnet=None, c_punt=1.4)[source]¶

Play out a certain number of games, each time updating our win and play counts for any state that we visit during the game. As we continue to play, num_wins / num_plays for a given state should begin to converge on the true optimality of a state

Parameters:	g (Game) – Game to train on num_iters (int) – Number of search iterations verbose (bool) – Whether or not to render a progress bar nnet (Network) – Optional nework to be used for the policy instead of a naive random playout c_punt (float) – The degree of exploration. Defaults to 1.4

search_episode(g, nnet=None, c_punt=1.4)[source]¶

We play a game by starting in the boards starting state and then choosing a random move. We then move to the next state, keeping track of which moves we chose. At the end of the game we go through our visited list and update the values of wins and plays so that we have a better understanding of which states are good and which are bad

Parameters:

g (Game) – Game to search
nnet (Network) – Optional nework to be used for the policy instead of a naive random playout
c_punt (float) – The degree of exploration. Defaults to 1.4

Returns:

List of examples where each entry is of the format: [player, state_hash, reward]

Return type:

list

update(examples)[source]¶

Backpropagate the result of the training episodes

Parameters:	examples (list) – List of examples where each entry is of the format `[player, state_hash, reward]`

class algorithms.Minimax(horizon=inf)[source]¶

Implementation of the minimax algorithm.

horizon¶: int – The max depth of the search. Defaults to infinity. Note that if this is set then the game’s hueristic is used

best_action(g, s, p)[source]¶

Return the best action given a state and player

Parameters:	g (Game) – The game object s (any) – The current state of the game p (int) – The current player
Returns:	The best action the algorithm can find
Return type:	int

max_play(g, s, p, depth)[source]¶

Get the largest value of all the child nodes

Parameters:	g (Game) – The game s (any) – The state of the game upon execution p (int) – The current player (who is about to make a move) depth (int) – The current depth of the search tree
Returns:	The largest value of all the child states
Return type:	int

min_play(g, s, p, depth)[source]¶

Get the smallest value of all the child nodes

Parameters:	g (Game) – The game s (any) – The state of the game upon execution p (int) – The current player (who is about to make a move) depth (int) – The current depth of the search tree
Returns:	The smallest value of all the child states
Return type:	int