Algorithms¶
Collection of useful AI algorithms

class
algorithms.
MCTS
[source]¶ Implementation of a Monte Carlo Tree Search. We want to learn how to play a game by keeping track of the best action in any state. We will do this by propagating whether or not the current player won the game back up through the game history. After enough iterations of game simulations we can choose the best move based on this stored information

wins
¶ dict – A dictionary where the key is a tuple
(player, state_hash)
and the value is the number of wins that occurred at that state for the player. Note that the player represents whoever played the move in the state.

plays
¶ dict – A dictionary of the same format as wins which represents the number of times the player made a move in the given state

best_action
(g, s, p)[source]¶ Get the best action for a given player in a given game state
Parameters:  g (Game) – The game
 s (state) – The current state of the game
 p (int) – The current player
Returns: The best action given the current knowledge of the game
Return type: int

execute_episode
(g, nnet=None, c_punt=1.4)[source]¶ Execute a single iteration of the search and update the internal state based on the generated examples
Parameters:  g (Game) – The game
 nnet (Network) – Optional nework to be used for the policy instead of a naive random playout
 c_punt (float) – The degree of exploration. Defaults to 1.4

monte_carlo_action
(g, s, p, c_punt)[source]¶ Choose an action during self play based on the UCB1 algorithm. Instead of just choosing the action that led to the most wins in the past, we choose the action that balances this concern with exploration
Parameters:  g (Game) – The game
 s (any) – The state of the game
 p (int) – The player who is about to make a move
 c_punt (float) – The degree of exploration
Returns:  Tuple
(best_move, expand)
, where playout is a boolean denoting whether or not the expansion phase has begun
Return type: tuple

pi
(g, s)[source]¶ Return the favorability of each action in a given state
Parameters:  g (Game) – The game
 s (any) – The state to evaluate
Returns: The favorabiltiy of each action
Return type: list
offloat

static
random_playout
(g, s, p, max_moves=1000)[source]¶ Perform a random playout and return the winner
Parameters:  g (Game) – The game
 s (any) – The state of the game to start the playout from
 p (player) – The player whose turn it currently is
 max_moves (int) – Maximum number of moves before the function exits
Returns: The winner of the game, or 1 if there is not one
Return type: int

search
(g, num_iters=100, verbose=False, nnet=None, c_punt=1.4)[source]¶ Play out a certain number of games, each time updating our win and play counts for any state that we visit during the game. As we continue to play, num_wins / num_plays for a given state should begin to converge on the true optimality of a state
Parameters:  g (Game) – Game to train on
 num_iters (int) – Number of search iterations
 verbose (bool) – Whether or not to render a progress bar
 nnet (Network) – Optional nework to be used for the policy instead of a naive random playout
 c_punt (float) – The degree of exploration. Defaults to 1.4

search_episode
(g, nnet=None, c_punt=1.4)[source]¶ We play a game by starting in the boards starting state and then choosing a random move. We then move to the next state, keeping track of which moves we chose. At the end of the game we go through our visited list and update the values of wins and plays so that we have a better understanding of which states are good and which are bad
Parameters:  g (Game) – Game to search
 nnet (Network) – Optional nework to be used for the policy instead of a naive random playout
 c_punt (float) – The degree of exploration. Defaults to 1.4
Returns:  List of examples where each entry is of the format
[player, state_hash, reward]
Return type: list


class
algorithms.
Minimax
(horizon=inf)[source]¶ Implementation of the minimax algorithm.

horizon
¶ int – The max depth of the search. Defaults to infinity. Note that if this is set then the game’s hueristic is used

best_action
(g, s, p)[source]¶ Return the best action given a state and player
Parameters:  g (Game) – The game object
 s (any) – The current state of the game
 p (int) – The current player
Returns: The best action the algorithm can find
Return type: int

max_play
(g, s, p, depth)[source]¶ Get the largest value of all the child nodes
Parameters:  g (Game) – The game
 s (any) – The state of the game upon execution
 p (int) – The current player (who is about to make a move)
 depth (int) – The current depth of the search tree
Returns: The largest value of all the child states
Return type: int

min_play
(g, s, p, depth)[source]¶ Get the smallest value of all the child nodes
Parameters:  g (Game) – The game
 s (any) – The state of the game upon execution
 p (int) – The current player (who is about to make a move)
 depth (int) – The current depth of the search tree
Returns: The smallest value of all the child states
Return type: int
