Algorithms

Collection of useful AI algorithms

class algorithms.MCTS[source]

Implementation of a Monte Carlo Tree Search. We want to learn how to play a game by keeping track of the best action in any state. We will do this by propagating whether or not the current player won the game back up through the game history. After enough iterations of game simulations we can choose the best move based on this stored information

wins

dict – A dictionary where the key is a tuple (player, state_hash) and the value is the number of wins that occurred at that state for the player. Note that the player represents whoever played the move in the state.

plays

dict – A dictionary of the same format as wins which represents the number of times the player made a move in the given state

best_action(g, s, p)[source]

Get the best action for a given player in a given game state

Parameters:
  • g (Game) – The game
  • s (state) – The current state of the game
  • p (int) – The current player
Returns:

The best action given the current knowledge of the game

Return type:

int

execute_episode(g, nnet=None, c_punt=1.4)[source]

Execute a single iteration of the search and update the internal state based on the generated examples

Parameters:
  • g (Game) – The game
  • nnet (Network) – Optional nework to be used for the policy instead of a naive random playout
  • c_punt (float) – The degree of exploration. Defaults to 1.4
monte_carlo_action(g, s, p, c_punt)[source]

Choose an action during self play based on the UCB1 algorithm. Instead of just choosing the action that led to the most wins in the past, we choose the action that balances this concern with exploration

Parameters:
  • g (Game) – The game
  • s (any) – The state of the game
  • p (int) – The player who is about to make a move
  • c_punt (float) – The degree of exploration
Returns:

Tuple (best_move, expand), where playout is a boolean denoting

whether or not the expansion phase has begun

Return type:

tuple

pi(g, s)[source]

Return the favorability of each action in a given state

Parameters:
  • g (Game) – The game
  • s (any) – The state to evaluate
Returns:

The favorabiltiy of each action

Return type:

list of float

static random_playout(g, s, p, max_moves=1000)[source]

Perform a random playout and return the winner

Parameters:
  • g (Game) – The game
  • s (any) – The state of the game to start the playout from
  • p (player) – The player whose turn it currently is
  • max_moves (int) – Maximum number of moves before the function exits
Returns:

The winner of the game, or -1 if there is not one

Return type:

int

search(g, num_iters=100, verbose=False, nnet=None, c_punt=1.4)[source]

Play out a certain number of games, each time updating our win and play counts for any state that we visit during the game. As we continue to play, num_wins / num_plays for a given state should begin to converge on the true optimality of a state

Parameters:
  • g (Game) – Game to train on
  • num_iters (int) – Number of search iterations
  • verbose (bool) – Whether or not to render a progress bar
  • nnet (Network) – Optional nework to be used for the policy instead of a naive random playout
  • c_punt (float) – The degree of exploration. Defaults to 1.4
search_episode(g, nnet=None, c_punt=1.4)[source]

We play a game by starting in the boards starting state and then choosing a random move. We then move to the next state, keeping track of which moves we chose. At the end of the game we go through our visited list and update the values of wins and plays so that we have a better understanding of which states are good and which are bad

Parameters:
  • g (Game) – Game to search
  • nnet (Network) – Optional nework to be used for the policy instead of a naive random playout
  • c_punt (float) – The degree of exploration. Defaults to 1.4
Returns:

List of examples where each entry is of the format

[player, state_hash, reward]

Return type:

list

update(examples)[source]

Backpropagate the result of the training episodes

Parameters:examples (list) – List of examples where each entry is of the format [player, state_hash, reward]
class algorithms.Minimax(horizon=inf)[source]

Implementation of the minimax algorithm.

horizon

int – The max depth of the search. Defaults to infinity. Note that if this is set then the game’s hueristic is used

best_action(g, s, p)[source]

Return the best action given a state and player

Parameters:
  • g (Game) – The game object
  • s (any) – The current state of the game
  • p (int) – The current player
Returns:

The best action the algorithm can find

Return type:

int

max_play(g, s, p, depth)[source]

Get the largest value of all the child nodes

Parameters:
  • g (Game) – The game
  • s (any) – The state of the game upon execution
  • p (int) – The current player (who is about to make a move)
  • depth (int) – The current depth of the search tree
Returns:

The largest value of all the child states

Return type:

int

min_play(g, s, p, depth)[source]

Get the smallest value of all the child nodes

Parameters:
  • g (Game) – The game
  • s (any) – The state of the game upon execution
  • p (int) – The current player (who is about to make a move)
  • depth (int) – The current depth of the search tree
Returns:

The smallest value of all the child states

Return type:

int