Core contains the primitives for playing a game between two players.

class core.Player(player_id, agent)[source]

Player is a simple abstraction whose policy is defined by the agent that backs it. Agents learn the optimal play for each player, while players are only concerned about the optimal play for themselves


int – The id of the player


Agent – The agent associated with the player

Raises:ValueError – If the id is not 0 or 1
action(g, s, flip)[source]

Take an action with the backing agent. If the starting player is not 0, then we invert the board so that the starting player is still 0 from the perspective of the agent

  • g (Game) – The game the player is playing
  • s (any) – The state of the game
  • flip (bool) – Whether or not to flip the state so that the agent thinks that player 0 started the game. This is necessary since trainable agents like MCTSAgent operate under the assumption that player 0 always starts

The index of the action the player will take

Return type:


class core.Arena(game, players)[source]

Place where two agents are pitted against eachother in a series of games. Statistics on the win rates are recorded and can be displayed.


Game – The game that is being played


list – List of Player objects. Note that there should only be two, and the ids of the player should map to the index of the player in the array.


int – The number of games played in the arena


list – List of two integers representing the number of wins of each player, with the index being the id of the player


Play a single game, doing the necessary bookkeeping to maintain accurate statistics and returning the winner (or -1 if no winner).


We always have the start with player being 0 from the persepctive of the agent. Because of this we pass in a flip boolean to the player class in the action method, which flips the board and makes it seems as though player 0 started, even if it was actually player 1

Parameters:verbose (bool) – Whether or not to print the output of the game. Defaults to false
Returns:The winner of the game
Return type:int

Play a series of games between the players, recording how they did so that we can display statistics on which player performed better

  • num_episodes (int) – The number of games to play, defaults to 10
  • verbose (bool) – Whether or not to print output from each game. Defaults to false

Print out the statistics for a given series of games.

class core.Game[source]

Game class, which is extended to implement different types of adversarial, zero sum games. The class itself is stateless and all methods are actually static.


For any given state returns a list of all possible valid actions

Parameters:s (any) – The state of the game

Invert the state of the board so that player 0 becomes player 1

Parameters:s (any) – The state of the game

Return the initial state of the game

next_state(s, a, p)[source]

Given a state, action, and player id, return the state resulting from the player making that move

  • s (any) – The state of the game
  • a (int) – The action for the player to take
  • p (int) – The player to get the next state for
reward(s, p)[source]

Returns the reward for a given state

  • s (any) – The state of the game
  • p (int) – The player to get the reward for

Returns whether a given state is terminal

Parameters:s (any) – The state of the game

Returns a hash of the game state, which is necessary for some algorithms such as MCTS

Parameters:s (any) – The state of the game

Returns a pretty-formatted representation of the board

Parameters:s (any) – The state of the game

Returns the winner of a game, or -1 if there is no winner

Parameters:s (any) – The state of the game

Note: There are two types of agents, agents that are trainable and agents that are not. If an agent is trainable then it inherits from the TrainableAgent class and must implement all of the members defined below. For example, MCTSAgent is a trainable agent, while MinimaxAgent is not.

class core.Agent[source]

An agent class which exposes a method called action. Given a certain state of a game and the player that is playing, the agent retuns the best action it can find, given a certain heuristic or strategy

action(g, s, p)[source]

Given a game, a state of the game, return an action

  • g (Game) – The game the agent is competing in
  • s (any) – The state of the game
  • p (int) – The current player (either 0 or 1)

The index of the action within the returned action space

Return type:


class core.TrainableAgent[source]

Class that extends the functionality of a normal agent. This is necessary because agents are bound to a particular player, but for some algorithms the agent is really being trained to play optimally for both plays, so we have this class house the training data and then pass it into the agents when they are instantiated to avoid duplicated work

train(g, **kwargs)[source]

Train the agent. As a convenience this should return self.training_params() at the end of training

Parameters:g (Game) – The game the agent is training on
Returns:The training params of the agent
Return type:tuple
train_episode(g, **kwargs)[source]

Single training iteration

Parameters:g (Game) – The game the agent is training on

Return the params that result from training

Parameters:g (Game) – The game the agent is training on
class core.Algorithm[source]

A basic abstraction for a class that finds an action to take in a given state for a given player. Even if the algorithm is not stateful it is still implemented as a class to provide a uniform interface.


Despite this interface being almost identical to an agent, agents can use multiple algorithms to come up with an action for a player to execute in a game.

best_action(g, s, p)[source]

Return the best action given a state and player

  • g (Game) – The game object
  • s (any) – The current state of the game
  • p (int) – The current player

The best action the algorithm can find

Return type: