Core¶

Core contains the primitives for playing a game between two players.

class core.Player(player_id, agent)[source]¶

Player is a simple abstraction whose policy is defined by the agent that backs it. Agents learn the optimal play for each player, while players are only concerned about the optimal play for themselves

player_id¶: int – The id of the player

agent¶: Agent – The agent associated with the player

Raises:	`ValueError` – If the id is not 0 or 1

action(g, s, flip)[source]¶

Take an action with the backing agent. If the starting player is not 0, then we invert the board so that the starting player is still 0 from the perspective of the agent

Parameters:	g (Game) – The game the player is playing s (any) – The state of the game flip (bool) – Whether or not to flip the state so that the agent thinks that player 0 started the game. This is necessary since trainable agents like MCTSAgent operate under the assumption that player 0 always starts
Returns:	The index of the action the player will take
Return type:	int

class core.Arena(game, players)[source]¶

Place where two agents are pitted against eachother in a series of games. Statistics on the win rates are recorded and can be displayed.

game¶: Game – The game that is being played

players¶: list – List of Player objects. Note that there should only be two, and the ids of the player should map to the index of the player in the array.

games_played¶: int – The number of games played in the arena

wins¶: list – List of two integers representing the number of wins of each player, with the index being the id of the player

play_game(verbose=False)[source]¶

Play a single game, doing the necessary bookkeeping to maintain accurate statistics and returning the winner (or -1 if no winner).

Note

We always have the start with player being 0 from the persepctive of the agent. Because of this we pass in a flip boolean to the player class in the action method, which flips the board and makes it seems as though player 0 started, even if it was actually player 1

Parameters:	verbose (bool) – Whether or not to print the output of the game. Defaults to false
Returns:	The winner of the game
Return type:	int

play_games(**kwargs)[source]¶

Play a series of games between the players, recording how they did so that we can display statistics on which player performed better

Parameters:	num_episodes (int) – The number of games to play, defaults to 10 verbose (bool) – Whether or not to print output from each game. Defaults to false

statistics()[source]¶: Print out the statistics for a given series of games.

class core.Game[source]¶

Game class, which is extended to implement different types of adversarial, zero sum games. The class itself is stateless and all methods are actually static.

action_space(s)[source]¶

For any given state returns a list of all possible valid actions

Parameters:	s (any) – The state of the game

flip_state(s)[source]¶

Invert the state of the board so that player 0 becomes player 1

Parameters:	s (any) – The state of the game

initial_state()[source]¶: Return the initial state of the game

next_state(s, a, p)[source]¶

Given a state, action, and player id, return the state resulting from the player making that move

Parameters:	s (any) – The state of the game a (int) – The action for the player to take p (int) – The player to get the next state for

reward(s, p)[source]¶

Returns the reward for a given state

Parameters:	s (any) – The state of the game p (int) – The player to get the reward for

terminal(s)[source]¶

Returns whether a given state is terminal

Parameters:	s (any) – The state of the game

to_hash(s)[source]¶

Returns a hash of the game state, which is necessary for some algorithms such as MCTS

Parameters:	s (any) – The state of the game

to_readable_string(s)[source]¶

Returns a pretty-formatted representation of the board

Parameters:	s (any) – The state of the game

winner(s)[source]¶

Returns the winner of a game, or -1 if there is no winner

Parameters:	s (any) – The state of the game

Note: There are two types of agents, agents that are trainable and agents that are not. If an agent is trainable then it inherits from the TrainableAgent class and must implement all of the members defined below. For example, MCTSAgent is a trainable agent, while MinimaxAgent is not.

class core.Agent[source]¶

An agent class which exposes a method called action. Given a certain state of a game and the player that is playing, the agent retuns the best action it can find, given a certain heuristic or strategy

action(g, s, p)[source]¶

Given a game, a state of the game, return an action

Parameters:	g (Game) – The game the agent is competing in s (any) – The state of the game p (int) – The current player (either 0 or 1)
Returns:	The index of the action within the returned action space
Return type:	int

class core.TrainableAgent[source]¶

Class that extends the functionality of a normal agent. This is necessary because agents are bound to a particular player, but for some algorithms the agent is really being trained to play optimally for both plays, so we have this class house the training data and then pass it into the agents when they are instantiated to avoid duplicated work

train(g, **kwargs)[source]¶

Train the agent. As a convenience this should return self.training_params() at the end of training

Parameters:	g (Game) – The game the agent is training on
Returns:	The training params of the agent
Return type:	tuple

train_episode(g, **kwargs)[source]¶

Single training iteration

Parameters:	g (Game) – The game the agent is training on

training_params(g)[source]¶

Return the params that result from training

Parameters:	g (Game) – The game the agent is training on

class core.Algorithm[source]¶

A basic abstraction for a class that finds an action to take in a given state for a given player. Even if the algorithm is not stateful it is still implemented as a class to provide a uniform interface.

Note

Despite this interface being almost identical to an agent, agents can use multiple algorithms to come up with an action for a player to execute in a game.

best_action(g, s, p)[source]¶

Return the best action given a state and player

Parameters:	g (Game) – The game object s (any) – The current state of the game p (int) – The current player
Returns:	The best action the algorithm can find
Return type:	int