Core¶
Core contains the primitives for playing a game between two players.
-
class
core.
Player
(player_id, agent)[source]¶ Player is a simple abstraction whose policy is defined by the agent that backs it. Agents learn the optimal play for each player, while players are only concerned about the optimal play for themselves
-
player_id
¶ int – The id of the player
-
agent
¶ Agent – The agent associated with the player
Raises: ValueError
– If the id is not 0 or 1-
action
(g, s, flip)[source]¶ Take an action with the backing agent. If the starting player is not 0, then we invert the board so that the starting player is still 0 from the perspective of the agent
Parameters: - g (Game) – The game the player is playing
- s (any) – The state of the game
- flip (bool) – Whether or not to flip the state so that the agent thinks that player 0 started the game. This is necessary since trainable agents like MCTSAgent operate under the assumption that player 0 always starts
Returns: The index of the action the player will take
Return type: int
-
-
class
core.
Arena
(game, players)[source]¶ Place where two agents are pitted against eachother in a series of games. Statistics on the win rates are recorded and can be displayed.
-
game
¶ Game – The game that is being played
-
players
¶ list – List of Player objects. Note that there should only be two, and the ids of the player should map to the index of the player in the array.
-
games_played
¶ int – The number of games played in the arena
-
wins
¶ list – List of two integers representing the number of wins of each player, with the index being the id of the player
-
play_game
(verbose=False)[source]¶ Play a single game, doing the necessary bookkeeping to maintain accurate statistics and returning the winner (or -1 if no winner).
Note
We always have the start with player being 0 from the persepctive of the agent. Because of this we pass in a
flip
boolean to the player class in the action method, which flips the board and makes it seems as though player 0 started, even if it was actually player 1Parameters: verbose (bool) – Whether or not to print the output of the game. Defaults to false Returns: The winner of the game Return type: int
-
play_games
(**kwargs)[source]¶ Play a series of games between the players, recording how they did so that we can display statistics on which player performed better
Parameters: - num_episodes (int) – The number of games to play, defaults to 10
- verbose (bool) – Whether or not to print output from each game. Defaults to false
-
-
class
core.
Game
[source]¶ Game class, which is extended to implement different types of adversarial, zero sum games. The class itself is stateless and all methods are actually static.
-
action_space
(s)[source]¶ For any given state returns a list of all possible valid actions
Parameters: s (any) – The state of the game
-
flip_state
(s)[source]¶ Invert the state of the board so that player 0 becomes player 1
Parameters: s (any) – The state of the game
-
next_state
(s, a, p)[source]¶ Given a state, action, and player id, return the state resulting from the player making that move
Parameters: - s (any) – The state of the game
- a (int) – The action for the player to take
- p (int) – The player to get the next state for
-
reward
(s, p)[source]¶ Returns the reward for a given state
Parameters: - s (any) – The state of the game
- p (int) – The player to get the reward for
-
terminal
(s)[source]¶ Returns whether a given state is terminal
Parameters: s (any) – The state of the game
-
to_hash
(s)[source]¶ Returns a hash of the game state, which is necessary for some algorithms such as MCTS
Parameters: s (any) – The state of the game
-
Note: There are two types of agents, agents that are trainable and agents that are not. If an agent is trainable then it inherits from the TrainableAgent
class and must implement all of the members defined below. For example, MCTSAgent
is a trainable agent, while MinimaxAgent
is not.
-
class
core.
Agent
[source]¶ An agent class which exposes a method called action. Given a certain state of a game and the player that is playing, the agent retuns the best action it can find, given a certain heuristic or strategy
-
class
core.
TrainableAgent
[source]¶ Class that extends the functionality of a normal agent. This is necessary because agents are bound to a particular player, but for some algorithms the agent is really being trained to play optimally for both plays, so we have this class house the training data and then pass it into the agents when they are instantiated to avoid duplicated work
-
train
(g, **kwargs)[source]¶ Train the agent. As a convenience this should return
self.training_params()
at the end of trainingParameters: g (Game) – The game the agent is training on Returns: The training params of the agent Return type: tuple
-
-
class
core.
Algorithm
[source]¶ A basic abstraction for a class that finds an action to take in a given state for a given player. Even if the algorithm is not stateful it is still implemented as a class to provide a uniform interface.
Note
Despite this interface being almost identical to an agent, agents can use multiple algorithms to come up with an action for a player to execute in a game.