Core
ECMs constructors
Here we collect different standalone functions that will help us construct different types of ECM
standard_ps_upd
standard_ps_upd (reward, hmatrix, gmatrix, h_damp, g_damp)
*Given a reward, updates h-matrix and g-matrix following the standard PS update rule:
\(h \leftarrow h - h_{damp}*(h-1)+ reward*g\)
\(g \leftarrow (1-g_{damp})*g\)*
Pre-built ECMs
Here we collect the abstract parent class that any ECM should be built upon as well as some pre-built ECM ready to use.
Abstract_ECM
Abstract_ECM (*args, **kwargs)
Abstract agent class any episodic and compositional memory (ECM) should be derived from. Asserts that the necessary methods are implemented. No compulsory input objects are needed.
PS agents must have as parent class Abstract_ECM, and hence must contain one compulsory method:
Abstract_ECM.sample
Abstract_ECM.sample ()
Performs a random walk through the ECM. Typically, this implies receiving an input percept and returning an action.
Two_Layer
Two_Layer (num_actions:int, g_damp:float, h_damp:float, policy:str='greedy', policy_parameters:dict=None, glow_method:str='sum')
*Two layer ECM. First layer, encoding the percepts observed in an environment, is initially empty (e.g. self.num_percepts = 0). As percepts are observed, they are added to the ECM and to the percept dictionary self.percepts. The second layer, encoding the actions, has size self.num_actions. In practice, the ECM graph is never created. Instead, it is defined indirectly by the h-matrix and g-matrix. Both have size (self.num_percepts, self.num_actions). The input policy (greedy, softmax or other) is used to sample actions based on the h-matrix.
For an end-to-end example of how to use this class, see the introductory tutorial notebook on PS agents.*
| Type | Default | Details | |
|---|---|---|---|
| num_actions | int | The number of available actions. | |
| g_damp | float | The glow damping(or eta) parameter. | |
| h_damp | float | The damping (or gamma) parameter. | |
| policy | str | greedy | If ‘greedy’, uses a greedy policy that samples the most action based on the h-matrix. If ‘softmax’, uses a softmax policy that samples an action based on the h-matrix and a temperature parameter (encoded in policy_parameters). If object, uses this object to sample action. Input must be h_values corresponding to current percept + arbitrary policy_parameters. |
| policy_parameters | dict | None | The parameters of the policy. |
| glow_method | str | sum | Method to update the g-matrix. If ‘sum’, adds the new value to the current value. If ‘init’, sets the new value to 1. |