Core

ECMs constructors

Here we collect different standalone functions that will help us construct different types of ECM

standard_ps_upd

 standard_ps_upd (reward, hmatrix, gmatrix, h_damp, g_damp)

*Given a reward, updates h-matrix and g-matrix following the standard PS update rule:

\(h \leftarrow h - h_{damp}*(h-1)+ reward*g\)

\(g \leftarrow (1-g_{damp})*g\)*

Pre-built ECMs

Here we collect the abstract parent class that any ECM should be built upon as well as some pre-built ECM ready to use.

source

Abstract_ECM

 Abstract_ECM (*args, **kwargs)

Abstract agent class any episodic and compositional memory (ECM) should be derived from. Asserts that the necessary methods are implemented. No compulsory input objects are needed.

PS agents must have as parent class Abstract_ECM, and hence must contain one compulsory method:

source

Abstract_ECM.sample

 Abstract_ECM.sample ()

Performs a random walk through the ECM. Typically, this implies receiving an input percept and returning an action.

source

Two_Layer

 Two_Layer (num_actions:int, g_damp:float, h_damp:float,
            policy:str='greedy', policy_parameters:dict=None,
            glow_method:str='sum')

*Two layer ECM. First layer, encoding the percepts observed in an environment, is initially empty (e.g. self.num_percepts = 0). As percepts are observed, they are added to the ECM and to the percept dictionary self.percepts. The second layer, encoding the actions, has size self.num_actions. In practice, the ECM graph is never created. Instead, it is defined indirectly by the h-matrix and g-matrix. Both have size (self.num_percepts, self.num_actions). The input policy (greedy, softmax or other) is used to sample actions based on the h-matrix.

For an end-to-end example of how to use this class, see the introductory tutorial notebook on PS agents.*

	Type	Default	Details
num_actions	int		The number of available actions.
g_damp	float		The glow damping(or eta) parameter.
h_damp	float		The damping (or gamma) parameter.
policy	str	greedy	If ‘greedy’, uses a greedy policy that samples the most action based on the h-matrix. If ‘softmax’, uses a softmax policy that samples an action based on the h-matrix and a temperature parameter (encoded in policy_parameters). If object, uses this object to sample action. Input must be h_values corresponding to current percept + arbitrary policy_parameters.
policy_parameters	dict	None	The parameters of the policy.
glow_method	str	sum	Method to update the g-matrix. If ‘sum’, adds the new value to the current value. If ‘init’, sets the new value to 1.