Foraging

This notebook gathers different kinds of agents for foraging and target search in various scenarios, adapted for their use in the reinforcement learning paradigm.

Helpers

source

rand_choice_nb

 rand_choice_nb (arr:<built-infunctionarray>, prob:<built-
                 infunctionarray>)

	Type	Details
arr	array	A 1D numpy array of values to sample from.
prob	array	A 1D numpy array of probabilities for the given samples.
Returns	float	A random sample from the given array with a given probability.

Forager

source

Forager

 Forager (num_actions:int, size_state_space:<built-infunctionarray>,
          gamma_damping=0.0, eta_glow_damping=0.0, policy_type='standard',
          beta_softmax=3, initial_prob_distr=array([], shape=(2, 0),
          dtype=float64), fixed_policy=array([], shape=(2, 0),
          dtype=float64), max_no_H_update=10000, g_update='s')

*This class defines a Forager agent, able to perform actions and learn from rewards based on the PS formalism.

This is an updated version from the one used in the original paper (https://doi.org/10.1088/1367-2630/ad19a8), taking into account the improvements made to the H and G matrices proposed by Michele Caraglio in our paper (https://doi.org/10.1039/D3SM01680C).

See self.act*

	Type	Default	Details
num_actions	int		Number of actions the agent can take
size_state_space	array		Size of the state space, given as an array where each entry is the dimension of each environmental feature
gamma_damping	float	0.0	Gamma damping from PS
eta_glow_damping	float	0.0	Eta damping from PS
policy_type	str	standard	Policy type. Can be ‘standard’ or ‘softmax’
beta_softmax	int	3	Beta parameter for softmax policy
initial_prob_distr		[]	Initial probability distribution for the H matrix
fixed_policy		[]	Fixed policy for the agent to follow
max_no_H_update	int	10000	Max number of steps without updating the H matrix. After this number, the full H matrix is updated
g_update	str	s	Type of update for the G matrix. Can be ‘s’ (sum) or ‘r’ (reset) Works as follows: s (sum) -> g_mat += 1 or r (reset) -> gmat = 1 when updating gmat

Parallel training launchers

Here we gather useful functions that launch training and inference of agents for different environments

Reset Environment

source

train_loop_reset

 train_loop_reset (episodes:int, time_ep:int, agent:object, env:object,
                   h_mat_allT:bool=False, when_save_h_mat=1,
                   reset_after_reward=True)

Training loop for a forager agent in a given environment.

	Type	Default	Details
episodes	int		Number of episodes
time_ep	int		Length of an episode
agent	object		Agent that will be trained
env	object		Environment where the agent will be trained
h_mat_allT	bool	False	If True saves the h_mat at desired time
when_save_h_mat	int	1	If h_mat_allT = True, sets the time where h-matrix is saved
reset_after_reward	bool	True	If True, the agents performs a reset after getting a target

Launchers

We now prepare some functions that allow us to launch parallel trainings with ease. We have to separate the launchers in 1D and 2D because of numba compilation, which would give errors due to the enviroments asking for different inputs.

source

run_agents_reset_1D

 run_agents_reset_1D (episodes, time_ep, N_agents, D=0.5, L=10.0,
                      num_actions=2, size_state_space=array([100]),
                      gamma_damping=1e-05, eta_glow_damping=0.1,
                      g_update='s', initial_prob_distr=array([], shape=(2,
                      0), dtype=float64), policy_type='standard',
                      beta_softmax=3, fixed_policy=array([], shape=(2, 0),
                      dtype=float64), max_no_H_update=1000,
                      h_mat_allT=False, when_save_h_mat=1,
                      reset_after_reward=True, num_runs=None)

Launches parallel trainings of forager agents in a 1D Reset environment.

	Type	Default	Details
episodes			Number of episodes
time_ep			Length of an episode
N_agents			Number of parallel agents
D	float	0.5	Diffusion coefficient
L	float	10.0	Size of the environment
num_actions	int	2	Number of actions
size_state_space	ndarray	[100]	Size of the state space
gamma_damping	float	1e-05	PS damping factor
eta_glow_damping	float	0.1	PS glow damping factor
g_update	str	s	Type of G update. Can be ‘s’ (sum) or ‘r’ (reset)
initial_prob_distr		[]	Initial probability distribution for the H matrix
policy_type	str	standard	Policy type. Can be ‘standard’ or ‘softmax’
beta_softmax	int	3	Softmax temperature if sotfmax policy is used
fixed_policy		[]	Fixed policy for the agent to follow
max_no_H_update	int	1000	Max number of steps without updating the H matrix. After this number, the full H matrix is updated
h_mat_allT	bool	False	If True saves the h_mat at desired time
when_save_h_mat	int	1	If h_mat_allT = True, sets the time where h-matrix is saved
reset_after_reward	bool	True	If True, the agents performs a reset after getting a target
num_runs	NoneType	None	When we want N_agent != number of max cores, we use this to make few runs over the selected number of cores, given by N_agents.

source

run_agents_reset_2D

 run_agents_reset_2D (episodes, time_ep, N_agents, dist_target=10.0,
                      radius_target=1.0, D=0.5, num_actions=2,
                      size_state_space=array([100]), gamma_damping=1e-05,
                      eta_glow_damping=0.1, initial_prob_distr=array([],
                      shape=(2, 0), dtype=float64),
                      policy_type='standard', beta_softmax=3,
                      fixed_policy=array([], shape=(2, 0), dtype=float64),
                      max_no_H_update=1000, h_mat_allT=False,
                      when_save_h_mat=1, reset_after_reward=True,
                      g_update='s', num_runs=None)

Launches parallel trainings of forager agents in a 2D Reset environment.

	Type	Default	Details
episodes			Number of episodes
time_ep			Length of an episode
N_agents			Number of parallel agents
dist_target	float	10.0	Distance from the origin where the target is located
radius_target	float	1.0	Radius of the target
D	float	0.5	Diffusion coefficient
num_actions	int	2	Number of actions
size_state_space	ndarray	[100]	Size of the state space
gamma_damping	float	1e-05	PS damping factor
eta_glow_damping	float	0.1	PS glow damping factor
initial_prob_distr		[]	Initial probability distribution for the H matrix
policy_type	str	standard	Policy type. Can be ‘standard’ or ‘softmax’
beta_softmax	int	3	Softmax temperature if softmax policy is used
fixed_policy		[]	Fixed policy for the agent to follow
max_no_H_update	int	1000	Max number of steps without updating the H matrix. After this number, the full H matrix is updated
h_mat_allT	bool	False	If True saves the h_mat at desired time
when_save_h_mat	int	1	If h_mat_allT = True, sets the time where h-matrix is saved
reset_after_reward	bool	True	If True, the agents performs a reset after getting a target
g_update	str	s	Type of G update. Can be ‘s’ (sum) or ‘r’ (reset)
num_runs	NoneType	None	When we want N_agent != number of max cores, we use this to make few runs over the selected number of cores, given by N_agents.