Projective Simulation
  • Get started
  • Documentation
  • Tutorials
  • Research

Foraging

  • Documentation
  • Agents
    • Core
    • Foraging
  • ECMs
    • Core
    • Priming
  • Environments
    • Core
    • Foraging

On this page

  • Helpers
    • rand_choice_nb
  • Forager
    • Forager
  • Parallel training launchers
    • Reset Environment
      • train_loop_reset
      • Launchers
      • run_agents_reset_1D
      • run_agents_reset_2D

View source

Report an issue

Foraging

This notebook gathers different kinds of agents for foraging and target search in various scenarios, adapted for their use in the reinforcement learning paradigm.

Helpers


source

rand_choice_nb

 rand_choice_nb (arr:<built-infunctionarray>, prob:<built-
                 infunctionarray>)
Type Details
arr array A 1D numpy array of values to sample from.
prob array A 1D numpy array of probabilities for the given samples.
Returns float A random sample from the given array with a given probability.

Forager


source

Forager

 Forager (num_actions:int, size_state_space:<built-infunctionarray>,
          gamma_damping=0.0, eta_glow_damping=0.0, policy_type='standard',
          beta_softmax=3, initial_prob_distr=array([], shape=(2, 0),
          dtype=float64), fixed_policy=array([], shape=(2, 0),
          dtype=float64), max_no_H_update=10000, g_update='s')

*This class defines a Forager agent, able to perform actions and learn from rewards based on the PS formalism.

This is an updated version from the one used in the original paper (https://doi.org/10.1088/1367-2630/ad19a8), taking into account the improvements made to the H and G matrices proposed by Michele Caraglio in our paper (https://doi.org/10.1039/D3SM01680C).

See self.act*

Type Default Details
num_actions int Number of actions the agent can take
size_state_space array Size of the state space, given as an array where each entry is the dimension of each environmental feature
gamma_damping float 0.0 Gamma damping from PS
eta_glow_damping float 0.0 Eta damping from PS
policy_type str standard Policy type. Can be ‘standard’ or ‘softmax’
beta_softmax int 3 Beta parameter for softmax policy
initial_prob_distr [] Initial probability distribution for the H matrix
fixed_policy [] Fixed policy for the agent to follow
max_no_H_update int 10000 Max number of steps without updating the H matrix. After this number, the full H matrix is updated
g_update str s Type of update for the G matrix. Can be ‘s’ (sum) or ‘r’ (reset)
Works as follows: s (sum) -> g_mat += 1 or r (reset) -> gmat = 1 when updating gmat

Parallel training launchers

Here we gather useful functions that launch training and inference of agents for different environments

Reset Environment


source

train_loop_reset

 train_loop_reset (episodes:int, time_ep:int, agent:object, env:object,
                   h_mat_allT:bool=False, when_save_h_mat=1,
                   reset_after_reward=True)

Training loop for a forager agent in a given environment.

Type Default Details
episodes int Number of episodes
time_ep int Length of an episode
agent object Agent that will be trained
env object Environment where the agent will be trained
h_mat_allT bool False If True saves the h_mat at desired time
when_save_h_mat int 1 If h_mat_allT = True, sets the time where h-matrix is saved
reset_after_reward bool True If True, the agents performs a reset after getting a target

Launchers

We now prepare some functions that allow us to launch parallel trainings with ease. We have to separate the launchers in 1D and 2D because of numba compilation, which would give errors due to the enviroments asking for different inputs.


source

run_agents_reset_1D

 run_agents_reset_1D (episodes, time_ep, N_agents, D=0.5, L=10.0,
                      num_actions=2, size_state_space=array([100]),
                      gamma_damping=1e-05, eta_glow_damping=0.1,
                      g_update='s', initial_prob_distr=array([], shape=(2,
                      0), dtype=float64), policy_type='standard',
                      beta_softmax=3, fixed_policy=array([], shape=(2, 0),
                      dtype=float64), max_no_H_update=1000,
                      h_mat_allT=False, when_save_h_mat=1,
                      reset_after_reward=True, num_runs=None)

Launches parallel trainings of forager agents in a 1D Reset environment.

Type Default Details
episodes Number of episodes
time_ep Length of an episode
N_agents Number of parallel agents
D float 0.5 Diffusion coefficient
L float 10.0 Size of the environment
num_actions int 2 Number of actions
size_state_space ndarray [100] Size of the state space
gamma_damping float 1e-05 PS damping factor
eta_glow_damping float 0.1 PS glow damping factor
g_update str s Type of G update. Can be ‘s’ (sum) or ‘r’ (reset)
initial_prob_distr [] Initial probability distribution for the H matrix
policy_type str standard Policy type. Can be ‘standard’ or ‘softmax’
beta_softmax int 3 Softmax temperature if sotfmax policy is used
fixed_policy [] Fixed policy for the agent to follow
max_no_H_update int 1000 Max number of steps without updating the H matrix. After this number, the full H matrix is updated
h_mat_allT bool False If True saves the h_mat at desired time
when_save_h_mat int 1 If h_mat_allT = True, sets the time where h-matrix is saved
reset_after_reward bool True If True, the agents performs a reset after getting a target
num_runs NoneType None When we want N_agent != number of max cores, we use this to make few runs over the selected number of cores, given by N_agents.

source

run_agents_reset_2D

 run_agents_reset_2D (episodes, time_ep, N_agents, dist_target=10.0,
                      radius_target=1.0, D=0.5, num_actions=2,
                      size_state_space=array([100]), gamma_damping=1e-05,
                      eta_glow_damping=0.1, initial_prob_distr=array([],
                      shape=(2, 0), dtype=float64),
                      policy_type='standard', beta_softmax=3,
                      fixed_policy=array([], shape=(2, 0), dtype=float64),
                      max_no_H_update=1000, h_mat_allT=False,
                      when_save_h_mat=1, reset_after_reward=True,
                      g_update='s', num_runs=None)

Launches parallel trainings of forager agents in a 2D Reset environment.

Type Default Details
episodes Number of episodes
time_ep Length of an episode
N_agents Number of parallel agents
dist_target float 10.0 Distance from the origin where the target is located
radius_target float 1.0 Radius of the target
D float 0.5 Diffusion coefficient
num_actions int 2 Number of actions
size_state_space ndarray [100] Size of the state space
gamma_damping float 1e-05 PS damping factor
eta_glow_damping float 0.1 PS glow damping factor
initial_prob_distr [] Initial probability distribution for the H matrix
policy_type str standard Policy type. Can be ‘standard’ or ‘softmax’
beta_softmax int 3 Softmax temperature if softmax policy is used
fixed_policy [] Fixed policy for the agent to follow
max_no_H_update int 1000 Max number of steps without updating the H matrix. After this number, the full H matrix is updated
h_mat_allT bool False If True saves the h_mat at desired time
when_save_h_mat int 1 If h_mat_allT = True, sets the time where h-matrix is saved
reset_after_reward bool True If True, the agents performs a reset after getting a target
g_update str s Type of G update. Can be ‘s’ (sum) or ‘r’ (reset)
num_runs NoneType None When we want N_agent != number of max cores, we use this to make few runs over the selected number of cores, given by N_agents.