Foraging
This notebook gathers different kinds of agents for foraging and target search in various scenarios, adapted for their use in the reinforcement learning paradigm.
Helpers
rand_choice_nb
rand_choice_nb (arr:<built-infunctionarray>, prob:<built- infunctionarray>)
| Type | Details | |
|---|---|---|
| arr | array | A 1D numpy array of values to sample from. |
| prob | array | A 1D numpy array of probabilities for the given samples. |
| Returns | float | A random sample from the given array with a given probability. |
Forager
Forager
Forager (num_actions:int, size_state_space:<built-infunctionarray>, gamma_damping=0.0, eta_glow_damping=0.0, policy_type='standard', beta_softmax=3, initial_prob_distr=array([], shape=(2, 0), dtype=float64), fixed_policy=array([], shape=(2, 0), dtype=float64), max_no_H_update=10000, g_update='s')
*This class defines a Forager agent, able to perform actions and learn from rewards based on the PS formalism.
This is an updated version from the one used in the original paper (https://doi.org/10.1088/1367-2630/ad19a8), taking into account the improvements made to the H and G matrices proposed by Michele Caraglio in our paper (https://doi.org/10.1039/D3SM01680C).
See self.act*
| Type | Default | Details | |
|---|---|---|---|
| num_actions | int | Number of actions the agent can take | |
| size_state_space | array | Size of the state space, given as an array where each entry is the dimension of each environmental feature | |
| gamma_damping | float | 0.0 | Gamma damping from PS |
| eta_glow_damping | float | 0.0 | Eta damping from PS |
| policy_type | str | standard | Policy type. Can be ‘standard’ or ‘softmax’ |
| beta_softmax | int | 3 | Beta parameter for softmax policy |
| initial_prob_distr | [] | Initial probability distribution for the H matrix | |
| fixed_policy | [] | Fixed policy for the agent to follow | |
| max_no_H_update | int | 10000 | Max number of steps without updating the H matrix. After this number, the full H matrix is updated |
| g_update | str | s | Type of update for the G matrix. Can be ‘s’ (sum) or ‘r’ (reset) Works as follows: s (sum) -> g_mat += 1 or r (reset) -> gmat = 1 when updating gmat |
Parallel training launchers
Here we gather useful functions that launch training and inference of agents for different environments
Reset Environment
train_loop_reset
train_loop_reset (episodes:int, time_ep:int, agent:object, env:object, h_mat_allT:bool=False, when_save_h_mat=1, reset_after_reward=True)
Training loop for a forager agent in a given environment.
| Type | Default | Details | |
|---|---|---|---|
| episodes | int | Number of episodes | |
| time_ep | int | Length of an episode | |
| agent | object | Agent that will be trained | |
| env | object | Environment where the agent will be trained | |
| h_mat_allT | bool | False | If True saves the h_mat at desired time |
| when_save_h_mat | int | 1 | If h_mat_allT = True, sets the time where h-matrix is saved |
| reset_after_reward | bool | True | If True, the agents performs a reset after getting a target |
Launchers
We now prepare some functions that allow us to launch parallel trainings with ease. We have to separate the launchers in 1D and 2D because of numba compilation, which would give errors due to the enviroments asking for different inputs.
run_agents_reset_1D
run_agents_reset_1D (episodes, time_ep, N_agents, D=0.5, L=10.0, num_actions=2, size_state_space=array([100]), gamma_damping=1e-05, eta_glow_damping=0.1, g_update='s', initial_prob_distr=array([], shape=(2, 0), dtype=float64), policy_type='standard', beta_softmax=3, fixed_policy=array([], shape=(2, 0), dtype=float64), max_no_H_update=1000, h_mat_allT=False, when_save_h_mat=1, reset_after_reward=True, num_runs=None)
Launches parallel trainings of forager agents in a 1D Reset environment.
| Type | Default | Details | |
|---|---|---|---|
| episodes | Number of episodes | ||
| time_ep | Length of an episode | ||
| N_agents | Number of parallel agents | ||
| D | float | 0.5 | Diffusion coefficient |
| L | float | 10.0 | Size of the environment |
| num_actions | int | 2 | Number of actions |
| size_state_space | ndarray | [100] | Size of the state space |
| gamma_damping | float | 1e-05 | PS damping factor |
| eta_glow_damping | float | 0.1 | PS glow damping factor |
| g_update | str | s | Type of G update. Can be ‘s’ (sum) or ‘r’ (reset) |
| initial_prob_distr | [] | Initial probability distribution for the H matrix | |
| policy_type | str | standard | Policy type. Can be ‘standard’ or ‘softmax’ |
| beta_softmax | int | 3 | Softmax temperature if sotfmax policy is used |
| fixed_policy | [] | Fixed policy for the agent to follow | |
| max_no_H_update | int | 1000 | Max number of steps without updating the H matrix. After this number, the full H matrix is updated |
| h_mat_allT | bool | False | If True saves the h_mat at desired time |
| when_save_h_mat | int | 1 | If h_mat_allT = True, sets the time where h-matrix is saved |
| reset_after_reward | bool | True | If True, the agents performs a reset after getting a target |
| num_runs | NoneType | None | When we want N_agent != number of max cores, we use this to make few runs over the selected number of cores, given by N_agents. |
run_agents_reset_2D
run_agents_reset_2D (episodes, time_ep, N_agents, dist_target=10.0, radius_target=1.0, D=0.5, num_actions=2, size_state_space=array([100]), gamma_damping=1e-05, eta_glow_damping=0.1, initial_prob_distr=array([], shape=(2, 0), dtype=float64), policy_type='standard', beta_softmax=3, fixed_policy=array([], shape=(2, 0), dtype=float64), max_no_H_update=1000, h_mat_allT=False, when_save_h_mat=1, reset_after_reward=True, g_update='s', num_runs=None)
Launches parallel trainings of forager agents in a 2D Reset environment.
| Type | Default | Details | |
|---|---|---|---|
| episodes | Number of episodes | ||
| time_ep | Length of an episode | ||
| N_agents | Number of parallel agents | ||
| dist_target | float | 10.0 | Distance from the origin where the target is located |
| radius_target | float | 1.0 | Radius of the target |
| D | float | 0.5 | Diffusion coefficient |
| num_actions | int | 2 | Number of actions |
| size_state_space | ndarray | [100] | Size of the state space |
| gamma_damping | float | 1e-05 | PS damping factor |
| eta_glow_damping | float | 0.1 | PS glow damping factor |
| initial_prob_distr | [] | Initial probability distribution for the H matrix | |
| policy_type | str | standard | Policy type. Can be ‘standard’ or ‘softmax’ |
| beta_softmax | int | 3 | Softmax temperature if softmax policy is used |
| fixed_policy | [] | Fixed policy for the agent to follow | |
| max_no_H_update | int | 1000 | Max number of steps without updating the H matrix. After this number, the full H matrix is updated |
| h_mat_allT | bool | False | If True saves the h_mat at desired time |
| when_save_h_mat | int | 1 | If h_mat_allT = True, sets the time where h-matrix is saved |
| reset_after_reward | bool | True | If True, the agents performs a reset after getting a target |
| g_update | str | s | Type of G update. Can be ‘s’ (sum) or ‘r’ (reset) |
| num_runs | NoneType | None | When we want N_agent != number of max cores, we use this to make few runs over the selected number of cores, given by N_agents. |