Symbolic regression
Overview
This module implements symbolic regression technics on top of the qdisc representation. It includes helpers for the various objectifs (SR1/SR2/SR3), dataset preparation routines, and a SymbolicRegression training wrapper.
Ansätze
For the moment, only the two-body correlator ansatz (2BC) is implemented as TwoBodyModel.
TwoBodyModel
def TwoBodyModel(
pairs, key, add_constant:bool=False
):
A simple two-body model where the output is a sum of pairwise interactions between input features. Each pairwise interaction is parameterized by a learnable coefficient (alpha). The model can optionally include a constant term.
TwoBodyModel.predict
def predict(
X
):
Predict the output of the model given input X.
Losses
Bellow are implemented various functions used to train the SR modules following the various objectifs.
loss_SR3
def loss_SR3(
alpha, model, dataset, vk, G, L1_reg, options
):
loss used for 2BC with SR3
derivative_loss_alpha_multi_vk
def derivative_loss_alpha_multi_vk(
tree, X, options, vk, G
):
loss projecting and aligning the gradiant Args: dy: (S, N) gradients of tree w.r.t inputs vk: (S, N, m) projector vectors per sample (m projectors) G: (S, m) target projected gradients per sample Returns MSE between normalized projected dy and normalized G (averaged over m).
loss_SR2
def loss_SR2(
alpha, model, dataset, G, L1_reg, options
):
loss used for the 2BC witb SR2
derivative_loss_alpha_multi
def derivative_loss_alpha_multi(
tree, X, options, G
):
MSE loss aligning the gradiant
loss_SR1
def loss_SR1(
alpha, model, X, Y, L1_reg, options
):
loss used for the 2BC with SR1
class_loss
def class_loss(
tree, X, options, Y
):
classification loss used for the 2BC witb SR1
FFNN_theta_to_mu
def FFNN_theta_to_mu(
hidden_dim:int, num_layers:int, parent:Union=<flax.linen.module._Sentinel object at 0x7f74545e5280>,
name:Optional=None
)->None:
simple feed forward net from theta to mu1, used for SR3
Performance helpers
Bellow are implemented various functions helping to acces the performance/quality of the discovered expressions.
auc_from_scores_labels
def auc_from_scores_labels(
scores, labels
):
Compute AUC from scores and binary labels.
compare_theta_corr
def compare_theta_corr(
theta, X, C, method:str='pearson', use_upper:bool=True
):
Compare the learned theta matrix to the empirical correlation matrix C using the specified method (pearson, cosine, or spearman). If use_upper=True, only compare the upper triangular parts of the matrices.
spearman_rho
def spearman_rho(
a, b
):
Compute Spearman’s rank correlation coefficient between two vectors a and b.
cosine_sim
def cosine_sim(
a, b, eps:float=1e-12
):
Compute cosine similarity between two vectors a and b.
pearson_between_vectors
def pearson_between_vectors(
a, b, eps:float=1e-12
):
Compute Pearson correlation between two vectors a and b.
flatten_upper
def flatten_upper(
mat
):
Flatten the upper triangular part of a matrix.
empirical_corr_matrix
def empirical_corr_matrix(
X, centered:bool=True
):
Compute empirical correlation matrix from data X. If centered=True, center the data by subtracting the mean before computing correlations.
curved_edge
def curved_edge(
ax, x1, y1, x2, y2, curvature:float=0.2, plot_kwargs:VAR_KEYWORD
):
Draw a curved (quadratic Bézier) edge between (x1,y1) and (x2,y2). curvature > 0 bends left, curvature < 0 bends right. Used to visualize pairwise interactions in the 2BC.
SR class
The main entry point of SR is the SymbolicRegression class, which exposes the training and analysis workflow for symbolic regression on qdisc data.
SymbolicRegression
def SymbolicRegression(
dataset:Dataset, cluster_idx_in:Array, objective:str, type_of_vk:Optional=None,
cluster_idx_out:Optional=None, # only needed for SR1 if not specified, full/in
search_space:str='2_body_correlator', # ansatz or genetic
add_constant:bool=False, shift_data:bool=True, VAE_model:Optional=None, # needed for SR2,3
VAE_params:Optional=None, # needed for SR2,3
mu_cluster:Optional=None, # needed for SR2,3
idx_mu_cluster:Optional=None, # needed for SR2,3
):
Wrapper with the SR methods to be used on top of the representation learned by the cpVAE for quantum
Args:
dataset: Dataset object
cluster_idx_in: coord. specifying the location of the cluster we analyse in parameter (theta) space
objective: SR1, SR2 or SR3
cluster_idx_out: coord. specifying the location of the cluster we analyse in parameter (theta) space (only used in SR1)
search_space: 2_body_correlator or genetic
add_constant: if True, add a constant term to the model
shift_data: if the symbolic function takes direcly the dataset.data or if {0,1}->{-1,1} before
for SR2,3 also need:
VAE_model: the VAE model
VAE_params: its params
mu_cluster: the value of the latent variable accrooss theta space where the cluster appear (for now, only one mu)
idx_mu_cluster: index of the latent variable where the cluster appear (for now, only one)
SymbolicRegression.train
def train(
key:int, dataset_size:int=2000, kwargs:VAR_KEYWORD
):
redirect to the train wrt the chosen search space
SymbolicRegression.call_pysr
def call_pysr(
key:PRNGKey, dataset_size:int, random_state:int=2575, # seed for reproductibility
niterations:int=200, # Number of iterations to search
binary_operators:list=['+', '*', '-'], # Allowed binary operations
unary_operators:list=[], # Other allowed operations
elementwise_loss:str='loss(x,y) = -y*log(1/(1+exp(-x)))-(1-y)*log(1-1/(1+exp(-x)))', # sigmoid loss for SR1
maxsize:int=20, # max complexity of the equations
progress:bool=True, # Show progress during training
extra_sympy_mappings:dict={'C': 'C'}, # Allow PySR to use constants
batching:bool=True, # batching, usually big dataset
batch_size:int=500, turbo:bool=True, deterministic:bool=True, # for reproductibility
parallelism:str='serial'
):
call pysr with the SR objective
SymbolicRegression.train_2BC
def train_2BC(
key:PRNGKey, dataset_size:int=2000, L1_reg:float=0.0, print_info:bool=True, max_iter:int=500
)->object:
Train the 2 body correlator (2BC) ansatz on the various SR objectives
SymbolicRegression.prepare_dataset
def prepare_dataset(
key:PRNGKey, dataset_size:Optional=2000
)->tuple:
Prepare the dataset, redirect to a method depending on the objective
SymbolicRegression.plot_alpha
def plot_alpha(
topology:list, edge_scale:int=10, name:str='', threshold:float=None
):
plot the 2 body correlator weights alpha_ij
SymbolicRegression.compute_prediction
def compute_prediction(
theta_pair:tuple=(1, 0), values_other_thetas:tuple=()
)->Array:
compute f(x) on the parameter space
SymbolicRegression.compute_and_plot_prediction
def compute_and_plot_prediction(
theta_pair:tuple=(1, 0), values_other_thetas:tuple=(), name:str='', class_pred:bool=False,
fig_shape:tuple=(3, 3)
)->Array:
compute and plot f(x) on the parameter space
SymbolicRegression.reduce_alpha
def reduce_alpha(
random_state:int, niterations:int=200, binary_operators:list=['+', '*', '/', '-'],
unary_operators:list=['exp', 'log', 'sin', 'cos', 'tanh'], elementwise_loss:str='loss(x, y) = (x - y)^2',
maxsize:int=25, deterministic:bool=True, extra_sympy_mappings:dict={'C': 'C'}
)->str:
Use pysr to reduce the alpha. It tries to find a fct: g(i,j)->alpha_ij